https://bugs.freedesktop.org/show_bug.cgi?id=97260
Bug ID: 97260 Summary: R9 290 low performance in Linux 4.7 Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Keywords: regression Severity: major Priority: high Component: DRM/Radeon Assignee: dri-devel@lists.freedesktop.org Reporter: magolobel@yahoo.com.br
Starting with Linux Kernel 4.7, the Radeon R9 290 started showing very low performance, about 20/30% of what is available up to kernel 4.6. As can be seen in this article, the performance regression started even before the launch of kernel 4.7 RC1, using Alex Deucher's drm-next-4.7 branch:
http://www.phoronix.com/scan.php?page=article&item=radeon-drm-next-47&am...
Some time later it was speculated that the problem may be with DPM:
http://www.phoronix.com/scan.php?page=news_item&px=Linux-4.7-R9-290-Regr...
To replicate the bug, install Ubuntu 16.04 64bits and install kernel 4.7 from here (preferably above rc5):
http://kernel.ubuntu.com/~kernel-ppa/mainline/
The performance regression will show right away. Rebooting the system and choosing a kernel version up to 4.6 will fix the bug.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #1 from Alex Deucher alexdeucher@gmail.com --- Can you bisect?
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #2 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- Just a note:
I am not experiencing such a performance regression on R9 390 with kernel 4.7.0 - 4.8.0-rc1, amdgpu.ko kernel module, Gentoo Linux.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #3 from Clésio Luiz magolobel@yahoo.com.br --- (In reply to Alex Deucher from comment #1)
Can you bisect?
Unfortunately, no. But the best I can say is that it started very early in kernel 4.7 schedule, since the Phoronix benchmark on the first link was made before 4.7 RC1 was out, using Alex Deucher's drm-next-4.7 branch, in 05/14/2016.
The problem is still present in 4.8 RC1.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #4 from Alex Deucher alexdeucher@gmail.com --- Unfortunately, none of the power management code for these asics changed during that time period, so nothing jumps out as a likely culprit. Did you also change your mesa stack or firmware in that time period? With everything else the same, does the issue go away when using an older kernel? If so, any chance you could narrow it down which kernels have the issue vs. not?
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #5 from Alex Deucher alexdeucher@gmail.com --- Can you verify that 4.6 works properly?
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #6 from Alex Deucher alexdeucher@gmail.com --- Does reverting 5e031d9fe8b0741f11d49667dfc3ebf5454121fd (drm/radeon/pm: update current crtc info after setting the powerstate) help?
https://bugs.freedesktop.org/show_bug.cgi?id=97260
Kai kai@dev.carbon-project.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kai@dev.carbon-project.org
--- Comment #7 from Kai kai@dev.carbon-project.org --- I can confirm this issue with 4.7.0 (vs. 4.6.4 in my case). In XCOM 2 I'm losing a third in FPS (down to <= 19 FPS from 26-29 FPS with 4.6.4, "measured" with Gallium's HUD) or about a fifth in The Talos Principle 64 bit (down to 43 Avg FPS from 53 Avg FPS, measured by the benchmark of TTP when run for 60 seconds). The only change between these numbers is the different kernel (and that's with all the VM faults I'm seeing for XCOM 2 with 4.6.4 on occasion and haven't been able to reproduce with 4.7.0 yet).
The stack I'm using (Debian testing as a base) is: GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1) Mesa: Git:master/3fb4a9b3b3 libdrm: 2.4.70-1 LLVM: SVN:trunk/r277307 (4.0 devel) X.Org: 2:1.18.4-1 Linux: 4.6.4 / 4.7.0 Firmware: Git:master/c170c8d957 (placed in /lib/firmware/updates; fallback would be firmware-amd-graphics/20160110-1) libclc: Git:master/785bfd3719 DDX: 1:7.7.0-1
I'm not going to be able to do a bisect before the weekend and even then I don't have high hopes, since the last time I tried bisecting the kernel all the merge commits screwed me over and I was unable to find a single offending commit.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #8 from Kai kai@dev.carbon-project.org --- (In reply to Alex Deucher from comment #6)
Does reverting 5e031d9fe8b0741f11d49667dfc3ebf5454121fd (drm/radeon/pm: update current crtc info after setting the powerstate) help?
No, it does not for me (I fetched the patch from https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=5e031d9fe8b0741f11d49667dfc3ebf5454121fd and applied it with patch -p1 -R < /path/to.patch). The Talos Principle is still down to 43 FPS, haven't tested XCOM 2 or anything else.
Btw, not sure if it can help tracking down the cause: the GPU doesn't seem to go into its highest performance mode or at least the fan is not reaching its highest RPM when I'm on 4.7.0 (with or without the revert).
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #9 from Alex Deucher alexdeucher@gmail.com --- (In reply to Kai from comment #8)
(In reply to Alex Deucher from comment #6)
Does reverting 5e031d9fe8b0741f11d49667dfc3ebf5454121fd (drm/radeon/pm: update current crtc info after setting the powerstate) help?
No, it does not for me (I fetched the patch from https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/ ?id=5e031d9fe8b0741f11d49667dfc3ebf5454121fd and applied it with patch -p1 -R < /path/to.patch). The Talos Principle is still down to 43 FPS, haven't tested XCOM 2 or anything else.
Btw, not sure if it can help tracking down the cause: the GPU doesn't seem to go into its highest performance mode or at least the fan is not reaching its highest RPM when I'm on 4.7.0 (with or without the revert).
You can check by running apps and monitoring the content of /sys/kernel/debug/dri/64/radeon_pm_info Do you see it scaling up under load on the problematic kernels? Does forcing the clocks to high via sysfs work ok?
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #10 from Kai kai@dev.carbon-project.org --- (In reply to Alex Deucher from comment #9)
(In reply to Kai from comment #8)
(In reply to Alex Deucher from comment #6)
Does reverting 5e031d9fe8b0741f11d49667dfc3ebf5454121fd (drm/radeon/pm: update current crtc info after setting the powerstate) help?
No, it does not for me (I fetched the patch from https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/ ?id=5e031d9fe8b0741f11d49667dfc3ebf5454121fd and applied it with patch -p1 -R < /path/to.patch). The Talos Principle is still down to 43 FPS, haven't tested XCOM 2 or anything else.
Btw, not sure if it can help tracking down the cause: the GPU doesn't seem to go into its highest performance mode or at least the fan is not reaching its highest RPM when I'm on 4.7.0 (with or without the revert).
You can check by running apps and monitoring the content of /sys/kernel/debug/dri/64/radeon_pm_info Do you see it scaling up under load on the problematic kernels? Does forcing the clocks to high via sysfs work ok?
While the clocks *do* scale up on 4.7.0, it doesn't go as high as 4.6.4. On 4.6.4 I'm reaching the maximum clocks for my GPU (power level avg sclk: 98000 mclk: 125000). With 4.7.0 the highest I'm seeing is "power level avg sclk: 90843 mclk: 125000" and the overall level is way lower (lots of sclk values starting with a 3) compared to 4.6.4 where most values are > 60000 for sclk.
Note: I used watch to dump /sys/kernel/debug/dri/64/radeon_pm_info every five seconds to a file. Between runs the actual numbers logged vary, but the general trend stays the same.
Writing "high" to /sys/class/drm/card0/device/power_dpm_force_performance_level locks the clock indeed to the highest values (sclk: 98000 mclk: 125000) on 4.7.0 and recovers almost all of the lost performance. There's still a small hit to performance.
Note: I've only tested "The Talos Principle" so far, if you need me to test XCOM 2 or other titles as well, let me know.
Do you still need the bisect?
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #11 from Alex Deucher alexdeucher@gmail.com --- (In reply to Kai from comment #10)
Do you still need the bisect?
That would be great. As I mentioned before, there weren't really any dpm related changes for these parts in 4.7 (there weren't many changes to radeon period), so I don't really have any ideas on what would have caused it right now.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
Kai kai@dev.carbon-project.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |bisected CC| |michel@daenzer.net Summary|R9 290 low performance in |[bisected] R9 290 low |Linux 4.7 |performance in Linux 4.7
--- Comment #12 from Kai kai@dev.carbon-project.org --- Ok, after 14 steps of bisection I identified the first bad commit as:
c63dd758589b1f7e8398841d1f443f06ebfbcefc is the first bad commit commit c63dd758589b1f7e8398841d1f443f06ebfbcefc Author: Michel Dänzer michel.daenzer@amd.com Date: Fri Apr 1 18:51:34 2016 +0900
drm/radeon: Support DRM_MODE_PAGE_FLIP_ASYNC
When this flag is set, we program the hardware to execute the flip during horizontal blank (i.e. for the next scanline) instead of during vertical blank (i.e. for the next frame).
Currently this is only supported on ASICs which have a page flip completion interrupt (>= R600), and only if the use_pflipirq parameter has value 2 (the default).
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Michel Dänzer michel.daenzer@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
:040000 040000 2f3d8295e7fa2809a3546a23c64da33311e624b9 6cd9fd9b53df0942efab559295e4c11fc6cc0463 M drivers
An additional build of 4.7.0 with c63dd758589b1f7e8398841d1f443f06ebfbcefc reverted maintains the performance level of 4.6.x.
Adding Michel to the CC list of this bug, since he authored the offending commit. Let me know, if you need anything else.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #13 from Kai kai@dev.carbon-project.org --- Created attachment 125755 --> https://bugs.freedesktop.org/attachment.cgi?id=125755&action=edit git bisect log output for the bisection run leading to c63dd758589b1f7e8398841d1f443f06ebfbcefc
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #14 from Alexander Tsoy alexander@tsoy.me --- (In reply to Kai from comment #12)
This is interesting. The same bad commit was found here: https://bugzilla.kernel.org/show_bug.cgi?id=119631
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #15 from nadro-linux@wp.pl --- I also noticed a big performance regression since kernel v4.7rc1. With kernel 4.6 all works fine. I have Radeon R9 380 2GB (Sapphire ITX Compact). I'll try to build a custom kernel without "support for DRM_MODE_PAGE_FLIP_ASYNC" commit and check if performance will be ok.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #16 from Alex Deucher alexdeucher@gmail.com --- Does just reverting this chunk fix the issue?
@@ -1630,6 +1631,9 @@ int radeon_modeset_init(struct radeon_device *rdev)
rdev->ddev->mode_config.funcs = &radeon_mode_funcs;
+ if (radeon_use_pflipirq == 2 && rdev->family >= CHIP_R600) + rdev->ddev->mode_config.async_page_flip = true; + if (ASIC_IS_DCE5(rdev)) { rdev->ddev->mode_config.max_width = 16384; rdev->ddev->mode_config.max_height = 16384;
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #17 from nadro-linux@wp.pl --- Thanks for your help, when I removed followind line: "adev->ddev->mode_config.async_page_flip = true;" from all three files "dce_v8_0.c", "dce_v10_0.c", "dce_v11_0.c" (I use AMDGPU driver) a performance is fine again!
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #18 from Michel Dänzer michel@daenzer.net --- Created attachment 125808 --> https://bugs.freedesktop.org/attachment.cgi?id=125808&action=edit radeon: Add some page flip debugging output
Well, that's a surprising result of the bisection.
I can imagine two possible causes, or possibly some combination thereof:
* The processing of asynchronous flips or the corresponding completion interrupts is delayed for some reason * Using flips instead of blits for buffer swaps lowers the load on the GPU 3D engine, so the SMU doesn't switch to higher clocks
The attached debugging patch should give us more information about the former. With it applied, run the following while an affected application is running in fullscreen:
sudo sh -c 'echo 2 >/sys/module/drm/parameters/debug'; sleep 1; sudo sh -c 'echo 0 >/sys/module/drm/parameters/debug'
Then attach the resulting dmesg output.
BTW, does the problem still happen with Alex's current drm-next-4.9-wip branch?
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #19 from Michel Dänzer michel@daenzer.net --- BTW, there are some potential workarounds:
* Disable DRI3 for affected games with the environment variable LIBGL_DRI3_DISABLE=1
* Enable sync-to-vblank in affected applications, or force it with vblank_mode=3
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #20 from Kai kai@dev.carbon-project.org --- Created attachment 125820 --> https://bugs.freedesktop.org/attachment.cgi?id=125820&action=edit dmesg output with additional debug info from attachment 125808
(In reply to Alex Deucher from comment #16)
Does just reverting this chunk fix the issue?
@@ -1630,6 +1631,9 @@ int radeon_modeset_init(struct radeon_device *rdev)
rdev->ddev->mode_config.funcs = &radeon_mode_funcs;
if (radeon_use_pflipirq == 2 && rdev->family >= CHIP_R600)
rdev->ddev->mode_config.async_page_flip = true;
if (ASIC_IS_DCE5(rdev)) { rdev->ddev->mode_config.max_width = 16384; rdev->ddev->mode_config.max_height = 16384;
Yes, just removing the default enable here has the same effect as reverting the entire patch.
(In reply to Michel Dänzer from comment #18)
Created attachment 125808 radeon: Add some page flip debugging output
Well, that's a surprising result of the bisection.
I can imagine two possible causes, or possibly some combination thereof:
- The processing of asynchronous flips or the corresponding completion
interrupts is delayed for some reason
- Using flips instead of blits for buffer swaps lowers the load on the GPU 3D engine, so the SMU doesn't switch to higher clocks
The attached debugging patch should give us more information about the former. With it applied, run the following while an affected application is running in fullscreen:
sudo sh -c 'echo 2 >/sys/module/drm/parameters/debug'; sleep 1; sudo sh -c 'echo 0 >/sys/module/drm/parameters/debug'
Then attach the resulting dmesg output.
Here you go. That was generated by running XCOM 2.
BTW, does the problem still happen with Alex's current drm-next-4.9-wip branch?
Haven't tested that yet. Maybe somebody else can do that. ;-)
(In reply to Michel Dänzer from comment #19)
BTW, there are some potential workarounds:
Disable DRI3 for affected games with the environment variable LIBGL_DRI3_DISABLE=1
Enable sync-to-vblank in affected applications, or force it with
vblank_mode=3
Well, this is going to be odd: I had VSync enabled in XCOM, since without that option I got poorer performance in the past than with it. Now, after your note here I actually *disabled* the VSync option in the game. 4.6.4 (or 4.7.0 without the offending commit/the enable removed) shows no longer a performance difference and I'm getting ~30 FPS in XCOM 2. BUT with your ASYNC patch (vanilla 4.7.0) and VSync turned of in the game gives me ALSO ~30 FPS! I'd still say this is a regression as there is no difference without your patch, but maybe this information can help you in narrowing down the cause?
I hope I haven't missed any open question. Let me know if you need anything else.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #21 from Michel Dänzer michel@daenzer.net --- Created attachment 125838 --> https://bugs.freedesktop.org/attachment.cgi?id=125838&action=edit loader/dri3: Always use 3 back buffers when flipping
Does this Mesa patch help?
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #22 from Kai kai@dev.carbon-project.org --- (In reply to Michel Dänzer from comment #21)
Created attachment 125838 loader/dri3: Always use 3 back buffers when flipping
Does this Mesa patch help?
Yes! With attachment 125838 applied on top of Mesa master 607ab6d3bf I'm seeing the same performance on 4.7.1 as on 4.6.4. You can have my Tested-by: Kai Wasserbäch kai@dev.carbon-project.org
The full stack used (Debian testing as a base) was: GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1) Mesa: Git:master/3fb4a9b3b3 / Git:master/607ab6d3bf + attachment 125838 libdrm: 2.4.70-1 LLVM: SVN:trunk/r278555 (4.0 devel) X.Org: 2:1.18.4-1 Linux: 4.7.1 Firmware: Git:master/c170c8d957 (placed in /lib/firmware/updates; fallback would be firmware-amd-graphics/20160110-1) libclc: Git:master/785bfd3719 DDX: 1:7.7.0-1
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #23 from Michel Dänzer michel@daenzer.net --- Note that since the crash happens when Xorg tries to use the GPU, testing with DRI3 and Xorg not using the GPU directly cannot trigger the crash.
https://bugs.freedesktop.org/show_bug.cgi?id=97260
--- Comment #24 from Michel Dänzer michel@daenzer.net --- Sorry, wrong bug. :(
https://bugs.freedesktop.org/show_bug.cgi?id=97260
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|unspecified |git Component|DRM/Radeon |Mesa core QA Contact| |mesa-dev@lists.freedesktop. | |org Product|DRI |Mesa Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop. |.org |org
dri-devel@lists.freedesktop.org