https://bugzilla.kernel.org/show_bug.cgi?id=44121
Summary: Reproducible GPU lockup CP stall on Radeon HD 6450 Product: Drivers Version: 2.5 Kernel Version: 3.5-rc5 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) AssignedTo: drivers_video-dri@kernel-bugs.osdl.org ReportedBy: khali@linux-fr.org CC: alexdeucher@gmail.com Regression: Yes
With kernels 3.5-rc3 to 3.5-rc5, I hit a GPU lockup CP stall issue whenever I do some actions in Firefox: if I need to authenticate to access a given site, or when the download target selection window pops up. I'm running Gnome 3.2 on openSUSE 12.1.
When this happens, the whole Gnome interface freezes, with gnome-shell stuck at 100% CPU. In the kernel logs I see the following:
radeon 0000:08:00.0: GPU lockup CP stall for more than 10000msec radeon 0000:08:00.0: GPU lockup (waiting for 0x00000000000113f3 last fence id 0x00000000000113f0) radeon 0000:08:00.0: GPU softreset radeon 0000:08:00.0: GRBM_STATUS=0xE55008A0 radeon 0000:08:00.0: GRBM_STATUS_SE0=0xEC000001 radeon 0000:08:00.0: GRBM_STATUS_SE1=0x00000007 radeon 0000:08:00.0: SRBM_STATUS=0x200000C0 radeon 0000:08:00.0: GRBM_SOFT_RESET=0x00007F6B radeon 0000:08:00.0: GRBM_STATUS=0x00003828 radeon 0000:08:00.0: GRBM_STATUS_SE0=0x00000007 radeon 0000:08:00.0: GRBM_STATUS_SE1=0x00000007 radeon 0000:08:00.0: SRBM_STATUS=0x200000C0 radeon 0000:08:00.0: GPU reset succeed [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). radeon 0000:08:00.0: WB enabled radeon 0000:08:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88013557bc00 [drm] ring test on 0 succeeded in 0 usecs [drm] ib test on ring 0 succeeded in 0 usecs
I have more samples if needed.
No problem when doing the same with kernel 3.4.4.
I ran "git bisect" and found that reverting the following commit fixes the problem:
commit 416a2bd274566a6f607a271f524b2dc0b84d9106 Author: Alex Deucher alexander.deucher@amd.com Date: Thu May 31 19:00:25 2012 -0400
drm/radeon: fixup tiling group size and backendmap on r6xx-r9xx (v4)
Tiling group size is always 256bits on r6xx/r7xx/r8xx/9xx. Also fix and simplify render backend map. This now properly sets up the backend map on r6xx-9xx which should improve 3D performance.
Vadim benchmarked also: Some benchmarks on juniper (5750), fullscreen 1920x1080, first result - kernel 3.4.0+ (fb21affa), second - with these patches:
Lightsmark: 91 fps => 123 fps +35% Doom3: 74 fps => 101 fps +36%
Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Jerome Glisse jglisse@redhat.com Signed-off-by: Dave Airlie airlied@redhat.com
Let me know if you need more debugging information, I'll do whatever I can to help.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #1 from Alex Deucher alexdeucher@gmail.com 2012-07-03 14:53:04 --- Can you dump the following registers using radeonreg or avivotool (http://cgit.freedesktop.org/~airlied/radeontool/) with the patch applied and reverted and attach both results?
CC_RB_BACKEND_DISABLE (0x98F4) CC_SYS_RB_BACKEND_DISABLE (0x3F88) GC_USER_RB_BACKEND_DISABLE (0x9B7C) CC_GC_SHADER_PIPE_CONFIG (0x8950) GB_BACKEND_MAP (0x98FC)
(as root): radeonreg regmatch 0x98F4 etc.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #2 from Jean Delvare khali@linux-fr.org 2012-07-03 15:27:00 --- With 3.5-rc5 kernel (failing) :
0x98F4 0x00000001 (1) 0x3F88 0x00000001 (1) 0x9B7C 0x00000000 (0) 0x8950 0xfffcf001 (-200703) 0x98FC 0x00000000 (0)
With commit 416a2bd2 reverted (working) :
0x98F4 0x00000001 (1) 0x3F88 0x00000001 (1) 0x9B7C 0x00fe0000 (16646144) 0x8950 0xfffcf001 (-200703) 0x98FC 0x00000000 (0)
So, value of register GC_USER_RB_BACKEND_DISABLE (0x9B7C) differs.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #3 from Jérôme Glisse glisse@freedesktop.org 2012-07-03 15:42:09 --- Created an attachment (id=74671) --> (https://bugzilla.kernel.org/attachment.cgi?id=74671) properly disable render backend
Does this patch fix it ?
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #4 from Jean Delvare khali@linux-fr.org 2012-07-03 16:21:27 --- I tested the patch in comment #3 but unfortunately it doesn't solve the problem.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #5 from Jean Delvare khali@linux-fr.org 2012-07-03 16:39:19 --- With this patch applied, I get:
0x98F4 0x00000001 (1) 0x3F88 0x00000001 (1) 0x9B7C 0x00fe0000 (16646144) 0x8950 0xfffcf001 (-200703) 0x98FC 0x00000000 (0) 0x8954 0x00000000 (0)
So the value of register 0x9B7C is correct now, but this was not sufficient.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
Jérôme Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #74671|0 |1 is obsolete| |
--- Comment #6 from Jérôme Glisse glisse@freedesktop.org 2012-07-03 17:09:41 --- Created an attachment (id=74701) --> (https://bugzilla.kernel.org/attachment.cgi?id=74701) properly disable render backend
This one ?
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #7 from Alex Deucher alexdeucher@gmail.com 2012-07-03 17:31:40 --- Created an attachment (id=74711) --> (https://bugzilla.kernel.org/attachment.cgi?id=74711) possible fix
or this variant. Although AFAIK, programming the USER register variants shouldn't be necessary as the default values (0) are valid.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #8 from Alex Deucher alexdeucher@gmail.com 2012-07-03 18:00:24 --- Does booting up a clean kernel without any patches applied or reverted work if you manually set the following registers to their "patch reverted" values using radeonreg? Just to be sure, write all of them even if the values are the same. Do this without X running.
0x98F4 0x3F88 0x9B7C 0x8950 0x98FC 0x8954
e.g., radeonreg regset 0x8950 0xfffcf001
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #9 from Jean Delvare khali@linux-fr.org 2012-07-03 18:08:29 --- Patch from comment #6 doesn't work, testing patch from comment #7 now.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #10 from Jean Delvare khali@linux-fr.org 2012-07-03 18:56:15 --- Patch from comment #7 did not work either. Then I followed the instructions from comment #8, but it also did not help.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #74711|0 |1 is obsolete| |
--- Comment #11 from Alex Deucher alexdeucher@gmail.com 2012-07-03 19:03:52 --- Created an attachment (id=74771) --> (https://bugzilla.kernel.org/attachment.cgi?id=74771) possible fix
Another possible fix, but I don't think it will help as it touches things never previously touched. I don't think the issue is the USER registers, but it's worth a shot I suppose.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #12 from Jean Delvare khali@linux-fr.org 2012-07-03 21:40:52 --- Patch from comment #11 didn't work at all, not only it didn't fix the original issue but it even caused additional trouble (gdm wouldn't even show up.)
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #13 from Jean Delvare khali@linux-fr.org 2012-07-05 07:30:03 --- Reproducibility information:
* I cannot reproduce the GPU lockup on a Radeon HD 4350 card.
* On the Radeon HD 6450, I can reproduce the GPU lockup with applications other than Firefox. I was able to do so with Claws Mail for example. The parent window has to be maximized for it to happen. Then, as soon as a title-less dialog box is opened (for example by pressing Ctrl+S for "Save As..."), the GPU lockup happens.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
Jean Delvare khali@linux-fr.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #74771|0 |1 is obsolete| |
https://bugzilla.kernel.org/show_bug.cgi?id=44121
Jean Delvare khali@linux-fr.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID
--- Comment #14 from Jean Delvare khali@linux-fr.org 2012-07-06 12:09:24 --- I managed to fix the problem with a user-space stack update. I updated:
* libdrm from version 2.4.26 to 2.4.33 * Mesa from version 7.11 to 8.0.3 * from xorg-x11-libX11 version 7.6 to libX11 version 1.5.0
and I no longer see the GPU lockup. So I guess I can close this bug as invalid, if the actual bug was in user-space.
https://bugzilla.kernel.org/show_bug.cgi?id=44121
--- Comment #15 from Alex Deucher alexdeucher@gmail.com 2012-07-06 13:13:17 --- That's the problem with GPU drivers. It's impossible to test every combination of userspace and kernel drivers and there can be very subtle bugs with certain combinations like this one that are almost impossible to track down.
dri-devel@lists.freedesktop.org