https://bugs.freedesktop.org/show_bug.cgi?id=59592
Priority: medium Bug ID: 59592 Assignee: dri-devel@lists.freedesktop.org Summary: Reproducable GPU lockups since 3.8 Severity: major Classification: Unclassified OS: All Reporter: nine@detonation.org Hardware: Other Status: NEW Version: XOrg CVS Component: DRM/Radeon Product: DRI
Created attachment 73293 --> https://bugs.freedesktop.org/attachment.cgi?id=73293&action=edit dmesg after lockups
Since upgrading my kernel to 3.8.0-rc3 using openSUSE packages, I reproducably get GPU lockups when running FlightGear right after the simulation starts. Takes a couple of minutes with some screen blanking to quit the application again. Afterwards the system continues running fine. Activating KDE's desktop effects may produce the same behaviour.
I'm using the following versions: Mesa-9.1_git20130117-230.1.x86_64 libdrm2-2.4.99_git20130117-1.1.x86_64
X.Org X Server 1.12.3 Release Date: 2012-07-09 [ 17.264] X Protocol Version 11, Revision 0 [ 17.264] Build Operating System: openSUSE SUSE LINUX [ 17.264] Current Operating System: Linux sphinx 3.8.0-rc3-1-desktop #1 SMP PREEMPT Thu Jan 10 20:49:22 UTC 2013 (7ce28dd) x86_64 [ 17.264] Kernel command line: BOOT_IMAGE=/vmlinuz-3.8.0-rc3-1-desktop root=/dev/mapper/system-root resume=/dev/system/swap quiet splash=silent [ 17.264] Build Date: 08 January 2013 11:56:04AM
I could do some git bisecting on the kernel or on Mesa if it helps and I'm fluent in C. Just tell me how I can help fixing this.
https://bugs.freedesktop.org/show_bug.cgi?id=59592
nine@detonation.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Reproducable GPU lockups |Radeon HD 5670: |since 3.8 |reproducable GPU lockups | |since kernel 3.8
--- Comment #1 from nine@detonation.org --- I spent the afternoon bisecting the kernel and found the commit seeming to cause the GPU lockups
4ac0533abaec2b83a7f2c675010eedd55664bc26 is the first bad commit commit 4ac0533abaec2b83a7f2c675010eedd55664bc26 Author: Jerome Glisse jglisse@redhat.com Date: Thu Dec 13 12:08:11 2012 -0500
drm/radeon: fix htile buffer size computation for command stream checker
Fix the size computation of the htile buffer.
Signed-off-by: Jerome Glisse jglisse@redhat.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
:040000 040000 cf30bb09a4096c41959a27c6fc7d391dfa718028 fc571d6379b3b697a2bad0e5d097797f77c0a1b6 M drivers
https://bugs.freedesktop.org/show_bug.cgi?id=59592
nine@detonation.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |regression
https://bugs.freedesktop.org/show_bug.cgi?id=59592
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Product|DRI |Mesa Version|XOrg CVS |git Component|DRM/Radeon |Drivers/Gallium/r600
--- Comment #2 from Alex Deucher agd5f@yahoo.com --- I think this is actually a mesa bug. The kernel commit you bisected just allows the problematic feature to be enabled in mesa. The mesa commits are: http://cgit.freedesktop.org/mesa/mesa/commit/?id=24b1206ab2dcd506aaac3ef656a... http://cgit.freedesktop.org/mesa/mesa/commit/?id=6532eb17baff6e61b427f29e076...
https://bugs.freedesktop.org/show_bug.cgi?id=59592
--- Comment #3 from nine@detonation.org --- I can confirm that http://cgit.freedesktop.org/mesa/mesa/commit/?id=6532eb17baff6e61b427f29e076... is the first Mesa commit where the lockups occur.
With http://cgit.freedesktop.org/mesa/mesa/commit/?id=24b1206ab2dcd506aaac3ef656a... it still works on kernel 3.8.0-rc4
Is there anything else I can do to help fixing this bug?
https://bugs.freedesktop.org/show_bug.cgi?id=59592
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Radeon HD 5670: |Radeon HD 5670: |reproducable GPU lockups |reproducable GPU lockups |since kernel 3.8 |with htile enabled
https://bugs.freedesktop.org/show_bug.cgi?id=59592
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |anonymous@dodgeit.com
--- Comment #4 from Alex Deucher agd5f@yahoo.com --- *** Bug 60347 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=59592
--- Comment #5 from Jerome Glisse glisse@freedesktop.org --- Created attachment 74707 --> https://bugs.freedesktop.org/attachment.cgi?id=74707&action=edit Mesa fix
Please try if attached mesa patch fix it
https://bugs.freedesktop.org/show_bug.cgi?id=59592
--- Comment #6 from nine@detonation.org --- The patch seems to fix the problem indeed. I've been trying more than half an hour to get the GPU to lock up without result. Thank you very very much!
https://bugs.freedesktop.org/show_bug.cgi?id=59592
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #7 from Jerome Glisse glisse@freedesktop.org --- Proper fix commited to mesa
https://bugs.freedesktop.org/show_bug.cgi?id=59592
nine@detonation.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |---
--- Comment #8 from nine@detonation.org --- While it did seem to work in my first test with your patch, I've experienced frequent GPU lockups since. Last weekend I tried to bisect it, since it got back to reliable immediate lockups during fadeout of the splash screen. Unfortunately they were not as reliable as they looked like, making the whole bisection useless.
Unfortunately I don't have more information yet. I'm running with --enable-debug but never got any assertion failures.
What can I do to debug this problem?
https://bugs.freedesktop.org/show_bug.cgi?id=59592
--- Comment #9 from Jerome Glisse glisse@freedesktop.org --- Please check if below patch fix the issue:
http://people.freedesktop.org/~glisse/0001-r600g-force-full-cache-for-hyperz...
https://bugs.freedesktop.org/show_bug.cgi?id=59592
--- Comment #10 from nine@detonation.org --- Sorry for the late reply. Took me some time to get my test setup working after a distribution upgrade. It seems like your patch does indeed fix the problem. I've played around with FlightGear for several hours without any lockups whatsoever with R600_HYPERZ=1. Before I tried it without your patch and got an immediate lockup.
It seems like your patch is already committed to master. After upgrading to current master it continues to work flawlessly. Thanks!
https://bugs.freedesktop.org/show_bug.cgi?id=59592
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |FIXED
--- Comment #11 from Jerome Glisse glisse@freedesktop.org --- Closing pushed to master and going to push to 9.1
dri-devel@lists.freedesktop.org