https://bugs.freedesktop.org/show_bug.cgi?id=76501
Priority: medium Bug ID: 76501 Assignee: dri-devel@lists.freedesktop.org Summary: fences regression Severity: normal Classification: Unclassified OS: Linux (All) Reporter: odi@odi.ch Hardware: x86-64 (AMD64) Status: NEW Version: XOrg CVS Component: DRM/Radeon Product: DRI
Created attachment 96233 --> https://bugs.freedesktop.org/attachment.cgi?id=96233&action=edit dmesg 3.12
I am seeing a GPU lockup from any v3.13 up to 3.14-rc7, which basically renders my computer unusable under recent kernels :-(
[ 55.762710] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 55.762715] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000004 last fence id 0x000000000000000 on ring 5) [ 55.762717] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35). [ 55.762720] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).
Hardware is an iMac 11,2 with a Radeon 4670 M96XT (RV730), 256MB GDDR3.
working up to 3.12, broken as of 3.13.
Xorg comes up after some dalays with a mostly black screen, some colored rectangular artifacts where the login fields are, a working mouse cursor.
Console fb still works.
Bisected to this commit:
commit f9eaf9ae782d6480f179850e27e6f4911ac10227 Author: Christian König christian.koenig@amd.com Date: Tue Oct 29 20:14:47 2013 +0100
drm/radeon: rework and fix reset detection v2
Stop fiddling with jiffies, always wait for RADEON_FENCE_JIFFIES_TIMEOUT. Consolidate the two wait sequence implementations into just one function. Activate all waiters and remember if the reset was already done instead of trying to reset from only one thread.
v2: clear reset flag earlier to avoid timeout in IB test
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #1 from Ortwin Glück odi@odi.ch --- Created attachment 96234 --> https://bugs.freedesktop.org/attachment.cgi?id=96234&action=edit dmesg 3.14
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #2 from Ortwin Glück odi@odi.ch --- NB: the UVD init does not occur each time. But the "GPU lockup" message does.
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #3 from Christian König deathsimple@vodafone.de --- please provide a dmesg from commit f9eaf9ae782d6480f179850e27e6f4911ac10227 and 1dac28eb726109e7ac256051b157baf60b21a5f7 as well.
Thansk in advance, Christian.
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #4 from Ortwin Glück odi@odi.ch --- Created attachment 96315 --> https://bugs.freedesktop.org/attachment.cgi?id=96315&action=edit last good commit
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #5 from Ortwin Glück odi@odi.ch --- Created attachment 96316 --> https://bugs.freedesktop.org/attachment.cgi?id=96316&action=edit first bad commit
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #6 from Ortwin Glück odi@odi.ch --- interestingly also the last good commit produces the following log: [ 7.573975] [drm] UVD initialized successfully. [ 7.574210] [drm] Enabling audio 0 support [ 7.574240] [drm] ib test on ring 0 succeeded in 0 usecs [ 7.574263] [drm] ib test on ring 3 succeeded in 0 usecs [ 17.730386] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 17.730390] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000002 last fence id 0x0000000000000000) [ 17.730393] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35). [ 17.730397] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).
So that seems unrelated to the issue at hand.
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #7 from Christian König deathsimple@vodafone.de --- Created attachment 96360 --> https://bugs.freedesktop.org/attachment.cgi?id=96360&action=edit Possible fix
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #8 from Ortwin Glück odi@odi.ch --- Thanks, I will test the patch tonight.
Also I will bisect the first commit that produces the GPU lockup (without visible artifacts), as that seems to me the real problem. Probably f9eaf9 only exposes that bug visibly.
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #9 from Christian König deathsimple@vodafone.de --- (In reply to comment #6)
interestingly also the last good commit produces the following log: [ 7.573975] [drm] UVD initialized successfully. [ 7.574210] [drm] Enabling audio 0 support [ 7.574240] [drm] ib test on ring 0 succeeded in 0 usecs [ 7.574263] [drm] ib test on ring 3 succeeded in 0 usecs [ 17.730386] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 17.730390] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000002 last fence id 0x0000000000000000) [ 17.730393] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35). [ 17.730397] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).
So that seems unrelated to the issue at hand.
Actually it is related, and now the behaviours makes perfect sense.
Somewhere between 3.12 and your "last good" commit we have a patch that breaks UVD IB testing. But that isn't critical (3D still works fine) until the reset detection rework, cause after that one we try to get the UVD ring working again with each new IOCTL made to the card.
Please give the attached patch a try, it clears the "needs_reset" flag if the IB test failed for some reason. So that if the initial bringup fails we won't try to get it working over and over again.
Additional to that please bisect what commit breaks UVD IB testing between 3.12 and the "last good" commit and open up a new bug report for this issue.
Thanks for the help, Christian.
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #10 from Ortwin Glück odi@odi.ch --- I confirm that the patch fixes the screen output (3D). The GPU lockup is still present in dmesg, as expected. Bisecting now and will open a new bug report for it.
https://bugs.freedesktop.org/show_bug.cgi?id=76501
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #11 from Christian König deathsimple@vodafone.de --- Perfect, thanks for the help.
Patch is on it's way upstream so any objections to closing this bug then?
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #12 from Ortwin Glück odi@odi.ch --- OK to closing.
The other problem has resolved itself, by the way. For convenience I had always booted these kernels via kexec, which was the reason for the GPU lockup. After a normal warm boot the problem went away.
https://bugs.freedesktop.org/show_bug.cgi?id=76501
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #13 from Christian König deathsimple@vodafone.de --- Ah, ok. That makes sense, cause kexec and UVD are known to not work together by design.
Closing this.
https://bugs.freedesktop.org/show_bug.cgi?id=76501
--- Comment #14 from Shawn Starr shawn.starr@rogers.com --- Does this fix issues where when GPU locks up X is able to resume? when X resumes for me, there is no ability to resume using the display server and most of the time it just GPU wedges and then I need to do a reset.
dri-devel@lists.freedesktop.org