https://bugzilla.kernel.org/show_bug.cgi?id=78661
Bug ID: 78661 Summary: GPU sometimes locks up after boot and/or resume Product: Drivers Version: 2.5 Kernel Version: 3.15.1 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: madigens@gmail.com Regression: No
I've been getting these hangs since at least 3.13.x on Ubuntu 14.04, but I remember getting them before that, too.
It doesn't always happen, mostly the system works as usual. Sometimes though, I boot up or resume and upon opening an application or moving the mouse, I get a garbled screen, after a few seconds it goes back to normal, only to hang shortly thereafter again or hang completely. I can't remember this happening during normal use, meaning if it doesn't happen shortly after boot or resume, the system works fine.
I cut some suspicious lines from syslog:
Jun 22 10:31:40 nikolaus-desktop kernel: [ 86.067026] radeon 0000:01:00.0: ring 5 stalled for more than 81420msec Jun 22 10:31:40 nikolaus-desktop kernel: [ 86.067034] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000001 last fence id 0x0000000000000000 on ring 5) Jun 22 10:31:41 nikolaus-desktop kernel: [ 86.929541] radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jun 22 10:31:41 nikolaus-desktop kernel: [ 86.986723] [drm] PCIE gen 2 link speeds already enabled Jun 22 10:31:41 nikolaus-desktop kernel: [ 86.988854] [drm] PCIE GART of 1024M enabled (table at 0x000000000025D000). Jun 22 10:31:41 nikolaus-desktop kernel: [ 86.988930] radeon 0000:01:00.0: WB enabled Jun 22 10:31:41 nikolaus-desktop kernel: [ 86.988932] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8800d9752c00 Jun 22 10:31:41 nikolaus-desktop kernel: [ 86.988933] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8800d9752c0c Jun 22 10:31:41 nikolaus-desktop kernel: [ 86.989307] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc90004f9c418 Jun 22 10:31:41 nikolaus-desktop kernel: [ 87.005388] [drm] ring test on 0 succeeded in 1 usecs Jun 22 10:31:41 nikolaus-desktop kernel: [ 87.005442] [drm] ring test on 3 succeeded in 1 usecs Jun 22 10:31:41 nikolaus-desktop kernel: [ 87.200919] [drm] ring test on 5 succeeded in 1 usecs Jun 22 10:31:41 nikolaus-desktop kernel: [ 87.200923] [drm] UVD initialized successfully. Jun 22 10:31:41 nikolaus-desktop kernel: [ 87.200941] [drm] ib test on ring 0 succeeded in 0 usecs Jun 22 10:31:41 nikolaus-desktop kernel: [ 87.200959] [drm] ib test on ring 3 succeeded in 1 usecs Jun 22 10:31:42 nikolaus-desktop kernel: [ 87.370683] [drm:uvd_v1_0_ib_test] *ERROR* radeon: failed to get create msg (-22). Jun 22 10:31:42 nikolaus-desktop kernel: [ 87.370688] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-22). Jun 22 10:31:42 nikolaus-desktop kernel: [ 87.370708] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed Jun 22 10:31:52 nikolaus-desktop kernel: [ 98.016760] radeon 0000:01:00.0: ring 5 stalled for more than 10000msec Jun 22 10:31:52 nikolaus-desktop kernel: [ 98.016768] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000001 last fence id 0x0000000000000000 on ring 5) Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.669107] radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.725911] [drm] PCIE gen 2 link speeds already enabled Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.727900] [drm] PCIE GART of 1024M enabled (table at 0x000000000025D000). Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.727974] radeon 0000:01:00.0: WB enabled Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.727976] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8800d9752c00 Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.727977] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8800d9752c0c Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.728365] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc90004f9c418 Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.744428] [drm] ring test on 0 succeeded in 1 usecs Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.744483] [drm] ring test on 3 succeeded in 1 usecs Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.940008] [drm] ring test on 5 succeeded in 1 usecs Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.940012] [drm] UVD initialized successfully. Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.940031] [drm] ib test on ring 0 succeeded in 0 usecs Jun 22 10:45:27 nikolaus-desktop kernel: [ 912.940050] [drm] ib test on ring 3 succeeded in 1 usecs Jun 22 10:45:27 nikolaus-desktop kernel: [ 913.109743] [drm:uvd_v1_0_ib_test] *ERROR* radeon: failed to get create msg (-22). Jun 22 10:45:27 nikolaus-desktop kernel: [ 913.109748] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-22). Jun 22 10:45:27 nikolaus-desktop kernel: [ 913.109763] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
What else do you need from me?
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #1 from Nikolaus Waxweiler madigens@gmail.com --- Argh, I forgot: I have a HD5870.
https://bugzilla.kernel.org/show_bug.cgi?id=78661
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #2 from Alex Deucher alexdeucher@gmail.com --- Does booting with radeon.dpm=0 on the kernel command line in grub help? Please attach your full dmesg output.
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #3 from Nikolaus Waxweiler madigens@gmail.com --- Created attachment 140851 --> https://bugzilla.kernel.org/attachment.cgi?id=140851&action=edit Full dmesg output for the day where the hangs occured up to the bug report
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #4 from Nikolaus Waxweiler madigens@gmail.com --- Will try radeon.dpm=0 and report back.
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #5 from Nikolaus Waxweiler madigens@gmail.com --- Created attachment 141821 --> https://bugzilla.kernel.org/attachment.cgi?id=141821&action=edit It happened again with radeon.dpm=0 and power_profile set to low.
syslog since using radeon.dpm=0 on ther kernel command line.
https://bugzilla.kernel.org/show_bug.cgi?id=78661
Dieter Nützel Dieter@nuetzel-hh.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |Dieter@nuetzel-hh.de
--- Comment #6 from Dieter Nützel Dieter@nuetzel-hh.de --- (In reply to Nikolaus Waxweiler from comment #1)
Argh, I forgot: I have a HD5870.
Try this patch: https://bugzilla.kernel.org/attachment.cgi?id=141741&action=diff
With radeon.dpm=1 ;-)
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #7 from Nikolaus Waxweiler madigens@gmail.com --- Created attachment 142021 --> https://bugzilla.kernel.org/attachment.cgi?id=142021&action=edit Rebuilt kernel with patch, just got a hang again :(
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #8 from Nikolaus Waxweiler madigens@gmail.com --- Speaking of hangs, every once in a while I get lockups on boot where the screen stays corrupted or black and even the reset button doesn't help and I have to long-press the power button. Theres no log for those hangs so I don't know if they're related to this bug. Anything I can do to further analyze these hangs?
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #9 from Alex Deucher alexdeucher@gmail.com --- This might be related to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=76998
Can you try this patch: https://bugs.freedesktop.org/attachment.cgi?id=102392
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #10 from Nikolaus Waxweiler madigens@gmail.com --- Alright, both patches active. Will test and report back.
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #11 from Nikolaus Waxweiler madigens@gmail.com --- Created attachment 142621 --> https://bugzilla.kernel.org/attachment.cgi?id=142621&action=edit Got a temporary hang again on boot-up, managed to reboot...
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #12 from Alex Deucher alexdeucher@gmail.com --- Make sure you power off completely before testing the new patch rather than just a warm reboot to make sure the old register state is not retained.
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #13 from Nikolaus Waxweiler madigens@gmail.com --- Okay :)
After a few days of use, I got my first corrupted screen this morning. I could reisub though. Unfortunately, nothing in the logs... will keep testing.
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #14 from Nikolaus Waxweiler madigens@gmail.com --- Created attachment 143581 --> https://bugzilla.kernel.org/attachment.cgi?id=143581&action=edit New hangs on cold boot on 3.15.6
Alright, the lockups were few and far between on 3.15.5 with both patches, now with 3.15.6 (installed from Ubuntu's mainline kernel repo) I always get a short hang on the first boot in the morning. It continues but I have to reboot once to avoid further short hangs and work normally. Log from first cold boot attached.
https://bugzilla.kernel.org/show_bug.cgi?id=78661
--- Comment #15 from Nikolaus Waxweiler madigens@gmail.com --- The amount of hangs has increased since 3.16 :(
https://bugzilla.kernel.org/show_bug.cgi?id=78661
bjo@nord-west.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |bjo@nord-west.org
--- Comment #16 from bjo@nord-west.org --- I got lockups with 3.19.2 and 3.14.36:
[ 334.649402] radeon 0000:02:00.0: ring 5 stalled for more than 10000msec [ 334.649408] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000002 last fence id 0x0000000000000004 on ring 5) [ 334.649520] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait failed (-35). [ 334.649563] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 5 (-35). [ 334.832010] [drm:rv770_dpm_set_power_state [radeon]] *ERROR* rv770_set_sw_state failed
dri-devel@lists.freedesktop.org