https://bugs.freedesktop.org/show_bug.cgi?id=64867
Priority: medium Bug ID: 64867 Assignee: dri-devel@lists.freedesktop.org Summary: Hangs on Cayman (HD6950) when watching flash/using vdpau Severity: normal Classification: Unclassified OS: Linux (All) Reporter: serafean@gmail.com Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Drivers/Gallium/r600 Product: Mesa
Created attachment 79658 --> https://bugs.freedesktop.org/attachment.cgi?id=79658&action=edit dmesg after X freeze
When watching a flash video (opera + flash-11.2.202.262 ) the kernel log starts filling up with [ 7009.603310] radeon 0000:01:00.0: GPU fault detected: 146 0x0e677004 [ 7009.603313] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 7009.603316] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
to sometimes eventually hang the GPU. When this starts appearing there is graphic corruption everywhere. I also managed to reproduce this when playing back a video using mplayer with the vdpau backend. ( EnableLinuxHWVideoDecode=1 is commented out in /etc/adobe/mms.cfg)
I managed to get a dmesg output after the GPU hang (attached) and then tried to save what is in /sys/kernel/debug/dri/0 but not knowing which was important, tried to do it in a bash for cycle, and got a complete hang.
linux-3.10-rc1 mesa, libdrm, radeon ddx from git.
might be related to https://bugs.freedesktop.org/show_bug.cgi?id=62959 , but piglit just finished fine. (fine = didn't hang in this case, still a bunch of "radeon_gem_object_create:69 alloc size 1365Mb bigger than 256Mb limit" in the logs). Additional info : I have a dual screen setup.
Anything more you need, I'll be happy to provide.
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #1 from Martin Bednar serafean@gmail.com --- Created attachment 79661 --> https://bugs.freedesktop.org/attachment.cgi?id=79661&action=edit output after hang.
The output I got when trying to cat /sys/debug/kernel/dri/0/* Sorry for the bad quality.
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #2 from Harald Judt h.judt@gmx.at --- I too get system hangs when watching a flash video in firefox. linux-3.8.13, libdrm, mesa etc. git. Screen simply becomes black (no signal) and machine is dead, leaving a hard reset as the only option. The dmesg is flooded with the following lines:
radeon 0000:01:00.0: GPU fault detected: 147 0x0d859002 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000012D8 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05090002 [...] repeated a hundred times with only the first line changing a bit [...]
then: radeon 0000:01:00.0: GPU fault detected: 146 0x07151004 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [...] repeated a hundred times with only the first line changing a bit [...]
Times indicate this goes on for approximately two minutes before the hang.
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #3 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #2)
I too get system hangs when watching a flash video in firefox. linux-3.8.13, libdrm, mesa etc. git. Screen simply becomes black (no signal) and machine is dead, leaving a hard reset as the only option. The dmesg is flooded with the following lines:
radeon 0000:01:00.0: GPU fault detected: 147 0x0d859002 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000012D8 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05090002 [...] repeated a hundred times with only the first line changing a bit [...]
Something in the mesa drivers is emitting a command buffer without a proper virtual address for CB5.
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #4 from Harald Judt h.judt@gmx.at --- Hoping that it would be a workaround, I've applied the following patch from another bug report:
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 5407459..959e7cf 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,6 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, if (r) { goto out; } + radeon_fence_wait(vm->fence, false); radeon_cs_sync_rings(parser); radeon_cs_sync_to(parser, vm->fence); radeon_cs_sync_to(parser, radeon_vm_grab_id(rdev, vm, parser->ring));
While the hang happened again while playing a flash video (I'll try if I can reproduce it somehow), this time I was able to vt switch, and X was killed and the following additional lines got appended to dmesg:
[30243.510949] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30243.510951] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000005c90bd last fence id 0x00000000005c90b8) [30243.510952] radeon 0000:01:00.0: couldn't schedule ib [30243.510973] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30243.511047] radeon 0000:01:00.0: couldn't schedule ib [30243.511048] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.511197] radeon 0000:01:00.0: couldn't schedule ib [30243.511198] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512323] radeon 0000:01:00.0: couldn't schedule ib [30243.512324] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512851] radeon 0000:01:00.0: couldn't schedule ib [30243.512852] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30254.004957] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30254.004959] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000334 last fence id 0x0000000000000333) [30254.004961] radeon 0000:01:00.0: couldn't schedule ib [30254.005052] radeon 0000:01:00.0: couldn't schedule ib [30254.005064] radeon 0000:01:00.0: couldn't schedule ib [30254.005070] radeon 0000:01:00.0: couldn't schedule ib [30254.005084] radeon 0000:01:00.0: couldn't schedule ib [30254.005092] radeon 0000:01:00.0: couldn't schedule ib [30254.005097] radeon 0000:01:00.0: Trying to sync to a disabled ring! [...] [30243.510949] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30243.510951] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000005c90bd last fence id 0x00000000005c90b8) [30243.510952] radeon 0000:01:00.0: couldn't schedule ib [30243.510973] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30243.511047] radeon 0000:01:00.0: couldn't schedule ib [30243.511048] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.511197] radeon 0000:01:00.0: couldn't schedule ib [30243.511198] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512323] radeon 0000:01:00.0: couldn't schedule ib [30243.512324] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512851] radeon 0000:01:00.0: couldn't schedule ib [30243.512852] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30254.004957] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30254.004959] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000334 last fence id 0x0000000000000333) [30254.004961] radeon 0000:01:00.0: couldn't schedule ib [30254.005052] radeon 0000:01:00.0: couldn't schedule ib [30254.005064] radeon 0000:01:00.0: couldn't schedule ib [30254.005070] radeon 0000:01:00.0: couldn't schedule ib [30254.005084] radeon 0000:01:00.0: couldn't schedule ib [30254.005092] radeon 0000:01:00.0: couldn't schedule ib [30254.005097] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30254.012901] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [...] many similar repeated lines about IB [...] [30264.498754] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30264.498759] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000090e5 last fence id 0x00000000000090e3)
Trying to restart X didn't work (X crashed), so I had to reboot the machine. Not sure if this brings any relevations. Anything I could do to provide more information next time when this happens?
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #5 from Harald Judt h.judt@gmx.at --- Doesn't seem to be related to playing flash videos. A few moments ago parts of the screen started looking crazy/corrupted, then on verification again the flood in dmesg, and I was only able to reboot the machine using ssh. Looking at cayman bug reports, many people seem to have the same or similar problems. I'll try a 3.7 or maybe even a 3.6 kernel, perhaps that works reliably.
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #6 from Martin Bednar serafean@gmail.com --- adding R600_DEBUG=nodma to my environment makes the problem go away... Not pretty, but a workaround. Same question though : how could I help debugging this?
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #7 from Harald Judt h.judt@gmx.at --- Created attachment 80920 --> https://bugs.freedesktop.org/attachment.cgi?id=80920&action=edit netconsole.log
I've been able to get a more complete output using netconsole. I'm not sure if it helps, but here it is.
Here are the steps to reproduce the crash:
1) Go to youtube.com, start playing a video. => This prints these lines: radeon 0000:01:00.0: GPU fault detected: 146 0x0b95e004 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000010B9 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050E0004 2) Close the web browser, activate xscreensaver with an opengl screensaver like photopile. => GPU lockup CP stall
https://bugs.freedesktop.org/show_bug.cgi?id=64867
Harald Judt h.judt@gmx.at changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #80920|0 |1 is obsolete| |
--- Comment #8 from Harald Judt h.judt@gmx.at --- Created attachment 80921 --> https://bugs.freedesktop.org/attachment.cgi?id=80921&action=edit netconsole.log
Last upload didn't work.
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #9 from Harald Judt h.judt@gmx.at ---
adding R600_DEBUG=nodma to my environment makes the problem go away... Not pretty, but a workaround. Same question though : how could I help debugging this?
I confirm this helps; Not against the GPU fault when playing the video, but against the crashes/hangs when an opengl xscreensaver etc. activates. Thanks for mentioning the workaround.
I've also applied https://bugs.freedesktop.org/attachment.cgi?id=72794, but it doesn't help.
https://bugs.freedesktop.org/show_bug.cgi?id=64867
--- Comment #10 from Harald Judt h.judt@gmx.at --- With current up-to-date git versions of libdrm, mesa, xorg-server and xf86-video-ati, the R600_DEBUG=nodma hack no longer seems necessary (linux-3.11.0-rc6 with UVD disabled); the GPU faults have vanished and the system is stable.
https://bugs.freedesktop.org/show_bug.cgi?id=64867
Dave Airlie airlied@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
dri-devel@lists.freedesktop.org