https://bugs.freedesktop.org/show_bug.cgi?id=51344
Bug #: 51344 Summary: massive corruption on RV515 Classification: Unclassified Product: DRI Version: XOrg CVS Platform: x86 (IA32) OS/Version: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/Radeon AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: bugzi11.fdo.tormod@xoxy.net
Created attachment 63356 --> https://bugs.freedesktop.org/attachment.cgi?id=63356 Xorg.0.log
This happened early May on drm-next somewhere between 4f256e8..d3029b4, and is still there in 3.5rc3 (and in current drm-next).
Things are smeared out vertically. Looks like desktop background is not corrupted. By turning off "EXABitmaps" there is less corruption.
I haven't done git bisecting, only download bisecting from http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-next/ and v3.4-rc6-295-g4f256e8 from May 8th was good and v3.4-rc6-315-gd3029b4 from May 10th was bad. Unfortunately the build from May 9th has been deleted in the meantime so I can not narrow it down further this way. So the commits in question should be:
d3029b4 drm/radeon/kms: fix warning on 32-bit in atomic fence printing f2e3922 drm/radeon: make the ib an inline object f237750 drm/radeon: remove r600 blit mutex v2 68470ae drm/radeon: move the semaphore from the fence into the ib 7c0d409 drm/radeon: immediately free ttm-move semaphore c507f7e drm/radeon: rip out the ib pool a8c0594 drm/radeon: simplify semaphore handling v2 c3b7fe8 drm/radeon: multiple ring allocator v3 0085c950 drm/radeon: use one wait queue for all rings add fence_wait_any v2 557017a drm/radeon: define new SA interface v3 2e0d991 drm/radeon: make sa bo a stand alone object e6661a9 drm/radeon: keep start and end offset in the SA 711a972 drm/radeon: add sub allocator debugfs file a651c55 drm/radeon: add proper locking to the SA v3 dd8bea2 drm/radeon: use inline functions to calc sa_bo addr 8a47cc9 drm/radeon: rework locking ring emission mutex in fence deadlock detection v2 3b7a2b2 drm/radeon: rework fence handling, drop fence list v7 bb63556 drm/radeon: convert fence to uint64_t v4 d6999bc drm/radeon: replace the per ring mutex with a global one 133f4cb drm/radeon: fix possible lack of synchronization btw ttm and other ring
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices [AMD] nee ATI Radeon Mobility X700 (PCIE) [1002:5653]
https://bugs.freedesktop.org/show_bug.cgi?id=51344
Tormod Volden bugzi11.fdo.tormod@xoxy.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|massive corruption on RV515 |massive corruption on RV410
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #1 from Tormod Volden bugzi11.fdo.tormod@xoxy.net 2012-06-22 11:22:39 PDT --- Created attachment 63357 --> https://bugs.freedesktop.org/attachment.cgi?id=63357 dmesg output
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #2 from Tormod Volden bugzi11.fdo.tormod@xoxy.net 2012-06-22 11:50:37 PDT --- Created attachment 63359 --> https://bugs.freedesktop.org/attachment.cgi?id=63359 screenshot (no xorg.conf options)
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #3 from Tom Stellard tstellar@gmail.com 2012-06-24 10:57:40 PDT --- Can you try to bisect this using git bisect and find the first bad commit?
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #4 from Tormod Volden bugzi11.fdo.tormod@xoxy.net 2012-06-27 12:33:37 PDT --- Sorry, I don't know when I can have time to do that. I'll try harder if the bug can be confirmed by other people too. Maybe the right developer can make an educated guess if it's limited to this card.
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #5 from Andrea mariofutire@googlemail.com 2012-08-27 20:00:41 UTC --- Hi guys,
can this be related to
https://bugs.freedesktop.org/show_bug.cgi?id=54129
?
I ended up in the same area of the git log.
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #6 from Jerome Glisse glisse@freedesktop.org 2012-08-27 20:26:28 UTC --- Also can you test if booting with radeon.no_wb=1 fix the issue ?
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #7 from Tormod Volden bugzi11.fdo.tormod@xoxy.net 2012-08-28 07:22:14 UTC --- Thanks, will test this later. BTW I already tried http://people.freedesktop.org/~glisse/0001-drm-radeon-extra-type-safe-for-fe... which came up on the dri-devel list, but that did not fix it.
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #8 from Tormod Volden bugzi11.fdo.tormod@xoxy.net 2012-08-28 18:29:07 UTC --- No, booting with radeon.no_wb=1 didn't help.
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #9 from Tormod Volden bugzi11.fdo.tormod@xoxy.net 2012-09-10 20:27:54 UTC --- Created attachment 66942 --> https://bugs.freedesktop.org/attachment.cgi?id=66942 backport of Christian's patch
I tried backporting Christian's patch from https://bugs.freedesktop.org/show_bug.cgi?id=54129#c11 but it did not help either. I suppose the following /sys/kernel/debug/dri/0/radeon_fence_info output indicates that the patch took effect, since the emitted numbers are above 0x100000000LL?
--- ring 0 --- Last signaled fence 0x000000020000149f Last emitted 0x0000000100001a9a
--- ring 0 --- Last signaled fence 0x000000020000149f Last emitted 0x0000000100002041
--- ring 0 --- Last signaled fence 0x000000020000149f Last emitted 0x000000010000294a
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #10 from Tormod Volden bugzi11.fdo.tormod@xoxy.net 2012-09-11 18:42:32 UTC --- Created attachment 66986 --> https://bugs.freedesktop.org/attachment.cgi?id=66986 backport of Christian's v2 patch
I tried backporting the v2 patch from http://lists.freedesktop.org/archives/dri-devel/2012-September/027608.html to kernel 3.5.2, see attached, but it did not help either. Maybe my card has another issue?
Output from /sys/kernel/debug/dri/0/radeon_fence_info
--- ring 0 --- Last signaled fence 0x00000000deadbeef Last emitted 0x0000000000000670
--- ring 0 --- Last signaled fence 0x00000000deadbeef Last emitted 0x0000000000000c44
https://bugs.freedesktop.org/show_bug.cgi?id=51344
--- Comment #11 from Christian König deathsimple@vodafone.de 2012-09-12 09:55:57 UTC --- (In reply to comment #10)
WTF? Well that's a very interesting information you've got us here, thanks allot.
"deadbeef" is a pattern we usually use for ring and IB tests, and I have no idea how that ended up as last signaled fence value.
Could you try Jeromes debugging patch (http://people.freedesktop.org/~glisse/0001-debug-fence-emission-reception.pa...) and attach the resulting output.
Thx, Christian.
https://bugs.freedesktop.org/show_bug.cgi?id=51344
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED AssignedTo|dri-devel@lists.freedesktop |deathsimple@vodafone.de |.org | Attachment #66942|0 |1 is obsolete| | Attachment #66986|0 |1 is obsolete| |
--- Comment #12 from Christian König deathsimple@vodafone.de 2012-09-12 11:49:46 UTC --- Created attachment 67047 --> https://bugs.freedesktop.org/attachment.cgi?id=67047 Possible fix
dri-devel@lists.freedesktop.org