https://bugs.freedesktop.org/show_bug.cgi?id=91141
Bug ID: 91141 Summary: Lots of *ERROR* Couldn't update BO_VA (-22) since drm/radeon: stop using addr to check for BO move Product: DRI Version: DRI git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/Radeon Assignee: dri-devel@lists.freedesktop.org Reporter: hadack@gmx.de
Created attachment 116789 --> https://bugs.freedesktop.org/attachment.cgi?id=116789&action=edit dmesg
This is with latest kernel from linus git tree on a CAPE VERDE card. When the errors appears I get screen corruption when scrolling in a browser/file-manager and missing/changed letters in a terminal.
A bisect led to the commit 161ab658a611df14fb0365b7b70a8c5fed3e4870 and reverting it on master makes everything work normal again.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #1 from hadack@gmx.de --- Created attachment 116790 --> https://bugs.freedesktop.org/attachment.cgi?id=116790&action=edit errors
https://bugs.freedesktop.org/show_bug.cgi?id=91141
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #2 from Christian König deathsimple@vodafone.de --- Fix is already in Alex's drm-fixes-4.2 tree and should appear in -rc1.
If you for some reason need it sooner just cherry pick "drm/radeon: fix adding all VAs to the freed list on remove v2"
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #3 from hadack@gmx.de --- Found the fix in his amdgpu branch and it fixes it, thanks!
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #4 from Christian König deathsimple@vodafone.de --- (In reply to hadack from comment #3)
Found the fix in his amdgpu branch and it fixes it, thanks!
Some users still report some issues even after this fix, so please keep an eye open for additional issues.
If you find some then please reopen this bug report.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
hadack@gmx.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |---
--- Comment #5 from hadack@gmx.de --- Hmm, seems you are right, desktop usage is fine on xfce with compton but starting a game like KSP leads to a non-refreshing screen. Reverting both commits makes it work again.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #6 from Dave Witbrodt dawitbro@sbcglobal.net --- (In reply to hadack from comment #5)
Hmm, seems you are right, desktop usage is fine on xfce with compton but starting a game like KSP leads to a non-refreshing screen. Reverting both commits makes it work again.
I can verify the same observations on my HD 7850 (PITCAIRN 0x1002:0x6819 0x1787:0x2320) card. I use Linux stable kernels with Radeon DRM (and core DRM) cherry-picked in from drm-next and drm-fixes. With my last local update -- from kernel 4.0.4 + DRM 4.1 cherry-picks, to 4.0.6 + DRM 4.1 + DRM 4.2 -- running 'alien-arena' as a test program causes the DE (also Xfce, as is the case with hadack) to stop responding once I exit the game; also, the DE itself seems to trigger the bug after a while, or when resuming from suspend.
I tried the patch mentioned in comment 2 ("drm/radeon: fix adding all VAs to the freed list on remove v2"), but the symptoms described above continued.
Reverting 161ab658, and not applying the "fix ... VAs ... v2" patch, gives me a working kernel. (And one I am very happy with! My current combination of LLVM 3.7, Mesa, libdrm, xf86-video-ati, and xorg-server is the fastest, most responsive system I've ever had with open source drivers.)
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #7 from Shawn Starr shawn.starr@rogers.com --- I confirm removing both patches mentioned (from dri-next-4.2) no issue happens for me.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #116789|0 |1 is obsolete| | Attachment #116790|0 |1 is obsolete| |
--- Comment #8 from Christian König deathsimple@vodafone.de --- Created attachment 116918 --> https://bugs.freedesktop.org/attachment.cgi?id=116918&action=edit Debuging patch.
I unfortunately can't reproduce the issue.
So could somebody please apply the attached patch and try to get me the result stack dump? I need to know who is calling this function.
Thanks in advance, Christian.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #9 from hadack@gmx.de --- Created attachment 116924 --> https://bugs.freedesktop.org/attachment.cgi?id=116924&action=edit output with debugging patch
Here is the output with the debugging patch applied.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #116918|0 |1 is obsolete| | Attachment #116924|0 |1 is obsolete| |
--- Comment #10 from Christian König deathsimple@vodafone.de --- Created attachment 116933 --> https://bugs.freedesktop.org/attachment.cgi?id=116933&action=edit Possible fix
Thanks does the attached patch fixes the issue?
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #11 from hadack@gmx.de --- Created attachment 116936 --> https://bugs.freedesktop.org/attachment.cgi?id=116936&action=edit dmesg with possible fix applied
Still not working with the possible fix applied.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #116936|0 |1 is obsolete| |
--- Comment #12 from Christian König deathsimple@vodafone.de --- Created attachment 116973 --> https://bugs.freedesktop.org/attachment.cgi?id=116973&action=edit Possible fix part 2
Please apply this one on top of the first fix and see if the problem still happen.
Sorry that I can't find it of hand and need to check each possible cause separately, but as noted before I can't reproduce the issue here.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #13 from hadack@gmx.de --- No problem, seems the second try was it. With both patches applied it works fine. Tested standard desktop usage and KSP.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #14 from Christian König deathsimple@vodafone.de --- (In reply to hadack from comment #13)
No problem, seems the second try was it. With both patches applied it works fine. Tested standard desktop usage and KSP.
Thanks for testing. Are you convinced enough that it works so that I can add an "Test-by: hadack@gmx.de" to the patches while pushing them towards 4.2?
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #15 from Dave Witbrodt dawitbro@sbcglobal.net --- (In reply to Christian König from comment #12)
Created attachment 116973 [details] Possible fix part 2
Please apply this one on top of the first fix and see if the problem still happen.
Works good on my machine. The programs that triggered the bug before no longer cause any problems.
Sanity check: I had dropped 161ab658 and b13e22ae from my list of cherry picks before in order to have a working kernel. After adding those back, and applying
0001-drm-radeon-allways-add-the-VM-clear-duplicate.patch 0001-drm-radeon-check-if-BO_VA-is-set-before-adding-it-to.patch
everything works great again. I have not yet tested suspend-to-RAM, but after the testing I've done so far I doubt there will be problems.
HTH, DW
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #16 from hadack@gmx.de --- Still working fine here, I tested all ways to trigger it and its fine. Feel free to add the tested-by.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #17 from Daniel Exner dex+fdobugzilla@dragonslave.de --- I can also confirm that a suspend resume cycle no longer floods my kernel log with linus kernel + the two patches.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
--- Comment #18 from Dave Witbrodt dawitbro@sbcglobal.net --- (In reply to Dave Witbrodt from comment #15) [...]
I have not yet tested suspend-to-RAM, but after the testing I've done so far I doubt there will be problems.
I tried suspend-to-RAM before leaving for work, and it resumed fine after work. No problems at all with the code in question.
DW
https://bugs.freedesktop.org/show_bug.cgi?id=91141
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |FIXED
--- Comment #19 from Christian König deathsimple@vodafone.de --- I think we can close this one now.
https://bugs.freedesktop.org/show_bug.cgi?id=91141
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
dri-devel@lists.freedesktop.org