https://bugzilla.kernel.org/show_bug.cgi?id=205169
Bug ID: 205169 Summary: AMDGPU driver with Navi card hangs Xorg in fullscreen only. Product: Drivers Version: 2.5 Kernel Version: 5.4.0-rc2 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: drjoms@gmail.com Regression: No
I have another problem logged with Navi + AMDGPU drivers. It's triggered independently and reliable. https://bugzilla.kernel.org/show_bug.cgi?id=204725
With that said, starting strictly and specifically with kernel version 5.4.0* I have new problem.
I successfully load into Xorg. I can start OpenGL and Vulkan games in non full screen. But once I start them - input devices hang, screen freezes. Machine is responsive over SSH/ethernet. I can raise skinny elephants.
I tried opening a few games in non full screen mode and in full screen mode. And i reliably hit bug everytime anything with OpenGL goes full screen on native resolution of the screen.
I noticed, issue is less likely to happen if program goes full screen in non native resolution.
I will attach details in files for DMESG, lsmod and some other things directly as message, if they are short enough.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #1 from Dmitri Seletski (drjoms@gmail.com) --- Created attachment 285479 --> https://bugzilla.kernel.org/attachment.cgi?id=285479&action=edit dmesg Sat 12 Oct 2019 03:34:43 PM IST
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #2 from Dmitri Seletski (drjoms@gmail.com) --- Created attachment 285481 --> https://bugzilla.kernel.org/attachment.cgi?id=285481&action=edit .config file Sat 12 Oct 2019 03:36:01 PM IST
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #3 from Dmitri Seletski (drjoms@gmail.com) --- Module Size Used by bridge 147456 0 stp 16384 1 bridge llc 16384 2 bridge,stp tun 53248 2 uvcvideo 106496 0 videobuf2_vmalloc 16384 1 uvcvideo videobuf2_memops 16384 1 videobuf2_vmalloc videobuf2_v4l2 24576 1 uvcvideo videodev 204800 2 videobuf2_v4l2,uvcvideo kvm_amd 86016 0 videobuf2_common 49152 2 videobuf2_v4l2,uvcvideo joydev 24576 0 mousedev 24576 0 kvm 659456 1 kvm_amd amdgpu 3989504 12 irqbypass 16384 1 kvm snd_virtuoso 49152 2 snd_oxygen_lib 49152 1 snd_virtuoso snd_mpu401_uart 16384 1 snd_oxygen_lib gpu_sched 32768 1 amdgpu i2c_piix4 24576 0 snd_rawmidi 32768 1 snd_mpu401_uart ttm 94208 1 amdgpu sr_mod 28672 0 cdrom 36864 1 sr_mod k10temp 16384 0
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #4 from Dmitri Seletski (drjoms@gmail.com) --- i realised that I have llvm 10 and 9 same time on my machine. i removed llvm 10, recompiled mesa.
uname -a Linux (none)dimko's Desktop 5.4.0-rc2 #1 SMP PREEMPT Tue Oct 8 19:48:16 IST 2019 x86_64 AMD Ryzen 5 1600 Six-Core Processor AuthenticAMD GNU/Linux
I am on AMD64 Gentoo.
will test after mesa is recompiled with V9 LLVM support and report any changes. If any.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #5 from Dmitri Seletski (drjoms@gmail.com) --- screen resolution 3440x1440. refresh rate 100, also tried 60. did not make any difference.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #6 from Dmitri Seletski (drjoms@gmail.com) --- interesting find, under Xwayland, same issue doesn't happen! I won't blame it on Xorg, because under older kernel programs with OpenGL and fulscreen work.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
Pierre-Eric Pelloux-Prayer (pierre-eric.pelloux-prayer@amd.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |pierre-eric.pelloux-prayer@ | |amd.com
--- Comment #7 from Pierre-Eric Pelloux-Prayer (pierre-eric.pelloux-prayer@amd.com) --- (In reply to Dmitri Seletski from comment #0)
I have another problem logged with Navi + AMDGPU drivers. It's triggered independently and reliable. https://bugzilla.kernel.org/show_bug.cgi?id=204725
With that said, starting strictly and specifically with kernel version 5.4.0* I have new problem.
What kernel version were you using before that didn't have the problem?
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #8 from Dmitri Seletski (drjoms@gmail.com) --- (In reply to Pierre-Eric Pelloux-Prayer from comment #7)
(In reply to Dmitri Seletski from comment #0)
I have another problem logged with Navi + AMDGPU drivers. It's triggered independently and reliable. https://bugzilla.kernel.org/show_bug.cgi?id=204725
With that said, starting strictly and specifically with kernel version 5.4.0* I have new problem.
What kernel version were you using before that didn't have the problem?
It was 5.3.* when I could open and use OpenGL and Vulkan apps full screen and it wouldn't crash. This is list of kernels I used from 5.3.*
ls /boot/ |grep vmlinuz-5.3. vmlinuz-5.3.0+ vmlinuz-5.3.0-next-20190920 vmlinuz-5.3.0+.old vmlinuz-5.3.0-rc6 vmlinuz-5.3.0-rc6+ vmlinuz-5.3.0-rc6+.old vmlinuz-5.3.0-rc8 vmlinuz-5.3.0-rc8.old
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #9 from Dmitri Seletski (drjoms@gmail.com) --- i had a couple of LLVM versions.i removed all. Now I have version 9.0.0 dimko@(none)dimko's Desktop ~ $ ls /boot/ |grep vmlinuz-5.3.
sys-devel/llvm Latest version available: 9.0.0 Latest version installed: 9.0.0
I have recompiled Mesa with llvm 9(previously was compiled with llvm 10 which i removed off the system manually)
glxinfo | grep "OpenGL version" OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.3.0-devel (git-1294f01e06)
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #10 from Pierre-Eric Pelloux-Prayer (pierre-eric.pelloux-prayer@amd.com) --- "git bisect" identifies this commit as the problematic one: 617089d5837a ("drm/amd/display: revert wait in pipelock").
Reverting this commit on top of amd-staging-drm-next seems to work fine.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #11 from Dmitri Seletski (drjoms@gmail.com) --- (In reply to Pierre-Eric Pelloux-Prayer from comment #10)
"git bisect" identifies this commit as the problematic one: 617089d5837a ("drm/amd/display: revert wait in pipelock").
Reverting this commit on top of amd-staging-drm-next seems to work fine.
uname -a Linux (none)dimko's Desktop 5.3.0-rc3+ #3 SMP PREEMPT Mon Oct 14 20:49:02 IST 2019 x86_64 AMD Ryzen 5 1600 Six-Core Processor AuthenticAMD GNU/Linux
git checkout 617089d5837a^
Issue no longer happens
Major downgrade, but no more problem. Which commit can I use to solve this issue?
Bug 205169 - AMDGPU driver with Navi card hangs Xorg in fullscreen only. (edit) https://bugzilla.kernel.org/show_bug.cgi?id=204725
Sorry that I take advantage of you here. I will try to find 5.3.0 commit. I am new into all this stuff.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #12 from Dmitri Seletski (drjoms@gmail.com) --- (In reply to Dmitri Seletski from comment #11)
(In reply to Pierre-Eric Pelloux-Prayer from comment #10)
"git bisect" identifies this commit as the problematic one: 617089d5837a ("drm/amd/display: revert wait in pipelock").
Reverting this commit on top of amd-staging-drm-next seems to work fine.
uname -a Linux (none)dimko's Desktop 5.3.0-rc3+ #3 SMP PREEMPT Mon Oct 14 20:49:02 IST 2019 x86_64 AMD Ryzen 5 1600 Six-Core Processor AuthenticAMD GNU/Linux
git checkout 617089d5837a^
Issue no longer happens
Major downgrade, but no more problem. Which commit can I use to solve this issue?
Bug 205169 - AMDGPU driver with Navi card hangs Xorg in fullscreen only. (edit) https://bugzilla.kernel.org/show_bug.cgi?id=204725
Sorry that I take advantage of you here. I will try to find 5.3.0 commit. I am new into all this stuff.
with regards to that other bug. It's there since moment when Navi driver was first introduced.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
ArneJ (kernelbug5193@arnej.de) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kernelbug5193@arnej.de
--- Comment #13 from ArneJ (kernelbug5193@arnej.de) --- I had a similar issue with Borderlands 2: https://gitlab.freedesktop.org/mesa/mesa/issues/2004
After I reverted the patch mentioned in comment 10, the issue seems to be fixed. The other hang later seems unrelated (looks like sdma is the problem with that one).
https://bugzilla.kernel.org/show_bug.cgi?id=205169
Shmerl (shtetldik@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |shtetldik@gmail.com
--- Comment #14 from Shmerl (shtetldik@gmail.com) --- Looks like the same issue with Pathfinder: Kingmaker: https://bugs.freedesktop.org/show_bug.cgi?id=112266
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #15 from Dmitri Seletski (drjoms@gmail.com) --- (In reply to ArneJ from comment #13)
I had a similar issue with Borderlands 2: https://gitlab.freedesktop.org/mesa/mesa/issues/2004
After I reverted the patch mentioned in comment 10, the issue seems to be fixed. The other hang later seems unrelated (looks like sdma is the problem with that one).
in my case its with ALL games. pls try others and report back.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #16 from Dmitri Seletski (drjoms@gmail.com) --- (In reply to Shmerl from comment #14)
Looks like the same issue with Pathfinder: Kingmaker: https://bugs.freedesktop.org/show_bug.cgi?id=112266
in my case its with ALL games. pls try others and report back.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #17 from Shmerl (shtetldik@gmail.com) --- (In reply to Dmitri Seletski from comment #16)
(In reply to Shmerl from comment #14)
Looks like the same issue with Pathfinder: Kingmaker: https://bugs.freedesktop.org/show_bug.cgi?id=112266
in my case its with ALL games. pls try others and report back.
I don't know which games you mean. Some others work don't hang me, such as Ion Fury, The Bard's Tale IV and etc. Yet some others like Hedon hang with gfx_0.0.0 timeout hang, so not the same as flip_done timed out hang.
Anyway, I'll try reverting that commit, to check if it helps.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #18 from Shmerl (shtetldik@gmail.com) --- I can confirm, that reverting that commit indeed prevents the hang in Pathfinder: Kingmaker!
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #19 from ArneJ (kernelbug5193@arnej.de) --- (In reply to Dmitri Seletski from comment #16)
(In reply to Shmerl from comment #14)
Looks like the same issue with Pathfinder: Kingmaker: https://bugs.freedesktop.org/show_bug.cgi?id=112266
in my case its with ALL games. pls try others and report back.
I tested many games all over. Many had this issue, some not. After reverting the aforementioned kernel patch and installing latest llvm and mesa from git, I had no more hangs (around 3-4 weeks without a hang now).
https://bugzilla.kernel.org/show_bug.cgi?id=205169
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #20 from Alex Deucher (alexdeucher@gmail.com) --- Created attachment 285935 --> https://bugzilla.kernel.org/attachment.cgi?id=285935&action=edit possible fix
Does this patch help?
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #21 from Dmitri Seletski (drjoms@gmail.com) --- (In reply to Alex Deucher from comment #20)
Created attachment 285935 [details] possible fix
Does this patch help?
It did not just solve one problem, but two!
First of all it solved original issue. Second of all, some games were hanging right before quitting. Xorg was responsive, but processes did not disappear.
I was blaming on proprietary code.
Apparently it was same bug, just different invocation of it.
Please close this bug report. My problem is now fixed.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #22 from Shmerl (shtetldik@gmail.com) --- It fixes Pathfinder: Kingamer too. But first let the patch be upstreamed, then it's OK to close the bug :)
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #23 from ArneJ (kernelbug5193@arnej.de) --- I just let Borderlands 2 run for about one hour in the menu which causes a hang without this patch in at most 3 minutes.
Consider Borderlands 2 also fixed with this :)
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #24 from Shmerl (shtetldik@gmail.com) --- Just FYI, 5.4 is out, but the fix didn't land yet, so it needs to be still applied manually.
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #25 from Shmerl (shtetldik@gmail.com) --- Also, even with 100 ms timeout, the flip hang still happens just very rarely and not in the usual scenarios for me. For example when playing The Witcher 3 (Wine+dxvk) and minimizing the game Window, on some rare occasion that flip hang occurs even with the patch. I suppose it's something to do with KWin (I usually keep compositing disabled though in those cases).
So may be 100 ms value is not always enough?
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #26 from Alex Deucher (alexdeucher@gmail.com) --- Patch has been upstream for a while: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
https://bugzilla.kernel.org/show_bug.cgi?id=205169
--- Comment #27 from aladjev.andrew@gmail.com (aladjev.andrew@gmail.com) --- Kernel driver hangs in production using regular usage. Such issues should be escalated as much as possible: DCN authors and developers meetings, core developers replacements, driver refactoring/rewrite, tests coverage. But it works in commercial environment only, open source provides TIMEOUT_FOR_FLIP_PENDING.
1.5 years passed: TIMEOUT_FOR_FLIP_PENDING is still here and nobody cares, and i am almost sure that nobody will care about it tomorrow.
Thank you.
dri-devel@lists.freedesktop.org