https://bugs.freedesktop.org/show_bug.cgi?id=97025
Bug ID: 97025 Summary: flip queue failed: Device or resource busy Product: DRI Version: unspecified Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: linux@bernd-steinhauser.de
Since last week I have random freezes with the amdgpu driver (running on Kaveri).
Once the issue occurs the display freezes. It's not fixable by switch to VT2 and back.
In Xorg.0.log I can find multiple times: [ 92357.021] (WW) AMDGPU(0): flip queue failed: Device or resource busy [ 92357.021] (WW) AMDGPU(0): Page flip failed: Device or resource busy [ 92357.021] (EE) AMDGPU(0): present flip failed
No related messages in the journal or dmesg afaics.
It does not seem to be related to a specific event (like a video playing), but just happens out of nowhere. I didn't find a way to reproduce it specifically.
Possibly related packages that I built in that time: * dev-lang/llvm-scm::arbor 2016-06-11 07:42:19 UTC * dev-lang/llvm-scm::arbor 2016-06-19 07:29:42 UTC * x11-dri/mesa-12.0.0-rc4::x11 2016-06-21 21:40:48 UTC * dev-lang/llvm-scm::arbor 2016-07-02 11:57:34 UTC * dev-lang/clang-scm::arbor 2016-07-02 12:43:00 UTC * dev-lang/llvm-3.8.0-r1::arbor 2016-07-12 20:04:14 UTC * dev-lang/clang-3.8.0::arbor 2016-07-12 20:48:27 UTC * x11-dri/mesa-12.0.0::x11 2016-07-13 04:42:47 UTC * x11-dri/mesa-12.0.1::x11 2016-07-17 14:25:44 UTC * x11-server/xorg-server-1.18.4::x11 2016-07-20 16:06:18 UTC
I couldn't get mesa 12 to built with llvm-scm anymore, so I downgraded. Still, I doubt it's related.
It's hard to be certain about this, but it could have been a regressing coming with mesa 12 and possibly mesa-12.0.0. I'm pretty sure I haven't seen the freeze before 12.0.0 final, but it's hard to be certain about this with an issue so random.
In case it matters, my xorg settings are: Section "Device" Identifier "AMDGPU" Driver "amdgpu" Option "TearFree" "Off" Option "EnablePageFlip" "On" Option "DRI" "3" EndSection
IIRC, this is now standard, so nothing special here.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #1 from Michel Dänzer michel@daenzer.net --- Please attach the Xorg log and dmesg output corresponding to the problem.
(In reply to Bernd Steinhauser from comment #0)
- x11-server/xorg-server-1.18.4::x11 2016-07-20 16:06:18 UTC
Which version of xorg-server were you using before? Does going back to that fix the problem?
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #2 from Bernd Steinhauser linux@bernd-steinhauser.de --- (In reply to Michel Dänzer from comment #1)
Please attach the Xorg log and dmesg output corresponding to the problem.
(In reply to Bernd Steinhauser from comment #0)
- x11-server/xorg-server-1.18.4::x11 2016-07-20 16:06:18 UTC
Which version of xorg-server were you using before? Does going back to that fix the problem?
Before it was 1.18.3 installed in April. I hoped that the update might improve the situation, but it didn't. So I'm pretty sure that the xorg-server update is unrelated.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #3 from Bernd Steinhauser linux@bernd-steinhauser.de --- I noticed that I updated my kernel from 4.6.3 to 4.6.4 on 12th of July, so I thought it could be related and had a little investigation. Then I stumbled across this log, which I think was the first time this happened. This is from journald: Jul 09 08:59:43 orionis kernel: Linux version 4.6.3-amdgpu (root@orionis) (gcc version 5.3.0 (GCC) ) #1 SMP PREEMPT Sat Jun 25 21:20:12 CEST 2016 [...] Jul 09 17:04:08 orionis kernel: [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip Jul 09 17:04:09 orionis kernel: [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip
No idea why in this case I can find some messages in the journal and in the other cases not. Anyway, this means that the origin is not the update mesa-12.0.0-rc4 -> final and also not linux 4.6.3 -> 4.6.4.
Also unlikely 4.6.2 -> 4.6.3, since (as you can see above) this was built approx. 2 weeks before and within that amount of time I would surely have experienced the problem. (Had it approx. 8 to 10 times during the last 2 weeks.)
Another message I found in a different log is: Jul 24 00:45:17 orionis kernel: [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed Jul 24 00:45:17 orionis kernel: [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed
Not sure if it is related.
With regards to what package started to bring this up, I'm now almost out of ideas. The only thing left would be kwin/plasma 5. The Update from Plasma 5.6.95 to 5.7.0 was performed on the 5th of July. So, since kwin is what I use as a compositor (and Plasma 5 as a desktop), it might be able that this triggers a bug?
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #4 from Michel Dänzer michel@daenzer.net --- Still looking for the full Xorg log and dmesg output, preferably captured after the problem occurred.
Does restarting kwin recover from the hang?
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #5 from Bernd Steinhauser linux@bernd-steinhauser.de --- Sorry, missed that request in your post above.
dmesg output I don't have available as I didn't have ssh activated when the problem occurred. (now I do)
I could attach the journald kernel output if that would be sufficient?
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #6 from Bernd Steinhauser linux@bernd-steinhauser.de --- Created attachment 125329 --> https://bugs.freedesktop.org/attachment.cgi?id=125329&action=edit Xorg.0.log
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #7 from Bernd Steinhauser linux@bernd-steinhauser.de --- Created attachment 125333 --> https://bugs.freedesktop.org/attachment.cgi?id=125333&action=edit dmesg output
dmesg output from the currently running system.
Attaching this as I noticed that I do get those vblank/flip messages even now, when I didn't experience the bug (yet).
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #8 from Bernd Steinhauser linux@bernd-steinhauser.de --- I noticed that those two lines coincident with a certain event I can trigger: Switching the DP-0 display off (an Eizo EV2455). This leads to a disconnect of the DP connection and that leads (somehow) to the quoted messages about the failed vblank. (I'm not sure if the disconnect is actually a bug in the kernel (as it's a DP1.2 display) or if it's my hardware/mainboard/gpu too old.)
However, this disconnect does not lead straight to the freeze. And so far I haven't seen the bug directly after a DP disconnect, but just at some random point.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #9 from Bernd Steinhauser linux@bernd-steinhauser.de --- Created attachment 125433 --> https://bugs.freedesktop.org/attachment.cgi?id=125433&action=edit dmesg output after the freeze
I logged into the machine during a freeze and saved the dmesg output. Unfortunately, it doesn't seem to contain additional information.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #10 from Bernd Steinhauser linux@bernd-steinhauser.de --- Since the weekend, I ran kwin without compositing.
Since then, I haven't seen this happening, so I think this is a bug that is triggered by kwin when compositing, likely since 4.7.0.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #11 from Michel Dänzer michel@daenzer.net --- Does explicitly disabling the DP output in the KDE configuration before turning off the monitor avoid the problem?
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #12 from Bernd Steinhauser linux@bernd-steinhauser.de --- It does prevent the vblank messages in dmesg, I don't know if it'll prevent the freeze.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #13 from Bernd Steinhauser linux@bernd-steinhauser.de --- One more remark: I've only observed the effect when the OpenGL 3.1 compositing backend in kwin is active. I tested with OpenGL 2 backend over the last week and have not seen this happening since.
I should also mention that I've had the egl interface activated, which is not recommended for kwin. I've not had issues with it before, but it could be related, so the next thing I'm testing is glx/OpenGL 3.1 and hope I can narrow this down this way.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #14 from Bernd Steinhauser linux@bernd-steinhauser.de --- Ok, it's not egl, the same happens with glx/OpenGL3.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #15 from Bernd Steinhauser linux@bernd-steinhauser.de --- I tried a few things, but wasn't really able to nail this down. I downgraded to mesa 11.2 to see if that helps, but it does not.
However, today I had plasmashell freezing after unlocking the screen. Only plasmashell froze, everything else kept working as expected.
I contacted Martin on IRC and he thought it might be related to this. I'll attach the log from the conversation as well as the backtrace.
He might be right, because around the time when this happened, I get these messages in dmesg: [88765.431890] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.436865] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.441940] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.446861] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.451865] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [88765.456903] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [89579.510005] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [89579.514998] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [89579.520053] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [89579.525158] [drm:amdgpu_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip [113833.139104] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [113833.139117] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed [113833.361471] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [113833.361484] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed [113836.962993] [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #16 from Bernd Steinhauser linux@bernd-steinhauser.de --- Created attachment 126131 --> https://bugs.freedesktop.org/attachment.cgi?id=126131&action=edit irc conversiaton with Martin Grässlin
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #17 from Bernd Steinhauser linux@bernd-steinhauser.de --- Created attachment 126132 --> https://bugs.freedesktop.org/attachment.cgi?id=126132&action=edit plasmashell backtrace
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #18 from Michel Dänzer michel@daenzer.net --- (In reply to Bernd Steinhauser from comment #15)
However, today I had plasmashell freezing after unlocking the screen. Only plasmashell froze, everything else kept working as expected.
[...]
[...] I get these messages in dmesg:
There are some messages with a timestamp around 89xxx and some with a timestamp around 11383x. Almost 7 hours passed in between, so which group of messages corresponds to the plasmashell freeze? Probably the latter? Those look again like the DP connection is lost. Were you able to determine if explicitly disabling the DP output in the kwin settings avoids the freezes?
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #19 from Bernd Steinhauser linux@bernd-steinhauser.de --- Yes, the ones around 11383x.
I can't yet be sure about DP, but I'll check again. The problem is that I can't find a way to trigger it, it just happens randomly.
The DisplayPort Monitor is my main screen, it would mean I have to work for 1 week or so without it.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #20 from Bernd Steinhauser linux@bernd-steinhauser.de --- Ok, running for approx. 4 days now with DP-0 deactivated and so far didn't spot any problems. Only at the very start, I could find these messages, but that was before running kde: [ 14.404932] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [ 14.404939] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed
Still, it's hard to tell for this kind of problem that occurs so randomly.
I'll have a search if I have another DP cable, so I can check that.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #21 from Bernd Steinhauser linux@bernd-steinhauser.de --- Ok, so I replaced the DP cable and reenabled the screen. Immediately after that I got these messages in dmesg. Note the time. [338324.267684] [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip [338324.489710] [drm:amdgpu_crtc_page_flip] *ERROR* failed to get vblank before flip [338526.834794] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [338526.834801] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed [338526.838652] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [338526.838655] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed
After that, no messages (related to the graphics stack) appeared in dmesg so far. However, the X server log is now spammed with messages every few seconds: [338324.859] (WW) AMDGPU(0): flip queue failed: Invalid argument [338324.859] (WW) AMDGPU(0): Page flip failed: Invalid argument [338324.859] (EE) AMDGPU(0): present flip failed [338324.940] (WW) AMDGPU(0): get vblank counter failed: Invalid argument [338324.942] (WW) AMDGPU(0): get vblank counter failed: Invalid argument [338324.942] (WW) AMDGPU(0): flip queue failed: Device or resource busy [338324.942] (WW) AMDGPU(0): Page flip failed: Device or resource busy
This started right after activating the DP screen. I guess sooner or later that will result in the freeze that I'm seeing. (I'll upload both dmesg and Xorg.0.log.)
So yeah, it seems like this a problem with the DP. Since I don't think that I have two broken DP cables, I guess the problem is somewhere else. If that would help, I can connect one of the other screens via DP and see if that makes a difference.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #22 from Bernd Steinhauser linux@bernd-steinhauser.de --- Created attachment 126285 --> https://bugs.freedesktop.org/attachment.cgi?id=126285&action=edit dmesg after reenabled DP
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #23 from Bernd Steinhauser linux@bernd-steinhauser.de --- Created attachment 126286 --> https://bugs.freedesktop.org/attachment.cgi?id=126286&action=edit Xorg.0.log after reenabled DP
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #24 from Kevin McCormack harlemsquirrel@gmail.com --- I am experiencing what I think may be a similar issue. When my display sleeps, it often does not wake up on keypress. I have to wait anywhere from a few seconds to a few minutes and then have errors in my log like the following
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
I am running Antergos 64-bit with GNOME 3.22.2 on Wayland Kernels 4.8.13 and 4.10.0-rc3-ga121103c9228 AMD FX-8370 Sapphire Fury X
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #25 from Kevin McCormack harlemsquirrel@gmail.com --- Created attachment 128916 --> https://bugs.freedesktop.org/attachment.cgi?id=128916&action=edit Delayed recovery from display sleep logs
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #26 from Michel Dänzer michel@daenzer.net --- Does https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... help for this?
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #27 from Bernd Steinhauser linux@bernd-steinhauser.de --- Thanks, I'm testing it right now on linux 4.16.8.
Although I'm not sure if it works as expected, since the display does still seem to disconnect when I turn the screen off.
At least the messages in dmesg are gone, so it's definitely different compared to previous tests. Can't say anything about the freezes without extensive testing, though.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #28 from Michel Dänzer michel@daenzer.net --- (In reply to Bernd Steinhauser from comment #27)
Although I'm not sure if it works as expected, since the display does still seem to disconnect when I turn the screen off.
AFAIK that's either a monitor or general DisplayPort issue. The drivers can't prevent it but have to cope with it.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #29 from Bernd Steinhauser linux@bernd-steinhauser.de --- (In reply to Michel Dänzer from comment #28)
(In reply to Bernd Steinhauser from comment #27)
Although I'm not sure if it works as expected, since the display does still seem to disconnect when I turn the screen off.
AFAIK that's either a monitor or general DisplayPort issue. The drivers can't prevent it but have to cope with it.
Quite possible. I've seen such behaviour on Windows as well on some displays. Don't really get it, it's very annoying if your windows are rearrange just because you turned off a display to save some power.
Anyway back to topic: [595475.710884] [drm:amdgpu_atombios_dp_link_train] *ERROR* displayport link status failed [595475.710902] [drm:amdgpu_atombios_dp_link_train] *ERROR* clock recovery failed
I do still get those messages sometimes, but at least I didn't experience any lockups or freezes.
https://bugs.freedesktop.org/show_bug.cgi?id=97025
--- Comment #30 from Fermulator freedesktop-bugs@fermulator.fastmail.org --- note, experiencing the same (or at least similar) issues -- my story is bug'd here: * https://bugs.freedesktop.org/show_bug.cgi?id=107560
https://bugs.freedesktop.org/show_bug.cgi?id=97025
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #31 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/80.
dri-devel@lists.freedesktop.org