https://bugzilla.kernel.org/show_bug.cgi?id=204611
Bug ID: 204611 Summary: amdgpu error scheduling IBs when waking from sleep Product: Drivers Version: 2.5 Kernel Version: 5.2.9 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: tones111@hotmail.com Regression: No
Created attachment 284485 --> https://bugzilla.kernel.org/attachment.cgi?id=284485&action=edit journalctl: amdgpu lockup on resume from sleep.
My system locks up when trying to wake from sleep (open lid). The screen remains black and is unresponsive to keyboard/mouse input. I'm able to ssh from another machine and have attached the output from journalctl -b. The log shows scrolling errors...
kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) kernel: amdgpu 0000:05:00.0: couldn't schedule ib on ring <gfx>
This is a Lenovo E585 laptop with an AMD R5 2500U APU.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
tones111@hotmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Regression|No |Yes
https://bugzilla.kernel.org/show_bug.cgi?id=204611
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #1 from Alex Deucher (alexdeucher@gmail.com) --- If this is a regression can you bisect?
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #2 from tones111@hotmail.com --- The problem is after v5.1, and before v5.2. It's very reproducible on v5.2 but might be less frequent as the bisect progresses. Attempts have driven me into the weeds, but I'm still trying.
It looks like another user reported the same issue here: https://bugzilla.kernel.org/show_bug.cgi?id=204227
During my bisect I was seeing visual artifacts without the lockup so I believe they're separate issues.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #3 from tones111@hotmail.com --- I'm still working on trying to bisect the problem, but it's been challenging. Following the advice at https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate... I turned on the initcall_debug and no_console_suspend boot options.
I then see the following messages in the boot log after bringing the system back up.
Sep 15 17:36:39 mobile kernel: [drm] reserve 0x400000 from 0xf400c00000 for PSP TMR SIZE ... Sep 15 17:36:39 mobile kernel: [drm] psp command failed and response status is (0) Sep 15 17:36:39 mobile kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP load tmr failed! Sep 15 17:36:39 mobile kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed Sep 15 17:36:39 mobile kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22 Sep 15 17:36:39 mobile kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22). Sep 15 17:36:39 mobile kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -22 Sep 15 17:36:39 mobile kernel: amdgpu 0000:05:00.0: pci_pm_resume+0x0/0x90 returned -22 after 19543535 usecs Sep 15 17:36:39 mobile kernel: PM: Device 0000:05:00.0 failed to resume async: error -22
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #4 from tones111@hotmail.com --- I've been able to narrow the problem down a bit.
The first commit where I get the scrolling amdgpu errors is 4f8b49092c37cf0c87c43bb2698d43c71cf0e4e5
Unfortunately that's a merge commit. One of the parents appears to be good ceacbc0e145e3b27d8b12eecb881f9d87702765a
The other parent 5dd6c49339126c2c8df2179041373222362d6e49 causes lockups that don't have any journal messages after going to sleep. I've tried bisecting this back to v5.1-rc1 (good) but the lockups become much less consistent.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
Carmen Bianca Bakker (carmen@carmenbianca.eu) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |carmen@carmenbianca.eu
--- Comment #5 from Carmen Bianca Bakker (carmen@carmenbianca.eu) --- I have the same problem on a Thinkpad X395, Ryzen 5 3500U. I have a downstream bug report at https://bugzilla.redhat.com/show_bug.cgi?id=1731915
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #6 from Carmen Bianca Bakker (carmen@carmenbianca.eu) --- Created attachment 285365 --> https://bugzilla.kernel.org/attachment.cgi?id=285365&action=edit journalctl output on Thinkpad X395
https://bugzilla.kernel.org/show_bug.cgi?id=204611
Vic Luo (vicluo96@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |vicluo96@gmail.com
--- Comment #7 from Vic Luo (vicluo96@gmail.com) --- Same for Thinkpad E585 with Ryzen 5 2500U.
kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP load tmr failed! kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22). kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x80 returns -22 kernel: PM: Device 0000:05:00.0 failed to resume async: error -22 kernel: acpi LNXPOWER:01: Turning OFF kernel: OOM killer enabled. kernel: Restarting tasks ...
https://bugzilla.kernel.org/show_bug.cgi?id=204611
aeon.descriptor@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |aeon.descriptor@gmail.com
--- Comment #8 from aeon.descriptor@gmail.com --- Issue also present on Lenovo e585 -> "AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx"
I can provide debugging information upon request, availability permitting. Omitted for now, as substantially similar to Vic Luo. I'm not just posting this as a 'me too', I'll try to make availability to help out in whatever ways I can.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
Bastian Luettig (bastian@luettig.eu) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |bastian@luettig.eu
--- Comment #9 from Bastian Luettig (bastian@luettig.eu) --- Created attachment 289265 --> https://bugzilla.kernel.org/attachment.cgi?id=289265&action=edit journalctl amdgpu fails on resume
confirming the bug on AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
Fedora 32
Kernel: 5.6.14-300.fc32.x86_64
Resume fails presumably (fan still active) iommu=pt and amd_iommu=on do not work, disabling pageflip does not work
latest bios version from HP is installed.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #10 from Bastian Luettig (bastian@luettig.eu) --- Created attachment 289269 --> https://bugzilla.kernel.org/attachment.cgi?id=289269&action=edit dmesg output when switching to console
update: when switching to console (ctrl alt f4) before suspend, pc wakes up again. direct switching back to wayland freezes pc
when instead restarting gdm from console, computer can resume again in wayland (took two logins) attached the dmesg output of suspend and resume in console mode.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
Daniel Parks (danielrparks@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |danielrparks@gmail.com
--- Comment #11 from Daniel Parks (danielrparks@gmail.com) --- Created attachment 289653 --> https://bugzilla.kernel.org/attachment.cgi?id=289653&action=edit 4700u journal
I am also affected by this issue on a Dell Inspiron 14 2-in-1 7405 with a Ryzen 7 4700u. I also am willing to help debug and test, but unfortunately I cannot help bisect because amdgpu did not support my gpu at all when the regression occurred.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #12 from tones111@hotmail.com --- I haven't seen problems resuming from sleep in some time. Is anyone still experiencing this problem on newer kernels? If not then I'd like to propose this issue be marked as resolved.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #13 from aeon.descriptor@gmail.com --- I still have this issue, but I'm using the latest Ubuntu 20.04 patched kernels, so I don't know how 'latest' that is.
What kernel versions work? I could try them out.
On Sat, Jan 29, 2022, 2:55 PM bugzilla-daemon@kernel.org wrote:
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #12 from tones111@hotmail.com --- I haven't seen problems resuming from sleep in some time. Is anyone still experiencing this problem on newer kernels? If not then I'd like to propose this issue be marked as resolved.
-- You may reply to this email to add a comment.
You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #14 from Thierry (thierry.monnier5@gmail.com) --- No problem since a long time too. I think it's solve.
Le sam. 29 janv. 2022 à 23:59, bugzilla-daemon@kernel.org a écrit :
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #13 from aeon.descriptor@gmail.com --- I still have this issue, but I'm using the latest Ubuntu 20.04 patched kernels, so I don't know how 'latest' that is.
What kernel versions work? I could try them out.
On Sat, Jan 29, 2022, 2:55 PM bugzilla-daemon@kernel.org wrote:
https://bugzilla.kernel.org/show_bug.cgi?id=204611
--- Comment #12 from tones111@hotmail.com --- I haven't seen problems resuming from sleep in some time. Is anyone
still
experiencing this problem on newer kernels? If not then I'd like to propose this issue be marked as resolved.
-- You may reply to this email to add a comment.
You are receiving this mail because: You are on the CC list for the bug.
-- You may reply to this email to add a comment.
You are receiving this mail because: You are on the CC list for the bug.
dri-devel@lists.freedesktop.org