https://bugzilla.kernel.org/show_bug.cgi?id=196615
Bug ID: 196615 Summary: amdgpu - resume from suspend is no longer working on rx480 Product: Drivers Version: 2.5 Kernel Version: >= 4.11.3 Hardware: Intel OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: psk@autistici.org Regression: No
Hi!
Since 4.12.4 I can no longer resume from suspend using the amdgpu driver on my rx 480.
I did a bisect and it revealed the following commit being the problem:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm...
Can someone help me with fixing that?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
Peter Spiess-Knafl (psk@autistici.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |psk@autistici.org Regression|No |Yes
--- Comment #1 from Peter Spiess-Knafl (psk@autistici.org) --- I also started a discussion thread on the arch forum:
https://bbs.archlinux.org/viewtopic.php?pid=1729393#p1729393
https://bugzilla.kernel.org/show_bug.cgi?id=196615
Peter Spiess-Knafl (psk@autistici.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- URL| |https://git.kernel.org/pub/ | |scm/linux/kernel/git/stable | |/linux-stable.git/commit/?h | |=linux-4.12.y&id=2dc1889ebf | |8501b0edf125e89a30e1cf3744a | |2a7
https://bugzilla.kernel.org/show_bug.cgi?id=196615
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #2 from Alex Deucher (alexdeucher@gmail.com) --- Please attach your dmesg output. How exactly does resume fail?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #3 from Peter Spiess-Knafl (psk@autistici.org) --- Hi Alex!
Thanks for getting back. First there are strange artefacts where the mouse pointer should be and shortly after the system freezes all together.
I'll attach a dmesg log. But I think the relevant errors are these:
Aug 08 22:30:29 rabe kernel: [drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 13 test failed Aug 08 22:30:29 rabe kernel: [drm:amdgpu_resume [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110 Aug 08 22:30:29 rabe kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_resume failed (-110). Aug 08 22:30:29 rabe kernel: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110 Aug 08 22:30:29 rabe kernel: PM: Device 0000:01:00.0 failed to resume async: error -110
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #4 from Peter Spiess-Knafl (psk@autistici.org) --- Created attachment 257861 --> https://bugzilla.kernel.org/attachment.cgi?id=257861&action=edit dmesg log regarding the freeze.
dmesg log regarding the freeze.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #5 from Peter Spiess-Knafl (psk@autistici.org) --- Alex, do you need further infos?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
Francisco J. Vazquez (dv@v44r.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dv@v44r.com
--- Comment #6 from Francisco J. Vazquez (dv@v44r.com) --- Same error on Gentoo, kernel 4.12.7, RX 470. On 4.12.7 the screen does come up on wake up from suspend after a while (20 seconds or so) but the system is unusable: the mouse cursor moves fine and the keyboard responds to keypresses but the screen updates with a 20-30s lag (if I launch a new terminal it appears after half a minute). Changing to VT with ctrl+alt+f[1-6] garbles the screen.
4.12.7 wake up: [...] [ 128.978655] [drm] ring test on 10 succeeded in 6 usecs [ 129.025563] [drm] ring test on 11 succeeded in 1 usecs [ 129.025563] [drm] UVD initialized successfully. [ 129.126522] [drm] ring test on 12 succeeded in 0 usecs [ 129.331084] [drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 13 test failed [ 129.331088] [drm:amdgpu_resume [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110 [ 129.331092] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_resume failed (-110). [ 129.331094] dpm_run_callback(): pci_pm_resume+0x0/0xd0 returns -110 [ 129.331095] PM: Device 0000:01:00.0 failed to resume async: error -110
On 4.12.3 the screen comes up instantly after resume and everything works fine.
4.12.3 wake up: [...] [ 14.255996] [drm] ring test on 10 succeeded in 6 usecs [ 14.302859] [drm] ring test on 11 succeeded in 1 usecs [ 14.302860] [drm] UVD initialized successfully. [ 14.403827] [drm] ring test on 12 succeeded in 0 usecs [ 14.403847] [drm] ring test on 13 succeeded in 9 usecs [ 14.403848] [drm] VCE initialized successfully. [ 14.403945] [drm] ib test on ring 0 succeeded [ 14.404119] [drm] ib test on ring 1 succeeded [...] [ 14.405487] [drm] ib test on ring 11 succeeded [ 14.405694] [drm] ib test on ring 12 succeeded
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #7 from Alex Deucher (alexdeucher@gmail.com) --- Created attachment 258001 --> https://bugzilla.kernel.org/attachment.cgi?id=258001&action=edit revert the change
Does reverting it help?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #258001|0 |1 is patch| | Attachment #258001|application/mbox |text/plain mime type| |
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #8 from Peter Spiess-Knafl (psk@autistici.org) --- Yes, it does. But i guess it was a bugfix for another problem as indicated in your commit message.
Will you revert it?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #9 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to Peter Spiess-Knafl from comment #8)
Yes, it does. But i guess it was a bugfix for another problem as indicated in your commit message.
It is a bug fix for high mclks when displays are off, but it seems to regress resume for some reason so we are just trading one bug for another. I guess maybe there is some other fix missing.
Will you revert it?
Unless you think otherwise.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #10 from Peter Spiess-Knafl (psk@autistici.org) --- Please revert it then. Thanks for your help.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #11 from Peter Spiess-Knafl (psk@autistici.org) --- Alex, when will this be released?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #12 from Alex Deucher (alexdeucher@gmail.com) --- I sent the patch to Greg last week.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
dolohow (dolohow@outlook.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dolohow@outlook.com
--- Comment #13 from dolohow (dolohow@outlook.com) --- After I updated my kernel from 4.12.9 to 4.12.10 I started experiencing screen flickering on my RX 480. I did bisecting and turns out that this commit dbe5b2d70cfdc3e1df1ceb3f715c6ef7d17fc566 makes my screen flickers.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
Harry Wentland (harry.wentland@amd.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |harry.wentland@amd.com
--- Comment #14 from Harry Wentland (harry.wentland@amd.com) --- (In reply to dolohow from comment #13)
After I updated my kernel from 4.12.9 to 4.12.10 I started experiencing screen flickering on my RX 480. I did bisecting and turns out that this commit dbe5b2d70cfdc3e1df1ceb3f715c6ef7d17fc566 makes my screen flickers.
Do you mind adding the commit subject and description in addition the the sha? Which git tree is this from? I'm having trouble finding it.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #15 from dolohow (dolohow@outlook.com) --- Sure, it's a Linus tree
Revert "drm/amdgpu: fix vblank_time when displays are off"
This reverts commit 2dc1889.
Fixes a suspend and resume regression.
bug: https://bugzilla.kernel.org/show_bug.cgi?id=196615 Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
https://bugzilla.kernel.org/show_bug.cgi?id=196615
klavkalashj@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |klavkalashj@gmail.com
--- Comment #16 from klavkalashj@gmail.com --- This bug remains for me. It was working after the patch was reverted, and continued to work fine for the rest of the 4.12 version, but as of linux 4.13.3, still the same symtoms. When browsing the sources for this kernel, it seems the patch is still there. Was it supposed to be reapplied?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #17 from Peter Spiess-Knafl (psk@autistici.org) --- Same for me here. Arch 4.13.3. Was the original patch reapplied?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #18 from klavkalashj@gmail.com --- Looks like it when browsing the source. It's in both 4.13 and 4.14-rc3. I hope it can be removed again in time for the LTS release. For now I'm holding off the upgrade to 4.13. I don't know if I'm getting this right, but it sounds like there is a choice between suspend/resume and screen flickering...
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #19 from klavkalashj@gmail.com --- The code is still there in 4.14-rc4.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #20 from Peter Spiess-Knafl (psk@autistici.org) --- Alex can you help out here? Why was the patch fixing the suspend/resume issue removed in 4.13?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #21 from Peter Spiess-Knafl (psk@autistici.org) --- "git log -p drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c" reveals that the original patch (https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm...) has been reapplied over the fix for suspend.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #22 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to Peter Spiess-Knafl from comment #20)
Alex can you help out here? Why was the patch fixing the suspend/resume issue removed in 4.13?
The fix was only applied to 4.12. No one reported any problems with 4.13 or newe until later.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #23 from klavkalashj@gmail.com --- Oh. I didn't realize it worked like that. The same problem happens with all versions of 4.13 and 4.14 I tried so far.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #24 from klavkalashj@gmail.com --- I would suggest to remove the fix in all kernel versions until we can confirm it doesn't break anything. Having an LTS kernel break suspend/resume for polaris users doesn't sound to good.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
contact@florentflament.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |contact@florentflament.com
--- Comment #25 from contact@florentflament.com --- Hi, Same issue here. OS freezing after resume from suspend with an AMD RX480 GPU.
$ cat /etc/redhat-release Fedora release 26 (Twenty Six) $ uname -a Linux amn 4.13.4-200.fc26.x86_64 #1 SMP Thu Sep 28 20:46:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux $ journalctl -k -b -2 | grep amdgpu | tail -5 Oct 12 01:46:06 amn kernel: [drm] Initialized amdgpu 3.18.0 20150101 for 0000:01:00.0 on minor 0 Oct 12 23:16:29 amn kernel: [drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 14 test failed Oct 12 23:16:29 amn kernel: [drm:amdgpu_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110 Oct 12 23:16:29 amn kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_resume failed (-110). Oct 12 23:16:30 amn kernel: amdgpu 0000:01:00.0: ffff9514d9161800 unpin not necessary
Regards
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #26 from contact@florentflament.com --- I just figured out that if instead of opening a 'Gnome on Xorg' session, I open a 'Gnome' session (which, as far as I know, starts a 'Wayland Display Server' instead of Xorg on my Fedora setup), then I don't have any more issues resuming after a suspend. It works pretty well, no more amdgpu related error messages in my journal.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #27 from klavkalashj@gmail.com --- Alex, I'm sorry for being pushy, but is anything being done about this? The next LTS kernel is closing in on release and suspend/resume is still not working. Linux 4.12.13 is still the last kernel with it working. If there is anything I can help with to solve this, like testing, info etc. just ask.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #28 from Alex Deucher (alexdeucher@gmail.com) --- Created attachment 260307 --> https://bugzilla.kernel.org/attachment.cgi?id=260307&action=edit possible fix
Does the attached patch fix the issue?
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #29 from klavkalashj@gmail.com --- I did a quick test on my Arch linux install. With Linux 4.14-rc5 and this latest patch applied, it seems to work like it should! I suspended and resumed twice and there were no errors reported and the computer resumed correctly. I couldn't get 4.13 to build for some reason, but I think the fault lies in my noobieness :) Will try tomorrow to build 4.13 on Ubuntu instead and get back with results. But so far so good, great job and many thanks!
https://bugzilla.kernel.org/show_bug.cgi?id=196615
Florian Schmitt (kommerz11@galois.de) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kommerz11@galois.de
--- Comment #30 from Florian Schmitt (kommerz11@galois.de) --- Looks like that did the trick for me. I'm using linux 4.13.8 on Fedora.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #31 from klavkalashj@gmail.com --- Yep, it seems to work fine also on 4.13 in Ubuntu. Built the current version of Ubuntu which is called 4.13.0-16-generic with the patch just posted, and the same small test with two suspend/resume cycles worked just fine with no errors.
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #32 from Philipp Claßen (philipp.classen@posteo.de) --- Solved it for me, too. Tested on Arch Linux with 4.14.0-rc6-mainline (plus the patch).
https://bugzilla.kernel.org/show_bug.cgi?id=196615
--- Comment #33 from Florian Schmitt (kommerz11@galois.de) --- Looks like the patch made it into 4.13.11. Yay. Thanks!
From the changelog:
commit 0d74253003e6370e65468f5aec8c969bdef6733e Author: Rex Zhu Rex.Zhu@amd.com Date: Fri Oct 20 15:07:41 2017 +0800
drm/amd/powerplay: fix uninitialized variable
commit 8b95f4f730cba02ef6febbdc4ca7e55ca045b00e upstream.
refresh_rate was not initialized when program display gap. this patch can fix vce ring test failed when do S3 on Polaris10.
bug: https://bugs.freedesktop.org/show_bug.cgi?id=103102 bug: https://bugzilla.kernel.org/show_bug.cgi?id=196615
dri-devel@lists.freedesktop.org