[Bug 211277] New: sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

List overview All Threads
Download

newer

older

Patch "dma_fence_array: Fix...

[PATCH v1] drm/privacy-screen:...

bugzilla-daemon＠bugzilla.kernel.org

19 Jan 2021 19 Jan '21

10:25 a.m.

https://bugzilla.kernel.org/show_bug.cgi?id=211277

Bug ID: 211277 Summary: sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail Product: Drivers Version: 2.5 Kernel Version: 5.10.4 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: kolAflash@kolahilft.de Regression: No

I'm currently on Debian-11-Testing (Bullseye). And since a few weeks the system sometimes (not always) doesn't wake up from suspend. Most of the time suspend works. But about 1 in 10 times it crashes.

I attached /var/log/kern.log which holds plenty of information about the crash. Looks like the crash happened in amdgpu_dm.c:7273 (amdgpu_dm_atomic_commit_tail, Linux-5.10.4).

I'm pretty sure this behavior didn't appeared a few month before. So I guess a recent change is causing it. This may either be:

1. an updated package by Debian-Testing

Indeed I'm pretty sure the problem didn't appeared before Linux-5.9. So maybe this is being caused by a change between Linux-5.8 and Linux-5.9. I'll try to test going back to Linux-5.8 in the next days.

2. a BIOS update In November 2020 I installed the BIOS update sp110770.exe. Before I was using sp107599.exe. You can find the BIOS history attached. I'll also see if I can test a BIOS downgrade in the next days.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

Show replies by date

bugzilla-daemon＠bugzilla.kernel.org

19 Jan 19 Jan

10:25 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #1 from kolAflash (kolAflash@kolahilft.de) --- Created attachment 294747 --> https://bugzilla.kernel.org/attachment.cgi?id=294747&action=edit kern.log

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

10:27 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #2 from kolAflash (kolAflash@kolahilft.de) --- Created attachment 294749 --> https://bugzilla.kernel.org/attachment.cgi?id=294749&action=edit BIOS update history (just in case someone has a clue if something looks suspicios and this might not be a Linux problem)

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

22 Jan 22 Jan

4:42 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #3 from kolAflash (kolAflash@kolahilft.de) --- I searched through my journalctl log.

I set up the whole system in May 2020 with Linux-5.6.7. (journalctl has everything back to that date)

The bug appeared as following since October and Linux-5.8. So Linux-5.8 was also affected (contradicting my original post).

I used the system nearly every day and always use s2ram (never shutting down, only rebooting when needed for updates). So this can be seen statistically.

- 2020-10-21 with Linux-5.8.14 (Debian 5.8.0-3, installed after 2020-09-26) - 2020-12-11 with Linux-5.9.11 (Debian 5.9.0-4, installed 2020-12-04) - 2020-12-25 with Linux-5.9.11 - 2021-01-13 with Linux-5.10.4 (Debian 5.10.0-1, installed 2021-01-10) - 2021-01-16 with Linux-5.10.4 - 2021-01-19 with Linux-5.10.4

So the bug didn't appear with Linux <= 5.7. And the bugs frequency increased with Linux-5.10.

In parallel I'm still trying to rule out other factors. (BIOS updates, other software changes, ...) Something significant might be, that Debian used GCC-9 for Linux-5.7. And starting with Linux-5.8 GCC-10 was used.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

24 Jan 24 Jan

7:23 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

Jerome C (me@jeromec.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |me@jeromec.com

--- Comment #4 from Jerome C (me@jeromec.com) --- I too have a Ryzen 5 3500U and random resumes where the screen updates are very slow ( 1 frame change every 1-2 minutes ) which looks like it's crashed and in the kernel logs I see a bunch of "flip_done timed out" and "amdgpu_dm_atomic_commit_tail" errors

This never happened for me between 5.4.6 - 5.9.14. I noticed this since 5.10.4 and did never suspended on 5.10.0 - 5.10.3, so my guess it's an issue sometime in 5.10.0 - 5.10.3

Do you have kernel parameter set "init_on_free=1" or in your kernel config "CONFIG_INIT_ON_FREE_DEFAULT_ON=y", if so try changing/setting the kernel parameter "init_on_free=0", so far ( for me and still testing ) it's resumed every time

I think it's an issue with amdgpu and kernel paramater "init_on_free=1" or kernel config "CONFIG_INIT_ON_FREE_DEFAULT_ON=y" which zero's memory on free/deallocation.

kernel paramter "init_on_alloc=1" or kernel config "CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y" works fine for me

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

27 Jan 27 Jan

2:11 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #5 from Jerome C (me@jeromec.com) --- Created attachment 294879 --> https://bugzilla.kernel.org/attachment.cgi?id=294879&action=edit Kernel log

Unfortunately it crashed again although I've noticed it's been crashing a lot less (4-5 days) since I set kernel parameter "init_on_free=0".

I've attached a kernel log for 5.10.10

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

30 Jan 30 Jan

10:25 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #6 from kolAflash (kolAflash@kolahilft.de) --- (In reply to Jerome C from comment #4)

...

[...] Do you have kernel parameter set "init_on_free=1" or in your kernel config "CONFIG_INIT_ON_FREE_DEFAULT_ON=y", [...]

I'm using the Debian-11 (Testing / Bullseye) standard kernel.

$ grep -i init_on_free /boot/config-5.10.0-2-amd64 # CONFIG_INIT_ON_FREE_DEFAULT_ON is not set

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

10:41 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #7 from Jerome C (me@jeromec.com) --- ok, you have it turned off already

Weird thing happened this morning... I woke my laptop up and it was slow screen updates... I just closed my laptop lid, frustrated... I noticed it suspended again... I open my laptop again and it resumed

I looked in my kernel logs and saw the error messages from the first resume

NOTE: only copied the error messages

...

[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:73:eDP-1] flip_done timed out [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:52:plane-3] flip_done timed out

but on the second resume... no warnings or errors

I think it's a bug somewhere between suspension and resuming

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

31 Jan 31 Jan

1:11 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #8 from Jerome C (me@jeromec.com) --- I've tried kernel 5.11-rc5 and same issue occurs there.

For now I've downgraded kernel to 5.9.14 ( will update it to 5.9.16 ) until this issue is fixed

What I've mentioned in comment 4 isn't really helping I think

Sometimes the issue happens frequently in a day but then other times it could be a few days before it happens again

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

21 Feb 21 Feb

12:17 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #9 from kolAflash (kolAflash@kolahilft.de) --- I'm on Linux-5.7 now since 2021-01-26. And I woke up the notebook at least once a day since then. So it's clearly a regression in the kernel somewhere between 5.7 and 5.10 and probably between 5.7 and 5.8.

And it's definitely not a BIOS issue, because I changed anything about the BIOS since the problem appeared last time with Kernel-5.10.

Regards, kolAflash

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:40 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

Alex Deucher (alexdeucher@gmail.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com

--- Comment #10 from Alex Deucher (alexdeucher@gmail.com) --- Can you bisect? https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

25 Feb 25 Feb

10:28 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #11 from kolAflash (kolAflash@kolahilft.de) --- (In reply to Alex Deucher from comment #10)

...

Can you bisect? https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html

I will try to.

But it will definitely need some time and may not be possible at all. Because the bug cannot be reproduced completely deterministically.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

5 Mar 5 Mar

3:02 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #12 from kolAflash (kolAflash@kolahilft.de) --- I've tried doing a bisect using this script. Unfortunately I couldn't reproduce the bug this way. So I bisecting will take a lot longer.

for i in {0..19}; do echo -e "\n${i}" /usr/sbin/rtcwake --seconds 15 --mode no systemctl start suspend.target sleep 15 done

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

3:13 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #13 from Jerome C (me@jeromec.com) --- (In reply to kolAflash from comment #12)

...

I've tried doing a bisect using this script. Unfortunately I couldn't reproduce the bug this way. So I bisecting will take a lot longer.

for i in {0..19}; do echo -e "\n${i}" /usr/sbin/rtcwake --seconds 15 --mode no systemctl start suspend.target sleep 15 done

Hiya

I did some testing myself recently and unfortunately doing 20 tests was not enough for me. I found that it could be 50 - 100 resumes before it would fail so I capped mine at 150 resumes, there were too many times where things looked fine for me with less than 50. After I tested kernels between 5.10.4 to 5.11-rc5 ( I didn't use 5.10.0 to 5.10.3 ) and found that this commit

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...

was causing the issue for me

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

7 Mar 7 Mar

3:43 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #14 from kolAflash (kolAflash@kolahilft.de) --- (In reply to Jerome C from comment #13)

I don't get how you got to your results. There's no straight path from 5.10.4 to 5.11-rc5, as they are on different branches (5.10.y and master).

Nevertheless, your result may be reasonable from the point of the git history. I'm not sure about the commit ID a10aad137, but it has an completly identical twin commit c6d2b0fbb (also removing AMD_PG_SUPPORT_VCN_DPG from that expression). https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... And c6d2b0fbb has been applied between v5.10-rc2 and v5.10-rc3 (a10aad137 is only in master).

So if c6d2b0fbb (a.k.a a10aad137) is responsible, this explains why I started recognizing the problem when Debian-Testing went from Linux-5.9 to Linux-5.10.

I'm now running a 5.10.21 kernel where I reverted c6d2b0fbb. And I'll try using this kernel for at least one week and also run some iterative tests with it.

Regarding reproduction in general:

I really wonder what triggers this bug. I didn't went so far to test with more than 50 tests (sleep-wake iterations). Especially I didn't tried more than 50 because the bug definitely appeared more often if it happened under "natural" (non-testing) circumstances.

Some test series I did which are hard to make sense of statistically: I tried 20 tests and nothing happened. A few minutes later I decided to try 50 more tests and it directly failed on the first one. So I had to reboot, tried again 50 tests and nothing happened. Afterwards I put my notebook into s2ram and when I woke it the next day it immediately crashed.

By the way the two times it crashed recently (see above) happened with a kernel I compiled from clean kernel.org sources. Also I never experienced the bug with a clean 5.8.18 compiled from kernel.org running with the same system for about a week. So I'm quite convinced it's nothing Debian specific.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

11 Mar 11 Mar

1:55 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #15 from kolAflash (kolAflash@kolahilft.de) --- (In reply to Alex Deucher from comment #10)

...

Can you bisect? https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html

I've done several s2ram-wakeup cycles (100 automatic and about three manual wakeups/day) with the kernel I compiled on 2021-03-07.

It's based on 5.10.21 with c6d2b0fbb reverted. (as suggested by Jerome) Result: No crashes. This looks very prosiming!

@Alex Can I help with anything else to solve this?

I also compiled 5.10.21 without reverting c6d2b0fbb, tested it for a few hours and got three wakeup-crashes.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

12 May 12 May

11:43 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #16 from kolAflash (kolAflash@kolahilft.de) --- @Alex Any progress on this?

If there's no perfect way to fix this, what about an option to turn on/off this behaviour? A module option that can be changed at runtime would be ideal. So it can be set right before suspending. But a kernel boot parameter would be fine too.

P.S. Would someone be so kind and set this bug to "confirmed"?

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

13 May 13 May

2:20 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #17 from Alex Deucher (alexdeucher@gmail.com) --- I don't think we've been able to reproduce it. That said, we did double check the programmign sequences and I believe it may be fixed with these patches: https://gitlab.freedesktop.org/agd5f/linux/-/commit/71efc8701a47aa9e3de74bab... https://gitlab.freedesktop.org/agd5f/linux/-/commit/a8f768874aaf751738a2e035...

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:26 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #18 from Jerome C (me@jeromec.com) --- (In reply to Alex Deucher from comment #17)

...

I don't think we've been able to reproduce it. That said, we did double check the programmign sequences and I believe it may be fixed with these patches: https://gitlab.freedesktop.org/agd5f/linux/-/commit/ 71efc8701a47aa9e3de74bab06020da81757893f https://gitlab.freedesktop.org/agd5f/linux/-/commit/ a8f768874aaf751738a2e0350bf2e70085f93ace

I've tried these two commits and the issue still there unfortunately

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

18 May 18 May

3:17 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

jamesz@amd.com (jamesz@amd.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |jamesz@amd.com

--- Comment #19 from jamesz@amd.com (jamesz@amd.com) --- Created attachment 296841 --> https://bugzilla.kernel.org/attachment.cgi?id=296841&action=edit to fix suspend/resume hung issue

Hi @kolAflash and @jeromec, Can you help check if this patch can fix the issue? Since we can't reproduce at our side. Thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4:13 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #20 from Jerome C (me@jeromec.com) --- (In reply to James Zhu from comment #19)

...

Created attachment 296841 [details] to fix suspend/resume hung issue

Hi @kolAflash and @jeromec, Can you help check if this patch can fix the issue? Since we can't reproduce at our side. Thanks! James

no, this doesn't work for me.

I'm curious to how your exactly to reproducing this

I start Xorg using the command "startx"

Xorg is running with LXQT

I start "Konsole" a gui terminal and execute the following

"for i in $(seq 1 150); do echo $i; sudo rtcwake -s 7 -m mem; done"

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4:41 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #21 from James Zhu (jamesz@amd.com) --- Hi Jeromec, to isolate the cause, can you help run two experiments separately? 1. To run suspend/resume without launching Xorg, just on text mode. 2. To disable video acceleration (VCN IP). I need you share me the whole dmesg log after loading amdgpu driver. I think basically running modprobe with ip_block_mask=0x0ff should disable vcn ip for VCN1.(you can find words in dmesg to tell you if vcn ip is disabled or not).

Thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

5:18 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #22 from kolAflash (kolAflash@kolahilft.de) --- @James What do you mean by video acceleration? Is this about 3D / DRI acceleration like in video games? Or do you mean just "video" playback (movie, mp4, webm, h264, vp8, ...) acceleration?

And I don't completely understand what ip_block_mask=0x0ff is supposed to do. I just rebootet with that kernel parameter added and 3D acceleration (DRI) is still working.

----

I'm planing to run these kernels in the next days:

1. Current Debian testing Linux-5.10.0-6 with ip_block_mask=0x0ff, Xorg and 3D acceleration in daily use.

2. amd-drm-next-5.14-2021-05-12* without ip_block_mask=0x0ff, with Xorg and with 3D acceleration in daily use.

3. amd-drm-next-5.14-2021-05-12* without ip_block_mask=0x0ff, with Xorg, but without 3D acceleration** in daily use.

4. amd-drm-next-5.14-2021-05-12* without ip_block_mask=0x0ff and without Xorg, doing some standby cycles for testing.

If I encounter any crash I'll post the whole dmesg starting with the boot output.

----

* amd-drm-next-5.14-2021-05-12 https://gitlab.freedesktop.org/agd5f/linux/-/tree/amd-drm-next-5.14-2021-05-... ae30d41eb

** Is there something special I should do to turn off acceleration? Or should I just don't start any application doing 3D / DRI acceleration? (the latter one might be difficult - I got to keep an eye on every application like Firefox, Atom, VLC, KWin/KDE window manager, ... not to use DRI)

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:04 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #23 from James Zhu (jamesz@amd.com) --- Hi kolAflash, VCN IP is for video acceleration(for video playback), if vcn ip didn't handle suspend/resume process properly, we do observe other IP blocks be affected. For your case it is display IP(dm) related. ip_block_mask=0xff (in grub should be amdgpu.ip_block_mask=0x0ff) can disable VCN IP during amdgpu driver loading. so this experiment can tell if this dm error is caused by VCN IP or not. sometimes /sys/kernel/debug/dri/0/amdgpu_fence_info can provide some useful information if it has chance to be dumped. these experiments can help identified which IP cause the issue. So we can find expert in that area to continue to triage. Your current report is case 2, so it can be replaced with 2. amd-drm-next-5.14-2021-05-12* with ip_block_mask=0x0ff, with Xorg and without 3D acceleration in daily use. I suggest you to execute your test plan in order 4->3->2->1. Thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:21 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #24 from Jerome C (me@jeromec.com) --- (In reply to James Zhu from comment #21)

...

Hi Jeromec, to isolate the cause, can you help run two experiments separately?

To run suspend/resume without launching Xorg, just on text mode.

To disable video acceleration (VCN IP). I need you share me the whole

dmesg log after loading amdgpu driver. I think basically running modprobe with ip_block_mask=0x0ff should disable vcn ip for VCN1.(you can find words in dmesg to tell you if vcn ip is disabled or not).

Thanks! James

1) In text mode, VCN enabled, suspensions issues are still there 2) I see the message confirming that VCN is disabled, In text mode, VCN disabled, suspensions issues are gone, After starting Xorg, VCN disabled, suspensions issues are gone

I'll gather the logs those soon ( tomorrow sometime )

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:21 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #25 from Jerome C (me@jeromec.com) --- I forgot to mention... I'm on kernel 5.13.4

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:48 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #26 from Jerome C (me@jeromec.com) --- (In reply to Jerome C from comment #25)

...

I forgot to mention... I'm on kernel 5.13.4

5.12.4 I mean

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

7:33 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #27 from James Zhu (jamesz@amd.com) --- Hi Jeromec, thanks for your feedback, can you also add drm.debug=0x1ff modprobe? I need log: case 1 dmesg and /sys/kernel/debug/dri/0/amdgpu_fence_info (if you can). James.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

19 May 19 May

7:44 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #28 from Jerome C (me@jeromec.com) --- Created attachment 296877 --> https://bugzilla.kernel.org/attachment.cgi?id=296877&action=edit AMDGPU fence info

(In reply to James Zhu from comment #27)

...

Hi Jeromec, thanks for your feedback, can you also add drm.debug=0x1ff modprobe? I need log: case 1 dmesg and /sys/kernel/debug/dri/0/amdgpu_fence_info (if you can). James.

I've tested text mode and gui/drm mode with "drm.debug=0x1ff" set and found no crashes... when "drm.debug=0x1ff" is unset... the crashes/timeouts are back... I think this is why your unable to reproduce the problem...

I've never known debug option(s) to remove issue(s)... oh well

I've added the contents of the file "/sys/kernel/debug/dri/0/amdgpu_fence_info".

The file contains 4 different boot states ( vcn on/off, drm debug on/off ) clearly marked/seperated in the attached file

I'm using 5.12.5 now but I also tried this on 5.12.4. Usually the crashes happen within 50 suspensions/resumes but today I left it to do over 2000 suspensions/resumes just to make sure...

I know you asked for a log but I spent so much time on this ( other things too ), it wasn't on my mind so I'll get that by Friday, if you still need it ofcourse

thanks

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

8:02 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #29 from James Zhu (jamesz@amd.com) --- Hi Jeromec,I think debug turn-on changes a little bit timing. log without debug info can't give me any help. The amdgpu_fence_info looks good for all cases. this issue is possible device specified.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

20 May 20 May

9:31 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #30 from kolAflash (kolAflash@kolahilft.de) --- Created attachment 296891 --> https://bugzilla.kernel.org/attachment.cgi?id=296891&action=edit all kernel messages with ip_block_mask=0x0ff (Debian kernel 5.10.0-6)

Also crashes with ip_block_mask=0x0ff Tested with the current Debian Testing kernel 5.10.0-6.

I attached all kernel messages from /var/log/messages from boot to crash. I think that should be the dmesg output.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

9:40 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #31 from Jerome C (me@jeromec.com) --- (In reply to kolAflash from comment #30)

...

Created attachment 296891 [details] all kernel messages with ip_block_mask=0x0ff (Debian kernel 5.10.0-6)

Also crashes with ip_block_mask=0x0ff Tested with the current Debian Testing kernel 5.10.0-6.

I attached all kernel messages from /var/log/messages from boot to crash. I think that should be the dmesg output.

hiya, you may not know this but use in "amdgpu.ip_block_mask=0x0ff" and not "ip_block_mask=0x0ff"

"ip_block_mask=0x0ff" will only apply to linux

"amdgpu.ip_block_mask=0x0ff" will only apply to amdgpu module

I can see in your kernel logs that VCN is still enabled

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:34 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #32 from kolAflash (kolAflash@kolahilft.de) --- Created attachment 296901 --> https://bugzilla.kernel.org/attachment.cgi?id=296901&action=edit dmesg via SSH, running amd-drm-next-5.14-2021-05-12 without ip_block_mask=0x0ff and with Xorg

(In reply to Jerome C from comment #31)

...

[...] hiya, you may not know this but use in "amdgpu.ip_block_mask=0x0ff" and not "ip_block_mask=0x0ff" [...] I can see in your kernel logs that VCN is still enabled

Ooops you're right. I know someone wrote that before. But it seems I somehow missed it while editing my Grub parameters.

I'll give it another try!

----

In the meanwhile I performed test number 2.

...

amd-drm-next-5.14-2021-05-12* without ip_block_mask=0x0ff, with Xorg [...]

This time the crash was very different!

After some minutes (about 3) the graphical screen actually turned back on. I'm pretty sure that didn't happen with the other kernels I tested. (never tested amd-drm-next-5.14-2021-05-12 before)

Nevertheless everything graphical is lagging extremely. If I move the mouse or do anything else it takes more than 10 seconds until something happens on the screen.

On the other hand SSH access is smoothly possible. And I was able to save the dmesg output. (see attachment) Unlocking the screen via SSH (loginctl) or starting graphical programs (DISPLAY=:0 xterm) works, but is extremely slow too. (> 10 seconds waiting)

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:42 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #33 from Jerome C (me@jeromec.com) --- (In reply to kolAflash from comment #32)

...

In the meanwhile I performed test number 2.

...

amd-drm-next-5.14-2021-05-12* without ip_block_mask=0x0ff, with Xorg

[...]

This time the crash was very different!

After some minutes (about 3) the graphical screen actually turned back on. I'm pretty sure that didn't happen with the other kernels I tested. (never tested amd-drm-next-5.14-2021-05-12 before)

Nevertheless everything graphical is lagging extremely. If I move the mouse or do anything else it takes more than 10 seconds until something happens on the screen.

On the other hand SSH access is smoothly possible. And I was able to save the dmesg output. (see attachment) Unlocking the screen via SSH (loginctl) or starting graphical programs (DISPLAY=:0 xterm) works, but is extremely slow too. (> 10 seconds waiting)

I experienced this laggy too although I didn't try the SSH thing ( I don't have it setup )

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

28 Jun 28 Jun

9:01 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #34 from Jerome C (me@jeromec.com) --- Using 5.13.0 now and the issue is still here

(In reply to kolAflash from comment #32)

...

Created attachment 296901 [details] dmesg via SSH, running amd-drm-next-5.14-2021-05-12 without ip_block_mask=0x0ff and with Xorg

(In reply to Jerome C from comment #31)

...
[...] hiya, you may not know this but use in "amdgpu.ip_block_mask=0x0ff" and not "ip_block_mask=0x0ff" [...] I can see in your kernel logs that VCN is still enabled

Ooops you're right. I know someone wrote that before. But it seems I somehow missed it while editing my Grub parameters.

I'll give it another try!

In the meanwhile I performed test number 2.

...

amd-drm-next-5.14-2021-05-12* without ip_block_mask=0x0ff, with Xorg

[...]

This time the crash was very different!

After some minutes (about 3) the graphical screen actually turned back on. I'm pretty sure that didn't happen with the other kernels I tested. (never tested amd-drm-next-5.14-2021-05-12 before)

Nevertheless everything graphical is lagging extremely. If I move the mouse or do anything else it takes more than 10 seconds until something happens on the screen.

On the other hand SSH access is smoothly possible. And I was able to save the dmesg output. (see attachment) Unlocking the screen via SSH (loginctl) or starting graphical programs (DISPLAY=:0 xterm) works, but is extremely slow too. (> 10 seconds waiting)

You have any updates since you corrected the kernel parameter?

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4 Aug 4 Aug

12:43 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #35 from kolAflash (kolAflash@kolahilft.de) --- Created attachment 298193 --> https://bugzilla.kernel.org/attachment.cgi?id=298193&action=edit /var/log/kern.log running amd-drm-next-5.14-2021-05-12 (ae30d41eb) with Xorg

Sorry for the long delay. I've tested:

1. Current Debian-11 testing Linux-5.10.0-8 with amdgpu.ip_block_mask=0x0ff while running Xorg. Result: everything ok

2. amd-drm-next-5.14-2021-05-12* (ae30d41eb) without any special kernel options while running Xorg. Result: - crashes - also the screen starts flickering about every 10 seconds after second resume - flickering also happens with using a8f768874^ (before the first fix-commit by Alex D.) - log attached: 5.12.0-rc7-original-ae30d41eb_crash.txt

3. Upstream Linux-5.14.0-rc4. Result: Still broken.

----

* amd-drm-next-5.14-2021-05-12 https://gitlab.freedesktop.org/agd5f/linux/-/tree/amd-drm-next-5.14-2021-05-... ae30d41eb

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

1:24 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #36 from Jerome C (me@jeromec.com) --- I've been watching linux-next and noticed that this commit

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/d...

was posted on linux-next back between 5.10-5.11, I don't remember but it keeps getting pushed back and not mainlined...

I think this is why the issues are still here and none of AMD are responding to this now since comment 29

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

25 Aug 25 Aug

noon

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #37 from James Zhu (jamesz@amd.com) --- HiJerome and kolAflash, would you mind base on your original test configuration,and add pci=noats in boot parameter? for example: linux /boot/vmlinuz-5.4.0-54-generic root=UUID=803844cc-7291-4056-bd04-f1b43b54ed97 ro pci=noats see if this helps. Thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4:53 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #38 from Jerome C (me@jeromec.com) --- Hi James,

With "pci=noats" set the suspension and resume works fine

I did see some errors ( something about device not added ) in the kernel log from "kfd" but I guess that's related to PCIe ATS being disabled with the kernel parameter set

Thanks

Jerome

On 21/02/2021 00:17, bugzilla-daemon@bugzilla.kernel.org wrote:

...

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #9 from kolAflash (kolAflash@kolahilft.de) --- I'm on Linux-5.7 now since 2021-01-26. And I woke up the notebook at least once a day since then. So it's clearly a regression in the kernel somewhere between 5.7 and 5.10 and probably between 5.7 and 5.8.

And it's definitely not a BIOS issue, because I changed anything about the BIOS since the problem appeared last time with Kernel-5.10.

Regards, kolAflash

-- You may reply to this email to add a comment.

You are receiving this mail because: You are on the CC list for the bug.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

5:09 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #40 from James Zhu (jamesz@amd.com) --- Hi Jerome, Yes, you are right.Turning off ats will affect iommu. KFD needs iommu enable. KFD supports computing engine. It won't affect 3D and video acceleration. After I confirm if ats/iommu causes the issue, I will find right person to fix it. Thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

1 Sep 1 Sep

1:34 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #41 from kolAflash (kolAflash@kolahilft.de) --- I can confirm Jeromes result.

Bug is gone with pci=noats. (Debian-11 kernel 5.10.0-8-amd64)

I ran 50 suspend/standby rounds. Also I used the notebook for 2 days and suspended it multiple times without issues.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2 Sep 2 Sep

12:59 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #42 from James Zhu (jamesz@amd.com) --- Hi Jerome and kolAflash,

Thanks for confirmation. I have a workaround for this issue. But I wish I can find the root cause or better workaround.

James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:20 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #43 from kolAflash (kolAflash@kolahilft.de) --- (In reply to James Zhu from comment #42)

...

Hi Jerome and kolAflash,

Thanks for confirmation. I have a workaround for this issue. But I wish I can find the root cause or better workaround.

Thanks too for your help James!

For me personally the situation is quite fine with pci=noats. I'm sometimes using Qemu/KVM and VirtualBox. But no need for absolute bleeding edge VM performance. So I'll probably be fine with pci=noats.

However, I'd love to contribute to a fix for all users without kernel parameter stuff. (including a fix in longterm Linux-5.10 for Debian) So just tell me if I can help by doing more tests, sending logs, ... :-)

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

9:24 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #44 from James Zhu (jamesz@amd.com) --- Created attachment 298651 --> https://bugzilla.kernel.org/attachment.cgi?id=298651&action=edit A workaround for suspend/resume hung issue

The VCN block passed all ring tests, usually the vcn will get into idle within 1 sec. Somehow it affected later amd iommu device resume which is controlled by kfd resume. This workaround is to gate vcn block immediately when ring test passed. It can fix the suspend/resume hung issue.

Hi kolAflash, Please help check the WA in your setup. I will continue working on root cause. thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

3 Sep 3 Sep

6:52 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #45 from Jerome C (me@jeromec.com) --- Unfortunately this failed after 138 susp/resu

Thanks

Jerome

On 02/09/2021 22:24, bugzilla-daemon@bugzilla.kernel.org wrote:

...

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #44 from James Zhu (jamesz@amd.com) --- Created attachment 298651 --> https://bugzilla.kernel.org/attachment.cgi?id=298651&action=edit A workaround for suspend/resume hung issue

The VCN block passed all ring tests, usually the vcn will get into idle within 1 sec. Somehow it affected later amd iommu device resume which is controlled by kfd resume. This workaround is to gate vcn block immediately when ring test passed. It can fix the suspend/resume hung issue.

Hi kolAflash, Please help check the WA in your setup. I will continue working on root cause. thanks! James

-- You may reply to this email to add a comment.

You are receiving this mail because: You are on the CC list for the bug.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

11:54 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #47 from James Zhu (jamesz@amd.com) --- Hi Jerome, Thanks! I knew this issue is not easy to judge if it is fixed. Since it occurred quite randomly. On my setup, this WA passed 5 times up to 300 suspend/resume cycles, 1 time up to 3800 suspend/resume cycle. But I doubt that it is root cause, so I took it as WA. But it seems it is not WA for all system. James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

12:12 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

Anthony Rabbito (ted437@gmail.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |ted437@gmail.com

--- Comment #48 from Anthony Rabbito (ted437@gmail.com) --- I'm also facing consistent wake up from screen saver crashes on a Radeon VII. This became more appearant 5.14.0-rc7 and has made it's way to 5.14.0. After the screens blank waking up from sleep typically leaves artifacts on one screen, another screen will be forozen, and a third screen allows to unlock out of SDDM. I will attach kernel logs of a trace while this happens. Please let me know if I can assist in anyway.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

12:13 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #49 from Anthony Rabbito (ted437@gmail.com) --- Created attachment 298661 --> https://bugzilla.kernel.org/attachment.cgi?id=298661&action=edit journalctl of amdgpu trace

(In reply to Anthony Rabbito from comment #48)

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

12:23 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #50 from James Zhu (jamesz@amd.com) --- Hi Anthony, Can you try if Comment #37? see if it helps. But from the log that you attached, it is a different issue that GFX hw has lots of ECC error, which cause gfx ring time out. after that the gpu recover is triggered, unfortunately, screen blank came up. I think you need create another ticket for your case. Best Regards! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4 Sep 4 Sep

5:41 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

Arham Jain (arhamjain@gmail.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |arhamjain@gmail.com

--- Comment #51 from Arham Jain (arhamjain@gmail.com) --- I can confirm that the issue I was having after trying to wake after suspend (Ryzen 3500u, Linux 5.14 RC7) has vanished after adding pci=noats to my boot parameters a few days ago. I've had this issue on every kernel since 5.10 (5.4 and 5.9 were fine for me for several months each, not sure what I used in between). Thank you so much James for posting this (and trying to fix it)!

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

7 Sep 7 Sep

2 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #52 from James Zhu (jamesz@amd.com) --- Created attachment 298691 --> https://bugzilla.kernel.org/attachment.cgi?id=298691&action=edit Fix for S3 hung issue

Hi Jerome and kolAflash,

I think iommu device init is put at wrong place during the resume. I attache a patch. Please confirm if it works. Thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:32 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #53 from Anthony Rabbito (ted437@gmail.com) --- Thanks for chiming in James! Few things I've observed since adding 'pci=noats' the graphic artifacts seem to happen way less. I did observe one lockup which required me to hard shut down the computer. This was a wake from suspend scenario.

I used to deal with somwhat similar issues here -- https://bugs.freedesktop.org/show_bug.cgi?id=110674 not sure if that's of any use. Let me know if a fresh bug is warranted.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:27 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #54 from Jerome C (me@jeromec.com) --- Hi James,

After 900 ( 600 on LLVM, 300 on GCC ) susp/resu using kernel 5.14.1 compiled by LLVM 12.0.1 ( LLVM_IAS is unset during compiling ) and again by GCC 11.1.0, there no crash on resume, awesome. It usually fails between 1-150 susp/resu

BRING ON THE RYZEN 6000 SERIES APU

Thanks

Jerome

-------- Original Message -------- On 7 Sep 2021, 03:00, < bugzilla-daemon@bugzilla.kernel.org> wrote:

...

[https://bugzilla.kernel.org/show%5C_bug.cgi?id=211277%5D%5Bhttps_bugzilla.ke...]

--- Comment #52 from James Zhu (jamesz@amd.com) --- Created attachment 298691 --> https://bugzilla.kernel.org/attachment.cgi?id=298691&action=edit Fix for S3 hung issue

Hi Jerome and kolAflash,

I think iommu device init is put at wrong place during the resume. I attache a patch. Please confirm if it works. Thanks! James

-- You may reply to this email to add a comment.

You are receiving this mail because: You are on the CC list for the bug.

[https_bugzilla.kernel.org_show_bug.cgi_id_211277]: https://bugzilla.kernel.org/show_bug.cgi?id=211277

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:27 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #55 from Jerome C (me@jeromec.com) --- Created attachment 298695 --> https://bugzilla.kernel.org/attachment.cgi?id=298695&action=edit signature.asc

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

7:47 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #56 from Jerome C (me@jeromec.com) --- damn, sorry for the ugly message layout replies

I didn't realize my e-mail provider was doing that

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

11:02 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #57 from James Zhu (jamesz@amd.com) --- (In reply to Anthony Rabbito from comment #53)

...

Thanks for chiming in James! Few things I've observed since adding 'pci=noats' the graphic artifacts seem to happen way less. I did observe one lockup which required me to hard shut down the computer. This was a wake from suspend scenario.

I used to deal with somwhat similar issues here -- https://bugs.freedesktop.org/show_bug.cgi?id=110674 not sure if that's of any use. Let me know if a fresh bug is warranted.

Hi Anthony,

The s3 hung issue here always with error: AMD-Vi: Event logged [IO_PAGE_FAULT...] Bug:110674 don't have gfx ECC error. You case do have lots of them. Can you share the whole dmesg after you added pci=noats? Regards! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

20 Sep 20 Sep

10:47 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

youling257@gmail.com changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |youling257@gmail.com

--- Comment #58 from youling257@gmail.com --- drm/amdgpu: move iommu_resume before ip init/resume cause suspend to disk resume failed on my amdgpu 3400g.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

11:34 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #59 from James Zhu (jamesz@amd.com) --- (In reply to youling257 from comment #58)

...

drm/amdgpu: move iommu_resume before ip init/resume cause suspend to disk resume failed on my amdgpu 3400g.

Can you share whole demsg log? Regards! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:43 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #60 from youling257@gmail.com --- Created attachment 298889 --> https://bugzilla.kernel.org/attachment.cgi?id=298889&action=edit dmesg5.15.txt

(In reply to James Zhu from comment #59)

...

(In reply to youling257 from comment #58)

...
drm/amdgpu: move iommu_resume before ip init/resume cause suspend to disk resume failed on my amdgpu 3400g.

Can you share whole demsg log? Regards! James

when resume failed have to force shutdown, how to output dmesg? only has boot log dmesg.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:57 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #61 from James Zhu (jamesz@amd.com) --- (In reply to youling257 from comment #60)

...

Created attachment 298889 [details] dmesg5.15.txt

(In reply to James Zhu from comment #59)

...
(In reply to youling257 from comment #58)

...
drm/amdgpu: move iommu_resume before ip init/resume cause suspend to disk resume failed on my amdgpu 3400g.

Can you share whole demsg log? Regards! James

when resume failed have to force shutdown, how to output dmesg? only has boot log dmesg.

after reboot, you can find under /var/log/kern.log and /var/log/syslog based on timestamp. you can just attach kern.log

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

3 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #62 from youling257@gmail.com --- (In reply to James Zhu from comment #61)

...

(In reply to youling257 from comment #60)

...
Created attachment 298889 [details] dmesg5.15.txt

(In reply to James Zhu from comment #59)

...
(In reply to youling257 from comment #58)

...
drm/amdgpu: move iommu_resume before ip init/resume cause suspend to

disk

...
...
...
resume failed on my amdgpu 3400g.

Can you share whole demsg log? Regards! James

when resume failed have to force shutdown, how to output dmesg? only has boot log dmesg.

after reboot, you can find under /var/log/kern.log and /var/log/syslog based on timestamp. you can just attach kern.log

my userspace is androidx86, running androidx86 with linux 5.15 and mesa21 on amdgpu, no /var/log. git bisect linux kernel 5.15rc1 and rc2, bad commit is drm/amdgpu: move iommu_resume before ip init/resume.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

9:07 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #63 from James Zhu (jamesz@amd.com) --- (In reply to youling257 from comment #62)

...

(In reply to James Zhu from comment #61)

...
(In reply to youling257 from comment #60)

...
Created attachment 298889 [details] dmesg5.15.txt

(In reply to James Zhu from comment #59)

...
(In reply to youling257 from comment #58)

...
drm/amdgpu: move iommu_resume before ip init/resume cause suspend to

disk

...
...
...
resume failed on my amdgpu 3400g.

Can you share whole demsg log? Regards! James

when resume failed have to force shutdown, how to output dmesg? only has boot log dmesg.

after reboot, you can find under /var/log/kern.log and /var/log/syslog

based

...
on timestamp. you can just attach kern.log

my userspace is androidx86, running androidx86 with linux 5.15 and mesa21 on amdgpu, no /var/log. git bisect linux kernel 5.15rc1 and rc2, bad commit is drm/amdgpu: move iommu_resume before ip init/resume.

Can you check CONFIG_HSA_AMD setting in .config? By the way , see if the below link help you dump the error message during resume. https://stackoverflow.com/questions/9682306/android-how-to-get-kernel-logs-a...

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

21 Sep 21 Sep

3:56 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #64 from youling257@gmail.com --- Created attachment 298899 --> https://bugzilla.kernel.org/attachment.cgi?id=298899&action=edit config-5.15.0-rc2-android-x86_64+

CONFIG_HSA_AMD=y

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4:04 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #65 from youling257@gmail.com --- (In reply to James Zhu from comment #63)

...

(In reply to youling257 from comment #62)

...
(In reply to James Zhu from comment #61)

...
(In reply to youling257 from comment #60)

...
Created attachment 298889 [details] dmesg5.15.txt

(In reply to James Zhu from comment #59)

...
(In reply to youling257 from comment #58)

...
drm/amdgpu: move iommu_resume before ip init/resume cause suspend

to

...
...
disk

...
...
...
resume failed on my amdgpu 3400g.

Can you share whole demsg log? Regards! James

when resume failed have to force shutdown, how to output dmesg? only has boot log dmesg.

after reboot, you can find under /var/log/kern.log and /var/log/syslog

based

...
on timestamp. you can just attach kern.log

my userspace is androidx86, running androidx86 with linux 5.15 and mesa21

on

...
amdgpu, no /var/log. git bisect linux kernel 5.15rc1 and rc2, bad commit is drm/amdgpu: move iommu_resume before ip init/resume.

Can you check CONFIG_HSA_AMD setting in .config? By the way , see if the below link help you dump the error message during resume. https://stackoverflow.com/questions/9682306/android-how-to-get-kernel-logs- after-kernel-panic

do you see my dmesg kernel command line "memmap=1M!5M ramoops.mem_size=1048576 ramoops.ecc=1 ramoops.mem_address=0x00500000 ramoops.console_size=16384 ramoops.ftrace_size=16384 ramoops.pmsg_size=16384 ramoops.record_size=32768".

if kernel panic reboot, can get /sys/fs/pstore/console-ramoops-0 and /sys/fs/pstore/pmsg-ramoops-0. but when resume failed, have to press power button force shutdown, no anything.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4:53 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #66 from youling257@gmail.com --- resume failed record video, https://drive.google.com/drive/folders/1bWMC4ByGvudC9zBk-9Xgamz-shir0pqX?usp...

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:32 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #67 from James Zhu (jamesz@amd.com) --- (In reply to youling257 from comment #66)

...

resume failed record video, https://drive.google.com/drive/folders/1bWMC4ByGvudC9zBk-9Xgamz- shir0pqX?usp=sharing

Can you try apply this patch: https://lore.kernel.org/all/20210920163922.313113287@linuxfoundation.org/?

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

5:43 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #68 from youling257@gmail.com --- (In reply to James Zhu from comment #67)

...

(In reply to youling257 from comment #66)

...
resume failed record video, https://drive.google.com/drive/folders/1bWMC4ByGvudC9zBk-9Xgamz- shir0pqX?usp=sharing

Can you try apply this patch: https://lore.kernel.org/all/20210920163922.313113287@linuxfoundation.org/?

linux kernel 5.15rc1 is good, suspend to disk resume success. linux kernel 5.15rc2 is bad, suspend to disk failed. revert "drm/amdgpu: move iommu_resume before ip init/resume" can suspend to disk resume success.

linux kernel 5.15rc2 has "drm/amdkfd: separate kfd_iommu_resume from kfd_resume", why you suggest me apply the patch

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

5:43 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #69 from youling257@gmail.com --- (In reply to James Zhu from comment #67)

...

(In reply to youling257 from comment #66)

...
resume failed record video, https://drive.google.com/drive/folders/1bWMC4ByGvudC9zBk-9Xgamz- shir0pqX?usp=sharing

Can you try apply this patch: https://lore.kernel.org/all/20210920163922.313113287@linuxfoundation.org/?

linux kernel 5.15rc2 has "drm/amdkfd: separate kfd_iommu_resume from kfd_resume", why you suggest me apply the patch

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:02 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #70 from James Zhu (jamesz@amd.com) --- My mistaake. Can you try add pci=noats in boot parameters?

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:29 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #71 from youling257@gmail.com --- (In reply to James Zhu from comment #70)

...

My mistaake. Can you try add pci=noats in boot parameters?

no help, still resume failed.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

22 Sep 22 Sep

1:59 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #72 from Jerome C (me@jeromec.com) --- Hi James,

I noticed the patch that you asked us to try from comment 52 were also submitted to kernel 5.14.7

tested it, all is good for now

Thanks

Jerome

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

15 Nov 15 Nov

1:39 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #73 from kolAflash (kolAflash@kolahilft.de) --- (In reply to Jerome C from comment #72)

...

Hi James,

I noticed the patch that you asked us to try from comment 52 were also submitted to kernel 5.14.7

tested it, all is good for now

Pleased to hear that :-) I'm just compiling 5.15.2 to run a test myself.

@James Will those patches be backported to the Linux-5.10 LTS kernel?

master and Linux-5.15 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...

Linux-5.14.7 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

23 Nov 23 Nov

9:31 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #74 from kolAflash (kolAflash@kolahilft.de) --- @James Zhu

Tested 5.15.2 for over a week and more than 50 standby-wakeups. No problems! Thanks :-)

I would be happy about a patch for the 5.10 longterm kernel. The bug became a problem with v5.10-rc3 (see comment 14), just before Debian made 5.10-longterm the Debian-11 kernel. So it would be great if I and probably other Debian-11 users could finally use that AMD GPU without workarounds.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

1:28 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #75 from James Zhu (jamesz@amd.com) --- (In reply to kolAflash from comment #74)

...

@James Zhu

Tested 5.15.2 for over a week and more than 50 standby-wakeups. No problems! Thanks :-)

I would be happy about a patch for the 5.10 longterm kernel. The bug became a problem with v5.10-rc3 (see comment 14), just before Debian made 5.10-longterm the Debian-11 kernel. So it would be great if I and probably other Debian-11 users could finally use that AMD GPU without workarounds.

Hi @Alex Deucher, Can you help on this request? thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

8:44 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #76 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to James Zhu from comment #75)

...

(In reply to kolAflash from comment #74)

...
@James Zhu

Tested 5.15.2 for over a week and more than 50 standby-wakeups. No problems! Thanks :-)

I would be happy about a patch for the 5.10 longterm kernel. The bug became a problem with v5.10-rc3 (see comment 14), just before

Debian

...
made 5.10-longterm the Debian-11 kernel. So it would be great if I and probably other Debian-11 users could finally use that AMD GPU without workarounds.

Hi @Alex Deucher, Can you help on this request? thanks! James

I cc'ed stable with the patches so they should show up in 5.10 assuming they apply cleanly. If not, can you look at what it would take to backport them?

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

24 Nov 24 Nov

3:22 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #77 from James Zhu (jamesz@amd.com) --- Created attachment 299697 --> https://bugzilla.kernel.org/attachment.cgi?id=299697&action=edit backport patch for 5.10 stable.

Hi @kolAflash, before I send out them to public for review,. could you help take a test? Thanks so much! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

25 Nov 25 Nov

6:34 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #78 from kolAflash (kolAflash@kolahilft.de) --- (In reply to James Zhu from comment #77)

...

Created attachment 299697 [details] backport patch for 5.10 stable.

Hi @kolAflash, before I send out them to public for review,. could you help take a test? Thanks so much! James

Thanks for the patch! :-)

make is currently running and I'll conduct some tests in the next days.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

6:58 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #79 from kolAflash (kolAflash@kolahilft.de) --- @James

Got this when compiling with Linux-5.10.81:

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c: In function ‘kgd2kfd_device_init’: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:754:6: error: implicit declaration of function ‘kgd2kfd_resume_iommu’; did you mean ‘kgd2kfd_resume_mm’? [-Werror=implicit-function-declaration] 754 | if (kgd2kfd_resume_iommu(kfd)) | ^~~~~~~~~~~~~~~~~~~~ | kgd2kfd_resume_mm

Patching 5.10.81 was without problems:

$ patch -p1 -i ../../backport_patch/0001-drm-amdkfd-separate-kfd_iommu_resume-from-kfd_resume.patch patching file drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h patching file drivers/gpu/drm/amd/amdkfd/kfd_device.c

$ patch -p1 -i ../../backport_patch/0002-drm-amdgpu-add-amdgpu_amdkfd_resume_iommu.patch patching file drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c patching file drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h

$ patch -p1 -i ../../backport_patch/0003-drm-amdgpu-move-iommu_resume-before-ip-init-resume.patch patching file drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

$ patch -p1 -i ../../backport_patch/0004-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch patching file drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

$ patch -p1 -i ../../backport_patch/0005-drm-amdkfd-fix-boot-failure-when-iommu-is-disabled-i.patch patching file drivers/gpu/drm/amd/amdgpu/amdgpu_device.c patching file drivers/gpu/drm/amd/amdkfd/kfd_device.c

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

8:48 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #80 from James Zhu (jamesz@amd.com) --- Hi @kolAflash, I applied those patches on (https://github.com/gregkh/linux.git linux-5.10.y f884bb85b8d877d4e0c670403754813a7901705b) (https://github.com/gregkh/linux.git linux-5.12.y 0e6f651912bdd027a6d730b68d6d1c3f4427c0ae). I didn't see compiling issue.

Can you share me .config?

James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

26 Nov 26 Nov

4:04 a.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #81 from kolAflash (kolAflash@kolahilft.de) --- Created attachment 299721 --> https://bugzilla.kernel.org/attachment.cgi?id=299721&action=edit Linux kernel make .config

@James

Compiling v5.10.80 (f884bb85b8d877d4e0c670403754813a7901705b) with the provided patch results in the same error.

I attached my Linux kernel make .config.

Compilation platform is Debian-11.1.0.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4:37 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #82 from James Zhu (jamesz@amd.com) --- Hi @kolAflash,

I don't have issue with your .config. on ubuntu 20.04

...

From source code, it should be fine.

$ grep -rn "kgd2kfd_resume_iommu" drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 309:int kgd2kfd_resume_iommu(struct kfd_dev *kfd);

$ grep -rn "amdgpu_amdkfd.h|kgd2kfd_resume_iommu" drivers/gpu/drm/amd/amdkfd/kfd_device.c 31:#include "amdgpu_amdkfd.h" 604: kfd->pci_atomic_requested = amdgpu_amdkfd_have_atomics_support(kgd);

...

...
...
...
792: if (kgd2kfd_resume_iommu(kfd))

940:int kgd2kfd_resume_iommu(struct kfd_dev *kfd)

Looks we are using different 5.10, should we use 5.10 stable for adding this backport patches?.

...

...
...
...
754 | if (kgd2kfd_resume_iommu(kfd))

| ^~~~~~~~~~~~~~~~~~~~ | kgd2kfd_resume_mm Best Regards! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

27 Nov 27 Nov

12:14 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #83 from kolAflash (kolAflash@kolahilft.de) --- Hi James,

(In reply to James Zhu from comment #82)

...

[...] $ grep -rn "amdgpu_amdkfd.h|kgd2kfd_resume_iommu" drivers/gpu/drm/amd/amdkfd/kfd_device.c 31:#include "amdgpu_amdkfd.h" 604: kfd->pci_atomic_requested = amdgpu_amdkfd_have_atomics_support(kgd);

...
...
...
...
792: if (kgd2kfd_resume_iommu(kfd))

940:int kgd2kfd_resume_iommu(struct kfd_dev *kfd)

the line numbers you're quoting are for Linux v5.12.19 (0e6f651912bdd027a6d730b68d6d1c3f4427c0ae) + the attachment-299697 patch.

...

Looks we are using different 5.10, should we use 5.10 stable for adding this backport patches?.

...
...
...
...
754 | if (kgd2kfd_resume_iommu(kfd))
  |      ^~~~~~~~~~~~~~~~~~~~
  |      kgd2kfd_resume_mm

I'm testing with Linux v5.10.80 (f884bb85b8d877d4e0c670403754813a7901705b) + the attachment-299697 patch. And there it's line number 754.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

1:03 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #84 from kolAflash (kolAflash@kolahilft.de) --- @James

I was able to compile!

Looks like this was some fault of mine. (I'm usually building out of source directory and did something wrong...)

Now I'm testing the current v5.10.82 with the provided attachment 299697 patches.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

29 Nov 29 Nov

7:21 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #85 from kolAflash (kolAflash@kolahilft.de) --- (In reply to James Zhu from comment #77)

...

Created attachment 299697 [details] backport patch for 5.10 stable.

Hi @kolAflash, before I send out them to public for review,. could you help take a test? Thanks so much! James

Works excellent!

Tested with Linux-5.10.82 on Debian-11.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

7:53 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #86 from James Zhu (jamesz@amd.com) --- Hi @kolAflash, thanks so much for your effort on this verification! Would you mind help apply those patches on 5.12 stable to check also? it should be automatically merged. Thanks! James

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

4 Dec 4 Dec

10:29 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #87 from kolAflash (kolAflash@kolahilft.de) --- (In reply to James Zhu from comment #86)

...

Hi @kolAflash, thanks so much for your effort on this verification! Would you mind help apply those patches on 5.12 stable to check also? it should be automatically merged. Thanks! James

I'm testing Linux-5.12.19 with the patch from attachment 299697 since 2021-12-02. Until now everything works fine.

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

23 Jan 23 Jan

1:54 p.m.

New subject: [Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

https://bugzilla.kernel.org/show_bug.cgi?id=211277

kolAflash (kolAflash@kolahilft.de) changed:

--- Comment #88 from kolAflash (kolAflash@kolahilft.de) --- Debian-11 just got a kernel security update, giving me Linux-5.10.92.

https://snapshot.debian.org/package/linux-signed-amd64/5.10.92%2B1/#linux-im...

Since rebooting into that kernel I got no more crashes after waking from s2ram. (not using pci=noats or any other workarounds)

Conclusion: Everything fixed! Thanks a lot to everyone involved :-)

-- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.

1183

Age (days ago)

1552

Last active (days ago)

dri-devel@lists.freedesktop.org

86 comments

1 participants

tags (0)

participants (1)

bugzilla-daemon＠bugzilla.kernel.org