https://bugs.freedesktop.org/show_bug.cgi?id=108992
Bug ID: 108992 Summary: Regression: Lenovo e585 (ryzen 2500u) freezes during boot with 4.20-rc5, amdgpu error Product: DRI Version: XOrg git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: christian.frank.uwb@googlemail.com
Created attachment 142765 --> https://bugs.freedesktop.org/attachment.cgi?id=142765&action=edit amdgpu error message
Hi,
i upgraded from mainline kernel 4.19.7 to 4.20-rc5. Sadly using that kernel the system freezes when it tries to show gdm.
OS: Ubuntu 18.04.1
Kernel: Linux version 4.20.0-042000rc5-generic (kernel@gloin) (gcc version 8.2.0 (Ubuntu 8.2.0-10ubuntu1)) #201812030721 SMP Mon Dec 3 12:23:24 UTC 2018
Command line: BOOT_IMAGE=/boot/vmlinuz-4.20.0-042000rc5-generic root=UUID=1381a98d-77fd-481f-9cdb-115b30829bd8 ro ivrs_ioapic[32]=00:14.0 ivrs_ioapic[33]=00:00.1 vt.handoff=1
Mesa is at version 18.2.2 (X-Swat ppa)
Firmware files: ll /lib/firmware/amdgpu/rav* -rw-r--r-- 1 root root 33280 Nov 6 21:32 /lib/firmware/amdgpu/raven_asd.bin -rw-r--r-- 1 root root 9344 Nov 6 21:32 /lib/firmware/amdgpu/raven_ce.bin -rw-r--r-- 1 root root 316 Apr 25 2018 /lib/firmware/amdgpu/raven_gpu_info.bin -rw-r--r-- 1 root root 17536 Nov 6 21:32 /lib/firmware/amdgpu/raven_me.bin -rw-r--r-- 1 root root 263808 Nov 6 21:32 /lib/firmware/amdgpu/raven_mec2.bin -rw-r--r-- 1 root root 263808 Nov 6 21:32 /lib/firmware/amdgpu/raven_mec.bin -rw-r--r-- 1 root root 21632 Nov 6 21:32 /lib/firmware/amdgpu/raven_pfp.bin -rw-r--r-- 1 root root 26948 Nov 6 21:32 /lib/firmware/amdgpu/raven_rlc.bin -rw-r--r-- 1 root root 17408 Nov 6 21:32 /lib/firmware/amdgpu/raven_sdma.bin -rw-r--r-- 1 root root 341728 Apr 25 2018 /lib/firmware/amdgpu/raven_vcn.bin
christian@christian-ThinkPad-E585:~$ apt-cache show linux-firmware Package: linux-firmware Architecture: all Version: 1.173.2
Error-Log from journalctl:
Dez 09 16:26:20 christian-ThinkPad-E585 set-cpufreq[874]: Setting ondemand scheduler for all CPUs Dez 09 16:26:20 christian-ThinkPad-E585 kernel: gmc_v9_0_process_interrupt: 28 callbacks suppressed Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32768, for process gnome-shell pid 1102 thread g Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: in page starting at address 0x0000800100020000 from 18 Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010013C Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:1 pasid:32768, for process gnome-shell pid 1102 thread g Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: in page starting at address 0x0000800100020000 from 18 Dez 09 16:26:20 christian-ThinkPad-E585 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010013C ez 09 16:26:20 christian-ThinkPad-E585 kernel: [Hardware Error]: Deferred error, no action required. Dez 09 16:26:20 christian-ThinkPad-E585 kernel: [Hardware Error]: CPU:0 (17:11:0) MC20_STATUS[-|-|MiscV|-|AddrV|Deferred|-|SyndV Dez 09 16:26:20 christian-ThinkPad-E585 systemd-journald[378]: Missed 68239 kernel messages Dez 09 16:26:20 christian-ThinkPad-E585 kernel: [Hardware Error]: Deferred error, no action required. Dez 09 16:26:20 christian-ThinkPad-E585 systemd-journald[378]: Missed 6630 kernel messages Dez 09 16:26:20 christian-ThinkPad-E585 kernel: [Hardware Error]: Coherent Slave Extended Error Code: 1 Dez 09 16:26:20 christian-ThinkPad-E585 systemd-journald[378]: Missed 7875 kernel messages
I attached an .txt file showing more of the error messages.
I also have seen freezes with 4.19.7 with a similar error message, but this happens very rarely. With 4.20-rc5 the issue happens every time gdm tries to start, which makes the system unusable.
If you need any other info, please ping me.
Many thanks ! Christian
https://bugs.freedesktop.org/show_bug.cgi?id=108992
chris christian.frank.uwb@googlemail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|XOrg git |unspecified
https://bugs.freedesktop.org/show_bug.cgi?id=108992
chris christian.frank.uwb@googlemail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Regression: Lenovo e585 |Regression: Lenovo e585 |(ryzen 2500u) freezes |(ryzen 2500u) freezes |during boot with 4.20-rc5, |during boot with |amdgpu error |4.20-rc5/rc6, amdgpu error
--- Comment #1 from chris christian.frank.uwb@googlemail.com --- Hi,
tested with the newly released rc6, same issue.
Many thanks ! Christian
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #2 from Brian Schott briancschott@gmail.com --- I have the same issue with a 2700U in a Dell Inspiron 7375. All of the 4.20 RC versions that I have tried show the same problem. The system is able to boot with a 4.19 kernel.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #3 from Brian Schott briancschott@gmail.com --- The issue is still present in kernel 4.20.0-rc6-next-20181213.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #4 from Alex Deucher alexdeucher@gmail.com --- Can you boot the system without amdgpu loaded (e.g., append modprobe.blacklist=amdgpu)? Or is this a general platform problem?
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #5 from chris christian.frank.uwb@googlemail.com --- Can you boot the system without amdgpu loaded (e.g., append modprobe.blacklist=amdgpu)
-> Doing this, i am able to boot my system.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #6 from Brian Schott briancschott@gmail.com --- To clarify, the system can boot with the amdgpu module, but it will lock up when LightDM/X starts. Booting with the amdgpu module blacklisted works.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #7 from chris christian.frank.uwb@googlemail.com --- Yes, same here. The system boots until GDM wants to start, then it freezes with the mentioned amdgpu error. Disabling amdgpu let the system start up completely including gdm.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #8 from Alex Deucher alexdeucher@gmail.com --- Can you bisect?
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #9 from Brian Schott briancschott@gmail.com --- 020aa2ec15fc4a5ffdfcab7dc0db648a137abc41 lets me log in before the system freezes.
770af5859d6903049b7f39ed4f4e6612b63fd82d locks up before LightDM can start.
I'll do a bit more testing.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #10 from Brian Schott briancschott@gmail.com --- Ignore that previous comment. I'm getting some strange results here and may have marked a commit with an intermittent crash as "good" while bisecting.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #11 from Brian Schott briancschott@gmail.com --- "bc537a9cc47eec7f4e32b8164c494ddc35dca8ac is the first bad commit"
Well, that's kind of useless. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?h=bc...
Any suggestions on how to get a better idea of where the break was?
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #12 from Michel Dänzer michel@daenzer.net --- Make sure you've tested a commit plenty before declaring it "good".
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #13 from Brian aeon.descriptor@gmail.com --- FYI, as a workaround, you can use the kernel opt:
iommu=pt
..at least, on 4.20 rc7, which is the only one I've tried that on, but it should work with others.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #14 from chris christian.frank.uwb@googlemail.com --- I can confirm that the iommu=pt workaround works, also iommu=soft works to get gdm started and use the laptop. Sadly i have no idea what impact those workarounds have when it comes to performance of the gpu/cpu or battery lifetime ?
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #15 from chris christian.frank.uwb@googlemail.com --- (In reply to chris from comment #14)
I can confirm that the iommu=pt workaround works, also iommu=soft works to get gdm started and use the laptop. Sadly i have no idea what impact those workarounds have when it comes to performance of the gpu/cpu or battery lifetime ?
Sadly i had a freeze during desktop usage shortly after boot using iommu=pt. The driver situation for raven ridge is really sad atm :( .
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #16 from Brian Schott briancschott@gmail.com --- I've tested a next-20181221 kernel with IOMMU_DEFAULT_PASSTHROUGH set, and I'm able to get the system to start properly. Still seeing some system lockups, when playing games, but it's better than crashing on the login screen.
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #17 from chris christian.frank.uwb@googlemail.com --- Hi,
the laptop is still freezing when trying to start with kernel 4.20 (release version) using latest amdgpu firmware from kernel firmware git.
Using iommu=soft still solves that issue.
I also tested with a kernel daily build from 26.12 which should include the latest drm changes, and it also shows the same issue.
Is there anything we can provide to help finding the root cause ?
Many thanks ! Christian
https://bugs.freedesktop.org/show_bug.cgi?id=108992
Zheng Luo vicluo96@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=109200
https://bugs.freedesktop.org/show_bug.cgi?id=108992
Zheng Luo vicluo96@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |vicluo96@gmail.com
--- Comment #18 from Zheng Luo vicluo96@gmail.com --- *** Bug 109200 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #19 from Zheng Luo vicluo96@gmail.com --- Created attachment 142928 --> https://bugs.freedesktop.org/attachment.cgi?id=142928&action=edit full kernel log
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #20 from Ian Kidd ikidd3123@gmail.com --- Seeing same issue with Dell 5575 (AMD 2500u, Vega mobile) on 4.20 Release. iommu=soft seems to allow boot.
Kernel Log: https://gist.github.com/ikidd/692dea4c63cc7656247071322d066405
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #21 from Zheng Luo vicluo96@gmail.com --- With iommu=soft I still occasionally experience frozen screen with following logs:
Jan 02 16:11:18 lzThinkpad gnome-shell[1647]: Failed to flip: Cannot allocate memory Jan 02 16:11:18 lzThinkpad kernel: amdgpu 0000:05:00.0: 00000000a2e0b642 pin failed Jan 02 16:11:18 lzThinkpad kernel: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #22 from Sergio Perez Dagobertstaler@gmail.com --- I would like to add that on my Lenovo E585 iommu=pt works reliably; even for hours and doing games/webvideos. But a few minutes in wayland produce a frozen screen (without iommu=pt is does not even start).
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #23 from Alex Deucher alexdeucher@gmail.com --- Can anyone else try and bisect?
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #24 from Chí-Thanh Christopher Nguyễn chithanh@gentoo.org --- No problem here with amdgpu and iommu enabled, running kernel 4.20.0 on Dell Latitude 5495 (2700U). So BIOS issue maybe?
iommu=pt is however still needed for kfd (bug 107898).
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #25 from tones111@hotmail.com --- Created attachment 142974 --> https://bugs.freedesktop.org/attachment.cgi?id=142974&action=edit journalctl -b of lockup from bisected commit
E585 owner here. Please let me know if I can provide any additional information that would be helpful. Thanks in advance for your help.
This problem was very consistently reproduced during the bisect. I've attached a journalctl -b from the first bad commit. I was able to bisect the problem to...
284dec4317c8e76f45d3ce922f673c80331812f1 is the first bad commit commit 284dec4317c8e76f45d3ce922f673c80331812f1 Author: Christian König christian.koenig@amd.com Date: Wed Aug 22 16:44:56 2018 +0200
drm/amdgpu: enable GTT PD/PT for raven v3
Should work on Vega10 as well, but with an obvious performance hit.
Older APUs can be enabled as well, but will probably be more work.
v2: fix error checking v3: use more general check
Signed-off-by: Christian König christian.koenig@amd.com Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #26 from chris christian.frank.uwb@googlemail.com --- Hi,
many thanks for that bisect. I googled the commit and found the following in addition which seems to be the same issue ?
https://bugzilla.kernel.org/show_bug.cgi?id=201727
Hope that helps.
Many thanks ! Christian
https://bugs.freedesktop.org/show_bug.cgi?id=108992
Jan Vesely jan.vesely@rutgers.edu changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugzilla.kernel.org | |/show_bug.cgi?id=201727
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #27 from chris christian.frank.uwb@googlemail.com --- Still the same issue with kernel 5.0-rc1. Any plan on when to tackle that issue ?
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #28 from Alex Deucher alexdeucher@gmail.com --- Should be fixed with this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #29 from tones111@hotmail.com --- I'm able to boot when building from that commit (1c1eba8) and looks like it will land in 4.20.4.
Thanks!
https://bugs.freedesktop.org/show_bug.cgi?id=108992
--- Comment #30 from chris christian.frank.uwb@googlemail.com --- Very nice. Just tried 5.0-rc2 and booting works fine now without the iommu workaround !
https://bugs.freedesktop.org/show_bug.cgi?id=108992
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #31 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/633.
dri-devel@lists.freedesktop.org