https://bugzilla.kernel.org/show_bug.cgi?id=201727
Bug ID: 201727 Summary: Hardware Error reported on Ryzen 5 2500U Product: Drivers Version: 2.5 Kernel Version: 4.20-rc3 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: misko.herko@gmail.com Regression: No
Created attachment 279511 --> https://bugzilla.kernel.org/attachment.cgi?id=279511&action=edit dmesg
How to reproduce: Boot and start any graphical application, for example gdm Expected: gdm will start Actual: screen goes black, with the cursor visible in top left corner, log is full of [Hardware Error]
I have bisected the bug, it seems it is introduced in commit 284dec4317c8e76f45d3ce922f673c80331812f1.
let me know if you think the hardware is actually broken, but i have had no issues on windows, and no hardware errors on 4.19 kernel.
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #1 from Michal Herko (misko.herko@gmail.com) --- Created attachment 279513 --> https://bugzilla.kernel.org/attachment.cgi?id=279513&action=edit /proc/cpuinfo
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #2 from Michal Herko (misko.herko@gmail.com) --- Created attachment 279515 --> https://bugzilla.kernel.org/attachment.cgi?id=279515&action=edit lspci -vvv
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #3 from Michal Herko (misko.herko@gmail.com) --- Created attachment 279517 --> https://bugzilla.kernel.org/attachment.cgi?id=279517&action=edit .config
https://bugzilla.kernel.org/show_bug.cgi?id=201727
Mike Lothian (mike@fireburn.co.uk) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mike@fireburn.co.uk
--- Comment #4 from Mike Lothian (mike@fireburn.co.uk) --- If you've bisected then it's unlikely to be a hardware issue
If you revert the commit does everything work again?
https://bugzilla.kernel.org/show_bug.cgi?id=201727
Michel Dänzer (michel@daenzer.net) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |christian.koenig@amd.com
--- Comment #5 from Michel Dänzer (michel@daenzer.net) --- That's
commit 284dec4317c8e76f45d3ce922f673c80331812f1 Author: Christian König christian.koenig@amd.com Date: Wed Aug 22 16:44:56 2018 +0200
drm/amdgpu: enable GTT PD/PT for raven v3
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #6 from Christian König (christian.koenig@amd.com) --- Currently GTT is only used for PD/PT as last resort when there is so few stolen memory assigned to the APU that it won't work at all otherwise.
The faulting address looks suspicious like we miss to handle an error code correctly somewhere and instead use the value as DMA address.
What is your BIOS setting for the stolen VRAM?
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #7 from Michal Herko (misko.herko@gmail.com) --- Everything works after a revert. There was a conflict, i am attaching a diff of the revert.
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #8 from Michal Herko (misko.herko@gmail.com) --- Created attachment 279521 --> https://bugzilla.kernel.org/attachment.cgi?id=279521&action=edit revert of 284dec
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #9 from Michal Herko (misko.herko@gmail.com) ---
What is your BIOS setting for the stolen VRAM?
I can't find any such settings in bios. I really do not have any options regarding video.
https://bugzilla.kernel.org/show_bug.cgi?id=201727
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #10 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to Christian König from comment #6)
What is your BIOS setting for the stolen VRAM?
[ 2.665594] [drm] amdgpu: 256M of VRAM memory ready
https://bugzilla.kernel.org/show_bug.cgi?id=201727
Michal Herko (misko.herko@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |CODE_FIX
--- Comment #11 from Michal Herko (misko.herko@gmail.com) --- The bug is no more present on v4.20-rc5.
https://bugzilla.kernel.org/show_bug.cgi?id=201727
Rafał Miłecki (zajec5@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |zajec5@gmail.com
--- Comment #12 from Rafał Miłecki (zajec5@gmail.com) --- I was curious what could have fixed that. I tried to reproduce it on a totally different notebook with 2500U (EliteBook 745 G5) but I wasn't getting any MCE reported errors with the commit 284dec4317c8 ("drm/amdgpu: enable GTT PD/PT for raven v3"). Probably because of having more stole VRAM: [ 5.232179] [drm] amdgpu: 1024M of VRAM memory ready
It's probably one of those: git log --oneline v4.20-rc2..v4.20-rc5 drivers/gpu/drm/amd/amdgpu/ ad97d9de4583 drm/amdgpu: Add delay after enable RLC ucode 1954db153d18 drm/amdgpu: Avoid endless loop in GPUVM fragment processing 9ce2b991f7ea drm/amdgpu: Cast to uint64_t before left shift a5d0f4565996 drm/amdgpu: Enable HDP memory light sleep 8d4d7c589947 drm/amdgpu: Add missing firmware entry for HAINAN 919a52fc4ca1 drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset 69756c6ff0de drm/amdgpu: Add amdgpu "max bpc" connector property (v2) c1a17777eb45 drm/amdgpu: fix huge page handling on Vega10 c837243ff401 drm/amdgpu: fix bug with IH ring setup 5581c670fb7e drm/amdgpu: set system aperture to cover whole FB region
https://bugzilla.kernel.org/show_bug.cgi?id=201727
Michal Herko (misko.herko@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|CODE_FIX |---
--- Comment #13 from Michal Herko (misko.herko@gmail.com) --- i reopen because i have realized the bug is still present. i have accidentally tested with "iommu=soft" kernel parameter. When using this parameter the bug is not displayed, and the system is usable.
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #14 from tones111@hotmail.com --- I'm experiencing this bug on a Lenovo E585. My boot logs also show only "256M of VRAM memory ready" when running 4.19.12 or commit 284dec4317c8. 4.19.12 boots fine, but the commit in question causes a lockup.
Is there any data I can collect or patches to test to support resolving this? Thanks for any insight or direction.
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #15 from tones111@hotmail.com --- Created attachment 280291 --> https://bugzilla.kernel.org/attachment.cgi?id=280291&action=edit v5.0-rc1 boot lockup
Boot log running v5.0-rc1 attached
https://bugzilla.kernel.org/show_bug.cgi?id=201727
--- Comment #16 from Alex Deucher (alexdeucher@gmail.com) --- should be fixed with: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
https://bugzilla.kernel.org/show_bug.cgi?id=201727
Michal Herko (misko.herko@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |CODE_FIX
--- Comment #17 from Michal Herko (misko.herko@gmail.com) --- I confirm the fix works. v5.0-rc3 also works. Let me know if you need any more testing.
dri-devel@lists.freedesktop.org