https://bugzilla.kernel.org/show_bug.cgi?id=205497
Bug ID: 205497 Summary: Attempt to read amd gpu id causes a freeze Product: Drivers Version: 2.5 Kernel Version: 5.3.9 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: luya@fedoraproject.org Regression: No
Created attachment 285871 --> https://bugzilla.kernel.org/attachment.cgi?id=285871&action=edit Script from radeontop to read AMD gpu ids
Running an utility named radeontop on an AMD APU causes a freeze while attempting to read amdgpu ids. Attached is the script. It will be nice to provide a better method to read AMD GPU cards.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #1 from albertogomezmarin@gmail.com --- It is happening for me too with a Vega integrated graphics. Totally freeze with no graphic load and the utility running
https://bugzilla.kernel.org/show_bug.cgi?id=205497
clst (claudius+kernel@hausnetz.lettenbach.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |claudius+kernel@hausnetz.le | |ttenbach.com
--- Comment #2 from clst (claudius+kernel@hausnetz.lettenbach.com) --- I think this might be a regression since radeontop worked fine with 4.19 on my Acer Nitro with Ryzen 5 2500U Raven + Polaris RX 560
The freezes are also not instant I get about a few seconds up to a few minutes before it hangs (might be dependent on load).
Some more information might be here: https://github.com/clbr/radeontop/issues/87
https://bugzilla.kernel.org/show_bug.cgi?id=205497
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #3 from Alex Deucher (alexdeucher@gmail.com) --- Created attachment 285881 --> https://bugzilla.kernel.org/attachment.cgi?id=285881&action=edit possible fix
Assuming radeontop uses the info ioctl to query the registers, this patch should fix it. If it mmaps the register BAR directly, there's nothing you can do. Accessing registers while the gfx block is off will lead to garbage data and possibly hang the chip.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #285881|0 |1 is obsolete| |
--- Comment #4 from Alex Deucher (alexdeucher@gmail.com) --- Created attachment 285883 --> https://bugzilla.kernel.org/attachment.cgi?id=285883&action=edit possible fix
updated patch to handle cached registers properly.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
V.I.S. (itemcode@mail.ru) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |itemcode@mail.ru
--- Comment #5 from V.I.S. (itemcode@mail.ru) --- Hi. Please add patch for 4.19.x LTS kernels too.
Thanks.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #285883|0 |1 is obsolete| |
--- Comment #6 from Alex Deucher (alexdeucher@gmail.com) --- Created attachment 285923 --> https://bugzilla.kernel.org/attachment.cgi?id=285923&action=edit possible fix
Better fix.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
Trek (trek00@inbox.ru) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |trek00@inbox.ru
--- Comment #7 from Trek (trek00@inbox.ru) --- as users reported, this bug should only affects kernels 5.2+
by default, radeontop calls amdgpu_read_mm_registers, amdgpu_query_info and amdgpu_query_sensor_info, but it can be forced by the command line to read BAR from /dev/mem
there is a kernel dump at https://github.com/clbr/radeontop/issues/87#issuecomment-529267244
thank you for the patch, but I cannot test it as my hardware is not affected (KAVERI)
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #8 from V.I.S. (itemcode@mail.ru) --- Please read here... https://github.com/lestofante/ksysguard-gpu/issues/4
Same issue on 4.19.x LTS kernel.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #9 from Trek (trek00@inbox.ru) --- thanks, I was not aware of it, may be different hardware from the ones on which kernel 4.19/5.1 works?
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #10 from V.I.S. (itemcode@mail.ru) --- AMD Ryzen 5 2600G + AMD RX560 (multiseat system), system freezed after few days on kernel 4.19.83 in my case.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #11 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to Trek from comment #7)
by default, radeontop calls amdgpu_read_mm_registers, amdgpu_query_info and amdgpu_query_sensor_info, but it can be forced by the command line to read BAR from /dev/mem
If you access the BAR directly you will likely have problems in certain power saving modes.
Can someone test the patch?
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #12 from V.I.S. (itemcode@mail.ru) --- I need approx 3-5 days for testing, because this bug is not persistent.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #13 from Luya Tshimbalanga (luya@fedoraproject.org) --- (In reply to Alex Deucher from comment #11)
(In reply to Trek from comment #7)
by default, radeontop calls amdgpu_read_mm_registers, amdgpu_query_info and amdgpu_query_sensor_info, but it can be forced by the command line to read BAR from /dev/mem
If you access the BAR directly you will likely have problems in certain power saving modes.
Can someone test the patch?
Currently building on https://copr.fedorainfracloud.org/coprs/luya/kernel-amgpu-gfxoff/build/10956...
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #14 from Trek (trek00@inbox.ru) --- (In reply to Alex Deucher from comment #11)
If you access the BAR directly you will likely have problems in certain power saving modes.
thank you, I'll add a warning message when accessing BAR directly
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #15 from Luya Tshimbalanga (luya@fedoraproject.org) --- Created attachment 285947 --> https://bugzilla.kernel.org/attachment.cgi?id=285947&action=edit dmesg from amd raven ridege Ryzen 2500u
dmesg showing latest kernel git snapshot
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #16 from Luya Tshimbalanga (luya@fedoraproject.org) --- Created attachment 285949 --> https://bugzilla.kernel.org/attachment.cgi?id=285949&action=edit amdgpu firmware info
Firmware information of amdgpu installed in the testing system
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #17 from Luya Tshimbalanga (luya@fedoraproject.org) --- Created attachment 285951 --> https://bugzilla.kernel.org/attachment.cgi?id=285951&action=edit Screenshot of radeontop running with patched kernel
Running radeontop with the patched test kernel, I can confirm the patch fixed the freezing issue which no longer occurs as the card is correctly picked up.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #18 from Luya Tshimbalanga (luya@fedoraproject.org) --- Reading another bug report on https://bugzilla.kernel.org/show_bug.cgi?id=204689 taken from amdgfx mailing list, could that issue related?
Anyway, radeontop still runs with the patched kernel. No noticeable freeze and I tested with Blender rendering the old Ryzen CPU 3D model with GPU compute running on rocm-opencl (which needs optimization compared to amdgpu-pro-opencl).
To Alex, will it possible to prepare the patch in the patchwork.kernel.org? Thanks.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #19 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to Luya Tshimbalanga from comment #18)
Reading another bug report on https://bugzilla.kernel.org/show_bug.cgi?id=204689 taken from amdgfx mailing list, could that issue related?
Not likely.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
Luya Tshimbalanga (luya@fedoraproject.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |CODE_FIX
--- Comment #20 from Luya Tshimbalanga (luya@fedoraproject.org) --- I confirm the fix landed on kernel 5.4. Thanks Alex for a quick investigation. Closing this report.
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #21 from albertogomezmarin@gmail.com --- (In reply to Luya Tshimbalanga from comment #20)
I confirm the fix landed on kernel 5.4. Thanks Alex for a quick investigation. Closing this report.
For me It Is happening again, i dont know since what kernel. Ivhace an Asus with ryzen 5 3550H
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #22 from Luya Tshimbalanga (luya@fedoraproject.org) --- (In reply to albertogomezmarin from comment #21)
(In reply to Luya Tshimbalanga from comment #20)
I confirm the fix landed on kernel 5.4. Thanks Alex for a quick investigation. Closing this report.
For me It Is happening again, i dont know since what kernel. Ivhace an Asus with ryzen 5 3550H
Did the latest updated kernel resolve the issue?
https://bugzilla.kernel.org/show_bug.cgi?id=205497
--- Comment #23 from albertogomezmarin@gmail.com --- I did not test it. I have not here the laptop to do it. I have now another laptop with ryyzen 7 3700U
https://bugzilla.kernel.org/show_bug.cgi?id=205497
Sling Shot (sling-shot@gmx.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |sling-shot@gmx.com
--- Comment #24 from Sling Shot (sling-shot@gmx.com) --- It is still happening. For me it is an almost instant lock. REISUB does not work.
CPU : Quad Core AMD Ryzen 3 2200G with Radeon Vega Graphics (-MCP-) Kernel : 5.6.17-pclos1 x86_64 Shell : bash 4.4.23
dri-devel@lists.freedesktop.org