https://bugzilla.kernel.org/show_bug.cgi?id=205089
Bug ID: 205089 Summary: amdgpu : drm:amdgpu_cs_ioctl : Failed to initialize parser -125 Product: Drivers Version: 2.5 Kernel Version: 5.3.2 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: maxijac@free.fr Regression: No
Hello,
I am experiencing freezes with kernel 5.3.2 and amdgpu on a Vega 64 card.
This happens during games (I experience it on CS:GO) but it is a bit random and takes time to eventually trigger. Once it triggers my dmesg is filled with errors:
[ 9156.537524] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 9156.747176] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 9156.747224] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 9156.883220] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 9156.883285] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
When it happens, the image hangs and PC is unresponsive. Sometimes I manage to switch to a TTY, but then the screen is corrupted.
HW: - AMD Ryzen 2700X CPU - AMD RX vega 64
SW: - Kernel 5.3.2 - Mesa 19.2.0
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #1 from Alex Deucher (alexdeucher@gmail.com) --- The GPU has reset and so you need to restart your desktop environment to continue. The error messages are because the kernel is rejecting commands from userspace because the application needs to recreate their contexts after a GPU reset. Things like desktop compositors would need to use the OpenGL robustness extensions and recreate their contexts after a GPU reset for this to work smoothly. Unfortunately, no desktop compositors do this at the moment.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #2 from Bruno Jacquet (maxijac@free.fr) --- If I understand you right this means there is still another issue that caused the GPU reset. And this issue in particular is just a consequence of the reset not being properly handled?
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #3 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to Bruno Jacquet from comment #2)
If I understand you right this means there is still another issue that caused the GPU reset. And this issue in particular is just a consequence of the reset not being properly handled?
The GPU reset succeeded. However, since the GPU has been reset, the contents of the memory (e.g, vram) that the application was using is undefined. So the application needs to use an API level (e.g., OpenGL robustness extensions or vulkan context lost) interface to query whether the GPU was reset and re-initialize it's buffers if so.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #4 from Bruno Jacquet (maxijac@free.fr) --- Okay, I got this, but should I investigate the initial GPU reset cause?
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #5 from Alex Deucher (alexdeucher@gmail.com) --- If you could come up with a reproducible test case, that would help for tracking down why it's hanging in the first place.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #6 from Bruno Jacquet (maxijac@free.fr) --- Hello Alex,
Well my test case is still very random, but I finally managed to get the full dmesg, the initial error seems to be this: [34856.817554] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted! [34858.320812] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=6337674, emitted seq=6337676 [34858.320854] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process csgo_linux64 pid 12587 thread csgo_linux:cs0 pid 12595 [34858.320857] amdgpu 0000:1f:00.0: GPU reset begin!
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #7 from Bruno Jacquet (maxijac@free.fr) --- Created attachment 285483 --> https://bugzilla.kernel.org/attachment.cgi?id=285483&action=edit dmesg of fence timeout error
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Andreas Schneider (asn@samba.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |asn@samba.org
--- Comment #8 from Andreas Schneider (asn@samba.org) --- I've hit the same error when trying to run vkdt [1], the darktable RAW image developer prototype written in Vulkan.
I can reliably reproduce the issue with it.
Kernel 5.3.4 Mesa 19.1.7 Vulkan 1.1.123
After compiling use:
./vkdt -g default-darkroom.cfg -d all path/to/RAW_images
[1] https://github.com/hanatos/vkdt
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #9 from Andreas Schneider (asn@samba.org) --- I totally forgot the GPU is a RX 470.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #10 from Bruno Jacquet (maxijac@free.fr) --- With a more recent stack it seems I am no longer experiencing this. Kernel 5.4.35 and mesa 20.0.5 seems stable for me.
Andreas, did you try upgrading your SW components and see if you still have the issue?
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #11 from Andreas Schneider (asn@samba.org) --- Yes, seems to work. I think this can be closed.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Bruno Jacquet (maxijac@free.fr) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |CODE_FIX
--- Comment #12 from Bruno Jacquet (maxijac@free.fr) --- OK, closing.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Lech (vandalhj@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |vandalhj@gmail.com
--- Comment #13 from Lech (vandalhj@gmail.com) --- Jul 25 09:19:54 lech-ryzen-vega kernel: [37627.065966] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out! Jul 25 09:19:54 lech-ryzen-vega kernel: [37631.935858] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1228554, emitted seq=1228556 Jul 25 09:19:54 lech-ryzen-vega kernel: [37631.935939] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process HeroesOfTheStor pid 28617 thread HeroesOfTheStor pid 28691 Jul 25 09:19:54 lech-ryzen-vega kernel: [37631.935948] amdgpu 0000:0b:00.0: GPU reset begin! Jul 25 09:19:54 lech-ryzen-vega kernel: [37632.181860] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out! Jul 25 09:19:54 lech-ryzen-vega kernel: [37632.312215] amdgpu 0000:0b:00.0: GPU BACO reset Jul 25 09:19:55 lech-ryzen-vega kernel: [37632.888325] amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume Jul 25 09:19:55 lech-ryzen-vega kernel: [37632.888485] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000). Jul 25 09:19:55 lech-ryzen-vega kernel: [37632.888509] [drm] VRAM is lost due to GPU reset! Jul 25 09:19:55 lech-ryzen-vega kernel: [37632.888833] [drm] PSP is resuming... Jul 25 09:19:55 lech-ryzen-vega kernel: [37633.076488] [drm] reserve 0x400000 from 0xf5fe800000 for PSP TMR Jul 25 09:19:55 lech-ryzen-vega kernel: [37633.255659] [drm] kiq ring mec 2 pipe 1 q 0 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373718] snd_hda_intel 0000:0b:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00af2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373723] snd_hda_intel 0000:0b:00.1: spurious response 0x0:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373726] snd_hda_intel 0000:0b:00.1: spurious response 0x0:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373728] snd_hda_intel 0000:0b:00.1: spurious response 0x233:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373730] snd_hda_intel 0000:0b:00.1: spurious response 0x0:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373731] snd_hda_intel 0000:0b:00.1: spurious response 0x1:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373733] snd_hda_intel 0000:0b:00.1: spurious response 0x0:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373735] snd_hda_intel 0000:0b:00.1: spurious response 0x0:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373736] snd_hda_intel 0000:0b:00.1: spurious response 0x0:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373738] snd_hda_intel 0000:0b:00.1: spurious response 0x0:0x0, last cmd=0xaf2d00 Jul 25 09:19:56 lech-ryzen-vega kernel: [37634.373739] snd_hda_intel 0000:0b:00.1: spurious response 0x0:0x0, last cmd=0xaf2d00 Jul 25 09:19:57 lech-ryzen-vega kernel: [37635.377702] snd_hda_intel 0000:0b:00.1: No response from codec, disabling MSI: last cmd=0x00a72d01 Jul 25 09:19:58 lech-ryzen-vega kernel: [37636.393677] snd_hda_intel 0000:0b:00.1: No response from codec, resetting bus: last cmd=0x00a72d01 Jul 25 09:19:59 lech-ryzen-vega kernel: [37637.397658] snd_hda_intel 0000:0b:00.1: azx_get_response timeout, switching to single_cmd mode: last cmd=0x00b77701 Jul 25 09:19:59 lech-ryzen-vega kernel: [37637.419432] [drm] UVD and UVD ENC initialized successfully. Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519135] [drm] VCE initialized successfully. Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519149] amdgpu 0000:0b:00.0: ring gfx uses VM inv eng 0 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519151] amdgpu 0000:0b:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519153] amdgpu 0000:0b:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519155] amdgpu 0000:0b:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519156] amdgpu 0000:0b:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519158] amdgpu 0000:0b:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519159] amdgpu 0000:0b:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519161] amdgpu 0000:0b:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519162] amdgpu 0000:0b:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519164] amdgpu 0000:0b:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519166] amdgpu 0000:0b:00.0: ring sdma0 uses VM inv eng 0 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519167] amdgpu 0000:0b:00.0: ring page0 uses VM inv eng 1 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519169] amdgpu 0000:0b:00.0: ring sdma1 uses VM inv eng 4 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519170] amdgpu 0000:0b:00.0: ring page1 uses VM inv eng 5 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519171] amdgpu 0000:0b:00.0: ring uvd_0 uses VM inv eng 6 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519173] amdgpu 0000:0b:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519174] amdgpu 0000:0b:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519176] amdgpu 0000:0b:00.0: ring vce0 uses VM inv eng 9 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519177] amdgpu 0000:0b:00.0: ring vce1 uses VM inv eng 10 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519179] amdgpu 0000:0b:00.0: ring vce2 uses VM inv eng 11 on hub 1 Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519180] [drm] ECC is not present. Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.519182] [drm] SRAM ECC is not present. Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.520993] [drm] recover vram bo from shadow start Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522435] [drm] recover vram bo from shadow done Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522437] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522438] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522466] amdgpu 0000:0b:00.0: GPU reset(2) succeeded! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522477] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522479] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522481] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522482] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522484] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522485] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522487] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522488] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522489] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522491] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522492] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522493] [drm] Skip scheduling IBs! Jul 25 09:20:00 lech-ryzen-vega kernel: [37637.522770] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Jul 25 09:20:10 lech-ryzen-vega kernel: [37648.127879] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Jul 25 09:20:10 lech-ryzen-vega kernel: [37648.129190] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Jul 25 09:20:10 lech-ryzen-vega kernel: [37648.162337] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Jul 25 09:20:10 lech-ryzen-vega kernel: [37648.164145] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Jul 25 09:20:10 lech-ryzen-vega kernel: [37648.164261] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Jul 25 09:20:10 lech-ryzen-vega kernel: [37648.167924] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Jul 25 09:20:10 lech-ryzen-vega kernel: [37648.168801] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
HW: Vega 56 Ryzen 3600X
SW: 5.7.1-050701-generic x86_64 Mesa 20.2.0-devel (git-14a12b7 2020-07-24 focal-oibaf-ppa)
You can safely reopen it.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
jesper@jnsn.dev changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jesper@jnsn.dev
--- Comment #14 from jesper@jnsn.dev --- I'm now seeing this bug again. This time it happening while launching dota2.
Hardware: RX 5700 XT Ryzen 3800X
Software: Mesa 21.1.5 (arch mainline) Linux 5.13.4.arch2-1
Log (Notice that it's most recent first): Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset(2) succeeded! Jul 26 22:15:55 delusionalStation kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Jul 26 22:15:55 delusionalStation kernel: [drm] Skip scheduling IBs! ... A bunch of repeats Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: recover vram bo from shadow done Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: recover vram bo from shadow start Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 Jul 26 22:15:55 delusionalStation kernel: [drm] JPEG decode initialized successfully. Jul 26 22:15:55 delusionalStation kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode). Jul 26 22:15:55 delusionalStation kernel: [drm] kiq ring mec 2 pipe 1 q 0 Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: SMU is resumed successfully! Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: SMU is resuming... Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: RAP: optional rap ta ucode is not available Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: RAS: optional ras ta ucode is not available Jul 26 22:15:55 delusionalStation kernel: [drm] reserve 0x900000 from 0x81fe400000 for PSP TMR Jul 26 22:15:55 delusionalStation kernel: [drm] PSP is resuming... Jul 26 22:15:55 delusionalStation kernel: [drm] VRAM is lost due to GPU reset! Jul 26 22:15:55 delusionalStation kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000). Jul 26 22:15:55 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset succeeded, trying to resume Jul 26 22:15:51 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: BACO reset Jul 26 22:15:51 delusionalStation kernel: [drm] free PSP TMR buffer Jul 26 22:15:51 delusionalStation kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx Jul 26 22:15:51 delusionalStation kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed Jul 26 22:15:51 delusionalStation kernel: amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) Jul 26 22:15:51 delusionalStation kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed Jul 26 22:15:51 delusionalStation kernel: amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) Jul 26 22:15:51 delusionalStation kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin! Jul 26 22:15:51 delusionalStation kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process dota2 pid 31372 thread dota2:cs0 pid 31391 Jul 26 22:15:51 delusionalStation kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=13190067, emitted seq=13190069 Jul 26 22:15:51 delusionalStation kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Alois Nespor (info@aloisnespor.info) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |info@aloisnespor.info
--- Comment #15 from Alois Nespor (info@aloisnespor.info) --- i can confirm, have same problem now with Ryzen 5 3400G (RX Vega 11).
kernel 5.13.4 and mesa 21.1.5
https://bugzilla.kernel.org/show_bug.cgi?id=205089
mcmarius@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mcmarius@gmx.net
--- Comment #16 from mcmarius@gmx.net --- i have the same problem with the kernel 5.11.22-2-MANJARO
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #17 from Alex Deucher (alexdeucher@gmail.com) --- Does up/downgrading the mesa driver help?
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #18 from jesper@jnsn.dev --- On 02/08/21 at 02:13pm, bugzilla-daemon@bugzilla.kernel.org wrote:
Does up/downgrading the mesa driver help?
Upgrading to the latest git revision of mesa has fixed Dota 2 for me at least.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
ctjansson@protonmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ctjansson@protonmail.com
--- Comment #19 from ctjansson@protonmail.com --- I just triggered this bug aswell playing Payday 2.
I have also triggered this bug when playing World of Warcraft in june.
OS: EndeavourOS Linux x86_64 Kernel: 5.13.10-arch1-1 Mesa: 21.1.6 DE: GNOME 40.3 CPU: Ryzen 9 5900X GPU: RX 6800 XT
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #20 from Alois Nespor (info@aloisnespor.info) --- (In reply to Alois Nespor from comment #15)
i can confirm, have same problem now with Ryzen 5 3400G (RX Vega 11).
kernel 5.13.4 and mesa 21.1.5
seems fixed with linux-firmware 20210818.c46b8c3 for me see https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/... they revert some fw due stability issues
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Joey Espinosa (jlouis.espinosa@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jlouis.espinosa@gmail.com
--- Comment #21 from Joey Espinosa (jlouis.espinosa@gmail.com) --- That didn't fix it for me. I'm having the exact same issue (same behavior, anyway), and I'm on linux-firmware 20211027-126.fc35 (Fedora 35).
I started experiencing it after an update a few days ago, and I thought maybe upgrading the OS from 34 -> 35 would maybe fix it. It didn't.
OS: Fedora 35 CPU: Ryzen 5950X GPU: RX 6900 XT
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #22 from Joey Espinosa (jlouis.espinosa@gmail.com) --- Kernel version would help too probably :-/
5.14.16-301.fc35.x86_64
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #23 from Joey Espinosa (jlouis.espinosa@gmail.com) --- ... and I guess some of this info:
Mesa: 21.2.5 DE: Gnome 41.1 Vulkan: 1.2.189 Xorg: 1.20.11
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Antoni (56turtle56@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |56turtle56@gmail.com
--- Comment #24 from Antoni (56turtle56@gmail.com) --- this bug triggers almost every day for me. I use awesomeWM on arch linux (also tried KDE but it also happens there).
software linux 5.14.16.arch1-1 mesa 21.2.4-1 awesome 4.3-3
hardware amd ryzen 5 5600g (integrated gpu)
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Hristos (kernel-bugs@hristos.co) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kernel-bugs@hristos.co
--- Comment #25 from Hristos (kernel-bugs@hristos.co) --- Kernel: 5.15.3 Mesa: 21.2.5 Xorg: 7.6
I see this when running OpenMW and a lot of mods (https://modding-openmw.com/lists/total-overhaul/). OpenMW with no mods or a smaller mod list seems to run fine.
When the program starts rendering the actual game scene (after loading data files and etc) it will hang, and then crash with "Failed to initialize parser -125" messages in the console.
It only happens with Mesa 21.2.X, though. When I downgraded to Mesa 21.1.7 everything ran as expected.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #26 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to Hristos from comment #25)
Kernel: 5.15.3 Mesa: 21.2.5 Xorg: 7.6
I see this when running OpenMW and a lot of mods (https://modding-openmw.com/lists/total-overhaul/). OpenMW with no mods or a smaller mod list seems to run fine.
When the program starts rendering the actual game scene (after loading data files and etc) it will hang, and then crash with "Failed to initialize parser -125" messages in the console.
It only happens with Mesa 21.2.X, though. When I downgraded to Mesa 21.1.7 everything ran as expected.
This sounds like a mesa issue. You might want to open a mesa issue: https://gitlab.freedesktop.org/groups/mesa/-/issues
https://bugzilla.kernel.org/show_bug.cgi?id=205089
kernel@feedmebits.nl changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kernel@feedmebits.nl
--- Comment #27 from kernel@feedmebits.nl --- (In reply to Hristos from comment #25)
Kernel: 5.15.3 Mesa: 21.2.5 Xorg: 7.6
I see this when running OpenMW and a lot of mods (https://modding-openmw.com/lists/total-overhaul/). OpenMW with no mods or a smaller mod list seems to run fine.
When the program starts rendering the actual game scene (after loading data files and etc) it will hang, and then crash with "Failed to initialize parser -125" messages in the console.
It only happens with Mesa 21.2.X, though. When I downgraded to Mesa 21.1.7 everything ran as expected.
I'm running into the same issue with one of my games so far. What other packages did you have to downgrade besides mesa and lib32-mesa in order to get a working opengl?
https://bugzilla.kernel.org/show_bug.cgi?id=205089
James Clark (jjc@jclark.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jjc@jclark.com
--- Comment #28 from James Clark (jjc@jclark.com) --- I am seeing this on Ubuntu 21.10:
Kernel: 5.13.0 Mesa: 21.2.2
Hardware: CPU: 3950X GPU: RX 6600
This is regular desktop use: Chrome 96 with Wayland enabled (--ozone-platform=wayland --enable-features=VaapiVideoDecoder --enable-gpu-rasterization -enable-drd --enable-zero-copy --enable-canvas-oop-rasterization)
https://bugzilla.kernel.org/show_bug.cgi?id=205089
David Nichols (david@qore.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |david@qore.org
--- Comment #29 from David Nichols (david@qore.org) --- Also seeing it on Ubuntu 21.10 on aarch64:
Kernel: 5.15.0 Mesa: mesa_22.0~git2111150600
Hardware: GPU: AMD RX 580 (8GB) CPU: 16 Core Arm Cortex A72 SoC: NXP LX2160A (SolidRun HoneyComb system)
Running 2 games: flightgear and endless-sky
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #30 from David Nichols (david@qore.org) --- The amdgpu problems in my system were completely and definitively resolved with a memcpy() patch to glibc: https://gist.github.com/jnettlet/f6f8b49bb7c731255c46f541f875f436
The SoC I'm using (NXP LX2160A - SolidRun HoneyComb system) has a known bug regarding PCI device memory writes that can be completely addressed with a simple reordering of the assembly instructions in the arch-specific memcpy() implementation.
In any case, this is not a kernel bug for me after all. I can't comment on the source of the problem for others who most likely are running an x86_64 kernel.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Andreas Polnas (andreas.polnas93@hotmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |andreas.polnas93@hotmail.co | |m
--- Comment #31 from Andreas Polnas (andreas.polnas93@hotmail.com) --- (In reply to Jesper Jensen from comment #14)
I'm now seeing this bug again. This time it happening while launching dota2.
Hardware: RX 5700 XT Ryzen 3800X
Same here, happens with dota2 for me as well. With dual monitors this can happen occasionally, I can either turn one of the monitors off, or as I have done lately is to modify the Launch options of the game on steam to use -phased_window_create. I have no idea why this works. I will run with the setup and report back if it continues to solve the issue or if I have just been lucky.
Hardware: Motherboard:Z97-S02 (MS-7821) GPU: Radeon RX 5500/5500M CPU: i7-4770K
Software: Mesa 21.3.3 Kernel 5.10.89-1-MANJARO
https://bugzilla.kernel.org/show_bug.cgi?id=205089
zccrs (zccrs@live.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |zccrs@live.com
--- Comment #32 from zccrs (zccrs@live.com) --- Also seeing it on Archlinux x86_64:
Kernel: 5.17.0-rc6-next-20220304-1-next-git Mesa: 21.3.7-2
Hardware: GPU: VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c9) CPU: AMD 5600G
Running the gnome 41 with wayland
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #33 from Alex Deucher (alexdeucher@gmail.com) --- Please try newer or older mesa drivers if you can repro this with a particular game like dota2. The kernel driver is just the messenger.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Christine Lemmer-Webber (cwebber@dustycloud.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |cwebber@dustycloud.org
--- Comment #34 from Christine Lemmer-Webber (cwebber@dustycloud.org) --- Hello,
I'm running a mostly stock upstream Linux kernel, 5.16.11 on Guix (using the nonguix channel to get the upstream kernel).
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 164c (rev c1) model name : AMD Ryzen 7 5700U with Radeon Graphics
I've been hitting the same error ("[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!" as posted in the top of this thread) and it can take out my whole desktop if using an accelerated compositor like Gnome Shell. Here's how to reproduce:
- Open Blender 3.0 - click the "New File -> 2D Animation" option from the splash screen - if it doesn't crash the first time, try it a few more times
Sometimes the desktop recovers, but often not. If press ctrl-alt-f1 I see that error being spit out repeatedly at the STTY.
If I'm running XFCE, it seems like similar issues happen in Blender in that it stutters, etc, but it seems to make the screen go black for a second, then it's able to recover.
Here's another way to trigger it: try opening a fresh scene and going to view -> viewpoint -> camera. Similarly, you might have to try this a few times. Strangely, the issue may be even worse: even on XFCE, Blender can't generally recover in a usable way, I have to restart it.
Would love to see this fixed! If I should open a new bug instead, let me know.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #35 from Christine Lemmer-Webber (cwebber@dustycloud.org) --- Here's the dmesg output that appears to be associated with when everything broke:
[ 51.645260] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645272] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000032a02a2a000 from IH client 0x1b (UTCL2) [ 51.645278] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645280] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645282] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645283] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645284] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645285] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645286] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645302] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645305] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x000002a2a0202000 from IH client 0x1b (UTCL2) [ 51.645310] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645312] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645314] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645316] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645318] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645319] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645321] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645335] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645338] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000030320203000 from IH client 0x1b (UTCL2) [ 51.645342] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645343] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645345] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645346] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645348] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645349] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645350] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645368] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645371] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000202032320000 from IH client 0x1b (UTCL2) [ 51.645375] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645376] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645377] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645379] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645380] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645382] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645383] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645404] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645407] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000320303202000 from IH client 0x1b (UTCL2) [ 51.645411] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645413] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645414] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645416] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645418] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645419] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645421] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645435] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645438] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000032020323000 from IH client 0x1b (UTCL2) [ 51.645442] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645444] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645445] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645447] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645448] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645450] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645452] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645465] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645469] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000323203032000 from IH client 0x1b (UTCL2) [ 51.645473] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645475] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645477] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645479] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645481] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645482] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645484] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645501] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645504] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x00002a0202a0a000 from IH client 0x1b (UTCL2) [ 51.645510] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645513] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645515] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645516] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645518] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645520] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645522] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645534] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645537] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000203232030000 from IH client 0x1b (UTCL2) [ 51.645542] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645544] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645545] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645546] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645547] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645548] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645549] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 51.645593] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process blender pid 1724 thread blender:cs0 pid 1754) [ 51.645595] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000a02032320000 from IH client 0x1b (UTCL2) [ 51.645599] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401431 [ 51.645601] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [ 51.645602] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 [ 51.645603] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 [ 51.645604] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 51.645605] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 51.645606] amdgpu 0000:05:00.0: amdgpu: RW: 0x0 [ 61.685353] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=4230, emitted seq=4232 [ 61.685637] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process blender pid 1724 thread blender:cs0 pid 1754 [ 61.685887] amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Joris L. (commandline@protonmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |commandline@protonmail.com
--- Comment #36 from Joris L. (commandline@protonmail.com) --- I also see these kind of errors on EL8 with kernel 4.18.0-348.20.1.el8_5.x86_64
I've been tracking a webkit bug for some time with similar impact, this webkit bug were hard freezes but here the system does not always freeze, it can recover.
Since the webkit bug was browser originating and specific to some URL only i considered it highly likely to be specific to Javascript.
Now also the impact is Javascript/NodeJS specific.
The URL which now caused this freeze was while writing content on LinkedIn.com
Before the most recent 'partial freeze' there was a 'full freeze' where the messages such as '[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!' were preceeded by a lengthy evolution of the problem
------
[ma mrt 21 17:06:55 2022] perf: interrupt took too long (2510 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [ma mrt 21 17:09:27 2022] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out! [ma mrt 21 17:09:32 2022] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=266035, emitted seq=266036 [ma mrt 21 17:09:32 2022] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out! [ma mrt 21 17:09:32 2022] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 [ma mrt 21 17:09:32 2022] amdgpu 0000:05:00.0: GPU reset begin! [ma mrt 21 17:09:32 2022] [drm] free PSP TMR buffer [ma mrt 21 17:09:32 2022] amdgpu 0000:05:00.0: MODE2 reset [ma mrt 21 17:09:32 2022] amdgpu 0000:05:00.0: GPU reset succeeded, trying to resume [ma mrt 21 17:09:32 2022] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000). [ma mrt 21 17:09:32 2022] [drm] PSP is resuming... [ma mrt 21 17:09:32 2022] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR [ma mrt 21 17:09:32 2022] amdgpu 0000:05:00.0: RAS: optional ras ta ucode is not available [ma mrt 21 17:09:32 2022] amdgpu 0000:05:00.0: RAP: optional rap ta ucode is not available [ma mrt 21 17:09:32 2022] [drm] kiq ring mec 2 pipe 1 q 0 [ma mrt 21 17:09:33 2022] WARNING: CPU: 5 PID: 25470 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:942 dc_commit_state_no_check+0x404/0x980 [amdgpu] [ma mrt 21 17:09:33 2022] Modules linked in: snd_seq_dummy snd_hrtimer uinput xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink sunrpc vfat fat intel_rapl_msr wmi_bmof intel_rapl_common edac_mce_amd rtw88_8822be snd_ctl_led rtw88_8822b snd_hda_codec_conexant kvm_amd rtw88_pci snd_hda_codec_generic snd_hda_codec_hdmi uvcvideo ccp kvm rtw88_core videobuf2_vmalloc irqbypass rapl snd_hda_intel joydev mac80211 videobuf2_memops videobuf2_v4l2 pcspkr videobuf2_common snd_intel_dspcfg videodev snd_intel_sdw_acpi snd_hda_codec snd_hda_core cfg80211 k10temp snd_hwdep snd_seq snd_seq_device snd_pcm libarc4 snd_timer rtsx_pci_ms thinkpad_acpi sp5100_tco ledtrig_audio snd_rn_pci_acp3x memstick snd i2c_piix4 [ma mrt 21 17:09:33 2022] soundcore rfkill wmi video i2c_scmi acpi_cpufreq ext4 mbcache jbd2 dm_crypt mmc_block sd_mod sg amdgpu rtsx_pci_sdmmc mmc_core drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul syscopyarea sysfillrect crc32c_intel ahci sysimgblt fb_sys_fops libahci drm ghash_clmulni_intel libata serio_raw nvme nvme_core r8169 rtsx_pci realtek t10_pi dm_mirror dm_region_hash dm_log dm_mod fuse [ma mrt 21 17:09:33 2022] CPU: 5 PID: 25470 Comm: kworker/5:3 Kdump: loaded Not tainted 4.18.0-348.20.1.el8_5.x86_64 #1 [ma mrt 21 17:09:33 2022] Hardware name: LENOVO 20NF0000GE/20NF0000GE, BIOS R11ET44P (1.24 ) 01/26/2022 [ma mrt 21 17:09:33 2022] Workqueue: events drm_sched_job_timedout [gpu_sched] [ma mrt 21 17:09:33 2022] RIP: 0010:dc_commit_state_no_check+0x404/0x980 [amdgpu] [ma mrt 21 17:09:33 2022] Code: 74 e2 49 3b 56 08 75 dc 48 8b 93 f8 e8 00 00 48 85 d2 74 d0 48 89 de 4c 89 f7 e8 d7 58 9c c6 eb c3 80 b8 80 03 00 00 00 74 02 <0f> 0b 48 81 c5 d8 04 00 00 49 39 ed 0f 85 d9 02 00 00 48 8b 93 b8 [ma mrt 21 17:09:33 2022] RSP: 0018:ffffa2e14ae7bc20 EFLAGS: 00010202 [ma mrt 21 17:09:33 2022] RAX: ffff89a339309400 RBX: ffff89a1e4400000 RCX: 0000000000000002 [ma mrt 21 17:09:33 2022] RDX: 0000000000000e60 RSI: 00000000000008f8 RDI: 00000baa349077ea [ma mrt 21 17:09:33 2022] RBP: ffff89a3441e06c0 R08: ffffa2e14ae7bb74 R09: 0000000000000000 [ma mrt 21 17:09:33 2022] R10: 0000000000000030 R11: 0000000000001000 R12: 0000000000000000 [ma mrt 21 17:09:33 2022] R13: ffff89a3441e1ef8 R14: ffff89a3441e1ef8 R15: ffff89a3441e0000 [ma mrt 21 17:09:33 2022] FS: 0000000000000000(0000) GS:ffff89a860b40000(0000) knlGS:0000000000000000 [ma mrt 21 17:09:33 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ma mrt 21 17:09:33 2022] CR2: 00007face995a020 CR3: 00000001bdb68000 CR4: 00000000003506e0 [ma mrt 21 17:09:33 2022] Call Trace: [ma mrt 21 17:09:33 2022] dc_commit_state+0xa1/0xb0 [amdgpu] [ma mrt 21 17:09:33 2022] dm_resume+0x3cd/0x530 [amdgpu] [ma mrt 21 17:09:33 2022] ? psm_adjust_power_state_dynamic+0xeb/0x1b0 [amdgpu] [ma mrt 21 17:09:33 2022] amdgpu_device_ip_resume_phase2+0x63/0xd0 [amdgpu] [ma mrt 21 17:09:33 2022] amdgpu_do_asic_reset+0x28b/0x3d0 [amdgpu] [ma mrt 21 17:09:33 2022] amdgpu_device_gpu_recover+0x4e8/0xac0 [amdgpu] [ma mrt 21 17:09:33 2022] ? __drm_err+0x72/0x90 [drm] [ma mrt 21 17:09:33 2022] amdgpu_job_timedout+0x132/0x150 [amdgpu] [ma mrt 21 17:09:33 2022] drm_sched_job_timedout+0x84/0xe0 [gpu_sched] [ma mrt 21 17:09:33 2022] process_one_work+0x1a7/0x360 [ma mrt 21 17:09:33 2022] ? create_worker+0x1a0/0x1a0 [ma mrt 21 17:09:33 2022] worker_thread+0x30/0x390 [ma mrt 21 17:09:33 2022] ? create_worker+0x1a0/0x1a0 [ma mrt 21 17:09:33 2022] kthread+0x116/0x130 [ma mrt 21 17:09:33 2022] ? kthread_flush_work_fn+0x10/0x10 [ma mrt 21 17:09:33 2022] ret_from_fork+0x22/0x40 [ma mrt 21 17:09:33 2022] ---[ end trace c905cf83c622864c ]--- [ma mrt 21 17:09:33 2022] [drm] VCN decode and encode initialized successfully(under SPG Mode). [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring gfx uses VM inv eng 0 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring sdma0 uses VM inv eng 0 on hub 1 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring vcn_dec uses VM inv eng 1 on hub 1 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1 [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: recover vram bo from shadow start [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: recover vram bo from shadow done [ma mrt 21 17:09:33 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: GPU reset(1) succeeded! [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1688011a0 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1688011e0 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1688011c0 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168801200 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168801220 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168801260 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168801240 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168801280 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1688012c0 flags=0x0070] [ma mrt 21 17:09:33 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1688012a0 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168801300 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x1688012e0 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168801340 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168801320 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168801360 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x1688013a0 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168801380 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x1688013e0 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x1688013c0 flags=0x0070] [ma mrt 21 17:09:33 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168801420 flags=0x0070] [ma mrt 21 17:09:43 2022] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=706702, emitted seq=706705 [ma mrt 21 17:09:43 2022] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 17823 thread Xwayland:cs0 pid 17916 [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: GPU reset begin! [ma mrt 21 17:09:43 2022] amd_iommu_report_page_fault: 412 callbacks suppressed [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x16881e560 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x16881e580 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x16881e5a0 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x16881e5c0 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x16881e5e0 flags=0x0070] [ma mrt 21 17:09:43 2022] amd_iommu_report_page_fault: 418 callbacks suppressed [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x16881e600 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x16881e620 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x16881e6c0 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x16881e640 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x16881e680 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x16881e700 flags=0x0070] [ma mrt 21 17:09:43 2022] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x168840000 flags=0x0070] [ma mrt 21 17:09:43 2022] [drm] free PSP TMR buffer [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: MODE2 reset [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: GPU reset succeeded, trying to resume [ma mrt 21 17:09:43 2022] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000). [ma mrt 21 17:09:43 2022] [drm] PSP is resuming... [ma mrt 21 17:09:43 2022] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR [ma mrt 21 17:09:43 2022] core: [Hardware Error]: Machine check events logged [ma mrt 21 17:09:43 2022] [Hardware Error]: Deferred error, no action required. [ma mrt 21 17:09:43 2022] [Hardware Error]: CPU:0 (17:18:1) MC20_STATUS[-|-|MiscV|AddrV|-|-|SyndV|UECC|Deferred|-|-]: 0x9c2030000001085b [ma mrt 21 17:09:43 2022] [Hardware Error]: Error Addr: 0x00007ffcffffff00 [ma mrt 21 17:09:43 2022] [Hardware Error]: IPID: 0x0000002e00000000, Syndrome: 0x000000005b240204 [ma mrt 21 17:09:43 2022] [Hardware Error]: Coherent Slave Ext. Error Code: 1, Address Violation. [ma mrt 21 17:09:43 2022] [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout) [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: RAS: optional ras ta ucode is not available [ma mrt 21 17:09:43 2022] amdgpu 0000:05:00.0: RAP: optional rap ta ucode is not available [ma mrt 21 17:09:44 2022] [drm] kiq ring mec 2 pipe 1 q 0 [ma mrt 21 17:09:44 2022] [drm] VCN decode and encode initialized successfully(under SPG Mode). [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring gfx uses VM inv eng 0 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring sdma0 uses VM inv eng 0 on hub 1 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring vcn_dec uses VM inv eng 1 on hub 1 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1 [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: recover vram bo from shadow start [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: recover vram bo from shadow done [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] amdgpu 0000:05:00.0: GPU reset(3) succeeded! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:44 2022] [drm] Skip scheduling IBs! [ma mrt 21 17:09:48 2022] amdgpu_cs_ioctl: 3771 callbacks suppressed [ma mrt 21 17:09:48 2022] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ma mrt 21 17:09:48 2022] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
https://bugzilla.kernel.org/show_bug.cgi?id=205089
MasterCATZ (mastercatz@hotmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mastercatz@hotmail.com
--- Comment #37 from MasterCATZ (mastercatz@hotmail.com) --- Now my R9 290 keeps doing this with the latest drivers on Ubuntu 22.04
Every time I try watching anime through kodi
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #38 from MasterCATZ (mastercatz@hotmail.com) --- amdgpu : drm:amdgpu_cs_ioctl : Failed to initialize parser -125
AMD Radeon R9 200 Series (hawaii, LLVM 14.0.0, DRM 3.42, 5.15.34-051534-generic) OpenGL version string: 4.6 (Compatibility Profile) Mesa 22.2.0-devel (git-6983c85 2022-05-07 impish-oibaf-ppa) Ubuntu 22.04 LTS
Kernel command line: BOOT_IMAGE=/vmlinuz-5.15.34-051534-generic root=/dev/mapper/Raid6LVM-lvUbuntu ro rootflags=subvol=@ amdgpu.gpu_recovery=1 amd_iommu=on iommu=pt delayacct acpi_enforce_resources=lax usbcore.autosuspend=-1 apparmor=0 amdgpu.dc=1 amdgpu.dpm=1 amdgpu.ppfeaturemask=0xfffd7fff amdgpu.dcfeaturemask=2 amdgpu.si_support=1 amdgpu.cik_support=1 radeon.si_support=0
I could not find my dmesg logs containing the crash and neither did
journalctl -k --since "2 hours ago"
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #39 from MasterCATZ (mastercatz@hotmail.com) --- h.264 is fine
any h.265 does it
do not know why my dmesg logs do not contain all the spam when the gpu resets
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Manuel Jesús de la Fuente (m@nueljl.in) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |m@nueljl.in
--- Comment #40 from Manuel Jesús de la Fuente (m@nueljl.in) --- Can still reproduce using the following:
- Ryzen 9 5900XT - Radeon RX 6700XT
- Linux 5.17.4-1-default (openSUSE Tumbleweed with KDE Plasma) - Mesa 22.0.2-308.2
May 08 20:18:32 localhost.localdomain kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2371535, emitted seq=2371537 May 08 20:18:32 localhost.localdomain kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_x11 pid 1795 thread kwin_x11:cs0 pid 1801 May 08 20:18:32 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU reset begin! May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) May 08 20:18:33 localhost.localdomain kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) May 08 20:18:33 localhost.localdomain kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed May 08 20:18:33 localhost.localdomain kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx May 08 20:18:33 localhost.localdomain kernel: [drm] free PSP TMR buffer May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: MODE1 reset May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU mode1 reset May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU smu mode1 reset May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU reset succeeded, trying to resume May 08 20:18:34 localhost.localdomain kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000). May 08 20:18:34 localhost.localdomain kernel: [drm] VRAM is lost due to GPU reset! May 08 20:18:34 localhost.localdomain kernel: [drm] PSP is resuming... May 08 20:18:34 localhost.localdomain kernel: [drm] reserve 0xa00000 from 0x82fe000000 for PSP TMR May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: RAS: optional ras ta ucode is not available May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is resuming... May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: smu driver if version = 0x0000000e, smu fw if version = 0x00000012, smu fw version = 0x00413500 (65.53.0) May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: SMU driver if version not matched May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is resumed successfully! May 08 20:18:34 localhost.localdomain kernel: [drm] DMUB hardware initialized: version=0x0202000C May 08 20:18:34 localhost.localdomain kernel: [drm] kiq ring mec 2 pipe 1 q 0 May 08 20:18:34 localhost.localdomain kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode). May 08 20:18:34 localhost.localdomain kernel: [drm] JPEG decode initialized successfully. May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1 May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: recover vram bo from shadow start May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: recover vram bo from shadow done May 08 20:18:34 localhost.localdomain kernel: [drm] Skip scheduling IBs! May 08 20:18:34 localhost.localdomain kernel: [drm] Skip scheduling IBs! May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU reset(2) succeeded! May 08 20:18:34 localhost.localdomain kernel: [drm] Skip scheduling IBs!
[ ... the previous line, but loads of times ]
May 08 20:18:34 localhost.localdomain kernel: [drm] Skip scheduling IBs! May 08 20:18:34 localhost.localdomain kernel: amdgpu_cs_ioctl: 46 callbacks suppressed May 08 20:18:34 localhost.localdomain kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ ... the previous line, but loads of times. These are the '-125!' ones ]
May 08 20:18:44 localhost.localdomain kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! May 08 20:18:44 localhost.localdomain xembedsniproxy[1862]: Container window visible, stack below May 08 20:18:44 localhost.localdomain kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
One interesting detail/partial workaround is that underclocking the RAM speed helps reduce it. Setting it to 2400 especifically (native speed of the 32GB of ram is 3600) makes it happen much less often (still does happen though).
Another thing is that it might be somehow related to the GPU's built in audio conflicting with intel's snd_hda_intel, which is part of a few other's logs (sometimes appearing for me too). Audio is also choppy until a Pulse restart with pulseaudio -k, which might be the cause for this first freeze with RAM at 2400. This may be unrelated though, and is just conjecture from my part.
Happy to help debug the issue if anyone can guide me through the process a bit. Will also take a look at reporting this to the Mesa side too.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
emlodnaor@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |emlodnaor@gmail.com
--- Comment #41 from emlodnaor@gmail.com --- Just wanted to confirm that I also have this problem, however I'm starting to wonder if it's a hardware issue?
Typical situation: When: Using remote desktop, virtual-box or browser. What: Screen freezes, but can move mouse around, followed by black screen, then it comes back after a few seconds but screen still frozen, but mouse works. can move mouse around, and close windows (screen does not update) or open up a terminal and do a sudo reboot etc (not showing on screen)...
Why I think it might be hardware: I dual boot windows, and have similar thing happening there, however, the desktop manager in windows do succeed in unfreezing everything, but widows have totally black content until I drag them around and they are redrawn...
AMD 5950x AMD Radeon 6700XT
So I am considering asking for a new card, but it's random when the fault happens, and sometimes it will work fine for days, so a bit worried that they will look at it quickly and claim it's fine...
https://bugzilla.kernel.org/show_bug.cgi?id=205089
Luke A. Guest (laguest@archeia.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |laguest@archeia.com
--- Comment #42 from Luke A. Guest (laguest@archeia.com) --- (In reply to MasterCATZ from comment #38)
amdgpu : drm:amdgpu_cs_ioctl : Failed to initialize parser -125
I'm getting the same with VLC hanging my machine, R9 390.
# uname -a Linux rogue 5.18.0-gentoo-x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 26 15:51:54 BST 2022 x86_64 AMD FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux
I updated my firmware and there are no binary differences between the old and the new, they weren't updated others were; taken from git HEAD.
sys-kernel/linux-firmware Available versions: 20210518^bstd 20210629^bstd 20210716^bstd 20210818^bstd 20210919^bstd 20211027^bstd 20211216^bstd 20220209^bstd 20220310^bstd 20220411^bstd 20220509^bstd (**)99999999*l^bstd {initramfs +redistributable savedconfig unknown-license} Installed versions: 99999999*l^bst(12:08:44 27/05/22)(redistributable -initramfs -savedconfig -unknown-license) Homepage: https://git.kernel.org/?p=linux/kernel/git/firmware/linux-firmware.git Description: Linux firmware files
I get this on using vlc:
[ 229.233581] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring uvd timeout, signaled seq=3, emitted seq=5 [ 229.233720] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 [ 229.233825] amdgpu 0000:01:00.0: amdgpu: GPU reset begin! [ 233.233843] amdgpu 0000:01:00.0: amdgpu: failed to suspend display audio [ 234.612017] amdgpu: VI should always have 2 performance levels [ 234.719098] amdgpu 0000:01:00.0: amdgpu: BACO reset [ 235.160372] amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume [ 235.160416] [drm] PCIE gen 2 link speeds already enabled [ 235.161162] [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000). [ 235.161207] [drm] VRAM is lost due to GPU reset! [ 235.163312] amdgpu 0000:01:00.0: amdgpu: SRBM_SOFT_RESET=0x00100040 [ 235.338304] [drm] UVD initialized successfully. [ 235.459249] [drm] VCE initialized successfully. [ 235.461738] amdgpu 0000:01:00.0: amdgpu: recover vram bo from shadow start [ 235.461827] amdgpu 0000:01:00.0: amdgpu: recover vram bo from shadow done [ 235.461867] [drm] Skip scheduling IBs! [ 235.461869] [drm] Skip scheduling IBs! [ 235.461890] amdgpu 0000:01:00.0: amdgpu: GPU reset(1) succeeded! [ 235.461926] [drm] Skip scheduling IBs! [ 235.461930] [drm] Skip scheduling IBs! [ 235.461934] [drm] Skip scheduling IBs! [ 235.461937] [drm] Skip scheduling IBs! [ 235.461941] [drm] Skip scheduling IBs! [ 235.461954] [drm] Skip scheduling IBs! [ 235.461958] [drm] Skip scheduling IBs! [ 235.461962] [drm] Skip scheduling IBs! [ 235.461963] [drm] Skip scheduling IBs! [ 235.461965] [drm] Skip scheduling IBs! [ 235.461968] [drm] Skip scheduling IBs! [ 235.461973] [drm] Skip scheduling IBs! [ 235.461975] [drm] Skip scheduling IBs! [ 235.461979] [drm] Skip scheduling IBs! [ 235.461981] [drm] Skip scheduling IBs! [ 235.461983] [drm] Skip scheduling IBs! [ 235.461989] [drm] Skip scheduling IBs! [ 235.461992] [drm] Skip scheduling IBs! [ 235.461994] [drm] Skip scheduling IBs! [ 235.461998] [drm] Skip scheduling IBs! [ 235.462003] [drm] Skip scheduling IBs! [ 235.461983] [drm:amdgpu_uvd_cs_pass2 [amdgpu]] *ERROR* Invalid UVD handle 0xdca40001! [ 235.462198] amdgpu_cs_ioctl: 131 callbacks suppressed [ 235.462201] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.462236] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.462260] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.462545] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.462569] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.462608] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.464578] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.464793] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.466719] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 235.466957] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! ( REPEATS )
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #43 from Luke A. Guest (laguest@archeia.com) --- Oh, and I've tested 5.18.0, 5.17.7/6/5 - all the same error/hang.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
birbwatcher@protonmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |birbwatcher@protonmail.com
--- Comment #44 from birbwatcher@protonmail.com --- Is anybody still working on this bug?
Same error, running Ubuntu 22.04 on GNOME with kernel 5.18.0-051800-generic New computer build - Ryzen 5600G on Asus B550M motherboard with current BIOS. Error happens in both X11 and Wayland. Wayland used by default, logs below are from an X11 session to see if it was any different but they're the same.
I can readily reproduce the error by loading Cities: Skylines in Steam. I can play for a few minutes before the screen freezes, then goes black momentarily, then comes back with frozen stuttering. Sound continues, mouse can move freely, some clicks even seem responsive (but I can't see what they're doing). Keyboard commands to close app don't resolve the issue.
Relevant systemlog: 07:17:23 kernel: Sending SIGTERM to remaining processes... 07:17:23 kernel: Syncing filesystems and block devices. 07:17:22 kernel: wlp4s0: deauthenticating from MAC:ADDRESS by local choice (Reason: 3=DEAUTH_LEAVING) 07:17:22 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! 07:17:21 kernel: amdgpu_cs_ioctl: 158 callbacks suppressed 07:17:21 kernel: rfkill: input handler enabled 07:17:00 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! 07:17:00 kernel: amdgpu_cs_ioctl: 1845 callbacks suppressed 07:16:55 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! 07:16:55 kernel: amdgpu_cs_ioctl: 2146 callbacks suppressed 07:16:50 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! 07:16:50 kernel: amdgpu_cs_ioctl: 2073 callbacks suppressed 07:16:45 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! 07:16:45 kernel: amdgpu_cs_ioctl: 2114 callbacks suppressed 07:16:40 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! 07:16:40 kernel: [drm] Skip scheduling IBs! 07:16:40 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(2) succeeded! 07:16:40 kernel: [drm] Skip scheduling IBs! 07:16:40 kernel: amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow done 07:16:40 kernel: [drm] JPEG decode initialized successfully. 07:16:40 kernel: amdgpu 0000:06:00.0: amdgpu: SMU is resumed successfully! 07:16:39 kernel: [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR 07:16:39 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume 07:16:39 kernel: </TASK> 07:16:39 kernel: Call Trace: 07:16:39 kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] 07:16:39 kernel: Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS (WI-FI), BIOS 2604 02/25/2022 07:16:39 kernel: CPU: 9 PID: 13975 Comm: kworker/u64:2 Not tainted 5.18.0-051800-generic #202205222030 07:16:39 kernel: [drm] free PSP TMR buffer 07:16:39 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin! 07:16:39 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Cities.x64 pid 14460 thread Cities.x64:cs0 pid 14462 07:16:39 kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out! 07:13:18 kernel: process 'steamapps/common/Cities_Skylines/Cities.x64' started with executable stack
Relevant application log 07:17:22 systemd: Reached target Exit the Session. 07:17:22 gnome-session-c: Couldn't connect to session bus: Error receiving data: Connection reset by peer 07:17:22 systemd: Closed D-Bus User Message Bus Socket. 07:17:22 dbus-update-act: dbus-update-activation-environment: error: unable to connect to D-Bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 07:17:22 systemd: Stopped target Shutdown running GNOME Session. 07:17:22 gnome-session-c: Couldn't connect to session bus: Error sending credentials: Error sending message: Broken pipe 07:17:22 systemd: session.slice: Consumed 17.619s CPU time. 07:17:22 tracker-miner-f: OK 07:17:22 gnome-session-f: Cannot open display: 07:17:22 systemd: Stopped Application launched by gnome-session-binary. 07:17:22 kernel: Error releasing name org.freedesktop.portal.Documents: The connection is closed 07:17:22 systemd: gvfs-daemon.service: Killing process 7772 (gdbus) with signal SIGKILL. 07:17:22 gnome-session-b: gnome-session-binary[4326]: WARNING: Lost name on bus: org.gnome.SessionManager 07:17:22 systemd: Stopped target Main User Target. 07:17:22 gnome-session-b: gnome-session-binary[4326]: WARNING: Could not get session class: No such device or address 07:17:22 systemd: app-org.gnome.Terminal.slice: Consumed 2.641s CPU time. 07:17:22 gvfsd: A connection to the bus can't be made 07:17:22 systemd: Removed slice Slice /app/org.gnome.Terminal. 07:17:22 Xorg: (II) Server terminated successfully (0). Closing log file. 07:17:21 Xorg: (II) systemd-logind: releasing fd for 13:72 07:17:21 systemd: pulseaudio.service: Consumed 7.577s CPU time. 07:17:21 pulseaudio: After module unload, module 'module-null-sink' was still loaded! 07:17:21 steamerrorrepor: Uploading dump (out-of-process) /tmp/dumps/assert_20220530071721_10.dmp 07:17:21 gameoverlayui: Installing breakpad exception handler for appid(gameoverlayui)/version(1.0) 07:17:21 steamerrorrepor: Uploading dump (out-of-process) /tmp/dumps/assert_20220530071721_7.dmp 07:17:21 gameoverlayui: Installing breakpad exception handler for appid(gameoverlayui)/version(1.0) 07:17:21 gnome-session-b: GnomeDesktop-WARNING: Failed to acquire idle monitor proxy: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying 07:17:21 systemd: Stopped target GNOME file sharing target. 07:17:21 kernel: [31mFATA[0m[May 30 07:17:21.369] Failed to launch [31merror[0m="exit status 1" 07:17:21 kernel: Exiting due to channel error. 07:17:21 at-spi2-registr: X connection to :0 broken (explicit kill or server shutdown). 07:17:21 systemd: Stopped target GNOME Session. 07:17:21 steam: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. 07:17:21 Xorg: (II) systemd-logind: releasing fd for 13:66 07:17:21 gnome-session-b: gnome-session-binary[4326]: GnomeDesktop-WARNING: Failed to acquire idle monitor proxy: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying 07:17:21 systemd: Stopped target GNOME X11 Session (session: ubuntu). 07:17:21 Xorg: amdgpu: The CS has been cancelled because the context is lost. 07:17:01 gnome-shell: amdgpu: The CS has been cancelled because the context is lost. 07:17:00 gnome-shell: amdgpu: The CS has been cancelled because the context is lost. 07:16:33 Xorg: (EE) event4 - Nordic 2.4G Wireless Receiver Mouse: client bug: event processing lagging behind by 33ms, your system is too slow 07:13:55 systemd: app-gnome-telegramdesktop-4541.scope: Consumed 12.705s CPU time. 07:13:37 steam: src/steamexe/main.cpp (253) : Assertion Failed: reaping pid: 14431 -- gameoverlayui 07:13:28 steamerrorrepor: file ''/tmp/dumps/assert_20220530071327_50.dmp'', upload yes: ''CrashID=bp-bce47298-6cd4-49aa-b884-774e02220530'' 07:13:27 steam: Installing breakpad exception handler for appid(steam)/version(1653101165) 07:13:24 gameoverlayui: g_object_unref: assertion 'G_IS_OBJECT (object)' failed 07:13:19 gnome-shell: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x561bb25806c0] is on because it needs an allocation. 07:13:19 steam: Game process updated : AppID 255710 "/home/username/.steam/debian-installation/ubuntu12_32/reaper SteamLaunch AppId=255710 -- '/home/username/.steam/debian-installation/steamapps/common/Cities_Skylines/dowser'", ProcID 14460, IP 0.0.0.0:0 07:13:18 Xorg: (--) AMDGPU(0): HDMI max TMDS frequency 170000KHz 07:13:17 gameoverlayui: Installing breakpad exception handler for appid(gameoverlayui)/version(1.0) 07:13:12 steam: Game process updated : AppID 255710 "/home/username/.steam/debian-installation/ubuntu12_32/reaper SteamLaunch AppId=255710 -- '/home/username/.steam/debian-installation/steamapps/common/Cities_Skylines/dowser'", ProcID 14282, IP 0.0.0.0:0 07:13:12 gameoverlayui: Installing breakpad exception handler for appid(gameoverlayui)/version(1.0) 07:13:11 steam: Trying to remove a child that doesn't believe we're it's parent. 07:13:11 dowser: ERROR: ld.so: object '/home/username/.steam/debian-installation/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored. 07:13:11 steam: GameAction [AppID 255710, ActionID 1] : LaunchApp changed task to Completed with "" 07:13:11 reaper: ERROR: ld.so: object '/home/username/.steam/debian-installation/ubuntu12_64/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored. 07:13:11 steam: GameAction [AppID 255710, ActionID 1] : LaunchApp changed task to WaitingGameWindow with "" 07:13:11 sh: ERROR: ld.so: object '/home/username/.steam/debian-installation/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored. 07:13:11 steam: Game process added : AppID 255710 "/home/username/.steam/debian-installation/ubuntu12_32/reaper SteamLaunch AppId=255710 -- '/home/username/.steam/debian-installation/steamapps/common/Cities_Skylines/dowser'", ProcID 14256, IP 0.0.0.0:0
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #45 from Alex Deucher (alexdeucher@gmail.com) --- The "Failed to initialize parser -125!" error message is a generic symptom of a GPU hang and reset. The actual cause of the GPU hang is very likely different for everyone. The issue is mostly likely in mesa (which handles the user mode side of graphics and video acceleration). An improperly set up command buffer from the user mode driver could cause a GPU hang. In that case the kernel is just the messenger. I would suggest trying a newer or older mesa release to see if you can narrow down the issue. If there is a specific application that causes the issue consistently, I would suggest opening a mesa bug report (https://gitlab.freedesktop.org/groups/mesa/-/issues?sort=updated_desc&st...).
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #46 from Luke A. Guest (laguest@archeia.com) --- Can confirm, for my case, emerge -av @mesa (where @mesa is libdrm, mes and mesa-tools from git HEAD) fixes it.
https://bugzilla.kernel.org/show_bug.cgi?id=205089
--- Comment #47 from Ryzen Buntu (birbwatcher@protonmail.com) --- I updated mesa using the kisak-mesa PPA, didn't notice any changes. But after disabling AMP/DOCP (using my 3600 ram at 3600mhz), and the auto setting on my BIOS set it to 2133mhz, I can play skylines without any issue at all.
dri-devel@lists.freedesktop.org