https://bugs.freedesktop.org/show_bug.cgi?id=109819
Bug ID: 109819 Summary: Shadow of Mordor causes gpu freeze ryzen 2200g Product: DRI Version: DRI git Hardware: x86-64 (AMD64) OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: dominic.letz@berlin.de
Created attachment 143516 --> https://bugs.freedesktop.org/attachment.cgi?id=143516&action=edit dumps from dmesg, glxinfo and xorg
Using Kernel 4.20.13 (on Ubuntu 18.04.2) the game Shadow of Mordor installed from steam will freeze the screen after 5-120 minutes. SSH'ing into the machine still works.
I'm attaching glxinfo, Xorg.log, and dmesg log from the crash for reference.
Btw. I've added "drm.debug=0x1e log_buf_len=1M" to grub but wasn't able so far to catch anything writting to /sys/class/drm/card0/error
Let me know if there is anything I can do to help debugging.
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #1 from Dominic dominic.letz@berlin.de --- Per Linux Kernel 5.0 release here an updated report with that newest kernel and updated head from git://anongit.freedesktop.org/mesa/drm
With the Linux Kernel 5.0 the dmesg log if full of amdgpu spam, that seems to repeat itself all the time, independent of operation - not sure if it's related to the grub debug line.
The the freeze though seems to still appear the same way but the error message in dmesg has changed and now just shows two lines: that occur at the freeze point:
[ 2501.329358] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=300047, emitted seq=300049 [ 2501.329419] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ShadowOfMordor pid 3150 thread ShadowOfMo:cs0 pid 3152 [ 2501.329421] [drm] GPU recovery disabled.
Full dmesg log and glxinfo output in the new attached dumps_from_dmesg_and_glxinfo_2
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #2 from Dominic dominic.letz@berlin.de --- Created attachment 143522 --> https://bugs.freedesktop.org/attachment.cgi?id=143522&action=edit New 5.0 Kernel Crashlog
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #3 from Dominic dominic.letz@berlin.de --- I've created an apitrace and can reproduce the issue everytime by replaying "apitrace replay ShadowOfMordor.trace". It's quite big - 10gb compressed xz but still here it comes: https://letz.tw/ShadowOfMordor.trace.xz
https://bugs.freedesktop.org/show_bug.cgi?id=109819
Dominic dominic.letz@berlin.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Shadow of Mordor causes gpu |[APITRACE] Shadow of Mordor |freeze ryzen 2200g |causes gpu freeze ryzen | |2200g
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #4 from Dominic dominic.letz@berlin.de --- Created attachment 143579 --> https://bugs.freedesktop.org/attachment.cgi?id=143579&action=edit Photo of apitrace replay after freeze
I've added a photo of running the apitrace verbose to see what the last calls printed are. Last visible call is 14838798 - photo attached.
https://bugs.freedesktop.org/show_bug.cgi?id=109819
Dominic dominic.letz@berlin.de changed:
What |Removed |Added ---------------------------------------------------------------------------- URL| |https://letz.tw/ShadowOfMor | |dor.trace.xz
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #5 from Dominic dominic.letz@berlin.de --- Fun fact. Binary searching the apitrace by playing to different calls I was able to identify that my GPU hangs everytime on this call in the apitrace:
14840194 @5 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 60, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
I can replay until the previous call 14840193 safely but trying to play until 14840194 freezes everytime.
So checking the OpenGL docks I'm not quite sure that indices and basevertex are allowed to be NULL/0 could that be an issue?
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #6 from Dominic dominic.letz@berlin.de --- Created attachment 143612 --> https://bugs.freedesktop.org/attachment.cgi?id=143612&action=edit Vertex & Fragment Shader per apitrace just before the crash
I've done some more software updates: - Kernel 5.0.1 - Mesa 1.8.4
But the crash still happens at the very same opengl instruction.
So the last mentioned glDrawElementsBaseVertex() call is definitely the point of the crash but the damage that makes the gpu freeze seems to have been created by earlier calls. I found from more testing that the previous glUseProgram() seems to be required to trigger the crash. So I've attached the vertex & fragment shader as shown in apitrace.
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #7 from Dominic dominic.letz@berlin.de --- Created attachment 143613 --> https://bugs.freedesktop.org/attachment.cgi?id=143613&action=edit UMR dump
Additionally I've seen from another bug https://bugs.freedesktop.org/show_bug.cgi?id=102322 the usage of UMR so here is an attached call from: sudo umr -O verbose -R gfx[.] &> umr-verbose-mar11.txt
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #8 from Dominic dominic.letz@berlin.de --- Created attachment 143614 --> https://bugs.freedesktop.org/attachment.cgi?id=143614&action=edit sudo umr -lb sudo umr -R gfx[.] sudo umr -R sdma0[.] sudo umr -R sdma1[.]
And from running this
#!/bin/bash set -x sudo umr -lb sudo umr -R gfx[.] sudo umr -R sdma0[.] sudo umr -R sdma1[.]
./run.sh &> umr-mar11.txt attached output as well.
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #9 from Dominic dominic.letz@berlin.de --- Created attachment 143615 --> https://bugs.freedesktop.org/attachment.cgi?id=143615&action=edit Attached screenshot of mentioned apitrace line 14840194 (last line)
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #10 from Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- I could replay the trace 3 times without getting a gpu hang using a recent kernel and mesa master.
Can you still reproduce the problem?
https://bugs.freedesktop.org/show_bug.cgi?id=109819
--- Comment #11 from Dominic dominic.letz@berlin.de --- I'm travelling right now, but can check once home again.
https://bugs.freedesktop.org/show_bug.cgi?id=109819
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #12 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/712.
dri-devel@lists.freedesktop.org