https://bugs.freedesktop.org/show_bug.cgi?id=90481
Bug ID: 90481 Summary: Radeon R9 270X gpu lockup in game spec ops: the line. Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/Radeon Assignee: dri-devel@lists.freedesktop.org Reporter: vim@xmail.net
Created attachment 115834 --> https://bugs.freedesktop.org/attachment.cgi?id=115834&action=edit kernel log with drm.debug=1
Playing in spec ops: the line from steam causes gpu lockup.
Behaviour differs. Firstly system freeze, after a few seconds screen becomes black. Sometimes works only sysrq, sometimes i can switch to VT and log in (screen still is black, but sudo reboot works). Sometimes it unfreeze and everything works again.
System: Fedora 22 x86_64 kernel: 4.0.2-300.fc22.x86_64 mesa: 10.5.4-1.20150505.fc22 xorg-server: 1.17.1-11.fc22 libdrm: 2.4.61-3.fc22 xorg-x11-drv-ati.x86_64: 7.5.0-3.fc22 window manager: kwin-5.3.0-2.fc22
video card Radeon R9 270X OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD PITCAIRN OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.5.4 OpenGL core profile shading language version string: 3.30 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 10.5.4 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.0 Mesa 10.5.4 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00 OpenGL ES profile extensions:
reproduce always. After 15-40 minutes of playing.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
Ivan Viktorov vim@xmail.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |major
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #1 from Ivan Viktorov vim@xmail.net --- With kernel 4.1.0-rc3 same situation.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #2 from Ivan Viktorov vim@xmail.net --- Created attachment 115859 --> https://bugs.freedesktop.org/attachment.cgi?id=115859&action=edit kernel log 4.1-rc3
https://bugs.freedesktop.org/show_bug.cgi?id=90481
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|DRM/Radeon |Drivers/Gallium/radeonsi Version|unspecified |10.5 Product|DRI |Mesa QA Contact| |dri-devel@lists.freedesktop | |.org
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #3 from Ivan Viktorov vim@xmail.net --- Issue presented with mesa 10.6.0-0.devel.6.5a55f68.fc23 llvm 3.6.0-1.fc23
https://bugs.freedesktop.org/show_bug.cgi?id=90481
xsellier@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|dri-devel@lists.freedesktop |xsellier@gmail.com |.org |
--- Comment #4 from xsellier@gmail.com --- Created attachment 121426 --> https://bugs.freedesktop.org/attachment.cgi?id=121426&action=edit kern.log 4.4.0-amd64
https://bugs.freedesktop.org/show_bug.cgi?id=90481
Xavier Sellier xsellier@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|xsellier@gmail.com |dri-devel@lists.freedesktop | |.org
https://bugs.freedesktop.org/show_bug.cgi?id=90481
Aaron Paden aaronbpaden@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|11.0 |11.1
--- Comment #8 from Aaron Paden aaronbpaden@gmail.com --- Still an issue with with Mesa 11.1.2 and Linux 4.5-rc3
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #9 from Daniel Scharrer daniel@constexpr.org --- I'm also seeing frequent lockups with VI using git Mesa and LLVM (X unresponsive, radeontop showing everything at 100%). Nothing in dmesg, but that's probably just because (afaik) gpu reset is not implemented for amdgpu in 4.5.
GPU: R9 380X (tonga) Mesa 11.3.0-devel (git-715e97e) LLVM r265649
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #10 from Bas Nieuwenhuizen bas@basnieuwenhuizen.nl --- I tried the attached apitrace, but the segfault most likely occurs because of the trimmed apitrace: the crash is because there is no index buffer bound in one of the glDrawRangeEelements, which results in interpreting the offset as a pointer and segfaults.
It is *very* unlikely that this is the bug from the original report, as it should have no side effects besides a segfault of the game.
Could you try to create an untrimmed apitrace which reproduces the issue and upload it somewhere?
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #11 from Daniel Scharrer daniel@constexpr.org --- Created attachment 124508 --> https://bugs.freedesktop.org/attachment.cgi?id=124508&action=edit GALLIUM_DDEBUG="800 noflush" dump
I tried to record an apitrace but could not get any lockups while recording or glretracing the traces. I also was not able to get a hang while using GALLIUM_DDEBUG="800" without noflush. Maybe the hang is framerate related, or at least much less likely to occur at really low framerates?
However, I did reproduce a hang while using GALLIUM_DDEBUG="800 noflush". Attached is the ddebug dump, not sure if it will be of any use.
Mesa 12.1.0-devel (git-a048047) LLVM r272544
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #12 from Nicolai Hähnle nhaehnle@gmail.com --- Based on the reported GRBM_STATUS registers, the hang is probably somewhere in the pixel pipe, since VGT_BUSY and PA_BUSY = 0. But it's difficult to say more. Framerate sensitivity is definitely possible.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #13 from Daniel Scharrer daniel@constexpr.org --- Is there anything else I could try to help track this down?
I tried running the game with R600_DEBUG=nodma and while that seemed to fix the issue at first, I still got a lockup after a couple of runs. Perhaps nodma just made the lockup less likely by slowing things down (although perf was not *that* different).
Btw, jaycee1980 in #radeon seemed open to providing AMD Mesa devs keys to Virtual Programming games, so you could try to reproduce this on your end as well.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #14 from at46n@t-online.de --- I'm also affected by this bug with my r7 260x on Ubuntu 16.04. If I'm able to switch to a tty my system give my an endless amount of messages with "radeon 000:01:00.0: ring 0 stalled for more than 10004msec". Last time I also got "[drm:ci_dpm_enable [radeon]] *ERROR* ci_start_dpm failed [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed"
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #15 from Marek Olšák maraeo@gmail.com --- You can try to test with:
GALLIUM_DDEBUG="pipelined 10000"
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #16 from Daniel Scharrer daniel@constexpr.org --- Unfortunately I am not able get a lockup when using GALLIUM_DDEBUG="pipelined 10000" - it seems the perf impact is still too big on my PC. I also checked that it still hangs without GALLIUM_DDEBUG.
Kernel: 4.7.0-gentoo Mesa: git-6fb6201 LLVM: r277571
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #17 from Marek Olšák maraeo@gmail.com --- Does this fix it?
https://cgit.freedesktop.org/mesa/mesa/commit/?id=947e0614d091c260651e4f3d62...
In other words, does mesa/master work?
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #18 from Aaron Paden aaronbpaden@gmail.com --- Crashed for me again after about 30 minutes of play using the latest mesa-git and Linux 4.8rc1
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #19 from Daniel Scharrer daniel@constexpr.org --- I also still get hangs, but they seem to be less frequent than they used to be. However, the framerate seems to be a bit lower compared to the last time I tested - ~70 vs. (iirc) 80+ FPS in the menu - so maybe it's just that. Both the game and glamor were using the updated Mesa version.
Curiously, the first freeze I got when testing didn't look like a GPU lockup but rather a (partial) X server lockup: all blocks were at 0% in radeontop and I was able to switch to a different VT using Ctrl+Alt+F1, and while switching back to X blocked further VT switches I was able to restart the X server normally (the log indicated a clean shutdown) and everything including OpenGL seemed to work fine after that.
The freeze lockup I got was a proper GPU lockup though - Event Engine and Texture Adresser at 0%, everything else at 100%, unable to switch VTs even with chvt over ssh.
Kernel: 4.7.0-gentoo Mesa: git-50b49d2 LLVM: r278309
https://bugs.freedesktop.org/show_bug.cgi?id=90481
Daniel Scharrer daniel@constexpr.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #124508|0 |1 is obsolete| |
--- Comment #20 from Daniel Scharrer daniel@constexpr.org --- Created attachment 125752 --> https://bugs.freedesktop.org/attachment.cgi?id=125752&action=edit GALLIUM_DDEBUG="pipelined 10000" dump
I played more of the game with GALLIUM_DDEBUG="pipelined 10000" and was able to eventually catch a lockup. Fewer blocks busy this time.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #21 from Daniel Scharrer daniel@constexpr.org --- The game also segfaulted a few times while playing - still need to get a backtrace of that.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #22 from Daniel Scharrer daniel@constexpr.org --- Created attachment 125754 --> https://bugs.freedesktop.org/attachment.cgi?id=125754&action=edit Another GALLIUM_DDEBUG="pipelined 10000" dump
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #23 from Daniel Scharrer daniel@constexpr.org --- Created attachment 125765 --> https://bugs.freedesktop.org/attachment.cgi?id=125765&action=edit Crash information
One segfault I observed was due sctx->b.dma.cs->current.buf being NULL in cik_sdma.c:377 (the first radeon_emit call in that block). Attached is the full stack trace and some additional info.
Another crash didn't have any Mesa stack frames. Not sure what's going on there.
I played a bit using amdgpu-pro 16.30.3.306809 (on top of the upstream 4.7.0 amdgpu kernel module), and there were no crashes or lockups. Also, the game runs noticeable faster on the blob :(
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #24 from Marek Olšák maraeo@gmail.com --- Does it hang with R600_DEBUG=nohyperz ?
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #25 from Daniel Scharrer daniel@constexpr.org --- I still get lockups with R600_DEBUG=nohyperz.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #26 from theamazingjanet@googlemail.com --- Tested with recent mesa git, Ubuntu 16.04 Padoka PPA, kernel 4.8.11, no gpu lockup with ~2 hour play. Probably fixed by same commit that fixed Arkham Origins in Wine and XCOM:EU.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
--- Comment #27 from Samuel Pitoiset samuel.pitoiset@gmail.com --- (In reply to Ryan Williams from comment #26)
Tested with recent mesa git, Ubuntu 16.04 Padoka PPA, kernel 4.8.11, no gpu lockup with ~2 hour play. Probably fixed by same commit that fixed Arkham Origins in Wine and XCOM:EU.
If you have a VI+ card and this commit e490b7812cae778c61004971d86dc8299b6cd240 in your build, that would make sense. But the original ticket is for SI. Should probably be closed because mesa 10.5 is very old though.
https://bugs.freedesktop.org/show_bug.cgi?id=90481
Timothy Arceri t_arceri@yahoo.com.au changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #28 from Timothy Arceri t_arceri@yahoo.com.au --- As per the previous comment lets close this and file a new bug if this is still an issue.
dri-devel@lists.freedesktop.org