https://bugs.freedesktop.org/show_bug.cgi?id=96731
Bug ID: 96731 Summary: [RADEONSI] [LLVM] [bisected] GPU lockups when running Alien: Isolation Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: arek.rusi@gmail.com QA Contact: dri-devel@lists.freedesktop.org
Created attachment 124784 --> https://bugs.freedesktop.org/attachment.cgi?id=124784&action=edit gpu lockups part from dmesg
Hi, GPU trying reset few times but hang at the end.
ArchLinux 64 Radeon HD 7770 mesa latest from git kernel 4.7rc libdrm latest from git
first bad commit is:
r273467 | arsenm | 2016-06-22 22:15:28 +0200 |
AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking.
Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return.
https://bugs.freedesktop.org/show_bug.cgi?id=96731
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |arsenm2@gmail.com
--- Comment #1 from Michel Dänzer michel@daenzer.net --- Matt, any ideas offhand?
Arek, can you attach the stderr output from running the game with the environment variable
R600_DEBUG=fs,vs,gs,ps,cs,tcs,tes
with and without the commit in question?
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #2 from Arek Ruśniak arek.rusi@gmail.com --- Created attachment 124796 --> https://bugs.freedesktop.org/attachment.cgi?id=124796&action=edit R600_DEBUG=fs,vs,gs,ps,cs,tcs,tes ./AlienIsolation for r273466
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #3 from Arek Ruśniak arek.rusi@gmail.com --- Created attachment 124797 --> https://bugs.freedesktop.org/attachment.cgi?id=124797&action=edit R600_DEBUG=fs,vs,gs,ps,cs,tcs,tes ./AlienIsolation for r273467
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #4 from Arek Ruśniak arek.rusi@gmail.com --- I didn't mention before but intro,loading screen and main menu works. Game hangs right after everything is loaded.
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #5 from Matt Arsenault arsenm2@gmail.com --- r274275 fixes a problem I noticed while doing more work on this, although I wouldn't expect it to change much
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #6 from Matt Arsenault arsenm2@gmail.com --- The only obvious difference I see in the dump diffs without looking at any particular shader is the number of used registers changed. This is probably because previously the implicit uses of the super registers were missing when the AsmPrinter counts them. If the dynamic was out of bounds, it is more likely to be out of bounds of the allocated VGPRs, in which case the hardware behavior is to return v0. If there are out of bounds accesses, it would now read an undefined register. I don't know if there are any actual out of bounds dynamic vector accesses
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #7 from Nicolai Hähnle nhaehnle@gmail.com --- There is nothing obviously wrong with the last shader(s) in the bad log - and unfortunately, the logs are not really comparable: the first genuine difference is in TGSI, which means that a different sequence of OpenGL calls happened in the two runs. This makes it basically impossible to figure out the problem.
To make progress on this bug, could you please record an apitrace of the game, and see if you can reproduce the lockups by playing back the trace? If this works, please provide
1. the trace file itself (e.g. upload on Google Drive) 2. before and after logs of playing back the trace like Michel asked for.
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #8 from Arek Ruśniak arek.rusi@gmail.com --- Hi guys, replay causes gpu lockup as well. apitrace is here: https://drive.google.com/open?id=0Bx3qMdwakiQMaTNxd0JsazA4ejA
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #9 from Arek Ruśniak arek.rusi@gmail.com --- Created attachment 124857 --> https://bugs.freedesktop.org/attachment.cgi?id=124857&action=edit R600_DEBUG=fs,vs,gs,ps,cs,tcs,tes apitrace replay AlienIsolation.1.trace
https://bugs.freedesktop.org/show_bug.cgi?id=96731
Arek Ruśniak arek.rusi@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #124857|R600_DEBUG=fs,vs,gs,ps,cs,t |R600_DEBUG=fs,vs,gs,ps,cs,t description|cs,tes apitrace replay |cs,tes apitrace replay |AlienIsolation.1.trace |AlienIsolation.1.trace | |r273466
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #10 from Arek Ruśniak arek.rusi@gmail.com --- Created attachment 124858 --> https://bugs.freedesktop.org/attachment.cgi?id=124858&action=edit R600_DEBUG=fs,vs,gs,ps,cs,tcs,tes apitrace replay AlienIsolation.1.trace r273467
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #11 from Nicolai Hähnle nhaehnle@gmail.com --- Hi Arek, thanks for the trace and new logs!
Looking at the logs, the only diff is in branch instructions. Perhaps there is a bug in how kill instructions are lowered now? Since there are several shaders with differences, it's not clear yet. I'm going to try to narrow it down to a single shader using the trace.
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #12 from Nicolai Hähnle nhaehnle@gmail.com --- The first bug that I noticed in the shaders was in return handling for non-monolithic shader parts. Fix for that bug is here: http://reviews.llvm.org/D21975
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #13 from Arek Ruśniak arek.rusi@gmail.com --- Nocolai, thanks for fix. That did the job. The game now looks even better:) Really! I'll try revert llvm to old revision and play it again, maybe it's just my imagination.
https://bugs.freedesktop.org/show_bug.cgi?id=96731
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |t.hirsch@web.de
--- Comment #14 from Michel Dänzer michel@daenzer.net --- *** Bug 96794 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=96731
Nicolai Hähnle nhaehnle@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #15 from Nicolai Hähnle nhaehnle@gmail.com --- Fixed in LLVM r274612 "AMDGPU: Fix return of non-void-returning shaders".
https://bugs.freedesktop.org/show_bug.cgi?id=96731
--- Comment #16 from Marek Olšák maraeo@gmail.com --- (In reply to Michel Dänzer from comment #1)
Matt, any ideas offhand?
Arek, can you attach the stderr output from running the game with the environment variable
R600_DEBUG=fs,vs,gs,ps,cs,tcs,tes
with and without the commit in question?
For obtaining the hanging shader, setting GALLIUM_DDEBUG=800 and attaching the created log file is better. The issue would have been pretty obvious from that.
dri-devel@lists.freedesktop.org