https://bugs.freedesktop.org/show_bug.cgi?id=75005
Priority: medium Bug ID: 75005 Assignee: dri-devel@lists.freedesktop.org Summary: "Upvoid" segfault in radeonsi/llvm Severity: normal Classification: Unclassified OS: All Reporter: haagch.christoph@googlemail.com Hardware: Other Status: NEW Version: git Component: Drivers/Gallium/radeonsi Product: Mesa
Created attachment 94105 --> https://bugs.freedesktop.org/attachment.cgi?id=94105&action=edit gdb "bt full" of the segfault
Software: Latest mesa git etc.
The program causing it is closed source: https://upvoid.com/
On intel ivy bridge it works.
On radeonsi it segfaults every time when starting the game.
full gdb backtrace with at least most debug information is attached.
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #1 from Tom Stellard tstellar@gmail.com --- Can you also post the output produced with the environment variable: R600_DEBUG=ps,vs
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #2 from Christoph Haag haagch.christoph@googlemail.com --- Created attachment 94106 --> https://bugs.freedesktop.org/attachment.cgi?id=94106&action=edit stderr with R600_DEBUG=ps,vs
https://bugs.freedesktop.org/show_bug.cgi?id=75005
Tom Stellard tstellar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends on| |75276
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #3 from Tom Stellard tstellar@gmail.com --- Can you try this patch: https://bugs.freedesktop.org/attachment.cgi?id=94675
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #4 from Christoph Haag haagch.christoph@googlemail.com --- Created attachment 94690 --> https://bugs.freedesktop.org/attachment.cgi?id=94690&action=edit GPU fault after it sort of works with patch from #3
Well.
At least initially it does not crash anymore. It does start now. Nice!
The problems begin very quickly after gameplay with GPU faults in the attached dmesg. Amazingly it keeps running with decent FPS for some time while these GPU faults are dumped into the log. But eventually the system will hard lockup (I got to 160 MB log before the lockup so I arbitrarily trimmed after the first few GPU fault messages). This was with R600_DEBUG=nohyperz, by the way.
I have also once seen another segfault in radeonsi, but I didn't have debug symbols at that time and haven't reproduced it with debug symbols now, because the hard lockup after a few seconds of gameplay is a bit annoying when trying to reproduce something. :) Maybe I can give a better backtrace later, but maybe it's unrelated.
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #5 from Tom Stellard tstellar@gmail.com --- Can you try this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes
If you experience any crashes or hangs please post the output of R600_DEBUG=ps,vs,gs
https://bugs.freedesktop.org/show_bug.cgi?id=75005
Christoph Haag haagch.christoph@googlemail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #94106|0 |1 is obsolete| |
--- Comment #6 from Christoph Haag haagch.christoph@googlemail.com --- Created attachment 95062 --> https://bugs.freedesktop.org/attachment.cgi?id=95062&action=edit stderr of upvoid with R600_DEBUG=ps,vs,gs that triggered GPU fault
(In reply to comment #5)
Can you try this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes
If you experience any crashes or hangs please post the output of R600_DEBUG=ps,vs,gs
I wasted some time compiling, clang was fixed for this branch version in 202737... Just in case anyone else is trying this.
Anyway, my very comprehensive tests of about 5 runs :) seem like the GPU faults and hangs still happen, mostly if the game window is maximized.
If it is not maximized and only a small window it does run longer and much more rarely hangs the GPU, and sometimes the game fails in several ways. That might be partly due to it being an alpha release, but maybe sometimes it's because of the driver? Don't know. Maybe you can decide whether it is caused by the problems here or needs new bugs.
E.g. this results in SIGABRT and then(?) SIGILL: UpvoidEngine: r600_query.c:749: r600_suspend_nontimer_queries: Assertion `ctx->num_cs_dw_nontimer_queries_suspend == 0' failed. I can post a full backtrace if you want.
It's currently segfaulting too much in unrelated code to get the segfault backtrace I originally wanted that involves almost only radeonsi, so I'll just attach the stderr output of one of the gpu fault hangs with R600_DEBUG=ps,vs,gs and try again later.
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #7 from Tom Stellard tstellar@gmail.com --- Is there anything printed to your dmesg log when the game locks up?
https://bugs.freedesktop.org/show_bug.cgi?id=75005
Christoph Haag haagch.christoph@googlemail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #94690|0 |1 is obsolete| |
--- Comment #8 from Christoph Haag haagch.christoph@googlemail.com --- Created attachment 97307 --> https://bugs.freedesktop.org/attachment.cgi?id=97307&action=edit dmesg with patched llvm
Sorry, I wasn't very active recently.
I'm not sure what you're asking for. In comment #4 there is a whole dmesg, but I can add another one...
I am using linux 3.14 by now and recent mesa git, but with your branch of llvm.
I can maybe add a few details to the behavior: When starting upvoid it displays a menu over a view of the game world. The gpu faults start appearing in dmesg right when it starts displaying this. It doesn't directly lockup and keeps rendering relatively well. When starting the game I can walk and look around a bit and I noticed: When looking at the ground the messages stop, but when looking in the distance, the messages are again created, so I would think it's directly related to the complexity of the stuff it is rendering.
After a while the game window stops reacting. At this point the game will take up 100% "red" cpu time in htop and shortly after that the whole machine will hard lockup.
I can't say if the lockup is because of excessive error logging or not. dmesg --follow | pv > /dev/null says it's about 35 kilobyte/second.
The dmesg here is from starting the game, waiting a few seconds and then killing it. When killing it early enough it doesn't seem to cause any problems, seems to recover nicely.
https://bugs.freedesktop.org/show_bug.cgi?id=75005
Christoph Haag haagch.christoph@googlemail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #95062|0 |1 is obsolete| |
--- Comment #9 from Christoph Haag haagch.christoph@googlemail.com --- Created attachment 97308 --> https://bugs.freedesktop.org/attachment.cgi?id=97308&action=edit more recent R600_DEBUG=ps,vs,gs that triggers gpu fault
Since there were some updates, maybe this should be updated too...
I have tried to take an apitrace but apitrace segfaults (?) when replaying it, so I can post that too when I can resolve that.
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #10 from Christoph Haag haagch.christoph@googlemail.com --- (In reply to comment #9)
I have tried to take an apitrace but apitrace segfaults (?) when replaying it, so I can post that too when I can resolve that.
That was actually just my failure to use it correctly. With apitrace replay --core it works:
64 Megabyte download, 101 Megabyte uncompressed: http://w3studi.informatik.uni-stuttgart.de/~haagch/UpvoidEngine.trace.bz2
Replaying this renders on intel without bigger problems, but on my HD 7970M it creates GPU faults (but not so much that it causes a lockup).
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #11 from Tom Stellard tstellar@gmail.com --- Can you test this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes-v2
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #12 from Christoph Haag haagch.christoph@googlemail.com --- (In reply to comment #11)
Can you test this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes-v2
I'm using 3.15-rc3 by now and I don't know whether it's fixes in linux or fixes in llvm, but there are no gpu faults and no crashes anymore.
At least for the few minutes I have "tested" for now it worked absolutely fine.
Awesome!
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #13 from Tom Stellard tstellar@gmail.com --- (In reply to comment #12)
(In reply to comment #11)
Can you test this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes-v2
I'm using 3.15-rc3 by now and I don't know whether it's fixes in linux or fixes in llvm, but there are no gpu faults and no crashes anymore.
At least for the few minutes I have "tested" for now it worked absolutely fine.
Awesome!
Thanks for testing. I made a few improvements, can you test this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=si-spill-fixes-v3
https://bugs.freedesktop.org/show_bug.cgi?id=75005
--- Comment #14 from Christoph Haag haagch.christoph@googlemail.com --- Seems to still work fine for Upvoid, at least when only running it for very few minutes.
https://bugs.freedesktop.org/show_bug.cgi?id=75005
Tom Stellard tstellar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #15 from Tom Stellard tstellar@gmail.com --- I have committed a fix for this.
https://bugs.freedesktop.org/show_bug.cgi?id=75005 Bug 75005 depends on bug 75276, which changed state.
Bug 75276 Summary: Implement VGPR Register Spilling https://bugs.freedesktop.org/show_bug.cgi?id=75276
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
dri-devel@lists.freedesktop.org