https://bugs.freedesktop.org/show_bug.cgi?id=98028
Bug ID: 98028 Summary: Guns of Icarus Online segfaults on startup since AMDGPU: Partially fix control flow at -O0 Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: daniel@constexpr.org QA Contact: dri-devel@lists.freedesktop.org
Created attachment 126970 --> https://bugs.freedesktop.org/attachment.cgi?id=126970&action=edit Short Guns of Icarus Online startup trace (truncated by segfault)
Guns of Icarus Online segfaults (or sometimes hangs) on startup with current Mesa and LLVM. I have bisected the segfault to LLVM r282667.
The backtraces for the segfaults vary. Some of the segfaults are inside malloc / free, indicating possible memory corruption.
I have attached an apitrace recorded using a bad LLVM revision. While the game consistently segfaults (or hangs), replaying the trace does not result in a segfault every time.
Here is also a longer trace of the full startup sequence recorded using a good LLVM revision: http://constexpr.org/tmp/GoIO-radeonsi.2.trace.xz (82 MiB)
GPU: R9 380X Kernel: 4.7.5-gentoo Mesa: git-024c207 LLVM: r283076
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #1 from Michel Dänzer michel@daenzer.net --- Please attach a backtrace of a segfault.
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #2 from Daniel Scharrer daniel@constexpr.org --- Created attachment 126992 --> https://bugs.freedesktop.org/attachment.cgi?id=126992&action=edit Backtraces recorded for the crashes
Here is a list of backtraces I have seen - it's probably not complete.
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #3 from Nicolai Hähnle nhaehnle@gmail.com --- I haven't been able to reproduce this with Mesa master and LLVM r283219 so far. Does this happen with clean re-builds?
If this still happens with current LLVM and clean re-builds, please provide logs with R600_DEBUG=vs,tcs,tes,gs,ps,cs.
The wide range of different backtraces suggests that it might be random memory corruption, so running under Valgrind may also be worth a shot.
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #4 from Daniel Scharrer daniel@constexpr.org --- Created attachment 127001 --> https://bugs.freedesktop.org/attachment.cgi?id=127001&action=edit R600_DEBUG=vs,tcs,tes,gs,ps,cs log
(In reply to Nicolai Hähnle from comment #3)
I haven't been able to reproduce this with Mesa master and LLVM r283219 so far. Does this happen with clean re-builds?
Yes, all LLVM and Mesa builds were done through the package manager, starting with an empty build directory. And I don't use ccache.
I just re-checked with an updated LLVM & Mesa and the game still crashes: Mesa: git-0e85ff3 LLVM: r283225
I also verified that it still starts properly with amdgpu-pro (running on top of the upstream 4.7.5 amdgpu module).
If this still happens with current LLVM and clean re-builds, please provide logs with R600_DEBUG=vs,tcs,tes,gs,ps,cs.
The wide range of different backtraces suggests that it might be random memory corruption, so running under Valgrind may also be worth a shot.
It does look like it. I'll get a valgrind memcheck log, but will first need to recompile a couple of libraries because valgrind still doesn't support all the instructions of my CPU :/
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #5 from Daniel Scharrer daniel@constexpr.org --- Created attachment 127002 --> https://bugs.freedesktop.org/attachment.cgi?id=127002&action=edit Another R600_DEBUG=vs,tcs,tes,gs,ps,cs log
Looks like the crashes don't always happen for the same shader.
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #6 from Daniel Scharrer daniel@constexpr.org --- Created attachment 127008 --> https://bugs.freedesktop.org/attachment.cgi?id=127008&action=edit Valgrind log
I managed to get a Valgrind log, the backtrace of the first invalid read seems consistent.
Here are the options I used, let me know if you want me to try any others:
valgrind --tool=memcheck --error-limit=no --log-file=valgrind-%p.log -v --trace-children=yes --track-origins=yes --read-var-info=yes --redzone-size=1024 --
I also noticed that the game's engine (Unity) overrides operator new and friends, maybe that's involved somehow.
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #7 from Nicolai Hähnle nhaehnle@gmail.com --- Thanks for the additional info. Running llc on those shaders under Valgrind doesn't show anything either, but this may be a limitation of Valgrind in connection with LLVM's internal allocator.
That this is exposed by the game's operator overrides is curious. If the bisection result is solid, we can't put the blame on those overrides though.
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #8 from Nicolai Hähnle nhaehnle@gmail.com --- Careful inspection of the commit you bisected this to has lead me to a smoking gun. Could you please check whether the patch at https://reviews.llvm.org/D25306 fixes this for you?
https://bugs.freedesktop.org/show_bug.cgi?id=98028
--- Comment #9 from Daniel Scharrer daniel@constexpr.org --- Your patch from D25306 fixes the crash for me. Thanks for looking into this.
https://bugs.freedesktop.org/show_bug.cgi?id=98028
Nicolai Hähnle nhaehnle@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #10 from Nicolai Hähnle nhaehnle@gmail.com --- Fixed in LLVM r283528.
dri-devel@lists.freedesktop.org