https://bugs.freedesktop.org/show_bug.cgi?id=90378
Bug ID: 90378 Summary: GPU lockups in Left 4 Dead 2 Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: daniel@constexpr.org QA Contact: dri-devel@lists.freedesktop.org
Created attachment 115653 --> https://bugs.freedesktop.org/attachment.cgi?id=115653&action=edit dmesg output
While playing L4D2 today I got a lot of GPU lockups.
While the lockups seem to happen randomly, they are fairly easy to reproduce in the third chapter (The Mall) of the Dead Center campaign. I recorded an apitrace while encountering 3 lockups and there seem to be at least a couple of lockups each time I retrace it:
http://constexpr.org/tmp/L4D2-radeonsi.trace.xz (507 MiB)
At least driver was able to successfully reset the GPU each time.
There also seem to be some infrequent rendering glitches.
Probably related to bug 89228, and possibly bug 90217, bug 90284 and/or bug 89954 (all reports of lockups with Source engine games).
GPU; Radeon HD 7950 (TAHITI) Mesa 10.6.0-devel (git-3bdbc1e) LLVM r236436 Linux 4.0.1-gentoo
The above logs and apitrace were recorded with unsafe-fp-math enbled (see bug 89069 comment 34) but the lockups also happen without it. I also noticed some VM fault messages in dmesg while running L4D2 without unsafe-fp-math.
https://bugs.freedesktop.org/show_bug.cgi?id=90378
Daniel Scharrer daniel@constexpr.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |daniel@constexpr.org
--- Comment #1 from Daniel Scharrer daniel@constexpr.org --- The game stdout with R600_DEBUG=ps,vs,gs was too large to attach, so here it is instead:
http://constexpr.org/tmp/l4d.log (5 MiB)
https://bugs.freedesktop.org/show_bug.cgi?id=90378
Daniel Scharrer daniel@constexpr.org changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=89228
https://bugs.freedesktop.org/show_bug.cgi?id=90378
--- Comment #2 from Daniel Scharrer daniel@constexpr.org --- Created attachment 115951 --> https://bugs.freedesktop.org/attachment.cgi?id=115951&action=edit patch to revert LLVM r233366 (fixes lockups)
This seems to be a regression in llvm: Mesa git + LLVM svn is bad Mesa 10.5.5 + LLVM svn is bad Mesa git + LLVM 3.6.0 is good (no lockups, no glitches)
With Mesa git, the lockups in the L4D2 apitrace linked above bisect to LLVM r233366:
commit 9217916725713c00f17cb64123e8dffdae843eb7 Author: Andrew Trick atrick@apple.com Date: Fri Mar 27 06:10:13 2015 +0000
Complete the MachineScheduler fix made way back in r210390.
"Fix the MachineScheduler's logic for updating ready times for in-order. Now the scheduler updates a node's ready time as soon as it is scheduled, before releasing dependent nodes."
This fix was only made in one variant of the ScheduleDAGMI driver. Francois de Ferriere reported the issue in the other bit of code where it was also needed. I never got around to coming up with a test case, but it's an obvious fix that shouldn't be delayed any longer. I'll try to refactor this code a little better.
I did verify performance on a wide variety of targets and saw no negative impact with this fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233366 91177308-0d34-0410-b5e6-96231b3b80d8
I had to revert b8797a7 and a99a16a in Mesa for it to build against that LLVM revision.
Besides the arch-specific test files, r233366 only moves one line of code around. Reverting that on current LLVM (see attached patch) also fixes the lockups.
As with bug #90510, R600_DEBUG=switch_on_eop gets rid of the glitches, and also prevents the crashes as well. Not sure if that means it could be a bug in Mesa or if that just hides the LLVM bug.
While bisecting for the lockup, I noticed the glitches were also introduced in LLVM after 3.6.0, but not by the same revision - f74b5c6 (r231401) has no lockups but does have glitches. I'll bisect that for bug #88561 as the glitches in the latest Talos apitrace there also seem to come from that commit range. (The GPU faults - bug #87278 - seem to have yet another cause, being present even with LLVM 3.6.0.)
NB: I also noticed that with compositing enabled in KWin, the system is not able to recover from the GPU lockups (and eventually does not even respond to SSH or SysRq). With compositing disabled the GPU is almost always reset successfully and the game / glretrace can continue as if nothing happened.
https://bugs.freedesktop.org/show_bug.cgi?id=90378
Daniel Scharrer daniel@constexpr.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|GPU lockups in Left 4 Dead |[radeonsi][bisected] GPU |2 |lockups in Left 4 Dead 2
https://bugs.freedesktop.org/show_bug.cgi?id=90378
--- Comment #3 from Daniel Scharrer daniel@constexpr.org --- Created attachment 115952 --> https://bugs.freedesktop.org/attachment.cgi?id=115952&action=edit R600_DEBUG=ps,vs,gs output with r233365 (no lockups)
https://bugs.freedesktop.org/show_bug.cgi?id=90378
Daniel Scharrer daniel@constexpr.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #115952|text/plain |application/x-xz mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=90378
--- Comment #4 from Daniel Scharrer daniel@constexpr.org --- Created attachment 115956 --> https://bugs.freedesktop.org/attachment.cgi?id=115956&action=edit R600_DEBUG=ps,vs,gs output with r233366 (lockups)
https://bugs.freedesktop.org/show_bug.cgi?id=90378
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|[radeonsi][bisected] GPU |[LLVM][bisected] GPU |lockups in Left 4 Dead 2 |lockups in Left 4 Dead 2
https://bugs.freedesktop.org/show_bug.cgi?id=90378
--- Comment #5 from Daniel Scharrer daniel@constexpr.org --- While the glitches come from an earlier revision than the GPU lockups, both are caused by the machine scheduler. Disabling the machine scheduler for SI fixes both the glitches and the GPU lockups. See bug 88978 for details.
https://bugs.freedesktop.org/show_bug.cgi?id=90378
Marek Olšák maraeo@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |tstellar@gmail.com
--- Comment #6 from Marek Olšák maraeo@gmail.com --- Hi Daniel,
Adding Tom in case he has an opinion on the LLVM issue.
(In reply to Daniel Scharrer from comment #2)
As with bug #90510, R600_DEBUG=switch_on_eop gets rid of the glitches, and also prevents the crashes as well. Not sure if that means it could be a bug in Mesa or if that just hides the LLVM bug.
Does this mean switch_on_eop fixes this bug completely?
https://bugs.freedesktop.org/show_bug.cgi?id=90378
--- Comment #7 from Daniel Scharrer daniel@constexpr.org --- Yes, afair R600_DEBUG=switch_on_eop fixed all issues with L4D2. I'll re-test with a more up to date build of LLVM and Mesa.
Adding Tom in case he has an opinion on the LLVM issue.
Looks like he already had a look at some of the LLVM parts in bug #88978.
https://bugs.freedesktop.org/show_bug.cgi?id=90378
--- Comment #8 from Daniel Scharrer daniel@constexpr.org --- I have confirmed that the issues is still the same with Mesa git-ff0a41b + LLVM r241381: L4D2 has glitches and lockups with unpatched LLVM and no glitches or lockups with unpatched LLVM and R600_DEBUG=switch_on_eop.
However other source engine games (Counter-Strike: Global Offensive and Team Fortress 2) still have similar-looking glitches even with patched LLVM *and* R600_DEBUG=switch_on_eop. No idea if those are related though.
https://bugs.freedesktop.org/show_bug.cgi?id=90378
--- Comment #9 from Daniel Scharrer daniel@constexpr.org --- Hi Marek,
Looks like http://lists.freedesktop.org/archives/mesa-dev/2015-July/089950.html does help. With latest LLVM + Mesa + that patch series, glitches and lockups seem to be gone. I got one lockup when replaying L4D2-radeonsi.trace after rebuilding mesa, but could not reproduce it after reboot or in game (tested in L4D2, CS:GO and The Talos Principle).
https://bugs.freedesktop.org/show_bug.cgi?id=90378
Daniel Scharrer daniel@constexpr.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #10 from Daniel Scharrer daniel@constexpr.org --- With that patch series merged I can no longer reproduce the GPU lockups.
dri-devel@lists.freedesktop.org