https://bugs.freedesktop.org/show_bug.cgi?id=88183
Bug ID: 88183 Summary: radeonsi: R9 280X hangs with SuperTuxKart Product: DRI Version: DRI git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/Radeon Assignee: dri-devel@lists.freedesktop.org Reporter: alexandre.f.demers@gmail.com
In SuperTuxKart, upon loading the first track of the story mode, the display freezes. The GPU resets but when it comes back everything is messed up and it keeps resetting continuously.
I'm using latest kernel 3.19-rc3 with the "drm/radeon: fix VM flush..." patches (also tested without it), latest mesa from git, latest drm from git.
I'll see a journald dump outputs something interesting.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #1 from Alexandre Demers alexandre.f.demers@gmail.com --- Nothing interesting in there.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #2 from Michel Dänzer michel@daenzer.net --- Can you create an apitrace which reproduces the problem?
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #3 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Michel Dänzer from comment #2)
Can you create an apitrace which reproduces the problem?
I tried, but it was not conclusive. I'll give it another try tomorrow.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #4 from Alexandre Demers alexandre.f.demers@gmail.com --- I have a trace available, but it's 170MB. Do you have a suggestion on where I should upload it?
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #5 from Alexandre Demers alexandre.f.demers@gmail.com --- Crashing trace: https://drive.google.com/file/d/0Bw_tZdWsNa4BeDN2c3VRZ014aW8/view?usp=sharin...
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #6 from Michel Dänzer michel@daenzer.net --- I can reproduce the hang with current Mesa Git master, but not with the Debian 10.3.2 packages. Can you confirm that and if so, can you bisect?
Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing past the end of DXT5 SRGBA textures:
VM start=0x262F0000 end=0x26346800 | Texture 512x512x1, 10 levels, 1 samples, dxt5_srgba [...] Jan 15 16:52:47 kaveri kernel: [ 208.982506] radeon 0000:00:01.0: GPU fault detected: 146 0x09450014 Jan 15 16:52:47 kaveri kernel: [ 208.982511] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0002634A Jan 15 16:52:47 kaveri kernel: [ 208.982513] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982514] VM fault (0x04, vmid 2) at page 156490, write from 'CB0' (0x43423000) (0) Jan 15 16:52:47 kaveri kernel: [ 208.982519] radeon 0000:00:01.0: GPU fault detected: 146 0x09050014 Jan 15 16:52:47 kaveri kernel: [ 208.982520] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00026347 Jan 15 16:52:47 kaveri kernel: [ 208.982521] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982522] VM fault (0x04, vmid 2) at page 156487, write from 'CB0' (0x43423000) (0)
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #7 from smoki smoki00790@gmail.com ---
Game works fine for me if i disable texture compression in options.
BTW, they have disabled compression for any intel driver, that might mean this is not only radeonsi driver issue:
https://github.com/supertuxkart/stk-code/blob/master/data/graphical_restrict...
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #8 from smoki smoki00790@gmail.com ---
Compiled latest supertuxkart git, nothing good.
Blah, even tried it on Windows now and there too it lockup driver randomly... they really needs to fix their new alpha engine.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #9 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to smoki from comment #7)
Game works fine for me if i disable texture compression in options.
BTW, they have disabled compression for any intel driver, that might mean this is not only radeonsi driver issue:
https://github.com/supertuxkart/stk-code/blob/master/data/ graphical_restrictions.xml
Which GPU are you using?
Disabling texture compression doesn't solve the bug.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #10 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Michel Dänzer from comment #6)
I can reproduce the hang with current Mesa Git master, but not with the Debian 10.3.2 packages. Can you confirm that and if so, can you bisect?
Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing past the end of DXT5 SRGBA textures:
VM start=0x262F0000 end=0x26346800 | Texture 512x512x1, 10 levels, 1 samples, dxt5_srgba [...] Jan 15 16:52:47 kaveri kernel: [ 208.982506] radeon 0000:00:01.0: GPU fault detected: 146 0x09450014 Jan 15 16:52:47 kaveri kernel: [ 208.982511] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0002634A Jan 15 16:52:47 kaveri kernel: [ 208.982513] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982514] VM fault (0x04, vmid 2) at page 156490, write from 'CB0' (0x43423000) (0) Jan 15 16:52:47 kaveri kernel: [ 208.982519] radeon 0000:00:01.0: GPU fault detected: 146 0x09050014 Jan 15 16:52:47 kaveri kernel: [ 208.982520] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00026347 Jan 15 16:52:47 kaveri kernel: [ 208.982521] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982522] VM fault (0x04, vmid 2) at page 156487, write from 'CB0' (0x43423000) (0)
I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly updated llvm back to 3.5 first.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #11 from smoki smoki00790@gmail.com --- (In reply to Alexandre Demers from comment #9)
Which GPU are you using?
Disabling texture compression doesn't solve the bug.
Low end Kabini. Yeah texture compression disable, solve it for 0.8.2-beta release, but not for current game git i tried now, so yeah bug is there.
Not sure it is driver bug, as game is really full of bugs and lockuped driver easely even on Windows for me, it lockup there even on very minimum settings... but somehow randomly, practically on any settings.
Man to man said, game is now real shit... sorry to say that i don't know better words to describe this :) I only understend it uses new forked engine and game developers needs to fix some driver incompatibilities.
I also read thir forums and issues on github, there are planty of unsolved isuess.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #12 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Alexandre Demers from comment #10)
(In reply to Michel Dänzer from comment #6)
I can reproduce the hang with current Mesa Git master, but not with the Debian 10.3.2 packages. Can you confirm that and if so, can you bisect?
Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing past the end of DXT5 SRGBA textures:
VM start=0x262F0000 end=0x26346800 | Texture 512x512x1, 10 levels, 1 samples, dxt5_srgba [...] Jan 15 16:52:47 kaveri kernel: [ 208.982506] radeon 0000:00:01.0: GPU fault detected: 146 0x09450014 Jan 15 16:52:47 kaveri kernel: [ 208.982511] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0002634A Jan 15 16:52:47 kaveri kernel: [ 208.982513] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982514] VM fault (0x04, vmid 2) at page 156490, write from 'CB0' (0x43423000) (0) Jan 15 16:52:47 kaveri kernel: [ 208.982519] radeon 0000:00:01.0: GPU fault detected: 146 0x09050014 Jan 15 16:52:47 kaveri kernel: [ 208.982520] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00026347 Jan 15 16:52:47 kaveri kernel: [ 208.982521] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982522] VM fault (0x04, vmid 2) at page 156487, write from 'CB0' (0x43423000) (0)
I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly updated llvm back to 3.5 first.
I'm confirming that 10.3.2 works fine.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #13 from smoki smoki00790@gmail.com ---
I tried mesa 10.2.9, 10.3.0, 10.3.2, 10.3.7, 10.4.2 and 10.5-devel. with current game git all lockup GPU.
Not to mention Windows 7 32bit and 64bit there are also GPU lockups with 0.8.2-beta release.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #14 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to smoki from comment #13)
I tried mesa 10.2.9, 10.3.0, 10.3.2, 10.3.7, 10.4.2 and 10.5-devel. with current game git all lockup GPU.
Not to mention Windows 7 32bit and 64bit there are also GPU lockups with 0.8.2-beta release.
IMO, an application should never be able to lock a GPU. But things are as they are.
I haven't updated STK since I did my apitrace. For me, that version is not crashing with 10.3.2 (and I'm bisecting, we will see where this ends).
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #15 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Alexandre Demers from comment #12)
(In reply to Alexandre Demers from comment #10)
(In reply to Michel Dänzer from comment #6)
I can reproduce the hang with current Mesa Git master, but not with the Debian 10.3.2 packages. Can you confirm that and if so, can you bisect?
Even with 10.3.2 though, there are GPUVM faults, looks like the CB writing past the end of DXT5 SRGBA textures:
VM start=0x262F0000 end=0x26346800 | Texture 512x512x1, 10 levels, 1 samples, dxt5_srgba [...] Jan 15 16:52:47 kaveri kernel: [ 208.982506] radeon 0000:00:01.0: GPU fault detected: 146 0x09450014 Jan 15 16:52:47 kaveri kernel: [ 208.982511] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0002634A Jan 15 16:52:47 kaveri kernel: [ 208.982513] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982514] VM fault (0x04, vmid 2) at page 156490, write from 'CB0' (0x43423000) (0) Jan 15 16:52:47 kaveri kernel: [ 208.982519] radeon 0000:00:01.0: GPU fault detected: 146 0x09050014 Jan 15 16:52:47 kaveri kernel: [ 208.982520] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00026347 Jan 15 16:52:47 kaveri kernel: [ 208.982521] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05000014 Jan 15 16:52:47 kaveri kernel: [ 208.982522] VM fault (0x04, vmid 2) at page 156487, write from 'CB0' (0x43423000) (0)
I'll try 10.3.2 and launch a bisection, but I need to downgrade my newly updated llvm back to 3.5 first.
I'm confirming that 10.3.2 works fine.
I'm also seeing the VM faults in dmesg.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #16 from Alexandre Demers alexandre.f.demers@gmail.com --- While bisecting, using today's mesa from git, I don't have any crash/hang... But the VM errors are still there.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #17 from Michel Dänzer michel@daenzer.net --- (In reply to Alexandre Demers from comment #16)
While bisecting, using today's mesa from git, I don't have any crash/hang...
If that's still using LLVM 3.5, maybe it's actually an LLVM regression.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #18 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Michel Dänzer from comment #17)
(In reply to Alexandre Demers from comment #16)
While bisecting, using today's mesa from git, I don't have any crash/hang...
If that's still using LLVM 3.5, maybe it's actually an LLVM regression.
I doubt it, since I've been using llvm-git only for the last couple of days. But I'm actually rebuilding llvm from git, I'll know later today for sure.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #19 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Michel Dänzer from comment #17)
(In reply to Alexandre Demers from comment #16)
While bisecting, using today's mesa from git, I don't have any crash/hang...
If that's still using LLVM 3.5, maybe it's actually an LLVM regression.
Tested with mesa recompiled with llvm 3.5+ (r226248) and it still doesn't crash.
Should we keep this bug opened and focus on the VM faults?
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #20 from Marek Olšák maraeo@gmail.com --- Yes. VM faults can cause hangs too. Were you able to bisect the problematic commit?
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #21 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Marek Olšák from comment #20)
Yes. VM faults can cause hangs too. Were you able to bisect the problematic commit?
Are you refering to the VM faults? If so, not yet? I've been busy with other things lately, but I could give it a go in the next week.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #22 from Michel Dänzer michel@daenzer.net --- Has anyone found a Mesa commit yet where the VM faults *don't* occur? Otherwise, there's nothing to bisect for them.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #23 from smoki smoki00790@gmail.com --- (In reply to Michel Dänzer from comment #22)
Has anyone found a Mesa commit yet where the VM faults *don't* occur? Otherwise, there's nothing to bisect for them.
There is no good bisect, a go down to nesa 10.1 game needs at least that for gl3 renderer at is still fault.
Issue is mostly about that texture compression option for me, i tried 0.8.2-beta and 0.8.2-beta2... only you are better to not start a race before it get disabled, applied and most importantly exit a game after that because it does not get really applied :D ... blah, just do:
MESA_EXTENSION_OVERRIDE=-GL_EXT_texture_compression_s3tc ./supertuxkart
And it is fine.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #24 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Michel Dänzer from comment #22)
Has anyone found a Mesa commit yet where the VM faults *don't* occur? Otherwise, there's nothing to bisect for them.
To my knowledge, since I've had this video card (a few months), I've been dealing with VM faults. Here they are triggered in SuperTuxKart, but I've also reportered them in another bug about Serious Sam 3 (which hangs the GPU in a few seconds). However, are VM faults only related to mesa or can they come from somewhere else (drm)?
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #25 from Michel Dänzer michel@daenzer.net --- (In reply to Alexandre Demers from comment #24)
To my knowledge, since I've had this video card (a few months), I've been dealing with VM faults.
Note that I'm referring specifically to the VM faults triggered by your apitrace from comment 5. A VM fault by itself is a generic symptom which can be caused by many different things, it's more or less the equivalent of a CPU segmentation fault.
However, are VM faults only related to mesa or can they come from somewhere else (drm)?
The Mesa driver is most likely in general, though in this particular case it could also be e.g. libdrm_radeon calculating the surface parameters incorrectly.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
--- Comment #26 from smoki smoki00790@gmail.com --- (In reply to Michel Dänzer from comment #25)
Note that I'm referring specifically to the VM faults triggered by your apitrace from comment 5.
I can't even start that one, it just throw this:
0 6 glXCreateWindow(dpy = 0x1fd84a0, config = 0x2060a80, win = 58720258, attribList = {}) = 58720259 6: warning: unsupported glXCreateWindow call X Error of failed request: GLXBadFBConfig Major opcode of failed request: 155 (GLX) Minor opcode of failed request: 34 () Serial number of failed request: 22 Current serial number in output stream: 20
Probably shipped gcc libs issue.
https://bugs.freedesktop.org/show_bug.cgi?id=88183
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #27 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/569.
dri-devel@lists.freedesktop.org