https://bugs.freedesktop.org/show_bug.cgi?id=101731
Bug ID: 101731 Summary: System freeze with AMDGPU when playing The Witcher 3 Product: Mesa Version: 17.1 Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: major Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: murks@tuxfamily.org QA Contact: dri-devel@lists.freedesktop.org
Created attachment 132575 --> https://bugs.freedesktop.org/attachment.cgi?id=132575&action=edit Save Game to reproduce the bug
Hi there.
I get reproducable system freezes when playing The Witcher 3. The save game that lets me reproduce this quickly is attached (requires The Witcher 3 with all Add-Ons).
I've reported this bug it wine first but as far as we could firgure out it is more likely a bug in mesa. You can find the wine bug report here: https://bugs.winehq.org/show_bug.cgi?id=43273
I'm using an AMD RX 460 on Arch Linux with Mesa 17.1.4.
I don't know how to debug this further since I can't do anything as soon as the freeze happens. The game music keeps playing. Sometimes Ctrl+Alt+FX lets me see the TTY, but nothing reacts afterwards and the game music stops.
There is nothing possibly related in the journal or Xorg logs.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #1 from Philipp Überbacher murks@tuxfamily.org --- Created attachment 132576 --> https://bugs.freedesktop.org/attachment.cgi?id=132576&action=edit glxinfo output
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #2 from Clément Guérin libcg@protonmail.com --- Try doing an apitrace and post it here. Like this:
WINEPREFIX=/path/to/prefix apitrace trace wine witcher3.exe
Then replaying the trace should hang your computer:
apitrace replay wine64-preloader.trace
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #3 from Philipp Überbacher murks@tuxfamily.org --- Do you have any suggestion on how to get this trace within reasonable time?
It usually just takes me a few seconds to trigger the bug. As it stands I get about two frames per minute, which means it will take me hours to get the trace.
I tried lowering resolution and all gfx settings as far as possible (I still get the bug), but that helped only a little bit.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #4 from Philipp Überbacher murks@tuxfamily.org --- I've tried this now for about 2 hours and have a 50 GB trace file. No freeze though. I guess it was about 30 seconds of running around in in-game time, which is usually enough to trigger the freeze. I might have been unlucky or it might not happen in apitrace. Maybe someone else has more luck.
I might try once more to install the amdgpu-pro drivers and see whether it happens there as well.
I'm open to other suggestions.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #5 from Philipp Überbacher murks@tuxfamily.org --- I've managed to install amdgpu-pro and that has brought me a bit closer to narrowing this down. Just for reference, the software versions are amdgpu-pro 17.10.401251-2 and related packages (https://aur.archlinux.org/packages/?O=0&K=amdgpu), mesa-noglvnd 17.1.4, xorg-server 1.18.4-1.
With amdgpu-pro I could narrow the freeze down to a specific option in the game: nvidia hairworks. With that option disabled I do not get the freeze. As soon as it is enabled and a game loaded the machine freezes.
I've used this to get a apitrace quickly and I have one with just 1.1 GB. However, replaying it does not produce the freeze. Maybe the actual freeze trigger didn't make it into the file. I'll provide you the file if you tell me how. I do have a lot of warnings and errors on console when I replay that file (see console_out).
Nvidia hairworks does not trigger the freeze with amdgpu, but it does so immediately with amdgpu-pro. amdgpu triggers the freeze seemingly randomly, at least in Velen, not in White Orchard. amdgpu-pro does not trigger the freeze in Velen (unless hairworks is enabled of course).
Since both amdgpu and amdgpu-pro use mesa and the non-mesa proprietary nvidia driver does not trigger this bug it is likely something in mesa. I hope the above helps to track it down.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #6 from Philipp Überbacher murks@tuxfamily.org --- Created attachment 132604 --> https://bugs.freedesktop.org/attachment.cgi?id=132604&action=edit console output when replaying the apitrace of a crash
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #7 from Shmerl shtetldik@gmail.com --- I have the freeze with hairworks disabled all the same.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #8 from Shmerl shtetldik@gmail.com --- (In reply to Philipp Überbacher from comment #5)
I've used this to get a apitrace quickly and I have one with just 1.1 GB. However, replaying it does not produce the freeze. Maybe the actual freeze trigger didn't make it into the file. I'll provide you the file if you tell me how.
You can try this service: https://uploadfiles.io
It's time limited though, but should be enough for 30 days.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #9 from Shmerl shtetldik@gmail.com --- I noticed, when I set graphics settings to minimum, this freeze doesn't happen (or at least didn't happen to me so far).
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #10 from Shmerl shtetldik@gmail.com --- Created attachment 132626 --> https://bugs.freedesktop.org/attachment.cgi?id=132626&action=edit The Witcher 3 crash save (GOG/GOTY version).
With latest Mesa built from source, it now consistently crashes for me on Velen checkpoint save, on max settings (hairworks disabled).
OpenGL renderer string: AMD Radeon RX 480 Graphics (AMD POLARIS10 / DRM 3.10.0 / 4.11.0-1-amd64, LLVM 4.0.1) OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.2.0-devel (git-f7e78abdf4)
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #11 from Philipp Überbacher murks@tuxfamily.org --- (In reply to Shmerl from comment #10)
Created attachment 132626 [details] The Witcher 3 crash save (GOG/GOTY version).
With latest Mesa built from source, it now consistently crashes for me on Velen checkpoint save, on max settings (hairworks disabled).
OpenGL renderer string: AMD Radeon RX 480 Graphics (AMD POLARIS10 / DRM 3.10.0 / 4.11.0-1-amd64, LLVM 4.0.1) OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.2.0-devel (git-f7e78abdf4)
That's wonderfull (in a way). Maybe you can get an apitrace from that?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #12 from Shmerl shtetldik@gmail.com --- May be I'm doing somethin wrong. I tried to record a trace (using Mesa built from source which I load using a script).
I recorded a small amount - starting menu first, but when replaying it, I get black screen and such:
2127496 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1228 2127496: warning: unsupported glXSwapBuffersMscOML call 2128642 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1229 2128642: warning: unsupported glXSwapBuffersMscOML call 2130677 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1230 2130677: warning: unsupported glXSwapBuffersMscOML call 2131778 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1231 2131778: warning: unsupported glXSwapBuffersMscOML call 2133839 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1232 2133839: warning: unsupported glXSwapBuffersMscOML call 2136933 @3 glXCreateWindow(dpy = 0x7cb2f3b0, config = 0x7cc82380, win = 127926276, attribList = {}) = 121634992 2136933: warning: unsupported glXCreateWindow call Rendered 0 frames in 6.86555 secs, average of 0 fps
So not sure if full trace would be useful until it will actually show anything.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #13 from Philipp Überbacher murks@tuxfamily.org --- (In reply to Shmerl from comment #12)
May be I'm doing somethin wrong. I tried to record a trace (using Mesa built from source which I load using a script).
I recorded a small amount - starting menu first, but when replaying it, I get black screen and such:
2127496 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1228 2127496: warning: unsupported glXSwapBuffersMscOML call 2128642 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1229 2128642: warning: unsupported glXSwapBuffersMscOML call 2130677 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1230 2130677: warning: unsupported glXSwapBuffersMscOML call 2131778 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1231 2131778: warning: unsupported glXSwapBuffersMscOML call 2133839 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1232 2133839: warning: unsupported glXSwapBuffersMscOML call 2136933 @3 glXCreateWindow(dpy = 0x7cb2f3b0, config = 0x7cc82380, win = 127926276, attribList = {}) = 121634992 2136933: warning: unsupported glXCreateWindow call Rendered 0 frames in 6.86555 secs, average of 0 fps
So not sure if full trace would be useful until it will actually show anything.
I've gotten the black screen in my replay-attempts too, but I guess that is normal. Otherwise the replay would require all the textures and whatnot. Does your replay trigger the freeze (mine did not)? Maybe you can upload the trace?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #14 from Shmerl shtetldik@gmail.com --- I didn't get to the freeze point in the replay, but I remember in the past, when I recorded a trace and replayed it, it actually showed images (i.e. video like). So I suppose something is wrong with my tracing. But I can record a crash trace just in case anyway.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #15 from Shmerl shtetldik@gmail.com --- Interestingly, when I record a trace, and it reaches the point where it's supposed to freeze, it doesn't. I.e. the tracing somehow prevents it from happening.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #16 from Philipp Überbacher murks@tuxfamily.org --- I finally came around to uploading this trace (should be up for 30 days). Remember that it was with amdgpu-pro and replaying did not cause the freeze. I hope it helps anyway.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #17 from Shmerl shtetldik@gmail.com --- Just for the reference, the freeze doesn't happen to me anymore, in a newer configuration.
See https://bugs.winehq.org/show_bug.cgi?id=43273#c12
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #18 from Shmerl shtetldik@gmail.com --- Actually, I just experienced the freeze bug again. I guess it's somehow random, and it's not truly gone :(
https://bugs.freedesktop.org/show_bug.cgi?id=101731
Shmerl shtetldik@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|17.1 |git
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #19 from lefl65@gmail.com lefl65@gmail.com --- I can confirm this happens with radeonsi too
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #20 from Shmerl shtetldik@gmail.com --- (In reply to Lennard from comment #19)
I can confirm this happens with radeonsi too
Well, most previous reports were about radeonsi.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #21 from Lennard lefl65@gmail.com --- Created attachment 133134 --> https://bugs.freedesktop.org/attachment.cgi?id=133134&action=edit dmesg when freeze almost happened
I was able to save my system by switching around TTYs somehow, checked dmesg and got this. Using an R7 260X with radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #22 from Shmerl shtetldik@gmail.com --- Did anyone try to reproduce this bug with AMD kernel that supports display code (i.e. one with Vega support)?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #23 from Philipp Überbacher murks@tuxfamily.org --- (In reply to Shmerl from comment #22)
Did anyone try to reproduce this bug with AMD kernel that supports display code (i.e. one with Vega support)?
The latest kernel I tried this with is 4.12.3, does that qualify? (mesa 17.1.5, xf86-video-amdgpu 1.3.0).
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #24 from Shmerl shtetldik@gmail.com --- (In reply to Philipp Überbacher from comment #23)
The latest kernel I tried this with is 4.12.3, does that qualify? (mesa 17.1.5, xf86-video-amdgpu 1.3.0).
Did you build it from here: https://cgit.freedesktop.org/~agd5f/linux/tree/ or used some other method?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #25 from Shmerl shtetldik@gmail.com --- Just tested it with stock Linux kernel 4.12.2 (from Debian experimental) and latest Mesa 17.3.0-devel (git-293b3e0a3f). The freeze still happens.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #26 from Shmerl shtetldik@gmail.com --- Is there anything else useful that can be done to help Mesa / kernel developers to nail it down?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #27 from Samuel Pitoiset samuel.pitoiset@gmail.com --- An apitrace that reproduces the issue would be very useful.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #28 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #27)
An apitrace that reproduces the issue would be very useful.
There is one example already from Philipp Überbacher above in the comments: https://bugs.freedesktop.org/show_bug.cgi?id=101731#c16
I tried to record this with apitrace too, but strangely, the freeze doesn't happen when it's recording. Somehow it prevents it by the fact of recording itself.
I'll re-record it anyway, and will post here.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #29 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #27)
An apitrace that reproduces the issue would be very useful.
See the trace here: https://ufile.io/i6czx
It's using Wine 2.14 with these patches: dark ground patch and:
ntdll-Grow_Virtual_Heap wined3d-buffer_create wined3d-sample_c_lz wined3d-Copy_Resource_Typeless xaudio2-get_al_format
And commented out portion that checks for GLX_OML_sync_control (as per recommendation from Józef Kucia in the wine bug, since apitrace chokes on GLX_OML_sync_control).
However, while it freezes the system when the game is run on its own in the above configuration, when it's being traced, the freeze doesn't happen.
Anyway, this will probably be of interest to find some issue in Mesa / amdgpu, but otherwise, I figured out that the freeze is gone if Wine is built skipping this patchset: wined3d-Copy_Resource_Typeless.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #30 from Shmerl shtetldik@gmail.com --- Actually, even though the above freeze is gone if Wine is built right, there is still freeze happening around Velen area (in Devil's Pit). Not sure if it's related to the above, I'll try reproducing it, but above trace should at lest give some idea, what to investigate in amdgpu / radeonsi already. Stuff shouldn't just freeze the system.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #31 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #27)
An apitrace that reproduces the issue would be very useful.
I uploaded another trace: https://ufile.io/9z5yc
It's a problematic area (Devil's Pit) which hangs the game even when the Velen intro works. It doesn't hang when traced, but hangs quite reliably without it when you just turn camera around. Also, due to very intensive load, it's hard to record the trace - everything moves very slowly.
I compressed it with pixz, so you can decompress it faster as well (pixz -d). It's compatible with regular xz if anything.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #32 from Shmerl shtetldik@gmail.com --- The problem still happen with kernel 4.13:
penGL renderer string: AMD Radeon (TM) RX 480 Graphics (POLARIS10 / DRM 3.18.0 / 4.13.0-rc5-amd64, LLVM 5.0.0) OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.3.0-devel (git-f24cf82d6d)
I'm using latest Wine master with needed patches.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #33 from Shmerl shtetldik@gmail.com --- Created attachment 133707 --> https://bugs.freedesktop.org/attachment.cgi?id=133707&action=edit Save file near freeze area (Devil's Pit, Velen)
Just turn around a bit, especially looking at direction of the sun seems to trigger the freeze.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #34 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #27)
An apitrace that reproduces the issue would be very useful.
Hi Samuel. Any luck with reproducing or narrowing down this problem? The uploaded trace is going to expire soon. Let me know if you need another one, or anything else to help.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #35 from Pablo Estigarribia pablodav@gmail.com --- Could it be related to dpm?
In my case I was trying many combinations of mesa versions, libdrm and kernels, but until many tests I have just changed dpm to high performance and no freeze happended anymore. Then I disabled dpm and no freeze since weeks.
My report: https://bugs.freedesktop.org/show_bug.cgi?id=101976
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #36 from Shmerl shtetldik@gmail.com --- (In reply to Pablo Estigarribia from comment #35)
Could it be related to dpm?
In my case I was trying many combinations of mesa versions, libdrm and kernels, but until many tests I have just changed dpm to high performance and no freeze happended anymore.
I tested your change, setting dpm to high. It didn't help, the freeze is still happening, so it must be something else.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #37 from Samuel Pitoiset samuel.pitoiset@gmail.com --- (In reply to Shmerl from comment #34)
(In reply to Samuel Pitoiset from comment #27)
An apitrace that reproduces the issue would be very useful.
Hi Samuel. Any luck with reproducing or narrowing down this problem? The uploaded trace is going to expire soon. Let me know if you need another one, or anything else to help.
No, I can't reproduce the issue with the trace on my system. I should probably set up a wine install at some point.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #38 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #37)
No, I can't reproduce the issue with the trace on my system. I should probably set up a wine install at some point.
Let me know if you need a GOG key for TW3. I've spoken to GOG Linux folks, and they are willing to help Mesa developers with this.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #39 from Shmerl shtetldik@gmail.com --- For the reference, I just tested it with Linux 4.13.0 using amdgpu display code branch from AMD. Unfortunately the freeze still happens with it.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #40 from Samuel Pitoiset samuel.pitoiset@gmail.com --- How to load a save game file? Where are they stored?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #41 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #40)
How to load a save game file? Where are they stored?
They should be in:
"${WINEPREFIX}/drive_c/users/$USER/My Documents/The Witcher 3/gamesaves"
I.e. it depends on what prefix you used.
Note that GOTY save file won't work with other versions, because of some minor incompatibilities. Even Steam version with all expansions isn't the same as GOG GOTY one. If you need the later, let me know. Linux GOG developers said they can provide a key.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #42 from Samuel Pitoiset samuel.pitoiset@gmail.com --- What's your Steam AppID?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
Samuel Pitoiset samuel.pitoiset@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|System freeze with AMDGPU |System freeze with AMDGPU |when playing The Witcher 3 |when playing The Witcher 3 | |(GOG GOTY)
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #43 from Lukas Jirkovsky l.jirkovsky@gmail.com --- I'm having the same problem during the initial cutscene in Velen.
Here are some additional information:
* While the computer seems frozen, it's not frozen completely. I can still connect over ssh and do stuff there as if nothing happened. Other services work uninterrupted, too.
* Locally, only SysRq helps. Even after killing everything using Alt+SysRq+i the computer doesn't react to anything apart from more SysRq shortcuts.
* dmesg doesn't contain anything useful
* Xorg.0.log doesn't contain anything useful either (on the wine bug there is a mention about input devices being removed, but that doesn't appear here unless forced using SysRq).
Happens with AMD RX 480 with mesa 17.2.0 and linux kernel 4.13.2
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #44 from Shmerl shtetldik@gmail.com --- (In reply to Lukas Jirkovsky from comment #43)
Here are some additional information:
Yes, I observed that as well. You can access the box over ssh, but it doens't react to any local input. Also attempts to reboot it remotely hang (systemctl reboot). And lack of any sensible info in the logs is just strange.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #45 from Jan Vesely jan.vesely@rutgers.edu --- (In reply to Shmerl from comment #44)
(In reply to Lukas Jirkovsky from comment #43)
Here are some additional information:
Yes, I observed that as well. You can access the box over ssh, but it doens't react to any local input. Also attempts to reboot it remotely hang (systemctl reboot). And lack of any sensible info in the logs is just strange.
sounds like hung GPU. afaik amdgpu.ko does not support GPU timeout/reset yet. you can try reseting the GPU manually via /sys/class/drm/cardX/device/reset
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #46 from Shmerl shtetldik@gmail.com --- (In reply to Jan Vesely from comment #45)
sounds like hung GPU. afaik amdgpu.ko does not support GPU timeout/reset yet. you can try reseting the GPU manually via /sys/class/drm/cardX/device/reset
How exactly, by writing 1 there?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #47 from Lukas Jirkovsky l.jirkovsky@gmail.com --- (In reply to Jan Vesely from comment #45)
sounds like hung GPU. afaik amdgpu.ko does not support GPU timeout/reset yet. you can try reseting the GPU manually via /sys/class/drm/cardX/device/reset
There's no such file on my system. There is a reset file for other PCI busses, but not for the GPU.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #48 from Shmerl shtetldik@gmail.com --- (In reply to Lukas Jirkovsky from comment #47)
There's no such file on my system. There is a reset file for other PCI busses, but not for the GPU.
I don't have it either for RX 480 card.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #49 from Alex Deucher alexdeucher@gmail.com --- You can force a reset by reading /sys/kernel/debug/dri/0/amdgpu_gpu_reset but very few if any applications currently use the GL robustness extensions to query if the context is lost and resubmit their state.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #50 from Shmerl shtetldik@gmail.com --- (In reply to Alex Deucher from comment #49)
You can force a reset by reading /sys/kernel/debug/dri/0/amdgpu_gpu_reset but very few if any applications currently use the GL robustness extensions to query if the context is lost and resubmit their state.
For a test, I tried doing
sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_reset
during normal desktop operation (in this setup it's card 1), and it just messes up KDE / sddm and even restarting sddm it isn't enough after that (soft reboot was enough).
Then I tested it after the The Witcher 3 freeze above (remotely, over ssh). That caused complete hang, that even ssh stopped working. So that required hard reboot.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #51 from Samuel Pitoiset samuel.pitoiset@gmail.com --- Created attachment 134349 --> https://bugs.freedesktop.org/attachment.cgi?id=134349&action=edit special varying hack
Guys, can you apply the proposed special hacky patch and try to reproduce the hang? It should, at least, partially "fix" the issue in the Velen area (cf the savegame file).
To be sure the hack is enabled, please redirect stderr (wine witcher3.exe &> log) and look for "*** The Witcher 3 SPECIAL HACK ENABLED ***".
If the game exits with "Aborted! TFB varyings not correctly set!", there is something else, but I wouldn't be surprised as the patch is a huge hack just used to demonstrate the issue. Please report anyways. Thanks!
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #52 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #51)
Guys, can you apply the proposed special hacky patch and try to reproduce the hang? It should, at least, partially "fix" the issue in the Velen area (cf the savegame file).
I applied your hack patch, and here is the output I got (with my other settings active):
ATTENTION: default value of option mesa_glthread overridden by environment. *** The Witcher 3 SPECIAL HACK ENABLED *** Aborted! TFB varyings not correctly set! source->Id = 250 AL lib: (EE) alc_cleanup: 2 devices not closed
The game indeed aborts, rather than hangs there.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #53 from Samuel Pitoiset samuel.pitoiset@gmail.com --- Okay, that's expected. Didn't you get some Mesa user errors as well?
But the fact that it no longer hangs is a good news, somehow. :)
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #54 from Shmerl shtetldik@gmail.com --- Created attachment 134355 --> https://bugs.freedesktop.org/attachment.cgi?id=134355&action=edit Hack patch debug run log
Run with MESA_DEBUG=true and Wine logging enabled.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #55 from Samuel Pitoiset samuel.pitoiset@gmail.com --- Okay, the hack doesn't work for you, Mesa fails to link because the varying name is not the same.
What version of wine are you using? FWIW, I'm building my local copy from bb16263fe1974851f495435fef9a3d57fa2d4aa9 with all wine-staging patches applied on top of that commit.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #56 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #55)
Okay, the hack doesn't work for you, Mesa fails to link because the varying name is not the same.
What version of wine are you using? FWIW, I'm building my local copy from bb16263fe1974851f495435fef9a3d57fa2d4aa9 with all wine-staging patches applied on top of that commit.
Ah, I'm not using full staging, but regular Wine (relatively recent master build) with minimal patchsets required to run the game (as described here: https://appdb.winehq.org/objectManager.php?sClass=version&iId=34698#note... ).
Let me try it with full staging 2.17.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #57 from Shmerl shtetldik@gmail.com --- Here is the run with Wine staging 2.17 (MESA_DEBUG set):
ATTENTION: default value of option mesa_glthread overridden by environment. *** The Witcher 3 SPECIAL HACK ENABLED *** Mesa: User error: GL_INVALID_OPERATION in glGetUniformLocation(program not linked) Mesa: 244 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_OPERATION in glUseProgram(program 662 not linked) Mesa: 1 similar GL_INVALID_OPERATION errors Mesa: User error: GL_INVALID_OPERATION in glBeginTransformFeedback(no varyings to record) Aborted! TFB varyings not correctly set! source->Id = 250 AL lib: (EE) alc_cleanup: 2 devices not closed
It looks slightly different than before.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #58 from Shmerl shtetldik@gmail.com --- I suppose I can also build Wine from that commit and apply all staging patches including past 2.17.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #59 from Shmerl shtetldik@gmail.com --- Actually, looks like 2.17 is the last one, so their official build should be just that. It's based on commit bb16263fe1974851f495435fef9a3d57fa2d4aa9
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #60 from Samuel Pitoiset samuel.pitoiset@gmail.com --- Yeah, I built against the same commit and I'm able to reproduce the link-time error.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #61 from Samuel Pitoiset samuel.pitoiset@gmail.com --- Created attachment 134356 --> https://bugs.freedesktop.org/attachment.cgi?id=134356&action=edit updated special varying hack
What about this updated patch? (the previous has to be reverted).
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #62 from Shmerl shtetldik@gmail.com --- I'll give a try. May be game settings affect what's going on too. For the reference, I set all to max, except hairworks off. Ambient occlusion: HBAO+.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
Samuel Pitoiset samuel.pitoiset@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dark.shadow4@web.de
--- Comment #63 from Samuel Pitoiset samuel.pitoiset@gmail.com --- *** Bug 102797 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #64 from Samuel Pitoiset samuel.pitoiset@gmail.com --- See the attached trace from https://bugs.freedesktop.org/show_bug.cgi?id=102797, it reproduces the same issue.
So, basically the issue is that wine fails to set the transform feedback varyings in some situations, this explains why the following message is reported "fixme:d3d_shader:shader_glsl_generate_transform_feedback_varyings Unsupported component range 2-2.". Then, the GPU will hang later on because it will read garbage from a TFB buffer.
About TW3, I think that game uses TFB in some scenarios, I don't know why and when, maybe it's based on some occlusion queries or some time constraints? Either way, this might explain why TFB is not used when tracing with apitrace or when using "GALLIUM_DDEBUG=800" which will flush and wait 800ms after every draw call.
The attached patches should workaround both issues (TW3 and Superposition), but wine has to be fixed here.
Please, let the bug open until it's really fixed.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #65 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #61)
Created attachment 134356 [details] [review] updated special varying hack
What about this updated patch? (the previous has to be reverted).
Great! I can confirm, this patch helps both full staging, and regular + minimal patches Wine. Thanks! I'll point Wine developers to this.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #66 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #64)
The attached patches should workaround both issues (TW3 and Superposition), but wine has to be fixed here.
Please, let the bug open until it's really fixed.
While Wine does something incorrect here, shouldn't amdgpu/radeonsi still handle such kind of issues more gracefully? I.e. while Wine should be fixed, I think Mesa shouldn't cause a system freeze when that happens. Can your patch approach be generally useful for Mesa to make it more resilient, or some other solution would be needed?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #67 from Lukas Jirkovsky l.jirkovsky@gmail.com --- I can confirm that it works fine here after applying the hack, too.
Anyway, I'm with Shmerl here. In my opinion a user process should never be able to make system unusable no matter what kind of stupid stuff it does. I'm fine with the application crashing or behaving incorrectly - it's that applications fault after all. Just don't take the system with it.
Also, great work, thank you!
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #68 from Samuel Pitoiset samuel.pitoiset@gmail.com --- Thanks for confirming that the hack actually works.
Yeah, it would be better to not hang in such situation but that's complicated. Though, you can try to boot with amdgpu.lockup_timeout=3000 (ie. wait 3s) to recover the state when a lockup is detected, it might work.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #69 from aidan@jmad.org --- (In reply to Samuel Pitoiset from comment #61)
Created attachment 134356 [details] [review] updated special varying hack
What about this updated patch? (the previous has to be reverted).
What commit should this patch be applied to? It fails when applying to mesa 17.2.1:
patching file src/mesa/main/transformfeedback.c Hunk #1 succeeded at 421 with fuzz 1 (offset 14 lines). Hunk #2 succeeded at 1117 with fuzz 2 (offset 256 lines). Hunk #3 FAILED at 870. Hunk #4 FAILED at 879. 2 out of 4 hunks FAILED -- saving rejects to file src/mesa/main/transformfeedback.c.rej
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #70 from Samuel Pitoiset samuel.pitoiset@gmail.com --- (In reply to aidan from comment #69)
(In reply to Samuel Pitoiset from comment #61)
Created attachment 134356 [details] [review] [review] updated special varying hack
What about this updated patch? (the previous has to be reverted).
What commit should this patch be applied to? It fails when applying to mesa 17.2.1:
Against git master.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #71 from Lukas Jirkovsky l.jirkovsky@gmail.com --- Created attachment 134411 --> https://bugs.freedesktop.org/attachment.cgi?id=134411&action=edit special varying hack backport 17.2.1
Backported the patch to apply on 17.2.1
aidan: you can use this patch.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #72 from Shmerl shtetldik@gmail.com --- (In reply to Samuel Pitoiset from comment #68)
Thanks for confirming that the hack actually works.
Yeah, it would be better to not hang in such situation but that's complicated.
Would that require changes to the kernel driver?
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #73 from Shmerl shtetldik@gmail.com --- Józef Kucia made a hack patch for Wine to prevent the freeze: https://bugs.winehq.org/show_bug.cgi?id=43273#c43
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #74 from mirh mirh@protonmail.ch --- Wine's role should just be that of avoiding their.. stuff, to misbehave.
But as for the freeze itself, I'd be expecting a bug in amdgpu, if the user level bug was *allowed* to escalate to kernel one.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #75 from Shmerl shtetldik@gmail.com --- I made a variant of this hack for Wine itself: https://bugs.winehq.org/attachment.cgi?id=59387
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #76 from Shmerl shtetldik@gmail.com --- Unlike the previous one, it's minimal and doesn't conflict with various staging patches that are also useful for TW3.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
Józef Kucia joseph.kucia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.winehq.org/sho | |w_bug.cgi?id=43273
--- Comment #77 from Józef Kucia joseph.kucia@gmail.com --- This bug should be fixed now in Wine main git tree. The fix will be included in the next development release.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
Józef Kucia joseph.kucia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |NOTOURBUG
--- Comment #78 from Józef Kucia joseph.kucia@gmail.com --- Fixed in Wine 2.21
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #79 from Fabian Maurer dark.shadow4@web.de --- Nice to hear the bug is fixed in wine, but the mesa bug still exists, so the resolution is wrong. It's simply not acceptable for a driver to freeze the system if an application misbehaves.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
mirh mirh@protonmail.ch changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|NOTOURBUG |---
--- Comment #80 from mirh mirh@protonmail.ch --- I also agree with Fabian. Application going crazy with its own business is totally not a "problem of the driver".. But compromising system stability definitively is.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #81 from Józef Kucia joseph.kucia@gmail.com --- (In reply to mirh from comment #80)
I also agree with Fabian. Application going crazy with its own business is totally not a "problem of the driver".. But compromising system stability definitively is.
If you want to make this bug about unreliable/unimplemented GPU resets in amdgpu.ko, then it is filed against the wrong component. AFAIK there is nothing to fix in Mesa. Other than that, the bug is full of comments about the source of GPU hang. It may be better to file a new bug for implementing/fixing GPU resets in amdgpu.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
mirh mirh@protonmail.ch changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |NOTOURBUG
--- Comment #82 from mirh mirh@protonmail.ch --- Guess it make sense. A thread per "actual issue to fix".
Even though, it should take nothing to just change component from mesa to DRI :p
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #83 from Samuel Pitoiset samuel.pitoiset@gmail.com --- I do agree with Jozef, it's really a different issue that the one initially filled here. Thanks again for fixing this!
https://bugs.freedesktop.org/show_bug.cgi?id=101731
Christian König ckoenig.leichtzumerken@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #84 from Christian König ckoenig.leichtzumerken@gmail.com --- Guys please keep in mind that GPUs are programmable processors.
So when an application sends an shader with an infinity loop to the driver there is absolutely nothing the driver Mesa stack can do about that.
As Jozef correctly pointed out the best thing we can do is resetting the GPU after a timeout, but that is really complex and doesn't work all the time.
Anyway closing this bug since the original issue is fixed.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #85 from Dmitry terapy-session@bk.ru --- This problem is still there on the new Mesa 18.3.1. On AMDLK there is no such problem, but I would not like to use it.
https://bugs.freedesktop.org/show_bug.cgi?id=101731
--- Comment #86 from Shmerl shtetldik@gmail.com --- (In reply to Dmitry from comment #85)
This problem is still there on the new Mesa 18.3.1. On AMDLK there is no such problem, but I would not like to use it.
This bug was about radeonsi, not about Vulkan drivers.
dri-devel@lists.freedesktop.org