https://bugs.freedesktop.org/show_bug.cgi?id=93301
Bug ID: 93301 Summary: ns2_linux32: radeon VM fault / VM_CONTEXT1_PROTECTION_FAULT_ADDR Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: dev@illwieckz.net QA Contact: dri-devel@lists.freedesktop.org
Created attachment 120427 --> https://bugs.freedesktop.org/attachment.cgi?id=120427&action=edit syslog (Linux 4.3, mesa git, radeon hawaii r9 390X)
Hi I (re) tried Natural Selection 2 yesterday, and I it crashed (as expected), and this time the syslog said:
``` Dec 8 07:20:25 gollum kernel: [16588.241364] radeon 0000:01:00.0: GPU fault detected: 146 0x0e65e014 Dec 8 07:20:25 gollum kernel: [16588.241368] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00007673 Dec 8 07:20:25 gollum kernel: [16588.241369] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050E0014 Dec 8 07:20:25 gollum kernel: [16588.241370] VM fault (0x04, vmid 2) at page 30323, write from 'CB1' (0x43423100) (224) ---8<--- 120 000+ lines --------------------- Dec 8 07:22:58 gollum kernel: [16741.758495] radeon 0000:01:00.0: GPU fault detected: 146 0x06a59014 Dec 8 07:22:58 gollum kernel: [16741.758506] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00007673 Dec 8 07:22:58 gollum kernel: [16741.758512] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05150014 Dec 8 07:22:58 gollum kernel: [16741.758518] VM fault (0x04, vmid 2) at page 30323, write from 'CB6' (0x43423600) (336) Dec 8 07:23:03 gollum kernel: [16746.694729] ns2_linux32[16192]: segfault at 1 ip 00000000f3bf8d12 sp 00000000f0b9adf8 error 4 in libc-2.21.so[f3acc000+1b4000] ```
Well, I'm not expecting to see this game working one day, I own this game since 2013-12-15 (it's almost two years) and it never worked, so these guys proved their product is a joke and I don't expect nothing from them. But perhaps this bug can help you to track things on your side, that's why I report it.
When the error happened, NS2 was loading assets since 20min to load a tutorial map.
I don't have more logs currently but if you have some ideas of things I can record, I can try to reproduce it. I agree the current log is a bit quiet.
I've already seen radeon errors like that before in a dpm issue (see #93288 ) but the current bug is unrelated to dpm (dpm was inactive in the current case).
I use a Radeon R9 390X with Mesa git on Linux 4.3 here.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
Thomas DEBESSE dev@illwieckz.net changed:
What |Removed |Added ---------------------------------------------------------------------------- OS|All |Linux (All)
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #1 from Nicolai Hähnle nhaehnle@gmail.com --- Hi Thomas! The VM faults are in backend writes. Can you try running with R600_DEBUG=nohyperz and/or nodcc?
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #2 from Michel Dänzer michel@daenzer.net --- Note that the game crash probably isn't directly related to the GPUVM faults. We'd need to see backtrace of the crash (preferably from gdb) to say if the crash is due to a bug in the game or Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #3 from Thomas DEBESSE dev@illwieckz.net --- (In reply to Michel Dänzer from comment #2)
Note that the game crash probably isn't directly related to the GPUVM faults.
Perhaps, but they were spamed right before the crash and stopped with it.
We'd need to see backtrace of the crash (preferably from gdb) to say if the crash is due to a bug in the game or Mesa.
I will try that.
(In reply to Nicolai Hähnle from comment #1)
Hi Thomas! The VM faults are in backend writes. Can you try running with R600_DEBUG=nohyperz and/or nodcc?
Right now I started it with R600_DEBUG=nohyperz and I get the output I’ll join.
On stdout side I got a 'radeon: mmap failed, errno: 12' error and on the syslog side I got a standard segfault message. There was no other message in syslog.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #4 from Thomas DEBESSE dev@illwieckz.net --- Created attachment 120429 --> https://bugs.freedesktop.org/attachment.cgi?id=120429&action=edit ns2 stdout (nohyperz)
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #5 from Thomas DEBESSE dev@illwieckz.net --- Good to know, using R600_DEBUG=nodcc the game loaded fine. There was many 'radeon: mmap failed, errno: 1' lines printed but the games does not crashed on graphical rendering (it crashed while playing a weapon sound). I was able to load a map and move in it, so that nodcc option workarounds the graphical rendering issue.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #6 from Thomas DEBESSE dev@illwieckz.net --- Created attachment 120430 --> https://bugs.freedesktop.org/attachment.cgi?id=120430&action=edit ns2 stdout (nodcc)
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #7 from Thomas DEBESSE dev@illwieckz.net --- Created attachment 120431 --> https://bugs.freedesktop.org/attachment.cgi?id=120431&action=edit ns2 stdout (default)
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #8 from Thomas DEBESSE dev@illwieckz.net --- Created attachment 120432 --> https://bugs.freedesktop.org/attachment.cgi?id=120432&action=edit ns2 stdout + gdb backtrace
https://bugs.freedesktop.org/show_bug.cgi?id=93301
Thomas DEBESSE dev@illwieckz.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #120432|0 |1 is obsolete| |
--- Comment #9 from Thomas DEBESSE dev@illwieckz.net --- Created attachment 120433 --> https://bugs.freedesktop.org/attachment.cgi?id=120433&action=edit ns2 stdout + gdb full backtrace + libdrm-radeon1-dbg symbols
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #10 from Thomas DEBESSE dev@illwieckz.net --- So, I added a gdb backtrace, I guess the interesting part is:
---8<------------- Thread 31 (Thread 0xe04e8b40 (LWP 26311)): #0 0xf7fd9be5 in __kernel_vsyscall () No symbol table info available. #1 0xf478025c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/i386-linux-gnu/libpthread.so.0 No symbol table info available. #2 0xe2e9cabc in radeon_drm_cs_emit_ioctl (param=0x8386300) at ../../../../../../../include/c11/threads_posix.h:159 ws = 0x8386300 cs = <optimized out> i = <optimized out> #3 0xe2e9c117 in impl_thrd_routine (p=0x856a708) at ../../../../../../../include/c11/threads_posix.h:87 pack = {func = 0xe2e9ca60 <radeon_drm_cs_emit_ioctl>, arg = 0x8386300} #4 0xf477b1aa in start_thread () from /lib/i386-linux-gnu/libpthread.so.0 No symbol table info available. #5 0xf44b5fde in clone () from /lib/i386-linux-gnu/libc.so.6 ------------->8---
https://bugs.freedesktop.org/show_bug.cgi?id=93301
Nicolai Hähnle nhaehnle@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|ns2_linux32: radeon VM |ns2_linux32: radeon VM |fault / |fault on Hawaii w/ DCC |VM_CONTEXT1_PROTECTION_FAUL |enabled (+mmap errors) |T_ADDR |
https://bugs.freedesktop.org/show_bug.cgi?id=93301
Nicolai Hähnle nhaehnle@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|ns2_linux32: radeon VM |ns2_linux32: radeon VM |fault on Hawaii w/ DCC |fault on Hawaii (+mmap |enabled (+mmap errors) |errors)
--- Comment #11 from Nicolai Hähnle nhaehnle@gmail.com --- Thanks for the followup. The last crash you reported
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #12 from Nicolai Hähnle nhaehnle@gmail.com --- Thank you for the followup.
Concerning your last backtrace: The crash is in a different thread than the one you mentioned, but the symbols in the backtrace suggest that it happens while updating a buffer, so the crash is likely a consequence of the failed mmap and the game not checking whether the map succeeded.
I'm noticing a lot of other error messages in stdout, though, such as the "Failed to launch child process" and something about allocation failures that is unrelated to Mesa. Perhaps the game's virtual address space is simply getting full? It does call itself a 32 bit program.
I'm a bit surprised in hindsight that nodcc had an effect. Now that I think about it, Hawaii doesn't actually have DCC, does it?
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #13 from Alex Deucher alexdeucher@gmail.com --- (In reply to Nicolai Hähnle from comment #12)
I'm a bit surprised in hindsight that nodcc had an effect. Now that I think about it, Hawaii doesn't actually have DCC, does it?
Right. Hawaii (CI) does not have DCC, only VI parts do.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #14 from Marek Olšák maraeo@gmail.com --- No, Hawaii doesn't have DCC.
The issue can indeed be that there is not enough address space. Mesa doesn't unmap most buffers, because mmap is a very costly operation. Such buffers are only unmapped when destroyed. This is obviously a problem on 32-bit. Possible solutions:
1) Always unmap buffers at appropriate places. This can have a significant performance cost in CPU-bound apps.
2) Clear pb_cache on mmap failure and try mmap again.
(pb_cache typically contains staging buffers, which are always mapped. Therefore, pb_cache completely hides allocation, GPU mmap, and CPU mmap overhead, but it also contributes to address space usage.)
https://bugs.freedesktop.org/show_bug.cgi?id=93301
Nicolai Hähnle nhaehnle@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |nhaehnle@gmail.com
--- Comment #15 from Nicolai Hähnle nhaehnle@gmail.com --- Created attachment 120440 --> https://bugs.freedesktop.org/attachment.cgi?id=120440&action=edit 0001-XXX-winsys-radeon-limit-buffer-cache-size.patch
Following the hypothesis that your game is running out of address space, here is a quick patch for Mesa that limits the buffer cache size to 4MB as a simple test.
Try this (feel free to play around with the number in the patch), plus reducing any graphics quality options the game may have to the minimum. Don't be surprised if performance becomes worse.
Please also set the environment variable R600_DEBUG=info and provide the output, as this allows us to see the default buffer cache size.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #16 from Thomas DEBESSE dev@illwieckz.net --- OK. Thanks, I will try all these things. I just have one question, since this software is running in 32 bit mode, does it mean I have to recompile mesa in 32bit too ?
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #17 from Marek Olšák maraeo@gmail.com --- (In reply to Thomas DEBESSE from comment #16)
OK. Thanks, I will try all these things. I just have one question, since this software is running in 32 bit mode, does it mean I have to recompile mesa in 32bit too ?
You only have to recompile Mesa 32-bit, but not 64-bit.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #18 from Thomas DEBESSE dev@illwieckz.net --- OK, I patched and recompiled mesa, but it changes nothing (see incoming log).
Setting R600_DEBUG=info I get:
---8<------------- radeon_drm_winsys_create: create cache of 4194304 bytes pci_id = 0x67b0 family = 57 chip_class = 9 gart_size = 2043 MB vram_size = 8192 MB max_sclk = 1080 max_compute_units = 44 max_se = 4 max_sh_per_se = 1 drm = 2.43.0 has_uvd = 1 vce_fw_version = 671220224 r600_num_backends = 16 r600_clock_crystal_freq = 27000 r600_tiling_config = 0x1023 r600_num_tile_pipes = 16 r600_max_pipes = 11 r600_virtual_address = 1 r600_has_dma = 1 r600_backend_map = 0 r600_backend_map_valid = 1 si_tile_mode_array_valid = 1 cik_macrotile_mode_array_valid = 1 Num displays: 1 ------------->8---
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #19 from Thomas DEBESSE dev@illwieckz.net --- Created attachment 120443 --> https://bugs.freedesktop.org/attachment.cgi?id=120443&action=edit ns2 stdout + gdb full backtrace + libdrm-radeon1-dbg symbols / patched mesa
https://bugs.freedesktop.org/show_bug.cgi?id=93301
Thomas DEBESSE dev@illwieckz.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #120443|ns2 stdout + gdb full |ns2 stdout + gdb full description|backtrace + |backtrace + |libdrm-radeon1-dbg symbols |libdrm-radeon1-dbg symbols |/ patched mesa |/ patched mesa + | |R600_DEBUG=info
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #20 from Marek Olšák maraeo@gmail.com --- (In reply to Thomas DEBESSE from comment #19)
Created attachment 120443 [details] ns2 stdout + gdb full backtrace + libdrm-radeon1-dbg symbols / patched mesa
- R600_DEBUG=info
This looks like the mmap error has been fixed.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #21 from Nicolai Hähnle nhaehnle@gmail.com --- Indeed. You'll also find that thread 1 commits suicide because TCMalloc cannot find more memory. So it's definitely still an out of memory condition, but without more evidence to the contrary I'm inclined to say it's probably not the OpenGL driver's fault.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #22 from Marek Olšák maraeo@gmail.com --- Yeah, it could be a memory leak of some kind. I recommend running the game with this environment variable: GALLIUM_HUD=VRAM-usage,GTT-usage You should see if it's leaking GPU buffers. You can also try running it with valgrind, but exit the game before it crashes.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
famo richard.llom@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |richard.llom@gmail.com
--- Comment #23 from famo richard.llom@gmail.com --- Created attachment 128951 --> https://bugs.freedesktop.org/attachment.cgi?id=128951&action=edit GTT and VRAM usage after 30 min play
Here is a screenshot of GTT and VRAM usage after 30min of play.
From these graphs alone I couldn't make out any memory leaks.
However the game crashed soon afterwards on an new map.
System Info: Card: Advanced Micro Devices [AMD/ATI] Bonaire XTX [Radeon R7 260X]
glxinfo | grep OpenGL OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD BONAIRE (DRM 2.46.0 / 4.8.6-1-CHAKRA, LLVM 3.9.1) OpenGL core profile version string: 4.3 (Core Profile) Mesa 13.0.2 OpenGL core profile shading language version string: 4.30 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 13.0.2 OpenGL shading language version string: 1.30 OpenGL context flags: (none)
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #24 from famo richard.llom@gmail.com --- Created attachment 128953 --> https://bugs.freedesktop.org/attachment.cgi?id=128953&action=edit GTT and VRAM on crash
Here is another screenshot, taken right on a crash.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #25 from Kamil Páral kamil.paral@gmail.com --- Please note that more people report crashing on radeonsi after some time of playing: http://forums.unknownworlds.com/discussion/comment/2330666/#Comment_2330666 This is with "Failed to allocate" errors.
For me personally, NS2 didn't even start on older Mesa (mmap failed), now it starts, but as people report, seems to crash after some time. I personally never played that long (because loading the shaders takes an eternity).
I'd guess this is simply the game running out of 32bit space, and Mesa drivers being more memory hungry than proprietary drivers. The devs had OOM problems even with nvidia drivers in the past and had to cut down on memory usage. They said they can't even add more skins because they're that close to the RAM limit. They are working on a 64bit version of the game, which should hopefully fix all this.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #26 from famo richard.llom@gmail.com --- (In reply to Kamil Páral from comment #25)
Please note that more people report crashing on radeonsi after some time of playing: http://forums.unknownworlds.com/discussion/comment/2330666/#Comment_2330666 This is with "Failed to allocate" errors.
I know (and if you compare the system info, you'll find a lot similarities ... ;-)
For me personally, NS2 didn't even start on older Mesa (mmap failed), now it starts, but as people report, seems to crash after some time. I personally never played that long (because loading the shaders takes an eternity).
Yeah, same here, just go it to load wite mesa 13 and llvm 3.9.1. With regard to loading times, I guess shader-on-disk cache should help here...
I'd guess this is simply the game running out of 32bit space, and Mesa drivers being more memory hungry than proprietary drivers. The devs had OOM problems even with nvidia drivers in the past and had to cut down on memory usage. They said they can't even add more skins because they're that close to the RAM limit. They are working on a 64bit version of the game, which should hopefully fix all this.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #27 from Thomas DEBESSE dev@illwieckz.net --- On my end I re-tried to play ns2_linux32 with my R9 390X two days ago running Linux 4.9 (amdgpu), Mesa 17~git and LLVM 4~svn and I was able to play a complete alien tutorial without crashes and the rendering looked OK to me. By the way I haven't seen the game running on another platform so I can miss something, but everything looked acceptable, I don't know if it's the expected rendering, but it's a nice, acceptable and honest rendering. I've never played too much this game to be able to hit the 32bit memory limit.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #28 from famo richard.llom@gmail.com --- I just want to report that with new mesa and cache NS2 runs as smooth as butter, thanks to the devs!
System: Host: cray Kernel: 4.12.4-1-CHAKRA x86_64 (64 bit) Desktop: KDE Plasma 5.10.4 Distro: Chakra Machine: Device: desktop Mobo: ASUSTeK model: A88XM-PLUS v: Rev X.0x UEFI: American Megatrends v: 3003 date: 03/04/2017 CPU: Quad core AMD A10-7850K Radeon R7 12 Compute Cores 4C+8G (-MCP-) cache: 8192 KB clock speeds: max: 4200 MHz 1: 4200 MHz 2: 3000 MHz 3: 3000 MHz 4: 3000 MHz Graphics: Card: Advanced Micro Devices [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] Display Server: x11 (X.Org 1.19.3) driver: amdgpu Resolution: 2560x1440@119.88hz OpenGL: renderer: Gallium 0.4 on AMD FIJI (DRM 3.15.0 / 4.12.4-1-CHAKRA, LLVM 4.0.1) version: 4.5 Mesa 17.1.5 Audio: Card-1 Advanced Micro Devices [AMD] FCH Azalia Controller driver: snd_hda_intel Card-2 Advanced Micro Devices [AMD/ATI] Fiji HDMI/DP Audio Controller driver: snd_hda_intel Sound: Advanced Linux Sound Architecture v: k4.12.4-1-CHAKRA Network: Card: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Sensors: System Temperatures: cpu: 44.0C mobo: 34.0C gpu: 41.0 Fan Speeds (in rpm): fan-1: 826 fan-2: 841 fan-3: 717 Info: Processes: 213 Uptime: 11:34 Memory: 4278.6/15994.4MB Client: Shell (bash) inxi: 2.3.23
https://bugs.freedesktop.org/show_bug.cgi?id=93301
--- Comment #29 from Kamil Páral kamil.paral@gmail.com --- Yes, I can confirm NS2 works well now. One thing to note is that when compiling the shaders, the memory usage of the game is much higher than when running from the cache (even *after* the compilation has ended), I'm not sure whether it's a game problem or driver problem of not freeing all the memory. In that case, it's still possible to hit the 32b memory limit. But when running from cache, I've seen no memory issue and rendering is fine.
https://bugs.freedesktop.org/show_bug.cgi?id=93301
Nicolai Hähnle nhaehnle@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #30 from Nicolai Hähnle nhaehnle@gmail.com --- This seems to be fixed. If the high memory use is still an issue some place, it seems best to open a new bug.
dri-devel@lists.freedesktop.org