https://bugs.freedesktop.org/show_bug.cgi?id=65968
Priority: medium Bug ID: 65968 Assignee: dri-devel@lists.freedesktop.org Summary: Massive memory corruption in Planetary Annihilation Alpha Severity: normal Classification: Unclassified OS: Linux (All) Reporter: andreas.ringlstetter@gmail.com Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Drivers/DRI/r300 Product: Mesa
Created attachment 81105 --> https://bugs.freedesktop.org/attachment.cgi?id=81105&action=edit Example of corruption in PA. The skybox texture has been completely overwritten, partly with textures from other programms, corruption in other textures is already starting.
Using the R300 driver (git version from 2013-06-19) on a Mobility Radeon X1400 (128MB dedicated ???), I get massive memory corruption which can be seen in the attached screenshot when running the Planetary Annihilation Alpha.
The game makes use of virtual texturing, thats means a mega texture which won't possibly fit in the RAM in one piece.
However, it appears like textures which are NOT part of the mega texture have been mapped into the same address space. I could see other textures, and even bitmaps from other applications.
In the screenshot, there are large grey stripes for example, however there is no such texture in the game. The color does match the color of the window border though. Performing further tests, I even managed to get parts of album covers from Banshee into PA.
This issue is not only limited to Planetary Annihilation though and the corruption also works other way around, where applications overwrite the bitmaps of other applications.
The effects of the corruption are clearly visible in PA due to the large textures. They are not deterministic, but appear very reliable, most likely due to the high memory usage.
Using other applications which frequently allocate new textures (like Banshee with album covers) speeds up the corruption and makes it even visible in other applications like Firefox, Cinnamon etc., although not reliable.
Attached are: Screenshot of corruption Xorg-log glxinfo output
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #1 from Andreas Ringlstetter andreas.ringlstetter@gmail.com --- Created attachment 81106 --> https://bugs.freedesktop.org/attachment.cgi?id=81106&action=edit Xorg log
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #2 from Andreas Ringlstetter andreas.ringlstetter@gmail.com --- Created attachment 81107 --> https://bugs.freedesktop.org/attachment.cgi?id=81107&action=edit glxinfo log
https://bugs.freedesktop.org/show_bug.cgi?id=65968
Andreas Ringlstetter andreas.ringlstetter@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #81105|0 |1 is obsolete| |
--- Comment #3 from Andreas Ringlstetter andreas.ringlstetter@gmail.com --- Created attachment 81108 --> https://bugs.freedesktop.org/attachment.cgi?id=81108&action=edit Example of corruption in PA. The skybox texture has been completely overwritten, partly with textures from other programms, corruption in other textures is already starting.
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #4 from Andreas Ringlstetter andreas.ringlstetter@gmail.com --- PS: I will not be able to test with 9.0 or 9.1 since one of the shaders causes a segfault while compiling in these version. This has only recently (last 1-2 months) been fixed in git.
This was caused by a faulty implementation of peephole_mul_omod() in compiler/radeon_optimize.c, the SIGSEGV was thrown in rc_variable_list_get_writers_one_reader due to writer_list beeing NULL.
https://bugs.freedesktop.org/show_bug.cgi?id=65968
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #81105|text/plain |image/jpeg mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=65968
Andreas Boll andreas.boll.dev@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|Drivers/DRI/r300 |Drivers/Gallium/r300
--- Comment #5 from Andreas Boll andreas.boll.dev@gmail.com --- You could try setting the env var RADEON_DEBUG=noopt, maybe it helps. Additionally you should be able to test 9.0 and 9.1 with this env var.
RADEON_DEBUG=help prints some other debug flags you could try. E.g disable hyper-z or msaa
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #6 from Andreas Ringlstetter andreas.ringlstetter@gmail.com --- RADEON_DEBUG=noopt is not possible, the pixel shader programs are to big to be loaded without size optimizations. Hard limit of 512 instruction slots per pixel shader: http://developer.amd.com/wordpress/media/2012/10/Radeon_X1x00_Programming_Gu... page 13 This limit is exceeded by far due to all the virtual texturing code, the optimized shader barely fits.
I did try it in 9.0 and 9.1 with noopt and I did get past the segfault in peephole_mul_omod() this way, but it did fail then because the resulting shader program was to big.
Deactivating hyper-z has no measurable impact, and it didn't prevent the corruption either.
Antialiasing hasn't even been enabled in the application by default, so turning it off makes no difference at all.
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #7 from Timothy Arceri t_arceri@yahoo.com.au --- Planetary Annihilation is using compat profile. When I override the Mesa version with MESA_GL_VERSION_OVERRIDE=3.1COMPAT the corruptions are fixed but it later crashes.
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #8 from Timothy Arceri t_arceri@yahoo.com.au --- Actually no I take that back it is using core profile.
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #9 from Timothy Arceri t_arceri@yahoo.com.au --- (In reply to Timothy Arceri from comment #8)
Actually no I take that back it is using core profile.
It's requesting a core profile and using compat features.
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #10 from Timothy Arceri t_arceri@yahoo.com.au --- (In reply to Timothy Arceri from comment #9)
(In reply to Timothy Arceri from comment #8)
Actually no I take that back it is using core profile.
It's requesting a core profile and using compat features.
Actually I'm not sure that's true either. Anyway here is a trace (warning its 3.6GB).
https://drive.google.com/open?id=0B-f68fD4PtpBenBiekxITllIbzg
https://bugs.freedesktop.org/show_bug.cgi?id=65968
--- Comment #11 from Timothy Arceri t_arceri@yahoo.com.au --- The game runs (mostly fine on) i965, and a trace from i965 seem to run without issue on radeonsi.
However running the radeonsi trace on the nvidia blob results in the same corruptions.
https://bugs.freedesktop.org/show_bug.cgi?id=65968
Andreas Ringlstetter andreas.ringlstetter@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |INVALID
--- Comment #12 from Andreas Ringlstetter andreas.ringlstetter@gmail.com --- It's a bug in PA itself, not in Mesa.
The root cause is a race condition on the shared buffer which is used to transfer the rendered HTML UI from the Coherent host process back to PA.
There is a missing mutex inside PA when the buffer gets reallocated as a result of a window resize event. Effectively, this results in a use-after-free by the render thread of the PA process.
The faster the realloc, the lower the chance of this bug occurring. It's also subject to possibly missing protections against use after free conditions on previously shared buffers. And also to the memory allocation strategy, as a reuse of the same memory region without a clear leads to the most visible effect.
Unfortunately, various Mesa drivers so not wipe the video memory after a buffer was returned to the global pool!
dri-devel@lists.freedesktop.org