https://bugs.freedesktop.org/show_bug.cgi?id=108824
Bug ID: 108824 Summary: Invalid handling when GL buffer is bound on one context and invalidated on another Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: baldurk@baldurk.org QA Contact: dri-devel@lists.freedesktop.org
Created attachment 142556 --> https://bugs.freedesktop.org/attachment.cgi?id=142556&action=edit piglit test showing broken behaviour
I found some odd behaviour that I think I've tracked down to some incorrect handling of buffer invalidation in radeonsi.
The rough order of events is:
1. Create a buffer that's shared between two contexts. Ensure it's bound as a UBO on both. 2. Invalidate the buffer with e.g. glMapBufferRange(GL_MAP_INVALIDATE_BUFFER_BIT) on context A. 3. Context B's buffer bind is now in a bad state. Rendering will have unpredictable results, and invalidating the buffer again on context B may fail.
That's a bit vague but that's the general repro that I know for sure. This will then result in unpredictable reads/garbage data, and quite likely you'll eventually hit the assert on src/gallium/drivers/radeonsi/si_descriptors.c:1489 - assert(old_buf_va <= old_desc_va);
My understanding is that the radeonsi code will look through all bound buffers whenever an invalidate happens, fixup the descriptors by subtracting the descriptor's VA from the outgoing VA for the old buffer to get the offset, then add it onto the incoming VA and update the descriptor.
The problem seems to be that when this happens for a buffer invalidate it only checks the current context's bound buffers - so other contexts don't have their descriptors updated. That means the old VA is still being pointed at, and if an invalidate happens again on the second thread the descriptor is referring to an even older VA than the outgoing VA so there's no longer any sense in the subtract call.
I've attached a piglit test which hopefully should drop right in, it runs through the steps above and does a pixel readback to ensure the rendering went correctly. If you remove the readback you can see flickering output. It runs fine with both the readback and the rendering if I switch to swrast.
I'm on an RX 480 and tested the bug with both git-61b535437e and 18.2.4 from padoka's PPA.
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #1 from olivier.jolly@laposte.net --- Created attachment 143269 --> https://bugs.freedesktop.org/attachment.cgi?id=143269&action=edit backtrace of crash when hitting this assert (from 18.3.3/19.0.0-rc1)
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #2 from olivier.jolly@laposte.net --- I also encounter what is most probably this same bug (same assertion at least) in a randomly fashion when using Blender 2.80.
My setup is debian unstable with a Radeon HD 7950 (and also GeForce GTX 1060 for Cuda only).
I encountered this crash on mesa 18.3.2 (packaged in debian), 18.3.3 and 19.0.0-rc1 (compiled manually)
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #3 from dnicolas@gmail.com --- I'm also finding the same problem with Blender 2.80. Sometimes it crashes **very** often. Making it almost unusable.
Is there anyone who can take a look at this?
AMDGPU (Vega 56) Kernel 4.20.15 Mesa 18.3.4 Fedora 29
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #4 from Marek Olšák maraeo@gmail.com --- This is fixed by these patches: https://patchwork.freedesktop.org/series/60491/
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #5 from Marek Olšák maraeo@gmail.com --- Baldur, can I set the license of your piglit test to MIT? Thanks.
https://en.wikipedia.org/wiki/MIT_License
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #6 from Baldur Karlsson baldurk@baldurk.org --- Yes, that's fine with me. I'll try to test the patches on my program soon.
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #7 from Baldur Karlsson baldurk@baldurk.org --- I applied the patchset on top of latest mesa (aa040d3b3c7d068e1ece61c71770c16a54745f89) and I seem to get some rendered corruption that I don't get with the parent commit before applying the patches.
It seems to only appear in RenderDoc, or at least it doesn't happen when running tiny demo programs. I can't isolate a simpler test case just now but it seems reliably reproducible and only shows up when I build with the patches applied.
To repro with RenderDoc:
* Download or build RenderDoc 1.4 * Build gears3d from https://github.com/gears3d/gears3d * Launch gears3d through RenderDoc, capture, open the frame * Step back and forth through the drawcalls and the texture viewer will show up with some corruption.
Screenshot here: https://i.imgur.com/1Dk7diS.png
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #8 from LoneVVolf lonewolf@xs4all.nl --- Baldur, I encounter similar visual corruption when running knetwalk.
See comment #12 in https://bugs.freedesktop.org/show_bug.cgi?id=110701#c12
Maybe these 2 bugs are related ?
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #9 from LoneVVolf lonewolf@xs4all.nl --- reverting commit https://cgit.freedesktop.org/mesa/mesa/commit/?id=78e35df52aa2f7d770f929a086... solves the visual corruption and gets rid of the gpu fault messages in dmesg.
As that commit is 2/2 of the patchset referenced in commit #4 , it does look like this introduces new errors. see https://bugs.freedesktop.org/show_bug.cgi?id=110701
https://bugs.freedesktop.org/show_bug.cgi?id=108824
LoneVVolf lonewolf@xs4all.nl changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=110701
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #10 from Pierre-Eric Pelloux-Prayer pelloux@gmail.com --- (In reply to Baldur Karlsson from comment #7)
To repro with RenderDoc:
- Download or build RenderDoc 1.4
- Build gears3d from https://github.com/gears3d/gears3d
- Launch gears3d through RenderDoc, capture, open the frame
- Step back and forth through the drawcalls and the texture viewer will show
up with some corruption.
Screenshot here: https://i.imgur.com/1Dk7diS.png
I tried to reproduce the issue and actually had 2 different issues: - before 12bf7cfecf52083c484602f971738475edfe497e: the rendering is corrupted as described above. Reverting 78e35df52aa2f7d770f929a0866a0faa89c261a9 fixes the rendering.
- starting from 12bf7cfecf52083c484602f971738475edfe497e: the rendering is corrupted and wrong: I only see the red gear, the green/blue ones are never drawn
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #11 from Pierre-Eric Pelloux-Prayer pelloux@gmail.com --- Created attachment 144311 --> https://bugs.freedesktop.org/attachment.cgi?id=144311&action=edit wip patch
The following patch (applied on top of the problematic commit 78e35df52a) seems to fix the corruption problem (but I don't know the code enough to decide if it's a correct fix).
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #12 from Marek Olšák maraeo@gmail.com --- Created attachment 144312 --> https://bugs.freedesktop.org/attachment.cgi?id=144312&action=edit likely fix
This patch should fix it. Thanks to Pierre-Eric for inspiring it.
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #13 from LoneVVolf lonewolf@xs4all.nl --- Applying the "likely fix" patch in https://bugs.freedesktop.org/show_bug.cgi?id=108824#c12 solves the issue with plasma shell/knetwalk on my rx 580.
https://bugs.freedesktop.org/show_bug.cgi?id=108824
--- Comment #14 from raffarti@zoho.com --- The patch fixes corruption caused by 78e35df52aa2f7d770f929a0866a0faa89c261a9 but not the one from 12bf7cfecf52083c484602f971738475edfe497e, which still persists in scroll bars of falkon and akregator. I'm using an RX 480.
https://bugs.freedesktop.org/show_bug.cgi?id=108824
GitLab Migration User gitlab-migration@fdo.invalid changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #15 from GitLab Migration User gitlab-migration@fdo.invalid --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1341.
dri-devel@lists.freedesktop.org