https://bugs.freedesktop.org/show_bug.cgi?id=97305
Bug ID: 97305 Summary: Wrong values returned by GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT & GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT randomly breaks stuff Product: Mesa Version: git Hardware: All OS: All Status: NEW Severity: critical Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: dark_sylinc@yahoo.com.ar QA Contact: dri-devel@lists.freedesktop.org
I'm the main developer of Ogre 2.1
We're trying our demos in latest Mesa: I've built Mesa from scratch. Tried commits git-edfc17a (current head of branch 12.0) and commit 17f1c49b9ad05af4f6482f6fa950e5dcc1a779d1 (current head of master branch)
Almost all of them work except for the compute shader based ones (Googling around it appears for Southern Island radeon it could be buggy so it was turned off. Not 100% sure but I think that's what's happening). Anyway, not a big deal on that. I understand the Mesa team considers that WIP. Move on.
However our Forward3D demo was glitching and I didn't know why. I've investigated the problem and located it: * glGetIntegerv w/ GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT is returning 4, should return 256 for my HW (GCN 1.0 AMD Radeon HD 7770 Southern Islands). * glGetIntegerv w/ GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT is returning 4, should return 256 for my HW.
In both cases "256" is what Windows drivers return btw, including fglrx on Linux.
When aligned to 4 bytes; the following worked correctly: glTexBufferRange( GL_TEXTURE_BUFFER, GL_R32UI, boName, offsetAlignedTo4ButNotTo256, sizeBytes ); uniform usamplerBuffer f3dGrid; uint numLightsInGrid = texelFetch( f3dGrid, int(sampleOffset) ).x;
But the following did not: glTexBufferRange( GL_TEXTURE_BUFFER, GL_RGBA32F, boName, offsetAlignedTo4ButNotTo256, sizeBytes ); uniform samplerBuffer f3dLightList; vec4 values = texelFetch( f3dLightList, int(sampleOffset) ).x;
The output of the latter was flickering garbage. When I forced 256 byte alignment, it worked as expected.
I consider this a critical issue because I suspect it's causing subtle problems with a lot of titles whose buffer offsets don't start at 0.
I believe NVIDIA GPUs also require buffer offsets of 256. I don't know about Intel cards.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #1 from Ilia Mirkin imirkin@alum.mit.edu --- (In reply to Matias N. Goldberg from comment #0)
But the following did not: glTexBufferRange( GL_TEXTURE_BUFFER, GL_RGBA32F, boName, offsetAlignedTo4ButNotTo256, sizeBytes ); uniform samplerBuffer f3dLightList; vec4 values = texelFetch( f3dLightList, int(sampleOffset) ).x;
The output of the latter was flickering garbage. When I forced 256 byte alignment, it worked as expected.
My guess is that 16 would have worked as well. It just has to be aligned to the texel. 4 works for GL_RGBA8 but not GL_RGBA32*. Unfortunately there's no provision for a per-format limit.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #2 from Nicolai Hähnle nhaehnle@gmail.com --- I agree with Ilia. There should be no requirement for an alignment greater than the "texel" size, that really wouldn't make sense given how the hardware works. Can you please try with 16-but-not-256 alignment?
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #3 from Matias N. Goldberg dark_sylinc@yahoo.com.ar ---
BEGIN PATCH
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index 88f4f20..6a2d5bc 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -363,10 +363,11 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT: case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT: + return 256; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return 4; case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: - return HAVE_LLVM >= 0x0309 ? 4 : 0; + return HAVE_LLVM >= 0x0309 ? 256 : 0;
case PIPE_CAP_GLSL_FEATURE_LEVEL: if (pscreen->get_shader_param(pscreen, PIPE_SHADER_COMPUTE,
<< END PATCH
I'm playing safe by going with 256 bytes with everyone, based on the overwhelming amount of reports (on the left top there's a dropdown, select "All"); unless someone wants to go through the trouble of identifying each card. Even in DX12 256 alignment is by spec: http://delphigl.de/glcapsviewer/gl_stats_caps_single.php?listreportsbycap=GL... http://delphigl.de/glcapsviewer/gl_stats_caps_single.php?listreportsbycap=GL... http://delphigl.de/glcapsviewer/gl_stats_caps_single.php?listreportsbycap=GL...
I believe this patch could potentially increase radeonsi's compatibility greatly and should be tested against popular game titles.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #4 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- Sure I will try 16 but not 256
https://bugs.freedesktop.org/show_bug.cgi?id=97305
Marek Olšák maraeo@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |NOTABUG Status|NEW |RESOLVED
--- Comment #5 from Marek Olšák maraeo@gmail.com --- Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN cards only require 4 byte alignment there.
All other buffer binding points require 1, 2, or 4 bytes depending on the texel size. It's really min(texel_size, 4).
Whatever the closed OpenGL driver reports doesn't correspond to the reality.
The limits reported by our Vulkan driver are very similar to Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #6 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- I'm back.
I only tested texture and uniform buffers, not SSBOs.
Results: * Setting UBOs to 4 byte alignments didn't seem to matter in any way. Performance didn't seem to be affected either. Nonetheless given that AMD always sets it to 256; this is what I'd recommend (i.e. there could be undocumented hw errors for edge cases?)
* Setting TBOs to non-16 byte alignments breaks in a specific pattern (i.e. 4 byte alignments produce certain glitches, 8 byte alignments produce a different set of glitches, 12 byte alignments a different set, etc). Setting TBOs to 16 byte alignments fixes everything. I benchmarked 256-byte alignments vs 16-byte alignments-non-256-byte and couldn't see any performance difference (the sample kept around 46.50-47.00ms all the time).
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #7 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- Our code is complex so if you keep insisting 4 byte alignment is correct, I won't deny the possibility it could be a bug in our codebase. I would have to create a simple as possible case to double check it.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #8 from Marek Olšák maraeo@gmail.com ---
Nonetheless given that AMD always sets it to 256 ...
Not true. The Mesa driver is also developed by us. Nicolai and me work for AMD. There is also the Vulkan driver which returns "1 byte" for the texture buffer alignment.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #9 from Ilia Mirkin imirkin@alum.mit.edu --- (In reply to Marek Olšák from comment #5)
Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN cards only require 4 byte alignment there.
All other buffer binding points require 1, 2, or 4 bytes depending on the texel size. It's really min(texel_size, 4).
That means that you have to return max(texel_size) for the GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture buffer can be any offset.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #10 from Marek Olšák maraeo@gmail.com --- (In reply to Ilia Mirkin from comment #9)
(In reply to Marek Olšák from comment #5)
Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN cards only require 4 byte alignment there.
All other buffer binding points require 1, 2, or 4 bytes depending on the texel size. It's really min(texel_size, 4).
That means that you have to return max(texel_size) for the GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture buffer can be any offset.
That's what we already do.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #11 from Ilia Mirkin imirkin@alum.mit.edu --- (In reply to Marek Olšák from comment #10)
(In reply to Ilia Mirkin from comment #9)
(In reply to Marek Olšák from comment #5)
Pre-GCN cards require an alignment of 256 bytes for uniform buffers. GCN cards only require 4 byte alignment there.
All other buffer binding points require 1, 2, or 4 bytes depending on the texel size. It's really min(texel_size, 4).
That means that you have to return max(texel_size) for the GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT. You only get one cap, and a texture buffer can be any offset.
That's what we already do.
You return 4. However that's not enough if I set the format to GL_RGBA32I. There's no provision in the spec to disallow someone feeding you a aligned-to-4 buffer with GL_RGBA32I format if you return "4" as the value.
Oh, unless I totally misunderstood what you were saying, and that situation should work. In which case, apologies for the noise.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #12 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- Created attachment 125723 --> https://bugs.freedesktop.org/attachment.cgi?id=125723&action=edit Proof of the bug
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #13 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- I've added a "Proof of the bug".
There IS a bug. I don't know if it's software or in the hardware. The attached sample does the following:
1. Create a 1MB buffer. 2. Fill the first 4kb with the pattern: data[i+0] = 1.0f; data[i+1] = 0.5f; data[i+2] = 0.15f; data[i+3] = 0.0f; 3. Attach it via: glActiveTexture( GL_TEXTURE0 ); glBindTexture( GL_TEXTURE_BUFFER, texName ); glTexBufferRange( GL_TEXTURE_BUFFER, GL_RGBA32F, vboName, elementStart*sizeof(float), 1024*4*sizeof(float) ); ++elementStart; 4. Render a triangle showing the colours.
If the alignment is not a problem, I should see Orange-Cream, then Carmine, then blue-violetish, then green; repeat.
Instead I **ALWAYS** see Orange-Cream; which is clearly incorrect. After running the sample long enough, elementStart will be > 1024 (thus 1024*4 exceeding 4kb) and begins to display black. (and if you leave it long enough running it could crash or throw an error, that's not the point)
https://bugs.freedesktop.org/show_bug.cgi?id=97305
Marek Olšák maraeo@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Wrong values returned by |Gallium: TBOs and images |GL_UNIFORM_BUFFER_OFFSET_AL |set the offset in elements, |IGNMENT & |not bytes |GL_TEXTURE_BUFFER_OFFSET_AL | |IGNMENT randomly breaks | |stuff | Resolution|NOTABUG |--- Status|RESOLVED |REOPENED
--- Comment #14 from Marek Olšák maraeo@gmail.com --- Thanks. It's not a hardware issue. It's a driver design issue.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #15 from Marek Olšák maraeo@gmail.com --- I'm working on it.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
Vedran Miletić vedran@miletic.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |vedran@miletic.net
--- Comment #16 from Vedran Miletić vedran@miletic.net --- (In reply to Marek Olšák from comment #15)
I'm working on it.
Any news?
https://bugs.freedesktop.org/show_bug.cgi?id=97305
--- Comment #17 from Marek Olšák maraeo@gmail.com --- It should be fixed in master already.
https://bugs.freedesktop.org/show_bug.cgi?id=97305
Matias N. Goldberg dark_sylinc@yahoo.com.ar changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|REOPENED |RESOLVED
--- Comment #18 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- You're awesome!
I can confirm it works on both the sample repro I provided and my actual real case.
Small note: We discovered there was a potential for a crash because at certain points we assume SIMD alignment; so we're now enforcing 16-byte alignment as minimum in our code. I'm doing this remark because Mesa is the only (or among the very few) implementations that return alignments <16; which can reveal hidden bugs in program code that runs fine everywhere else. If you ever see a bug report of these sorts (e.g. crashing in movaps, movntdqa), this is something to keep in mind.
I appreciate the 4 byte alignment as 256 reported by others is excessively wasteful. Thanks.
For the record it was fixed in https://cgit.freedesktop.org/mesa/mesa/commit/?id=7cd256ce7e4bad680bb77d033c... and https://cgit.freedesktop.org/mesa/mesa/commit/?id=325379096f54dde39171d1b880...
dri-devel@lists.freedesktop.org