https://bugs.freedesktop.org/show_bug.cgi?id=49140
Bug #: 49140 Summary: r600_state_common.c:761:r600_draw_vbo: Assertion `0' failed Classification: Unclassified Product: Mesa Version: git Platform: All OS/Version: Linux (All) Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/r600 AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: bgz.marko@gmail.com
Created attachment 60568 --> https://bugs.freedesktop.org/attachment.cgi?id=60568 R600_DUMP_SHADERS
Hello,
I have an application using a rather complex shader with some branching - while/if. Applications fails with r600 driver giving the following error:
EE r600_shader.c:140 r600_pipe_shader_create - translation from TGSI failed ! r600_state_common.c:761:r600_draw_vbo: Assertion `0' failed.
It seems something goes wrong in the branching section, since it works if I comment it. The same shader works fine using either LIBGL_ALWAYS_SOFTWARE=1 or fglrx. Also, I can remember working it fine with some older revision of R600, unfortunately I don't know which one exactly.
I have attached R600_DUMP_SHADERS output. If needed I can also provide links to source code or any other data that may be helpful in debugging.
Some relevant parts of glxinfo:
OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD RV770 OpenGL version string: 2.1 Mesa 8.1-devel (git-1a33c1b precise-oibaf-ppa)
Best regards, Marko
https://bugs.freedesktop.org/show_bug.cgi?id=49140
--- Comment #1 from Vadim ptpzz@yandex.ru 2012-04-25 16:06:28 PDT --- Probably register limit. Shader uses 5 inputs + 8 outputs + 112 temps = 125 registers. I think it should work if you could make it less than 120.
https://bugs.freedesktop.org/show_bug.cgi?id=49140
--- Comment #2 from Marko bgz.marko@gmail.com 2012-04-26 00:27:46 PDT --- Any suggestions on how to accomplish that? I tried turning my code around and around, but so far with no success. What should help with the register count?
On the other hand, why is the register limit set so low? I just tested the program with Mesa 7.11, same hardware and r600 drivers - it works. This seems like a regression.
If needed, there are binaries/source available for testing at http://thelarge.org
https://bugs.freedesktop.org/show_bug.cgi?id=49140
--- Comment #3 from Vadim ptpzz@yandex.ru 2012-04-26 03:43:45 PDT --- (In reply to comment #2)
Any suggestions on how to accomplish that? I tried turning my code around and around, but so far with no success. What should help with the register count?
On the other hand, why is the register limit set so low? I just tested the program with Mesa 7.11, same hardware and r600 drivers - it works. This seems like a regression. If needed, there are binaries/source available for testing at http://thelarge.org
It's a hardware limit. The compiler in theory should optimize register allocation, but the problem is that r600g still lacks real register allocator. And probably some changes since 7.11 increased register usage in the TGSI IR.
I'll see if I can help with that shader somehow, but generally r600g needs a better shader compiler. There is some work in progress on that, but I don't know when it will be completed.
Also there is some experimental code that probably could help with that, but currently it works only with evergreen GPUs. If you could use a gpu of the evergreen class (IIRC it's all of 5xxx, some of 6xxx cards), then you might want to try r600_shader_opt and r600_shader_opt_2 branches from the following repo: https://github.com/VadimGirlin/mesa
https://bugs.freedesktop.org/show_bug.cgi?id=49140
--- Comment #4 from Vadim ptpzz@yandex.ru 2012-04-26 17:38:49 PDT --- Created attachment 60642 --> https://bugs.freedesktop.org/attachment.cgi?id=60642 lorentzTransform function
It seems you could replace the following lines in the lorentzTransform function:
r.w = g*p.w - v.x*g*p.x - v.y*g*p.y - v.z*g*p.z; r.x = -v.x*g*p.w + (1.0 + gm1*v.x*v.x/v2)*p.x + (gm1*v.x*v.y/v2)*p.y + (gm1*v.x*v.z/v2)*p.z; r.y = -v.y*g*p.w + (gm1*v.y*v.x/v2)*p.x + (1.0 + gm1*v.y*v.y/v2)*p.y + (gm1*v.y*v.z/v2)*p.z, r.z = -v.z*g*p.w + (gm1*v.z*v.x/v2)*p.x + (gm1*v.z*v.y/v2)*p.y + (1.0+gm1*v.z*v.z/v2)*p.z;
with
vec3 p3 = vec3(p.x, p.y, p.z); float t = dot(v, p3); float t2 = gm1*t/v2 - g*p.w; r = vec4( v*t2 + p3, g * (p.w - t));
Attachment contains the complete text of the modified function with the separate steps of the transformation in the comments. Please check if all steps are correct. Anyway, it shows the direction.
Original shader uses 130 regs, 1262 vliw alu instructions on my system. Modified version - 81 reg, 778 instructions.
https://bugs.freedesktop.org/show_bug.cgi?id=49140
--- Comment #5 from Marko bgz.marko@gmail.com 2012-05-02 22:42:10 PDT ---
vec3 p3 = vec3(p.x, p.y, p.z); float t = dot(v, p3); float t2 = gm1*t/v2 - g*p.w; r = vec4( v*t2 + p3, g * (p.w - t));
Vadim, this is really great and much appreciated. I went through the steps and it all seems fine to me, also tested some examples and everything works great. This also makes the advanced path in the shader work again without my quirky workarounds in the while loop. I never really took much care with these kind of optimisations, somehow blindly hoping that the compiler will automagically optimise everything. I'll try to be more careful in advance.
I'd like to put some comments in the code, like "optimised by Vadim" if that's ok with you or perhaps should I use your real name? Thanks.
https://bugs.freedesktop.org/show_bug.cgi?id=49140
--- Comment #6 from Vadim Girlin ptpzz@yandex.ru 2012-05-03 03:16:19 PDT --- (In reply to comment #5)
vec3 p3 = vec3(p.x, p.y, p.z); float t = dot(v, p3); float t2 = gm1*t/v2 - g*p.w; r = vec4( v*t2 + p3, g * (p.w - t));
Vadim, this is really great and much appreciated. I went through the steps and it all seems fine to me, also tested some examples and everything works great. This also makes the advanced path in the shader work again without my quirky workarounds in the while loop. I never really took much care with these kind of optimisations, somehow blindly hoping that the compiler will automagically optimise everything. I'll try to be more careful in advance.
I'd like to put some comments in the code, like "optimised by Vadim" if that's ok with you or perhaps should I use your real name? Thanks.
Ah, yes, I forgot to set my full name in the account here. Updated now, so you can use it if you want.
https://bugs.freedesktop.org/show_bug.cgi?id=49140
Andreas Boll andreas.boll.dev@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #60568|application/octet-stream |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=49140
GitLab Migration User gitlab-migration@fdo.invalid changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #7 from GitLab Migration User gitlab-migration@fdo.invalid --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/408.
dri-devel@lists.freedesktop.org