https://bugs.freedesktop.org/show_bug.cgi?id=99349
--- Comment #7 from Gert Wollny gw.fossdev@gmail.com --- Now, just multiplying two constants/uniforms not necessarily trigger the bug. With a simple shader program like
uniform vec4 base_color; uniform vec4 test; uniform vec4 test2; uniform vec4 test3;
void main() { vec4 h1 = base_color * test; vec4 h2 = test2 * test3; gl_FragColor = h1 * h2; }
for both const-const multiplications one constant is always addressed via a GPR, i.e. I get
1: MUL TEMP[0], CONST[0], CONST[1] r600_shader.c:3986 tgsi_op2_s - About to multiply two constants r600_shader.c:4000 tgsi_op2_s - ctx->src[0]: sel:7 // this is a GPR address swizzle:0 1 2 3 neg:0 abs:0 rel:0 kc_bank:0 kc_rel:0 value:0 0 0 0
r600_shader.c:4000 tgsi_op2_s - ctx->src[1]: sel:513 // this is a cfile address swizzle:0 1 2 3 neg:0 abs:0 rel:0 kc_bank:0 kc_rel:0 value:0 0 0 0
and then check_vector/reserve_cfile can successfully assign the read ports via cfile because only 4 values need to be read.
However, for a more complicated shader I get the following:
250: MUL TEMP[11], CONST[26], CONST[23] r600_shader.c:3986 tgsi_op2_s - About to multiply two constants r600_shader.c:4000 tgsi_op2_s - ctx->src[0]: sel:160 // cfile kcache after translation swizzle:0 1 2 3 neg:0 abs:0 rel:0 kc_bank:0 kc_rel:0 value:0 0 0 0
r600_shader.c:4000 tgsi_op2_s - ctx->src[1]: sel:535 // cfile kcache before translation swizzle:0 1 2 3 neg:0 abs:0 rel:0 kc_bank:0 kc_rel:0 value:0 0 0 0
r600_asm.c:472 check_vector - bs->hw_cfile_addr:[-1 -1] bs->hw_cfile_elem: [-1 -1] bank_swizzle:0 num_src:2 r600_asm.c:494 check_vector - src 0: sel:160 elem:0 r600_asm.c:423 reserve_cfile - res=0: bs->hw_cfile_addr:-1 bs->hw_cfile_elem:-1 sel:160 chan:0 r600_asm.c:494 check_vector - src 1: sel:535 elem:0 r600_asm.c:423 reserve_cfile - res=0: bs->hw_cfile_addr:160 bs->hw_cfile_elem:0 sel:535 chan:0 r600_asm.c:423 reserve_cfile - res=1: bs->hw_cfile_addr:-1 bs->hw_cfile_elem:-1 sel:535 chan:0 r600_asm.c:472 check_vector - bs->hw_cfile_addr:[160 535] bs->hw_cfile_elem: [0 0] bank_swizzle:0 num_src:2 r600_asm.c:494 check_vector - src 0: sel:160 elem:1 r600_asm.c:423 reserve_cfile - res=0: bs->hw_cfile_addr:160 bs->hw_cfile_elem:0 sel:160 chan:0 r600_asm.c:494 check_vector - src 1: sel:535 elem:1 r600_asm.c:423 reserve_cfile - res=0: bs->hw_cfile_addr:160 bs->hw_cfile_elem:0 sel:535 chan:0 r600_asm.c:423 reserve_cfile - res=1: bs->hw_cfile_addr:535 bs->hw_cfile_elem:0 sel:535 chan:0 r600_asm.c:472 check_vector - bs->hw_cfile_addr:[160 535] bs->hw_cfile_elem: [0 0] bank_swizzle:0 num_src:2 r600_asm.c:494 check_vector - src 0: sel:160 elem:2 r600_asm.c:423 reserve_cfile - res=0: bs->hw_cfile_addr:160 bs->hw_cfile_elem:0 sel:160 chan:1 r600_asm.c:423 reserve_cfile - res=1: bs->hw_cfile_addr:535 bs->hw_cfile_elem:0 sel:160 chan:1 r600_asm.c:436 reserve_cfile - All cfile read ports are used, cannot reference vector element.
In summary allocating a read port for elem >= 2 fails, because it would mean reading more than four values in one instruction group, and this is ot possible according to the AMD Evergreen-Family instruction set manual 4.7.5.