https://bugs.freedesktop.org/show_bug.cgi?id=82828
Priority: medium Bug ID: 82828 Assignee: dri-devel@lists.freedesktop.org Summary: Regression: Crash in 3Dmark2001 Severity: normal Classification: Unclassified OS: All Reporter: stefandoesinger@gmx.at Hardware: Other Status: NEW Version: git Component: Drivers/Gallium/r300 Product: Mesa
Created attachment 104921 --> https://bugs.freedesktop.org/attachment.cgi?id=104921&action=edit Backtrace
Since commit e78a01d5e6f77e075fe667a0f0ccb10d89c0dd58 3DMark2001 crashes in the Nature test when it is run in Wine with the ARB shader backend on r300g.
The 3DMark2001 download can be found here: http://www.futuremark.com/benchmarks/legacy
I used Wine 1.7.22 for testing, but I am certain that the bug can be reproduced with newer Wine releases because the ARB shader code hasn't been changed in a white. My GPU is a Radeon X1600.
To reproduce the bug you have to enable the ARB shader backend by starting Wine's regedit and setting HKEY_CURRENT_USER/Software/Wine/Direct3D/UseGLSL to disabled. Create the Direct3D key and UseGLSL string value if needed.
I do not see the crash on r600g (Tested with 9a071e33, Radeon HD 5770).
A backtrace is attached. The backtrace was generated with Mesa 1c4f141a.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
Tom Stellard tstellar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |andabata12@yahoo.it
--- Comment #1 from Tom Stellard tstellar@gmail.com --- *** Bug 82852 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #2 from José Jorge lists.jjorge@free.fr --- I confirm the same bug on ATI X600 Mobile with Mesa 10.3.0 RC1 . Flightgear at least triggers it.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
Pavel Ondračka pavel.ondracka@email.cz changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |pavel.ondracka@email.cz Keywords| |bisected, regression
--- Comment #3 from Pavel Ondračka pavel.ondracka@email.cz --- Yeah its affecting multiple apps and I also see over 100 crashing piglit tests after this commit on my RV530 so this should be easy to reproduce even without wine.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #4 from Connor Abbott cwabbott0@gmail.com --- All the crashes are in the same place, right?
Can you run it under gdb and print out n2 and the contents of g->nodes[n].adjacency_list (it's an array with g->nodes[n].adjacency_count elements) after the segfault? How about the former before the ra_simplify() call in the ra_allocate() call that's segfaulting? (If you don't know how to do this, see http://stackoverflow.com/questions/2956889/how-to-set-a-counter-for-a-gdb-br...)
I'm guessing that it's segfaulting because n2 is some bogus value. n2 comes from the adjacency_list, which is something generated before the allocator actually runs by code I didn't touch and then never modified afterward, and the code that's segfaulting wasn't modified by the commit in question, so the two most likely options I see are that either this is exposing a bug somewhere else (like in r300g) or the new ra_simplify() is somehow corrupting the adjacency_list. I don't know how r300g sets up the register conflicts and register classes, though, so I can't guess why it works fine on i965 but fails for r300g.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #5 from Pavel Ondračka pavel.ondracka@email.cz --- Created attachment 105451 --> https://bugs.freedesktop.org/attachment.cgi?id=105451&action=edit full backtrace from piglit crash
(In reply to comment #4)
All the crashes are in the same place, right?
Can you run it under gdb and print out n2 and the contents of g->nodes[n].adjacency_list (it's an array with g->nodes[n].adjacency_count elements) after the segfault? How about the former before the ra_simplify() call in the ra_allocate() call that's segfaulting? (If you don't know how to do this, see http://stackoverflow.com/questions/2956889/how-to-set-a-counter-for-a-gdb- breakpoint)
I'm guessing that it's segfaulting because n2 is some bogus value. n2 comes from the adjacency_list, which is something generated before the allocator actually runs by code I didn't touch and then never modified afterward, and the code that's segfaulting wasn't modified by the commit in question, so the two most likely options I see are that either this is exposing a bug somewhere else (like in r300g) or the new ra_simplify() is somehow corrupting the adjacency_list. I don't know how r300g sets up the register conflicts and register classes, though, so I can't guess why it works fine on i965 but fails for r300g.
OK, so not sure if I know what I'm doing but selecting one random crashing piglit test
/bin/shader_runner tests/shaders/glsl-fs-loop-continue.shader_test -auto
Program received signal SIGSEGV, Segmentation fault. 0xb76391a9 in ra_select (g=0x80c2058) at ../../src/mesa/program/register_allocate.c:525 525 BITSET_TEST(g->regs->regs[r].conflicts, g->nodes[n2].reg)) {
print n2 $2 = 0
print n $7 = 1
print g->nodes[n].adjacency_count $1 = 3
print g->nodes[n].adjacency_list $3 = (unsigned int *) 0x80c1b58
print g->nodes[n].adjacency_list[0] $4 = 1
print g->nodes[n].adjacency_list[1] $5 = 0
print g->nodes[n].adjacency_list[2] $6 = 2
full backtrace attached.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #6 from Connor Abbott cwabbott0@gmail.com --- (In reply to comment #5)
Created attachment 105451 [details] full backtrace from piglit crash
(In reply to comment #4)
All the crashes are in the same place, right?
Can you run it under gdb and print out n2 and the contents of g->nodes[n].adjacency_list (it's an array with g->nodes[n].adjacency_count elements) after the segfault? How about the former before the ra_simplify() call in the ra_allocate() call that's segfaulting? (If you don't know how to do this, see http://stackoverflow.com/questions/2956889/how-to-set-a-counter-for-a-gdb- breakpoint)
I'm guessing that it's segfaulting because n2 is some bogus value. n2 comes from the adjacency_list, which is something generated before the allocator actually runs by code I didn't touch and then never modified afterward, and the code that's segfaulting wasn't modified by the commit in question, so the two most likely options I see are that either this is exposing a bug somewhere else (like in r300g) or the new ra_simplify() is somehow corrupting the adjacency_list. I don't know how r300g sets up the register conflicts and register classes, though, so I can't guess why it works fine on i965 but fails for r300g.
OK, so not sure if I know what I'm doing but selecting one random crashing piglit test
/bin/shader_runner tests/shaders/glsl-fs-loop-continue.shader_test -auto
Program received signal SIGSEGV, Segmentation fault. 0xb76391a9 in ra_select (g=0x80c2058) at ../../src/mesa/program/register_allocate.c:525 525 BITSET_TEST(g->regs->regs[r].conflicts, g->nodes[n2].reg)) {
print n2 $2 = 0
print n $7 = 1
print g->nodes[n].adjacency_count $1 = 3
print g->nodes[n].adjacency_list $3 = (unsigned int *) 0x80c1b58
print g->nodes[n].adjacency_list[0] $4 = 1
print g->nodes[n].adjacency_list[1] $5 = 0
print g->nodes[n].adjacency_list[2] $6 = 2
full backtrace attached.
Can you print out the value of g->nodes[n2].reg? I think it may be NO_REG (0xffffffff), even though it shouldn't be (if a node is not on the stack, then it's supposed to be assigned a register already).
(In reply to comment #5)
Created attachment 105451 [details] full backtrace from piglit crash
(In reply to comment #4)
All the crashes are in the same place, right?
Can you run it under gdb and print out n2 and the contents of g->nodes[n].adjacency_list (it's an array with g->nodes[n].adjacency_count elements) after the segfault? How about the former before the ra_simplify() call in the ra_allocate() call that's segfaulting? (If you don't know how to do this, see http://stackoverflow.com/questions/2956889/how-to-set-a-counter-for-a-gdb- breakpoint)
I'm guessing that it's segfaulting because n2 is some bogus value. n2 comes from the adjacency_list, which is something generated before the allocator actually runs by code I didn't touch and then never modified afterward, and the code that's segfaulting wasn't modified by the commit in question, so the two most likely options I see are that either this is exposing a bug somewhere else (like in r300g) or the new ra_simplify() is somehow corrupting the adjacency_list. I don't know how r300g sets up the register conflicts and register classes, though, so I can't guess why it works fine on i965 but fails for r300g.
OK, so not sure if I know what I'm doing but selecting one random crashing piglit test
/bin/shader_runner tests/shaders/glsl-fs-loop-continue.shader_test -auto
Program received signal SIGSEGV, Segmentation fault. 0xb76391a9 in ra_select (g=0x80c2058) at ../../src/mesa/program/register_allocate.c:525 525 BITSET_TEST(g->regs->regs[r].conflicts, g->nodes[n2].reg)) {
print n2 $2 = 0
print n $7 = 1
print g->nodes[n].adjacency_count $1 = 3
print g->nodes[n].adjacency_list $3 = (unsigned int *) 0x80c1b58
print g->nodes[n].adjacency_list[0] $4 = 1
print g->nodes[n].adjacency_list[1] $5 = 0
print g->nodes[n].adjacency_list[2] $6 = 2
full backtrace attached.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #7 from Connor Abbott cwabbott0@gmail.com --- Oh, and I forgot to mention:
If you do find that g->nodes[n2].reg is NO_REG, the next step would be to break at the end of ra_simplify() (but make sure to stop at the last time the breakpoint gets hit before the segfault using the stackoverflow post I linked to) and print out the values of all the nodes (g->nodes[0], g->nodes[1], ..., g->nodes[g->count - 1]). All the ones with .reg = NO_REG should also have .in_stack = true. If one has .reg = NO_REG and .in_stack = false, then in ra_simplify() we should have reached line 468, in which case we either push it onto the stack (if pq_test() returns true) or considered it for optimistic coloring (if pq_test() returns false). So if we finished the loop, then progress == false and so no nodes were pushed on the stack and no nodes were considered for optimistic coloring (see the places where we set progress = true), so no nodes should have .reg = NO_REG and .in_stack = false when we leave ra_simplify(). Then, in ra_select(), whenever we set .in_stack = false (line 536) we also set .reg to something else (line 541) unless we run out of registers in which case we bail out and then r300g will complain about running out of registers. So it seems strange to me that that would happen, but also the most likely explanation of why it's segfaulting.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #8 from Pavel Ondračka pavel.ondracka@email.cz --- Ok, so indeed I got NO_REG for g->nodes[n2].reg
print g->nodes[n2].reg $1 = 4294967295
than I set breakpoint at end of ra_simplify (it gets called just once before the crash)
Breakpoint 1, ra_simplify (g=0x80c2058) at ../../src/mesa/program/register_allocate.c:491 491 } (gdb) print g->count $2 = 3
print g->nodes[0] $3 = {adjacency = 0x81b8968, adjacency_list = 0x80c1658, adjacency_list_size = 4, adjacency_count = 3, class = 0, reg = 4294967295, in_stack = false, q_total = 4294967295, spill_cost = 0}
print g->nodes[1] $4 = {adjacency = 0x81b3d18, adjacency_list = 0x80c1b58, adjacency_list_size = 4, adjacency_count = 3, class = 2, reg = 4294967295, in_stack = true, q_total = 2, spill_cost = 0}
print g->nodes[2] $5 = {adjacency = 0x81c0328, adjacency_list = 0x80c18d8, adjacency_list_size = 4, adjacency_count = 3, class = 3, reg = 4294967295, in_stack = true, q_total = 4, spill_cost = 0}
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #9 from Connor Abbott cwabbott0@gmail.com --- Created attachment 105572 --> https://bugs.freedesktop.org/attachment.cgi?id=105572&action=edit debugging patch
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #10 from Connor Abbott cwabbott0@gmail.com --- Can you try the patch I attached and tell me what output you get between the last "--- begin simplify ---" and "--- end simplify ---" pair?
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #11 from Pavel Ondračka pavel.ondracka@email.cz --- Full test output with debugging patch:
$ bin/shader_runner tests/shaders/glsl-fs-loop-continue.shader_test -auto r300: DRM version: 2.38.0, Name: ATI RV530, ID: 0x71c5, GB: 1, Z: 2 r300: GART size: 509 MB, VRAM size: 256 MB r300: AA compression RAM: YES, Z compression RAM: YES, HiZ RAM: YES --- begin simplify --- got here with node 2 pushing node 2 onto the stack got here with node 1 pushing node 1 onto the stack got here with node 0 got here with node 0 --- end simplify --- Neoprávněný přístup do paměti (SIGSEGV)
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #12 from Connor Abbott cwabbott0@gmail.com --- Created attachment 105630 --> https://bugs.freedesktop.org/attachment.cgi?id=105630&action=edit another debugging patch
Ok, it looks like the problem is that node 0's q_total is bogus, which means it never even gets considered for optimistic coloring. To help me figure out why this is, can you apply this patch to master (not on top of the other patch) and tell me the output of the piglit test now?
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #13 from Tom Stellard tstellar@gmail.com --- (In reply to comment #12)
Created attachment 105630 [details] [review] another debugging patch
Ok, it looks like the problem is that node 0's q_total is bogus, which means it never even gets considered for optimistic coloring. To help me figure out why this is, can you apply this patch to master (not on top of the other patch) and tell me the output of the piglit test now?
On (In reply to comment #12)
Created attachment 105630 [details] [review] another debugging patch
Ok, it looks like the problem is that node 0's q_total is bogus, which means it never even gets considered for optimistic coloring. To help me figure out why this is, can you apply this patch to master (not on top of the other patch) and tell me the output of the piglit test now?
I'm not sure if this matters, but r300g pre-allocates the input registers before calling ra_allocate_no_spills().
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #14 from Connor Abbott cwabbott0@gmail.com --- (In reply to comment #13)
(In reply to comment #12)
Created attachment 105630 [details] [review] [review] another debugging patch
Ok, it looks like the problem is that node 0's q_total is bogus, which means it never even gets considered for optimistic coloring. To help me figure out why this is, can you apply this patch to master (not on top of the other patch) and tell me the output of the piglit test now?
On (In reply to comment #12)
Created attachment 105630 [details] [review] [review] another debugging patch
Ok, it looks like the problem is that node 0's q_total is bogus, which means it never even gets considered for optimistic coloring. To help me figure out why this is, can you apply this patch to master (not on top of the other patch) and tell me the output of the piglit test now?
I'm not sure if this matters, but r300g pre-allocates the input registers before calling ra_allocate_no_spills().
I think there are no input registers in this case (there's a NumInputs = 0 somewhere in the backtrace) so there aren't any pre-allocated nodes here.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #15 from Tom Stellard tstellar@gmail.com --- Can you post the output of RADEON_DEBUG=ps,vs ?
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #16 from Marek Olšák maraeo@gmail.com --- (In reply to comment #14)
(In reply to comment #13)
(In reply to comment #12)
Created attachment 105630 [details] [review] [review] [review] another debugging patch
Ok, it looks like the problem is that node 0's q_total is bogus, which means it never even gets considered for optimistic coloring. To help me figure out why this is, can you apply this patch to master (not on top of the other patch) and tell me the output of the piglit test now?
On (In reply to comment #12)
Created attachment 105630 [details] [review] [review] [review] another debugging patch
Ok, it looks like the problem is that node 0's q_total is bogus, which means it never even gets considered for optimistic coloring. To help me figure out why this is, can you apply this patch to master (not on top of the other patch) and tell me the output of the piglit test now?
I'm not sure if this matters, but r300g pre-allocates the input registers before calling ra_allocate_no_spills().
I think there are no input registers in this case (there's a NumInputs = 0 somewhere in the backtrace) so there aren't any pre-allocated nodes here.
What Tom probably meant is that inputs are loaded to temps before the fragment shader starts, so inputs and temps pretty much share the temporary file. Not sure how relevant it is to this issue, but obviously you can't rename the temps which are supposed to contain inputs.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #17 from Pavel Ondračka pavel.ondracka@email.cz --- output with second debug patch:
bin/shader_runner tests/shaders/glsl-fs-loop-continue.shader_test -auto r300: DRM version: 2.38.0, Name: ATI RV530, ID: 0x71c5, GB: 1, Z: 2 r300: GART size: 509 MB, VRAM size: 256 MB r300: AA compression RAM: YES, Z compression RAM: YES, HiZ RAM: YES increasing q total, old q total = 0, n1 = 0, n2 = 1, value = 1 increasing q total, old q total = 0, n1 = 1, n2 = 0, value = 1 increasing q total, old q total = 1, n1 = 0, n2 = 2, value = 1 increasing q total, old q total = 0, n1 = 2, n2 = 0, value = 1 increasing q total, old q total = 1, n1 = 1, n2 = 2, value = 1 increasing q total, old q total = 1, n1 = 2, n2 = 1, value = 3 decreasing q total, old q total = 2, n = 2, n2 = 0, value = 0 decreasing q total, old q total = 2, n = 2, n2 = 1, value = 0 decreasing q total, old q total = 2, n = 1, n2 = 0, value = 3 Neoprávněný přístup do paměti (SIGSEGV)
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #18 from Pavel Ondračka pavel.ondracka@email.cz --- Created attachment 105641 --> https://bugs.freedesktop.org/attachment.cgi?id=105641&action=edit RADEON_DEBUG=fp,vp output
(In reply to comment #15)
Can you post the output of RADEON_DEBUG=ps,vs ?
I suppose you mean RADEON_DEBUG=fp,vp?
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #19 from Connor Abbott cwabbott0@gmail.com --- Created attachment 105645 --> https://bugs.freedesktop.org/attachment.cgi?id=105645&action=edit proposed fix
Does this patch fix the piglit failures? For doing a full piglit run, I'd recommend comparing the commit before before my series where the mess started (d72d67832bd7a5f2aa0c402333a7de6804ad35ef) and the last commit (e78a01d5e6f77e075fe667a0f0ccb10d89c0dd58) with my fix on top.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #20 from Pavel Ondračka pavel.ondracka@email.cz --- Your patch does indeed fix the crashing tests, I still see some piglit regressions but that should be either bug 82882 or bug 82978. Thanks for the fix.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
--- Comment #21 from Connor Abbott cwabbott0@gmail.com --- FYI, I posted the fix I attached as http://lists.freedesktop.org/archives/mesa-dev/2014-September/067343.html and a few other patches that cleanup things I noticed when fixing this, but I don't have commit access so I'm waiting for someone to push the series before I close this issue.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
Fabio Pedretti fabio.ped@libero.it changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |blocker Priority|medium |high
--- Comment #22 from Fabio Pedretti fabio.ped@libero.it --- Can someone push Connor patches and backport the fix in time for 10.3?
r300 is seriously broken without this fix, with many apps crashing, and it would be nice to have it fixed in time for 10.3.
https://bugs.freedesktop.org/show_bug.cgi?id=82828
Andreas Boll andreas.boll.dev@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #23 from Andreas Boll andreas.boll.dev@gmail.com --- Fixed with commit afd82dcad127b64381ca6d80d0e499368074f474
https://bugs.freedesktop.org/show_bug.cgi?id=82828
Fabio Pedretti fabio.ped@libero.it changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED
dri-devel@lists.freedesktop.org