https://bugs.freedesktop.org/show_bug.cgi?id=53111
Bug #: 53111 Summary: [bisected] lockups since added support for virtual address space on cayman v11 Classification: Unclassified Product: Mesa Version: git Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/r600 AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: alexandre.f.demers@gmail.com
When running RendererFeatTest64, it always locks at the same test. Lockups also happen when running piglit r600.test, locking always near the same test (sanity tests are OK). If we disable virtual address space as explained under bug 45018, no lockups happen.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #1 from Anthony Waters awaters1@gmail.com 2012-08-04 04:13:14 UTC --- Created attachment 65108 --> https://bugs.freedesktop.org/attachment.cgi?id=65108 dmesg of piglit r600.test crash
I also have the same issue, here is the dmesg of the crash I get when running the piglit test case r600.test. This is with virtual address enabled and the patches from bug 45018 applied.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #2 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-04 04:32:01 UTC --- Small note to whoever could come here and was not following bug 45018:
Bisecting identified the following commit as culprit:
bb1f0cf3508630a9a93512c79badf8c493c46743 is the first bad commit commit bb1f0cf3508630a9a93512c79badf8c493c46743 Author: Jerome Glisse jglisse@redhat.com Date: Fri Dec 2 10:20:29 2011 -0500
r600g: add support for virtual address space on cayman v11
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #3 from Michel Dänzer michel@daenzer.net 2012-08-06 13:27:48 UTC --- FWIW, r600.tests should no longer be used in favour of quick-driver.tests. I assume it still happens with the latter though, so here's some debugging tips:
For isolating a single piglit test that locks up, it may help to run piglit-run.py with -c 0 to prevent several tests from running in parallel.
For isolating the cause of a lockup, it may help to add some debugging output about virtual addresses to the r600g driver, and compare that to the fault address in the VM_CONTEXT1_PROTECTION_FAULT_ADDR register.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #4 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-06 14:42:39 UTC --- (In reply to comment #3)
FWIW, r600.tests should no longer be used in favour of quick-driver.tests. I assume it still happens with the latter though, so here's some debugging tips:
For isolating a single piglit test that locks up, it may help to run piglit-run.py with -c 0 to prevent several tests from running in parallel.
For isolating the cause of a lockup, it may help to add some debugging output about virtual addresses to the r600g driver, and compare that to the fault address in the VM_CONTEXT1_PROTECTION_FAULT_ADDR register.
Your info will be helpful for piglit tests, I'll try that later. For the debug calls, I'll let someone else propose a patch so it is at the right spot.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #5 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-07 01:21:27 UTC --- Tested running one piglit test at a time (thanks Michel) and it always locks on "texturing/depthstencil-render-miplevels 146 s=z24_s8_d=z32f_s8". It locks hard, resets, stays locked and usually restarts the computer.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #6 from Anthony Waters awaters1@gmail.com 2012-08-09 03:07:19 UTC --- The fault address in the VM_CONTEXT1_PROTECTION_FAULT_ADDR register is less than the start of the virtual address area, unless that is due to the bug?
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #7 from Michel Dänzer michel@daenzer.net 2012-08-09 07:05:26 UTC --- (In reply to comment #6)
The fault address in the VM_CONTEXT1_PROTECTION_FAULT_ADDR register is less than the start of the virtual address area, unless that is due to the bug?
Sorry, should have mentioned that the address in VM_CONTEXT1_PROTECTION_FAULT_ADDR is shifted right by 12 bits (i.e. it's the page frame number).
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #8 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-17 03:49:31 UTC --- Is there a way to use apitrace in combination with piglit? I'd like to trace the problematic test.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #9 from Michel Dänzer michel@daenzer.net 2012-08-17 07:28:50 UTC --- (In reply to comment #8)
Is there a way to use apitrace in combination with piglit? I'd like to trace the problematic test.
The first step would be to reproduce the problem by manually running the problematic test from the piglit/bin directory. Then you should be able to apitrace it just like any other app.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #10 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-18 05:08:00 UTC --- Well, it seems running it through qapitrace doesn't lock. But running only this single test in a terminal does.
One thing though: when using qapitrace and looking up state, framebuffer under surfaces is pretty much garbage whatever stage I look at. I don't know if this is expected fom depthstencil-render-miplevels 146 s=z24_s8_d=z32f_s8.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #11 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-18 05:09:01 UTC --- Created attachment 65723 --> https://bugs.freedesktop.org/attachment.cgi?id=65723 apitrace
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #12 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-19 22:22:24 UTC --- I tried to trace RenderFeatTest (one of the other applications locking my system). It did as with the piglit test: it didn't crash. However, the rendering is corrupted where it locks when launched from a terminal. Trace is 75MB when compressed if you want me to upload it somewhere.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #13 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-19 23:03:48 UTC --- (In reply to comment #12)
I tried to trace RenderFeatTest (one of the other applications locking my system). It did as with the piglit test: it didn't crash. However, the rendering is corrupted where it locks when launched from a terminal. Trace is 75MB when compressed if you want me to upload it somewhere.
I forgot to say: it doesn't lock anymore at all. I should have written "... where it locked when launched from a terminal". It was locking at test 7. I'm attaching a screenshot from that test.
I'll bisect to see if I can find which commit "fixed" the lock.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #14 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-19 23:04:33 UTC --- Created attachment 65813 --> https://bugs.freedesktop.org/attachment.cgi?id=65813 bad rendereing on test 7, where it used to lock
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #15 from Michel Dänzer michel@daenzer.net 2012-08-20 14:59:34 UTC --- (In reply to comment #10)
Well, it seems running it through qapitrace doesn't lock.
The apitrace looks incomplete: it doesn't contain any actual rendering operations.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #16 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-20 15:05:03 UTC --- (In reply to comment #15)
(In reply to comment #10)
Well, it seems running it through qapitrace doesn't lock.
The apitrace looks incomplete: it doesn't contain any actual rendering operations.
I'll rerun it at home tonight.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #17 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-22 03:02:45 UTC --- (In reply to comment #13)
(In reply to comment #12)
I tried to trace RenderFeatTest (one of the other applications locking my system). It did as with the piglit test: it didn't crash. However, the rendering is corrupted where it locks when launched from a terminal. Trace is 75MB when compressed if you want me to upload it somewhere.
I forgot to say: it doesn't lock anymore at all. I should have written "... where it locked when launched from a terminal". It was locking at test 7. I'm attaching a screenshot from that test.
I'll bisect to see if I can find which commit "fixed" the lock.
I was not able to figure out the combination that fixed the thing. Well, let's focus on the piglit test that locks the beast.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #18 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-22 05:32:58 UTC --- (In reply to comment #16)
(In reply to comment #15)
(In reply to comment #10)
Well, it seems running it through qapitrace doesn't lock.
The apitrace looks incomplete: it doesn't contain any actual rendering operations.
I'll rerun it at home tonight.
You were right, I had missed a ";" between the arguments. Bam, locked. I was unable to retrieve a trace. Well, I may try to run it in debug mode to see where it stops later this week.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #19 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-23 04:12:52 UTC --- So about this locking piglit test (depthstencil-render-miplevels 146 s=z24_s8_d=z32f_s8), I've been able to track it down to: line 218: piglit_report_result(PIGLIT_SKIP);
I don't know if we are supposed to be hitting this path, but either way, it seems piglit_report_result(PIGLIT_SKIP) locks. I suppose this function must be releasing resources before exiting, but something wrong is happening in there.
By the way, I'm now running kernel 3.6.0-rc3 with latest drm and mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #20 from Michel Dänzer michel@daenzer.net 2012-08-23 06:45:54 UTC --- (In reply to comment #19)
So about this locking piglit test (depthstencil-render-miplevels 146 s=z24_s8_d=z32f_s8), I've been able to track it down to: line 218: piglit_report_result(PIGLIT_SKIP);
How did you determine that? It's weird, I wouldn't expect a skipped test to produce any actual GPU rendering.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #21 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-23 13:13:25 UTC --- (In reply to comment #20)
(In reply to comment #19)
So about this locking piglit test (depthstencil-render-miplevels 146 s=z24_s8_d=z32f_s8), I've been able to track it down to: line 218: piglit_report_result(PIGLIT_SKIP);
How did you determine that? It's weird, I wouldn't expect a skipped test to produce any actual GPU rendering.
I used gdb and step into the code until it locked. It gets out at level 0, after going through:
/** * Attach the proper miplevel of each texture to the framebuffer */ void set_up_framebuffer_for_miplevel(int level)...
Before this call, there is a framebuffer initialization: GLuint fbo; glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); glBindFramebuffer(GL_READ_FRAMEBUFFER, fbo);
for (int level = 0; level <= max_miplevel; ++level) { set_up_framebuffer_for_miplevel(level);
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #22 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-30 21:34:48 UTC --- It seems Marek has more weight than me about lockups related to VM on Cayman(problem first reported as bug 45018). Patch by Marek to disable VM by default for Cayman: http://lists.freedesktop.org/archives/mesa-dev/2012-August/026590.html
If you have any news on the subject, feel free to add info in the current bug. To Marek: are you experiencing the same first lockup in the piglit tests as reported in comment 10. I'm sure have seen a previous comment from another dev who was also experiencing lockups on Cayman, but I can't find who that was.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #23 from Alex Deucher agd5f@yahoo.com 2012-08-30 21:42:20 UTC --- (In reply to comment #22)
It seems Marek has more weight than me about lockups related to VM on Cayman(problem first reported as bug 45018).
Well, we were hoping to get this resolved in time for 9.0, but as it's getting pretty close now, it's probably better to disable it at least for the 9.0 release. The problem is, when it's disabled, there's not much chance of anyone testing it, so it's not likely to ever get properly fixed. Also, SI only supports VM, so we can't disable VM for SI.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #24 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-30 22:38:36 UTC --- (In reply to comment #23)
(In reply to comment #22)
It seems Marek has more weight than me about lockups related to VM on Cayman(problem first reported as bug 45018).
Well, we were hoping to get this resolved in time for 9.0, but as it's getting pretty close now, it's probably better to disable it at least for the 9.0 release. The problem is, when it's disabled, there's not much chance of anyone testing it, so it's not likely to ever get properly fixed. Also, SI only supports VM, so we can't disable VM for SI.
Meanwhile, since fixes committed for bug 45018 helped me a lot, I'll gladly keep VM activated to test it. After all, my desktop is now usable now, I've been running for 3 days without any lockup, while I was previously only able to run for a couple of hours before restarting. So, if you have any patches you want to test that could help, ask me.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #25 from Alexandre Demers alexandre.f.demers@gmail.com 2012-09-06 17:19:09 UTC --- I'll have to confirm it later today by disabling VM, but I'm pretty sure I experienced a lock (can be reproduced every time) related to VM when testing with Unigine Tropics. It loaded, the demo began and then it locked when the island appeared at the horizon (I guess that's what it is since it was the first time I was running this demo).
From the retrieved logs, I could only identify a GPU lock with a reset that
failed to reset rings properly.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #26 from Anthony Waters awaters1@gmail.com --- As I mentiond in bug 55416 I received a new lockup due to VA being enabled, however, the lockups only started occuring after commit c8b06dccff9cb89e20378664f3cbc202876a180f. Disabling VA also prevents the lockups, so it may be similar to what was mentioned in comment 25. I will check if that piglit test still locks up for me.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #27 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #21)
(In reply to comment #20)
(In reply to comment #19)
So about this locking piglit test (depthstencil-render-miplevels 146 s=z24_s8_d=z32f_s8), I've been able to track it down to: line 218: piglit_report_result(PIGLIT_SKIP);
How did you determine that? It's weird, I wouldn't expect a skipped test to produce any actual GPU rendering.
I used gdb and step into the code until it locked. It gets out at level 0, after going through:
/**
- Attach the proper miplevel of each texture to the framebuffer
*/ void set_up_framebuffer_for_miplevel(int level)...
Before this call, there is a framebuffer initialization: GLuint fbo; glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); glBindFramebuffer(GL_READ_FRAMEBUFFER, fbo);
for (int level = 0; level <= max_miplevel; ++level) { set_up_framebuffer_for_miplevel(level);
It seems that with latest mesa, drm, xf86 and kernel 3.7.0-rc7-71633-g3b6b59b from drm-next, it doesn't fail on this test anymore. It does lock however on a different one. I'll debug it and see where it locks.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
--- Comment #28 from Alexandre Demers alexandre.f.demers@gmail.com --- I'm closing this bug, the original triggering application is not doing it anymore. Also many things changed since then. I'll reopen it if for any reason I can still point to this exact problem, but I think it was more like an umbrella bug: many others were hidden under it.
https://bugs.freedesktop.org/show_bug.cgi?id=53111
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
dri-devel@lists.freedesktop.org