https://bugs.freedesktop.org/show_bug.cgi?id=50655
Bug #: 50655 Summary: ATI RV670 [Radeon HD 3870] Ioquake games causes GPU lockup (waiting for 0x00003039 last fence id 0x00003030) Classification: Unclassified Product: Mesa Version: git Platform: x86-64 (AMD64) OS/Version: Linux (All) Status: NEW Severity: major Priority: medium Component: Drivers/DRI/R600 AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: BryanQuigley@Ubuntu.com
Created attachment 62474 --> https://bugs.freedesktop.org/attachment.cgi?id=62474 kern.log
Tested and reproducible with Urban Terror, Warsow, and World of Padman. Used phoronix test suite, and at some point during the run of each game, it would either freeze or eventually display weird output to the screen (attached).
Occasionally a VT switch would let the game "appear" again. Other times you can hear the sound of the game continue.
I am using the 3.4 kernel and drivers/X, etc from Xorg Edgers PPA, which for the ati driver would be 6.14.99+git20120525.b1e9c308 and mesa is 8.1~git20120530.ff3eef1a.
You should be able to reproduce this by: Installing phoronix test suite (http://www.phoronix-test-suite.com/?k=downloads) and then running: phoronix-test-suite benchmark urbanterror (or warsow or padman)
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #1 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-06-03 14:23:32 PDT --- Created attachment 62475 --> https://bugs.freedesktop.org/attachment.cgi?id=62475 syslog
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #2 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-06-03 14:23:51 PDT --- Created attachment 62476 --> https://bugs.freedesktop.org/attachment.cgi?id=62476 Xorg log
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #3 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-06-03 15:41:37 PDT --- Created attachment 62478 --> https://bugs.freedesktop.org/attachment.cgi?id=62478 weird screen
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #4 from Alexandre Demers alexandre.f.demers@gmail.com 2012-06-04 05:22:26 UTC --- Would it be possible to test the same thing, but with kernel 3.2? I'd like to know if we are experiencing the same problem that I reported some time ago.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #5 from Alex Deucher agd5f@yahoo.com 2012-06-04 05:33:01 PDT --- Would it be possible to narrow down which component (kernel, ddx, or mesa) is causing the problem and bisect? I'd guess it's a mesa issue.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #6 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-06-04 07:37:58 PDT --- I did test with a 3.2 and the same 3.4 kernel and the stable mesa/X/drivers that came with Precise. This did not cause a crash..
I think I tested with 3.2 and the git mesa/X/drivers will cause the crash, I'll confirm tonight.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #7 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-06-04 18:50:08 UTC --- Just upgrading Mesa (which does pull in libdrm upgrades) causes the bug.. even on the 3.2 kernel without Xorg/Drivers upgraded... I think this confirms it is mesa bug..
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|Drivers/DRI/R600 |Drivers/Gallium/r600
--- Comment #8 from Michel Dänzer michel@daenzer.net 2012-06-05 09:02:35 PDT --- (In reply to comment #7)
I think this confirms it is mesa bug..
Would be great if you could bisect mesa Git then.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #9 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-06-06 21:39:53 PDT --- I think I did everything right in this bisect (I didn't the first attempt).
fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf is the first bad commit commit fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf Author: Marek Olšák maraeo@gmail.com Date: Fri Feb 3 05:05:31 2012 +0100
r600g: move invariant register updates into start_cs for r6xx-r7xx
:040000 040000 dd9232a0c49e54e0cd536fa858dc131982dc2fbe 379e1d61c53d98a8706f32da5020dc22c0c0ee33 M src
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #10 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-06-06 21:42:42 PDT --- Created attachment 62689 --> https://bugs.freedesktop.org/attachment.cgi?id=62689 good+bad git bisects
Both the good and bad git bisect logs, the good one had me run warsow, padman, and urbanterror looking for the bug. The bad one missed some occurrences it seems.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |maraeo@gmail.com
--- Comment #11 from Michel Dänzer michel@daenzer.net 2012-06-07 00:40:00 PDT --- Marek, any ideas? (bug 47116 might be related)
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #62476|application/octet-stream |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #12 from Alex Deucher agd5f@yahoo.com 2012-06-07 07:16:57 PDT --- I think I know what's going on here. There's a hw bug on r6xx where you need to re-emit a CB register if some state further up the pipeline changes even if the CB state has not changed. I remember fixing it in r600c, but I can't find the commit...
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #13 from Alex Deucher agd5f@yahoo.com 2012-06-07 07:20:16 PDT --- IIRC, the fix is to always re-emit a CB reg between draw calls if some other state changed.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #14 from Marek Olšák maraeo@gmail.com 2012-06-07 09:19:52 PDT --- (In reply to comment #11)
Marek, any ideas? (bug 47116 might be related)
Sorry I've got none. All the regs were really invariant at the time I wrote the commit. A hardware bug like Alex suggested is one possible explanation...
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #15 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-08-15 14:49:33 UTC --- Bug still occurs in git from yesterday.
I'm willing to test patches or even do some basic programming (no graphics experience). I wasn't able to just revert the problem patch and am not sure which parts I should be trying to keep.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #16 from Marek Olšák maraeo@gmail.com 2012-08-24 01:14:51 UTC --- Created attachment 66040 --> https://bugs.freedesktop.org/attachment.cgi?id=66040 possible fix
Could you please try this patch?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #17 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-08-24 04:31:57 UTC --- The patch doesn't seem to work. It may have made the crash more likely to bring the system down, but I'd have to do more testing to confirm that.
Attaching 3 syslog results in 1 file containing: Before the patch After the patch After the patch - broke so much it needed a restart
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #18 from Bryan Quigley BryanQuigley@Ubuntu.com 2012-08-24 04:32:48 UTC --- Created attachment 66047 --> https://bugs.freedesktop.org/attachment.cgi?id=66047 3 outputs of syslog: before the patch, after, and after really bad
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Bryan Quigley BryanQuigley@Ubuntu.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #62689|application/octet-stream |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Bryan Quigley BryanQuigley@Ubuntu.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #62475|application/octet-stream |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Bryan Quigley BryanQuigley@Ubuntu.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #62474|application/octet-stream |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Bryan Quigley BryanQuigley@Ubuntu.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #66047|application/octet-stream |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #19 from Bryan Quigley BryanQuigley@Ubuntu.com --- Would any other output help debug this? Register dumps using avivotool?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #20 from Alex Deucher agd5f@yahoo.com --- Created attachment 71271 --> https://bugs.freedesktop.org/attachment.cgi?id=71271&action=edit possible fix
Does this patch help?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #21 from Bryan Quigley BryanQuigley@Ubuntu.com --- Nope, but the patch didn't work as is, so I changed it to: rctx->framebuffer.atom.dirty = true;
Which may not be what the patch was actually trying to do...
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #22 from Marek Olšák maraeo@gmail.com --- (In reply to comment #21)
Nope, but the patch didn't work as is, so I changed it to: rctx->framebuffer.atom.dirty = true;
Which may not be what the patch was actually trying to do...
Your modification is correct. So did it work or not?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #23 from Bryan Quigley BryanQuigley@Ubuntu.com --- No, the new patch doesn't fix it either.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Myckel Habets myckel@sdf.lonestar.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |myckel@sdf.lonestar.org
--- Comment #24 from Myckel Habets myckel@sdf.lonestar.org --- Some more info is in Bug 58058, because I think this is the same problem.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #25 from Alex Deucher agd5f@yahoo.com --- Created attachment 71346 --> https://bugs.freedesktop.org/attachment.cgi?id=71346&action=edit possible fix
Try this patch. It re-emits most of the invariant state at draw time. If it helps, please try commenting out (change the #if 1 to #if 0) each new section until you are able to trigger the lock ups again so we can narrow down which state needs to be re-emitted at draw time.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #26 from Myckel Habets myckel@sdf.lonestar.org --- (In reply to comment #25)
Created attachment 71346 [details] [review] possible fix
Try this patch. It re-emits most of the invariant state at draw time. If it helps, please try commenting out (change the #if 1 to #if 0) each new section until you are able to trigger the lock ups again so we can narrow down which state needs to be re-emitted at draw time.
In my case it didn't help, although I had the impression it took longer before it hang (could also be random?)
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #27 from Myckel Habets myckel@sdf.lonestar.org --- 2nd time it took shorter for it to lock up. Some observations: screen gets distorted after one or more resets (wrong rendering order?). Resetting of the screen keeps going, also when switched in the console (tty interface), until X is killed/shutdown.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #28 from Bryan Quigley BryanQuigley@Ubuntu.com --- I confirm that patch 71346 didn't help either.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #29 from Bryan Quigley BryanQuigley@Ubuntu.com --- I get a similar lockup when starting Team Fortress 2 (native). It happens at startup so it's much easier to reproduce..
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #30 from Andy Furniss lists@andyfurniss.entadsl.com --- I've just put my rv670 (HD3850) card back in my AGP box and can reliably get etqw to lock after a few seconds with waiting for fence.
I may be too different from the OP for this to be relevant to this bug differences -
AGP, 32 bit, running drm-fixes kernel, no writebacks and my bisect came up with a commit postdating the original report.
But for me -
1eedebc65b02130ef7a27062a1ed67972a317a08 is first bad commit commit 1eedebc65b02130ef7a27062a1ed67972a317a08 Author: Marek Olšák maraeo@gmail.com Date: Thu Nov 1 02:00:37 2012 +0100
r600g: re-enable handling of DISCARD_RANGE, improving performance
It seems to work for me now. Even the graphics corruption is gone.
This also boosts performance in Reaction Quake.
Gives a reliable rv670 lock up with etqw.
This is testing with mesa built with --disable-llvm (as R600_LLVM doesn't work at all on this card)
It may (or may not) be worth anyone testing with mesa master to try resetting it to the commit before the one above like -
make distclean git clean -dfx git reset --hard fa58644855e44830e0b91dc627703c236fa6712a
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #31 from Bryan Quigley BryanQuigley@Ubuntu.com --- Andy,
How many times did you try it at that commit? I ask because I orginally bisected it wrong because it didn't always reproduce consistantly for me. (Would take >1 run)
I'll test it out though.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #32 from Andy Furniss lists@andyfurniss.entadsl.com --- Looks like this was a separate issue - I've just managed to get openarena to lock GPU with mesa set to before
r600g: re-enable handling of DISCARD_RANGE
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #33 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #31)
Andy,
How many times did you try it at that commit? I ask because I orginally bisected it wrong because it didn't always reproduce consistantly for me. (Would take >1 run)
I'll test it out though.
I am still testing - for etqw it looks good, but as I just posted I can after some time get openarena to lock.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #34 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #9)
I think I did everything right in this bisect (I didn't the first attempt).
fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf is the first bad commit commit fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf Author: Marek Olšák maraeo@gmail.com Date: Fri Feb 3 05:05:31 2012 +0100
r600g: move invariant register updates into start_cs for r6xx-r7xx
:040000 040000 dd9232a0c49e54e0cd536fa858dc131982dc2fbe 379e1d61c53d98a8706f32da5020dc22c0c0ee33 M src
This seems correct, I can get a lock after a few minutes on this commit, but have so far failed to lock on the one before it.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #35 from Myckel Habets myckel@sdf.lonestar.org --- (In reply to comment #30)
make distclean git clean -dfx git reset --hard fa58644855e44830e0b91dc627703c236fa6712a
Ok, did this and rebuild everything, but problem stays in my case.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #36 from Bryan Quigley BryanQuigley@Ubuntu.com --- I believe this bug is now triggered much faster (within 10 seconds of starting one of these games). But on the plus side it seems to usually just crash the game in question. (Running xorg-edgers (git) on Ubuntu Raring, kernel 3.8)
Excerpt from kern.log 346488] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec 346502] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000ced last fence id 0x0000000000000cea) 347653] radeon 0000:01:00.0: Saved 121 dwords of commands on ring 0. 347664] radeon 0000:01:00.0: GPU softreset: 0x00000003 348246] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xE7730130 348253] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00FF0103 348259] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200000C0 348265] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x02000000 348271] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00040804 348277] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00028284 348283] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80878645 348289] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE 363165] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 378050] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 378056] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 378062] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200080C0 378068] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 378074] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 378079] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 378085] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 382501] radeon 0000:01:00.0: GPU reset succeeded, trying to resume 400412] [drm] probing gen 2 caps for device 1022:9603 = 2/0 400422] [drm] PCIE gen 2 link speeds already enabled 405272] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). 405369] radeon 0000:01:00.0: WB enabled 405379] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffdccc00 405388] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffdccc0c 436612] [drm] ring test on 0 succeeded in 0 usecs 436678] [drm] ring test on 3 succeeded in 1 usecs 438392] [drm] ib test on ring 0 succeeded in 0 usecs 438423] [drm] ib test on ring 3 succeeded in 1 usecs
End of an apitrace: 4700817 glClientActiveTextureARB(texture = GL_TEXTURE1) 4700818 glBindTexture(target = GL_TEXTURE_2D, texture = 0) 4700819 glActiveTextureARB(texture = GL_TEXTURE0) 4700820 glClientActiveTextureARB(texture = GL_TEXTURE0) 4700821 glBindTexture(target = GL_TEXTURE_2D, texture = 0) 4700822 glXMakeCurrent(dpy = 0xb7f7c80, drawable = 0, ctx = NULL) = True 4700823 glXDestroyContext(dpy = 0xb7f7c80, ctx = 0xb830e08) 4700339 glDrawElements(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_INT, indices = blob(72)) // incomplete
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #37 from Myckel Habets myckel@sdf.lonestar.org --- Some attempts from my side:
I've been going back in the tree, to see if I could find a point where it doesn't show this bug. I've come as far as end 2011, but still it locks up (although it seems that it takes more time). With my last check (early 2011) I was unable to build the code, seems that it is incompatible going back that far. I'll see if I can find the spot where I can build it again and test it.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #38 from Bryan Quigley BryanQuigley@Ubuntu.com --- @Myckel Habets in comment #39
What do you mean by a lot more time? I would test with 3 games, running 3 times each, automatically via phoronix stest suite.
With the latest git mesa does it crash very quickly for you?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #39 from Erik Jørgensen eoj198+FOSS@gmail.com --- Created attachment 75272 --> https://bugs.freedesktop.org/attachment.cgi?id=75272&action=edit Possible fix for R600 hw deadlock
Patch has been tested on a system with AMD K8 CPU and Radeon AGP card (AMD RV670 / Radeon HD 3850) with both 3.6.11-030611-generic kernel (from Ubuntu kernel PPA mainline) and kernel built from recent drm-fixes git in the testing. This patch may also be relevant to reported Bug 47116 .
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #40 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #39)
Created attachment 75272 [details] [review] Possible fix for R600 hw deadlock
Patch has been tested on a system with AMD K8 CPU and Radeon AGP card (AMD RV670 / Radeon HD 3850) with both 3.6.11-030611-generic kernel (from Ubuntu kernel PPA mainline) and kernel built from recent drm-fixes git in the testing. This patch may also be relevant to reported Bug 47116 .
There is a lot of unrelated stuff going on in that patch. Can you narrow down what part fixes the issue?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #41 from Alex Deucher agd5f@yahoo.com --- Created attachment 75274 --> https://bugs.freedesktop.org/attachment.cgi?id=75274&action=edit flush fix 1/4
Please try this patch series. The 4th patch is optional. It just enables CP DMA assuming that the previous flushing fixes fix the CP DMA issues.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #42 from Alex Deucher agd5f@yahoo.com --- Created attachment 75275 --> https://bugs.freedesktop.org/attachment.cgi?id=75275&action=edit flush fix 2/4
patch 2 of 4.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #43 from Alex Deucher agd5f@yahoo.com --- Created attachment 75276 --> https://bugs.freedesktop.org/attachment.cgi?id=75276&action=edit flush fix 3/4
patch 3 of 4.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #44 from Alex Deucher agd5f@yahoo.com --- Created attachment 75277 --> https://bugs.freedesktop.org/attachment.cgi?id=75277&action=edit flush fix 4/4
Optional patch to enable CP DMA on 6xx.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |archon-123@hotmail.com
--- Comment #45 from Alex Deucher agd5f@yahoo.com --- *** Bug 47116 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #46 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #39)
Created attachment 75272 [details] [review] Possible fix for R600 hw deadlock
Patch has been tested on a system with AMD K8 CPU and Radeon AGP card (AMD RV670 / Radeon HD 3850) with both 3.6.11-030611-generic kernel (from Ubuntu kernel PPA mainline) and kernel built from recent drm-fixes git in the testing. This patch may also be relevant to reported Bug 47116 .
Testing AGP HD3850 - this patch regresses etqw which since my previous post in this bug had become stable. GPU lock within seconds with or without llvm. Testing on 3.7.6 (purely because I have a separate issue with gpu locks provoking oops with current kernels).
It does however seem to fix openarena and nexuiz which without this patch would gpu lock, or really hard lock respectively after a couple of minutes. Haven't had time to test really long runs yet though.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #47 from Bryan Quigley BryanQuigley@Ubuntu.com --- The series of 4 patches by Alex (41-44) doesn't fix the issue for me.
The patch in Comment #39 does fix it for me! I tested it repeatedly with 6 runs of padman, urbanterror and openarena each. (using 3.8 kernel)
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #48 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #42)
Created attachment 75275 [details] [review] flush fix 2/4
patch 2 of 4.
This patch (patch 1 also applied) regresses etqw.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #75274|0 |1 is obsolete| | Attachment #75275|0 |1 is obsolete| | Attachment #75276|0 |1 is obsolete| | Attachment #75277|0 |1 is obsolete| |
--- Comment #49 from Alex Deucher agd5f@yahoo.com --- Created attachment 75317 --> https://bugs.freedesktop.org/attachment.cgi?id=75317&action=edit new attempt 1/5
Another attempt to fix the issue. Patch 5 is optional and not related to bug per se.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #50 from Alex Deucher agd5f@yahoo.com --- Created attachment 75318 --> https://bugs.freedesktop.org/attachment.cgi?id=75318&action=edit new attempt 2/5
2/5
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #51 from Alex Deucher agd5f@yahoo.com --- Created attachment 75319 --> https://bugs.freedesktop.org/attachment.cgi?id=75319&action=edit new attempt 3/5
3/5
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #52 from Alex Deucher agd5f@yahoo.com --- Created attachment 75320 --> https://bugs.freedesktop.org/attachment.cgi?id=75320&action=edit new attempt 4/5
4/5
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #53 from Alex Deucher agd5f@yahoo.com --- Created attachment 75321 --> https://bugs.freedesktop.org/attachment.cgi?id=75321&action=edit new attempt 5/5
optional 5/5.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #54 from Alex Deucher agd5f@yahoo.com --- Latest patches 1 and 4 alone are enough to fix the hangs for me on an rs780.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #55 from Alex Deucher agd5f@yahoo.com --- actually just patch 4 alone seems to fix it.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #75317|0 |1 is obsolete| | Attachment #75318|0 |1 is obsolete| | Attachment #75319|0 |1 is obsolete| | Attachment #75320|0 |1 is obsolete| | Attachment #75321|0 |1 is obsolete| |
--- Comment #56 from Alex Deucher agd5f@yahoo.com --- Created attachment 75331 --> https://bugs.freedesktop.org/attachment.cgi?id=75331&action=edit simple fix
Just this patch alone seems to fix the issue here.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #57 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #51)
Created attachment 75319 [details] [review] new attempt 3/5
3/5
FWIW now it's obsolete this still regressed etqw.
Also tried 1+2+4 and 1+2+3+4 with openarena/nexuiz and still had lockups.
Will try 4 alone next.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #58 from Bryan Quigley BryanQuigley@Ubuntu.com --- The simple patch appears to have fixed it for me. (comment 56). Just did 9 total runs, will test more later today.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #59 from Myckel Habets myckel@sdf.lonestar.org --- (In reply to comment #56)
Created attachment 75331 [details] [review] simple fix
Just this patch alone seems to fix the issue here.
I just had a lock up after ~30min in the game (openarena).
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #60 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #59)
(In reply to comment #56)
Created attachment 75331 [details] [review] [review] simple fix
Just this patch alone seems to fix the issue here.
I just had a lock up after ~30min in the game (openarena).
Can you try just the patch "new attempt 4/5" (attachment 75320) by itself?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #61 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #57)
(In reply to comment #51)
Created attachment 75319 [details] [review] [review] new attempt 3/5
3/5
FWIW now it's obsolete this still regressed etqw.
Also tried 1+2+4 and 1+2+3+4 with openarena/nexuiz and still had lockups.
Will try 4 alone next.
Can you also try the simple fix (attachment 75331)?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #62 from Alex Deucher agd5f@yahoo.com --- Created attachment 75363 --> https://bugs.freedesktop.org/attachment.cgi?id=75363&action=edit alternate simple fix
Another patch to try.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #63 from Myckel Habets myckel@sdf.lonestar.org --- I tried the simple fix together with Eriks patch, haven't been able to get it locked up yet after ~30 minutes.
I'll also try the alternate simple fix later.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #64 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #56)
Created attachment 75331 [details] [review] simple fix
Just this patch alone seems to fix the issue here.
I can still lockup with this and 0004.
It took longer with 0004 and generally llvm seems to take longer to lock than R600_LLVM=0.
Will try the new patch.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #65 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #64)
(In reply to comment #56)
Created attachment 75331 [details] [review] [review] simple fix
Just this patch alone seems to fix the issue here.
I can still lockup with this
Ignore this - I messed up when testing simple fix and was testing unpatched - it's running now and hasn't locked yet.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #66 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #65)
(In reply to comment #64)
(In reply to comment #56)
Created attachment 75331 [details] [review] [review] [review] simple fix
Just this patch alone seems to fix the issue here.
I can still lockup with this
Ignore this - I messed up when testing simple fix and was testing unpatched
- it's running now and hasn't locked yet.
It eventually hard locked with nexuiz.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #75363|0 |1 is obsolete| |
--- Comment #67 from Alex Deucher agd5f@yahoo.com --- Created attachment 75373 --> https://bugs.freedesktop.org/attachment.cgi?id=75373&action=edit better alternative fix
Please try this one instead of the previous one.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #68 from Bryan Quigley BryanQuigley@Ubuntu.com --- The better alternative fix just worked fine for me running: openarena, nexuiz, padman, tremulus, and urbanterror. I'm going to run it again to be sure. Will report back if it breaks. (http://openbenchmarking.org/result/1302221-RA-BUGTESTIN78)
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #69 from Alex Deucher agd5f@yahoo.com --- I went ahead and pushed a split up version of attachment 75373 to mesa: http://cgit.freedesktop.org/mesa/mesa/commit/?id=7ebf83f109db9dde89830d58441... http://cgit.freedesktop.org/mesa/mesa/commit/?id=8442b67f5f3aedbfdb4446164dd... 9.1 is supposed to be released today and even if the patch isn't perfect for everyone yet, it's a lot better than it was before. I'll keep this bug open and we can continue to work on this until we get it nailed.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #70 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #67)
Created attachment 75373 [details] [review] better alternative fix
Please try this one instead of the previous one.
I can still hard lock with this and previous - nexuiz is easiest and it normally hard locks. openarena with vanilla is nicer and gpu locks and recovery is possible, but with these patches I did get a hard lock from it.
There is a difference to vanilla in that I am getting the locks after a level/timedemo has run rather than during.
With the patch before this I played 40 minutes of openarena got bored and typed disconnect then after it had exited the level it locked.
With nexuix I just run the demos in order and again the locks are coming after a demo has finished and the game has been showing a text screen for several seconds.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #71 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #69)
I went ahead and pushed a split up version of attachment 75373 [details] [review] to mesa: http://cgit.freedesktop.org/mesa/mesa/commit/ ?id=7ebf83f109db9dde89830d5844107c936cf42e4d http://cgit.freedesktop.org/mesa/mesa/commit/ ?id=8442b67f5f3aedbfdb4446164dd09d4eaeda4888 9.1 is supposed to be released today and even if the patch isn't perfect for everyone yet, it's a lot better than it was before. I'll keep this bug open and we can continue to work on this until we get it nailed.
That was quick - I've only just got to try with etqw and with v5 it quickly causes a GPU reset.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #72 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #71)
(In reply to comment #69)
I went ahead and pushed a split up version of attachment 75373 [details] [review] [review] to mesa: http://cgit.freedesktop.org/mesa/mesa/commit/ ?id=7ebf83f109db9dde89830d5844107c936cf42e4d http://cgit.freedesktop.org/mesa/mesa/commit/ ?id=8442b67f5f3aedbfdb4446164dd09d4eaeda4888 9.1 is supposed to be released today and even if the patch isn't perfect for everyone yet, it's a lot better than it was before. I'll keep this bug open and we can continue to work on this until we get it nailed.
That was quick - I've only just got to try with etqw and with v5 it quickly causes a GPU reset.
On vanilla master now. Can still get etqw to provoke a gpu reset but it seems like it's the initial use of the text console when on the main screen that provokes it. If I avoid using it then I can run without locks.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #73 from Alex Deucher agd5f@yahoo.com --- Does disabling hyperZ help? Set env var R600_HYPERZ=0
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #74 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #73)
Does disabling hyperZ help? Set env var R600_HYPERZ=0
No, that doesn't help.
I have just found another way to avoid it though, running with my card on "low" I can't get it to lock. Turning it up to high as I normally do and it will lock on first (but not subsequent) use of text console every time.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #75 from Myckel Habets myckel@sdf.lonestar.org --- (In reply to comment #72)
(In reply to comment #71)
(In reply to comment #69)
I went ahead and pushed a split up version of attachment 75373 [details] [review] [review] [review] to mesa: http://cgit.freedesktop.org/mesa/mesa/commit/ ?id=7ebf83f109db9dde89830d5844107c936cf42e4d http://cgit.freedesktop.org/mesa/mesa/commit/ ?id=8442b67f5f3aedbfdb4446164dd09d4eaeda4888 9.1 is supposed to be released today and even if the patch isn't perfect for everyone yet, it's a lot better than it was before. I'll keep this bug open and we can continue to work on this until we get it nailed.
That was quick - I've only just got to try with etqw and with v5 it quickly causes a GPU reset.
On vanilla master now. Can still get etqw to provoke a gpu reset but it seems like it's the initial use of the text console when on the main screen that provokes it. If I avoid using it then I can run without locks.
I'm also on vanilla master now, just got a lock up on open arena (after ~40 min). I'm trying Eriks patch again, because I yet have to get it to lock up with that one (after ~2h of playing).
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|ATI RV670 [Radeon HD 3870] |[r600g][RV670 HD3870] |Ioquake games causes GPU |Ioquake games causes GPU |lockup (waiting for |lockup (waiting for |0x00003039 last fence id |0x00003039 last fence id |0x00003030) |0x00003030)
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #76 from Bryan Quigley gquigs+bugs@gmail.com --- I haven't seen this bug since my last comment. (and for the last month been on a different video card).
Does anyone else still see this issue or shall I close it Fix Released?
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #77 from Marek Olšák maraeo@gmail.com --- (In reply to comment #76)
I haven't seen this bug since my last comment. (and for the last month been on a different video card).
Does anyone else still see this issue or shall I close it Fix Released?
I tested RV670 with piglit and DOTA 2 in April this year and it worked fine.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
--- Comment #78 from Myckel Habets myckel@sdf.lonestar.org --- (In reply to comment #76)
I haven't seen this bug since my last comment. (and for the last month been on a different video card).
Does anyone else still see this issue or shall I close it Fix Released?
Give me a few days to test (not so much spare time now) and see if I can still trigger the bug.
https://bugs.freedesktop.org/show_bug.cgi?id=50655
Bryan Quigley gquigs+bugs@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #79 from Bryan Quigley gquigs+bugs@gmail.com --- Per my comment on 2014-07-30 and no other updates since that year I'm going to go ahead and mark this Fixed. Thanks all!
dri-devel@lists.freedesktop.org