On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer jwboyer@gmail.com wrote:
On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher alexdeucher@gmail.com wrote:
On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer jwboyer@gmail.com wrote:
On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher alexdeucher@gmail.com wrote:
>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit > > So I don't think that's actually the cause of the problem. Or at least > not that alone. I reverted it on top of Linus' latest tree and I still > get the lockups.
Actually, git bisect does seem to have gotten it correct. Once I actually tested the revert of just that on top of Linus' tree (commit d895cb1af1), things seem to be working much better. I've rebooted a dozen times without a lockup. The most I've seen it take on a kernel with that commit included is 3 reboots, so that's definitely at least an improvement.
I give up. GPU issues are not my thing. 2 reboots after I sent that it gave me pretty rainbow static again. So it might have been an improvement, but revert it is not a solution.
Looking at there rest of the commits, the whole GPU rework might be suspect, but I clearly have no clue.
GPUs are tricky beasts :)
Understatement ;).
ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the problem anyway since it only affects 6xx/7xx and your card is handled by the evergreen code. I'll put together some patches to help narrow down the problem.
Yeah, that's the biggest problem I have, not knowing which functions are actually being executed for this card. It looks like a combination of stuff in evergreen.c and ni.c, but I have no idea.
Patches would be great. If nothing else, I'm really good at building kernels and rebooting by now.
Two possible fixes attached. The first attempts a full reset of all blocks if the MC (memory controller) is hung. That may work better than just resetting the MC. The second just disables MC reset. I'm not sure we can reliably tell if it's busy due to display requests hitting the MC periodically which would lead to needlessly resetting it possibly leading to failures like you are seeing.
OK. I'll test them individually. It will probably take a bit because I'll want to do numerous reboots if things seem "fixed" with one or the other.
I'll let you know how things go.
I applied each individually on top of Linus' tree as of this morning (commit 2a7d2b96d5) built, installed, and tested.
0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in two reboots.
0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone 21 reboots without a hang/rainbow static. You'll understand if I'm hesitant to declare success, but resetting the MC does indeed appear to be the issue. I'll keep rebooting for a while to make sure.
josh