Re: Radeon lockup on 3.8.5-201.fc18.x86_64

23 Apr 2013


      On Tue, Apr 23, 2013 at 10:15 AM, Michel Dänzer michel@daenzer.net wrote:
...
On Die, 2013-04-23 at 10:08 -0700, Andy Lutomirski wrote:
...
On Mon, Apr 22, 2013 at 10:55 PM, Michel Dänzer michel@daenzer.net wrote:
...
On Mon, 2013-04-22 at 16:19 -0700, Andy Lutomirski wrote:
...
I'm not convinced there's an actual hang.  40 seconds is a long time,
and I've only ever seen this when clicking something, and when this
happens, the screen goes blank immediately (not after a 40 second
delay).
Hmm, now that you mention this, I notice in your original report it
claims that the CP stalled for 'more than 5102593msec', which is clearly
bogus. Looks like something's wrong with the lockup detection.
Did this start after a kernel update or something like that?
It's recent.  It may have been when F18 switched from 3.7 to 3.8.
Can you reproduce it with an upstream kernel? Can you bisect? I realize
it'll probably take a long time, but unless someone has an idea which
change might have introduced the problem...
Yuck.  I can try, but it takes days to reproduce this, so it will take
forever (and may end up with a wrong answer if I get lucky and don't
crash).
...
...
I think there are bugs in the lockup detection and in the lockup
recovery.  Firefox, in particular, is *really* slow afterwards.  Are
interrupts possibly getting dropped or misconfigured during the reset?
Let's not get ahead of ourselves and focus on the lockup detection issue
for now.
I don't understand the r600_gpu_check_soft_reset code, but could this
be the sequence of events that triggers it?
1. radeon_ring_is_lockup is called just as the very last command on
the ring completes, so last_rptr gets set to the rptr.
2. Nothing happens for a while (i.e. > lockup_timeout).  rptr doesn't change.
3. A very slightly slow operation starts.
4. radeon_ring_is_lockup gets called before that command completes.
radeon_ring_test_lockup will not detect a jiffies wrap-around (because
there wasn't one), rptr will equal last_rptr (because there hasn't
been any progress since last time), and the elapsed time will be
really long, because the function hasn't been called for a long time.
So a lockup gets detected, even though nothing's wrong.
There's a comment above radeon_ring_test_lockup that says:
* A possible false positivie is if we get call after while and last_cp_rptr ==
 * the current CP rptr, even if it's unlikely it might happen. To avoid this
 * if the elapsed time since last call is bigger than 2 second than we return
 * false and update the tracking information. Due to this the caller must call
 * radeon_ring_test_lockup several time in less than 2sec for lockup
to be reported
 * the fencing code should be cautious about that.
but the corresponding code doesn't appear to exist anywhere.
Also, and unrelatedly, I revoke my comment about gmail issues being
fixed with hyperz off.  Gmail still draws incorrectly.  This may or
may not have anything to do with the radeon driver.
--Andy

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Radeon lockup on 3.8.5-201.fc18.x86_64