"Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2

List overview All Threads
Download

newer

older

[Bug 59169] New: [radeons] wrong...

[PATCH] Make s6e8ax0 panel driver...

J. Bruce Fields

3 Jan 2013 3 Jan '13

8:46 p.m.

I got a crash after a few minutes of running 3.8.0-rc2, was able to switch to a vt and look at dmesg:

[ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.

Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such problem.

dmesg, config, and i915_error_state available from:

http://fieldses.org/~bfields/3.8-hang/

--b.

Show replies by date

Josh Boyer

3 Jan 3 Jan

9:16 p.m.

On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields bfields@fieldses.org wrote:

...

I got a crash after a few minutes of running 3.8.0-rc2, was able to switch to a vt and look at dmesg:

[ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.

Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such problem.

I'm not questioning that you haven't seen that error in F17, but we have had quite a few bug reports with similar error messages for a while now. Apparently there are lots of ways GPUs can get hung, so they might be different from what you're seeing. Just wanted to point out that it might not be a new 3.8 change that caused it.

josh

J. Bruce Fields

11:11 p.m.

On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote:

...

On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields bfields@fieldses.org wrote:

...
I got a crash after a few minutes of running 3.8.0-rc2, was able to switch to a vt and look at dmesg:

[ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.

Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such problem.

I'm not questioning that you haven't seen that error in F17, but we have had quite a few bug reports with similar error messages for a while now. Apparently there are lots of ways GPUs can get hung, so they might be different from what you're seeing. Just wanted to point out that it might not be a new 3.8 change that caused it.

OK, sure. It reproduced very quickly after the upgrade, so I assumed it was a regression.

I'm running 3.7.0 now which hasn't shown any problem.

I'll try a newer kernel again to see if it's really that easy for me to reproduce.

--b.

Daniel Vetter

6 Jan 6 Jan

6:06 p.m.

On Thu, Jan 03, 2013 at 06:11:23PM -0500, J. Bruce Fields wrote:

...

On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote:

...
On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields bfields@fieldses.org wrote:

...
I got a crash after a few minutes of running 3.8.0-rc2, was able to switch to a vt and look at dmesg:

[ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.

Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such problem.

I'm not questioning that you haven't seen that error in F17, but we have had quite a few bug reports with similar error messages for a while now. Apparently there are lots of ways GPUs can get hung, so they might be different from what you're seeing. Just wanted to point out that it might not be a new 3.8 change that caused it.

OK, sure. It reproduced very quickly after the upgrade, so I assumed it was a regression.

I'm running 3.7.0 now which hasn't shown any problem.

I'll try a newer kernel again to see if it's really that easy for me to reproduce.

If you hit this again (even better if you have a way to reproduce) please grab the i915_error_state file from debugfs and file a bug on bugs.freedesktop.org against DRM - DRI (Intel). We do know of a few recent issues introduced around 3.7 kernels, preliminary patches are floating around. The error state should be good enough to decide whether you're hitting the same issues.

Thanks, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

J. Bruce Fields

8 Jan 8 Jan

2:37 p.m.

On Sun, Jan 06, 2013 at 07:06:52PM +0100, Daniel Vetter wrote:

...

On Thu, Jan 03, 2013 at 06:11:23PM -0500, J. Bruce Fields wrote:

...
On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote:

...
On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields bfields@fieldses.org wrote:

...
I got a crash after a few minutes of running 3.8.0-rc2, was able to switch to a vt and look at dmesg:

[ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.

Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such problem.

I'm not questioning that you haven't seen that error in F17, but we have had quite a few bug reports with similar error messages for a while now. Apparently there are lots of ways GPUs can get hung, so they might be different from what you're seeing. Just wanted to point out that it might not be a new 3.8 change that caused it.

OK, sure. It reproduced very quickly after the upgrade, so I assumed it was a regression.

I'm running 3.7.0 now which hasn't shown any problem.

I'll try a newer kernel again to see if it's really that easy for me to reproduce.

If you hit this again (even better if you have a way to reproduce)

Unfortunately I wasn't able to reproduce after working a couple more hours on 3.8 again. However:

...

please grab the i915_error_state file from debugfs

As I said in the original mail, I've already done that:

http://fieldses.org/~bfields/3.8-hang/

...

and file a bug on bugs.freedesktop.org against DRM - DRI (Intel).

Would it still be useful for me to file a bug? (Just going through the new-account confirmation dance now.)

--b.

...

We do know of a few recent issues introduced around 3.7 kernels, preliminary patches are floating around. The error state should be good enough to decide whether you're hitting the same issues.

Thanks, Daniel

Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Vetter

9 Jan 9 Jan

11:27 a.m.

On Tue, Jan 8, 2013 at 3:37 PM, J. Bruce Fields bfields@fieldses.org wrote:

...

...
please grab the i915_error_state file from debugfs

As I said in the original mail, I've already done that:
    http://fieldses.org/~bfields/3.8-hang/

Sorry, missed that the first time around.

...

...
and file a bug on bugs.freedesktop.org against DRM - DRI (Intel).

Would it still be useful for me to file a bug? (Just going through the new-account confirmation dance now.)

Looks like the ilk bug tracked at https://bugs.freedesktop.org/show_bug.cgi?id=55984

-Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

J. Bruce Fields

2:18 p.m.

On Wed, Jan 09, 2013 at 12:27:22PM +0100, Daniel Vetter wrote:

...

On Tue, Jan 8, 2013 at 3:37 PM, J. Bruce Fields bfields@fieldses.org wrote:

...
...
please grab the i915_error_state file from debugfs

As I said in the original mail, I've already done that:
    http://fieldses.org/~bfields/3.8-hang/
Sorry, missed that the first time around.

...
...
and file a bug on bugs.freedesktop.org against DRM - DRI (Intel).

Would it still be useful for me to file a bug? (Just going through the new-account confirmation dance now.)

Looks like the ilk bug tracked at https://bugs.freedesktop.org/show_bug.cgi?id=55984

OK, I'll add something there if I'm available to find a reproducer. Thanks!

--b.

4507

Age (days ago)

4513

Last active (days ago)

dri-devel@lists.freedesktop.org

6 comments

3 participants

tags (0)

participants (3)

Daniel Vetter
J. Bruce Fields
Josh Boyer