On Thu, Jan 03, 2013 at 06:11:23PM -0500, J. Bruce Fields wrote:
On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote:
On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields bfields@fieldses.org wrote:
I got a crash after a few minutes of running 3.8.0-rc2, was able to switch to a vt and look at dmesg:
[ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.
Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such problem.
I'm not questioning that you haven't seen that error in F17, but we have had quite a few bug reports with similar error messages for a while now. Apparently there are lots of ways GPUs can get hung, so they might be different from what you're seeing. Just wanted to point out that it might not be a new 3.8 change that caused it.
OK, sure. It reproduced very quickly after the upgrade, so I assumed it was a regression.
I'm running 3.7.0 now which hasn't shown any problem.
I'll try a newer kernel again to see if it's really that easy for me to reproduce.
If you hit this again (even better if you have a way to reproduce) please grab the i915_error_state file from debugfs and file a bug on bugs.freedesktop.org against DRM - DRI (Intel). We do know of a few recent issues introduced around 3.7 kernels, preliminary patches are floating around. The error state should be good enough to decide whether you're hitting the same issues.
Thanks, Daniel