Hi,
With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of text scroll in an xterm, such as when extracting a tar archive. Such as this one (note the timestamps):
[22865.157750] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [22865.157763] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring [22871.165992] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [22871.166008] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring [22877.417902] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [22877.417915] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring [22883.426132] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [22883.426146] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring
These hangs did not occur with Linux 3.1 or they were so infrequent that I didn't notice them. The machine is an x220 laptop with SNB CPU. The kernel command line contains:
i915.i915_enable_rc6=1 i915.i915_enable_fbc=1 i915.lvds_downclock=1
Note that VT-d is enabled in the BIOS, but the kernel is compiled without IOMMU support. Because the IOMMU is not enabled, I'm inclined to think that this problem is not VT-d related.
Cheers,
- Udo
On Wed, 21 Dec 2011 19:54:10 +0100, Udo Steinberg udo@hypervisor.org wrote:
Hi,
With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of text scroll in an xterm, such as when extracting a tar archive. Such as this one (note the timestamps):
Can you try with semaphores disabled?
i915.semaphores=0
And, you should be able to remove the explicit rc6 argument now; if it will work, the kernel should turn it on automatically.
On Wed, 21 Dec 2011 12:00:37 -0800 Keith Packard (KP) wrote:
KP> On Wed, 21 Dec 2011 19:54:10 +0100, Udo Steinberg udo@hypervisor.org wrote: KP> > Hi, KP> > KP> > With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of KP> > text scroll in an xterm, such as when extracting a tar archive. Such as this KP> > one (note the timestamps): KP> KP> Can you try with semaphores disabled? KP> KP> i915.semaphores=0
That makes the problem go away. If you need more help tracking down the problem, then let me know. I can reproduce it fairly easily with something as simple as:
while true; do dmesg; done
KP> And, you should be able to remove the explicit rc6 argument now; if it KP> will work, the kernel should turn it on automatically.
Ok.
Cheers,
- Udo
On Wed, 21 Dec 2011 22:26:26 +0100, Udo Steinberg udo@hypervisor.org wrote:
That makes the problem go away. If you need more help tracking down the problem, then let me know. I can reproduce it fairly easily with something as simple as:
while true; do dmesg; done
Are you using SNA?
On Wed, 21 Dec 2011 14:55:07 -0800 Keith Packard (KP) wrote:
KP> On Wed, 21 Dec 2011 22:26:26 +0100, Udo Steinberg udo@hypervisor.org wrote: KP> KP> > That makes the problem go away. If you need more help tracking down the KP> > problem, then let me know. I can reproduce it fairly easily with something KP> > as simple as: KP> > KP> > while true; do dmesg; done KP> KP> Are you using SNA?
I don't think so. I'm using the following packages:
xorg-server-1.9.5 xf86-video-intel-2.15.0 libdrm-2.4.25 mesa-7.10.2
I quick google search suggests that at least some of them are too old to support SNA.
Cheers,
- Udo
On Thu, Dec 22, 2011 at 01:45:20AM +0100, Udo Steinberg wrote:
On Wed, 21 Dec 2011 14:55:07 -0800 Keith Packard (KP) wrote:
KP> On Wed, 21 Dec 2011 22:26:26 +0100, Udo Steinberg udo@hypervisor.org wrote: KP> KP> > That makes the problem go away. If you need more help tracking down the KP> > problem, then let me know. I can reproduce it fairly easily with something KP> > as simple as: KP> > KP> > while true; do dmesg; done KP> KP> Are you using SNA?
I don't think so. I'm using the following packages:
xorg-server-1.9.5 xf86-video-intel-2.15.0 libdrm-2.4.25 mesa-7.10.2
I quick google search suggests that at least some of them are too old to support SNA.
Can you please apply the patch available at
http://cgit.freedesktop.org/~danvet/drm/patch/?id=0b3ecfa8c9b00f50d514fbcc12...
This one will let your gpu keep hanging in the "stuck on semphore wait" condition. Later on the hangcheck should kick in and grab a gpu error state (in the file i915_error_state in debugfs). Can you please attach that one?
Thanks, Daniel
On Thu, 22 Dec 2011 09:22:33 +0100 Daniel Vetter (DV) wrote:
DV> Can you please apply the patch available at DV> DV> http://cgit.freedesktop.org/~danvet/drm/patch/?id=0b3ecfa8c9b00f50d514fbcc12... DV> DV> This one will let your gpu keep hanging in the "stuck on semphore wait" DV> condition. Later on the hangcheck should kick in and grab a gpu error DV> state (in the file i915_error_state in debugfs). Can you please attach DV> that one?
Hi,
i915_error_state and dmesg are attached. This was with Linux 3.2.0-rc6 and the aforementioned patch applied.
Cheers,
- Udo
On Mon, Jan 2, 2012 at 08:09, Udo Steinberg udo@hypervisor.org wrote:
On Thu, 22 Dec 2011 09:22:33 +0100 Daniel Vetter (DV) wrote:
DV> Can you please apply the patch available at DV> DV> http://cgit.freedesktop.org/~danvet/drm/patch/?id=0b3ecfa8c9b00f50d514fbcc12... DV> DV> This one will let your gpu keep hanging in the "stuck on semphore wait" DV> condition. Later on the hangcheck should kick in and grab a gpu error DV> state (in the file i915_error_state in debugfs). Can you please attach DV> that one?
Hi,
i915_error_state and dmesg are attached. This was with Linux 3.2.0-rc6 and the aforementioned patch applied.
Thanks a lot for gathering the error state. It looks like the semaphore code deadlocked, which should not have happened. Can you please test with the my-next branch available at:
http://cgit.freedesktop.org/~danvet/drm/log/?h=my-next
It contains a few fixes for races (that might cause your issue) and dumps some additional information into the error state. You need to boot with i915.semaphores=1 because they're not enabled by default in that tree.
Again, please attach the error state if the gpu hangs.
Yours, Daniel
On Thu, 22 Dec 2011 01:45:20 +0100, Udo Steinberg udo@hypervisor.org wrote:
I quick google search suggests that at least some of them are too old to support SNA.
Sounds good. If you can capture the error as Daniel suggests, that would be great. In any case, I'll post a revert of the semaphore enable patch as it looks like that's still not working right...
On Thu, Dec 22, 2011 at 15:00, Keith Packard keithp@keithp.com wrote:
On Thu, 22 Dec 2011 01:45:20 +0100, Udo Steinberg udo@hypervisor.org wrote:
I quick google search suggests that at least some of them are too old to support SNA.
Sounds good. If you can capture the error as Daniel suggests, that would be great. In any case, I'll post a revert of the semaphore enable patch as it looks like that's still not working right...
Could we revert it for SNB, and leave it enabled for IVB?
It would be just the chunk which checks for intel_iommu status in the patch.
On Fri, 23 Dec 2011 09:55:34 -0200, Eugeni Dodonov eugeni@dodonov.net wrote:
Could we revert it for SNB, and leave it enabled for IVB?
Yes, that's my plan. I don't want to ever disable it for IVB.
It would also be nice to know if disabling VT-d in the BIOS resolves this issue, or if building the kernel with IOMMU support and then forcibly disabling it with 'intel_iommu=off' fixes the problem. Given that you can easily reproduce this, it would be good to know how your machine differs from dozens of others which work fine with semaphores turned on...
Oh, and the output of 'lspci -nn -vv' would be nice to see in case you've got older (or newer) chips...
dri-devel@lists.freedesktop.org