New subject: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.

18 Oct 2011


      Hi,
在 2011年10月17日 下午2:34， chenhc@lemote.com写道：
...
If I start X but switch to the console, then do suspend & resume, "GPU
reset" hardly happen. but there is a new problem that the IRQ of radeon
card is disabled. Maybe "GPU reset" has something to do with "IRQ
disabled"?
I have tried "irqpoll", it doesn't fix this problem.
[  571.914062] irq 6: nobody cared (try booting with the "irqpoll" option)
[  571.914062] Call Trace:
[  571.914062] [<ffffffff806f3248>] dump_stack+0x8/0x34
[  571.914062] [<ffffffff8027e1e4>] __report_bad_irq.clone.6+0x44/0x15c
[  571.914062] [<ffffffff8027e584>] note_interrupt+0x204/0x2a0
[  571.914062] [<ffffffff8027c7cc>] handle_irq_event_percpu+0x19c/0x1f8
[  571.914062] [<ffffffff8027c890>] handle_irq_event+0x68/0xa8
[  571.914062] [<ffffffff8027f038>] handle_level_irq+0xd8/0x13c
[  571.914062] [<ffffffff8027bec8>] generic_handle_irq+0x48/0x58
[  571.914062] [<ffffffff80204574>] do_IRQ+0x18/0x24
[  571.914062] [<ffffffff8020152c>] mach_irq_dispatch+0xf0/0x194
[  571.914062] [<ffffffff80202a40>] ret_from_irq+0x0/0x4
[  571.914062]
[  571.914062] handlers:
[  571.914062] [<ffffffff8053bba8>] radeon_driver_irq_handler_kms
P.S.: use the latest kernel from git, and irq6 is not shared by other
devices.
Does fence_wait depends on GPU's interrupt? If yes, then can I say "GPU
lockup" is caused by unexpected disabling of GPU's irq?
...
...
Hi Alex, Michel
2011/10/5 Alex Deucher alexdeucher@gmail.com
...
2011/10/5 Michel D鋘zer michel@daenzer.net:
...
On Don, 2011-09-29 at 17:17 +0800, Chen Jie wrote:
...
We got occasionally "GPU lockup" after resuming from suspend(on
mipsel
...
...
platform with a mips64 compatible CPU and rs780e, the kernel is
3.1.0-rc8 64bit).  Related kernel message:
[...]
...
[  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
10019msec
[  177.089843] ------------[ cut here ]------------
[  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
radeon_fence_wait+0x25c/0x33c()
[  177.105468] GPU lockup (waiting for 0x000013C3 last fence id
0x000013AD)
[  177.113281] Modules linked in: psmouse serio_raw
[  177.117187] Call Trace:
[  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
[  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
[  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
[  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
[  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
[  177.148437] [<ffffffff8053b478>] radeon_gem_wait_idle_ioctl
+0x80/0x114
[  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
[  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
[  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
[  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
[  177.179687] ---[ end trace 92f63d998efe4c6d ]---
[  177.187500] radeon 0000:01:05.0: GPU softreset
[  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
[  177.195312] radeon 0000:01:05.0:
R_008014_GRBM_STATUS2=0x00111103
...
...
[  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
[  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
[...]
...
What may cause a "GPU lockup"?
Lots of things... The most common cause is an incorrect command stream
sent to the GPU by userspace or the kernel.
...
Why reset didn't work?
Might be related to 'Wait for MC idle timedout !', but I don't know
offhand what could be up with that.
...
BTW,  one question:
I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes
need_dma32 was set.
Is it correct? (drivers/char/agp is not available on mips, could that
be the reason?)
Not sure, Alex?
You don't AGP for newer IGP cards (rs4xx+).  It gets set by default if
the card is not AGP or PCIE.  That should be changed as only the
legacy r1xx PCI GART block has that limitation.  I'll send a patch out
shortly.
Got it, thanks for the reply.

Re: Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.