Hi,
在 2011年10月17日 下午2:34, chenhc@lemote.com写道:
If I start X but switch to the console, then do suspend & resume, "GPU reset" hardly happen. but there is a new problem that the IRQ of radeon card is disabled. Maybe "GPU reset" has something to do with "IRQ disabled"?
I have tried "irqpoll", it doesn't fix this problem.
[ 571.914062] irq 6: nobody cared (try booting with the "irqpoll" option) [ 571.914062] Call Trace: [ 571.914062] [<ffffffff806f3248>] dump_stack+0x8/0x34 [ 571.914062] [<ffffffff8027e1e4>] __report_bad_irq.clone.6+0x44/0x15c [ 571.914062] [<ffffffff8027e584>] note_interrupt+0x204/0x2a0 [ 571.914062] [<ffffffff8027c7cc>] handle_irq_event_percpu+0x19c/0x1f8 [ 571.914062] [<ffffffff8027c890>] handle_irq_event+0x68/0xa8 [ 571.914062] [<ffffffff8027f038>] handle_level_irq+0xd8/0x13c [ 571.914062] [<ffffffff8027bec8>] generic_handle_irq+0x48/0x58 [ 571.914062] [<ffffffff80204574>] do_IRQ+0x18/0x24 [ 571.914062] [<ffffffff8020152c>] mach_irq_dispatch+0xf0/0x194 [ 571.914062] [<ffffffff80202a40>] ret_from_irq+0x0/0x4 [ 571.914062] [ 571.914062] handlers: [ 571.914062] [<ffffffff8053bba8>] radeon_driver_irq_handler_kms
P.S.: use the latest kernel from git, and irq6 is not shared by other devices.
Does fence_wait depends on GPU's interrupt? If yes, then can I say "GPU
lockup" is caused by unexpected disabling of GPU's irq?
Hi Alex, Michel
2011/10/5 Alex Deucher alexdeucher@gmail.com
2011/10/5 Michel D鋘zer michel@daenzer.net:
On Don, 2011-09-29 at 17:17 +0800, Chen Jie wrote:
We got occasionally "GPU lockup" after resuming from suspend(on
mipsel
platform with a mips64 compatible CPU and rs780e, the kernel is 3.1.0-rc8 64bit). Related kernel message:
[...]
[ 177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than 10019msec [ 177.089843] ------------[ cut here ]------------ [ 177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x25c/0x33c() [ 177.105468] GPU lockup (waiting for 0x000013C3 last fence id 0x000013AD) [ 177.113281] Modules linked in: psmouse serio_raw [ 177.117187] Call Trace: [ 177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34 [ 177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0 [ 177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44 [ 177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c [ 177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220 [ 177.148437] [<ffffffff8053b478>] radeon_gem_wait_idle_ioctl +0x80/0x114 [ 177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc [ 177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38 [ 177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c [ 177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138 [ 177.179687] ---[ end trace 92f63d998efe4c6d ]--- [ 177.187500] radeon 0000:01:05.0: GPU softreset [ 177.191406] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xF57C2030 [ 177.195312] radeon 0000:01:05.0:
R_008014_GRBM_STATUS2=0x00111103
[ 177.203125] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20023040 [ 177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
[...]
What may cause a "GPU lockup"?
Lots of things... The most common cause is an incorrect command stream sent to the GPU by userspace or the kernel.
Why reset didn't work?
Might be related to 'Wait for MC idle timedout !', but I don't know offhand what could be up with that.
BTW, one question: I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes need_dma32 was set. Is it correct? (drivers/char/agp is not available on mips, could that be the reason?)
Not sure, Alex?
You don't AGP for newer IGP cards (rs4xx+). It gets set by default if the card is not AGP or PCIE. That should be changed as only the legacy r1xx PCI GART block has that limitation. I'll send a patch out shortly.
Got it, thanks for the reply.
On Die, 2011-10-18 at 16:35 +0800, Chen Jie wrote:
在 2011年10月17日 下午2:34, chenhc@lemote.com写道: If I start X but switch to the console, then do suspend & resume, "GPU reset" hardly happen. but there is a new problem that the IRQ of radeon card is disabled. Maybe "GPU reset" has something to do with "IRQ disabled"?
I have tried "irqpoll", it doesn't fix this problem. [ 571.914062] irq 6: nobody cared (try booting with the "irqpoll" option) [ 571.914062] Call Trace: [ 571.914062] [<ffffffff806f3248>] dump_stack+0x8/0x34 [ 571.914062] [<ffffffff8027e1e4>] __report_bad_irq.clone.6 +0x44/0x15c [ 571.914062] [<ffffffff8027e584>] note_interrupt+0x204/0x2a0 [ 571.914062] [<ffffffff8027c7cc>] handle_irq_event_percpu +0x19c/0x1f8 [ 571.914062] [<ffffffff8027c890>] handle_irq_event+0x68/0xa8 [ 571.914062] [<ffffffff8027f038>] handle_level_irq +0xd8/0x13c [ 571.914062] [<ffffffff8027bec8>] generic_handle_irq +0x48/0x58 [ 571.914062] [<ffffffff80204574>] do_IRQ+0x18/0x24 [ 571.914062] [<ffffffff8020152c>] mach_irq_dispatch +0xf0/0x194 [ 571.914062] [<ffffffff80202a40>] ret_from_irq+0x0/0x4 [ 571.914062] [ 571.914062] handlers: [ 571.914062] [<ffffffff8053bba8>] radeon_driver_irq_handler_kms P.S.: use the latest kernel from git, and irq6 is not shared by other devices.
Does fence_wait depends on GPU's interrupt? If yes, then can I say "GPU lockup" is caused by unexpected disabling of GPU's irq?
No, if the GPU didn't actually lock up, the fences should still signal eventually, as radeon_fence_signaled()->radeon_fence_poll_locked() is called after the wait for the SW interrupt times out.
dri-devel@lists.freedesktop.org