New subject: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.

29 Sep 2011


      Hi,
Add more information.
We got occasionally "GPU lockup" after resuming from suspend(on mipsel
platform with a mips64 compatible CPU and rs780e, the kernel is 3.1.0-rc8
64bit).  Related kernel message:
/* return from STR */
[  156.152343] radeon 0000:01:05.0: WB enabled
[  156.187500] [drm] ring test succeeded in 0 usecs
[  156.187500] [drm] ib test succeeded in 0 usecs
[  156.398437] ata2: SATA link down (SStatus 0 SControl 300)
[  156.398437] ata3: SATA link down (SStatus 0 SControl 300)
[  156.398437] ata4: SATA link down (SStatus 0 SControl 300)
[  156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  156.597656] ata1.00: configured for UDMA/133
[  156.613281] usb 1-5: reset high speed USB device number 4 using ehci_hcd
[  157.027343] usb 3-2: reset low speed USB device number 2 using ohci_hcd
[  157.609375] usb 3-3: reset low speed USB device number 3 using ohci_hcd
[  157.683593] r8169 0000:02:00.0: eth0: link up
[  165.621093] PM: resume of devices complete after 9679.556 msecs
[  165.628906] Restarting tasks ... done.
[  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
10019msec
[  177.089843] ------------[ cut here ]------------
[  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
radeon_fence_wait+0x25c/0x33c()
[  177.105468] GPU lockup (waiting for 0x000013C3 last fence id 0x000013AD)
[  177.113281] Modules linked in: psmouse serio_raw
[  177.117187] Call Trace:
[  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
[  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
[  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
[  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
[  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
[  177.148437] [<ffffffff8053b478>] radeon_gem_wait_idle_ioctl+0x80/0x114
[  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
[  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
[  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
[  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
[  177.179687] ---[ end trace 92f63d998efe4c6d ]---
[  177.187500] radeon 0000:01:05.0: GPU softreset
[  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
[  177.195312] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00111103
[  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
[  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.367187] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[  177.390625] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
[  177.414062] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[  177.417968] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[  177.425781] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x2002B040
[  177.433593] radeon 0000:01:05.0: GPU reset succeed
[  177.605468] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.761718] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.804687] radeon 0000:01:05.0: WB enabled
[  178.000000] [drm:r600_ring_test] *ERROR* radeon: ring test failed
(scratch(0x8504)=0xCAFEDEAD)
[  178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume
[  178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(5).
[  178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(6).
...
What may cause a "GPU lockup"? Why reset didn't work? Any idea?
BTW,  one question:
I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes
need_dma32 was set.
Is it correct? (drivers/char/agp is not available on mips, could that be the
reason?)
[  177.179687]在 2011年9月28日 下午3:23， chenhc@lemote.com写道：
...
Hi Alex,
When we do STR (S3) with a RS780E radeon card on MIPS platform. "GPU
reset" may happen after resume (the possibility is about 5%). After that,
X is unusuable.
We know there is a "ring test" at system resume time and GPU reset time.
Whether GPU reset happens, the "ring test" at system resume time is always
successful. But the "ring test" at GPU reset time usually fails.
We use the latest kernel (3.1.0-RC8 from git) and X.org is 7.6.
Any ideas?
Best regards,
Huacai Chen
Regards,
- Chen Jie

Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.