New subject: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.

7 Dec 2011


      When "MC timeout" happens at GPU reset, we found the 12th and 13th
bits of R_000E50_SRBM_STATUS is 1. From kernel code we found these
two bits are like this:
#define         G_000E50_MCDX_BUSY(x)              (((x) >> 12) & 1)
#define         G_000E50_MCDW_BUSY(x)              (((x) >> 13) & 1)
Could you please tell me what does they mean? And if possible,
I want to know the functionalities of these 5 registers in detail:
#define R_000E60_SRBM_SOFT_RESET                       0x0E60
#define R_000E50_SRBM_STATUS                           0x0E50
#define R_008020_GRBM_SOFT_RESET                0x8020
#define R_008010_GRBM_STATUS                    0x8010
#define R_008014_GRBM_STATUS2                   0x8014
A bit more info: If I reset the MC after resetting CP (this is what
Linux-2.6.34 does, but removed since 2.6.35), then "MC timeout" will
disappear, but there is still "ring test failed".
Huacai Chen
...
2011/11/8  chenhc@lemote.com:
...
And, I want to know something:
1, Does GPU use MC to access GTT?
Yes.  All GPU clients (display, 3D, etc.) go through the MC to access
memory (vram or gart).
...
2, What can cause MC timeout？
Lots of things.  Some GPU client still active, some GPU client hung or
not properly initialized.
Alex
...
...
Hi,
Some status update.
在 2011年9月29日 下午5:17，Chen Jie chenj@lemote.com 写道：
...
Hi,
Add more information.
We got occasionally "GPU lockup" after resuming from suspend(on mipsel
platform with a mips64 compatible CPU and rs780e, the kernel is
3.1.0-rc8
64bit).  Related kernel message:
/* return from STR */
[  156.152343] radeon 0000:01:05.0: WB enabled
[  156.187500] [drm] ring test succeeded in 0 usecs
[  156.187500] [drm] ib test succeeded in 0 usecs
[  156.398437] ata2: SATA link down (SStatus 0 SControl 300)
[  156.398437] ata3: SATA link down (SStatus 0 SControl 300)
[  156.398437] ata4: SATA link down (SStatus 0 SControl 300)
[  156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  156.597656] ata1.00: configured for UDMA/133
[  156.613281] usb 1-5: reset high speed USB device number 4 using
ehci_hcd
[  157.027343] usb 3-2: reset low speed USB device number 2 using
ohci_hcd
[  157.609375] usb 3-3: reset low speed USB device number 3 using
ohci_hcd
[  157.683593] r8169 0000:02:00.0: eth0: link up
[  165.621093] PM: resume of devices complete after 9679.556 msecs
[  165.628906] Restarting tasks ... done.
[  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
10019msec
[  177.089843] ------------[ cut here ]------------
[  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
radeon_fence_wait+0x25c/0x33c()
[  177.105468] GPU lockup (waiting for 0x000013C3 last fence id
0x000013AD)
[  177.113281] Modules linked in: psmouse serio_raw
[  177.117187] Call Trace:
[  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
[  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
[  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
[  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
[  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
[  177.148437] [<ffffffff8053b478>]
radeon_gem_wait_idle_ioctl+0x80/0x114
[  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
[  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
[  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
[  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
[  177.179687] ---[ end trace 92f63d998efe4c6d ]---
[  177.187500] radeon 0000:01:05.0: GPU softreset
[  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
[  177.195312] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00111103
[  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
[  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.367187] radeon 0000:01:05.0:
R_008020_GRBM_SOFT_RESET=0x00007FEE
[  177.390625] radeon 0000:01:05.0:
R_008020_GRBM_SOFT_RESET=0x00000001
[  177.414062] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[  177.417968] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[  177.425781] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x2002B040
[  177.433593] radeon 0000:01:05.0: GPU reset succeed
[  177.605468] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.761718] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.804687] radeon 0000:01:05.0: WB enabled
[  178.000000] [drm:r600_ring_test] *ERROR* radeon: ring test failed
(scratch(0x8504)=0xCAFEDEAD)
After pinned ring in VRAM, it warned an ib test failure. It seems
something wrong with accessing through GTT.
We dump gart table just after stopped cp, and compare gart table with
the dumped one just after r600_pcie_gart_enable, and don't find any
difference.
Any idea?
...
[  178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume
[  178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't
schedule
IB(5).
[  178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't
schedule
IB(6).
...
Regards,
-- Chen Jie

Re: [mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.