On Mit, 2012-06-27 at 14:14 -0400, j.glisse@gmail.com wrote:
From: Jerome Glisse jglisse@redhat.com
After unrecovered GPU lockup avoid any GPU activities to avoid things like kernel segfault and alike to happen in any of the path that assume hw is working.
The segfault is due to PCIE vram gart table being unmapped after suspend in the GPU reset path. To avoid segault to happen and to avoid further GPU activity if unsuccessful at reseting GPU we use the accel_working boolean to transform ttm activities into noop. It does not impact the module load path because in that path ttm have an empty schedule queue and accel_working will be set to true as soon as the gart table is in valid state. Because ttm might have work queued it is better to use the accel working then disabling radeon_bo ioctl.
To trigger the segfault launch a program that repeatly create bo in ttm and let it run in background, then trigger gpu lockup from another process.
This patch also for video mode restoring on r1xx,r2xx,r3xx,r4xx, r5xx,rs4xx,rs6xx GPU even if GPU reset fail. When GPU reset fails it is very likely (so far i never had it not working) that the modesetting part of the GPU is still alive. So we can have a chance to get kernel backtrace or other debugging informations on the screen if we always restore the video mode.
v2: fix spelling error and disable accel before suspend and reenable it after pcie gart initialization to be even more cautious about possible segfault. Improve commit message v3: Improve commit message to describe the video mode restoring no matter what. v4: Avoid issue after successfull GPU lockup recovery. Don't do noop ttm move, instead report error if move needs bind or unbind or fallback to memcpy. Don't restrict new bo domain instead refuse to create new bo if gpu reset failed. Disable accel working in gart vram table unpin thus we don't change the behavior of the suspend path. v5: Avoid set domain to also trigger noop bind/unbind, instead force it to wait for GPU reset to go through or return failure if gpu reset fails.
cc: stable@vger.kernel.org Signed-off-by: Jerome Glisse jglisse@redhat.com
[...]
/* try memcpy */
goto memcpy;
This comment is redundant. :)
Either way though,
Reviewed-by: Michel Dänzer michel.daenzer@amd.com