On Tue, Jun 29, 2021 at 09:34:55AM +0200, Boris Brezillon wrote:
The documentation is a bit vague and doesn't really describe what the ->timedout_job() is expected to do. Let's add a few more details.
v5:
- New patch
Suggested-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch
include/drm/gpu_scheduler.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 10225a0a35d0..65700511e074 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -239,6 +239,20 @@ struct drm_sched_backend_ops { * @timedout_job: Called when a job has taken too long to execute, * to trigger GPU recovery. *
* This method is called in a workqueue context.
*
* Drivers typically issue a reset to recover from GPU hangs, and this
* procedure usually follows the following workflow:
*
* 1. Stop the scheduler using drm_sched_stop(). This will park the
* scheduler thread and cancel the timeout work, guaranteeing that
* nothing is queued while we reset the hardware queue
* 2. Try to gracefully stop non-faulty jobs (optional)
* 3. Issue a GPU reset (driver-specific)
* 4. Re-submit jobs using drm_sched_resubmit_jobs()
* 5. Restart the scheduler using drm_sched_start(). At that point, new
* jobs can be queued, and the scheduler thread is unblocked
*
- Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
- and the underlying driver has started or completed recovery.
-- 2.31.1