Re: [PATCH v5 01/16] drm/sched: Document what the timedout_job method should do

29 Jun 2021

On Tue, Jun 29, 2021 at 09:34:55AM +0200, Boris Brezillon wrote:
...
The documentation is a bit vague and doesn't really describe what the
->timedout_job() is expected to do. Let's add a few more details.
v5:

New patch

Suggested-by: Daniel Vetter daniel.vetter@ffwll.ch
Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch
...

include/drm/gpu_scheduler.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 10225a0a35d0..65700511e074 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,6 +239,20 @@ struct drm_sched_backend_ops {
    * @timedout_job: Called when a job has taken too long to execute,
    * to trigger GPU recovery.
    *

* This method is called in a workqueue context.


*


* Drivers typically issue a reset to recover from GPU hangs, and this


* procedure usually follows the following workflow:


*


* 1. Stop the scheduler using drm_sched_stop(). This will park the


*    scheduler thread and cancel the timeout work, guaranteeing that


*    nothing is queued while we reset the hardware queue


* 2. Try to gracefully stop non-faulty jobs (optional)


* 3. Issue a GPU reset (driver-specific)


* 4. Re-submit jobs using drm_sched_resubmit_jobs()


* 5. Restart the scheduler using drm_sched_start(). At that point, new


*    jobs can be queued, and the scheduler thread is unblocked


*


Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
and the underlying driver has started or completed recovery.




-- 
2.31.1
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH v5 01/16] drm/sched: Document what the timedout_job method should do