Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

8 Nov 2021

I stumbled across this thread when I ran into the same issue, while
working out how to move drm/msm to use scheduler's retire +
timeout/recovery (and get rid of our own mirror list of in-flight
jobs).  We already have hw error detection enabled, and it can signal
quite fast, so assuming the first job on the list is the guilty job
just won't work.
But I was considering a slightly different approach to fixing this,
instead just handling it all in drm_sched_main() and getting rid of
the complicated kthread parking gymnastics.  Ie. something along the
lines of:
---------------------

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 67382621b429..4d6ce775c316 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -764,6 +764,45 @@ static bool drm_sched_blocked(struct
drm_gpu_scheduler *sched)
        return false;
 }
+static bool handle_timeout(struct drm_gpu_scheduler *sched)
+{
+       struct drm_sched_job *bad;
+
+       if (!sched->has_timeout)
+               return false;
+
+       sched->has_timeout = false;
+
+       spin_lock(&sched->job_list_lock);
+       bad = list_first_entry_or_null(&sched->pending_list,
+                                      struct drm_sched_job, list);
+
+       if (!bad) {
+               spin_unlock(&sched->job_list_lock);
+               return false;
+       }
+
+       spin_unlock(&sched->job_list_lock);
+
+       if (sched->timeout_wq == system_wq) {
+               /*
+                * If driver has no specific requirements about serializing
+                * reset wrt. other engines, just call timedout_job() directly
+                */
+               sched->ops->timedout_job(job);
+       } else {
+               /*
+                * Otherwise queue it on timeout_wq and wait for it to complete
+                */
+               ... more typing needed here ...
+       }
+
+       if (sched->free_guilty) {
+               sched->ops->free_job(job);
+               sched->free_guilty = false;
+       }
+}
+
 /**
  * drm_sched_main - main scheduler thread
  *
@@ -787,6 +826,7 @@ static int drm_sched_main(void *param)
wait_event_interruptible(sched->wake_up_worker,
                                         (cleanup_job =
drm_sched_get_cleanup_job(sched)) ||
+                                        handle_timeout(sched) ||
                                         (!drm_sched_blocked(sched) &&
                                          (entity =
drm_sched_select_entity(sched))) ||
                                         kthread_should_stop());
---------------------
drm_sched_fault() and the sw timeout handler would just set
sched->has_timeout and kick sched->wake_up_worker.
And since we handle the timeout case after
drm_sched_get_cleanup_job(), we know that all of the successfully
completed jobs have already been popped off the list, and won't be
unfairly maligned.
BR,
-R
On Tue, Aug 31, 2021 at 6:29 PM Liu, Monk Monk.Liu@amd.com wrote:
...
[AMD Official Use Only]
Okay, I will reprepare this patch
Thanks

Monk Liu | Cloud-GPU Core team
-----Original Message-----
From: Daniel Vetter daniel@ffwll.ch
Sent: Tuesday, August 31, 2021 9:02 PM
To: Liu, Monk Monk.Liu@amd.com
Cc: amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Chen, Jingwen Jingwen.Chen@amd.com
Subject: Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler
On Tue, Aug 31, 2021 at 02:59:02PM +0200, Daniel Vetter wrote:
...
Can we please have some actual commit message here, with detailed
explanation of the race/bug/whatever, how you fix it and why this is
the best option?
On Tue, Aug 31, 2021 at 06:35:39PM +0800, Monk Liu wrote:
...
tested-by: jingwen chen jingwen.chen@amd.com
Signed-off-by: Monk Liu Monk.Liu@amd.com
Signed-off-by: jingwen chen jingwen.chen@amd.com

drivers/gpu/drm/scheduler/sched_main.c | 24
++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index ecf8140..894fdb24 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,19 +319,17 @@ static void drm_sched_job_timedout(struct work_struct *work)
    sched = container_of(work, struct drm_gpu_scheduler,
work_tdr.work);
/* Protects against concurrent deletion in

drm_sched_get_cleanup_job */

if (!__kthread_should_park(sched->thread))

This is a __ function, i.e. considered internal, and it's lockless
atomic, i.e. unordered. And you're not explaining why this works.
Iow it's probably buggy, and an just unconditionally parking the
kthread is probably the right thing to do. If it's not the right thing
to do, there's a bug here for sure.
Also why don't we reuse the function drivers already have to stop a scheduler thread? We seem to have two kthread_park now, that's probably one too much.
-Daniel
...
...

      kthread_park(sched->thread);


spin_lock(&sched->job_list_lock);
  job = list_first_entry_or_null(&sched->pending_list,
                                 struct drm_sched_job, list);
if (job) {



      /*


       * Remove the bad job so it cannot be freed by concurrent


       * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread


       * is parked at which point it's safe.


       */


      list_del_init(&job->list);
      spin_unlock(&sched->job_list_lock);





      /* vendor's timeout_job should call drm_sched_start() */
      status = job->sched->ops->timedout_job(job);

      /*



@@ -393,20 +391,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
    kthread_park(sched->thread);
/*



Reinsert back the bad job here - now it's safe as



drm_sched_get_cleanup_job cannot race against us and release the



bad job at this point - we parked (waited for) any in progress



(earlier) cleanups and drm_sched_get_cleanup_job will not be called



now until the scheduler thread is unparked.


*/
if (bad && bad->sched == sched)
      /*


       * Add at the head of the queue to reflect it was the earliest


       * job extracted.


       */


      list_add(&bad->list, &sched->pending_list);



/*
Iterate the job list from later to  earlier one and either deactive
their HW callbacks or remove them from pending list if they already
signaled.



--
2.7.4
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.
ffwll.ch%2F&amp;data=04%7C01%7CMonk.Liu%40amd.com%7C298815bea18f4fbf76
b308d96c7f7a8b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6376601170
51194614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiL
CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=QzgCU7%2BPdA0aWL5%2BJLg
KeKbGaMMGqeGI9KE0P0LXlN4%3D&amp;reserved=0
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

Monk Liu | Cloud-GPU Core team