Re: [PATCH v3] drm: Don't free jobs in wait_event_interruptible()

26 Sep 2019

On 26/09/2019 14:37, Koenig, Christian wrote:
...
Am 26.09.19 um 14:31 schrieb Steven Price:
...
drm_sched_cleanup_jobs() attempts to free finished jobs, however because
it is called as the condition of wait_event_interruptible() it must not
sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
Instead let's rename drm_sched_cleanup_jobs() to
drm_sched_get_cleanup_job() and simply return a job for processing if
there is one. The caller can then call the free_job() callback outside
the wait_event_interruptible() where sleeping is possible before
re-checking and returning to sleep if necessary.
Signed-off-by: Steven Price steven.price@arm.com
Changes from v2:

Actually move list_first_entry_or_null() within the lock

drivers/gpu/drm/scheduler/sched_main.c | 42 ++++++++++++++------------
  1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 9a0ee74d82dc..e4bd792f2b29 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
  }
/**


drm_sched_cleanup_jobs - destroy finished jobs





drm_sched_get_cleanup_job - fetch the next finished job to be destroyed

@sched: scheduler instance






Remove all finished jobs from the mirror list and destroy them.





Returns the next finished job from the mirror list (if there is one)



ready for it to be destroyed.

*/

-static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
+static struct drm_sched_job *
+drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
  {

struct drm_sched_job *job = NULL;
unsigned long flags;
/* Don't destroy jobs while the timeout worker is running */
if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
  !cancel_delayed_work(&sched->work_tdr))



return;





return NULL;





while (!list_empty(&sched->ring_mirror_list)) {
struct drm_sched_job *job;




spin_lock_irqsave(&sched->job_list_lock, flags);


job = list_first_entry(&sched->ring_mirror_list,




job = list_first_entry_or_null(&sched->ring_mirror_list,
		       struct drm_sched_job, node);


if (!dma_fence_is_signaled(&job->s_fence->finished))


	break;



spin_lock_irqsave(&sched->job_list_lock, flags);




if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
/* remove job from ring_mirror_list */
list_del_init(&job->node);


spin_unlock_irqrestore(&sched->job_list_lock, flags);



sched->ops->free_job(job);




} else {
job = NULL;


/* queue timeout for next job */


drm_sched_start_timeout(sched);



Sorry I've only seen this right now: This won't work.
See you always need to start the timeout, or cancel_delayed_work() will 
abort the whole thing after returning the first job.
Good spot!
My logic was that drm_sched_main() would call again and that would
re-arm the timeout after the job has been completed.
But looking at this more closely I see that isn't actually going to work
reliably - if the work has been cancelled the first time then
cancel_delayed_work() will return false causing an early out and the
timeout isn't re-armed.
Is it as simple as moving drm_sched_start_timeout() out of the else case
and unconditionally re-arming the timeout? I'm worried about races with
the job that is being returned, although it has already been removed
from ring_mirror_list so perhaps it's safe to do?
Alternatively the caller could manually re-arm the timeout after
handling the job free.
Steve
...
Christian.
...
}

/* queue timeout for next job */
spin_lock_irqsave(&sched->job_list_lock, flags);
drm_sched_start_timeout(sched);
spin_unlock_irqrestore(&sched->job_list_lock, flags);


return job;
}
/**


@@ -698,12 +696,18 @@ static int drm_sched_main(void *param)
   	struct drm_sched_fence *s_fence;
   	struct drm_sched_job *sched_job;
   	struct dma_fence *fence;

struct drm_sched_job *cleanup_job = NULL;


wait_event_interruptible(sched->wake_up_worker,



			 (drm_sched_cleanup_jobs(sched),




			 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
		 (!drm_sched_blocked(sched) &&
		  (entity = drm_sched_select_entity(sched))) ||




			 kthread_should_stop()));




			 kthread_should_stop());



while (cleanup_job) {


	sched->ops->free_job(cleanup_job);


	cleanup_job = drm_sched_get_cleanup_job(sched);


}


if (!entity)
	continue;



dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH v3] drm: Don't free jobs in wait_event_interruptible()

Signed-off-by: Steven Price steven.price@arm.com