On 26/09/2019 14:37, Koenig, Christian wrote:
Am 26.09.19 um 14:31 schrieb Steven Price:
drm_sched_cleanup_jobs() attempts to free finished jobs, however because it is called as the condition of wait_event_interruptible() it must not sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
Instead let's rename drm_sched_cleanup_jobs() to drm_sched_get_cleanup_job() and simply return a job for processing if there is one. The caller can then call the free_job() callback outside the wait_event_interruptible() where sleeping is possible before re-checking and returning to sleep if necessary.
Signed-off-by: Steven Price steven.price@arm.com
Changes from v2:
- Actually move list_first_entry_or_null() within the lock
drivers/gpu/drm/scheduler/sched_main.c | 42 ++++++++++++++------------ 1 file changed, 23 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 9a0ee74d82dc..e4bd792f2b29 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) }
/**
- drm_sched_cleanup_jobs - destroy finished jobs
- drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
- @sched: scheduler instance
- Remove all finished jobs from the mirror list and destroy them.
- Returns the next finished job from the mirror list (if there is one)
*/
- ready for it to be destroyed.
-static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) +static struct drm_sched_job * +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) {
struct drm_sched_job *job = NULL; unsigned long flags;
/* Don't destroy jobs while the timeout worker is running */ if (sched->timeout != MAX_SCHEDULE_TIMEOUT && !cancel_delayed_work(&sched->work_tdr))
return;
return NULL;
- while (!list_empty(&sched->ring_mirror_list)) {
struct drm_sched_job *job;
- spin_lock_irqsave(&sched->job_list_lock, flags);
job = list_first_entry(&sched->ring_mirror_list,
- job = list_first_entry_or_null(&sched->ring_mirror_list, struct drm_sched_job, node);
if (!dma_fence_is_signaled(&job->s_fence->finished))
break;
spin_lock_irqsave(&sched->job_list_lock, flags);
- if (job && dma_fence_is_signaled(&job->s_fence->finished)) { /* remove job from ring_mirror_list */ list_del_init(&job->node);
spin_unlock_irqrestore(&sched->job_list_lock, flags);
sched->ops->free_job(job);
- } else {
job = NULL;
/* queue timeout for next job */
drm_sched_start_timeout(sched);
Sorry I've only seen this right now: This won't work.
See you always need to start the timeout, or cancel_delayed_work() will abort the whole thing after returning the first job.
Good spot!
My logic was that drm_sched_main() would call again and that would re-arm the timeout after the job has been completed.
But looking at this more closely I see that isn't actually going to work reliably - if the work has been cancelled the first time then cancel_delayed_work() will return false causing an early out and the timeout isn't re-armed.
Is it as simple as moving drm_sched_start_timeout() out of the else case and unconditionally re-arming the timeout? I'm worried about races with the job that is being returned, although it has already been removed from ring_mirror_list so perhaps it's safe to do?
Alternatively the caller could manually re-arm the timeout after handling the job free.
Steve
Christian.
}
- /* queue timeout for next job */
- spin_lock_irqsave(&sched->job_list_lock, flags);
- drm_sched_start_timeout(sched); spin_unlock_irqrestore(&sched->job_list_lock, flags);
return job; }
/**
@@ -698,12 +696,18 @@ static int drm_sched_main(void *param) struct drm_sched_fence *s_fence; struct drm_sched_job *sched_job; struct dma_fence *fence;
struct drm_sched_job *cleanup_job = NULL;
wait_event_interruptible(sched->wake_up_worker,
(drm_sched_cleanup_jobs(sched),
(cleanup_job = drm_sched_get_cleanup_job(sched)) || (!drm_sched_blocked(sched) && (entity = drm_sched_select_entity(sched))) ||
kthread_should_stop()));
kthread_should_stop());
while (cleanup_job) {
sched->ops->free_job(cleanup_job);
cleanup_job = drm_sched_get_cleanup_job(sched);
}
if (!entity) continue;
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel