On 9/25/19 12:07 PM, Andrey Grodzovsky wrote:
On 9/25/19 12:00 PM, Steven Price wrote:
On 25/09/2019 16:56, Grodzovsky, Andrey wrote:
On 9/25/19 11:14 AM, Steven Price wrote:
drm_sched_cleanup_jobs() attempts to free finished jobs, however because it is called as the condition of wait_event_interruptible() it must not sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep.
Instead let's rename drm_sched_cleanup_jobs() to drm_sched_get_cleanup_job() and simply return a job for processing if there is one. The caller can then call the free_job() callback outside the wait_event_interruptible() where sleeping is possible before re-checking and returning to sleep if necessary.
Signed-off-by: Steven Price steven.price@arm.com
drivers/gpu/drm/scheduler/sched_main.c | 44 ++++++++++++++------------ 1 file changed, 24 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 9a0ee74d82dc..0ed4aaa4e6d1 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) } /**
- drm_sched_cleanup_jobs - destroy finished jobs
- drm_sched_get_cleanup_job - fetch the next finished job to be
destroyed * * @sched: scheduler instance *
- Remove all finished jobs from the mirror list and destroy them.
- Returns the next finished job from the mirror list (if there is
one)
- ready for it to be destroyed.
*/ -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) +static struct drm_sched_job * +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) { + struct drm_sched_job *job = NULL; unsigned long flags; /* Don't destroy jobs while the timeout worker is running */ if (sched->timeout != MAX_SCHEDULE_TIMEOUT && !cancel_delayed_work(&sched->work_tdr)) - return;
- while (!list_empty(&sched->ring_mirror_list)) { - struct drm_sched_job *job; + return NULL; - job = list_first_entry(&sched->ring_mirror_list, + job = list_first_entry_or_null(&sched->ring_mirror_list, struct drm_sched_job, node); - if (!dma_fence_is_signaled(&job->s_fence->finished)) - break; - spin_lock_irqsave(&sched->job_list_lock, flags); + spin_lock_irqsave(&sched->job_list_lock, flags);
+ if (job && dma_fence_is_signaled(&job->s_fence->finished)) { /* remove job from ring_mirror_list */ list_del_init(&job->node);
- spin_unlock_irqrestore(&sched->job_list_lock, flags);
- sched->ops->free_job(job); + } else { + job = NULL; + /* queue timeout for next job */ + drm_sched_start_timeout(sched); } - /* queue timeout for next job */ - spin_lock_irqsave(&sched->job_list_lock, flags); - drm_sched_start_timeout(sched); spin_unlock_irqrestore(&sched->job_list_lock, flags); + return job; } /** @@ -698,12 +696,18 @@ static int drm_sched_main(void *param) struct drm_sched_fence *s_fence; struct drm_sched_job *sched_job; struct dma_fence *fence; + struct drm_sched_job *cleanup_job = NULL; wait_event_interruptible(sched->wake_up_worker, - (drm_sched_cleanup_jobs(sched), + (cleanup_job = drm_sched_get_cleanup_job(sched)) || (!drm_sched_blocked(sched) && (entity = drm_sched_select_entity(sched))) || - kthread_should_stop())); + kthread_should_stop());
Can't we just call drm_sched_cleanup_jobs right here, remove all the conditions in wait_event_interruptible (make it always true) and after drm_sched_cleanup_jobs is called test for all those conditions and return to sleep if they evaluate to false ? drm_sched_cleanup_jobs is called unconditionally inside wait_event_interruptible anyway... This is more of a question to Christian.
Christian may know better than me, but I think those conditions need to be in wait_event_interruptible() to avoid race conditions. If we simply replace all the conditions with a literal "true" then wait_event_interruptible() will never actually sleep.
Steve
Yes you right, it won't work as I missed that condition is evaluated as first step in wait_event_interruptible before it sleeps.
Andrey
Another idea - what about still just relocating drm_sched_cleanup_jobs to after wait_event_interruptible and also call it in drm_sched_fini so for the case when it will not be called from drm_sched_main due to conditions not evaluating to true it eventually be called last time from drm_sched_fini. I mean - the refactor looks ok to me but the code becomes somewhat confusing this way to grasp.
Andrey