Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

14 Aug 2018

      I would rather like to avoid taking the lock in the hot path.
How about this:
/* For killed process disable any more IBs enqueue right now */
     last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
      if ((!last_user || last_user == current->group_leader) &&
          (current->flags & PF_EXITING) && (current->exit_code == 
SIGKILL)) {
         grab_lock();
          drm_sched_rq_remove_entity(entity->rq, entity);
         if (READ_ONCE(&entity->last_user) != NULL)
             drm_sched_rq_add_entity(entity->rq, entity);
         drop_lock();
     }
Christian.
Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:
...
Attached.
If the general idea in the patch is OK I can think of a test (and 
maybe add to libdrm amdgpu tests) to actually simulate this scenario 
with 2 forked
concurrent processes working on same entity's job queue when one is 
dying while the other keeps pushing to the same queue. For now I only 
tested it
with normal boot and ruining multiple glxgears concurrently - which 
doesn't really test this code path since i think each of them works on 
it's own FD.
Andrey
On 08/10/2018 09:27 AM, Christian König wrote:
...
Crap, yeah indeed that needs to be protected by some lock.
Going to prepare a patch for that,
Christian.
Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:
...
Reviewed-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
But I still  have questions about entity->last_user (didn't notice 
this before) -
Looks to me there is a race condition with it's current usage, let's 
say process A was preempted after doing 
drm_sched_entity_flush->cmpxchg(...)
now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user 
and also
executes drm_sched_rq_add_entity. Now process A runs again and 
execute drm_sched_rq_remove_entity inadvertently causing process B 
removal
from it's scheduler rq.
Looks to me like instead we should lock together entity->last_user 
accesses and adds/removals of entity to the rq.
Andrey
On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
...
I forgot about this since we started discussing possible scenarios 
of processes and threads.
In any case, this check is redundant. Acked-by: Nayan Deshmukh 
<nayan26deshmukh@gmail.com mailto:nayan26deshmukh@gmail.com>
Nayan
On Mon, Aug 6, 2018 at 7:43 PM Christian König 
<ckoenig.leichtzumerken@gmail.com 
mailto:ckoenig.leichtzumerken@gmail.com> wrote:
Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König <christian.koenig@amd.com
<mailto:christian.koenig@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -----
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(&entity->rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed
entity\n");
> -  spin_unlock(&entity->rq_lock);
> -                     return;
> -             }
>  drm_sched_rq_add_entity(entity->rq, entity);
>               spin_unlock(&entity->rq_lock);
>  drm_sched_wakeup(entity->rq->sched);

dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check