[PATCH 1/2] drm/amdgpu: stop trying to schedule() with a spin held

List overview All Threads
Download

newer

older

[PULL] drm-intel-fixes

Looking for pointers on diagnosing...

Christian König

13 Jun 2016 13 Jun '16

2:12 p.m.

From: Christian König christian.koenig@amd.com

Drop the lock before calling cancel_delayed_work_sync().

Signed-off-by: Christian König christian.koenig@amd.com Tested-by: Nicolai Hähnle nicolai.haehnle@amd.com --- drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c index 74aa0b3..b1d49c5 100644 --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c @@ -339,7 +339,9 @@ static void amd_sched_job_finish(struct work_struct *work) if (sched->timeout != MAX_SCHEDULE_TIMEOUT) { struct amd_sched_job *next;

+ spin_unlock_irqrestore(&sched->job_list_lock, flags); cancel_delayed_work_sync(&s_job->work_tdr); + spin_lock_irqsave(&sched->job_list_lock, flags);

/* queue TDR for next job */ next = list_first_entry_or_null(&sched->ring_mirror_list,

-- 2.5.0

Show replies by date

Christian König

13 Jun 13 Jun

2:12 p.m.

New subject: [PATCH 2/2] drm/amdgpu: stop disabling irqs when it isn't neccessary

From: Christian König christian.koenig@amd.com

A regular spin_lock/unlock should do here as well.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c index b1d49c5..e13b140 100644 --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c @@ -331,17 +331,16 @@ static void amd_sched_job_finish(struct work_struct *work) struct amd_sched_job *s_job = container_of(work, struct amd_sched_job, finish_work); struct amd_gpu_scheduler *sched = s_job->sched; - unsigned long flags;

/* remove job from ring_mirror_list */ - spin_lock_irqsave(&sched->job_list_lock, flags); + spin_lock(&sched->job_list_lock); list_del_init(&s_job->node); if (sched->timeout != MAX_SCHEDULE_TIMEOUT) { struct amd_sched_job *next;

- spin_unlock_irqrestore(&sched->job_list_lock, flags); + spin_unlock(&sched->job_list_lock); cancel_delayed_work_sync(&s_job->work_tdr); - spin_lock_irqsave(&sched->job_list_lock, flags); + spin_lock(&sched->job_list_lock);

/* queue TDR for next job */ next = list_first_entry_or_null(&sched->ring_mirror_list, @@ -350,7 +349,7 @@ static void amd_sched_job_finish(struct work_struct *work) if (next) schedule_delayed_work(&next->work_tdr, sched->timeout); } - spin_unlock_irqrestore(&sched->job_list_lock, flags); + spin_unlock(&sched->job_list_lock); sched->ops->free_job(s_job); }

@@ -364,15 +363,14 @@ static void amd_sched_job_finish_cb(struct fence *f, struct fence_cb *cb) static void amd_sched_job_begin(struct amd_sched_job *s_job) { struct amd_gpu_scheduler *sched = s_job->sched; - unsigned long flags;

- spin_lock_irqsave(&sched->job_list_lock, flags); + spin_lock(&sched->job_list_lock); list_add_tail(&s_job->node, &sched->ring_mirror_list); if (sched->timeout != MAX_SCHEDULE_TIMEOUT && list_first_entry_or_null(&sched->ring_mirror_list, struct amd_sched_job, node) == s_job) schedule_delayed_work(&s_job->work_tdr, sched->timeout); - spin_unlock_irqrestore(&sched->job_list_lock, flags); + spin_unlock(&sched->job_list_lock); }

static void amd_sched_job_timedout(struct work_struct *work)

-- 2.5.0

Daniel Vetter

4:33 p.m.

New subject: [PATCH 2/2] drm/amdgpu: stop disabling irqs when it isn't neccessary

On Mon, Jun 13, 2016 at 04:12:43PM +0200, Christian König wrote:

...

From: Christian König christian.koenig@amd.com

A regular spin_lock/unlock should do here as well.

Signed-off-by: Christian König christian.koenig@amd.com

Just drive-by comment, but you can't mix up irq spinlocks with normal ones. And there's places where this is still irqsave, but some of these functions can be called directly from the scheduler kthread. You can only drop the _irqsave if you know you're always called from hardirq context (softirq isn't enough), or when you know someone already disabled interrupts. And you can simplify the _irqsave to just _irq when you are in always in process context and irqs are always still enabled.

On a super-quick look seems fishy. -Daniel

...

drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c index b1d49c5..e13b140 100644 --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c @@ -331,17 +331,16 @@ static void amd_sched_job_finish(struct work_struct *work) struct amd_sched_job *s_job = container_of(work, struct amd_sched_job, finish_work); struct amd_gpu_scheduler *sched = s_job->sched;

unsigned long flags;

/* remove job from ring_mirror_list */

spin_lock_irqsave(&sched->job_list_lock, flags);

spin_lock(&sched->job_list_lock); list_del_init(&s_job->node); if (sched->timeout != MAX_SCHEDULE_TIMEOUT) { struct amd_sched_job *next;
spin_unlock_irqrestore(&sched->job_list_lock, flags);
spin_unlock(&sched->job_list_lock);
cancel_delayed_work_sync(&s_job->work_tdr);
spin_lock_irqsave(&sched->job_list_lock, flags);
spin_lock(&sched->job_list_lock);
/* queue TDR for next job */ next = list_first_entry_or_null(&sched->ring_mirror_list,
@@ -350,7 +349,7 @@ static void amd_sched_job_finish(struct work_struct *work) if (next) schedule_delayed_work(&next->work_tdr, sched->timeout); }

spin_unlock_irqrestore(&sched->job_list_lock, flags);

spin_unlock(&sched->job_list_lock); sched->ops->free_job(s_job);

}

@@ -364,15 +363,14 @@ static void amd_sched_job_finish_cb(struct fence *f, struct fence_cb *cb) static void amd_sched_job_begin(struct amd_sched_job *s_job) { struct amd_gpu_scheduler *sched = s_job->sched;

unsigned long flags;

spin_lock_irqsave(&sched->job_list_lock, flags);

spin_lock(&sched->job_list_lock); list_add_tail(&s_job->node, &sched->ring_mirror_list); if (sched->timeout != MAX_SCHEDULE_TIMEOUT && list_first_entry_or_null(&sched->ring_mirror_list, struct amd_sched_job, node) == s_job) schedule_delayed_work(&s_job->work_tdr, sched->timeout);

spin_unlock_irqrestore(&sched->job_list_lock, flags);

spin_unlock(&sched->job_list_lock);

}

static void amd_sched_job_timedout(struct work_struct *work)

2.5.0

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

5:37 p.m.

New subject: [PATCH 2/2] drm/amdgpu: stop disabling irqs when it isn't neccessary

Am 13.06.2016 um 18:33 schrieb Daniel Vetter:

...

On Mon, Jun 13, 2016 at 04:12:43PM +0200, Christian König wrote:

...
From: Christian König christian.koenig@amd.com

A regular spin_lock/unlock should do here as well.

Signed-off-by: Christian König christian.koenig@amd.com

Just drive-by comment, but you can't mix up irq spinlocks with normal ones. And there's places where this is still irqsave, but some of these functions can be called directly from the scheduler kthread. You can only drop the _irqsave if you know you're always called from hardirq context (softirq isn't enough), or when you know someone already disabled interrupts. And you can simplify the _irqsave to just _irq when you are in always in process context and irqs are always still enabled.

On a super-quick look seems fishy.

The point is there isn't even any IRQ involved in all of this.

The spin is either locked from a work item or the kthread, never from irq context.

Christian.

...

-Daniel

...
drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c index b1d49c5..e13b140 100644 --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c @@ -331,17 +331,16 @@ static void amd_sched_job_finish(struct work_struct *work) struct amd_sched_job *s_job = container_of(work, struct amd_sched_job, finish_work); struct amd_gpu_scheduler *sched = s_job->sched;

unsigned long flags;

/* remove job from ring_mirror_list */

spin_lock_irqsave(&sched->job_list_lock, flags);

spin_lock(&sched->job_list_lock); list_del_init(&s_job->node); if (sched->timeout != MAX_SCHEDULE_TIMEOUT) { struct amd_sched_job *next;
spin_unlock_irqrestore(&sched->job_list_lock, flags);
spin_unlock(&sched->job_list_lock);
cancel_delayed_work_sync(&s_job->work_tdr);
spin_lock_irqsave(&sched->job_list_lock, flags);
spin_lock(&sched->job_list_lock);
/* queue TDR for next job */ next = list_first_entry_or_null(&sched->ring_mirror_list,
@@ -350,7 +349,7 @@ static void amd_sched_job_finish(struct work_struct *work) if (next) schedule_delayed_work(&next->work_tdr, sched->timeout); }

spin_unlock_irqrestore(&sched->job_list_lock, flags);

spin_unlock(&sched->job_list_lock); sched->ops->free_job(s_job); }

@@ -364,15 +363,14 @@ static void amd_sched_job_finish_cb(struct fence *f, struct fence_cb *cb) static void amd_sched_job_begin(struct amd_sched_job *s_job) { struct amd_gpu_scheduler *sched = s_job->sched;

unsigned long flags;

spin_lock_irqsave(&sched->job_list_lock, flags);

spin_lock(&sched->job_list_lock); list_add_tail(&s_job->node, &sched->ring_mirror_list); if (sched->timeout != MAX_SCHEDULE_TIMEOUT && list_first_entry_or_null(&sched->ring_mirror_list, struct amd_sched_job, node) == s_job) schedule_delayed_work(&s_job->work_tdr, sched->timeout);

spin_unlock_irqrestore(&sched->job_list_lock, flags);

spin_unlock(&sched->job_list_lock); }

static void amd_sched_job_timedout(struct work_struct *work)

-- 2.5.0

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Daniel Vetter

9:19 p.m.

New subject: [PATCH 2/2] drm/amdgpu: stop disabling irqs when it isn't neccessary

On Mon, Jun 13, 2016 at 7:37 PM, Christian König deathsimple@vodafone.de wrote:

...

Am 13.06.2016 um 18:33 schrieb Daniel Vetter:

...
On Mon, Jun 13, 2016 at 04:12:43PM +0200, Christian König wrote:

...
From: Christian König christian.koenig@amd.com

A regular spin_lock/unlock should do here as well.

Signed-off-by: Christian König christian.koenig@amd.com

Just drive-by comment, but you can't mix up irq spinlocks with normal ones. And there's places where this is still irqsave, but some of these functions can be called directly from the scheduler kthread. You can only drop the _irqsave if you know you're always called from hardirq context (softirq isn't enough), or when you know someone already disabled interrupts. And you can simplify the _irqsave to just _irq when you are in always in process context and irqs are always still enabled.

On a super-quick look seems fishy.

The point is there isn't even any IRQ involved in all of this.

The spin is either locked from a work item or the kthread, never from irq context.

In that case git grep job_list_lock finds more of those in sched_fence.c. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Michel Dänzer

14 Jun 14 Jun

2:23 a.m.

On 06/13/16 23:12, Christian König wrote:

...

From: Christian König christian.koenig@amd.com

Drop the lock before calling cancel_delayed_work_sync().

Signed-off-by: Christian König christian.koenig@amd.com Tested-by: Nicolai Hähnle nicolai.haehnle@amd.com

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96445

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Alex Deucher

8:21 p.m.

On Mon, Jun 13, 2016 at 10:23 PM, Michel Dänzer michel@daenzer.net wrote:

...

On 06/13/16 23:12, Christian König wrote:

...
From: Christian König christian.koenig@amd.com

Drop the lock before calling cancel_delayed_work_sync().

Signed-off-by: Christian König christian.koenig@amd.com Tested-by: Nicolai Hähnle nicolai.haehnle@amd.com

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96445

Reviewed-by: Alex Deucher alexander.deucher@amd.com

Applied with the bugzilla entry added.

Thanks,

Alex

...

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

3257

Age (days ago)

3258

Last active (days ago)

dri-devel@lists.freedesktop.org

6 comments

4 participants

tags (0)

participants (4)

Alex Deucher
Christian König
Daniel Vetter
Michel Dänzer