On 2022-04-26 22:31, Hangyu Hua wrote:
On 2022/4/26 22:55, Andrey Grodzovsky wrote:
On 2022-04-25 22:54, Hangyu Hua wrote:
On 2022/4/25 23:42, Andrey Grodzovsky wrote:
On 2022-04-25 04:36, Hangyu Hua wrote:
When drm_sched_job_add_dependency() fails, dma_fence_put() will be called internally. Calling it again after drm_sched_job_add_dependency() finishes may result in a dangling pointer.
Fix this by removing redundant dma_fence_put().
Signed-off-by: Hangyu Hua hbh25y@gmail.com
drivers/gpu/drm/lima/lima_gem.c | 1 - drivers/gpu/drm/scheduler/sched_main.c | 1 - 2 files changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index 55bb1ec3c4f7..99c8e7f6bb1c 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -291,7 +291,6 @@ static int lima_gem_add_deps(struct drm_file *file, struct lima_submit *submit) err = drm_sched_job_add_dependency(&submit->task->base, fence); if (err) { - dma_fence_put(fence); return err;
Makes sense here
} } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index b81fceb0b8a2..ebab9eca37a8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -708,7 +708,6 @@ int drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job, dma_fence_get(fence); ret = drm_sched_job_add_dependency(job, fence); if (ret) { - dma_fence_put(fence);
Not sure about this one since if you look at the relevant commits - 'drm/scheduler: fix drm_sched_job_add_implicit_dependencies' and 'drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder' You will see that the dma_fence_put here balances the extra dma_fence_get above
Andrey
I don't think so. I checked the call chain and found no additional dma_fence_get(). But dma_fence_get() needs to be called before drm_sched_job_add_dependency() to keep the counter balanced.
I don't say there is an additional get, I just say that drm_sched_job_add_dependency doesn't grab an extra reference to the fences it stores so this needs to be done outside and for that drm_sched_job_add_implicit_dependencies->dma_fence_get is called and, if this addition fails you just call dma_fence_put to keep the counter balanced.
drm_sched_job_add_implicit_dependencies() will call drm_sched_job_add_dependency(). And drm_sched_job_add_dependency() already call dma_fence_put() when it fails. Calling dma_fence_put() twice doesn't make sense.
dma_fence_get() is in [2]. But dma_fence_put() will be called in [1] and [3] when xa_alloc() fails.
The way I see it, [2] and [3] are mat matching *get* and *put* respectively. [1] *put* is against the original dma_fence_init->kref_init of the fence which always set the refcount at 1. Also in support of this see commit 'drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder' - it says there "drm_sched_job_add_dependency() could drop the last ref" - this last ref is the original refcount set by dma_fence_init->kref
Andrey
int drm_sched_job_add_dependency(struct drm_sched_job *job, struct dma_fence *fence) { ... ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL); if (ret != 0) dma_fence_put(fence); <--- [1]
return ret; } EXPORT_SYMBOL(drm_sched_job_add_dependency);
int drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job, struct drm_gem_object *obj, bool write) { struct dma_resv_iter cursor; struct dma_fence *fence; int ret;
dma_resv_for_each_fence(&cursor, obj->resv, write, fence) { /* Make sure to grab an additional ref on the added fence */ dma_fence_get(fence); <--- [2] ret = drm_sched_job_add_dependency(job, fence); if (ret) { dma_fence_put(fence); <--- [3] return ret; } } return 0; }
On the other hand, dma_fence_get() and dma_fence_put() are meaningless here if threre is an extra dma_fence_get() beacause counter will not decrease to 0 during drm_sched_job_add_dependency().
I check the call chain as follows:
msm_ioctl_gem_submit() -> submit_fence_sync() -> drm_sched_job_add_implicit_dependencies()
Can you maybe trace or print one such example of problematic refcount that you are trying to fix ? I still don't see where is the problem.
Andrey
I also wish I could. System logs can make this easy. But i don't have a corresponding GPU physical device. drm_sched_job_add_implicit_dependencies is only used in a few devices.
Thanks.
Thanks, Hangyu
return ret; } }