when allocating a resource in place it is common to free the buffer's resource, then allocate a new resource in a different placement.
e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls ttm_bo_mem_space.
In this situation, bo->resource will be null as it is cleared during the initial freeing of the previous resource. This leads to a null deref.
Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer)
Signed-off-by: Robert Beckett bob.beckett@collabora.com --- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index db3dc7ef5382..62b29ee7d040 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, }
error: - if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) + if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) ttm_bo_move_to_lru_tail_unlocked(bo);
return ret;
Am 18.03.22 um 20:50 schrieb Robert Beckett:
when allocating a resource in place it is common to free the buffer's resource, then allocate a new resource in a different placement.
e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls ttm_bo_mem_space.
Well yes I'm working the drivers towards this, but NAK at the moment. Currently bo->resource is never expected to be NULL.
And yes I'm searching for this bug in amdgpu for quite a while. Where exactly does that happen?
Amdgpu is supposed to allocate a new resource first, then do a swap and the free the old one.
Thanks, Christian.
In this situation, bo->resource will be null as it is cleared during the initial freeing of the previous resource. This leads to a null deref.
Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer)
Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/ttm/ttm_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index db3dc7ef5382..62b29ee7d040 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, }
error:
- if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count)
if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) ttm_bo_move_to_lru_tail_unlocked(bo);
return ret;
On 21/03/2022 09:51, Christian König wrote:
Am 18.03.22 um 20:50 schrieb Robert Beckett:
when allocating a resource in place it is common to free the buffer's resource, then allocate a new resource in a different placement.
e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls ttm_bo_mem_space.
Well yes I'm working the drivers towards this, but NAK at the moment. Currently bo->resource is never expected to be NULL.
And yes I'm searching for this bug in amdgpu for quite a while. Where exactly does that happen?
in my case, I am writing new code for i915 that does this. I will switch it to allocate the new resource first, then free the old one if successful.
For the existing amd case, see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls ttm_bo_mem_space. If the ttm_bo_mem_space call fails (e.g. due to memory pressure), then the error path will try to deref bo->resource, which will be null at that point.
to fix this, I honestly don't see a reason to not also have the safety check for null there. It could check early and return an error if it is null. I think that defensive programming here makes sense, better than a null deref if someone programs it wrong.
Amdgpu is supposed to allocate a new resource first, then do a swap and the free the old one.
Thanks, Christian.
In this situation, bo->resource will be null as it is cleared during the initial freeing of the previous resource. This leads to a null deref.
Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer)
Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/ttm/ttm_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index db3dc7ef5382..62b29ee7d040 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, } error: - if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) + if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) ttm_bo_move_to_lru_tail_unlocked(bo); return ret;
Am 21.03.22 um 16:44 schrieb Robert Beckett:
On 21/03/2022 09:51, Christian König wrote:
Am 18.03.22 um 20:50 schrieb Robert Beckett:
when allocating a resource in place it is common to free the buffer's resource, then allocate a new resource in a different placement.
e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls ttm_bo_mem_space.
Well yes I'm working the drivers towards this, but NAK at the moment. Currently bo->resource is never expected to be NULL.
And yes I'm searching for this bug in amdgpu for quite a while. Where exactly does that happen?
in my case, I am writing new code for i915 that does this. I will switch it to allocate the new resource first, then free the old one if successful.
For the existing amd case, see https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel...
amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls ttm_bo_mem_space. If the ttm_bo_mem_space call fails (e.g. due to memory pressure), then the error path will try to deref bo->resource, which will be null at that point.
Yeah, but that's a special handling only used during driver startup. We somehow have this on systems with DMA-buf sharing as well.
to fix this, I honestly don't see a reason to not also have the safety check for null there. It could check early and return an error if it is null. I think that defensive programming here makes sense, better than a null deref if someone programs it wrong.
Having it here is fine, the problem is you need to have that at tons of other places as well.
Maybe I should send you my WIP patch set for this? If you handle all the other cases as well I'm perfectly fine with this.
Regards, Christian.
Amdgpu is supposed to allocate a new resource first, then do a swap and the free the old one.
Thanks, Christian.
In this situation, bo->resource will be null as it is cleared during the initial freeing of the previous resource. This leads to a null deref.
Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer)
Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/ttm/ttm_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index db3dc7ef5382..62b29ee7d040 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, } error: - if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) + if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) ttm_bo_move_to_lru_tail_unlocked(bo); return ret;
dri-devel@lists.freedesktop.org