From: Michel Dänzer michel.daenzer@amd.com
Fixes the BUG_ON spuriously triggering under the following circumstances:
* ttm_eu_reserve_buffers processes a list containing multiple BOs using the same reservation object, so it calls reservation_object_reserve_shared with that reservation object once for each such BO. * In reservation_object_reserve_shared, old->shared_count == old->shared_max - 1, so obj->staged is freed in preparation of an in-place update. * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence once for each of the BOs above, always with the same fence. * The first call adds the fence in the remaining free slot, after which old->shared_count == old->shared_max.
In the next call to reservation_object_add_shared_fence, the BUG_ON triggers. However, nothing bad would happen in reservation_object_add_shared_inplace, since the fence is already in the reservation object.
Prevent this by moving the BUG_ON to where an overflow would actually happen (e.g. if a buggy caller didn't call reservation_object_reserve_shared before).
Cc: stable@vger.kernel.org Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/dma-buf/reservation.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index 314eb1071cce..532545b9488e 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, if (signaled) { RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); } else { + BUG_ON(fobj->shared_count >= fobj->shared_max); RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence); fobj->shared_count++; } @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj, old = reservation_object_get_list(obj); obj->staged = NULL;
- if (!fobj) { - BUG_ON(old->shared_count >= old->shared_max); + if (!fobj) reservation_object_add_shared_inplace(obj, old, fence); - } else + else reservation_object_add_shared_replace(obj, old, fobj, fence); } EXPORT_SYMBOL(reservation_object_add_shared_fence);
Quoting Michel Dänzer (2018-06-26 15:31:47)
From: Michel Dänzer michel.daenzer@amd.com
Fixes the BUG_ON spuriously triggering under the following circumstances:
- ttm_eu_reserve_buffers processes a list containing multiple BOs using the same reservation object, so it calls reservation_object_reserve_shared with that reservation object once for each such BO.
- In reservation_object_reserve_shared, old->shared_count == old->shared_max - 1, so obj->staged is freed in preparation of an in-place update.
- ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence once for each of the BOs above, always with the same fence.
- The first call adds the fence in the remaining free slot, after which old->shared_count == old->shared_max.
In the next call to reservation_object_add_shared_fence, the BUG_ON triggers. However, nothing bad would happen in reservation_object_add_shared_inplace, since the fence is already in the reservation object.
Prevent this by moving the BUG_ON to where an overflow would actually happen (e.g. if a buggy caller didn't call reservation_object_reserve_shared before).
Cc: stable@vger.kernel.org Signed-off-by: Michel Dänzer michel.daenzer@amd.com
I've convinced myself (or rather have not found a valid argument against) that being able to call reserve_shared + add_shared multiple times for the same fence is an intended part of reservation_object API
I'd double check with Christian though.
Reviewed-by: Chris Wilson chris@chris-wilson.co.uk
drivers/dma-buf/reservation.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index 314eb1071cce..532545b9488e 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, if (signaled) { RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); } else {
BUG_ON(fobj->shared_count >= fobj->shared_max);
Personally I would just let kasan detect this and throw away the BUG_ON or at least move it behind some DMABUF_BUG_ON(). -Chris
On 2018-06-27 01:50 PM, Chris Wilson wrote:
Quoting Michel Dänzer (2018-06-26 15:31:47)
From: Michel Dänzer michel.daenzer@amd.com
Fixes the BUG_ON spuriously triggering under the following circumstances:
- ttm_eu_reserve_buffers processes a list containing multiple BOs using the same reservation object, so it calls reservation_object_reserve_shared with that reservation object once for each such BO.
- In reservation_object_reserve_shared, old->shared_count == old->shared_max - 1, so obj->staged is freed in preparation of an in-place update.
- ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence once for each of the BOs above, always with the same fence.
- The first call adds the fence in the remaining free slot, after which old->shared_count == old->shared_max.
In the next call to reservation_object_add_shared_fence, the BUG_ON triggers. However, nothing bad would happen in reservation_object_add_shared_inplace, since the fence is already in the reservation object.
Prevent this by moving the BUG_ON to where an overflow would actually happen (e.g. if a buggy caller didn't call reservation_object_reserve_shared before).
Cc: stable@vger.kernel.org Signed-off-by: Michel Dänzer michel.daenzer@amd.com
I've convinced myself (or rather have not found a valid argument against) that being able to call reserve_shared + add_shared multiple times for the same fence is an intended part of reservation_object API
I'd double check with Christian though.
Right, I'm interested in Christian's feedback.
Reviewed-by: Chris Wilson chris@chris-wilson.co.uk
Thanks!
drivers/dma-buf/reservation.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index 314eb1071cce..532545b9488e 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, if (signaled) { RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); } else {
BUG_ON(fobj->shared_count >= fobj->shared_max);
Personally I would just let kasan detect this and throw away the BUG_ON or at least move it behind some DMABUF_BUG_ON().
Hmm. Normally, I'm not a fan of BUG(_ON) either. But in this case, it's clear that the caller is buggy, and proceeding to write beyond the end of the array could have far-reaching consequences. I'm leaving that to somebody else.
Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
From: Michel Dänzer michel.daenzer@amd.com
Fixes the BUG_ON spuriously triggering under the following circumstances:
- ttm_eu_reserve_buffers processes a list containing multiple BOs using the same reservation object, so it calls reservation_object_reserve_shared with that reservation object once for each such BO.
- In reservation_object_reserve_shared, old->shared_count == old->shared_max - 1, so obj->staged is freed in preparation of an in-place update.
- ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence once for each of the BOs above, always with the same fence.
- The first call adds the fence in the remaining free slot, after which old->shared_count == old->shared_max.
Well, the explanation here is not correct. For multiple BOs using the same reservation object we won't call reservation_object_add_shared_fence() multiple times because we move those to the duplicates list in ttm_eu_reserve_buffers().
But this bug can still happen because we call reservation_object_add_shared_fence() manually with fences for the same context in a couple of places.
One prominent case which comes to my mind are for the VM BOs during updates. Another possibility are VRAM BOs which need to be cleared.
In the next call to reservation_object_add_shared_fence, the BUG_ON triggers. However, nothing bad would happen in reservation_object_add_shared_inplace, since the fence is already in the reservation object.
Prevent this by moving the BUG_ON to where an overflow would actually happen (e.g. if a buggy caller didn't call reservation_object_reserve_shared before).
Cc: stable@vger.kernel.org Signed-off-by: Michel Dänzer michel.daenzer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com.
Regards, Christian.
drivers/dma-buf/reservation.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index 314eb1071cce..532545b9488e 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, if (signaled) { RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); } else {
RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence); fobj->shared_count++; }BUG_ON(fobj->shared_count >= fobj->shared_max);
@@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj, old = reservation_object_get_list(obj); obj->staged = NULL;
- if (!fobj) {
BUG_ON(old->shared_count >= old->shared_max);
- if (!fobj) reservation_object_add_shared_inplace(obj, old, fence);
- } else
- else reservation_object_add_shared_replace(obj, old, fobj, fence); } EXPORT_SYMBOL(reservation_object_add_shared_fence);
On 2018-07-04 10:31 AM, Christian König wrote:
Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
From: Michel Dänzer michel.daenzer@amd.com
Fixes the BUG_ON spuriously triggering under the following circumstances:
- ttm_eu_reserve_buffers processes a list containing multiple BOs using
the same reservation object, so it calls reservation_object_reserve_shared with that reservation object once for each such BO.
- In reservation_object_reserve_shared, old->shared_count ==
old->shared_max - 1, so obj->staged is freed in preparation of an in-place update.
- ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
once for each of the BOs above, always with the same fence.
- The first call adds the fence in the remaining free slot, after which
old->shared_count == old->shared_max.
Well, the explanation here is not correct. For multiple BOs using the same reservation object we won't call reservation_object_add_shared_fence() multiple times because we move those to the duplicates list in ttm_eu_reserve_buffers().
But this bug can still happen because we call reservation_object_add_shared_fence() manually with fences for the same context in a couple of places.
One prominent case which comes to my mind are for the VM BOs during updates. Another possibility are VRAM BOs which need to be cleared.
Thanks. How about the following:
* ttm_eu_reserve_buffers calls reservation_object_reserve_shared. * In reservation_object_reserve_shared, shared_count == shared_max - 1, so obj->staged is freed in preparation of an in-place update. * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence, after which shared_count == shared_max. * The amdgpu driver also calls reservation_object_add_shared_fence for the same reservation object, and the BUG_ON triggers.
However, nothing bad would happen in reservation_object_add_shared_inplace, since all fences use the same context, so they can only occupy a single slot.
Prevent this by moving the BUG_ON to where an overflow would actually happen (e.g. if a buggy caller didn't call reservation_object_reserve_shared before).
Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2, as I suspect this fix is necessary under the circumstances described there as well.
Am 04.07.2018 um 11:09 schrieb Michel Dänzer:
On 2018-07-04 10:31 AM, Christian König wrote:
Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
From: Michel Dänzer michel.daenzer@amd.com
Fixes the BUG_ON spuriously triggering under the following circumstances:
- ttm_eu_reserve_buffers processes a list containing multiple BOs using
the same reservation object, so it calls reservation_object_reserve_shared with that reservation object once for each such BO.
- In reservation_object_reserve_shared, old->shared_count ==
old->shared_max - 1, so obj->staged is freed in preparation of an in-place update.
- ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
once for each of the BOs above, always with the same fence.
- The first call adds the fence in the remaining free slot, after which
old->shared_count == old->shared_max.
Well, the explanation here is not correct. For multiple BOs using the same reservation object we won't call reservation_object_add_shared_fence() multiple times because we move those to the duplicates list in ttm_eu_reserve_buffers().
But this bug can still happen because we call reservation_object_add_shared_fence() manually with fences for the same context in a couple of places.
One prominent case which comes to my mind are for the VM BOs during updates. Another possibility are VRAM BOs which need to be cleared.
Thanks. How about the following:
- ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
- In reservation_object_reserve_shared, shared_count == shared_max - 1, so obj->staged is freed in preparation of an in-place update.
- ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence, after which shared_count == shared_max.
- The amdgpu driver also calls reservation_object_add_shared_fence for the same reservation object, and the BUG_ON triggers.
I would rather completely drop the reference to the ttm_eu_* functions, cause those wrappers are completely unrelated to the problem.
Instead let's just note something like the following:
* When reservation_object_reserve_shared is called with shared_count == shared_max - 1, so obj->staged is freed in preparation of an in-place update.
* Now reservation_object_add_shared_fence is called with the first fence and after that shared_count == shared_max.
* After that reservation_object_add_shared_fence can be called with follow up fences from the same context, but since shared_count == shared_max we would run into this BUG_ON.
However, nothing bad would happen in reservation_object_add_shared_inplace, since all fences use the same context, so they can only occupy a single slot.
Prevent this by moving the BUG_ON to where an overflow would actually happen (e.g. if a buggy caller didn't call reservation_object_reserve_shared before).
Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2, as I suspect this fix is necessary under the circumstances described there as well.
The rest sounds good to me.
Regards, Christian.
From: Michel Dänzer michel.daenzer@amd.com
Fixes the BUG_ON spuriously triggering under the following circumstances:
* reservation_object_reserve_shared is called with shared_count == shared_max - 1, so obj->staged is freed in preparation of an in-place update.
* reservation_object_add_shared_fence is called with the first fence, after which shared_count == shared_max.
* reservation_object_add_shared_fence is called with a follow-up fence from the same context.
In the second reservation_object_add_shared_fence call, the BUG_ON triggers. However, nothing bad would happen in reservation_object_add_shared_inplace, since both fences are from the same context, so they only occupy a single slot.
Prevent this by moving the BUG_ON to where an overflow would actually happen (e.g. if a buggy caller didn't call reservation_object_reserve_shared before).
v2: * Fix description of breaking scenario (Christian König) * Add bugzilla reference
Cc: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/106418 Reviewed-by: Chris Wilson chris@chris-wilson.co.uk # v1 Reviewed-by: Christian König christian.koenig@amd.com # v1 Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/dma-buf/reservation.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index 314eb1071cce..532545b9488e 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, if (signaled) { RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); } else { + BUG_ON(fobj->shared_count >= fobj->shared_max); RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence); fobj->shared_count++; } @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj, old = reservation_object_get_list(obj); obj->staged = NULL;
- if (!fobj) { - BUG_ON(old->shared_count >= old->shared_max); + if (!fobj) reservation_object_add_shared_inplace(obj, old, fence); - } else + else reservation_object_add_shared_replace(obj, old, fobj, fence); } EXPORT_SYMBOL(reservation_object_add_shared_fence);
dri-devel@lists.freedesktop.org