New version of the series, with feedback from previous series added.
First 11 patches are clean, some small fixes might required still for all to pass.
Maarten Lankhorst (16): drm/i915: Remove unused bits of i915_vma/active api drm/i915: Change shrink ordering to use locking around unbinding. drm/i915: Remove pages_mutex and intel_gtt->vma_ops.set/clear_pages members, v2. drm/i915: Take object lock in i915_ggtt_pin if ww is not set drm/i915: Force ww lock for i915_gem_object_ggtt_pin_ww drm/i915: Ensure gem_contexts selftests work with unbind changes. drm/i915: Take trylock during eviction, v2. drm/i915: Pass trylock context to callers drm/i915: Ensure i915_vma tests do not get -ENOSPC with the locking changes. drm/i915: Make i915_gem_evict_vm work correctly for already locked objects drm/i915: Call i915_gem_evict_vm in vm_fault_gtt to prevent new ENOSPC errors drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for i915_vma_unbind drm/i915: Require object lock when freeing pages during destruction drm/i915: Remove assert_object_held_shared drm/i915: Remove support for unlocked i915_vma unbind drm/i915: Remove short-term pins from execbuf, v5.
drivers/gpu/drm/i915/display/intel_dpt.c | 2 - drivers/gpu/drm/i915/display/intel_fb_pin.c | 2 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 250 ++++---- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 18 +- drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 +- drivers/gpu/drm/i915/gem/i915_gem_object.h | 22 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 12 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 44 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- .../i915/gem/selftests/i915_gem_client_blt.c | 2 +- .../drm/i915/gem/selftests/i915_gem_context.c | 54 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 6 + drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 15 - drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 450 ++------------ drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 1 - drivers/gpu/drm/i915/gt/intel_gtt.c | 13 - drivers/gpu/drm/i915/gt/intel_gtt.h | 7 - drivers/gpu/drm/i915/gt/intel_ppgtt.c | 12 - drivers/gpu/drm/i915/gt/mock_engine.c | 2 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 +- drivers/gpu/drm/i915/gt/selftest_migrate.c | 2 +- drivers/gpu/drm/i915/gvt/aperture_gm.c | 2 +- drivers/gpu/drm/i915/i915_active.c | 28 +- drivers/gpu/drm/i915/i915_active.h | 17 +- drivers/gpu/drm/i915/i915_drv.h | 12 +- drivers/gpu/drm/i915/i915_gem.c | 28 +- drivers/gpu/drm/i915/i915_gem_evict.c | 64 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 8 +- drivers/gpu/drm/i915/i915_gem_gtt.h | 4 + drivers/gpu/drm/i915/i915_vgpu.c | 2 +- drivers/gpu/drm/i915/i915_vma.c | 580 +++++++++++++++--- drivers/gpu/drm/i915/i915_vma.h | 6 +- drivers/gpu/drm/i915/i915_vma_types.h | 1 - .../gpu/drm/i915/selftests/i915_gem_evict.c | 27 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 48 +- drivers/gpu/drm/i915/selftests/i915_vma.c | 19 +- drivers/gpu/drm/i915/selftests/mock_gtt.c | 4 - 40 files changed, 942 insertions(+), 841 deletions(-)
When reworking the code to move the eviction fence to the object, the best code is removed code.
Remove some functions that are unused, and change the function definition if it's only used in 1 place.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Reviewed-by: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com [mlankhorst: Remove new use of i915_active_has_exclusive] --- drivers/gpu/drm/i915/i915_active.c | 28 +++------------------------- drivers/gpu/drm/i915/i915_active.h | 17 +---------------- drivers/gpu/drm/i915/i915_vma.c | 24 ++++++++++-------------- drivers/gpu/drm/i915/i915_vma.h | 2 -- 4 files changed, 14 insertions(+), 57 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 3103c1e1fd14..ee2b3a375362 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -426,8 +426,9 @@ replace_barrier(struct i915_active *ref, struct i915_active_fence *active) return true; }
-int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence) +int i915_active_add_request(struct i915_active *ref, struct i915_request *rq) { + struct dma_fence *fence = &rq->fence; struct i915_active_fence *active; int err;
@@ -436,7 +437,7 @@ int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence) if (err) return err;
- active = active_instance(ref, idx); + active = active_instance(ref, i915_request_timeline(rq)->fence_context); if (!active) { err = -ENOMEM; goto out; @@ -477,29 +478,6 @@ __i915_active_set_fence(struct i915_active *ref, return prev; }
-static struct i915_active_fence * -__active_fence(struct i915_active *ref, u64 idx) -{ - struct active_node *it; - - it = __active_lookup(ref, idx); - if (unlikely(!it)) { /* Contention with parallel tree builders! */ - spin_lock_irq(&ref->tree_lock); - it = __active_lookup(ref, idx); - spin_unlock_irq(&ref->tree_lock); - } - GEM_BUG_ON(!it); /* slot must be preallocated */ - - return &it->base; -} - -struct dma_fence * -__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence) -{ - /* Only valid while active, see i915_active_acquire_for_context() */ - return __i915_active_set_fence(ref, __active_fence(ref, idx), fence); -} - struct dma_fence * i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f) { diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index 5fcdb0e2bc9e..7eb44132183a 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -164,26 +164,11 @@ void __i915_active_init(struct i915_active *ref, __i915_active_init(ref, active, retire, flags, &__mkey, &__wkey); \ } while (0)
-struct dma_fence * -__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence); -int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence); - -static inline int -i915_active_add_request(struct i915_active *ref, struct i915_request *rq) -{ - return i915_active_ref(ref, - i915_request_timeline(rq)->fence_context, - &rq->fence); -} +int i915_active_add_request(struct i915_active *ref, struct i915_request *rq);
struct dma_fence * i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f);
-static inline bool i915_active_has_exclusive(struct i915_active *ref) -{ - return rcu_access_pointer(ref->excl.fence); -} - int __i915_active_wait(struct i915_active *ref, int state); static inline int i915_active_wait(struct i915_active *ref) { diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 927f0d4f8e11..921d5b946c32 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -356,22 +356,18 @@ int i915_vma_wait_for_bind(struct i915_vma *vma) #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) static int i915_vma_verify_bind_complete(struct i915_vma *vma) { - int err = 0; - - if (i915_active_has_exclusive(&vma->active)) { - struct dma_fence *fence = - i915_active_fence_get(&vma->active.excl); + struct dma_fence *fence = i915_active_fence_get(&vma->active.excl); + int err;
- if (!fence) - return 0; + if (!fence) + return 0;
- if (dma_fence_is_signaled(fence)) - err = fence->error; - else - err = -EBUSY; + if (dma_fence_is_signaled(fence)) + err = fence->error; + else + err = -EBUSY;
- dma_fence_put(fence); - } + dma_fence_put(fence);
return err; } @@ -1249,7 +1245,7 @@ __i915_request_await_bind(struct i915_request *rq, struct i915_vma *vma) return __i915_request_await_exclusive(rq, &vma->active); }
-int __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq) +static int __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq) { int err;
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 4033aa08d5e4..9a931ecb09e5 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -55,8 +55,6 @@ static inline bool i915_vma_is_active(const struct i915_vma *vma) /* do not reserve memory to prevent deadlocks */ #define __EXEC_OBJECT_NO_RESERVE BIT(31)
-int __must_check __i915_vma_move_to_active(struct i915_vma *vma, - struct i915_request *rq); int __must_check _i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq, struct dma_fence *fence,
Call drop_pages with the gem object lock held, instead of the other way around. This will allow us to drop the vma bindings with the gem object lock held.
We plan to require the object lock for unpinning in the future, and this is an easy target.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Reviewed-by: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 38 ++++++++++---------- 1 file changed, 18 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index 157a9765f483..eebff4735781 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -36,8 +36,8 @@ static bool can_release_pages(struct drm_i915_gem_object *obj) return swap_available() || obj->mm.madv == I915_MADV_DONTNEED; }
-static bool unsafe_drop_pages(struct drm_i915_gem_object *obj, - unsigned long shrink, bool trylock_vm) +static int drop_pages(struct drm_i915_gem_object *obj, + unsigned long shrink, bool trylock_vm) { unsigned long flags;
@@ -214,26 +214,24 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww,
spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
- err = 0; - if (unsafe_drop_pages(obj, shrink, trylock_vm)) { - /* May arrive from get_pages on another bo */ - if (!ww) { - if (!i915_gem_object_trylock(obj)) - goto skip; - } else { - err = i915_gem_object_lock(obj, ww); - if (err) - goto skip; - } - - if (!__i915_gem_object_put_pages(obj)) { - if (!try_to_writeback(obj, shrink)) - count += obj->base.size >> PAGE_SHIFT; - } - if (!ww) - i915_gem_object_unlock(obj); + /* May arrive from get_pages on another bo */ + if (!ww) { + if (!i915_gem_object_trylock(obj)) + goto skip; + } else { + err = i915_gem_object_lock(obj, ww); + if (err) + goto skip; }
+ if (drop_pages(obj, shrink, trylock_vm) && + !__i915_gem_object_put_pages(obj) && + !try_to_writeback(obj, shrink)) + count += obj->base.size >> PAGE_SHIFT; + + if (!ww) + i915_gem_object_unlock(obj); + scanned += obj->base.size >> PAGE_SHIFT; skip: i915_gem_object_put(obj);
Big delta, but boils down to moving set_pages to i915_vma.c, and removing the special handling, all callers use the defaults anyway. We only remap in ggtt, so default case will fall through.
Because we still don't require locking in i915_vma_unpin(), handle this by using xchg in get_pages(), as it's locked with obj->mutex, and cmpxchg in unpin, which only fails if we race a against a new pin.
Changes since v1: - aliasing gtt sets ZERO_SIZE_PTR, not -ENODEV, remove special case from __i915_vma_get_pages(). (Matt)
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/display/intel_dpt.c | 2 - drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 15 - drivers/gpu/drm/i915/gt/intel_ggtt.c | 403 ---------------- drivers/gpu/drm/i915/gt/intel_gtt.c | 13 - drivers/gpu/drm/i915/gt/intel_gtt.h | 7 - drivers/gpu/drm/i915/gt/intel_ppgtt.c | 12 - drivers/gpu/drm/i915/i915_vma.c | 444 ++++++++++++++++-- drivers/gpu/drm/i915/i915_vma.h | 3 + drivers/gpu/drm/i915/i915_vma_types.h | 1 - drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 12 +- drivers/gpu/drm/i915/selftests/mock_gtt.c | 4 - 11 files changed, 424 insertions(+), 492 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c index 56755788547d..f2ff70bcbac5 100644 --- a/drivers/gpu/drm/i915/display/intel_dpt.c +++ b/drivers/gpu/drm/i915/display/intel_dpt.c @@ -273,8 +273,6 @@ intel_dpt_create(struct intel_framebuffer *fb)
vm->vma_ops.bind_vma = dpt_bind_vma; vm->vma_ops.unbind_vma = dpt_unbind_vma; - vm->vma_ops.set_pages = ggtt_set_pages; - vm->vma_ops.clear_pages = clear_pages;
vm->pte_encode = gen8_ggtt_pte_encode;
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index 4a166d25fe60..cfc9cc5880ec 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -269,19 +269,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm) free_pd(&ppgtt->base.vm, ppgtt->base.pd); }
-static int pd_vma_set_pages(struct i915_vma *vma) -{ - vma->pages = ERR_PTR(-ENODEV); - return 0; -} - -static void pd_vma_clear_pages(struct i915_vma *vma) -{ - GEM_BUG_ON(!vma->pages); - - vma->pages = NULL; -} - static void pd_vma_bind(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, struct i915_vma *vma, @@ -321,8 +308,6 @@ static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma) }
static const struct i915_vma_ops pd_vma_ops = { - .set_pages = pd_vma_set_pages, - .clear_pages = pd_vma_clear_pages, .bind_vma = pd_vma_bind, .unbind_vma = pd_vma_unbind, }; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 555111c3bee5..d9596087df08 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -22,9 +22,6 @@ #include "intel_gtt.h" #include "gen8_ppgtt.h"
-static int -i915_get_ggtt_vma_pages(struct i915_vma *vma); - static void i915_ggtt_color_adjust(const struct drm_mm_node *node, unsigned long color, u64 *start, @@ -892,21 +889,6 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size) return 0; }
-int ggtt_set_pages(struct i915_vma *vma) -{ - int ret; - - GEM_BUG_ON(vma->pages); - - ret = i915_get_ggtt_vma_pages(vma); - if (ret) - return ret; - - vma->page_sizes = vma->obj->mm.page_sizes; - - return 0; -} - static void gen6_gmch_remove(struct i915_address_space *vm) { struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); @@ -967,8 +949,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.vma_ops.bind_vma = ggtt_bind_vma; ggtt->vm.vma_ops.unbind_vma = ggtt_unbind_vma; - ggtt->vm.vma_ops.set_pages = ggtt_set_pages; - ggtt->vm.vma_ops.clear_pages = clear_pages;
ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
@@ -1117,8 +1097,6 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.vma_ops.bind_vma = ggtt_bind_vma; ggtt->vm.vma_ops.unbind_vma = ggtt_unbind_vma; - ggtt->vm.vma_ops.set_pages = ggtt_set_pages; - ggtt->vm.vma_ops.clear_pages = clear_pages;
return ggtt_probe_common(ggtt, size); } @@ -1162,8 +1140,6 @@ static int i915_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.vma_ops.bind_vma = ggtt_bind_vma; ggtt->vm.vma_ops.unbind_vma = ggtt_unbind_vma; - ggtt->vm.vma_ops.set_pages = ggtt_set_pages; - ggtt->vm.vma_ops.clear_pages = clear_pages;
if (unlikely(ggtt->do_idle_maps)) drm_notice(&i915->drm, @@ -1333,382 +1309,3 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
intel_ggtt_restore_fences(ggtt); } - -static struct scatterlist * -rotate_pages(struct drm_i915_gem_object *obj, unsigned int offset, - unsigned int width, unsigned int height, - unsigned int src_stride, unsigned int dst_stride, - struct sg_table *st, struct scatterlist *sg) -{ - unsigned int column, row; - unsigned int src_idx; - - for (column = 0; column < width; column++) { - unsigned int left; - - src_idx = src_stride * (height - 1) + column + offset; - for (row = 0; row < height; row++) { - st->nents++; - /* - * We don't need the pages, but need to initialize - * the entries so the sg list can be happily traversed. - * The only thing we need are DMA addresses. - */ - sg_set_page(sg, NULL, I915_GTT_PAGE_SIZE, 0); - sg_dma_address(sg) = - i915_gem_object_get_dma_address(obj, src_idx); - sg_dma_len(sg) = I915_GTT_PAGE_SIZE; - sg = sg_next(sg); - src_idx -= src_stride; - } - - left = (dst_stride - height) * I915_GTT_PAGE_SIZE; - - if (!left) - continue; - - st->nents++; - - /* - * The DE ignores the PTEs for the padding tiles, the sg entry - * here is just a conenience to indicate how many padding PTEs - * to insert at this spot. - */ - sg_set_page(sg, NULL, left, 0); - sg_dma_address(sg) = 0; - sg_dma_len(sg) = left; - sg = sg_next(sg); - } - - return sg; -} - -static noinline struct sg_table * -intel_rotate_pages(struct intel_rotation_info *rot_info, - struct drm_i915_gem_object *obj) -{ - unsigned int size = intel_rotation_info_size(rot_info); - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct sg_table *st; - struct scatterlist *sg; - int ret = -ENOMEM; - int i; - - /* Allocate target SG list. */ - st = kmalloc(sizeof(*st), GFP_KERNEL); - if (!st) - goto err_st_alloc; - - ret = sg_alloc_table(st, size, GFP_KERNEL); - if (ret) - goto err_sg_alloc; - - st->nents = 0; - sg = st->sgl; - - for (i = 0 ; i < ARRAY_SIZE(rot_info->plane); i++) - sg = rotate_pages(obj, rot_info->plane[i].offset, - rot_info->plane[i].width, rot_info->plane[i].height, - rot_info->plane[i].src_stride, - rot_info->plane[i].dst_stride, - st, sg); - - return st; - -err_sg_alloc: - kfree(st); -err_st_alloc: - - drm_dbg(&i915->drm, "Failed to create rotated mapping for object size %zu! (%ux%u tiles, %u pages)\n", - obj->base.size, rot_info->plane[0].width, - rot_info->plane[0].height, size); - - return ERR_PTR(ret); -} - -static struct scatterlist * -add_padding_pages(unsigned int count, - struct sg_table *st, struct scatterlist *sg) -{ - st->nents++; - - /* - * The DE ignores the PTEs for the padding tiles, the sg entry - * here is just a convenience to indicate how many padding PTEs - * to insert at this spot. - */ - sg_set_page(sg, NULL, count * I915_GTT_PAGE_SIZE, 0); - sg_dma_address(sg) = 0; - sg_dma_len(sg) = count * I915_GTT_PAGE_SIZE; - sg = sg_next(sg); - - return sg; -} - -static struct scatterlist * -remap_tiled_color_plane_pages(struct drm_i915_gem_object *obj, - unsigned int offset, unsigned int alignment_pad, - unsigned int width, unsigned int height, - unsigned int src_stride, unsigned int dst_stride, - struct sg_table *st, struct scatterlist *sg, - unsigned int *gtt_offset) -{ - unsigned int row; - - if (!width || !height) - return sg; - - if (alignment_pad) - sg = add_padding_pages(alignment_pad, st, sg); - - for (row = 0; row < height; row++) { - unsigned int left = width * I915_GTT_PAGE_SIZE; - - while (left) { - dma_addr_t addr; - unsigned int length; - - /* - * We don't need the pages, but need to initialize - * the entries so the sg list can be happily traversed. - * The only thing we need are DMA addresses. - */ - - addr = i915_gem_object_get_dma_address_len(obj, offset, &length); - - length = min(left, length); - - st->nents++; - - sg_set_page(sg, NULL, length, 0); - sg_dma_address(sg) = addr; - sg_dma_len(sg) = length; - sg = sg_next(sg); - - offset += length / I915_GTT_PAGE_SIZE; - left -= length; - } - - offset += src_stride - width; - - left = (dst_stride - width) * I915_GTT_PAGE_SIZE; - - if (!left) - continue; - - sg = add_padding_pages(left >> PAGE_SHIFT, st, sg); - } - - *gtt_offset += alignment_pad + dst_stride * height; - - return sg; -} - -static struct scatterlist * -remap_contiguous_pages(struct drm_i915_gem_object *obj, - unsigned int obj_offset, - unsigned int count, - struct sg_table *st, struct scatterlist *sg) -{ - struct scatterlist *iter; - unsigned int offset; - - iter = i915_gem_object_get_sg_dma(obj, obj_offset, &offset); - GEM_BUG_ON(!iter); - - do { - unsigned int len; - - len = min(sg_dma_len(iter) - (offset << PAGE_SHIFT), - count << PAGE_SHIFT); - sg_set_page(sg, NULL, len, 0); - sg_dma_address(sg) = - sg_dma_address(iter) + (offset << PAGE_SHIFT); - sg_dma_len(sg) = len; - - st->nents++; - count -= len >> PAGE_SHIFT; - if (count == 0) - return sg; - - sg = __sg_next(sg); - iter = __sg_next(iter); - offset = 0; - } while (1); -} - -static struct scatterlist * -remap_linear_color_plane_pages(struct drm_i915_gem_object *obj, - unsigned int obj_offset, unsigned int alignment_pad, - unsigned int size, - struct sg_table *st, struct scatterlist *sg, - unsigned int *gtt_offset) -{ - if (!size) - return sg; - - if (alignment_pad) - sg = add_padding_pages(alignment_pad, st, sg); - - sg = remap_contiguous_pages(obj, obj_offset, size, st, sg); - sg = sg_next(sg); - - *gtt_offset += alignment_pad + size; - - return sg; -} - -static struct scatterlist * -remap_color_plane_pages(const struct intel_remapped_info *rem_info, - struct drm_i915_gem_object *obj, - int color_plane, - struct sg_table *st, struct scatterlist *sg, - unsigned int *gtt_offset) -{ - unsigned int alignment_pad = 0; - - if (rem_info->plane_alignment) - alignment_pad = ALIGN(*gtt_offset, rem_info->plane_alignment) - *gtt_offset; - - if (rem_info->plane[color_plane].linear) - sg = remap_linear_color_plane_pages(obj, - rem_info->plane[color_plane].offset, - alignment_pad, - rem_info->plane[color_plane].size, - st, sg, - gtt_offset); - - else - sg = remap_tiled_color_plane_pages(obj, - rem_info->plane[color_plane].offset, - alignment_pad, - rem_info->plane[color_plane].width, - rem_info->plane[color_plane].height, - rem_info->plane[color_plane].src_stride, - rem_info->plane[color_plane].dst_stride, - st, sg, - gtt_offset); - - return sg; -} - -static noinline struct sg_table * -intel_remap_pages(struct intel_remapped_info *rem_info, - struct drm_i915_gem_object *obj) -{ - unsigned int size = intel_remapped_info_size(rem_info); - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct sg_table *st; - struct scatterlist *sg; - unsigned int gtt_offset = 0; - int ret = -ENOMEM; - int i; - - /* Allocate target SG list. */ - st = kmalloc(sizeof(*st), GFP_KERNEL); - if (!st) - goto err_st_alloc; - - ret = sg_alloc_table(st, size, GFP_KERNEL); - if (ret) - goto err_sg_alloc; - - st->nents = 0; - sg = st->sgl; - - for (i = 0 ; i < ARRAY_SIZE(rem_info->plane); i++) - sg = remap_color_plane_pages(rem_info, obj, i, st, sg, >t_offset); - - i915_sg_trim(st); - - return st; - -err_sg_alloc: - kfree(st); -err_st_alloc: - - drm_dbg(&i915->drm, "Failed to create remapped mapping for object size %zu! (%ux%u tiles, %u pages)\n", - obj->base.size, rem_info->plane[0].width, - rem_info->plane[0].height, size); - - return ERR_PTR(ret); -} - -static noinline struct sg_table * -intel_partial_pages(const struct i915_ggtt_view *view, - struct drm_i915_gem_object *obj) -{ - struct sg_table *st; - struct scatterlist *sg; - unsigned int count = view->partial.size; - int ret = -ENOMEM; - - st = kmalloc(sizeof(*st), GFP_KERNEL); - if (!st) - goto err_st_alloc; - - ret = sg_alloc_table(st, count, GFP_KERNEL); - if (ret) - goto err_sg_alloc; - - st->nents = 0; - - sg = remap_contiguous_pages(obj, view->partial.offset, count, st, st->sgl); - - sg_mark_end(sg); - i915_sg_trim(st); /* Drop any unused tail entries. */ - - return st; - -err_sg_alloc: - kfree(st); -err_st_alloc: - return ERR_PTR(ret); -} - -static int -i915_get_ggtt_vma_pages(struct i915_vma *vma) -{ - int ret; - - /* - * The vma->pages are only valid within the lifespan of the borrowed - * obj->mm.pages. When the obj->mm.pages sg_table is regenerated, so - * must be the vma->pages. A simple rule is that vma->pages must only - * be accessed when the obj->mm.pages are pinned. - */ - GEM_BUG_ON(!i915_gem_object_has_pinned_pages(vma->obj)); - - switch (vma->ggtt_view.type) { - default: - GEM_BUG_ON(vma->ggtt_view.type); - fallthrough; - case I915_GGTT_VIEW_NORMAL: - vma->pages = vma->obj->mm.pages; - return 0; - - case I915_GGTT_VIEW_ROTATED: - vma->pages = - intel_rotate_pages(&vma->ggtt_view.rotated, vma->obj); - break; - - case I915_GGTT_VIEW_REMAPPED: - vma->pages = - intel_remap_pages(&vma->ggtt_view.remapped, vma->obj); - break; - - case I915_GGTT_VIEW_PARTIAL: - vma->pages = intel_partial_pages(&vma->ggtt_view, vma->obj); - break; - } - - ret = 0; - if (IS_ERR(vma->pages)) { - ret = PTR_ERR(vma->pages); - vma->pages = NULL; - drm_err(&vma->vm->i915->drm, - "Failed to get pages for VMA view type %u (%d)!\n", - vma->ggtt_view.type, ret); - } - return ret; -} diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 0dd254cb1f69..681162030cae 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -223,19 +223,6 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass) INIT_LIST_HEAD(&vm->bound_list); }
-void clear_pages(struct i915_vma *vma) -{ - GEM_BUG_ON(!vma->pages); - - if (vma->pages != vma->obj->mm.pages) { - sg_free_table(vma->pages); - kfree(vma->pages); - } - vma->pages = NULL; - - memset(&vma->page_sizes, 0, sizeof(vma->page_sizes)); -} - void *__px_vaddr(struct drm_i915_gem_object *p) { enum i915_map_type type; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index dfeaef680aac..d0d0f19c7895 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -206,9 +206,6 @@ struct i915_vma_ops { */ void (*unbind_vma)(struct i915_address_space *vm, struct i915_vma *vma); - - int (*set_pages)(struct i915_vma *vma); - void (*clear_pages)(struct i915_vma *vma); };
struct i915_address_space { @@ -596,10 +593,6 @@ release_pd_entry(struct i915_page_directory * const pd, const struct drm_i915_gem_object * const scratch); void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
-int ggtt_set_pages(struct i915_vma *vma); -int ppgtt_set_pages(struct i915_vma *vma); -void clear_pages(struct i915_vma *vma); - void ppgtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, struct i915_vma *vma, diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 4396bfd630d8..083b3090c69c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -289,16 +289,6 @@ void i915_vm_free_pt_stash(struct i915_address_space *vm, } }
-int ppgtt_set_pages(struct i915_vma *vma) -{ - GEM_BUG_ON(vma->pages); - - vma->pages = vma->obj->mm.pages; - vma->page_sizes = vma->obj->mm.page_sizes; - - return 0; -} - void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt, unsigned long lmem_pt_obj_flags) { @@ -315,6 +305,4 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
ppgtt->vm.vma_ops.bind_vma = ppgtt_bind_vma; ppgtt->vm.vma_ops.unbind_vma = ppgtt_unbind_vma; - ppgtt->vm.vma_ops.set_pages = ppgtt_set_pages; - ppgtt->vm.vma_ops.clear_pages = clear_pages; } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 921d5b946c32..feb43b5334dd 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -109,7 +109,6 @@ vma_create(struct drm_i915_gem_object *obj, return ERR_PTR(-ENOMEM);
kref_init(&vma->ref); - mutex_init(&vma->pages_mutex); vma->vm = i915_vm_get(vm); vma->ops = &vm->vma_ops; vma->obj = obj; @@ -816,10 +815,398 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned int flags) return pinned; }
-static int vma_get_pages(struct i915_vma *vma) +static struct scatterlist * +rotate_pages(struct drm_i915_gem_object *obj, unsigned int offset, + unsigned int width, unsigned int height, + unsigned int src_stride, unsigned int dst_stride, + struct sg_table *st, struct scatterlist *sg) { - int err = 0; - bool pinned_pages = true; + unsigned int column, row; + unsigned int src_idx; + + for (column = 0; column < width; column++) { + unsigned int left; + + src_idx = src_stride * (height - 1) + column + offset; + for (row = 0; row < height; row++) { + st->nents++; + /* + * We don't need the pages, but need to initialize + * the entries so the sg list can be happily traversed. + * The only thing we need are DMA addresses. + */ + sg_set_page(sg, NULL, I915_GTT_PAGE_SIZE, 0); + sg_dma_address(sg) = + i915_gem_object_get_dma_address(obj, src_idx); + sg_dma_len(sg) = I915_GTT_PAGE_SIZE; + sg = sg_next(sg); + src_idx -= src_stride; + } + + left = (dst_stride - height) * I915_GTT_PAGE_SIZE; + + if (!left) + continue; + + st->nents++; + + /* + * The DE ignores the PTEs for the padding tiles, the sg entry + * here is just a conenience to indicate how many padding PTEs + * to insert at this spot. + */ + sg_set_page(sg, NULL, left, 0); + sg_dma_address(sg) = 0; + sg_dma_len(sg) = left; + sg = sg_next(sg); + } + + return sg; +} + +static noinline struct sg_table * +intel_rotate_pages(struct intel_rotation_info *rot_info, + struct drm_i915_gem_object *obj) +{ + unsigned int size = intel_rotation_info_size(rot_info); + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct sg_table *st; + struct scatterlist *sg; + int ret = -ENOMEM; + int i; + + /* Allocate target SG list. */ + st = kmalloc(sizeof(*st), GFP_KERNEL); + if (!st) + goto err_st_alloc; + + ret = sg_alloc_table(st, size, GFP_KERNEL); + if (ret) + goto err_sg_alloc; + + st->nents = 0; + sg = st->sgl; + + for (i = 0 ; i < ARRAY_SIZE(rot_info->plane); i++) + sg = rotate_pages(obj, rot_info->plane[i].offset, + rot_info->plane[i].width, rot_info->plane[i].height, + rot_info->plane[i].src_stride, + rot_info->plane[i].dst_stride, + st, sg); + + return st; + +err_sg_alloc: + kfree(st); +err_st_alloc: + + drm_dbg(&i915->drm, "Failed to create rotated mapping for object size %zu! (%ux%u tiles, %u pages)\n", + obj->base.size, rot_info->plane[0].width, + rot_info->plane[0].height, size); + + return ERR_PTR(ret); +} + +static struct scatterlist * +add_padding_pages(unsigned int count, + struct sg_table *st, struct scatterlist *sg) +{ + st->nents++; + + /* + * The DE ignores the PTEs for the padding tiles, the sg entry + * here is just a convenience to indicate how many padding PTEs + * to insert at this spot. + */ + sg_set_page(sg, NULL, count * I915_GTT_PAGE_SIZE, 0); + sg_dma_address(sg) = 0; + sg_dma_len(sg) = count * I915_GTT_PAGE_SIZE; + sg = sg_next(sg); + + return sg; +} + +static struct scatterlist * +remap_tiled_color_plane_pages(struct drm_i915_gem_object *obj, + unsigned int offset, unsigned int alignment_pad, + unsigned int width, unsigned int height, + unsigned int src_stride, unsigned int dst_stride, + struct sg_table *st, struct scatterlist *sg, + unsigned int *gtt_offset) +{ + unsigned int row; + + if (!width || !height) + return sg; + + if (alignment_pad) + sg = add_padding_pages(alignment_pad, st, sg); + + for (row = 0; row < height; row++) { + unsigned int left = width * I915_GTT_PAGE_SIZE; + + while (left) { + dma_addr_t addr; + unsigned int length; + + /* + * We don't need the pages, but need to initialize + * the entries so the sg list can be happily traversed. + * The only thing we need are DMA addresses. + */ + + addr = i915_gem_object_get_dma_address_len(obj, offset, &length); + + length = min(left, length); + + st->nents++; + + sg_set_page(sg, NULL, length, 0); + sg_dma_address(sg) = addr; + sg_dma_len(sg) = length; + sg = sg_next(sg); + + offset += length / I915_GTT_PAGE_SIZE; + left -= length; + } + + offset += src_stride - width; + + left = (dst_stride - width) * I915_GTT_PAGE_SIZE; + + if (!left) + continue; + + sg = add_padding_pages(left >> PAGE_SHIFT, st, sg); + } + + *gtt_offset += alignment_pad + dst_stride * height; + + return sg; +} + +static struct scatterlist * +remap_contiguous_pages(struct drm_i915_gem_object *obj, + unsigned int obj_offset, + unsigned int count, + struct sg_table *st, struct scatterlist *sg) +{ + struct scatterlist *iter; + unsigned int offset; + + iter = i915_gem_object_get_sg_dma(obj, obj_offset, &offset); + GEM_BUG_ON(!iter); + + do { + unsigned int len; + + len = min(sg_dma_len(iter) - (offset << PAGE_SHIFT), + count << PAGE_SHIFT); + sg_set_page(sg, NULL, len, 0); + sg_dma_address(sg) = + sg_dma_address(iter) + (offset << PAGE_SHIFT); + sg_dma_len(sg) = len; + + st->nents++; + count -= len >> PAGE_SHIFT; + if (count == 0) + return sg; + + sg = __sg_next(sg); + iter = __sg_next(iter); + offset = 0; + } while (1); +} + +static struct scatterlist * +remap_linear_color_plane_pages(struct drm_i915_gem_object *obj, + unsigned int obj_offset, unsigned int alignment_pad, + unsigned int size, + struct sg_table *st, struct scatterlist *sg, + unsigned int *gtt_offset) +{ + if (!size) + return sg; + + if (alignment_pad) + sg = add_padding_pages(alignment_pad, st, sg); + + sg = remap_contiguous_pages(obj, obj_offset, size, st, sg); + sg = sg_next(sg); + + *gtt_offset += alignment_pad + size; + + return sg; +} + +static struct scatterlist * +remap_color_plane_pages(const struct intel_remapped_info *rem_info, + struct drm_i915_gem_object *obj, + int color_plane, + struct sg_table *st, struct scatterlist *sg, + unsigned int *gtt_offset) +{ + unsigned int alignment_pad = 0; + + if (rem_info->plane_alignment) + alignment_pad = ALIGN(*gtt_offset, rem_info->plane_alignment) - *gtt_offset; + + if (rem_info->plane[color_plane].linear) + sg = remap_linear_color_plane_pages(obj, + rem_info->plane[color_plane].offset, + alignment_pad, + rem_info->plane[color_plane].size, + st, sg, + gtt_offset); + + else + sg = remap_tiled_color_plane_pages(obj, + rem_info->plane[color_plane].offset, + alignment_pad, + rem_info->plane[color_plane].width, + rem_info->plane[color_plane].height, + rem_info->plane[color_plane].src_stride, + rem_info->plane[color_plane].dst_stride, + st, sg, + gtt_offset); + + return sg; +} + +static noinline struct sg_table * +intel_remap_pages(struct intel_remapped_info *rem_info, + struct drm_i915_gem_object *obj) +{ + unsigned int size = intel_remapped_info_size(rem_info); + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct sg_table *st; + struct scatterlist *sg; + unsigned int gtt_offset = 0; + int ret = -ENOMEM; + int i; + + /* Allocate target SG list. */ + st = kmalloc(sizeof(*st), GFP_KERNEL); + if (!st) + goto err_st_alloc; + + ret = sg_alloc_table(st, size, GFP_KERNEL); + if (ret) + goto err_sg_alloc; + + st->nents = 0; + sg = st->sgl; + + for (i = 0 ; i < ARRAY_SIZE(rem_info->plane); i++) + sg = remap_color_plane_pages(rem_info, obj, i, st, sg, >t_offset); + + i915_sg_trim(st); + + return st; + +err_sg_alloc: + kfree(st); +err_st_alloc: + + drm_dbg(&i915->drm, "Failed to create remapped mapping for object size %zu! (%ux%u tiles, %u pages)\n", + obj->base.size, rem_info->plane[0].width, + rem_info->plane[0].height, size); + + return ERR_PTR(ret); +} + +static noinline struct sg_table * +intel_partial_pages(const struct i915_ggtt_view *view, + struct drm_i915_gem_object *obj) +{ + struct sg_table *st; + struct scatterlist *sg; + unsigned int count = view->partial.size; + int ret = -ENOMEM; + + st = kmalloc(sizeof(*st), GFP_KERNEL); + if (!st) + goto err_st_alloc; + + ret = sg_alloc_table(st, count, GFP_KERNEL); + if (ret) + goto err_sg_alloc; + + st->nents = 0; + + sg = remap_contiguous_pages(obj, view->partial.offset, count, st, st->sgl); + + sg_mark_end(sg); + i915_sg_trim(st); /* Drop any unused tail entries. */ + + return st; + +err_sg_alloc: + kfree(st); +err_st_alloc: + return ERR_PTR(ret); +} + +static int +__i915_vma_get_pages(struct i915_vma *vma) +{ + struct sg_table *pages; + int ret; + + /* + * The vma->pages are only valid within the lifespan of the borrowed + * obj->mm.pages. When the obj->mm.pages sg_table is regenerated, so + * must be the vma->pages. A simple rule is that vma->pages must only + * be accessed when the obj->mm.pages are pinned. + */ + GEM_BUG_ON(!i915_gem_object_has_pinned_pages(vma->obj)); + + switch (vma->ggtt_view.type) { + default: + GEM_BUG_ON(vma->ggtt_view.type); + fallthrough; + case I915_GGTT_VIEW_NORMAL: + pages = vma->obj->mm.pages; + break; + + case I915_GGTT_VIEW_ROTATED: + pages = + intel_rotate_pages(&vma->ggtt_view.rotated, vma->obj); + break; + + case I915_GGTT_VIEW_REMAPPED: + pages = + intel_remap_pages(&vma->ggtt_view.remapped, vma->obj); + break; + + case I915_GGTT_VIEW_PARTIAL: + pages = intel_partial_pages(&vma->ggtt_view, vma->obj); + break; + } + + ret = 0; + if (IS_ERR(pages)) { + ret = PTR_ERR(pages); + pages = NULL; + drm_err(&vma->vm->i915->drm, + "Failed to get pages for VMA view type %u (%d)!\n", + vma->ggtt_view.type, ret); + } + + pages = xchg(&vma->pages, pages); + + /* did we race against a put_pages? */ + if (pages && pages != vma->obj->mm.pages) { + sg_free_table(vma->pages); + kfree(vma->pages); + } + + return ret; +} + +I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) +{ + int err;
if (atomic_add_unless(&vma->pages_count, 1, 0)) return 0; @@ -828,25 +1215,17 @@ static int vma_get_pages(struct i915_vma *vma) if (err) return err;
- /* Allocations ahoy! */ - if (mutex_lock_interruptible(&vma->pages_mutex)) { - err = -EINTR; - goto unpin; - } + err = __i915_vma_get_pages(vma); + if (err) + goto err_unpin;
- if (!atomic_read(&vma->pages_count)) { - err = vma->ops->set_pages(vma); - if (err) - goto unlock; - pinned_pages = false; - } + vma->page_sizes = vma->obj->mm.page_sizes; atomic_inc(&vma->pages_count);
-unlock: - mutex_unlock(&vma->pages_mutex); -unpin: - if (pinned_pages) - __i915_gem_object_unpin_pages(vma->obj); + return 0; + +err_unpin: + __i915_gem_object_unpin_pages(vma->obj);
return err; } @@ -854,18 +1233,22 @@ static int vma_get_pages(struct i915_vma *vma) static void __vma_put_pages(struct i915_vma *vma, unsigned int count) { /* We allocate under vma_get_pages, so beware the shrinker */ - mutex_lock_nested(&vma->pages_mutex, SINGLE_DEPTH_NESTING); + struct sg_table *pages = READ_ONCE(vma->pages); + GEM_BUG_ON(atomic_read(&vma->pages_count) < count); + if (atomic_sub_return(count, &vma->pages_count) == 0) { - vma->ops->clear_pages(vma); - GEM_BUG_ON(vma->pages); + if (pages == cmpxchg(&vma->pages, pages, NULL) && + pages != vma->obj->mm.pages) { + sg_free_table(pages); + kfree(pages); + }
i915_gem_object_unpin_pages(vma->obj); } - mutex_unlock(&vma->pages_mutex); }
-static void vma_put_pages(struct i915_vma *vma) +I915_SELFTEST_EXPORT void i915_vma_put_pages(struct i915_vma *vma) { if (atomic_add_unless(&vma->pages_count, -1, 1)) return; @@ -896,10 +1279,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, unsigned int bound; int err;
-#ifdef CONFIG_PROVE_LOCKING - if (debug_locks && !WARN_ON(!ww)) - assert_vma_held(vma); -#endif + assert_vma_held(vma); + GEM_BUG_ON(!ww);
BUILD_BUG_ON(PIN_GLOBAL != I915_VMA_GLOBAL_BIND); BUILD_BUG_ON(PIN_USER != I915_VMA_LOCAL_BIND); @@ -910,7 +1291,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, if (try_qad_pin(vma, flags & I915_VMA_BIND_MASK)) return 0;
- err = vma_get_pages(vma); + err = i915_vma_get_pages(vma); if (err) return err;
@@ -1038,10 +1419,11 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, err_rpm: if (wakeref) intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref); + if (moving) dma_fence_put(moving); - vma_put_pages(vma);
+ i915_vma_put_pages(vma); return err; }
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 9a931ecb09e5..32719431b3df 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -431,4 +431,7 @@ static inline int i915_vma_sync(struct i915_vma *vma) void i915_vma_module_exit(void); int i915_vma_module_init(void);
+I915_SELFTEST_DECLARE(int i915_vma_get_pages(struct i915_vma *vma)); +I915_SELFTEST_DECLARE(void i915_vma_put_pages(struct i915_vma *vma)); + #endif diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h index f03fa96a1701..ca575e129ced 100644 --- a/drivers/gpu/drm/i915/i915_vma_types.h +++ b/drivers/gpu/drm/i915/i915_vma_types.h @@ -270,7 +270,6 @@ struct i915_vma { #define I915_VMA_PAGES_BIAS 24 #define I915_VMA_PAGES_ACTIVE (BIT(24) | 1) atomic_t pages_count; /* number of active binds to the pages */ - struct mutex pages_mutex; /* protect acquire/release of backing pages */
/** * Support different GGTT views into the same object. diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 46f4236039a9..4574fb51b656 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -1953,7 +1953,9 @@ static int igt_cs_tlb(void *arg) goto end; }
- err = vma->ops->set_pages(vma); + i915_gem_object_lock(bbe, NULL); + err = i915_vma_get_pages(vma); + i915_gem_object_unlock(bbe); if (err) goto end;
@@ -1994,7 +1996,7 @@ static int igt_cs_tlb(void *arg) i915_request_put(rq); }
- vma->ops->clear_pages(vma); + i915_vma_put_pages(vma);
err = context_sync(ce); if (err) { @@ -2009,7 +2011,9 @@ static int igt_cs_tlb(void *arg) goto end; }
- err = vma->ops->set_pages(vma); + i915_gem_object_lock(act, NULL); + err = i915_vma_get_pages(vma); + i915_gem_object_unlock(act); if (err) goto end;
@@ -2047,7 +2051,7 @@ static int igt_cs_tlb(void *arg) } end_spin(batch, count - 1);
- vma->ops->clear_pages(vma); + i915_vma_put_pages(vma);
err = context_sync(ce); if (err) { diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c index cc047ec594f9..096679014d99 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c @@ -86,8 +86,6 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
ppgtt->vm.vma_ops.bind_vma = mock_bind_ppgtt; ppgtt->vm.vma_ops.unbind_vma = mock_unbind_ppgtt; - ppgtt->vm.vma_ops.set_pages = ppgtt_set_pages; - ppgtt->vm.vma_ops.clear_pages = clear_pages;
return ppgtt; } @@ -126,8 +124,6 @@ void mock_init_ggtt(struct drm_i915_private *i915, struct i915_ggtt *ggtt)
ggtt->vm.vma_ops.bind_vma = mock_bind_ggtt; ggtt->vm.vma_ops.unbind_vma = mock_unbind_ggtt; - ggtt->vm.vma_ops.set_pages = ggtt_set_pages; - ggtt->vm.vma_ops.clear_pages = clear_pages;
i915_address_space_init(&ggtt->vm, VM_CLASS_GGTT); i915->gt.ggtt = ggtt;
On Mon, 29 Nov 2021 at 13:57, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
<snip>
Does this emit a barrier? Or can the READ_ONCE(vma->pages) be moved past this, and does that matter?
try_cmpxchg? Also can pages be NULL here?
As an aside, is it somehow possible to re-order the series or something to avoid introducing the transient lockless trickery here? I know by the end of the series this all gets removed, but still just slightly worried here.
On 06-12-2021 14:13, Matthew Auld wrote:
It's not that tricky, and only there because we still have to support unlocked until patch 13, patch 15 removes it.
From the kernel doc:
- RMW operations that have a return value are fully ordered;
- RMW operations that are conditional are unordered on FAILURE, otherwise the above rules apply.
so READ_ONCE followed by a bunch of stuff that only happens when cmpxchg is succesful, is ok.
At the beginning of vma_put_pages(), we hold at least 1 reference to vma->pages, and we assume vma->pages is set to something sane.
We use READ_ONCE to read vma->pages before decreasing refcount on vma->pages_count, after which we attempt to clear vma->pages.
HOWEVER, as we are not guaranteed to hold the lock, we are careful. New pages may have been set by __i915_vma_get_pages(), using xchg.
In that case, we fail, and _get_pages() cleans up instead.
After that, we drop the reference to the object's page pin, which we needed for the pages != vma->obj->mm.pages comparison.
cmpxchg is correct here. We don't need to loop, and only need to try once. The only time we can fail, will happen after at least one get_pages() call, and that would have otherwise freed it for us.
The locked version would actually be identical in this case.
I removed the locking because it didn't add anything. The same ops would be required, only with additional locking for something that is using atomic ops for a refcount anyway..
On Mon, 6 Dec 2021 at 15:18, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Ok, I can buy that.
On Mon, Dec 06, 2021 at 05:00:46PM +0000, Matthew Auld wrote:
Maybe I'm late, but this stuff needs to be documented in a comment right above the barrier, e.g.
/* atomic_sub_return provides *exact type of barrier, e.g. if it's * conditional or whatever* which orders thing_A against thing_B. The * counterpart barriers is found in function_C() */
Ofc function_C() then needs to have a similar comment. Ideally the comment explains anything else that needs explaining, like we need to order thing_A against thing_B, the above is just the absolute bare minimum.
This is required (and yes we've been extremely bad at not doing this) per kernel coding style, so not just i915 rules.
If we don't have it (because patch landed already, I'm horribly burried) please add it these comment in fixup patches. -Daniel
On Mon, 29 Nov 2021 at 13:57, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
<snip>
So should this one rather be:
sg_free_table(pages); kfree(pages);
Or am I missing something?
On Tue, 7 Dec 2021 at 10:06, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
r-b with that.
i915_vma_wait_for_bind needs the vma lock held, fix the caller.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/i915_vma.c | 40 +++++++++++++++++++++++---------- 1 file changed, 28 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index feb43b5334dd..ecf97b3bf64f 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -1438,23 +1438,15 @@ static void flush_idle_contexts(struct intel_gt *gt) intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT); }
-int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, - u32 align, unsigned int flags) +static int __i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, + u32 align, unsigned int flags) { struct i915_address_space *vm = vma->vm; int err;
- GEM_BUG_ON(!i915_vma_is_ggtt(vma)); - -#ifdef CONFIG_LOCKDEP - WARN_ON(!ww && dma_resv_held(vma->obj->base.resv)); -#endif - do { - if (ww) - err = i915_vma_pin_ww(vma, ww, 0, align, flags | PIN_GLOBAL); - else - err = i915_vma_pin(vma, 0, align, flags | PIN_GLOBAL); + err = i915_vma_pin_ww(vma, ww, 0, align, flags | PIN_GLOBAL); + if (err != -ENOSPC) { if (!err) { err = i915_vma_wait_for_bind(vma); @@ -1473,6 +1465,30 @@ int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, } while (1); }
+int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, + u32 align, unsigned int flags) +{ + struct i915_gem_ww_ctx _ww; + int err; + + GEM_BUG_ON(!i915_vma_is_ggtt(vma)); + + if (ww) + return __i915_ggtt_pin(vma, ww, align, flags); + +#ifdef CONFIG_LOCKDEP + WARN_ON(dma_resv_held(vma->obj->base.resv)); +#endif + + for_i915_gem_ww(&_ww, err, true) { + err = i915_gem_object_lock(vma->obj, &_ww); + if (!err) + err = __i915_ggtt_pin(vma, &_ww, align, flags); + } + + return err; +} + static void __vma_close(struct i915_vma *vma, struct intel_gt *gt) { /*
On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
i915_vma_wait_for_bind needs the vma lock held, fix the caller.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Reviewed-by: Matthew Auld matthew.auld@intel.com
We will need the lock to unbind the vma, and wait for bind to complete. Remove the special casing for the !ww path, and force ww locking for all.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/i915_drv.h | 7 ++----- drivers/gpu/drm/i915/i915_gem.c | 26 ++++++++++++++++++++++---- 2 files changed, 24 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1bfadd9127fc..8322abe194da 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1842,13 +1842,10 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, const struct i915_ggtt_view *view, u64 size, u64 alignment, u64 flags);
-static inline struct i915_vma * __must_check +struct i915_vma * __must_check i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, const struct i915_ggtt_view *view, - u64 size, u64 alignment, u64 flags) -{ - return i915_gem_object_ggtt_pin_ww(obj, NULL, view, size, alignment, flags); -} + u64 size, u64 alignment, u64 flags);
int i915_gem_object_unbind(struct drm_i915_gem_object *obj, unsigned long flags); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 527228d4da7e..e83522953cde 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -877,6 +877,8 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, struct i915_vma *vma; int ret;
+ GEM_WARN_ON(!ww); + if (flags & PIN_MAPPABLE && (!view || view->type == I915_GGTT_VIEW_NORMAL)) { /* @@ -936,10 +938,7 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, return ERR_PTR(ret); }
- if (ww) - ret = i915_vma_pin_ww(vma, ww, size, alignment, flags | PIN_GLOBAL); - else - ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL); + ret = i915_vma_pin_ww(vma, ww, size, alignment, flags | PIN_GLOBAL);
if (ret) return ERR_PTR(ret); @@ -959,6 +958,25 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, return vma; }
+struct i915_vma * __must_check +i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, + const struct i915_ggtt_view *view, + u64 size, u64 alignment, u64 flags) +{ + struct i915_gem_ww_ctx ww; + struct i915_vma *ret; + int err; + + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(obj, &ww); + if (!err) + ret = i915_gem_object_ggtt_pin_ww(obj, &ww, view, size, + alignment, flags); + } + + return err ? ERR_PTR(err) : ret; +} + int i915_gem_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv)
We will need the lock to unbind the vma, and wait for bind to complete. Remove the special casing for the !ww path, and force ww locking for all.
Changes since v1: - Pass err to for_i915_gem_ww handling for -EDEADLK handling.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/i915_drv.h | 7 ++----- drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++---- 2 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1bfadd9127fc..8322abe194da 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1842,13 +1842,10 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, const struct i915_ggtt_view *view, u64 size, u64 alignment, u64 flags);
-static inline struct i915_vma * __must_check +struct i915_vma * __must_check i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, const struct i915_ggtt_view *view, - u64 size, u64 alignment, u64 flags) -{ - return i915_gem_object_ggtt_pin_ww(obj, NULL, view, size, alignment, flags); -} + u64 size, u64 alignment, u64 flags);
int i915_gem_object_unbind(struct drm_i915_gem_object *obj, unsigned long flags); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 527228d4da7e..6002045bd194 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -877,6 +877,8 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, struct i915_vma *vma; int ret;
+ GEM_WARN_ON(!ww); + if (flags & PIN_MAPPABLE && (!view || view->type == I915_GGTT_VIEW_NORMAL)) { /* @@ -936,10 +938,7 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, return ERR_PTR(ret); }
- if (ww) - ret = i915_vma_pin_ww(vma, ww, size, alignment, flags | PIN_GLOBAL); - else - ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL); + ret = i915_vma_pin_ww(vma, ww, size, alignment, flags | PIN_GLOBAL);
if (ret) return ERR_PTR(ret); @@ -959,6 +958,29 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, return vma; }
+struct i915_vma * __must_check +i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, + const struct i915_ggtt_view *view, + u64 size, u64 alignment, u64 flags) +{ + struct i915_gem_ww_ctx ww; + struct i915_vma *ret; + int err; + + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(obj, &ww); + if (err) + continue; + + ret = i915_gem_object_ggtt_pin_ww(obj, &ww, view, size, + alignment, flags); + if (IS_ERR(ret)) + err = PTR_ERR(ret); + } + + return err ? ERR_PTR(err) : ret; +} + int i915_gem_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv)
On Tue, 30 Nov 2021 at 09:21, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Return something here, or GEM_BUG_ON()? I assume it's going to crash and burn anyway?
Maybe s/ret/vma/ ? Seeing ret makes me think it's an integer.
Anyway, Reviewed-by: Matthew Auld matthew.auld@intel.com
In the next commit, we don't evict when refcount = 0.
igt_vm_isolation() continuously tries to pin/unpin at same address, but also calls put() on the object, which means the object may not be unpinned in time.
Instead of this, re-use the same object over and over, so they can be unbound as required.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- .../drm/i915/gem/selftests/i915_gem_context.c | 54 +++++++++++-------- 1 file changed, 32 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index b32f7fed2d9c..3fc595b57cf4 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1481,10 +1481,10 @@ static int check_scratch(struct i915_address_space *vm, u64 offset)
static int write_to_scratch(struct i915_gem_context *ctx, struct intel_engine_cs *engine, + struct drm_i915_gem_object *obj, u64 offset, u32 value) { struct drm_i915_private *i915 = ctx->i915; - struct drm_i915_gem_object *obj; struct i915_address_space *vm; struct i915_request *rq; struct i915_vma *vma; @@ -1497,15 +1497,9 @@ static int write_to_scratch(struct i915_gem_context *ctx, if (err) return err;
- obj = i915_gem_object_create_internal(i915, PAGE_SIZE); - if (IS_ERR(obj)) - return PTR_ERR(obj); - cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); - if (IS_ERR(cmd)) { - err = PTR_ERR(cmd); - goto out; - } + if (IS_ERR(cmd)) + return PTR_ERR(cmd);
*cmd++ = MI_STORE_DWORD_IMM_GEN4; if (GRAPHICS_VER(i915) >= 8) { @@ -1569,17 +1563,19 @@ static int write_to_scratch(struct i915_gem_context *ctx, i915_vma_unpin(vma); out_vm: i915_vm_put(vm); -out: - i915_gem_object_put(obj); + + if (!err) + err = i915_gem_object_wait(obj, 0, MAX_SCHEDULE_TIMEOUT); + return err; }
static int read_from_scratch(struct i915_gem_context *ctx, struct intel_engine_cs *engine, + struct drm_i915_gem_object *obj, u64 offset, u32 *value) { struct drm_i915_private *i915 = ctx->i915; - struct drm_i915_gem_object *obj; struct i915_address_space *vm; const u32 result = 0x100; struct i915_request *rq; @@ -1594,10 +1590,6 @@ static int read_from_scratch(struct i915_gem_context *ctx, if (err) return err;
- obj = i915_gem_object_create_internal(i915, PAGE_SIZE); - if (IS_ERR(obj)) - return PTR_ERR(obj); - if (GRAPHICS_VER(i915) >= 8) { const u32 GPR0 = engine->mmio_base + 0x600;
@@ -1615,7 +1607,7 @@ static int read_from_scratch(struct i915_gem_context *ctx, cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); - goto out; + goto err_unpin; }
memset(cmd, POISON_INUSE, PAGE_SIZE); @@ -1651,7 +1643,7 @@ static int read_from_scratch(struct i915_gem_context *ctx, cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); - goto out; + goto err_unpin; }
memset(cmd, POISON_INUSE, PAGE_SIZE); @@ -1722,8 +1714,10 @@ static int read_from_scratch(struct i915_gem_context *ctx, i915_vma_unpin(vma); out_vm: i915_vm_put(vm); -out: - i915_gem_object_put(obj); + + if (!err) + err = i915_gem_object_wait(obj, 0, MAX_SCHEDULE_TIMEOUT); + return err; }
@@ -1765,6 +1759,7 @@ static int igt_vm_isolation(void *arg) u64 vm_total; u32 expected; int err; + struct drm_i915_gem_object *obj_a, *obj_b;
if (GRAPHICS_VER(i915) < 7) return 0; @@ -1810,6 +1805,18 @@ static int igt_vm_isolation(void *arg) vm_total = ctx_a->vm->total; GEM_BUG_ON(ctx_b->vm->total != vm_total);
+ obj_a = i915_gem_object_create_internal(i915, PAGE_SIZE); + if (IS_ERR(obj_a)) { + err = PTR_ERR(obj_a); + goto out_file; + } + + obj_b = i915_gem_object_create_internal(i915, PAGE_SIZE); + if (IS_ERR(obj_b)) { + err = PTR_ERR(obj_b); + goto put_a; + } + count = 0; num_engines = 0; for_each_uabi_engine(engine, i915) { @@ -1832,10 +1839,10 @@ static int igt_vm_isolation(void *arg) I915_GTT_PAGE_SIZE, vm_total, sizeof(u32), alignof_dword);
- err = write_to_scratch(ctx_a, engine, + err = write_to_scratch(ctx_a, engine, obj_a, offset, 0xdeadbeef); if (err == 0) - err = read_from_scratch(ctx_b, engine, + err = read_from_scratch(ctx_b, engine, obj_b, offset, &value); if (err) goto out_file; @@ -1858,6 +1865,9 @@ static int igt_vm_isolation(void *arg) pr_info("Checked %lu scratch offsets across %lu engines\n", count, num_engines);
+ i915_gem_object_put(obj_b); +put_a: + i915_gem_object_put(obj_a); out_file: if (igt_live_test_end(&t)) err = -EIO;
On Mon, 29 Nov 2021 at 13:57, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Is this something to be worried about in the real world, outside of the selftests?
Nit: Christmas tree-ish
goto put_b; below also?
Otherwise, Reviewed-by: Matthew Auld matthew.auld@intel.com
On 07-12-2021 11:44, Matthew Auld wrote:
I don't think userspace could hit it because the race is small, it would need to free an object, then immediately try to softpin a new object in the same place.
It could be fixed, but it would require a massive rework of eviction. It could eventually be done, but requires fixing the entire vm locking. I don't think userspace
will hit it, except if it tried deliberately. If it does turn out to be a problem, a workaround would be only calling i915_gem_evict_vm() without locks, so it can call drain_freed_objects as needed. This requires some surgery
to make execbuf handle the case where we may drop all locks when evicting.
Thanks, will fixup both!
~Maarten
Now that freeing objects takes the object lock when destroying the backing pages, we can confidently take the object lock even for dead objects.
Use this fact to take the object lock in the shrinker, without requiring a reference to the object, so all calls to unbind take the object lock.
This is the last step to requiring the object lock for vma_unbind.
Changes since v1: - No longer require the refcount, as every freed object now holds the lock when unbinding VMA's.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 6 ++++ drivers/gpu/drm/i915/i915_gem_evict.c | 34 +++++++++++++++++--- 2 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index eebff4735781..ad2123369e0d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -405,12 +405,18 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr list_for_each_entry_safe(vma, next, &i915->ggtt.vm.bound_list, vm_link) { unsigned long count = vma->node.size >> PAGE_SHIFT; + struct drm_i915_gem_object *obj = vma->obj;
if (!vma->iomap || i915_vma_is_active(vma)) continue;
+ if (!i915_gem_object_trylock(obj)) + continue; + if (__i915_vma_unbind(vma) == 0) freed_pages += count; + + i915_gem_object_unlock(obj); } mutex_unlock(&i915->ggtt.vm.mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 2b73ddb11c66..286efa462eca 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -58,6 +58,9 @@ mark_free(struct drm_mm_scan *scan, if (i915_vma_is_pinned(vma)) return false;
+ if (!i915_gem_object_trylock(vma->obj)) + return false; + list_add(&vma->evict_link, unwind); return drm_mm_scan_add_block(scan, &vma->node); } @@ -178,6 +181,7 @@ i915_gem_evict_something(struct i915_address_space *vm, list_for_each_entry_safe(vma, next, &eviction_list, evict_link) { ret = drm_mm_scan_remove_block(&scan, &vma->node); BUG_ON(ret); + i915_gem_object_unlock(vma->obj); }
/* @@ -222,10 +226,12 @@ i915_gem_evict_something(struct i915_address_space *vm, * of any of our objects, thus corrupting the list). */ list_for_each_entry_safe(vma, next, &eviction_list, evict_link) { - if (drm_mm_scan_remove_block(&scan, &vma->node)) + if (drm_mm_scan_remove_block(&scan, &vma->node)) { __i915_vma_pin(vma); - else + } else { list_del(&vma->evict_link); + i915_gem_object_unlock(vma->obj); + } }
/* Unbinding will emit any required flushes */ @@ -234,16 +240,22 @@ i915_gem_evict_something(struct i915_address_space *vm, __i915_vma_unpin(vma); if (ret == 0) ret = __i915_vma_unbind(vma); + + i915_gem_object_unlock(vma->obj); }
while (ret == 0 && (node = drm_mm_scan_color_evict(&scan))) { vma = container_of(node, struct i915_vma, node);
+ /* If we find any non-objects (!vma), we cannot evict them */ - if (vma->node.color != I915_COLOR_UNEVICTABLE) + if (vma->node.color != I915_COLOR_UNEVICTABLE && + i915_gem_object_trylock(vma->obj)) { ret = __i915_vma_unbind(vma); - else - ret = -ENOSPC; /* XXX search failed, try again? */ + i915_gem_object_unlock(vma->obj); + } else { + ret = -ENOSPC; + } }
return ret; @@ -333,6 +345,11 @@ int i915_gem_evict_for_node(struct i915_address_space *vm, break; }
+ if (!i915_gem_object_trylock(vma->obj)) { + ret = -ENOSPC; + break; + } + /* * Never show fear in the face of dragons! * @@ -350,6 +367,8 @@ int i915_gem_evict_for_node(struct i915_address_space *vm, __i915_vma_unpin(vma); if (ret == 0) ret = __i915_vma_unbind(vma); + + i915_gem_object_unlock(vma->obj); }
return ret; @@ -393,6 +412,9 @@ int i915_gem_evict_vm(struct i915_address_space *vm) if (i915_vma_is_pinned(vma)) continue;
+ if (!i915_gem_object_trylock(vma->obj)) + continue; + __i915_vma_pin(vma); list_add(&vma->evict_link, &eviction_list); } @@ -406,6 +428,8 @@ int i915_gem_evict_vm(struct i915_address_space *vm) ret = __i915_vma_unbind(vma); if (ret != -EINTR) /* "Get me out of here!" */ ret = 0; + + i915_gem_object_unlock(vma->obj); } } while (ret == 0);
On Mon, 29 Nov 2021 at 13:57, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
That looks to be a future patch, at least with non-TTM backend? Does something need to be re-ordered in the series?
Use this fact to take the object lock in the shrinker, without requiring a reference to the object, so all calls to unbind take the object lock.
Hmm, the previous patch was talking about "we don't evict when refcount = 0", but this looks to be doing something else?
On 07-12-2021 12:01, Matthew Auld wrote:
Yeah, a careful inspection of eviction code does allow evicting when refcount = 0 using trylocks, iff we take the object lock during destruction., in order to free all remaining backing.
However, you can't do a blocking lock for dead objects, because VM lock is the inner lock in the hierarchy. Lockdep will complain (rightfully).
To free dead objects to reclaim memory, you would need to call drain_freed_objects(), which will deadlock if you hold either vm lock or resv locks.
We are moving away from short term pinning, and need a way to evict objects locked by the current context. Pass the ww context to all eviction functions, so that they can evict objects that are already locked by the current ww context.
On top of that, this slightly improves ww handling because the locked objects are marked as locked by the correct ww.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_object.h | 8 ++++-- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 4 +-- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/mock_engine.c | 2 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 +- drivers/gpu/drm/i915/gt/selftest_migrate.c | 2 +- drivers/gpu/drm/i915/gvt/aperture_gm.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 5 +++- drivers/gpu/drm/i915/i915_gem_evict.c | 15 ++++++----- drivers/gpu/drm/i915/i915_gem_gtt.c | 8 +++--- drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +++ drivers/gpu/drm/i915/i915_vgpu.c | 2 +- drivers/gpu/drm/i915/i915_vma.c | 13 +++++---- .../gpu/drm/i915/selftests/i915_gem_evict.c | 27 ++++++++++++------- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 14 +++++----- 18 files changed, 70 insertions(+), 45 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 9f7c6ecadb90..91ab43d67f47 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -761,7 +761,7 @@ static int eb_reserve(struct i915_execbuffer *eb) case 1: /* Too fragmented, unbind everything and retry */ mutex_lock(&eb->context->vm->mutex); - err = i915_gem_evict_vm(eb->context->vm); + err = i915_gem_evict_vm(eb->context->vm, &eb->ww); mutex_unlock(&eb->context->vm->mutex); if (err) return err; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 66f20b803b01..f66d46882ea7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -210,9 +210,13 @@ static inline int i915_gem_object_lock_interruptible(struct drm_i915_gem_object return __i915_gem_object_lock(obj, ww, true); }
-static inline bool i915_gem_object_trylock(struct drm_i915_gem_object *obj) +static inline bool i915_gem_object_trylock(struct drm_i915_gem_object *obj, + struct i915_gem_ww_ctx *ww) { - return dma_resv_trylock(obj->base.resv); + if (!ww) + return dma_resv_trylock(obj->base.resv); + else + return ww_mutex_trylock(&obj->base.resv->lock, &ww->ctx); }
static inline void i915_gem_object_unlock(struct drm_i915_gem_object *obj) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index ad2123369e0d..85ff182ddbc2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -216,7 +216,7 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww,
/* May arrive from get_pages on another bo */ if (!ww) { - if (!i915_gem_object_trylock(obj)) + if (!i915_gem_object_trylock(obj, NULL)) goto skip; } else { err = i915_gem_object_lock(obj, ww); @@ -410,7 +410,7 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr if (!vma->iomap || i915_vma_is_active(vma)) continue;
- if (!i915_gem_object_trylock(obj)) + if (!i915_gem_object_trylock(obj, NULL)) continue;
if (__i915_vma_unbind(vma) == 0) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 80680395bb3b..42b0b770929c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -653,7 +653,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem, cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE; i915_gem_object_set_cache_coherency(obj, cache_level);
- if (WARN_ON(!i915_gem_object_trylock(obj))) + if (WARN_ON(!i915_gem_object_trylock(obj, NULL))) return -EBUSY;
i915_gem_object_init_memory_region(obj, mem); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index a1334b48dde7..cd19f4afe823 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -26,7 +26,7 @@ static void dbg_poison_ce(struct intel_context *ce) int type = i915_coherent_map_type(ce->engine->i915, obj, true); void *map;
- if (!i915_gem_object_trylock(obj)) + if (!i915_gem_object_trylock(obj, NULL)) return;
map = i915_gem_object_pin_map(obj, type); diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index d9596087df08..17e2e01b8d25 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -504,7 +504,7 @@ static int ggtt_reserve_guc_top(struct i915_ggtt *ggtt) GEM_BUG_ON(ggtt->vm.total <= GUC_GGTT_TOP); size = ggtt->vm.total - GUC_GGTT_TOP;
- ret = i915_gem_gtt_reserve(&ggtt->vm, &ggtt->uc_fw, size, + ret = i915_gem_gtt_reserve(&ggtt->vm, NULL, &ggtt->uc_fw, size, GUC_GGTT_TOP, I915_COLOR_UNEVICTABLE, PIN_NOEVICT); if (ret) diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index bb99fc03f503..240265354274 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -17,7 +17,7 @@ static int mock_timeline_pin(struct intel_timeline *tl) { int err;
- if (WARN_ON(!i915_gem_object_trylock(tl->hwsp_ggtt->obj))) + if (WARN_ON(!i915_gem_object_trylock(tl->hwsp_ggtt->obj, NULL))) return -EBUSY;
err = intel_timeline_pin_map(tl); diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index e5ad4d5a91c0..a3597a6bb805 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1382,7 +1382,7 @@ static int evict_vma(void *data) complete(&arg->completion);
mutex_lock(&vm->mutex); - err = i915_gem_evict_for_node(vm, &evict, 0); + err = i915_gem_evict_for_node(vm, NULL, &evict, 0); mutex_unlock(&vm->mutex);
return err; diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c index 12ef2837c89b..78993c760f12 100644 --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c @@ -464,7 +464,7 @@ create_init_lmem_internal(struct intel_gt *gt, size_t sz, bool try_lmem) return obj; }
- i915_gem_object_trylock(obj); + i915_gem_object_trylock(obj, NULL); err = i915_gem_object_pin_pages(obj); if (err) { i915_gem_object_unlock(obj); diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c index 0d6d59871308..c08098a167e9 100644 --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c @@ -63,7 +63,7 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
mutex_lock(>->ggtt->vm.mutex); mmio_hw_access_pre(gt); - ret = i915_gem_gtt_insert(>->ggtt->vm, node, + ret = i915_gem_gtt_insert(>->ggtt->vm, NULL, node, size, I915_GTT_PAGE_SIZE, I915_COLOR_UNEVICTABLE, start, end, flags); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8322abe194da..14aa66030836 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1908,14 +1908,17 @@ i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
/* i915_gem_evict.c */ int __must_check i915_gem_evict_something(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww, u64 min_size, u64 alignment, unsigned long color, u64 start, u64 end, unsigned flags); int __must_check i915_gem_evict_for_node(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww, struct drm_mm_node *node, unsigned int flags); -int i915_gem_evict_vm(struct i915_address_space *vm); +int i915_gem_evict_vm(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww);
/* i915_gem_internal.c */ struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 286efa462eca..24f5e3345e43 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -51,6 +51,7 @@ static int ggtt_flush(struct intel_gt *gt)
static bool mark_free(struct drm_mm_scan *scan, + struct i915_gem_ww_ctx *ww, struct i915_vma *vma, unsigned int flags, struct list_head *unwind) @@ -58,7 +59,7 @@ mark_free(struct drm_mm_scan *scan, if (i915_vma_is_pinned(vma)) return false;
- if (!i915_gem_object_trylock(vma->obj)) + if (!i915_gem_object_trylock(vma->obj, ww)) return false;
list_add(&vma->evict_link, unwind); @@ -101,6 +102,7 @@ static bool defer_evict(struct i915_vma *vma) */ int i915_gem_evict_something(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww, u64 min_size, u64 alignment, unsigned long color, u64 start, u64 end, @@ -173,7 +175,7 @@ i915_gem_evict_something(struct i915_address_space *vm, continue; }
- if (mark_free(&scan, vma, flags, &eviction_list)) + if (mark_free(&scan, ww, vma, flags, &eviction_list)) goto found; }
@@ -250,7 +252,7 @@ i915_gem_evict_something(struct i915_address_space *vm,
/* If we find any non-objects (!vma), we cannot evict them */ if (vma->node.color != I915_COLOR_UNEVICTABLE && - i915_gem_object_trylock(vma->obj)) { + i915_gem_object_trylock(vma->obj, ww)) { ret = __i915_vma_unbind(vma); i915_gem_object_unlock(vma->obj); } else { @@ -273,6 +275,7 @@ i915_gem_evict_something(struct i915_address_space *vm, * memory in e.g. the shrinker. */ int i915_gem_evict_for_node(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww, struct drm_mm_node *target, unsigned int flags) { @@ -345,7 +348,7 @@ int i915_gem_evict_for_node(struct i915_address_space *vm, break; }
- if (!i915_gem_object_trylock(vma->obj)) { + if (!i915_gem_object_trylock(vma->obj, ww)) { ret = -ENOSPC; break; } @@ -386,7 +389,7 @@ int i915_gem_evict_for_node(struct i915_address_space *vm, * To clarify: This is for freeing up virtual address space, not for freeing * memory in e.g. the shrinker. */ -int i915_gem_evict_vm(struct i915_address_space *vm) +int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww) { int ret = 0;
@@ -412,7 +415,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm) if (i915_vma_is_pinned(vma)) continue;
- if (!i915_gem_object_trylock(vma->obj)) + if (!i915_gem_object_trylock(vma->obj, ww)) continue;
__i915_vma_pin(vma); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index cd5f2348a187..c810da476c2b 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -93,6 +93,7 @@ void i915_gem_gtt_finish_pages(struct drm_i915_gem_object *obj, * asked to wait for eviction and interrupted. */ int i915_gem_gtt_reserve(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww, struct drm_mm_node *node, u64 size, u64 offset, unsigned long color, unsigned int flags) @@ -117,7 +118,7 @@ int i915_gem_gtt_reserve(struct i915_address_space *vm, if (flags & PIN_NOEVICT) return -ENOSPC;
- err = i915_gem_evict_for_node(vm, node, flags); + err = i915_gem_evict_for_node(vm, ww, node, flags); if (err == 0) err = drm_mm_reserve_node(&vm->mm, node);
@@ -184,6 +185,7 @@ static u64 random_offset(u64 start, u64 end, u64 len, u64 align) * asked to wait for eviction and interrupted. */ int i915_gem_gtt_insert(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww, struct drm_mm_node *node, u64 size, u64 alignment, unsigned long color, u64 start, u64 end, unsigned int flags) @@ -269,7 +271,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, */ offset = random_offset(start, end, size, alignment ?: I915_GTT_MIN_ALIGNMENT); - err = i915_gem_gtt_reserve(vm, node, size, offset, color, flags); + err = i915_gem_gtt_reserve(vm, ww, node, size, offset, color, flags); if (err != -ENOSPC) return err;
@@ -277,7 +279,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, return -ENOSPC;
/* Randomly selected placement is pinned, do a search */ - err = i915_gem_evict_something(vm, size, alignment, color, + err = i915_gem_evict_something(vm, ww, size, alignment, color, start, end, flags); if (err) return err; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index c9b0ee5e1d23..e4938aba3fe9 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -16,6 +16,7 @@
struct drm_i915_gem_object; struct i915_address_space; +struct i915_gem_ww_ctx;
int __must_check i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj, struct sg_table *pages); @@ -23,11 +24,13 @@ void i915_gem_gtt_finish_pages(struct drm_i915_gem_object *obj, struct sg_table *pages);
int i915_gem_gtt_reserve(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww, struct drm_mm_node *node, u64 size, u64 offset, unsigned long color, unsigned int flags);
int i915_gem_gtt_insert(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww, struct drm_mm_node *node, u64 size, u64 alignment, unsigned long color, u64 start, u64 end, unsigned int flags); diff --git a/drivers/gpu/drm/i915/i915_vgpu.c b/drivers/gpu/drm/i915/i915_vgpu.c index 31a105bc1792..c97323973f9b 100644 --- a/drivers/gpu/drm/i915/i915_vgpu.c +++ b/drivers/gpu/drm/i915/i915_vgpu.c @@ -197,7 +197,7 @@ static int vgt_balloon_space(struct i915_ggtt *ggtt, drm_info(&dev_priv->drm, "balloon space: range [ 0x%lx - 0x%lx ] %lu KiB.\n", start, end, size / 1024); - ret = i915_gem_gtt_reserve(&ggtt->vm, node, + ret = i915_gem_gtt_reserve(&ggtt->vm, NULL, node, size, start, I915_COLOR_UNEVICTABLE, 0); if (!ret) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index ecf97b3bf64f..74b88b130cd5 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -650,7 +650,8 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color) * 0 on success, negative error code otherwise. */ static int -i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) +i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, + u64 size, u64 alignment, u64 flags) { unsigned long color; u64 start, end; @@ -702,7 +703,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) range_overflows(offset, size, end)) return -EINVAL;
- ret = i915_gem_gtt_reserve(vma->vm, &vma->node, + ret = i915_gem_gtt_reserve(vma->vm, ww, &vma->node, size, offset, color, flags); if (ret) @@ -741,7 +742,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) size = round_up(size, I915_GTT_PAGE_SIZE_2M); }
- ret = i915_gem_gtt_insert(vma->vm, &vma->node, + ret = i915_gem_gtt_insert(vma->vm, ww, &vma->node, size, alignment, color, start, end, flags); if (ret) @@ -1379,7 +1380,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, goto err_unlock;
if (!(bound & I915_VMA_BIND_MASK)) { - err = i915_vma_insert(vma, size, alignment, flags); + err = i915_vma_insert(vma, ww, size, alignment, flags); if (err) goto err_active;
@@ -1459,7 +1460,9 @@ static int __i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, /* Unlike i915_vma_pin, we don't take no for an answer! */ flush_idle_contexts(vm->gt); if (mutex_lock_interruptible(&vm->mutex) == 0) { - i915_gem_evict_vm(vm); + + /* We pass NULL ww here, as we don't want to unbind locked objects */ + i915_gem_evict_vm(vm, NULL); mutex_unlock(&vm->mutex); } } while (1); diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c index 7e0658a77659..c80e1483d603 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c @@ -117,7 +117,7 @@ static int igt_evict_something(void *arg)
/* Everything is pinned, nothing should happen */ mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_something(&ggtt->vm, + err = i915_gem_evict_something(&ggtt->vm, NULL, I915_GTT_PAGE_SIZE, 0, 0, 0, U64_MAX, 0); @@ -132,11 +132,12 @@ static int igt_evict_something(void *arg)
/* Everything is unpinned, we should be able to evict something */ mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_something(&ggtt->vm, + err = i915_gem_evict_something(&ggtt->vm, NULL, I915_GTT_PAGE_SIZE, 0, 0, 0, U64_MAX, 0); mutex_unlock(&ggtt->vm.mutex); + if (err) { pr_err("i915_gem_evict_something failed on a full GGTT with err=%d\n", err); @@ -204,7 +205,7 @@ static int igt_evict_for_vma(void *arg)
/* Everything is pinned, nothing should happen */ mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_for_node(&ggtt->vm, &target, 0); + err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0); mutex_unlock(&ggtt->vm.mutex); if (err != -ENOSPC) { pr_err("i915_gem_evict_for_node on a full GGTT returned err=%d\n", @@ -216,7 +217,7 @@ static int igt_evict_for_vma(void *arg)
/* Everything is unpinned, we should be able to evict the node */ mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_for_node(&ggtt->vm, &target, 0); + err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0); mutex_unlock(&ggtt->vm.mutex); if (err) { pr_err("i915_gem_evict_for_node returned err=%d\n", @@ -297,7 +298,7 @@ static int igt_evict_for_cache_color(void *arg)
/* Remove just the second vma */ mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_for_node(&ggtt->vm, &target, 0); + err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0); mutex_unlock(&ggtt->vm.mutex); if (err) { pr_err("[0]i915_gem_evict_for_node returned err=%d\n", err); @@ -310,7 +311,7 @@ static int igt_evict_for_cache_color(void *arg) target.color = I915_CACHE_L3_LLC;
mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_for_node(&ggtt->vm, &target, 0); + err = i915_gem_evict_for_node(&ggtt->vm, NULL, &target, 0); mutex_unlock(&ggtt->vm.mutex); if (!err) { pr_err("[1]i915_gem_evict_for_node returned err=%d\n", err); @@ -331,6 +332,7 @@ static int igt_evict_vm(void *arg) { struct intel_gt *gt = arg; struct i915_ggtt *ggtt = gt->ggtt; + struct i915_gem_ww_ctx ww; LIST_HEAD(objects); int err;
@@ -342,7 +344,7 @@ static int igt_evict_vm(void *arg)
/* Everything is pinned, nothing should happen */ mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_vm(&ggtt->vm); + err = i915_gem_evict_vm(&ggtt->vm, NULL); mutex_unlock(&ggtt->vm.mutex); if (err) { pr_err("i915_gem_evict_vm on a full GGTT returned err=%d]\n", @@ -352,9 +354,14 @@ static int igt_evict_vm(void *arg)
unpin_ggtt(ggtt);
+ i915_gem_ww_ctx_init(&ww, false); mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_vm(&ggtt->vm); + err = i915_gem_evict_vm(&ggtt->vm, &ww); mutex_unlock(&ggtt->vm.mutex); + + /* no -EDEADLK handling; can't happen with vm.mutex in place */ + i915_gem_ww_ctx_fini(&ww); + if (err) { pr_err("i915_gem_evict_vm on a full GGTT returned err=%d]\n", err); @@ -402,7 +409,7 @@ static int igt_evict_contexts(void *arg) /* Reserve a block so that we know we have enough to fit a few rq */ memset(&hole, 0, sizeof(hole)); mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &hole, + err = i915_gem_gtt_insert(&ggtt->vm, NULL, &hole, PRETEND_GGTT_SIZE, 0, I915_COLOR_UNEVICTABLE, 0, ggtt->vm.total, PIN_NOEVICT); @@ -422,7 +429,7 @@ static int igt_evict_contexts(void *arg) goto out_locked; }
- if (i915_gem_gtt_insert(&ggtt->vm, &r->node, + if (i915_gem_gtt_insert(&ggtt->vm, NULL, &r->node, 1ul << 20, 0, I915_COLOR_UNEVICTABLE, 0, ggtt->vm.total, PIN_NOEVICT)) { diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 4574fb51b656..6b1db83119b3 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -1378,7 +1378,7 @@ static int igt_gtt_reserve(void *arg) }
mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, + err = i915_gem_gtt_reserve(&ggtt->vm, NULL, &vma->node, obj->base.size, total, obj->cache_level, @@ -1430,7 +1430,7 @@ static int igt_gtt_reserve(void *arg) }
mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, + err = i915_gem_gtt_reserve(&ggtt->vm, NULL, &vma->node, obj->base.size, total, obj->cache_level, @@ -1477,7 +1477,7 @@ static int igt_gtt_reserve(void *arg) I915_GTT_MIN_ALIGNMENT);
mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, + err = i915_gem_gtt_reserve(&ggtt->vm, NULL, &vma->node, obj->base.size, offset, obj->cache_level, @@ -1552,7 +1552,7 @@ static int igt_gtt_insert(void *arg) /* Check a couple of obviously invalid requests */ for (ii = invalid_insert; ii->size; ii++) { mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &tmp, + err = i915_gem_gtt_insert(&ggtt->vm, NULL, &tmp, ii->size, ii->alignment, I915_COLOR_UNEVICTABLE, ii->start, ii->end, @@ -1594,7 +1594,7 @@ static int igt_gtt_insert(void *arg) }
mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, + err = i915_gem_gtt_insert(&ggtt->vm, NULL, &vma->node, obj->base.size, 0, obj->cache_level, 0, ggtt->vm.total, 0); @@ -1654,7 +1654,7 @@ static int igt_gtt_insert(void *arg) }
mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, + err = i915_gem_gtt_insert(&ggtt->vm, NULL, &vma->node, obj->base.size, 0, obj->cache_level, 0, ggtt->vm.total, 0); @@ -1703,7 +1703,7 @@ static int igt_gtt_insert(void *arg) }
mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, + err = i915_gem_gtt_insert(&ggtt->vm, NULL, &vma->node, obj->base.size, 0, obj->cache_level, 0, ggtt->vm.total, 0);
On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Reviewed-by: Matthew Auld matthew.auld@intel.com
Now that we require locking to evict, multiple vmas from the same object might not be evicted. This is expected and required, because execbuf will move to short-term pinning by using the lock only. This will cause these tests to fail, because they create a ton of vma's for the same object.
Unbind manually to prevent spurious -ENOSPC in those mock tests.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/selftests/i915_vma.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c index 1f10fe36619b..5c5809dfe9b2 100644 --- a/drivers/gpu/drm/i915/selftests/i915_vma.c +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c @@ -691,7 +691,11 @@ static int igt_vma_rotate_remap(void *arg) }
i915_vma_unpin(vma); - + err = i915_vma_unbind(vma); + if (err) { + pr_err("Unbinding returned %i\n", err); + goto out_object; + } cond_resched(); } } @@ -848,6 +852,11 @@ static int igt_vma_partial(void *arg)
i915_vma_unpin(vma); nvma++; + err = i915_vma_unbind(vma); + if (err) { + pr_err("Unbinding returned %i\n", err); + goto out_object; + }
cond_resched(); } @@ -882,6 +891,12 @@ static int igt_vma_partial(void *arg)
i915_vma_unpin(vma);
+ err = i915_vma_unbind(vma); + if (err) { + pr_err("Unbinding returned %i\n", err); + goto out_object; + } + count = 0; list_for_each_entry(vma, &obj->vma.list, obj_link) count++;
On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Hmm, do we need this? It looks like we should be able to handle such scenarios, with already locked objects sharing the same dma-resv? Or is something else going on here?
On Wed, 8 Dec 2021 at 11:49, Matthew Auld matthew.william.auld@gmail.com wrote:
Oh, because trylock still "fails' in such cases? Do the later changes to evict_vm, where it is able to handle already locked objects fix this? Do we not need similar treatment for things like evict_for_node?
i915_gem_execbuf will call i915_gem_evict_vm() after failing to pin all objects in the first round. We are about to remove those short-term pins, but even without those the objects are still locked. Add a special case to allow i915_gem_evict_vm to evict locked objects as well.
This might also allow multiple objects sharing the same resv to be evicted.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/i915_gem_evict.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 24f5e3345e43..f502a617b35c 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -410,21 +410,42 @@ int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww) do { struct i915_vma *vma, *vn; LIST_HEAD(eviction_list); + LIST_HEAD(locked_eviction_list);
list_for_each_entry(vma, &vm->bound_list, vm_link) { if (i915_vma_is_pinned(vma)) continue;
+ /* + * If we already own the lock, trylock fails. In case the resv + * is shared among multiple objects, we still need the object ref. + */ + if (ww && (dma_resv_locking_ctx(vma->obj->base.resv) == &ww->ctx)) { + __i915_vma_pin(vma); + list_add(&vma->evict_link, &locked_eviction_list); + continue; + } + if (!i915_gem_object_trylock(vma->obj, ww)) continue;
__i915_vma_pin(vma); list_add(&vma->evict_link, &eviction_list); } - if (list_empty(&eviction_list)) + if (list_empty(&eviction_list) && list_empty(&locked_eviction_list)) break;
ret = 0; + /* Unbind locked objects first, before unlocking the eviction_list */ + list_for_each_entry_safe(vma, vn, &locked_eviction_list, evict_link) { + __i915_vma_unpin(vma); + + if (ret == 0) + ret = __i915_vma_unbind(vma); + if (ret != -EINTR) /* "Get me out of here!" */ + ret = 0; + } + list_for_each_entry_safe(vma, vn, &eviction_list, evict_link) { __i915_vma_unpin(vma); if (ret == 0)
On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Reviewed-by: Matthew Auld matthew.auld@intel.com
Do we need similar treatment for stuff like evict_for_node etc?
On 08-12-2021 13:07, Matthew Auld wrote:
Unfortunately not, we would risk evicting newly bound objects when we completely drop short term pinning
evict_vm is the exception, because you expect it to clean up the entire vm as much as possible, and is called explictly, not as a part of pinning. :)
Now that we cannot unbind kill the currently locked object directly because we're removing short term pinning, we may have to unbind the object from gtt manually, using a i915_gem_evict_vm() call.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 65fc6ff5f59d..6d557bb9926f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -357,8 +357,22 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf) vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags); }
- /* The entire mappable GGTT is pinned? Unexpected! */ - GEM_BUG_ON(vma == ERR_PTR(-ENOSPC)); + /* + * The entire mappable GGTT is pinned? Unexpected! + * Try to evict the object we locked too, as normally we skip it + * due to lack of short term pinning inside execbuf. + */ + if (vma == ERR_PTR(-ENOSPC)) { + ret = mutex_lock_interruptible(&ggtt->vm.mutex); + if (!ret) { + ret = i915_gem_evict_vm(&ggtt->vm, &ww); + mutex_unlock(&ggtt->vm.mutex); + } + if (ret) + goto err_reset; + vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags); + } + GEM_WARN_ON(vma == ERR_PTR(-ENOSPC)); } if (IS_ERR(vma)) { ret = PTR_ERR(vma);
On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Now that we cannot unbind kill the currently locked object directly
Can this be reworded slightly? Not sure what is meant by "unbind kill" here.
Would it make sense to pass an extra flag for the above ggtt_pin(maybe PIN_EVICT_SHARED)? Such that evict_for_something can handle the already locked object and then also any vma sharing the same dma-resv object here? Or at least trying to nuke the entire vm, just for the mappable portion seems maybe overkill? Or perhaps we never expect to hit this in the real world?
Reviewed-by: Matthew Auld matthew.auld@intel.com
On 09-12-2021 13:17, Matthew Auld wrote:
Oops, the word 'kill' doesn't belong here.
Yeah, effect would be the same though. When fully reworking eviction and vm locks, it might be better to do so though.
We want to remove more members of i915_vma, which requires the locking to be held more often.
Start requiring gem object lock for i915_vma_unbind, as it's one of the callers that may unpin pages.
Some special care is needed when evicting, because the last reference to the object may be held by the VMA, so after __i915_vma_unbind, vma may be garbage, and we need to cache vma->obj before unlocking.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/display/intel_fb_pin.c | 2 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- .../i915/gem/selftests/i915_gem_client_blt.c | 2 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 6 +++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 45 ++++++++++++++++--- drivers/gpu/drm/i915/i915_gem.c | 2 + drivers/gpu/drm/i915/i915_vma.c | 27 ++++++++++- drivers/gpu/drm/i915/i915_vma.h | 1 + drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 22 ++++----- drivers/gpu/drm/i915/selftests/i915_vma.c | 8 ++-- 10 files changed, 92 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c index 3b20f69e0240..3aec972d9382 100644 --- a/drivers/gpu/drm/i915/display/intel_fb_pin.c +++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c @@ -47,7 +47,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb, goto err;
if (i915_vma_misplaced(vma, 0, alignment, 0)) { - ret = i915_vma_unbind(vma); + ret = i915_vma_unbind_unlocked(vma); if (ret) { vma = ERR_PTR(ret); goto err; diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index c69c7d45aabc..0db62652e3ba 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -647,7 +647,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg) * pages. */ for (offset = 4096; offset < page_size; offset += 4096) { - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) goto out_unpin;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index 8402ed925a69..8fb5be799b3c 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -318,7 +318,7 @@ static int pin_buffer(struct i915_vma *vma, u64 addr) int err;
if (drm_mm_node_allocated(&vma->node) && vma->node.start != addr) { - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) return err; } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 6d30cdfa80f3..e69e8861352d 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -165,7 +165,9 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, kunmap(p);
out: + i915_gem_object_lock(obj, NULL); __i915_vma_put(vma); + i915_gem_object_unlock(obj); return err; }
@@ -259,7 +261,9 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, if (err) return err;
+ i915_gem_object_lock(obj, NULL); __i915_vma_put(vma); + i915_gem_object_unlock(obj);
if (igt_timeout(end_time, "%s: timed out after tiling=%d stride=%d\n", @@ -1349,7 +1353,9 @@ static int __igt_mmap_revoke(struct drm_i915_private *i915, * for other objects. Ergo we have to revoke the previous mmap PTE * access as it no longer points to the same object. */ + i915_gem_object_lock(obj, NULL); err = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE); + i915_gem_object_unlock(obj); if (err) { pr_err("Failed to unbind object!\n"); goto out_unmap; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 17e2e01b8d25..f9790f45a492 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -129,22 +129,47 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm)
drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt);
+retry: + i915_gem_drain_freed_objects(vm->i915); + mutex_lock(&vm->mutex);
/* Skip rewriting PTE on VMA unbind. */ open = atomic_xchg(&vm->open, 0);
list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link) { + struct drm_i915_gem_object *obj = vma->obj; + GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); + i915_vma_wait_for_bind(vma);
- if (i915_vma_is_pinned(vma)) + if (i915_vma_is_pinned(vma) || !i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) continue;
- if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) { - __i915_vma_evict(vma); - drm_mm_remove_node(&vma->node); + /* unlikely to race when GPU is idle, so no worry about slowpath.. */ + if (!i915_gem_object_trylock(obj, NULL)) { + atomic_set(&vm->open, open); + + i915_gem_object_get(obj); + mutex_unlock(&vm->mutex); + + i915_gem_object_lock(obj, NULL); + open = i915_vma_unbind(vma); + i915_gem_object_unlock(obj); + + GEM_WARN_ON(open); + + i915_gem_object_put(obj); + goto retry; } + + i915_vma_wait_for_bind(vma); + + __i915_vma_evict(vma); + drm_mm_remove_node(&vma->node); + + i915_gem_object_unlock(obj); }
vm->clear_range(vm, 0, vm->total); @@ -742,11 +767,21 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt) atomic_set(&ggtt->vm.open, 0);
flush_workqueue(ggtt->vm.i915->wq); + i915_gem_drain_freed_objects(ggtt->vm.i915);
mutex_lock(&ggtt->vm.mutex);
- list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link) + list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link) { + struct drm_i915_gem_object *obj = vma->obj; + bool trylock; + + trylock = i915_gem_object_trylock(obj, NULL); + WARN_ON(!trylock); + WARN_ON(__i915_vma_unbind(vma)); + if (trylock) + i915_gem_object_unlock(obj); + }
if (drm_mm_node_allocated(&ggtt->error_capture)) drm_mm_remove_node(&ggtt->error_capture); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e83522953cde..5ce38027e0fd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -118,6 +118,8 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, struct i915_vma *vma; int ret;
+ assert_object_held(obj); + if (list_empty(&obj->vma.list)) return 0;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 74b88b130cd5..7261d82162af 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -1598,8 +1598,16 @@ void i915_vma_parked(struct intel_gt *gt) struct drm_i915_gem_object *obj = vma->obj; struct i915_address_space *vm = vma->vm;
- INIT_LIST_HEAD(&vma->closed_link); - __i915_vma_put(vma); + if (i915_gem_object_trylock(obj, NULL)) { + INIT_LIST_HEAD(&vma->closed_link); + __i915_vma_put(vma); + i915_gem_object_unlock(obj); + } else { + /* back you go.. */ + spin_lock_irq(>->closed_lock); + list_add(&vma->closed_link, >->closed_vma); + spin_unlock_irq(>->closed_lock); + }
i915_gem_object_put(obj); i915_vm_close(vm); @@ -1715,6 +1723,7 @@ int _i915_vma_move_to_active(struct i915_vma *vma, void __i915_vma_evict(struct i915_vma *vma) { GEM_BUG_ON(i915_vma_is_pinned(vma)); + assert_object_held_shared(vma->obj);
if (i915_vma_is_map_and_fenceable(vma)) { /* Force a pagefault for domain tracking on next user access */ @@ -1760,6 +1769,7 @@ int __i915_vma_unbind(struct i915_vma *vma) int ret;
lockdep_assert_held(&vma->vm->mutex); + assert_object_held_shared(vma->obj);
if (!drm_mm_node_allocated(&vma->node)) return 0; @@ -1791,6 +1801,8 @@ int i915_vma_unbind(struct i915_vma *vma) intel_wakeref_t wakeref = 0; int err;
+ assert_object_held_shared(vma->obj); + /* Optimistic wait before taking the mutex */ err = i915_vma_sync(vma); if (err) @@ -1821,6 +1833,17 @@ int i915_vma_unbind(struct i915_vma *vma) return err; }
+int i915_vma_unbind_unlocked(struct i915_vma *vma) +{ + int err; + + i915_gem_object_lock(vma->obj, NULL); + err = i915_vma_unbind(vma); + i915_gem_object_unlock(vma->obj); + + return err; +} + struct i915_vma *i915_vma_make_unshrinkable(struct i915_vma *vma) { i915_gem_object_make_unshrinkable(vma->obj); diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 32719431b3df..da69ecb1b860 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -214,6 +214,7 @@ void i915_vma_revoke_mmap(struct i915_vma *vma); void __i915_vma_evict(struct i915_vma *vma); int __i915_vma_unbind(struct i915_vma *vma); int __must_check i915_vma_unbind(struct i915_vma *vma); +int __must_check i915_vma_unbind_unlocked(struct i915_vma *vma); void i915_vma_unlink_ctx(struct i915_vma *vma); void i915_vma_close(struct i915_vma *vma); void i915_vma_reopen(struct i915_vma *vma); diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 6b1db83119b3..de16897d3d9a 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -385,7 +385,7 @@ static void close_object_list(struct list_head *objects,
vma = i915_vma_instance(obj, vm, NULL); if (!IS_ERR(vma)) - ignored = i915_vma_unbind(vma); + ignored = i915_vma_unbind_unlocked(vma);
list_del(&obj->st_link); i915_gem_object_put(obj); @@ -496,7 +496,7 @@ static int fill_hole(struct i915_address_space *vm, goto err; }
- err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("%s(%s) (forward) unbind of vma.node=%llx + %llx failed with err=%d\n", __func__, p->name, vma->node.start, vma->node.size, @@ -569,7 +569,7 @@ static int fill_hole(struct i915_address_space *vm, goto err; }
- err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("%s(%s) (backward) unbind of vma.node=%llx + %llx failed with err=%d\n", __func__, p->name, vma->node.start, vma->node.size, @@ -655,7 +655,7 @@ static int walk_hole(struct i915_address_space *vm, goto err_put; }
- err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("%s unbind failed at %llx + %llx with err=%d\n", __func__, addr, vma->size, err); @@ -732,13 +732,13 @@ static int pot_hole(struct i915_address_space *vm, pr_err("%s incorrect at %llx + %llx\n", __func__, addr, vma->size); i915_vma_unpin(vma); - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); err = -EINVAL; goto err_obj; }
i915_vma_unpin(vma); - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); GEM_BUG_ON(err); }
@@ -832,13 +832,13 @@ static int drunk_hole(struct i915_address_space *vm, pr_err("%s incorrect at %llx + %llx\n", __func__, addr, BIT_ULL(size)); i915_vma_unpin(vma); - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); err = -EINVAL; goto err_obj; }
i915_vma_unpin(vma); - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); GEM_BUG_ON(err);
if (igt_timeout(end_time, @@ -906,7 +906,7 @@ static int __shrink_hole(struct i915_address_space *vm, pr_err("%s incorrect at %llx + %llx\n", __func__, addr, size); i915_vma_unpin(vma); - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); err = -EINVAL; break; } @@ -1465,7 +1465,7 @@ static int igt_gtt_reserve(void *arg) goto out; }
- err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("i915_vma_unbind failed with err=%d!\n", err); goto out; @@ -1647,7 +1647,7 @@ static int igt_gtt_insert(void *arg) GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); offset = vma->node.start;
- err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("i915_vma_unbind failed with err=%d!\n", err); goto out; diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c index 5c5809dfe9b2..2c73f7448df7 100644 --- a/drivers/gpu/drm/i915/selftests/i915_vma.c +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c @@ -340,7 +340,7 @@ static int igt_vma_pin1(void *arg)
if (!err) { i915_vma_unpin(vma); - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("Failed to unbind single page from GGTT, err=%d\n", err); goto out; @@ -691,7 +691,7 @@ static int igt_vma_rotate_remap(void *arg) }
i915_vma_unpin(vma); - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("Unbinding returned %i\n", err); goto out_object; @@ -852,7 +852,7 @@ static int igt_vma_partial(void *arg)
i915_vma_unpin(vma); nvma++; - err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("Unbinding returned %i\n", err); goto out_object; @@ -891,7 +891,7 @@ static int igt_vma_partial(void *arg)
i915_vma_unpin(vma);
- err = i915_vma_unbind(vma); + err = i915_vma_unbind_unlocked(vma); if (err) { pr_err("Unbinding returned %i\n", err); goto out_object;
On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
<snip>
Does this need a comment about barriers?
Should this not be kref_get_unless_zero? Assuming the vm->mutex is the only thing keeping the object alive here, won't this lead to potential uaf/double-free or something? Also should we not plonk this before the trylock? Or maybe I'm missing something here?
We also call wait_for_bind above, is that intentional?
On 09-12-2021 14:05, Matthew Auld wrote:
Not sure, it's guarded by vm->mutex.
Normally you're correct, this is normally the case, but we drain freed objects and this path should only be run during s/r, at which point userspace should be dead, GPU idle, and we just drained all freed objects above.
It would be a bug if we still found a dead object, as nothing should be running.
Should be harmless, but first one should probably be removed. :)
On Thu, 9 Dec 2021 at 13:25, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Hmm, Ok. So why do we expect the trylock to ever fail here? Who else can grab it at this stage?
On Thu, 9 Dec 2021 at 13:46, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
r-b with that then.
TTM already requires this, and we require it for delayed destroy.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 5fac9b560b73..39cd563544a5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -262,6 +262,8 @@ static void __i915_gem_object_free_mmaps(struct drm_i915_gem_object *obj) */ void __i915_gem_object_pages_fini(struct drm_i915_gem_object *obj) { + assert_object_held(obj); + if (!list_empty(&obj->vma.list)) { struct i915_vma *vma;
@@ -328,7 +330,10 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, obj->ops->delayed_free(obj); continue; } + + i915_gem_object_lock(obj, NULL); __i915_gem_object_pages_fini(obj); + i915_gem_object_unlock(obj); __i915_gem_free_object(obj);
/* But keep the pointer alive for RCU-protected lookups */
This duck tape workaround is no longer required, unbind and destroy are fixed to take the obj->resv mutex before destroying and obj->mm.lock has been removed, always requiring obj->resv as well.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_object.h | 14 -------------- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 12 ++++++------ drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +- drivers/gpu/drm/i915/i915_vma.c | 6 +++--- 5 files changed, 12 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 39cd563544a5..0ae86812ed66 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -544,7 +544,7 @@ bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj) #ifdef CONFIG_LOCKDEP if (IS_DGFX(to_i915(obj->base.dev)) && i915_gem_object_evictable((void __force *)obj)) - assert_object_held_shared(obj); + assert_object_held(obj); #endif return obj->mem_flags & I915_BO_FLAG_STRUCT_PAGE; } @@ -563,7 +563,7 @@ bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj) #ifdef CONFIG_LOCKDEP if (IS_DGFX(to_i915(obj->base.dev)) && i915_gem_object_evictable((void __force *)obj)) - assert_object_held_shared(obj); + assert_object_held(obj); #endif return obj->mem_flags & I915_BO_FLAG_IOMEM; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index f66d46882ea7..09cf1c92373a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -157,20 +157,6 @@ i915_gem_object_put(struct drm_i915_gem_object *obj)
#define assert_object_held(obj) dma_resv_assert_held((obj)->base.resv)
-/* - * If more than one potential simultaneous locker, assert held. - */ -static inline void assert_object_held_shared(const struct drm_i915_gem_object *obj) -{ - /* - * Note mm list lookup is protected by - * kref_get_unless_zero(). - */ - if (IS_ENABLED(CONFIG_LOCKDEP) && - kref_read(&obj->base.refcount) > 0) - assert_object_held(obj); -} - static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj, struct i915_gem_ww_ctx *ww, bool intr) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 49c6e55c68ce..6805b46170be 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -19,7 +19,7 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, bool shrinkable; int i;
- assert_object_held_shared(obj); + assert_object_held(obj);
if (i915_gem_object_is_volatile(obj)) obj->mm.madv = I915_MADV_DONTNEED; @@ -95,7 +95,7 @@ int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj) struct drm_i915_private *i915 = to_i915(obj->base.dev); int err;
- assert_object_held_shared(obj); + assert_object_held(obj);
if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) { drm_dbg(&i915->drm, @@ -122,7 +122,7 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
assert_object_held(obj);
- assert_object_held_shared(obj); + assert_object_held(obj);
if (unlikely(!i915_gem_object_has_pages(obj))) { GEM_BUG_ON(i915_gem_object_has_pinned_pages(obj)); @@ -171,7 +171,7 @@ int i915_gem_object_truncate(struct drm_i915_gem_object *obj) /* Try to discard unwanted pages */ void i915_gem_object_writeback(struct drm_i915_gem_object *obj) { - assert_object_held_shared(obj); + assert_object_held(obj); GEM_BUG_ON(i915_gem_object_has_pages(obj));
if (obj->ops->writeback) @@ -202,7 +202,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) { struct sg_table *pages;
- assert_object_held_shared(obj); + assert_object_held(obj);
pages = fetch_and_zero(&obj->mm.pages); if (IS_ERR_OR_NULL(pages)) @@ -233,7 +233,7 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj) return -EBUSY;
/* May be called by shrinker from within get_pages() (on another bo) */ - assert_object_held_shared(obj); + assert_object_held(obj);
i915_gem_object_release_mmap_offset(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 3173c9f9a040..a315c010f635 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -109,7 +109,7 @@ static void i915_gem_object_userptr_drop_ref(struct drm_i915_gem_object *obj) { struct page **pvec = NULL;
- assert_object_held_shared(obj); + assert_object_held(obj);
if (!--obj->userptr.page_ref) { pvec = obj->userptr.pvec; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 7261d82162af..100739b3e084 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -1723,7 +1723,7 @@ int _i915_vma_move_to_active(struct i915_vma *vma, void __i915_vma_evict(struct i915_vma *vma) { GEM_BUG_ON(i915_vma_is_pinned(vma)); - assert_object_held_shared(vma->obj); + assert_object_held(vma->obj);
if (i915_vma_is_map_and_fenceable(vma)) { /* Force a pagefault for domain tracking on next user access */ @@ -1769,7 +1769,7 @@ int __i915_vma_unbind(struct i915_vma *vma) int ret;
lockdep_assert_held(&vma->vm->mutex); - assert_object_held_shared(vma->obj); + assert_object_held(vma->obj);
if (!drm_mm_node_allocated(&vma->node)) return 0; @@ -1801,7 +1801,7 @@ int i915_vma_unbind(struct i915_vma *vma) intel_wakeref_t wakeref = 0; int err;
- assert_object_held_shared(vma->obj); + assert_object_held(vma->obj);
/* Optimistic wait before taking the mutex */ err = i915_vma_sync(vma);
On Mon, 29 Nov 2021 at 13:57, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Reviewed-by: Matthew Auld matthew.auld@intel.com
Now that we require the object lock for all ops, some code handling race conditions can be removed.
This is required to not take short-term pins inside execbuf.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Acked-by: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com --- drivers/gpu/drm/i915/i915_vma.c | 40 +++++---------------------------- 1 file changed, 5 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 100739b3e084..3236cf1c1400 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -776,7 +776,6 @@ i915_vma_detach(struct i915_vma *vma) static bool try_qad_pin(struct i915_vma *vma, unsigned int flags) { unsigned int bound; - bool pinned = true;
bound = atomic_read(&vma->flags); do { @@ -786,34 +785,10 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned int flags) if (unlikely(bound & (I915_VMA_OVERFLOW | I915_VMA_ERROR))) return false;
- if (!(bound & I915_VMA_PIN_MASK)) - goto unpinned; - GEM_BUG_ON(((bound + 1) & I915_VMA_PIN_MASK) == 0); } while (!atomic_try_cmpxchg(&vma->flags, &bound, bound + 1));
return true; - -unpinned: - /* - * If pin_count==0, but we are bound, check under the lock to avoid - * racing with a concurrent i915_vma_unbind(). - */ - mutex_lock(&vma->vm->mutex); - do { - if (unlikely(bound & (I915_VMA_OVERFLOW | I915_VMA_ERROR))) { - pinned = false; - break; - } - - if (unlikely(flags & ~bound)) { - pinned = false; - break; - } - } while (!atomic_try_cmpxchg(&vma->flags, &bound, bound + 1)); - mutex_unlock(&vma->vm->mutex); - - return pinned; }
static struct scatterlist * @@ -1194,13 +1169,7 @@ __i915_vma_get_pages(struct i915_vma *vma) vma->ggtt_view.type, ret); }
- pages = xchg(&vma->pages, pages); - - /* did we race against a put_pages? */ - if (pages && pages != vma->obj->mm.pages) { - sg_free_table(vma->pages); - kfree(vma->pages); - } + vma->pages = pages;
return ret; } @@ -1234,13 +1203,14 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) static void __vma_put_pages(struct i915_vma *vma, unsigned int count) { /* We allocate under vma_get_pages, so beware the shrinker */ - struct sg_table *pages = READ_ONCE(vma->pages); + struct sg_table *pages = vma->pages;
GEM_BUG_ON(atomic_read(&vma->pages_count) < count);
if (atomic_sub_return(count, &vma->pages_count) == 0) { - if (pages == cmpxchg(&vma->pages, pages, NULL) && - pages != vma->obj->mm.pages) { + vma->pages = NULL; + + if (pages != vma->obj->mm.pages) { sg_free_table(pages); kfree(pages); }
Add a flag PIN_VALIDATE, to indicate we don't need to pin and only protected by the object lock.
This removes the need to unpin, which is done by just releasing the lock.
eb_reserve is slightly reworked for readability, but the same steps are still done: - First pass pins with NONBLOCK. - Second pass unbinds all objects first, then pins. - Third pass is only called when not all objects are softpinned, and unbinds all objects, then calls i915_gem_evict_vm(), then pins.
When evicting the entire vm in eb_reserve() we do temporarily pin objects that are marked with EXEC_OBJECT_PINNED. This is because they are already at their destination, and i915_gem_evict_vm() would otherwise unbind them.
However, we reduce the visibility of those pins by limiting the pin to our call to i915_gem_evict_vm() only, and pin with vm->mutex held, instead of the entire duration of the execbuf.
Not sure the latter matters, one can hope.. In theory we could kill the pinning by adding an extra flag to the vma to temporarily prevent unbinding for gtt for i915_gem_evict_vm only, but I think that might be overkill. We're still holding the object lock, and we don't have blocking eviction yet. It's likely sufficient to simply enforce EXEC_OBJECT_PINNED for all objects on >= gen12.
Changes since v1: - Split out eb_reserve() into separate functions for readability. Changes since v2: - Make batch buffer mappable on platforms where only GGTT is available, to prevent moving the batch buffer during relocations. Changes since v3: - Preserve current behavior for batch buffer, instead be cautious when calling i915_gem_object_ggtt_pin_ww, and re-use the current batch vma if it's inside ggtt and map-and-fenceable. - Remove impossible condition check from eb_reserve. (Matt)
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 250 ++++++++++-------- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 1 - drivers/gpu/drm/i915/i915_gem_gtt.h | 1 + drivers/gpu/drm/i915/i915_vma.c | 24 +- 4 files changed, 158 insertions(+), 118 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 91ab43d67f47..b6a50631c21b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -436,7 +436,7 @@ eb_pin_vma(struct i915_execbuffer *eb, else pin_flags = entry->offset & PIN_OFFSET_MASK;
- pin_flags |= PIN_USER | PIN_NOEVICT | PIN_OFFSET_FIXED; + pin_flags |= PIN_USER | PIN_NOEVICT | PIN_OFFSET_FIXED | PIN_VALIDATE; if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_GTT)) pin_flags |= PIN_GLOBAL;
@@ -454,17 +454,15 @@ eb_pin_vma(struct i915_execbuffer *eb, entry->pad_to_size, entry->alignment, eb_pin_flags(entry, ev->flags) | - PIN_USER | PIN_NOEVICT); + PIN_USER | PIN_NOEVICT | PIN_VALIDATE); if (unlikely(err)) return err; }
if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) { err = i915_vma_pin_fence(vma); - if (unlikely(err)) { - i915_vma_unpin(vma); + if (unlikely(err)) return err; - }
if (vma->fence) ev->flags |= __EXEC_OBJECT_HAS_FENCE; @@ -480,13 +478,9 @@ eb_pin_vma(struct i915_execbuffer *eb, static inline void eb_unreserve_vma(struct eb_vma *ev) { - if (!(ev->flags & __EXEC_OBJECT_HAS_PIN)) - return; - if (unlikely(ev->flags & __EXEC_OBJECT_HAS_FENCE)) __i915_vma_unpin_fence(ev->vma);
- __i915_vma_unpin(ev->vma); ev->flags &= ~__EXEC_OBJECT_RESERVED; }
@@ -679,10 +673,8 @@ static int eb_reserve_vma(struct i915_execbuffer *eb,
if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) { err = i915_vma_pin_fence(vma); - if (unlikely(err)) { - i915_vma_unpin(vma); + if (unlikely(err)) return err; - }
if (vma->fence) ev->flags |= __EXEC_OBJECT_HAS_FENCE; @@ -694,85 +686,125 @@ static int eb_reserve_vma(struct i915_execbuffer *eb, return 0; }
-static int eb_reserve(struct i915_execbuffer *eb) +static int eb_evict_vm(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; - unsigned int pin_flags = PIN_USER | PIN_NONBLOCK; + unsigned int i; + int err; + + err = mutex_lock_interruptible(&eb->context->vm->mutex); + if (err) + return err; + + /* pin to protect against i915_gem_evict_vm evicting below */ + for (i = 0; i < count; i++) { + struct eb_vma *ev = &eb->vma[i]; + + if (ev->flags & __EXEC_OBJECT_HAS_PIN) + __i915_vma_pin(ev->vma); + } + + /* Too fragmented, unbind everything and retry */ + err = i915_gem_evict_vm(eb->context->vm, &eb->ww); + + /* unpin objects.. */ + for (i = 0; i < count; i++) { + struct eb_vma *ev = &eb->vma[i]; + + if (ev->flags & __EXEC_OBJECT_HAS_PIN) + i915_vma_unpin(ev->vma); + } + + mutex_unlock(&eb->context->vm->mutex); + + return err; +} + +static bool eb_unbind(struct i915_execbuffer *eb) +{ + const unsigned int count = eb->buffer_count; + unsigned int i; struct list_head last; + bool unpinned = false; + + /* Resort *all* the objects into priority order */ + INIT_LIST_HEAD(&eb->unbound); + INIT_LIST_HEAD(&last); + + for (i = 0; i < count; i++) { + struct eb_vma *ev = &eb->vma[i]; + unsigned int flags = ev->flags; + + if (flags & EXEC_OBJECT_PINNED && + flags & __EXEC_OBJECT_HAS_PIN) + continue; + + unpinned = true; + eb_unreserve_vma(ev); + + if (flags & EXEC_OBJECT_PINNED) + /* Pinned must have their slot */ + list_add(&ev->bind_link, &eb->unbound); + else if (flags & __EXEC_OBJECT_NEEDS_MAP) + /* Map require the lowest 256MiB (aperture) */ + list_add_tail(&ev->bind_link, &eb->unbound); + else if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)) + /* Prioritise 4GiB region for restricted bo */ + list_add(&ev->bind_link, &last); + else + list_add_tail(&ev->bind_link, &last); + } + + list_splice_tail(&last, &eb->unbound); + return unpinned; +} + +static int eb_reserve(struct i915_execbuffer *eb) +{ struct eb_vma *ev; - unsigned int i, pass; + unsigned int pass; int err = 0; + bool unpinned;
/* * Attempt to pin all of the buffers into the GTT. - * This is done in 3 phases: + * This is done in 2 phases: * - * 1a. Unbind all objects that do not match the GTT constraints for - * the execbuffer (fenceable, mappable, alignment etc). - * 1b. Increment pin count for already bound objects. - * 2. Bind new objects. - * 3. Decrement pin count. + * 1. Unbind all objects that do not match the GTT constraints for + * the execbuffer (fenceable, mappable, alignment etc). + * 2. Bind new objects. * * This avoid unnecessary unbinding of later objects in order to make * room for the earlier objects *unless* we need to defragment. + * + * Defragmenting is skipped if all objects are pinned at a fixed location. */ - pass = 0; - do { - list_for_each_entry(ev, &eb->unbound, bind_link) { - err = eb_reserve_vma(eb, ev, pin_flags); - if (err) - break; - } - if (err != -ENOSPC) - return err; + for (pass = 0; pass <= 2; pass++) { + int pin_flags = PIN_USER | PIN_VALIDATE;
- /* Resort *all* the objects into priority order */ - INIT_LIST_HEAD(&eb->unbound); - INIT_LIST_HEAD(&last); - for (i = 0; i < count; i++) { - unsigned int flags; + if (pass == 0) + pin_flags |= PIN_NONBLOCK;
- ev = &eb->vma[i]; - flags = ev->flags; - if (flags & EXEC_OBJECT_PINNED && - flags & __EXEC_OBJECT_HAS_PIN) - continue; + if (pass >= 1) + unpinned = eb_unbind(eb);
- eb_unreserve_vma(ev); - - if (flags & EXEC_OBJECT_PINNED) - /* Pinned must have their slot */ - list_add(&ev->bind_link, &eb->unbound); - else if (flags & __EXEC_OBJECT_NEEDS_MAP) - /* Map require the lowest 256MiB (aperture) */ - list_add_tail(&ev->bind_link, &eb->unbound); - else if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)) - /* Prioritise 4GiB region for restricted bo */ - list_add(&ev->bind_link, &last); - else - list_add_tail(&ev->bind_link, &last); - } - list_splice_tail(&last, &eb->unbound); - - switch (pass++) { - case 0: - break; - - case 1: - /* Too fragmented, unbind everything and retry */ - mutex_lock(&eb->context->vm->mutex); - err = i915_gem_evict_vm(eb->context->vm, &eb->ww); - mutex_unlock(&eb->context->vm->mutex); + if (pass == 2) { + err = eb_evict_vm(eb); if (err) return err; - break; + }
- default: - return -ENOSPC; + list_for_each_entry(ev, &eb->unbound, bind_link) { + err = eb_reserve_vma(eb, ev, pin_flags); + if (err) + break; }
- pin_flags = PIN_USER; - } while (1); + if (err != -ENOSPC) + break; + } + + return err; }
static int eb_select_context(struct i915_execbuffer *eb) @@ -1178,10 +1210,11 @@ static void *reloc_kmap(struct drm_i915_gem_object *obj, return vaddr; }
-static void *reloc_iomap(struct drm_i915_gem_object *obj, +static void *reloc_iomap(struct i915_vma *batch, struct i915_execbuffer *eb, unsigned long page) { + struct drm_i915_gem_object *obj = batch->obj; struct reloc_cache *cache = &eb->reloc_cache; struct i915_ggtt *ggtt = cache_to_ggtt(cache); unsigned long offset; @@ -1191,7 +1224,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(ggtt->vm.gt); io_mapping_unmap_atomic((void __force __iomem *) unmask_page(cache->vaddr)); } else { - struct i915_vma *vma; + struct i915_vma *vma = ERR_PTR(-ENODEV); int err;
if (i915_gem_object_is_tiled(obj)) @@ -1204,10 +1237,23 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj, if (err) return ERR_PTR(err);
- vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0, - PIN_MAPPABLE | - PIN_NONBLOCK /* NOWARN */ | - PIN_NOEVICT); + /* + * i915_gem_object_ggtt_pin_ww may attempt to remove the batch + * VMA from the object list because we no longer pin. + * + * Only attempt to pin the batch buffer to ggtt if the current batch + * is not inside ggtt, or the batch buffer is not misplaced. + */ + if (!i915_is_ggtt(batch->vm)) { + vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0, + PIN_MAPPABLE | + PIN_NONBLOCK /* NOWARN */ | + PIN_NOEVICT); + } else if (i915_vma_is_map_and_fenceable(batch)) { + __i915_vma_pin(batch); + vma = batch; + } + if (vma == ERR_PTR(-EDEADLK)) return vma;
@@ -1245,7 +1291,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj, return vaddr; }
-static void *reloc_vaddr(struct drm_i915_gem_object *obj, +static void *reloc_vaddr(struct i915_vma *vma, struct i915_execbuffer *eb, unsigned long page) { @@ -1257,9 +1303,9 @@ static void *reloc_vaddr(struct drm_i915_gem_object *obj, } else { vaddr = NULL; if ((cache->vaddr & KMAP) == 0) - vaddr = reloc_iomap(obj, eb, page); + vaddr = reloc_iomap(vma, eb, page); if (!vaddr) - vaddr = reloc_kmap(obj, cache, page); + vaddr = reloc_kmap(vma->obj, cache, page); }
return vaddr; @@ -1300,7 +1346,7 @@ relocate_entry(struct i915_vma *vma, void *vaddr;
repeat: - vaddr = reloc_vaddr(vma->obj, eb, + vaddr = reloc_vaddr(vma, eb, offset >> PAGE_SHIFT); if (IS_ERR(vaddr)) return PTR_ERR(vaddr); @@ -2076,7 +2122,7 @@ shadow_batch_pin(struct i915_execbuffer *eb, if (IS_ERR(vma)) return vma;
- err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, flags); + err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, flags | PIN_VALIDATE); if (err) return ERR_PTR(err);
@@ -2090,7 +2136,7 @@ static struct i915_vma *eb_dispatch_secure(struct i915_execbuffer *eb, struct i9 * batch" bit. Hence we need to pin secure batches into the global gtt. * hsw should have this fixed, but bdw mucks it up again. */ if (eb->batch_flags & I915_DISPATCH_SECURE) - return i915_gem_object_ggtt_pin_ww(vma->obj, &eb->ww, NULL, 0, 0, 0); + return i915_gem_object_ggtt_pin_ww(vma->obj, &eb->ww, NULL, 0, 0, PIN_VALIDATE);
return NULL; } @@ -2141,13 +2187,12 @@ static int eb_parse(struct i915_execbuffer *eb)
err = i915_gem_object_lock(pool->obj, &eb->ww); if (err) - goto err; + return err;
shadow = shadow_batch_pin(eb, pool->obj, eb->context->vm, PIN_USER); - if (IS_ERR(shadow)) { - err = PTR_ERR(shadow); - goto err; - } + if (IS_ERR(shadow)) + return PTR_ERR(shadow); + intel_gt_buffer_pool_mark_used(pool); i915_gem_object_set_readonly(shadow->obj); shadow->private = pool; @@ -2159,25 +2204,21 @@ static int eb_parse(struct i915_execbuffer *eb) shadow = shadow_batch_pin(eb, pool->obj, &eb->gt->ggtt->vm, PIN_GLOBAL); - if (IS_ERR(shadow)) { - err = PTR_ERR(shadow); - shadow = trampoline; - goto err_shadow; - } + if (IS_ERR(shadow)) + return PTR_ERR(shadow); + shadow->private = pool;
eb->batch_flags |= I915_DISPATCH_SECURE; }
batch = eb_dispatch_secure(eb, shadow); - if (IS_ERR(batch)) { - err = PTR_ERR(batch); - goto err_trampoline; - } + if (IS_ERR(batch)) + return PTR_ERR(batch);
err = dma_resv_reserve_shared(shadow->obj->base.resv, 1); if (err) - goto err_trampoline; + return err;
err = intel_engine_cmd_parser(eb->context->engine, eb->batches[0]->vma, @@ -2185,7 +2226,7 @@ static int eb_parse(struct i915_execbuffer *eb) eb->batch_len[0], shadow, trampoline); if (err) - goto err_unpin_batch; + return err;
eb->batches[0] = &eb->vma[eb->buffer_count++]; eb->batches[0]->vma = i915_vma_get(shadow); @@ -2204,17 +2245,6 @@ static int eb_parse(struct i915_execbuffer *eb) eb->batches[0]->vma = i915_vma_get(batch); } return 0; - -err_unpin_batch: - if (batch) - i915_vma_unpin(batch); -err_trampoline: - if (trampoline) - i915_vma_unpin(trampoline); -err_shadow: - i915_vma_unpin(shadow); -err: - return err; }
static int eb_request_submit(struct i915_execbuffer *eb, @@ -3330,8 +3360,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
err_vma: eb_release_vmas(&eb, true); - if (eb.trampoline) - i915_vma_unpin(eb.trampoline); WARN_ON(err == -EDEADLK); i915_gem_ww_ctx_fini(&eb.ww);
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index f8948de72036..bbcd0522f68e 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -425,7 +425,6 @@ int i915_vma_pin_fence(struct i915_vma *vma) * must keep the device awake whilst using the fence. */ assert_rpm_wakelock_held(vma->vm->gt->uncore->rpm); - GEM_BUG_ON(!i915_vma_is_pinned(vma)); GEM_BUG_ON(!i915_vma_is_ggtt(vma));
err = mutex_lock_interruptible(&vma->vm->mutex); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index e4938aba3fe9..8c2f57eb5dda 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -44,6 +44,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, #define PIN_HIGH BIT_ULL(5) #define PIN_OFFSET_BIAS BIT_ULL(6) #define PIN_OFFSET_FIXED BIT_ULL(7) +#define PIN_VALIDATE BIT_ULL(8) /* validate placement only, no need to call unpin() */
#define PIN_GLOBAL BIT_ULL(10) /* I915_VMA_GLOBAL_BIND */ #define PIN_USER BIT_ULL(11) /* I915_VMA_LOCAL_BIND */ diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 3236cf1c1400..c4b7c0b203df 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -778,6 +778,15 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned int flags) unsigned int bound;
bound = atomic_read(&vma->flags); + + if (flags & PIN_VALIDATE) { + flags &= I915_VMA_BIND_MASK; + + return (flags & bound) == flags; + } + + /* with the lock mandatory for unbind, we don't race here */ + flags &= I915_VMA_BIND_MASK; do { if (unlikely(flags & ~bound)) return false; @@ -1259,7 +1268,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, GEM_BUG_ON(!(flags & (PIN_USER | PIN_GLOBAL)));
/* First try and grab the pin without rebinding the vma */ - if (try_qad_pin(vma, flags & I915_VMA_BIND_MASK)) + if (try_qad_pin(vma, flags)) return 0;
err = i915_vma_get_pages(vma); @@ -1341,7 +1350,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, }
if (unlikely(!(flags & ~bound & I915_VMA_BIND_MASK))) { - __i915_vma_pin(vma); + if (!(flags & PIN_VALIDATE)) + __i915_vma_pin(vma); goto err_unlock; }
@@ -1370,8 +1380,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, atomic_add(I915_VMA_PAGES_ACTIVE, &vma->pages_count); list_move_tail(&vma->vm_link, &vma->vm->bound_list);
- __i915_vma_pin(vma); - GEM_BUG_ON(!i915_vma_is_pinned(vma)); + if (!(flags & PIN_VALIDATE)) { + __i915_vma_pin(vma); + GEM_BUG_ON(!i915_vma_is_pinned(vma)); + } GEM_BUG_ON(!i915_vma_is_bound(vma, flags)); GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags));
@@ -1628,8 +1640,6 @@ static int __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request * { int err;
- GEM_BUG_ON(!i915_vma_is_pinned(vma)); - /* Wait for the vma to be bound before we start! */ err = __i915_request_await_bind(rq, vma); if (err) @@ -1648,6 +1658,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
assert_object_held(obj);
+ GEM_BUG_ON(!vma->pages); + err = __i915_vma_move_to_active(vma, rq); if (unlikely(err)) return err;
On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
Reviewed-by: Matthew Auld matthew.auld@intel.com
Hi,
On 29/11/2021 13:47, Maarten Lankhorst wrote:
New version of the series, with feedback from previous series added.
If there was a cover letter sent for this work in the past could you please keep attaching it? Or if there wasn't, could you please write one?
I am worried about two things. First is that we need to have a high level overview of the rules/design changes documented so third party people have any hope of getting code right after this lands. (Where we are, where we are going, how we will get there, how far did we get and when we will get to the end.)
Second is that when parts of the series land piecemeal (Which they have in this right, right?), it gets very hard to write up a maintainer level changelog.
But in any case, even on the mundane process level, we need to have cover letters for any non trivial work was the conclusion since some time ago.
Regards,
Tvrtko
On 30-11-2021 09:54, Tvrtko Ursulin wrote:
The preparation part is to ensure we always hold vma->obj->resv when unbinding.
The first preparation series ensured vma->obj always existed. This was not the case for mock gtt and gen6 aliasing gtt. This allowed us to remove all the special handling for those uncommon cases, and actually enforce we can always take that lock. This part is merged.
Patch 2-11 in this series adds the vma->obj->resv to eviction and shrinker. Those are the only parts where we don't take the lock yet.
After that, we always hold the lock when required, and we can start requiring the obj-> resv lock when unbinding. This is completed in patch 15.
With that fixed, removing short term pins can be done, because for unbind we now always take obj->resv, so holding obj->resv during execbuf submission is sufficient, and all short term pinning can be removed.
We only pin temporarily when calling i915_gem_evict_vm in execbuf, which could also be handled in theory by just marking all objects as unpinned.
As a bonus, using TTM for delayed eviction on all objects becomes easy, just need to get rid of i915_active in i915_vma, as it keeps the object refcount alive.
Remainder is removing refcount to i915_vma, to make it a real
But in any case, even on the mundane process level, we need to have cover letters for any non trivial work was the conclusion since some time ago.
Here you go! I hope it explains the reasoning.
~Maarten
On 30/11/2021 11:17, Maarten Lankhorst wrote:
Sounds good. But also mention the high level motivation for why we always want to hold vma->obj->resv when unbinding in the introduction as well.
I'd also like the cover letter to contain a high level description on _why_ is removing short term pins needed or beneficial.
What was the flow and object lifetimes so far, and what it will be going forward etc.
Sounds on the right track with maybe a bit more text so the readers can easily understand it on the higher level.
But in any case, even on the mundane process level, we need to have cover letters for any non trivial work was the conclusion since some time ago.
Here you go! I hope it explains the reasoning.
It is on the right track. I think just needs to be expanded a bit with high level direction and plan, pointing out where in the grand scheme this series is. And then don't forget to add the improved text as cover letter when sending next time please.
Regards,
Tvrtko
On 30-11-2021 19:38, Tvrtko Ursulin wrote:
Previously, short term pinning in execbuf was required because i915_vma was effectively independent from objects, and has its own refcount, locking, and lifetime rules and pinning. This series removes the separate locking, by requiring vma->obj->resv to be held when pinning and unbinding. This will also be required for VM_BIND work. With pinning required for pinning and unbinding, the lock is enough to prevent unbinding when trying to pin with the lock held, like in execbuf. This makes binding/unbinding similar to ttm_bo_validate()'s use, which just cares that an object is in a certain place, without pinning it in place.
Having it part of gem bo removes a lot of the vma refcounting, and makes i915_vma more a part of the bo, instead of its own floating object that just happens to be part of a bo. This is also required to make it more compatible with TTM, and migration in general.
For future work, it makes things a lot simpler and clear. We want to end up with i915_vma just being a specific mapping of the BO, just like is the case in other drivers. i915_vma->active removal is the next step there, and makes it when object is destroyed, the bindings are destroyed (after idle), instead of object being destroyed when bindings are idle.
With the obj->resv being the master lock, pinning for execbuf becomes obsolete and can be removed.
On 01/12/2021 11:15, Maarten Lankhorst wrote:
Excellent, that's exactly the level needed for the cover letter. So that as intro, plus some text about the series details like which part of the overall plan it implements and it will be good.
Out of personal curiosity about the long term plans - what will be the flow on request retirement in terms of unbinding?
Regards,
Tvrtko
dri-devel@lists.freedesktop.org