On 09-12-2021 14:40, Matthew Auld wrote:
On Thu, 9 Dec 2021 at 13:25, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
On 09-12-2021 14:05, Matthew Auld wrote:
On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst maarten.lankhorst@linux.intel.com wrote:
We want to remove more members of i915_vma, which requires the locking to be held more often.
Start requiring gem object lock for i915_vma_unbind, as it's one of the callers that may unpin pages.
Some special care is needed when evicting, because the last reference to the object may be held by the VMA, so after __i915_vma_unbind, vma may be garbage, and we need to cache vma->obj before unlocking.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com
<snip>
@@ -129,22 +129,47 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm)
drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt);
+retry:
i915_gem_drain_freed_objects(vm->i915);
mutex_lock(&vm->mutex); /* Skip rewriting PTE on VMA unbind. */ open = atomic_xchg(&vm->open, 0); list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link) {
struct drm_i915_gem_object *obj = vma->obj;
GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
i915_vma_wait_for_bind(vma);
if (i915_vma_is_pinned(vma))
if (i915_vma_is_pinned(vma) || !i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) continue;
if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) {
__i915_vma_evict(vma);
drm_mm_remove_node(&vma->node);
/* unlikely to race when GPU is idle, so no worry about slowpath.. */
if (!i915_gem_object_trylock(obj, NULL)) {
atomic_set(&vm->open, open);
Does this need a comment about barriers?
Not sure, it's guarded by vm->mutex.
i915_gem_object_get(obj);
Should this not be kref_get_unless_zero? Assuming the vm->mutex is the only thing keeping the object alive here, won't this lead to potential uaf/double-free or something? Also should we not plonk this before the trylock? Or maybe I'm missing something here?
Normally you're correct, this is normally the case, but we drain freed objects and this path should only be run during s/r, at which point userspace should be dead, GPU idle, and we just drained all freed objects above.
It would be a bug if we still found a dead object, as nothing should be running.
Hmm, Ok. So why do we expect the trylock to ever fail here? Who else can grab it at this stage?
It probably shouldn't, should probably be a WARN if it happens.
mutex_unlock(&vm->mutex);
i915_gem_object_lock(obj, NULL);
open = i915_vma_unbind(vma);
i915_gem_object_unlock(obj);
GEM_WARN_ON(open);
i915_gem_object_put(obj);
goto retry; }
i915_vma_wait_for_bind(vma);
We also call wait_for_bind above, is that intentional?
Should be harmless, but first one should probably be removed. :)