gem context refcounting is another exercise in least locking design it seems, where most things get destroyed upon context closure (which can race with anything really). Only the actual memory allocation and the locks survive while holding a reference.
This tripped up Jason when reimplementing the single timeline feature in
commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d Author: Jason Ekstrand jason@jlekstrand.net Date: Thu Jul 8 10:48:12 2021 -0500
drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
We could fix the bug by holding ctx->mutex, but it's cleaner to just make the context object actually invariant over its _entire_ lifetime.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: Jason Ekstrand jason@jlekstrand.net Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: "Thomas Hellström" thomas.hellstrom@intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Dave Airlie airlied@redhat.com --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 754b9b8d4981..93ba0197d70a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) trace_i915_context_free(ctx); GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
+ if (ctx->syncobj) + drm_syncobj_put(ctx->syncobj); + mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex);
@@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) if (vm) i915_vm_close(vm);
- if (ctx->syncobj) - drm_syncobj_put(ctx->syncobj); - ctx->file_priv = ERR_PTR(-EBADF);
/*
On August 6, 2021 15:18:59 Daniel Vetter daniel.vetter@ffwll.ch wrote:
gem context refcounting is another exercise in least locking design it seems, where most things get destroyed upon context closure (which can race with anything really). Only the actual memory allocation and the locks survive while holding a reference.
This tripped up Jason when reimplementing the single timeline feature in
commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d Author: Jason Ekstrand jason@jlekstrand.net Date: Thu Jul 8 10:48:12 2021 -0500
drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
We could fix the bug by holding ctx->mutex, but it's cleaner to just
What bug is this fixing, exactly?
--Jason
make the context object actually invariant over its _entire_ lifetime.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: Jason Ekstrand jason@jlekstrand.net Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: "Thomas Hellström" thomas.hellstrom@intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Dave Airlie airlied@redhat.com
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 754b9b8d4981..93ba0197d70a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) trace_i915_context_free(ctx); GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
- if (ctx->syncobj)
- drm_syncobj_put(ctx->syncobj);
mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex);
@@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) if (vm) i915_vm_close(vm);
- if (ctx->syncobj)
- drm_syncobj_put(ctx->syncobj);
ctx->file_priv = ERR_PTR(-EBADF);
/*
2.32.0
On Sun, Aug 8, 2021 at 2:56 AM Jason Ekstrand jason@jlekstrand.net wrote:
On August 6, 2021 15:18:59 Daniel Vetter daniel.vetter@ffwll.ch wrote:
gem context refcounting is another exercise in least locking design it seems, where most things get destroyed upon context closure (which can race with anything really). Only the actual memory allocation and the locks survive while holding a reference.
This tripped up Jason when reimplementing the single timeline feature in
commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d Author: Jason Ekstrand jason@jlekstrand.net Date: Thu Jul 8 10:48:12 2021 -0500
drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
We could fix the bug by holding ctx->mutex, but it's cleaner to just
What bug is this fixing, exactly?
Oh lol I was all busy ranting and not explaining :-) I found it while auditing other context stuff, so that other patch has the longer commit message with more history, but that patch is also now tied into the vm-dercuify, so short summary: You put the syncobj in context close (i.e. CTX_DESTRY ioctl or close(drmfd)), not in the final kref_put. Which means you're open to a use-after-free if you race against an execbuf. ctx->vm is equally broken (but for other ioctl), once this fix here is merged I send out the ctx->vm fix because that's tied into the vm-dercuify now due to conflicts. -Daniel
--Jason
make the context object actually invariant over its _entire_ lifetime.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: Jason Ekstrand jason@jlekstrand.net Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: "Thomas Hellström" thomas.hellstrom@intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Dave Airlie airlied@redhat.com
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 754b9b8d4981..93ba0197d70a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) trace_i915_context_free(ctx); GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
- if (ctx->syncobj)
- drm_syncobj_put(ctx->syncobj);
- mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex);
@@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) if (vm) i915_vm_close(vm);
if (ctx->syncobj)
drm_syncobj_put(ctx->syncobj);
ctx->file_priv = ERR_PTR(-EBADF);
/*
-- 2.32.0
On Mon, Aug 09, 2021 at 08:47:14PM +0200, Daniel Vetter wrote:
On Sun, Aug 8, 2021 at 2:56 AM Jason Ekstrand jason@jlekstrand.net wrote:
On August 6, 2021 15:18:59 Daniel Vetter daniel.vetter@ffwll.ch wrote:
gem context refcounting is another exercise in least locking design it seems, where most things get destroyed upon context closure (which can race with anything really). Only the actual memory allocation and the locks survive while holding a reference.
This tripped up Jason when reimplementing the single timeline feature in
commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d Author: Jason Ekstrand jason@jlekstrand.net Date: Thu Jul 8 10:48:12 2021 -0500
drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
We could fix the bug by holding ctx->mutex, but it's cleaner to just
What bug is this fixing, exactly?
Oh lol I was all busy ranting and not explaining :-) I found it while auditing other context stuff, so that other patch has the longer commit message with more history, but that patch is also now tied into the vm-dercuify, so short summary: You put the syncobj in context close (i.e. CTX_DESTRY ioctl or close(drmfd)), not in the final kref_put. Which means you're open to a use-after-free if you race against an execbuf. ctx->vm is equally broken (but for other ioctl), once this fix here is merged I send out the ctx->vm fix because that's tied into the vm-dercuify now due to conflicts.
CI caught more, so just explaining what I'm fixing here isn't going to be enough.
The troubel is that drm_syncobj_put is now called from very awkward places, and I need to see whether we can fix that. Or whether we need more work_struct (like we have already for i915_address_space) for the final release. -Daniel
-Daniel
--Jason
make the context object actually invariant over its _entire_ lifetime.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: Jason Ekstrand jason@jlekstrand.net Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: "Thomas Hellström" thomas.hellstrom@intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Dave Airlie airlied@redhat.com
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 754b9b8d4981..93ba0197d70a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) trace_i915_context_free(ctx); GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
- if (ctx->syncobj)
- drm_syncobj_put(ctx->syncobj);
- mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex);
@@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) if (vm) i915_vm_close(vm);
if (ctx->syncobj)
drm_syncobj_put(ctx->syncobj);
ctx->file_priv = ERR_PTR(-EBADF);
/*
-- 2.32.0
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
dri-devel@lists.freedesktop.org