Full audit of everyone:
- i915, radeon, amdgpu should be clean per their maintainers.
- vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
- panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
- v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
- vmwgfx has a bunch of ioctls that do their own copy_*_user: - vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe. - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out. - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there. Summary: vmwgfx seems to be fine too.
- virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
- qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
- A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/dma-buf/dma-resv.c | 42 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..29988b1564c1 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/** * DOC: Reservation Object Overview @@ -95,6 +96,28 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +struct lockdep_work { + struct work_struct work; + struct dma_resv obj; + struct mm_struct *mm; +} lockdep_work; + +void lockdep_work_fn(struct work_struct *work) +{ + dma_resv_init(&lockdep_work.obj); + + down_read(&lockdep_work.mm->mmap_sem); + ww_mutex_lock(&lockdep_work.obj.lock, NULL); + fs_reclaim_acquire(GFP_KERNEL); + fs_reclaim_release(GFP_KERNEL); + ww_mutex_unlock(&lockdep_work.obj.lock); + up_read(&lockdep_work.mm->mmap_sem); + + mmput(lockdep_work.mm); +} +#endif + /** * dma_resv_init - initialize a reservation object * @obj: the reservation object @@ -107,6 +130,25 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); + +#if IS_ENABLED(CONFIG_LOCKDEP) + if (current->mm) { + static atomic_t lockdep_primed; + + /* + * This gets called from all kinds of places, launch a worker. + * Usual init sections don't work for kernel threads lack an + * ->mm. + */ + if (atomic_cmpxchg(&lockdep_primed, 0, 1) == 0) { + INIT_WORK(&lockdep_work.work, lockdep_work_fn); + lockdep_work.mm = current->mm; + mmget(lockdep_work.mm); + + schedule_work(&lockdep_work.work); + } + } +#endif } EXPORT_SYMBOL(dma_resv_init);
We can't copy_*_user while holding reservations, that will (soon even for nouveau) lead to deadlocks. And it breaks the cross-driver contract around dma_resv.
Fix this by adding a slowpath for when we need relocations, and by pushing the writeback of the new presumed offsets to the very end.
Aside from "it compiles" entirely untested unfortunately.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Ben Skeggs bskeggs@redhat.com Cc: nouveau@lists.freedesktop.org --- drivers/gpu/drm/nouveau/nouveau_gem.c | 57 ++++++++++++++++++--------- 1 file changed, 38 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index c77302f969e8..60309b997951 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -482,12 +482,9 @@ validate_init(struct nouveau_channel *chan, struct drm_file *file_priv,
static int validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli, - struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo, - uint64_t user_pbbo_ptr) + struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo) { struct nouveau_drm *drm = chan->drm; - struct drm_nouveau_gem_pushbuf_bo __user *upbbo = - (void __force __user *)(uintptr_t)user_pbbo_ptr; struct nouveau_bo *nvbo; int ret, relocs = 0;
@@ -531,10 +528,6 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli, b->presumed.offset = nvbo->bo.offset; b->presumed.valid = 0; relocs++; - - if (copy_to_user(&upbbo[nvbo->pbbo_index].presumed, - &b->presumed, sizeof(b->presumed))) - return -EFAULT; } }
@@ -545,8 +538,8 @@ static int nouveau_gem_pushbuf_validate(struct nouveau_channel *chan, struct drm_file *file_priv, struct drm_nouveau_gem_pushbuf_bo *pbbo, - uint64_t user_buffers, int nr_buffers, - struct validate_op *op, int *apply_relocs) + int nr_buffers, + struct validate_op *op, bool *apply_relocs) { struct nouveau_cli *cli = nouveau_cli(file_priv); int ret; @@ -563,7 +556,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan, return ret; }
- ret = validate_list(chan, cli, &op->list, pbbo, user_buffers); + ret = validate_list(chan, cli, &op->list, pbbo); if (unlikely(ret < 0)) { if (ret != -ERESTARTSYS) NV_PRINTK(err, cli, "validating bo list\n"); @@ -603,16 +596,12 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size) static int nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, struct drm_nouveau_gem_pushbuf *req, + struct drm_nouveau_gem_pushbuf_reloc *reloc, struct drm_nouveau_gem_pushbuf_bo *bo) { - struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL; int ret = 0; unsigned i;
- reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc)); - if (IS_ERR(reloc)) - return PTR_ERR(reloc); - for (i = 0; i < req->nr_relocs; i++) { struct drm_nouveau_gem_pushbuf_reloc *r = &reloc[i]; struct drm_nouveau_gem_pushbuf_bo *b; @@ -691,11 +680,13 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, struct nouveau_drm *drm = nouveau_drm(dev); struct drm_nouveau_gem_pushbuf *req = data; struct drm_nouveau_gem_pushbuf_push *push; + struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL; struct drm_nouveau_gem_pushbuf_bo *bo; struct nouveau_channel *chan = NULL; struct validate_op op; struct nouveau_fence *fence = NULL; - int i, j, ret = 0, do_reloc = 0; + int i, j, ret = 0; + bool do_reloc = false;
if (unlikely(!abi16)) return -ENOMEM; @@ -753,7 +744,8 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, }
/* Validate buffer list */ - ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->buffers, +revalidate: + ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->nr_buffers, &op, &do_reloc); if (ret) { if (ret != -ERESTARTSYS) @@ -763,7 +755,18 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
/* Apply any relocations that are required */ if (do_reloc) { - ret = nouveau_gem_pushbuf_reloc_apply(cli, req, bo); + if (!reloc) { + validate_fini(&op, chan, NULL, bo); + reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc)); + if (IS_ERR(reloc)) { + ret = PTR_ERR(reloc); + goto out_prevalid; + } + + goto revalidate; + } + + ret = nouveau_gem_pushbuf_reloc_apply(cli, req, reloc, bo); if (ret) { NV_PRINTK(err, cli, "reloc apply: %d\n", ret); goto out; @@ -849,6 +852,22 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, validate_fini(&op, chan, fence, bo); nouveau_fence_unref(&fence);
+ if (do_reloc) { + struct drm_nouveau_gem_pushbuf_bo __user *upbbo = + u64_to_user_ptr(req->buffers); + + for (i = 0; i < req->nr_buffers; i++) { + if (bo[i].presumed.valid) + continue; + + if (copy_to_user(&upbbo[i].presumed, &bo[i].presumed, + sizeof(bo[i].presumed))) { + ret = -EFAULT; + break; + } + } + u_free(reloc); + } out_prevalid: u_free(bo); u_free(push);
On Wed, Aug 21, 2019 at 11:50:29PM +0200, Daniel Vetter wrote:
We can't copy_*_user while holding reservations, that will (soon even for nouveau) lead to deadlocks. And it breaks the cross-driver contract around dma_resv.
Fix this by adding a slowpath for when we need relocations, and by pushing the writeback of the new presumed offsets to the very end.
Aside from "it compiles" entirely untested unfortunately.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Ben Skeggs bskeggs@redhat.com Cc: nouveau@lists.freedesktop.org
Ping for some review/testing (apparently needs pre-nv50). I'd really like to land this series here, it should help a lot in making sure everyone uses dma_resv in a compatible way across drivers.
Thanks, Daniel
drivers/gpu/drm/nouveau/nouveau_gem.c | 57 ++++++++++++++++++--------- 1 file changed, 38 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index c77302f969e8..60309b997951 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -482,12 +482,9 @@ validate_init(struct nouveau_channel *chan, struct drm_file *file_priv,
static int validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo,
uint64_t user_pbbo_ptr)
struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo)
{ struct nouveau_drm *drm = chan->drm;
- struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
struct nouveau_bo *nvbo; int ret, relocs = 0;(void __force __user *)(uintptr_t)user_pbbo_ptr;
@@ -531,10 +528,6 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli, b->presumed.offset = nvbo->bo.offset; b->presumed.valid = 0; relocs++;
if (copy_to_user(&upbbo[nvbo->pbbo_index].presumed,
&b->presumed, sizeof(b->presumed)))
} }return -EFAULT;
@@ -545,8 +538,8 @@ static int nouveau_gem_pushbuf_validate(struct nouveau_channel *chan, struct drm_file *file_priv, struct drm_nouveau_gem_pushbuf_bo *pbbo,
uint64_t user_buffers, int nr_buffers,
struct validate_op *op, int *apply_relocs)
int nr_buffers,
struct validate_op *op, bool *apply_relocs)
{ struct nouveau_cli *cli = nouveau_cli(file_priv); int ret; @@ -563,7 +556,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan, return ret; }
- ret = validate_list(chan, cli, &op->list, pbbo, user_buffers);
- ret = validate_list(chan, cli, &op->list, pbbo); if (unlikely(ret < 0)) { if (ret != -ERESTARTSYS) NV_PRINTK(err, cli, "validating bo list\n");
@@ -603,16 +596,12 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size) static int nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, struct drm_nouveau_gem_pushbuf *req,
struct drm_nouveau_gem_pushbuf_reloc *reloc, struct drm_nouveau_gem_pushbuf_bo *bo)
{
struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL; int ret = 0; unsigned i;
reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc));
if (IS_ERR(reloc))
return PTR_ERR(reloc);
for (i = 0; i < req->nr_relocs; i++) { struct drm_nouveau_gem_pushbuf_reloc *r = &reloc[i]; struct drm_nouveau_gem_pushbuf_bo *b;
@@ -691,11 +680,13 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, struct nouveau_drm *drm = nouveau_drm(dev); struct drm_nouveau_gem_pushbuf *req = data; struct drm_nouveau_gem_pushbuf_push *push;
- struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL; struct drm_nouveau_gem_pushbuf_bo *bo; struct nouveau_channel *chan = NULL; struct validate_op op; struct nouveau_fence *fence = NULL;
- int i, j, ret = 0, do_reloc = 0;
int i, j, ret = 0;
bool do_reloc = false;
if (unlikely(!abi16)) return -ENOMEM;
@@ -753,7 +744,8 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, }
/* Validate buffer list */
- ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->buffers,
+revalidate:
- ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->nr_buffers, &op, &do_reloc); if (ret) { if (ret != -ERESTARTSYS)
@@ -763,7 +755,18 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
/* Apply any relocations that are required */ if (do_reloc) {
ret = nouveau_gem_pushbuf_reloc_apply(cli, req, bo);
if (!reloc) {
validate_fini(&op, chan, NULL, bo);
reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc));
if (IS_ERR(reloc)) {
ret = PTR_ERR(reloc);
goto out_prevalid;
}
goto revalidate;
}
if (ret) { NV_PRINTK(err, cli, "reloc apply: %d\n", ret); goto out;ret = nouveau_gem_pushbuf_reloc_apply(cli, req, reloc, bo);
@@ -849,6 +852,22 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, validate_fini(&op, chan, fence, bo); nouveau_fence_unref(&fence);
- if (do_reloc) {
struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
u64_to_user_ptr(req->buffers);
for (i = 0; i < req->nr_buffers; i++) {
if (bo[i].presumed.valid)
continue;
if (copy_to_user(&upbbo[i].presumed, &bo[i].presumed,
sizeof(bo[i].presumed))) {
ret = -EFAULT;
break;
}
}
u_free(reloc);
- }
out_prevalid: u_free(bo); u_free(push); -- 2.23.0.rc1
On Tue, Sep 03, 2019 at 10:17:14AM +0200, Daniel Vetter wrote:
On Wed, Aug 21, 2019 at 11:50:29PM +0200, Daniel Vetter wrote:
We can't copy_*_user while holding reservations, that will (soon even for nouveau) lead to deadlocks. And it breaks the cross-driver contract around dma_resv.
Fix this by adding a slowpath for when we need relocations, and by pushing the writeback of the new presumed offsets to the very end.
Aside from "it compiles" entirely untested unfortunately.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Ben Skeggs bskeggs@redhat.com Cc: nouveau@lists.freedesktop.org
Ping for some review/testing (apparently needs pre-nv50). I'd really like to land this series here, it should help a lot in making sure everyone uses dma_resv in a compatible way across drivers.
Now that the gem/ttm fallout is fixed, ping for testing on this one here ... Also need some r-b to get this landed.
Thanks, Daniel
drivers/gpu/drm/nouveau/nouveau_gem.c | 57 ++++++++++++++++++--------- 1 file changed, 38 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index c77302f969e8..60309b997951 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -482,12 +482,9 @@ validate_init(struct nouveau_channel *chan, struct drm_file *file_priv,
static int validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo,
uint64_t user_pbbo_ptr)
struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo)
{ struct nouveau_drm *drm = chan->drm;
- struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
struct nouveau_bo *nvbo; int ret, relocs = 0;(void __force __user *)(uintptr_t)user_pbbo_ptr;
@@ -531,10 +528,6 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli, b->presumed.offset = nvbo->bo.offset; b->presumed.valid = 0; relocs++;
if (copy_to_user(&upbbo[nvbo->pbbo_index].presumed,
&b->presumed, sizeof(b->presumed)))
} }return -EFAULT;
@@ -545,8 +538,8 @@ static int nouveau_gem_pushbuf_validate(struct nouveau_channel *chan, struct drm_file *file_priv, struct drm_nouveau_gem_pushbuf_bo *pbbo,
uint64_t user_buffers, int nr_buffers,
struct validate_op *op, int *apply_relocs)
int nr_buffers,
struct validate_op *op, bool *apply_relocs)
{ struct nouveau_cli *cli = nouveau_cli(file_priv); int ret; @@ -563,7 +556,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan, return ret; }
- ret = validate_list(chan, cli, &op->list, pbbo, user_buffers);
- ret = validate_list(chan, cli, &op->list, pbbo); if (unlikely(ret < 0)) { if (ret != -ERESTARTSYS) NV_PRINTK(err, cli, "validating bo list\n");
@@ -603,16 +596,12 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size) static int nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, struct drm_nouveau_gem_pushbuf *req,
struct drm_nouveau_gem_pushbuf_reloc *reloc, struct drm_nouveau_gem_pushbuf_bo *bo)
{
struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL; int ret = 0; unsigned i;
reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc));
if (IS_ERR(reloc))
return PTR_ERR(reloc);
for (i = 0; i < req->nr_relocs; i++) { struct drm_nouveau_gem_pushbuf_reloc *r = &reloc[i]; struct drm_nouveau_gem_pushbuf_bo *b;
@@ -691,11 +680,13 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, struct nouveau_drm *drm = nouveau_drm(dev); struct drm_nouveau_gem_pushbuf *req = data; struct drm_nouveau_gem_pushbuf_push *push;
- struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL; struct drm_nouveau_gem_pushbuf_bo *bo; struct nouveau_channel *chan = NULL; struct validate_op op; struct nouveau_fence *fence = NULL;
- int i, j, ret = 0, do_reloc = 0;
int i, j, ret = 0;
bool do_reloc = false;
if (unlikely(!abi16)) return -ENOMEM;
@@ -753,7 +744,8 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, }
/* Validate buffer list */
- ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->buffers,
+revalidate:
- ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->nr_buffers, &op, &do_reloc); if (ret) { if (ret != -ERESTARTSYS)
@@ -763,7 +755,18 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
/* Apply any relocations that are required */ if (do_reloc) {
ret = nouveau_gem_pushbuf_reloc_apply(cli, req, bo);
if (!reloc) {
validate_fini(&op, chan, NULL, bo);
reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc));
if (IS_ERR(reloc)) {
ret = PTR_ERR(reloc);
goto out_prevalid;
}
goto revalidate;
}
if (ret) { NV_PRINTK(err, cli, "reloc apply: %d\n", ret); goto out;ret = nouveau_gem_pushbuf_reloc_apply(cli, req, reloc, bo);
@@ -849,6 +852,22 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, validate_fini(&op, chan, fence, bo); nouveau_fence_unref(&fence);
- if (do_reloc) {
struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
u64_to_user_ptr(req->buffers);
for (i = 0; i < req->nr_buffers; i++) {
if (bo[i].presumed.valid)
continue;
if (copy_to_user(&upbbo[i].presumed, &bo[i].presumed,
sizeof(bo[i].presumed))) {
ret = -EFAULT;
break;
}
}
u_free(reloc);
- }
out_prevalid: u_free(bo); u_free(push); -- 2.23.0.rc1
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2: - Dont forget wu_mutex (Christian König) - Keep the mmap_sem-less wait optimization (Thomas) - Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com --- drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv); - mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); } @@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru); - mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages; @@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ - int ret; - - /* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ - ret = mutex_lock_interruptible(&bo->wu_mutex); - if (unlikely(ret != 0)) - return -ERESTARTSYS; - if (!dma_resv_is_locked(bo->base.resv)) - goto out_unlock; - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); - if (ret == -EINTR) - ret = -ERESTARTSYS; - if (unlikely(ret != 0)) - goto out_unlock; - dma_resv_unlock(bo->base.resv); - -out_unlock: - mutex_unlock(&bo->wu_mutex); - return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru); - mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0); diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
- /* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem); - (void) ttm_bo_wait_unreserved(bo); + if (!dma_resv_lock_interruptible(bo->base.resv, + NULL)) + dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); }
return VM_FAULT_RETRY; }
- /* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ - return VM_FAULT_NOPAGE; + if (dma_resv_lock_interruptible(bo->base.resv, NULL)) + return VM_FAULT_NOPAGE; }
/* diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt; * @offset: The current GPU offset, which can have different meanings * depending on the memory type. For SYSTEM type memory, it should be 0. * @cur_placement: Hint of current placement. - * @wu_mutex: Wait unreserved mutex. * * Base class for TTM buffer object, that deals with data placement and CPU * mappings. GPU mappings are really up to the driver, but for simpler GPUs @@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */
struct sg_table *sg; - - struct mutex wu_mutex; };
/** @@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the
Quoting Daniel Vetter (2019-08-21 22:50:28)
Full audit of everyone:
i915, radeon, amdgpu should be clean per their maintainers.
vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
vmwgfx has a bunch of ioctls that do their own copy_*_user:
- vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.
- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.
- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.
Summary: vmwgfx seems to be fine too.
virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/dma-buf/dma-resv.c | 42 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..29988b1564c1 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/**
- DOC: Reservation Object Overview
@@ -95,6 +96,28 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +struct lockdep_work {
struct work_struct work;
struct dma_resv obj;
struct mm_struct *mm;
+} lockdep_work;
+void lockdep_work_fn(struct work_struct *work) +{
dma_resv_init(&lockdep_work.obj);
down_read(&lockdep_work.mm->mmap_sem);
ww_mutex_lock(&lockdep_work.obj.lock, NULL);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&lockdep_work.obj.lock);
up_read(&lockdep_work.mm->mmap_sem);
mmput(lockdep_work.mm);
+} +#endif
#if IS_ENABLED(CONFIG_LOCKDEP) static void dma_resv_lockmap(void) { struct mm_struct *mm = alloc_mm(); struct dma_resv obj;
dma_resv_init(&obj);
down_read(&mm->mmap_sem); ww_mutex_lock(&obj.lock, NULL); fs_reclaim_acquire(GFP_KERNEL); fs_reclaim_release(GFP_KERNEL); ww_mutex_unlock(&obj.lock); up_read(&mm->mmap_sem);
mmput(mm); } core_initcall(dma_resv_lockmap); #endif
as a thought. -Chris
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2: - Dont forget wu_mutex (Christian König) - Keep the mmap_sem-less wait optimization (Thomas) - Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com --- drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv); - mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); } @@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru); - mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages; @@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ - int ret; - - /* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ - ret = mutex_lock_interruptible(&bo->wu_mutex); - if (unlikely(ret != 0)) - return -ERESTARTSYS; - if (!dma_resv_is_locked(bo->base.resv)) - goto out_unlock; - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); - if (ret == -EINTR) - ret = -ERESTARTSYS; - if (unlikely(ret != 0)) - goto out_unlock; - dma_resv_unlock(bo->base.resv); - -out_unlock: - mutex_unlock(&bo->wu_mutex); - return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru); - mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0); diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
- /* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem); - (void) ttm_bo_wait_unreserved(bo); + if (!dma_resv_lock_interruptible(bo->base.resv, + NULL)) + dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); }
return VM_FAULT_RETRY; }
- /* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ - return VM_FAULT_NOPAGE; + if (dma_resv_lock_interruptible(bo->base.resv, NULL)) + return VM_FAULT_NOPAGE; }
/* diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt; * @offset: The current GPU offset, which can have different meanings * depending on the memory type. For SYSTEM type memory, it should be 0. * @cur_placement: Hint of current placement. - * @wu_mutex: Wait unreserved mutex. * * Base class for TTM buffer object, that deals with data placement and CPU * mappings. GPU mappings are really up to the driver, but for simpler GPUs @@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */
struct sg_table *sg; - - struct mutex wu_mutex; };
/** @@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the
Am 22.08.19 um 08:49 schrieb Daniel Vetter:
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2:
- Dont forget wu_mutex (Christian König)
- Keep the mmap_sem-less wait optimization (Thomas)
- Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com
drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv);
- mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); }
@@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru);
- mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages;
@@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);
-/**
- ttm_bo_wait_unreserved - interruptible wait for a buffer object to become
- unreserved
- @bo: Pointer to buffer
- */
-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
- int ret;
- /*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
- ret = mutex_lock_interruptible(&bo->wu_mutex);
- if (unlikely(ret != 0))
return -ERESTARTSYS;
- if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
- ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
- if (ret == -EINTR)
ret = -ERESTARTSYS;
- if (unlikely(ret != 0))
goto out_unlock;
- dma_resv_unlock(bo->base.resv);
-out_unlock:
- mutex_unlock(&bo->wu_mutex);
- return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
- mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
- /*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {*/
Not an expert on fault handling, but shouldn't this now be one if?
E.g. if FAULT_FLAG_RETRY_NOWAIT is set we should return VM_FAULT_NOPAGE instead of VM_FAULT_RETRY.
But really take that with a grain of salt, Christian.
ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem);
(void) ttm_bo_wait_unreserved(bo);
if (!dma_resv_lock_interruptible(bo->base.resv,
NULL))
dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); } return VM_FAULT_RETRY;
}
/*
* If we'd want to change locking order to
* mmap_sem -> bo::reserve, we'd use a blocking reserve here
* instead of retrying the fault...
*/
return VM_FAULT_NOPAGE;
if (dma_resv_lock_interruptible(bo->base.resv, NULL))
return VM_FAULT_NOPAGE;
}
/*
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt;
- @offset: The current GPU offset, which can have different meanings
- depending on the memory type. For SYSTEM type memory, it should be 0.
- @cur_placement: Hint of current placement.
- @wu_mutex: Wait unreserved mutex.
- Base class for TTM buffer object, that deals with data placement and CPU
- mappings. GPU mappings are really up to the driver, but for simpler GPUs
@@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */
struct sg_table *sg;
struct mutex wu_mutex; };
/**
@@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/**
- ttm_bo_uses_embedded_gem_object - check if the given bo uses the
On Thu, Aug 22, 2019 at 9:56 AM Koenig, Christian Christian.Koenig@amd.com wrote:
Am 22.08.19 um 08:49 schrieb Daniel Vetter:
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2:
- Dont forget wu_mutex (Christian König)
- Keep the mmap_sem-less wait optimization (Thomas)
- Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com
drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv);
}mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size);
@@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru);
mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages;
@@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);
-/**
- ttm_bo_wait_unreserved - interruptible wait for a buffer object to become
- unreserved
- @bo: Pointer to buffer
- */
-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;
/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);
if (unlikely(ret != 0))
return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
if (ret == -EINTR)
ret = -ERESTARTSYS;
if (unlikely(ret != 0))
goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:
mutex_unlock(&bo->wu_mutex);
return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/ if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
Not an expert on fault handling, but shouldn't this now be one if?
E.g. if FAULT_FLAG_RETRY_NOWAIT is set we should return VM_FAULT_NOPAGE instead of VM_FAULT_RETRY.
Honestly I have no idea at all about this stuff. I just learned about the mmap_sem less retry now that Thomas pointed it out, and I have no idea how anything else here works. My approach has always been to just throw ridiculous amounts of really nasty tests at fault handling (including handling our own gtt mmaps to copy*user in relocs or gup for userptr and all that), and leave it at that :-)
But really take that with a grain of salt,
No idea either. It should be functionally equivalent to what was there before, except we now have the full blocking wait for the mutex instead of busy-spinning on NO_PAGE (with the wait_unreserved mixed in every odd fault I guess?). All over my head I'd say ...
Cheers, Daniel
Christian.
ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem);
(void) ttm_bo_wait_unreserved(bo);
if (!dma_resv_lock_interruptible(bo->base.resv,
NULL))
dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); } return VM_FAULT_RETRY; }
/*
* If we'd want to change locking order to
* mmap_sem -> bo::reserve, we'd use a blocking reserve here
* instead of retrying the fault...
*/
return VM_FAULT_NOPAGE;
if (dma_resv_lock_interruptible(bo->base.resv, NULL))
return VM_FAULT_NOPAGE; } /*
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt;
- @offset: The current GPU offset, which can have different meanings
- depending on the memory type. For SYSTEM type memory, it should be 0.
- @cur_placement: Hint of current placement.
- @wu_mutex: Wait unreserved mutex.
- Base class for TTM buffer object, that deals with data placement and CPU
- mappings. GPU mappings are really up to the driver, but for simpler GPUs
@@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */
struct sg_table *sg;
struct mutex wu_mutex;
};
/**
@@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/**
- ttm_bo_uses_embedded_gem_object - check if the given bo uses the
On 8/22/19 10:47 AM, Daniel Vetter wrote:
On Thu, Aug 22, 2019 at 9:56 AM Koenig, Christian Christian.Koenig@amd.com wrote:
Am 22.08.19 um 08:49 schrieb Daniel Vetter:
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2:
- Dont forget wu_mutex (Christian König)
- Keep the mmap_sem-less wait optimization (Thomas)
- Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com
drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv);
}mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size);
@@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru);
mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages;
@@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);
-/**
- ttm_bo_wait_unreserved - interruptible wait for a buffer object to become
- unreserved
- @bo: Pointer to buffer
- */
-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;
/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);
if (unlikely(ret != 0))
return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
if (ret == -EINTR)
ret = -ERESTARTSYS;
if (unlikely(ret != 0))
goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:
mutex_unlock(&bo->wu_mutex);
return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/ if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
Not an expert on fault handling, but shouldn't this now be one if?
E.g. if FAULT_FLAG_RETRY_NOWAIT is set we should return VM_FAULT_NOPAGE instead of VM_FAULT_RETRY.
Honestly I have no idea at all about this stuff. I just learned about the mmap_sem less retry now that Thomas pointed it out, and I have no idea how anything else here works. My approach has always been to just throw ridiculous amounts of really nasty tests at fault handling (including handling our own gtt mmaps to copy*user in relocs or gup for userptr and all that), and leave it at that :-)
But really take that with a grain of salt,
No idea either. It should be functionally equivalent to what was there before, except we now have the full blocking wait for the mutex instead of busy-spinning on NO_PAGE (with the wait_unreserved mixed in every odd fault I guess?). All over my head I'd say ...
To be honest I don't remember the difference about VM_FAULT_RETRY with !FAULT_FLAG_RETRY_NOWAIT and just returning VM_FAULT_NOPAGE. It appears most users and TTM use the former, while shmem uses the latter.
The detailed FAULT_RETRY semantics are pretty undocumented and requires diving into the mm system to get the full picture.
BTW it looks to me like vgem and vmks has got VM_FAULT_RETRY wrong, since they might return it without ALLOW_RETRY, and if FAULT_FLAG_RETRY_NOWAIT is true they, should drop the mmap_sem, otherwise things will go really bad.
/Thomas
Cheers, Daniel
Christian.
ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem);
(void) ttm_bo_wait_unreserved(bo);
if (!dma_resv_lock_interruptible(bo->base.resv,
NULL))
dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); } return VM_FAULT_RETRY; }
/*
* If we'd want to change locking order to
* mmap_sem -> bo::reserve, we'd use a blocking reserve here
* instead of retrying the fault...
*/
return VM_FAULT_NOPAGE;
if (dma_resv_lock_interruptible(bo->base.resv, NULL))
return VM_FAULT_NOPAGE; } /*
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt; * @offset: The current GPU offset, which can have different meanings * depending on the memory type. For SYSTEM type memory, it should be 0. * @cur_placement: Hint of current placement.
- @wu_mutex: Wait unreserved mutex.
- Base class for TTM buffer object, that deals with data placement and CPU
- mappings. GPU mappings are really up to the driver, but for simpler GPUs
@@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */
struct sg_table *sg;
struct mutex wu_mutex;
};
/**
@@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the
On Thu, Aug 22, 2019 at 07:56:56AM +0000, Koenig, Christian wrote:
Am 22.08.19 um 08:49 schrieb Daniel Vetter:
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2:
- Dont forget wu_mutex (Christian König)
- Keep the mmap_sem-less wait optimization (Thomas)
- Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com
btw I realized I didn't remove your r-b, since v1 was broken.
For formality, can you pls reaffirm, or still something broken?
Also from the other thread: Reviewed-by: Thomas Hellström thellstrom@vmware.com
Thanks, Daniel
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com
drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv);
- mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); }
@@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru);
- mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages;
@@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);
-/**
- ttm_bo_wait_unreserved - interruptible wait for a buffer object to become
- unreserved
- @bo: Pointer to buffer
- */
-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
- int ret;
- /*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
- ret = mutex_lock_interruptible(&bo->wu_mutex);
- if (unlikely(ret != 0))
return -ERESTARTSYS;
- if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
- ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
- if (ret == -EINTR)
ret = -ERESTARTSYS;
- if (unlikely(ret != 0))
goto out_unlock;
- dma_resv_unlock(bo->base.resv);
-out_unlock:
- mutex_unlock(&bo->wu_mutex);
- return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
- mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
- /*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {*/
Not an expert on fault handling, but shouldn't this now be one if?
E.g. if FAULT_FLAG_RETRY_NOWAIT is set we should return VM_FAULT_NOPAGE instead of VM_FAULT_RETRY.
But really take that with a grain of salt, Christian.
ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem);
(void) ttm_bo_wait_unreserved(bo);
if (!dma_resv_lock_interruptible(bo->base.resv,
NULL))
dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); } return VM_FAULT_RETRY;
}
/*
* If we'd want to change locking order to
* mmap_sem -> bo::reserve, we'd use a blocking reserve here
* instead of retrying the fault...
*/
return VM_FAULT_NOPAGE;
if (dma_resv_lock_interruptible(bo->base.resv, NULL))
return VM_FAULT_NOPAGE;
}
/*
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt;
- @offset: The current GPU offset, which can have different meanings
- depending on the memory type. For SYSTEM type memory, it should be 0.
- @cur_placement: Hint of current placement.
- @wu_mutex: Wait unreserved mutex.
- Base class for TTM buffer object, that deals with data placement and CPU
- mappings. GPU mappings are really up to the driver, but for simpler GPUs
@@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */
struct sg_table *sg;
struct mutex wu_mutex; };
/**
@@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/**
- ttm_bo_uses_embedded_gem_object - check if the given bo uses the
Am 22.08.19 um 15:06 schrieb Daniel Vetter:
On Thu, Aug 22, 2019 at 07:56:56AM +0000, Koenig, Christian wrote:
Am 22.08.19 um 08:49 schrieb Daniel Vetter:
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2:
- Dont forget wu_mutex (Christian König)
- Keep the mmap_sem-less wait optimization (Thomas)
- Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com
btw I realized I didn't remove your r-b, since v1 was broken.
For formality, can you pls reaffirm, or still something broken?
My r-b is still valid.
Only problem I see is that neither of us seems to have a good idea about the different VM_FAULT_* replies.
But that worked before so it should still work now, Christian.
Also from the other thread: Reviewed-by: Thomas Hellström thellstrom@vmware.com
Thanks, Daniel
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com
drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv);
- mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); }
@@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru);
- mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages;
@@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);
-/**
- ttm_bo_wait_unreserved - interruptible wait for a buffer object to become
- unreserved
- @bo: Pointer to buffer
- */
-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
- int ret;
- /*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
- ret = mutex_lock_interruptible(&bo->wu_mutex);
- if (unlikely(ret != 0))
return -ERESTARTSYS;
- if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
- ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
- if (ret == -EINTR)
ret = -ERESTARTSYS;
- if (unlikely(ret != 0))
goto out_unlock;
- dma_resv_unlock(bo->base.resv);
-out_unlock:
- mutex_unlock(&bo->wu_mutex);
- return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
- mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
- /*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {*/
Not an expert on fault handling, but shouldn't this now be one if?
E.g. if FAULT_FLAG_RETRY_NOWAIT is set we should return VM_FAULT_NOPAGE instead of VM_FAULT_RETRY.
But really take that with a grain of salt, Christian.
ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem);
(void) ttm_bo_wait_unreserved(bo);
if (!dma_resv_lock_interruptible(bo->base.resv,
NULL))
dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); } return VM_FAULT_RETRY; }
/*
* If we'd want to change locking order to
* mmap_sem -> bo::reserve, we'd use a blocking reserve here
* instead of retrying the fault...
*/
return VM_FAULT_NOPAGE;
if (dma_resv_lock_interruptible(bo->base.resv, NULL))
return VM_FAULT_NOPAGE;
}
/*
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt; * @offset: The current GPU offset, which can have different meanings * depending on the memory type. For SYSTEM type memory, it should be 0. * @cur_placement: Hint of current placement.
- @wu_mutex: Wait unreserved mutex.
- Base class for TTM buffer object, that deals with data placement and CPU
- mappings. GPU mappings are really up to the driver, but for simpler GPUs
@@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */
struct sg_table *sg;
struct mutex wu_mutex; };
/**
@@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the
On 8/22/19 4:02 PM, Koenig, Christian wrote:
Am 22.08.19 um 15:06 schrieb Daniel Vetter:
On Thu, Aug 22, 2019 at 07:56:56AM +0000, Koenig, Christian wrote:
Am 22.08.19 um 08:49 schrieb Daniel Vetter:
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2:
- Dont forget wu_mutex (Christian König)
- Keep the mmap_sem-less wait optimization (Thomas)
- Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com
btw I realized I didn't remove your r-b, since v1 was broken.
For formality, can you pls reaffirm, or still something broken?
My r-b is still valid.
Only problem I see is that neither of us seems to have a good idea about the different VM_FAULT_* replies.
I took a look in mm/gup.c. It seems like when using get_user_pages, VM_FAULT_RETRY will retry to a requesting caller telling it that a long wait was expected and not performed, whereas VM_FAULT_NOPAGE will just keep get_user_pages to spin. So the proposed patch should be correct from my understanding.
If the fault originates from user-space, I guess either is fine.
/Thomas
But that worked before so it should still work now, Christian.
Also from the other thread: Reviewed-by: Thomas Hellström thellstrom@vmware.com
Thanks, Daniel
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com
drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv);
- mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); }
@@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru);
- mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages;
@@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);
-/**
- ttm_bo_wait_unreserved - interruptible wait for a buffer object to become
- unreserved
- @bo: Pointer to buffer
- */
-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
- int ret;
- /*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
- ret = mutex_lock_interruptible(&bo->wu_mutex);
- if (unlikely(ret != 0))
return -ERESTARTSYS;
- if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
- ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
- if (ret == -EINTR)
ret = -ERESTARTSYS;
- if (unlikely(ret != 0))
goto out_unlock;
- dma_resv_unlock(bo->base.resv);
-out_unlock:
- mutex_unlock(&bo->wu_mutex);
- return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
- mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
- /*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/ if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
Not an expert on fault handling, but shouldn't this now be one if?
E.g. if FAULT_FLAG_RETRY_NOWAIT is set we should return VM_FAULT_NOPAGE instead of VM_FAULT_RETRY.
But really take that with a grain of salt, Christian.
ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem);
(void) ttm_bo_wait_unreserved(bo);
if (!dma_resv_lock_interruptible(bo->base.resv,
NULL))
dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); } return VM_FAULT_RETRY; }
/*
* If we'd want to change locking order to
* mmap_sem -> bo::reserve, we'd use a blocking reserve here
* instead of retrying the fault...
*/
return VM_FAULT_NOPAGE;
if (dma_resv_lock_interruptible(bo->base.resv, NULL))
return VM_FAULT_NOPAGE; } /*
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt; * @offset: The current GPU offset, which can have different meanings * depending on the memory type. For SYSTEM type memory, it should be 0. * @cur_placement: Hint of current placement.
- @wu_mutex: Wait unreserved mutex.
- Base class for TTM buffer object, that deals with data placement and CPU
- mappings. GPU mappings are really up to the driver, but for simpler GPUs
@@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */
struct sg_table *sg;
struct mutex wu_mutex; };
/**
@@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On 8/22/19 4:24 PM, Thomas Hellström (VMware) wrote:
On 8/22/19 4:02 PM, Koenig, Christian wrote:
Am 22.08.19 um 15:06 schrieb Daniel Vetter:
On Thu, Aug 22, 2019 at 07:56:56AM +0000, Koenig, Christian wrote:
Am 22.08.19 um 08:49 schrieb Daniel Vetter:
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.
Assuming I didn't screw up anything with my audit of course.
v2:
- Dont forget wu_mutex (Christian König)
- Keep the mmap_sem-less wait optimization (Thomas)
- Use _lock_interruptible to be good citizens (Thomas)
Reviewed-by: Christian König christian.koenig@amd.com
btw I realized I didn't remove your r-b, since v1 was broken.
For formality, can you pls reaffirm, or still something broken?
My r-b is still valid.
Only problem I see is that neither of us seems to have a good idea about the different VM_FAULT_* replies.
I took a look in mm/gup.c. It seems like when using get_user_pages, VM_FAULT_RETRY will retry
s/retry/return/
to a requesting caller telling it that a long wait was expected and not performed, whereas VM_FAULT_NOPAGE will just keep get_user_pages to spin. So the proposed patch should be correct from my understanding.
If the fault originates from user-space, I guess either is fine.
/Thomas
Full audit of everyone:
- i915, radeon, amdgpu should be clean per their maintainers.
- vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
- panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
- v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
- vmwgfx has a bunch of ioctls that do their own copy_*_user: - vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe. - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out. - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there. Summary: vmwgfx seems to be fine too.
- virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
- qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
- A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..d233ef4cf0d7 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/** * DOC: Reservation Object Overview @@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void dma_resv_lockdep(void) +{ + struct mm_struct *mm = mm_alloc(); + struct dma_resv obj; + + if (!mm) + return; + + dma_resv_init(&obj); + + down_read(&mm->mmap_sem); + ww_mutex_lock(&obj.lock, NULL); + fs_reclaim_acquire(GFP_KERNEL); + fs_reclaim_release(GFP_KERNEL); + ww_mutex_unlock(&obj.lock); + up_read(&mm->mmap_sem); + + mmput(mm); +} +subsys_initcall(dma_resv_lockdep); +#endif + /** * dma_resv_init - initialize a reservation object * @obj: the reservation object
Quoting Daniel Vetter (2019-08-22 07:54:57)
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void dma_resv_lockdep(void) +{
struct mm_struct *mm = mm_alloc();
struct dma_resv obj;
if (!mm)
return;
dma_resv_init(&obj);
down_read(&mm->mmap_sem);
ww_mutex_lock(&obj.lock, NULL);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj.lock);
up_read(&mm->mmap_sem);
mmput(mm);
+} +subsys_initcall(dma_resv_lockdep); +#endif
Adding a
dma_resv_lock(); might_lock(&i915->drm.struct_mutex); dma_resv_unlock();
yielded
[ 18.513633] ====================================================== [ 18.513636] WARNING: possible circular locking dependency detected [ 18.513639] 5.3.0-rc5+ #76 Not tainted [ 18.513640] ------------------------------------------------------ [ 18.513643] insmod/655 is trying to acquire lock: [ 18.513645] 00000000877909e7 (&dev->struct_mutex){+.+.}, at: i915_driver_probe+0x89c/0x1470 [i915] [ 18.513671] [ 18.513671] but task is already holding lock: [ 18.513673] 00000000a85ba8ec (reservation_ww_class_mutex){+.+.}, at: i915_driver_probe+0x8e1/0x1470 [i915] [ 18.513698] [ 18.513698] which lock already depends on the new lock. [ 18.513698] [ 18.513701] [ 18.513701] the existing dependency chain (in reverse order) is: [ 18.513703] [ 18.513703] -> #1 (reservation_ww_class_mutex){+.+.}: [ 18.513708] __ww_mutex_lock.constprop.17+0xbc/0xf90 [ 18.513739] i915_gem_init+0x518/0x750 [i915] [ 18.513762] i915_driver_probe+0x891/0x1470 [i915] [ 18.513785] i915_pci_probe+0x2f/0x110 [i915] [ 18.513789] pci_device_probe+0x99/0x110 [ 18.513792] really_probe+0xd1/0x360 [ 18.513794] driver_probe_device+0xaf/0xf0 [ 18.513796] device_driver_attach+0x4a/0x50 [ 18.513799] __driver_attach+0x80/0x140 [ 18.513801] bus_for_each_dev+0x5e/0x90 [ 18.513804] bus_add_driver+0x148/0x1e0 [ 18.513806] driver_register+0x66/0xb0 [ 18.513809] do_one_initcall+0x45/0x29f [ 18.513812] do_init_module+0x55/0x200 [ 18.513814] load_module+0x2519/0x2690 [ 18.513816] __do_sys_finit_module+0x8f/0xd0 [ 18.513818] do_syscall_64+0x4f/0x220 [ 18.513822] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 18.513824] [ 18.513824] -> #0 (&dev->struct_mutex){+.+.}: [ 18.513828] __lock_acquire+0xcb9/0x1520 [ 18.513831] lock_acquire+0x90/0x170 [ 18.513853] i915_driver_probe+0x8fd/0x1470 [i915] [ 18.513876] i915_pci_probe+0x2f/0x110 [i915] [ 18.513879] pci_device_probe+0x99/0x110 [ 18.513881] really_probe+0xd1/0x360 [ 18.513883] driver_probe_device+0xaf/0xf0 [ 18.513886] device_driver_attach+0x4a/0x50 [ 18.513888] __driver_attach+0x80/0x140 [ 18.513891] bus_for_each_dev+0x5e/0x90 [ 18.513893] bus_add_driver+0x148/0x1e0 [ 18.513895] driver_register+0x66/0xb0 [ 18.513897] do_one_initcall+0x45/0x29f [ 18.513899] do_init_module+0x55/0x200 [ 18.513902] load_module+0x2519/0x2690 [ 18.513904] __do_sys_finit_module+0x8f/0xd0 [ 18.513906] do_syscall_64+0x4f/0x220 [ 18.513909] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 18.513911] [ 18.513911] other info that might help us debug this: [ 18.513911] [ 18.513914] Possible unsafe locking scenario: [ 18.513914] [ 18.513916] CPU0 CPU1 [ 18.513918] ---- ---- [ 18.513920] lock(reservation_ww_class_mutex); [ 18.513922] lock(&dev->struct_mutex); [ 18.513924] lock(reservation_ww_class_mutex); [ 18.513927] lock(&dev->struct_mutex); [ 18.513929] [ 18.513929] *** DEADLOCK *** [ 18.513929] [ 18.513932] 3 locks held by insmod/655: [ 18.513933] #0: 000000004dccb591 (&dev->mutex){....}, at: device_driver_attach+0x18/0x50 [ 18.513938] #1: 000000009118ecae (&mm->mmap_sem#2){++++}, at: i915_driver_probe+0x8c8/0x1470 [i915] [ 18.513962] #2: 00000000a85ba8ec (reservation_ww_class_mutex){+.+.}, at: i915_driver_probe+0x8e1/0x1470 [i915]
so
Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Tested-by: Chris Wilson chris@chris-wilson.co.uk -Chris
Am 22.08.19 um 08:54 schrieb Daniel Vetter:
Full audit of everyone:
i915, radeon, amdgpu should be clean per their maintainers.
vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
vmwgfx has a bunch of ioctls that do their own copy_*_user:
- vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.
- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.
- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.
Summary: vmwgfx seems to be fine too.
virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Reviewed-by: Christian König christian.koenig@amd.com
drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..d233ef4cf0d7 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/**
- DOC: Reservation Object Overview
@@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void dma_resv_lockdep(void) +{
- struct mm_struct *mm = mm_alloc();
- struct dma_resv obj;
- if (!mm)
return;
- dma_resv_init(&obj);
- down_read(&mm->mmap_sem);
- ww_mutex_lock(&obj.lock, NULL);
- fs_reclaim_acquire(GFP_KERNEL);
- fs_reclaim_release(GFP_KERNEL);
- ww_mutex_unlock(&obj.lock);
- up_read(&mm->mmap_sem);
- mmput(mm);
+} +subsys_initcall(dma_resv_lockdep); +#endif
- /**
- dma_resv_init - initialize a reservation object
- @obj: the reservation object
On Thu, Aug 22, 2019 at 07:53:53AM +0000, Koenig, Christian wrote:
Am 22.08.19 um 08:54 schrieb Daniel Vetter:
Full audit of everyone:
i915, radeon, amdgpu should be clean per their maintainers.
vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
vmwgfx has a bunch of ioctls that do their own copy_*_user:
- vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.
- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.
- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.
Summary: vmwgfx seems to be fine too.
virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Reviewed-by: Christian König christian.koenig@amd.com
Did you get a chance to give this a spin on the amd CI? -Daniel
drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..d233ef4cf0d7 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/**
- DOC: Reservation Object Overview
@@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void dma_resv_lockdep(void) +{
- struct mm_struct *mm = mm_alloc();
- struct dma_resv obj;
- if (!mm)
return;
- dma_resv_init(&obj);
- down_read(&mm->mmap_sem);
- ww_mutex_lock(&obj.lock, NULL);
- fs_reclaim_acquire(GFP_KERNEL);
- fs_reclaim_release(GFP_KERNEL);
- ww_mutex_unlock(&obj.lock);
- up_read(&mm->mmap_sem);
- mmput(mm);
+} +subsys_initcall(dma_resv_lockdep); +#endif
- /**
- dma_resv_init - initialize a reservation object
- @obj: the reservation object
Am 03.09.19 um 10:16 schrieb Daniel Vetter:
On Thu, Aug 22, 2019 at 07:53:53AM +0000, Koenig, Christian wrote:
Am 22.08.19 um 08:54 schrieb Daniel Vetter:
Full audit of everyone:
i915, radeon, amdgpu should be clean per their maintainers.
vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
vmwgfx has a bunch of ioctls that do their own copy_*_user:
- vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.
- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.
- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.
Summary: vmwgfx seems to be fine too.
virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Reviewed-by: Christian König christian.koenig@amd.com
Did you get a chance to give this a spin on the amd CI?
No and sorry totally forgot to ask about that.
Going to try to bring this up tomorrow once more, but don't expect that I can get this tested anytime soon.
Christian.
-Daniel
drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..d233ef4cf0d7 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/** * DOC: Reservation Object Overview @@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void dma_resv_lockdep(void) +{
- struct mm_struct *mm = mm_alloc();
- struct dma_resv obj;
- if (!mm)
return;
- dma_resv_init(&obj);
- down_read(&mm->mmap_sem);
- ww_mutex_lock(&obj.lock, NULL);
- fs_reclaim_acquire(GFP_KERNEL);
- fs_reclaim_release(GFP_KERNEL);
- ww_mutex_unlock(&obj.lock);
- up_read(&mm->mmap_sem);
- mmput(mm);
+} +subsys_initcall(dma_resv_lockdep); +#endif
- /**
- dma_resv_init - initialize a reservation object
- @obj: the reservation object
On Thu, Aug 22, 2019 at 1:55 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
Full audit of everyone:
i915, radeon, amdgpu should be clean per their maintainers.
vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
vmwgfx has a bunch of ioctls that do their own copy_*_user:
- vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.
- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.
- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.
Summary: vmwgfx seems to be fine too.
virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..d233ef4cf0d7 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/**
- DOC: Reservation Object Overview
@@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void dma_resv_lockdep(void)
__init
+{
struct mm_struct *mm = mm_alloc();
struct dma_resv obj;
if (!mm)
return;
dma_resv_init(&obj);
down_read(&mm->mmap_sem);
ww_mutex_lock(&obj.lock, NULL);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj.lock);
up_read(&mm->mmap_sem);
mmput(mm);
+} +subsys_initcall(dma_resv_lockdep); +#endif
/**
- dma_resv_init - initialize a reservation object
- @obj: the reservation object
-- 2.23.0.rc1
Full audit of everyone:
- i915, radeon, amdgpu should be clean per their maintainers.
- vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
- panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
- v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
- vmwgfx has a bunch of ioctls that do their own copy_*_user: - vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe. - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out. - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there. Summary: vmwgfx seems to be fine too.
- virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
- qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
- A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
v4: Annotate with __init (Rob Herring)
Cc: Rob Herring robh@kernel.org Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Christian König christian.koenig@amd.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Tested-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..97c4c4812d08 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/** * DOC: Reservation Object Overview @@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void __init dma_resv_lockdep(void) +{ + struct mm_struct *mm = mm_alloc(); + struct dma_resv obj; + + if (!mm) + return; + + dma_resv_init(&obj); + + down_read(&mm->mmap_sem); + ww_mutex_lock(&obj.lock, NULL); + fs_reclaim_acquire(GFP_KERNEL); + fs_reclaim_release(GFP_KERNEL); + ww_mutex_unlock(&obj.lock); + up_read(&mm->mmap_sem); + + mmput(mm); +} +subsys_initcall(dma_resv_lockdep); +#endif + /** * dma_resv_init - initialize a reservation object * @obj: the reservation object
On 8/22/19 3:07 PM, Daniel Vetter wrote:
Full audit of everyone:
i915, radeon, amdgpu should be clean per their maintainers.
vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
vmwgfx has a bunch of ioctls that do their own copy_*_user:
- vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.
- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.
- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.
Summary: vmwgfx seems to be fine too.
virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
v4: Annotate with __init (Rob Herring)
Cc: Rob Herring robh@kernel.org Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Christian König christian.koenig@amd.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Tested-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..97c4c4812d08 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/**
- DOC: Reservation Object Overview
@@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void __init dma_resv_lockdep(void) +{
- struct mm_struct *mm = mm_alloc();
- struct dma_resv obj;
- if (!mm)
return;
- dma_resv_init(&obj);
- down_read(&mm->mmap_sem);
I took a quick look into using lockdep macros replacing the actual locks: Something along the lines of
lock_acquire(mm->mmap_sem.dep_map, 0, 0, 1, 1, NULL, _THIS_IP_);
- ww_mutex_lock(&obj.lock, NULL);
lock_acquire(obj.lock.dep_map, 0, 0, 0, 1, NULL, _THIS_IP_);
- fs_reclaim_acquire(GFP_KERNEL);
- fs_reclaim_release(GFP_KERNEL);
- ww_mutex_unlock(&obj.lock);
lock_release(obj.lock.dep_map, 0, _THIS_IP_);
- up_read(&mm->mmap_sem);
lock_release(obj.lock.dep_map, 0, _THIS_IP_);
Either way is fine with me, though.
Reviewed-by: Thomas Hellström thellstrom@vmware.com
- mmput(mm);
+} +subsys_initcall(dma_resv_lockdep); +#endif
- /**
- dma_resv_init - initialize a reservation object
- @obj: the reservation object
On Thu, Aug 22, 2019 at 3:30 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:
On 8/22/19 3:07 PM, Daniel Vetter wrote:
Full audit of everyone:
i915, radeon, amdgpu should be clean per their maintainers.
vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
vmwgfx has a bunch of ioctls that do their own copy_*_user:
- vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.
- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.
- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.
Summary: vmwgfx seems to be fine too.
virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
v4: Annotate with __init (Rob Herring)
Cc: Rob Herring robh@kernel.org Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Christian König christian.koenig@amd.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Tested-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..97c4c4812d08 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/**
- DOC: Reservation Object Overview
@@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void __init dma_resv_lockdep(void) +{
struct mm_struct *mm = mm_alloc();
struct dma_resv obj;
if (!mm)
return;
dma_resv_init(&obj);
down_read(&mm->mmap_sem);
I took a quick look into using lockdep macros replacing the actual locks: Something along the lines of
lock_acquire(mm->mmap_sem.dep_map, 0, 0, 1, 1, NULL, _THIS_IP_);
Yeah I'm not a fan of the magic numbers this nees :-/ And now this is run once at startup, so the taking the fake locks for real, once, shouldn't hurt. Lockdep updating it's data structures is going to be 100x more cpu cycles anyway :-)
ww_mutex_lock(&obj.lock, NULL);
lock_acquire(obj.lock.dep_map, 0, 0, 0, 1, NULL, _THIS_IP_);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj.lock);
lock_release(obj.lock.dep_map, 0, _THIS_IP_);
up_read(&mm->mmap_sem);
lock_release(obj.lock.dep_map, 0, _THIS_IP_);
Either way is fine with me, though.
Reviewed-by: Thomas Hellström thellstrom@vmware.com
Thanks for your review comments.
Can you pls also run this in some test cycles, if that's easily possible? I'd like to have a tested-by from at least the big drivers - i915, amd, nouveau, vmwgfx and is definitely using ttm to its fullest too, so best chances for hitting an oversight.
Cheers, Daniel
mmput(mm);
+} +subsys_initcall(dma_resv_lockdep); +#endif
- /**
- dma_resv_init - initialize a reservation object
- @obj: the reservation object
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On 8/22/19 3:36 PM, Daniel Vetter wrote:
On Thu, Aug 22, 2019 at 3:30 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:
On 8/22/19 3:07 PM, Daniel Vetter wrote:
Full audit of everyone:
i915, radeon, amdgpu should be clean per their maintainers.
vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.
panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.
v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.
vmwgfx has a bunch of ioctls that do their own copy_*_user:
- vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.
- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.
- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.
Summary: vmwgfx seems to be fine too.
virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.
A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.
v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a dma_resv lock of a different object already. Christian mentioned that ttm core does this too for ghost objects. intel-gfx-ci highlighted that i915 has similar issues.
Unfortunately we can't do this in the usual module init functions, because kernel threads don't have an ->mm - we have to wait around for some user thread to do this.
Solution is to spawn a worker (but only once). It's horrible, but it works.
v3: We can allocate mm! (Chris). Horrible worker hack out, clean initcall solution in.
v4: Annotate with __init (Rob Herring)
Cc: Rob Herring robh@kernel.org Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Christian König christian.koenig@amd.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Tested-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/dma-buf/dma-resv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..97c4c4812d08 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@
#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>
/** * DOC: Reservation Object Overview @@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list) kfree_rcu(list, rcu); }
+#if IS_ENABLED(CONFIG_LOCKDEP) +static void __init dma_resv_lockdep(void) +{
struct mm_struct *mm = mm_alloc();
struct dma_resv obj;
if (!mm)
return;
dma_resv_init(&obj);
down_read(&mm->mmap_sem);
I took a quick look into using lockdep macros replacing the actual locks: Something along the lines of
lock_acquire(mm->mmap_sem.dep_map, 0, 0, 1, 1, NULL, _THIS_IP_);
Yeah I'm not a fan of the magic numbers this nees :-/ And now this is run once at startup, so the taking the fake locks for real, once, shouldn't hurt. Lockdep updating it's data structures is going to be 100x more cpu cycles anyway :-)
ww_mutex_lock(&obj.lock, NULL);
lock_acquire(obj.lock.dep_map, 0, 0, 0, 1, NULL, _THIS_IP_);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj.lock);
lock_release(obj.lock.dep_map, 0, _THIS_IP_);
up_read(&mm->mmap_sem);
lock_release(obj.lock.dep_map, 0, _THIS_IP_);
Either way is fine with me, though.
Reviewed-by: Thomas Hellström thellstrom@vmware.com
Thanks for your review comments.
Can you pls also run this in some test cycles, if that's easily possible? I'd like to have a tested-by from at least the big drivers - i915, amd, nouveau, vmwgfx and is definitely using ttm to its fullest too, so best chances for hitting an oversight.
Cheers, Daniel
Tested vmwgfx with a decent OpenGL / rendercheck stress test and no lockdep trips.
/Thomas
Tested-by: Thomas Hellström thellstrom@vmware.com
dri-devel@lists.freedesktop.org