[PATCH 0/3] RFC/T: dma_resv vs. mmap_sem

List overview All Threads
Download

newer

older

[PATCH v4 0/4] drm/bochs: some...

linux-next: manual merge of the...

Daniel Vetter

20 Aug 2019 20 Aug '19

2:53 p.m.

Hi all,

As part of all the recent discussions around ttm and dma_resv I started looking into this. The goal (at least somewhen in the near future) is to have it all documented and the cross-driver semantics locked down as much as possible.

One of the biggest issues there is how the dma_resv ww_mutex fits into the overall locking hierarchy, especially wrt core mm locks. This patch series should fix that.

Comments, review and especially testing (there's only so much you can do with auditing) very much appreciated.

Cheers, Daniel

Daniel Vetter (3): dma_resv: prime lockdep annotations drm/nouveau: slowpath for pushbuf ioctl drm/ttm: remove ttm_bo_wait_unreserved

drivers/dma-buf/dma-resv.c | 12 ++++++ drivers/gpu/drm/nouveau/nouveau_gem.c | 57 ++++++++++++++++++--------- drivers/gpu/drm/ttm/ttm_bo.c | 34 ---------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +----------- include/drm/ttm/ttm_bo_api.h | 1 - 5 files changed, 51 insertions(+), 79 deletions(-)

-- 2.23.0.rc1

Show replies by date

Daniel Vetter

20 Aug 20 Aug

2:53 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

Full audit of everyone:

- i915, radeon, amdgpu should be clean per their maintainers.

- vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

- panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

- v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

- vmwgfx has a bunch of ioctls that do their own copy_*_user: - vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe. - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out. - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there. Summary: vmwgfx seems to be fine too.

- virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

- qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

- A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/dma-buf/dma-resv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@

#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>

/** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); + + if (IS_ENABLED(CONFIG_LOCKDEP)) { + if (current->mm) + down_read(&current->mm->mmap_sem); + ww_mutex_lock(&obj->lock, NULL); + fs_reclaim_acquire(GFP_KERNEL); + fs_reclaim_release(GFP_KERNEL); + ww_mutex_unlock(&obj->lock); + if (current->mm) + up_read(&current->mm->mmap_sem); + } } EXPORT_SYMBOL(dma_resv_init);

-- 2.23.0.rc1

Koenig, Christian

2:56 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

Am 20.08.19 um 16:53 schrieb Daniel Vetter:

...

Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

Reviewed-by: Christian König christian.koenig@amd.com

...

drivers/dma-buf/dma-resv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@

#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>

/**

DOC: Reservation Object Overview

@@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
if (IS_ENABLED(CONFIG_LOCKDEP)) {
if (current->mm)
	down_read(&current->mm->mmap_sem);
ww_mutex_lock(&obj->lock, NULL);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj->lock);
if (current->mm)
	up_read(&current->mm->mmap_sem);
} } EXPORT_SYMBOL(dma_resv_init);

Daniel Vetter

21 Aug 21 Aug

3:44 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On Tue, Aug 20, 2019 at 02:56:36PM +0000, Koenig, Christian wrote:

...

Am 20.08.19 um 16:53 schrieb Daniel Vetter:

...
Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

Reviewed-by: Christian König christian.koenig@amd.com

Does this r-b just apply to radeon/amdgpu or for the full audit? -Daniel

...

...
drivers/dma-buf/dma-resv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@

#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>

/**

DOC: Reservation Object Overview

@@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
if (IS_ENABLED(CONFIG_LOCKDEP)) {
if (current->mm)
	down_read(&current->mm->mmap_sem);
ww_mutex_lock(&obj->lock, NULL);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj->lock);
if (current->mm)
	up_read(&current->mm->mmap_sem);
} } EXPORT_SYMBOL(dma_resv_init);

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Chris Wilson

20 Aug 20 Aug

2:58 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

Quoting Daniel Vetter (2019-08-20 15:53:34)

...

Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

drivers/dma-buf/dma-resv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@

#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>

/**

DOC: Reservation Object Overview

@@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
  if (IS_ENABLED(CONFIG_LOCKDEP)) {
          if (current->mm)
                  down_read(&current->mm->mmap_sem);
          ww_mutex_lock(&obj->lock, NULL);
          fs_reclaim_acquire(GFP_KERNEL);
          fs_reclaim_release(GFP_KERNEL);

A candidate for might_alloc(GFP_KERNEL), we've repeated this often enough. -Chris

Thomas Hellström (VMware)

21 Aug 21 Aug

3:54 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On 8/20/19 4:53 PM, Daniel Vetter wrote:

...

Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

drivers/dma-buf/dma-resv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@

#include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h>

/**

DOC: Reservation Object Overview

@@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
if (IS_ENABLED(CONFIG_LOCKDEP)) {
if (current->mm)
	down_read(&current->mm->mmap_sem);
ww_mutex_lock(&obj->lock, NULL);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj->lock);
if (current->mm)
	up_read(&current->mm->mmap_sem);
} } EXPORT_SYMBOL(dma_resv_init);

I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done?

Otherwise LGTM.

Reviewed-by: Thomas Hellström thellstrom@vmware.com

Will test this and let you know if it trips on vmwgfx, but it really shouldn't.

/Thomas

Daniel Vetter

4:34 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote:

...

On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

drivers/dma-buf/dma-resv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h> /**

DOC: Reservation Object Overview

@@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
if (IS_ENABLED(CONFIG_LOCKDEP)) {
if (current->mm)
	down_read(&current->mm->mmap_sem);
ww_mutex_lock(&obj->lock, NULL);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj->lock);
if (current->mm)
	up_read(&current->mm->mmap_sem);
} } EXPORT_SYMBOL(dma_resv_init);
I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done?

There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead.

...

Otherwise LGTM.

Reviewed-by: Thomas Hellström thellstrom@vmware.com

Will test this and let you know if it trips on vmwgfx, but it really shouldn't.

Thanks, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Thomas Hellström (VMware)

5:06 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On 8/21/19 6:34 PM, Daniel Vetter wrote:

...

On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

drivers/dma-buf/dma-resv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h> /** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
if (IS_ENABLED(CONFIG_LOCKDEP)) {
if (current->mm)
	down_read(&current->mm->mmap_sem);
ww_mutex_lock(&obj->lock, NULL);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(&obj->lock);
if (current->mm)
	up_read(&current->mm->mmap_sem);
} } EXPORT_SYMBOL(dma_resv_init);
I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done?
There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead.

...
Otherwise LGTM.

Reviewed-by: Thomas Hellström thellstrom@vmware.com

Will test this and let you know if it trips on vmwgfx, but it really shouldn't.

Thanks, Daniel

One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem.

/Thomas

Daniel Vetter

6:11 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On Wed, Aug 21, 2019 at 7:06 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...

On 8/21/19 6:34 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

drivers/dma-buf/dma-resv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h> /** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
if (IS_ENABLED(CONFIG_LOCKDEP)) {
      if (current->mm)
              down_read(&current->mm->mmap_sem);
      ww_mutex_lock(&obj->lock, NULL);
      fs_reclaim_acquire(GFP_KERNEL);
      fs_reclaim_release(GFP_KERNEL);
      ww_mutex_unlock(&obj->lock);
      if (current->mm)
              up_read(&current->mm->mmap_sem);
} } EXPORT_SYMBOL(dma_resv_init);
I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done?
There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead.

...
Otherwise LGTM.

Reviewed-by: Thomas Hellström thellstrom@vmware.com

Will test this and let you know if it trips on vmwgfx, but it really shouldn't.

Thanks, Daniel
One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem.

Hm yeah ... the trouble is a need a non-kthread thread so that I have a current->mm. Otherwise I'd have put it into some init section with a temp dma_buf. And I kinda don't want to create a fake ->mm just for lockdep priming. I don't expect this to be a real problem in practice, since before you've called dma_resv_init the reservation lock doesn't exist, so you can't hold it. And you've probably just allocated it, so fs_reclaim is going to be fine. And if you allocate dma_resv objects from your fault handlers I have questions anyway :-)

So I think this should be safe. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellström (VMware)

6:27 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On 8/21/19 8:11 PM, Daniel Vetter wrote:

...

On Wed, Aug 21, 2019 at 7:06 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 6:34 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/dma-buf/dma-resv.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h> /** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
if (IS_ENABLED(CONFIG_LOCKDEP)) {
      if (current->mm)
              down_read(&current->mm->mmap_sem);
      ww_mutex_lock(&obj->lock, NULL);
      fs_reclaim_acquire(GFP_KERNEL);
      fs_reclaim_release(GFP_KERNEL);
      ww_mutex_unlock(&obj->lock);
      if (current->mm)
              up_read(&current->mm->mmap_sem);
} } EXPORT_SYMBOL(dma_resv_init);
I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done?
There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead.

...
Otherwise LGTM.

Reviewed-by: Thomas Hellström thellstrom@vmware.com

Will test this and let you know if it trips on vmwgfx, but it really shouldn't.

Thanks, Daniel
One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem.
Hm yeah ... the trouble is a need a non-kthread thread so that I have a current->mm. Otherwise I'd have put it into some init section with a temp dma_buf. And I kinda don't want to create a fake ->mm just for lockdep priming. I don't expect this to be a real problem in practice, since before you've called dma_resv_init the reservation lock doesn't exist, so you can't hold it. And you've probably just allocated it, so fs_reclaim is going to be fine. And if you allocate dma_resv objects from your fault handlers I have questions anyway :-)

Coming to think of it, I think vmwgfx sometimes create bos with other bo's reservation lock held. I guess that would trip both the mmap_sem check the ww_mutex check?

/Thomas

...

So I think this should be safe. -Daniel

Daniel Vetter

7:51 p.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On Wed, Aug 21, 2019 at 08:27:59PM +0200, Thomas Hellström (VMware) wrote:

...

On 8/21/19 8:11 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 7:06 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 6:34 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
Full audit of everyone:

i915, radeon, amdgpu should be clean per their maintainers.

vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all.

panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean.

v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section.

vmwgfx has a bunch of ioctls that do their own copy_*_user:

vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe.

vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out.

a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there.

Summary: vmwgfx seems to be fine too.

virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe.

qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe.

A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up.

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Eric Anholt eric@anholt.net Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Ben Skeggs bskeggs@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/dma-buf/dma-resv.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include <linux/dma-resv.h> #include <linux/export.h> +#include <linux/sched/mm.h> /** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) &reservation_seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL);
if (IS_ENABLED(CONFIG_LOCKDEP)) {
      if (current->mm)
              down_read(&current->mm->mmap_sem);
      ww_mutex_lock(&obj->lock, NULL);
      fs_reclaim_acquire(GFP_KERNEL);
      fs_reclaim_release(GFP_KERNEL);
      ww_mutex_unlock(&obj->lock);
      if (current->mm)
              up_read(&current->mm->mmap_sem);
} } EXPORT_SYMBOL(dma_resv_init);
I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done?
There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead.

...
Otherwise LGTM.

Reviewed-by: Thomas Hellström thellstrom@vmware.com

Will test this and let you know if it trips on vmwgfx, but it really shouldn't.

Thanks, Daniel
One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem.
Hm yeah ... the trouble is a need a non-kthread thread so that I have a current->mm. Otherwise I'd have put it into some init section with a temp dma_buf. And I kinda don't want to create a fake ->mm just for lockdep priming. I don't expect this to be a real problem in practice, since before you've called dma_resv_init the reservation lock doesn't exist, so you can't hold it. And you've probably just allocated it, so fs_reclaim is going to be fine. And if you allocate dma_resv objects from your fault handlers I have questions anyway :-)
Coming to think of it, I think vmwgfx sometimes create bos with other bo's reservation lock held. I guess that would trip both the mmap_sem check the ww_mutex check?

If you do that, yes we're busted. Do you do that?

I guess needs a new idea for where to put this ... while making sure everyone gets it. So some evil trick like putting it in drm_open() won't work, since I also want to make sure everyone else using dma-buf follows these rules.

Ideas? -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Thomas Hellström (VMware)

22 Aug 22 Aug

6:42 a.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On 8/21/19 9:51 PM, Daniel Vetter wrote:

...

On Wed, Aug 21, 2019 at 08:27:59PM +0200, Thomas Hellström (VMware) wrote:

...
On 8/21/19 8:11 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 7:06 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 6:34 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote: > Full audit of everyone: > > - i915, radeon, amdgpu should be clean per their maintainers. > > - vram helpers should be fine, they don't do command submission, so > really no business holding struct_mutex while doing copy_*_user. But > I haven't checked them all. > > - panfrost seems to dma_resv_lock only in panfrost_job_push, which > looks clean. > > - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), > copying from/to userspace happens all in v3d_lookup_bos which is > outside of the critical section. > > - vmwgfx has a bunch of ioctls that do their own copy_*_user: > - vmw_execbuf_process: First this does some copies in > vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. > Then comes the usual ttm reserve/validate sequence, then actual > submission/fencing, then unreserving, and finally some more > copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of > details, but looks all safe. > - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be > seen, seems to only create a fence and copy it out. > - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be > found there. > Summary: vmwgfx seems to be fine too. > > - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the > copying from userspace before even looking up objects through their > handles, so safe. Plus the getparam/getcaps ioctl, also both safe. > > - qxl only has qxl_execbuffer_ioctl, which calls into > qxl_process_single_command. There's a lovely comment before the > __copy_from_user_inatomic that the slowpath should be copied from > i915, but I guess that never happened. Try not to be unlucky and get > your CS data evicted between when it's written and the kernel tries > to read it. The only other copy_from_user is for relocs, but those > are done before qxl_release_reserve_list(), which seems to be the > only thing reserving buffers (in the ttm/dma_resv sense) in that > code. So looks safe. > > - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in > usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this > everywhere and needs to be fixed up. > > Cc: Alex Deucher alexander.deucher@amd.com > Cc: Christian König christian.koenig@amd.com > Cc: Chris Wilson chris@chris-wilson.co.uk > Cc: Thomas Zimmermann tzimmermann@suse.de > Cc: Rob Herring robh@kernel.org > Cc: Tomeu Vizoso tomeu.vizoso@collabora.com > Cc: Eric Anholt eric@anholt.net > Cc: Dave Airlie airlied@redhat.com > Cc: Gerd Hoffmann kraxel@redhat.com > Cc: Ben Skeggs bskeggs@redhat.com > Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com > Cc: Thomas Hellstrom thellstrom@vmware.com > Signed-off-by: Daniel Vetter daniel.vetter@intel.com > --- > drivers/dma-buf/dma-resv.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c > index 42a8f3f11681..3edca10d3faf 100644 > --- a/drivers/dma-buf/dma-resv.c > +++ b/drivers/dma-buf/dma-resv.c > @@ -34,6 +34,7 @@ > #include <linux/dma-resv.h> > #include <linux/export.h> > +#include <linux/sched/mm.h> > /** > * DOC: Reservation Object Overview > @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) > &reservation_seqcount_class); > RCU_INIT_POINTER(obj->fence, NULL); > RCU_INIT_POINTER(obj->fence_excl, NULL); > + > + if (IS_ENABLED(CONFIG_LOCKDEP)) { > + if (current->mm) > + down_read(&current->mm->mmap_sem); > + ww_mutex_lock(&obj->lock, NULL); > + fs_reclaim_acquire(GFP_KERNEL); > + fs_reclaim_release(GFP_KERNEL); > + ww_mutex_unlock(&obj->lock); > + if (current->mm) > + up_read(&current->mm->mmap_sem); > + } > } > EXPORT_SYMBOL(dma_resv_init); I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done?

There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead.

...
Otherwise LGTM.

Reviewed-by: Thomas Hellström thellstrom@vmware.com

Will test this and let you know if it trips on vmwgfx, but it really shouldn't.

Thanks, Daniel

One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem.

Hm yeah ... the trouble is a need a non-kthread thread so that I have a current->mm. Otherwise I'd have put it into some init section with a temp dma_buf. And I kinda don't want to create a fake ->mm just for lockdep priming. I don't expect this to be a real problem in practice, since before you've called dma_resv_init the reservation lock doesn't exist, so you can't hold it. And you've probably just allocated it, so fs_reclaim is going to be fine. And if you allocate dma_resv objects from your fault handlers I have questions anyway :-)

Coming to think of it, I think vmwgfx sometimes create bos with other bo's reservation lock held. I guess that would trip both the mmap_sem check the ww_mutex check?

If you do that, yes we're busted. Do you do that?

Yes, we do, in a couple of places it seems, and it also appears like TTM is doing it according to Christian.

...

I guess needs a new idea for where to put this ... while making sure everyone gets it. So some evil trick like putting it in drm_open() won't work, since I also want to make sure everyone else using dma-buf follows these rules.

IMO it should be sufficient to establish this locking order once, but I guess dma-buf module init time won't work because we might not have an mm structure?

/Thomas

...

Ideas? -Daniel

Daniel Vetter

6:47 a.m.

New subject: [PATCH 1/3] dma_resv: prime lockdep annotations

On Thu, Aug 22, 2019 at 8:42 AM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...

On 8/21/19 9:51 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 08:27:59PM +0200, Thomas Hellström (VMware) wrote:

...
On 8/21/19 8:11 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 7:06 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 6:34 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote: > On 8/20/19 4:53 PM, Daniel Vetter wrote: >> Full audit of everyone: >> >> - i915, radeon, amdgpu should be clean per their maintainers. >> >> - vram helpers should be fine, they don't do command submission, so >> really no business holding struct_mutex while doing copy_*_user. But >> I haven't checked them all. >> >> - panfrost seems to dma_resv_lock only in panfrost_job_push, which >> looks clean. >> >> - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), >> copying from/to userspace happens all in v3d_lookup_bos which is >> outside of the critical section. >> >> - vmwgfx has a bunch of ioctls that do their own copy_*_user: >> - vmw_execbuf_process: First this does some copies in >> vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. >> Then comes the usual ttm reserve/validate sequence, then actual >> submission/fencing, then unreserving, and finally some more >> copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of >> details, but looks all safe. >> - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be >> seen, seems to only create a fence and copy it out. >> - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be >> found there. >> Summary: vmwgfx seems to be fine too. >> >> - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the >> copying from userspace before even looking up objects through their >> handles, so safe. Plus the getparam/getcaps ioctl, also both safe. >> >> - qxl only has qxl_execbuffer_ioctl, which calls into >> qxl_process_single_command. There's a lovely comment before the >> __copy_from_user_inatomic that the slowpath should be copied from >> i915, but I guess that never happened. Try not to be unlucky and get >> your CS data evicted between when it's written and the kernel tries >> to read it. The only other copy_from_user is for relocs, but those >> are done before qxl_release_reserve_list(), which seems to be the >> only thing reserving buffers (in the ttm/dma_resv sense) in that >> code. So looks safe. >> >> - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in >> usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this >> everywhere and needs to be fixed up. >> >> Cc: Alex Deucher alexander.deucher@amd.com >> Cc: Christian König christian.koenig@amd.com >> Cc: Chris Wilson chris@chris-wilson.co.uk >> Cc: Thomas Zimmermann tzimmermann@suse.de >> Cc: Rob Herring robh@kernel.org >> Cc: Tomeu Vizoso tomeu.vizoso@collabora.com >> Cc: Eric Anholt eric@anholt.net >> Cc: Dave Airlie airlied@redhat.com >> Cc: Gerd Hoffmann kraxel@redhat.com >> Cc: Ben Skeggs bskeggs@redhat.com >> Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com >> Cc: Thomas Hellstrom thellstrom@vmware.com >> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >> --- >> drivers/dma-buf/dma-resv.c | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c >> index 42a8f3f11681..3edca10d3faf 100644 >> --- a/drivers/dma-buf/dma-resv.c >> +++ b/drivers/dma-buf/dma-resv.c >> @@ -34,6 +34,7 @@ >> #include <linux/dma-resv.h> >> #include <linux/export.h> >> +#include <linux/sched/mm.h> >> /** >> * DOC: Reservation Object Overview >> @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) >> &reservation_seqcount_class); >> RCU_INIT_POINTER(obj->fence, NULL); >> RCU_INIT_POINTER(obj->fence_excl, NULL); >> + >> + if (IS_ENABLED(CONFIG_LOCKDEP)) { >> + if (current->mm) >> + down_read(&current->mm->mmap_sem); >> + ww_mutex_lock(&obj->lock, NULL); >> + fs_reclaim_acquire(GFP_KERNEL); >> + fs_reclaim_release(GFP_KERNEL); >> + ww_mutex_unlock(&obj->lock); >> + if (current->mm) >> + up_read(&current->mm->mmap_sem); >> + } >> } >> EXPORT_SYMBOL(dma_resv_init); > I assume if this would have been easily done and maintainable using only > lockdep annotation instead of actually acquiring the locks, that would have > been done? There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead.

> Otherwise LGTM. > > Reviewed-by: Thomas Hellström thellstrom@vmware.com > > Will test this and let you know if it trips on vmwgfx, but it really > shouldn't. Thanks, Daniel

One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem.

Hm yeah ... the trouble is a need a non-kthread thread so that I have a current->mm. Otherwise I'd have put it into some init section with a temp dma_buf. And I kinda don't want to create a fake ->mm just for lockdep priming. I don't expect this to be a real problem in practice, since before you've called dma_resv_init the reservation lock doesn't exist, so you can't hold it. And you've probably just allocated it, so fs_reclaim is going to be fine. And if you allocate dma_resv objects from your fault handlers I have questions anyway :-)

Coming to think of it, I think vmwgfx sometimes create bos with other bo's reservation lock held. I guess that would trip both the mmap_sem check the ww_mutex check?

If you do that, yes we're busted. Do you do that?

Yes, we do, in a couple of places it seems, and it also appears like TTM is doing it according to Christian.

...
I guess needs a new idea for where to put this ... while making sure everyone gets it. So some evil trick like putting it in drm_open() won't work, since I also want to make sure everyone else using dma-buf follows these rules.

IMO it should be sufficient to establish this locking order once, but I guess dma-buf module init time won't work because we might not have an mm structure?

mm_alloc() is a thing as Chris pointed out, and it works. v3 on its way. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Vetter

20 Aug 20 Aug

2:53 p.m.

New subject: [PATCH 2/3] drm/nouveau: slowpath for pushbuf ioctl

We can't copy_*_user while holding reservations, that will (soon even for nouveau) lead to deadlocks. And it breaks the cross-driver contract around dma_resv.

Fix this by adding a slowpath for when we need relocations, and by pushing the writeback of the new presumed offsets to the very end.

Aside from "it compiles" entirely untested unfortunately.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Ben Skeggs bskeggs@redhat.com Cc: nouveau@lists.freedesktop.org --- drivers/gpu/drm/nouveau/nouveau_gem.c | 57 ++++++++++++++++++--------- 1 file changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index c77302f969e8..60309b997951 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -482,12 +482,9 @@ validate_init(struct nouveau_channel *chan, struct drm_file *file_priv,

static int validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli, - struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo, - uint64_t user_pbbo_ptr) + struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo) { struct nouveau_drm *drm = chan->drm; - struct drm_nouveau_gem_pushbuf_bo __user *upbbo = - (void __force __user *)(uintptr_t)user_pbbo_ptr; struct nouveau_bo *nvbo; int ret, relocs = 0;

@@ -531,10 +528,6 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli, b->presumed.offset = nvbo->bo.offset; b->presumed.valid = 0; relocs++; - - if (copy_to_user(&upbbo[nvbo->pbbo_index].presumed, - &b->presumed, sizeof(b->presumed))) - return -EFAULT; } }

@@ -545,8 +538,8 @@ static int nouveau_gem_pushbuf_validate(struct nouveau_channel *chan, struct drm_file *file_priv, struct drm_nouveau_gem_pushbuf_bo *pbbo, - uint64_t user_buffers, int nr_buffers, - struct validate_op *op, int *apply_relocs) + int nr_buffers, + struct validate_op *op, bool *apply_relocs) { struct nouveau_cli *cli = nouveau_cli(file_priv); int ret; @@ -563,7 +556,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan, return ret; }

- ret = validate_list(chan, cli, &op->list, pbbo, user_buffers); + ret = validate_list(chan, cli, &op->list, pbbo); if (unlikely(ret < 0)) { if (ret != -ERESTARTSYS) NV_PRINTK(err, cli, "validating bo list\n"); @@ -603,16 +596,12 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size) static int nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, struct drm_nouveau_gem_pushbuf *req, + struct drm_nouveau_gem_pushbuf_reloc *reloc, struct drm_nouveau_gem_pushbuf_bo *bo) { - struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL; int ret = 0; unsigned i;

- reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc)); - if (IS_ERR(reloc)) - return PTR_ERR(reloc); - for (i = 0; i < req->nr_relocs; i++) { struct drm_nouveau_gem_pushbuf_reloc *r = &reloc[i]; struct drm_nouveau_gem_pushbuf_bo *b; @@ -691,11 +680,13 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, struct nouveau_drm *drm = nouveau_drm(dev); struct drm_nouveau_gem_pushbuf *req = data; struct drm_nouveau_gem_pushbuf_push *push; + struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL; struct drm_nouveau_gem_pushbuf_bo *bo; struct nouveau_channel *chan = NULL; struct validate_op op; struct nouveau_fence *fence = NULL; - int i, j, ret = 0, do_reloc = 0; + int i, j, ret = 0; + bool do_reloc = false;

if (unlikely(!abi16)) return -ENOMEM; @@ -753,7 +744,8 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, }

/* Validate buffer list */ - ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->buffers, +revalidate: + ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->nr_buffers, &op, &do_reloc); if (ret) { if (ret != -ERESTARTSYS) @@ -763,7 +755,18 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,

/* Apply any relocations that are required */ if (do_reloc) { - ret = nouveau_gem_pushbuf_reloc_apply(cli, req, bo); + if (!reloc) { + validate_fini(&op, chan, NULL, bo); + reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc)); + if (IS_ERR(reloc)) { + ret = PTR_ERR(reloc); + goto out_prevalid; + } + + goto revalidate; + } + + ret = nouveau_gem_pushbuf_reloc_apply(cli, req, reloc, bo); if (ret) { NV_PRINTK(err, cli, "reloc apply: %d\n", ret); goto out; @@ -849,6 +852,22 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, validate_fini(&op, chan, fence, bo); nouveau_fence_unref(&fence);

+ if (do_reloc) { + struct drm_nouveau_gem_pushbuf_bo __user *upbbo = + u64_to_user_ptr(req->buffers); + + for (i = 0; i < req->nr_buffers; i++) { + if (bo[i].presumed.valid) + continue; + + if (copy_to_user(&upbbo[i].presumed, &bo[i].presumed, + sizeof(bo[i].presumed))) { + ret = -EFAULT; + break; + } + } + u_free(reloc); + } out_prevalid: u_free(bo); u_free(push);

-- 2.23.0.rc1

Daniel Vetter

2:53 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com --- drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ - int ret; - - /* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ - ret = mutex_lock_interruptible(&bo->wu_mutex); - if (unlikely(ret != 0)) - return -ERESTARTSYS; - if (!dma_resv_is_locked(bo->base.resv)) - goto out_unlock; - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); - if (ret == -EINTR) - ret = -ERESTARTSYS; - if (unlikely(ret != 0)) - goto out_unlock; - dma_resv_unlock(bo->base.resv); - -out_unlock: - mutex_unlock(&bo->wu_mutex); - return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;

- /* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ - if (unlikely(!dma_resv_trylock(bo->base.resv))) { - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { - ttm_bo_get(bo); - up_read(&vmf->vma->vm_mm->mmap_sem); - (void) ttm_bo_wait_unreserved(bo); - ttm_bo_put(bo); - } - - return VM_FAULT_RETRY; - } - - /* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ - return VM_FAULT_NOPAGE; - } + dma_resv_lock(bo->base.resv, NULL);

/* * Refuse to fault imported pages. This should be handled diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);

/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the

-- 2.23.0.rc1

Koenig, Christian

3:16 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

Am 20.08.19 um 16:53 schrieb Daniel Vetter:

...

With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

Yes, please. But one more point: you can now remove bo->wu_mutex as well!

Apart from that Reviewed-by: Christian König christian.koenig@amd.com

...

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;

/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);

if (unlikely(ret != 0))
return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);

if (ret == -EINTR)
ret = -ERESTARTSYS;
if (unlikely(ret != 0))
goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:

mutex_unlock(&bo->wu_mutex);

return ret;

-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
	if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
		ttm_bo_get(bo);
		up_read(&vmf->vma->vm_mm->mmap_sem);
		(void) ttm_bo_wait_unreserved(bo);
		ttm_bo_put(bo);
	}
	return VM_FAULT_RETRY;
}
/*
 * If we'd want to change locking order to
 * mmap_sem -> bo::reserve, we'd use a blocking reserve here
 * instead of retrying the fault...
 */
return VM_FAULT_NOPAGE;
}
dma_resv_lock(bo->base.resv, NULL);

/*

Refuse to fault imported pages. This should be handled

diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);

/**

ttm_bo_uses_embedded_gem_object - check if the given bo uses the

Daniel Vetter

3:21 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On Tue, Aug 20, 2019 at 5:16 PM Koenig, Christian Christian.Koenig@amd.com wrote:

...

Am 20.08.19 um 16:53 schrieb Daniel Vetter:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

Yes, please. But one more point: you can now remove bo->wu_mutex as well!

Ah right totally forgot about that in my enthusiasm after all the auditing and fixing nouveau.

...

Apart from that Reviewed-by: Christian König christian.koenig@amd.com

Thanks, I already respun the patches, so will be in the next version. Can you pls also give this a spin on the amdgpu CI, just to make sure it's all fine? With full lockdep ofc. And then reply with a t-b.

Thanks, Daniel

...

...
drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;
/*
 * In the absense of a wait_unlocked API,
 * Use the bo::wu_mutex to avoid triggering livelocks due to
 * concurrent use of this function. Note that this use of
 * bo::wu_mutex can go away if we change locking order to
 * mmap_sem -> bo::reserve.
 */
ret = mutex_lock_interruptible(&bo->wu_mutex);
if (unlikely(ret != 0))
        return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
        goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
if (ret == -EINTR)
        ret = -ERESTARTSYS;
if (unlikely(ret != 0))
        goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:
mutex_unlock(&bo->wu_mutex);
return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
 * Work around locking order reversal in fault / nopfn
 * between mmap_sem and bo_reserve: Perform a trylock operation
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
        if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
                if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
                        ttm_bo_get(bo);
                        up_read(&vmf->vma->vm_mm->mmap_sem);
                        (void) ttm_bo_wait_unreserved(bo);
                        ttm_bo_put(bo);
                }
                return VM_FAULT_RETRY;
        }
        /*
         * If we'd want to change locking order to
         * mmap_sem -> bo::reserve, we'd use a blocking reserve here
         * instead of retrying the fault...
         */
        return VM_FAULT_NOPAGE;
}
dma_resv_lock(bo->base.resv, NULL);

/*
 * Refuse to fault imported pages. This should be handled
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);

/**

ttm_bo_uses_embedded_gem_object - check if the given bo uses the

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Koenig, Christian

3:34 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

Am 20.08.19 um 17:21 schrieb Daniel Vetter:

...

On Tue, Aug 20, 2019 at 5:16 PM Koenig, Christian Christian.Koenig@amd.com wrote:

...
Am 20.08.19 um 16:53 schrieb Daniel Vetter:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

Yes, please. But one more point: you can now remove bo->wu_mutex as well!

Ah right totally forgot about that in my enthusiasm after all the auditing and fixing nouveau.

...
Apart from that Reviewed-by: Christian König christian.koenig@amd.com

Thanks, I already respun the patches, so will be in the next version. Can you pls also give this a spin on the amdgpu CI, just to make sure it's all fine? With full lockdep ofc. And then reply with a t-b.

I can ask for this on our call tomorrow, but I fear our CI infrastructure is not ready yet.

Christian.

...

Thanks, Daniel

...
...
drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;
/*
 * In the absense of a wait_unlocked API,
 * Use the bo::wu_mutex to avoid triggering livelocks due to
 * concurrent use of this function. Note that this use of
 * bo::wu_mutex can go away if we change locking order to
 * mmap_sem -> bo::reserve.
 */
ret = mutex_lock_interruptible(&bo->wu_mutex);
if (unlikely(ret != 0))
        return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
        goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
if (ret == -EINTR)
        ret = -ERESTARTSYS;
if (unlikely(ret != 0))
        goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:
mutex_unlock(&bo->wu_mutex);
return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
 * Work around locking order reversal in fault / nopfn
 * between mmap_sem and bo_reserve: Perform a trylock operation
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
        if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
                if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
                        ttm_bo_get(bo);
                        up_read(&vmf->vma->vm_mm->mmap_sem);
                        (void) ttm_bo_wait_unreserved(bo);
                        ttm_bo_put(bo);
                }
                return VM_FAULT_RETRY;
        }
        /*
         * If we'd want to change locking order to
         * mmap_sem -> bo::reserve, we'd use a blocking reserve here
         * instead of retrying the fault...
         */
        return VM_FAULT_NOPAGE;
}
dma_resv_lock(bo->base.resv, NULL);

 /*
  * Refuse to fault imported pages. This should be handled
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);

/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the

Daniel Vetter

3:41 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On Tue, Aug 20, 2019 at 5:34 PM Koenig, Christian Christian.Koenig@amd.com wrote:

...

Am 20.08.19 um 17:21 schrieb Daniel Vetter:

...
On Tue, Aug 20, 2019 at 5:16 PM Koenig, Christian Christian.Koenig@amd.com wrote:

...
Am 20.08.19 um 16:53 schrieb Daniel Vetter:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

Yes, please. But one more point: you can now remove bo->wu_mutex as well!

Ah right totally forgot about that in my enthusiasm after all the auditing and fixing nouveau.

...
Apart from that Reviewed-by: Christian König christian.koenig@amd.com

Thanks, I already respun the patches, so will be in the next version. Can you pls also give this a spin on the amdgpu CI, just to make sure it's all fine? With full lockdep ofc. And then reply with a t-b.

I can ask for this on our call tomorrow, but I fear our CI infrastructure is not ready yet.

I thought you have some internal branch you all commit amdgpu stuff for, and then Alex goes and collects the pieces that are ready? Or does that all blow up if you push a patch which touches code outside of the dkms? -Daniel

...

Christian.

...
Thanks, Daniel

...
...
drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;
/*
 * In the absense of a wait_unlocked API,
 * Use the bo::wu_mutex to avoid triggering livelocks due to
 * concurrent use of this function. Note that this use of
 * bo::wu_mutex can go away if we change locking order to
 * mmap_sem -> bo::reserve.
 */
ret = mutex_lock_interruptible(&bo->wu_mutex);
if (unlikely(ret != 0))
        return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
        goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
if (ret == -EINTR)
        ret = -ERESTARTSYS;
if (unlikely(ret != 0))
        goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:
mutex_unlock(&bo->wu_mutex);
return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
 * Work around locking order reversal in fault / nopfn
 * between mmap_sem and bo_reserve: Perform a trylock operation
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
        if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
                if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
                        ttm_bo_get(bo);
                        up_read(&vmf->vma->vm_mm->mmap_sem);
                        (void) ttm_bo_wait_unreserved(bo);
                        ttm_bo_put(bo);
                }
                return VM_FAULT_RETRY;
        }
        /*
         * If we'd want to change locking order to
         * mmap_sem -> bo::reserve, we'd use a blocking reserve here
         * instead of retrying the fault...
         */
        return VM_FAULT_NOPAGE;
}
dma_resv_lock(bo->base.resv, NULL);

 /*
  * Refuse to fault imported pages. This should be handled
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);

/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Koenig, Christian

3:45 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

Am 20.08.19 um 17:41 schrieb Daniel Vetter:

...

On Tue, Aug 20, 2019 at 5:34 PM Koenig, Christian Christian.Koenig@amd.com wrote:

...
Am 20.08.19 um 17:21 schrieb Daniel Vetter:

...
On Tue, Aug 20, 2019 at 5:16 PM Koenig, Christian Christian.Koenig@amd.com wrote:

...
Am 20.08.19 um 16:53 schrieb Daniel Vetter:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

Yes, please. But one more point: you can now remove bo->wu_mutex as well!

Ah right totally forgot about that in my enthusiasm after all the auditing and fixing nouveau.

...
Apart from that Reviewed-by: Christian König christian.koenig@amd.com

Thanks, I already respun the patches, so will be in the next version. Can you pls also give this a spin on the amdgpu CI, just to make sure it's all fine? With full lockdep ofc. And then reply with a t-b.

I can ask for this on our call tomorrow, but I fear our CI infrastructure is not ready yet.

I thought you have some internal branch you all commit amdgpu stuff for, and then Alex goes and collects the pieces that are ready?

No, that part is correct. The problem is that this branch is not QA tested regularly as far as I know.

...

Or does that all blow up if you push a patch which touches code outside of the dkms?

No, but the problem is related to that.

See the release branches for dkms are separate and indeed QA tested regularly.

But changes from amd-staging-drm-next are only cherry picked into those in certain intervals.

Well going to discuss that tomorrow, Christian.

...

-Daniel

...
Christian.

...
Thanks, Daniel

...
...
drivers/gpu/drm/ttm/ttm_bo.c    | 34 ---------------------------------
drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------
include/drm/ttm/ttm_bo_api.h    |  1 -
3 files changed, 1 insertion(+), 60 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;
/*
 * In the absense of a wait_unlocked API,
 * Use the bo::wu_mutex to avoid triggering livelocks due to
 * concurrent use of this function. Note that this use of
 * bo::wu_mutex can go away if we change locking order to
 * mmap_sem -> bo::reserve.
 */
ret = mutex_lock_interruptible(&bo->wu_mutex);
if (unlikely(ret != 0))
        return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
        goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
if (ret == -EINTR)
        ret = -ERESTARTSYS;
if (unlikely(ret != 0))
        goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:
mutex_unlock(&bo->wu_mutex);
return ret;
-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
 * Work around locking order reversal in fault / nopfn
 * between mmap_sem and bo_reserve: Perform a trylock operation
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
        if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
                if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
                        ttm_bo_get(bo);
                        up_read(&vmf->vma->vm_mm->mmap_sem);
                        (void) ttm_bo_wait_unreserved(bo);
                        ttm_bo_put(bo);
                }
                return VM_FAULT_RETRY;
        }
        /*
         * If we'd want to change locking order to
         * mmap_sem -> bo::reserve, we'd use a blocking reserve here
         * instead of retrying the fault...
         */
        return VM_FAULT_NOPAGE;
}
dma_resv_lock(bo->base.resv, NULL);

  /*
   * Refuse to fault imported pages. This should be handled
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
/**
 * ttm_bo_uses_embedded_gem_object - check if the given bo uses the

Thomas Hellström (VMware)

21 Aug 21 Aug

12:40 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On 8/20/19 4:53 PM, Daniel Vetter wrote:

...

With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;

/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);

if (unlikely(ret != 0))
return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);

if (ret == -EINTR)
ret = -ERESTARTSYS;
if (unlikely(ret != 0))
goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:

mutex_unlock(&bo->wu_mutex);

return ret;

-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
	if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
		ttm_bo_get(bo);
		up_read(&vmf->vma->vm_mm->mmap_sem);
		(void) ttm_bo_wait_unreserved(bo);
		ttm_bo_put(bo);
	}
	return VM_FAULT_RETRY;
}
/*
 * If we'd want to change locking order to
 * mmap_sem -> bo::reserve, we'd use a blocking reserve here
 * instead of retrying the fault...
 */

I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order.

The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification.

Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry.

/Thomas

...

return VM_FAULT_NOPAGE;
}
dma_resv_lock(bo->base.resv, NULL);

/*

Refuse to fault imported pages. This should be handled

diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);

/**

ttm_bo_uses_embedded_gem_object - check if the given bo uses the

Thomas Hellström (VMware)

12:47 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:

...

On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c    | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h    | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev)           ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object

to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ -    int ret;

-    /* -     * In the absense of a wait_unlocked API, -     * Use the bo::wu_mutex to avoid triggering livelocks due to -     * concurrent use of this function. Note that this use of -     * bo::wu_mutex can go away if we change locking order to -     * mmap_sem -> bo::reserve. -     */ -    ret = mutex_lock_interruptible(&bo->wu_mutex); -    if (unlikely(ret != 0)) -        return -ERESTARTSYS; -    if (!dma_resv_is_locked(bo->base.resv)) -        goto out_unlock; -    ret = dma_resv_lock_interruptible(bo->base.resv, NULL); -    if (ret == -EINTR) -        ret = -ERESTARTSYS; -    if (unlikely(ret != 0)) -        goto out_unlock; -    dma_resv_unlock(bo->base.resv);

-out_unlock: -    mutex_unlock(&bo->wu_mutex); -    return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)           &bdev->man[bo->mem.mem_type];       struct vm_area_struct cvma; -    /* -     * Work around locking order reversal in fault / nopfn -     * between mmap_sem and bo_reserve: Perform a trylock operation -     * for reserve, and if it fails, retry the fault after waiting -     * for the buffer to become unreserved. -     */ -    if (unlikely(!dma_resv_trylock(bo->base.resv))) { -        if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { -            if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { -                ttm_bo_get(bo);

up_read(&vmf->vma->vm_mm->mmap_sem);

-                (void) ttm_bo_wait_unreserved(bo); -                ttm_bo_put(bo); -            }

-            return VM_FAULT_RETRY; -        }

-        /* -         * If we'd want to change locking order to -         * mmap_sem -> bo::reserve, we'd use a blocking reserve here -         * instead of retrying the fault... -         */

I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order.

The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification.

Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry.

And this last sentence also means we still can remove the wu-mutex when the locking order is changed, since the chance of livelock is goes away. IIRC only a single RETRY spin is allowed by the mm core.

/Thomas

...

/Thomas

...
- return VM_FAULT_NOPAGE; - } + dma_resv_lock(bo->base.resv, NULL); /* * Refuse to fault imported pages. This should be handled diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo); /** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the

Daniel Vetter

2:09 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...

On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object

to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;

/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);

if (unlikely(ret != 0))
   return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
   goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);

if (ret == -EINTR)
   ret = -ERESTARTSYS;
if (unlikely(ret != 0))
   goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:

mutex_unlock(&bo->wu_mutex);

return ret;

-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
   if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
       if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
           ttm_bo_get(bo);
up_read(&vmf->vma->vm_mm->mmap_sem);
           (void) ttm_bo_wait_unreserved(bo);
           ttm_bo_put(bo);
       }
       return VM_FAULT_RETRY;
   }
   /*
    * If we'd want to change locking order to
    * mmap_sem -> bo::reserve, we'd use a blocking reserve here
    * instead of retrying the fault...
    */
I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order.

The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification.

Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry.

That would be patches 1&2 in this series.

...

And this last sentence also means we still can remove the wu-mutex when the locking order is changed, since the chance of livelock is goes away. IIRC only a single RETRY spin is allowed by the mm core.

Christian already noticed that one too, I simply forgot, it's fixed in v2 I have here. -Daniel

...

/Thomas

...
/Thomas

...
   return VM_FAULT_NOPAGE;
}
dma_resv_lock(bo->base.resv, NULL); /*

Refuse to fault imported pages. This should be handled

diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo); /**

ttm_bo_uses_embedded_gem_object - check if the given bo uses the

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellström (VMware)

2:27 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On 8/21/19 4:09 PM, Daniel Vetter wrote:

...

On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object

to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;

/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);

if (unlikely(ret != 0))
   return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
   goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);

if (ret == -EINTR)
   ret = -ERESTARTSYS;
if (unlikely(ret != 0))
   goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:

mutex_unlock(&bo->wu_mutex);

return ret;

-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
   if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
       if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
           ttm_bo_get(bo);
up_read(&vmf->vma->vm_mm->mmap_sem);
           (void) ttm_bo_wait_unreserved(bo);
           ttm_bo_put(bo);
       }
       return VM_FAULT_RETRY;
   }
   /*
    * If we'd want to change locking order to
    * mmap_sem -> bo::reserve, we'd use a blocking reserve here
    * instead of retrying the fault...
    */
I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order.

The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification.

Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry.
That would be patches 1&2 in this series.

Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation,

but to keep the mm latency optimization using the RETRY functionality:

Thanks, Thomas

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 85f5bcbe0c76..68482c67b9f7 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,20 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;

- /* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ + /* Avoid blocking on reservation with mmap_sem held, if possible */ if (unlikely(!reservation_object_trylock(bo->base.resv))) { - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { - ttm_bo_get(bo); - up_read(&vmf->vma->vm_mm->mmap_sem); - (void) ttm_bo_wait_unreserved(bo); - ttm_bo_put(bo); - } + if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) && + !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { + ttm_bo_get(bo); + up_read(&vmf->vma->vm_mm->mmap_sem); + (void) ttm_bo_wait_unreserved(bo); + ttm_bo_put(bo);

return VM_FAULT_RETRY; }

- /* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ - return VM_FAULT_NOPAGE; + if (reservation_object_lock_interruptible(bo->base.resv, NULL)) + return VM_FAULT_NOPAGE; }

Daniel Vetter

2:47 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...

On 8/21/19 4:09 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object

to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;

/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);

if (unlikely(ret != 0))
   return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
   goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);

if (ret == -EINTR)
   ret = -ERESTARTSYS;
if (unlikely(ret != 0))
   goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:

mutex_unlock(&bo->wu_mutex);

return ret;

-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
   if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
       if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
           ttm_bo_get(bo);
up_read(&vmf->vma->vm_mm->mmap_sem);
           (void) ttm_bo_wait_unreserved(bo);
           ttm_bo_put(bo);
       }
       return VM_FAULT_RETRY;
   }
   /*
    * If we'd want to change locking order to
    * mmap_sem -> bo::reserve, we'd use a blocking reserve here
    * instead of retrying the fault...
    */
I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order.

The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification.

Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry.
That would be patches 1&2 in this series.
Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation,

Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something.

...

but to keep the mm latency optimization using the RETRY functionality:

Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops - previously that was necessary because the old ttm_bo_vm_fault had a busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to make this super fancy again. Further patches on top is obviously all fine with me. -Daniel

...

Thanks, Thomas

```
  /*
```

   * Work around locking order reversal in fault / nopfn

   * between mmap_sem and bo_reserve: Perform a trylock operation

   * for reserve, and if it fails, retry the fault after waiting

   * for the buffer to become unreserved.

```
   */
```

  /* Avoid blocking on reservation with mmap_sem held, if possible */
   if (unlikely(!reservation_object_trylock(bo->base.resv))) {

          if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {

                  if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {

                          ttm_bo_get(bo);

                          up_read(&vmf->vma->vm_mm->mmap_sem);

                          (void) ttm_bo_wait_unreserved(bo);

                          ttm_bo_put(bo);

```
                  }
```

          if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) &&

              !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {

```
                  ttm_bo_get(bo);
```

                  up_read(&vmf->vma->vm_mm->mmap_sem);

                  (void) ttm_bo_wait_unreserved(bo);

                  ttm_bo_put(bo);

                   return VM_FAULT_RETRY;
           }

```
          /*
```

           * If we'd want to change locking order to

           * mmap_sem -> bo::reserve, we'd use a blocking reserve here

           * instead of retrying the fault...

```
           */
```
```
          return VM_FAULT_NOPAGE;
```

          if (reservation_object_lock_interruptible(bo->base.resv, NULL))

                  return VM_FAULT_NOPAGE;
   }

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellström (VMware)

3:03 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On 8/21/19 4:47 PM, Daniel Vetter wrote:

...

On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:09 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com
drivers/gpu/drm/ttm/ttm_bo.c    | 34 ---------------------------------
drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------
include/drm/ttm/ttm_bo_api.h    |  1 -
3 files changed, 1 insertion(+), 60 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object

to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;

/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);

if (unlikely(ret != 0))
   return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
   goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);

if (ret == -EINTR)
   ret = -ERESTARTSYS;
if (unlikely(ret != 0))
   goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:

mutex_unlock(&bo->wu_mutex);

return ret;

-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma; - /*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/
if (unlikely(!dma_resv_trylock(bo->base.resv))) {
   if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
       if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
           ttm_bo_get(bo);
up_read(&vmf->vma->vm_mm->mmap_sem);
           (void) ttm_bo_wait_unreserved(bo);
           ttm_bo_put(bo);
       }
       return VM_FAULT_RETRY;
   }
   /*
    * If we'd want to change locking order to
    * mmap_sem -> bo::reserve, we'd use a blocking reserve here
    * instead of retrying the fault...
    */
I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order.

The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification.

Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry.
That would be patches 1&2 in this series.
Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation,
Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something.

...
but to keep the mm latency optimization using the RETRY functionality:

Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops

previously that was necessary because the old ttm_bo_vm_fault had a

busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to make this super fancy again. Further patches on top is obviously all fine with me. -Daniel

Yes, but there are two different things you remove here. One is the workaround for the locking reversal, which is obviously correct.

One is TTM's implementation of the mmap_sem latency optimization, which looks like an oversight.

That optimization is why the VM_FAULT_RETRY functionality was added to mm in the first place and is intended to be used when drivers expect a substantial sleep to avoid keeping the pretty globalish mmap_sem held while that sleep is taking place. We do exactly the same while waiting for move-fences (ttm_bo_vm_fault_idle) and other drivers that expect long waits in the fault handler do the same.

To avoid this accidently happening there was even this comment:

/* * If we'd want to change locking order to * mmap_sem -> bo::reserve, we'd use a blocking reserve here * instead of retrying the fault... */ return VM_FAULT_NOPAGE;

And I do really think we should avoid accidently removing this just to re-add it in a later patch, particularly when I pointed it out at review time.

/Thomas

Daniel Vetter

3:14 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On Wed, Aug 21, 2019 at 5:03 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...

On 8/21/19 4:47 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:09 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote: > With nouveau fixed all ttm-using drives have the correct nesting of > mmap_sem vs dma_resv, and we can just lock the buffer. > > Assuming I didn't screw up anything with my audit of course. > > Signed-off-by: Daniel Vetter daniel.vetter@intel.com > Cc: Christian Koenig christian.koenig@amd.com > Cc: Huang Rui ray.huang@amd.com > Cc: Gerd Hoffmann kraxel@redhat.com > Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com > Cc: Thomas Hellstrom thellstrom@vmware.com > --- > drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- > drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ > include/drm/ttm/ttm_bo_api.h | 1 - > 3 files changed, 1 insertion(+), 60 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 20ff56f27aa4..a952dd624b06 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device > *bdev) > ; > } > EXPORT_SYMBOL(ttm_bo_swapout_all); > - > -/** > - * ttm_bo_wait_unreserved - interruptible wait for a buffer object > to become > - * unreserved > - * > - * @bo: Pointer to buffer > - */ > -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) > -{ > - int ret; > - > - /* > - * In the absense of a wait_unlocked API, > - * Use the bo::wu_mutex to avoid triggering livelocks due to > - * concurrent use of this function. Note that this use of > - * bo::wu_mutex can go away if we change locking order to > - * mmap_sem -> bo::reserve. > - */ > - ret = mutex_lock_interruptible(&bo->wu_mutex); > - if (unlikely(ret != 0)) > - return -ERESTARTSYS; > - if (!dma_resv_is_locked(bo->base.resv)) > - goto out_unlock; > - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); > - if (ret == -EINTR) > - ret = -ERESTARTSYS; > - if (unlikely(ret != 0)) > - goto out_unlock; > - dma_resv_unlock(bo->base.resv); > - > -out_unlock: > - mutex_unlock(&bo->wu_mutex); > - return ret; > -} > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c > b/drivers/gpu/drm/ttm/ttm_bo_vm.c > index 76eedb963693..505e1787aeea 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c > @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct > vm_fault *vmf) > &bdev->man[bo->mem.mem_type]; > struct vm_area_struct cvma; > - /* > - * Work around locking order reversal in fault / nopfn > - * between mmap_sem and bo_reserve: Perform a trylock operation > - * for reserve, and if it fails, retry the fault after waiting > - * for the buffer to become unreserved. > - */ > - if (unlikely(!dma_resv_trylock(bo->base.resv))) { > - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { > - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { > - ttm_bo_get(bo); > - up_read(&vmf->vma->vm_mm->mmap_sem); > - (void) ttm_bo_wait_unreserved(bo); > - ttm_bo_put(bo); > - } > - > - return VM_FAULT_RETRY; > - } > - > - /* > - * If we'd want to change locking order to > - * mmap_sem -> bo::reserve, we'd use a blocking reserve here > - * instead of retrying the fault... > - */ I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order.

The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification.

Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry.

That would be patches 1&2 in this series.

Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation,

Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something.

...
but to keep the mm latency optimization using the RETRY functionality:

Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops

previously that was necessary because the old ttm_bo_vm_fault had a

busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to make this super fancy again. Further patches on top is obviously all fine with me. -Daniel

Yes, but there are two different things you remove here. One is the workaround for the locking reversal, which is obviously correct.

One is TTM's implementation of the mmap_sem latency optimization, which looks like an oversight.

That optimization is why the VM_FAULT_RETRY functionality was added to mm in the first place and is intended to be used when drivers expect a substantial sleep to avoid keeping the pretty globalish mmap_sem held while that sleep is taking place. We do exactly the same while waiting for move-fences (ttm_bo_vm_fault_idle) and other drivers that expect long waits in the fault handler do the same.

Hm, are we guaranteed that core mm will only call us once with ALLOW_RETRY? Just to make sure that we're not live-locking here. I'd also like to get rid of the wu_mutex, that just looks really strange (and I thought it was to duct-tape over the inversion, not as an optimization). If the livelock due to locking inversion is gone I have no idea anymore why we even needs the wu_mutex.

...

To avoid this accidently happening there was even this comment:
     /*
      * If we'd want to change locking order to
      * mmap_sem -> bo::reserve, we'd use a blocking reserve here
      * instead of retrying the fault...
      */
     return VM_FAULT_NOPAGE;
And I do really think we should avoid accidently removing this just to re-add it in a later patch, particularly when I pointed it out at review time.

Yeah I read that, but I didn't read this comment the way you now explained it. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellström (VMware)

3:19 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On 8/21/19 5:14 PM, Daniel Vetter wrote:

...

On Wed, Aug 21, 2019 at 5:03 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:47 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:09 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: > On 8/20/19 4:53 PM, Daniel Vetter wrote: >> With nouveau fixed all ttm-using drives have the correct nesting of >> mmap_sem vs dma_resv, and we can just lock the buffer. >> >> Assuming I didn't screw up anything with my audit of course. >> >> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >> Cc: Christian Koenig christian.koenig@amd.com >> Cc: Huang Rui ray.huang@amd.com >> Cc: Gerd Hoffmann kraxel@redhat.com >> Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com >> Cc: Thomas Hellstrom thellstrom@vmware.com >> --- >> drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- >> drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ >> include/drm/ttm/ttm_bo_api.h | 1 - >> 3 files changed, 1 insertion(+), 60 deletions(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >> index 20ff56f27aa4..a952dd624b06 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device >> *bdev) >> ; >> } >> EXPORT_SYMBOL(ttm_bo_swapout_all); >> - >> -/** >> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object >> to become >> - * unreserved >> - * >> - * @bo: Pointer to buffer >> - */ >> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) >> -{ >> - int ret; >> - >> - /* >> - * In the absense of a wait_unlocked API, >> - * Use the bo::wu_mutex to avoid triggering livelocks due to >> - * concurrent use of this function. Note that this use of >> - * bo::wu_mutex can go away if we change locking order to >> - * mmap_sem -> bo::reserve. >> - */ >> - ret = mutex_lock_interruptible(&bo->wu_mutex); >> - if (unlikely(ret != 0)) >> - return -ERESTARTSYS; >> - if (!dma_resv_is_locked(bo->base.resv)) >> - goto out_unlock; >> - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); >> - if (ret == -EINTR) >> - ret = -ERESTARTSYS; >> - if (unlikely(ret != 0)) >> - goto out_unlock; >> - dma_resv_unlock(bo->base.resv); >> - >> -out_unlock: >> - mutex_unlock(&bo->wu_mutex); >> - return ret; >> -} >> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c >> b/drivers/gpu/drm/ttm/ttm_bo_vm.c >> index 76eedb963693..505e1787aeea 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c >> @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct >> vm_fault *vmf) >> &bdev->man[bo->mem.mem_type]; >> struct vm_area_struct cvma; >> - /* >> - * Work around locking order reversal in fault / nopfn >> - * between mmap_sem and bo_reserve: Perform a trylock operation >> - * for reserve, and if it fails, retry the fault after waiting >> - * for the buffer to become unreserved. >> - */ >> - if (unlikely(!dma_resv_trylock(bo->base.resv))) { >> - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { >> - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { >> - ttm_bo_get(bo); >> - up_read(&vmf->vma->vm_mm->mmap_sem); >> - (void) ttm_bo_wait_unreserved(bo); >> - ttm_bo_put(bo); >> - } >> - >> - return VM_FAULT_RETRY; >> - } >> - >> - /* >> - * If we'd want to change locking order to >> - * mmap_sem -> bo::reserve, we'd use a blocking reserve here >> - * instead of retrying the fault... >> - */ > I think you should justify why the above code is removed, since the > comments actually outlines what to do if we change locking order. > > The code that's removed above is not for adjusting locking orders but > to decrease the mm latency by releasing the mmap_sem while waiting for > bo reserve which in turn might be waiting for GPU. At a minimum we > should have a separate patch with justification. > > Note that the caller here ensures locking progress by adjusting the > RETRY flags after a retry.

That would be patches 1&2 in this series.

Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation,

Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something.

...
but to keep the mm latency optimization using the RETRY functionality:

Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops

previously that was necessary because the old ttm_bo_vm_fault had a

busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to make this super fancy again. Further patches on top is obviously all fine with me. -Daniel

Yes, but there are two different things you remove here. One is the workaround for the locking reversal, which is obviously correct.

One is TTM's implementation of the mmap_sem latency optimization, which looks like an oversight.

That optimization is why the VM_FAULT_RETRY functionality was added to mm in the first place and is intended to be used when drivers expect a substantial sleep to avoid keeping the pretty globalish mmap_sem held while that sleep is taking place. We do exactly the same while waiting for move-fences (ttm_bo_vm_fault_idle) and other drivers that expect long waits in the fault handler do the same.

Hm, are we guaranteed that core mm will only call us once with ALLOW_RETRY?

Last time I looked in the implementation, yes. The ALLOW_RETRY was there to specifically allow making progress in the locking.

...

Just to make sure that we're not live-locking here. I'd also like to get rid of the wu_mutex, that just looks really strange (and I thought it was to duct-tape over the inversion, not as an optimization). If the livelock due to locking inversion is gone I have no idea anymore why we even needs the wu_mutex.

Yes, my interpretation of this is that wu_mutex definitely can be ditched.

/Thomas

Daniel Vetter

3:22 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On Wed, Aug 21, 2019 at 5:19 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...

On 8/21/19 5:14 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 5:03 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:47 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:09 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote: > On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: >> On 8/20/19 4:53 PM, Daniel Vetter wrote: >>> With nouveau fixed all ttm-using drives have the correct nesting of >>> mmap_sem vs dma_resv, and we can just lock the buffer. >>> >>> Assuming I didn't screw up anything with my audit of course. >>> >>> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >>> Cc: Christian Koenig christian.koenig@amd.com >>> Cc: Huang Rui ray.huang@amd.com >>> Cc: Gerd Hoffmann kraxel@redhat.com >>> Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com >>> Cc: Thomas Hellstrom thellstrom@vmware.com >>> --- >>> drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- >>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ >>> include/drm/ttm/ttm_bo_api.h | 1 - >>> 3 files changed, 1 insertion(+), 60 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >>> index 20ff56f27aa4..a952dd624b06 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>> @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device >>> *bdev) >>> ; >>> } >>> EXPORT_SYMBOL(ttm_bo_swapout_all); >>> - >>> -/** >>> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object >>> to become >>> - * unreserved >>> - * >>> - * @bo: Pointer to buffer >>> - */ >>> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) >>> -{ >>> - int ret; >>> - >>> - /* >>> - * In the absense of a wait_unlocked API, >>> - * Use the bo::wu_mutex to avoid triggering livelocks due to >>> - * concurrent use of this function. Note that this use of >>> - * bo::wu_mutex can go away if we change locking order to >>> - * mmap_sem -> bo::reserve. >>> - */ >>> - ret = mutex_lock_interruptible(&bo->wu_mutex); >>> - if (unlikely(ret != 0)) >>> - return -ERESTARTSYS; >>> - if (!dma_resv_is_locked(bo->base.resv)) >>> - goto out_unlock; >>> - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); >>> - if (ret == -EINTR) >>> - ret = -ERESTARTSYS; >>> - if (unlikely(ret != 0)) >>> - goto out_unlock; >>> - dma_resv_unlock(bo->base.resv); >>> - >>> -out_unlock: >>> - mutex_unlock(&bo->wu_mutex); >>> - return ret; >>> -} >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c >>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c >>> index 76eedb963693..505e1787aeea 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c >>> @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct >>> vm_fault *vmf) >>> &bdev->man[bo->mem.mem_type]; >>> struct vm_area_struct cvma; >>> - /* >>> - * Work around locking order reversal in fault / nopfn >>> - * between mmap_sem and bo_reserve: Perform a trylock operation >>> - * for reserve, and if it fails, retry the fault after waiting >>> - * for the buffer to become unreserved. >>> - */ >>> - if (unlikely(!dma_resv_trylock(bo->base.resv))) { >>> - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { >>> - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { >>> - ttm_bo_get(bo); >>> - up_read(&vmf->vma->vm_mm->mmap_sem); >>> - (void) ttm_bo_wait_unreserved(bo); >>> - ttm_bo_put(bo); >>> - } >>> - >>> - return VM_FAULT_RETRY; >>> - } >>> - >>> - /* >>> - * If we'd want to change locking order to >>> - * mmap_sem -> bo::reserve, we'd use a blocking reserve here >>> - * instead of retrying the fault... >>> - */ >> I think you should justify why the above code is removed, since the >> comments actually outlines what to do if we change locking order. >> >> The code that's removed above is not for adjusting locking orders but >> to decrease the mm latency by releasing the mmap_sem while waiting for >> bo reserve which in turn might be waiting for GPU. At a minimum we >> should have a separate patch with justification. >> >> Note that the caller here ensures locking progress by adjusting the >> RETRY flags after a retry. That would be patches 1&2 in this series.

Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation,

Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something.

...
but to keep the mm latency optimization using the RETRY functionality:

Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops

previously that was necessary because the old ttm_bo_vm_fault had a

busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to make this super fancy again. Further patches on top is obviously all fine with me. -Daniel

Yes, but there are two different things you remove here. One is the workaround for the locking reversal, which is obviously correct.

One is TTM's implementation of the mmap_sem latency optimization, which looks like an oversight.

That optimization is why the VM_FAULT_RETRY functionality was added to mm in the first place and is intended to be used when drivers expect a substantial sleep to avoid keeping the pretty globalish mmap_sem held while that sleep is taking place. We do exactly the same while waiting for move-fences (ttm_bo_vm_fault_idle) and other drivers that expect long waits in the fault handler do the same.

Hm, are we guaranteed that core mm will only call us once with ALLOW_RETRY?

Last time I looked in the implementation, yes. The ALLOW_RETRY was there to specifically allow making progress in the locking.

...
Just to make sure that we're not live-locking here. I'd also like to get rid of the wu_mutex, that just looks really strange (and I thought it was to duct-tape over the inversion, not as an optimization). If the livelock due to locking inversion is gone I have no idea anymore why we even needs the wu_mutex.

Yes, my interpretation of this is that wu_mutex definitely can be ditched.

Ok I'll respin and just do normal interruptible locks, keeping the mmap_sem-less path. I think perfectly ok to leave the optimization in, as long as we can remove the wu_mutex.

btw r-b/t-b on patch 1 from vmwgfx would be very much appreciated. That one is the real trick in this series here I think. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellström (VMware)

3:34 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On 8/21/19 5:22 PM, Daniel Vetter wrote:

...

On Wed, Aug 21, 2019 at 5:19 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 5:14 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 5:03 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:47 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:09 PM, Daniel Vetter wrote: > On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) > thomas_os@shipmail.org wrote: >> On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: >>> On 8/20/19 4:53 PM, Daniel Vetter wrote: >>>> With nouveau fixed all ttm-using drives have the correct nesting of >>>> mmap_sem vs dma_resv, and we can just lock the buffer. >>>> >>>> Assuming I didn't screw up anything with my audit of course. >>>> >>>> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >>>> Cc: Christian Koenig christian.koenig@amd.com >>>> Cc: Huang Rui ray.huang@amd.com >>>> Cc: Gerd Hoffmann kraxel@redhat.com >>>> Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com >>>> Cc: Thomas Hellstrom thellstrom@vmware.com >>>> --- >>>> drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- >>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ >>>> include/drm/ttm/ttm_bo_api.h | 1 - >>>> 3 files changed, 1 insertion(+), 60 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >>>> index 20ff56f27aa4..a952dd624b06 100644 >>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>> @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device >>>> *bdev) >>>> ; >>>> } >>>> EXPORT_SYMBOL(ttm_bo_swapout_all); >>>> - >>>> -/** >>>> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object >>>> to become >>>> - * unreserved >>>> - * >>>> - * @bo: Pointer to buffer >>>> - */ >>>> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) >>>> -{ >>>> - int ret; >>>> - >>>> - /* >>>> - * In the absense of a wait_unlocked API, >>>> - * Use the bo::wu_mutex to avoid triggering livelocks due to >>>> - * concurrent use of this function. Note that this use of >>>> - * bo::wu_mutex can go away if we change locking order to >>>> - * mmap_sem -> bo::reserve. >>>> - */ >>>> - ret = mutex_lock_interruptible(&bo->wu_mutex); >>>> - if (unlikely(ret != 0)) >>>> - return -ERESTARTSYS; >>>> - if (!dma_resv_is_locked(bo->base.resv)) >>>> - goto out_unlock; >>>> - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); >>>> - if (ret == -EINTR) >>>> - ret = -ERESTARTSYS; >>>> - if (unlikely(ret != 0)) >>>> - goto out_unlock; >>>> - dma_resv_unlock(bo->base.resv); >>>> - >>>> -out_unlock: >>>> - mutex_unlock(&bo->wu_mutex); >>>> - return ret; >>>> -} >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c >>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c >>>> index 76eedb963693..505e1787aeea 100644 >>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c >>>> @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct >>>> vm_fault *vmf) >>>> &bdev->man[bo->mem.mem_type]; >>>> struct vm_area_struct cvma; >>>> - /* >>>> - * Work around locking order reversal in fault / nopfn >>>> - * between mmap_sem and bo_reserve: Perform a trylock operation >>>> - * for reserve, and if it fails, retry the fault after waiting >>>> - * for the buffer to become unreserved. >>>> - */ >>>> - if (unlikely(!dma_resv_trylock(bo->base.resv))) { >>>> - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { >>>> - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { >>>> - ttm_bo_get(bo); >>>> - up_read(&vmf->vma->vm_mm->mmap_sem); >>>> - (void) ttm_bo_wait_unreserved(bo); >>>> - ttm_bo_put(bo); >>>> - } >>>> - >>>> - return VM_FAULT_RETRY; >>>> - } >>>> - >>>> - /* >>>> - * If we'd want to change locking order to >>>> - * mmap_sem -> bo::reserve, we'd use a blocking reserve here >>>> - * instead of retrying the fault... >>>> - */ >>> I think you should justify why the above code is removed, since the >>> comments actually outlines what to do if we change locking order. >>> >>> The code that's removed above is not for adjusting locking orders but >>> to decrease the mm latency by releasing the mmap_sem while waiting for >>> bo reserve which in turn might be waiting for GPU. At a minimum we >>> should have a separate patch with justification. >>> >>> Note that the caller here ensures locking progress by adjusting the >>> RETRY flags after a retry. > That would be patches 1&2 in this series. > Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation,

Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something.

...
but to keep the mm latency optimization using the RETRY functionality:

Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops

previously that was necessary because the old ttm_bo_vm_fault had a

busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to make this super fancy again. Further patches on top is obviously all fine with me. -Daniel

Yes, but there are two different things you remove here. One is the workaround for the locking reversal, which is obviously correct.

One is TTM's implementation of the mmap_sem latency optimization, which looks like an oversight.

That optimization is why the VM_FAULT_RETRY functionality was added to mm in the first place and is intended to be used when drivers expect a substantial sleep to avoid keeping the pretty globalish mmap_sem held while that sleep is taking place. We do exactly the same while waiting for move-fences (ttm_bo_vm_fault_idle) and other drivers that expect long waits in the fault handler do the same.

Hm, are we guaranteed that core mm will only call us once with ALLOW_RETRY?

Last time I looked in the implementation, yes. The ALLOW_RETRY was there to specifically allow making progress in the locking.

...
Just to make sure that we're not live-locking here. I'd also like to get rid of the wu_mutex, that just looks really strange (and I thought it was to duct-tape over the inversion, not as an optimization). If the livelock due to locking inversion is gone I have no idea anymore why we even needs the wu_mutex.

Yes, my interpretation of this is that wu_mutex definitely can be ditched.

Ok I'll respin and just do normal interruptible locks, keeping the mmap_sem-less path. I think perfectly ok to leave the optimization in, as long as we can remove the wu_mutex.

btw r-b/t-b on patch 1 from vmwgfx would be very much appreciated. That one is the real trick in this series here I think.

Thanks!

I'll look into patch 1 as well.

/Thomas

...

-Daniel

Koenig, Christian

3:07 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

Am 21.08.19 um 16:47 schrieb Daniel Vetter:

...

On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 4:09 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote: [SNIP]

but to keep the mm latency optimization using the RETRY functionality:

Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops

previously that was necessary because the old ttm_bo_vm_fault had a

busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to make this super fancy again. Further patches on top is obviously all fine with me.

I think this is just an optimization to not hold the mmap_sem while waiting for the dma_resv lock.

I agree that it shouldn't be necessary, but maybe it's a good idea for performance. I'm also OK with removing it, cause I'm not sure if it's worth it.

But Thomas noted correctly that we should probably do it in a separate patch so that when somebody points out "Hey my system is slower now!" he's able to bisect to the change.

Christian.

...

-Daniel

...
Thanks, Thomas

Thomas Hellström (VMware)

1:16 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On 8/20/19 4:53 PM, Daniel Vetter wrote:

...

With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)

dma_resv_lock(bo->base.resv, NULL);

interruptible, or at least killable. (IIRC think killable is the right choice in fault code, even if TTM initially implemented interruptible to get reasonable Xorg "silken mouse" latency).

Thanks, /Thomas

Daniel Vetter

2:10 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On Wed, Aug 21, 2019 at 3:16 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...

On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)
dma_resv_lock(bo->base.resv, NULL);
interruptible, or at least killable. (IIRC think killable is the right choice in fault code, even if TTM initially implemented interruptible to get reasonable Xorg "silken mouse" latency).

I think interruptible is fine. I chickend out of that for v1 because I always mix up the return code for _lock_interruptibl() :-)

I'll add that for the next version too. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellström (VMware)

2:30 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On 8/21/19 4:10 PM, Daniel Vetter wrote:

...

On Wed, Aug 21, 2019 at 3:16 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)
dma_resv_lock(bo->base.resv, NULL);
interruptible, or at least killable. (IIRC think killable is the right choice in fault code, even if TTM initially implemented interruptible to get reasonable Xorg "silken mouse" latency).
I think interruptible is fine. I chickend out of that for v1 because I always mix up the return code for _lock_interruptibl() :-)

:). IIRC I think the in-kernel users of fault() were unhappy with interruptible. (GUP?), but I guess it's better to use a bulk change at some point if necessary.

/Thomas

...

I'll add that for the next version too. -Daniel

Daniel Vetter

2:42 p.m.

New subject: [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

On Wed, Aug 21, 2019 at 4:30 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...

On 8/21/19 4:10 PM, Daniel Vetter wrote:

...
On Wed, Aug 21, 2019 at 3:16 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:

...
On 8/20/19 4:53 PM, Daniel Vetter wrote:

...
With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-)
dma_resv_lock(bo->base.resv, NULL);
interruptible, or at least killable. (IIRC think killable is the right choice in fault code, even if TTM initially implemented interruptible to get reasonable Xorg "silken mouse" latency).
I think interruptible is fine. I chickend out of that for v1 because I always mix up the return code for _lock_interruptibl() :-)
:). IIRC I think the in-kernel users of fault() were unhappy with interruptible. (GUP?), but I guess it's better to use a bulk change at some point if necessary.

We've been using interruptible since forever, returning VM_FAULT_NOPAGE to get the signal handled and re-run. Seems to not have pissed off anyone thus far. I think if we do this I'll do it as a follow-up. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Vetter

3:33 p.m.

New subject: [PATCH] drm/ttm: remove ttm_bo_wait_unreserved

With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

v2: - Dont forget wu_mutex (Christian König) - Keep the mmap_sem-less wait optimization (Thomas) - Use _lock_interruptible to be good citizens (Thomas)

Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com --- drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv); - mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); } @@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru); - mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages; @@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ - int ret; - - /* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ - ret = mutex_lock_interruptible(&bo->wu_mutex); - if (unlikely(ret != 0)) - return -ERESTARTSYS; - if (!dma_resv_is_locked(bo->base.resv)) - goto out_unlock; - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); - if (ret == -EINTR) - ret = -ERESTARTSYS; - if (unlikely(ret != 0)) - goto out_unlock; - dma_resv_unlock(bo->base.resv); - -out_unlock: - mutex_unlock(&bo->wu_mutex); - return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru); - mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0); diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;

- /* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem); - (void) ttm_bo_wait_unreserved(bo); + if (!dma_resv_lock_interruptible(bo->base.resv, + NULL)) + dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); }

return VM_FAULT_RETRY; }

- /* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ - return VM_FAULT_NOPAGE; + if (dma_resv_lock_interruptible(bo->base.resv, NULL)) + return VM_FAULT_NOPAGE; }

/* diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt; * @offset: The current GPU offset, which can have different meanings * depending on the memory type. For SYSTEM type memory, it should be 0. * @cur_placement: Hint of current placement. - * @wu_mutex: Wait unreserved mutex. * * Base class for TTM buffer object, that deals with data placement and CPU * mappings. GPU mappings are really up to the driver, but for simpler GPUs @@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */

struct sg_table *sg; - - struct mutex wu_mutex; };

/** @@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);

/** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the

-- 2.23.0.rc1

Thomas Hellström (VMware)

3:39 p.m.

New subject: [PATCH] drm/ttm: remove ttm_bo_wait_unreserved

On 8/21/19 5:33 PM, Daniel Vetter wrote:

...

With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer.

Assuming I didn't screw up anything with my audit of course.

v2:

Dont forget wu_mutex (Christian König)

Keep the mmap_sem-less wait optimization (Thomas)

Use _lock_interruptible to be good citizens (Thomas)

Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com Cc: Thomas Hellstrom thellstrom@vmware.com

drivers/gpu/drm/ttm/ttm_bo.c | 36 ------------------------------- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +++++----------- include/drm/ttm/ttm_bo_api.h | 4 ---- 4 files changed, 5 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(&bo->base._resv);

mutex_destroy(&bo->wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); }

@@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru);

mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages;

@@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all);

-/**

ttm_bo_wait_unreserved - interruptible wait for a buffer object to become

unreserved

@bo: Pointer to buffer

*/

-int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{
int ret;

/*
* In the absense of a wait_unlocked API,
* Use the bo::wu_mutex to avoid triggering livelocks due to
* concurrent use of this function. Note that this use of
* bo::wu_mutex can go away if we change locking order to
* mmap_sem -> bo::reserve.
*/
ret = mutex_lock_interruptible(&bo->wu_mutex);

if (unlikely(ret != 0))
return -ERESTARTSYS;
if (!dma_resv_is_locked(bo->base.resv))
goto out_unlock;
ret = dma_resv_lock_interruptible(bo->base.resv, NULL);

if (ret == -EINTR)
ret = -ERESTARTSYS;
if (unlikely(ret != 0))
goto out_unlock;
dma_resv_unlock(bo->base.resv);
-out_unlock:

mutex_unlock(&bo->wu_mutex);

return ret;

-} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->base.lru); INIT_LIST_HEAD(&fbo->base.swap); INIT_LIST_HEAD(&fbo->base.io_reserve_lru);

mutex_init(&fbo->base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node); atomic_set(&fbo->base.cpu_writers, 0);

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) &bdev->man[bo->mem.mem_type]; struct vm_area_struct cvma;
/*
* Work around locking order reversal in fault / nopfn
* between mmap_sem and bo_reserve: Perform a trylock operation
* for reserve, and if it fails, retry the fault after waiting
* for the buffer to become unreserved.
*/
if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { ttm_bo_get(bo); up_read(&vmf->vma->vm_mm->mmap_sem);
		(void) ttm_bo_wait_unreserved(bo);
		if (!dma_resv_lock_interruptible(bo->base.resv,
						 NULL))
			dma_resv_unlock(bo->base.resv);
	ttm_bo_put(bo);
}

return VM_FAULT_RETRY;
}
/*
 * If we'd want to change locking order to
 * mmap_sem -> bo::reserve, we'd use a blocking reserve here
 * instead of retrying the fault...
 */
return VM_FAULT_NOPAGE;
if (dma_resv_lock_interruptible(bo->base.resv, NULL))
	return VM_FAULT_NOPAGE;
}

/*
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..21c7d0d28757 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -155,7 +155,6 @@ struct ttm_tt;

@offset: The current GPU offset, which can have different meanings

depending on the memory type. For SYSTEM type memory, it should be 0.

@cur_placement: Hint of current placement.

@wu_mutex: Wait unreserved mutex.

Base class for TTM buffer object, that deals with data placement and CPU

mappings. GPU mappings are really up to the driver, but for simpler GPUs

@@ -229,8 +228,6 @@ struct ttm_buffer_object { uint64_t offset; /* GPU address space is independent of CPU word size */

struct sg_table *sg;

struct mutex wu_mutex; };

/**

@@ -765,7 +762,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);

/**

ttm_bo_uses_embedded_gem_object - check if the given bo uses the

Thanks. LGTM.

Reviewed-by: Thomas Hellström thellstrom@vmware.com

2093

Age (days ago)

2095

Last active (days ago)

dri-devel@lists.freedesktop.org

36 comments

5 participants

tags (0)

participants (5)

Chris Wilson
Daniel Vetter
Daniel Vetter
Koenig, Christian
Thomas Hellström (VMware)