On 8/21/19 5:22 PM, Daniel Vetter wrote:
On Wed, Aug 21, 2019 at 5:19 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:
On 8/21/19 5:14 PM, Daniel Vetter wrote:
On Wed, Aug 21, 2019 at 5:03 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:
On 8/21/19 4:47 PM, Daniel Vetter wrote:
On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:
On 8/21/19 4:09 PM, Daniel Vetter wrote: > On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) > thomas_os@shipmail.org wrote: >> On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: >>> On 8/20/19 4:53 PM, Daniel Vetter wrote: >>>> With nouveau fixed all ttm-using drives have the correct nesting of >>>> mmap_sem vs dma_resv, and we can just lock the buffer. >>>> >>>> Assuming I didn't screw up anything with my audit of course. >>>> >>>> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >>>> Cc: Christian Koenig christian.koenig@amd.com >>>> Cc: Huang Rui ray.huang@amd.com >>>> Cc: Gerd Hoffmann kraxel@redhat.com >>>> Cc: "VMware Graphics" linux-graphics-maintainer@vmware.com >>>> Cc: Thomas Hellstrom thellstrom@vmware.com >>>> --- >>>> drivers/gpu/drm/ttm/ttm_bo.c | 34 --------------------------------- >>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------ >>>> include/drm/ttm/ttm_bo_api.h | 1 - >>>> 3 files changed, 1 insertion(+), 60 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >>>> index 20ff56f27aa4..a952dd624b06 100644 >>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>> @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device >>>> *bdev) >>>> ; >>>> } >>>> EXPORT_SYMBOL(ttm_bo_swapout_all); >>>> - >>>> -/** >>>> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object >>>> to become >>>> - * unreserved >>>> - * >>>> - * @bo: Pointer to buffer >>>> - */ >>>> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) >>>> -{ >>>> - int ret; >>>> - >>>> - /* >>>> - * In the absense of a wait_unlocked API, >>>> - * Use the bo::wu_mutex to avoid triggering livelocks due to >>>> - * concurrent use of this function. Note that this use of >>>> - * bo::wu_mutex can go away if we change locking order to >>>> - * mmap_sem -> bo::reserve. >>>> - */ >>>> - ret = mutex_lock_interruptible(&bo->wu_mutex); >>>> - if (unlikely(ret != 0)) >>>> - return -ERESTARTSYS; >>>> - if (!dma_resv_is_locked(bo->base.resv)) >>>> - goto out_unlock; >>>> - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); >>>> - if (ret == -EINTR) >>>> - ret = -ERESTARTSYS; >>>> - if (unlikely(ret != 0)) >>>> - goto out_unlock; >>>> - dma_resv_unlock(bo->base.resv); >>>> - >>>> -out_unlock: >>>> - mutex_unlock(&bo->wu_mutex); >>>> - return ret; >>>> -} >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c >>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c >>>> index 76eedb963693..505e1787aeea 100644 >>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c >>>> @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct >>>> vm_fault *vmf) >>>> &bdev->man[bo->mem.mem_type]; >>>> struct vm_area_struct cvma; >>>> - /* >>>> - * Work around locking order reversal in fault / nopfn >>>> - * between mmap_sem and bo_reserve: Perform a trylock operation >>>> - * for reserve, and if it fails, retry the fault after waiting >>>> - * for the buffer to become unreserved. >>>> - */ >>>> - if (unlikely(!dma_resv_trylock(bo->base.resv))) { >>>> - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { >>>> - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { >>>> - ttm_bo_get(bo); >>>> - up_read(&vmf->vma->vm_mm->mmap_sem); >>>> - (void) ttm_bo_wait_unreserved(bo); >>>> - ttm_bo_put(bo); >>>> - } >>>> - >>>> - return VM_FAULT_RETRY; >>>> - } >>>> - >>>> - /* >>>> - * If we'd want to change locking order to >>>> - * mmap_sem -> bo::reserve, we'd use a blocking reserve here >>>> - * instead of retrying the fault... >>>> - */ >>> I think you should justify why the above code is removed, since the >>> comments actually outlines what to do if we change locking order. >>> >>> The code that's removed above is not for adjusting locking orders but >>> to decrease the mm latency by releasing the mmap_sem while waiting for >>> bo reserve which in turn might be waiting for GPU. At a minimum we >>> should have a separate patch with justification. >>> >>> Note that the caller here ensures locking progress by adjusting the >>> RETRY flags after a retry. > That would be patches 1&2 in this series. > Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation,
Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something.
but to keep the mm latency optimization using the RETRY functionality:
Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops
- previously that was necessary because the old ttm_bo_vm_fault had a
busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to make this super fancy again. Further patches on top is obviously all fine with me. -Daniel
Yes, but there are two different things you remove here. One is the workaround for the locking reversal, which is obviously correct.
One is TTM's implementation of the mmap_sem latency optimization, which looks like an oversight.
That optimization is why the VM_FAULT_RETRY functionality was added to mm in the first place and is intended to be used when drivers expect a substantial sleep to avoid keeping the pretty globalish mmap_sem held while that sleep is taking place. We do exactly the same while waiting for move-fences (ttm_bo_vm_fault_idle) and other drivers that expect long waits in the fault handler do the same.
Hm, are we guaranteed that core mm will only call us once with ALLOW_RETRY?
Last time I looked in the implementation, yes. The ALLOW_RETRY was there to specifically allow making progress in the locking.
Just to make sure that we're not live-locking here. I'd also like to get rid of the wu_mutex, that just looks really strange (and I thought it was to duct-tape over the inversion, not as an optimization). If the livelock due to locking inversion is gone I have no idea anymore why we even needs the wu_mutex.
Yes, my interpretation of this is that wu_mutex definitely can be ditched.
Ok I'll respin and just do normal interruptible locks, keeping the mmap_sem-less path. I think perfectly ok to leave the optimization in, as long as we can remove the wu_mutex.
btw r-b/t-b on patch 1 from vmwgfx would be very much appreciated. That one is the real trick in this series here I think.
Thanks!
I'll look into patch 1 as well.
/Thomas
-Daniel