In commit:
commit 09ac4fcb3f255e9225967c75f5893325c116cdbe Author: Felix Kuehling Felix.Kuehling@amd.com Date: Thu Jul 13 17:01:16 2017 -0400
drm/ttm: Implement vm_operations_struct.access v2
we added the vm_access hook, where we also directly call tt_swapin for some reason. If something is swapped-out then the ttm_tt must also be unpopulated, and since access_kmap should also call tt_populate, if needed, then swapping-in will already be handled there.
If anything, calling tt_swapin directly here would likely always fail since the tt->pages won't yet be populated, or worse since the tt->pages array is never actually cleared in unpopulate this might lead to a nasty uaf.
Fixes: 09ac4fcb3f25 ("drm/ttm: Implement vm_operations_struct.access v2") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index f56be5bc0861..5b9b7fd01a69 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -519,11 +519,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
switch (bo->resource->mem_type) { case TTM_PL_SYSTEM: - if (unlikely(bo->ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) { - ret = ttm_tt_swapin(bo->ttm); - if (unlikely(ret != 0)) - return ret; - } fallthrough; case TTM_PL_TT: ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
In commit:
commit 58aa6622d32af7d2c08d45085f44c54554a16ed7 Author: Thomas Hellstrom thellstrom@vmware.com Date: Fri Jan 3 11:47:23 2014 +0100
drm/ttm: Correctly set page mapping and -index members
we started setting the page->mapping and page->index to point to the virtual address space, if the pages were faulted with TTM. Apparently this was needed for core-mm to able to reverse lookup the virtual address given the struct page, and potentially unmap it from the page tables. However as pointed out by Thomas, since we are now using PFN_MAP, instead of say PFN_MIXED, this should no longer be the case.
There was also apparently some usecase in vmwgfx which needed this for dirty tracking, but that also doesn't appear to be the case anymore, as pointed out by Thomas.
We still need keep the page->mapping for now, since that is still needed for different reasons, but we try to address that in the next patch.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com Reviewed-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 -- drivers/gpu/drm/ttm/ttm_tt.c | 4 +--- 2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 5b9b7fd01a69..9a2119fe4bdd 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -346,8 +346,6 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } else if (unlikely(!page)) { break; } - page->index = drm_vma_node_start(&bo->base.vma_node) + - page_offset; pfn = page_to_pfn(page); }
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index dae52433beeb..1cc04c224988 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -367,10 +367,8 @@ static void ttm_tt_clear_mapping(struct ttm_tt *ttm) if (ttm->page_flags & TTM_PAGE_FLAG_SG) return;
- for (i = 0; i < ttm->num_pages; ++i) { + for (i = 0; i < ttm->num_pages; ++i) (*page)->mapping = NULL; - (*page++)->index = 0; - } }
void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
Now that setting page->index shouldn't be needed anymore, we are just left with setting page->mapping, and here it looks like amdgpu is the only user, where pointing the page->mapping at the dev_mapping is used to verify that the pages do indeed belong to the device, if userspace later tries to touch them.
v2(Christian): - Drop the functions altogether and just inline modifying the page->mapping
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 15 ++++++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 25 ------------------------- 2 files changed, 14 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 820fcb24231f..438377a89aa3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1119,6 +1119,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev, { struct amdgpu_device *adev = amdgpu_ttm_adev(bdev); struct amdgpu_ttm_tt *gtt = (void *)ttm; + pgoff_t i; + int ret;
/* user pages are bound by amdgpu_ttm_tt_pin_userptr() */ if (gtt->userptr) { @@ -1131,7 +1133,14 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev, if (ttm->page_flags & TTM_PAGE_FLAG_SG) return 0;
- return ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx); + ret = ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx); + if (ret) + return ret; + + for (i = 0; i < ttm->num_pages; ++i) + ttm->pages[i]->mapping = bdev->dev_mapping; + + return 0; }
/* @@ -1145,6 +1154,7 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev, { struct amdgpu_ttm_tt *gtt = (void *)ttm; struct amdgpu_device *adev; + pgoff_t i;
amdgpu_ttm_backend_unbind(bdev, ttm);
@@ -1158,6 +1168,9 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev, if (ttm->page_flags & TTM_PAGE_FLAG_SG) return;
+ for (i = 0; i < ttm->num_pages; ++i) + ttm->pages[i]->mapping = NULL; + adev = amdgpu_ttm_adev(bdev); return ttm_pool_free(&adev->mman.bdev.pool, ttm); } diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 1cc04c224988..980ecb079b2c 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -289,17 +289,6 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm, return ret; }
-static void ttm_tt_add_mapping(struct ttm_device *bdev, struct ttm_tt *ttm) -{ - pgoff_t i; - - if (ttm->page_flags & TTM_PAGE_FLAG_SG) - return; - - for (i = 0; i < ttm->num_pages; ++i) - ttm->pages[i]->mapping = bdev->dev_mapping; -} - int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm, struct ttm_operation_ctx *ctx) { @@ -336,7 +325,6 @@ int ttm_tt_populate(struct ttm_device *bdev, if (ret) goto error;
- ttm_tt_add_mapping(bdev, ttm); ttm->page_flags |= TTM_PAGE_FLAG_PRIV_POPULATED; if (unlikely(ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) { ret = ttm_tt_swapin(ttm); @@ -359,24 +347,11 @@ int ttm_tt_populate(struct ttm_device *bdev, } EXPORT_SYMBOL(ttm_tt_populate);
-static void ttm_tt_clear_mapping(struct ttm_tt *ttm) -{ - pgoff_t i; - struct page **page = ttm->pages; - - if (ttm->page_flags & TTM_PAGE_FLAG_SG) - return; - - for (i = 0; i < ttm->num_pages; ++i) - (*page)->mapping = NULL; -} - void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) { if (!ttm_tt_is_populated(ttm)) return;
- ttm_tt_clear_mapping(ttm); if (bdev->funcs->ttm_tt_unpopulate) bdev->funcs->ttm_tt_unpopulate(bdev, ttm); else
Am 21.09.21 um 13:01 schrieb Matthew Auld:
Now that setting page->index shouldn't be needed anymore, we are just left with setting page->mapping, and here it looks like amdgpu is the only user, where pointing the page->mapping at the dev_mapping is used to verify that the pages do indeed belong to the device, if userspace later tries to touch them.
v2(Christian):
- Drop the functions altogether and just inline modifying the page->mapping
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 15 ++++++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 25 ------------------------- 2 files changed, 14 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 820fcb24231f..438377a89aa3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1119,6 +1119,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev, { struct amdgpu_device *adev = amdgpu_ttm_adev(bdev); struct amdgpu_ttm_tt *gtt = (void *)ttm;
pgoff_t i;
int ret;
/* user pages are bound by amdgpu_ttm_tt_pin_userptr() */ if (gtt->userptr) {
@@ -1131,7 +1133,14 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev, if (ttm->page_flags & TTM_PAGE_FLAG_SG) return 0;
- return ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx);
ret = ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx);
if (ret)
return ret;
for (i = 0; i < ttm->num_pages; ++i)
ttm->pages[i]->mapping = bdev->dev_mapping;
return 0; }
/*
@@ -1145,6 +1154,7 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev, { struct amdgpu_ttm_tt *gtt = (void *)ttm; struct amdgpu_device *adev;
pgoff_t i;
amdgpu_ttm_backend_unbind(bdev, ttm);
@@ -1158,6 +1168,9 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev, if (ttm->page_flags & TTM_PAGE_FLAG_SG) return;
- for (i = 0; i < ttm->num_pages; ++i)
ttm->pages[i]->mapping = NULL;
- adev = amdgpu_ttm_adev(bdev); return ttm_pool_free(&adev->mman.bdev.pool, ttm); }
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 1cc04c224988..980ecb079b2c 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -289,17 +289,6 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm, return ret; }
-static void ttm_tt_add_mapping(struct ttm_device *bdev, struct ttm_tt *ttm) -{
- pgoff_t i;
- if (ttm->page_flags & TTM_PAGE_FLAG_SG)
return;
- for (i = 0; i < ttm->num_pages; ++i)
ttm->pages[i]->mapping = bdev->dev_mapping;
-}
- int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm, struct ttm_operation_ctx *ctx) {
@@ -336,7 +325,6 @@ int ttm_tt_populate(struct ttm_device *bdev, if (ret) goto error;
- ttm_tt_add_mapping(bdev, ttm); ttm->page_flags |= TTM_PAGE_FLAG_PRIV_POPULATED; if (unlikely(ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) { ret = ttm_tt_swapin(ttm);
@@ -359,24 +347,11 @@ int ttm_tt_populate(struct ttm_device *bdev, } EXPORT_SYMBOL(ttm_tt_populate);
-static void ttm_tt_clear_mapping(struct ttm_tt *ttm) -{
- pgoff_t i;
- struct page **page = ttm->pages;
- if (ttm->page_flags & TTM_PAGE_FLAG_SG)
return;
- for (i = 0; i < ttm->num_pages; ++i)
(*page)->mapping = NULL;
-}
void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) { if (!ttm_tt_is_populated(ttm)) return;
ttm_tt_clear_mapping(ttm); if (bdev->funcs->ttm_tt_unpopulate) bdev->funcs->ttm_tt_unpopulate(bdev, ttm); else
No longer used it seems.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com Reviewed-by: Christian König christian.koenig@amd.com --- include/drm/ttm/ttm_tt.h | 1 - 1 file changed, 1 deletion(-)
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index 89b15d673b22..842ce756213c 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -41,7 +41,6 @@ struct ttm_operation_ctx; #define TTM_PAGE_FLAG_SWAPPED (1 << 4) #define TTM_PAGE_FLAG_ZERO_ALLOC (1 << 6) #define TTM_PAGE_FLAG_SG (1 << 8) -#define TTM_PAGE_FLAG_NO_RETRY (1 << 9)
#define TTM_PAGE_FLAG_PRIV_POPULATED (1 << 31)
It covers more than just ttm_bo_type_sg usage, like with say dma-buf, since one other user is userptr in amdgpu, and in the future we might have some more. Hence EXTERNAL is likely a more suitable name.
v2(Christian): - Rename these to TTM_TT_FLAGS_* - Fix up all the holes in the flag values
Suggested-by: Christian König christian.koenig@amd.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 +++++----- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 +++--- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++-- drivers/gpu/drm/radeon/radeon_ttm.c | 8 ++++---- drivers/gpu/drm/ttm/ttm_bo.c | 4 ++-- drivers/gpu/drm/ttm/ttm_bo_util.c | 4 ++-- drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 +- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 24 ++++++++++++------------ include/drm/ttm/ttm_device.h | 2 +- include/drm/ttm/ttm_tt.h | 18 +++++++++--------- 11 files changed, 42 insertions(+), 42 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 438377a89aa3..0cf94421665f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -894,7 +894,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev, DRM_ERROR("failed to pin userptr\n"); return r; } - } else if (ttm->page_flags & TTM_PAGE_FLAG_SG) { + } else if (ttm->page_flags & TTM_TT_FLAG_EXTERNAL) { if (!ttm->sg) { struct dma_buf_attachment *attach; struct sg_table *sgt; @@ -1130,7 +1130,7 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev, return 0; }
- if (ttm->page_flags & TTM_PAGE_FLAG_SG) + if (ttm->page_flags & TTM_TT_FLAG_EXTERNAL) return 0;
ret = ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx); @@ -1165,7 +1165,7 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev, return; }
- if (ttm->page_flags & TTM_PAGE_FLAG_SG) + if (ttm->page_flags & TTM_TT_FLAG_EXTERNAL) return;
for (i = 0; i < ttm->num_pages; ++i) @@ -1198,8 +1198,8 @@ int amdgpu_ttm_tt_set_userptr(struct ttm_buffer_object *bo, return -ENOMEM; }
- /* Set TTM_PAGE_FLAG_SG before populate but after create. */ - bo->ttm->page_flags |= TTM_PAGE_FLAG_SG; + /* Set TTM_TT_FLAG_EXTERNAL before populate but after create. */ + bo->ttm->page_flags |= TTM_TT_FLAG_EXTERNAL;
gtt = (void *)bo->ttm; gtt->userptr = addr; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 2f672f06b169..fd5b925b27c5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -182,7 +182,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && man->use_tt) - page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC; + page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, i915_ttm_select_tt_caching(obj)); @@ -546,7 +546,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, }
/* Populate ttm with pages if needed. Typically system memory. */ - if (ttm && (dst_man->use_tt || (ttm->page_flags & TTM_PAGE_FLAG_SWAPPED))) { + if (ttm && (dst_man->use_tt || (ttm->page_flags & TTM_TT_FLAG_SWAPPED))) { ret = ttm_tt_populate(bo->bdev, ttm, ctx); if (ret) return ret; @@ -557,7 +557,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, return PTR_ERR(dst_st);
clear = !cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm)); - if (!(clear && ttm && !(ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC))) + if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) __i915_ttm_move(bo, clear, dst_mem, dst_st);
ttm_bo_move_sync_cleanup(bo, dst_mem); diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index d3b21d318b42..12b107acb6ee 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1250,7 +1250,7 @@ nouveau_ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm_dma = (void *)ttm; struct nouveau_drm *drm; struct device *dev; - bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG); + bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
if (ttm_tt_is_populated(ttm)) return 0; @@ -1273,7 +1273,7 @@ nouveau_ttm_tt_unpopulate(struct ttm_device *bdev, { struct nouveau_drm *drm; struct device *dev; - bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG); + bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
if (slave) return; diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 7793249bc549..11b21d605584 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -545,14 +545,14 @@ static int radeon_ttm_tt_populate(struct ttm_device *bdev, { struct radeon_device *rdev = radeon_get_rdev(bdev); struct radeon_ttm_tt *gtt = radeon_ttm_tt_to_gtt(rdev, ttm); - bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG); + bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
if (gtt && gtt->userptr) { ttm->sg = kzalloc(sizeof(struct sg_table), GFP_KERNEL); if (!ttm->sg) return -ENOMEM;
- ttm->page_flags |= TTM_PAGE_FLAG_SG; + ttm->page_flags |= TTM_TT_FLAG_EXTERNAL; return 0; }
@@ -569,13 +569,13 @@ static void radeon_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm { struct radeon_device *rdev = radeon_get_rdev(bdev); struct radeon_ttm_tt *gtt = radeon_ttm_tt_to_gtt(rdev, ttm); - bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG); + bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
radeon_ttm_tt_unbind(bdev, ttm);
if (gtt && gtt->userptr) { kfree(ttm->sg); - ttm->page_flags &= ~TTM_PAGE_FLAG_SG; + ttm->page_flags &= ~TTM_TT_FLAG_EXTERNAL; return; }
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3b22c0013dbf..d62b2013c367 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1115,8 +1115,8 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, return -EBUSY;
if (!bo->ttm || !ttm_tt_is_populated(bo->ttm) || - bo->ttm->page_flags & TTM_PAGE_FLAG_SG || - bo->ttm->page_flags & TTM_PAGE_FLAG_SWAPPED || + bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL || + bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED || !ttm_bo_get_unless_zero(bo)) { if (locked) dma_resv_unlock(bo->base.resv); diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index c893c3db2623..a342d701c91c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -147,7 +147,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, bool clear; int ret = 0;
- if (ttm && ((ttm->page_flags & TTM_PAGE_FLAG_SWAPPED) || + if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) || dst_man->use_tt)) { ret = ttm_tt_populate(bdev, ttm, ctx); if (ret) @@ -169,7 +169,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, }
clear = src_iter->ops->maps_tt && (!ttm || !ttm_tt_is_populated(ttm)); - if (!(clear && ttm && !(ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC))) + if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) ttm_move_memcpy(clear, dst_mem->num_pages, dst_iter, src_iter);
if (!src_iter->ops->maps_tt) diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 9a2119fe4bdd..950f4f132802 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -162,7 +162,7 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo, * Refuse to fault imported pages. This should be handled * (if at all) by redirecting mmap to the exporter. */ - if (bo->ttm && (bo->ttm->page_flags & TTM_PAGE_FLAG_SG)) { + if (bo->ttm && (bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { dma_resv_unlock(bo->base.resv); return VM_FAULT_SIGBUS; } diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index c961a788b519..1bba0a0ed3f9 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -371,7 +371,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt, WARN_ON(!num_pages || ttm_tt_is_populated(tt)); WARN_ON(dma_addr && !pool->dev);
- if (tt->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC) + if (tt->page_flags & TTM_TT_FLAG_ZERO_ALLOC) gfp_flags |= __GFP_ZERO;
if (ctx->gfp_retry_mayfail) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 980ecb079b2c..86f31fde6e35 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -68,12 +68,12 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc) switch (bo->type) { case ttm_bo_type_device: if (zero_alloc) - page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC; + page_flags |= TTM_TT_FLAG_ZERO_ALLOC; break; case ttm_bo_type_kernel: break; case ttm_bo_type_sg: - page_flags |= TTM_PAGE_FLAG_SG; + page_flags |= TTM_TT_FLAG_EXTERNAL; break; default: pr_err("Illegal buffer object type\n"); @@ -156,7 +156,7 @@ EXPORT_SYMBOL(ttm_tt_init);
void ttm_tt_fini(struct ttm_tt *ttm) { - WARN_ON(ttm->page_flags & TTM_PAGE_FLAG_PRIV_POPULATED); + WARN_ON(ttm->page_flags & TTM_TT_FLAG_PRIV_POPULATED);
if (ttm->swap_storage) fput(ttm->swap_storage); @@ -178,7 +178,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
ttm_tt_init_fields(ttm, bo, page_flags, caching);
- if (page_flags & TTM_PAGE_FLAG_SG) + if (page_flags & TTM_TT_FLAG_EXTERNAL) ret = ttm_sg_tt_alloc_page_directory(ttm); else ret = ttm_dma_tt_alloc_page_directory(ttm); @@ -224,7 +224,7 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
fput(swap_storage); ttm->swap_storage = NULL; - ttm->page_flags &= ~TTM_PAGE_FLAG_SWAPPED; + ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
return 0;
@@ -279,7 +279,7 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
ttm_tt_unpopulate(bdev, ttm); ttm->swap_storage = swap_storage; - ttm->page_flags |= TTM_PAGE_FLAG_SWAPPED; + ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
return ttm->num_pages;
@@ -300,7 +300,7 @@ int ttm_tt_populate(struct ttm_device *bdev, if (ttm_tt_is_populated(ttm)) return 0;
- if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { atomic_long_add(ttm->num_pages, &ttm_pages_allocated); if (bdev->pool.use_dma32) atomic_long_add(ttm->num_pages, @@ -325,8 +325,8 @@ int ttm_tt_populate(struct ttm_device *bdev, if (ret) goto error;
- ttm->page_flags |= TTM_PAGE_FLAG_PRIV_POPULATED; - if (unlikely(ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) { + ttm->page_flags |= TTM_TT_FLAG_PRIV_POPULATED; + if (unlikely(ttm->page_flags & TTM_TT_FLAG_SWAPPED)) { ret = ttm_tt_swapin(ttm); if (unlikely(ret != 0)) { ttm_tt_unpopulate(bdev, ttm); @@ -337,7 +337,7 @@ int ttm_tt_populate(struct ttm_device *bdev, return 0;
error: - if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); if (bdev->pool.use_dma32) atomic_long_sub(ttm->num_pages, @@ -357,14 +357,14 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) else ttm_pool_free(&bdev->pool, ttm);
- if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); if (bdev->pool.use_dma32) atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated); }
- ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED; + ttm->page_flags &= ~TTM_TT_FLAG_PRIV_POPULATED; }
#ifdef CONFIG_DEBUG_FS diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h index cbe03d45e883..0a4ddec78d8f 100644 --- a/include/drm/ttm/ttm_device.h +++ b/include/drm/ttm/ttm_device.h @@ -65,7 +65,7 @@ struct ttm_device_funcs { * ttm_tt_create * * @bo: The buffer object to create the ttm for. - * @page_flags: Page flags as identified by TTM_PAGE_FLAG_XX flags. + * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags. * * Create a struct ttm_tt to back data with system memory pages. * No pages are actually allocated. diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index 842ce756213c..b023cd58ff38 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -38,17 +38,17 @@ struct ttm_resource; struct ttm_buffer_object; struct ttm_operation_ctx;
-#define TTM_PAGE_FLAG_SWAPPED (1 << 4) -#define TTM_PAGE_FLAG_ZERO_ALLOC (1 << 6) -#define TTM_PAGE_FLAG_SG (1 << 8) +#define TTM_TT_FLAG_SWAPPED (1 << 0) +#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1) +#define TTM_TT_FLAG_EXTERNAL (1 << 2)
-#define TTM_PAGE_FLAG_PRIV_POPULATED (1 << 31) +#define TTM_TT_FLAG_PRIV_POPULATED (1 << 31)
/** * struct ttm_tt * * @pages: Array of pages backing the data. - * @page_flags: see TTM_PAGE_FLAG_* + * @page_flags: see TTM_TT_FLAG_* * @num_pages: Number of pages in the page array. * @sg: for SG objects via dma-buf * @dma_address: The DMA (bus) addresses of the pages @@ -84,7 +84,7 @@ struct ttm_kmap_iter_tt {
static inline bool ttm_tt_is_populated(struct ttm_tt *tt) { - return tt->page_flags & TTM_PAGE_FLAG_PRIV_POPULATED; + return tt->page_flags & TTM_TT_FLAG_PRIV_POPULATED; }
/** @@ -103,7 +103,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc); * * @ttm: The struct ttm_tt. * @bo: The buffer object we create the ttm for. - * @page_flags: Page flags as identified by TTM_PAGE_FLAG_XX flags. + * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags. * @caching: the desired caching state of the pages * * Create a struct ttm_tt to back data with system memory pages. @@ -178,7 +178,7 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm); */ static inline void ttm_tt_mark_for_clear(struct ttm_tt *ttm) { - ttm->page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC; + ttm->page_flags |= TTM_TT_FLAG_ZERO_ALLOC; }
void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages); @@ -194,7 +194,7 @@ struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt, * * @bo: Buffer object we allocate the ttm for. * @bridge: The agp bridge this device is sitting on. - * @page_flags: Page flags as identified by TTM_PAGE_FLAG_XX flags. + * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags. * * * Create a TTM backend that uses the indicated AGP bridge as an aperture
Am 21.09.21 um 13:01 schrieb Matthew Auld:
It covers more than just ttm_bo_type_sg usage, like with say dma-buf, since one other user is userptr in amdgpu, and in the future we might have some more. Hence EXTERNAL is likely a more suitable name.
v2(Christian):
- Rename these to TTM_TT_FLAGS_*
- Fix up all the holes in the flag values
Suggested-by: Christian König christian.koenig@amd.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
Acked-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 +++++----- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 +++--- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++-- drivers/gpu/drm/radeon/radeon_ttm.c | 8 ++++---- drivers/gpu/drm/ttm/ttm_bo.c | 4 ++-- drivers/gpu/drm/ttm/ttm_bo_util.c | 4 ++-- drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 +- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 24 ++++++++++++------------ include/drm/ttm/ttm_device.h | 2 +- include/drm/ttm/ttm_tt.h | 18 +++++++++--------- 11 files changed, 42 insertions(+), 42 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 438377a89aa3..0cf94421665f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -894,7 +894,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev, DRM_ERROR("failed to pin userptr\n"); return r; }
- } else if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
- } else if (ttm->page_flags & TTM_TT_FLAG_EXTERNAL) { if (!ttm->sg) { struct dma_buf_attachment *attach; struct sg_table *sgt;
@@ -1130,7 +1130,7 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev, return 0; }
- if (ttm->page_flags & TTM_PAGE_FLAG_SG)
if (ttm->page_flags & TTM_TT_FLAG_EXTERNAL) return 0;
ret = ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx);
@@ -1165,7 +1165,7 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev, return; }
- if (ttm->page_flags & TTM_PAGE_FLAG_SG)
if (ttm->page_flags & TTM_TT_FLAG_EXTERNAL) return;
for (i = 0; i < ttm->num_pages; ++i)
@@ -1198,8 +1198,8 @@ int amdgpu_ttm_tt_set_userptr(struct ttm_buffer_object *bo, return -ENOMEM; }
- /* Set TTM_PAGE_FLAG_SG before populate but after create. */
- bo->ttm->page_flags |= TTM_PAGE_FLAG_SG;
/* Set TTM_TT_FLAG_EXTERNAL before populate but after create. */
bo->ttm->page_flags |= TTM_TT_FLAG_EXTERNAL;
gtt = (void *)bo->ttm; gtt->userptr = addr;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 2f672f06b169..fd5b925b27c5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -182,7 +182,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
if (obj->flags & I915_BO_ALLOC_CPU_CLEAR && man->use_tt)
page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC;
page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, i915_ttm_select_tt_caching(obj));
@@ -546,7 +546,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, }
/* Populate ttm with pages if needed. Typically system memory. */
- if (ttm && (dst_man->use_tt || (ttm->page_flags & TTM_PAGE_FLAG_SWAPPED))) {
- if (ttm && (dst_man->use_tt || (ttm->page_flags & TTM_TT_FLAG_SWAPPED))) { ret = ttm_tt_populate(bo->bdev, ttm, ctx); if (ret) return ret;
@@ -557,7 +557,7 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, return PTR_ERR(dst_st);
clear = !cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm));
- if (!(clear && ttm && !(ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)))
if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) __i915_ttm_move(bo, clear, dst_mem, dst_st);
ttm_bo_move_sync_cleanup(bo, dst_mem);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index d3b21d318b42..12b107acb6ee 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1250,7 +1250,7 @@ nouveau_ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm_dma = (void *)ttm; struct nouveau_drm *drm; struct device *dev;
- bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
if (ttm_tt_is_populated(ttm)) return 0;
@@ -1273,7 +1273,7 @@ nouveau_ttm_tt_unpopulate(struct ttm_device *bdev, { struct nouveau_drm *drm; struct device *dev;
- bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
if (slave) return;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 7793249bc549..11b21d605584 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -545,14 +545,14 @@ static int radeon_ttm_tt_populate(struct ttm_device *bdev, { struct radeon_device *rdev = radeon_get_rdev(bdev); struct radeon_ttm_tt *gtt = radeon_ttm_tt_to_gtt(rdev, ttm);
- bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
if (gtt && gtt->userptr) { ttm->sg = kzalloc(sizeof(struct sg_table), GFP_KERNEL); if (!ttm->sg) return -ENOMEM;
ttm->page_flags |= TTM_PAGE_FLAG_SG;
return 0; }ttm->page_flags |= TTM_TT_FLAG_EXTERNAL;
@@ -569,13 +569,13 @@ static void radeon_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm { struct radeon_device *rdev = radeon_get_rdev(bdev); struct radeon_ttm_tt *gtt = radeon_ttm_tt_to_gtt(rdev, ttm);
- bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
bool slave = !!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
radeon_ttm_tt_unbind(bdev, ttm);
if (gtt && gtt->userptr) { kfree(ttm->sg);
ttm->page_flags &= ~TTM_PAGE_FLAG_SG;
return; }ttm->page_flags &= ~TTM_TT_FLAG_EXTERNAL;
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3b22c0013dbf..d62b2013c367 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1115,8 +1115,8 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, return -EBUSY;
if (!bo->ttm || !ttm_tt_is_populated(bo->ttm) ||
bo->ttm->page_flags & TTM_PAGE_FLAG_SG ||
bo->ttm->page_flags & TTM_PAGE_FLAG_SWAPPED ||
bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL ||
if (locked) dma_resv_unlock(bo->base.resv);bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED || !ttm_bo_get_unless_zero(bo)) {
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index c893c3db2623..a342d701c91c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -147,7 +147,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, bool clear; int ret = 0;
- if (ttm && ((ttm->page_flags & TTM_PAGE_FLAG_SWAPPED) ||
- if (ttm && ((ttm->page_flags & TTM_TT_FLAG_SWAPPED) || dst_man->use_tt)) { ret = ttm_tt_populate(bdev, ttm, ctx); if (ret)
@@ -169,7 +169,7 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, }
clear = src_iter->ops->maps_tt && (!ttm || !ttm_tt_is_populated(ttm));
- if (!(clear && ttm && !(ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)))
if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) ttm_move_memcpy(clear, dst_mem->num_pages, dst_iter, src_iter);
if (!src_iter->ops->maps_tt)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 9a2119fe4bdd..950f4f132802 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -162,7 +162,7 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo, * Refuse to fault imported pages. This should be handled * (if at all) by redirecting mmap to the exporter. */
- if (bo->ttm && (bo->ttm->page_flags & TTM_PAGE_FLAG_SG)) {
- if (bo->ttm && (bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { dma_resv_unlock(bo->base.resv); return VM_FAULT_SIGBUS; }
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index c961a788b519..1bba0a0ed3f9 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -371,7 +371,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt, WARN_ON(!num_pages || ttm_tt_is_populated(tt)); WARN_ON(dma_addr && !pool->dev);
- if (tt->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)
if (tt->page_flags & TTM_TT_FLAG_ZERO_ALLOC) gfp_flags |= __GFP_ZERO;
if (ctx->gfp_retry_mayfail)
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 980ecb079b2c..86f31fde6e35 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -68,12 +68,12 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc) switch (bo->type) { case ttm_bo_type_device: if (zero_alloc)
page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC;
break; case ttm_bo_type_kernel: break; case ttm_bo_type_sg:page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
page_flags |= TTM_PAGE_FLAG_SG;
break; default: pr_err("Illegal buffer object type\n");page_flags |= TTM_TT_FLAG_EXTERNAL;
@@ -156,7 +156,7 @@ EXPORT_SYMBOL(ttm_tt_init);
void ttm_tt_fini(struct ttm_tt *ttm) {
- WARN_ON(ttm->page_flags & TTM_PAGE_FLAG_PRIV_POPULATED);
WARN_ON(ttm->page_flags & TTM_TT_FLAG_PRIV_POPULATED);
if (ttm->swap_storage) fput(ttm->swap_storage);
@@ -178,7 +178,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
ttm_tt_init_fields(ttm, bo, page_flags, caching);
- if (page_flags & TTM_PAGE_FLAG_SG)
- if (page_flags & TTM_TT_FLAG_EXTERNAL) ret = ttm_sg_tt_alloc_page_directory(ttm); else ret = ttm_dma_tt_alloc_page_directory(ttm);
@@ -224,7 +224,7 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
fput(swap_storage); ttm->swap_storage = NULL;
- ttm->page_flags &= ~TTM_PAGE_FLAG_SWAPPED;
ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
return 0;
@@ -279,7 +279,7 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
ttm_tt_unpopulate(bdev, ttm); ttm->swap_storage = swap_storage;
- ttm->page_flags |= TTM_PAGE_FLAG_SWAPPED;
ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
return ttm->num_pages;
@@ -300,7 +300,7 @@ int ttm_tt_populate(struct ttm_device *bdev, if (ttm_tt_is_populated(ttm)) return 0;
- if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
- if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { atomic_long_add(ttm->num_pages, &ttm_pages_allocated); if (bdev->pool.use_dma32) atomic_long_add(ttm->num_pages,
@@ -325,8 +325,8 @@ int ttm_tt_populate(struct ttm_device *bdev, if (ret) goto error;
- ttm->page_flags |= TTM_PAGE_FLAG_PRIV_POPULATED;
- if (unlikely(ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) {
- ttm->page_flags |= TTM_TT_FLAG_PRIV_POPULATED;
- if (unlikely(ttm->page_flags & TTM_TT_FLAG_SWAPPED)) { ret = ttm_tt_swapin(ttm); if (unlikely(ret != 0)) { ttm_tt_unpopulate(bdev, ttm);
@@ -337,7 +337,7 @@ int ttm_tt_populate(struct ttm_device *bdev, return 0;
error:
- if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
- if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); if (bdev->pool.use_dma32) atomic_long_sub(ttm->num_pages,
@@ -357,14 +357,14 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) else ttm_pool_free(&bdev->pool, ttm);
- if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
- if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); if (bdev->pool.use_dma32) atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated); }
- ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED;
ttm->page_flags &= ~TTM_TT_FLAG_PRIV_POPULATED; }
#ifdef CONFIG_DEBUG_FS
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h index cbe03d45e883..0a4ddec78d8f 100644 --- a/include/drm/ttm/ttm_device.h +++ b/include/drm/ttm/ttm_device.h @@ -65,7 +65,7 @@ struct ttm_device_funcs { * ttm_tt_create * * @bo: The buffer object to create the ttm for.
* @page_flags: Page flags as identified by TTM_PAGE_FLAG_XX flags.
* @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
- Create a struct ttm_tt to back data with system memory pages.
- No pages are actually allocated.
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index 842ce756213c..b023cd58ff38 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -38,17 +38,17 @@ struct ttm_resource; struct ttm_buffer_object; struct ttm_operation_ctx;
-#define TTM_PAGE_FLAG_SWAPPED (1 << 4) -#define TTM_PAGE_FLAG_ZERO_ALLOC (1 << 6) -#define TTM_PAGE_FLAG_SG (1 << 8) +#define TTM_TT_FLAG_SWAPPED (1 << 0) +#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1) +#define TTM_TT_FLAG_EXTERNAL (1 << 2)
-#define TTM_PAGE_FLAG_PRIV_POPULATED (1 << 31) +#define TTM_TT_FLAG_PRIV_POPULATED (1 << 31)
/**
- struct ttm_tt
- @pages: Array of pages backing the data.
- @page_flags: see TTM_PAGE_FLAG_*
- @page_flags: see TTM_TT_FLAG_*
- @num_pages: Number of pages in the page array.
- @sg: for SG objects via dma-buf
- @dma_address: The DMA (bus) addresses of the pages
@@ -84,7 +84,7 @@ struct ttm_kmap_iter_tt {
static inline bool ttm_tt_is_populated(struct ttm_tt *tt) {
- return tt->page_flags & TTM_PAGE_FLAG_PRIV_POPULATED;
return tt->page_flags & TTM_TT_FLAG_PRIV_POPULATED; }
/**
@@ -103,7 +103,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
- @ttm: The struct ttm_tt.
- @bo: The buffer object we create the ttm for.
- @page_flags: Page flags as identified by TTM_PAGE_FLAG_XX flags.
- @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
- @caching: the desired caching state of the pages
- Create a struct ttm_tt to back data with system memory pages.
@@ -178,7 +178,7 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm); */ static inline void ttm_tt_mark_for_clear(struct ttm_tt *ttm) {
- ttm->page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC;
ttm->page_flags |= TTM_TT_FLAG_ZERO_ALLOC; }
void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages);
@@ -194,7 +194,7 @@ struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
- @bo: Buffer object we allocate the ttm for.
- @bridge: The agp bridge this device is sitting on.
- @page_flags: Page flags as identified by TTM_PAGE_FLAG_XX flags.
- @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
- Create a TTM backend that uses the indicated AGP bridge as an aperture
Move it to inline kernel-doc, otherwise we can't add empty lines it seems. Also drop the kernel-doc for pages_list, which doesn't seem to exist.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com --- include/drm/ttm/ttm_tt.h | 57 ++++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 19 deletions(-)
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index b023cd58ff38..f3ef568da651 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -38,35 +38,54 @@ struct ttm_resource; struct ttm_buffer_object; struct ttm_operation_ctx;
-#define TTM_TT_FLAG_SWAPPED (1 << 0) -#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1) -#define TTM_TT_FLAG_EXTERNAL (1 << 2) - -#define TTM_TT_FLAG_PRIV_POPULATED (1 << 31) - /** - * struct ttm_tt - * - * @pages: Array of pages backing the data. - * @page_flags: see TTM_TT_FLAG_* - * @num_pages: Number of pages in the page array. - * @sg: for SG objects via dma-buf - * @dma_address: The DMA (bus) addresses of the pages - * @swap_storage: Pointer to shmem struct file for swap storage. - * @pages_list: used by some page allocation backend - * @caching: The current caching state of the pages, see enum ttm_caching. - * - * This is a structure holding the pages, caching- and aperture binding - * status for a buffer object that isn't backed by fixed (VRAM / AGP) + * struct ttm_tt - This is a structure holding the pages, caching- and aperture + * binding status for a buffer object that isn't backed by fixed (VRAM / AGP) * memory. */ struct ttm_tt { + /** @pages: Array of pages backing the data. */ struct page **pages; + /** + * @page_flags: The page flags. + * + * Supported values: + * + * TTM_TT_FLAG_SWAPPED: Set if the pages have been swapped out. + * Calling ttm_tt_populate() will swap the pages back in, and unset the + * flag. + * + * TTM_TT_FLAG_ZERO_ALLOC: Set if the pages will be zeroed on + * allocation. + * + * TTM_TT_FLAG_EXTERNAL: Set if the underlying pages were allocated + * externally, like with dma-buf or userptr. This effectively disables + * TTM swapping out such pages. Also important is to prevent TTM from + * ever directly mapping these pages. + * + * Note that enum ttm_bo_type.ttm_bo_type_sg objects will always enable + * this flag. + * + * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. + */ +#define TTM_TT_FLAG_SWAPPED (1 << 0) +#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1) +#define TTM_TT_FLAG_EXTERNAL (1 << 2) + +#define TTM_TT_FLAG_PRIV_POPULATED (1 << 31) uint32_t page_flags; + /** @num_pages: Number of pages in the page array. */ uint32_t num_pages; + /** @sg: for SG objects via dma-buf. */ struct sg_table *sg; + /** @dma_address: The DMA (bus) addresses of the pages. */ dma_addr_t *dma_address; + /** @swap_storage: Pointer to shmem struct file for swap storage. */ struct file *swap_storage; + /** + * @caching: The current caching state of the pages, see enum + * ttm_caching. + */ enum ttm_caching caching; };
Am 21.09.21 um 13:01 schrieb Matthew Auld:
Move it to inline kernel-doc, otherwise we can't add empty lines it seems. Also drop the kernel-doc for pages_list, which doesn't seem to exist.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
One comment below, with that fixed Reviewed-by: Christian König christian.koenig@amd.com
include/drm/ttm/ttm_tt.h | 57 ++++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 19 deletions(-)
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index b023cd58ff38..f3ef568da651 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -38,35 +38,54 @@ struct ttm_resource; struct ttm_buffer_object; struct ttm_operation_ctx;
-#define TTM_TT_FLAG_SWAPPED (1 << 0) -#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1) -#define TTM_TT_FLAG_EXTERNAL (1 << 2)
-#define TTM_TT_FLAG_PRIV_POPULATED (1 << 31)
- /**
- struct ttm_tt
- @pages: Array of pages backing the data.
- @page_flags: see TTM_TT_FLAG_*
- @num_pages: Number of pages in the page array.
- @sg: for SG objects via dma-buf
- @dma_address: The DMA (bus) addresses of the pages
- @swap_storage: Pointer to shmem struct file for swap storage.
- @pages_list: used by some page allocation backend
- @caching: The current caching state of the pages, see enum ttm_caching.
- This is a structure holding the pages, caching- and aperture binding
- status for a buffer object that isn't backed by fixed (VRAM / AGP)
- struct ttm_tt - This is a structure holding the pages, caching- and aperture
*/ struct ttm_tt {
- binding status for a buffer object that isn't backed by fixed (VRAM / AGP)
- memory.
- /** @pages: Array of pages backing the data. */ struct page **pages;
- /**
* @page_flags: The page flags.
*
* Supported values:
*
* TTM_TT_FLAG_SWAPPED: Set if the pages have been swapped out.
* Calling ttm_tt_populate() will swap the pages back in, and unset the
* flag.
*
* TTM_TT_FLAG_ZERO_ALLOC: Set if the pages will be zeroed on
* allocation.
*
* TTM_TT_FLAG_EXTERNAL: Set if the underlying pages were allocated
* externally, like with dma-buf or userptr. This effectively disables
* TTM swapping out such pages. Also important is to prevent TTM from
* ever directly mapping these pages.
*
* Note that enum ttm_bo_type.ttm_bo_type_sg objects will always enable
* this flag.
*
* TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE.
The swapped flag should probably not be touched by the drivers either.
Better just describe what this is good for.
Christian.
*/
+#define TTM_TT_FLAG_SWAPPED (1 << 0) +#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1) +#define TTM_TT_FLAG_EXTERNAL (1 << 2)
+#define TTM_TT_FLAG_PRIV_POPULATED (1 << 31) uint32_t page_flags;
- /** @num_pages: Number of pages in the page array. */ uint32_t num_pages;
- /** @sg: for SG objects via dma-buf. */ struct sg_table *sg;
- /** @dma_address: The DMA (bus) addresses of the pages. */ dma_addr_t *dma_address;
- /** @swap_storage: Pointer to shmem struct file for swap storage. */ struct file *swap_storage;
- /**
* @caching: The current caching state of the pages, see enum
* ttm_caching.
enum ttm_caching caching; };*/
In commit:
commit 667a50db0477d47fdff01c666f5ee1ce26b5264c Author: Thomas Hellstrom thellstrom@vmware.com Date: Fri Jan 3 11:17:18 2014 +0100
drm/ttm: Refuse to fault (prime-) imported pages
we introduced the restriction that imported pages should not be directly mappable through TTM(this also extends to userptr). In the next patch we want to introduce a shmem_tt backend, which should follow all the existing rules with TTM_PAGE_FLAG_EXTERNAL, since it will need to handle swapping itself, but with the above mapping restriction lifted.
v2(Christian): - Don't OR together EXTERNAL and EXTERNAL_MAPPABLE in the definition of EXTERNAL_MAPPABLE, just leave it the caller to handle this correctly, otherwise we might encounter subtle issues.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 6 ++++-- drivers/gpu/drm/ttm/ttm_tt.c | 3 +++ include/drm/ttm/ttm_tt.h | 19 ++++++++++++++++--- 3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 950f4f132802..33680c94127c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -163,8 +163,10 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo, * (if at all) by redirecting mmap to the exporter. */ if (bo->ttm && (bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) { - dma_resv_unlock(bo->base.resv); - return VM_FAULT_SIGBUS; + if (!(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE)) { + dma_resv_unlock(bo->base.resv); + return VM_FAULT_SIGBUS; + } }
return 0; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 86f31fde6e35..7e83c00a3f48 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -84,6 +84,9 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc) if (unlikely(bo->ttm == NULL)) return -ENOMEM;
+ WARN_ON(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE && + !(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)); + return 0; }
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index f3ef568da651..5995b8ee1667 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -66,11 +66,24 @@ struct ttm_tt { * Note that enum ttm_bo_type.ttm_bo_type_sg objects will always enable * this flag. * + * TTM_TT_FLAG_EXTERNAL_MAPPABLE: Same behaviour as + * TTM_TT_FLAG_EXTERNAL, but with the reduced restriction that it is + * still valid to use TTM to map the pages directly. This is useful when + * implementing a ttm_tt backend which still allocates driver owned + * pages underneath(say with shmem). + * + * Note that since this also implies TTM_TT_FLAG_EXTERNAL, the usage + * here should always be: + * + * page_flags = TTM_TT_FLAG_EXTERNAL | + * TTM_TT_FLAG_EXTERNAL_MAPPABLE; + * * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. */ -#define TTM_TT_FLAG_SWAPPED (1 << 0) -#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1) -#define TTM_TT_FLAG_EXTERNAL (1 << 2) +#define TTM_TT_FLAG_SWAPPED (1 << 0) +#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1) +#define TTM_TT_FLAG_EXTERNAL (1 << 2) +#define TTM_TT_FLAG_EXTERNAL_MAPPABLE (1 << 3)
#define TTM_TT_FLAG_PRIV_POPULATED (1 << 31) uint32_t page_flags;
From: Thomas Hellström thomas.hellstrom@linux.intel.com
Break out some shmem backend utils for future reuse by the TTM backend: shmem_alloc_st(), shmem_free_st() and __shmem_writeback() which we can use to provide a shmem-backed TTM page pool for cached-only TTM buffer objects.
Main functional change here is that we now compute the page sizes using the dma segments rather than using the physical page address segments.
v2(Reported-by: kernel test robot lkp@intel.com) - Make sure we initialise the mapping on the error path in shmem_get_pages()
Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 181 +++++++++++++--------- 1 file changed, 106 insertions(+), 75 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 11f072193f3b..36b711ae9e28 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -25,46 +25,61 @@ static void check_release_pagevec(struct pagevec *pvec) cond_resched(); }
-static int shmem_get_pages(struct drm_i915_gem_object *obj) +static void shmem_free_st(struct sg_table *st, struct address_space *mapping, + bool dirty, bool backup) { - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct intel_memory_region *mem = obj->mm.region; - const unsigned long page_count = obj->base.size / PAGE_SIZE; + struct sgt_iter sgt_iter; + struct pagevec pvec; + struct page *page; + + mapping_clear_unevictable(mapping); + + pagevec_init(&pvec); + for_each_sgt_page(page, sgt_iter, st) { + if (dirty) + set_page_dirty(page); + + if (backup) + mark_page_accessed(page); + + if (!pagevec_add(&pvec, page)) + check_release_pagevec(&pvec); + } + if (pagevec_count(&pvec)) + check_release_pagevec(&pvec); + + sg_free_table(st); + kfree(st); +} + +static struct sg_table *shmem_alloc_st(struct drm_i915_private *i915, + size_t size, struct intel_memory_region *mr, + struct address_space *mapping, + unsigned int max_segment) +{ + const unsigned long page_count = size / PAGE_SIZE; unsigned long i; - struct address_space *mapping; struct sg_table *st; struct scatterlist *sg; - struct sgt_iter sgt_iter; struct page *page; unsigned long last_pfn = 0; /* suppress gcc warning */ - unsigned int max_segment = i915_sg_segment_size(); - unsigned int sg_page_sizes; gfp_t noreclaim; int ret;
- /* - * Assert that the object is not currently in any GPU domain. As it - * wasn't in the GTT, there shouldn't be any way it could have been in - * a GPU cache - */ - GEM_BUG_ON(obj->read_domains & I915_GEM_GPU_DOMAINS); - GEM_BUG_ON(obj->write_domain & I915_GEM_GPU_DOMAINS); - /* * If there's no chance of allocating enough pages for the whole * object, bail early. */ - if (obj->base.size > resource_size(&mem->region)) - return -ENOMEM; + if (size > resource_size(&mr->region)) + return ERR_PTR(-ENOMEM);
st = kmalloc(sizeof(*st), GFP_KERNEL); if (!st) - return -ENOMEM; + return ERR_PTR(-ENOMEM);
-rebuild_st: if (sg_alloc_table(st, page_count, GFP_KERNEL)) { kfree(st); - return -ENOMEM; + return ERR_PTR(-ENOMEM); }
/* @@ -73,14 +88,12 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) * * Fail silently without starting the shrinker */ - mapping = obj->base.filp->f_mapping; mapping_set_unevictable(mapping); noreclaim = mapping_gfp_constraint(mapping, ~__GFP_RECLAIM); noreclaim |= __GFP_NORETRY | __GFP_NOWARN;
sg = st->sgl; st->nents = 0; - sg_page_sizes = 0; for (i = 0; i < page_count; i++) { const unsigned int shrink[] = { I915_SHRINK_BOUND | I915_SHRINK_UNBOUND, @@ -135,10 +148,9 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) if (!i || sg->length >= max_segment || page_to_pfn(page) != last_pfn + 1) { - if (i) { - sg_page_sizes |= sg->length; + if (i) sg = sg_next(sg); - } + st->nents++; sg_set_page(sg, page, PAGE_SIZE, 0); } else { @@ -149,14 +161,65 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) /* Check that the i965g/gm workaround works. */ GEM_BUG_ON(gfp & __GFP_DMA32 && last_pfn >= 0x00100000UL); } - if (sg) { /* loop terminated early; short sg table */ - sg_page_sizes |= sg->length; + if (sg) /* loop terminated early; short sg table */ sg_mark_end(sg); - }
/* Trim unused sg entries to avoid wasting memory. */ i915_sg_trim(st);
+ return st; +err_sg: + sg_mark_end(sg); + if (sg != st->sgl) { + shmem_free_st(st, mapping, false, false); + } else { + mapping_clear_unevictable(mapping); + sg_free_table(st); + kfree(st); + } + + /* + * shmemfs first checks if there is enough memory to allocate the page + * and reports ENOSPC should there be insufficient, along with the usual + * ENOMEM for a genuine allocation failure. + * + * We use ENOSPC in our driver to mean that we have run out of aperture + * space and so want to translate the error from shmemfs back to our + * usual understanding of ENOMEM. + */ + if (ret == -ENOSPC) + ret = -ENOMEM; + + return ERR_PTR(ret); +} + +static int shmem_get_pages(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_memory_region *mem = obj->mm.region; + struct address_space *mapping = obj->base.filp->f_mapping; + const unsigned long page_count = obj->base.size / PAGE_SIZE; + unsigned int max_segment = i915_sg_segment_size(); + struct sg_table *st; + struct sgt_iter sgt_iter; + struct page *page; + int ret; + + /* + * Assert that the object is not currently in any GPU domain. As it + * wasn't in the GTT, there shouldn't be any way it could have been in + * a GPU cache + */ + GEM_BUG_ON(obj->read_domains & I915_GEM_GPU_DOMAINS); + GEM_BUG_ON(obj->write_domain & I915_GEM_GPU_DOMAINS); + +rebuild_st: + st = shmem_alloc_st(i915, obj->base.size, mem, mapping, max_segment); + if (IS_ERR(st)) { + ret = PTR_ERR(st); + goto err_st; + } + ret = i915_gem_gtt_prepare_pages(obj, st); if (ret) { /* @@ -168,6 +231,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) for_each_sgt_page(page, sgt_iter, st) put_page(page); sg_free_table(st); + kfree(st);
max_segment = PAGE_SIZE; goto rebuild_st; @@ -200,28 +264,12 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) if (IS_JSL_EHL(i915) && obj->flags & I915_BO_ALLOC_USER) obj->cache_dirty = true;
- __i915_gem_object_set_pages(obj, st, sg_page_sizes); + __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
return 0;
-err_sg: - sg_mark_end(sg); err_pages: - mapping_clear_unevictable(mapping); - if (sg != st->sgl) { - struct pagevec pvec; - - pagevec_init(&pvec); - for_each_sgt_page(page, sgt_iter, st) { - if (!pagevec_add(&pvec, page)) - check_release_pagevec(&pvec); - } - if (pagevec_count(&pvec)) - check_release_pagevec(&pvec); - } - sg_free_table(st); - kfree(st); - + shmem_free_st(st, mapping, false, false); /* * shmemfs first checks if there is enough memory to allocate the page * and reports ENOSPC should there be insufficient, along with the usual @@ -231,6 +279,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) * space and so want to translate the error from shmemfs back to our * usual understanding of ENOMEM. */ +err_st: if (ret == -ENOSPC) ret = -ENOMEM;
@@ -251,10 +300,8 @@ shmem_truncate(struct drm_i915_gem_object *obj) obj->mm.pages = ERR_PTR(-EFAULT); }
-static void -shmem_writeback(struct drm_i915_gem_object *obj) +static void __shmem_writeback(size_t size, struct address_space *mapping) { - struct address_space *mapping; struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, .nr_to_write = SWAP_CLUSTER_MAX, @@ -270,10 +317,9 @@ shmem_writeback(struct drm_i915_gem_object *obj) * instead of invoking writeback so they are aged and paged out * as normal. */ - mapping = obj->base.filp->f_mapping;
/* Begin writeback on each dirty page */ - for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) { + for (i = 0; i < size >> PAGE_SHIFT; i++) { struct page *page;
page = find_lock_page(mapping, i); @@ -296,6 +342,12 @@ shmem_writeback(struct drm_i915_gem_object *obj) } }
+static void +shmem_writeback(struct drm_i915_gem_object *obj) +{ + __shmem_writeback(obj->base.size, obj->base.filp->f_mapping); +} + void __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, struct sg_table *pages, @@ -316,11 +368,6 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, struct sg_table *pages) { - struct sgt_iter sgt_iter; - struct pagevec pvec; - struct page *page; - - GEM_WARN_ON(IS_DGFX(to_i915(obj->base.dev))); __i915_gem_object_release_shmem(obj, pages, true);
i915_gem_gtt_finish_pages(obj, pages); @@ -328,25 +375,9 @@ void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, struct sg_ if (i915_gem_object_needs_bit17_swizzle(obj)) i915_gem_object_save_bit_17_swizzle(obj, pages);
- mapping_clear_unevictable(file_inode(obj->base.filp)->i_mapping); - - pagevec_init(&pvec); - for_each_sgt_page(page, sgt_iter, pages) { - if (obj->mm.dirty) - set_page_dirty(page); - - if (obj->mm.madv == I915_MADV_WILLNEED) - mark_page_accessed(page); - - if (!pagevec_add(&pvec, page)) - check_release_pagevec(&pvec); - } - if (pagevec_count(&pvec)) - check_release_pagevec(&pvec); + shmem_free_st(pages, file_inode(obj->base.filp)->i_mapping, + obj->mm.dirty, obj->mm.madv == I915_MADV_WILLNEED); obj->mm.dirty = false; - - sg_free_table(pages); - kfree(pages); }
static void
For cached objects we can allocate our pages directly in shmem. This should make it possible(in a later patch) to utilise the existing i915-gem shrinker code for such objects. For now this is still disabled.
v2(Thomas): - Add optional try_to_writeback hook for objects. Importantly we need to check if the object is even still shrinkable; in between us dropping the shrinker LRU lock and acquiring the object lock it could for example have been moved. Also we need to differentiate between "lazy" shrinking and the immediate writeback mode. Also later we need to handle objects which don't even have mm.pages, so bundling this into put_pages() would require somehow handling that edge case, hence just letting the ttm backend handle everything in try_to_writeback doesn't seem too bad.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 8 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 2 + drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 14 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 17 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 252 ++++++++++++++++-- 5 files changed, 258 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 48112b9d76df..561d6bd0a5c9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -618,6 +618,14 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj, bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj, enum intel_memory_type type);
+struct sg_table *shmem_alloc_st(struct drm_i915_private *i915, + size_t size, struct intel_memory_region *mr, + struct address_space *mapping, + unsigned int max_segment); +void shmem_free_st(struct sg_table *st, struct address_space *mapping, + bool dirty, bool backup); +void __shmem_writeback(size_t size, struct address_space *mapping); + #ifdef CONFIG_MMU_NOTIFIER static inline bool i915_gem_object_is_userptr(struct drm_i915_gem_object *obj) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 2471f36aaff3..1b7ba859cf5e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -56,6 +56,8 @@ struct drm_i915_gem_object_ops { struct sg_table *pages); void (*truncate)(struct drm_i915_gem_object *obj); void (*writeback)(struct drm_i915_gem_object *obj); + int (*try_to_writeback)(struct drm_i915_gem_object *obj, + bool do_writeback);
int (*pread)(struct drm_i915_gem_object *obj, const struct drm_i915_gem_pread *arg); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 36b711ae9e28..19e55cc29a15 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -25,8 +25,8 @@ static void check_release_pagevec(struct pagevec *pvec) cond_resched(); }
-static void shmem_free_st(struct sg_table *st, struct address_space *mapping, - bool dirty, bool backup) +void shmem_free_st(struct sg_table *st, struct address_space *mapping, + bool dirty, bool backup) { struct sgt_iter sgt_iter; struct pagevec pvec; @@ -52,10 +52,10 @@ static void shmem_free_st(struct sg_table *st, struct address_space *mapping, kfree(st); }
-static struct sg_table *shmem_alloc_st(struct drm_i915_private *i915, - size_t size, struct intel_memory_region *mr, - struct address_space *mapping, - unsigned int max_segment) +struct sg_table *shmem_alloc_st(struct drm_i915_private *i915, + size_t size, struct intel_memory_region *mr, + struct address_space *mapping, + unsigned int max_segment) { const unsigned long page_count = size / PAGE_SIZE; unsigned long i; @@ -300,7 +300,7 @@ shmem_truncate(struct drm_i915_gem_object *obj) obj->mm.pages = ERR_PTR(-EFAULT); }
-static void __shmem_writeback(size_t size, struct address_space *mapping) +void __shmem_writeback(size_t size, struct address_space *mapping) { struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index e382b7f2353b..478663dc42b4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -56,19 +56,24 @@ static bool unsafe_drop_pages(struct drm_i915_gem_object *obj, return false; }
-static void try_to_writeback(struct drm_i915_gem_object *obj, - unsigned int flags) +static int try_to_writeback(struct drm_i915_gem_object *obj, unsigned int flags) { + if (obj->ops->try_to_writeback) + return obj->ops->try_to_writeback(obj, + flags & I915_SHRINK_WRITEBACK); + switch (obj->mm.madv) { case I915_MADV_DONTNEED: i915_gem_object_truncate(obj); - return; + return 0; case __I915_MADV_PURGED: - return; + return 0; }
if (flags & I915_SHRINK_WRITEBACK) i915_gem_object_writeback(obj); + + return 0; }
/** @@ -222,8 +227,8 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww, }
if (!__i915_gem_object_put_pages(obj)) { - try_to_writeback(obj, shrink); - count += obj->base.size >> PAGE_SHIFT; + if (!try_to_writeback(obj, shrink)) + count += obj->base.size >> PAGE_SHIFT; } if (!ww) i915_gem_object_unlock(obj); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index fd5b925b27c5..174aebe11264 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -37,6 +37,10 @@ * @ttm: The base TTM page vector. * @dev: The struct device used for dma mapping and unmapping. * @cached_st: The cached scatter-gather table. + * @obj: The GEM object. Should be valid while we have a valid bo->ttm. + * @filp: The shmem file, if using shmem backend. + * @is_shmem: Set if using shmem. + * @do_backup: If using shmem, do the swap step when unpopulating. * * Note that DMA may be going on right up to the point where the page- * vector is unpopulated in delayed destroy. Hence keep the @@ -48,6 +52,10 @@ struct i915_ttm_tt { struct ttm_tt ttm; struct device *dev; struct sg_table *cached_st; + struct drm_i915_gem_object *obj; + struct file *filp; + bool is_shmem; + bool do_backup; };
static const struct ttm_place sys_placement_flags = { @@ -167,12 +175,104 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj, placement->busy_placement = busy; }
+static int i915_ttm_tt_shmem_populate(struct ttm_device *bdev, + struct ttm_tt *ttm, + struct ttm_operation_ctx *ctx) +{ + struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev); + struct intel_memory_region *mr = i915->mm.regions[INTEL_MEMORY_SYSTEM]; + struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm); + const unsigned int max_segment = i915_sg_segment_size(); + const size_t size = ttm->num_pages << PAGE_SHIFT; + struct drm_i915_gem_object *obj = i915_tt->obj; + struct file *filp = i915_tt->filp; + struct sgt_iter sgt_iter; + struct sg_table *st; + struct page *page; + unsigned long i; + int err; + + GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); + + if (!filp) { + struct address_space *mapping; + gfp_t mask; + + filp = shmem_file_setup("i915-shmem-tt", size, VM_NORESERVE); + if (IS_ERR(filp)) + return PTR_ERR(filp); + + mask = GFP_HIGHUSER | __GFP_RECLAIMABLE; + + mapping = filp->f_mapping; + mapping_set_gfp_mask(mapping, mask); + GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM)); + + i915_tt->filp = filp; + } + + st = shmem_alloc_st(i915, size, mr, filp->f_mapping, max_segment); + if (IS_ERR(st)) + return PTR_ERR(st); + + err = dma_map_sg_attrs(i915_tt->dev, + st->sgl, st->nents, + PCI_DMA_BIDIRECTIONAL, + DMA_ATTR_SKIP_CPU_SYNC | + DMA_ATTR_NO_KERNEL_MAPPING | + DMA_ATTR_NO_WARN); + if (err <= 0) { + err = -EINVAL; + goto err_free_st; + } + + i = 0; + for_each_sgt_page(page, sgt_iter, st) + ttm->pages[i++] = page; + + if (ttm->page_flags & TTM_TT_FLAG_SWAPPED) + ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED; + + i915_tt->cached_st = st; + return 0; + +err_free_st: + shmem_free_st(st, filp->f_mapping, false, false); + return err; +} + +static void i915_ttm_tt_shmem_unpopulate(struct ttm_tt *ttm) +{ + struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm); + struct drm_i915_gem_object *obj = i915_tt->obj; + bool backup = i915_tt->do_backup; + + if (obj->mm.madv == I915_MADV_DONTNEED) { + obj->mm.dirty = false; + GEM_BUG_ON(backup); + } + + dma_unmap_sg(i915_tt->dev, i915_tt->cached_st->sgl, + i915_tt->cached_st->nents, + PCI_DMA_BIDIRECTIONAL); + + shmem_free_st(fetch_and_zero(&i915_tt->cached_st), + file_inode(i915_tt->filp)->i_mapping, + obj->mm.dirty, backup); + + obj->mm.dirty = false; + + if (backup) + ttm->page_flags |= TTM_TT_FLAG_SWAPPED; +} + static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, uint32_t page_flags) { struct ttm_resource_manager *man = ttm_manager_type(bo->bdev, bo->resource->mem_type); struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); + enum ttm_caching caching = i915_ttm_select_tt_caching(obj); struct i915_ttm_tt *i915_tt; int ret;
@@ -184,36 +284,63 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, man->use_tt) page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
- ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, - i915_ttm_select_tt_caching(obj)); - if (ret) { - kfree(i915_tt); - return NULL; + if (i915_gem_object_is_shrinkable(obj) && caching == ttm_cached) { + page_flags |= TTM_TT_FLAG_EXTERNAL | + TTM_TT_FLAG_EXTERNAL_MAPPABLE; + i915_tt->is_shmem = true; }
+ ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching); + if (ret) + goto err_free; + i915_tt->dev = obj->base.dev->dev; + i915_tt->obj = obj;
return &i915_tt->ttm; + +err_free: + kfree(i915_tt); + return NULL; +} + +static int i915_ttm_tt_populate(struct ttm_device *bdev, + struct ttm_tt *ttm, + struct ttm_operation_ctx *ctx) +{ + struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm); + + if (i915_tt->is_shmem) + return i915_ttm_tt_shmem_populate(bdev, ttm, ctx); + + return ttm_pool_alloc(&bdev->pool, ttm, ctx); }
static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) { struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
- if (i915_tt->cached_st) { - dma_unmap_sgtable(i915_tt->dev, i915_tt->cached_st, - DMA_BIDIRECTIONAL, 0); - sg_free_table(i915_tt->cached_st); - kfree(i915_tt->cached_st); - i915_tt->cached_st = NULL; + if (i915_tt->is_shmem) { + i915_ttm_tt_shmem_unpopulate(ttm); + } else { + if (i915_tt->cached_st) { + dma_unmap_sgtable(i915_tt->dev, i915_tt->cached_st, + DMA_BIDIRECTIONAL, 0); + sg_free_table(i915_tt->cached_st); + kfree(i915_tt->cached_st); + i915_tt->cached_st = NULL; + } + ttm_pool_free(&bdev->pool, ttm); } - ttm_pool_free(&bdev->pool, ttm); }
static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm) { struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
+ if (i915_tt->filp) + fput(i915_tt->filp); + ttm_tt_fini(ttm); kfree(i915_tt); } @@ -223,6 +350,13 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo, { struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+ /* + * EXTERNAL objects should never be swapped out by TTM, instead we need + * to handle that ourselves. + */ + GEM_BUG_ON(place->mem_type == TTM_PL_SYSTEM && + bo->ttm && bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL); + /* Will do for now. Our pinned objects are still on TTM's LRU lists */ return i915_gem_object_evictable(obj); } @@ -316,9 +450,11 @@ static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj) i915_gem_object_set_cache_coherency(obj, cache_level); }
-static void i915_ttm_purge(struct drm_i915_gem_object *obj) +static int __i915_ttm_purge(struct drm_i915_gem_object *obj) { struct ttm_buffer_object *bo = i915_gem_to_ttm(obj); + struct i915_ttm_tt *i915_tt = + container_of(bo->ttm, typeof(*i915_tt), ttm); struct ttm_operation_ctx ctx = { .interruptible = true, .no_wait_gpu = false, @@ -327,17 +463,81 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj) int ret;
if (obj->mm.madv == __I915_MADV_PURGED) - return; + return 0;
- /* TTM's purge interface. Note that we might be reentering. */ ret = ttm_bo_validate(bo, &place, &ctx); - if (!ret) { - obj->write_domain = 0; - obj->read_domains = 0; - i915_ttm_adjust_gem_after_move(obj); - i915_ttm_free_cached_io_st(obj); - obj->mm.madv = __I915_MADV_PURGED; + if (ret) + return ret; + + if (bo->ttm && i915_tt->is_shmem) { + /* + * The below fput(which eventually calls shmem_truncate) might + * be delayed by worker, so when directly called to purge the + * pages(like by the shrinker) we should try to be more + * aggressive and release the pages immediately. + */ + if (i915_tt->filp) { + shmem_truncate_range(file_inode(i915_tt->filp), + 0, (loff_t)-1); + fput(fetch_and_zero(&i915_tt->filp)); + } + } + + obj->write_domain = 0; + obj->read_domains = 0; + i915_ttm_adjust_gem_after_move(obj); + i915_ttm_free_cached_io_st(obj); + obj->mm.madv = __I915_MADV_PURGED; + return 0; +} + +static void i915_ttm_purge(struct drm_i915_gem_object *obj) +{ + __i915_ttm_purge(obj); +} + +static int i915_ttm_try_to_writeback(struct drm_i915_gem_object *obj, + bool do_writeback) +{ + struct ttm_buffer_object *bo = i915_gem_to_ttm(obj); + struct i915_ttm_tt *i915_tt = + container_of(bo->ttm, typeof(*i915_tt), ttm); + struct ttm_operation_ctx ctx = { + .interruptible = true, + .no_wait_gpu = false, + }; + struct ttm_placement place = {}; + int ret; + + if (bo->resource->mem_type != TTM_PL_SYSTEM) + return -EBUSY; + + if (!bo->ttm || !i915_tt->is_shmem) + return -EBUSY; + + if (!i915_tt->filp) + return -EBUSY; + + if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED) + return 0; + + switch (obj->mm.madv) { + case I915_MADV_DONTNEED: + return __i915_ttm_purge(obj); + case __I915_MADV_PURGED: + return 0; } + + i915_tt->do_backup = true; + ret = ttm_bo_validate(bo, &place, &ctx); + i915_tt->do_backup = false; + if (ret) + return ret; + + if (do_writeback) + __shmem_writeback(obj->base.size, i915_tt->filp->f_mapping); + + return 0; }
static void i915_ttm_swap_notify(struct ttm_buffer_object *bo) @@ -602,6 +802,7 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
static struct ttm_device_funcs i915_ttm_bo_driver = { .ttm_tt_create = i915_ttm_tt_create, + .ttm_tt_populate = i915_ttm_tt_populate, .ttm_tt_unpopulate = i915_ttm_tt_unpopulate, .ttm_tt_destroy = i915_ttm_tt_destroy, .eviction_valuable = i915_ttm_eviction_valuable, @@ -669,12 +870,17 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj, }
if (!i915_gem_object_has_pages(obj)) { + struct i915_ttm_tt *i915_tt = + container_of(bo->ttm, typeof(*i915_tt), ttm); + /* Object either has a page vector or is an iomem object */ st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st; if (IS_ERR(st)) return PTR_ERR(st);
__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl)); + if (!bo->ttm || !i915_tt->is_shmem) + i915_gem_object_make_unshrinkable(obj); }
return ret; @@ -871,9 +1077,12 @@ static const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = { .get_pages = i915_ttm_get_pages, .put_pages = i915_ttm_put_pages, .truncate = i915_ttm_purge, + .try_to_writeback = i915_ttm_try_to_writeback, + .adjust_lru = i915_ttm_adjust_lru, .delayed_free = i915_ttm_delayed_free, .migrate = i915_ttm_migrate, + .mmap_offset = i915_ttm_mmap_offset, .mmap_ops = &vm_ops_ttm, }; @@ -919,7 +1128,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem, drm_gem_private_object_init(&i915->drm, &obj->base, size); i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags); i915_gem_object_init_memory_region(obj, mem); - i915_gem_object_make_unshrinkable(obj); INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN); mutex_init(&obj->ttm.get_io_page.lock); bo_type = (obj->flags & I915_BO_ALLOC_USER) ? ttm_bo_type_device :
Hi, Matthew,
On 9/21/21 1:01 PM, Matthew Auld wrote:
For cached objects we can allocate our pages directly in shmem. This should make it possible(in a later patch) to utilise the existing i915-gem shrinker code for such objects. For now this is still disabled.
v2(Thomas):
- Add optional try_to_writeback hook for objects. Importantly we need to check if the object is even still shrinkable; in between us dropping the shrinker LRU lock and acquiring the object lock it could for example have been moved. Also we need to differentiate between "lazy" shrinking and the immediate writeback mode. Also later we need to handle objects which don't even have mm.pages, so bundling this into put_pages() would require somehow handling that edge case, hence just letting the ttm backend handle everything in try_to_writeback doesn't seem too bad.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
drivers/gpu/drm/i915/gem/i915_gem_object.h | 8 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 2 + drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 14 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 17 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 252 ++++++++++++++++-- 5 files changed, 258 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 48112b9d76df..561d6bd0a5c9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -618,6 +618,14 @@ int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj, bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj, enum intel_memory_type type);
+struct sg_table *shmem_alloc_st(struct drm_i915_private *i915,
size_t size, struct intel_memory_region *mr,
struct address_space *mapping,
unsigned int max_segment);
+void shmem_free_st(struct sg_table *st, struct address_space *mapping,
bool dirty, bool backup);
+void __shmem_writeback(size_t size, struct address_space *mapping);
- #ifdef CONFIG_MMU_NOTIFIER static inline bool i915_gem_object_is_userptr(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 2471f36aaff3..1b7ba859cf5e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -56,6 +56,8 @@ struct drm_i915_gem_object_ops { struct sg_table *pages); void (*truncate)(struct drm_i915_gem_object *obj); void (*writeback)(struct drm_i915_gem_object *obj);
- int (*try_to_writeback)(struct drm_i915_gem_object *obj,
bool do_writeback);
Sounds like a reasonable approach. Perhaps a different name of the callback since this is a two-step process - releasing the pages to shmem and then optionally writeback to swapcache.
int (*pread)(struct drm_i915_gem_object *obj, const struct drm_i915_gem_pread *arg); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 36b711ae9e28..19e55cc29a15 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -25,8 +25,8 @@ static void check_release_pagevec(struct pagevec *pvec) cond_resched(); }
-static void shmem_free_st(struct sg_table *st, struct address_space *mapping,
bool dirty, bool backup)
+void shmem_free_st(struct sg_table *st, struct address_space *mapping,
{ struct sgt_iter sgt_iter; struct pagevec pvec;bool dirty, bool backup)
@@ -52,10 +52,10 @@ static void shmem_free_st(struct sg_table *st, struct address_space *mapping, kfree(st); }
-static struct sg_table *shmem_alloc_st(struct drm_i915_private *i915,
size_t size, struct intel_memory_region *mr,
struct address_space *mapping,
unsigned int max_segment)
+struct sg_table *shmem_alloc_st(struct drm_i915_private *i915,
size_t size, struct intel_memory_region *mr,
struct address_space *mapping,
{ const unsigned long page_count = size / PAGE_SIZE; unsigned long i;unsigned int max_segment)
@@ -300,7 +300,7 @@ shmem_truncate(struct drm_i915_gem_object *obj) obj->mm.pages = ERR_PTR(-EFAULT); }
-static void __shmem_writeback(size_t size, struct address_space *mapping) +void __shmem_writeback(size_t size, struct address_space *mapping) { struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index e382b7f2353b..478663dc42b4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -56,19 +56,24 @@ static bool unsafe_drop_pages(struct drm_i915_gem_object *obj, return false; }
-static void try_to_writeback(struct drm_i915_gem_object *obj,
unsigned int flags)
+static int try_to_writeback(struct drm_i915_gem_object *obj, unsigned int flags) {
- if (obj->ops->try_to_writeback)
return obj->ops->try_to_writeback(obj,
flags & I915_SHRINK_WRITEBACK);
- switch (obj->mm.madv) { case I915_MADV_DONTNEED: i915_gem_object_truncate(obj);
return;
case __I915_MADV_PURGED:return 0;
return;
return 0;
}
if (flags & I915_SHRINK_WRITEBACK) i915_gem_object_writeback(obj);
return 0; }
/**
@@ -222,8 +227,8 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww, }
if (!__i915_gem_object_put_pages(obj)) {
try_to_writeback(obj, shrink);
count += obj->base.size >> PAGE_SHIFT;
if (!try_to_writeback(obj, shrink))
count += obj->base.size >> PAGE_SHIFT; } if (!ww) i915_gem_object_unlock(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index fd5b925b27c5..174aebe11264 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -37,6 +37,10 @@
- @ttm: The base TTM page vector.
- @dev: The struct device used for dma mapping and unmapping.
- @cached_st: The cached scatter-gather table.
- @obj: The GEM object. Should be valid while we have a valid bo->ttm.
- @filp: The shmem file, if using shmem backend.
- @is_shmem: Set if using shmem.
- @do_backup: If using shmem, do the swap step when unpopulating.
- Note that DMA may be going on right up to the point where the page-
- vector is unpopulated in delayed destroy. Hence keep the
@@ -48,6 +52,10 @@ struct i915_ttm_tt { struct ttm_tt ttm; struct device *dev; struct sg_table *cached_st;
- struct drm_i915_gem_object *obj;
- struct file *filp;
- bool is_shmem;
- bool do_backup;
IMHO we need to clean this up a bit since tts might (although perhaps not while swapping) might have a different lifetime than the object, for example if it's moved to a ghost object during async migration, alternatively group members together that are only parameters to the unpopulate() function; obj and do_backup, and document that they are only valid while the object lock is held. filp and is_shmem should be OK as they are. Perhaps obj could be replaced with bool dirty, and instead of do_backup perhaps we could look at whether the TTM_TT_FLAG_SWAPPED was set before unpopulate?
};
static const struct ttm_place sys_placement_flags = { @@ -167,12 +175,104 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj, placement->busy_placement = busy; }
+static int i915_ttm_tt_shmem_populate(struct ttm_device *bdev,
struct ttm_tt *ttm,
struct ttm_operation_ctx *ctx)
+{
- struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
- struct intel_memory_region *mr = i915->mm.regions[INTEL_MEMORY_SYSTEM];
- struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
- const unsigned int max_segment = i915_sg_segment_size();
- const size_t size = ttm->num_pages << PAGE_SHIFT;
- struct drm_i915_gem_object *obj = i915_tt->obj;
- struct file *filp = i915_tt->filp;
- struct sgt_iter sgt_iter;
- struct sg_table *st;
- struct page *page;
- unsigned long i;
- int err;
- GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED);
- if (!filp) {
struct address_space *mapping;
gfp_t mask;
filp = shmem_file_setup("i915-shmem-tt", size, VM_NORESERVE);
if (IS_ERR(filp))
return PTR_ERR(filp);
mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
mapping = filp->f_mapping;
mapping_set_gfp_mask(mapping, mask);
GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));
i915_tt->filp = filp;
- }
- st = shmem_alloc_st(i915, size, mr, filp->f_mapping, max_segment);
- if (IS_ERR(st))
return PTR_ERR(st);
- err = dma_map_sg_attrs(i915_tt->dev,
st->sgl, st->nents,
PCI_DMA_BIDIRECTIONAL,
DMA_ATTR_SKIP_CPU_SYNC |
DMA_ATTR_NO_KERNEL_MAPPING |
DMA_ATTR_NO_WARN);
- if (err <= 0) {
err = -EINVAL;
goto err_free_st;
- }
- i = 0;
- for_each_sgt_page(page, sgt_iter, st)
ttm->pages[i++] = page;
- if (ttm->page_flags & TTM_TT_FLAG_SWAPPED)
ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
- i915_tt->cached_st = st;
- return 0;
+err_free_st:
- shmem_free_st(st, filp->f_mapping, false, false);
- return err;
+}
+static void i915_ttm_tt_shmem_unpopulate(struct ttm_tt *ttm) +{
- struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
- struct drm_i915_gem_object *obj = i915_tt->obj;
- bool backup = i915_tt->do_backup;
- if (obj->mm.madv == I915_MADV_DONTNEED) {
obj->mm.dirty = false;
GEM_BUG_ON(backup);
- }
- dma_unmap_sg(i915_tt->dev, i915_tt->cached_st->sgl,
i915_tt->cached_st->nents,
PCI_DMA_BIDIRECTIONAL);
- shmem_free_st(fetch_and_zero(&i915_tt->cached_st),
file_inode(i915_tt->filp)->i_mapping,
obj->mm.dirty, backup);
- obj->mm.dirty = false;
- if (backup)
ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
+}
- static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, uint32_t page_flags) { struct ttm_resource_manager *man = ttm_manager_type(bo->bdev, bo->resource->mem_type); struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
- enum ttm_caching caching = i915_ttm_select_tt_caching(obj); struct i915_ttm_tt *i915_tt; int ret;
@@ -184,36 +284,63 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, man->use_tt) page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
- ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags,
i915_ttm_select_tt_caching(obj));
- if (ret) {
kfree(i915_tt);
return NULL;
if (i915_gem_object_is_shrinkable(obj) && caching == ttm_cached) {
page_flags |= TTM_TT_FLAG_EXTERNAL |
TTM_TT_FLAG_EXTERNAL_MAPPABLE;
i915_tt->is_shmem = true;
}
ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
if (ret)
goto err_free;
i915_tt->dev = obj->base.dev->dev;
i915_tt->obj = obj;
return &i915_tt->ttm;
+err_free:
- kfree(i915_tt);
- return NULL;
+}
+static int i915_ttm_tt_populate(struct ttm_device *bdev,
struct ttm_tt *ttm,
struct ttm_operation_ctx *ctx)
+{
struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
if (i915_tt->is_shmem)
return i915_ttm_tt_shmem_populate(bdev, ttm, ctx);
return ttm_pool_alloc(&bdev->pool, ttm, ctx); }
static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) { struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
- if (i915_tt->cached_st) {
dma_unmap_sgtable(i915_tt->dev, i915_tt->cached_st,
DMA_BIDIRECTIONAL, 0);
sg_free_table(i915_tt->cached_st);
kfree(i915_tt->cached_st);
i915_tt->cached_st = NULL;
- if (i915_tt->is_shmem) {
i915_ttm_tt_shmem_unpopulate(ttm);
- } else {
if (i915_tt->cached_st) {
dma_unmap_sgtable(i915_tt->dev, i915_tt->cached_st,
DMA_BIDIRECTIONAL, 0);
sg_free_table(i915_tt->cached_st);
kfree(i915_tt->cached_st);
i915_tt->cached_st = NULL;
}
}ttm_pool_free(&bdev->pool, ttm);
ttm_pool_free(&bdev->pool, ttm); }
static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm) { struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
- if (i915_tt->filp)
fput(i915_tt->filp);
- ttm_tt_fini(ttm); kfree(i915_tt); }
@@ -223,6 +350,13 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo, { struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
- /*
* EXTERNAL objects should never be swapped out by TTM, instead we need
* to handle that ourselves.
*/
- GEM_BUG_ON(place->mem_type == TTM_PL_SYSTEM &&
bo->ttm && bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL);
I guess this needs fixing following Christian's email unless we put all EXTERNAL objects on a bdev->external list.
/Thomas
This is probably a NAK. But ideally we need to somehow prevent TTM from seeing shmem objects when doing its LRU swap walk. Since these are EXTERNAL they are ignored anyway, but keeping them in the LRU seems pretty wasteful. Trying to use bo_pin() for this is all kinds of nasty since we need to be able to do the bo_unpin() from the unpopulate hook, but since that can be called from the BO destroy path we will likely go down in flames.
An alternative is to maybe just add EXTERNAL objects to some bdev->external LRU in TTM, or just don't add them at all?
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 174aebe11264..b438ddb52764 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -800,6 +800,22 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo, return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs; }
+static void i915_ttm_del_from_lru_notify(struct ttm_buffer_object *bo) +{ + struct i915_ttm_tt *i915_tt = + container_of(bo->ttm, typeof(*i915_tt), ttm); + + /* Idealy we need to prevent TTM from seeing shmem objects when doing + * its LRU swap walk. Since these are EXTERNAL they are ignored anyway, + * but keeping them in the LRU is pretty waseful. Trying to use bo_pin() + * for this is very nasty since we need to be able to do the bo_unpin() + * from the unpopulate hook, but since that can be called from the BO + * destroy path we will go down in flames. + */ + if (bo->ttm && ttm_tt_is_populated(bo->ttm) && i915_tt->is_shmem) + list_del_init(&bo->lru); +} + static struct ttm_device_funcs i915_ttm_bo_driver = { .ttm_tt_create = i915_ttm_tt_create, .ttm_tt_populate = i915_ttm_tt_populate, @@ -810,6 +826,7 @@ static struct ttm_device_funcs i915_ttm_bo_driver = { .move = i915_ttm_move, .swap_notify = i915_ttm_swap_notify, .delete_mem_notify = i915_ttm_delete_mem_notify, + .del_from_lru_notify = i915_ttm_del_from_lru_notify, .io_mem_reserve = i915_ttm_io_mem_reserve, .io_mem_pfn = i915_ttm_io_mem_pfn, };
Am 21.09.21 um 13:01 schrieb Matthew Auld:
This is probably a NAK. But ideally we need to somehow prevent TTM from seeing shmem objects when doing its LRU swap walk. Since these are EXTERNAL they are ignored anyway, but keeping them in the LRU seems pretty wasteful. Trying to use bo_pin() for this is all kinds of nasty since we need to be able to do the bo_unpin() from the unpopulate hook, but since that can be called from the BO destroy path we will likely go down in flames.
An alternative is to maybe just add EXTERNAL objects to some bdev->external LRU in TTM, or just don't add them at all?
Yeah, that goes into the same direction as why I want to push the LRU into the resource for some time.
The problem is that the LRU is needed for multiple things. E.g. swapping, GART management, resource constrains, IOMMU teardown etc..
So for now I think that everything should be on the LRU even if it isn't valid to be there for some use case.
Regards, Christian.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 174aebe11264..b438ddb52764 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -800,6 +800,22 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo, return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs; }
+static void i915_ttm_del_from_lru_notify(struct ttm_buffer_object *bo) +{
- struct i915_ttm_tt *i915_tt =
container_of(bo->ttm, typeof(*i915_tt), ttm);
- /* Idealy we need to prevent TTM from seeing shmem objects when doing
* its LRU swap walk. Since these are EXTERNAL they are ignored anyway,
* but keeping them in the LRU is pretty waseful. Trying to use bo_pin()
* for this is very nasty since we need to be able to do the bo_unpin()
* from the unpopulate hook, but since that can be called from the BO
* destroy path we will go down in flames.
*/
- if (bo->ttm && ttm_tt_is_populated(bo->ttm) && i915_tt->is_shmem)
list_del_init(&bo->lru);
+}
- static struct ttm_device_funcs i915_ttm_bo_driver = { .ttm_tt_create = i915_ttm_tt_create, .ttm_tt_populate = i915_ttm_tt_populate,
@@ -810,6 +826,7 @@ static struct ttm_device_funcs i915_ttm_bo_driver = { .move = i915_ttm_move, .swap_notify = i915_ttm_swap_notify, .delete_mem_notify = i915_ttm_delete_mem_notify,
- .del_from_lru_notify = i915_ttm_del_from_lru_notify, .io_mem_reserve = i915_ttm_io_mem_reserve, .io_mem_pfn = i915_ttm_io_mem_pfn, };
On 21/09/2021 12:48, Christian König wrote:
Am 21.09.21 um 13:01 schrieb Matthew Auld:
This is probably a NAK. But ideally we need to somehow prevent TTM from seeing shmem objects when doing its LRU swap walk. Since these are EXTERNAL they are ignored anyway, but keeping them in the LRU seems pretty wasteful. Trying to use bo_pin() for this is all kinds of nasty since we need to be able to do the bo_unpin() from the unpopulate hook, but since that can be called from the BO destroy path we will likely go down in flames.
An alternative is to maybe just add EXTERNAL objects to some bdev->external LRU in TTM, or just don't add them at all?
Yeah, that goes into the same direction as why I want to push the LRU into the resource for some time.
The problem is that the LRU is needed for multiple things. E.g. swapping, GART management, resource constrains, IOMMU teardown etc..
So for now I think that everything should be on the LRU even if it isn't valid to be there for some use case.
Ok. Is it a no-go to keep TT_FLAG_EXTERNAL on say bdev->external?
Regards, Christian.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 174aebe11264..b438ddb52764 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -800,6 +800,22 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo, return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs; } +static void i915_ttm_del_from_lru_notify(struct ttm_buffer_object *bo) +{ + struct i915_ttm_tt *i915_tt = + container_of(bo->ttm, typeof(*i915_tt), ttm);
+ /* Idealy we need to prevent TTM from seeing shmem objects when doing + * its LRU swap walk. Since these are EXTERNAL they are ignored anyway, + * but keeping them in the LRU is pretty waseful. Trying to use bo_pin() + * for this is very nasty since we need to be able to do the bo_unpin() + * from the unpopulate hook, but since that can be called from the BO + * destroy path we will go down in flames. + */ + if (bo->ttm && ttm_tt_is_populated(bo->ttm) && i915_tt->is_shmem) + list_del_init(&bo->lru); +}
static struct ttm_device_funcs i915_ttm_bo_driver = { .ttm_tt_create = i915_ttm_tt_create, .ttm_tt_populate = i915_ttm_tt_populate, @@ -810,6 +826,7 @@ static struct ttm_device_funcs i915_ttm_bo_driver = { .move = i915_ttm_move, .swap_notify = i915_ttm_swap_notify, .delete_mem_notify = i915_ttm_delete_mem_notify, + .del_from_lru_notify = i915_ttm_del_from_lru_notify, .io_mem_reserve = i915_ttm_io_mem_reserve, .io_mem_pfn = i915_ttm_io_mem_pfn, };
Am 22.09.21 um 15:34 schrieb Matthew Auld:
On 21/09/2021 12:48, Christian König wrote:
Am 21.09.21 um 13:01 schrieb Matthew Auld:
This is probably a NAK. But ideally we need to somehow prevent TTM from seeing shmem objects when doing its LRU swap walk. Since these are EXTERNAL they are ignored anyway, but keeping them in the LRU seems pretty wasteful. Trying to use bo_pin() for this is all kinds of nasty since we need to be able to do the bo_unpin() from the unpopulate hook, but since that can be called from the BO destroy path we will likely go down in flames.
An alternative is to maybe just add EXTERNAL objects to some bdev->external LRU in TTM, or just don't add them at all?
Yeah, that goes into the same direction as why I want to push the LRU into the resource for some time.
The problem is that the LRU is needed for multiple things. E.g. swapping, GART management, resource constrains, IOMMU teardown etc..
So for now I think that everything should be on the LRU even if it isn't valid to be there for some use case.
Ok. Is it a no-go to keep TT_FLAG_EXTERNAL on say bdev->external?
We could add that as a workaround, but I would rather aim for cleaning that up more thoughtfully.
Regards, Christian.
Regards, Christian.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 174aebe11264..b438ddb52764 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -800,6 +800,22 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo, return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs; } +static void i915_ttm_del_from_lru_notify(struct ttm_buffer_object *bo) +{ + struct i915_ttm_tt *i915_tt = + container_of(bo->ttm, typeof(*i915_tt), ttm);
+ /* Idealy we need to prevent TTM from seeing shmem objects when doing + * its LRU swap walk. Since these are EXTERNAL they are ignored anyway, + * but keeping them in the LRU is pretty waseful. Trying to use bo_pin() + * for this is very nasty since we need to be able to do the bo_unpin() + * from the unpopulate hook, but since that can be called from the BO + * destroy path we will go down in flames. + */ + if (bo->ttm && ttm_tt_is_populated(bo->ttm) && i915_tt->is_shmem) + list_del_init(&bo->lru); +}
static struct ttm_device_funcs i915_ttm_bo_driver = { .ttm_tt_create = i915_ttm_tt_create, .ttm_tt_populate = i915_ttm_tt_populate, @@ -810,6 +826,7 @@ static struct ttm_device_funcs i915_ttm_bo_driver = { .move = i915_ttm_move, .swap_notify = i915_ttm_swap_notify, .delete_mem_notify = i915_ttm_delete_mem_notify, + .del_from_lru_notify = i915_ttm_del_from_lru_notify, .io_mem_reserve = i915_ttm_io_mem_reserve, .io_mem_pfn = i915_ttm_io_mem_pfn, };
This should let us do an accelerated copy directly to the shmem pages when temporarily moving lmem-only objects, where the i915-gem shrinker can later kick in to swap out the pages, if needed.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index b438ddb52764..f4f7010e1c15 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -125,11 +125,11 @@ static enum ttm_caching i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj) { /* - * Objects only allowed in system get cached cpu-mappings. - * Other objects get WC mapping for now. Even if in system. + * Objects only allowed in system get cached cpu-mappings, or when + * evicting lmem-only buffers to system for swapping. Other objects get + * WC mapping for now. Even if in system. */ - if (obj->mm.region->type == INTEL_MEMORY_SYSTEM && - obj->mm.n_placements <= 1) + if (obj->mm.n_placements <= 1) return ttm_cached;
return ttm_write_combined;
Drop the atomic shrink_pin stuff, and just have make_{un}shrinkable update the shrinker visible lists immediately. This at least simplifies the next patch, and does make the behaviour more obvious. The potential downside is that make_unshrinkable now grabs a global lock even when the object itself is no longer shrinkable(transitioning from purgeable <-> shrinkable doesn't seem to be a thing), for example in the ppGTT insertion paths we should now be careful not to needlessly call make_unshrinkable multiple times. Outside of that there is some fallout in intel_context which relies on nesting calls to shrink_pin.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 ---- .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 16 +----- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 52 +++++++++++++------ drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 1 - drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 1 - drivers/gpu/drm/i915/gt/intel_context.c | 9 +--- 7 files changed, 41 insertions(+), 50 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 6fb9afb65034..e8265a432fcb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -305,15 +305,6 @@ static void i915_gem_free_object(struct drm_gem_object *gem_obj) */ atomic_inc(&i915->mm.free_count);
- /* - * This serializes freeing with the shrinker. Since the free - * is delayed, first by RCU then by the workqueue, we want the - * shrinker to be able to free pages of unreferenced objects, - * or else we may oom whilst there are plenty of deferred - * freed objects. - */ - i915_gem_object_make_unshrinkable(obj); - /* * Since we require blocking on struct_mutex to unbind the freed * object from the GPU before releasing resources back to the diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 1b7ba859cf5e..a0069c643e3d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -455,7 +455,6 @@ struct drm_i915_gem_object { * instead go through the pin/unpin interfaces. */ atomic_t pages_pin_count; - atomic_t shrink_pin;
/** * Priority list of potential placements for this object. @@ -516,7 +515,7 @@ struct drm_i915_gem_object { struct i915_gem_object_page_iter get_dma_page;
/** - * Element within i915->mm.unbound_list or i915->mm.bound_list, + * Element within i915->mm.shrink_list or i915->mm.purge_list, * locked by i915->mm.obj_lock. */ struct list_head link; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 8eb1c3a6fc9c..f0df1394d7f6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -64,28 +64,16 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(i915_gem_object_has_tiling_quirk(obj)); i915_gem_object_set_tiling_quirk(obj); GEM_BUG_ON(!list_empty(&obj->mm.link)); - atomic_inc(&obj->mm.shrink_pin); shrinkable = false; }
if (shrinkable) { - struct list_head *list; - unsigned long flags; - assert_object_held(obj); - spin_lock_irqsave(&i915->mm.obj_lock, flags); - - i915->mm.shrink_count++; - i915->mm.shrink_memory += obj->base.size;
if (obj->mm.madv != I915_MADV_WILLNEED) - list = &i915->mm.purge_list; + i915_gem_object_make_purgeable(obj); else - list = &i915->mm.shrink_list; - list_add_tail(&obj->mm.link, list); - - atomic_set(&obj->mm.shrink_pin, 0); - spin_unlock_irqrestore(&i915->mm.obj_lock, flags); + i915_gem_object_make_shrinkable(obj); } }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index 478663dc42b4..4f5d49ada9e6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -460,23 +460,26 @@ void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
+/** + * i915_gem_object_make_unshrinkable - Hide the object from the shrinker. By + * default all object types that support shrinking(see IS_SHRINKABLE), will also + * make the object visible to the shrinker after allocating the system memory + * pages. + * @obj: The GEM object. + * + * This is typically used for special kernel internal objects that can't be + * easily processed by the shrinker, like if they are perma-pinned. + */ void i915_gem_object_make_unshrinkable(struct drm_i915_gem_object *obj) { struct drm_i915_private *i915 = obj_to_i915(obj); unsigned long flags;
- /* - * We can only be called while the pages are pinned or when - * the pages are released. If pinned, we should only be called - * from a single caller under controlled conditions; and on release - * only one caller may release us. Neither the two may cross. - */ - if (atomic_add_unless(&obj->mm.shrink_pin, 1, 0)) + if (!i915_gem_object_is_shrinkable(obj)) return;
spin_lock_irqsave(&i915->mm.obj_lock, flags); - if (!atomic_fetch_inc(&obj->mm.shrink_pin) && - !list_empty(&obj->mm.link)) { + if (!list_empty(&obj->mm.link)) { list_del_init(&obj->mm.link); i915->mm.shrink_count--; i915->mm.shrink_memory -= obj->base.size; @@ -494,28 +497,45 @@ static void __i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj, if (!i915_gem_object_is_shrinkable(obj)) return;
- if (atomic_add_unless(&obj->mm.shrink_pin, -1, 1)) - return; - spin_lock_irqsave(&i915->mm.obj_lock, flags); + GEM_BUG_ON(!kref_read(&obj->base.refcount)); - if (atomic_dec_and_test(&obj->mm.shrink_pin)) { - GEM_BUG_ON(!list_empty(&obj->mm.link));
- list_add_tail(&obj->mm.link, head); + if (list_empty(&obj->mm.link)) { i915->mm.shrink_count++; i915->mm.shrink_memory += obj->base.size; - + list_add_tail(&obj->mm.link, head); + } else { + list_move_tail(&obj->mm.link, head); } + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); }
+ +/** + * i915_gem_object_make_shrinkable - Move the object to the tail of the + * shrinkable list. Objects on this list might be swapped out. Used with + * WILLNEED objects. + * @obj: The GEM object. + * + * Should only be called on objects which have backing pages. + */ void i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj) { __i915_gem_object_make_shrinkable(obj, &obj_to_i915(obj)->mm.shrink_list); }
+/** + * i915_gem_object_make_purgeable - Move the object to the tail of the purgeable + * list. Used with DONTNEED objects. Unlike with shrinkable objects, the + * shrinker will attempt to discard the backing pages, instead of trying to swap + * them out. + * @obj: The GEM object. + * + * Should only be called on objects which have backing pages. + */ void i915_gem_object_make_purgeable(struct drm_i915_gem_object *obj) { __i915_gem_object_make_shrinkable(obj, diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index 1aee5e6b1b23..dd5ebc1659cf 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -185,7 +185,6 @@ static void gen6_alloc_va_range(struct i915_address_space *vm,
pt = stash->pt[0]; __i915_gem_object_pin_pages(pt->base); - i915_gem_object_make_unshrinkable(pt->base);
fill32_px(pt, vm->scratch[0]->encode);
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 6a5af995f5b1..1b946978e377 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -301,7 +301,6 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
pt = stash->pt[!!lvl]; __i915_gem_object_pin_pages(pt->base); - i915_gem_object_make_unshrinkable(pt->base);
fill_px(pt, vm->scratch[lvl]->encode);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff637147b1a9..1b7dc57e6ec1 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -111,11 +111,6 @@ static int __context_pin_state(struct i915_vma *vma, struct i915_gem_ww_ctx *ww) if (err) goto err_unpin;
- /* - * And mark it as a globally pinned object to let the shrinker know - * it cannot reclaim the object until we release it. - */ - i915_vma_make_unshrinkable(vma); vma->obj->mm.dirty = true;
return 0; @@ -127,7 +122,6 @@ static int __context_pin_state(struct i915_vma *vma, struct i915_gem_ww_ctx *ww)
static void __context_unpin_state(struct i915_vma *vma) { - i915_vma_make_shrinkable(vma); i915_active_release(&vma->active); __i915_vma_unpin(vma); } @@ -180,7 +174,6 @@ static int intel_context_pre_pin(struct intel_context *ce, if (err) goto err_timeline;
- return 0;
err_timeline: @@ -338,6 +331,8 @@ static void __intel_context_retire(struct i915_active *active)
set_bit(CONTEXT_VALID_BIT, &ce->flags); intel_context_post_unpin(ce); + if (ce->state) + i915_vma_make_shrinkable(ce->state); intel_context_put(ce); }
We currently just evict lmem objects to system memory when under memory pressure. For this case we lack the usual object mm.pages, which effectively hides the pages from the i915-gem shrinker, until we actually "attach" the TT to the object, or in the case of lmem-only objects it just gets migrated back to lmem when touched again. For such cases we can make the object visible as soon as we populate the TT with shmem pages, and then hide it again when doing the unpopulate.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 29 +++++++++++++++----- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 11 ++++++++ 3 files changed, 34 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 561d6bd0a5c9..28b831c78c47 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -540,6 +540,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
void i915_gem_object_make_unshrinkable(struct drm_i915_gem_object *obj); void i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj); +void __i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj); void i915_gem_object_make_purgeable(struct drm_i915_gem_object *obj);
static inline bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index 4f5d49ada9e6..a69d7c207558 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -487,13 +487,12 @@ void i915_gem_object_make_unshrinkable(struct drm_i915_gem_object *obj) spin_unlock_irqrestore(&i915->mm.obj_lock, flags); }
-static void __i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj, - struct list_head *head) +static void ___i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj, + struct list_head *head) { struct drm_i915_private *i915 = obj_to_i915(obj); unsigned long flags;
- GEM_BUG_ON(!i915_gem_object_has_pages(obj)); if (!i915_gem_object_is_shrinkable(obj)) return;
@@ -512,6 +511,21 @@ static void __i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj, spin_unlock_irqrestore(&i915->mm.obj_lock, flags); }
+/** + * __i915_gem_object_make_shrinkable - Move the object to the tail of the + * shrinkable list. Objects on this list might be swapped out. Used with + * WILLNEED objects. + * @obj: The GEM object. + * + * DO NOT USE. This is intended to be called on very special objects that don't + * yet have mm.pages, but are guaranteed to have potentially reclaimable pages + * underneath. + */ +void __i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj) +{ + ___i915_gem_object_make_shrinkable(obj, + &obj_to_i915(obj)->mm.shrink_list); +}
/** * i915_gem_object_make_shrinkable - Move the object to the tail of the @@ -523,8 +537,8 @@ static void __i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj, */ void i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj) { - __i915_gem_object_make_shrinkable(obj, - &obj_to_i915(obj)->mm.shrink_list); + GEM_BUG_ON(!i915_gem_object_has_pages(obj)); + __i915_gem_object_make_shrinkable(obj); }
/** @@ -538,6 +552,7 @@ void i915_gem_object_make_shrinkable(struct drm_i915_gem_object *obj) */ void i915_gem_object_make_purgeable(struct drm_i915_gem_object *obj) { - __i915_gem_object_make_shrinkable(obj, - &obj_to_i915(obj)->mm.purge_list); + GEM_BUG_ON(!i915_gem_object_has_pages(obj)); + ___i915_gem_object_make_shrinkable(obj, + &obj_to_i915(obj)->mm.purge_list); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index f4f7010e1c15..f68481b852ff 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -233,6 +233,15 @@ static int i915_ttm_tt_shmem_populate(struct ttm_device *bdev, if (ttm->page_flags & TTM_TT_FLAG_SWAPPED) ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
+ /* + * Even if we lack mm.pages for this object(which will be the case when + * something is evicted to system memory by TTM), we still want to make + * this object visible to the shrinker, since the underlying ttm_tt + * still has the real shmem pages. When unpopulating the tt(possibly due + * to shrinking) we hide it again from the shrinker. + */ + __i915_gem_object_make_shrinkable(obj); + i915_tt->cached_st = st; return 0;
@@ -247,6 +256,8 @@ static void i915_ttm_tt_shmem_unpopulate(struct ttm_tt *ttm) struct drm_i915_gem_object *obj = i915_tt->obj; bool backup = i915_tt->do_backup;
+ i915_gem_object_make_unshrinkable(obj); + if (obj->mm.madv == I915_MADV_DONTNEED) { obj->mm.dirty = false; GEM_BUG_ON(backup);
Enable shmem tt backend, and enable shrinking.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index f68481b852ff..33c1c0a50fdf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -1101,6 +1101,7 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
static const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = { .name = "i915_gem_object_ttm", + .flags = I915_GEM_OBJECT_IS_SHRINKABLE,
.get_pages = i915_ttm_get_pages, .put_pages = i915_ttm_put_pages,
Am 21.09.21 um 13:01 schrieb Matthew Auld:
In commit:
commit 09ac4fcb3f255e9225967c75f5893325c116cdbe Author: Felix Kuehling Felix.Kuehling@amd.com Date: Thu Jul 13 17:01:16 2017 -0400
drm/ttm: Implement vm_operations_struct.access v2
we added the vm_access hook, where we also directly call tt_swapin for some reason. If something is swapped-out then the ttm_tt must also be unpopulated, and since access_kmap should also call tt_populate, if needed, then swapping-in will already be handled there.
Sounds like you completely misunderstand what that is good for.
This is for debugger attaching to a process and peek/poke into the VMA and completely unrelated to kmap.
If anything, calling tt_swapin directly here would likely always fail since the tt->pages won't yet be populated, or worse since the tt->pages array is never actually cleared in unpopulate this might lead to a nasty uaf.
That's indeed true, but we just need to unconditionally call ttm_tt_populate() here instead.
Regards, Christian.
Fixes: 09ac4fcb3f25 ("drm/ttm: Implement vm_operations_struct.access v2") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index f56be5bc0861..5b9b7fd01a69 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -519,11 +519,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
switch (bo->resource->mem_type) { case TTM_PL_SYSTEM:
if (unlikely(bo->ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) {
ret = ttm_tt_swapin(bo->ttm);
if (unlikely(ret != 0))
return ret;
fallthrough; case TTM_PL_TT: ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);}
HI, Christian, On Tue, 2021-09-21 at 13:28 +0200, Christian König wrote:
Am 21.09.21 um 13:01 schrieb Matthew Auld:
In commit:
commit 09ac4fcb3f255e9225967c75f5893325c116cdbe Author: Felix Kuehling Felix.Kuehling@amd.com Date: Thu Jul 13 17:01:16 2017 -0400
drm/ttm: Implement vm_operations_struct.access v2
we added the vm_access hook, where we also directly call tt_swapin for some reason. If something is swapped-out then the ttm_tt must also be unpopulated, and since access_kmap should also call tt_populate, if needed, then swapping-in will already be handled there.
Sounds like you completely misunderstand what that is good for.
This is for debugger attaching to a process and peek/poke into the VMA and completely unrelated to kmap.
I think what Matthew is saying is that there is a fallthrough to TTM_PL_TT which calls
ttm_bo_vm_access_kmap
which calls
ttm_tt_populate().
So from my pow, unless there are other concerns, this is
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
If anything, calling tt_swapin directly here would likely always fail since the tt->pages won't yet be populated, or worse since the tt-
pages
array is never actually cleared in unpopulate this might lead to a nasty uaf.
That's indeed true, but we just need to unconditionally call ttm_tt_populate() here instead.
Regards, Christian.
Fixes: 09ac4fcb3f25 ("drm/ttm: Implement vm_operations_struct.access v2") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index f56be5bc0861..5b9b7fd01a69 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -519,11 +519,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, switch (bo->resource->mem_type) { case TTM_PL_SYSTEM: - if (unlikely(bo->ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) { - ret = ttm_tt_swapin(bo->ttm); - if (unlikely(ret != 0)) - return ret; - } fallthrough; case TTM_PL_TT: ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
Am 21.09.21 um 13:37 schrieb Thomas Hellström:
HI, Christian, On Tue, 2021-09-21 at 13:28 +0200, Christian König wrote:
Am 21.09.21 um 13:01 schrieb Matthew Auld:
In commit:
commit 09ac4fcb3f255e9225967c75f5893325c116cdbe Author: Felix Kuehling Felix.Kuehling@amd.com Date: Thu Jul 13 17:01:16 2017 -0400
drm/ttm: Implement vm_operations_struct.access v2
we added the vm_access hook, where we also directly call tt_swapin for some reason. If something is swapped-out then the ttm_tt must also be unpopulated, and since access_kmap should also call tt_populate, if needed, then swapping-in will already be handled there.
Sounds like you completely misunderstand what that is good for.
This is for debugger attaching to a process and peek/poke into the VMA and completely unrelated to kmap.
I think what Matthew is saying is that there is a fallthrough to TTM_PL_TT which calls
ttm_bo_vm_access_kmap
Ah, good point. Now that makes much more sense.
which calls
ttm_tt_populate().
So from my pow, unless there are other concerns, this is
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
In that case the patch is Reviewed-by: Christian König christian.koenig@amd.com as well.
When this is gone we can also make ttm_tt_swapin() static since that here is the only user.
And BTW replacing the switch/case with a check for use_tt in the resource_manager is probably a good idea as well.
Regards, Christian.
If anything, calling tt_swapin directly here would likely always fail since the tt->pages won't yet be populated, or worse since the tt-
pages
array is never actually cleared in unpopulate this might lead to a nasty uaf.
That's indeed true, but we just need to unconditionally call ttm_tt_populate() here instead.
Regards, Christian.
Fixes: 09ac4fcb3f25 ("drm/ttm: Implement vm_operations_struct.access v2") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Christian König christian.koenig@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index f56be5bc0861..5b9b7fd01a69 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -519,11 +519,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
switch (bo->resource->mem_type) { case TTM_PL_SYSTEM: - if (unlikely(bo->ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) { - ret = ttm_tt_swapin(bo->ttm); - if (unlikely(ret != 0)) - return ret; - } fallthrough; case TTM_PL_TT: ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
dri-devel@lists.freedesktop.org