On 1/29/20 3:55 PM, Christian König wrote:
Am 24.01.20 um 10:09 schrieb Thomas Hellström (VMware):
From: Thomas Hellstrom thellstrom@vmware.com
Support huge (PMD-size and PUD-size) page-table entries by providing a huge_fault() callback. We still support private mappings and write-notify by splitting the huge page-table entries on write-access.
Note that for huge page-faults to occur, either the kernel needs to be compiled with trans-huge-pages always enabled, or the kernel needs to be compiled with trans-huge-pages enabled using madvise, and the user-space app needs to call madvise() to enable trans-huge pages on a per-mapping basis.
Furthermore huge page-faults will not succeed unless buffer objects and user-space addresses are aligned on huge page size boundaries.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: "Christian König" christian.koenig@amd.com Cc: Dan Williams dan.j.williams@intel.com Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Roland Scheidegger sroland@vmware.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 145 ++++++++++++++++++++- drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 2 +- include/drm/ttm/ttm_bo_api.h | 3 +- 3 files changed, 145 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 389128b8c4dd..49704261a00d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -156,6 +156,89 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo, } EXPORT_SYMBOL(ttm_bo_vm_reserve); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/**
- ttm_bo_vm_insert_huge - Insert a pfn for PUD or PMD faults
- @vmf: Fault data
- @bo: The buffer object
- @page_offset: Page offset from bo start
- @fault_page_size: The size of the fault in pages.
- @pgprot: The page protections.
- Does additional checking whether it's possible to insert a PUD or
PMD
- pfn and performs the insertion.
- Return: VM_FAULT_NOPAGE on successful insertion,
VM_FAULT_FALLBACK if
- a huge fault was not possible, and a VM_FAULT_ERROR code otherwise.
- */
+static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault *vmf, + struct ttm_buffer_object *bo, + pgoff_t page_offset, + pgoff_t fault_page_size, + pgprot_t pgprot) +{ + pgoff_t i; + vm_fault_t ret; + unsigned long pfn; + pfn_t pfnt; + struct ttm_tt *ttm = bo->ttm; + bool write = vmf->flags & FAULT_FLAG_WRITE;
+ /* Fault should not cross bo boundary. */ + page_offset &= ~(fault_page_size - 1); + if (page_offset + fault_page_size > bo->num_pages) + goto out_fallback;
+ if (bo->mem.bus.is_iomem) + pfn = ttm_bo_io_mem_pfn(bo, page_offset); + else + pfn = page_to_pfn(ttm->pages[page_offset]);
+ /* pfn must be fault_page_size aligned. */ + if ((pfn & (fault_page_size - 1)) != 0) + goto out_fallback;
+ /* Check that memory is contiguous. */ + if (!bo->mem.bus.is_iomem) + for (i = 1; i < fault_page_size; ++i) { + if (page_to_pfn(ttm->pages[page_offset + i]) != pfn + i) + goto out_fallback; + } + /* IO mem without the io_mem_pfn callback is always contiguous. */ + else if (bo->bdev->driver->io_mem_pfn) + for (i = 1; i < fault_page_size; ++i) { + if (ttm_bo_io_mem_pfn(bo, page_offset + i) != pfn + i) + goto out_fallback; + }
Maybe add {} to the if to make clear where things start/end.
+ pfnt = __pfn_to_pfn_t(pfn, PFN_DEV); + if (fault_page_size == (HPAGE_PMD_SIZE >> PAGE_SHIFT)) + ret = vmf_insert_pfn_pmd_prot(vmf, pfnt, pgprot, write); +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + else if (fault_page_size == (HPAGE_PUD_SIZE >> PAGE_SHIFT)) + ret = vmf_insert_pfn_pud_prot(vmf, pfnt, pgprot, write); +#endif + else + WARN_ON_ONCE(ret = VM_FAULT_FALLBACK);
+ if (ret != VM_FAULT_NOPAGE) + goto out_fallback;
+ return VM_FAULT_NOPAGE; +out_fallback: + count_vm_event(THP_FAULT_FALLBACK); + return VM_FAULT_FALLBACK;
This doesn't seem to match the function documentation since we never return ret here as far as I can see.
Apart from those comments it looks like that should work, Christian.
Thanks for reviewing, Christian. I'll update the next version with your feedback.
/Thomas