From: Thomas Hellstrom thellstrom@vmware.com
The drm/ttm module is using a modified on-stack copy of the struct vm_area_struct to be able to set a page protection with customized caching. Fix that by adding a vmf_insert_mixed_prot() function similar to the existing vmf_insert_pfn_prot() for use with drm/ttm.
I'd like to merge this through a drm tree.
Changes since v1: *) Formatting fixes in patch 1 *) Updated commit message of patch 2.
Thomas Hellstrom (2): mm: Add and export vmf_insert_mixed_prot() drm/ttm: Fix vm page protection handling
drivers/gpu/drm/ttm/ttm_bo_vm.c | 14 +++++++------- include/linux/mm.h | 2 ++ mm/memory.c | 15 +++++++++++---- 3 files changed, 20 insertions(+), 11 deletions(-)
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: "Christian König" christian.koenig@amd.com
From: Thomas Hellstrom thellstrom@vmware.com
The TTM module today uses a hack to be able to set a different page protection than struct vm_area_struct::vm_page_prot. To be able to do this properly, add and export vmf_insert_mixed_prot().
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: "Christian König" christian.koenig@amd.com Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Acked-by: Christian König christian.koenig@amd.com --- include/linux/mm.h | 2 ++ mm/memory.c | 15 +++++++++++---- 2 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index cc292273e6ba..29575d3c1e47 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2548,6 +2548,8 @@ vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn, pgprot_t pgprot); vm_fault_t vmf_insert_mixed(struct vm_area_struct *vma, unsigned long addr, pfn_t pfn); +vm_fault_t vmf_insert_mixed_prot(struct vm_area_struct *vma, unsigned long addr, + pfn_t pfn, pgprot_t pgprot); vm_fault_t vmf_insert_mixed_mkwrite(struct vm_area_struct *vma, unsigned long addr, pfn_t pfn); int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len); diff --git a/mm/memory.c b/mm/memory.c index b1ca51a079f2..4c26c27afb0a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1719,9 +1719,9 @@ static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn) }
static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma, - unsigned long addr, pfn_t pfn, bool mkwrite) + unsigned long addr, pfn_t pfn, pgprot_t pgprot, + bool mkwrite) { - pgprot_t pgprot = vma->vm_page_prot; int err;
BUG_ON(!vm_mixed_ok(vma, pfn)); @@ -1764,10 +1764,17 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma, return VM_FAULT_NOPAGE; }
+vm_fault_t vmf_insert_mixed_prot(struct vm_area_struct *vma, unsigned long addr, + pfn_t pfn, pgprot_t pgprot) +{ + return __vm_insert_mixed(vma, addr, pfn, pgprot, false); +} +EXPORT_SYMBOL(vmf_insert_mixed_prot); + vm_fault_t vmf_insert_mixed(struct vm_area_struct *vma, unsigned long addr, pfn_t pfn) { - return __vm_insert_mixed(vma, addr, pfn, false); + return __vm_insert_mixed(vma, addr, pfn, vma->vm_page_prot, false); } EXPORT_SYMBOL(vmf_insert_mixed);
@@ -1779,7 +1786,7 @@ EXPORT_SYMBOL(vmf_insert_mixed); vm_fault_t vmf_insert_mixed_mkwrite(struct vm_area_struct *vma, unsigned long addr, pfn_t pfn) { - return __vm_insert_mixed(vma, addr, pfn, true); + return __vm_insert_mixed(vma, addr, pfn, vma->vm_page_prot, true); } EXPORT_SYMBOL(vmf_insert_mixed_mkwrite);
From: Thomas Hellstrom thellstrom@vmware.com
TTM graphics buffer objects may, transparently to user-space, move between IO and system memory. When that happens, all PTEs pointing to the old location are zapped before the move and then faulted in again if needed. When that happens, the page protection caching mode- and encryption bits may change and be different from those of struct vm_area_struct::vm_page_prot.
We were using an ugly hack to set the page protection correctly. Fix that and instead use vmf_insert_mixed_prot() and / or vmf_insert_pfn_prot(). Also get the default page protection from struct vm_area_struct::vm_page_prot rather than using vm_get_page_prot(). This way we catch modifications done by the vm system for drivers that want write-notification.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: "Christian König" christian.koenig@amd.com Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index e6495ca2630b..2098f8d4dfc5 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -173,7 +173,6 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, pgoff_t num_prefault) { struct vm_area_struct *vma = vmf->vma; - struct vm_area_struct cvma = *vma; struct ttm_buffer_object *bo = vma->vm_private_data; struct ttm_bo_device *bdev = bo->bdev; unsigned long page_offset; @@ -244,7 +243,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, goto out_io_unlock; }
- cvma.vm_page_prot = ttm_io_prot(bo->mem.placement, prot); + prot = ttm_io_prot(bo->mem.placement, prot); if (!bo->mem.bus.is_iomem) { struct ttm_operation_ctx ctx = { .interruptible = false, @@ -260,7 +259,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } } else { /* Iomem should not be marked encrypted */ - cvma.vm_page_prot = pgprot_decrypted(cvma.vm_page_prot); + prot = pgprot_decrypted(prot); }
/* @@ -284,10 +283,11 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, }
if (vma->vm_flags & VM_MIXEDMAP) - ret = vmf_insert_mixed(&cvma, address, - __pfn_to_pfn_t(pfn, PFN_DEV)); + ret = vmf_insert_mixed_prot(vma, address, + __pfn_to_pfn_t(pfn, PFN_DEV), + prot); else - ret = vmf_insert_pfn(&cvma, address, pfn); + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
/* Never error on prefaulted PTEs */ if (unlikely((ret & VM_FAULT_ERROR))) { @@ -319,7 +319,7 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) if (ret) return ret;
- prot = vm_get_page_prot(vma->vm_flags); + prot = vma->vm_page_prot; ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT); if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret;
On Tue 03-12-19 11:48:53, Thomas Hellström (VMware) wrote:
So essentially this should have any new side effect on functionality it is just making a hacky/ugly code less so? In other words what are the consequences of having page protection inconsistent from vma's?
On 12/4/19 2:52 PM, Michal Hocko wrote:
Functionality is unchanged. The use of a on-stack vma copy was severely frowned upon in an earlier thread, which also points to another similar example using vmf_insert_pfn_prot().
https://lore.kernel.org/lkml/20190905103541.4161-2-thomas_os@shipmail.org/
In other words what are the consequences of having page protection inconsistent from vma's?
During the years, it looks like the caching- and encryption flags of vma::vm_page_prot have been largely removed from usage. From what I can tell, there are no more places left that can affect TTM. We discussed __split_huge_pmd_locked() towards the end of that thread, but that doesn't affect TTM even with huge page-table entries.
/Thomas
On Wed 04-12-19 15:36:58, Thomas Hellström (VMware) wrote:
And thinking about that this also begs for a comment in the code to explain that some (which?) mappings might have a mismatch and the generic code have to be careful. Because as things stand now this seems to be really subtle and happen to work _now_ and might break in the future. Or what does prevent a generic code to stumble over this discrepancy?
On 12/4/19 3:42 PM, Michal Hocko wrote:
Yes we had that discussion in the thread I pointed to. I initially suggested and argued for updating the vma::vm_page_prot using a WRITE_ONCE() (we only have the mmap_sem in read mode), there seems to be other places in generic code that does the same.
But I was convinced by Andy that this was the right way and also was used elsewhere.
(See also https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/vdso/vma.c#L11...)
I guess to have this properly formulated, what's required is that generic code doesn't build page-table entries using vma::vm_page_prot for VM_PFNMAP and VM_MIXEDMAP outside of driver control.
/Thomas
On Wed 04-12-19 16:19:27, Thomas Hellström (VMware) wrote:
Let me repeat that this belongs to a code somewhere everybody can see it rather than a "random" discussion at mailing list.
Thanks!
dri-devel@lists.freedesktop.org