On Tue, Sep 3, 2019 at 9:38 PM Dave Hansen dave.hansen@intel.com wrote:
This whole thing looks like a fascinating collection of hacks. :)
ttm is taking a stack-alllocated "VMA" and handing it to vmf_insert_*() which obviously are expecting "real" VMAs that are linked into the mm. It's extracting some pgprot_t information from the real VMA, making a psuedo-temporary VMA, then passing the temporary one back into the insertion functions:
static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) {
...
struct vm_area_struct cvma;
...
if (vma->vm_flags & VM_MIXEDMAP) ret = vmf_insert_mixed(&cvma, address, __pfn_to_pfn_t(pfn, PFN_DEV)); else ret = vmf_insert_pfn(&cvma, address, pfn);
I can totally see why this needs new exports. But, man, it doesn't seem like something we want to keep *feeding*.
The real problem here is that the encryption bits from the device VMA's "true" vma->vm_page_prot don't match the ones that actually get inserted, probably because the device ptes need the encryption bits cleared but the system memory PTEs need them set *and* they're mixed under one VMA.
The thing we need to stop is having mixed encryption rules under one VMA.
The point here is that we want this. We need to be able to move the buffer between device ptes and system memory ptes, transparently, behind userspace back, without races. And the fast path (which is "no pte exists for this vma") must be real fast, so taking mmap_sem and replacing the vma is no-go. -Daniel