On Tue, Feb 15, 2022 at 05:49:07PM -0500, Felix Kuehling wrote:
Userspace does
- mmap(MAP_PRIVATE) to allocate anon memory
- something to trigger migration to install a ZONE_DEVICE page
- munmap()
Who decrements the refcout on the munmap?
When a ZONE_DEVICE page is installed in the PTE is supposed to be marked as pte_devmap and that disables all the normal page refcounting during munmap().
fsdax makes this work by working the refcounts backwards, the page is refcounted while it exists in the driver, when the driver decides to remove it then unmap_mapping_range() is called to purge it from all PTEs and then refcount is decrd. munmap/fork/etc don't change the refcount.
Hmm, that just means, whether or not there are PTEs doesn't really matter.
Yes, that is the FSDAX model
It should work the same as it does for DEVICE_PRIVATE pages. I'm not sure where DEVICE_PRIVATE page's refcounts are decremented on unmap, TBH. But I can't find it in our driver, or in the test_hmm driver for that matter.
It is not the same as DEVICE_PRIVATE because DEVICE_PRIVATE uses swap entries. The put_page for that case is here:
static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, struct zap_details *details) { [..] if (is_device_private_entry(entry) || is_device_exclusive_entry(entry)) { struct page *page = pfn_swap_entry_to_page(entry);
if (unlikely(zap_skip_check_mapping(details, page))) continue; pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); rss[mm_counter(page)]--;
if (is_device_private_entry(entry)) page_remove_rmap(page, false);
put_page(page);
However the devmap case will return NULL from vm_normal_page() and won't do the put_page() embedded inside the __tlb_remove_page() in the pte_present() block in the same function.
After reflecting for awhile, I think Christoph's idea is quite good. Just make it so you don't set pte_devmap() on your pages and then lets avoid pte_devmap for all refcount correct ZONE_DEVICE pages.
Jason