On Monday, 15 March 2021 6:42:45 PM AEDT Christoph Hellwig wrote:
+Not all devices support atomic access to system memory. To support atomic +operations to a shared virtual memory page such a device needs access to
that
+page which is exclusive of any userspace access from the CPU. The +``make_device_exclusive_range()`` function can be used to make a memory
range
+inaccessible from userspace.
s/Not all devices/Some devices/ ?
I will reword this. What I was trying to convey is that devices may have features which allow for atomics to be implemented with SW assistance.
static inline int mm_has_notifiers(struct mm_struct *mm) @@ -528,7 +534,17 @@ static inline void mmu_notifier_range_init_migrate( { mmu_notifier_range_init(range, MMU_NOTIFY_MIGRATE, flags, vma, mm, start, end);
- range->migrate_pgmap_owner = pgmap;
- range->owner = pgmap;
+}
+static inline void mmu_notifier_range_init_exclusive(
struct mmu_notifier_range *range, unsigned int flags,
struct vm_area_struct *vma, struct mm_struct *mm,
unsigned long start, unsigned long end, void *owner)
+{
- mmu_notifier_range_init(range, MMU_NOTIFY_EXCLUSIVE, flags, vma, mm,
start, end);
- range->owner = owner;
Maybe just replace mmu_notifier_range_init_migrate with a mmu_notifier_range_init_owner helper that takes the owner but does not hard code a type?
Ok. That does result in a function which takes a fair number of arguments, but I guess that's no worse than multiple functions hard coding the different types and it does result in less code overall.
}
- } else if (is_device_exclusive_entry(entry)) {
page = pfn_swap_entry_to_page(entry);
get_page(page);
rss[mm_counter(page)]++;
if (is_writable_device_exclusive_entry(entry) &&
is_cow_mapping(vm_flags)) {
/*
* COW mappings require pages in both
* parent and child to be set to read.
*/
entry = make_readable_device_exclusive_entry(
swp_offset(entry));
pte = swp_entry_to_pte(entry);
if (pte_swp_soft_dirty(*src_pte))
pte = pte_swp_mksoft_dirty(pte);
if (pte_swp_uffd_wp(*src_pte))
pte = pte_swp_mkuffd_wp(pte);
set_pte_at(src_mm, addr, src_pte, pte);
}
Just cosmetic, but I wonder if should factor this code block into a little helper.
In that case there are arguably are other bits of this function which should be refactored into helpers as well. Unless you feel strongly about it I would like to leave this as is and put together a future series to fix this and a couple of other areas I've noticed that could do with some refactoring/clean ups.
+static bool try_to_protect_one(struct page *page, struct vm_area_struct
*vma,
unsigned long address, void *arg)
+{
- struct mm_struct *mm = vma->vm_mm;
- struct page_vma_mapped_walk pvmw = {
.page = page,
.vma = vma,
.address = address,
- };
- struct ttp_args *ttp = (struct ttp_args *) arg;
This cast should not be needed.
- return ttp.valid && (!page_mapcount(page) ? true : false);
This can be simplified to:
return ttp.valid && !page_mapcount(page);
- npages = get_user_pages_remote(mm, start, npages,
FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD,
pages, NULL, NULL);
- for (i = 0; i < npages; i++, start += PAGE_SIZE) {
if (!trylock_page(pages[i])) {
put_page(pages[i]);
pages[i] = NULL;
continue;
}
if (!try_to_protect(pages[i], mm, start, arg)) {
unlock_page(pages[i]);
put_page(pages[i]);
pages[i] = NULL;
}
Should the trylock_page go into try_to_protect to simplify the loop a little? Also I wonder if we need make_device_exclusive_range or should just open code the get_user_pages_remote + try_to_protect loop in the callers, as that might allow them to also deduct other information about the found pages.
This function has evolved over time and putting the trylock_page into try_to_protect does simplify things nicely. I'm not sure what other information a caller could deduct through open coding though, but I guess in some circumstances it might be possible for callers to skip get_user_pages_remote() which might be a future improvement.
The main reason it looks like this was simply to keep it looking fairly similar to how hmm_range_fault() and migrate_vma() are used with an array of pages (or pfns) which are filled out from the given address range.
Otherwise looks good:
Reviewed-by: Christoph Hellwig hch@lst.de
Thanks.
dri-devel@lists.freedesktop.org