On Fri, Feb 26, 2021 at 06:18:27PM +1100, Alistair Popple wrote:
The behaviour of try_to_unmap_one() is difficult to follow because it performs different operations based on a fairly large set of flags used in different combinations.
TTU_MUNLOCK is one such flag. However it is exclusively used by try_to_munlock() which specifies no other flags. Therefore rather than overload try_to_unmap_one() with unrelated behaviour split this out into it's own function and remove the flag.
Signed-off-by: Alistair Popple apopple@nvidia.com
Given the comments on not needing to hold mmap_lock it was not 100% clear to me if it is safe to check vma->vma_flags & VM_LOCKED and if re-checking under the ptl was significant. I left the extra check in case it was, but it seems one of the checks is redunant as either the first check is racey or the second check is unneccsary.
The rmap doesn't hold the mmap_lock so I think both of these cases are racey.
eg
apply_vma_lock_flags()
vma = find_vma(current->mm, start); if (!vma || vma->vm_start > start) return -ENOMEM;
prev = vma->vm_prev; if (start > vma->vm_start) prev = vma;
for (nstart = start ; ; ) { vm_flags_t newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
newflags |= flags; [...] mlock_fixup() /* * vm_flags is protected by the mmap_lock held in write mode. * It's okay if try_to_unmap_one unmaps a page just after we * set VM_LOCKED, populate_vma_page_range will bring it back. */
if (lock) vma->vm_flags = newflags; else vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
Which is only done under the mmap_sem
+static bool try_to_munlock_one(struct page *page, struct vm_area_struct *vma,
unsigned long address, void *arg)
+{
- struct page_vma_mapped_walk pvmw = {
.page = page,
.vma = vma,
.address = address,
- };
- bool ret = true;
- /* munlock has nothing to gain from examining un-locked vmas */
- if (!(vma->vm_flags & VM_LOCKED))
return true;
The mmap_sem can't be obtained in the rmap walkers due to lock ordering, the various rmap locks are nested under the mmap_sem
So, when reading data that is not locked it should be written as:
READ_ONCE(vma->vm_flags) & VM_LOCKED
- while (page_vma_mapped_walk(&pvmw)) {
/*
* If the page is mlock()d, we cannot swap it out.
* If it's recently referenced (perhaps page_referenced
* skipped over this mm) then we should reactivate it.
*/
if (vma->vm_flags & VM_LOCKED) {
And since we write the data without holding the PTLs this looks pointless, unless there is some other VM_LOCKED manipulation
Jason