Re: [PATCH V4 4/6] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it

23 Jul 2015

      On Thu, 23 Jul 2015, Vlastimil Babka wrote:
...
On 07/22/2015 08:43 PM, Eric B Munson wrote:
...
On Wed, 22 Jul 2015, Vlastimil Babka wrote:
...
Hi,
I think you should include a complete description of which
transitions for vma states and mlock2/munlock2 flags applied on them
are valid and what they do. It will also help with the manpages.
You explained some to Jon in the last thread, but I think there
should be a canonical description in changelog (if not also
Documentation, if mlock is covered there).
For example the scenario Jon asked, what happens after a
mlock2(MLOCK_ONFAULT) followed by mlock2(MLOCK_LOCKED), and that the
answer is "nothing". Your promised code comment for
apply_vma_flags() doesn't suffice IMHO (and I'm not sure it's there,
anyway?).
I missed adding that comment to the code, will be there in V5 along with
the description in the changelog.
Thanks!
...
...
But the more I think about the scenario and your new VM_LOCKONFAULT
vma flag, it seems awkward to me. Why should munlocking at all care
if the vma was mlocked with MLOCK_LOCKED or MLOCK_ONFAULT? In either
case the result is that all pages currently populated are munlocked.
So the flags for munlock2 should be unnecessary.
Say a user has a large area of interleaved MLOCK_LOCK and MLOCK_ONFAULT
mappings and they want to unlock only the ones with MLOCK_LOCK.  With
the current implementation, this is possible in a single system call
that spans the entire region.  With your suggestion, the user would have
to know what regions where locked with MLOCK_LOCK and call munlock() on
each of them.  IMO, the way munlock2() works better mirrors the way
munlock() currently works when called on a large area of interleaved
locked and unlocked areas.
Um OK, that scenario is possible in theory. But I have a hard time imagining
that somebody would really want to do that. I think much more people would
benefit from a simpler API.
It wasn't about imagining a scenario, more about keeping parity with
something that currently works (unlocking a large area of interleaved
locked and unlocked regions).  However, there is no reason we can't add
the new munlock2 later if it is desired.
...
...
...
I also think VM_LOCKONFAULT is unnecessary. VM_LOCKED should be
enough - see how you had to handle the new flag in all places that
had to handle the old flag? I think the information whether mlock
was supposed to fault the whole vma is obsolete at the moment mlock
returns. VM_LOCKED should be enough for both modes, and the flag to
mlock2 could just control whether the pre-faulting is done.
So what should be IMHO enough:

munlock can stay without flags
mlock2 has only one new flag MLOCK_ONFAULT. If specified,

pre-faulting is not done, just set VM_LOCKED and mlock pages already
present.

same with mmap(MAP_LOCKONFAULT) (need to define what happens when

both MAP_LOCKED and MAP_LOCKONFAULT are specified).
Now mlockall(MCL_FUTURE) muddles the situation in that it stores the
information for future VMA's in current->mm->def_flags, and this
def_flags would need to distinguish VM_LOCKED with population and
without. But that could be still solvable without introducing a new
vma flag everywhere.
With you right up until that last paragraph.  I have been staring at
this a while and I cannot come up a way to handle the
mlockall(MCL_ONFAULT) without introducing a new vm flag.  It doesn't
have to be VM_LOCKONFAULT, we could use the model that Michal Hocko
suggested with something like VM_FAULTPOPULATE.  However, we can't
really use this flag anywhere except the mlock code becuase we have to
be able to distinguish a caller that wants to use MLOCK_LOCK with
whatever control VM_FAULTPOPULATE might grant outside of mlock and a
caller that wants MLOCK_ONFAULT.  That was a long way of saying we need
an extra vma flag regardless.  However, if that flag only controls if
mlock pre-populates it would work and it would do away with most of the
places I had to touch to handle VM_LOCKONFAULT properly.
Yes, it would be a good way. Adding a new vma flag is probably cleanest after
all, but the flag would be set *in addition* to VM_LOCKED, *just* to prevent
pre-faulting. The places that check VM_LOCKED for the actual page mlocking (i.e.
try_to_unmap_one) would just keep checking VM_LOCKED. The places where VM_LOCKED
is checked to trigger prepopulation, would skip that if VM_LOCKONFAULT is also
set. Having VM_LOCKONFAULT set without also VM_LOCKED itself would be invalid state.
This should work fine with the simplified API as I proposed so let me reiterate
and try fill in the blanks:

mlock2 has only one new flag MLOCK_ONFAULT. If specified, VM_LOCKONFAULT is

set in addition to VM_LOCKED and no prefaulting is done

old mlock syscall naturally behaves as mlock2 without MLOCK_ONFAULT
calling mlock/mlock2 on an already-mlocked area (if that's permitted

already?) will add/remove VM_LOCKONFAULT as needed. If it's removing,
prepopulate whole range. Of course adding VM_LOCKONFAULT to a vma that was
already prefaulted doesn't make any difference, but it's consistent with the rest.

munlock removes both VM_LOCKED and VM_LOCKONFAULT
mmap could treat MAP_LOCKONFAULT as a modifier to MAP_LOCKED to be consistent?

or not? I'm not sure here, either way subtly differs from mlock API anyway, I
just wish MAP_LOCKED never existed...

mlockall(MCL_CURRENT) sets or clears VM_LOCKONFAULT depending on

MCL_LOCKONFAULT, mlockall(MCL_FUTURE) does the same on mm->def_flags

munlockall2 removes both, like munlock. munlockall2(MCL_FUTURE) does that to

def_flags
...
I picked VM_LOCKONFAULT because it is explicit about what it is for and
there is little risk of someone coming along in 5 years and saying "why
not overload this flag to do this other thing completely unrelated to
mlock?".  A flag for controling speculative population is more likely to
be overloaded outside of mlock().
Sure, let's make clear the name is related to mlock, but the behavior could
still be additive to MAP_LOCKED.
...
If you have a sane way of handling mlockall(MCL_ONFAULT) without a new
VMA flag, I am happy to give it a try, but I haven't been able to come
up with one that doesn't have its own gremlins.
Well we could store the MCL_FUTURE | MCL_ONFAULT bit elsewhere in mm_struct than
the def_flags field. The VM_LOCKED field is already evaluated specially from all
the other def_flags. We are nearing the full 32bit space for vma flags. I think
all I've proposed above wouldn't change much if we removed per-vma
VM_LOCKONFAULT flag from the equation. Just that re-mlocking area already
mlocked *withouth* MLOCK_ONFAULT wouldn't know that it was alread prepopulated,
and would have to re-populate in either case (I'm not sure, maybe it's already
done by current implementation anyway so it's not a potential performance
regression).
Only mlockall(MCL_FUTURE | MCL_ONFAULT) should really need the ONFAULT info to
"stick" somewhere in mm_struct, but it doesn't have to be def_flags?
This all sounds fine and should still cover the usecase that started
this adventure.  I will include this change in the V5 spin.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH V4 4/6] mm: mlock: Introduce VM_LOCKONFAULT and add mlock flags to enable it