CC: Hugh Dickins
On Wed, Mar 31, 2021 at 9:37 PM Alistair Popple apopple@nvidia.com wrote:
On Wednesday, 31 March 2021 10:57:46 PM AEDT Jason Gunthorpe wrote:
On Wed, Mar 31, 2021 at 03:15:47PM +1100, Alistair Popple wrote:
On Wednesday, 31 March 2021 2:56:38 PM AEDT John Hubbard wrote:
On 3/30/21 3:56 PM, Alistair Popple wrote: ...
+1 for renaming "munlock*" items to "mlock*", where applicable. good
grief.
At least the situation was weird enough to prompt further
investigation :)
Renaming to mlock* doesn't feel like the right solution to me either
though. I
am not sure if you saw me responding to myself earlier but I am
thinking
renaming try_to_munlock() -> page_mlocked() and try_to_munlock_one() -
page_mlock_one() might be better. Thoughts?
Quite confused by this naming idea. Because: try_to_munlock() returns void, so a boolean-style name such as "page_mlocked()" is already not a good fit.
Even more important, though, is that try_to_munlock() is mlock-ing the page, right? Is there some subtle point I'm missing? It really is doing an mlock to the best of my knowledge here. Although the kerneldoc comment for try_to_munlock() seems questionable too:
It's mlocking the page if it turns out it still needs to be locked after unlocking it. But I don't think you're missing anything.
It is really searching all VMA's to see if the VMA flag is set and if any are found then it mlocks the page.
But presenting this rountine in its simplified form raises lots of questions:
- What locking is being used to read the VMA flag?
- Why do we need to manipulate global struct page flags under the page table locks of a single VMA?
I was wondering that and questioned it in an earlier version of this series. I have done some digging and the commit log for b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set PageMlocked") provides the original justification.
It's fairly long so I won't quote it here but the summary seems to be that among other things the combination of page lock and ptl makes this safe. I have yet to verify if everything there still holds and is sensible, but the last paragraph certainly is :-)
"Stopped short of separating try_to_munlock_one() from try_to_munmap_one() on this occasion, but that's probably the sensible next step - with a rename, given that try_to_munlock()'s business is to try to set Mlocked."
- Why do we need to check for huge pages inside the VMA loop, not before going to the rmap? PageTransCompoundHead() is not sensitive to the PTEs. (and what happens if the huge page breaks up concurrently?)
- Why do we clear the mlock bit then run around to try and set it?
I don't have an answer for that as I'm not (yet) across all the mlock code paths, but I'm hoping this patch at least won't change anything.
It would be good to ask the person who has the most answers?
Hugh, the thread started at https://lore.kernel.org/dri-devel/20210326000805.2518-4-apopple@nvidia.com/