On Mon, Mar 16, 2020 at 10:05:03AM +0100, Christoph Hellwig wrote:
On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
From: Jason Gunthorpe jgg@mellanox.com
This eventually calls into handle_mm_fault() which is a sleeping function. Release the lock first.
hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not need the lock.
So how did this manage to not be noticed before?
I'm still struggling a bit with how this PUD stuff works.. However AFAICT:
1) The first hunk around pud_none() is actualy covering a race. In the non-race case the page walker will directly call hmm_vma_walk_hole() for pud_none so it would be very hard to hit this
2) I'm not 100% sure, but I think pud_present() == pud_none(), as there is no swap entry for PUD I don't know what a non-present but non-none entry is or how to create one. This is possibly dead code due to #1
3) To hit hmm_range_need_fault() requesting fault you would need a read-only huge PUD and a fault requesting write. I suspect creating a read only huge PUD entry is very rare - not something someone would deliberately construct.
4) To even trigger the PUD flow at all you need the 1G THP or the special 1G DAX pages which I strongly suspect people are not testing with.
The fix looks fine assuming we want something backportable before starting the cleanups:
I found it easier to make the other cleanup patches make sense if they didn't introduce all kinds of new logic too..
Thanks, Jason