On Thu, Sep 26, 2019 at 03:20:42PM -0700, Linus Torvalds wrote:
On Thu, Sep 26, 2019 at 1:55 PM Thomas Hellström (VMware) thomas_os@shipmail.org wrote:
Well, we're working on supporting huge puds and pmds in the graphics VMAs, although in the write-notify cases we're looking at here, we would probably want to split them down to PTE level.
Well, that's what the existing walker code does if you don't have that "pud_entry()" callback.
That said, I assume you would *not* want to do that if the huge pud/pmd is already clean and read-only, but just continue.
So you may want to have a special pud_entry() that handles that case. Eventually. Maybe. Although honestly, if you're doing dirty tracking, I doubt it makes much sense to use largepages.
Looking at zap_pud_range() which when called from unmap_mapping_pages() uses identical locking (no mmap_sem), it seems we should be able to get away with i_mmap_lock(), making sure the whole page table doesn't disappear under us. So it's not clear to me why the mmap_sem is strictly needed here. Better to sort those restrictions out now rather than when huge entries start appearing.
zap_pud_range()actually does have that
VM_BUG_ON_VMA(!rwsem_is_locked(&tlb->mm->mmap_sem), vma);
The VM_BUG is a blind copy from PMD layer and it's bogus. i_mmap_lock() works fine for file mappings.
The PMD was intended for THP case at the time when there were only anon-THP. The check was relaxed and later dropped for file-THP on PMD level. It has to be dropped on PUD too. We don't have anon-THP on PUD level at all, only DAX played with them.