On 31.05.22 22:00, Alex Sierra wrote:
With DEVICE_COHERENT, we'll soon have vm_normal_pages() return device-managed anonymous pages that are not LRU pages. Although they behave like normal pages for purposes of mapping in CPU page, and for COW. They do not support LRU lists, NUMA migration or THP.
We also introduced a FOLL_LRU flag that adds the same behaviour to follow_page and related APIs, to allow callers to specify that they expect to put pages on an LRU list.
Signed-off-by: Alex Sierra alex.sierra@amd.com Acked-by: Felix Kuehling Felix.Kuehling@amd.com
fs/proc/task_mmu.c | 2 +- include/linux/mm.h | 3 ++- mm/gup.c | 6 +++++- mm/huge_memory.c | 2 +- mm/khugepaged.c | 9 ++++++--- mm/ksm.c | 6 +++--- mm/madvise.c | 4 ++-- mm/memory.c | 9 ++++++++- mm/mempolicy.c | 2 +- mm/migrate.c | 4 ++-- mm/mlock.c | 2 +- mm/mprotect.c | 2 +- 12 files changed, 33 insertions(+), 18 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 2d04e3470d4c..2dd8c8a66924 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1792,7 +1792,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma, return NULL;
page = vm_normal_page(vma, addr, pte);
- if (!page)
if (!page || is_zone_device_page(page)) return NULL;
if (PageReserved(page))
diff --git a/include/linux/mm.h b/include/linux/mm.h index bc8f326be0ce..d3f43908ff8d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -601,7 +601,7 @@ struct vm_operations_struct { #endif /* * Called by vm_normal_page() for special PTEs to find the
* page for @addr. This is useful if the default behavior
* page for @addr. This is useful if the default behavior
*/ struct page *(*find_special_page)(struct vm_area_struct *vma,
- (using pte_page()) would not find the correct page.
@@ -2934,6 +2934,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_NUMA 0x200 /* force NUMA hinting page fault */ #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ +#define FOLL_LRU 0x1000 /* return only LRU (anon or page cache) */
Does that statement hold for special pages like the shared zeropage?
Also, this flag is only valid for in-kernel follow_page() but not for the ordinary GUP interfaces. What are the semantics there? Is it fenced?
I really wonder if you should simply similarly teach the handful of users of follow_page() to just special case these pages ... sounds cleaner to me then adding flags with unclear semantics. Alternatively, properly document what that flag is actually doing and where it applies.
I know, there was discussion on ... sorry for jumping in now, but this doesn't look clean to me yet.