We have allowed migration for only LRU pages until now and it was enough to make high-order pages. But recently, embedded system(e.g., webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) so we have seen several reports about troubles of small high-order allocation. For fixing the problem, there were several efforts (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, reserved memory, vmalloc and so on) but if there are lots of non-movable pages in system, their solutions are void in the long run.
So, this patch is to support facility to change non-movable pages with movable. For the feature, this patch introduces functions related to migration to address_space_operations as well as some page flags.
Basically, this patch supports two page-flags and two functions related to page migration. The flag and page->mapping stability are protected by PG_lock.
PG_movable PG_isolated
bool (*isolate_page) (struct page *, isolate_mode_t); void (*putback_page) (struct page *);
Duty of subsystem want to make their pages as migratable are as follows:
1. It should register address_space to page->mapping then mark the page as PG_movable via __SetPageMovable.
2. It should mark the page as PG_isolated via SetPageIsolated if isolation is sucessful and return true.
3. If migration is successful, it should clear PG_isolated and PG_movable of the page for free preparation then release the reference of the page to free.
4. If migration fails, putback function of subsystem should clear PG_isolated via ClearPageIsolated.
5. If a subsystem want to release isolated page, it should clear PG_isolated but not PG_movable. Instead, VM will do it.
Cc: Vlastimil Babka vbabka@suse.cz Cc: Mel Gorman mgorman@suse.de Cc: Hugh Dickins hughd@google.com Cc: dri-devel@lists.freedesktop.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Gioh Kim gurugio@hanmail.net Signed-off-by: Minchan Kim minchan@kernel.org --- Documentation/filesystems/Locking | 4 + Documentation/filesystems/vfs.txt | 5 + fs/proc/page.c | 3 + include/linux/fs.h | 2 + include/linux/migrate.h | 2 + include/linux/page-flags.h | 31 ++++++ include/uapi/linux/kernel-page-flags.h | 1 + mm/compaction.c | 14 ++- mm/migrate.c | 174 +++++++++++++++++++++++++++++---- 9 files changed, 217 insertions(+), 19 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 619af9bfdcb3..0bb79560abb3 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -195,7 +195,9 @@ unlocks and drops the reference. int (*releasepage) (struct page *, int); void (*freepage)(struct page *); int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); + bool (*isolate_page) (struct page *, isolate_mode_t); int (*migratepage)(struct address_space *, struct page *, struct page *); + void (*putback_page) (struct page *); int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); int (*error_remove_page)(struct address_space *, struct page *); @@ -219,7 +221,9 @@ invalidatepage: yes releasepage: yes freepage: yes direct_IO: +isolate_page: yes migratepage: yes (both) +putback_page: yes launder_page: yes is_partially_uptodate: yes error_remove_page: yes diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index b02a7d598258..4c1b6c3b4bc8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -592,9 +592,14 @@ struct address_space_operations { int (*releasepage) (struct page *, int); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); + /* isolate a page for migration */ + bool (*isolate_page) (struct page *, isolate_mode_t); /* migrate the contents of a page to the specified target */ int (*migratepage) (struct page *, struct page *); + /* put the page back to right list */ + void (*putback_page) (struct page *); int (*launder_page) (struct page *); + int (*is_partially_uptodate) (struct page *, unsigned long, unsigned long); void (*is_dirty_writeback) (struct page *, bool *, bool *); diff --git a/fs/proc/page.c b/fs/proc/page.c index 3ecd445e830d..ce3d08a4ad8d 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page) if (page_is_idle(page)) u |= 1 << KPF_IDLE;
+ if (PageMovable(page)) + u |= 1 << KPF_MOVABLE; + u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
u |= kpf_copy_bit(k, KPF_SLAB, PG_slab); diff --git a/include/linux/fs.h b/include/linux/fs.h index da9e67d937e5..36f2d610e7a8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -401,6 +401,8 @@ struct address_space_operations { */ int (*migratepage) (struct address_space *, struct page *, struct page *, enum migrate_mode); + bool (*isolate_page)(struct page *, isolate_mode_t); + void (*putback_page)(struct page *); int (*launder_page) (struct page *); int (*is_partially_uptodate) (struct page *, unsigned long, unsigned long); diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 9b50325e4ddf..404fbfefeb33 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *, struct page *, struct page *, enum migrate_mode); extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, unsigned long private, enum migrate_mode mode, int reason); +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode); +extern void putback_movable_page(struct page *page);
extern int migrate_prep(void); extern int migrate_prep_local(void); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f4ed4f1b0c77..77ebf8fdbc6e 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -129,6 +129,10 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */ PG_double_map = PG_private_2, + + /* non-lru movable pages */ + PG_movable = PG_reclaim, + PG_isolated = PG_owner_priv_1, };
#ifndef __GENERATING_BOUNDS_H @@ -614,6 +618,33 @@ static inline void __ClearPageBalloon(struct page *page) atomic_set(&page->_mapcount, -1); }
+#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255) + +static inline int PageMovable(struct page *page) +{ + return ((test_bit(PG_movable, &(page)->flags) && + atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE) + || PageBalloon(page)); +} + +/* Caller should hold a PG_lock */ +static inline void __SetPageMovable(struct page *page, + struct address_space *mapping) +{ + page->mapping = mapping; + __set_bit(PG_movable, &page->flags); + atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE); +} + +static inline void __ClearPageMovable(struct page *page) +{ + atomic_set(&page->_mapcount, -1); + __clear_bit(PG_movable, &(page)->flags); + page->mapping = NULL; +} + +PAGEFLAG(Isolated, isolated, PF_ANY); + /* * If network-based swap is enabled, sl*b must keep track of whether pages * were allocated from pfmemalloc reserves. diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h index 5da5f8751ce7..a184fd2434fa 100644 --- a/include/uapi/linux/kernel-page-flags.h +++ b/include/uapi/linux/kernel-page-flags.h @@ -34,6 +34,7 @@ #define KPF_BALLOON 23 #define KPF_ZERO_PAGE 24 #define KPF_IDLE 25 +#define KPF_MOVABLE 26
#endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */ diff --git a/mm/compaction.c b/mm/compaction.c index ccf97b02b85f..7557aedddaee 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
/* * Check may be lockless but that's ok as we recheck later. - * It's possible to migrate LRU pages and balloon pages + * It's possible to migrate LRU and movable kernel pages. * Skip any other type of page */ is_lru = PageLRU(page); @@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_success; } } + + if (unlikely(PageMovable(page)) && + !PageIsolated(page)) { + if (locked) { + spin_unlock_irqrestore(&zone->lru_lock, + flags); + locked = false; + } + + if (isolate_movable_page(page, isolate_mode)) + goto isolate_success; + } }
/* diff --git a/mm/migrate.c b/mm/migrate.c index 53529c805752..b56bf2b3fe8c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -73,6 +73,85 @@ int migrate_prep_local(void) return 0; }
+bool isolate_movable_page(struct page *page, isolate_mode_t mode) +{ + bool ret = false; + + /* + * Avoid burning cycles with pages that are yet under __free_pages(), + * or just got freed under us. + * + * In case we 'win' a race for a movable page being freed under us and + * raise its refcount preventing __free_pages() from doing its job + * the put_page() at the end of this block will take care of + * release this page, thus avoiding a nasty leakage. + */ + if (unlikely(!get_page_unless_zero(page))) + goto out; + + /* + * Check PG_movable before holding a PG_lock because page's owner + * assumes anybody doesn't touch PG_lock of newly allocated page. + */ + if (unlikely(!PageMovable(page))) + goto out_putpage; + /* + * As movable pages are not isolated from LRU lists, concurrent + * compaction threads can race against page migration functions + * as well as race against the releasing a page. + * + * In order to avoid having an already isolated movable page + * being (wrongly) re-isolated while it is under migration, + * or to avoid attempting to isolate pages being released, + * lets be sure we have the page lock + * before proceeding with the movable page isolation steps. + */ + if (unlikely(!trylock_page(page))) + goto out_putpage; + + if (!PageMovable(page) || PageIsolated(page)) + goto out_no_isolated; + + ret = page->mapping->a_ops->isolate_page(page, mode); + if (!ret) + goto out_no_isolated; + + WARN_ON_ONCE(!PageIsolated(page)); + unlock_page(page); + return ret; + +out_no_isolated: + unlock_page(page); +out_putpage: + put_page(page); +out: + return ret; +} + +/* It should be called on page which is PG_movable */ +void putback_movable_page(struct page *page) +{ + /* + * 'lock_page()' stabilizes the page and prevents races against + * concurrent isolation threads attempting to re-isolate it. + */ + VM_BUG_ON_PAGE(!PageMovable(page), page); + + lock_page(page); + if (PageIsolated(page)) { + struct address_space *mapping; + + mapping = page_mapping(page); + mapping->a_ops->putback_page(page); + WARN_ON_ONCE(PageIsolated(page)); + } else { + __ClearPageMovable(page); + } + unlock_page(page); + /* drop the extra ref count taken for movable page isolation */ + put_page(page); +} + /* * Put previously isolated pages back onto the appropriate lists * from where they were once taken off for compaction/migration. @@ -94,10 +173,18 @@ void putback_movable_pages(struct list_head *l) list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); - if (unlikely(isolated_balloon_page(page))) + if (unlikely(isolated_balloon_page(page))) { balloon_page_putback(page); - else + } else if (unlikely(PageMovable(page))) { + if (PageIsolated(page)) { + putback_movable_page(page); + } else { + __ClearPageMovable(page); + put_page(page); + } + } else { putback_lru_page(page); + } } }
@@ -592,7 +679,7 @@ void migrate_page_copy(struct page *newpage, struct page *page) ***********************************************************/
/* - * Common logic to directly migrate a single page suitable for + * Common logic to directly migrate a single LRU page suitable for * pages that do not use PagePrivate/PagePrivate2. * * Pages are locked upon entry and exit. @@ -755,24 +842,54 @@ static int move_to_new_page(struct page *newpage, struct page *page, enum migrate_mode mode) { struct address_space *mapping; - int rc; + int rc = -EAGAIN; + bool lru_movable = true;
VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
mapping = page_mapping(page); - if (!mapping) - rc = migrate_page(mapping, newpage, page, mode); - else if (mapping->a_ops->migratepage) - /* - * Most pages have a mapping and most filesystems provide a - * migratepage callback. Anonymous pages are part of swap - * space which also has its own migratepage callback. This - * is the most common path for page migration. - */ - rc = mapping->a_ops->migratepage(mapping, newpage, page, mode); - else - rc = fallback_migrate_page(mapping, newpage, page, mode); + /* + * In case of non-lru page, it could be released after + * isolation step. In that case, we shouldn't try + * fallback migration which was designed for LRU pages. + * + * The rule for such case is that subsystem should clear + * PG_isolated but remains PG_movable so VM should catch + * it and clear PG_movable for it. + */ + if (unlikely(PageMovable(page))) { + lru_movable = false; + VM_BUG_ON_PAGE(!mapping, page); + if (!PageIsolated(page)) { + rc = MIGRATEPAGE_SUCCESS; + __ClearPageMovable(page); + goto out; + } + } + + if (likely(lru_movable)) { + if (!mapping) + rc = migrate_page(mapping, newpage, page, mode); + else if (mapping->a_ops->migratepage) + /* + * Most pages have a mapping and most filesystems + * provide a migratepage callback. Anonymous pages + * are part of swap space which also has its own + * migratepage callback. This is the most common path + * for page migration. + */ + rc = mapping->a_ops->migratepage(mapping, newpage, + page, mode); + else + rc = fallback_migrate_page(mapping, newpage, + page, mode); + } else { + rc = mapping->a_ops->migratepage(mapping, newpage, + page, mode); + WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS && + PageIsolated(page)); + }
/* * When successful, old pagecache page->mapping must be cleared before @@ -782,6 +899,7 @@ static int move_to_new_page(struct page *newpage, struct page *page, if (!PageAnon(page)) page->mapping = NULL; } +out: return rc; }
@@ -960,6 +1078,8 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, put_new_page(newpage, private); else put_page(newpage); + if (PageMovable(page)) + __ClearPageMovable(page); goto out; }
@@ -1000,8 +1120,26 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, num_poisoned_pages_inc(); } } else { - if (rc != -EAGAIN) - putback_lru_page(page); + if (rc != -EAGAIN) { + /* + * subsystem couldn't remove PG_movable since page is + * isolated so PageMovable check is not racy in here. + * But PageIsolated check can be racy but it's okay + * because putback_movable_page checks it under PG_lock + * again. + */ + if (unlikely(PageMovable(page))) { + if (PageIsolated(page)) + putback_movable_page(page); + else { + __ClearPageMovable(page); + put_page(page); + } + } else { + putback_lru_page(page); + } + } + if (put_new_page) put_new_page(newpage, private); else
Might have been better as a separate migration patch and then a compaction patch. It's prefixed mm/compaction, but most changed are in mm/migrate.c
On 03/30/2016 09:12 AM, Minchan Kim wrote:
We have allowed migration for only LRU pages until now and it was enough to make high-order pages. But recently, embedded system(e.g., webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) so we have seen several reports about troubles of small high-order allocation. For fixing the problem, there were several efforts (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, reserved memory, vmalloc and so on) but if there are lots of non-movable pages in system, their solutions are void in the long run.
So, this patch is to support facility to change non-movable pages with movable. For the feature, this patch introduces functions related to migration to address_space_operations as well as some page flags.
Basically, this patch supports two page-flags and two functions related to page migration. The flag and page->mapping stability are protected by PG_lock.
PG_movable PG_isolated
bool (*isolate_page) (struct page *, isolate_mode_t); void (*putback_page) (struct page *);
Duty of subsystem want to make their pages as migratable are as follows:
- It should register address_space to page->mapping then mark
the page as PG_movable via __SetPageMovable.
- It should mark the page as PG_isolated via SetPageIsolated
if isolation is sucessful and return true.
Ah another thing to document (especially in the comments/Doc) is that the subsystem must not expect anything to survive in page.lru (or fields that union it) after having isolated successfully.
- If migration is successful, it should clear PG_isolated and
PG_movable of the page for free preparation then release the reference of the page to free.
- If migration fails, putback function of subsystem should
clear PG_isolated via ClearPageIsolated.
- If a subsystem want to release isolated page, it should
clear PG_isolated but not PG_movable. Instead, VM will do it.
Under lock? Or just with ClearPageIsolated?
Cc: Vlastimil Babka vbabka@suse.cz Cc: Mel Gorman mgorman@suse.de Cc: Hugh Dickins hughd@google.com Cc: dri-devel@lists.freedesktop.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Gioh Kim gurugio@hanmail.net Signed-off-by: Minchan Kim minchan@kernel.org
Documentation/filesystems/Locking | 4 + Documentation/filesystems/vfs.txt | 5 + fs/proc/page.c | 3 + include/linux/fs.h | 2 + include/linux/migrate.h | 2 + include/linux/page-flags.h | 31 ++++++ include/uapi/linux/kernel-page-flags.h | 1 + mm/compaction.c | 14 ++- mm/migrate.c | 174 +++++++++++++++++++++++++++++---- 9 files changed, 217 insertions(+), 19 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 619af9bfdcb3..0bb79560abb3 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -195,7 +195,9 @@ unlocks and drops the reference. int (*releasepage) (struct page *, int); void (*freepage)(struct page *); int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
- bool (*isolate_page) (struct page *, isolate_mode_t); int (*migratepage)(struct address_space *, struct page *, struct page *);
- void (*putback_page) (struct page *); int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes releasepage: yes freepage: yes direct_IO: +isolate_page: yes migratepage: yes (both) +putback_page: yes launder_page: yes is_partially_uptodate: yes error_remove_page: yes diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index b02a7d598258..4c1b6c3b4bc8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -592,9 +592,14 @@ struct address_space_operations { int (*releasepage) (struct page *, int); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
- /* isolate a page for migration */
- bool (*isolate_page) (struct page *, isolate_mode_t); /* migrate the contents of a page to the specified target */ int (*migratepage) (struct page *, struct page *);
- /* put the page back to right list */
... "after a failed migration" ?
- void (*putback_page) (struct page *); int (*launder_page) (struct page *);
- int (*is_partially_uptodate) (struct page *, unsigned long, unsigned long); void (*is_dirty_writeback) (struct page *, bool *, bool *);
diff --git a/fs/proc/page.c b/fs/proc/page.c index 3ecd445e830d..ce3d08a4ad8d 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page) if (page_is_idle(page)) u |= 1 << KPF_IDLE;
if (PageMovable(page))
u |= 1 << KPF_MOVABLE;
u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
diff --git a/include/linux/fs.h b/include/linux/fs.h index da9e67d937e5..36f2d610e7a8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -401,6 +401,8 @@ struct address_space_operations { */ int (*migratepage) (struct address_space *, struct page *, struct page *, enum migrate_mode);
- bool (*isolate_page)(struct page *, isolate_mode_t);
- void (*putback_page)(struct page *); int (*launder_page) (struct page *); int (*is_partially_uptodate) (struct page *, unsigned long, unsigned long);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 9b50325e4ddf..404fbfefeb33 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *, struct page *, struct page *, enum migrate_mode); extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, unsigned long private, enum migrate_mode mode, int reason); +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode); +extern void putback_movable_page(struct page *page);
extern int migrate_prep(void); extern int migrate_prep_local(void); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f4ed4f1b0c77..77ebf8fdbc6e 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -129,6 +129,10 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */ PG_double_map = PG_private_2,
- /* non-lru movable pages */
- PG_movable = PG_reclaim,
- PG_isolated = PG_owner_priv_1,
Documentation should probably state that these fields alias and subsystem supporting the movable pages shouldn't use them elsewhere.
Also I'm a bit uncomfortable how isolate_movable_page() blindly expects that page->mapping->a_ops->isolate_page exists for PageMovable() pages. What if it's a false positive on a PG_reclaim page? Can we rely on PG_reclaim always (and without races) implying PageLRU() so that we don't even attempt isolate_movable_page()?
};
#ifndef __GENERATING_BOUNDS_H @@ -614,6 +618,33 @@ static inline void __ClearPageBalloon(struct page *page) atomic_set(&page->_mapcount, -1); }
+#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
IIRC this was what Gioh's previous attempts used instead of PG_movable? Is it still needed? Doesn't it prevent a driver providing movable *and* mapped pages? If it's to distinguish the PG_reclaim alias that I mention above, it seems like an overkill to me. Why would be need both special mapcount value and a flag? Checking that page->mapping->a_ops->isolate_page exists before calling it should be enough to resolve the ambiguity?
+static inline int PageMovable(struct page *page) +{
- return ((test_bit(PG_movable, &(page)->flags) &&
atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
|| PageBalloon(page));
+}
+/* Caller should hold a PG_lock */ +static inline void __SetPageMovable(struct page *page,
struct address_space *mapping)
+{
- page->mapping = mapping;
- __set_bit(PG_movable, &page->flags);
- atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
+}
+static inline void __ClearPageMovable(struct page *page) +{
- atomic_set(&page->_mapcount, -1);
- __clear_bit(PG_movable, &(page)->flags);
- page->mapping = NULL;
+}
+PAGEFLAG(Isolated, isolated, PF_ANY);
- /*
- If network-based swap is enabled, sl*b must keep track of whether pages
- were allocated from pfmemalloc reserves.
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h index 5da5f8751ce7..a184fd2434fa 100644 --- a/include/uapi/linux/kernel-page-flags.h +++ b/include/uapi/linux/kernel-page-flags.h @@ -34,6 +34,7 @@ #define KPF_BALLOON 23 #define KPF_ZERO_PAGE 24 #define KPF_IDLE 25 +#define KPF_MOVABLE 26
#endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */ diff --git a/mm/compaction.c b/mm/compaction.c index ccf97b02b85f..7557aedddaee 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
/* * Check may be lockless but that's ok as we recheck later.
* It's possible to migrate LRU pages and balloon pages
* It's possible to migrate LRU and movable kernel pages.
*/ is_lru = PageLRU(page);
- Skip any other type of page
@@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_success; } }
if (unlikely(PageMovable(page)) &&
!PageIsolated(page)) {
if (locked) {
spin_unlock_irqrestore(&zone->lru_lock,
flags);
locked = false;
}
if (isolate_movable_page(page, isolate_mode))
goto isolate_success;
}
}
/*
diff --git a/mm/migrate.c b/mm/migrate.c index 53529c805752..b56bf2b3fe8c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -73,6 +73,85 @@ int migrate_prep_local(void) return 0; }
+bool isolate_movable_page(struct page *page, isolate_mode_t mode) +{
- bool ret = false;
Maintaining "ret" seems useless here. All the "goto out*" statements are executed only when ret is false, and ret == true is returned by a different return.
- /*
* Avoid burning cycles with pages that are yet under __free_pages(),
* or just got freed under us.
*
* In case we 'win' a race for a movable page being freed under us and
* raise its refcount preventing __free_pages() from doing its job
* the put_page() at the end of this block will take care of
* release this page, thus avoiding a nasty leakage.
*/
- if (unlikely(!get_page_unless_zero(page)))
goto out;
- /*
* Check PG_movable before holding a PG_lock because page's owner
* assumes anybody doesn't touch PG_lock of newly allocated page.
*/
- if (unlikely(!PageMovable(page)))
goto out_putpage;
- /*
* As movable pages are not isolated from LRU lists, concurrent
* compaction threads can race against page migration functions
* as well as race against the releasing a page.
*
* In order to avoid having an already isolated movable page
* being (wrongly) re-isolated while it is under migration,
* or to avoid attempting to isolate pages being released,
* lets be sure we have the page lock
* before proceeding with the movable page isolation steps.
*/
- if (unlikely(!trylock_page(page)))
goto out_putpage;
- if (!PageMovable(page) || PageIsolated(page))
goto out_no_isolated;
- ret = page->mapping->a_ops->isolate_page(page, mode);
- if (!ret)
goto out_no_isolated;
- WARN_ON_ONCE(!PageIsolated(page));
- unlock_page(page);
- return ret;
+out_no_isolated:
- unlock_page(page);
+out_putpage:
- put_page(page);
+out:
- return ret;
+}
+/* It should be called on page which is PG_movable */ +void putback_movable_page(struct page *page) +{
- /*
* 'lock_page()' stabilizes the page and prevents races against
* concurrent isolation threads attempting to re-isolate it.
*/
- VM_BUG_ON_PAGE(!PageMovable(page), page);
- lock_page(page);
- if (PageIsolated(page)) {
struct address_space *mapping;
mapping = page_mapping(page);
mapping->a_ops->putback_page(page);
WARN_ON_ONCE(PageIsolated(page));
- } else {
__ClearPageMovable(page);
- }
- unlock_page(page);
- /* drop the extra ref count taken for movable page isolation */
- put_page(page);
+}
- /*
- Put previously isolated pages back onto the appropriate lists
- from where they were once taken off for compaction/migration.
@@ -94,10 +173,18 @@ void putback_movable_pages(struct list_head *l) list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page));
if (unlikely(isolated_balloon_page(page)))
if (unlikely(isolated_balloon_page(page))) { balloon_page_putback(page);
else
} else if (unlikely(PageMovable(page))) {
if (PageIsolated(page)) {
putback_movable_page(page);
} else {
__ClearPageMovable(page);
We don't do lock_page() here, so what prevents parallel compaction isolating the same page?
put_page(page);
}
} else { putback_lru_page(page);
} }}
@@ -592,7 +679,7 @@ void migrate_page_copy(struct page *newpage, struct page *page) ***********************************************************/
/*
- Common logic to directly migrate a single page suitable for
- Common logic to directly migrate a single LRU page suitable for
- pages that do not use PagePrivate/PagePrivate2.
- Pages are locked upon entry and exit.
@@ -755,24 +842,54 @@ static int move_to_new_page(struct page *newpage, struct page *page, enum migrate_mode mode) { struct address_space *mapping;
- int rc;
int rc = -EAGAIN;
bool lru_movable = true;
VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
mapping = page_mapping(page);
- if (!mapping)
rc = migrate_page(mapping, newpage, page, mode);
- else if (mapping->a_ops->migratepage)
/*
* Most pages have a mapping and most filesystems provide a
* migratepage callback. Anonymous pages are part of swap
* space which also has its own migratepage callback. This
* is the most common path for page migration.
*/
rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
- else
rc = fallback_migrate_page(mapping, newpage, page, mode);
- /*
* In case of non-lru page, it could be released after
* isolation step. In that case, we shouldn't try
* fallback migration which was designed for LRU pages.
*
* The rule for such case is that subsystem should clear
* PG_isolated but remains PG_movable so VM should catch
* it and clear PG_movable for it.
*/
- if (unlikely(PageMovable(page))) {
Can false positive from PG_reclaim occur here?
lru_movable = false;
VM_BUG_ON_PAGE(!mapping, page);
if (!PageIsolated(page)) {
rc = MIGRATEPAGE_SUCCESS;
__ClearPageMovable(page);
goto out;
}
}
if (likely(lru_movable)) {
if (!mapping)
rc = migrate_page(mapping, newpage, page, mode);
else if (mapping->a_ops->migratepage)
/*
* Most pages have a mapping and most filesystems
* provide a migratepage callback. Anonymous pages
* are part of swap space which also has its own
* migratepage callback. This is the most common path
* for page migration.
*/
rc = mapping->a_ops->migratepage(mapping, newpage,
page, mode);
else
rc = fallback_migrate_page(mapping, newpage,
page, mode);
} else {
rc = mapping->a_ops->migratepage(mapping, newpage,
page, mode);
WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
PageIsolated(page));
}
/*
- When successful, old pagecache page->mapping must be cleared before
@@ -782,6 +899,7 @@ static int move_to_new_page(struct page *newpage, struct page *page, if (!PageAnon(page)) page->mapping = NULL; } +out: return rc; }
@@ -960,6 +1078,8 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, put_new_page(newpage, private); else put_page(newpage);
if (PageMovable(page))
goto out; }__ClearPageMovable(page);
@@ -1000,8 +1120,26 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, num_poisoned_pages_inc(); } } else {
if (rc != -EAGAIN)
putback_lru_page(page);
if (rc != -EAGAIN) {
/*
* subsystem couldn't remove PG_movable since page is
* isolated so PageMovable check is not racy in here.
* But PageIsolated check can be racy but it's okay
* because putback_movable_page checks it under PG_lock
* again.
*/
if (unlikely(PageMovable(page))) {
if (PageIsolated(page))
putback_movable_page(page);
else {
__ClearPageMovable(page);
Again, we don't do lock_page() here, so what prevents parallel compaction isolating the same page?
Sorry for so many questions, hope they all have good answers and this series is a success :) Thanks for picking it up.
put_page(page);
}
} else {
putback_lru_page(page);
}
}
- if (put_new_page) put_new_page(newpage, private); else
On Fri, Apr 01, 2016 at 11:29:14PM +0200, Vlastimil Babka wrote:
Might have been better as a separate migration patch and then a compaction patch. It's prefixed mm/compaction, but most changed are in mm/migrate.c
Indeed. The title is rather misleading but not sure it's a good idea to separate compaction and migration part. I will just resend to change the tile from "mm/compaction" to "mm/migration".
On 03/30/2016 09:12 AM, Minchan Kim wrote:
We have allowed migration for only LRU pages until now and it was enough to make high-order pages. But recently, embedded system(e.g., webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) so we have seen several reports about troubles of small high-order allocation. For fixing the problem, there were several efforts (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, reserved memory, vmalloc and so on) but if there are lots of non-movable pages in system, their solutions are void in the long run.
So, this patch is to support facility to change non-movable pages with movable. For the feature, this patch introduces functions related to migration to address_space_operations as well as some page flags.
Basically, this patch supports two page-flags and two functions related to page migration. The flag and page->mapping stability are protected by PG_lock.
PG_movable PG_isolated
bool (*isolate_page) (struct page *, isolate_mode_t); void (*putback_page) (struct page *);
Duty of subsystem want to make their pages as migratable are as follows:
- It should register address_space to page->mapping then mark
the page as PG_movable via __SetPageMovable.
- It should mark the page as PG_isolated via SetPageIsolated
if isolation is sucessful and return true.
Ah another thing to document (especially in the comments/Doc) is that the subsystem must not expect anything to survive in page.lru (or fields that union it) after having isolated successfully.
Indeed. I surprised I didn't miss because I wrote it down somewhere but might miss it during rebase. I will fix it.
- If migration is successful, it should clear PG_isolated and
PG_movable of the page for free preparation then release the reference of the page to free.
- If migration fails, putback function of subsystem should
clear PG_isolated via ClearPageIsolated.
- If a subsystem want to release isolated page, it should
clear PG_isolated but not PG_movable. Instead, VM will do it.
Under lock? Or just with ClearPageIsolated?
Both: ClearPageIsolated undert PG_lock.
Yes, it's better to change ClearPageIsolated to __ClearPageIsolated.
Cc: Vlastimil Babka vbabka@suse.cz Cc: Mel Gorman mgorman@suse.de Cc: Hugh Dickins hughd@google.com Cc: dri-devel@lists.freedesktop.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Gioh Kim gurugio@hanmail.net Signed-off-by: Minchan Kim minchan@kernel.org
Documentation/filesystems/Locking | 4 + Documentation/filesystems/vfs.txt | 5 + fs/proc/page.c | 3 + include/linux/fs.h | 2 + include/linux/migrate.h | 2 + include/linux/page-flags.h | 31 ++++++ include/uapi/linux/kernel-page-flags.h | 1 + mm/compaction.c | 14 ++- mm/migrate.c | 174 +++++++++++++++++++++++++++++---- 9 files changed, 217 insertions(+), 19 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 619af9bfdcb3..0bb79560abb3 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -195,7 +195,9 @@ unlocks and drops the reference. int (*releasepage) (struct page *, int); void (*freepage)(struct page *); int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
- bool (*isolate_page) (struct page *, isolate_mode_t); int (*migratepage)(struct address_space *, struct page *, struct page *);
- void (*putback_page) (struct page *); int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes releasepage: yes freepage: yes direct_IO: +isolate_page: yes migratepage: yes (both) +putback_page: yes launder_page: yes is_partially_uptodate: yes error_remove_page: yes diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index b02a7d598258..4c1b6c3b4bc8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -592,9 +592,14 @@ struct address_space_operations { int (*releasepage) (struct page *, int); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
- /* isolate a page for migration */
- bool (*isolate_page) (struct page *, isolate_mode_t); /* migrate the contents of a page to the specified target */ int (*migratepage) (struct page *, struct page *);
- /* put the page back to right list */
... "after a failed migration" ?
Better.
- void (*putback_page) (struct page *); int (*launder_page) (struct page *);
- int (*is_partially_uptodate) (struct page *, unsigned long, unsigned long); void (*is_dirty_writeback) (struct page *, bool *, bool *);
diff --git a/fs/proc/page.c b/fs/proc/page.c index 3ecd445e830d..ce3d08a4ad8d 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page) if (page_is_idle(page)) u |= 1 << KPF_IDLE;
if (PageMovable(page))
u |= 1 << KPF_MOVABLE;
u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
diff --git a/include/linux/fs.h b/include/linux/fs.h index da9e67d937e5..36f2d610e7a8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -401,6 +401,8 @@ struct address_space_operations { */ int (*migratepage) (struct address_space *, struct page *, struct page *, enum migrate_mode);
- bool (*isolate_page)(struct page *, isolate_mode_t);
- void (*putback_page)(struct page *); int (*launder_page) (struct page *); int (*is_partially_uptodate) (struct page *, unsigned long, unsigned long);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 9b50325e4ddf..404fbfefeb33 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *, struct page *, struct page *, enum migrate_mode); extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, unsigned long private, enum migrate_mode mode, int reason); +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode); +extern void putback_movable_page(struct page *page);
extern int migrate_prep(void); extern int migrate_prep_local(void); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f4ed4f1b0c77..77ebf8fdbc6e 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -129,6 +129,10 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */ PG_double_map = PG_private_2,
- /* non-lru movable pages */
- PG_movable = PG_reclaim,
- PG_isolated = PG_owner_priv_1,
Documentation should probably state that these fields alias and subsystem supporting the movable pages shouldn't use them elsewhere.
Yeb.
Also I'm a bit uncomfortable how isolate_movable_page() blindly expects that page->mapping->a_ops->isolate_page exists for PageMovable() pages. What if it's a false positive on a PG_reclaim page? Can we rely on PG_reclaim always (and without races) implying PageLRU() so that we don't even attempt isolate_movable_page()?
For now, we shouldn't have such a false positive because PageMovable checks page->_mapcount == PAGE_MOVABLE_MAPCOUNT_VALUE as well as PG_movable under PG_lock.
But I read your question about user-mapped drvier pages so we cannot use _mapcount anymore so I will find another thing. A option is this.
static inline int PageMovable(struct page *page) { int ret = 0; struct address_space *mapping; struct address_space_operations *a_op;
if (!test_bit(PG_movable, &(page->flags)) goto out;
mapping = page->mapping; if (!mapping) goto out;
a_op = mapping->a_op; if (!aop) goto out; if (a_op->isolate_page) ret = 1; out: return ret;
}
It works under PG_lock but with this, we need trylock_page to peek whether it's movable non-lru or not for scanning pfn. For avoiding that, we need another function to peek which just checks PG_movable bit instead of all things.
/* * If @page_locked is false, we cannot guarantee page->mapping's stability * so just the function checks with PG_movable which could be false positive * so caller should check it again under PG_lock to check a_ops->isolate_page. */ static inline int PageMovable(struct page *page, bool page_locked) { int ret = 0; struct address_space *mapping; struct address_space_operations *a_op;
if (!test_bit(PG_movable, &(page->flags)) goto out;
if (!page_locked) { ret = 1; goto out; }
mapping = page->mapping; if (!mapping) goto out;
a_op = mapping->a_op; if (!aop) goto out; if (a_op->isolate_page) ret = 1; out: return ret; }
};
#ifndef __GENERATING_BOUNDS_H @@ -614,6 +618,33 @@ static inline void __ClearPageBalloon(struct page *page) atomic_set(&page->_mapcount, -1); }
+#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
IIRC this was what Gioh's previous attempts used instead of PG_movable? Is it still needed? Doesn't it prevent a driver
It needs to avoid false positive as I said.
providing movable *and* mapped pages?
Absolutely true. I will rethink about it.
If it's to distinguish the PG_reclaim alias that I mention above, it seems like an overkill to me. Why would be need both special mapcount value and a flag? Checking that page->mapping->a_ops->isolate_page exists before calling it should be enough to resolve the ambiguity?
As I mentioned, using a_ops->isolate_page needs to be done under PG_lock. And the idea I suggested above will work, I guess. I will try it.
+static inline int PageMovable(struct page *page) +{
- return ((test_bit(PG_movable, &(page)->flags) &&
atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
|| PageBalloon(page));
+}
+/* Caller should hold a PG_lock */ +static inline void __SetPageMovable(struct page *page,
struct address_space *mapping)
+{
- page->mapping = mapping;
- __set_bit(PG_movable, &page->flags);
- atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
+}
+static inline void __ClearPageMovable(struct page *page) +{
- atomic_set(&page->_mapcount, -1);
- __clear_bit(PG_movable, &(page)->flags);
- page->mapping = NULL;
+}
+PAGEFLAG(Isolated, isolated, PF_ANY);
/*
- If network-based swap is enabled, sl*b must keep track of whether pages
- were allocated from pfmemalloc reserves.
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h index 5da5f8751ce7..a184fd2434fa 100644 --- a/include/uapi/linux/kernel-page-flags.h +++ b/include/uapi/linux/kernel-page-flags.h @@ -34,6 +34,7 @@ #define KPF_BALLOON 23 #define KPF_ZERO_PAGE 24 #define KPF_IDLE 25 +#define KPF_MOVABLE 26
#endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */ diff --git a/mm/compaction.c b/mm/compaction.c index ccf97b02b85f..7557aedddaee 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
/* * Check may be lockless but that's ok as we recheck later.
* It's possible to migrate LRU pages and balloon pages
* It's possible to migrate LRU and movable kernel pages.
*/ is_lru = PageLRU(page);
- Skip any other type of page
@@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_success; } }
if (unlikely(PageMovable(page)) &&
!PageIsolated(page)) {
if (locked) {
spin_unlock_irqrestore(&zone->lru_lock,
flags);
locked = false;
}
if (isolate_movable_page(page, isolate_mode))
goto isolate_success;
}
}
/*
diff --git a/mm/migrate.c b/mm/migrate.c index 53529c805752..b56bf2b3fe8c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -73,6 +73,85 @@ int migrate_prep_local(void) return 0; }
+bool isolate_movable_page(struct page *page, isolate_mode_t mode) +{
- bool ret = false;
Maintaining "ret" seems useless here. All the "goto out*" statements are executed only when ret is false, and ret == true is returned by a different return.
Yeb. Will change.
- /*
* Avoid burning cycles with pages that are yet under __free_pages(),
* or just got freed under us.
*
* In case we 'win' a race for a movable page being freed under us and
* raise its refcount preventing __free_pages() from doing its job
* the put_page() at the end of this block will take care of
* release this page, thus avoiding a nasty leakage.
*/
- if (unlikely(!get_page_unless_zero(page)))
goto out;
- /*
* Check PG_movable before holding a PG_lock because page's owner
* assumes anybody doesn't touch PG_lock of newly allocated page.
*/
- if (unlikely(!PageMovable(page)))
goto out_putpage;
- /*
* As movable pages are not isolated from LRU lists, concurrent
* compaction threads can race against page migration functions
* as well as race against the releasing a page.
*
* In order to avoid having an already isolated movable page
* being (wrongly) re-isolated while it is under migration,
* or to avoid attempting to isolate pages being released,
* lets be sure we have the page lock
* before proceeding with the movable page isolation steps.
*/
- if (unlikely(!trylock_page(page)))
goto out_putpage;
- if (!PageMovable(page) || PageIsolated(page))
goto out_no_isolated;
- ret = page->mapping->a_ops->isolate_page(page, mode);
- if (!ret)
goto out_no_isolated;
- WARN_ON_ONCE(!PageIsolated(page));
- unlock_page(page);
- return ret;
+out_no_isolated:
- unlock_page(page);
+out_putpage:
- put_page(page);
+out:
- return ret;
+}
+/* It should be called on page which is PG_movable */ +void putback_movable_page(struct page *page) +{
- /*
* 'lock_page()' stabilizes the page and prevents races against
* concurrent isolation threads attempting to re-isolate it.
*/
- VM_BUG_ON_PAGE(!PageMovable(page), page);
- lock_page(page);
- if (PageIsolated(page)) {
struct address_space *mapping;
mapping = page_mapping(page);
mapping->a_ops->putback_page(page);
WARN_ON_ONCE(PageIsolated(page));
- } else {
__ClearPageMovable(page);
- }
- unlock_page(page);
- /* drop the extra ref count taken for movable page isolation */
- put_page(page);
+}
/*
- Put previously isolated pages back onto the appropriate lists
- from where they were once taken off for compaction/migration.
@@ -94,10 +173,18 @@ void putback_movable_pages(struct list_head *l) list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page));
if (unlikely(isolated_balloon_page(page)))
if (unlikely(isolated_balloon_page(page))) { balloon_page_putback(page);
else
} else if (unlikely(PageMovable(page))) {
if (PageIsolated(page)) {
putback_movable_page(page);
} else {
__ClearPageMovable(page);
We don't do lock_page() here, so what prevents parallel compaction isolating the same page?
Need PG_lock.
put_page(page);
}
} else { putback_lru_page(page);
}}
}
@@ -592,7 +679,7 @@ void migrate_page_copy(struct page *newpage, struct page *page) ***********************************************************/
/*
- Common logic to directly migrate a single page suitable for
- Common logic to directly migrate a single LRU page suitable for
- pages that do not use PagePrivate/PagePrivate2.
- Pages are locked upon entry and exit.
@@ -755,24 +842,54 @@ static int move_to_new_page(struct page *newpage, struct page *page, enum migrate_mode mode) { struct address_space *mapping;
- int rc;
int rc = -EAGAIN;
bool lru_movable = true;
VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
mapping = page_mapping(page);
- if (!mapping)
rc = migrate_page(mapping, newpage, page, mode);
- else if (mapping->a_ops->migratepage)
/*
* Most pages have a mapping and most filesystems provide a
* migratepage callback. Anonymous pages are part of swap
* space which also has its own migratepage callback. This
* is the most common path for page migration.
*/
rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
- else
rc = fallback_migrate_page(mapping, newpage, page, mode);
- /*
* In case of non-lru page, it could be released after
* isolation step. In that case, we shouldn't try
* fallback migration which was designed for LRU pages.
*
* The rule for such case is that subsystem should clear
* PG_isolated but remains PG_movable so VM should catch
* it and clear PG_movable for it.
*/
- if (unlikely(PageMovable(page))) {
Can false positive from PG_reclaim occur here?
PageMovable has _mapcount == PAGE_MOVALBE_MAPCOUNT_VALUE check.
lru_movable = false;
VM_BUG_ON_PAGE(!mapping, page);
if (!PageIsolated(page)) {
rc = MIGRATEPAGE_SUCCESS;
__ClearPageMovable(page);
goto out;
}
}
if (likely(lru_movable)) {
if (!mapping)
rc = migrate_page(mapping, newpage, page, mode);
else if (mapping->a_ops->migratepage)
/*
* Most pages have a mapping and most filesystems
* provide a migratepage callback. Anonymous pages
* are part of swap space which also has its own
* migratepage callback. This is the most common path
* for page migration.
*/
rc = mapping->a_ops->migratepage(mapping, newpage,
page, mode);
else
rc = fallback_migrate_page(mapping, newpage,
page, mode);
} else {
rc = mapping->a_ops->migratepage(mapping, newpage,
page, mode);
WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
PageIsolated(page));
}
/*
- When successful, old pagecache page->mapping must be cleared before
@@ -782,6 +899,7 @@ static int move_to_new_page(struct page *newpage, struct page *page, if (!PageAnon(page)) page->mapping = NULL; } +out: return rc; }
@@ -960,6 +1078,8 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, put_new_page(newpage, private); else put_page(newpage);
if (PageMovable(page))
goto out; }__ClearPageMovable(page);
@@ -1000,8 +1120,26 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, num_poisoned_pages_inc(); } } else {
if (rc != -EAGAIN)
putback_lru_page(page);
if (rc != -EAGAIN) {
/*
* subsystem couldn't remove PG_movable since page is
* isolated so PageMovable check is not racy in here.
* But PageIsolated check can be racy but it's okay
* because putback_movable_page checks it under PG_lock
* again.
*/
if (unlikely(PageMovable(page))) {
if (PageIsolated(page))
putback_movable_page(page);
else {
__ClearPageMovable(page);
Again, we don't do lock_page() here, so what prevents parallel compaction isolating the same page?
It seems to need PG_lock in there, too. Thanks for catching it up.
Sorry for so many questions, hope they all have good answers and this series is a success :) Thanks for picking it up.
No problem at all. Many question means the code or/and doc is not clear and still need be improved.
Thanks for detail review, Vlastimil! I will resend new versions after vacation in this week.
On 04/04/2016 07:12 AM, Minchan Kim wrote:
On Fri, Apr 01, 2016 at 11:29:14PM +0200, Vlastimil Babka wrote:
Might have been better as a separate migration patch and then a compaction patch. It's prefixed mm/compaction, but most changed are in mm/migrate.c
Indeed. The title is rather misleading but not sure it's a good idea to separate compaction and migration part.
Guess it's better to see the new functions together with its user after all, OK.
I will just resend to change the tile from "mm/compaction" to "mm/migration".
OK!
Also I'm a bit uncomfortable how isolate_movable_page() blindly expects that page->mapping->a_ops->isolate_page exists for PageMovable() pages. What if it's a false positive on a PG_reclaim page? Can we rely on PG_reclaim always (and without races) implying PageLRU() so that we don't even attempt isolate_movable_page()?
For now, we shouldn't have such a false positive because PageMovable checks page->_mapcount == PAGE_MOVABLE_MAPCOUNT_VALUE as well as PG_movable under PG_lock.
But I read your question about user-mapped drvier pages so we cannot use _mapcount anymore so I will find another thing. A option is this.
static inline int PageMovable(struct page *page) { int ret = 0; struct address_space *mapping; struct address_space_operations *a_op;
if (!test_bit(PG_movable, &(page->flags)) goto out; mapping = page->mapping; if (!mapping) goto out; a_op = mapping->a_op; if (!aop) goto out; if (a_op->isolate_page) ret = 1;
out: return ret;
}
It works under PG_lock but with this, we need trylock_page to peek whether it's movable non-lru or not for scanning pfn.
Hm I hoped that with READ_ONCE() we could do the peek safely without trylock_page, if we use it only as a heuristic. But I guess it would require at least RCU-level protection of the page->mapping->a_op->isolate_page chain.
For avoiding that, we need another function to peek which just checks PG_movable bit instead of all things.
/*
- If @page_locked is false, we cannot guarantee page->mapping's stability
- so just the function checks with PG_movable which could be false positive
- so caller should check it again under PG_lock to check a_ops->isolate_page.
*/ static inline int PageMovable(struct page *page, bool page_locked) { int ret = 0; struct address_space *mapping; struct address_space_operations *a_op;
if (!test_bit(PG_movable, &(page->flags)) goto out; if (!page_locked) { ret = 1; goto out; } mapping = page->mapping; if (!mapping) goto out; a_op = mapping->a_op; if (!aop) goto out; if (a_op->isolate_page) ret = 1;
out: return ret; }
I wouldn't put everything into single function, but create something like __PageMovable() just for the unlocked peek. Unlike the zone->lru_lock, we don't keep page_lock() across iterations in isolate_migratepages_block(), as obviously each page has different lock. So the page_locked parameter would be always passed as constant, and at that point it's better to have separate functions.
So I guess the question is how many false positives from overlap with PG_reclaim the scanner will hit if we give up on PAGE_MOVABLE_MAPCOUNT_VALUE, as that will increase number of page locks just to realize that it's not actual PageMovable() page...
Thanks for detail review, Vlastimil! I will resend new versions after vacation in this week.
You're welcome, great!
On Mon, Apr 04, 2016 at 03:24:34PM +0200, Vlastimil Babka wrote:
On 04/04/2016 07:12 AM, Minchan Kim wrote:
On Fri, Apr 01, 2016 at 11:29:14PM +0200, Vlastimil Babka wrote:
Might have been better as a separate migration patch and then a compaction patch. It's prefixed mm/compaction, but most changed are in mm/migrate.c
Indeed. The title is rather misleading but not sure it's a good idea to separate compaction and migration part.
Guess it's better to see the new functions together with its user after all, OK.
I will just resend to change the tile from "mm/compaction" to "mm/migration".
OK!
Also I'm a bit uncomfortable how isolate_movable_page() blindly expects that page->mapping->a_ops->isolate_page exists for PageMovable() pages. What if it's a false positive on a PG_reclaim page? Can we rely on PG_reclaim always (and without races) implying PageLRU() so that we don't even attempt isolate_movable_page()?
For now, we shouldn't have such a false positive because PageMovable checks page->_mapcount == PAGE_MOVABLE_MAPCOUNT_VALUE as well as PG_movable under PG_lock.
But I read your question about user-mapped drvier pages so we cannot use _mapcount anymore so I will find another thing. A option is this.
static inline int PageMovable(struct page *page) { int ret = 0; struct address_space *mapping; struct address_space_operations *a_op;
if (!test_bit(PG_movable, &(page->flags)) goto out; mapping = page->mapping; if (!mapping) goto out; a_op = mapping->a_op; if (!aop) goto out; if (a_op->isolate_page) ret = 1;
out: return ret;
}
It works under PG_lock but with this, we need trylock_page to peek whether it's movable non-lru or not for scanning pfn.
Hm I hoped that with READ_ONCE() we could do the peek safely without trylock_page, if we use it only as a heuristic. But I guess it would require at least RCU-level protection of the page->mapping->a_op->isolate_page chain.
For avoiding that, we need another function to peek which just checks PG_movable bit instead of all things.
/*
- If @page_locked is false, we cannot guarantee page->mapping's stability
- so just the function checks with PG_movable which could be false positive
- so caller should check it again under PG_lock to check a_ops->isolate_page.
*/ static inline int PageMovable(struct page *page, bool page_locked) { int ret = 0; struct address_space *mapping; struct address_space_operations *a_op;
if (!test_bit(PG_movable, &(page->flags)) goto out; if (!page_locked) { ret = 1; goto out; } mapping = page->mapping; if (!mapping) goto out; a_op = mapping->a_op; if (!aop) goto out; if (a_op->isolate_page) ret = 1;
out: return ret; }
I wouldn't put everything into single function, but create something like __PageMovable() just for the unlocked peek. Unlike the zone->lru_lock, we don't keep page_lock() across iterations in isolate_migratepages_block(), as obviously each page has different lock. So the page_locked parameter would be always passed as constant, and at that point it's better to have separate functions.
Agree.
So I guess the question is how many false positives from overlap with PG_reclaim the scanner will hit if we give up on PAGE_MOVABLE_MAPCOUNT_VALUE, as that will increase number of page locks just to realize that it's not actual PageMovable() page...
I don't think it's too many because PG_reclaim bit is set to only LRU pages at the moment and we can check PageMovable after !PageLRU check.
Thanks.
dri-devel@lists.freedesktop.org