From: John Hubbard jhubbard@nvidia.com
Hi,
These are best characterized as miscellaneous conversions: many (not all) call sites that don't involve biovec or iov_iter, nor mm/. It also leaves out a few call sites that require some more work. These are mostly pretty simple ones.
It's probably best to send all of these via Andrew's -mm tree, assuming that there are no significant merge conflicts with ongoing work in other trees (which I doubt, given that these are small changes).
These patches apply to the latest linux.git. Patch #1 is also already in Andrew's tree, but given the broad non-linux-mm Cc list, I thought it would be more convenient to just include that patch here, so that people can use linux.git as the base--even though these are probably destined for linux-mm.
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). That commit has an extensive description of the problem and the planned steps to solve it, but the highlites are:
1) Provide put_user_page*() routines, intended to be used for releasing pages that were pinned via get_user_pages*().
2) Convert all of the call sites for get_user_pages*(), to invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time.
3) After (2) is complete, use get_user_pages*() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting.
4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem.
And a few references, also from that commit:
[1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()" [2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
Ira Weiny (1): fs/binfmt_elf: convert put_page() to put_user_page*()
John Hubbard (33): mm/gup: add make_dirty arg to put_user_pages_dirty_lock() net/rds: convert put_page() to put_user_page*() net/ceph: convert put_page() to put_user_page*() x86/kvm: convert put_page() to put_user_page*() drm/etnaviv: convert release_pages() to put_user_pages() drm/i915: convert put_page() to put_user_page*() drm/radeon: convert put_page() to put_user_page*() media/ivtv: convert put_page() to put_user_page*() media/v4l2-core/mm: convert put_page() to put_user_page*() genwqe: convert put_page() to put_user_page*() scif: convert put_page() to put_user_page*() vmci: convert put_page() to put_user_page*() rapidio: convert put_page() to put_user_page*() oradax: convert put_page() to put_user_page*() staging/vc04_services: convert put_page() to put_user_page*() drivers/tee: convert put_page() to put_user_page*() vfio: convert put_page() to put_user_page*() fbdev/pvr2fb: convert put_page() to put_user_page*() fsl_hypervisor: convert put_page() to put_user_page*() xen: convert put_page() to put_user_page*() fs/exec.c: convert put_page() to put_user_page*() orangefs: convert put_page() to put_user_page*() uprobes: convert put_page() to put_user_page*() futex: convert put_page() to put_user_page*() mm/frame_vector.c: convert put_page() to put_user_page*() mm/gup_benchmark.c: convert put_page() to put_user_page*() mm/memory.c: convert put_page() to put_user_page*() mm/madvise.c: convert put_page() to put_user_page*() mm/process_vm_access.c: convert put_page() to put_user_page*() crypt: convert put_page() to put_user_page*() nfs: convert put_page() to put_user_page*() goldfish_pipe: convert put_page() to put_user_page*() kernel/events/core.c: convert put_page() to put_user_page*()
arch/x86/kvm/svm.c | 4 +- crypto/af_alg.c | 7 +- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 9 +- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/infiniband/core/umem.c | 5 +- drivers/infiniband/hw/hfi1/user_pages.c | 5 +- drivers/infiniband/hw/qib/qib_user_pages.c | 5 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 5 +- drivers/infiniband/sw/siw/siw_mem.c | 10 +- drivers/media/pci/ivtv/ivtv-udma.c | 14 +-- drivers/media/pci/ivtv/ivtv-yuv.c | 10 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 3 +- drivers/misc/genwqe/card_utils.c | 17 +-- drivers/misc/mic/scif/scif_rma.c | 17 ++- drivers/misc/vmw_vmci/vmci_context.c | 2 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 11 +- drivers/platform/goldfish/goldfish_pipe.c | 9 +- drivers/rapidio/devices/rio_mport_cdev.c | 9 +- drivers/sbus/char/oradax.c | 2 +- .../interface/vchiq_arm/vchiq_2835_arm.c | 10 +- drivers/tee/tee_shm.c | 10 +- drivers/vfio/vfio_iommu_type1.c | 8 +- drivers/video/fbdev/pvr2fb.c | 3 +- drivers/virt/fsl_hypervisor.c | 7 +- drivers/xen/gntdev.c | 5 +- drivers/xen/privcmd.c | 7 +- fs/binfmt_elf.c | 2 +- fs/binfmt_elf_fdpic.c | 2 +- fs/exec.c | 2 +- fs/nfs/direct.c | 4 +- fs/orangefs/orangefs-bufmap.c | 7 +- include/linux/mm.h | 5 +- kernel/events/core.c | 2 +- kernel/events/uprobes.c | 6 +- kernel/futex.c | 10 +- mm/frame_vector.c | 4 +- mm/gup.c | 115 ++++++++---------- mm/gup_benchmark.c | 2 +- mm/madvise.c | 2 +- mm/memory.c | 2 +- mm/process_vm_access.c | 18 +-- net/ceph/pagevec.c | 8 +- net/rds/info.c | 5 +- net/rds/message.c | 2 +- net/rds/rdma.c | 15 ++- virt/kvm/kvm_main.c | 4 +- 47 files changed, 151 insertions(+), 266 deletions(-)
From: John Hubbard jhubbard@nvidia.com
Provide more capable variation of put_user_pages_dirty_lock(), and delete put_user_pages_dirty(). This is based on the following:
1. Lots of call sites become simpler if a bool is passed into put_user_page*(), instead of making the call site choose which put_user_page*() variant to call.
2. Christoph Hellwig's observation that set_page_dirty_lock() is usually correct, and set_page_dirty() is usually a bug, or at least questionable, within a put_user_page*() calling chain.
This leads to the following API choices:
* put_user_pages_dirty_lock(page, npages, make_dirty)
* There is no put_user_pages_dirty(). You have to hand code that, in the rare case that it's required.
Cc: Matthew Wilcox willy@infradead.org Cc: Jan Kara jack@suse.cz Cc: Christoph Hellwig hch@lst.de Cc: Ira Weiny ira.weiny@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/infiniband/core/umem.c | 5 +- drivers/infiniband/hw/hfi1/user_pages.c | 5 +- drivers/infiniband/hw/qib/qib_user_pages.c | 5 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 5 +- drivers/infiniband/sw/siw/siw_mem.c | 10 +- include/linux/mm.h | 5 +- mm/gup.c | 115 +++++++++------------ 7 files changed, 58 insertions(+), 92 deletions(-)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 08da840ed7ee..965cf9dea71a 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -54,10 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) { page = sg_page_iter_page(&sg_iter); - if (umem->writable && dirty) - put_user_pages_dirty_lock(&page, 1); - else - put_user_page(page); + put_user_pages_dirty_lock(&page, 1, umem->writable && dirty); }
sg_free_table(&umem->sg_head); diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c index b89a9b9aef7a..469acb961fbd 100644 --- a/drivers/infiniband/hw/hfi1/user_pages.c +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -118,10 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np void hfi1_release_user_pages(struct mm_struct *mm, struct page **p, size_t npages, bool dirty) { - if (dirty) - put_user_pages_dirty_lock(p, npages); - else - put_user_pages(p, npages); + put_user_pages_dirty_lock(p, npages, dirty);
if (mm) { /* during close after signal, mm can be NULL */ atomic64_sub(npages, &mm->pinned_vm); diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index bfbfbb7e0ff4..6bf764e41891 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -40,10 +40,7 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages, int dirty) { - if (dirty) - put_user_pages_dirty_lock(p, num_pages); - else - put_user_pages(p, num_pages); + put_user_pages_dirty_lock(p, num_pages, dirty); }
/** diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 0b0237d41613..62e6ffa9ad78 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -75,10 +75,7 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty) for_each_sg(chunk->page_list, sg, chunk->nents, i) { page = sg_page(sg); pa = sg_phys(sg); - if (dirty) - put_user_pages_dirty_lock(&page, 1); - else - put_user_page(page); + put_user_pages_dirty_lock(&page, 1, dirty); usnic_dbg("pa: %pa\n", &pa); } kfree(chunk); diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c index 67171c82b0c4..ab83a9cec562 100644 --- a/drivers/infiniband/sw/siw/siw_mem.c +++ b/drivers/infiniband/sw/siw/siw_mem.c @@ -63,15 +63,7 @@ struct siw_mem *siw_mem_id2obj(struct siw_device *sdev, int stag_index) static void siw_free_plist(struct siw_page_chunk *chunk, int num_pages, bool dirty) { - struct page **p = chunk->plist; - - while (num_pages--) { - if (!PageDirty(*p) && dirty) - put_user_pages_dirty_lock(p, 1); - else - put_user_page(*p); - p++; - } + put_user_pages_dirty_lock(chunk->plist, num_pages, dirty); }
void siw_umem_release(struct siw_umem *umem, bool dirty) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0334ca97c584..9759b6a24420 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1057,8 +1057,9 @@ static inline void put_user_page(struct page *page) put_page(page); }
-void put_user_pages_dirty(struct page **pages, unsigned long npages); -void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty); + void put_user_pages(struct page **pages, unsigned long npages);
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bac..7fefd7ab02c4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -29,85 +29,70 @@ struct follow_page_context { unsigned int page_mask; };
-typedef int (*set_dirty_func_t)(struct page *page); - -static void __put_user_pages_dirty(struct page **pages, - unsigned long npages, - set_dirty_func_t sdf) -{ - unsigned long index; - - for (index = 0; index < npages; index++) { - struct page *page = compound_head(pages[index]); - - /* - * Checking PageDirty at this point may race with - * clear_page_dirty_for_io(), but that's OK. Two key cases: - * - * 1) This code sees the page as already dirty, so it skips - * the call to sdf(). That could happen because - * clear_page_dirty_for_io() called page_mkclean(), - * followed by set_page_dirty(). However, now the page is - * going to get written back, which meets the original - * intention of setting it dirty, so all is well: - * clear_page_dirty_for_io() goes on to call - * TestClearPageDirty(), and write the page back. - * - * 2) This code sees the page as clean, so it calls sdf(). - * The page stays dirty, despite being written back, so it - * gets written back again in the next writeback cycle. - * This is harmless. - */ - if (!PageDirty(page)) - sdf(page); - - put_user_page(page); - } -} - /** - * put_user_pages_dirty() - release and dirty an array of gup-pinned pages - * @pages: array of pages to be marked dirty and released. + * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages + * @pages: array of pages to be maybe marked dirty, and definitely released. * @npages: number of pages in the @pages array. + * @make_dirty: whether to mark the pages dirty * * "gup-pinned page" refers to a page that has had one of the get_user_pages() * variants called on that page. * * For each page in the @pages array, make that page (or its head page, if a - * compound page) dirty, if it was previously listed as clean. Then, release - * the page using put_user_page(). + * compound page) dirty, if @make_dirty is true, and if the page was previously + * listed as clean. In any case, releases all pages using put_user_page(), + * possibly via put_user_pages(), for the non-dirty case. * * Please see the put_user_page() documentation for details. * - * set_page_dirty(), which does not lock the page, is used here. - * Therefore, it is the caller's responsibility to ensure that this is - * safe. If not, then put_user_pages_dirty_lock() should be called instead. + * set_page_dirty_lock() is used internally. If instead, set_page_dirty() is + * required, then the caller should a) verify that this is really correct, + * because _lock() is usually required, and b) hand code it: + * set_page_dirty_lock(), put_user_page(). * */ -void put_user_pages_dirty(struct page **pages, unsigned long npages) +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty) { - __put_user_pages_dirty(pages, npages, set_page_dirty); -} -EXPORT_SYMBOL(put_user_pages_dirty); + unsigned long index;
-/** - * put_user_pages_dirty_lock() - release and dirty an array of gup-pinned pages - * @pages: array of pages to be marked dirty and released. - * @npages: number of pages in the @pages array. - * - * For each page in the @pages array, make that page (or its head page, if a - * compound page) dirty, if it was previously listed as clean. Then, release - * the page using put_user_page(). - * - * Please see the put_user_page() documentation for details. - * - * This is just like put_user_pages_dirty(), except that it invokes - * set_page_dirty_lock(), instead of set_page_dirty(). - * - */ -void put_user_pages_dirty_lock(struct page **pages, unsigned long npages) -{ - __put_user_pages_dirty(pages, npages, set_page_dirty_lock); + /* + * TODO: this can be optimized for huge pages: if a series of pages is + * physically contiguous and part of the same compound page, then a + * single operation to the head page should suffice. + */ + + if (!make_dirty) { + put_user_pages(pages, npages); + return; + } + + for (index = 0; index < npages; index++) { + struct page *page = compound_head(pages[index]); + /* + * Checking PageDirty at this point may race with + * clear_page_dirty_for_io(), but that's OK. Two key + * cases: + * + * 1) This code sees the page as already dirty, so it + * skips the call to set_page_dirty(). That could happen + * because clear_page_dirty_for_io() called + * page_mkclean(), followed by set_page_dirty(). + * However, now the page is going to get written back, + * which meets the original intention of setting it + * dirty, so all is well: clear_page_dirty_for_io() goes + * on to call TestClearPageDirty(), and write the page + * back. + * + * 2) This code sees the page as clean, so it calls + * set_page_dirty(). The page stays dirty, despite being + * written back, so it gets written back again in the + * next writeback cycle. This is harmless. + */ + if (!PageDirty(page)) + set_page_dirty_lock(page); + put_user_page(page); + } } EXPORT_SYMBOL(put_user_pages_dirty_lock);
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Santosh Shilimkar santosh.shilimkar@oracle.com Cc: David S. Miller davem@davemloft.net Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: rds-devel@oss.oracle.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- net/rds/info.c | 5 ++--- net/rds/message.c | 2 +- net/rds/rdma.c | 15 +++++++-------- 3 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/net/rds/info.c b/net/rds/info.c index 03f6fd56d237..ca6af2889adf 100644 --- a/net/rds/info.c +++ b/net/rds/info.c @@ -162,7 +162,6 @@ int rds_info_getsockopt(struct socket *sock, int optname, char __user *optval, struct rds_info_lengths lens; unsigned long nr_pages = 0; unsigned long start; - unsigned long i; rds_info_func func; struct page **pages = NULL; int ret; @@ -235,8 +234,8 @@ int rds_info_getsockopt(struct socket *sock, int optname, char __user *optval, ret = -EFAULT;
out: - for (i = 0; pages && i < nr_pages; i++) - put_page(pages[i]); + if (pages) + put_user_pages(pages, nr_pages); kfree(pages);
return ret; diff --git a/net/rds/message.c b/net/rds/message.c index 50f13f1d4ae0..d7b0d266c437 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -404,7 +404,7 @@ static int rds_message_zcopy_from_user(struct rds_message *rm, struct iov_iter * int i;
for (i = 0; i < rm->data.op_nents; i++) - put_page(sg_page(&rm->data.op_sg[i])); + put_user_page(sg_page(&rm->data.op_sg[i])); mmp = &rm->data.op_mmp_znotifier->z_mmp; mm_unaccount_pinned_pages(mmp); ret = -EFAULT; diff --git a/net/rds/rdma.c b/net/rds/rdma.c index 916f5ec373d8..6762e8696b99 100644 --- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -162,8 +162,7 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages, pages);
if (ret >= 0 && ret < nr_pages) { - while (ret--) - put_page(pages[ret]); + put_user_pages(pages, ret); ret = -EFAULT; }
@@ -276,7 +275,7 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args,
if (IS_ERR(trans_private)) { for (i = 0 ; i < nents; i++) - put_page(sg_page(&sg[i])); + put_user_page(sg_page(&sg[i])); kfree(sg); ret = PTR_ERR(trans_private); goto out; @@ -464,9 +463,10 @@ void rds_rdma_free_op(struct rm_rdma_op *ro) * to local memory */ if (!ro->op_write) { WARN_ON(!page->mapping && irqs_disabled()); - set_page_dirty(page); + put_user_pages_dirty_lock(&page, 1, true); + } else { + put_user_page(page); } - put_page(page); }
kfree(ro->op_notifier); @@ -481,8 +481,7 @@ void rds_atomic_free_op(struct rm_atomic_op *ao) /* Mark page dirty if it was possibly modified, which * is the case for a RDMA_READ which copies from remote * to local memory */ - set_page_dirty(page); - put_page(page); + put_user_pages_dirty_lock(&page, 1, true);
kfree(ao->op_notifier); ao->op_notifier = NULL; @@ -867,7 +866,7 @@ int rds_cmsg_atomic(struct rds_sock *rs, struct rds_message *rm, return ret; err: if (page) - put_page(page); + put_user_page(page); rm->atomic.op_active = 0; kfree(rm->atomic.op_notifier);
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Ilya Dryomov idryomov@gmail.com Cc: Sage Weil sage@redhat.com Cc: David S. Miller davem@davemloft.net Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- net/ceph/pagevec.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c index 64305e7056a1..c88fff2ab9bd 100644 --- a/net/ceph/pagevec.c +++ b/net/ceph/pagevec.c @@ -12,13 +12,7 @@
void ceph_put_page_vector(struct page **pages, int num_pages, bool dirty) { - int i; - - for (i = 0; i < num_pages; i++) { - if (dirty) - set_page_dirty_lock(pages[i]); - put_page(pages[i]); - } + put_user_pages_dirty_lock(pages, num_pages, dirty); kvfree(pages); } EXPORT_SYMBOL(ceph_put_page_vector);
On Thu, 2019-08-01 at 19:19 -0700, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Ilya Dryomov idryomov@gmail.com Cc: Sage Weil sage@redhat.com Cc: David S. Miller davem@davemloft.net Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com
net/ceph/pagevec.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c index 64305e7056a1..c88fff2ab9bd 100644 --- a/net/ceph/pagevec.c +++ b/net/ceph/pagevec.c @@ -12,13 +12,7 @@
void ceph_put_page_vector(struct page **pages, int num_pages, bool dirty) {
- int i;
- for (i = 0; i < num_pages; i++) {
if (dirty)
set_page_dirty_lock(pages[i]);
put_page(pages[i]);
- }
- put_user_pages_dirty_lock(pages, num_pages, dirty); kvfree(pages);
} EXPORT_SYMBOL(ceph_put_page_vector);
This patch looks sane enough. Assuming that the earlier patches are OK:
Acked-by: Jeff Layton jlayton@kernel.org
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Joerg Roedel joro@8bytes.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: "Radim Krčmář" rkrcmar@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: x86@kernel.org Cc: kvm@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- arch/x86/kvm/svm.c | 4 ++-- virt/kvm/kvm_main.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 7eafc6907861..ff93c923ed36 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1827,7 +1827,7 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
err: if (npinned > 0) - release_pages(pages, npinned); + put_user_pages(pages, npinned);
kvfree(pages); return NULL; @@ -1838,7 +1838,7 @@ static void sev_unpin_memory(struct kvm *kvm, struct page **pages, { struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
- release_pages(pages, npages); + put_user_pages(pages, npages); kvfree(pages); sev->pages_locked -= npages; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 887f3b0c2b60..4b6a596ea8e9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1499,7 +1499,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
if (__get_user_pages_fast(addr, 1, 1, &wpage) == 1) { *writable = true; - put_page(page); + put_user_page(page); page = wpage; } } @@ -1831,7 +1831,7 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(kvm_pfn_t pfn) { if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) - put_page(pfn_to_page(pfn)); + put_user_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Joerg Roedel joro@8bytes.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: "Radim Krčmář" rkrcmar@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: x86@kernel.org Cc: kvm@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c index e8778ebb72e6..a0144a5ee325 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c @@ -686,7 +686,7 @@ static int etnaviv_gem_userptr_get_pages(struct etnaviv_gem_object *etnaviv_obj) ret = get_user_pages_fast(ptr, num_pages, !userptr->ro ? FOLL_WRITE : 0, pages); if (ret < 0) { - release_pages(pvec, pinned); + put_user_pages(pvec, pinned); kvfree(pvec); return ret; } @@ -710,7 +710,7 @@ static void etnaviv_gem_userptr_release(struct etnaviv_gem_object *etnaviv_obj) if (etnaviv_obj->pages) { int npages = etnaviv_obj->base.size >> PAGE_SHIFT;
- release_pages(etnaviv_obj->pages, npages); + put_user_pages(etnaviv_obj->pages, npages); kvfree(etnaviv_obj->pages); } }
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Note that this effectively changes the code's behavior in i915_gem_userptr_put_pages(): it now calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate.
As Christophe Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1]
[1] https://lore.kernel.org/r/20190723153640.GB720@lst.de
Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: David Airlie airlied@linux.ie Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 528b61678334..c18008d3cc2a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -527,7 +527,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) } mutex_unlock(&obj->mm.lock);
- release_pages(pvec, pinned); + put_user_pages(pvec, pinned); kvfree(pvec);
i915_gem_object_put(obj); @@ -640,7 +640,7 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) __i915_gem_userptr_set_active(obj, true);
if (IS_ERR(pages)) - release_pages(pvec, pinned); + put_user_pages(pvec, pinned); kvfree(pvec);
return PTR_ERR_OR_ZERO(pages); @@ -663,11 +663,8 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj, i915_gem_gtt_finish_pages(obj, pages);
for_each_sgt_page(page, sgt_iter, pages) { - if (obj->mm.dirty) - set_page_dirty(page); - mark_page_accessed(page); - put_page(page); + put_user_pages_dirty_lock(&page, 1, obj->mm.dirty); } obj->mm.dirty = false;
Quoting john.hubbard@gmail.com (2019-08-02 05:19:37)
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Note that this effectively changes the code's behavior in i915_gem_userptr_put_pages(): it now calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate.
We've already fixed this in drm-tip where the current code uses set_page_dirty_lock().
This would conflict with our tree. Rodrigo is handling drm-intel-next for 5.4, so you guys want to coordinate how to merge.
Regards, Joonas
On 8/2/19 2:19 AM, Joonas Lahtinen wrote:
Quoting john.hubbard@gmail.com (2019-08-02 05:19:37)
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Note that this effectively changes the code's behavior in i915_gem_userptr_put_pages(): it now calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate.
We've already fixed this in drm-tip where the current code uses set_page_dirty_lock().
This would conflict with our tree. Rodrigo is handling drm-intel-next for 5.4, so you guys want to coordinate how to merge.
Hi Joonas, Rodrigo,
First of all, I apologize for the API breakage: put_user_pages_dirty_lock() has an additional "dirty" parameter.
In order to deal with the merge problem, I'll drop this patch from my series, and I'd recommend that the drm-intel-next take the following approach:
1) For now, s/put_page/put_user_page/ in i915_gem_userptr_put_pages(), and fix up the set_page_dirty() --> set_page_dirty_lock() issue, like this (based against linux.git):
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 528b61678334..94721cc0093b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -664,10 +664,10 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
for_each_sgt_page(page, sgt_iter, pages) { if (obj->mm.dirty) - set_page_dirty(page); + set_page_dirty_lock(page);
mark_page_accessed(page); - put_page(page); + put_user_page(page); } obj->mm.dirty = false;
That will leave you with your original set_page_dirty_lock() calls and everything works properly.
2) Next cycle, move to the new put_user_pages_dirty_lock().
thanks,
On 8/2/19 11:48 AM, John Hubbard wrote:
On 8/2/19 2:19 AM, Joonas Lahtinen wrote:
Quoting john.hubbard@gmail.com (2019-08-02 05:19:37)
From: John Hubbard jhubbard@nvidia.com
...
In order to deal with the merge problem, I'll drop this patch from my series, and I'd recommend that the drm-intel-next take the following approach:
Actually, I just pulled the latest linux.git, and there are a few changes:
- For now, s/put_page/put_user_page/ in i915_gem_userptr_put_pages(),
and fix up the set_page_dirty() --> set_page_dirty_lock() issue, like this (based against linux.git):
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 528b61678334..94721cc0093b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -664,10 +664,10 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
for_each_sgt_page(page, sgt_iter, pages) { if (obj->mm.dirty) - set_page_dirty(page); + set_page_dirty_lock(page);
I see you've already applied this fix to your tree, in linux.git already.
mark_page_accessed(page); - put_page(page); + put_user_page(page);
But this conversion still needs doing. So I'll repost a patch that only does this (plus the other call sites).
That can go in via either your tree, or Andrew's -mm tree, without generating any conflicts.
thanks,
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: David (ChunMing) Zhou David1.Zhou@amd.com Cc: David Airlie airlied@linux.ie Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index fb3696bc616d..4c9943fa10df 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -540,7 +540,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm) kfree(ttm->sg);
release_pages: - release_pages(ttm->pages, pinned); + put_user_pages(ttm->pages, pinned); return r; }
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Andy Walls awalls@md.metrocast.net Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: linux-media@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/media/pci/ivtv/ivtv-udma.c | 14 ++++---------- drivers/media/pci/ivtv/ivtv-yuv.c | 10 +++------- 2 files changed, 7 insertions(+), 17 deletions(-)
diff --git a/drivers/media/pci/ivtv/ivtv-udma.c b/drivers/media/pci/ivtv/ivtv-udma.c index 5f8883031c9c..7c7f33c2412b 100644 --- a/drivers/media/pci/ivtv/ivtv-udma.c +++ b/drivers/media/pci/ivtv/ivtv-udma.c @@ -92,7 +92,7 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr, { struct ivtv_dma_page_info user_dma; struct ivtv_user_dma *dma = &itv->udma; - int i, err; + int err;
IVTV_DEBUG_DMA("ivtv_udma_setup, dst: 0x%08x\n", (unsigned int)ivtv_dest_addr);
@@ -119,8 +119,7 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr, IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n", err, user_dma.page_count); if (err >= 0) { - for (i = 0; i < err; i++) - put_page(dma->map[i]); + put_user_pages(dma->map, err); return -EINVAL; } return err; @@ -130,9 +129,7 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr,
/* Fill SG List with new values */ if (ivtv_udma_fill_sg_list(dma, &user_dma, 0) < 0) { - for (i = 0; i < dma->page_count; i++) { - put_page(dma->map[i]); - } + put_user_pages(dma->map, dma->page_count); dma->page_count = 0; return -ENOMEM; } @@ -153,7 +150,6 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr, void ivtv_udma_unmap(struct ivtv *itv) { struct ivtv_user_dma *dma = &itv->udma; - int i;
IVTV_DEBUG_INFO("ivtv_unmap_user_dma\n");
@@ -170,9 +166,7 @@ void ivtv_udma_unmap(struct ivtv *itv) ivtv_udma_sync_for_cpu(itv);
/* Release User Pages */ - for (i = 0; i < dma->page_count; i++) { - put_page(dma->map[i]); - } + put_user_pages(dma->map, dma->page_count); dma->page_count = 0; }
diff --git a/drivers/media/pci/ivtv/ivtv-yuv.c b/drivers/media/pci/ivtv/ivtv-yuv.c index cd2fe2d444c0..9465a7d450b6 100644 --- a/drivers/media/pci/ivtv/ivtv-yuv.c +++ b/drivers/media/pci/ivtv/ivtv-yuv.c @@ -81,8 +81,7 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct ivtv_user_dma *dma, uv_pages, uv_dma.page_count);
if (uv_pages >= 0) { - for (i = 0; i < uv_pages; i++) - put_page(dma->map[y_pages + i]); + put_user_pages(&dma->map[y_pages], uv_pages); rc = -EFAULT; } else { rc = uv_pages; @@ -93,8 +92,7 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct ivtv_user_dma *dma, y_pages, y_dma.page_count); } if (y_pages >= 0) { - for (i = 0; i < y_pages; i++) - put_page(dma->map[i]); + put_user_pages(dma->map, y_pages); /* * Inherit the -EFAULT from rc's * initialization, but allow it to be @@ -112,9 +110,7 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct ivtv_user_dma *dma, /* Fill & map SG List */ if (ivtv_udma_fill_sg_list (dma, &uv_dma, ivtv_udma_fill_sg_list (dma, &y_dma, 0)) < 0) { IVTV_DEBUG_WARN("could not allocate bounce buffers for highmem userspace buffers\n"); - for (i = 0; i < dma->page_count; i++) { - put_page(dma->map[i]); - } + put_user_pages(dma->map, dma->page_count); dma->page_count = 0; return -ENOMEM; }
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: Kees Cook keescook@chromium.org Cc: Hans Verkuil hans.verkuil@cisco.com Cc: Sakari Ailus sakari.ailus@linux.intel.com Cc: Jan Kara jack@suse.cz Cc: Robin Murphy robin.murphy@arm.com Cc: Souptick Joarder jrdr.linux@gmail.com Cc: Dan Williams dan.j.williams@intel.com Cc: linux-media@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/media/v4l2-core/videobuf-dma-sg.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 66a6c6c236a7..d6eeb437ec19 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -349,8 +349,7 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma) BUG_ON(dma->sglen);
if (dma->pages) { - for (i = 0; i < dma->nr_pages; i++) - put_page(dma->pages[i]); + put_user_pages(dma->pages, dma->nr_pages); kfree(dma->pages); dma->pages = NULL; }
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
This changes the release code slightly, because each page slot in the page_list[] array is no longer checked for NULL. However, that check was wrong anyway, because the get_user_pages() pattern of usage here never allowed for NULL entries within a range of pinned pages.
Cc: Frank Haverkamp haver@linux.vnet.ibm.com Cc: "Guilherme G. Piccoli" gpiccoli@linux.vnet.ibm.com Cc: Arnd Bergmann arnd@arndb.de Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/misc/genwqe/card_utils.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-)
diff --git a/drivers/misc/genwqe/card_utils.c b/drivers/misc/genwqe/card_utils.c index 2e1c4d2905e8..2a888f31d2c5 100644 --- a/drivers/misc/genwqe/card_utils.c +++ b/drivers/misc/genwqe/card_utils.c @@ -517,24 +517,13 @@ int genwqe_free_sync_sgl(struct genwqe_dev *cd, struct genwqe_sgl *sgl) /** * genwqe_free_user_pages() - Give pinned pages back * - * Documentation of get_user_pages is in mm/gup.c: - * - * If the page is written to, set_page_dirty (or set_page_dirty_lock, - * as appropriate) must be called after the page is finished with, and - * before put_page is called. + * The pages may have been written to, so we call put_user_pages_dirty_lock(), + * rather than put_user_pages(). */ static int genwqe_free_user_pages(struct page **page_list, unsigned int nr_pages, int dirty) { - unsigned int i; - - for (i = 0; i < nr_pages; i++) { - if (page_list[i] != NULL) { - if (dirty) - set_page_dirty_lock(page_list[i]); - put_page(page_list[i]); - } - } + put_user_pages_dirty_lock(page_list, nr_pages, dirty); return 0; }
On Thu, Aug 01, 2019 at 07:19:41PM -0700, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
This changes the release code slightly, because each page slot in the page_list[] array is no longer checked for NULL. However, that check was wrong anyway, because the get_user_pages() pattern of usage here never allowed for NULL entries within a range of pinned pages.
Cc: Frank Haverkamp haver@linux.vnet.ibm.com Cc: "Guilherme G. Piccoli" gpiccoli@linux.vnet.ibm.com Cc: Arnd Bergmann arnd@arndb.de Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: John Hubbard jhubbard@nvidia.com
drivers/misc/genwqe/card_utils.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-)
Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Ashutosh Dixit ashutosh.dixit@intel.com Cc: Arnd Bergmann arnd@arndb.de Cc: Joerg Roedel jroedel@suse.de Cc: Robin Murphy robin.murphy@arm.com Cc: Zhen Lei thunder.leizhen@huawei.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/misc/mic/scif/scif_rma.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/misc/mic/scif/scif_rma.c b/drivers/misc/mic/scif/scif_rma.c index 01e27682ea30..d84ed9466920 100644 --- a/drivers/misc/mic/scif/scif_rma.c +++ b/drivers/misc/mic/scif/scif_rma.c @@ -113,13 +113,14 @@ static int scif_destroy_pinned_pages(struct scif_pinned_pages *pin) int writeable = pin->prot & SCIF_PROT_WRITE; int kernel = SCIF_MAP_KERNEL & pin->map_flags;
- for (j = 0; j < pin->nr_pages; j++) { - if (pin->pages[j] && !kernel) { + if (kernel) { + for (j = 0; j < pin->nr_pages; j++) { if (writeable) - SetPageDirty(pin->pages[j]); + set_page_dirty_lock(pin->pages[j]); put_page(pin->pages[j]); } - } + } else + put_user_pages_dirty_lock(pin->pages, pin->nr_pages, writeable);
scif_free(pin->pages, pin->nr_pages * sizeof(*pin->pages)); @@ -1385,11 +1386,9 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot, if (ulimit) __scif_dec_pinned_vm_lock(mm, nr_pages); /* Roll back any pinned pages */ - for (i = 0; i < pinned_pages->nr_pages; i++) { - if (pinned_pages->pages[i]) - put_page( - pinned_pages->pages[i]); - } + put_user_pages(pinned_pages->pages, + pinned_pages->nr_pages); + prot &= ~SCIF_PROT_WRITE; try_upgrade = false; goto retry;
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Note that this effectively changes the code's behavior in qp_release_pages(): it now ultimately calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate.
As Christophe Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1]
[1] https://lore.kernel.org/r/20190723153640.GB720@lst.de
Cc: Arnd Bergmann arnd@arndb.de Cc: Al Viro viro@zeniv.linux.org.uk Cc: "Gustavo A. R. Silva" gustavo@embeddedor.com Cc: Kees Cook keescook@chromium.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/misc/vmw_vmci/vmci_context.c | 2 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 11 ++--------- 2 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/misc/vmw_vmci/vmci_context.c b/drivers/misc/vmw_vmci/vmci_context.c index 16695366ec92..9daa52ee63b7 100644 --- a/drivers/misc/vmw_vmci/vmci_context.c +++ b/drivers/misc/vmw_vmci/vmci_context.c @@ -587,7 +587,7 @@ void vmci_ctx_unset_notify(struct vmci_ctx *context)
if (notify_page) { kunmap(notify_page); - put_page(notify_page); + put_user_page(notify_page); } }
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c index 8531ae781195..e5434551d0ef 100644 --- a/drivers/misc/vmw_vmci/vmci_queue_pair.c +++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c @@ -626,15 +626,8 @@ static void qp_release_queue_mutex(struct vmci_queue *queue) static void qp_release_pages(struct page **pages, u64 num_pages, bool dirty) { - int i; - - for (i = 0; i < num_pages; i++) { - if (dirty) - set_page_dirty(pages[i]); - - put_page(pages[i]); - pages[i] = NULL; - } + put_user_pages_dirty_lock(pages, num_pages, dirty); + memset(pages, 0, num_pages * sizeof(struct page *)); }
/*
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Matt Porter mporter@kernel.crashing.org Cc: Alexandre Bounine alex.bou9@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Logan Gunthorpe logang@deltatee.com Cc: Christophe JAILLET christophe.jaillet@wanadoo.fr Cc: Ioan Nicu ioan.nicu.ext@nokia.com Cc: Kees Cook keescook@chromium.org Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/rapidio/devices/rio_mport_cdev.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c index 8155f59ece38..0e8ea0e5a89e 100644 --- a/drivers/rapidio/devices/rio_mport_cdev.c +++ b/drivers/rapidio/devices/rio_mport_cdev.c @@ -572,14 +572,12 @@ static void dma_req_free(struct kref *ref) struct mport_dma_req *req = container_of(ref, struct mport_dma_req, refcount); struct mport_cdev_priv *priv = req->priv; - unsigned int i;
dma_unmap_sg(req->dmach->device->dev, req->sgt.sgl, req->sgt.nents, req->dir); sg_free_table(&req->sgt); if (req->page_list) { - for (i = 0; i < req->nr_pages; i++) - put_page(req->page_list[i]); + put_user_pages(req->page_list, req->nr_pages); kfree(req->page_list); }
@@ -815,7 +813,7 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode, struct mport_dma_req *req; struct mport_dev *md = priv->md; struct dma_chan *chan; - int i, ret; + int ret; int nents;
if (xfer->length == 0) @@ -946,8 +944,7 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
err_pg: if (!req->page_list) { - for (i = 0; i < nr_pages; i++) - put_page(page_list[i]); + put_user_pages(page_list, nr_pages); kfree(page_list); } err_req:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: David S. Miller davem@davemloft.net Cc: Jonathan Helman jonathan.helman@oracle.com Cc: Rob Gardner rob.gardner@oracle.com Cc: Andy Shevchenko andy.shevchenko@gmail.com Cc: Jonathan Corbet corbet@lwn.net Cc: Wei Yongjun weiyongjun1@huawei.com Cc: Mauro Carvalho Chehab mchehab+samsung@kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/sbus/char/oradax.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/sbus/char/oradax.c b/drivers/sbus/char/oradax.c index 8af216287a84..029e619992fc 100644 --- a/drivers/sbus/char/oradax.c +++ b/drivers/sbus/char/oradax.c @@ -412,7 +412,7 @@ static void dax_unlock_pages(struct dax_ctx *ctx, int ccb_index, int nelem) dax_dbg("freeing page %p", p); if (j == OUT) set_page_dirty(p); - put_page(p); + put_user_page(p); ctx->pages[i][j] = NULL; } }
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Eric Anholt eric@anholt.net Cc: Stefan Wahren stefan.wahren@i2se.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Mihaela Muraru mihaela.muraru21@gmail.com Cc: Suniel Mahesh sunil.m@techveda.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Sidong Yang realwakka@gmail.com Cc: Kishore KP kishore.p@techveda.org Cc: linux-rpi-kernel@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: devel@driverdev.osuosl.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- .../vc04_services/interface/vchiq_arm/vchiq_2835_arm.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c index 61c69f353cdb..ec92b4c50e95 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c @@ -336,10 +336,7 @@ cleanup_pagelistinfo(struct vchiq_pagelist_info *pagelistinfo) }
if (pagelistinfo->pages_need_release) { - unsigned int i; - - for (i = 0; i < pagelistinfo->num_pages; i++) - put_page(pagelistinfo->pages[i]); + put_user_pages(pagelistinfo->pages, pagelistinfo->num_pages); }
dma_free_coherent(g_dev, pagelistinfo->pagelist_buffer_size, @@ -454,10 +451,7 @@ create_pagelist(char __user *buf, size_t count, unsigned short type) __func__, actual_pages, num_pages);
/* This is probably due to the process being killed */ - while (actual_pages > 0) { - actual_pages--; - put_page(pages[actual_pages]); - } + put_user_pages(pages, actual_pages); cleanup_pagelistinfo(pagelistinfo); return NULL; }
On Thu, Aug 01, 2019 at 07:19:46PM -0700, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Eric Anholt eric@anholt.net Cc: Stefan Wahren stefan.wahren@i2se.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Mihaela Muraru mihaela.muraru21@gmail.com Cc: Suniel Mahesh sunil.m@techveda.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Sidong Yang realwakka@gmail.com Cc: Kishore KP kishore.p@techveda.org Cc: linux-rpi-kernel@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: devel@driverdev.osuosl.org Signed-off-by: John Hubbard jhubbard@nvidia.com
.../vc04_services/interface/vchiq_arm/vchiq_2835_arm.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Jens Wiklander jens.wiklander@linaro.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/tee/tee_shm.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c index 2da026fd12c9..c967d0420b67 100644 --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -31,16 +31,13 @@ static void tee_shm_release(struct tee_shm *shm)
poolm->ops->free(poolm, shm); } else if (shm->flags & TEE_SHM_REGISTER) { - size_t n; int rc = teedev->desc->ops->shm_unregister(shm->ctx, shm);
if (rc) dev_err(teedev->dev.parent, "unregister shm %p failed: %d", shm, rc);
- for (n = 0; n < shm->num_pages; n++) - put_page(shm->pages[n]); - + put_user_pages(shm->pages, shm->num_pages); kfree(shm->pages); }
@@ -313,16 +310,13 @@ struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr, return shm; err: if (shm) { - size_t n; - if (shm->id >= 0) { mutex_lock(&teedev->mutex); idr_remove(&teedev->idr, shm->id); mutex_unlock(&teedev->mutex); } if (shm->pages) { - for (n = 0; n < shm->num_pages; n++) - put_page(shm->pages[n]); + put_user_pages(shm->pages, shm->num_pages); kfree(shm->pages); } }
On Fri, Aug 2, 2019 at 4:20 AM john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Jens Wiklander jens.wiklander@linaro.org Signed-off-by: John Hubbard jhubbard@nvidia.com
drivers/tee/tee_shm.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
Acked-by: Jens Wiklander jens.wiklander@linaro.org
I suppose you're taking this via your own tree or such.
Thanks, Jens
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c index 2da026fd12c9..c967d0420b67 100644 --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -31,16 +31,13 @@ static void tee_shm_release(struct tee_shm *shm)
poolm->ops->free(poolm, shm); } else if (shm->flags & TEE_SHM_REGISTER) {
size_t n; int rc = teedev->desc->ops->shm_unregister(shm->ctx, shm); if (rc) dev_err(teedev->dev.parent, "unregister shm %p failed: %d", shm, rc);
for (n = 0; n < shm->num_pages; n++)
put_page(shm->pages[n]);
put_user_pages(shm->pages, shm->num_pages); kfree(shm->pages); }
@@ -313,16 +310,13 @@ struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr, return shm; err: if (shm) {
size_t n;
if (shm->id >= 0) { mutex_lock(&teedev->mutex); idr_remove(&teedev->idr, shm->id); mutex_unlock(&teedev->mutex); } if (shm->pages) {
for (n = 0; n < shm->num_pages; n++)
put_page(shm->pages[n]);
put_user_pages(shm->pages, shm->num_pages); kfree(shm->pages); } }
-- 2.22.0
On 8/1/19 11:29 PM, Jens Wiklander wrote:
On Fri, Aug 2, 2019 at 4:20 AM john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Jens Wiklander jens.wiklander@linaro.org Signed-off-by: John Hubbard jhubbard@nvidia.com
drivers/tee/tee_shm.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
Acked-by: Jens Wiklander jens.wiklander@linaro.org
I suppose you're taking this via your own tree or such.
Hi Jens,
Thanks for the ACK! I'm expecting that Andrew will take this through his -mm tree, unless he pops up and says otherwise.
thanks,
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Note that this effectively changes the code's behavior in qp_release_pages(): it now ultimately calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate.
As Christophe Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1]
[1] https://lore.kernel.org/r/20190723153640.GB720@lst.de
Cc: Alex Williamson alex.williamson@redhat.com Cc: kvm@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/vfio/vfio_iommu_type1.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 054391f30fa8..5a5461a14299 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -320,9 +320,9 @@ static int put_pfn(unsigned long pfn, int prot) { if (!is_invalid_reserved_pfn(pfn)) { struct page *page = pfn_to_page(pfn); - if (prot & IOMMU_WRITE) - SetPageDirty(page); - put_page(page); + bool dirty = prot & IOMMU_WRITE; + + put_user_pages_dirty_lock(&page, 1, dirty); return 1; } return 0; @@ -356,7 +356,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, */ if (ret > 0 && vma_is_fsdax(vmas[0])) { ret = -EOPNOTSUPP; - put_page(page[0]); + put_user_page(page[0]); } } up_read(&mm->mmap_sem);
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Cc: Kees Cook keescook@chromium.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Bhumika Goyal bhumirks@gmail.com Cc: Arvind Yadav arvind.yadav.cs@gmail.com Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/video/fbdev/pvr2fb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c index 7ff4b6b84282..0e4f9aa6444d 100644 --- a/drivers/video/fbdev/pvr2fb.c +++ b/drivers/video/fbdev/pvr2fb.c @@ -700,8 +700,7 @@ static ssize_t pvr2fb_write(struct fb_info *info, const char *buf, ret = count;
out_unmap: - for (i = 0; i < nr_pages; i++) - put_page(pages[i]); + put_user_pages(pages, nr_pages);
kfree(pages);
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
This changes the release code slightly, because each page slot in the page_list[] array is no longer checked for NULL. However, that check was wrong anyway, because the get_user_pages() pattern of usage here never allowed for NULL entries within a range of pinned pages.
Cc: Al Viro viro@zeniv.linux.org.uk Cc: Kees Cook keescook@chromium.org Cc: Rob Herring robh@kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/virt/fsl_hypervisor.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/virt/fsl_hypervisor.c b/drivers/virt/fsl_hypervisor.c index 93d5bebf9572..a8f78d572c45 100644 --- a/drivers/virt/fsl_hypervisor.c +++ b/drivers/virt/fsl_hypervisor.c @@ -292,11 +292,8 @@ static long ioctl_memcpy(struct fsl_hv_ioctl_memcpy __user *p) virt_to_phys(sg_list), num_pages);
exit: - if (pages) { - for (i = 0; i < num_pages; i++) - if (pages[i]) - put_page(pages[i]); - } + if (pages) + put_user_pages(pages, num_pages);
kfree(sg_list_unaligned); kfree(pages);
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: xen-devel@lists.xenproject.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/xen/gntdev.c | 5 +---- drivers/xen/privcmd.c | 7 +------ 2 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 4c339c7e66e5..2586b3df2bb6 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -864,10 +864,7 @@ static int gntdev_get_page(struct gntdev_copy_batch *batch, void __user *virt,
static void gntdev_put_pages(struct gntdev_copy_batch *batch) { - unsigned int i; - - for (i = 0; i < batch->nr_pages; i++) - put_page(batch->pages[i]); + put_user_pages(batch->pages, batch->nr_pages); batch->nr_pages = 0; }
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 2f5ce7230a43..29e461dbee2d 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -611,15 +611,10 @@ static int lock_pages(
static void unlock_pages(struct page *pages[], unsigned int nr_pages) { - unsigned int i; - if (!pages) return;
- for (i = 0; i < nr_pages; i++) { - if (pages[i]) - put_page(pages[i]); - } + put_user_pages(pages, nr_pages); }
static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
On 02.08.19 04:19, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: xen-devel@lists.xenproject.org Signed-off-by: John Hubbard jhubbard@nvidia.com
drivers/xen/gntdev.c | 5 +---- drivers/xen/privcmd.c | 7 +------ 2 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 4c339c7e66e5..2586b3df2bb6 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -864,10 +864,7 @@ static int gntdev_get_page(struct gntdev_copy_batch *batch, void __user *virt,
static void gntdev_put_pages(struct gntdev_copy_batch *batch) {
- unsigned int i;
- for (i = 0; i < batch->nr_pages; i++)
put_page(batch->pages[i]);
- put_user_pages(batch->pages, batch->nr_pages); batch->nr_pages = 0; }
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 2f5ce7230a43..29e461dbee2d 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -611,15 +611,10 @@ static int lock_pages(
static void unlock_pages(struct page *pages[], unsigned int nr_pages) {
unsigned int i;
if (!pages) return;
for (i = 0; i < nr_pages; i++) {
if (pages[i])
put_page(pages[i]);
}
- put_user_pages(pages, nr_pages);
You are not handling the case where pages[i] is NULL here. Or am I missing a pending patch to put_user_pages() here?
Juergen
On 8/1/19 9:36 PM, Juergen Gross wrote:
On 02.08.19 04:19, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
...
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 2f5ce7230a43..29e461dbee2d 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -611,15 +611,10 @@ static int lock_pages( static void unlock_pages(struct page *pages[], unsigned int nr_pages) { - unsigned int i;
if (!pages) return; - for (i = 0; i < nr_pages; i++) { - if (pages[i]) - put_page(pages[i]); - } + put_user_pages(pages, nr_pages);
You are not handling the case where pages[i] is NULL here. Or am I missing a pending patch to put_user_pages() here?
Hi Juergen,
You are correct--this no longer handles the cases where pages[i] is NULL. It's intentional, though possibly wrong. :)
I see that I should have added my standard blurb to this commit description. I missed this one, but some of the other patches have it. It makes the following, possibly incorrect claim:
"This changes the release code slightly, because each page slot in the page_list[] array is no longer checked for NULL. However, that check was wrong anyway, because the get_user_pages() pattern of usage here never allowed for NULL entries within a range of pinned pages."
The way I've seen these page arrays used with get_user_pages(), things are either done single page, or with a contiguous range. So unless I'm missing a case where someone is either
a) releasing individual pages within a range (and thus likely messing up their count of pages they have), or
b) allocating two gup ranges within the same pages[] array, with a gap between the allocations,
...then it should be correct. If so, then I'll add the above blurb to this patch's commit description.
If that's not the case (both here, and in 3 or 4 other patches in this series, then as you said, I should add NULL checks to put_user_pages() and put_user_pages_dirty_lock().
thanks,
On 02.08.19 07:48, John Hubbard wrote:
On 8/1/19 9:36 PM, Juergen Gross wrote:
On 02.08.19 04:19, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
...
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 2f5ce7230a43..29e461dbee2d 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -611,15 +611,10 @@ static int lock_pages( static void unlock_pages(struct page *pages[], unsigned int nr_pages) { - unsigned int i;
if (!pages) return; - for (i = 0; i < nr_pages; i++) { - if (pages[i]) - put_page(pages[i]); - } + put_user_pages(pages, nr_pages);
You are not handling the case where pages[i] is NULL here. Or am I missing a pending patch to put_user_pages() here?
Hi Juergen,
You are correct--this no longer handles the cases where pages[i] is NULL. It's intentional, though possibly wrong. :)
I see that I should have added my standard blurb to this commit description. I missed this one, but some of the other patches have it. It makes the following, possibly incorrect claim:
"This changes the release code slightly, because each page slot in the page_list[] array is no longer checked for NULL. However, that check was wrong anyway, because the get_user_pages() pattern of usage here never allowed for NULL entries within a range of pinned pages."
The way I've seen these page arrays used with get_user_pages(), things are either done single page, or with a contiguous range. So unless I'm missing a case where someone is either
a) releasing individual pages within a range (and thus likely messing up their count of pages they have), or
b) allocating two gup ranges within the same pages[] array, with a gap between the allocations,
...then it should be correct. If so, then I'll add the above blurb to this patch's commit description.
If that's not the case (both here, and in 3 or 4 other patches in this series, then as you said, I should add NULL checks to put_user_pages() and put_user_pages_dirty_lock().
In this case it is not correct, but can easily be handled. The NULL case can occur only in an error case with the pages array filled partially or not at all.
I'd prefer something like the attached patch here.
Juergen
On 02.08.19 07:48, John Hubbard wrote:
On 8/1/19 9:36 PM, Juergen Gross wrote:
On 02.08.19 04:19, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
...
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 2f5ce7230a43..29e461dbee2d 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -611,15 +611,10 @@ static int lock_pages( static void unlock_pages(struct page *pages[], unsigned int nr_pages) { - unsigned int i;
if (!pages) return; - for (i = 0; i < nr_pages; i++) { - if (pages[i]) - put_page(pages[i]); - } + put_user_pages(pages, nr_pages);
You are not handling the case where pages[i] is NULL here. Or am I missing a pending patch to put_user_pages() here?
Hi Juergen,
You are correct--this no longer handles the cases where pages[i] is NULL. It's intentional, though possibly wrong. :)
I see that I should have added my standard blurb to this commit description. I missed this one, but some of the other patches have it. It makes the following, possibly incorrect claim:
"This changes the release code slightly, because each page slot in the page_list[] array is no longer checked for NULL. However, that check was wrong anyway, because the get_user_pages() pattern of usage here never allowed for NULL entries within a range of pinned pages."
The way I've seen these page arrays used with get_user_pages(), things are either done single page, or with a contiguous range. So unless I'm missing a case where someone is either
a) releasing individual pages within a range (and thus likely messing up their count of pages they have), or
b) allocating two gup ranges within the same pages[] array, with a gap between the allocations,
...then it should be correct. If so, then I'll add the above blurb to this patch's commit description.
If that's not the case (both here, and in 3 or 4 other patches in this series, then as you said, I should add NULL checks to put_user_pages() and put_user_pages_dirty_lock().
In this case it is not correct, but can easily be handled. The NULL case can occur only in an error case with the pages array filled partially or not at all.
I'd prefer something like the attached patch here.
I'm not an expert in this code and have not looked at it carefully but that patch does seem to be the better fix than forcing NULL checks on everyone.
Ira
On 8/2/19 9:09 AM, Weiny, Ira wrote:
On 02.08.19 07:48, John Hubbard wrote:
On 8/1/19 9:36 PM, Juergen Gross wrote:
On 02.08.19 04:19, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
... If that's not the case (both here, and in 3 or 4 other patches in this series, then as you said, I should add NULL checks to put_user_pages() and put_user_pages_dirty_lock().
In this case it is not correct, but can easily be handled. The NULL case can occur only in an error case with the pages array filled partially or not at all.
I'd prefer something like the attached patch here.
I'm not an expert in this code and have not looked at it carefully but that patch does seem to be the better fix than forcing NULL checks on everyone.
OK, I'll use Juergen's approach, and also check for that pattern in the other patches.
thanks,
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- fs/exec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c index f7f6a140856a..ee442151582f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -227,7 +227,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
static void put_arg_page(struct page *page) { - put_page(page); + put_user_page(page); }
static void free_arg_pages(struct linux_binprm *bprm)
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Mike Marshall hubcap@omnibond.com Cc: Martin Brandenburg martin@omnibond.com Cc: devel@lists.orangefs.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- fs/orangefs/orangefs-bufmap.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/fs/orangefs/orangefs-bufmap.c b/fs/orangefs/orangefs-bufmap.c index 2bb916d68576..f2f33a16d604 100644 --- a/fs/orangefs/orangefs-bufmap.c +++ b/fs/orangefs/orangefs-bufmap.c @@ -168,10 +168,7 @@ static DEFINE_SPINLOCK(orangefs_bufmap_lock); static void orangefs_bufmap_unmap(struct orangefs_bufmap *bufmap) { - int i; - - for (i = 0; i < bufmap->page_count; i++) - put_page(bufmap->page_array[i]); + put_user_pages(bufmap->page_array, bufmap->page_count); }
static void @@ -280,7 +277,7 @@ orangefs_bufmap_map(struct orangefs_bufmap *bufmap,
for (i = 0; i < ret; i++) { SetPageError(bufmap->page_array[i]); - put_page(bufmap->page_array[i]); + put_user_page(bufmap->page_array[i]); } return -ENOMEM; }
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@redhat.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Namhyung Kim namhyung@kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- kernel/events/uprobes.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 84fa00497c49..4a575de8cec8 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -397,7 +397,7 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d) ret = 0; out: kunmap_atomic(kaddr); - put_page(page); + put_user_page(page); return ret; }
@@ -504,7 +504,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, ret = __replace_page(vma, vaddr, old_page, new_page); put_page(new_page); put_old: - put_page(old_page); + put_user_page(old_page);
if (unlikely(ret == -EAGAIN)) goto retry; @@ -1981,7 +1981,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr) return result;
copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); - put_page(page); + put_user_page(page); out: /* This needs to return true for any variant of the trap insn */ return is_trap_insn(&opcode);
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Darren Hart dvhart@infradead.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- kernel/futex.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c index 6d50728ef2e7..4b4cae58ec57 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -623,7 +623,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a lock_page(page); shmem_swizzled = PageSwapCache(page) || page->mapping; unlock_page(page); - put_page(page); + put_user_page(page);
if (shmem_swizzled) goto again; @@ -675,7 +675,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
if (READ_ONCE(page->mapping) != mapping) { rcu_read_unlock(); - put_page(page); + put_user_page(page);
goto again; } @@ -683,7 +683,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a inode = READ_ONCE(mapping->host); if (!inode) { rcu_read_unlock(); - put_page(page); + put_user_page(page);
goto again; } @@ -702,7 +702,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a */ if (!atomic_inc_not_zero(&inode->i_count)) { rcu_read_unlock(); - put_page(page); + put_user_page(page);
goto again; } @@ -723,7 +723,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a }
out: - put_page(page); + put_user_page(page); return err; }
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Dan Williams dan.j.williams@intel.com Cc: Jan Kara jack@suse.cz Cc: Mel Gorman mgorman@suse.de Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: John Hubbard jhubbard@nvidia.com --- mm/frame_vector.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/mm/frame_vector.c b/mm/frame_vector.c index c64dca6e27c2..f590badac776 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -120,7 +120,6 @@ EXPORT_SYMBOL(get_vaddr_frames); */ void put_vaddr_frames(struct frame_vector *vec) { - int i; struct page **pages;
if (!vec->got_ref) @@ -133,8 +132,7 @@ void put_vaddr_frames(struct frame_vector *vec) */ if (WARN_ON(IS_ERR(pages))) goto out; - for (i = 0; i < vec->nr_frames; i++) - put_page(pages[i]); + put_user_pages(pages, vec->nr_frames); vec->got_ref = false; out: vec->nr_frames = 0;
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Dan Carpenter dan.carpenter@oracle.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Keith Busch keith.busch@intel.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Michael S. Tsirkin mst@redhat.com Cc: YueHaibing yuehaibing@huawei.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- mm/gup_benchmark.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 7dd602d7f8db..515ac8eeb6ee 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -79,7 +79,7 @@ static int __gup_benchmark_ioctl(unsigned int cmd, for (i = 0; i < nr_pages; i++) { if (!pages[i]) break; - put_page(pages[i]); + put_user_page(pages[i]); } end_time = ktime_get(); gup->put_delta_usec = ktime_us_delta(end_time, start_time);
On Thu, Aug 01, 2019 at 07:19:57PM -0700, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Dan Carpenter dan.carpenter@oracle.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Keith Busch keith.busch@intel.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Michael S. Tsirkin mst@redhat.com Cc: YueHaibing yuehaibing@huawei.com Signed-off-by: John Hubbard jhubbard@nvidia.com
Looks fine.
Reviewed-by: Keith Busch keith.busch@intel.com
mm/gup_benchmark.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 7dd602d7f8db..515ac8eeb6ee 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -79,7 +79,7 @@ static int __gup_benchmark_ioctl(unsigned int cmd, for (i = 0; i < nr_pages; i++) { if (!pages[i]) break;
put_page(pages[i]);
} end_time = ktime_get(); gup->put_delta_usec = ktime_us_delta(end_time, start_time);put_user_page(pages[i]);
--
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Huang Ying ying.huang@intel.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Matthew Wilcox willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Peter Zijlstra peterz@infradead.org Cc: Rik van Riel riel@surriel.com Cc: Souptick Joarder jrdr.linux@gmail.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c index e2bb51b6242e..8870968496ea 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4337,7 +4337,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, buf, maddr + offset, bytes); } kunmap(page); - put_page(page); + put_user_page(page); } len -= bytes; buf += bytes;
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Dan Williams dan.j.williams@intel.com Cc: Daniel Black daniel@linux.ibm.com Cc: Jan Kara jack@suse.cz Cc: Jérôme Glisse jglisse@redhat.com Cc: Matthew Wilcox willy@infradead.org Cc: Mike Kravetz mike.kravetz@oracle.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c index 968df3aa069f..1c6881a761a5 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -672,7 +672,7 @@ static int madvise_inject_error(int behavior, * routine is responsible for pinning the page to prevent it * from being released back to the page allocator. */ - put_page(page); + put_user_page(page); ret = memory_failure(pfn, 0); if (ret) return ret;
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Al Viro viro@zeniv.linux.org.uk Cc: Andrea Arcangeli aarcange@redhat.com Cc: Christopher Yeoh cyeoh@au1.ibm.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Heiko Carstens heiko.carstens@de.ibm.com Cc: Ingo Molnar mingo@kernel.org Cc: Jann Horn jann@thejh.net Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: Rashika Kheria rashika.kheria@gmail.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- mm/process_vm_access.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index 357aa7bef6c0..4d29d54ec93f 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -96,7 +96,7 @@ static int process_vm_rw_single_vec(unsigned long addr, flags |= FOLL_WRITE;
while (!rc && nr_pages && iov_iter_count(iter)) { - int pages = min(nr_pages, max_pages_per_loop); + int pinned_pages = min(nr_pages, max_pages_per_loop); int locked = 1; size_t bytes;
@@ -106,14 +106,15 @@ static int process_vm_rw_single_vec(unsigned long addr, * current/current->mm */ down_read(&mm->mmap_sem); - pages = get_user_pages_remote(task, mm, pa, pages, flags, - process_pages, NULL, &locked); + pinned_pages = get_user_pages_remote(task, mm, pa, pinned_pages, + flags, process_pages, NULL, + &locked); if (locked) up_read(&mm->mmap_sem); - if (pages <= 0) + if (pinned_pages <= 0) return -EFAULT;
- bytes = pages * PAGE_SIZE - start_offset; + bytes = pinned_pages * PAGE_SIZE - start_offset; if (bytes > len) bytes = len;
@@ -122,10 +123,9 @@ static int process_vm_rw_single_vec(unsigned long addr, vm_write); len -= bytes; start_offset = 0; - nr_pages -= pages; - pa += pages * PAGE_SIZE; - while (pages) - put_page(process_pages[--pages]); + nr_pages -= pinned_pages; + pa += pinned_pages * PAGE_SIZE; + put_user_pages(process_pages, pinned_pages); }
return rc;
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Herbert Xu herbert@gondor.apana.org.au Cc: David S. Miller davem@davemloft.net Cc: linux-crypto@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- crypto/af_alg.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 879cf23f7489..edd358ea64da 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -428,10 +428,7 @@ static void af_alg_link_sg(struct af_alg_sgl *sgl_prev,
void af_alg_free_sg(struct af_alg_sgl *sgl) { - int i; - - for (i = 0; i < sgl->npages; i++) - put_page(sgl->pages[i]); + put_user_pages(sgl->pages, sgl->npages); } EXPORT_SYMBOL_GPL(af_alg_free_sg);
@@ -668,7 +665,7 @@ static void af_alg_free_areq_sgls(struct af_alg_async_req *areq) for_each_sg(tsgl, sg, areq->tsgl_entries, i) { if (!sg_page(sg)) continue; - put_page(sg_page(sg)); + put_user_page(sg_page(sg)); }
sock_kfree_s(sk, tsgl, areq->tsgl_entries * sizeof(*tsgl));
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Trond Myklebust trond.myklebust@hammerspace.com Cc: Anna Schumaker anna.schumaker@netapp.com Cc: linux-nfs@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- fs/nfs/direct.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 0cb442406168..b00b89dda3c5 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -278,9 +278,7 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
static void nfs_direct_release_pages(struct page **pages, unsigned int npages) { - unsigned int i; - for (i = 0; i < npages; i++) - put_page(pages[i]); + put_user_pages(pages, npages); }
void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
On 02/08/2019 3:20 am, john.hubbard@gmail.com wrote:
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Trond Myklebust trond.myklebust@hammerspace.com Cc: Anna Schumaker anna.schumaker@netapp.com Cc: linux-nfs@vger.kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com
fs/nfs/direct.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 0cb442406168..b00b89dda3c5 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -278,9 +278,7 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
static void nfs_direct_release_pages(struct page **pages, unsigned int npages) {
- unsigned int i;
- for (i = 0; i < npages; i++)
put_page(pages[i]);
- put_user_pages(pages, npages); }
Since it's static, and only called twice, might it be better to change its two callers [nfs_direct_{read,write}_schedule_iovec()] to call put_user_pages() directly, and remove nfs_direct_release_pages() entirely?
thanks, calum.
void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
On 8/2/19 6:27 PM, Calum Mackay wrote:
On 02/08/2019 3:20 am, john.hubbard@gmail.com wrote:
...
Since it's static, and only called twice, might it be better to change its two callers [nfs_direct_{read,write}_schedule_iovec()] to call put_user_pages() directly, and remove nfs_direct_release_pages() entirely?
thanks, calum.
void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
Hi Calum,
Absolutely! Is it OK to add your reviewed-by, with the following incremental patch made to this one?
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index b00b89dda3c5..c0c1b9f2c069 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -276,11 +276,6 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return nfs_file_direct_write(iocb, iter); }
-static void nfs_direct_release_pages(struct page **pages, unsigned int npages) -{ - put_user_pages(pages, npages); -} - void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo, struct nfs_direct_req *dreq) { @@ -510,7 +505,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; } - nfs_direct_release_pages(pagevec, npages); + put_user_pages(pagevec, npages); kvfree(pagevec); if (result < 0) break; @@ -933,7 +928,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; } - nfs_direct_release_pages(pagevec, npages); + put_user_pages(pagevec, npages); kvfree(pagevec); if (result < 0) break;
thanks,
On 03/08/2019 2:41 am, John Hubbard wrote:
On 8/2/19 6:27 PM, Calum Mackay wrote:
On 02/08/2019 3:20 am, john.hubbard@gmail.com wrote:
...
Since it's static, and only called twice, might it be better to change its two callers [nfs_direct_{read,write}_schedule_iovec()] to call put_user_pages() directly, and remove nfs_direct_release_pages() entirely?
thanks, calum.
void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
Hi Calum,
Absolutely! Is it OK to add your reviewed-by, with the following incremental patch made to this one?
Thanks John; looks good.
Reviewed-by: Calum Mackay calum.mackay@oracle.com
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index b00b89dda3c5..c0c1b9f2c069 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -276,11 +276,6 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return nfs_file_direct_write(iocb, iter); }
-static void nfs_direct_release_pages(struct page **pages, unsigned int npages) -{
put_user_pages(pages, npages);
-}
- void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo, struct nfs_direct_req *dreq) {
@@ -510,7 +505,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; }
nfs_direct_release_pages(pagevec, npages);
put_user_pages(pagevec, npages); kvfree(pagevec); if (result < 0) break;
@@ -933,7 +928,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; }
nfs_direct_release_pages(pagevec, npages);
put_user_pages(pagevec, npages); kvfree(pagevec); if (result < 0) break;
thanks,
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Note that this effectively changes the code's behavior in qp_release_pages(): it now ultimately calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate.
As Christophe Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1]
[1] https://lore.kernel.org/r/20190723153640.GB720@lst.de
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Roman Kiryanov rkir@google.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- drivers/platform/goldfish/goldfish_pipe.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c index cef0133aa47a..2bd21020e288 100644 --- a/drivers/platform/goldfish/goldfish_pipe.c +++ b/drivers/platform/goldfish/goldfish_pipe.c @@ -288,15 +288,12 @@ static int pin_user_pages(unsigned long first_page, static void release_user_pages(struct page **pages, int pages_count, int is_write, s32 consumed_size) { - int i; + bool dirty = !is_write && consumed_size > 0;
- for (i = 0; i < pages_count; i++) { - if (!is_write && consumed_size > 0) - set_page_dirty(pages[i]); - put_page(pages[i]); - } + put_user_pages_dirty_lock(pages, pages_count, dirty); }
+ /* Populate the call parameters, merging adjacent pages together */ static void populate_rw_params(struct page **pages, int pages_count,
From: John Hubbard jhubbard@nvidia.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@redhat.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Namhyung Kim namhyung@kernel.org Signed-off-by: John Hubbard jhubbard@nvidia.com --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 0463c1151bae..7be52bbbfe87 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6426,7 +6426,7 @@ static u64 perf_virt_to_phys(u64 virt) phys_addr = page_to_phys(p) + virt % PAGE_SIZE;
if (p) - put_page(p); + put_user_page(p); }
return phys_addr;
From: Ira Weiny ira.weiny@intel.com
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages().
This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions").
get_dump_page calls get_user_page so put_user_page must be used to match.
Signed-off-by: Ira Weiny ira.weiny@intel.com Signed-off-by: John Hubbard jhubbard@nvidia.com --- fs/binfmt_elf.c | 2 +- fs/binfmt_elf_fdpic.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index d4e11b2e04f6..92e4a5ca99d8 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -2377,7 +2377,7 @@ static int elf_core_dump(struct coredump_params *cprm) void *kaddr = kmap(page); stop = !dump_emit(cprm, kaddr, PAGE_SIZE); kunmap(page); - put_page(page); + put_user_page(page); } else stop = !dump_skip(cprm, PAGE_SIZE); if (stop) diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index d86ebd0dcc3d..321724b3be22 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -1511,7 +1511,7 @@ static bool elf_fdpic_dump_segments(struct coredump_params *cprm) void *kaddr = kmap(page); res = dump_emit(cprm, kaddr, PAGE_SIZE); kunmap(page); - put_page(page); + put_user_page(page); } else { res = dump_skip(cprm, PAGE_SIZE); }
On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote: [...]
- Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time.
How do we make sure this is the case and it will remain the case in the future? There must be some automagic to enforce/check that. It is simply not manageable to do it every now and then because then 3) will simply be never safe.
Have you considered coccinele or some other scripted way to do the transition? I have no idea how to deal with future changes that would break the balance though.
On Fri 02-08-19 11:12:44, Michal Hocko wrote:
On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote: [...]
- Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time.
How do we make sure this is the case and it will remain the case in the future? There must be some automagic to enforce/check that. It is simply not manageable to do it every now and then because then 3) will simply be never safe.
Have you considered coccinele or some other scripted way to do the transition? I have no idea how to deal with future changes that would break the balance though.
Yeah, that's why I've been suggesting at LSF/MM that we may need to create a gup wrapper - say vaddr_pin_pages() - and track which sites dropping references got converted by using this wrapper instead of gup. The counterpart would then be more logically named as unpin_page() or whatever instead of put_user_page(). Sure this is not completely foolproof (you can create new callsite using vaddr_pin_pages() and then just drop refs using put_page()) but I suppose it would be a high enough barrier for missed conversions... Thoughts?
Honza
On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
On Fri 02-08-19 11:12:44, Michal Hocko wrote:
On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote: [...]
- Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time.
How do we make sure this is the case and it will remain the case in the future? There must be some automagic to enforce/check that. It is simply not manageable to do it every now and then because then 3) will simply be never safe.
Have you considered coccinele or some other scripted way to do the transition? I have no idea how to deal with future changes that would break the balance though.
Yeah, that's why I've been suggesting at LSF/MM that we may need to create a gup wrapper - say vaddr_pin_pages() - and track which sites dropping references got converted by using this wrapper instead of gup. The counterpart would then be more logically named as unpin_page() or whatever instead of put_user_page(). Sure this is not completely foolproof (you can create new callsite using vaddr_pin_pages() and then just drop refs using put_page()) but I suppose it would be a high enough barrier for missed conversions... Thoughts?
I think the API we really need is get_user_bvec() / put_user_bvec(), and I know Christoph has been putting some work into that. That avoids doing refcount operations on hundreds of pages if the page in question is a huge page. Once people are switched over to that, they won't be tempted to manually call put_page() on the individual constituent pages of a bvec.
On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
On Fri 02-08-19 11:12:44, Michal Hocko wrote:
On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote: [...]
- Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time.
How do we make sure this is the case and it will remain the case in the future? There must be some automagic to enforce/check that. It is simply not manageable to do it every now and then because then 3) will simply be never safe.
Have you considered coccinele or some other scripted way to do the transition? I have no idea how to deal with future changes that would break the balance though.
Yeah, that's why I've been suggesting at LSF/MM that we may need to create a gup wrapper - say vaddr_pin_pages() - and track which sites dropping references got converted by using this wrapper instead of gup. The counterpart would then be more logically named as unpin_page() or whatever instead of put_user_page(). Sure this is not completely foolproof (you can create new callsite using vaddr_pin_pages() and then just drop refs using put_page()) but I suppose it would be a high enough barrier for missed conversions... Thoughts?
I think the API we really need is get_user_bvec() / put_user_bvec(), and I know Christoph has been putting some work into that. That avoids doing refcount operations on hundreds of pages if the page in question is a huge page. Once people are switched over to that, they won't be tempted to manually call put_page() on the individual constituent pages of a bvec.
Well, get_user_bvec() is certainly a good API for one class of users but just looking at the above series, you'll see there are *many* places that just don't work with bvecs at all and you need something for those.
Honza
On 8/2/19 7:52 AM, Jan Kara wrote:
On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
On Fri 02-08-19 11:12:44, Michal Hocko wrote:
On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote: [...]
- Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time.
How do we make sure this is the case and it will remain the case in the future? There must be some automagic to enforce/check that. It is simply not manageable to do it every now and then because then 3) will simply be never safe.
Have you considered coccinele or some other scripted way to do the transition? I have no idea how to deal with future changes that would break the balance though.
Hi Michal,
Yes, I've thought about it, and coccinelle falls a bit short (it's not smart enough to know which put_page()'s to convert). However, there is a debug option planned: a yet-to-be-posted commit [1] uses struct page extensions (obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add a redundant counter. That allows:
void __put_page(struct page *page) { ... /* Someone called put_page() instead of put_user_page() */ WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);
Yeah, that's why I've been suggesting at LSF/MM that we may need to create a gup wrapper - say vaddr_pin_pages() - and track which sites dropping references got converted by using this wrapper instead of gup. The counterpart would then be more logically named as unpin_page() or whatever instead of put_user_page(). Sure this is not completely foolproof (you can create new callsite using vaddr_pin_pages() and then just drop refs using put_page()) but I suppose it would be a high enough barrier for missed conversions... Thoughts?
The debug option above is still a bit simplistic in its implementation (and maybe not taking full advantage of the data it has), but I think it's preferable, because it monitors the "core" and WARNs.
Instead of the wrapper, I'm thinking: documentation and the passage of time, plus the debug option (perhaps enhanced--probably once I post it someone will notice opportunities), yes?
I think the API we really need is get_user_bvec() / put_user_bvec(), and I know Christoph has been putting some work into that. That avoids doing refcount operations on hundreds of pages if the page in question is a huge page. Once people are switched over to that, they won't be tempted to manually call put_page() on the individual constituent pages of a bvec.
Well, get_user_bvec() is certainly a good API for one class of users but just looking at the above series, you'll see there are *many* places that just don't work with bvecs at all and you need something for those.
Yes, there are quite a few places that don't involve _bvec, as we can see right here. So we need something. Andrew asked for a debug option some time ago, and several people (Dave Hansen, Dan Williams, Jerome) had the idea of vmap-ing gup pages separately, so you can definitely tell where each page came from. I'm hoping not to have to go to that level of complexity though.
[1] "mm/gup: debug tracking of get_user_pages() references" : https://github.com/johnhubbard/linux/commit/21ff7d6161ec2a14d3f9d17c98abb00c...
thanks,
On Fri 02-08-19 12:14:09, John Hubbard wrote:
On 8/2/19 7:52 AM, Jan Kara wrote:
On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
On Fri 02-08-19 11:12:44, Michal Hocko wrote:
On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote: [...]
- Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time.
How do we make sure this is the case and it will remain the case in the future? There must be some automagic to enforce/check that. It is simply not manageable to do it every now and then because then 3) will simply be never safe.
Have you considered coccinele or some other scripted way to do the transition? I have no idea how to deal with future changes that would break the balance though.
Hi Michal,
Yes, I've thought about it, and coccinelle falls a bit short (it's not smart enough to know which put_page()'s to convert). However, there is a debug option planned: a yet-to-be-posted commit [1] uses struct page extensions (obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add a redundant counter. That allows:
void __put_page(struct page *page) { ... /* Someone called put_page() instead of put_user_page() */ WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);
Yeah, that's why I've been suggesting at LSF/MM that we may need to create a gup wrapper - say vaddr_pin_pages() - and track which sites dropping references got converted by using this wrapper instead of gup. The counterpart would then be more logically named as unpin_page() or whatever instead of put_user_page(). Sure this is not completely foolproof (you can create new callsite using vaddr_pin_pages() and then just drop refs using put_page()) but I suppose it would be a high enough barrier for missed conversions... Thoughts?
The debug option above is still a bit simplistic in its implementation (and maybe not taking full advantage of the data it has), but I think it's preferable, because it monitors the "core" and WARNs.
Instead of the wrapper, I'm thinking: documentation and the passage of time, plus the debug option (perhaps enhanced--probably once I post it someone will notice opportunities), yes?
So I think your debug option and my suggested renaming serve a bit different purposes (and thus both make sense). If you do the renaming, you can just grep to see unconverted sites. Also when someone merges new GUP user (unaware of the new rules) while you switch GUP to use pins instead of ordinary references, you'll get compilation error in case of renaming instead of hard to debug refcount leak without the renaming. And such conflict is almost bound to happen given the size of GUP patch set... Also the renaming serves against the "coding inertia" - i.e., GUP is around for ages so people just use it without checking any documentation or comments. After switching how GUP works, what used to be correct isn't anymore so renaming the function serves as a warning that something has really changed.
Your refcount debug patches are good to catch bugs in the conversions done but that requires you to be able to excercise the code path in the first place which may require particular HW or so, and you also have to enable the debug option which means you already aim at verifying the GUP references are treated properly.
Honza
On Wed 07-08-19 10:37:26, Jan Kara wrote:
On Fri 02-08-19 12:14:09, John Hubbard wrote:
On 8/2/19 7:52 AM, Jan Kara wrote:
On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
On Fri 02-08-19 11:12:44, Michal Hocko wrote:
On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote: [...] > 2) Convert all of the call sites for get_user_pages*(), to > invoke put_user_page*(), instead of put_page(). This involves dozens of > call sites, and will take some time.
How do we make sure this is the case and it will remain the case in the future? There must be some automagic to enforce/check that. It is simply not manageable to do it every now and then because then 3) will simply be never safe.
Have you considered coccinele or some other scripted way to do the transition? I have no idea how to deal with future changes that would break the balance though.
Hi Michal,
Yes, I've thought about it, and coccinelle falls a bit short (it's not smart enough to know which put_page()'s to convert). However, there is a debug option planned: a yet-to-be-posted commit [1] uses struct page extensions (obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add a redundant counter. That allows:
void __put_page(struct page *page) { ... /* Someone called put_page() instead of put_user_page() */ WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);
Yeah, that's why I've been suggesting at LSF/MM that we may need to create a gup wrapper - say vaddr_pin_pages() - and track which sites dropping references got converted by using this wrapper instead of gup. The counterpart would then be more logically named as unpin_page() or whatever instead of put_user_page(). Sure this is not completely foolproof (you can create new callsite using vaddr_pin_pages() and then just drop refs using put_page()) but I suppose it would be a high enough barrier for missed conversions... Thoughts?
The debug option above is still a bit simplistic in its implementation (and maybe not taking full advantage of the data it has), but I think it's preferable, because it monitors the "core" and WARNs.
Instead of the wrapper, I'm thinking: documentation and the passage of time, plus the debug option (perhaps enhanced--probably once I post it someone will notice opportunities), yes?
So I think your debug option and my suggested renaming serve a bit different purposes (and thus both make sense). If you do the renaming, you can just grep to see unconverted sites. Also when someone merges new GUP user (unaware of the new rules) while you switch GUP to use pins instead of ordinary references, you'll get compilation error in case of renaming instead of hard to debug refcount leak without the renaming. And such conflict is almost bound to happen given the size of GUP patch set... Also the renaming serves against the "coding inertia" - i.e., GUP is around for ages so people just use it without checking any documentation or comments. After switching how GUP works, what used to be correct isn't anymore so renaming the function serves as a warning that something has really changed.
Fully agreed!
Your refcount debug patches are good to catch bugs in the conversions done but that requires you to be able to excercise the code path in the first place which may require particular HW or so, and you also have to enable the debug option which means you already aim at verifying the GUP references are treated properly.
Honza
-- Jan Kara jack@suse.com SUSE Labs, CR
On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
On Wed 07-08-19 10:37:26, Jan Kara wrote:
On Fri 02-08-19 12:14:09, John Hubbard wrote:
On 8/2/19 7:52 AM, Jan Kara wrote:
On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
On Fri 02-08-19 11:12:44, Michal Hocko wrote: > On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote: > [...] > > 2) Convert all of the call sites for get_user_pages*(), to > > invoke put_user_page*(), instead of put_page(). This involves dozens of > > call sites, and will take some time. > > How do we make sure this is the case and it will remain the case in the > future? There must be some automagic to enforce/check that. It is simply > not manageable to do it every now and then because then 3) will simply > be never safe. > > Have you considered coccinele or some other scripted way to do the > transition? I have no idea how to deal with future changes that would > break the balance though.
Hi Michal,
Yes, I've thought about it, and coccinelle falls a bit short (it's not smart enough to know which put_page()'s to convert). However, there is a debug option planned: a yet-to-be-posted commit [1] uses struct page extensions (obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add a redundant counter. That allows:
void __put_page(struct page *page) { ... /* Someone called put_page() instead of put_user_page() */ WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);
Yeah, that's why I've been suggesting at LSF/MM that we may need to create a gup wrapper - say vaddr_pin_pages() - and track which sites dropping references got converted by using this wrapper instead of gup. The counterpart would then be more logically named as unpin_page() or whatever instead of put_user_page(). Sure this is not completely foolproof (you can create new callsite using vaddr_pin_pages() and then just drop refs using put_page()) but I suppose it would be a high enough barrier for missed conversions... Thoughts?
The debug option above is still a bit simplistic in its implementation (and maybe not taking full advantage of the data it has), but I think it's preferable, because it monitors the "core" and WARNs.
Instead of the wrapper, I'm thinking: documentation and the passage of time, plus the debug option (perhaps enhanced--probably once I post it someone will notice opportunities), yes?
So I think your debug option and my suggested renaming serve a bit different purposes (and thus both make sense). If you do the renaming, you can just grep to see unconverted sites. Also when someone merges new GUP user (unaware of the new rules) while you switch GUP to use pins instead of ordinary references, you'll get compilation error in case of renaming instead of hard to debug refcount leak without the renaming. And such conflict is almost bound to happen given the size of GUP patch set... Also the renaming serves against the "coding inertia" - i.e., GUP is around for ages so people just use it without checking any documentation or comments. After switching how GUP works, what used to be correct isn't anymore so renaming the function serves as a warning that something has really changed.
Fully agreed!
Ok Prior to this I've been basing all my work for the RDMA/FS DAX stuff in Johns put_user_pages()... (Including when I proposed failing truncate with a lease in June [1])
However, based on the suggestions in that thread it became clear that a new interface was going to need to be added to pass in the "RDMA file" information to GUP to associate file pins with the correct processes...
I have many drawings on my white board with "a whole lot of lines" on them to make sure that if a process opens a file, mmaps it, pins it with RDMA, _closes_ it, and ummaps it; that the resulting file pin can still be traced back to the RDMA context and all the processes which may have access to it.... No matter where the original context may have come from. I believe I have accomplished that.
Before I go on, I would like to say that the "imbalance" of get_user_pages() and put_page() bothers me from a purist standpoint... However, since this discussion cropped up I went ahead and ported my work to Linus' current master (5.3-rc3+) and in doing so I only had to steal a bit of Johns code... Sorry John... :-(
I don't have the commit messages all cleaned up and I know there may be some discussion on these new interfaces but I wanted to throw this series out there because I think it may be what Jan and Michal are driving at (or at least in that direction.
Right now only RDMA and DAX FS's are supported. Other users of GUP will still fail on a DAX file and regular files will still be at risk.[2]
I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
I think the most relevant patch to this conversation is:
https://github.com/weiny2/linux-kernel/commit/5d377653ba5cf11c3b716f904b057b...
I stole Jans suggestion for a name as the name I used while prototyping was pretty bad... So Thanks Jan... ;-)
Also thanks to John for his contribution on some of this. I'm still tweaking put_user_pages under the hood on the DAX path.
Ira
[1] https://lwn.net/Articles/790544/
[2] I've been looking into how to support io_uring next but I've had some issue getting a test program to actually call GUP in that code path... :-(
[3] If it would be easier I can just throw an RFC on the list but right now the cover letter and some of the commit messages are full of the old stuff and various ideas I have had...
Your refcount debug patches are good to catch bugs in the conversions done but that requires you to be able to excercise the code path in the first place which may require particular HW or so, and you also have to enable the debug option which means you already aim at verifying the GUP references are treated properly.
Honza
-- Jan Kara jack@suse.com SUSE Labs, CR
-- Michal Hocko SUSE Labs
On 8/7/19 7:36 PM, Ira Weiny wrote:
On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
On Wed 07-08-19 10:37:26, Jan Kara wrote:
On Fri 02-08-19 12:14:09, John Hubbard wrote:
On 8/2/19 7:52 AM, Jan Kara wrote:
On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote: > On Fri 02-08-19 11:12:44, Michal Hocko wrote: >> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
[...]
Before I go on, I would like to say that the "imbalance" of get_user_pages() and put_page() bothers me from a purist standpoint... However, since this discussion cropped up I went ahead and ported my work to Linus' current master (5.3-rc3+) and in doing so I only had to steal a bit of Johns code... Sorry John... :-(
I don't have the commit messages all cleaned up and I know there may be some discussion on these new interfaces but I wanted to throw this series out there because I think it may be what Jan and Michal are driving at (or at least in that direction.
Right now only RDMA and DAX FS's are supported. Other users of GUP will still fail on a DAX file and regular files will still be at risk.[2]
I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
I think the most relevant patch to this conversation is:
https://github.com/weiny2/linux-kernel/commit/5d377653ba5cf11c3b716f904b057b...
ohhh...can you please avoid using the old __put_user_pages_dirty() function? I thought I'd caught things early enough to get away with the rename and deletion of that. You could either:
a) open code an implementation of vaddr_put_pages_dirty_lock() that doesn't call any of the *put_user_pages_dirty*() variants, or
b) include my first patch ("") are part of your series, or
c) base this on Andrews's tree, which already has merged in my first patch.
thanks,
On 8/7/19 7:36 PM, Ira Weiny wrote:
On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
On Wed 07-08-19 10:37:26, Jan Kara wrote:
On Fri 02-08-19 12:14:09, John Hubbard wrote:
On 8/2/19 7:52 AM, Jan Kara wrote:
On Fri 02-08-19 07:24:43, Matthew Wilcox wrote: > On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote: >> On Fri 02-08-19 11:12:44, Michal Hocko wrote: >>> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
[...]
Before I go on, I would like to say that the "imbalance" of get_user_pages() and put_page() bothers me from a purist standpoint... However, since this discussion cropped up I went ahead and ported my work to Linus' current master (5.3-rc3+) and in doing so I only had to steal a bit of Johns code... Sorry John... :-(
I don't have the commit messages all cleaned up and I know there may be some discussion on these new interfaces but I wanted to throw this series out there because I think it may be what Jan and Michal are driving at (or at least in that direction.
Right now only RDMA and DAX FS's are supported. Other users of GUP will still fail on a DAX file and regular files will still be at risk.[2]
I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
I think the most relevant patch to this conversation is:
kernel/commit/5d377653ba5cf11c3b716f90
4b057bee6641aaf6
ohhh...can you please avoid using the old __put_user_pages_dirty() function?
Agreed... I did not like that. Part of the reason I did not post this is I'm still trying to figure out what has landed and what I can and can't depend on.
For example, Christoph H. was proposing changes to some of the GUP calls which may conflict. But I'm not sure his changes are moving forward. So rather than waiting for the dust to settle I decided to see how hard it would be to get this rebased against mainline and working. Turns out it was not too hard.
I think that is because, as time has moved on it seems that, for some users such as RDMA, a simple put_user_page() is not going to be sufficient. We need something else to allow GUP to keep track of the file pins as we discussed. So I'm starting to think some of this could go in at the same time.
I thought I'd caught things early enough to get away with the rename and deletion of that. You could either:
a) open code an implementation of vaddr_put_pages_dirty_lock() that doesn't call any of the *put_user_pages_dirty*() variants, or
b) include my first patch ("") are part of your series, or
c) base this on Andrews's tree, which already has merged in my first patch.
Yep I can do this. I did not realize that Andrew had accepted any of this work. I'll check out his tree. But I don't think he is going to accept this series through his tree. So what is the ETA on that landing in Linus' tree?
To that point I'm still not sure who would take all this as I am now touching mm, procfs, rdma, ext4, and xfs.
I just thought I would chime in with my progress because I'm to a point where things are working and so I can submit the code but I'm not sure what I can/should depend on landing... Also, now that 0day has run overnight it has found issues with this rebase so I need to clean those up... Perhaps I will base on Andrew's tree prior to doing that...
Thanks, Ira
thanks,
John Hubbard NVIDIA
On 8/8/19 9:25 AM, Weiny, Ira wrote:
On 8/7/19 7:36 PM, Ira Weiny wrote:
On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
On Wed 07-08-19 10:37:26, Jan Kara wrote:
On Fri 02-08-19 12:14:09, John Hubbard wrote:
On 8/2/19 7:52 AM, Jan Kara wrote: > On Fri 02-08-19 07:24:43, Matthew Wilcox wrote: >> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote: >>> On Fri 02-08-19 11:12:44, Michal Hocko wrote: >>>> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
[...]
Yep I can do this. I did not realize that Andrew had accepted any of this work. I'll check out his tree. But I don't think he is going to accept this series through his tree. So what is the ETA on that landing in Linus' tree?
I'd expect it to go into 5.4, according to my understanding of how the release cycles are arranged.
To that point I'm still not sure who would take all this as I am now touching mm, procfs, rdma, ext4, and xfs.
I just thought I would chime in with my progress because I'm to a point where things are working and so I can submit the code but I'm not sure what I can/should depend on landing... Also, now that 0day has run overnight it has found issues with this rebase so I need to clean those up... Perhaps I will base on Andrew's tree prior to doing that...
I'm certainly not the right person to answer, but in spite of that, I'd think Andrew's tree is a reasonable place for it. Sort of.
thanks,
On Thu 08-08-19 16:25:04, Weiny, Ira wrote:
I thought I'd caught things early enough to get away with the rename and deletion of that. You could either:
a) open code an implementation of vaddr_put_pages_dirty_lock() that doesn't call any of the *put_user_pages_dirty*() variants, or
b) include my first patch ("") are part of your series, or
c) base this on Andrews's tree, which already has merged in my first patch.
Yep I can do this. I did not realize that Andrew had accepted any of this work. I'll check out his tree. But I don't think he is going to accept this series through his tree. So what is the ETA on that landing in Linus' tree?
To that point I'm still not sure who would take all this as I am now touching mm, procfs, rdma, ext4, and xfs.
MM tree would be one candidate for routing but there are other options that would make sense as well - Dan's tree, VFS tree, or even I can pickup the patches to my tree if needed. But let's worry about the routing after we have working and reviewed patches...
Honza
On Wed 07-08-19 19:36:37, Ira Weiny wrote:
On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
So I think your debug option and my suggested renaming serve a bit different purposes (and thus both make sense). If you do the renaming, you can just grep to see unconverted sites. Also when someone merges new GUP user (unaware of the new rules) while you switch GUP to use pins instead of ordinary references, you'll get compilation error in case of renaming instead of hard to debug refcount leak without the renaming. And such conflict is almost bound to happen given the size of GUP patch set... Also the renaming serves against the "coding inertia" - i.e., GUP is around for ages so people just use it without checking any documentation or comments. After switching how GUP works, what used to be correct isn't anymore so renaming the function serves as a warning that something has really changed.
Fully agreed!
Ok Prior to this I've been basing all my work for the RDMA/FS DAX stuff in Johns put_user_pages()... (Including when I proposed failing truncate with a lease in June [1])
However, based on the suggestions in that thread it became clear that a new interface was going to need to be added to pass in the "RDMA file" information to GUP to associate file pins with the correct processes...
I have many drawings on my white board with "a whole lot of lines" on them to make sure that if a process opens a file, mmaps it, pins it with RDMA, _closes_ it, and ummaps it; that the resulting file pin can still be traced back to the RDMA context and all the processes which may have access to it.... No matter where the original context may have come from. I believe I have accomplished that.
Before I go on, I would like to say that the "imbalance" of get_user_pages() and put_page() bothers me from a purist standpoint... However, since this discussion cropped up I went ahead and ported my work to Linus' current master (5.3-rc3+) and in doing so I only had to steal a bit of Johns code... Sorry John... :-(
I don't have the commit messages all cleaned up and I know there may be some discussion on these new interfaces but I wanted to throw this series out there because I think it may be what Jan and Michal are driving at (or at least in that direction.
Right now only RDMA and DAX FS's are supported. Other users of GUP will still fail on a DAX file and regular files will still be at risk.[2]
I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
I think the most relevant patch to this conversation is:
https://github.com/weiny2/linux-kernel/commit/5d377653ba5cf11c3b716f904b057b...
I stole Jans suggestion for a name as the name I used while prototyping was pretty bad... So Thanks Jan... ;-)
For your function, I'd choose a name like vaddr_pin_leased_pages() so that association with a lease is clear from the name :) Also I'd choose the counterpart to be vaddr_unpin_leased_page[s](). Especially having put_page in the name looks confusing to me...
Honza
On Wed 07-08-19 19:36:37, Ira Weiny wrote:
On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
So I think your debug option and my suggested renaming serve a bit different purposes (and thus both make sense). If you do the renaming, you can just grep to see unconverted sites. Also when someone merges new GUP user (unaware of the new rules) while you switch GUP to use pins instead of ordinary references, you'll get compilation error in case of renaming instead of hard to debug refcount leak without the renaming. And such conflict is almost bound to happen given the size of GUP patch set... Also the renaming serves against the "coding inertia" - i.e., GUP is around for
ages so people just use it without checking any documentation or comments.
After switching how GUP works, what used to be correct isn't anymore so renaming the function serves as a warning that something has really changed.
Fully agreed!
Ok Prior to this I've been basing all my work for the RDMA/FS DAX stuff in Johns put_user_pages()... (Including when I proposed failing truncate with a lease in June [1])
However, based on the suggestions in that thread it became clear that a new interface was going to need to be added to pass in the "RDMA file" information to GUP to associate file pins with the correct processes...
I have many drawings on my white board with "a whole lot of lines" on them to make sure that if a process opens a file, mmaps it, pins it with RDMA, _closes_ it, and ummaps it; that the resulting file pin can still be traced back to the RDMA context and all the processes which may have access to it.... No matter where the original context may have come from. I believe I have accomplished that.
Before I go on, I would like to say that the "imbalance" of get_user_pages() and put_page() bothers me from a purist standpoint... However, since this discussion cropped up I went ahead and ported my work to Linus' current master (5.3-rc3+) and in doing so I only had to steal a bit of Johns code... Sorry John... :-(
I don't have the commit messages all cleaned up and I know there may be some discussion on these new interfaces but I wanted to throw this series out there because I think it may be what Jan and Michal are driving at (or at least in that direction.
Right now only RDMA and DAX FS's are supported. Other users of GUP will still fail on a DAX file and regular files will still be at risk.[2]
I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
I think the most relevant patch to this conversation is:
kernel/commit/5d377653ba5cf11c3b716f90
4b057bee6641aaf6
I stole Jans suggestion for a name as the name I used while prototyping was pretty bad... So Thanks Jan... ;-)
For your function, I'd choose a name like vaddr_pin_leased_pages() so that association with a lease is clear from the name :)
My gut was to just change this as you suggested. But the fact is that these calls can get used on anonymous pages as well. So the "leased" semantic may not apply... OTOH if a file is encountered it will fail the pin... :-/ I'm going to leave it for now and get the patches submitted to the list...
Also I'd choose the counterpart to be vaddr_unpin_leased_page[s](). Especially having put_page in the name looks confusing to me...
Ah yes, totally agree with the "pin/unpin" symmetry. I've changed from "put" to "unpin"...
Thanks, Ira
Honza
-- Jan Kara jack@suse.com SUSE Labs, CR
dri-devel@lists.freedesktop.org