Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

9 Sep 2021


      Am 2021-09-02 um 4:18 a.m. schrieb Christoph Hellwig:
...
On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote:
...
...
...
...
It looks like I'm totally misunderstanding what you are adding here
then.  Why do we need any special treatment at all for memory that
has normal struct pages and is part of the direct kernel map?
The pages are like normal memory for purposes of mapping them in CPU
page tables and for coherent access from the CPU.
That's the user page tables.  What about the kernel direct map?
If there is a normal kernel struct page backing there really should
be no need for the pgmap.
I'm not sure. The physical address ranges are in the UEFI system address
map as special-purpose memory. Does Linux create the struct pages and
kernel direct map for that without a pgmap call? I didn't see that last
time I went digging through that code.
So doing some googling finds a patch from Dan that claims to hand EFI
special purpose memory to the device dax driver.  But when I try to
follow the version that got merged it looks it is treated simply as an
MMIO region to be claimed by drivers, which would not get a struct page.
Dan, did I misunderstand how E820_TYPE_SOFT_RESERVED works?
...
...
...
From an application
perspective, we want file-backed and anonymous mappings to be able to
use DEVICE_PUBLIC pages with coherent CPU access. The goal is to
optimize performance for GPU heavy workloads while minimizing the need
to migrate data back-and-forth between system memory and device memory.
I don't really understand that part.  file backed pages are always
allocated by the file system using the pagecache helpers, that is
using the page allocator.  Anonymouns memory also always comes from
the page allocator.
I'm coming at this from my experience with DEVICE_PRIVATE. Both
anonymous and file-backed pages should be migrateable to DEVICE_PRIVATE
memory by the migrate_vma_* helpers for more efficient access by our
GPU. (*) It's part of the basic premise of HMM as I understand it. I
would expect the same thing to work for DEVICE_PUBLIC memory.
Ok, so you want to migrate to and from them.  Not use DEVICE_PUBLIC
for the actual page cache pages.  That maks a lot more sense.
...
I see DEVICE_PUBLIC as an improved version of DEVICE_PRIVATE that allows
the CPU to map the device memory coherently to minimize the need for
migrations when CPU and GPU access the same memory concurrently or
alternatingly. But we're not going as far as putting that memory
entirely under the management of the Linux memory manager and VM
subsystem. Our (and HPE's) system architects decided that this memory is
not suitable to be used like regular NUMA system memory by the Linux
memory manager.
So yes.  It is a Memory Mapped I/O region, which unlike the PCIe BARs
that people typically deal with is fully cache coherent.  I think this
does make more sense as a description.
But to go back to what start this discussion:  If these are memory
mapped I/O pfn_valid should generally not return true for them.
As I understand it, pfn_valid should be true for any pfn that's part of
the kernel's physical memory map, i.e. is returned by page_to_pfn or
works with pfn_to_page. Both the hmm_range_fault and the migrate_vma_*
APIs use pfns to refer to regular system memory and ZONE_DEVICE pages
(even DEVICE_PRIVATE). Therefore I believe pfn_valid should be true for
ZONE_DEVICE pages as well.
Regards,
  Felix
...
And as you already pointed out in reply to Alex we need to tighten the
selection criteria one way or another.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration