Re: [RFC PATCH 0/3] KVM: x86: honor guest memory type

19 Feb 2020

      On Wed, Feb 19, 2020 at 2:00 AM Tian, Kevin kevin.tian@intel.com wrote:
...
...
From: Chia-I Wu
Sent: Saturday, February 15, 2020 5:15 AM
On Fri, Feb 14, 2020 at 2:26 AM Paolo Bonzini pbonzini@redhat.com wrote:
...
On 13/02/20 23:18, Chia-I Wu wrote:
...
The bug you mentioned was probably this one
https://bugzilla.kernel.org/show_bug.cgi?id=104091
Yes, indeed.
...
From what I can tell, the commit allowed the guests to create cached
mappings to MMIO regions and caused MCEs.  That is different than what
I need, which is to allow guests to create uncached mappings to system
ram (i.e., !kvm_is_mmio_pfn) when the host userspace also has
uncached
...
...
mappings.  But it is true that this still allows the userspace & guest
kernel to create conflicting memory types.
Right, the question is whether the MCEs were tied to MMIO regions
specifically and if so why.
An interesting remark is in the footnote of table 11-7 in the SDM.
There, for the MTRR (EPT for us) memory type UC you can read:
The UC attribute comes from the MTRRs and the processors are not
  required to snoop their caches since the data could never have
  been cached. This attribute is preferred for performance reasons.
There are two possibilities:

the footnote doesn't apply to UC mode coming from EPT page tables.

That would make your change safe.

the footnote also applies when the UC attribute comes from the EPT

page tables rather than the MTRRs.  In that case, the host should use
UC as the EPT page attribute if and only if it's consistent with the host
MTRRs; it would be more or less impossible to honor UC in the guest
MTRRs.
...
In that case, something like the patch below would be needed.
It is not clear from the manual why the footnote would not apply to WC;
that
...
is, the manual doesn't say explicitly that the processor does not do
snooping
...
for accesses to WC memory.  But I guess that must be the case, which is
why I
...
used MTRR_TYPE_WRCOMB in the patch below.
Either way, we would have an explanation of why creating cached mapping
to
...
MMIO regions would, and why in practice we're not seeing MCEs for guest
RAM
...
(the guest would have set WB for that memory in its MTRRs, not UC).
One thing you didn't say: how would userspace use KVM_MEM_DMA?  On
which
...
regions would it be set?
It will be set for shmems that are mapped WC.
GPU/DRM drivers allocate shmems as DMA-able gpu buffers and allow the
userspace to map them cached or WC (I915_MMAP_WC or
AMDGPU_GEM_CREATE_CPU_GTT_USWC for example).  When a shmem is
mapped
WC and is made available to the guest, we would like the ability to
map the region WC in the guest.
Curious... How is such slot exposed to the guest? A reserved memory
region? Is it static or might be dynamically added?
The plan is for virtio-gpu device to reserve a huge memory region in
the guest.  Memslots may be added dynamically or statically to back
the region.
Dynamic: the host adds a 16MB GPU allocation as a memslot at a time.
The guest kernel suballocates from the 16MB pool.
Static: the host creates a huge PROT_NONE memfd and adds it as a
memslot.  GPU allocations are mremap()ed into the memfd region to
provide the real mapping.
These options are considered because the number of memslots are
limited: 32 on ARM and 509 on x86.  If the number of memslots could be
made larger (4096 or more), we would also consider adding each
individual GPU allocation as a memslot.
These are actually questions we need feedback.  Besides, GPU
allocations can be assumed to be kernel dma-bufs in this context.  I
wonder if it makes sense to have a variation of
KVM_SET_USER_MEMORY_REGION that takes dma-bufs.
...
Thanks
Kevin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [RFC PATCH 0/3] KVM: x86: honor guest memory type