On Wed, Feb 19, 2020 at 2:00 AM Tian, Kevin kevin.tian@intel.com wrote:
From: Chia-I Wu Sent: Saturday, February 15, 2020 5:15 AM
On Fri, Feb 14, 2020 at 2:26 AM Paolo Bonzini pbonzini@redhat.com wrote:
On 13/02/20 23:18, Chia-I Wu wrote:
The bug you mentioned was probably this one
Yes, indeed.
From what I can tell, the commit allowed the guests to create cached mappings to MMIO regions and caused MCEs. That is different than what I need, which is to allow guests to create uncached mappings to system ram (i.e., !kvm_is_mmio_pfn) when the host userspace also has
uncached
mappings. But it is true that this still allows the userspace & guest kernel to create conflicting memory types.
Right, the question is whether the MCEs were tied to MMIO regions specifically and if so why.
An interesting remark is in the footnote of table 11-7 in the SDM. There, for the MTRR (EPT for us) memory type UC you can read:
The UC attribute comes from the MTRRs and the processors are not required to snoop their caches since the data could never have been cached. This attribute is preferred for performance reasons.
There are two possibilities:
- the footnote doesn't apply to UC mode coming from EPT page tables.
That would make your change safe.
- the footnote also applies when the UC attribute comes from the EPT
page tables rather than the MTRRs. In that case, the host should use UC as the EPT page attribute if and only if it's consistent with the host MTRRs; it would be more or less impossible to honor UC in the guest
MTRRs.
In that case, something like the patch below would be needed.
It is not clear from the manual why the footnote would not apply to WC;
that
is, the manual doesn't say explicitly that the processor does not do
snooping
for accesses to WC memory. But I guess that must be the case, which is
why I
used MTRR_TYPE_WRCOMB in the patch below.
Either way, we would have an explanation of why creating cached mapping
to
MMIO regions would, and why in practice we're not seeing MCEs for guest
RAM
(the guest would have set WB for that memory in its MTRRs, not UC).
One thing you didn't say: how would userspace use KVM_MEM_DMA? On
which
regions would it be set?
It will be set for shmems that are mapped WC.
GPU/DRM drivers allocate shmems as DMA-able gpu buffers and allow the userspace to map them cached or WC (I915_MMAP_WC or AMDGPU_GEM_CREATE_CPU_GTT_USWC for example). When a shmem is mapped WC and is made available to the guest, we would like the ability to map the region WC in the guest.
Curious... How is such slot exposed to the guest? A reserved memory region? Is it static or might be dynamically added?
The plan is for virtio-gpu device to reserve a huge memory region in the guest. Memslots may be added dynamically or statically to back the region.
Dynamic: the host adds a 16MB GPU allocation as a memslot at a time. The guest kernel suballocates from the 16MB pool.
Static: the host creates a huge PROT_NONE memfd and adds it as a memslot. GPU allocations are mremap()ed into the memfd region to provide the real mapping.
These options are considered because the number of memslots are limited: 32 on ARM and 509 on x86. If the number of memslots could be made larger (4096 or more), we would also consider adding each individual GPU allocation as a memslot.
These are actually questions we need feedback. Besides, GPU allocations can be assumed to be kernel dma-bufs in this context. I wonder if it makes sense to have a variation of KVM_SET_USER_MEMORY_REGION that takes dma-bufs.
Thanks Kevin