From: Chia-I Wu olvaffe@gmail.com Sent: Saturday, February 22, 2020 2:21 AM
On Fri, Feb 21, 2020 at 7:59 AM Sean Christopherson sean.j.christopherson@intel.com wrote:
On Thu, Feb 20, 2020 at 09:39:05PM -0800, Tian, Kevin wrote:
From: Chia-I Wu olvaffe@gmail.com Sent: Friday, February 21, 2020 12:51 PM If you think it is the best for KVM to inspect hva to determine the
memory
type with page granularity, that is reasonable and should work for us
too.
The userspace can do something (e.g., add a GPU driver dependency to
the
hypervisor such that the dma-buf is imported as a GPU memory and
mapped
using vkMapMemory) or I can work with dma-buf maintainers to see if dma-
buf's
semantics can be changed.
I think you need consider the live migration requirement as Paolo pointed
out.
The migration thread needs to read/write the region, then it must use the same type as GPU process and guest to read/write the region. In such
case,
the hva mapped by Qemu should have the desired type as the guest.
However,
adding GPU driver dependency to Qemu might trigger some concern. I'm
not
sure whether there is generic mechanism though, to share dmabuf fd
between GPU
process and Qemu while allowing Qemu to follow the desired type w/o
using
vkMapMemory...
Alternatively, KVM could make KVM_MEM_DMA and
KVM_MEM_LOG_DIRTY_PAGES
mutually exclusive, i.e. force a transition to WB memtype for the guest (with appropriate zapping) when migration is activated. I think that would work?
Hm, virtio-gpu does not allow live migration when the 3D function (virgl=on) is enabled. This is the relevant code in qemu:
if (virtio_gpu_virgl_enabled(g->conf)) { error_setg(&g->migration_blocker, "virgl is not yet migratable");
Although we (virtio-gpu and virglrenderer projects) plan to make host GPU buffers available to the guest via memslots, those buffers should be considered a part of the "GPU state". The migration thread should work with virglrenderer and let virglrenderer save/restore them, if live migration is to be supported.
Thanks for your explanation. Your RFC makes more sense now.
One remaining open is, although for live migration we can explicitly state that migration thread itself should not access the dma-buf region, how can we warn other usages which may potentially simply walk every memslot and access the content through the mmap-ed virtual address? Possibly we may need a flag to indicate a memslot which is mmaped only for KVM to retrieve its page table mapping but not for direct access in Qemu.
QEMU depends on GPU drivers already when configured with --enable-virglrenderer. There is vhost-user-gpu that can move the dependency to a GPU process. But there are still going to be cases (e.g., nVidia's proprietary driver does not support dma-buf) where QEMU cannot avoid GPU driver dependency.
Note this is orthogonal to whether introducing a new uapi or implicitly
checking
hva to favor guest memory type. It's purely about Qemu itself. Ideally
anyone
with the desire to access a dma-buf object should follow the expected
semantics.
It's interesting that dma-buf sub-system doesn't provide a centralized synchronization about memory type between multiple mmap paths.