On Fri, Feb 21, 2020 at 7:59 AM Sean Christopherson sean.j.christopherson@intel.com wrote:
On Thu, Feb 20, 2020 at 09:39:05PM -0800, Tian, Kevin wrote:
From: Chia-I Wu olvaffe@gmail.com Sent: Friday, February 21, 2020 12:51 PM If you think it is the best for KVM to inspect hva to determine the memory type with page granularity, that is reasonable and should work for us too. The userspace can do something (e.g., add a GPU driver dependency to the hypervisor such that the dma-buf is imported as a GPU memory and mapped using vkMapMemory) or I can work with dma-buf maintainers to see if dma-buf's semantics can be changed.
I think you need consider the live migration requirement as Paolo pointed out. The migration thread needs to read/write the region, then it must use the same type as GPU process and guest to read/write the region. In such case, the hva mapped by Qemu should have the desired type as the guest. However, adding GPU driver dependency to Qemu might trigger some concern. I'm not sure whether there is generic mechanism though, to share dmabuf fd between GPU process and Qemu while allowing Qemu to follow the desired type w/o using vkMapMemory...
Alternatively, KVM could make KVM_MEM_DMA and KVM_MEM_LOG_DIRTY_PAGES mutually exclusive, i.e. force a transition to WB memtype for the guest (with appropriate zapping) when migration is activated. I think that would work?
Hm, virtio-gpu does not allow live migration when the 3D function (virgl=on) is enabled. This is the relevant code in qemu:
if (virtio_gpu_virgl_enabled(g->conf)) { error_setg(&g->migration_blocker, "virgl is not yet migratable");
Although we (virtio-gpu and virglrenderer projects) plan to make host GPU buffers available to the guest via memslots, those buffers should be considered a part of the "GPU state". The migration thread should work with virglrenderer and let virglrenderer save/restore them, if live migration is to be supported.
QEMU depends on GPU drivers already when configured with --enable-virglrenderer. There is vhost-user-gpu that can move the dependency to a GPU process. But there are still going to be cases (e.g., nVidia's proprietary driver does not support dma-buf) where QEMU cannot avoid GPU driver dependency.
Note this is orthogonal to whether introducing a new uapi or implicitly checking hva to favor guest memory type. It's purely about Qemu itself. Ideally anyone with the desire to access a dma-buf object should follow the expected semantics. It's interesting that dma-buf sub-system doesn't provide a centralized synchronization about memory type between multiple mmap paths.