Hi Christoph:
Thanks so much for the patches and the testing.
The abstraction between i915 and KVMGT module is for our customers who can easily port GVT-g into their own hypervisors. As you can see, all the hypervisor related functions were put in "hypervisor" module. The GVT-g module talks with the "hypervisor" module through the MPT layer. The customers just need to focus on their "hypervisor" module, implement and attach their own "hypervisor" modules to the MPT layer without touching the GVT-g core logic, which reduce great efforts during the porting as the core logic of GVT-g in i915 is pretty vendor-specific and customers aren't motivated to touch it unless they have to.
The boundary between GVT-g core logic and "hypervisor" module also helps a lot on narrowing down the problems when supporting our customers. According to our experience during the support, the less a customer touches the core logic, the less problem will be introduced.
We get many customers who are using commercial hypervisors like QNX or their private hypervisors in many areas in the industry. An reference implementation of "Xen hypervisor" module to demonstrate our customer how to port GVT-g to a type-1 hypervisor instead of KVM can be found here: https://github.com/intel/gvt-linux/blob/topic/gvt-xengt/drivers/gpu/drm/i915...
But it's hard for some customers to contribute their own "hypervisor" module to the upstream Linux kernel. I am thinking what would be a better solution here? The MPT layer in the kernel helps a lot for customers, but only one open-source "hypervisor" module is there in the kernel. That can confuse people which don't know the story. One thing I was thinking is to put a document about the background and more description in the MPT headers. So it won't confuse more people.
Feel free to drop more comments. 😊
Thanks, Zhi.
-----Original Message----- From: Christoph Hellwig hch@lst.de Sent: Wednesday, July 21, 2021 6:54 PM To: Jani Nikula jani.nikula@linux.intel.com; Joonas Lahtinen joonas.lahtinen@linux.intel.com; Vivi, Rodrigo rodrigo.vivi@intel.com; Zhenyu Wang zhenyuw@linux.intel.com; Wang, Zhi A zhi.a.wang@intel.com Cc: intel-gfx@lists.freedesktop.org; intel-gvt-dev@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-kernel@vger.kernel.org Subject: refactor the i915 GVT support
Hi all,
the GVT code in the i915 is a bit of a mess right now due to strange abstractions and lots of indirect calls. This series refactors various bits to clean that up. The main user visible change is that almost all of the GVT code moves out of the main i915 driver and into the kvmgt module.
Tested on my Thinkpad with a Kaby Lake CPU and integrated graphics.
Git tree:
git://git.infradead.org/users/hch/misc.git i915-gvt
Gitweb:
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/i915-gvt
Diffstat: b/drivers/gpu/drm/i915/Kconfig | 31 b/drivers/gpu/drm/i915/Makefile | 30 b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 b/drivers/gpu/drm/i915/gvt/cfg_space.c | 89 -- b/drivers/gpu/drm/i915/gvt/cmd_parser.c | 4 b/drivers/gpu/drm/i915/gvt/dmabuf.c | 36 b/drivers/gpu/drm/i915/gvt/execlist.c | 12 b/drivers/gpu/drm/i915/gvt/gtt.c | 55 - b/drivers/gpu/drm/i915/gvt/gvt.c | 100 -- b/drivers/gpu/drm/i915/gvt/gvt.h | 132 ++- b/drivers/gpu/drm/i915/gvt/interrupt.c | 38 - b/drivers/gpu/drm/i915/gvt/kvmgt.c | 634 ++++------------- b/drivers/gpu/drm/i915/gvt/mmio.c | 4 b/drivers/gpu/drm/i915/gvt/opregion.c | 148 --- b/drivers/gpu/drm/i915/gvt/page_track.c | 8 b/drivers/gpu/drm/i915/gvt/scheduler.c | 37 b/drivers/gpu/drm/i915/gvt/trace.h | 2 b/drivers/gpu/drm/i915/gvt/vgpu.c | 22 b/drivers/gpu/drm/i915/i915_drv.h | 7 b/drivers/gpu/drm/i915/i915_params.c | 2 b/drivers/gpu/drm/i915/intel_gvt.c | 64 + b/drivers/gpu/drm/i915/intel_gvt.h | 4 drivers/gpu/drm/i915/gvt/Makefile | 9 drivers/gpu/drm/i915/gvt/hypercall.h | 82 -- drivers/gpu/drm/i915/gvt/mpt.h | 400 ---------- 25 files changed, 541 insertions(+), 1413 deletions(-)
Hi,
https://github.com/intel/gvt-linux/blob/topic/gvt-xengt/drivers/gpu/drm/i915...
But it's hard for some customers to contribute their own "hypervisor" module to the upstream Linux kernel. I am thinking what would be a better solution here? The MPT layer in the kernel helps a lot for customers, but only one open-source "hypervisor" module is there in the kernel. That can confuse people which don't know the story. One thing I was thinking is to put a document about the background and more description in the MPT headers. So it won't confuse more people.
Getting the xengt module linked above merged into mainline would also nicely explain why there are hypervisor modules.
take care, Gerd
On Thu, Jul 22, 2021 at 01:26:36PM +0200, Gerd Hoffmann wrote:
Hi,
https://github.com/intel/gvt-linux/blob/topic/gvt-xengt/drivers/gpu/drm/i915...
But it's hard for some customers to contribute their own "hypervisor" module to the upstream Linux kernel. I am thinking what would be a better solution here? The MPT layer in the kernel helps a lot for customers, but only one open-source "hypervisor" module is there in the kernel. That can confuse people which don't know the story. One thing I was thinking is to put a document about the background and more description in the MPT headers. So it won't confuse more people.
Getting the xengt module linked above merged into mainline would also nicely explain why there are hypervisor modules.
It would also be nice to explain why a GPU driver needs a hypervisor specific shim like this in the first place.
enum hypervisor_type type; int (*host_init)(struct device *dev, void *gvt, const void *ops); void (*host_exit)(struct device *dev, void *gvt); int (*attach_vgpu)(void *vgpu, unsigned long *handle); void (*detach_vgpu)(void *vgpu);
Doesn't vfio provide all this generically with notifiers?
int (*inject_msi)(unsigned long handle, u32 addr, u16 data);
Isn't this one just an eventfd?
unsigned long (*from_virt_to_mfn)(void *p); int (*read_gpa)(unsigned long handle, unsigned long gpa, void *buf, unsigned long len); int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf, unsigned long len); unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn, unsigned long size, dma_addr_t *dma_addr); void (*dma_unmap_guest_page)(unsigned long handle, dma_addr_t dma_addr);
int (*dma_pin_guest_page)(unsigned long handle, dma_addr_t dma_addr);
int (*map_gfn_to_mfn)(unsigned long handle, unsigned long gfn, unsigned long mfn, unsigned int nr, bool map); bool (*is_valid_gfn)(unsigned long handle, unsigned long gfn);
Shouldn't the vfio page SW IOMMU do all of this generically?
int (*enable_page_track)(unsigned long handle, u64 gfn); int (*disable_page_track)(unsigned long handle, u64 gfn); int (*set_trap_area)(unsigned long handle, u64 start, u64 end, bool map); int (*set_opregion)(void *vgpu); int (*set_edid)(void *vgpu, int port_num);
edid depends on hypervisor??
int (*get_vfio_device)(void *vgpu); void (*put_vfio_device)(void *vgpu);
Jason
Hi Jason:
I guess those APIs you were talking about are KVM-only. For other hypervisors, e.g. Xen, ARCN cannot use the APIs you mentioned. Not sure if you have already noticed that VFIO is KVM-only right now.
GVT-g is designed for many hypervisors not only KVM. In the design, we implemented an abstraction layer for different hypervisors. You can check the link in the previous email which has an example of how the MPT module "xengt" supports GVT-g running under Xen. For example, injecting a msi in VFIO/KVM is via playing with eventfd. But in Xen, we need to issue a hypercall from Dom0. So does others, like querying mappings between GFN and HFN. Some GPU related emulation logic might be implemented differently under different hypervisors because different hypervisors might provide not exact the APIs we want. That's the reason why they get a callback in the MPT (yet not perfect.)
As you can see, to survive from this situation, we have to rely on an abstraction layer so that we can prevent introducing coding blocks like in the core logic:
If (in_hypervisor_xen) Issue hypercalls else if (in_hypervisor_kvm) Play with eventfds. Else if (in_hypervisor_other) xxxx
Thus some of the APIs have to be wrapped in the MPT module in GVT-g design.
Sadly, not all customers are motivated or allowed to get their hypervisor-specific modules into the kernel. We have a customer who runs GVT-g with their private hypervisor. In this case, they don't want to get their "xxxgt" MPT module into upstream since their hypervisor has been in the kernel yet. Also, we have customers who ported the GVT-g to QNX which is another widely used commercial hypervisor in the industry. They can't get the "qnxgt" MPT module into upstream since it's not allowed.
We do understand the situation and try to figure out a solution that can fulfill expectations from different people in the community and also customers.
According to Greg KH's comments, we are collecting the requirements of MPT modules from other open-source hypervisors in the kernel, e.g. ACRN, to see if they can get another MPT module into the kernel, which will show an example that how the MPT abstraction can benefit. Also, we are evaluating the impact on our customers if we have to remove MPT abstraction in the kernel because there is only one MPT module.
Thanks so much for the comments.
Thanks, Zhi.
-----Original Message----- From: Jason Gunthorpe jgg@nvidia.com Sent: Tuesday, July 27, 2021 3:12 PM To: Gerd Hoffmann kraxel@redhat.com Cc: Wang, Zhi A zhi.a.wang@intel.com; Christoph Hellwig hch@lst.de; Jani Nikula jani.nikula@linux.intel.com; Joonas Lahtinen joonas.lahtinen@linux.intel.com; Vivi, Rodrigo rodrigo.vivi@intel.com; Zhenyu Wang zhenyuw@linux.intel.com; intel-gfx@lists.freedesktop.org; intel-gvt-dev@lists.freedesktop.org; linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org Subject: Re: refactor the i915 GVT support
On Thu, Jul 22, 2021 at 01:26:36PM +0200, Gerd Hoffmann wrote:
Hi,
https://github.com/intel/gvt-linux/blob/topic/gvt-xengt/drivers/gpu/ drm/i915/gvt/xengt.c
But it's hard for some customers to contribute their own "hypervisor" module to the upstream Linux kernel. I am thinking what would be a better solution here? The MPT layer in the kernel helps a lot for customers, but only one open-source "hypervisor" module is there in the kernel. That can confuse people which don't know the story. One thing I was thinking is to put a document about the background and more description in the MPT headers. So it won't confuse more people.
Getting the xengt module linked above merged into mainline would also nicely explain why there are hypervisor modules.
It would also be nice to explain why a GPU driver needs a hypervisor specific shim like this in the first place.
enum hypervisor_type type; int (*host_init)(struct device *dev, void *gvt, const void *ops); void (*host_exit)(struct device *dev, void *gvt); int (*attach_vgpu)(void *vgpu, unsigned long *handle); void (*detach_vgpu)(void *vgpu);
Doesn't vfio provide all this generically with notifiers?
int (*inject_msi)(unsigned long handle, u32 addr, u16 data);
Isn't this one just an eventfd?
unsigned long (*from_virt_to_mfn)(void *p); int (*read_gpa)(unsigned long handle, unsigned long gpa, void *buf, unsigned long len); int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf, unsigned long len); unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn, unsigned long size, dma_addr_t *dma_addr); void (*dma_unmap_guest_page)(unsigned long handle, dma_addr_t dma_addr);
int (*dma_pin_guest_page)(unsigned long handle, dma_addr_t dma_addr);
int (*map_gfn_to_mfn)(unsigned long handle, unsigned long gfn, unsigned long mfn, unsigned int nr, bool map); bool (*is_valid_gfn)(unsigned long handle, unsigned long gfn);
Shouldn't the vfio page SW IOMMU do all of this generically?
int (*enable_page_track)(unsigned long handle, u64 gfn); int (*disable_page_track)(unsigned long handle, u64 gfn); int (*set_trap_area)(unsigned long handle, u64 start, u64 end, bool map); int (*set_opregion)(void *vgpu); int (*set_edid)(void *vgpu, int port_num);
edid depends on hypervisor??
int (*get_vfio_device)(void *vgpu); void (*put_vfio_device)(void *vgpu);
Jason
A: http://en.wikipedia.org/wiki/Top_post Q: Were do I find info about this thing called top-posting? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail?
A: No. Q: Should I include quotations after my reply?
http://daringfireball.net/2007/07/on_top
On Wed, Jul 28, 2021 at 01:38:58PM +0000, Wang, Zhi A wrote:
Hi Jason:
I guess those APIs you were talking about are KVM-only. For other hypervisors, e.g. Xen, ARCN cannot use the APIs you mentioned. Not sure if you have already noticed that VFIO is KVM-only right now.
Please wrap your lines properly :(
GVT-g is designed for many hypervisors not only KVM. In the design, we implemented an abstraction layer for different hypervisors. You can check the link in the previous email which has an example of how the MPT module "xengt" supports GVT-g running under Xen. For example, injecting a msi in VFIO/KVM is via playing with eventfd. But in Xen, we need to issue a hypercall from Dom0. So does others, like querying mappings between GFN and HFN. Some GPU related emulation logic might be implemented differently under different hypervisors because different hypervisors might provide not exact the APIs we want. That's the reason why they get a callback in the MPT (yet not perfect.)
As you can see, to survive from this situation, we have to rely on an abstraction layer so that we can prevent introducing coding blocks like in the core logic:
If (in_hypervisor_xen) Issue hypercalls else if (in_hypervisor_kvm) Play with eventfds. Else if (in_hypervisor_other) xxxx
That's horrid, and slow, please do this properly.
Thus some of the APIs have to be wrapped in the MPT module in GVT-g design.
Sadly, not all customers are motivated or allowed to get their hypervisor-specific modules into the kernel. We have a customer who runs GVT-g with their private hypervisor. In this case, they don't want to get their "xxxgt" MPT module into upstream since their hypervisor has been in the kernel yet. Also, we have customers who ported the GVT-g to QNX which is another widely used commercial hypervisor in the industry. They can't get the "qnxgt" MPT module into upstream since it's not allowed.
Why is it not allowed?
We do understand the situation and try to figure out a solution that can fulfill expectations from different people in the community and also customers.
According to Greg KH's comments, we are collecting the requirements of MPT modules from other open-source hypervisors in the kernel, e.g. ACRN, to see if they can get another MPT module into the kernel, which will show an example that how the MPT abstraction can benefit. Also, we are evaluating the impact on our customers if we have to remove MPT abstraction in the kernel because there is only one MPT module.
Until that happens, can we please just remove the unneeded layer here as no one is using it? Then, when you have a real user for this type of middle-layer, you can add it back? We have no need for it now.
thanks,
greg k-h
On Wed, Jul 28, 2021 at 01:38:58PM +0000, Wang, Zhi A wrote:
I guess those APIs you were talking about are KVM-only. For other hypervisors, e.g. Xen, ARCN cannot use the APIs you mentioned. Not sure if you have already noticed that VFIO is KVM-only right now.
There is very little hard connection between VFIO and KVM, so no, I don't think that is completely true.
In an event, an in-tree version of other hypervisor support for GVT needs to go through enabling VFIO support so that the existing API multiplexers we have can be used properly, not adding a shim layer trying to recreate VFIO inside a GPU driver.
GVT-g is designed for many hypervisors not only KVM. In the design, we implemented an abstraction layer for different hypervisors. You can check the link in the previous email which has an example of how the MPT module "xengt" supports GVT-g running under Xen. For example, injecting a msi in VFIO/KVM is via playing with eventfd. But in Xen, we need to issue a hypercall from Dom0.
This is obviously bad design, Xen should plug into the standardized eventfd scheme as well and trigger its hypercall this way. Then it can integrate with the existing VFIO interrupt abstraction infrastructure.
others, like querying mappings between GFN and HFN.
This should be done through VFIO containers, there is nothing KVM specific there.
As you can see, to survive from this situation, we have to rely on an abstraction layer so that we can prevent introducing coding blocks like in the core logic:
No, you have to fix the abstractions we already have to support the matrix of things you care about. If this can't be done then maybe we can add new abstractions, but abstractions like this absoultely should not be done inside drivers.
Jason
On 7/28/2021 8:59 PM, Jason Gunthorpe wrote:
On Wed, Jul 28, 2021 at 01:38:58PM +0000, Wang, Zhi A wrote:
I guess those APIs you were talking about are KVM-only. For other hypervisors, e.g. Xen, ARCN cannot use the APIs you mentioned. Not sure if you have already noticed that VFIO is KVM-only right now.
There is very little hard connection between VFIO and KVM, so no, I don't think that is completely true.
In an event, an in-tree version of other hypervisor support for GVT needs to go through enabling VFIO support so that the existing API multiplexers we have can be used properly, not adding a shim layer trying to recreate VFIO inside a GPU driver.
We were delivering the presentation of GVT-g in Xen summit 2018 and we were thinking and talking about supporting VFIO in Xen during the presentation (the video can be found from Youtube). But we didn't see any motiviation from the Xen community to adopt it.
If people take a look into the code in QEMU, in the PCI-passthrough part, Xen is actually not using VFIO even nowadays. We would be glad to see someone can influence on that part, especically making all the in-kernel hypervisor to use VFIO in PCI-passthrough and supporting mdev. That would be a huge benefit for all the users.
GVT-g is designed for many hypervisors not only KVM. In the design, we implemented an abstraction layer for different hypervisors. You can check the link in the previous email which has an example of how the MPT module "xengt" supports GVT-g running under Xen. For example, injecting a msi in VFIO/KVM is via playing with eventfd. But in Xen, we need to issue a hypercall from Dom0.
This is obviously bad design, Xen should plug into the standardized eventfd scheme as well and trigger its hypercall this way. Then it can integrate with the existing VFIO interrupt abstraction infrastructure.
others, like querying mappings between GFN and HFN.
This should be done through VFIO containers, there is nothing KVM specific there.
As you can see, to survive from this situation, we have to rely on an abstraction layer so that we can prevent introducing coding blocks like in the core logic:
No, you have to fix the abstractions we already have to support the matrix of things you care about. If this can't be done then maybe we can add new abstractions, but abstractions like this absoultely should not be done inside drivers.
Jason
That's a good point and we were actually thinking about this before and I believe that's the correct direction. But just like the situation mentioned above, it would be nice if people can really put a great influence on all in-kernel hypervisors to use VFIO which can really benefit all the users.
For now, we are just going to take christoph's patches.
Zhi
On Thu, Jul 22, 2021 at 10:49:47AM +0000, Wang, Zhi A wrote:
But it's hard for some customers to contribute their own "hypervisor" module to the upstream Linux kernel.
What prevents them from doing this? We will take any code that meets our standards, what format is this external code in?
I am thinking what would be a better solution here? The MPT layer in the kernel helps a lot for customers, but only one open-source "hypervisor" module is there in the kernel. That can confuse people which don't know the story. One thing I was thinking is to put a document about the background and more description in the MPT headers. So it won't confuse more people.
If no one is using it in the kernel, it needs to be removed. No abstractions should be added that are not required by the in-tree code.
So this series should be accepted, _or_ the external code needs to be submitted for inclusion.
thanks,
greg k-h
dri-devel@lists.freedesktop.org