Users attempting to enable vfio PCI device assignment with a GPU will often block the default PCI driver from the device to avoid conflicts with the device initialization or release path. This means that vfio-pci is sometimes the first PCI driver to bind to the device. In the case of assigning the primary graphics device, low-level console drivers may still generate resource conflicts. Users often employ kernel command line arguments to disable conflicting drivers or perform unbinding in userspace to avoid this, but the actual solution is often distribution/kernel config specific based on the included drivers.
We can instead allow vfio-pci to copy the behavior of drm_aperture_remove_conflicting_pci_framebuffers() in order to remove these low-level drivers with conflicting resources. vfio-pci is not however a DRM driver, nor does vfio-pci depend on DRM config options, thus we split out and export the necessary DRM apterture support and mirror the framebuffer and VGA support.
I'd be happy to pull this series in through the vfio branch if approved by the DRM maintainers. Thanks,
Alex
---
Alex Williamson (2): drm/aperture: Split conflicting platform driver removal vfio/pci: Remove console drivers
drivers/gpu/drm/drm_aperture.c | 33 +++++++++++++++++++++++--------- drivers/vfio/pci/vfio_pci_core.c | 17 ++++++++++++++++ include/drm/drm_aperture.h | 2 ++ 3 files changed, 43 insertions(+), 9 deletions(-)
Split the removal of platform drivers conflicting with PCI resources from drm_aperture_remove_conflicting_pci_framebuffers() to support both non-DRM callers and better modularity. This is intended to support the vfio-pci driver, which can acquire ownership of PCI graphics devices, but is not itself a DRM or FB driver, and therefore has no Kconfig dependencies. The remaining actions of drm_aperture_remove_conflicting_pci_framebuffers() are already exported from their respective subsystems, therefore this allows vfio-pci to separate each set of conflicts independently based on the configured features.
Reported-by: Laszlo Ersek lersek@redhat.com Tested-by: Laszlo Ersek lersek@redhat.com Suggested-by: Gerd Hoffmann kraxel@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com --- drivers/gpu/drm/drm_aperture.c | 33 ++++++++++++++++++++++++--------- include/drm/drm_aperture.h | 2 ++ 2 files changed, 26 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/drm_aperture.c b/drivers/gpu/drm/drm_aperture.c index 74bd4a76b253..5b2500f7fb8b 100644 --- a/drivers/gpu/drm/drm_aperture.c +++ b/drivers/gpu/drm/drm_aperture.c @@ -313,6 +313,28 @@ int drm_aperture_remove_conflicting_framebuffers(resource_size_t base, resource_ } EXPORT_SYMBOL(drm_aperture_remove_conflicting_framebuffers);
+/** + * drm_aperture_detach_platform_drivers - detach platform drivers conflicting with PCI device + * @pdev: PCI device + * + * This function detaches platform drivers with resource conflicts to the memory + * bars of the provided @pdev. + */ +void drm_aperture_detach_platform_drivers(struct pci_dev *pdev) +{ + resource_size_t base, size; + int bar; + + for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) { + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) + continue; + base = pci_resource_start(pdev, bar); + size = pci_resource_len(pdev, bar); + drm_aperture_detach_drivers(base, size); + } +} +EXPORT_SYMBOL(drm_aperture_detach_platform_drivers); + /** * drm_aperture_remove_conflicting_pci_framebuffers - remove existing framebuffers for PCI devices * @pdev: PCI device @@ -328,16 +350,9 @@ EXPORT_SYMBOL(drm_aperture_remove_conflicting_framebuffers); int drm_aperture_remove_conflicting_pci_framebuffers(struct pci_dev *pdev, const struct drm_driver *req_driver) { - resource_size_t base, size; - int bar, ret = 0; + int ret = 0;
- for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) { - if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) - continue; - base = pci_resource_start(pdev, bar); - size = pci_resource_len(pdev, bar); - drm_aperture_detach_drivers(base, size); - } + drm_aperture_detach_platform_drivers(pdev);
/* * WARNING: Apparently we must kick fbdev drivers before vgacon, diff --git a/include/drm/drm_aperture.h b/include/drm/drm_aperture.h index 7096703c3949..53fd36fa258e 100644 --- a/include/drm/drm_aperture.h +++ b/include/drm/drm_aperture.h @@ -15,6 +15,8 @@ int devm_aperture_acquire_from_firmware(struct drm_device *dev, resource_size_t int drm_aperture_remove_conflicting_framebuffers(resource_size_t base, resource_size_t size, bool primary, const struct drm_driver *req_driver);
+void drm_aperture_detach_platform_drivers(struct pci_dev *pdev); + int drm_aperture_remove_conflicting_pci_framebuffers(struct pci_dev *pdev, const struct drm_driver *req_driver);
Console drivers can create conflicts with PCI resources resulting in userspace getting mmap failures to memory BARs. This is especially evident when trying to re-use the system primary console for userspace drivers. Attempt to remove all nature of conflicting drivers as part of our VGA initialization.
Reported-by: Laszlo Ersek lersek@redhat.com Tested-by: Laszlo Ersek lersek@redhat.com Suggested-by: Gerd Hoffmann kraxel@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com --- drivers/vfio/pci/vfio_pci_core.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index a0d69ddaf90d..e0cbcbc2aee1 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -13,6 +13,7 @@ #include <linux/device.h> #include <linux/eventfd.h> #include <linux/file.h> +#include <linux/fb.h> #include <linux/interrupt.h> #include <linux/iommu.h> #include <linux/module.h> @@ -29,6 +30,8 @@
#include <linux/vfio_pci_core.h>
+#include <drm/drm_aperture.h> + #define DRIVER_AUTHOR "Alex Williamson alex.williamson@redhat.com" #define DRIVER_DESC "core driver for VFIO based PCI devices"
@@ -1793,6 +1796,20 @@ static int vfio_pci_vga_init(struct vfio_pci_core_device *vdev) if (!vfio_pci_is_vga(pdev)) return 0;
+#if IS_REACHABLE(CONFIG_DRM) + drm_aperture_detach_platform_drivers(pdev); +#endif + +#if IS_REACHABLE(CONFIG_FB) + ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name); + if (ret) + return ret; +#endif + + ret = vga_remove_vgacon(pdev); + if (ret) + return ret; + ret = vga_client_register(pdev, vfio_pci_set_decode); if (ret) return ret;
Hi Alex
Am 06.06.22 um 19:53 schrieb Alex Williamson:
Console drivers can create conflicts with PCI resources resulting in userspace getting mmap failures to memory BARs. This is especially evident when trying to re-use the system primary console for userspace drivers. Attempt to remove all nature of conflicting drivers as part of our VGA initialization.
First a dumb question about your use case. You want to assign a PCI graphics card to a virtual machine and need to remove the generic driver from the framebuffer?
Reported-by: Laszlo Ersek lersek@redhat.com Tested-by: Laszlo Ersek lersek@redhat.com Suggested-by: Gerd Hoffmann kraxel@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com
drivers/vfio/pci/vfio_pci_core.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index a0d69ddaf90d..e0cbcbc2aee1 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -13,6 +13,7 @@ #include <linux/device.h> #include <linux/eventfd.h> #include <linux/file.h> +#include <linux/fb.h> #include <linux/interrupt.h> #include <linux/iommu.h> #include <linux/module.h> @@ -29,6 +30,8 @@
#include <linux/vfio_pci_core.h>
+#include <drm/drm_aperture.h>
- #define DRIVER_AUTHOR "Alex Williamson alex.williamson@redhat.com" #define DRIVER_DESC "core driver for VFIO based PCI devices"
@@ -1793,6 +1796,20 @@ static int vfio_pci_vga_init(struct vfio_pci_core_device *vdev) if (!vfio_pci_is_vga(pdev)) return 0;
+#if IS_REACHABLE(CONFIG_DRM)
- drm_aperture_detach_platform_drivers(pdev);
+#endif
+#if IS_REACHABLE(CONFIG_FB)
- ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name);
- if (ret)
return ret;
+#endif
- ret = vga_remove_vgacon(pdev);
- if (ret)
return ret;
You shouldn't have to copy any of the implementation of the aperture helpers.
If you call drm_aperture_remove_conflicting_pci_framebuffers() it should work correctly. The only reason why it requires a DRM driver structure as second argument is for the driver's name. [1] And that name is only used for printing an info message. [2]
The plan forward would be to drop patch 1 entirely.
For patch 2, the most trivial workaround is to instanciate struct drm_driver here and set the name field to 'vdev->vdev.ops->name'. In the longer term, the aperture helpers will be moved out of DRM and into a more prominent location. That workaround will be cleaned up then.
Alternatively, drm_aperture_remove_conflicting_pci_framebuffers() could be changed to accept the name string as second argument, but that's quite a bit of churn within the DRM code.
Best regards Thomas
[1] https://elixir.bootlin.com/linux/v5.18.2/source/drivers/gpu/drm/drm_aperture... [2] https://elixir.bootlin.com/linux/v5.18.2/source/drivers/video/fbdev/core/fbm...
ret = vga_client_register(pdev, vfio_pci_set_decode); if (ret) return ret;
Hi Thomas,
On Wed, 8 Jun 2022 13:11:21 +0200 Thomas Zimmermann tzimmermann@suse.de wrote:
Hi Alex
Am 06.06.22 um 19:53 schrieb Alex Williamson:
Console drivers can create conflicts with PCI resources resulting in userspace getting mmap failures to memory BARs. This is especially evident when trying to re-use the system primary console for userspace drivers. Attempt to remove all nature of conflicting drivers as part of our VGA initialization.
First a dumb question about your use case. You want to assign a PCI graphics card to a virtual machine and need to remove the generic driver from the framebuffer?
Exactly.
Reported-by: Laszlo Ersek lersek@redhat.com Tested-by: Laszlo Ersek lersek@redhat.com Suggested-by: Gerd Hoffmann kraxel@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com
drivers/vfio/pci/vfio_pci_core.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index a0d69ddaf90d..e0cbcbc2aee1 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -13,6 +13,7 @@ #include <linux/device.h> #include <linux/eventfd.h> #include <linux/file.h> +#include <linux/fb.h> #include <linux/interrupt.h> #include <linux/iommu.h> #include <linux/module.h> @@ -29,6 +30,8 @@
#include <linux/vfio_pci_core.h>
+#include <drm/drm_aperture.h>
- #define DRIVER_AUTHOR "Alex Williamson alex.williamson@redhat.com" #define DRIVER_DESC "core driver for VFIO based PCI devices"
@@ -1793,6 +1796,20 @@ static int vfio_pci_vga_init(struct vfio_pci_core_device *vdev) if (!vfio_pci_is_vga(pdev)) return 0;
+#if IS_REACHABLE(CONFIG_DRM)
- drm_aperture_detach_platform_drivers(pdev);
+#endif
+#if IS_REACHABLE(CONFIG_FB)
- ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name);
- if (ret)
return ret;
+#endif
- ret = vga_remove_vgacon(pdev);
- if (ret)
return ret;
You shouldn't have to copy any of the implementation of the aperture helpers.
If you call drm_aperture_remove_conflicting_pci_framebuffers() it should work correctly. The only reason why it requires a DRM driver structure as second argument is for the driver's name. [1] And that name is only used for printing an info message. [2]
vfio-pci is not dependent on CONFIG_DRM, therefore we need to open code this regardless. The only difference if we were to use the existing function would be something like:
#if IS_REACHABLE(CONFIG_DRM) ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev, &dummy_drm_driver); if (ret) return ret; #else #if IS_REACHABLE(CONFIG_FB) ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name); if (ret) return ret; #endif ret = vga_remove_vgacon(pdev); if (ret) return ret; #endif
It's also bad practice to create a dummy DRM driver struct with some assumption which fields are used. If the usage is later expanded, we'd only discover it by users getting segfaults. If DRM wanted to make such an API guarantee, then we shouldn't have commit 97c9bfe3f660 ("drm/aperture: Pass DRM driver structure instead of driver name").
The plan forward would be to drop patch 1 entirely.
For patch 2, the most trivial workaround is to instanciate struct drm_driver here and set the name field to 'vdev->vdev.ops->name'. In the longer term, the aperture helpers will be moved out of DRM and into a more prominent location. That workaround will be cleaned up then.
Trivial in execution, but as above, this is poor practice and should be avoided.
Alternatively, drm_aperture_remove_conflicting_pci_framebuffers() could be changed to accept the name string as second argument, but that's quite a bit of churn within the DRM code.
The series as presented was exactly meant to provide the most correct solution with the least churn to the DRM code. The above referenced commit precludes us from calling the existing DRM function directly without resorting to poor practices of assuming the usage of the DRM driver struct. Using the existing DRM function also does not prevent us from open coding the remainder of the function to avoid a CONFIG_DRM dependency. Thanks,
Alex
Hi
Am 08.06.22 um 16:04 schrieb Alex Williamson:
You shouldn't have to copy any of the implementation of the aperture helpers.
If you call drm_aperture_remove_conflicting_pci_framebuffers() it should work correctly. The only reason why it requires a DRM driver structure as second argument is for the driver's name. [1] And that name is only used for printing an info message. [2]
vfio-pci is not dependent on CONFIG_DRM, therefore we need to open code this regardless. The only difference if we were to use the existing function would be something like:
#if IS_REACHABLE(CONFIG_DRM) ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev, &dummy_drm_driver); if (ret) return ret; #else #if IS_REACHABLE(CONFIG_FB) ret = remove_conflicting_pci_framebuffers(pdev, vdev->vdev.ops->name); if (ret) return ret; #endif ret = vga_remove_vgacon(pdev); if (ret) return ret; #endif
It's also bad practice to create a dummy DRM driver struct with some assumption which fields are used. If the usage is later expanded, we'd only discover it by users getting segfaults. If DRM wanted to make such an API guarantee, then we shouldn't have commit 97c9bfe3f660 ("drm/aperture: Pass DRM driver structure instead of driver name").
What you're open coding within vfio is legacy code and we want to remove it as much as possible from the aperture helpers.
Tying the helpers to DRM was in mistake in retrospective. We wanted something tailored to the needs of DRM. Now that we've seen quite a few corner cases in the interaction among graphics subsystems, we need something else. The order of creating devices and loading driver modules is crucial to making the hand-over between drivers work reliably. So far, this luckily has only been a problem in theory, but not practice.
The plan forward would be to drop patch 1 entirely.
For patch 2, the most trivial workaround is to instanciate struct drm_driver here and set the name field to 'vdev->vdev.ops->name'. In the longer term, the aperture helpers will be moved out of DRM and into a more prominent location. That workaround will be cleaned up then.
Trivial in execution, but as above, this is poor practice and should be avoided.
Alternatively, drm_aperture_remove_conflicting_pci_framebuffers() could be changed to accept the name string as second argument, but that's quite a bit of churn within the DRM code.
The series as presented was exactly meant to provide the most correct solution with the least churn to the DRM code. The above referenced commit precludes us from calling the existing DRM function directly without resorting to poor practices of assuming the usage of the DRM driver struct. Using the existing DRM function also does not prevent us from open coding the remainder of the function to avoid a CONFIG_DRM dependency. Thanks,
Please have a look at the attached patch. It moves the aperture helpers to a location common to the various possible users (DRM, fbdev, vfio). The DRM interfaces remain untouched for now. The patch should provide what you need in vfio and also serve our future use cases for graphics drivers. If possible, please create your patch on top of it.
Best regards Thomas
Alex
On Thu, 9 Jun 2022 11:13:22 +0200 Thomas Zimmermann tzimmermann@suse.de wrote:
Please have a look at the attached patch. It moves the aperture helpers to a location common to the various possible users (DRM, fbdev, vfio). The DRM interfaces remain untouched for now. The patch should provide what you need in vfio and also serve our future use cases for graphics drivers. If possible, please create your patch on top of it.
Looks good to me, this of course makes the vfio change quite trivial. One change I'd request:
diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig index 40c50fa2dd70..7f3c44e1538b 100644 --- a/drivers/video/console/Kconfig +++ b/drivers/video/console/Kconfig @@ -10,6 +10,7 @@ config VGA_CONSOLE depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \ (!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \ !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !S390 && !UML + select APERTURE_HELPERS if (DRM || FB || VFIO_PCI) default y help Saying Y here will allow you to use Linux in text mode through a
This should be VFIO_PCI_CORE. Thanks,
Alex
On Thu, 9 Jun 2022 15:41:02 -0600 Alex Williamson alex.williamson@redhat.com wrote:
On Thu, 9 Jun 2022 11:13:22 +0200 Thomas Zimmermann tzimmermann@suse.de wrote:
Please have a look at the attached patch. It moves the aperture helpers to a location common to the various possible users (DRM, fbdev, vfio). The DRM interfaces remain untouched for now. The patch should provide what you need in vfio and also serve our future use cases for graphics drivers. If possible, please create your patch on top of it.
Looks good to me, this of course makes the vfio change quite trivial. One change I'd request:
diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig index 40c50fa2dd70..7f3c44e1538b 100644 --- a/drivers/video/console/Kconfig +++ b/drivers/video/console/Kconfig @@ -10,6 +10,7 @@ config VGA_CONSOLE depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \ (!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \ !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !S390 && !UML
- select APERTURE_HELPERS if (DRM || FB || VFIO_PCI) default y help Saying Y here will allow you to use Linux in text mode through a
This should be VFIO_PCI_CORE. Thanks,
Also, whatever tree this lands in, I'd appreciate a topic branch being made available so I can more easily get the vfio change in on the same release. Thanks,
Alex
Hi
Am 09.06.22 um 23:44 schrieb Alex Williamson:
On Thu, 9 Jun 2022 15:41:02 -0600 Alex Williamson alex.williamson@redhat.com wrote:
On Thu, 9 Jun 2022 11:13:22 +0200 Thomas Zimmermann tzimmermann@suse.de wrote:
Please have a look at the attached patch. It moves the aperture helpers to a location common to the various possible users (DRM, fbdev, vfio). The DRM interfaces remain untouched for now. The patch should provide what you need in vfio and also serve our future use cases for graphics drivers. If possible, please create your patch on top of it.
Looks good to me, this of course makes the vfio change quite trivial. One change I'd request:
diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig index 40c50fa2dd70..7f3c44e1538b 100644 --- a/drivers/video/console/Kconfig +++ b/drivers/video/console/Kconfig @@ -10,6 +10,7 @@ config VGA_CONSOLE depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \ (!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \ !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !S390 && !UML
- select APERTURE_HELPERS if (DRM || FB || VFIO_PCI) default y help Saying Y here will allow you to use Linux in text mode through a
This should be VFIO_PCI_CORE. Thanks,
I attached an updated patch to this email.
Also, whatever tree this lands in, I'd appreciate a topic branch being made available so I can more easily get the vfio change in on the same release. Thanks,
You can add my patch to your series and merge it through vfio. You'd only have to cc dri-devel for the patch's review. I guess it's more important for vfio than DRM. We have no hurry on the DRM side, but v5.20 would be nice.
Best regards Thomas
Alex
On Fri, 10 Jun 2022 09:03:15 +0200 Thomas Zimmermann tzimmermann@suse.de wrote:
Hi
Am 09.06.22 um 23:44 schrieb Alex Williamson:
On Thu, 9 Jun 2022 15:41:02 -0600 Alex Williamson alex.williamson@redhat.com wrote:
On Thu, 9 Jun 2022 11:13:22 +0200 Thomas Zimmermann tzimmermann@suse.de wrote:
Please have a look at the attached patch. It moves the aperture helpers to a location common to the various possible users (DRM, fbdev, vfio). The DRM interfaces remain untouched for now. The patch should provide what you need in vfio and also serve our future use cases for graphics drivers. If possible, please create your patch on top of it.
Looks good to me, this of course makes the vfio change quite trivial. One change I'd request:
diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig index 40c50fa2dd70..7f3c44e1538b 100644 --- a/drivers/video/console/Kconfig +++ b/drivers/video/console/Kconfig @@ -10,6 +10,7 @@ config VGA_CONSOLE depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \ (!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \ !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !S390 && !UML
- select APERTURE_HELPERS if (DRM || FB || VFIO_PCI) default y help Saying Y here will allow you to use Linux in text mode through a
This should be VFIO_PCI_CORE. Thanks,
I attached an updated patch to this email.
Also, whatever tree this lands in, I'd appreciate a topic branch being made available so I can more easily get the vfio change in on the same release. Thanks,
You can add my patch to your series and merge it through vfio. You'd only have to cc dri-devel for the patch's review. I guess it's more important for vfio than DRM. We have no hurry on the DRM side, but v5.20 would be nice.
Ok, I didn't realize you were offering the patch for me to post and merge. I'll do that. Thanks!
Alex
Hi,
You shouldn't have to copy any of the implementation of the aperture helpers.
That comes from the aperture helpers being part of drm ...
For patch 2, the most trivial workaround is to instanciate struct drm_driver here and set the name field to 'vdev->vdev.ops->name'. In the longer term, the aperture helpers will be moved out of DRM and into a more prominent location. That workaround will be cleaned up then.
... but if the long-term plan is to clean that up properly anyway I don't see the point in bike shedding too much on the details of some temporary solution.
Alternatively, drm_aperture_remove_conflicting_pci_framebuffers() could be changed to accept the name string as second argument, but that's quite a bit of churn within the DRM code.
Also pointless churn because you'll have the very same churn again when moving the aperture helpers out of drm.
take care, Gerd
Hello Alex,
On 6/6/22 19:53, Alex Williamson wrote:
Users attempting to enable vfio PCI device assignment with a GPU will often block the default PCI driver from the device to avoid conflicts with the device initialization or release path. This means that vfio-pci is sometimes the first PCI driver to bind to the device. In the case of assigning the primary graphics device, low-level console drivers may still generate resource conflicts. Users often employ kernel command line arguments to disable conflicting drivers or perform unbinding in userspace to avoid this, but the actual solution is often distribution/kernel config specific based on the included drivers.
We can instead allow vfio-pci to copy the behavior of drm_aperture_remove_conflicting_pci_framebuffers() in order to remove these low-level drivers with conflicting resources. vfio-pci is not however a DRM driver, nor does vfio-pci depend on DRM config options, thus we split out and export the necessary DRM apterture support and mirror the framebuffer and VGA support.
I'd be happy to pull this series in through the vfio branch if approved by the DRM maintainers. Thanks,
I understand your issue but I really don't think that using this helper is the correct thing to do. We already have some races with the current aperture infrastructure As an example you can look at [0].
The agreement on the mentioned thread is that we want to unify the fbdev and DRM drivers apertures into a single list, and ideally moving all to the Linux device model to handle the removal of conflicting devices.
That's why I don't feel that leaking the DRM aperture helper to another is desirable since it would make even harder to cleanup this later.
But also, this issue isn't something that only affects graphic devices, right? AFAIU from [1] and [2], the same issue happens if a PCI device has to be bound to vfio-pci but already was bound to a host driver.
The fact that DRM happens to have some infrastructure to remove devices that conflict with an aperture is just a coincidence. Since this is used to remove devices bound to drivers that make use of the firmware-provided system framebuffer.
The series [0] mentioned above, adds a sysfb_disable() that disables the Generic System Framebuffer logic that is what registers the framebuffer devices that are bound to these generic video drivers. On disable, the devices registered by sysfb are also unregistered.
Would be enough for your use case to use that helper function if it lands or do you really need to look at the apertures? That is, do you want to remove the {vesa,efi,simple}fb and simpledrm drivers or is there a need to also remove real fbdev and DRM drivers?
[0]: https://lore.kernel.org/lkml/YnvrxICnisXU6I1y@ravnborg.org/T/ [1]: https://www.ibm.com/docs/en/linux-on-systems?topic=through-pci [2]: https://www.kernel.org/doc/Documentation/vfio.txt
On Tue, 7 Jun 2022 19:40:40 +0200 Javier Martinez Canillas javierm@redhat.com wrote:
Hello Alex,
On 6/6/22 19:53, Alex Williamson wrote:
Users attempting to enable vfio PCI device assignment with a GPU will often block the default PCI driver from the device to avoid conflicts with the device initialization or release path. This means that vfio-pci is sometimes the first PCI driver to bind to the device. In the case of assigning the primary graphics device, low-level console drivers may still generate resource conflicts. Users often employ kernel command line arguments to disable conflicting drivers or perform unbinding in userspace to avoid this, but the actual solution is often distribution/kernel config specific based on the included drivers.
We can instead allow vfio-pci to copy the behavior of drm_aperture_remove_conflicting_pci_framebuffers() in order to remove these low-level drivers with conflicting resources. vfio-pci is not however a DRM driver, nor does vfio-pci depend on DRM config options, thus we split out and export the necessary DRM apterture support and mirror the framebuffer and VGA support.
I'd be happy to pull this series in through the vfio branch if approved by the DRM maintainers. Thanks,
I understand your issue but I really don't think that using this helper is the correct thing to do. We already have some races with the current aperture infrastructure As an example you can look at [0].
The agreement on the mentioned thread is that we want to unify the fbdev and DRM drivers apertures into a single list, and ideally moving all to the Linux device model to handle the removal of conflicting devices.
That's why I don't feel that leaking the DRM aperture helper to another is desirable since it would make even harder to cleanup this later.
OTOH, this doesn't really make the problem worse and it identifies another stakeholder to a full solution.
But also, this issue isn't something that only affects graphic devices, right? AFAIU from [1] and [2], the same issue happens if a PCI device has to be bound to vfio-pci but already was bound to a host driver.
When we're shuffling between PCI drivers, we expect the unbind of the previous driver to have released all the claimed resources. If there were a previously attached PCI graphics driver, then the code added in patch 2/ is simply redundant since that PCI graphics driver must have already performed similar actions. The primary use case of this series is where there is no previous PCI graphics driver and we have no visibility to platform drivers carving chunks of the PCI resources into different subsystems. AFAIK, this is unique to graphics devices.
The fact that DRM happens to have some infrastructure to remove devices that conflict with an aperture is just a coincidence. Since this is used to remove devices bound to drivers that make use of the firmware-provided system framebuffer.
It seems not so much a coincidence as an artifact of the exact problem both PCI graphics drivers and now vfio-pci face. We've created platform devices to manage sub-ranges of resources, where the actual parent of those resources is only discovered later and we don't automatically resolve the resource conflict between that parent device and these platform devices when binding the parent driver.
The series [0] mentioned above, adds a sysfb_disable() that disables the Generic System Framebuffer logic that is what registers the framebuffer devices that are bound to these generic video drivers. On disable, the devices registered by sysfb are also unregistered.
Would be enough for your use case to use that helper function if it lands or do you really need to look at the apertures? That is, do you want to remove the {vesa,efi,simple}fb and simpledrm drivers or is there a need to also remove real fbdev and DRM drivers?
It's not clear to me how this helps. I infer that sysfb_disable() is intended to be used by, for example, a PCI console driver which would be taking over the console and can therefore make a clear decision to end sysfb support. The vfio-pci driver is not a console driver so we can't make any sort of blind assertion regarding sysfb. We might be binding to a secondary graphics card which has no sysfb drivers attached. I'm a lot more comfortable wielding an interface that intends to disable drivers/devices relative to the resources of a given device rather than a blanket means to disable a subsystem.
I wonder if the race issues aren't better solved by avoiding to create platform devices exposing resource conflicts with known devices, especially when those existing devices have drivers attached. Thanks,
Alex
Hi,
But also, this issue isn't something that only affects graphic devices, right? AFAIU from [1] and [2], the same issue happens if a PCI device has to be bound to vfio-pci but already was bound to a host driver.
Nope. There is a standard procedure to bind and unbind pci drivers via sysfs, using /sys/bus/pci/drivers/$name/{bind,unbind}.
The fact that DRM happens to have some infrastructure to remove devices that conflict with an aperture is just a coincidence.
No. It's a consequence of firmware framebuffers not being linked to the pci device actually backing them, so some other way is needed to find and solve conflicts.
The series [0] mentioned above, adds a sysfb_disable() that disables the Generic System Framebuffer logic that is what registers the framebuffer devices that are bound to these generic video drivers. On disable, the devices registered by sysfb are also unregistered.
As Alex already mentioned this might not have the desired effect on systems with multiple GPUs (I think even without considering vfio-pci).
That is, do you want to remove the {vesa,efi,simple}fb and simpledrm drivers or is there a need to also remove real fbdev and DRM drivers?
Boot framebuffers are the problem because they are neither visible nor manageable in /sys/bus/pci. For real fbdev/drm drivers the standard pci unbind can be used.
take care, Gerd
Hello Gerd and Alex,
On 6/8/22 09:43, Gerd Hoffmann wrote:
Hi,
But also, this issue isn't something that only affects graphic devices, right? AFAIU from [1] and [2], the same issue happens if a PCI device has to be bound to vfio-pci but already was bound to a host driver.
Nope. There is a standard procedure to bind and unbind pci drivers via sysfs, using /sys/bus/pci/drivers/$name/{bind,unbind}.
Yes, but the cover letter says:
"Users often employ kernel command line arguments to disable conflicting drivers or perform unbinding in userspace to avoid this"
So I misunderstood that the goal was to avoid the need to do this via sysfs in user-space. I understand now that the problem is that for real PCI devices bound to a driver, you know the PCI device ID and bus so that you can use it, but with platform devices bound to drivers that just use a firmware-provided framebuffers you don't have that information to unbound.
Because you could use the standard sysfs bind/unbind interface for this too, but don't have a way to know if the "simple-framebuffer" or "efi-framebuffer" is associated with a PCI device that you want to pass through or another one.
The only information that could tell you that is the I/O memory resource that is associated with the platform device registered and that's why you want to use the drm_aperture_remove_conflicting_pci_framebuffers() helper.
The fact that DRM happens to have some infrastructure to remove devices that conflict with an aperture is just a coincidence.
No. It's a consequence of firmware framebuffers not being linked to the pci device actually backing them, so some other way is needed to find and solve conflicts.
Right, it's clear to me now. As mentioned I misunderstood your problem.
The series [0] mentioned above, adds a sysfb_disable() that disables the Generic System Framebuffer logic that is what registers the framebuffer devices that are bound to these generic video drivers. On disable, the devices registered by sysfb are also unregistered.
As Alex already mentioned this might not have the desired effect on systems with multiple GPUs (I think even without considering vfio-pci).
That's correct, although the firmware framebuffer drivers are just a best effort to allow having some display output even if there's no real video driver (or if the user prevented them to load with "nomodeset").
We have talked about improving this, by unifying fbdev and DRM apertures in a single list that could track all the devices registered and their requested aperture so that all subsystems could use it. The reason why I was pushing back on using the DRM aperture helper is that it would make more complicated later to do this refactoring as more subsystems use the current API.
But as Alex said, it wouldn't make the problem worse so I'm OK with this if others agree that's the correct thing to do.
That is, do you want to remove the {vesa,efi,simple}fb and simpledrm drivers or is there a need to also remove real fbdev and DRM drivers?
Boot framebuffers are the problem because they are neither visible nor manageable in /sys/bus/pci. For real fbdev/drm drivers the standard pci unbind can be used.
Yes. Honestly I believe all this should be handled by the Linux device model.
That is, drivers could just do pci_request_region() / request_mem_region() and drivers that want to unbind another bound device could do something like pci_request_region_force() / request_mem_region_force() to kick them out.
Hi,
But also, this issue isn't something that only affects graphic devices, right? AFAIU from [1] and [2], the same issue happens if a PCI device has to be bound to vfio-pci but already was bound to a host driver.
Nope. There is a standard procedure to bind and unbind pci drivers via sysfs, using /sys/bus/pci/drivers/$name/{bind,unbind}.
Yes, but the cover letter says:
"Users often employ kernel command line arguments to disable conflicting drivers or perform unbinding in userspace to avoid this"
Thats helpful at times to deal with driver and/or hardware quirks. Example: Years ago drm drivers used to be horrible when it came to unbind, leaving oopses and panics left & right when you tried (luckily it works much better these days).
[ leaving this here for completeness, snipping the remaining reply, noting that we are on the same page now ]
thanks & take care, Gerd
dri-devel@lists.freedesktop.org