This is in support of Max's series to split vfio-pci. For that to work the reflck concept embedded in vfio-pci needs to be sharable across all of the new VFIO PCI drivers which motivated re-examining how this is implemented.
Another significant issue is how the VFIO PCI core includes code like:
if (pci_dev_driver(pdev) != &vfio_pci_driver)
Which is not scalable if there are going to be multiple different driver types.
This series takes the approach of moving the "reflck" mechanism into the core code as a "device set". Each vfio_device driver can specify how vfio_devices are grouped into the set using a key and the set comes along with a set-global mutex. The core code manages creating per-device set memory and associating it with each vfio_device.
In turn this allows the core code to provide an open/close_device() operation that is called only for the first/last FD, and is called under the global device set lock.
Review of all the drivers show that they are either already open coding the first/last semantic or are buggy and missing it. All drivers are migrated/fixed to the new open/close_device ops and the unused per-FD open()/release() ops are deleted.
The special behavior of PCI around the bus/slot "reset group" is recast in terms of the device set which conslidates the reflck, eliminates two touches of pci_dev_driver(), and allows the reset mechanism to share across all VFIO PCI drivers. PCI is changed to acquire devices directly from the device set instead of trying to work backwards from the struct pci_device.
Overall a few minor bugs are squashed and quite a bit of code is removed through consolidation.
v3: - Atomic conversion of mbochs_used_mbytes - Add missing vfio_uninit_group_dev in error unwind of mbochs - Reorganize vfio_assign_device_set() - Move the dev_set_list hunks to the introduction of the dev_set - Use if instead of ?: in fsl - Add a comment about the whole bus reset in vfio_pci_probe() - Rename vfio_pci_check_all_devices_bound() to vfio_pci_is_device_in_set() - Move logic from vfio_pci_try_bus_reset() into vfio_pci_find_reset_target() v2: https://lore.kernel.org/r/0-v2-b6a5582525c9+ff96-vfio_reflck_jgg@nvidia.com - Reorder fsl and mbochs vfio_uninit_group_dev - Fix missing error unwind in mbochs - Return 0 from mdev open_device if there is no op - Fix style for else {} - Spelling fix for singleton - Acquire cur_mem under lock - Always use error unwind flow for vfio_pci_check_all_devices_bound() v1: https://lore.kernel.org/r/0-v1-eaf3ccbba33c+1add0-vfio_reflck_jgg@nvidia.com
Jason Gunthorpe (12): vfio/samples: Remove module get/put vfio/mbochs: Fix missing error unwind of mbochs_used_mbytes vfio: Provide better generic support for open/release vfio_device_ops vfio/samples: Delete useless open/close vfio/fsl: Move to the device set infrastructure vfio/platform: Use open_device() instead of open coding a refcnt scheme vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set vfio/pci: Reorganize VFIO_DEVICE_PCI_HOT_RESET to use the device set vfio/mbochs: Fix close when multiple device FDs are open vfio/ap,ccw: Fix open/close when multiple device FDs are open vfio/gvt: Fix open/close when multiple device FDs are open vfio: Remove struct vfio_device_ops open/release
Max Gurtovoy (1): vfio: Introduce a vfio_uninit_group_dev() API call
Yishai Hadas (1): vfio/pci: Move to the device set infrastructure
Documentation/driver-api/vfio.rst | 4 +- drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +- drivers/s390/cio/vfio_ccw_ops.c | 8 +- drivers/s390/crypto/vfio_ap_ops.c | 8 +- drivers/vfio/fsl-mc/vfio_fsl_mc.c | 159 +----- drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 6 +- drivers/vfio/fsl-mc/vfio_fsl_mc_private.h | 7 - drivers/vfio/mdev/vfio_mdev.c | 31 +- drivers/vfio/pci/vfio_pci.c | 487 +++++++----------- drivers/vfio/pci/vfio_pci_private.h | 7 - drivers/vfio/platform/vfio_platform_common.c | 86 ++-- drivers/vfio/platform/vfio_platform_private.h | 1 - drivers/vfio/vfio.c | 144 +++++- include/linux/mdev.h | 9 +- include/linux/vfio.h | 26 +- samples/vfio-mdev/mbochs.c | 40 +- samples/vfio-mdev/mdpy.c | 40 +- samples/vfio-mdev/mtty.c | 40 +- 18 files changed, 468 insertions(+), 643 deletions(-)
The patch to move the get/put to core and the patch to convert the samples to use vfio_device crossed in a way that this was missed. When both patches are together the samples do not need their own get/put.
Fixes: 437e41368c01 ("vfio/mdpy: Convert to use vfio_register_group_dev()") Fixes: 681c1615f891 ("vfio/mbochs: Convert to use vfio_register_group_dev()") Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- samples/vfio-mdev/mbochs.c | 4 ---- samples/vfio-mdev/mdpy.c | 4 ---- 2 files changed, 8 deletions(-)
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c index 6c0f229db36a1a..e81b875b4d87b4 100644 --- a/samples/vfio-mdev/mbochs.c +++ b/samples/vfio-mdev/mbochs.c @@ -1274,9 +1274,6 @@ static long mbochs_ioctl(struct vfio_device *vdev, unsigned int cmd,
static int mbochs_open(struct vfio_device *vdev) { - if (!try_module_get(THIS_MODULE)) - return -ENODEV; - return 0; }
@@ -1300,7 +1297,6 @@ static void mbochs_close(struct vfio_device *vdev) mbochs_put_pages(mdev_state);
mutex_unlock(&mdev_state->ops_lock); - module_put(THIS_MODULE); }
static ssize_t diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c index 393c9df6f6a010..a7d4ed28d66411 100644 --- a/samples/vfio-mdev/mdpy.c +++ b/samples/vfio-mdev/mdpy.c @@ -611,15 +611,11 @@ static long mdpy_ioctl(struct vfio_device *vdev, unsigned int cmd,
static int mdpy_open(struct vfio_device *vdev) { - if (!try_module_get(THIS_MODULE)) - return -ENODEV; - return 0; }
static void mdpy_close(struct vfio_device *vdev) { - module_put(THIS_MODULE); }
static ssize_t
Convert mbochs to use an atomic scheme for this like mtty was changed into. The atomic fixes various race conditions with probing. Add the missing error unwind. Also add the missing kfree of mdev_state->pages.
Fixes: 681c1615f891 ("vfio/mbochs: Convert to use vfio_register_group_dev()") Reported-by: Cornelia Huck cohuck@redhat.com Co-developed-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- samples/vfio-mdev/mbochs.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c index e81b875b4d87b4..a12b1fc94ed402 100644 --- a/samples/vfio-mdev/mbochs.c +++ b/samples/vfio-mdev/mbochs.c @@ -129,7 +129,7 @@ static dev_t mbochs_devt; static struct class *mbochs_class; static struct cdev mbochs_cdev; static struct device mbochs_dev; -static int mbochs_used_mbytes; +static atomic_t mbochs_avail_mbytes; static const struct vfio_device_ops mbochs_dev_ops;
struct vfio_region_info_ext { @@ -507,18 +507,22 @@ static int mbochs_reset(struct mdev_state *mdev_state)
static int mbochs_probe(struct mdev_device *mdev) { + int avail_mbytes = atomic_read(&mbochs_avail_mbytes); const struct mbochs_type *type = &mbochs_types[mdev_get_type_group_id(mdev)]; struct device *dev = mdev_dev(mdev); struct mdev_state *mdev_state; int ret = -ENOMEM;
- if (type->mbytes + mbochs_used_mbytes > max_mbytes) - return -ENOMEM; + do { + if (avail_mbytes < type->mbytes) + return -ENOSPC; + } while (!atomic_try_cmpxchg(&mbochs_avail_mbytes, &avail_mbytes, + avail_mbytes - type->mbytes));
mdev_state = kzalloc(sizeof(struct mdev_state), GFP_KERNEL); if (mdev_state == NULL) - return -ENOMEM; + goto err_avail; vfio_init_group_dev(&mdev_state->vdev, &mdev->dev, &mbochs_dev_ops);
mdev_state->vconfig = kzalloc(MBOCHS_CONFIG_SPACE_SIZE, GFP_KERNEL); @@ -549,17 +553,17 @@ static int mbochs_probe(struct mdev_device *mdev) mbochs_create_config_space(mdev_state); mbochs_reset(mdev_state);
- mbochs_used_mbytes += type->mbytes; - ret = vfio_register_group_dev(&mdev_state->vdev); if (ret) goto err_mem; dev_set_drvdata(&mdev->dev, mdev_state); return 0; - err_mem: + kfree(mdev_state->pages); kfree(mdev_state->vconfig); kfree(mdev_state); +err_avail: + atomic_add(mdev_state->type->mbytes, &mbochs_avail_mbytes); return ret; }
@@ -567,8 +571,8 @@ static void mbochs_remove(struct mdev_device *mdev) { struct mdev_state *mdev_state = dev_get_drvdata(&mdev->dev);
- mbochs_used_mbytes -= mdev_state->type->mbytes; vfio_unregister_group_dev(&mdev_state->vdev); + atomic_add(mdev_state->type->mbytes, &mbochs_avail_mbytes); kfree(mdev_state->pages); kfree(mdev_state->vconfig); kfree(mdev_state); @@ -1351,7 +1355,7 @@ static ssize_t available_instances_show(struct mdev_type *mtype, { const struct mbochs_type *type = &mbochs_types[mtype_get_type_group_id(mtype)]; - int count = (max_mbytes - mbochs_used_mbytes) / type->mbytes; + int count = atomic_read(&mbochs_avail_mbytes) / type->mbytes;
return sprintf(buf, "%d\n", count); } @@ -1433,6 +1437,8 @@ static int __init mbochs_dev_init(void) { int ret = 0;
+ atomic_set(&mbochs_avail_mbytes, max_mbytes); + ret = alloc_chrdev_region(&mbochs_devt, 0, MINORMASK + 1, MBOCHS_NAME); if (ret < 0) { pr_err("Error: failed to register mbochs_dev, err: %d\n", ret);
Hi Jason,
url: https://github.com/0day-ci/linux/commits/Jason-Gunthorpe/Provide-core-infras... base: https://github.com/awilliam/linux-vfio.git next config: x86_64-randconfig-m001-20210728 (attached as .config) compiler: gcc-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@oracle.com
smatch warnings: samples/vfio-mdev/mbochs.c:566 mbochs_probe() error: we previously assumed 'mdev_state' could be null (see line 524) samples/vfio-mdev/mbochs.c:566 mbochs_probe() error: dereferencing freed memory 'mdev_state'
vim +/mdev_state +566 samples/vfio-mdev/mbochs.c
681c1615f89144 Jason Gunthorpe 2021-06-17 508 static int mbochs_probe(struct mdev_device *mdev) a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 509 { 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 510 int avail_mbytes = atomic_read(&mbochs_avail_mbytes); 3d3a360e570616 Jason Gunthorpe 2021-04-06 511 const struct mbochs_type *type = 3d3a360e570616 Jason Gunthorpe 2021-04-06 512 &mbochs_types[mdev_get_type_group_id(mdev)]; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 513 struct device *dev = mdev_dev(mdev); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 514 struct mdev_state *mdev_state; 681c1615f89144 Jason Gunthorpe 2021-06-17 515 int ret = -ENOMEM; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 516 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 517 do { 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 518 if (avail_mbytes < type->mbytes) 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 519 return -ENOSPC; 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 520 } while (!atomic_try_cmpxchg(&mbochs_avail_mbytes, &avail_mbytes, 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 521 avail_mbytes - type->mbytes)); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 522 a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 523 mdev_state = kzalloc(sizeof(struct mdev_state), GFP_KERNEL); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 @524 if (mdev_state == NULL) 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 525 goto err_avail;
This goto leads to a NULL deref
681c1615f89144 Jason Gunthorpe 2021-06-17 526 vfio_init_group_dev(&mdev_state->vdev, &mdev->dev, &mbochs_dev_ops); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 527 a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 528 mdev_state->vconfig = kzalloc(MBOCHS_CONFIG_SPACE_SIZE, GFP_KERNEL); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 529 if (mdev_state->vconfig == NULL) a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 530 goto err_mem; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 531 a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 532 mdev_state->memsize = type->mbytes * 1024 * 1024; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 533 mdev_state->pagecount = mdev_state->memsize >> PAGE_SHIFT; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 534 mdev_state->pages = kcalloc(mdev_state->pagecount, a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 535 sizeof(struct page *), a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 536 GFP_KERNEL); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 537 if (!mdev_state->pages) a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 538 goto err_mem; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 539 a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 540 dev_info(dev, "%s: %s, %d MB, %ld pages\n", __func__, 3d3a360e570616 Jason Gunthorpe 2021-04-06 541 type->name, type->mbytes, mdev_state->pagecount); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 542 a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 543 mutex_init(&mdev_state->ops_lock); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 544 mdev_state->mdev = mdev; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 545 INIT_LIST_HEAD(&mdev_state->dmabufs); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 546 mdev_state->next_id = 1; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 547 a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 548 mdev_state->type = type; 104c7405a64d93 Gerd Hoffmann 2018-09-21 549 mdev_state->edid_regs.max_xres = type->max_x; 104c7405a64d93 Gerd Hoffmann 2018-09-21 550 mdev_state->edid_regs.max_yres = type->max_y; 104c7405a64d93 Gerd Hoffmann 2018-09-21 551 mdev_state->edid_regs.edid_offset = MBOCHS_EDID_BLOB_OFFSET; 104c7405a64d93 Gerd Hoffmann 2018-09-21 552 mdev_state->edid_regs.edid_max_size = sizeof(mdev_state->edid_blob); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 553 mbochs_create_config_space(mdev_state); 681c1615f89144 Jason Gunthorpe 2021-06-17 554 mbochs_reset(mdev_state); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 555 681c1615f89144 Jason Gunthorpe 2021-06-17 556 ret = vfio_register_group_dev(&mdev_state->vdev); 681c1615f89144 Jason Gunthorpe 2021-06-17 557 if (ret) 681c1615f89144 Jason Gunthorpe 2021-06-17 558 goto err_mem; 681c1615f89144 Jason Gunthorpe 2021-06-17 559 dev_set_drvdata(&mdev->dev, mdev_state); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 560 return 0; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 561 err_mem: 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 562 kfree(mdev_state->pages); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 563 kfree(mdev_state->vconfig); a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 564 kfree(mdev_state); ^^^^^^^^^^ Freed
909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 565 err_avail: 909fe1e3ec15f4 Jason Gunthorpe 2021-07-28 @566 atomic_add(mdev_state->type->mbytes, &mbochs_avail_mbytes); ^^^^^^^^^^
This should just be: atomic_add(type->mbytes, &mbochs_avail_mbytes);
681c1615f89144 Jason Gunthorpe 2021-06-17 567 return ret; a5e6e6505f38f7 Gerd Hoffmann 2018-05-11 568 }
--- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Thu, Jul 29, 2021 at 12:38:12PM +0300, Dan Carpenter wrote:
This should just be: atomic_add(type->mbytes, &mbochs_avail_mbytes);
Arg, yes, thanks Dan - I thought I got all of these.
Jason
From: Max Gurtovoy mgurtovoy@nvidia.com
This pairs with vfio_init_group_dev() and allows undoing any state that is stored in the vfio_device unrelated to registration. Add appropriately placed calls to all the drivers.
The following patch will use this to add pre-registration state for the device set.
Signed-off-by: Max Gurtovoy mgurtovoy@nvidia.com Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- Documentation/driver-api/vfio.rst | 4 ++- drivers/vfio/fsl-mc/vfio_fsl_mc.c | 7 ++--- drivers/vfio/mdev/vfio_mdev.c | 13 +++++++--- drivers/vfio/pci/vfio_pci.c | 6 +++-- drivers/vfio/platform/vfio_platform_common.c | 7 +++-- drivers/vfio/vfio.c | 5 ++++ include/linux/vfio.h | 1 + samples/vfio-mdev/mbochs.c | 2 ++ samples/vfio-mdev/mdpy.c | 25 ++++++++++-------- samples/vfio-mdev/mtty.c | 27 ++++++++++++-------- 10 files changed, 64 insertions(+), 33 deletions(-)
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst index 606eed8823ceab..c663b6f978255b 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -255,11 +255,13 @@ vfio_unregister_group_dev() respectively:: void vfio_init_group_dev(struct vfio_device *device, struct device *dev, const struct vfio_device_ops *ops); + void vfio_uninit_group_dev(struct vfio_device *device); int vfio_register_group_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device);
The driver should embed the vfio_device in its own structure and call -vfio_init_group_dev() to pre-configure it before going to registration. +vfio_init_group_dev() to pre-configure it before going to registration +and call vfio_uninit_group_dev() after completing the un-registration. vfio_register_group_dev() indicates to the core to begin tracking the iommu_group of the specified dev and register the dev as owned by a VFIO bus driver. Once vfio_register_group_dev() returns it is possible for userspace to diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c index 90cad109583b80..122997c61ba450 100644 --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c @@ -627,7 +627,7 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev)
ret = vfio_fsl_mc_reflck_attach(vdev); if (ret) - goto out_kfree; + goto out_uninit;
ret = vfio_fsl_mc_init_device(vdev); if (ret) @@ -657,7 +657,8 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev) vfio_fsl_uninit_device(vdev); out_reflck: vfio_fsl_mc_reflck_put(vdev->reflck); -out_kfree: +out_uninit: + vfio_uninit_group_dev(&vdev->vdev); kfree(vdev); out_group_put: vfio_iommu_group_put(group, dev); @@ -675,7 +676,7 @@ static int vfio_fsl_mc_remove(struct fsl_mc_device *mc_dev) dprc_remove_devices(mc_dev, NULL, 0); vfio_fsl_uninit_device(vdev); vfio_fsl_mc_reflck_put(vdev->reflck); - + vfio_uninit_group_dev(&vdev->vdev); kfree(vdev); vfio_iommu_group_put(mc_dev->dev.iommu_group, dev);
diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c index 39ef7489fe4719..a5c77ccb24f70a 100644 --- a/drivers/vfio/mdev/vfio_mdev.c +++ b/drivers/vfio/mdev/vfio_mdev.c @@ -120,12 +120,16 @@ static int vfio_mdev_probe(struct mdev_device *mdev)
vfio_init_group_dev(vdev, &mdev->dev, &vfio_mdev_dev_ops); ret = vfio_register_group_dev(vdev); - if (ret) { - kfree(vdev); - return ret; - } + if (ret) + goto out_uninit; + dev_set_drvdata(&mdev->dev, vdev); return 0; + +out_uninit: + vfio_uninit_group_dev(vdev); + kfree(vdev); + return ret; }
static void vfio_mdev_remove(struct mdev_device *mdev) @@ -133,6 +137,7 @@ static void vfio_mdev_remove(struct mdev_device *mdev) struct vfio_device *vdev = dev_get_drvdata(&mdev->dev);
vfio_unregister_group_dev(vdev); + vfio_uninit_group_dev(vdev); kfree(vdev); }
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 318864d5283782..fab3715d60d4ba 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -2022,7 +2022,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
ret = vfio_pci_reflck_attach(vdev); if (ret) - goto out_free; + goto out_uninit; ret = vfio_pci_vf_init(vdev); if (ret) goto out_reflck; @@ -2059,7 +2059,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) vfio_pci_vf_uninit(vdev); out_reflck: vfio_pci_reflck_put(vdev->reflck); -out_free: +out_uninit: + vfio_uninit_group_dev(&vdev->vdev); kfree(vdev->pm_save); kfree(vdev); out_group_put: @@ -2077,6 +2078,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
vfio_pci_vf_uninit(vdev); vfio_pci_reflck_put(vdev->reflck); + vfio_uninit_group_dev(&vdev->vdev); vfio_pci_vga_uninit(vdev);
vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev); diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index 703164df7637db..bdde8605178cd2 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -667,7 +667,7 @@ int vfio_platform_probe_common(struct vfio_platform_device *vdev, ret = vfio_platform_of_probe(vdev, dev);
if (ret) - return ret; + goto out_uninit;
vdev->device = dev;
@@ -675,7 +675,7 @@ int vfio_platform_probe_common(struct vfio_platform_device *vdev, if (ret && vdev->reset_required) { dev_err(dev, "No reset function found for device %s\n", vdev->name); - return ret; + goto out_uninit; }
group = vfio_iommu_group_get(dev); @@ -698,6 +698,8 @@ int vfio_platform_probe_common(struct vfio_platform_device *vdev, vfio_iommu_group_put(group, dev); put_reset: vfio_platform_put_reset(vdev); +out_uninit: + vfio_uninit_group_dev(&vdev->vdev); return ret; } EXPORT_SYMBOL_GPL(vfio_platform_probe_common); @@ -708,6 +710,7 @@ void vfio_platform_remove_common(struct vfio_platform_device *vdev)
pm_runtime_disable(vdev->device); vfio_platform_put_reset(vdev); + vfio_uninit_group_dev(&vdev->vdev); vfio_iommu_group_put(vdev->vdev.dev->iommu_group, vdev->vdev.dev); } EXPORT_SYMBOL_GPL(vfio_platform_remove_common); diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 02cc51ce6891a9..cc375df0fd5dda 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -749,6 +749,11 @@ void vfio_init_group_dev(struct vfio_device *device, struct device *dev, } EXPORT_SYMBOL_GPL(vfio_init_group_dev);
+void vfio_uninit_group_dev(struct vfio_device *device) +{ +} +EXPORT_SYMBOL_GPL(vfio_uninit_group_dev); + int vfio_register_group_dev(struct vfio_device *device) { struct vfio_device *existing_device; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index a2c5b30e1763ba..b0875cf8e496bb 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -61,6 +61,7 @@ extern void vfio_iommu_group_put(struct iommu_group *group, struct device *dev);
void vfio_init_group_dev(struct vfio_device *device, struct device *dev, const struct vfio_device_ops *ops); +void vfio_uninit_group_dev(struct vfio_device *device); int vfio_register_group_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device); extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c index a12b1fc94ed402..3d02e8e5964be7 100644 --- a/samples/vfio-mdev/mbochs.c +++ b/samples/vfio-mdev/mbochs.c @@ -559,6 +559,7 @@ static int mbochs_probe(struct mdev_device *mdev) dev_set_drvdata(&mdev->dev, mdev_state); return 0; err_mem: + vfio_uninit_group_dev(&mdev_state->vdev); kfree(mdev_state->pages); kfree(mdev_state->vconfig); kfree(mdev_state); @@ -572,6 +573,7 @@ static void mbochs_remove(struct mdev_device *mdev) struct mdev_state *mdev_state = dev_get_drvdata(&mdev->dev);
vfio_unregister_group_dev(&mdev_state->vdev); + vfio_uninit_group_dev(&mdev_state->vdev); atomic_add(mdev_state->type->mbytes, &mbochs_avail_mbytes); kfree(mdev_state->pages); kfree(mdev_state->vconfig); diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c index a7d4ed28d66411..57334034cde6dd 100644 --- a/samples/vfio-mdev/mdpy.c +++ b/samples/vfio-mdev/mdpy.c @@ -235,17 +235,16 @@ static int mdpy_probe(struct mdev_device *mdev)
mdev_state->vconfig = kzalloc(MDPY_CONFIG_SPACE_SIZE, GFP_KERNEL); if (mdev_state->vconfig == NULL) { - kfree(mdev_state); - return -ENOMEM; + ret = -ENOMEM; + goto err_state; }
fbsize = roundup_pow_of_two(type->width * type->height * type->bytepp);
mdev_state->memblk = vmalloc_user(fbsize); if (!mdev_state->memblk) { - kfree(mdev_state->vconfig); - kfree(mdev_state); - return -ENOMEM; + ret = -ENOMEM; + goto err_vconfig; } dev_info(dev, "%s: %s (%dx%d)\n", __func__, type->name, type->width, type->height); @@ -260,13 +259,18 @@ static int mdpy_probe(struct mdev_device *mdev) mdpy_count++;
ret = vfio_register_group_dev(&mdev_state->vdev); - if (ret) { - kfree(mdev_state->vconfig); - kfree(mdev_state); - return ret; - } + if (ret) + goto err_mem; dev_set_drvdata(&mdev->dev, mdev_state); return 0; +err_mem: + vfree(mdev_state->memblk); +err_vconfig: + kfree(mdev_state->vconfig); +err_state: + vfio_uninit_group_dev(&mdev_state->vdev); + kfree(mdev_state); + return ret; }
static void mdpy_remove(struct mdev_device *mdev) @@ -278,6 +282,7 @@ static void mdpy_remove(struct mdev_device *mdev) vfio_unregister_group_dev(&mdev_state->vdev); vfree(mdev_state->memblk); kfree(mdev_state->vconfig); + vfio_uninit_group_dev(&mdev_state->vdev); kfree(mdev_state);
mdpy_count--; diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c index 8b26fecc4afedd..37cc9067e1601d 100644 --- a/samples/vfio-mdev/mtty.c +++ b/samples/vfio-mdev/mtty.c @@ -718,8 +718,8 @@ static int mtty_probe(struct mdev_device *mdev)
mdev_state = kzalloc(sizeof(struct mdev_state), GFP_KERNEL); if (mdev_state == NULL) { - atomic_add(nr_ports, &mdev_avail_ports); - return -ENOMEM; + ret = -ENOMEM; + goto err_nr_ports; }
vfio_init_group_dev(&mdev_state->vdev, &mdev->dev, &mtty_dev_ops); @@ -732,9 +732,8 @@ static int mtty_probe(struct mdev_device *mdev) mdev_state->vconfig = kzalloc(MTTY_CONFIG_SPACE_SIZE, GFP_KERNEL);
if (mdev_state->vconfig == NULL) { - kfree(mdev_state); - atomic_add(nr_ports, &mdev_avail_ports); - return -ENOMEM; + ret = -ENOMEM; + goto err_state; }
mutex_init(&mdev_state->ops_lock); @@ -743,14 +742,19 @@ static int mtty_probe(struct mdev_device *mdev) mtty_create_config_space(mdev_state);
ret = vfio_register_group_dev(&mdev_state->vdev); - if (ret) { - kfree(mdev_state); - atomic_add(nr_ports, &mdev_avail_ports); - return ret; - } - + if (ret) + goto err_vconfig; dev_set_drvdata(&mdev->dev, mdev_state); return 0; + +err_vconfig: + kfree(mdev_state->vconfig); +err_state: + vfio_uninit_group_dev(&mdev_state->vdev); + kfree(mdev_state); +err_nr_ports: + atomic_add(nr_ports, &mdev_avail_ports); + return ret; }
static void mtty_remove(struct mdev_device *mdev) @@ -761,6 +765,7 @@ static void mtty_remove(struct mdev_device *mdev) vfio_unregister_group_dev(&mdev_state->vdev);
kfree(mdev_state->vconfig); + vfio_uninit_group_dev(&mdev_state->vdev); kfree(mdev_state); atomic_add(nr_ports, &mdev_avail_ports); }
Currently the driver ops have an open/release pair that is called once each time a device FD is opened or closed. Add an additional set of open/close_device() ops which are called when the device FD is opened for the first time and closed for the last time.
An analysis shows that all of the drivers require this semantic. Some are open coding it as part of their reflck implementation, and some are just buggy and miss it completely.
To retain the current semantics PCI and FSL depend on, introduce the idea of a "device set" which is a grouping of vfio_device's that share the same lock around opening.
The device set is established by providing a 'set_id' pointer. All vfio_device's that provide the same pointer will be joined to the same singleton memory and lock across the whole set. This effectively replaces the oddly named reflck.
After conversion the set_id will be sourced from: - A struct device from a fsl_mc_device (fsl) - A struct pci_slot (pci) - A struct pci_bus (pci) - The struct vfio_device (everything)
The design ensures that the above pointers are live as long as the vfio_device is registered, so they form reliable unique keys to group vfio_devices into sets.
This implementation uses xarray instead of searching through the driver core structures, which simplifies the somewhat tricky locking in this area.
Following patches convert all the drivers.
Signed-off-by: Yishai Hadas yishaih@nvidia.com Reviewed-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/vfio/mdev/vfio_mdev.c | 22 +++++ drivers/vfio/vfio.c | 149 +++++++++++++++++++++++++++++----- include/linux/mdev.h | 2 + include/linux/vfio.h | 21 +++++ 4 files changed, 172 insertions(+), 22 deletions(-)
diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c index a5c77ccb24f70a..725cd2fe675190 100644 --- a/drivers/vfio/mdev/vfio_mdev.c +++ b/drivers/vfio/mdev/vfio_mdev.c @@ -17,6 +17,26 @@
#include "mdev_private.h"
+static int vfio_mdev_open_device(struct vfio_device *core_vdev) +{ + struct mdev_device *mdev = to_mdev_device(core_vdev->dev); + struct mdev_parent *parent = mdev->type->parent; + + if (unlikely(!parent->ops->open_device)) + return 0; + + return parent->ops->open_device(mdev); +} + +static void vfio_mdev_close_device(struct vfio_device *core_vdev) +{ + struct mdev_device *mdev = to_mdev_device(core_vdev->dev); + struct mdev_parent *parent = mdev->type->parent; + + if (likely(parent->ops->close_device)) + parent->ops->close_device(mdev); +} + static int vfio_mdev_open(struct vfio_device *core_vdev) { struct mdev_device *mdev = to_mdev_device(core_vdev->dev); @@ -100,6 +120,8 @@ static void vfio_mdev_request(struct vfio_device *core_vdev, unsigned int count)
static const struct vfio_device_ops vfio_mdev_dev_ops = { .name = "vfio-mdev", + .open_device = vfio_mdev_open_device, + .close_device = vfio_mdev_close_device, .open = vfio_mdev_open, .release = vfio_mdev_release, .ioctl = vfio_mdev_unlocked_ioctl, diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index cc375df0fd5dda..9cc17768c42554 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -96,6 +96,79 @@ module_param_named(enable_unsafe_noiommu_mode, MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode. This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel. If you do not know what this is for, step away. (default: false)"); #endif
+static DEFINE_XARRAY(vfio_device_set_xa); + +int vfio_assign_device_set(struct vfio_device *device, void *set_id) +{ + unsigned long idx = (unsigned long)set_id; + struct vfio_device_set *new_dev_set; + struct vfio_device_set *dev_set; + + if (WARN_ON(!set_id)) + return -EINVAL; + + /* + * Atomically acquire a singleton object in the xarray for this set_id + */ + xa_lock(&vfio_device_set_xa); + dev_set = xa_load(&vfio_device_set_xa, idx); + if (dev_set) + goto found_get_ref; + xa_unlock(&vfio_device_set_xa); + + new_dev_set = kzalloc(sizeof(*new_dev_set), GFP_KERNEL); + if (!new_dev_set) + return -ENOMEM; + mutex_init(&new_dev_set->lock); + INIT_LIST_HEAD(&new_dev_set->device_list); + new_dev_set->set_id = set_id; + + xa_lock(&vfio_device_set_xa); + dev_set = __xa_cmpxchg(&vfio_device_set_xa, idx, NULL, new_dev_set, + GFP_KERNEL); + if (!dev_set) { + dev_set = new_dev_set; + goto found_get_ref; + } + + kfree(new_dev_set); + if (xa_is_err(dev_set)) { + xa_unlock(&vfio_device_set_xa); + return xa_err(dev_set); + } + +found_get_ref: + dev_set->device_count++; + xa_unlock(&vfio_device_set_xa); + mutex_lock(&dev_set->lock); + device->dev_set = dev_set; + list_add_tail(&device->dev_set_list, &dev_set->device_list); + mutex_unlock(&dev_set->lock); + return 0; +} +EXPORT_SYMBOL_GPL(vfio_assign_device_set); + +static void vfio_release_device_set(struct vfio_device *device) +{ + struct vfio_device_set *dev_set = device->dev_set; + + if (!dev_set) + return; + + mutex_lock(&dev_set->lock); + list_del(&device->dev_set_list); + mutex_unlock(&dev_set->lock); + + xa_lock(&vfio_device_set_xa); + if (!--dev_set->device_count) { + __xa_erase(&vfio_device_set_xa, + (unsigned long)dev_set->set_id); + mutex_destroy(&dev_set->lock); + kfree(dev_set); + } + xa_unlock(&vfio_device_set_xa); +} + /* * vfio_iommu_group_{get,put} are only intended for VFIO bus driver probe * and remove functions, any use cases other than acquiring the first @@ -751,6 +824,7 @@ EXPORT_SYMBOL_GPL(vfio_init_group_dev);
void vfio_uninit_group_dev(struct vfio_device *device) { + vfio_release_device_set(device); } EXPORT_SYMBOL_GPL(vfio_uninit_group_dev);
@@ -760,6 +834,13 @@ int vfio_register_group_dev(struct vfio_device *device) struct iommu_group *iommu_group; struct vfio_group *group;
+ /* + * If the driver doesn't specify a set then the device is added to a + * singleton set just for itself. + */ + if (!device->dev_set) + vfio_assign_device_set(device, device); + iommu_group = iommu_group_get(device->dev); if (!iommu_group) return -EINVAL; @@ -1361,7 +1442,8 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) { struct vfio_device *device; struct file *filep; - int ret; + int fdno; + int ret = 0;
if (0 == atomic_read(&group->container_users) || !group->container->iommu_driver || !vfio_group_viable(group)) @@ -1375,38 +1457,38 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) return PTR_ERR(device);
if (!try_module_get(device->dev->driver->owner)) { - vfio_device_put(device); - return -ENODEV; + ret = -ENODEV; + goto err_device_put; }
- ret = device->ops->open(device); - if (ret) { - module_put(device->dev->driver->owner); - vfio_device_put(device); - return ret; + mutex_lock(&device->dev_set->lock); + device->open_count++; + if (device->open_count == 1 && device->ops->open_device) { + ret = device->ops->open_device(device); + if (ret) + goto err_undo_count; + } + mutex_unlock(&device->dev_set->lock); + + if (device->ops->open) { + ret = device->ops->open(device); + if (ret) + goto err_close_device; }
/* * We can't use anon_inode_getfd() because we need to modify * the f_mode flags directly to allow more than just ioctls */ - ret = get_unused_fd_flags(O_CLOEXEC); - if (ret < 0) { - device->ops->release(device); - module_put(device->dev->driver->owner); - vfio_device_put(device); - return ret; - } + fdno = ret = get_unused_fd_flags(O_CLOEXEC); + if (ret < 0) + goto err_release;
filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops, device, O_RDWR); if (IS_ERR(filep)) { - put_unused_fd(ret); ret = PTR_ERR(filep); - device->ops->release(device); - module_put(device->dev->driver->owner); - vfio_device_put(device); - return ret; + goto err_fd; }
/* @@ -1418,12 +1500,28 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
atomic_inc(&group->container_users);
- fd_install(ret, filep); + fd_install(fdno, filep);
if (group->noiommu) dev_warn(device->dev, "vfio-noiommu device opened by user " "(%s:%d)\n", current->comm, task_pid_nr(current)); + return fdno;
+err_fd: + put_unused_fd(fdno); +err_release: + if (device->ops->release) + device->ops->release(device); +err_close_device: + mutex_lock(&device->dev_set->lock); + if (device->open_count == 1 && device->ops->close_device) + device->ops->close_device(device); +err_undo_count: + device->open_count--; + mutex_unlock(&device->dev_set->lock); + module_put(device->dev->driver->owner); +err_device_put: + vfio_device_put(device); return ret; }
@@ -1561,7 +1659,13 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) { struct vfio_device *device = filep->private_data;
- device->ops->release(device); + if (device->ops->release) + device->ops->release(device); + + mutex_lock(&device->dev_set->lock); + if (!--device->open_count && device->ops->close_device) + device->ops->close_device(device); + mutex_unlock(&device->dev_set->lock);
module_put(device->dev->driver->owner);
@@ -2364,6 +2468,7 @@ static void __exit vfio_cleanup(void) class_destroy(vfio.class); vfio.class = NULL; misc_deregister(&vfio_dev); + xa_destroy(&vfio_device_set_xa); }
module_init(vfio_init); diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 3a38598c260559..cb5b7ed1d7c30d 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -111,6 +111,8 @@ struct mdev_parent_ops {
int (*create)(struct mdev_device *mdev); int (*remove)(struct mdev_device *mdev); + int (*open_device)(struct mdev_device *mdev); + void (*close_device)(struct mdev_device *mdev); int (*open)(struct mdev_device *mdev); void (*release)(struct mdev_device *mdev); ssize_t (*read)(struct mdev_device *mdev, char __user *buf, diff --git a/include/linux/vfio.h b/include/linux/vfio.h index b0875cf8e496bb..f0e6a72875e471 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -15,13 +15,28 @@ #include <linux/poll.h> #include <uapi/linux/vfio.h>
+/* + * VFIO devices can be placed in a set, this allows all devices to share this + * structure and the VFIO core will provide a lock that is held around + * open_device()/close_device() for all devices in the set. + */ +struct vfio_device_set { + void *set_id; + struct mutex lock; + struct list_head device_list; + unsigned int device_count; +}; + struct vfio_device { struct device *dev; const struct vfio_device_ops *ops; struct vfio_group *group; + struct vfio_device_set *dev_set; + struct list_head dev_set_list;
/* Members below here are private, not for driver use */ refcount_t refcount; + unsigned int open_count; struct completion comp; struct list_head group_next; }; @@ -29,6 +44,8 @@ struct vfio_device { /** * struct vfio_device_ops - VFIO bus driver device callbacks * + * @open_device: Called when the first file descriptor is opened for this device + * @close_device: Opposite of open_device * @open: Called when userspace creates new file descriptor for device * @release: Called when userspace releases file descriptor for device * @read: Perform read(2) on device file descriptor @@ -43,6 +60,8 @@ struct vfio_device { */ struct vfio_device_ops { char *name; + int (*open_device)(struct vfio_device *vdev); + void (*close_device)(struct vfio_device *vdev); int (*open)(struct vfio_device *vdev); void (*release)(struct vfio_device *vdev); ssize_t (*read)(struct vfio_device *vdev, char __user *buf, @@ -67,6 +86,8 @@ void vfio_unregister_group_dev(struct vfio_device *device); extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); extern void vfio_device_put(struct vfio_device *device);
+int vfio_assign_device_set(struct vfio_device *device, void *set_id); + /* events for the backend driver notify callback */ enum vfio_iommu_notify_type { VFIO_IOMMU_CONTAINER_CLOSE = 0,
The core code no longer requires these ops to be defined, so delete these empty functions and leave the op as NULL. mtty's functions only log a pointless message, delete that entirely.
Signed-off-by: Yishai Hadas yishaih@nvidia.com Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- samples/vfio-mdev/mbochs.c | 6 ------ samples/vfio-mdev/mdpy.c | 11 ----------- samples/vfio-mdev/mtty.c | 13 ------------- 3 files changed, 30 deletions(-)
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c index 3d02e8e5964be7..5ac65894fcd38c 100644 --- a/samples/vfio-mdev/mbochs.c +++ b/samples/vfio-mdev/mbochs.c @@ -1278,11 +1278,6 @@ static long mbochs_ioctl(struct vfio_device *vdev, unsigned int cmd, return -ENOTTY; }
-static int mbochs_open(struct vfio_device *vdev) -{ - return 0; -} - static void mbochs_close(struct vfio_device *vdev) { struct mdev_state *mdev_state = @@ -1401,7 +1396,6 @@ static struct attribute_group *mdev_type_groups[] = { };
static const struct vfio_device_ops mbochs_dev_ops = { - .open = mbochs_open, .release = mbochs_close, .read = mbochs_read, .write = mbochs_write, diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c index 57334034cde6dd..8d1a80a0722aa9 100644 --- a/samples/vfio-mdev/mdpy.c +++ b/samples/vfio-mdev/mdpy.c @@ -614,15 +614,6 @@ static long mdpy_ioctl(struct vfio_device *vdev, unsigned int cmd, return -ENOTTY; }
-static int mdpy_open(struct vfio_device *vdev) -{ - return 0; -} - -static void mdpy_close(struct vfio_device *vdev) -{ -} - static ssize_t resolution_show(struct device *dev, struct device_attribute *attr, char *buf) @@ -717,8 +708,6 @@ static struct attribute_group *mdev_type_groups[] = { };
static const struct vfio_device_ops mdpy_dev_ops = { - .open = mdpy_open, - .release = mdpy_close, .read = mdpy_read, .write = mdpy_write, .ioctl = mdpy_ioctl, diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c index 37cc9067e1601d..5983cdb16e3d1d 100644 --- a/samples/vfio-mdev/mtty.c +++ b/samples/vfio-mdev/mtty.c @@ -1207,17 +1207,6 @@ static long mtty_ioctl(struct vfio_device *vdev, unsigned int cmd, return -ENOTTY; }
-static int mtty_open(struct vfio_device *vdev) -{ - pr_info("%s\n", __func__); - return 0; -} - -static void mtty_close(struct vfio_device *mdev) -{ - pr_info("%s\n", __func__); -} - static ssize_t sample_mtty_dev_show(struct device *dev, struct device_attribute *attr, char *buf) @@ -1325,8 +1314,6 @@ static struct attribute_group *mdev_type_groups[] = {
static const struct vfio_device_ops mtty_dev_ops = { .name = "vfio-mtty", - .open = mtty_open, - .release = mtty_close, .read = mtty_read, .write = mtty_write, .ioctl = mtty_ioctl,
FSL uses the internal reflck to implement the open_device() functionality, conversion to the core code is straightforward.
The decision on which set to be part of is trivially based on the is_fsl_mc_bus_dprc() and we use a 'struct device *' pointer as the set_id.
The dev_set lock is protecting the interrupts setup. The FSL MC devices are using MSIs and only the DPRC device is allocating the MSIs from the MSI domain. The other devices just take interrupts from a pool. The lock is protecting the access to this pool.
Signed-off-by: Yishai Hadas yishaih@nvidia.com Tested-by: Diana Craciun OSS diana.craciun@oss.nxp.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/vfio/fsl-mc/vfio_fsl_mc.c | 154 ++++------------------ drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 6 +- drivers/vfio/fsl-mc/vfio_fsl_mc_private.h | 7 - 3 files changed, 28 insertions(+), 139 deletions(-)
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c index 122997c61ba450..0ead91bfa83867 100644 --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c @@ -19,81 +19,10 @@
static struct fsl_mc_driver vfio_fsl_mc_driver;
-static DEFINE_MUTEX(reflck_lock); - -static void vfio_fsl_mc_reflck_get(struct vfio_fsl_mc_reflck *reflck) -{ - kref_get(&reflck->kref); -} - -static void vfio_fsl_mc_reflck_release(struct kref *kref) -{ - struct vfio_fsl_mc_reflck *reflck = container_of(kref, - struct vfio_fsl_mc_reflck, - kref); - - mutex_destroy(&reflck->lock); - kfree(reflck); - mutex_unlock(&reflck_lock); -} - -static void vfio_fsl_mc_reflck_put(struct vfio_fsl_mc_reflck *reflck) -{ - kref_put_mutex(&reflck->kref, vfio_fsl_mc_reflck_release, &reflck_lock); -} - -static struct vfio_fsl_mc_reflck *vfio_fsl_mc_reflck_alloc(void) -{ - struct vfio_fsl_mc_reflck *reflck; - - reflck = kzalloc(sizeof(*reflck), GFP_KERNEL); - if (!reflck) - return ERR_PTR(-ENOMEM); - - kref_init(&reflck->kref); - mutex_init(&reflck->lock); - - return reflck; -} - -static int vfio_fsl_mc_reflck_attach(struct vfio_fsl_mc_device *vdev) -{ - int ret = 0; - - mutex_lock(&reflck_lock); - if (is_fsl_mc_bus_dprc(vdev->mc_dev)) { - vdev->reflck = vfio_fsl_mc_reflck_alloc(); - ret = PTR_ERR_OR_ZERO(vdev->reflck); - } else { - struct device *mc_cont_dev = vdev->mc_dev->dev.parent; - struct vfio_device *device; - struct vfio_fsl_mc_device *cont_vdev; - - device = vfio_device_get_from_dev(mc_cont_dev); - if (!device) { - ret = -ENODEV; - goto unlock; - } - - cont_vdev = - container_of(device, struct vfio_fsl_mc_device, vdev); - if (!cont_vdev || !cont_vdev->reflck) { - vfio_device_put(device); - ret = -ENODEV; - goto unlock; - } - vfio_fsl_mc_reflck_get(cont_vdev->reflck); - vdev->reflck = cont_vdev->reflck; - vfio_device_put(device); - } - -unlock: - mutex_unlock(&reflck_lock); - return ret; -} - -static int vfio_fsl_mc_regions_init(struct vfio_fsl_mc_device *vdev) +static int vfio_fsl_mc_open_device(struct vfio_device *core_vdev) { + struct vfio_fsl_mc_device *vdev = + container_of(core_vdev, struct vfio_fsl_mc_device, vdev); struct fsl_mc_device *mc_dev = vdev->mc_dev; int count = mc_dev->obj_desc.region_count; int i; @@ -136,58 +65,30 @@ static void vfio_fsl_mc_regions_cleanup(struct vfio_fsl_mc_device *vdev) kfree(vdev->regions); }
-static int vfio_fsl_mc_open(struct vfio_device *core_vdev) -{ - struct vfio_fsl_mc_device *vdev = - container_of(core_vdev, struct vfio_fsl_mc_device, vdev); - int ret = 0; - - mutex_lock(&vdev->reflck->lock); - if (!vdev->refcnt) { - ret = vfio_fsl_mc_regions_init(vdev); - if (ret) - goto out; - } - vdev->refcnt++; -out: - mutex_unlock(&vdev->reflck->lock);
- return ret; -} - -static void vfio_fsl_mc_release(struct vfio_device *core_vdev) +static void vfio_fsl_mc_close_device(struct vfio_device *core_vdev) { struct vfio_fsl_mc_device *vdev = container_of(core_vdev, struct vfio_fsl_mc_device, vdev); + struct fsl_mc_device *mc_dev = vdev->mc_dev; + struct device *cont_dev = fsl_mc_cont_dev(&mc_dev->dev); + struct fsl_mc_device *mc_cont = to_fsl_mc_device(cont_dev); int ret;
- mutex_lock(&vdev->reflck->lock); - - if (!(--vdev->refcnt)) { - struct fsl_mc_device *mc_dev = vdev->mc_dev; - struct device *cont_dev = fsl_mc_cont_dev(&mc_dev->dev); - struct fsl_mc_device *mc_cont = to_fsl_mc_device(cont_dev); - - vfio_fsl_mc_regions_cleanup(vdev); + vfio_fsl_mc_regions_cleanup(vdev);
- /* reset the device before cleaning up the interrupts */ - ret = dprc_reset_container(mc_cont->mc_io, 0, - mc_cont->mc_handle, - mc_cont->obj_desc.id, - DPRC_RESET_OPTION_NON_RECURSIVE); + /* reset the device before cleaning up the interrupts */ + ret = dprc_reset_container(mc_cont->mc_io, 0, mc_cont->mc_handle, + mc_cont->obj_desc.id, + DPRC_RESET_OPTION_NON_RECURSIVE);
- if (ret) { - dev_warn(&mc_cont->dev, "VFIO_FLS_MC: reset device has failed (%d)\n", - ret); - WARN_ON(1); - } + if (WARN_ON(ret)) + dev_warn(&mc_cont->dev, + "VFIO_FLS_MC: reset device has failed (%d)\n", ret);
- vfio_fsl_mc_irqs_cleanup(vdev); + vfio_fsl_mc_irqs_cleanup(vdev);
- fsl_mc_cleanup_irq_pool(mc_cont); - } - - mutex_unlock(&vdev->reflck->lock); + fsl_mc_cleanup_irq_pool(mc_cont); }
static long vfio_fsl_mc_ioctl(struct vfio_device *core_vdev, @@ -504,8 +405,8 @@ static int vfio_fsl_mc_mmap(struct vfio_device *core_vdev,
static const struct vfio_device_ops vfio_fsl_mc_ops = { .name = "vfio-fsl-mc", - .open = vfio_fsl_mc_open, - .release = vfio_fsl_mc_release, + .open_device = vfio_fsl_mc_open_device, + .close_device = vfio_fsl_mc_close_device, .ioctl = vfio_fsl_mc_ioctl, .read = vfio_fsl_mc_read, .write = vfio_fsl_mc_write, @@ -625,13 +526,16 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev) vdev->mc_dev = mc_dev; mutex_init(&vdev->igate);
- ret = vfio_fsl_mc_reflck_attach(vdev); + if (is_fsl_mc_bus_dprc(mc_dev)) + ret = vfio_assign_device_set(&vdev->vdev, &mc_dev->dev); + else + ret = vfio_assign_device_set(&vdev->vdev, mc_dev->dev.parent); if (ret) goto out_uninit;
ret = vfio_fsl_mc_init_device(vdev); if (ret) - goto out_reflck; + goto out_uninit;
ret = vfio_register_group_dev(&vdev->vdev); if (ret) { @@ -639,12 +543,6 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev) goto out_device; }
- /* - * This triggers recursion into vfio_fsl_mc_probe() on another device - * and the vfio_fsl_mc_reflck_attach() must succeed, which relies on the - * vfio_add_group_dev() above. It has no impact on this vdev, so it is - * safe to be after the vfio device is made live. - */ ret = vfio_fsl_mc_scan_container(mc_dev); if (ret) goto out_group_dev; @@ -655,8 +553,6 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev) vfio_unregister_group_dev(&vdev->vdev); out_device: vfio_fsl_uninit_device(vdev); -out_reflck: - vfio_fsl_mc_reflck_put(vdev->reflck); out_uninit: vfio_uninit_group_dev(&vdev->vdev); kfree(vdev); @@ -675,7 +571,7 @@ static int vfio_fsl_mc_remove(struct fsl_mc_device *mc_dev)
dprc_remove_devices(mc_dev, NULL, 0); vfio_fsl_uninit_device(vdev); - vfio_fsl_mc_reflck_put(vdev->reflck); + vfio_uninit_group_dev(&vdev->vdev); kfree(vdev); vfio_iommu_group_put(mc_dev->dev.iommu_group, dev); diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c index 0d9f3002df7f51..77e584093a233d 100644 --- a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c @@ -120,7 +120,7 @@ static int vfio_fsl_mc_set_irq_trigger(struct vfio_fsl_mc_device *vdev, if (start != 0 || count != 1) return -EINVAL;
- mutex_lock(&vdev->reflck->lock); + mutex_lock(&vdev->vdev.dev_set->lock); ret = fsl_mc_populate_irq_pool(mc_cont, FSL_MC_IRQ_POOL_MAX_TOTAL_IRQS); if (ret) @@ -129,7 +129,7 @@ static int vfio_fsl_mc_set_irq_trigger(struct vfio_fsl_mc_device *vdev, ret = vfio_fsl_mc_irqs_allocate(vdev); if (ret) goto unlock; - mutex_unlock(&vdev->reflck->lock); + mutex_unlock(&vdev->vdev.dev_set->lock);
if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { s32 fd = *(s32 *)data; @@ -154,7 +154,7 @@ static int vfio_fsl_mc_set_irq_trigger(struct vfio_fsl_mc_device *vdev, return 0;
unlock: - mutex_unlock(&vdev->reflck->lock); + mutex_unlock(&vdev->vdev.dev_set->lock); return ret;
} diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc_private.h b/drivers/vfio/fsl-mc/vfio_fsl_mc_private.h index 89700e00e77d10..4ad63ececb914b 100644 --- a/drivers/vfio/fsl-mc/vfio_fsl_mc_private.h +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc_private.h @@ -22,11 +22,6 @@ struct vfio_fsl_mc_irq { char *name; };
-struct vfio_fsl_mc_reflck { - struct kref kref; - struct mutex lock; -}; - struct vfio_fsl_mc_region { u32 flags; u32 type; @@ -39,9 +34,7 @@ struct vfio_fsl_mc_device { struct vfio_device vdev; struct fsl_mc_device *mc_dev; struct notifier_block nb; - int refcnt; struct vfio_fsl_mc_region *regions; - struct vfio_fsl_mc_reflck *reflck; struct mutex igate; struct vfio_fsl_mc_irq *mc_irqs; };
Platform simply wants to run some code when the device is first opened/last closed. Use the core framework and locking for this. Aside from removing a bit of code this narrows the locking scope from a global lock.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Yishai Hadas yishaih@nvidia.com Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de --- drivers/vfio/platform/vfio_platform_common.c | 79 ++++++++----------- drivers/vfio/platform/vfio_platform_private.h | 1 - 2 files changed, 32 insertions(+), 48 deletions(-)
diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index bdde8605178cd2..6af7ce7d619c25 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -218,65 +218,52 @@ static int vfio_platform_call_reset(struct vfio_platform_device *vdev, return -EINVAL; }
-static void vfio_platform_release(struct vfio_device *core_vdev) +static void vfio_platform_close_device(struct vfio_device *core_vdev) { struct vfio_platform_device *vdev = container_of(core_vdev, struct vfio_platform_device, vdev); + const char *extra_dbg = NULL; + int ret;
- mutex_lock(&driver_lock); - - if (!(--vdev->refcnt)) { - const char *extra_dbg = NULL; - int ret; - - ret = vfio_platform_call_reset(vdev, &extra_dbg); - if (ret && vdev->reset_required) { - dev_warn(vdev->device, "reset driver is required and reset call failed in release (%d) %s\n", - ret, extra_dbg ? extra_dbg : ""); - WARN_ON(1); - } - pm_runtime_put(vdev->device); - vfio_platform_regions_cleanup(vdev); - vfio_platform_irq_cleanup(vdev); + ret = vfio_platform_call_reset(vdev, &extra_dbg); + if (WARN_ON(ret && vdev->reset_required)) { + dev_warn( + vdev->device, + "reset driver is required and reset call failed in release (%d) %s\n", + ret, extra_dbg ? extra_dbg : ""); } - - mutex_unlock(&driver_lock); + pm_runtime_put(vdev->device); + vfio_platform_regions_cleanup(vdev); + vfio_platform_irq_cleanup(vdev); }
-static int vfio_platform_open(struct vfio_device *core_vdev) +static int vfio_platform_open_device(struct vfio_device *core_vdev) { struct vfio_platform_device *vdev = container_of(core_vdev, struct vfio_platform_device, vdev); + const char *extra_dbg = NULL; int ret;
- mutex_lock(&driver_lock); - - if (!vdev->refcnt) { - const char *extra_dbg = NULL; - - ret = vfio_platform_regions_init(vdev); - if (ret) - goto err_reg; + ret = vfio_platform_regions_init(vdev); + if (ret) + return ret;
- ret = vfio_platform_irq_init(vdev); - if (ret) - goto err_irq; + ret = vfio_platform_irq_init(vdev); + if (ret) + goto err_irq;
- ret = pm_runtime_get_sync(vdev->device); - if (ret < 0) - goto err_rst; + ret = pm_runtime_get_sync(vdev->device); + if (ret < 0) + goto err_rst;
- ret = vfio_platform_call_reset(vdev, &extra_dbg); - if (ret && vdev->reset_required) { - dev_warn(vdev->device, "reset driver is required and reset call failed in open (%d) %s\n", - ret, extra_dbg ? extra_dbg : ""); - goto err_rst; - } + ret = vfio_platform_call_reset(vdev, &extra_dbg); + if (ret && vdev->reset_required) { + dev_warn( + vdev->device, + "reset driver is required and reset call failed in open (%d) %s\n", + ret, extra_dbg ? extra_dbg : ""); + goto err_rst; } - - vdev->refcnt++; - - mutex_unlock(&driver_lock); return 0;
err_rst: @@ -284,8 +271,6 @@ static int vfio_platform_open(struct vfio_device *core_vdev) vfio_platform_irq_cleanup(vdev); err_irq: vfio_platform_regions_cleanup(vdev); -err_reg: - mutex_unlock(&driver_lock); return ret; }
@@ -616,8 +601,8 @@ static int vfio_platform_mmap(struct vfio_device *core_vdev, struct vm_area_stru
static const struct vfio_device_ops vfio_platform_ops = { .name = "vfio-platform", - .open = vfio_platform_open, - .release = vfio_platform_release, + .open_device = vfio_platform_open_device, + .close_device = vfio_platform_close_device, .ioctl = vfio_platform_ioctl, .read = vfio_platform_read, .write = vfio_platform_write, diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index dfb834c1365946..520d2a8e8375b2 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -48,7 +48,6 @@ struct vfio_platform_device { u32 num_regions; struct vfio_platform_irq *irqs; u32 num_irqs; - int refcnt; struct mutex igate; const char *compat; const char *acpihid;
Hi Jason,
On 7/29/21 2:49 AM, Jason Gunthorpe wrote:
Platform simply wants to run some code when the device is first opened/last closed. Use the core framework and locking for this. Aside from removing a bit of code this narrows the locking scope from a global lock.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Yishai Hadas yishaih@nvidia.com Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de
Reviewed-by: Eric Auger eric.auger@redhat.com
Thanks
Eric
drivers/vfio/platform/vfio_platform_common.c | 79 ++++++++----------- drivers/vfio/platform/vfio_platform_private.h | 1 - 2 files changed, 32 insertions(+), 48 deletions(-)
diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index bdde8605178cd2..6af7ce7d619c25 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -218,65 +218,52 @@ static int vfio_platform_call_reset(struct vfio_platform_device *vdev, return -EINVAL; }
-static void vfio_platform_release(struct vfio_device *core_vdev) +static void vfio_platform_close_device(struct vfio_device *core_vdev) { struct vfio_platform_device *vdev = container_of(core_vdev, struct vfio_platform_device, vdev);
- const char *extra_dbg = NULL;
- int ret;
- mutex_lock(&driver_lock);
- if (!(--vdev->refcnt)) {
const char *extra_dbg = NULL;
int ret;
ret = vfio_platform_call_reset(vdev, &extra_dbg);
if (ret && vdev->reset_required) {
dev_warn(vdev->device, "reset driver is required and reset call failed in release (%d) %s\n",
ret, extra_dbg ? extra_dbg : "");
WARN_ON(1);
}
pm_runtime_put(vdev->device);
vfio_platform_regions_cleanup(vdev);
vfio_platform_irq_cleanup(vdev);
- ret = vfio_platform_call_reset(vdev, &extra_dbg);
- if (WARN_ON(ret && vdev->reset_required)) {
dev_warn(
vdev->device,
"reset driver is required and reset call failed in release (%d) %s\n",
}ret, extra_dbg ? extra_dbg : "");
- mutex_unlock(&driver_lock);
- pm_runtime_put(vdev->device);
- vfio_platform_regions_cleanup(vdev);
- vfio_platform_irq_cleanup(vdev);
}
-static int vfio_platform_open(struct vfio_device *core_vdev) +static int vfio_platform_open_device(struct vfio_device *core_vdev) { struct vfio_platform_device *vdev = container_of(core_vdev, struct vfio_platform_device, vdev);
- const char *extra_dbg = NULL; int ret;
- mutex_lock(&driver_lock);
- if (!vdev->refcnt) {
const char *extra_dbg = NULL;
ret = vfio_platform_regions_init(vdev);
if (ret)
goto err_reg;
- ret = vfio_platform_regions_init(vdev);
- if (ret)
return ret;
ret = vfio_platform_irq_init(vdev);
if (ret)
goto err_irq;
- ret = vfio_platform_irq_init(vdev);
- if (ret)
goto err_irq;
ret = pm_runtime_get_sync(vdev->device);
if (ret < 0)
goto err_rst;
- ret = pm_runtime_get_sync(vdev->device);
- if (ret < 0)
goto err_rst;
ret = vfio_platform_call_reset(vdev, &extra_dbg);
if (ret && vdev->reset_required) {
dev_warn(vdev->device, "reset driver is required and reset call failed in open (%d) %s\n",
ret, extra_dbg ? extra_dbg : "");
goto err_rst;
}
- ret = vfio_platform_call_reset(vdev, &extra_dbg);
- if (ret && vdev->reset_required) {
dev_warn(
vdev->device,
"reset driver is required and reset call failed in open (%d) %s\n",
ret, extra_dbg ? extra_dbg : "");
}goto err_rst;
- vdev->refcnt++;
- mutex_unlock(&driver_lock); return 0;
err_rst: @@ -284,8 +271,6 @@ static int vfio_platform_open(struct vfio_device *core_vdev) vfio_platform_irq_cleanup(vdev); err_irq: vfio_platform_regions_cleanup(vdev); -err_reg:
- mutex_unlock(&driver_lock); return ret;
}
@@ -616,8 +601,8 @@ static int vfio_platform_mmap(struct vfio_device *core_vdev, struct vm_area_stru
static const struct vfio_device_ops vfio_platform_ops = { .name = "vfio-platform",
- .open = vfio_platform_open,
- .release = vfio_platform_release,
- .open_device = vfio_platform_open_device,
- .close_device = vfio_platform_close_device, .ioctl = vfio_platform_ioctl, .read = vfio_platform_read, .write = vfio_platform_write,
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index dfb834c1365946..520d2a8e8375b2 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -48,7 +48,6 @@ struct vfio_platform_device { u32 num_regions; struct vfio_platform_irq *irqs; u32 num_irqs;
- int refcnt; struct mutex igate; const char *compat; const char *acpihid;
From: Yishai Hadas yishaih@nvidia.com
PCI wants to have the usual open/close_device() logic with the slight twist that the open/close_device() must be done under a singelton lock shared by all of the vfio_devices that are in the PCI "reset group".
The reset group, and thus the device set, is determined by what devices pci_reset_bus() touches, which is either the entire bus or only the slot.
Rely on the core code to do everything reflck was doing and delete reflck entirely.
Signed-off-by: Yishai Hadas yishaih@nvidia.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/vfio/pci/vfio_pci.c | 162 +++++++--------------------- drivers/vfio/pci/vfio_pci_private.h | 7 -- 2 files changed, 37 insertions(+), 132 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index fab3715d60d4ba..5d6db93d6c680f 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -530,53 +530,40 @@ static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val) vfio_device_put(&pf_vdev->vdev); }
-static void vfio_pci_release(struct vfio_device *core_vdev) +static void vfio_pci_close_device(struct vfio_device *core_vdev) { struct vfio_pci_device *vdev = container_of(core_vdev, struct vfio_pci_device, vdev);
- mutex_lock(&vdev->reflck->lock); - - if (!(--vdev->refcnt)) { - vfio_pci_vf_token_user_add(vdev, -1); - vfio_spapr_pci_eeh_release(vdev->pdev); - vfio_pci_disable(vdev); + vfio_pci_vf_token_user_add(vdev, -1); + vfio_spapr_pci_eeh_release(vdev->pdev); + vfio_pci_disable(vdev);
- mutex_lock(&vdev->igate); - if (vdev->err_trigger) { - eventfd_ctx_put(vdev->err_trigger); - vdev->err_trigger = NULL; - } - if (vdev->req_trigger) { - eventfd_ctx_put(vdev->req_trigger); - vdev->req_trigger = NULL; - } - mutex_unlock(&vdev->igate); + mutex_lock(&vdev->igate); + if (vdev->err_trigger) { + eventfd_ctx_put(vdev->err_trigger); + vdev->err_trigger = NULL; } - - mutex_unlock(&vdev->reflck->lock); + if (vdev->req_trigger) { + eventfd_ctx_put(vdev->req_trigger); + vdev->req_trigger = NULL; + } + mutex_unlock(&vdev->igate); }
-static int vfio_pci_open(struct vfio_device *core_vdev) +static int vfio_pci_open_device(struct vfio_device *core_vdev) { struct vfio_pci_device *vdev = container_of(core_vdev, struct vfio_pci_device, vdev); int ret = 0;
- mutex_lock(&vdev->reflck->lock); - - if (!vdev->refcnt) { - ret = vfio_pci_enable(vdev); - if (ret) - goto error; + ret = vfio_pci_enable(vdev); + if (ret) + return ret;
- vfio_spapr_pci_eeh_open(vdev->pdev); - vfio_pci_vf_token_user_add(vdev, 1); - } - vdev->refcnt++; -error: - mutex_unlock(&vdev->reflck->lock); - return ret; + vfio_spapr_pci_eeh_open(vdev->pdev); + vfio_pci_vf_token_user_add(vdev, 1); + return 0; }
static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) @@ -1870,8 +1857,8 @@ static int vfio_pci_match(struct vfio_device *core_vdev, char *buf)
static const struct vfio_device_ops vfio_pci_ops = { .name = "vfio-pci", - .open = vfio_pci_open, - .release = vfio_pci_release, + .open_device = vfio_pci_open_device, + .close_device = vfio_pci_close_device, .ioctl = vfio_pci_ioctl, .read = vfio_pci_read, .write = vfio_pci_write, @@ -1880,9 +1867,6 @@ static const struct vfio_device_ops vfio_pci_ops = { .match = vfio_pci_match, };
-static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev); -static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck); - static int vfio_pci_bus_notifier(struct notifier_block *nb, unsigned long action, void *data) { @@ -2020,12 +2004,23 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) INIT_LIST_HEAD(&vdev->vma_list); init_rwsem(&vdev->memory_lock);
- ret = vfio_pci_reflck_attach(vdev); + if (pci_is_root_bus(pdev->bus)) { + ret = vfio_assign_device_set(&vdev->vdev, vdev); + } else if (!pci_probe_reset_slot(pdev->slot)) { + ret = vfio_assign_device_set(&vdev->vdev, pdev->slot); + } else { + /* + * If there is no slot reset support for this device, the whole + * bus needs to be grouped together to support bus-wide resets. + */ + ret = vfio_assign_device_set(&vdev->vdev, pdev->bus); + } + if (ret) goto out_uninit; ret = vfio_pci_vf_init(vdev); if (ret) - goto out_reflck; + goto out_uninit; ret = vfio_pci_vga_init(vdev); if (ret) goto out_vf; @@ -2057,8 +2052,6 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) vfio_pci_set_power_state(vdev, PCI_D0); out_vf: vfio_pci_vf_uninit(vdev); -out_reflck: - vfio_pci_reflck_put(vdev->reflck); out_uninit: vfio_uninit_group_dev(&vdev->vdev); kfree(vdev->pm_save); @@ -2077,7 +2070,6 @@ static void vfio_pci_remove(struct pci_dev *pdev) vfio_unregister_group_dev(&vdev->vdev);
vfio_pci_vf_uninit(vdev); - vfio_pci_reflck_put(vdev->reflck); vfio_uninit_group_dev(&vdev->vdev); vfio_pci_vga_uninit(vdev);
@@ -2153,86 +2145,6 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, };
-static DEFINE_MUTEX(reflck_lock); - -static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void) -{ - struct vfio_pci_reflck *reflck; - - reflck = kzalloc(sizeof(*reflck), GFP_KERNEL); - if (!reflck) - return ERR_PTR(-ENOMEM); - - kref_init(&reflck->kref); - mutex_init(&reflck->lock); - - return reflck; -} - -static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck) -{ - kref_get(&reflck->kref); -} - -static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_reflck **preflck = data; - struct vfio_device *device; - struct vfio_pci_device *vdev; - - device = vfio_device_get_from_dev(&pdev->dev); - if (!device) - return 0; - - if (pci_dev_driver(pdev) != &vfio_pci_driver) { - vfio_device_put(device); - return 0; - } - - vdev = container_of(device, struct vfio_pci_device, vdev); - - if (vdev->reflck) { - vfio_pci_reflck_get(vdev->reflck); - *preflck = vdev->reflck; - vfio_device_put(device); - return 1; - } - - vfio_device_put(device); - return 0; -} - -static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) -{ - bool slot = !pci_probe_reset_slot(vdev->pdev->slot); - - mutex_lock(&reflck_lock); - - if (pci_is_root_bus(vdev->pdev->bus) || - vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find, - &vdev->reflck, slot) <= 0) - vdev->reflck = vfio_pci_reflck_alloc(); - - mutex_unlock(&reflck_lock); - - return PTR_ERR_OR_ZERO(vdev->reflck); -} - -static void vfio_pci_reflck_release(struct kref *kref) -{ - struct vfio_pci_reflck *reflck = container_of(kref, - struct vfio_pci_reflck, - kref); - - kfree(reflck); - mutex_unlock(&reflck_lock); -} - -static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) -{ - kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock); -} - static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; @@ -2254,7 +2166,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) vdev = container_of(device, struct vfio_pci_device, vdev);
/* Fault if the device is not unused */ - if (vdev->refcnt) { + if (device->open_count) { vfio_device_put(device); return -EBUSY; } @@ -2303,7 +2215,7 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) * - At least one of the affected devices is marked dirty via * needs_reset (such as by lack of FLR support) * Then attempt to perform that bus or slot reset. Callers are required - * to hold vdev->reflck->lock, protecting the bus/slot reset group from + * to hold vdev->dev_set->lock, protecting the bus/slot reset group from * concurrent opens. A vfio_device reference is acquired for each device * to prevent unbinds during the reset operation. * diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index bbc56c857ef081..70414b6c904d89 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -83,11 +83,6 @@ struct vfio_pci_dummy_resource { struct list_head res_next; };
-struct vfio_pci_reflck { - struct kref kref; - struct mutex lock; -}; - struct vfio_pci_vf_token { struct mutex lock; uuid_t uuid; @@ -130,8 +125,6 @@ struct vfio_pci_device { bool needs_pm_restore; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; - struct vfio_pci_reflck *reflck; - int refcnt; int ioeventfds_nr; struct eventfd_ctx *err_trigger; struct eventfd_ctx *req_trigger;
Keep track of all the vfio_devices that have been added to the device set and use this list in vfio_pci_try_bus_reset() instead of trying to work backwards from the pci_device.
The dev_set->lock directly prevents devices from joining/leaving the set, which further implies the pci_device cannot change drivers or that the vfio_device be freed, eliminating the need for get/put's.
Completeness of the device set can be directly measured by checking if every PCI device in the reset group is also in the device set - which proves that VFIO drivers are attached to everything.
This restructuring corrects a call to pci_dev_driver() without holding the device_lock() and removes a hard wiring to &vfio_pci_driver.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/vfio/pci/vfio_pci.c | 148 +++++++++++++++--------------------- 1 file changed, 62 insertions(+), 86 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 5d6db93d6c680f..a1ae9a83a38621 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -404,6 +404,9 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; int i, bar;
+ /* For needs_reset */ + lockdep_assert_held(&vdev->vdev.dev_set->lock); + /* Stop the device from further DMA */ pci_clear_master(pdev);
@@ -2145,7 +2148,7 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, };
-static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) +static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; struct vfio_device *device; @@ -2165,8 +2168,11 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
vdev = container_of(device, struct vfio_pci_device, vdev);
- /* Fault if the device is not unused */ - if (device->open_count) { + /* + * Locking multiple devices is prone to deadlock, runaway and + * unwind if we hit contention. + */ + if (!vfio_pci_zap_and_vma_lock(vdev, true)) { vfio_device_put(device); return -EBUSY; } @@ -2175,112 +2181,82 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) return 0; }
-static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) +static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) { - struct vfio_devices *devs = data; - struct vfio_device *device; - struct vfio_pci_device *vdev; + struct vfio_device_set *dev_set = data; + struct vfio_device *cur;
- if (devs->cur_index == devs->max_index) - return -ENOSPC; + lockdep_assert_held(&dev_set->lock);
- device = vfio_device_get_from_dev(&pdev->dev); - if (!device) - return -EINVAL; - - if (pci_dev_driver(pdev) != &vfio_pci_driver) { - vfio_device_put(device); - return -EBUSY; - } - - vdev = container_of(device, struct vfio_pci_device, vdev); + list_for_each_entry(cur, &dev_set->device_list, dev_set_list) + if (cur->dev == &pdev->dev) + return 0; + return -EBUSY; +}
- /* - * Locking multiple devices is prone to deadlock, runaway and - * unwind if we hit contention. - */ - if (!vfio_pci_zap_and_vma_lock(vdev, true)) { - vfio_device_put(device); - return -EBUSY; +/* + * vfio-core considers a group to be viable and will create a vfio_device even + * if some devices are bound to drivers like pci-stub or pcieport. Here we + * require all PCI devices to be inside our dev_set since that ensures they stay + * put and that every driver controlling the device can co-ordinate with the + * device reset. + */ +static struct pci_dev *vfio_pci_find_reset_target(struct vfio_pci_device *vdev) +{ + struct vfio_device_set *dev_set = vdev->vdev.dev_set; + struct vfio_pci_device *cur; + bool needs_reset = false; + + /* No VFIO device has an open device FD */ + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { + if (cur->vdev.open_count) + return NULL; + needs_reset |= cur->needs_reset; } + if (!needs_reset) + return NULL;
- devs->devices[devs->cur_index++] = vdev; - return 0; + /* All PCI devices in the group to be reset need to be in our dev_set */ + if (vfio_pci_for_each_slot_or_bus( + vdev->pdev, vfio_pci_is_device_in_set, dev_set, + !pci_probe_reset_slot(vdev->pdev->slot))) + return NULL; + return cur->pdev; }
/* * If a bus or slot reset is available for the provided device and: * - All of the devices affected by that bus or slot reset are unused - * (!refcnt) * - At least one of the affected devices is marked dirty via * needs_reset (such as by lack of FLR support) - * Then attempt to perform that bus or slot reset. Callers are required - * to hold vdev->dev_set->lock, protecting the bus/slot reset group from - * concurrent opens. A vfio_device reference is acquired for each device - * to prevent unbinds during the reset operation. - * - * NB: vfio-core considers a group to be viable even if some devices are - * bound to drivers like pci-stub or pcieport. Here we require all devices - * to be bound to vfio_pci since that's the only way we can be sure they - * stay put. + * Then attempt to perform that bus or slot reset. */ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) { - struct vfio_devices devs = { .cur_index = 0 }; - int i = 0, ret = -EINVAL; - bool slot = false; - struct vfio_pci_device *tmp; - - if (!pci_probe_reset_slot(vdev->pdev->slot)) - slot = true; - else if (pci_probe_reset_bus(vdev->pdev->bus)) - return; + struct vfio_device_set *dev_set = vdev->vdev.dev_set; + struct pci_dev *to_reset; + struct vfio_pci_device *cur; + int ret;
- if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs, - &i, slot) || !i) - return; + lockdep_assert_held(&vdev->vdev.dev_set->lock);
- devs.max_index = i; - devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL); - if (!devs.devices) + if (pci_probe_reset_slot(vdev->pdev->slot) && + pci_probe_reset_bus(vdev->pdev->bus)) return;
- if (vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_get_unused_devs, - &devs, slot)) - goto put_devs; - - /* Does at least one need a reset? */ - for (i = 0; i < devs.cur_index; i++) { - tmp = devs.devices[i]; - if (tmp->needs_reset) { - ret = pci_reset_bus(vdev->pdev); - break; - } - } - -put_devs: - for (i = 0; i < devs.cur_index; i++) { - tmp = devs.devices[i]; - - /* - * If reset was successful, affected devices no longer need - * a reset and we should return all the collateral devices - * to low power. If not successful, we either didn't reset - * the bus or timed out waiting for it, so let's not touch - * the power state. - */ - if (!ret) { - tmp->needs_reset = false; + to_reset = vfio_pci_find_reset_target(vdev); + if (!to_reset) + return;
- if (tmp != vdev && !disable_idle_d3) - vfio_pci_set_power_state(tmp, PCI_D3hot); - } + ret = pci_reset_bus(to_reset); + if (ret) + return;
- vfio_device_put(&tmp->vdev); + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { + cur->needs_reset = false; + if (cur->pdev != to_reset && !disable_idle_d3) + vfio_pci_set_power_state(cur, PCI_D3hot); } - - kfree(devs.devices); }
static void __exit vfio_pci_cleanup(void)
On Wed, 28 Jul 2021 21:49:18 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
Keep track of all the vfio_devices that have been added to the device set and use this list in vfio_pci_try_bus_reset() instead of trying to work backwards from the pci_device.
The dev_set->lock directly prevents devices from joining/leaving the set, which further implies the pci_device cannot change drivers or that the vfio_device be freed, eliminating the need for get/put's.
Completeness of the device set can be directly measured by checking if every PCI device in the reset group is also in the device set - which proves that VFIO drivers are attached to everything.
This restructuring corrects a call to pci_dev_driver() without holding the device_lock() and removes a hard wiring to &vfio_pci_driver.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com
drivers/vfio/pci/vfio_pci.c | 148 +++++++++++++++--------------------- 1 file changed, 62 insertions(+), 86 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 5d6db93d6c680f..a1ae9a83a38621 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -404,6 +404,9 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; int i, bar;
- /* For needs_reset */
- lockdep_assert_held(&vdev->vdev.dev_set->lock);
- /* Stop the device from further DMA */ pci_clear_master(pdev);
@@ -2145,7 +2148,7 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, };
-static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) +static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; struct vfio_device *device; @@ -2165,8 +2168,11 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
vdev = container_of(device, struct vfio_pci_device, vdev);
- /* Fault if the device is not unused */
- if (device->open_count) {
- /*
* Locking multiple devices is prone to deadlock, runaway and
* unwind if we hit contention.
*/
- if (!vfio_pci_zap_and_vma_lock(vdev, true)) { vfio_device_put(device); return -EBUSY; }
@@ -2175,112 +2181,82 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) return 0; }
-static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) +static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) {
- struct vfio_devices *devs = data;
- struct vfio_device *device;
- struct vfio_pci_device *vdev;
- struct vfio_device_set *dev_set = data;
- struct vfio_device *cur;
- if (devs->cur_index == devs->max_index)
return -ENOSPC;
- lockdep_assert_held(&dev_set->lock);
- device = vfio_device_get_from_dev(&pdev->dev);
- if (!device)
return -EINVAL;
- if (pci_dev_driver(pdev) != &vfio_pci_driver) {
vfio_device_put(device);
return -EBUSY;
- }
- vdev = container_of(device, struct vfio_pci_device, vdev);
- list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
if (cur->dev == &pdev->dev)
return 0;
- return -EBUSY;
+}
- /*
* Locking multiple devices is prone to deadlock, runaway and
* unwind if we hit contention.
*/
- if (!vfio_pci_zap_and_vma_lock(vdev, true)) {
vfio_device_put(device);
return -EBUSY;
+/*
- vfio-core considers a group to be viable and will create a vfio_device even
- if some devices are bound to drivers like pci-stub or pcieport. Here we
- require all PCI devices to be inside our dev_set since that ensures they stay
- put and that every driver controlling the device can co-ordinate with the
- device reset.
- */
+static struct pci_dev *vfio_pci_find_reset_target(struct vfio_pci_device *vdev) +{
- struct vfio_device_set *dev_set = vdev->vdev.dev_set;
- struct vfio_pci_device *cur;
- bool needs_reset = false;
- /* No VFIO device has an open device FD */
- list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
if (cur->vdev.open_count)
return NULL;
}needs_reset |= cur->needs_reset;
- if (!needs_reset)
return NULL;
- devs->devices[devs->cur_index++] = vdev;
- return 0;
- /* All PCI devices in the group to be reset need to be in our dev_set */
- if (vfio_pci_for_each_slot_or_bus(
vdev->pdev, vfio_pci_is_device_in_set, dev_set,
!pci_probe_reset_slot(vdev->pdev->slot)))
return NULL;
- return cur->pdev;
I don't understand the "reset target" aspect of this, cur->pdev is simply the last entry in the dev_set->devices_list...
}
/*
- If a bus or slot reset is available for the provided device and:
- All of the devices affected by that bus or slot reset are unused
- (!refcnt)
- At least one of the affected devices is marked dirty via
- needs_reset (such as by lack of FLR support)
- Then attempt to perform that bus or slot reset. Callers are required
- to hold vdev->dev_set->lock, protecting the bus/slot reset group from
- concurrent opens. A vfio_device reference is acquired for each device
- to prevent unbinds during the reset operation.
- NB: vfio-core considers a group to be viable even if some devices are
- bound to drivers like pci-stub or pcieport. Here we require all devices
- to be bound to vfio_pci since that's the only way we can be sure they
- stay put.
*/
- Then attempt to perform that bus or slot reset.
static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) {
- struct vfio_devices devs = { .cur_index = 0 };
- int i = 0, ret = -EINVAL;
- bool slot = false;
- struct vfio_pci_device *tmp;
- if (!pci_probe_reset_slot(vdev->pdev->slot))
slot = true;
- else if (pci_probe_reset_bus(vdev->pdev->bus))
return;
- struct vfio_device_set *dev_set = vdev->vdev.dev_set;
- struct pci_dev *to_reset;
- struct vfio_pci_device *cur;
- int ret;
- if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
&i, slot) || !i)
return;
- lockdep_assert_held(&vdev->vdev.dev_set->lock);
- devs.max_index = i;
- devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL);
- if (!devs.devices)
- if (pci_probe_reset_slot(vdev->pdev->slot) &&
return;pci_probe_reset_bus(vdev->pdev->bus))
- if (vfio_pci_for_each_slot_or_bus(vdev->pdev,
vfio_pci_get_unused_devs,
&devs, slot))
goto put_devs;
- /* Does at least one need a reset? */
- for (i = 0; i < devs.cur_index; i++) {
tmp = devs.devices[i];
if (tmp->needs_reset) {
ret = pci_reset_bus(vdev->pdev);
break;
}
- }
-put_devs:
- for (i = 0; i < devs.cur_index; i++) {
tmp = devs.devices[i];
/*
* If reset was successful, affected devices no longer need
* a reset and we should return all the collateral devices
* to low power. If not successful, we either didn't reset
* the bus or timed out waiting for it, so let's not touch
* the power state.
*/
if (!ret) {
tmp->needs_reset = false;
- to_reset = vfio_pci_find_reset_target(vdev);
- if (!to_reset)
return;
if (tmp != vdev && !disable_idle_d3)
vfio_pci_set_power_state(tmp, PCI_D3hot);
}
- ret = pci_reset_bus(to_reset);
- if (ret)
return;
vfio_device_put(&tmp->vdev);
- list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
cur->needs_reset = false;
if (cur->pdev != to_reset && !disable_idle_d3)
}vfio_pci_set_power_state(cur, PCI_D3hot);
...which means that here, I think we're putting all but whichever random device was last in the list into D3. The intention was that all the devices except for the one we're operating on should already be in D3, the bus reset will put them back in D0, so we want to force them back to D3.
I think the vfio_pci_find_reset_target() function needs to be re-worked to just tell us true/false that it's ok to reset the provided device, not to anoint an arbitrary target device. Thanks,
Alex
- kfree(devs.devices);
}
static void __exit vfio_pci_cleanup(void)
On Tue, Aug 03, 2021 at 10:34:06AM -0600, Alex Williamson wrote:
On Wed, 28 Jul 2021 21:49:18 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
Keep track of all the vfio_devices that have been added to the device set and use this list in vfio_pci_try_bus_reset() instead of trying to work backwards from the pci_device.
The dev_set->lock directly prevents devices from joining/leaving the set, which further implies the pci_device cannot change drivers or that the vfio_device be freed, eliminating the need for get/put's.
Completeness of the device set can be directly measured by checking if every PCI device in the reset group is also in the device set - which proves that VFIO drivers are attached to everything.
This restructuring corrects a call to pci_dev_driver() without holding the device_lock() and removes a hard wiring to &vfio_pci_driver.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com drivers/vfio/pci/vfio_pci.c | 148 +++++++++++++++--------------------- 1 file changed, 62 insertions(+), 86 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 5d6db93d6c680f..a1ae9a83a38621 100644 +++ b/drivers/vfio/pci/vfio_pci.c @@ -404,6 +404,9 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; int i, bar;
- /* For needs_reset */
- lockdep_assert_held(&vdev->vdev.dev_set->lock);
- /* Stop the device from further DMA */ pci_clear_master(pdev);
@@ -2145,7 +2148,7 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, };
-static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) +static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; struct vfio_device *device; @@ -2165,8 +2168,11 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
vdev = container_of(device, struct vfio_pci_device, vdev);
- /* Fault if the device is not unused */
- if (device->open_count) {
- /*
* Locking multiple devices is prone to deadlock, runaway and
* unwind if we hit contention.
*/
- if (!vfio_pci_zap_and_vma_lock(vdev, true)) { vfio_device_put(device); return -EBUSY; }
@@ -2175,112 +2181,82 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) return 0; }
-static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) +static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) {
- struct vfio_devices *devs = data;
- struct vfio_device *device;
- struct vfio_pci_device *vdev;
- struct vfio_device_set *dev_set = data;
- struct vfio_device *cur;
- if (devs->cur_index == devs->max_index)
return -ENOSPC;
- lockdep_assert_held(&dev_set->lock);
- device = vfio_device_get_from_dev(&pdev->dev);
- if (!device)
return -EINVAL;
- if (pci_dev_driver(pdev) != &vfio_pci_driver) {
vfio_device_put(device);
return -EBUSY;
- }
- vdev = container_of(device, struct vfio_pci_device, vdev);
- list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
if (cur->dev == &pdev->dev)
return 0;
- return -EBUSY;
+}
- /*
* Locking multiple devices is prone to deadlock, runaway and
* unwind if we hit contention.
*/
- if (!vfio_pci_zap_and_vma_lock(vdev, true)) {
vfio_device_put(device);
return -EBUSY;
+/*
- vfio-core considers a group to be viable and will create a vfio_device even
- if some devices are bound to drivers like pci-stub or pcieport. Here we
- require all PCI devices to be inside our dev_set since that ensures they stay
- put and that every driver controlling the device can co-ordinate with the
- device reset.
- */
+static struct pci_dev *vfio_pci_find_reset_target(struct vfio_pci_device *vdev) +{
- struct vfio_device_set *dev_set = vdev->vdev.dev_set;
- struct vfio_pci_device *cur;
- bool needs_reset = false;
- /* No VFIO device has an open device FD */
- list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
if (cur->vdev.open_count)
return NULL;
}needs_reset |= cur->needs_reset;
- if (!needs_reset)
return NULL;
- devs->devices[devs->cur_index++] = vdev;
- return 0;
- /* All PCI devices in the group to be reset need to be in our dev_set */
- if (vfio_pci_for_each_slot_or_bus(
vdev->pdev, vfio_pci_is_device_in_set, dev_set,
!pci_probe_reset_slot(vdev->pdev->slot)))
return NULL;
- return cur->pdev;
I don't understand the "reset target" aspect of this, cur->pdev is simply the last entry in the dev_set->devices_list...
Oh, hum, this got messed up someplace along the way, the original code was just:
/* Does at least one need a reset? */ for (i = 0; i < devs.cur_index; i++) { tmp = devs.devices[i]; if (tmp->needs_reset) { ret = pci_reset_bus(vdev->pdev); break;
So should this, I'll fix it up, thanks
I think the vfio_pci_find_reset_target() function needs to be re-worked to just tell us true/false that it's ok to reset the provided device, not to anoint an arbitrary target device. Thanks,
Yes, though this logic is confusing, why do we need to check if any device needs a reset at this point? If we are being asked to reset vdev shouldn't vdev needs_reset?
Or is the function more of a 'synchronize pending reset' kind of thing?
Jason
On Tue, 3 Aug 2021 13:41:52 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Tue, Aug 03, 2021 at 10:34:06AM -0600, Alex Williamson wrote:
I think the vfio_pci_find_reset_target() function needs to be re-worked to just tell us true/false that it's ok to reset the provided device, not to anoint an arbitrary target device. Thanks,
Yes, though this logic is confusing, why do we need to check if any device needs a reset at this point? If we are being asked to reset vdev shouldn't vdev needs_reset?
Or is the function more of a 'synchronize pending reset' kind of thing?
Yes, the latter. For instance think about a multi-function PCI device such as a GPU. The functions have dramatically different capabilities, some might have function level reset abilities and others not. We want to be able to trigger a bus reset as the last device of the set is released, no matter the order they're released and no matter the capabilities of the device we're currently processing. Thanks,
Alex
On Tue, Aug 03, 2021 at 10:52:25AM -0600, Alex Williamson wrote:
On Tue, 3 Aug 2021 13:41:52 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Tue, Aug 03, 2021 at 10:34:06AM -0600, Alex Williamson wrote:
I think the vfio_pci_find_reset_target() function needs to be re-worked to just tell us true/false that it's ok to reset the provided device, not to anoint an arbitrary target device. Thanks,
Yes, though this logic is confusing, why do we need to check if any device needs a reset at this point? If we are being asked to reset vdev shouldn't vdev needs_reset?
Or is the function more of a 'synchronize pending reset' kind of thing?
Yes, the latter. For instance think about a multi-function PCI device such as a GPU. The functions have dramatically different capabilities, some might have function level reset abilities and others not. We want to be able to trigger a bus reset as the last device of the set is released, no matter the order they're released and no matter the capabilities of the device we're currently processing. Thanks,
I worked on this for awhile, I think this is much clearer about what this algorithm is trying to do:
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 5d6db93d6c680f..e418bcbb68facc 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -223,7 +223,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) } }
-static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); +static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set); static void vfio_pci_disable(struct vfio_pci_device *vdev); static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data);
@@ -404,6 +404,9 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; int i, bar;
+ /* For needs_reset */ + lockdep_assert_held(&vdev->vdev.dev_set->lock); + /* Stop the device from further DMA */ pci_clear_master(pdev);
@@ -487,9 +490,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) out: pci_disable_device(pdev);
- vfio_pci_try_bus_reset(vdev); - - if (!disable_idle_d3) + if (!vfio_pci_dev_set_try_reset(vdev->vdev.dev_set) && !disable_idle_d3) vfio_pci_set_power_state(vdev, PCI_D3hot); }
@@ -2145,36 +2146,6 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, };
-static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) -{ - struct vfio_devices *devs = data; - struct vfio_device *device; - struct vfio_pci_device *vdev; - - if (devs->cur_index == devs->max_index) - return -ENOSPC; - - device = vfio_device_get_from_dev(&pdev->dev); - if (!device) - return -EINVAL; - - if (pci_dev_driver(pdev) != &vfio_pci_driver) { - vfio_device_put(device); - return -EBUSY; - } - - vdev = container_of(device, struct vfio_pci_device, vdev); - - /* Fault if the device is not unused */ - if (device->open_count) { - vfio_device_put(device); - return -EBUSY; - } - - devs->devices[devs->cur_index++] = vdev; - return 0; -} - static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; @@ -2208,79 +2179,86 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) return 0; }
+static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) +{ + struct vfio_device_set *dev_set = data; + struct vfio_device *cur; + + lockdep_assert_held(&dev_set->lock); + + list_for_each_entry(cur, &dev_set->device_list, dev_set_list) + if (cur->dev == &pdev->dev) + return 0; + return -EBUSY; +} + +static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set) +{ + struct vfio_pci_device *cur; + bool needs_reset = false; + + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { + /* No VFIO device in the set can have an open device FD */ + if (cur->vdev.open_count) + return false; + needs_reset |= cur->needs_reset; + } + return needs_reset; +} + /* - * If a bus or slot reset is available for the provided device and: + * If a bus or slot reset is available for the provided dev_set and: * - All of the devices affected by that bus or slot reset are unused - * (!refcnt) * - At least one of the affected devices is marked dirty via * needs_reset (such as by lack of FLR support) - * Then attempt to perform that bus or slot reset. Callers are required - * to hold vdev->dev_set->lock, protecting the bus/slot reset group from - * concurrent opens. A vfio_device reference is acquired for each device - * to prevent unbinds during the reset operation. - * - * NB: vfio-core considers a group to be viable even if some devices are - * bound to drivers like pci-stub or pcieport. Here we require all devices - * to be bound to vfio_pci since that's the only way we can be sure they - * stay put. + * Then attempt to perform that bus or slot reset. + * Returns true if the dev_set was reset. */ -static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) +static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set) { - struct vfio_devices devs = { .cur_index = 0 }; - int i = 0, ret = -EINVAL; - bool slot = false; - struct vfio_pci_device *tmp; + struct vfio_pci_device *cur; + struct pci_dev *pdev; + int ret;
- if (!pci_probe_reset_slot(vdev->pdev->slot)) - slot = true; - else if (pci_probe_reset_bus(vdev->pdev->bus)) - return; + lockdep_assert_held(&dev_set->lock);
- if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs, - &i, slot) || !i) - return; + /* + * By definition all PCI devices in the dev_set share the same PCI + * reset, so any pci_dev will have the same outcomes for + * pci_probe_reset_*() and pci_reset_bus(). + */ + pdev = list_first_entry(&dev_set->device_list, struct vfio_pci_device, + vdev.dev_set_list)->pdev;
- devs.max_index = i; - devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL); - if (!devs.devices) - return; + /* Reset of the dev_set is possible */ + if (pci_probe_reset_slot(pdev->slot) && pci_probe_reset_bus(pdev->bus)) + return false;
- if (vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_get_unused_devs, - &devs, slot)) - goto put_devs; + if (!vfio_pci_dev_set_needs_reset(dev_set)) + return false;
- /* Does at least one need a reset? */ - for (i = 0; i < devs.cur_index; i++) { - tmp = devs.devices[i]; - if (tmp->needs_reset) { - ret = pci_reset_bus(vdev->pdev); - break; - } + /* + * vfio-core considers a group to be viable and will create a + * vfio_device even if some devices are bound to drivers like pci-stub + * or pcieport. Here we require all PCI devices to be inside our dev_set + * since that ensures they stay put and that every driver controlling + * the device can co-ordinate with the device reset. + */ + if (vfio_pci_for_each_slot_or_bus(pdev, vfio_pci_is_device_in_set, + dev_set, + !pci_probe_reset_slot(pdev->slot))) + return false; + + ret = pci_reset_bus(pdev); + if (ret) + return false; + + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { + cur->needs_reset = false; + if (!disable_idle_d3) + vfio_pci_set_power_state(cur, PCI_D3hot); } - -put_devs: - for (i = 0; i < devs.cur_index; i++) { - tmp = devs.devices[i]; - - /* - * If reset was successful, affected devices no longer need - * a reset and we should return all the collateral devices - * to low power. If not successful, we either didn't reset - * the bus or timed out waiting for it, so let's not touch - * the power state. - */ - if (!ret) { - tmp->needs_reset = false; - - if (tmp != vdev && !disable_idle_d3) - vfio_pci_set_power_state(tmp, PCI_D3hot); - } - - vfio_device_put(&tmp->vdev); - } - - kfree(devs.devices); + return true; }
static void __exit vfio_pci_cleanup(void)
On Thu, 5 Aug 2021 08:47:01 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Tue, Aug 03, 2021 at 10:52:25AM -0600, Alex Williamson wrote:
On Tue, 3 Aug 2021 13:41:52 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Tue, Aug 03, 2021 at 10:34:06AM -0600, Alex Williamson wrote:
I think the vfio_pci_find_reset_target() function needs to be re-worked to just tell us true/false that it's ok to reset the provided device, not to anoint an arbitrary target device. Thanks,
Yes, though this logic is confusing, why do we need to check if any device needs a reset at this point? If we are being asked to reset vdev shouldn't vdev needs_reset?
Or is the function more of a 'synchronize pending reset' kind of thing?
Yes, the latter. For instance think about a multi-function PCI device such as a GPU. The functions have dramatically different capabilities, some might have function level reset abilities and others not. We want to be able to trigger a bus reset as the last device of the set is released, no matter the order they're released and no matter the capabilities of the device we're currently processing. Thanks,
I worked on this for awhile, I think this is much clearer about what this algorithm is trying to do:
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 5d6db93d6c680f..e418bcbb68facc 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -223,7 +223,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) } }
-static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); +static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set); static void vfio_pci_disable(struct vfio_pci_device *vdev); static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data);
@@ -404,6 +404,9 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; int i, bar;
- /* For needs_reset */
- lockdep_assert_held(&vdev->vdev.dev_set->lock);
- /* Stop the device from further DMA */ pci_clear_master(pdev);
@@ -487,9 +490,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) out: pci_disable_device(pdev);
- vfio_pci_try_bus_reset(vdev);
- if (!disable_idle_d3)
- if (!vfio_pci_dev_set_try_reset(vdev->vdev.dev_set) && !disable_idle_d3) vfio_pci_set_power_state(vdev, PCI_D3hot);
}
@@ -2145,36 +2146,6 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, };
-static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) -{
- struct vfio_devices *devs = data;
- struct vfio_device *device;
- struct vfio_pci_device *vdev;
- if (devs->cur_index == devs->max_index)
return -ENOSPC;
- device = vfio_device_get_from_dev(&pdev->dev);
- if (!device)
return -EINVAL;
- if (pci_dev_driver(pdev) != &vfio_pci_driver) {
vfio_device_put(device);
return -EBUSY;
- }
- vdev = container_of(device, struct vfio_pci_device, vdev);
- /* Fault if the device is not unused */
- if (device->open_count) {
vfio_device_put(device);
return -EBUSY;
- }
- devs->devices[devs->cur_index++] = vdev;
- return 0;
-}
static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; @@ -2208,79 +2179,86 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) return 0; }
+static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) +{
- struct vfio_device_set *dev_set = data;
- struct vfio_device *cur;
- lockdep_assert_held(&dev_set->lock);
- list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
if (cur->dev == &pdev->dev)
return 0;
- return -EBUSY;
+}
+static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set)
Slight nit on the name here since we're essentially combining needs_reset along with the notion of the device being unused. I'm not sure, maybe "should_reset"? Otherwise it looks ok. Thanks,
Alex
+{
- struct vfio_pci_device *cur;
- bool needs_reset = false;
- list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
/* No VFIO device in the set can have an open device FD */
if (cur->vdev.open_count)
return false;
needs_reset |= cur->needs_reset;
- }
- return needs_reset;
+}
/*
- If a bus or slot reset is available for the provided device and:
- If a bus or slot reset is available for the provided dev_set and:
- All of the devices affected by that bus or slot reset are unused
- (!refcnt)
- At least one of the affected devices is marked dirty via
- needs_reset (such as by lack of FLR support)
- Then attempt to perform that bus or slot reset. Callers are required
- to hold vdev->dev_set->lock, protecting the bus/slot reset group from
- concurrent opens. A vfio_device reference is acquired for each device
- to prevent unbinds during the reset operation.
- NB: vfio-core considers a group to be viable even if some devices are
- bound to drivers like pci-stub or pcieport. Here we require all devices
- to be bound to vfio_pci since that's the only way we can be sure they
- stay put.
- Then attempt to perform that bus or slot reset.
*/
- Returns true if the dev_set was reset.
-static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) +static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set) {
- struct vfio_devices devs = { .cur_index = 0 };
- int i = 0, ret = -EINVAL;
- bool slot = false;
- struct vfio_pci_device *tmp;
- struct vfio_pci_device *cur;
- struct pci_dev *pdev;
- int ret;
- if (!pci_probe_reset_slot(vdev->pdev->slot))
slot = true;
- else if (pci_probe_reset_bus(vdev->pdev->bus))
return;
- lockdep_assert_held(&dev_set->lock);
- if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
&i, slot) || !i)
return;
- /*
* By definition all PCI devices in the dev_set share the same PCI
* reset, so any pci_dev will have the same outcomes for
* pci_probe_reset_*() and pci_reset_bus().
*/
- pdev = list_first_entry(&dev_set->device_list, struct vfio_pci_device,
vdev.dev_set_list)->pdev;
- devs.max_index = i;
- devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL);
- if (!devs.devices)
return;
- /* Reset of the dev_set is possible */
- if (pci_probe_reset_slot(pdev->slot) && pci_probe_reset_bus(pdev->bus))
return false;
- if (vfio_pci_for_each_slot_or_bus(vdev->pdev,
vfio_pci_get_unused_devs,
&devs, slot))
goto put_devs;
- if (!vfio_pci_dev_set_needs_reset(dev_set))
return false;
- /* Does at least one need a reset? */
- for (i = 0; i < devs.cur_index; i++) {
tmp = devs.devices[i];
if (tmp->needs_reset) {
ret = pci_reset_bus(vdev->pdev);
break;
}
- /*
* vfio-core considers a group to be viable and will create a
* vfio_device even if some devices are bound to drivers like pci-stub
* or pcieport. Here we require all PCI devices to be inside our dev_set
* since that ensures they stay put and that every driver controlling
* the device can co-ordinate with the device reset.
*/
- if (vfio_pci_for_each_slot_or_bus(pdev, vfio_pci_is_device_in_set,
dev_set,
!pci_probe_reset_slot(pdev->slot)))
return false;
- ret = pci_reset_bus(pdev);
- if (ret)
return false;
- list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
cur->needs_reset = false;
if (!disable_idle_d3)
}vfio_pci_set_power_state(cur, PCI_D3hot);
-put_devs:
- for (i = 0; i < devs.cur_index; i++) {
tmp = devs.devices[i];
/*
* If reset was successful, affected devices no longer need
* a reset and we should return all the collateral devices
* to low power. If not successful, we either didn't reset
* the bus or timed out waiting for it, so let's not touch
* the power state.
*/
if (!ret) {
tmp->needs_reset = false;
if (tmp != vdev && !disable_idle_d3)
vfio_pci_set_power_state(tmp, PCI_D3hot);
}
vfio_device_put(&tmp->vdev);
- }
- kfree(devs.devices);
- return true;
}
static void __exit vfio_pci_cleanup(void)
On Thu, Aug 05, 2021 at 11:33:11AM -0600, Alex Williamson wrote:
+static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) +{
- struct vfio_device_set *dev_set = data;
- struct vfio_device *cur;
- lockdep_assert_held(&dev_set->lock);
- list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
if (cur->dev == &pdev->dev)
return 0;
- return -EBUSY;
+}
+static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set)
Slight nit on the name here since we're essentially combining needs_reset along with the notion of the device being unused. I'm not sure, maybe "should_reset"? Otherwise it looks ok. Thanks,
What I did is add a new function vfio_pci_dev_set_resettable() which pulls in three parts of logic that can be be shared with the VFIO_DEVICE_PCI_HOT_RESET change in the next patch. That leaves this function as purely needs_reset.
In turn the VFIO_DEVICE_PCI_HOT_RESET patch gets the same treatment where it becomes a dev_set centric API just like this.
I'll send it as a v4.
Thanks, Jason
Like vfio_pci_try_bus_reset() this code wants to reset all of the devices in the "reset group" which is the same membership as the device set.
Instead of trying to reconstruct the device set from the PCI list go directly from the device set's device list to execute the reset.
The same basic structure as vfio_pci_try_bus_reset() is used. The 'vfio_devices' struct is replaced with the device set linked list and we simply sweep it multiple times under the lock.
This eliminates a memory allocation and get/put traffic and another improperly locked test of pci_dev_driver().
Reviewed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/vfio/pci/vfio_pci.c | 215 +++++++++++++++--------------------- 1 file changed, 91 insertions(+), 124 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index a1ae9a83a38621..721dcc99aaa042 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -223,9 +223,11 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) } }
+struct vfio_pci_group_info; static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); static void vfio_pci_disable(struct vfio_pci_device *vdev); -static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data); +static int vfio_hot_reset_device_set(struct vfio_pci_device *vdev, + struct vfio_pci_group_info *groups);
/* * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND @@ -645,37 +647,11 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data) return 0; }
-struct vfio_pci_group_entry { - struct vfio_group *group; - int id; -}; - struct vfio_pci_group_info { int count; - struct vfio_pci_group_entry *groups; + struct vfio_group **groups; };
-static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_group_info *info = data; - struct iommu_group *group; - int id, i; - - group = iommu_group_get(&pdev->dev); - if (!group) - return -EPERM; - - id = iommu_group_id(group); - - for (i = 0; i < info->count; i++) - if (info->groups[i].id == id) - break; - - iommu_group_put(group); - - return (i == info->count) ? -EINVAL : 0; -} - static bool vfio_pci_dev_below_slot(struct pci_dev *pdev, struct pci_slot *slot) { for (; pdev; pdev = pdev->bus->self) @@ -753,12 +729,6 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, return 0; }
-struct vfio_devices { - struct vfio_pci_device **devices; - int cur_index; - int max_index; -}; - static long vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int cmd, unsigned long arg) { @@ -1127,11 +1097,10 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev, } else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) { struct vfio_pci_hot_reset hdr; int32_t *group_fds; - struct vfio_pci_group_entry *groups; + struct vfio_group **groups; struct vfio_pci_group_info info; - struct vfio_devices devs = { .cur_index = 0 }; bool slot = false; - int i, group_idx, mem_idx = 0, count = 0, ret = 0; + int group_idx, count = 0, ret = 0;
minsz = offsetofend(struct vfio_pci_hot_reset, count);
@@ -1198,9 +1167,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev, break; }
- groups[group_idx].group = group; - groups[group_idx].id = - vfio_external_user_iommu_id(group); + groups[group_idx] = group; }
kfree(group_fds); @@ -1212,64 +1179,11 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev, info.count = hdr.count; info.groups = groups;
- /* - * Test whether all the affected devices are contained - * by the set of groups provided by the user. - */ - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_validate_devs, - &info, slot); - if (ret) - goto hot_reset_release; - - devs.max_index = count; - devs.devices = kcalloc(count, sizeof(struct vfio_device *), - GFP_KERNEL); - if (!devs.devices) { - ret = -ENOMEM; - goto hot_reset_release; - } - - /* - * We need to get memory_lock for each device, but devices - * can share mmap_lock, therefore we need to zap and hold - * the vma_lock for each device, and only then get each - * memory_lock. - */ - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_try_zap_and_vma_lock_cb, - &devs, slot); - if (ret) - goto hot_reset_release; - - for (; mem_idx < devs.cur_index; mem_idx++) { - struct vfio_pci_device *tmp = devs.devices[mem_idx]; - - ret = down_write_trylock(&tmp->memory_lock); - if (!ret) { - ret = -EBUSY; - goto hot_reset_release; - } - mutex_unlock(&tmp->vma_lock); - } - - /* User has access, do the reset */ - ret = pci_reset_bus(vdev->pdev); + ret = vfio_hot_reset_device_set(vdev, &info);
hot_reset_release: - for (i = 0; i < devs.cur_index; i++) { - struct vfio_pci_device *tmp = devs.devices[i]; - - if (i < mem_idx) - up_write(&tmp->memory_lock); - else - mutex_unlock(&tmp->vma_lock); - vfio_device_put(&tmp->vdev); - } - kfree(devs.devices); - for (group_idx--; group_idx >= 0; group_idx--) - vfio_group_put_external_user(groups[group_idx].group); + vfio_group_put_external_user(groups[group_idx]);
kfree(groups); return ret; @@ -2148,37 +2062,15 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, };
-static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) +static bool vfio_dev_in_groups(struct vfio_pci_device *vdev, + struct vfio_pci_group_info *groups) { - struct vfio_devices *devs = data; - struct vfio_device *device; - struct vfio_pci_device *vdev; - - if (devs->cur_index == devs->max_index) - return -ENOSPC; - - device = vfio_device_get_from_dev(&pdev->dev); - if (!device) - return -EINVAL; - - if (pci_dev_driver(pdev) != &vfio_pci_driver) { - vfio_device_put(device); - return -EBUSY; - } - - vdev = container_of(device, struct vfio_pci_device, vdev); + unsigned int i;
- /* - * Locking multiple devices is prone to deadlock, runaway and - * unwind if we hit contention. - */ - if (!vfio_pci_zap_and_vma_lock(vdev, true)) { - vfio_device_put(device); - return -EBUSY; - } - - devs->devices[devs->cur_index++] = vdev; - return 0; + for (i = 0; i < groups->count; i++) + if (groups->groups[i] == vdev->vdev.group) + return true; + return false; }
static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) @@ -2194,6 +2086,81 @@ static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) return -EBUSY; }
+/* + * We need to get memory_lock for each device, but devices can share mmap_lock, + * therefore we need to zap and hold the vma_lock for each device, and only then + * get each memory_lock. + */ +static int vfio_hot_reset_device_set(struct vfio_pci_device *vdev, + struct vfio_pci_group_info *groups) +{ + struct vfio_device_set *dev_set = vdev->vdev.dev_set; + struct vfio_pci_device *cur_mem; + struct vfio_pci_device *cur_vma; + struct vfio_pci_device *cur; + bool is_mem = true; + int ret; + + mutex_lock(&dev_set->lock); + cur_mem = list_first_entry(&dev_set->device_list, + struct vfio_pci_device, vdev.dev_set_list); + + /* All devices in the group to be reset need VFIO devices */ + if (vfio_pci_for_each_slot_or_bus( + vdev->pdev, vfio_pci_is_device_in_set, dev_set, + !pci_probe_reset_slot(vdev->pdev->slot))) { + ret = -EINVAL; + goto err_unlock; + } + + list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) { + /* + * Test whether all the affected devices are contained by the + * set of groups provided by the user. + */ + if (!vfio_dev_in_groups(cur_vma, groups)) { + ret = -EINVAL; + goto err_undo; + } + + /* + * Locking multiple devices is prone to deadlock, runaway and + * unwind if we hit contention. + */ + if (!vfio_pci_zap_and_vma_lock(cur_vma, true)) { + ret = -EBUSY; + goto err_undo; + } + } + cur_vma = NULL; + + list_for_each_entry(cur_mem, &dev_set->device_list, vdev.dev_set_list) { + if (!down_write_trylock(&cur_mem->memory_lock)) { + ret = -EBUSY; + goto err_undo; + } + mutex_unlock(&cur_mem->vma_lock); + } + cur_mem = NULL; + + ret = pci_reset_bus(vdev->pdev); + +err_undo: + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { + if (cur == cur_mem) + is_mem = false; + if (cur == cur_vma) + break; + if (is_mem) + up_write(&cur->memory_lock); + else + mutex_unlock(&cur->vma_lock); + } +err_unlock: + mutex_unlock(&dev_set->lock); + return ret; +} + /* * vfio-core considers a group to be viable and will create a vfio_device even * if some devices are bound to drivers like pci-stub or pcieport. Here we
mbochs_close() iterates over global device state and frees it. Currently this is done every time a device FD is closed, but if multiple device FDs are open this could corrupt other still active FDs.
Change this to use close_device() so it only runs on the last close.
Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- samples/vfio-mdev/mbochs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c index 5ac65894fcd38c..6974626ec1c5d0 100644 --- a/samples/vfio-mdev/mbochs.c +++ b/samples/vfio-mdev/mbochs.c @@ -1278,7 +1278,7 @@ static long mbochs_ioctl(struct vfio_device *vdev, unsigned int cmd, return -ENOTTY; }
-static void mbochs_close(struct vfio_device *vdev) +static void mbochs_close_device(struct vfio_device *vdev) { struct mdev_state *mdev_state = container_of(vdev, struct mdev_state, vdev); @@ -1396,7 +1396,7 @@ static struct attribute_group *mdev_type_groups[] = { };
static const struct vfio_device_ops mbochs_dev_ops = { - .release = mbochs_close, + .close_device = mbochs_close_device, .read = mbochs_read, .write = mbochs_write, .ioctl = mbochs_ioctl,
The user can open multiple device FDs if it likes, however these open() functions call vfio_register_notifier() on some device global state. Calling vfio_register_notifier() twice in will trigger a WARN_ON from notifier_chain_register() and the first close will wrongly delete the notifier and more.
Since these really want the new open/close_device() semantics just change the functions over.
Reviewed-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/s390/cio/vfio_ccw_ops.c | 8 ++++---- drivers/s390/crypto/vfio_ap_ops.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c index c57d2a7f091975..7f540ad0b568bc 100644 --- a/drivers/s390/cio/vfio_ccw_ops.c +++ b/drivers/s390/cio/vfio_ccw_ops.c @@ -159,7 +159,7 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev) return 0; }
-static int vfio_ccw_mdev_open(struct mdev_device *mdev) +static int vfio_ccw_mdev_open_device(struct mdev_device *mdev) { struct vfio_ccw_private *private = dev_get_drvdata(mdev_parent_dev(mdev)); @@ -194,7 +194,7 @@ static int vfio_ccw_mdev_open(struct mdev_device *mdev) return ret; }
-static void vfio_ccw_mdev_release(struct mdev_device *mdev) +static void vfio_ccw_mdev_close_device(struct mdev_device *mdev) { struct vfio_ccw_private *private = dev_get_drvdata(mdev_parent_dev(mdev)); @@ -638,8 +638,8 @@ static const struct mdev_parent_ops vfio_ccw_mdev_ops = { .supported_type_groups = mdev_type_groups, .create = vfio_ccw_mdev_create, .remove = vfio_ccw_mdev_remove, - .open = vfio_ccw_mdev_open, - .release = vfio_ccw_mdev_release, + .open_device = vfio_ccw_mdev_open_device, + .close_device = vfio_ccw_mdev_close_device, .read = vfio_ccw_mdev_read, .write = vfio_ccw_mdev_write, .ioctl = vfio_ccw_mdev_ioctl, diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index 122c85c224695e..cee5626fe0a4ef 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -1315,7 +1315,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) return rc; }
-static int vfio_ap_mdev_open(struct mdev_device *mdev) +static int vfio_ap_mdev_open_device(struct mdev_device *mdev) { struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); unsigned long events; @@ -1348,7 +1348,7 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev) return ret; }
-static void vfio_ap_mdev_release(struct mdev_device *mdev) +static void vfio_ap_mdev_close_device(struct mdev_device *mdev) { struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
@@ -1427,8 +1427,8 @@ static const struct mdev_parent_ops vfio_ap_matrix_ops = { .mdev_attr_groups = vfio_ap_mdev_attr_groups, .create = vfio_ap_mdev_create, .remove = vfio_ap_mdev_remove, - .open = vfio_ap_mdev_open, - .release = vfio_ap_mdev_release, + .open_device = vfio_ap_mdev_open_device, + .close_device = vfio_ap_mdev_close_device, .ioctl = vfio_ap_mdev_ioctl, };
The user can open multiple device FDs if it likes, however the open function calls vfio_register_notifier() on device global state. Calling vfio_register_notifier() twice will trigger a WARN_ON from notifier_chain_register() and the first close will wrongly delete the notifier and more.
Since these really want the new open/close_device() semantics just change the function over.
Reviewed-by: Zhenyu Wang zhenyuw@linux.intel.com Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/gpu/drm/i915/gvt/kvmgt.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 1ac98f8aba31e6..7efa386449d104 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -885,7 +885,7 @@ static int intel_vgpu_group_notifier(struct notifier_block *nb, return NOTIFY_OK; }
-static int intel_vgpu_open(struct mdev_device *mdev) +static int intel_vgpu_open_device(struct mdev_device *mdev) { struct intel_vgpu *vgpu = mdev_get_drvdata(mdev); struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu); @@ -1004,7 +1004,7 @@ static void __intel_vgpu_release(struct intel_vgpu *vgpu) vgpu->handle = 0; }
-static void intel_vgpu_release(struct mdev_device *mdev) +static void intel_vgpu_close_device(struct mdev_device *mdev) { struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
@@ -1753,8 +1753,8 @@ static struct mdev_parent_ops intel_vgpu_ops = { .create = intel_vgpu_create, .remove = intel_vgpu_remove,
- .open = intel_vgpu_open, - .release = intel_vgpu_release, + .open_device = intel_vgpu_open_device, + .close_device = intel_vgpu_close_device,
.read = intel_vgpu_read, .write = intel_vgpu_write,
Nothing uses this anymore, delete it.
Signed-off-by: Yishai Hadas yishaih@nvidia.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/vfio/mdev/vfio_mdev.c | 22 ---------------------- drivers/vfio/vfio.c | 14 +------------- include/linux/mdev.h | 7 ------- include/linux/vfio.h | 4 ---- 4 files changed, 1 insertion(+), 46 deletions(-)
diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c index 725cd2fe675190..5174974e5fb5f9 100644 --- a/drivers/vfio/mdev/vfio_mdev.c +++ b/drivers/vfio/mdev/vfio_mdev.c @@ -37,26 +37,6 @@ static void vfio_mdev_close_device(struct vfio_device *core_vdev) parent->ops->close_device(mdev); }
-static int vfio_mdev_open(struct vfio_device *core_vdev) -{ - struct mdev_device *mdev = to_mdev_device(core_vdev->dev); - struct mdev_parent *parent = mdev->type->parent; - - if (unlikely(!parent->ops->open)) - return -EINVAL; - - return parent->ops->open(mdev); -} - -static void vfio_mdev_release(struct vfio_device *core_vdev) -{ - struct mdev_device *mdev = to_mdev_device(core_vdev->dev); - struct mdev_parent *parent = mdev->type->parent; - - if (likely(parent->ops->release)) - parent->ops->release(mdev); -} - static long vfio_mdev_unlocked_ioctl(struct vfio_device *core_vdev, unsigned int cmd, unsigned long arg) { @@ -122,8 +102,6 @@ static const struct vfio_device_ops vfio_mdev_dev_ops = { .name = "vfio-mdev", .open_device = vfio_mdev_open_device, .close_device = vfio_mdev_close_device, - .open = vfio_mdev_open, - .release = vfio_mdev_release, .ioctl = vfio_mdev_unlocked_ioctl, .read = vfio_mdev_read, .write = vfio_mdev_write, diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 9cc17768c42554..3c034fe14ccb03 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1470,19 +1470,13 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) } mutex_unlock(&device->dev_set->lock);
- if (device->ops->open) { - ret = device->ops->open(device); - if (ret) - goto err_close_device; - } - /* * We can't use anon_inode_getfd() because we need to modify * the f_mode flags directly to allow more than just ioctls */ fdno = ret = get_unused_fd_flags(O_CLOEXEC); if (ret < 0) - goto err_release; + goto err_close_device;
filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops, device, O_RDWR); @@ -1509,9 +1503,6 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
err_fd: put_unused_fd(fdno); -err_release: - if (device->ops->release) - device->ops->release(device); err_close_device: mutex_lock(&device->dev_set->lock); if (device->open_count == 1 && device->ops->close_device) @@ -1659,9 +1650,6 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) { struct vfio_device *device = filep->private_data;
- if (device->ops->release) - device->ops->release(device); - mutex_lock(&device->dev_set->lock); if (!--device->open_count && device->ops->close_device) device->ops->close_device(device); diff --git a/include/linux/mdev.h b/include/linux/mdev.h index cb5b7ed1d7c30d..68427e8fadebd6 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -72,11 +72,6 @@ struct device *mtype_get_parent_dev(struct mdev_type *mtype); * @mdev: mdev_device device structure which is being * destroyed * Returns integer: success (0) or error (< 0) - * @open: Open mediated device. - * @mdev: mediated device. - * Returns integer: success (0) or error (< 0) - * @release: release mediated device - * @mdev: mediated device. * @read: Read emulation callback * @mdev: mediated device structure * @buf: read buffer @@ -113,8 +108,6 @@ struct mdev_parent_ops { int (*remove)(struct mdev_device *mdev); int (*open_device)(struct mdev_device *mdev); void (*close_device)(struct mdev_device *mdev); - int (*open)(struct mdev_device *mdev); - void (*release)(struct mdev_device *mdev); ssize_t (*read)(struct mdev_device *mdev, char __user *buf, size_t count, loff_t *ppos); ssize_t (*write)(struct mdev_device *mdev, const char __user *buf, diff --git a/include/linux/vfio.h b/include/linux/vfio.h index f0e6a72875e471..b53a9557884ada 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -46,8 +46,6 @@ struct vfio_device { * * @open_device: Called when the first file descriptor is opened for this device * @close_device: Opposite of open_device - * @open: Called when userspace creates new file descriptor for device - * @release: Called when userspace releases file descriptor for device * @read: Perform read(2) on device file descriptor * @write: Perform write(2) on device file descriptor * @ioctl: Perform ioctl(2) on device file descriptor, supporting VFIO_DEVICE_* @@ -62,8 +60,6 @@ struct vfio_device_ops { char *name; int (*open_device)(struct vfio_device *vdev); void (*close_device)(struct vfio_device *vdev); - int (*open)(struct vfio_device *vdev); - void (*release)(struct vfio_device *vdev); ssize_t (*read)(struct vfio_device *vdev, char __user *buf, size_t count, loff_t *ppos); ssize_t (*write)(struct vfio_device *vdev, const char __user *buf,
dri-devel@lists.freedesktop.org