On Thu, Sep 26, 2019 at 10:26:08AM -0600, Alex Williamson wrote:
On Thu, 26 Sep 2019 11:46:55 -0400 "Michael S. Tsirkin" mst@redhat.com wrote:
On Wed, Sep 25, 2019 at 10:30:28AM -0600, Alex Williamson wrote:
On Wed, 25 Sep 2019 10:11:00 -0400 Rob Miller rob.miller@broadcom.com wrote:
On Tue, 24 Sep 2019 21:53:29 +0800 Jason Wang jasowang@redhat.com wrote: > diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c > index 891cf83a2d9a..95efa054442f 100644 > --- a/drivers/vfio/mdev/vfio_mdev.c > +++ b/drivers/vfio/mdev/vfio_mdev.c > @@ -14,6 +14,7 @@ > #include <linux/slab.h> > #include <linux/vfio.h> > #include <linux/mdev.h> > +#include <linux/vfio_mdev.h> > > #include "mdev_private.h" > > @@ -24,16 +25,16 @@ > static int vfio_mdev_open(void *device_data) > { > struct mdev_device *mdev = device_data; > - struct mdev_parent *parent = mdev->parent; > + const struct vfio_mdev_device_ops *ops = mdev_get_dev_ops(mdev); > int ret; > > - if (unlikely(!parent->ops->open)) > + if (unlikely(!ops->open)) > return -EINVAL; > > if (!try_module_get(THIS_MODULE)) > return -ENODEV;
RJM>] My understanding lately is that this call to try_module_get(THIS_MODULE) is no longer needed as is considered as a latent bug. Quote from https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use... : There are a number of uses of try_module_get(THIS_MODULE) in the kernel source but most if not all of them are latent bugs that should be cleaned up.
This use seems to fall exactly into the case where it is necessary, the open here is not a direct VFS call, it's an internal interface between modules. The user is interacting with filesystem objects from the vfio module and the module reference we're trying to acquire here is to the vfio-mdev module. Thanks,
Alex
I think the latent bug refers not to module get per se, but to the module_put tied to it. E.g.:
static void vfio_mdev_release(void *device_data) { struct mdev_device *mdev = device_data; struct mdev_parent *parent = mdev->parent;
if (likely(parent->ops->release)) parent->ops->release(mdev); module_put(THIS_MODULE);
Does anything prevent the module from unloading at this point? if not then ...
}
it looks like the implicit return (with instructions for argument pop and functuon return) here can get overwritten on module unload, causing a crash when executed.
IOW there's generally no way for module to keep a reference to itself: it can take a reference but it needs someone else to keep it and put.
I'd always assumed this would exit cleanly, but perhaps there is a latent race there. In any case, taking a module reference within the module in this case is better than not doing so, as the latter would potentially allow the module to be removed at any point in time, while the former only seems to expose acquire and release gaps. Add it to the todo list. Thanks,
Alex
Right. I agree with the stack overflow quote: as this example seems to show this is a latent bug.
But I also agree that just removing the reference isn't the right way to clean it up.