Currently, kernel/module annotates module dependencies when request_symbol is used, but it doesn't cover more complex inter-driver dependencies that are subsystem and/or driver-specific.
In the case of hdmi sound, depending on the CPU/GPU, sometimes the snd_hda_driver can talk directly with the hardware, but sometimes, it uses the i915 driver. When the snd_hda_driver uses i915, it should first be unbind/rmmod, as otherwise trying to unbind/rmmod the i915 driver cause driver issues, as as reported by CI tools with different GPU models:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_6415/fi-tgl-1115g4/igt@core_hot... https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11495/bat-adlm-1/igt@i915_mo...
In the past, just a few CPUs were doing such bindings, but this issue now applies to all "modern" Intel CPUs that have onboard graphics, as well as to the newer discrete GPUs.
With the discrete GPU case, the HDA controller is physically separate and requires i915 to power on the hardware for all hardware access. In this case, the issue is hit basicly 100% of the time.
With on-board graphics, i915 driver is needed only when the display codec is accessed. If i915 is unbind during runtime suspend, while snd-hda-intel is still bound, nothing bad happens, but unbinding i915 on other situations may also cause issues.
So, add support at kernel/modules to allow snd-hda drivers to properly annotate when a dependency on a DRM driver dependencies exists, and add a call to such new function at the snd-hda driver when it successfully binds into the DRM driver.
This would allow userspace tools to check and properly remove the audio driver before trying to remove or unbind the GPU driver.
It should be noticed that this series conveys the hidden module dependencies. Other changes are needed in order to allow removing or unbinding the i915 driver while keeping the snd-hda-intel driver loaded/bound. With that regards, there are some discussions on how to improve this at alsa-devel a while back:
https://mailman.alsa-project.org/pipermail/alsa-devel/2021-September/190099....
So, future improvements on both in i915 and the audio drivers could be made. E.g. with discrete GPUs, it's the only codec of the card, so it seems feasible to detach the ALSA card if i915 is bound (using infra made for VGA switcheroo), but, until these improvements are done and land in upstream, audio drivers needs to be unbound if i915 driver goes unbind.
Yet, even if such fixes got merged, this series is still needed, as it makes such dependencies more explicit and easier to debug.
PS.: This series was generated against next-20220428.
Luis/Takashi/Daniel/David,
If OK for you, I would prefer to have such patches applied via the drm-tip tree once reviewed, in order to make easier to use them by some patches I'm preparing to improve the CI tests that use i915 unbind logic.
Mauro Carvalho Chehab (2): module: add a function to add module references ALSA: hda - identify when audio is provided by a video driver
include/linux/module.h | 7 +++++++ kernel/module/main.c | 31 +++++++++++++++++++++++++++++++ sound/hda/hdac_component.c | 8 ++++++++ 3 files changed, 46 insertions(+)
Sometimes, device drivers are bound using indirect references, which is not visible when looking at /proc/modules or lsmod.
Add a function to allow setting up module references for such cases.
Reviewed-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org ---
See [PATCH 0/2] at: https://lore.kernel.org/all/cover.1651212016.git.mchehab@kernel.org/
include/linux/module.h | 7 +++++++ kernel/module/main.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+)
diff --git a/include/linux/module.h b/include/linux/module.h index 46d4d5f2516e..be74f807e41d 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -596,6 +596,8 @@ static inline bool within_module(unsigned long addr, const struct module *mod) /* Search for module by name: must be in a RCU-sched critical section. */ struct module *find_module(const char *name);
+int module_add_named_dependency(const char *name, struct module *this); + /* Returns 0 and fills in value, defined and namebuf, or -ERANGE if symnum out of range. */ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, @@ -772,6 +774,11 @@ static inline int lookup_module_symbol_attrs(unsigned long addr, unsigned long * return -ERANGE; }
+static inline int module_add_named_dependency(const char *name, struct module *this) +{ + return 0; +} + static inline int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, char *name, char *module_name, int *exported) diff --git a/kernel/module/main.c b/kernel/module/main.c index 05a42d8fcd7a..dbd577ccc38c 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -631,6 +631,37 @@ static int ref_module(struct module *a, struct module *b) return 0; }
+int module_add_named_dependency(const char *name, struct module *this) +{ + struct module *mod; + int ret; + + if (!name || !this || !this->name) { + return -EINVAL; + } + + mutex_lock(&module_mutex); + mod = find_module(name); + if (!mod) { + ret = -EINVAL; + goto ret; + } + + ret = ref_module(this, mod); + if (ret) + goto ret; + +#ifdef CONFIG_MODULE_UNLOAD + ret = sysfs_create_link(mod->holders_dir, + &this->mkobj.kobj, this->name); +#endif + +ret: + mutex_unlock(&module_mutex); + return ret; +} +EXPORT_SYMBOL_GPL(module_add_named_dependency); + /* Clear the unload stuff of the module. */ static void module_unload_free(struct module *mod) {
On Fri, Apr 29, 2022 at 07:31:15AM +0100, Mauro Carvalho Chehab wrote:
This sounds like duct tape at the wrong level. We should have a device_link connecting these devices, and maybe device_link internally needs to make sure the respective driver modules stay around for long enough too. But open-coding this all over the place into every driver that has some kind of cross-driver dependency sounds terrible.
Or maybe the bug is that the snd driver keeps accessing the hw/component side when that is just plain gone. Iirc there's still fundamental issues there on the sound side of things, which have been attempted to paper over by timeouts and stuff like that in the past instead of enforcing a hard link between the snd and i915 side.
Adding Greg for device model questions like this. -Daniel
Hi Daniel,
Em Fri, 29 Apr 2022 09:54:10 +0200 Daniel Vetter daniel@ffwll.ch escreveu:
I agree with you that the device link between snd-hda and the DRM driver should properly handle unbinding on both directions. This is something that require further discussions with ALSA and DRM people, and we should keep working on it.
Yet, the binding between those drivers do exist, but, despite other similar inter-driver bindings being properly reported by lsmod, this one is invisible for userspace.
What this series does is to make such binding visible. As simple as that.
On Fri, Apr 29, 2022 at 09:07:57AM +0100, Mauro Carvalho Chehab wrote:
It also increases the reference count, and creates a user/kernel api with the symlinks, right? Will the reference count increase prevent the modules from now being unloadable?
This feels like a very "weak" link between modules that should not be needed if reference counting is implemented properly (so that things are cleaned up in the correct order.)
Please don't paper over the real issue with this hack.
thanks,
greg k-h
HI Greg,
Em Fri, 29 Apr 2022 10:30:33 +0200 Greg KH gregkh@linuxfoundation.org escreveu:
The refcount increment exists even without this patch, as hda_component_master_bind() at sound/hda/hdac_component.c uses try_module_get() when it creates the device link.
This series won't change anything with that regards. The only difference is that it will now properly report userspace that snd-hda will be using something inside the DRM driver (basically, it uses the DRM driver to power-control the HDA hardware on modern CPU/GPUs).
-
Btw, this is not the only case where userspace invisible bindings between two driver happen within the Kernel. On media, DVB drivers attach the frontend/tuner drivers using I2C bus on a way where lsmod doesn't current report any dependencies. See, for instance, PCTV 290e driver (partial) dependency chain:
$ lsmod Module Size Used by rc_pinnacle_pctv_hd 16384 0 em28xx_rc 20480 0 tda18271 53248 1 cxd2820r 45056 1 em28xx_dvb 36864 0 dvb_core 155648 2 cxd2820r,em28xx_dvb em28xx 106496 2 em28xx_rc,em28xx_dvb tveeprom 28672 1 em28xx videobuf2_vmalloc 20480 1 uvcvideo videobuf2_memops 20480 1 videobuf2_vmalloc videobuf2_common 69632 4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops videodev 266240 4 videobuf2_v4l2,uvcvideo,videobuf2_common,em28xx mc 65536 6 videodev,videobuf2_v4l2,uvcvideo,dvb_core,videobuf2_common,em28xx
In the above example, tda18271 is an I2C tuner driver which talks to the hardware via the I2C bus registered by the em28xx driver. It is loaded during em28xx probing time.
Again, lsmod doesn't show such dependencies. One can't remove the tuner driver without first removing the em28xx driver, which is the one that increments its refcount.
-
Back to the snd-hda issue, the problem is not with refcount. It is, instead, to provide a way for userspace to know what's the correct order to remove/unbind modules.
Regards, Mauro
On Fri, Apr 29, 2022 at 10:15:03AM +0100, Mauro Carvalho Chehab wrote:
Ok, then why shouldn't try_module_get() be creating this link instead of having to manually do it this way again? You don't want to have to go around and add this call to all users of that function, right?
thanks,
greg k-h
Em Fri, 29 Apr 2022 12:10:07 +0200 Greg KH gregkh@linuxfoundation.org escreveu:
Works for me, but this is not a too trivial change, as the new try_module_get() function will require two parameters, instead of one:
- the module to be referenced; - the module which will reference it.
On trivial cases, one will be THIS_MODULE, but, in the specific case of snd_hda, the binding is done via an ancillary routine under snd_hda_core, but the actual binding happens at snd_hda_intel.
Ok, we could add a __try_module_get() (or whatever other name that would properly express what it does) with two parameters, and then define try_module_get() as:
#define try_module_get(mod) __try_module_get(mod, THIS_MODULE)
Would that work for you?
Regards, Mauro
On Fri, Apr 29, 2022 at 11:23:51AM +0100, Mauro Carvalho Chehab wrote:
For calls that want to increment a module reference on behalf of a different code segment than is calling it, create a new function so we can audit-the-heck out of those code paths as odds are, they are unsafe.
For the normal code path, just turn try_module_get() into a macro that includes THIS_MODULE as part of it like we do for the driver register functions (see usb_register_driver() in include/linux/usb.h as an example of how to do that.)
Yes, that's the way forward.
thanks,
greg k-h
On Fri, Apr 29, 2022 at 12:36:01PM +0200, Greg KH wrote:
This might result in some complaints because it can create module refcount loops. Well, seemingly module refcount loops, you can break them by first unloading the driver, and then unloading the module. Currently a lot of folks rely on just unload the module to tear down the drivers (and all in right order through device links and component and all that). So maybe we want to make this some kind of weak reference, i.e. it shows up in lsmod, but the magic teardown still works and the module isn't actually pinnned.
Either way I agree that something like this sounds like the right approach. -Daniel
thanks,
greg k-h
On Fri, Apr 29, 2022 at 11:23:51AM +0100, Mauro Carvalho Chehab wrote:
agree that this should be done at this level rather than open coding it at every driver. Main improvement being fixed here regardless of the snd-hda-intel issue is to properly annotate what is holding a module.
Right now we have 1) symbol module dependencies; 2) kernel references; 3) userspace references. With (2) and (3) being unknown to the user from lsmod pov. Handling this any time try_module_get() is called would make (2) visible to lsmod.
Paired with fixes to the (unreleased) kmod 30[1], this allows `modprobe -r --remove-holders <module>` to also try removing the holders before removing the module itself.
thanks Lucas De Marchi
[1] https://lore.kernel.org/linux-modules/20220329090912.geymr6xk7taq4rtq@ldmart...
On some devices, the hda driver needs to hook into a video driver, in order to be able to properly access the audio hardware and/or the power management function.
That's the case of several snd_hda_intel devices that depends on i915 driver.
Currently, this dependency is hidden from /proc/modules and from lsmod, making harder to identify the need to unbind the audio driver before being able to unbind the i915 driver.
Add a reference for it at the module dependency, in order to allow userspace to identify it, and print a message on such cases.
Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org ---
See [PATCH 0/2] at: https://lore.kernel.org/all/cover.1651212016.git.mchehab@kernel.org/
sound/hda/hdac_component.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/hda/hdac_component.c b/sound/hda/hdac_component.c index bb37e7e0bd79..103c6be8be1e 100644 --- a/sound/hda/hdac_component.c +++ b/sound/hda/hdac_component.c @@ -211,6 +211,14 @@ static int hdac_component_master_bind(struct device *dev) }
complete_all(&acomp->master_bind_complete); + + if (acomp->ops->owner && acomp->ops->owner->name) { + dev_info(dev, "audio component provided by %s driver\n", + acomp->ops->owner->name); + module_add_named_dependency(acomp->ops->owner->name, + dev->driver->owner); + } + return 0;
module_put:
dri-devel@lists.freedesktop.org