We should tear down in the opposite order we set up.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Fixes: 537f9c84a427 ("drm/i915/pmu: Fix CPU hotplug with multiple GPUs") Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/i915/i915_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 67696d7522718..50ed93b03e582 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1244,8 +1244,8 @@ static void __exit i915_exit(void)
i915_perf_sysctl_unregister(); pci_unregister_driver(&i915_pci_driver); - i915_globals_exit(); i915_pmu_exit(); + i915_globals_exit(); }
module_init(i915_init);
In i915_exit(), we check i915_pci_driver.driver.owner to detect if i915_init exited early and don't tear anything down. However, we didn't have proper tear-down paths for early exits in i915_init().
Most of the time, you would never notice this as driver init failures are extremely rare and generally the sign of a bigger bug. However, when the mock self-tests are run, they run as part of i915_init() and exit early once they complete. They run after i915_globals_init() and before we set up anything else. The IGT test then unloads the module, invoking i915_exit() which, thanks to our i915_pci_driver.driver.owner check, doesn't actually tear anything down. Importantly, this means i915_globals_exit() never gets called even though i915_globals_init() was and we leak the globals.
The most annoying part is that you don't actually notice the failure as part of the self-tests since leaking a bit of memory, while bad, doesn't result in anything observable from userspace. Instead, the next time we load the driver (usually for next IGT test), i915_globals_init() gets invoked again, we go to allocate a bunch of new memory slabs, those implicitly create debugfs entries, and debugfs warns that we're trying to create directories and files that already exist. Since this all happens as part of the next driver load, it shows up in the dmesg-warn of whatever IGT test ran after the mock selftests.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/i915/i915_globals.c | 4 ++-- drivers/gpu/drm/i915/i915_pci.c | 23 +++++++++++++++++------ 2 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c index 77f1911c463b8..87267e1d2ad92 100644 --- a/drivers/gpu/drm/i915/i915_globals.c +++ b/drivers/gpu/drm/i915/i915_globals.c @@ -138,7 +138,7 @@ void i915_globals_unpark(void) atomic_inc(&active); }
-static void __exit __i915_globals_flush(void) +static void __i915_globals_flush(void) { atomic_inc(&active); /* skip shrinking */
@@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void) atomic_dec(&active); }
-void __exit i915_globals_exit(void) +void i915_globals_exit(void) { GEM_BUG_ON(atomic_read(&active));
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 50ed93b03e582..783f547be0990 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1199,13 +1199,20 @@ static int __init i915_init(void) bool use_kms = true; int err;
+ /* We use this to detect early returns from i915_init() so we don't + * tear anything down in i915_exit() + */ + i915_pci_driver.driver.owner = NULL; + err = i915_globals_init(); if (err) return err;
err = i915_mock_selftests(); - if (err) - return err > 0 ? 0 : err; + if (err) { + err = err > 0 ? 0 : err; + goto globals_exit; + }
/* * Enable KMS by default, unless explicitly overriden by @@ -1228,13 +1235,17 @@ static int __init i915_init(void) i915_pmu_init();
err = pci_register_driver(&i915_pci_driver); - if (err) { - i915_pmu_exit(); - return err; - } + if (err) + goto pmu_exit;
i915_perf_sysctl_register(); return 0; + +pmu_exit: + i915_pmu_exit(); +globals_exit: + i915_globals_exit(); + return err; }
static void __exit i915_exit(void)
On Sat, Jul 17, 2021 at 12:48 AM Jason Ekstrand jason@jlekstrand.net wrote:
In i915_exit(), we check i915_pci_driver.driver.owner to detect if i915_init exited early and don't tear anything down. However, we didn't have proper tear-down paths for early exits in i915_init().
Most of the time, you would never notice this as driver init failures are extremely rare and generally the sign of a bigger bug. However, when the mock self-tests are run, they run as part of i915_init() and exit early once they complete. They run after i915_globals_init() and before we set up anything else. The IGT test then unloads the module, invoking i915_exit() which, thanks to our i915_pci_driver.driver.owner check, doesn't actually tear anything down. Importantly, this means i915_globals_exit() never gets called even though i915_globals_init() was and we leak the globals.
The most annoying part is that you don't actually notice the failure as part of the self-tests since leaking a bit of memory, while bad, doesn't result in anything observable from userspace. Instead, the next time we load the driver (usually for next IGT test), i915_globals_init() gets invoked again, we go to allocate a bunch of new memory slabs, those implicitly create debugfs entries, and debugfs warns that we're trying to create directories and files that already exist. Since this all happens as part of the next driver load, it shows up in the dmesg-warn of whatever IGT test ran after the mock selftests.
My idea was to onion-unwind in i915_exit, but that means we need to carry state over or have checks for every step, which is a bit annoying.
Yours unwinds even if i915_init returns 0, i.e. success, if we had some selftests, which is most unusual and I think deserves an explainer here in the commit message and maybe somewhere in the code.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/i915/i915_globals.c | 4 ++-- drivers/gpu/drm/i915/i915_pci.c | 23 +++++++++++++++++------ 2 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c index 77f1911c463b8..87267e1d2ad92 100644 --- a/drivers/gpu/drm/i915/i915_globals.c +++ b/drivers/gpu/drm/i915/i915_globals.c @@ -138,7 +138,7 @@ void i915_globals_unpark(void) atomic_inc(&active); }
-static void __exit __i915_globals_flush(void) +static void __i915_globals_flush(void) { atomic_inc(&active); /* skip shrinking */
@@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void) atomic_dec(&active); }
-void __exit i915_globals_exit(void) +void i915_globals_exit(void) { GEM_BUG_ON(atomic_read(&active));
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 50ed93b03e582..783f547be0990 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1199,13 +1199,20 @@ static int __init i915_init(void) bool use_kms = true; int err;
/* We use this to detect early returns from i915_init() so we don't
* tear anything down in i915_exit()
*/
i915_pci_driver.driver.owner = NULL;
Setting this seems redundant? Or if you want to make it explicit, just have a dedicated bool with a big comment explaining that only when we load the full pci driver do we tear down stuff in i915_exit. You could then set after pci_register_driver was successful. Some screaming name like driver_fully_loaded or something like that ...
err = i915_globals_init(); if (err) return err; err = i915_mock_selftests();
if (err)
return err > 0 ? 0 : err;
if (err) {
err = err > 0 ? 0 : err;
goto globals_exit;
} /* * Enable KMS by default, unless explicitly overriden by
Imo move this up, but if you want I can send out my diff so you score an r-b: tag :-)
@@ -1228,13 +1235,17 @@ static int __init i915_init(void) i915_pmu_init();
err = pci_register_driver(&i915_pci_driver);
if (err) {
i915_pmu_exit();
return err;
}
if (err)
goto pmu_exit; i915_perf_sysctl_register(); return 0;
We unwind even on success, which is most unusual. I think that deserves a comment.
+pmu_exit:
i915_pmu_exit();
+globals_exit:
i915_globals_exit();
return err;
}
static void __exit i915_exit(void)
2.31.1
On Sat, Jul 17, 2021 at 12:48 AM Jason Ekstrand jason@jlekstrand.net wrote:
We should tear down in the opposite order we set up.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Fixes: 537f9c84a427 ("drm/i915/pmu: Fix CPU hotplug with multiple GPUs") Cc: Daniel Vetter daniel@ffwll.ch
Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch
drivers/gpu/drm/i915/i915_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 67696d7522718..50ed93b03e582 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1244,8 +1244,8 @@ static void __exit i915_exit(void)
i915_perf_sysctl_unregister(); pci_unregister_driver(&i915_pci_driver);
i915_globals_exit(); i915_pmu_exit();
i915_globals_exit();
}
module_init(i915_init);
2.31.1
Noticed PMU being mentioned..
On 16/07/2021 23:47, Jason Ekstrand wrote:
We should tear down in the opposite order we set up.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Fixes: 537f9c84a427 ("drm/i915/pmu: Fix CPU hotplug with multiple GPUs")
1) You can use 'dim fixes <sha>' to get you the correct cc list when using the fixes tag. But:
2) Fixes tag looks like should be removed to avoid potential needless backporting since I can't see that there is any inter-dependency between i915_pmu_exit and i915_globals_exit, hence nothing is getting fixes really, just tidying of the order.
With fixes removed:
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/i915/i915_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 67696d7522718..50ed93b03e582 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1244,8 +1244,8 @@ static void __exit i915_exit(void)
i915_perf_sysctl_unregister(); pci_unregister_driver(&i915_pci_driver);
- i915_globals_exit(); i915_pmu_exit();
i915_globals_exit(); }
module_init(i915_init);
dri-devel@lists.freedesktop.org