On Fri, 2022-01-14 at 15:30 -0600, Stephen Boyd wrote:
Quoting Yong Wu (2022-01-14 01:06:31)
On Wed, 2022-01-12 at 20:25 -0800, Stephen Boyd wrote:
[ 2.654526] ------------[ cut here ]------------ [ 2.655558] refcount_t: addition on 0; use-after-free.
After this patch, the aggregate_driver flow looks ok. But our driver still aborts like this:
[ 2.721316] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [ 2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.732434] lr : mtk_smi_larb_resume+0x54/0x98 ... [ 2.742457] Call trace: [ 2.742768] mtk_smi_larb_config_port_gen2_general+0xa4/0x13 8 [ 2.743496] pm_generic_runtime_resume+0x2c/0x48 [ 2.744090] __genpd_runtime_resume+0x30/0xa8 [ 2.744648] genpd_runtime_resume+0x94/0x2c8 [ 2.745191] __rpm_callback+0x44/0x150 [ 2.745669] rpm_callback+0x6c/0x78 [ 2.746114] rpm_resume+0x314/0x558 [ 2.746559] __pm_runtime_resume+0x3c/0x88 [ 2.747080] pm_runtime_get_suppliers+0x7c/0x110 [ 2.747668] __driver_probe_device+0x4c/0xe8 [ 2.748212] driver_probe_device+0x44/0x130 [ 2.748745] __device_attach_driver+0x98/0xd0 [ 2.749300] bus_for_each_drv+0x68/0xd0 [ 2.749787] __device_attach+0xec/0x148 [ 2.750277] device_attach+0x14/0x20 [ 2.750733] bus_rescan_devices_helper+0x50/0x90 [ 2.751319] bus_for_each_dev+0x7c/0xd8 [ 2.751806] bus_rescan_devices+0x20/0x30 [ 2.752315] __component_add+0x7c/0xa0 [ 2.752795] component_add+0x14/0x20 [ 2.753253] mtk_smi_larb_probe+0xe0/0x120
This is because the device runtime_resume is called before the bind operation(In our case this detailed function is mtk_smi_larb_bind). The issue doesn't happen without this patchset. I'm not sure the right sequence. If we should fix in mediatek driver, the patch could be:
Oh, the runtime PM is moved around with these patches. The aggregate device is runtime PM enabled before the probe is called,
In our case, the component device may probe before the aggregate device. thus the component device runtime PM has already been enabled when aggregate device probe.
This is always the case. The component device always probes before the aggregate device because the component device registers with the component framework. But the component device can decide to enable runtime PM during driver probe or during component bind.
and there are supplier links made to each component, so each component is runtime resumed before the aggregate probe function is called.
Yes. This is the current flow.
Got it.
It means that all the component drivers need to have their resources ready to power on before their component_bind() callback is made.
Sorry, I don't understand here well. In this case, The component drivers prepare the resource for power on in the component_bind since the resource comes from the aggregate driver. Thus, we expect the component_bind run before the runtime resume callback.
What resource comes from the aggregate device that the component device manages in runtime PM?
Here the aggregate device is the iommu device. It knows who enabled the iommu from the dtsi, then transfers this information to the component(smi_larb) devices. smi_larb will configure the registers to enable iommu for these masters in the runtime resume.
Another solution is moving the component's pm_runtime_enable into the component_bind(It's mtk_smi_larb_bind here), then the runtime callback is called after component_bind in which the resource for power on is ready.
This sounds more correct to me. I'm not an expert on runtime PM though as I always have to read the code to remember how it works. if the device isn't ready for runtime PM until the component bind function is called then runtime PM shouldn't be enabled on the component device.
Anyway, We should update the SMI driver for this case. I prepare a patch into this patchset or I send it independently? which way is better?
The patch could be:
diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index b883dcc0bbfa..88854c2270a1 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -175,6 +175,8 @@ mtk_smi_larb_bind(struct device *dev, struct device *master, void *data) larb->larbid = i; larb->mmu = &larb_mmu[i].mmu; larb->bank = larb_mmu[i].bank; + + pm_runtime_enable(dev); return 0; } } @@ -450,7 +452,6 @@ static int mtk_smi_larb_probe(struct platform_device *pdev) if (ret < 0) return ret;
- pm_runtime_enable(dev); platform_set_drvdata(pdev, larb); ret = component_add(dev, &mtk_smi_larb_component_ops); if (ret)
Thanks.