On Wednesday, January 16, 2019 7:42:45 PM CET Daniel Vetter wrote:
On Tue, Jan 15, 2019 at 11:47 PM Rafael J. Wysocki rjw@rjwysocki.net wrote:
[CC Greg]
On Tuesday, January 15, 2019 1:04:02 AM CET Rafael J. Wysocki wrote:
[CC Lukas and linux-pm]
On Mon, Jan 14, 2019 at 1:32 PM Rafael J. Wysocki rafael@kernel.org wrote:
On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:
[cut]
This thread is discussing how to deal with Armada DRM, its use of the component helper, TDA998x's hybrid use of the component helper and DRM bridges.
Currently, DRM bridges register into the DRM system and are added to a global list of bridges. When a DRM driver binds, it looks up the DRM bridge and attaches to it. There is no way in generic DRM code for the DRM driver to know if the DRM bridge is unbound from DRM, consequently a DRM driver may continue trying to call functions in the DRM bridge driver using memory that has already been freed.
We had similar issues with imx-drm, and a number of years ago at a kernel summit, this was discussed, and I proposed a system which is now known as the component helper. This handles the problem of a multi-component system.
However, DRM bridge was already established, and there appears to be no desire to convert DRM bridges and DRM drivers to use the component helper.
We are presently in the situation where Armada DRM is incompatible with the DRM bridges way of doing things, and making it compatible essentially means introducing potential use-after-free bugs into the code.
Device links in their stateful form appear to provide an alternative to the component helper, in that a stateful device link will remove consumers of a resource when the supplier is going away - which is exactly the problem which the component helper is solving. The difference is that device links look like being a cleaner solution.
Just like the component helper, a stateful link would unbind the consumer of a resource when the supplier goes away - which is exactly the behaviour we're wanting.
The problem is that the connection between various drivers is only really known when the drivers obtain their resources, and the following can happen:
supplier consumer
probe() alloc probe() publish obtain supplier's resource return
Where things go wrong is if a stateful link is created when the consumer obtains the suppliers resource before the supplier has finished probing - which from what's been said is illegal.
It just doesn't work (which means "invalid" rather than "illegal" I guess, but whatever :-)).
Admittedly, the original design didn't take this particular case into account and I'm not actually sure how it can be taken into account either.
If the link is created by the consumer in the scenario above, its status will be updated twice in a row after the consumer probe returns and after the supplier probe returns. It looks like this update would need to work regardless of the order it which this happened which sounds somewhat challenging. I would need to think about that.
So if I'm not mistaken it can be made work with the help of the (completely untested) attached patch (of course, the documentation would need to be updated too).
The key observation here is that it should be fine to create a link from the consumer driver's probe while the supplier is still probing if the consumer has some way to find out that the supplier is functional at that point (like when it has published itself already in your example above). The initial state of the link can be "consumer probe" in that case and I don't see a reason why that might not work.
Below is a more complete patch that should take all of the corner cases into account unless I have missed anything. Testing it would be appreciated. :-)
From: Rafael J. Wysocki rafael.j.wysocki@intel.com Subject: [PATCH] driver core: Fix adding device links to probing suppliers
Currently, it is not valid to add a device link from a consumer driver ->probe() callback to a supplier that is still probing too, but generally this is a valid use case. For example, if the consumer has just acquired a resource that can only be available when the supplier is functional, adding a device link to that supplier right away should be safe (and even desirable arguably), but device_link_add() doesn't handle that case correctly and the initial state of the link created by it is wrong then.
To address this problem, change the initial state of device links added between a probing supplier and a probing consumer to DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to skip such links on the supplier side.
With this change, if the supplier probe completes first, device_links_driver_bound() called for it will skip the link state update and when it is called for the consumer, the link state will be updated to "active". In turn, if the consumer probe completes first, device_links_driver_bound() called for it will change the state of the link to "active" and when it is called for the supplier, the link status update will be skipped.
However, in principle the supplier or consumer probe may still fail after the link has been added, so modify device_links_no_driver() to change device links in the "active" or "consumer probe" state to "dormant" on the supplier side and update __device_links_no_driver() to change the link state to "available" only if it is "consumer probe" or "active".
Then, if the supplier probe fails first, the leftover link to the probing consumer will become "dormant" and device_links_no_driver() called for the consumer (when its probe fails) will clean it up. In turn, if the consumer probe fails first, it will either drop the link, or change its state to "available" and, in the latter case, when device_links_no_driver() is called for the supplier, it will update the link state to "dormant". [If the supplier probe fails, but the consumer probe succeeds, which should not happen as long as the consumer driver is correct, the link still will be around, but it will be "dormant" until the supplier is probed again.]
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com
Pulling in a bunch of drm_bridge and drm_panel people. See huge discussion upthread, this here is Rafael's idea for making device links more useful. Would be great if someone could test this out with panel/bridge dependencies.
I guess this here isn't yet solving the reprobe issue where the provider was unloaded and reloaded (which should cause the consumer to reprobe too, with the EPROBE_DEFERRED logic). I think that was the other issue we've hit.
I don't think it is addressed here.
Can anyone please explain to me what happens in more detail?
Cheers, Rafael