On Thursday, January 17, 2019 6:26:25 PM CET Russell King - ARM Linux admin wrote:
On Tue, Jan 15, 2019 at 11:47:00PM +0100, Rafael J. Wysocki wrote:
From: Rafael J. Wysocki rafael.j.wysocki@intel.com Subject: [PATCH] driver core: Fix adding device links to probing suppliers
Currently, it is not valid to add a device link from a consumer driver ->probe() callback to a supplier that is still probing too, but generally this is a valid use case. For example, if the consumer has just acquired a resource that can only be available when the supplier is functional, adding a device link to that supplier right away should be safe (and even desirable arguably), but device_link_add() doesn't handle that case correctly and the initial state of the link created by it is wrong then.
To address this problem, change the initial state of device links added between a probing supplier and a probing consumer to DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to skip such links on the supplier side.
With this change, if the supplier probe completes first, device_links_driver_bound() called for it will skip the link state update and when it is called for the consumer, the link state will be updated to "active". In turn, if the consumer probe completes first, device_links_driver_bound() called for it will change the state of the link to "active" and when it is called for the supplier, the link status update will be skipped.
However, in principle the supplier or consumer probe may still fail after the link has been added, so modify device_links_no_driver() to change device links in the "active" or "consumer probe" state to "dormant" on the supplier side and update __device_links_no_driver() to change the link state to "available" only if it is "consumer probe" or "active".
Then, if the supplier probe fails first, the leftover link to the probing consumer will become "dormant" and device_links_no_driver() called for the consumer (when its probe fails) will clean it up. In turn, if the consumer probe fails first, it will either drop the link, or change its state to "available" and, in the latter case, when device_links_no_driver() is called for the supplier, it will update the link state to "dormant". [If the supplier probe fails, but the consumer probe succeeds, which should not happen as long as the consumer driver is correct, the link still will be around, but it will be "dormant" until the supplier is probed again.]
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com
Hi Rafael,
Hi,
I've tried this with Armada DRM without using the component helper for bridges, and it seems to work up to a point (I have a vga bridge here to allow further testing):
[ 2.323441] armada-drm display-subsystem: assigned reserved memory node framebuffer [ 2.332001] armada-drm display-subsystem: bound f1820000.lcd-controller (ops armada_lcd_ops) [ 2.340790] armada-drm display-subsystem: bound f1810000.lcd-controller (ops armada_lcd_ops) [ 2.349345] armada-drm display-subsystem: node=/i2c-mux/i2c@0/hdmi-encoder@70 [ 2.356719] armada-drm display-subsystem: panel=fffffdfb bridge= (null) ret=0 [ 2.364038] armada-drm display-subsystem: Linked as a consumer to 1-0070 [ 2.370818] armada-drm display-subsystem: bridge=ee8cda2c ret=0 [ 2.376894] armada-drm display-subsystem: node=/vga-bridge [ 2.382453] armada-drm display-subsystem: panel=fffffdfb bridge=(null) ret=0 [ 2.389762] armada-drm display-subsystem: Linked as a consumer to vga-bridge [ 2.396883] armada-drm display-subsystem: bridge=ef3bec40 ret=0
When I remove the HDMI encoder:
root@cubox:/sys/bus/i2c/drivers/tda998x# echo 1-0070 > unbind
then I get:
[ 1013.824860] Console: switching to colour dummy device 80x30 [ 1013.866785] armada-drm display-subsystem: Dropping the link to vga-bridge [ 1013.867126] armada-drm display-subsystem: Dropping the link to 1-0070
which looks like it did what was expected - the DRM device is indeed unbound, the nodes in /dev/dri are gone.
OK, thanks!
This tells me that the patch is an improvement, so I'm going to move on to integrate it.
When rebinding the HDMI encoder:
[ 1015.864703] tda998x 1-0070: found TDA19988 [ 1015.898078] tda9950 1-0034: TDA9950 CEC interface, hardware version 3.3 [ 1015.941374] Registered IR keymap rc-cec [ 1015.941684] rc rc0: tda9950 as /devices/platform/mbus/mbus:internal-regs/f1011000.i2c/i2c-0/i2c-1/1-0070/rc/rc0 [ 1015.942439] input: tda9950 as /devices/platform/mbus/mbus:internal-regs/f1011000.i2c/i2c-0/i2c-1/1-0070/rc/rc0/input2 [ 1016.982921] alloc_contig_range: [10390, 10490) PFNs busy [ 1016.987400] kirkwood-spdif-audio audio-subsystem: snd-soc-dummy-dai <-> kirkwood-fe mapping ok [ 1016.987591] kirkwood-spdif-audio audio-subsystem: multicodec <-> kirkwood-spdif mapping ok
but the DRM stuff doesn't come back - this is what Daniel was talking about further down this thread.
Right, it's that case AFAICS.
I guess the kernel can't know that it should come back because the device links were dropped at the unbind stage, which means the kernel has lost the information necessary to know that the display subsystem is dependent on the presence of the HDMI encoder. I don't see an easy way around that.
If the link is defined as "persistent", it will survive the removal of drivers, but it will become "dormant".
Currently, if such a link is present, the consumer probe fill fail with -EPROBE_DEFER, but it might also trigger device_attach() on the supplier. It would make sense to do that anyway IMO.
If we keep the device links after an unbind event, then, because we create them during probe, we'll be attempting to recreate the link when we reattach the supplier to the consumer. If we only allow one instance, then what does device_link_add() return.
It returns the existing link, but it will reference-count it too.
However, that arguably is an overlooked use case too, because "persistent" links created during consumer or supplier probe should not be reference counted if the same probe runs again. Extra flags may be needed to handle that, though.
Maybe it is going to be less painful to push everything bridge-related to use the component helper after all. Dunno. Problems either way.
Well, this is a fairly complicated use case and I'm glad that we have it, because it shows what's missing.