Armada DRM: bridge with componentized devices

List overview All Threads
Download

newer

older

[PATCH v8 0/3] Add support...

[PATCH] drm/modes: Prevent...

Lubomir Rintel

3 Jan 2019 3 Jan '19

9:47 a.m.

Hello,

lately I've been trying to make the Himax HX8837 chip that drives the OLPC LCD display work with Armada DRM driver. I've been advised to create a bridge driver and not an encoder driver since the silicon is separate from the LCDC.

The Armada DRM driver (and, I think, the i.MX one) creates the drm_device once the component infrastructure sees the necessary sub-devices appear. The sub-devices being the LCDCs and the encoders (not bridges) that it expects to be created externally.

Currently, it seems, the only driver that can actually work with this (that is -- creates a drm_encoder for a drm_device when the component is bound) is the tda998x. All other similar drivers create a drm_bridge instead and not use the component infrastructure at all. (In fact, tilcdc driver contains a hack to handle tda998x specially.)

I'm wondering how to reconcile the two?

* The tda998x driver has recently been modified to create a bridge on probe and eventually encoder on component bind. Is this an okay thing to do in a new driver? (this probably means the tilcdc hack can be removed...)

* If a non-componentized bridge were to be used (along with a dummy encoder), at what point would it make sense to look for the bridge? Would it be a good idea to defer the probe of crtc until a bridge can be looked up and the attach it on component bind? What if the bridge goes away (a module is unloaded, etc.) in between?

I'd be thankful for opintions and advice before I move ahead with this.

Thanks, Lubo

Show replies by date

Russell King - ARM Linux

3 Jan 3 Jan

1:11 p.m.

On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote:

...

Hello,

lately I've been trying to make the Himax HX8837 chip that drives the OLPC LCD display work with Armada DRM driver. I've been advised to create a bridge driver and not an encoder driver since the silicon is separate from the LCDC.

The Armada DRM driver (and, I think, the i.MX one) creates the drm_device once the component infrastructure sees the necessary sub-devices appear. The sub-devices being the LCDCs and the encoders (not bridges) that it expects to be created externally.

Currently, it seems, the only driver that can actually work with this (that is -- creates a drm_encoder for a drm_device when the component is bound) is the tda998x. All other similar drivers create a drm_bridge instead and not use the component infrastructure at all. (In fact, tilcdc driver contains a hack to handle tda998x specially.)

I'm wondering how to reconcile the two?

The tda998x driver has recently been modified to create a bridge on probe and eventually encoder on component bind. Is this an okay thing to do in a new driver? (this probably means the tilcdc hack can be removed...)

If a non-componentized bridge were to be used (along with a dummy encoder), at what point would it make sense to look for the bridge? Would it be a good idea to defer the probe of crtc until a bridge can be looked up and the attach it on component bind? What if the bridge goes away (a module is unloaded, etc.) in between?

I'd be thankful for opintions and advice before I move ahead with this.

This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

The issue about the encoders is something that I've tried to discuss, and I've pointed out that moving it into the DRM driver adds additional complexity there, and I'd hoped that my patch set I posted would've generated discussion, but alas not.

What I'm not prepared to do is to introduce _known_ bugs into any driver that I maintain.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Daniel Vetter

7 Jan 7 Jan

10:45 a.m.

On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...

On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote:

...
Hello,

lately I've been trying to make the Himax HX8837 chip that drives the OLPC LCD display work with Armada DRM driver. I've been advised to create a bridge driver and not an encoder driver since the silicon is separate from the LCDC.

The Armada DRM driver (and, I think, the i.MX one) creates the drm_device once the component infrastructure sees the necessary sub-devices appear. The sub-devices being the LCDCs and the encoders (not bridges) that it expects to be created externally.

Currently, it seems, the only driver that can actually work with this (that is -- creates a drm_encoder for a drm_device when the component is bound) is the tda998x. All other similar drivers create a drm_bridge instead and not use the component infrastructure at all. (In fact, tilcdc driver contains a hack to handle tda998x specially.)

I'm wondering how to reconcile the two?

The tda998x driver has recently been modified to create a bridge on probe and eventually encoder on component bind. Is this an okay thing to do in a new driver? (this probably means the tilcdc hack can be removed...)

If a non-componentized bridge were to be used (along with a dummy encoder), at what point would it make sense to look for the bridge? Would it be a good idea to defer the probe of crtc until a bridge can be looked up and the attach it on component bind? What if the bridge goes away (a module is unloaded, etc.) in between?

I'd be thankful for opintions and advice before I move ahead with this.

This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

...

The issue about the encoders is something that I've tried to discuss, and I've pointed out that moving it into the DRM driver adds additional complexity there, and I'd hoped that my patch set I posted would've generated discussion, but alas not.

What I'm not prepared to do is to introduce _known_ bugs into any driver that I maintain.

I thought last time around the idea was to use device links (and teach drm_bridge about them), so that the unloading works correctly.

Wrt tda988x: I think it really shouldn't create a drm_encoder, nor register as a component. Fixing that is probably a bit more work. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Andrzej Hajda

11:26 a.m.

On 07.01.2019 11:45, Daniel Vetter wrote:

...

On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote:

...
Hello,

lately I've been trying to make the Himax HX8837 chip that drives the OLPC LCD display work with Armada DRM driver. I've been advised to create a bridge driver and not an encoder driver since the silicon is separate from the LCDC.

The Armada DRM driver (and, I think, the i.MX one) creates the drm_device once the component infrastructure sees the necessary sub-devices appear. The sub-devices being the LCDCs and the encoders (not bridges) that it expects to be created externally.

Currently, it seems, the only driver that can actually work with this (that is -- creates a drm_encoder for a drm_device when the component is bound) is the tda998x. All other similar drivers create a drm_bridge instead and not use the component infrastructure at all. (In fact, tilcdc driver contains a hack to handle tda998x specially.)

I'm wondering how to reconcile the two?

The tda998x driver has recently been modified to create a bridge on probe and eventually encoder on component bind. Is this an okay thing to do in a new driver? (this probably means the tilcdc hack can be removed...)

If a non-componentized bridge were to be used (along with a dummy encoder), at what point would it make sense to look for the bridge? Would it be a good idea to defer the probe of crtc until a bridge can be looked up and the attach it on component bind? What if the bridge goes away (a module is unloaded, etc.) in between?

I'd be thankful for opintions and advice before I move ahead with this.

This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

...
The issue about the encoders is something that I've tried to discuss, and I've pointed out that moving it into the DRM driver adds additional complexity there, and I'd hoped that my patch set I posted would've generated discussion, but alas not.

What I'm not prepared to do is to introduce _known_ bugs into any driver that I maintain.

I thought last time around the idea was to use device links (and teach drm_bridge about them), so that the unloading works correctly.

With current device_links implementation registering links in probe of the consumer is just incorrect - it can happen that neither consumer neither provider is fully probed and creating device links in such state is wrong according to docs, and my experiments. See [1] for discussion and [2] for docs.

[1]: https://patchwork.freedesktop.org/patch/218878/

[2]: https://www.kernel.org/doc/html/latest/driver-api/device_link.html#usage

Regards

Andrzej

...

Wrt tda988x: I think it really shouldn't create a drm_encoder, nor register as a component. Fixing that is probably a bit more work. -Daniel

Daniel Vetter

4:08 p.m.

On Mon, Jan 07, 2019 at 12:26:58PM +0100, Andrzej Hajda wrote:

...

On 07.01.2019 11:45, Daniel Vetter wrote:

...
On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote:

...
Hello,

lately I've been trying to make the Himax HX8837 chip that drives the OLPC LCD display work with Armada DRM driver. I've been advised to create a bridge driver and not an encoder driver since the silicon is separate from the LCDC.

The Armada DRM driver (and, I think, the i.MX one) creates the drm_device once the component infrastructure sees the necessary sub-devices appear. The sub-devices being the LCDCs and the encoders (not bridges) that it expects to be created externally.

Currently, it seems, the only driver that can actually work with this (that is -- creates a drm_encoder for a drm_device when the component is bound) is the tda998x. All other similar drivers create a drm_bridge instead and not use the component infrastructure at all. (In fact, tilcdc driver contains a hack to handle tda998x specially.)

I'm wondering how to reconcile the two?

The tda998x driver has recently been modified to create a bridge on probe and eventually encoder on component bind. Is this an okay thing to do in a new driver? (this probably means the tilcdc hack can be removed...)

If a non-componentized bridge were to be used (along with a dummy encoder), at what point would it make sense to look for the bridge? Would it be a good idea to defer the probe of crtc until a bridge can be looked up and the attach it on component bind? What if the bridge goes away (a module is unloaded, etc.) in between?

I'd be thankful for opintions and advice before I move ahead with this.

This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

...
The issue about the encoders is something that I've tried to discuss, and I've pointed out that moving it into the DRM driver adds additional complexity there, and I'd hoped that my patch set I posted would've generated discussion, but alas not.

What I'm not prepared to do is to introduce _known_ bugs into any driver that I maintain.

I thought last time around the idea was to use device links (and teach drm_bridge about them), so that the unloading works correctly.

With current device_links implementation registering links in probe of the consumer is just incorrect - it can happen that neither consumer neither provider is fully probed and creating device links in such state is wrong according to docs, and my experiments. See [1] for discussion and [2] for docs.

We could set up the device link only at drm_dev_register time. At that point we know that driver loading has fully succeeded. We do have a list of bridges at hand, so that's not going to be an issue.

The only case where this breaks is if a driver is still using the deprecated ->load callback. But that ->load callback doesn't mix well with EDEFER_PROBE/component framework anyway, so I think not going to be a problem hopefully? -Daniel

...

Regards

Andrzej

...
Wrt tda988x: I think it really shouldn't create a drm_encoder, nor register as a component. Fixing that is probably a bit more work. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Andrzej Hajda

4:27 p.m.

On 07.01.2019 17:08, Daniel Vetter wrote:

...

On Mon, Jan 07, 2019 at 12:26:58PM +0100, Andrzej Hajda wrote:

...
On 07.01.2019 11:45, Daniel Vetter wrote:

...
On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote:

...
Hello,

lately I've been trying to make the Himax HX8837 chip that drives the OLPC LCD display work with Armada DRM driver. I've been advised to create a bridge driver and not an encoder driver since the silicon is separate from the LCDC.

The Armada DRM driver (and, I think, the i.MX one) creates the drm_device once the component infrastructure sees the necessary sub-devices appear. The sub-devices being the LCDCs and the encoders (not bridges) that it expects to be created externally.

Currently, it seems, the only driver that can actually work with this (that is -- creates a drm_encoder for a drm_device when the component is bound) is the tda998x. All other similar drivers create a drm_bridge instead and not use the component infrastructure at all. (In fact, tilcdc driver contains a hack to handle tda998x specially.)

I'm wondering how to reconcile the two?

The tda998x driver has recently been modified to create a bridge on probe and eventually encoder on component bind. Is this an okay thing to do in a new driver? (this probably means the tilcdc hack can be removed...)

If a non-componentized bridge were to be used (along with a dummy encoder), at what point would it make sense to look for the bridge? Would it be a good idea to defer the probe of crtc until a bridge can be looked up and the attach it on component bind? What if the bridge goes away (a module is unloaded, etc.) in between?

I'd be thankful for opintions and advice before I move ahead with this.

This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

...
The issue about the encoders is something that I've tried to discuss, and I've pointed out that moving it into the DRM driver adds additional complexity there, and I'd hoped that my patch set I posted would've generated discussion, but alas not.

What I'm not prepared to do is to introduce _known_ bugs into any driver that I maintain.

I thought last time around the idea was to use device links (and teach drm_bridge about them), so that the unloading works correctly.

With current device_links implementation registering links in probe of the consumer is just incorrect - it can happen that neither consumer neither provider is fully probed and creating device links in such state is wrong according to docs, and my experiments. See [1] for discussion and [2] for docs.

We could set up the device link only at drm_dev_register time. At that point we know that driver loading has fully succeeded. We do have a list of bridges at hand, so that's not going to be an issue.

The only case where this breaks is if a driver is still using the deprecated ->load callback. But that ->load callback doesn't mix well with EDEFER_PROBE/component framework anyway, so I think not going to be a problem hopefully?

drm_dev_register usually is called from bind callback, which is called from probe callback of one of the components or master (depending on particular probe order). If you want to register device link in this function it is possible that the bad scenario will happen - there will be attempt to create device link between not-yet-probed consumer and not-yet-probed provider.

Regards

Andrzej

...

-Daniel

...
Regards

Andrzej

...
Wrt tda988x: I think it really shouldn't create a drm_encoder, nor register as a component. Fixing that is probably a bit more work. -Daniel

Daniel Vetter

9:56 p.m.

On Mon, Jan 7, 2019 at 5:27 PM Andrzej Hajda a.hajda@samsung.com wrote:

...

On 07.01.2019 17:08, Daniel Vetter wrote:

...
On Mon, Jan 07, 2019 at 12:26:58PM +0100, Andrzej Hajda wrote:

...
On 07.01.2019 11:45, Daniel Vetter wrote:

...
On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote:

...
Hello,

lately I've been trying to make the Himax HX8837 chip that drives the OLPC LCD display work with Armada DRM driver. I've been advised to create a bridge driver and not an encoder driver since the silicon is separate from the LCDC.

The Armada DRM driver (and, I think, the i.MX one) creates the drm_device once the component infrastructure sees the necessary sub-devices appear. The sub-devices being the LCDCs and the encoders (not bridges) that it expects to be created externally.

Currently, it seems, the only driver that can actually work with this (that is -- creates a drm_encoder for a drm_device when the component is bound) is the tda998x. All other similar drivers create a drm_bridge instead and not use the component infrastructure at all. (In fact, tilcdc driver contains a hack to handle tda998x specially.)

I'm wondering how to reconcile the two?

The tda998x driver has recently been modified to create a bridge on probe and eventually encoder on component bind. Is this an okay thing to do in a new driver? (this probably means the tilcdc hack can be removed...)

If a non-componentized bridge were to be used (along with a dummy encoder), at what point would it make sense to look for the bridge? Would it be a good idea to defer the probe of crtc until a bridge can be looked up and the attach it on component bind? What if the bridge goes away (a module is unloaded, etc.) in between?

I'd be thankful for opintions and advice before I move ahead with this.

This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

...
The issue about the encoders is something that I've tried to discuss, and I've pointed out that moving it into the DRM driver adds additional complexity there, and I'd hoped that my patch set I posted would've generated discussion, but alas not.

What I'm not prepared to do is to introduce _known_ bugs into any driver that I maintain.

I thought last time around the idea was to use device links (and teach drm_bridge about them), so that the unloading works correctly.

With current device_links implementation registering links in probe of the consumer is just incorrect - it can happen that neither consumer neither provider is fully probed and creating device links in such state is wrong according to docs, and my experiments. See [1] for discussion and [2] for docs.

We could set up the device link only at drm_dev_register time. At that point we know that driver loading has fully succeeded. We do have a list of bridges at hand, so that's not going to be an issue.

The only case where this breaks is if a driver is still using the deprecated ->load callback. But that ->load callback doesn't mix well with EDEFER_PROBE/component framework anyway, so I think not going to be a problem hopefully?

drm_dev_register usually is called from bind callback, which is called from probe callback of one of the components or master (depending on particular probe order). If you want to register device link in this function it is possible that the bad scenario will happen - there will be attempt to create device link between not-yet-probed consumer and not-yet-probed provider.

If you call drm_dev_register before you have all the components assembled (i.e. anywhere else than in master bind) that sounds like a driver bug. drm_dev_register publishes the drm device instance to the world, if you're not ready to handle userspace requests at that point (because not everything is loaded yet) then things will go boom in very colorful ways. And from my (admittedly very rough) understanding we should be able to register the the device links as the very last step in the master bind function (and drm_dev_register should be about the last thing you do in the master bind).

So not clear on why this won't work? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Andrzej Hajda

8 Jan 8 Jan

8:35 a.m.

On 07.01.2019 22:56, Daniel Vetter wrote:

...

On Mon, Jan 7, 2019 at 5:27 PM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 07.01.2019 17:08, Daniel Vetter wrote:

...
On Mon, Jan 07, 2019 at 12:26:58PM +0100, Andrzej Hajda wrote:

...
On 07.01.2019 11:45, Daniel Vetter wrote:

...
On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote: > Hello, > > lately I've been trying to make the Himax HX8837 chip that drives the OLPC > LCD display work with Armada DRM driver. I've been advised to create a > bridge driver and not an encoder driver since the silicon is separate from > the LCDC. > > The Armada DRM driver (and, I think, the i.MX one) creates the drm_device > once the component infrastructure sees the necessary sub-devices appear. > The sub-devices being the LCDCs and the encoders (not bridges) that it > expects to be created externally. > > Currently, it seems, the only driver that can actually work with this (that > is -- creates a drm_encoder for a drm_device when the component is bound) > is the tda998x. All other similar drivers create a drm_bridge instead and > not use the component infrastructure at all. (In fact, tilcdc driver > contains a hack to handle tda998x specially.) > > I'm wondering how to reconcile the two? > > * The tda998x driver has recently been modified to create a bridge on probe > and eventually encoder on component bind. Is this an okay thing to do in > a new driver? (this probably means the tilcdc hack can be removed...) > > * If a non-componentized bridge were to be used (along with a dummy > encoder), at what point would it make sense to look for the bridge? > Would it be a good idea to defer the probe of crtc until a bridge can be > looked up and the attach it on component bind? What if the bridge goes > away (a module is unloaded, etc.) in between? > > I'd be thankful for opintions and advice before I move ahead with this. This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

...
The issue about the encoders is something that I've tried to discuss, and I've pointed out that moving it into the DRM driver adds additional complexity there, and I'd hoped that my patch set I posted would've generated discussion, but alas not.

What I'm not prepared to do is to introduce _known_ bugs into any driver that I maintain.

I thought last time around the idea was to use device links (and teach drm_bridge about them), so that the unloading works correctly.

With current device_links implementation registering links in probe of the consumer is just incorrect - it can happen that neither consumer neither provider is fully probed and creating device links in such state is wrong according to docs, and my experiments. See [1] for discussion and [2] for docs.

We could set up the device link only at drm_dev_register time. At that point we know that driver loading has fully succeeded. We do have a list of bridges at hand, so that's not going to be an issue.

The only case where this breaks is if a driver is still using the deprecated ->load callback. But that ->load callback doesn't mix well with EDEFER_PROBE/component framework anyway, so I think not going to be a problem hopefully?

drm_dev_register usually is called from bind callback, which is called from probe callback of one of the components or master (depending on particular probe order). If you want to register device link in this function it is possible that the bad scenario will happen - there will be attempt to create device link between not-yet-probed consumer and not-yet-probed provider.

If you call drm_dev_register before you have all the components assembled (i.e. anywhere else than in master bind) that sounds like a driver bug.

This is how components work, every component calls component_add usually in probe, and this function checks if all components are present, if yes (ie. during probe of the last component) master bind is called and it creates drm device and then registers it. If you add device link at this moment, it is still before end of probe of the last component.

On the other side provider is registered (drm_bridge_add in case of bridges) during its probe so it becomes available to the consumers BEFORE end of its probe and again it can happen that device link will be added to early.

Regards

Andrzej

...

drm_dev_register publishes the drm device instance to the world, if you're not ready to handle userspace requests at that point (because not everything is loaded yet) then things will go boom in very colorful ways. And from my (admittedly very rough) understanding we should be able to register the the device links as the very last step in the master bind function (and drm_dev_register should be about the last thing you do in the master bind).

So not clear on why this won't work?

...

-Daniel

Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Vetter

8:47 a.m.

On Tue, Jan 8, 2019 at 9:35 AM Andrzej Hajda a.hajda@samsung.com wrote:

...

On 07.01.2019 22:56, Daniel Vetter wrote:

...
On Mon, Jan 7, 2019 at 5:27 PM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 07.01.2019 17:08, Daniel Vetter wrote:

...
On Mon, Jan 07, 2019 at 12:26:58PM +0100, Andrzej Hajda wrote:

...
On 07.01.2019 11:45, Daniel Vetter wrote:

...
On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote: > On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote: >> Hello, >> >> lately I've been trying to make the Himax HX8837 chip that drives the OLPC >> LCD display work with Armada DRM driver. I've been advised to create a >> bridge driver and not an encoder driver since the silicon is separate from >> the LCDC. >> >> The Armada DRM driver (and, I think, the i.MX one) creates the drm_device >> once the component infrastructure sees the necessary sub-devices appear. >> The sub-devices being the LCDCs and the encoders (not bridges) that it >> expects to be created externally. >> >> Currently, it seems, the only driver that can actually work with this (that >> is -- creates a drm_encoder for a drm_device when the component is bound) >> is the tda998x. All other similar drivers create a drm_bridge instead and >> not use the component infrastructure at all. (In fact, tilcdc driver >> contains a hack to handle tda998x specially.) >> >> I'm wondering how to reconcile the two? >> >> * The tda998x driver has recently been modified to create a bridge on probe >> and eventually encoder on component bind. Is this an okay thing to do in >> a new driver? (this probably means the tilcdc hack can be removed...) >> >> * If a non-componentized bridge were to be used (along with a dummy >> encoder), at what point would it make sense to look for the bridge? >> Would it be a good idea to defer the probe of crtc until a bridge can be >> looked up and the attach it on component bind? What if the bridge goes >> away (a module is unloaded, etc.) in between? >> >> I'd be thankful for opintions and advice before I move ahead with this. > This is the long-standing problem with the conflict between bridge > support and component support, and I'm not sure that there is really > any answer to it. > > I've gone into the details of the two several times on the list, > particularly about the short-comings of the bridge approach, but it > seems no one cares to fix those short-comings. > > You are re-identifying some of the issues that I've already pointed > out - such as what happens to DRM drives when the bridge driver is > unbound (it's really not about modules being unloaded, and the problem > can't be solved by taking a module reference count - all that the > module reference count does is ensure that the module doesn't go > away unexpected, there is no way to ensure that the device isn't > unbound.) > > The issue of unbinding is precisely the issue which the component > support was created to solve - but everyone seems to prefer the buggy > bridge approach, and no one seems willing to do anything about the > bugs or even acknowledge that it's a problem. It's strange - if one > identifies bugs that result in kernel oops in other kernel subsystems, > one is generally taken seriously and the problem is solved. Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

> The issue about the encoders is something that I've tried to discuss, > and I've pointed out that moving it into the DRM driver adds additional > complexity there, and I'd hoped that my patch set I posted would've > generated discussion, but alas not. > > What I'm not prepared to do is to introduce _known_ bugs into any > driver that I maintain. I thought last time around the idea was to use device links (and teach drm_bridge about them), so that the unloading works correctly.

With current device_links implementation registering links in probe of the consumer is just incorrect - it can happen that neither consumer neither provider is fully probed and creating device links in such state is wrong according to docs, and my experiments. See [1] for discussion and [2] for docs.

We could set up the device link only at drm_dev_register time. At that point we know that driver loading has fully succeeded. We do have a list of bridges at hand, so that's not going to be an issue.

The only case where this breaks is if a driver is still using the deprecated ->load callback. But that ->load callback doesn't mix well with EDEFER_PROBE/component framework anyway, so I think not going to be a problem hopefully?

drm_dev_register usually is called from bind callback, which is called from probe callback of one of the components or master (depending on particular probe order). If you want to register device link in this function it is possible that the bad scenario will happen - there will be attempt to create device link between not-yet-probed consumer and not-yet-probed provider.

If you call drm_dev_register before you have all the components assembled (i.e. anywhere else than in master bind) that sounds like a driver bug.

This is how components work, every component calls component_add usually in probe, and this function checks if all components are present, if yes (ie. during probe of the last component) master bind is called and it creates drm device and then registers it. If you add device link at this moment, it is still before end of probe of the last component.

On the other side provider is registered (drm_bridge_add in case of bridges) during its probe so it becomes available to the consumers BEFORE end of its probe and again it can happen that device link will be added to early.

Hm I read that thread again. Is the reason why we can't add the device link only the exynos behaviour to allow drm_panel to be hot-added/removed?

Note that for bridge allowing this is 100% buggy: drm core does not allow you to hotplug/hotunplug bridges.

In theory drm_panel can be hotplugged (because we can hotplug drm_connector), but I'm pretty sure exynos gets it all wrong since hotplugging panels is no easy thing to do. I don't think this is something we want to support, forcing the entire driver to be unload sounds like what we want to do here. -Daniel

...

Regards

Andrzej

...
drm_dev_register publishes the drm device instance to the world, if you're not ready to handle userspace requests at that point (because not everything is loaded yet) then things will go boom in very colorful ways. And from my (admittedly very rough) understanding we should be able to register the the device links as the very last step in the master bind function (and drm_dev_register should be about the last thing you do in the master bind).

So not clear on why this won't work?

...
-Daniel

Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Andrzej Hajda

9:22 a.m.

On 08.01.2019 09:47, Daniel Vetter wrote:

...

On Tue, Jan 8, 2019 at 9:35 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 07.01.2019 22:56, Daniel Vetter wrote:

...
On Mon, Jan 7, 2019 at 5:27 PM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 07.01.2019 17:08, Daniel Vetter wrote:

...
On Mon, Jan 07, 2019 at 12:26:58PM +0100, Andrzej Hajda wrote:

...
On 07.01.2019 11:45, Daniel Vetter wrote: > On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote: >> On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote: >>> Hello, >>> >>> lately I've been trying to make the Himax HX8837 chip that drives the OLPC >>> LCD display work with Armada DRM driver. I've been advised to create a >>> bridge driver and not an encoder driver since the silicon is separate from >>> the LCDC. >>> >>> The Armada DRM driver (and, I think, the i.MX one) creates the drm_device >>> once the component infrastructure sees the necessary sub-devices appear. >>> The sub-devices being the LCDCs and the encoders (not bridges) that it >>> expects to be created externally. >>> >>> Currently, it seems, the only driver that can actually work with this (that >>> is -- creates a drm_encoder for a drm_device when the component is bound) >>> is the tda998x. All other similar drivers create a drm_bridge instead and >>> not use the component infrastructure at all. (In fact, tilcdc driver >>> contains a hack to handle tda998x specially.) >>> >>> I'm wondering how to reconcile the two? >>> >>> * The tda998x driver has recently been modified to create a bridge on probe >>> and eventually encoder on component bind. Is this an okay thing to do in >>> a new driver? (this probably means the tilcdc hack can be removed...) >>> >>> * If a non-componentized bridge were to be used (along with a dummy >>> encoder), at what point would it make sense to look for the bridge? >>> Would it be a good idea to defer the probe of crtc until a bridge can be >>> looked up and the attach it on component bind? What if the bridge goes >>> away (a module is unloaded, etc.) in between? >>> >>> I'd be thankful for opintions and advice before I move ahead with this. >> This is the long-standing problem with the conflict between bridge >> support and component support, and I'm not sure that there is really >> any answer to it. >> >> I've gone into the details of the two several times on the list, >> particularly about the short-comings of the bridge approach, but it >> seems no one cares to fix those short-comings. >> >> You are re-identifying some of the issues that I've already pointed >> out - such as what happens to DRM drives when the bridge driver is >> unbound (it's really not about modules being unloaded, and the problem >> can't be solved by taking a module reference count - all that the >> module reference count does is ensure that the module doesn't go >> away unexpected, there is no way to ensure that the device isn't >> unbound.) >> >> The issue of unbinding is precisely the issue which the component >> support was created to solve - but everyone seems to prefer the buggy >> bridge approach, and no one seems willing to do anything about the >> bugs or even acknowledge that it's a problem. It's strange - if one >> identifies bugs that result in kernel oops in other kernel subsystems, >> one is generally taken seriously and the problem is solved. > Unbinding is really not the most important feature, especially for SoC. If > you feel different, working together with others, getting some agreement, > getting the patches reviewed and finding someone to get them merged is > very much appreciated. But just complaining won't move this forward. > >> The issue about the encoders is something that I've tried to discuss, >> and I've pointed out that moving it into the DRM driver adds additional >> complexity there, and I'd hoped that my patch set I posted would've >> generated discussion, but alas not. >> >> What I'm not prepared to do is to introduce _known_ bugs into any >> driver that I maintain. > I thought last time around the idea was to use device links (and teach > drm_bridge about them), so that the unloading works correctly. With current device_links implementation registering links in probe of the consumer is just incorrect - it can happen that neither consumer neither provider is fully probed and creating device links in such state is wrong according to docs, and my experiments. See [1] for discussion and [2] for docs.

We could set up the device link only at drm_dev_register time. At that point we know that driver loading has fully succeeded. We do have a list of bridges at hand, so that's not going to be an issue.

The only case where this breaks is if a driver is still using the deprecated ->load callback. But that ->load callback doesn't mix well with EDEFER_PROBE/component framework anyway, so I think not going to be a problem hopefully?

drm_dev_register usually is called from bind callback, which is called from probe callback of one of the components or master (depending on particular probe order). If you want to register device link in this function it is possible that the bad scenario will happen - there will be attempt to create device link between not-yet-probed consumer and not-yet-probed provider.

If you call drm_dev_register before you have all the components assembled (i.e. anywhere else than in master bind) that sounds like a driver bug.

This is how components work, every component calls component_add usually in probe, and this function checks if all components are present, if yes (ie. during probe of the last component) master bind is called and it creates drm device and then registers it. If you add device link at this moment, it is still before end of probe of the last component.

On the other side provider is registered (drm_bridge_add in case of bridges) during its probe so it becomes available to the consumers BEFORE end of its probe and again it can happen that device link will be added to early.

Hm I read that thread again. Is the reason why we can't add the device link only the exynos behaviour to allow drm_panel to be hot-added/removed?

Not only. What I have described above is common to all drivers it has nothing specific to Exynos.

Of course it was tested on Exynos as this is HW I work on. Linus Walleij has also reported in this thread that device links have different issue - circular dependencies (last few emails in this lengthy thread). My response explains it in detail.

...

Note that for bridge allowing this is 100% buggy: drm core does not allow you to hotplug/hotunplug bridges.

What part of drm core disallows it? As I remember discussions about drm_bridge design there were voices that they should be hot(un)pluggable, and they are IMO, of course if they are not active.

...

In theory drm_panel can be hotplugged (because we can hotplug drm_connector), but I'm pretty sure exynos gets it all wrong since hotplugging panels is no easy thing to do. I don't think this is something we want to support, forcing the entire driver to be unload sounds like what we want to do here.

I do not know if it is 'all wrong', but tests shows it is hot(un)pluggable. In both cases of panel and bridge :)

Regards

Andrzej

...

-Daniel

...
Regards

Andrzej

...
drm_dev_register publishes the drm device instance to the world, if you're not ready to handle userspace requests at that point (because not everything is loaded yet) then things will go boom in very colorful ways. And from my (admittedly very rough) understanding we should be able to register the the device links as the very last step in the master bind function (and drm_dev_register should be about the last thing you do in the master bind).

So not clear on why this won't work?

...
-Daniel

Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Russell King - ARM Linux

10:23 a.m.

On Tue, Jan 08, 2019 at 10:22:06AM +0100, Andrzej Hajda wrote:

...

What part of drm core disallows it? As I remember discussions about drm_bridge design there were voices that they should be hot(un)pluggable, and they are IMO, of course if they are not active.

Even if they are not active, once the DRM master device has used of_drm_find_bridge(), it has a reference on struct drm_bridge. Normally, that is allocated using something like devm_kzalloc() in the bridge drivers probe function.

When the bridge driver is unbound, that memory will be freed, but there is no notification to the DRM master that this structure is no longer valid (unless you have something implemented in exynos between the exynos core and the bridge driver(s) that specifically deals with that.) Note that there is nothing in drm_bridge_remove() that does anything beyond removing the bridge from the global list of bridges.

Any further accesses by the DRM master to that struct drm_bridge will be a use-after-free of that memory.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Andrzej Hajda

10:32 a.m.

On 08.01.2019 11:23, Russell King - ARM Linux wrote:

...

On Tue, Jan 08, 2019 at 10:22:06AM +0100, Andrzej Hajda wrote:

...
What part of drm core disallows it? As I remember discussions about drm_bridge design there were voices that they should be hot(un)pluggable, and they are IMO, of course if they are not active.

Even if they are not active, once the DRM master device has used of_drm_find_bridge(), it has a reference on struct drm_bridge. Normally, that is allocated using something like devm_kzalloc() in the bridge drivers probe function.

When the bridge driver is unbound, that memory will be freed, but there is no notification to the DRM master that this structure is no longer valid (unless you have something implemented in exynos between the exynos core and the bridge driver(s) that specifically deals with that.) Note that there is nothing in drm_bridge_remove() that does anything beyond removing the bridge from the global list of bridges.

Any further accesses by the DRM master to that struct drm_bridge will be a use-after-free of that memory.

This is fortunate case of mipi-dsi bus, where master is notified upon child removal (mipi_dsi_host_ops::detach), so it can perform proper cleanup.

Regards

Andrzej

Daniel Vetter

10:24 a.m.

On Tue, Jan 8, 2019 at 10:22 AM Andrzej Hajda a.hajda@samsung.com wrote:

...

On 08.01.2019 09:47, Daniel Vetter wrote:

...
On Tue, Jan 8, 2019 at 9:35 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 07.01.2019 22:56, Daniel Vetter wrote:

...
On Mon, Jan 7, 2019 at 5:27 PM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 07.01.2019 17:08, Daniel Vetter wrote:

...
On Mon, Jan 07, 2019 at 12:26:58PM +0100, Andrzej Hajda wrote: > On 07.01.2019 11:45, Daniel Vetter wrote: >> On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote: >>> On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote: >>>> Hello, >>>> >>>> lately I've been trying to make the Himax HX8837 chip that drives the OLPC >>>> LCD display work with Armada DRM driver. I've been advised to create a >>>> bridge driver and not an encoder driver since the silicon is separate from >>>> the LCDC. >>>> >>>> The Armada DRM driver (and, I think, the i.MX one) creates the drm_device >>>> once the component infrastructure sees the necessary sub-devices appear. >>>> The sub-devices being the LCDCs and the encoders (not bridges) that it >>>> expects to be created externally. >>>> >>>> Currently, it seems, the only driver that can actually work with this (that >>>> is -- creates a drm_encoder for a drm_device when the component is bound) >>>> is the tda998x. All other similar drivers create a drm_bridge instead and >>>> not use the component infrastructure at all. (In fact, tilcdc driver >>>> contains a hack to handle tda998x specially.) >>>> >>>> I'm wondering how to reconcile the two? >>>> >>>> * The tda998x driver has recently been modified to create a bridge on probe >>>> and eventually encoder on component bind. Is this an okay thing to do in >>>> a new driver? (this probably means the tilcdc hack can be removed...) >>>> >>>> * If a non-componentized bridge were to be used (along with a dummy >>>> encoder), at what point would it make sense to look for the bridge? >>>> Would it be a good idea to defer the probe of crtc until a bridge can be >>>> looked up and the attach it on component bind? What if the bridge goes >>>> away (a module is unloaded, etc.) in between? >>>> >>>> I'd be thankful for opintions and advice before I move ahead with this. >>> This is the long-standing problem with the conflict between bridge >>> support and component support, and I'm not sure that there is really >>> any answer to it. >>> >>> I've gone into the details of the two several times on the list, >>> particularly about the short-comings of the bridge approach, but it >>> seems no one cares to fix those short-comings. >>> >>> You are re-identifying some of the issues that I've already pointed >>> out - such as what happens to DRM drives when the bridge driver is >>> unbound (it's really not about modules being unloaded, and the problem >>> can't be solved by taking a module reference count - all that the >>> module reference count does is ensure that the module doesn't go >>> away unexpected, there is no way to ensure that the device isn't >>> unbound.) >>> >>> The issue of unbinding is precisely the issue which the component >>> support was created to solve - but everyone seems to prefer the buggy >>> bridge approach, and no one seems willing to do anything about the >>> bugs or even acknowledge that it's a problem. It's strange - if one >>> identifies bugs that result in kernel oops in other kernel subsystems, >>> one is generally taken seriously and the problem is solved. >> Unbinding is really not the most important feature, especially for SoC. If >> you feel different, working together with others, getting some agreement, >> getting the patches reviewed and finding someone to get them merged is >> very much appreciated. But just complaining won't move this forward. >> >>> The issue about the encoders is something that I've tried to discuss, >>> and I've pointed out that moving it into the DRM driver adds additional >>> complexity there, and I'd hoped that my patch set I posted would've >>> generated discussion, but alas not. >>> >>> What I'm not prepared to do is to introduce _known_ bugs into any >>> driver that I maintain. >> I thought last time around the idea was to use device links (and teach >> drm_bridge about them), so that the unloading works correctly. > With current device_links implementation registering links in probe of > the consumer is just incorrect - it can happen that neither consumer > neither provider is fully probed and creating device links in such state > is wrong according to docs, and my experiments. See [1] for discussion > and [2] for docs. We could set up the device link only at drm_dev_register time. At that point we know that driver loading has fully succeeded. We do have a list of bridges at hand, so that's not going to be an issue.

The only case where this breaks is if a driver is still using the deprecated ->load callback. But that ->load callback doesn't mix well with EDEFER_PROBE/component framework anyway, so I think not going to be a problem hopefully?

drm_dev_register usually is called from bind callback, which is called from probe callback of one of the components or master (depending on particular probe order). If you want to register device link in this function it is possible that the bad scenario will happen - there will be attempt to create device link between not-yet-probed consumer and not-yet-probed provider.

If you call drm_dev_register before you have all the components assembled (i.e. anywhere else than in master bind) that sounds like a driver bug.

This is how components work, every component calls component_add usually in probe, and this function checks if all components are present, if yes (ie. during probe of the last component) master bind is called and it creates drm device and then registers it. If you add device link at this moment, it is still before end of probe of the last component.

On the other side provider is registered (drm_bridge_add in case of bridges) during its probe so it becomes available to the consumers BEFORE end of its probe and again it can happen that device link will be added to early.

Hm I read that thread again. Is the reason why we can't add the device link only the exynos behaviour to allow drm_panel to be hot-added/removed?

Not only. What I have described above is common to all drivers it has nothing specific to Exynos.

I'm pretty sure that most drivers do not hot-add/remove drm things after drm_dev_register (except drm_connector for dp mst support). It would be buggy. Do you have more examples? I haven't reviewed them all in detail, and things might have changed, so could very well be there's exceptions.

...

Of course it was tested on Exynos as this is HW I work on. Linus Walleij has also reported in this thread that device links have different issue

circular dependencies (last few emails in this lengthy thread). My

response explains it in detail.

...
Note that for bridge allowing this is 100% buggy: drm core does not allow you to hotplug/hotunplug bridges.

What part of drm core disallows it? As I remember discussions about drm_bridge design there were voices that they should be hot(un)pluggable, and they are IMO, of course if they are not active.

There's no locking at all to handle bridges appearing/disappearing while the drm_device can be accessed. There's also no lifetime management/refcounting. So it "works" but it's racy like mad.

...

...
In theory drm_panel can be hotplugged (because we can hotplug drm_connector), but I'm pretty sure exynos gets it all wrong since hotplugging panels is no easy thing to do. I don't think this is something we want to support, forcing the entire driver to be unload sounds like what we want to do here.

I do not know if it is 'all wrong', but tests shows it is hot(un)pluggable. In both cases of panel and bridge :)

There's no drm_connector_get/put in exynos (except the one in hdmi_mode_fixup, which looks rather fishy tbh). And drm_connector are the only things in drm you can hotplug/unplug (except the overall drm_device ofc). So again it maybe "works", but you're guaranteed to have lots of races all over the place. -Daniel

...

Regards

Andrzej

...
-Daniel

...
Regards

Andrzej

...
drm_dev_register publishes the drm device instance to the world, if you're not ready to handle userspace requests at that point (because not everything is loaded yet) then things will go boom in very colorful ways. And from my (admittedly very rough) understanding we should be able to register the the device links as the very last step in the master bind function (and drm_dev_register should be about the last thing you do in the master bind).

So not clear on why this won't work?

...
-Daniel

Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Andrzej Hajda

11:25 a.m.

On 08.01.2019 11:24, Daniel Vetter wrote:

...

On Tue, Jan 8, 2019 at 10:22 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 08.01.2019 09:47, Daniel Vetter wrote:

...
On Tue, Jan 8, 2019 at 9:35 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 07.01.2019 22:56, Daniel Vetter wrote:

...
On Mon, Jan 7, 2019 at 5:27 PM Andrzej Hajda a.hajda@samsung.com wrote:

...
On 07.01.2019 17:08, Daniel Vetter wrote: > On Mon, Jan 07, 2019 at 12:26:58PM +0100, Andrzej Hajda wrote: >> On 07.01.2019 11:45, Daniel Vetter wrote: >>> On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote: >>>> On Thu, Jan 03, 2019 at 10:47:27AM +0100, Lubomir Rintel wrote: >>>>> Hello, >>>>> >>>>> lately I've been trying to make the Himax HX8837 chip that drives the OLPC >>>>> LCD display work with Armada DRM driver. I've been advised to create a >>>>> bridge driver and not an encoder driver since the silicon is separate from >>>>> the LCDC. >>>>> >>>>> The Armada DRM driver (and, I think, the i.MX one) creates the drm_device >>>>> once the component infrastructure sees the necessary sub-devices appear. >>>>> The sub-devices being the LCDCs and the encoders (not bridges) that it >>>>> expects to be created externally. >>>>> >>>>> Currently, it seems, the only driver that can actually work with this (that >>>>> is -- creates a drm_encoder for a drm_device when the component is bound) >>>>> is the tda998x. All other similar drivers create a drm_bridge instead and >>>>> not use the component infrastructure at all. (In fact, tilcdc driver >>>>> contains a hack to handle tda998x specially.) >>>>> >>>>> I'm wondering how to reconcile the two? >>>>> >>>>> * The tda998x driver has recently been modified to create a bridge on probe >>>>> and eventually encoder on component bind. Is this an okay thing to do in >>>>> a new driver? (this probably means the tilcdc hack can be removed...) >>>>> >>>>> * If a non-componentized bridge were to be used (along with a dummy >>>>> encoder), at what point would it make sense to look for the bridge? >>>>> Would it be a good idea to defer the probe of crtc until a bridge can be >>>>> looked up and the attach it on component bind? What if the bridge goes >>>>> away (a module is unloaded, etc.) in between? >>>>> >>>>> I'd be thankful for opintions and advice before I move ahead with this. >>>> This is the long-standing problem with the conflict between bridge >>>> support and component support, and I'm not sure that there is really >>>> any answer to it. >>>> >>>> I've gone into the details of the two several times on the list, >>>> particularly about the short-comings of the bridge approach, but it >>>> seems no one cares to fix those short-comings. >>>> >>>> You are re-identifying some of the issues that I've already pointed >>>> out - such as what happens to DRM drives when the bridge driver is >>>> unbound (it's really not about modules being unloaded, and the problem >>>> can't be solved by taking a module reference count - all that the >>>> module reference count does is ensure that the module doesn't go >>>> away unexpected, there is no way to ensure that the device isn't >>>> unbound.) >>>> >>>> The issue of unbinding is precisely the issue which the component >>>> support was created to solve - but everyone seems to prefer the buggy >>>> bridge approach, and no one seems willing to do anything about the >>>> bugs or even acknowledge that it's a problem. It's strange - if one >>>> identifies bugs that result in kernel oops in other kernel subsystems, >>>> one is generally taken seriously and the problem is solved. >>> Unbinding is really not the most important feature, especially for SoC. If >>> you feel different, working together with others, getting some agreement, >>> getting the patches reviewed and finding someone to get them merged is >>> very much appreciated. But just complaining won't move this forward. >>> >>>> The issue about the encoders is something that I've tried to discuss, >>>> and I've pointed out that moving it into the DRM driver adds additional >>>> complexity there, and I'd hoped that my patch set I posted would've >>>> generated discussion, but alas not. >>>> >>>> What I'm not prepared to do is to introduce _known_ bugs into any >>>> driver that I maintain. >>> I thought last time around the idea was to use device links (and teach >>> drm_bridge about them), so that the unloading works correctly. >> With current device_links implementation registering links in probe of >> the consumer is just incorrect - it can happen that neither consumer >> neither provider is fully probed and creating device links in such state >> is wrong according to docs, and my experiments. See [1] for discussion >> and [2] for docs. > We could set up the device link only at drm_dev_register time. At that point > we know that driver loading has fully succeeded. We do have a list of > bridges at hand, so that's not going to be an issue. > > The only case where this breaks is if a driver is still using the > deprecated ->load callback. But that ->load callback doesn't mix well with > EDEFER_PROBE/component framework anyway, so I think not going to be a > problem hopefully? drm_dev_register usually is called from bind callback, which is called from probe callback of one of the components or master (depending on particular probe order). If you want to register device link in this function it is possible that the bad scenario will happen - there will be attempt to create device link between not-yet-probed consumer and not-yet-probed provider.

If you call drm_dev_register before you have all the components assembled (i.e. anywhere else than in master bind) that sounds like a driver bug.

This is how components work, every component calls component_add usually in probe, and this function checks if all components are present, if yes (ie. during probe of the last component) master bind is called and it creates drm device and then registers it. If you add device link at this moment, it is still before end of probe of the last component.

On the other side provider is registered (drm_bridge_add in case of bridges) during its probe so it becomes available to the consumers BEFORE end of its probe and again it can happen that device link will be added to early.

Hm I read that thread again. Is the reason why we can't add the device link only the exynos behaviour to allow drm_panel to be hot-added/removed?

Not only. What I have described above is common to all drivers it has nothing specific to Exynos.

I'm pretty sure that most drivers do not hot-add/remove drm things after drm_dev_register (except drm_connector for dp mst support). It would be buggy. Do you have more examples? I haven't reviewed them all in detail, and things might have changed, so could very well be there's exceptions.

Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

...

...
Of course it was tested on Exynos as this is HW I work on. Linus Walleij has also reported in this thread that device links have different issue

circular dependencies (last few emails in this lengthy thread). My

response explains it in detail.

...
Note that for bridge allowing this is 100% buggy: drm core does not allow you to hotplug/hotunplug bridges.

What part of drm core disallows it? As I remember discussions about drm_bridge design there were voices that they should be hot(un)pluggable, and they are IMO, of course if they are not active.

There's no locking at all to handle bridges appearing/disappearing while the drm_device can be accessed. There's also no lifetime management/refcounting. So it "works" but it's racy like mad.

I have not recently examined the code, but as I remember core uses the bridge only if it is attached to encoder which is attached to pipeline and the connector is in connected state (or in transition phase to/from this state).

...

...
...
In theory drm_panel can be hotplugged (because we can hotplug drm_connector), but I'm pretty sure exynos gets it all wrong since hotplugging panels is no easy thing to do. I don't think this is something we want to support, forcing the entire driver to be unload sounds like what we want to do here.

I do not know if it is 'all wrong', but tests shows it is hot(un)pluggable. In both cases of panel and bridge :)

There's no drm_connector_get/put in exynos (except the one in hdmi_mode_fixup, which looks rather fishy tbh). And drm_connector are the only things in drm you can hotplug/unplug (except the overall drm_device ofc). So again it maybe "works", but you're guaranteed to have lots of races all over the place.

Personally I have implemented only panel hotplug support and I have it tested/analyzed quite thoroughly - it does not play with connector removal it just makes connector disconnected in case panel is removed, much simpler case. In case of bridge I have helped in design but I have not tested it thoroughly, and as you pointed it out there are fishy places, but I guess the design should be OK.

The design is as follows (just bridge removal scenario and tc358764 bridge):

1. User requests bridge unbind - tc358764_remove is called.

2. tc358764_remove calls mipi_dsi_detach - encoder can perform pipeline cleanup, including connector removal.

3. Now the bridge is detached, so it can be removed - tc358764_remove calls drm_bridge_remove.

As I see in the code it looks like there are missing pieces/bugs but it is just something which can be fixed.

BTW the approach that last element of the pipeline should create connector complicates things a lot, as Laurent (and me) argued multiple times moving connector creation out of bridges would simplify things.

Regards

Andrzej

...

-Daniel

...
Regards

Andrzej

...
-Daniel

...
Regards

Andrzej

...
drm_dev_register publishes the drm device instance to the world, if you're not ready to handle userspace requests at that point (because not everything is loaded yet) then things will go boom in very colorful ways. And from my (admittedly very rough) understanding we should be able to register the the device links as the very last step in the master bind function (and drm_dev_register should be about the last thing you do in the master bind).

So not clear on why this won't work?

...
-Daniel

Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Russell King - ARM Linux

11:38 a.m.

On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote:

...

Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

I don't think the links should be created when the bridge is attached, they should be created when the main drm driver gains its first reference to the drm_bridge (via of_drm_find_bridge()) - since that's the point where things may explode if the drm_bridge goes away. Calling drm_bridge_attach() for a bridge that has been unbound will be a use-after-free bug, so using device links at that point is way too late.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Andrzej Hajda

12:27 p.m.

On 08.01.2019 12:38, Russell King - ARM Linux wrote:

...

On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote:

...
Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

...

I don't think the links should be created when the bridge is attached, they should be created when the main drm driver gains its first reference to the drm_bridge (via of_drm_find_bridge()) - since that's the point where things may explode if the drm_bridge goes away. Calling drm_bridge_attach() for a bridge that has been unbound will be a use-after-free bug, so using device links at that point is way too late.

If you want to create device_link you should access bridge->dev, if you do it outside bridge_lock it can explode as well. So link should be created under bridge_lock then (inside of_drm_find_bridge? via another getter? ).

Regards

Andrzej

Russell King - ARM Linux

1:21 p.m.

On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote:

...

On 08.01.2019 12:38, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote:

...
Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Secondly, device links are created by the regulator support whenever a regulator is "got" - between the supplier and the consumer. I'm not sure that regulator uses this in a safe way since it looks to me like there could be a race between the point where the regulator has been found and the point that the device link is created, and the regulator supplier being unbound. Regulator uses stateless device links to order PM, but not to remove consumers when the supplier goes away.

Finally, I believe that CCF is looking at using device links as well, which will mean a link is established by a clk_get() type operation and released in a clk_put() operation. There hasn't been any code merged for this, but I have seen it discussed.

These all have in common one thing - a device link is created at the point that the resource is obtained and removed when the resource is released.

I don't see how DRM bridges are any different from any other resource in the system, and why you think that device links wouldn't work for DRM bridges.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Daniel Vetter

1:34 p.m.

On Tue, Jan 8, 2019 at 2:22 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...

On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 12:38, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote:

...
Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Secondly, device links are created by the regulator support whenever a regulator is "got" - between the supplier and the consumer. I'm not sure that regulator uses this in a safe way since it looks to me like there could be a race between the point where the regulator has been found and the point that the device link is created, and the regulator supplier being unbound. Regulator uses stateless device links to order PM, but not to remove consumers when the supplier goes away.

Finally, I believe that CCF is looking at using device links as well, which will mean a link is established by a clk_get() type operation and released in a clk_put() operation. There hasn't been any code merged for this, but I have seen it discussed.

These all have in common one thing - a device link is created at the point that the resource is obtained and removed when the resource is released.

I don't see how DRM bridges are any different from any other resource in the system, and why you think that device links wouldn't work for DRM bridges.

Yeah, for PM ordering device links should work for bridges and panels. I think all the questions/confusion is whether they also work for load/unload ordering, which they atm don't anyway because if you unload a provider and then reload it, the consumer isn't added to the reprobe list (unlikely what component achieves, which does call master bind again in this case).

I also think the entire discussion around whether you can hotadd bridges/panels after drm_dev_register is orthogonal, or at least should be. If you want to support hot-adding, they neither can you use device links nor component, because they both create a more static connection between producer and consumer. Aside from hotadding anything aside from drm_connector isn't really supported, and you need to roll your own locking and everything (which exynos really doesn't seem to do - the fact it works if you're careful doesn't really change that). -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Andrzej Hajda

2:33 p.m.

On 08.01.2019 14:21, Russell King - ARM Linux wrote:

...

On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 12:38, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote:

...
Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Supplier usually is registered in it's probe time, so there is short period of time when supplier is available, but the probe is not yet finished - quite unsafe, but not impossible, especially if there exists

some kind of notifications about resource appearance (MIPI-DSI case).

Regards

Andrzej

...

Secondly, device links are created by the regulator support whenever a regulator is "got" - between the supplier and the consumer. I'm not sure that regulator uses this in a safe way since it looks to me like there could be a race between the point where the regulator has been found and the point that the device link is created, and the regulator supplier being unbound. Regulator uses stateless device links to order PM, but not to remove consumers when the supplier goes away.

Finally, I believe that CCF is looking at using device links as well, which will mean a link is established by a clk_get() type operation and released in a clk_put() operation. There hasn't been any code merged for this, but I have seen it discussed.

These all have in common one thing - a device link is created at the point that the resource is obtained and removed when the resource is released.

I don't see how DRM bridges are any different from any other resource in the system, and why you think that device links wouldn't work for DRM bridges.

Russell King - ARM Linux

3:07 p.m.

On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote:

...

On 08.01.2019 14:21, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 12:38, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote:

...
Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Supplier usually is registered in it's probe time, so there is short period of time when supplier is available, but the probe is not yet finished - quite unsafe, but not impossible, especially if there exists some kind of notifications about resource appearance (MIPI-DSI case).

At some point during the supplier probe, the resource becomes available to consumers. At that point, device links can be setup before the supplier has finished probing. So any driver that provides resources to another driver will, at some point during the provider's probe, make resources available, and therefore be a candidate for device links to be created _before_ the probe function has returned.

What is a problem is if the provider publishes resources, and then fails its probe function, causing the resource to disappear.

Taking DRM bridge, drm_bridge_add() returns void - it never fails. Most bridge drivers do drm_bridge_add() as the very last step before returning zero from their probe function. There are a few exceptions.

megachips-stdpxxxx-ge-b850v3-fw.c is already buggy - it calls devm_request_threaded_irq(), which if it fails, the bridge is left added to the global list of bridges. Since this structure was allocated with devm_kzalloc(), it will be freed when the probe function fails. So, any failure here (eg a deferred probe because the IRQ controller is not available) already creates a latent bug just waiting to bite.

tc358764.c is another case where the bridge is published prior to initialisation completion, but it looks like that's under the control of the "host" (consumer) driver.

mtk_hdmi.c enables clocks for audio after registering the bridge, which may fail, but are unlikely to. However, moving them prior to drm_bridge_add() probably won't hurt and avoids the "published interfaces which then vanish" problem.

Then we have exynos_drm_mic.c and tda998x_drv.c, but I think the latter's error path after drm_bridge_add() can be eliminated if we transitioned to device links instead of the component helper.

Outside DRM, take regulators - at some point during a regulator supplier's probe function, the resource will be published, and as soon as it is, it's available for regulator_get() to find it and setup a device link before the probe function has finished.

...

From what I can tell, in the situation where a supplier makes resources

available, the consumer binds to the resource, and then the supplier goes away, the device link will remain, and the consumer will not be unbound, which leads to an unexpected state. The solution to this is the age old kernel rule of "don't publish your interfaces until you've completed initialisation". IOW, publish at a point where you aren't going to fail.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Daniel Vetter

6:07 p.m.

On Tue, Jan 8, 2019 at 4:07 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...

On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 14:21, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 12:38, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote:

...
Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Supplier usually is registered in it's probe time, so there is short period of time when supplier is available, but the probe is not yet finished - quite unsafe, but not impossible, especially if there exists some kind of notifications about resource appearance (MIPI-DSI case).

At some point during the supplier probe, the resource becomes available to consumers. At that point, device links can be setup before the supplier has finished probing. So any driver that provides resources to another driver will, at some point during the provider's probe, make resources available, and therefore be a candidate for device links to be created _before_ the probe function has returned.

What is a problem is if the provider publishes resources, and then fails its probe function, causing the resource to disappear.

Taking DRM bridge, drm_bridge_add() returns void - it never fails. Most bridge drivers do drm_bridge_add() as the very last step before returning zero from their probe function. There are a few exceptions.

Yup, that's how it should be. And this is a general rule: Only ever publish your interface (whether userspace dev node, or some kernel internal thing) once setup is 100% done and nothing can fail anymore. If you call drm_bridge_add before your bridge is fully set up, that's a bug. Same way that calling drm_dev_register before all the sub-parts (except drm_connector, which can be hotplugged) is a bug. That ordering issue is why the ->load callback is deprecated, since it's fundamentally unsafe (but we did paper over the worst races with opening the chardev using some global locking).

If you want more examples from within drm look at gem bo or drm_framebuffer: These also only get registered once fully set up, and the registration is the very last thing addfb or dumb_create do. Heck even for hotplugged drm_connector the same rule holds: First set it up (including panel behind it and everything), then register it.

...

megachips-stdpxxxx-ge-b850v3-fw.c is already buggy - it calls devm_request_threaded_irq(), which if it fails, the bridge is left added to the global list of bridges. Since this structure was allocated with devm_kzalloc(), it will be freed when the probe function fails. So, any failure here (eg a deferred probe because the IRQ controller is not available) already creates a latent bug just waiting to bite.

tc358764.c is another case where the bridge is published prior to initialisation completion, but it looks like that's under the control of the "host" (consumer) driver.

mtk_hdmi.c enables clocks for audio after registering the bridge, which may fail, but are unlikely to. However, moving them prior to drm_bridge_add() probably won't hurt and avoids the "published interfaces which then vanish" problem.

Then we have exynos_drm_mic.c and tda998x_drv.c, but I think the latter's error path after drm_bridge_add() can be eliminated if we transitioned to device links instead of the component helper.

Outside DRM, take regulators - at some point during a regulator supplier's probe function, the resource will be published, and as soon as it is, it's available for regulator_get() to find it and setup a device link before the probe function has finished.

From what I can tell, in the situation where a supplier makes resources available, the consumer binds to the resource, and then the supplier goes away, the device link will remain, and the consumer will not be unbound, which leads to an unexpected state. The solution to this is the age old kernel rule of "don't publish your interfaces until you've completed initialisation". IOW, publish at a point where you aren't going to fail.

Fully agreed. Any driver that does something else is suspect, and definitely shouldn't be held up as an example of what we should support in generic/shared code. There's just too many surprises lurking otherwise (and so if you insist on doing things like that, you get to hand-roll a lot more of your driver code). -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Andrzej Hajda

9 Jan 9 Jan

9:12 a.m.

+CC: Rafael J. Wysocki rafael@kernel.org

On 08.01.2019 16:07, Russell King - ARM Linux wrote:

...

On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 14:21, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 12:38, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote:

...
Issues with device links have nothing to do with hotplugging, they are generic - lifetime of the objects (drm_bridge, drm_panel) is just slightly different of lifetime of device links, and this is racy even if you do not want hotplugging. Moreover since drm_dev is not device (has no associated struct device) assuming we can reuse its parent to create device link results in circular dependencies.

How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Supplier usually is registered in it's probe time, so there is short period of time when supplier is available, but the probe is not yet finished - quite unsafe, but not impossible, especially if there exists some kind of notifications about resource appearance (MIPI-DSI case).

At some point during the supplier probe, the resource becomes available to consumers. At that point, device links can be setup before the supplier has finished probing. So any driver that provides resources to another driver will, at some point during the provider's probe, make resources available, and therefore be a candidate for device links to be created _before_ the probe function has returned.

What is a problem is if the provider publishes resources, and then fails its probe function, causing the resource to disappear.

But creating link to not-probed provider is still incorrect usage from device_links framework PoV, and my tests showed indeed that device link created before end of provider's probe does not work at all - and since it is stated in the documentation I guess it is by design.

...

Taking DRM bridge, drm_bridge_add() returns void - it never fails. Most bridge drivers do drm_bridge_add() as the very last step before returning zero from their probe function. There are a few exceptions.

megachips-stdpxxxx-ge-b850v3-fw.c is already buggy - it calls devm_request_threaded_irq(), which if it fails, the bridge is left added to the global list of bridges. Since this structure was allocated with devm_kzalloc(), it will be freed when the probe function fails. So, any failure here (eg a deferred probe because the IRQ controller is not available) already creates a latent bug just waiting to bite.

tc358764.c is another case where the bridge is published prior to initialisation completion, but it looks like that's under the control of the "host" (consumer) driver.

mtk_hdmi.c enables clocks for audio after registering the bridge, which may fail, but are unlikely to. However, moving them prior to drm_bridge_add() probably won't hurt and avoids the "published interfaces which then vanish" problem.

Then we have exynos_drm_mic.c and tda998x_drv.c, but I think the latter's error path after drm_bridge_add() can be eliminated if we transitioned to device links instead of the component helper.

Outside DRM, take regulators - at some point during a regulator supplier's probe function, the resource will be published, and as soon as it is, it's available for regulator_get() to find it and setup a device link before the probe function has finished.

...
From what I can tell, in the situation where a supplier makes resources

available, the consumer binds to the resource, and then the supplier goes away, the device link will remain, and the consumer will not be unbound, which leads to an unexpected state. The solution to this is the age old kernel rule of "don't publish your interfaces until you've completed initialisation". IOW, publish at a point where you aren't going to fail.

What about complex devices which wants to publish multiple interfaces? I can imagine in some cases it could be impossible to stick to this rule.

Regards

Andrzej

Rafael J. Wysocki

9:24 a.m.

On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote:

...

+CC: Rafael J. Wysocki rafael@kernel.org

On 08.01.2019 16:07, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 14:21, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 12:38, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: > Issues with device links have nothing to do with hotplugging, they are > generic - lifetime of the objects (drm_bridge, drm_panel) is just > slightly different of lifetime of device links, and this is racy even if > you do not want hotplugging. Moreover since drm_dev is not device (has > no associated struct device) assuming we can reuse its parent to create > device link results in circular dependencies. How about having the device links created depending on whether the main drm driver wants them or not - that would mean that Exynos could continue avoiding them, but others that want them can have the links?

That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Supplier usually is registered in it's probe time, so there is short period of time when supplier is available, but the probe is not yet finished - quite unsafe, but not impossible, especially if there exists some kind of notifications about resource appearance (MIPI-DSI case).

At some point during the supplier probe, the resource becomes available to consumers. At that point, device links can be setup before the supplier has finished probing. So any driver that provides resources to another driver will, at some point during the provider's probe, make resources available, and therefore be a candidate for device links to be created _before_ the probe function has returned.

What is a problem is if the provider publishes resources, and then fails its probe function, causing the resource to disappear.

But creating link to not-probed provider is still incorrect usage from device_links framework PoV, and my tests showed indeed that device link created before end of provider's probe does not work at all - and since it is stated in the documentation I guess it is by design.

Yes, it is.

Russell King - ARM Linux

9:30 a.m.

On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote:

...

On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
+CC: Rafael J. Wysocki rafael@kernel.org

On 08.01.2019 16:07, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 14:21, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 12:38, Russell King - ARM Linux wrote: > On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: >> Issues with device links have nothing to do with hotplugging, they are >> generic - lifetime of the objects (drm_bridge, drm_panel) is just >> slightly different of lifetime of device links, and this is racy even if >> you do not want hotplugging. Moreover since drm_dev is not device (has >> no associated struct device) assuming we can reuse its parent to create >> device link results in circular dependencies. > How about having the device links created depending on whether the > main drm driver wants them or not - that would mean that Exynos > could continue avoiding them, but others that want them can have > the links? That should be OK for Exynos. But regardless of Exynos device_links at the current state will not work correctly with bridges/panels as I described earlier.

However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Supplier usually is registered in it's probe time, so there is short period of time when supplier is available, but the probe is not yet finished - quite unsafe, but not impossible, especially if there exists some kind of notifications about resource appearance (MIPI-DSI case).

At some point during the supplier probe, the resource becomes available to consumers. At that point, device links can be setup before the supplier has finished probing. So any driver that provides resources to another driver will, at some point during the provider's probe, make resources available, and therefore be a candidate for device links to be created _before_ the probe function has returned.

What is a problem is if the provider publishes resources, and then fails its probe function, causing the resource to disappear.

But creating link to not-probed provider is still incorrect usage from device_links framework PoV, and my tests showed indeed that device link created before end of provider's probe does not work at all - and since it is stated in the documentation I guess it is by design.

Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Daniel Vetter

11 Jan 11 Jan

2:20 p.m.

On Wed, Jan 9, 2019 at 10:31 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...

On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote:

...
On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
+CC: Rafael J. Wysocki rafael@kernel.org

On 08.01.2019 16:07, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 14:21, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote: > On 08.01.2019 12:38, Russell King - ARM Linux wrote: >> On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: >>> Issues with device links have nothing to do with hotplugging, they are >>> generic - lifetime of the objects (drm_bridge, drm_panel) is just >>> slightly different of lifetime of device links, and this is racy even if >>> you do not want hotplugging. Moreover since drm_dev is not device (has >>> no associated struct device) assuming we can reuse its parent to create >>> device link results in circular dependencies. >> How about having the device links created depending on whether the >> main drm driver wants them or not - that would mean that Exynos >> could continue avoiding them, but others that want them can have >> the links? > That should be OK for Exynos. But regardless of Exynos device_links at > the current state will not work correctly with bridges/panels as I > described earlier. However, I'm not sure you're correct with your interpretation of the documentation. Firstly, the documentation says:

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's ->probe callback while the supplier hasn't probed yet: Had the driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on non-presence.

This is fine - if we add the device link from of_drm_find_bridge(), we will be in the consumer's ->probe function, and the supplier must have been probed for us to find the struct drm_bridge.

Supplier usually is registered in it's probe time, so there is short period of time when supplier is available, but the probe is not yet finished - quite unsafe, but not impossible, especially if there exists some kind of notifications about resource appearance (MIPI-DSI case).

At some point during the supplier probe, the resource becomes available to consumers. At that point, device links can be setup before the supplier has finished probing. So any driver that provides resources to another driver will, at some point during the provider's probe, make resources available, and therefore be a candidate for device links to be created _before_ the probe function has returned.

What is a problem is if the provider publishes resources, and then fails its probe function, causing the resource to disappear.

But creating link to not-probed provider is still incorrect usage from device_links framework PoV, and my tests showed indeed that device link created before end of provider's probe does not work at all - and since it is stated in the documentation I guess it is by design.

Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

Rafael, can you confirm?

This would mean device links aren't really useful for componentized drivers (whether they use the component framework or something subsystem specific like drm_bridge, regulator, ccf or whatever), which seems to defeat the purpose of them. Or is there some other way to set up device links, which doesn't hit these cases?

Note that there's some other issues with device links (discussed earlier in this thread round reprobing), but that issue should be easy to fix and not something fundamental like the above one. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Rafael J. Wysocki

2:26 p.m.

On Fri, Jan 11, 2019 at 3:20 PM Daniel Vetter daniel@ffwll.ch wrote:

...

On Wed, Jan 9, 2019 at 10:31 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote:

...
On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
+CC: Rafael J. Wysocki rafael@kernel.org

On 08.01.2019 16:07, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote:

...
On 08.01.2019 14:21, Russell King - ARM Linux wrote: > On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote: >> On 08.01.2019 12:38, Russell King - ARM Linux wrote: >>> On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: >>>> Issues with device links have nothing to do with hotplugging, they are >>>> generic - lifetime of the objects (drm_bridge, drm_panel) is just >>>> slightly different of lifetime of device links, and this is racy even if >>>> you do not want hotplugging. Moreover since drm_dev is not device (has >>>> no associated struct device) assuming we can reuse its parent to create >>>> device link results in circular dependencies. >>> How about having the device links created depending on whether the >>> main drm driver wants them or not - that would mean that Exynos >>> could continue avoiding them, but others that want them can have >>> the links? >> That should be OK for Exynos. But regardless of Exynos device_links at >> the current state will not work correctly with bridges/panels as I >> described earlier. > However, I'm not sure you're correct with your interpretation of the > documentation. Firstly, the documentation says: > > Another example for an inconsistent state would be a device link that > represents a driver presence dependency, yet is added from the consumer's > ->probe callback while the supplier hasn't probed yet: Had the driver core > known about the device link earlier, it wouldn't have probed the consumer > in the first place. The onus is thus on the consumer to check presence of > the supplier after adding the link, and defer probing on non-presence. > > This is fine - if we add the device link from of_drm_find_bridge(), we > will be in the consumer's ->probe function, and the supplier must have > been probed for us to find the struct drm_bridge.

Supplier usually is registered in it's probe time, so there is short period of time when supplier is available, but the probe is not yet finished - quite unsafe, but not impossible, especially if there exists some kind of notifications about resource appearance (MIPI-DSI case).

At some point during the supplier probe, the resource becomes available to consumers. At that point, device links can be setup before the supplier has finished probing. So any driver that provides resources to another driver will, at some point during the provider's probe, make resources available, and therefore be a candidate for device links to be created _before_ the probe function has returned.

What is a problem is if the provider publishes resources, and then fails its probe function, causing the resource to disappear.

But creating link to not-probed provider is still incorrect usage from device_links framework PoV, and my tests showed indeed that device link created before end of provider's probe does not work at all - and since it is stated in the documentation I guess it is by design.

Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

Rafael, can you confirm?

Let me quote from https://www.kernel.org/doc/html/latest/driver-api/device_link.html :

"When driver presence on the supplier is irrelevant and only correct suspend/resume and shutdown ordering is needed, the device link may simply be set up with the DL_FLAG_STATELESS flag. In other words, enforcing driver presence on the supplier is optional."

Which is exactly the case at hand here AFAICS.

Russell King - ARM Linux

2:32 p.m.

On Fri, Jan 11, 2019 at 03:26:44PM +0100, Rafael J. Wysocki wrote:

...

On Fri, Jan 11, 2019 at 3:20 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Jan 9, 2019 at 10:31 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote:

...
On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
+CC: Rafael J. Wysocki rafael@kernel.org

On 08.01.2019 16:07, Russell King - ARM Linux wrote:

...
On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote: > On 08.01.2019 14:21, Russell King - ARM Linux wrote: >> On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote: >>> On 08.01.2019 12:38, Russell King - ARM Linux wrote: >>>> On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: >>>>> Issues with device links have nothing to do with hotplugging, they are >>>>> generic - lifetime of the objects (drm_bridge, drm_panel) is just >>>>> slightly different of lifetime of device links, and this is racy even if >>>>> you do not want hotplugging. Moreover since drm_dev is not device (has >>>>> no associated struct device) assuming we can reuse its parent to create >>>>> device link results in circular dependencies. >>>> How about having the device links created depending on whether the >>>> main drm driver wants them or not - that would mean that Exynos >>>> could continue avoiding them, but others that want them can have >>>> the links? >>> That should be OK for Exynos. But regardless of Exynos device_links at >>> the current state will not work correctly with bridges/panels as I >>> described earlier. >> However, I'm not sure you're correct with your interpretation of the >> documentation. Firstly, the documentation says: >> >> Another example for an inconsistent state would be a device link that >> represents a driver presence dependency, yet is added from the consumer's >> ->probe callback while the supplier hasn't probed yet: Had the driver core >> known about the device link earlier, it wouldn't have probed the consumer >> in the first place. The onus is thus on the consumer to check presence of >> the supplier after adding the link, and defer probing on non-presence. >> >> This is fine - if we add the device link from of_drm_find_bridge(), we >> will be in the consumer's ->probe function, and the supplier must have >> been probed for us to find the struct drm_bridge. > > Supplier usually is registered in it's probe time, so there is short > period of time when supplier is available, but the probe is not yet > finished - quite unsafe, but not impossible, especially if there exists > some kind of notifications about resource appearance (MIPI-DSI case). At some point during the supplier probe, the resource becomes available to consumers. At that point, device links can be setup before the supplier has finished probing. So any driver that provides resources to another driver will, at some point during the provider's probe, make resources available, and therefore be a candidate for device links to be created _before_ the probe function has returned.

What is a problem is if the provider publishes resources, and then fails its probe function, causing the resource to disappear.

But creating link to not-probed provider is still incorrect usage from device_links framework PoV, and my tests showed indeed that device link created before end of provider's probe does not work at all - and since it is stated in the documentation I guess it is by design.

Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

Rafael, can you confirm?

Let me quote from https://www.kernel.org/doc/html/latest/driver-api/device_link.html :

"When driver presence on the supplier is irrelevant and only correct suspend/resume and shutdown ordering is needed, the device link may simply be set up with the DL_FLAG_STATELESS flag. In other words, enforcing driver presence on the supplier is optional."

Which is exactly the case at hand here AFAICS.

That is not what we're discussing. We are discussing using device links to solve the problem with DRM bridges which may be removed today from DRM without the rest of the DRM system being aware.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Daniel Vetter

2:36 p.m.

On Fri, Jan 11, 2019 at 3:32 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...

On Fri, Jan 11, 2019 at 03:26:44PM +0100, Rafael J. Wysocki wrote:

...
On Fri, Jan 11, 2019 at 3:20 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Jan 9, 2019 at 10:31 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote:

...
On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
+CC: Rafael J. Wysocki rafael@kernel.org

On 08.01.2019 16:07, Russell King - ARM Linux wrote: > On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote: >> On 08.01.2019 14:21, Russell King - ARM Linux wrote: >>> On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote: >>>> On 08.01.2019 12:38, Russell King - ARM Linux wrote: >>>>> On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: >>>>>> Issues with device links have nothing to do with hotplugging, they are >>>>>> generic - lifetime of the objects (drm_bridge, drm_panel) is just >>>>>> slightly different of lifetime of device links, and this is racy even if >>>>>> you do not want hotplugging. Moreover since drm_dev is not device (has >>>>>> no associated struct device) assuming we can reuse its parent to create >>>>>> device link results in circular dependencies. >>>>> How about having the device links created depending on whether the >>>>> main drm driver wants them or not - that would mean that Exynos >>>>> could continue avoiding them, but others that want them can have >>>>> the links? >>>> That should be OK for Exynos. But regardless of Exynos device_links at >>>> the current state will not work correctly with bridges/panels as I >>>> described earlier. >>> However, I'm not sure you're correct with your interpretation of the >>> documentation. Firstly, the documentation says: >>> >>> Another example for an inconsistent state would be a device link that >>> represents a driver presence dependency, yet is added from the consumer's >>> ->probe callback while the supplier hasn't probed yet: Had the driver core >>> known about the device link earlier, it wouldn't have probed the consumer >>> in the first place. The onus is thus on the consumer to check presence of >>> the supplier after adding the link, and defer probing on non-presence. >>> >>> This is fine - if we add the device link from of_drm_find_bridge(), we >>> will be in the consumer's ->probe function, and the supplier must have >>> been probed for us to find the struct drm_bridge. >> >> Supplier usually is registered in it's probe time, so there is short >> period of time when supplier is available, but the probe is not yet >> finished - quite unsafe, but not impossible, especially if there exists >> some kind of notifications about resource appearance (MIPI-DSI case). > At some point during the supplier probe, the resource becomes available > to consumers. At that point, device links can be setup before the > supplier has finished probing. So any driver that provides resources > to another driver will, at some point during the provider's probe, > make resources available, and therefore be a candidate for device links > to be created _before_ the probe function has returned. > > What is a problem is if the provider publishes resources, and then fails > its probe function, causing the resource to disappear.

But creating link to not-probed provider is still incorrect usage from device_links framework PoV, and my tests showed indeed that device link created before end of provider's probe does not work at all - and since it is stated in the documentation I guess it is by design.

Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

Rafael, can you confirm?

Let me quote from https://www.kernel.org/doc/html/latest/driver-api/device_link.html :

"When driver presence on the supplier is irrelevant and only correct suspend/resume and shutdown ordering is needed, the device link may simply be set up with the DL_FLAG_STATELESS flag. In other words, enforcing driver presence on the supplier is optional."

Which is exactly the case at hand here AFAICS.

That is not what we're discussing. We are discussing using device links to solve the problem with DRM bridges which may be removed today from DRM without the rest of the DRM system being aware.

Yeah, we want device links not just for the suspend/resume ordering, but also as full dependencies. Atm that's solved with component (for the full generic case), but that needs more hand-rolled code. device_links would be much easier to integrate into all these subsystems (and you really want to unload the drm driver if clocks/transcoders/whatever disappears). For most drivers there's really no difference between the runtime dependencies for suspend/resume and the load/unload dependencies. Having two use 2 completely different solutions for the common case where they match exactly seems suboptimal. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Rafael J. Wysocki

2:40 p.m.

On Fri, Jan 11, 2019 at 3:36 PM Daniel Vetter daniel@ffwll.ch wrote:

...

On Fri, Jan 11, 2019 at 3:32 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Fri, Jan 11, 2019 at 03:26:44PM +0100, Rafael J. Wysocki wrote:

...
On Fri, Jan 11, 2019 at 3:20 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Jan 9, 2019 at 10:31 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote:

...
On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote: > > +CC: Rafael J. Wysocki rafael@kernel.org > > On 08.01.2019 16:07, Russell King - ARM Linux wrote: > > On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote: > >> On 08.01.2019 14:21, Russell King - ARM Linux wrote: > >>> On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote: > >>>> On 08.01.2019 12:38, Russell King - ARM Linux wrote: > >>>>> On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: > >>>>>> Issues with device links have nothing to do with hotplugging, they are > >>>>>> generic - lifetime of the objects (drm_bridge, drm_panel) is just > >>>>>> slightly different of lifetime of device links, and this is racy even if > >>>>>> you do not want hotplugging. Moreover since drm_dev is not device (has > >>>>>> no associated struct device) assuming we can reuse its parent to create > >>>>>> device link results in circular dependencies. > >>>>> How about having the device links created depending on whether the > >>>>> main drm driver wants them or not - that would mean that Exynos > >>>>> could continue avoiding them, but others that want them can have > >>>>> the links? > >>>> That should be OK for Exynos. But regardless of Exynos device_links at > >>>> the current state will not work correctly with bridges/panels as I > >>>> described earlier. > >>> However, I'm not sure you're correct with your interpretation of the > >>> documentation. Firstly, the documentation says: > >>> > >>> Another example for an inconsistent state would be a device link that > >>> represents a driver presence dependency, yet is added from the consumer's > >>> ->probe callback while the supplier hasn't probed yet: Had the driver core > >>> known about the device link earlier, it wouldn't have probed the consumer > >>> in the first place. The onus is thus on the consumer to check presence of > >>> the supplier after adding the link, and defer probing on non-presence. > >>> > >>> This is fine - if we add the device link from of_drm_find_bridge(), we > >>> will be in the consumer's ->probe function, and the supplier must have > >>> been probed for us to find the struct drm_bridge. > >> > >> Supplier usually is registered in it's probe time, so there is short > >> period of time when supplier is available, but the probe is not yet > >> finished - quite unsafe, but not impossible, especially if there exists > >> some kind of notifications about resource appearance (MIPI-DSI case). > > At some point during the supplier probe, the resource becomes available > > to consumers. At that point, device links can be setup before the > > supplier has finished probing. So any driver that provides resources > > to another driver will, at some point during the provider's probe, > > make resources available, and therefore be a candidate for device links > > to be created _before_ the probe function has returned. > > > > What is a problem is if the provider publishes resources, and then fails > > its probe function, causing the resource to disappear. > > > But creating link to not-probed provider is still incorrect usage from > device_links framework PoV, and my tests showed indeed that device link > created before end of provider's probe does not work at all - and since > it is stated in the documentation I guess it is by design.

Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

Rafael, can you confirm?

Let me quote from https://www.kernel.org/doc/html/latest/driver-api/device_link.html :

"When driver presence on the supplier is irrelevant and only correct suspend/resume and shutdown ordering is needed, the device link may simply be set up with the DL_FLAG_STATELESS flag. In other words, enforcing driver presence on the supplier is optional."

Which is exactly the case at hand here AFAICS.

That is not what we're discussing. We are discussing using device links to solve the problem with DRM bridges which may be removed today from DRM without the rest of the DRM system being aware.

Yeah, we want device links not just for the suspend/resume ordering, but also as full dependencies. Atm that's solved with component (for the full generic case), but that needs more hand-rolled code. device_links would be much easier to integrate into all these subsystems (and you really want to unload the drm driver if clocks/transcoders/whatever disappears). For most drivers there's really no difference between the runtime dependencies for suspend/resume and the load/unload dependencies. Having two use 2 completely different solutions for the common case where they match exactly seems suboptimal.

Well, the state tracking in device links needs the initial state to be supplier driver present and functional. If you need full dependencies without that, device links are probably not for you.

Rafael J. Wysocki

2:36 p.m.

On Fri, Jan 11, 2019 at 3:32 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...

On Fri, Jan 11, 2019 at 03:26:44PM +0100, Rafael J. Wysocki wrote:

...
On Fri, Jan 11, 2019 at 3:20 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Jan 9, 2019 at 10:31 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote:

...
On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote:

...
+CC: Rafael J. Wysocki rafael@kernel.org

On 08.01.2019 16:07, Russell King - ARM Linux wrote: > On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote: >> On 08.01.2019 14:21, Russell King - ARM Linux wrote: >>> On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote: >>>> On 08.01.2019 12:38, Russell King - ARM Linux wrote: >>>>> On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: >>>>>> Issues with device links have nothing to do with hotplugging, they are >>>>>> generic - lifetime of the objects (drm_bridge, drm_panel) is just >>>>>> slightly different of lifetime of device links, and this is racy even if >>>>>> you do not want hotplugging. Moreover since drm_dev is not device (has >>>>>> no associated struct device) assuming we can reuse its parent to create >>>>>> device link results in circular dependencies. >>>>> How about having the device links created depending on whether the >>>>> main drm driver wants them or not - that would mean that Exynos >>>>> could continue avoiding them, but others that want them can have >>>>> the links? >>>> That should be OK for Exynos. But regardless of Exynos device_links at >>>> the current state will not work correctly with bridges/panels as I >>>> described earlier. >>> However, I'm not sure you're correct with your interpretation of the >>> documentation. Firstly, the documentation says: >>> >>> Another example for an inconsistent state would be a device link that >>> represents a driver presence dependency, yet is added from the consumer's >>> ->probe callback while the supplier hasn't probed yet: Had the driver core >>> known about the device link earlier, it wouldn't have probed the consumer >>> in the first place. The onus is thus on the consumer to check presence of >>> the supplier after adding the link, and defer probing on non-presence. >>> >>> This is fine - if we add the device link from of_drm_find_bridge(), we >>> will be in the consumer's ->probe function, and the supplier must have >>> been probed for us to find the struct drm_bridge. >> >> Supplier usually is registered in it's probe time, so there is short >> period of time when supplier is available, but the probe is not yet >> finished - quite unsafe, but not impossible, especially if there exists >> some kind of notifications about resource appearance (MIPI-DSI case). > At some point during the supplier probe, the resource becomes available > to consumers. At that point, device links can be setup before the > supplier has finished probing. So any driver that provides resources > to another driver will, at some point during the provider's probe, > make resources available, and therefore be a candidate for device links > to be created _before_ the probe function has returned. > > What is a problem is if the provider publishes resources, and then fails > its probe function, causing the resource to disappear.

But creating link to not-probed provider is still incorrect usage from device_links framework PoV, and my tests showed indeed that device link created before end of provider's probe does not work at all - and since it is stated in the documentation I guess it is by design.

Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

Rafael, can you confirm?

Let me quote from https://www.kernel.org/doc/html/latest/driver-api/device_link.html :

"When driver presence on the supplier is irrelevant and only correct suspend/resume and shutdown ordering is needed, the device link may simply be set up with the DL_FLAG_STATELESS flag. In other words, enforcing driver presence on the supplier is optional."

Which is exactly the case at hand here AFAICS.

That is not what we're discussing. We are discussing using device links to solve the problem with DRM bridges which may be removed today from DRM without the rest of the DRM system being aware.

Still, I think, the DL_FLAG_STATELESS flag should work for you as it turns the state tracking off entirely.

Russell King - ARM Linux

2:49 p.m.

On Fri, Jan 11, 2019 at 03:36:28PM +0100, Rafael J. Wysocki wrote:

...

On Fri, Jan 11, 2019 at 3:32 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Fri, Jan 11, 2019 at 03:26:44PM +0100, Rafael J. Wysocki wrote:

...
On Fri, Jan 11, 2019 at 3:20 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Jan 9, 2019 at 10:31 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote:

...
On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote: > > +CC: Rafael J. Wysocki rafael@kernel.org > > On 08.01.2019 16:07, Russell King - ARM Linux wrote: > > On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote: > >> On 08.01.2019 14:21, Russell King - ARM Linux wrote: > >>> On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote: > >>>> On 08.01.2019 12:38, Russell King - ARM Linux wrote: > >>>>> On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: > >>>>>> Issues with device links have nothing to do with hotplugging, they are > >>>>>> generic - lifetime of the objects (drm_bridge, drm_panel) is just > >>>>>> slightly different of lifetime of device links, and this is racy even if > >>>>>> you do not want hotplugging. Moreover since drm_dev is not device (has > >>>>>> no associated struct device) assuming we can reuse its parent to create > >>>>>> device link results in circular dependencies. > >>>>> How about having the device links created depending on whether the > >>>>> main drm driver wants them or not - that would mean that Exynos > >>>>> could continue avoiding them, but others that want them can have > >>>>> the links? > >>>> That should be OK for Exynos. But regardless of Exynos device_links at > >>>> the current state will not work correctly with bridges/panels as I > >>>> described earlier. > >>> However, I'm not sure you're correct with your interpretation of the > >>> documentation. Firstly, the documentation says: > >>> > >>> Another example for an inconsistent state would be a device link that > >>> represents a driver presence dependency, yet is added from the consumer's > >>> ->probe callback while the supplier hasn't probed yet: Had the driver core > >>> known about the device link earlier, it wouldn't have probed the consumer > >>> in the first place. The onus is thus on the consumer to check presence of > >>> the supplier after adding the link, and defer probing on non-presence. > >>> > >>> This is fine - if we add the device link from of_drm_find_bridge(), we > >>> will be in the consumer's ->probe function, and the supplier must have > >>> been probed for us to find the struct drm_bridge. > >> > >> Supplier usually is registered in it's probe time, so there is short > >> period of time when supplier is available, but the probe is not yet > >> finished - quite unsafe, but not impossible, especially if there exists > >> some kind of notifications about resource appearance (MIPI-DSI case). > > At some point during the supplier probe, the resource becomes available > > to consumers. At that point, device links can be setup before the > > supplier has finished probing. So any driver that provides resources > > to another driver will, at some point during the provider's probe, > > make resources available, and therefore be a candidate for device links > > to be created _before_ the probe function has returned. > > > > What is a problem is if the provider publishes resources, and then fails > > its probe function, causing the resource to disappear. > > > But creating link to not-probed provider is still incorrect usage from > device_links framework PoV, and my tests showed indeed that device link > created before end of provider's probe does not work at all - and since > it is stated in the documentation I guess it is by design.

Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

Rafael, can you confirm?

Let me quote from https://www.kernel.org/doc/html/latest/driver-api/device_link.html :

"When driver presence on the supplier is irrelevant and only correct suspend/resume and shutdown ordering is needed, the device link may simply be set up with the DL_FLAG_STATELESS flag. In other words, enforcing driver presence on the supplier is optional."

Which is exactly the case at hand here AFAICS.

That is not what we're discussing. We are discussing using device links to solve the problem with DRM bridges which may be removed today from DRM without the rest of the DRM system being aware.

Still, I think, the DL_FLAG_STATELESS flag should work for you as it turns the state tracking off entirely.

Let me try again.

This thread is discussing how to deal with Armada DRM, its use of the component helper, TDA998x's hybrid use of the component helper and DRM bridges.

Currently, DRM bridges register into the DRM system and are added to a global list of bridges. When a DRM driver binds, it looks up the DRM bridge and attaches to it. There is no way in generic DRM code for the DRM driver to know if the DRM bridge is unbound from DRM, consequently a DRM driver may continue trying to call functions in the DRM bridge driver using memory that has already been freed.

We had similar issues with imx-drm, and a number of years ago at a kernel summit, this was discussed, and I proposed a system which is now known as the component helper. This handles the problem of a multi-component system.

However, DRM bridge was already established, and there appears to be no desire to convert DRM bridges and DRM drivers to use the component helper.

We are presently in the situation where Armada DRM is incompatible with the DRM bridges way of doing things, and making it compatible essentially means introducing potential use-after-free bugs into the code.

Device links in their stateful form appear to provide an alternative to the component helper, in that a stateful device link will remove consumers of a resource when the supplier is going away - which is exactly the problem which the component helper is solving. The difference is that device links look like being a cleaner solution.

Just like the component helper, a stateful link would unbind the consumer of a resource when the supplier goes away - which is exactly the behaviour we're wanting.

The problem is that the connection between various drivers is only really known when the drivers obtain their resources, and the following can happen:

supplier consumer

probe() alloc probe() publish obtain supplier's resource return

Where things go wrong is if a stateful link is created when the consumer obtains the suppliers resource before the supplier has finished probing - which from what's been said is illegal.

It seems that stateful device links can only be created by bus layer code, which limits their usefulness - in the case we have here, DT doesn't always know ahead of time about these links.

It sounds from what you're saying that you don't want to entertain any changes to device links - in which case, DRM can't use them to solve this problem, even though they would be an elegant solution. So, I think we're going to have to find a way to retrofit component stuff into DRM in a way that makes it optional for any DRM bridge consumer.

I hope this makes the discussion and what we're trying to achieve a little clearer.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Rafael J. Wysocki

14 Jan 14 Jan

12:32 p.m.

On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...

On Fri, Jan 11, 2019 at 03:36:28PM +0100, Rafael J. Wysocki wrote:

...
On Fri, Jan 11, 2019 at 3:32 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Fri, Jan 11, 2019 at 03:26:44PM +0100, Rafael J. Wysocki wrote:

...
On Fri, Jan 11, 2019 at 3:20 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Jan 9, 2019 at 10:31 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Wed, Jan 09, 2019 at 10:24:01AM +0100, Rafael J. Wysocki wrote: > On Wed, Jan 9, 2019 at 10:13 AM Andrzej Hajda a.hajda@samsung.com wrote: > > > > +CC: Rafael J. Wysocki rafael@kernel.org > > > > On 08.01.2019 16:07, Russell King - ARM Linux wrote: > > > On Tue, Jan 08, 2019 at 03:33:54PM +0100, Andrzej Hajda wrote: > > >> On 08.01.2019 14:21, Russell King - ARM Linux wrote: > > >>> On Tue, Jan 08, 2019 at 01:27:56PM +0100, Andrzej Hajda wrote: > > >>>> On 08.01.2019 12:38, Russell King - ARM Linux wrote: > > >>>>> On Tue, Jan 08, 2019 at 12:25:34PM +0100, Andrzej Hajda wrote: > > >>>>>> Issues with device links have nothing to do with hotplugging, they are > > >>>>>> generic - lifetime of the objects (drm_bridge, drm_panel) is just > > >>>>>> slightly different of lifetime of device links, and this is racy even if > > >>>>>> you do not want hotplugging. Moreover since drm_dev is not device (has > > >>>>>> no associated struct device) assuming we can reuse its parent to create > > >>>>>> device link results in circular dependencies. > > >>>>> How about having the device links created depending on whether the > > >>>>> main drm driver wants them or not - that would mean that Exynos > > >>>>> could continue avoiding them, but others that want them can have > > >>>>> the links? > > >>>> That should be OK for Exynos. But regardless of Exynos device_links at > > >>>> the current state will not work correctly with bridges/panels as I > > >>>> described earlier. > > >>> However, I'm not sure you're correct with your interpretation of the > > >>> documentation. Firstly, the documentation says: > > >>> > > >>> Another example for an inconsistent state would be a device link that > > >>> represents a driver presence dependency, yet is added from the consumer's > > >>> ->probe callback while the supplier hasn't probed yet: Had the driver core > > >>> known about the device link earlier, it wouldn't have probed the consumer > > >>> in the first place. The onus is thus on the consumer to check presence of > > >>> the supplier after adding the link, and defer probing on non-presence. > > >>> > > >>> This is fine - if we add the device link from of_drm_find_bridge(), we > > >>> will be in the consumer's ->probe function, and the supplier must have > > >>> been probed for us to find the struct drm_bridge. > > >> > > >> Supplier usually is registered in it's probe time, so there is short > > >> period of time when supplier is available, but the probe is not yet > > >> finished - quite unsafe, but not impossible, especially if there exists > > >> some kind of notifications about resource appearance (MIPI-DSI case). > > > At some point during the supplier probe, the resource becomes available > > > to consumers. At that point, device links can be setup before the > > > supplier has finished probing. So any driver that provides resources > > > to another driver will, at some point during the provider's probe, > > > make resources available, and therefore be a candidate for device links > > > to be created _before_ the probe function has returned. > > > > > > What is a problem is if the provider publishes resources, and then fails > > > its probe function, causing the resource to disappear. > > > > > > But creating link to not-probed provider is still incorrect usage from > > device_links framework PoV, and my tests showed indeed that device link > > created before end of provider's probe does not work at all - and since > > it is stated in the documentation I guess it is by design. > > Yes, it is.

So is the regulator support and the use of it being proposed for the CCF all going against the design of device links? In both those cases, device links _can_ be created while the supplier is still in the probe function by a consumer finding the resource.

This seems fragile by design.

Rafael, can you confirm?

Let me quote from https://www.kernel.org/doc/html/latest/driver-api/device_link.html :

"When driver presence on the supplier is irrelevant and only correct suspend/resume and shutdown ordering is needed, the device link may simply be set up with the DL_FLAG_STATELESS flag. In other words, enforcing driver presence on the supplier is optional."

Which is exactly the case at hand here AFAICS.

That is not what we're discussing. We are discussing using device links to solve the problem with DRM bridges which may be removed today from DRM without the rest of the DRM system being aware.

Still, I think, the DL_FLAG_STATELESS flag should work for you as it turns the state tracking off entirely.

Let me try again.

This thread is discussing how to deal with Armada DRM, its use of the component helper, TDA998x's hybrid use of the component helper and DRM bridges.

Currently, DRM bridges register into the DRM system and are added to a global list of bridges. When a DRM driver binds, it looks up the DRM bridge and attaches to it. There is no way in generic DRM code for the DRM driver to know if the DRM bridge is unbound from DRM, consequently a DRM driver may continue trying to call functions in the DRM bridge driver using memory that has already been freed.

We had similar issues with imx-drm, and a number of years ago at a kernel summit, this was discussed, and I proposed a system which is now known as the component helper. This handles the problem of a multi-component system.

However, DRM bridge was already established, and there appears to be no desire to convert DRM bridges and DRM drivers to use the component helper.

We are presently in the situation where Armada DRM is incompatible with the DRM bridges way of doing things, and making it compatible essentially means introducing potential use-after-free bugs into the code.

Device links in their stateful form appear to provide an alternative to the component helper, in that a stateful device link will remove consumers of a resource when the supplier is going away - which is exactly the problem which the component helper is solving. The difference is that device links look like being a cleaner solution.

Just like the component helper, a stateful link would unbind the consumer of a resource when the supplier goes away - which is exactly the behaviour we're wanting.

The problem is that the connection between various drivers is only really known when the drivers obtain their resources, and the following can happen:

supplier consumer

probe() alloc probe() publish obtain supplier's resource return

Where things go wrong is if a stateful link is created when the consumer obtains the suppliers resource before the supplier has finished probing - which from what's been said is illegal.

It just doesn't work (which means "invalid" rather than "illegal" I guess, but whatever :-)).

Admittedly, the original design didn't take this particular case into account and I'm not actually sure how it can be taken into account either.

If the link is created by the consumer in the scenario above, its status will be updated twice in a row after the consumer probe returns and after the supplier probe returns. It looks like this update would need to work regardless of the order it which this happened which sounds somewhat challenging. I would need to think about that.

Moreover, there seems to be is an additional complication here, which is that the supplier probe may finish and the status update for its links may run before the extra link is actually created by the consumer probe. Or is there any reason why that cannot happen in this particular case?

...

It seems that stateful device links can only be created by bus layer code, which limits their usefulness - in the case we have here, DT doesn't always know ahead of time about these links.

That's fair enough.

...

It sounds from what you're saying that you don't want to entertain any changes to device links

Why not?

If they can be made work for this use case, they would become more useful in general I suppose, but I'm just not sure how to do that ATM.

...

in which case, DRM can't use them to

solve this problem, even though they would be an elegant solution. So, I think we're going to have to find a way to retrofit component stuff into DRM in a way that makes it optional for any DRM bridge consumer.

I hope this makes the discussion and what we're trying to achieve a little clearer.

Yes, it does, thank you!

Cheers, Rafael

Rafael J. Wysocki

15 Jan 15 Jan

12:04 a.m.

[CC Lukas and linux-pm]

On Mon, Jan 14, 2019 at 1:32 PM Rafael J. Wysocki rafael@kernel.org wrote:

...

On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

[cut]

...

...
This thread is discussing how to deal with Armada DRM, its use of the component helper, TDA998x's hybrid use of the component helper and DRM bridges.

Currently, DRM bridges register into the DRM system and are added to a global list of bridges. When a DRM driver binds, it looks up the DRM bridge and attaches to it. There is no way in generic DRM code for the DRM driver to know if the DRM bridge is unbound from DRM, consequently a DRM driver may continue trying to call functions in the DRM bridge driver using memory that has already been freed.

We had similar issues with imx-drm, and a number of years ago at a kernel summit, this was discussed, and I proposed a system which is now known as the component helper. This handles the problem of a multi-component system.

However, DRM bridge was already established, and there appears to be no desire to convert DRM bridges and DRM drivers to use the component helper.

We are presently in the situation where Armada DRM is incompatible with the DRM bridges way of doing things, and making it compatible essentially means introducing potential use-after-free bugs into the code.

Device links in their stateful form appear to provide an alternative to the component helper, in that a stateful device link will remove consumers of a resource when the supplier is going away - which is exactly the problem which the component helper is solving. The difference is that device links look like being a cleaner solution.

Just like the component helper, a stateful link would unbind the consumer of a resource when the supplier goes away - which is exactly the behaviour we're wanting.

The problem is that the connection between various drivers is only really known when the drivers obtain their resources, and the following can happen:

supplier consumer

probe() alloc probe() publish obtain supplier's resource return

Where things go wrong is if a stateful link is created when the consumer obtains the suppliers resource before the supplier has finished probing - which from what's been said is illegal.

It just doesn't work (which means "invalid" rather than "illegal" I guess, but whatever :-)).

Admittedly, the original design didn't take this particular case into account and I'm not actually sure how it can be taken into account either.

If the link is created by the consumer in the scenario above, its status will be updated twice in a row after the consumer probe returns and after the supplier probe returns. It looks like this update would need to work regardless of the order it which this happened which sounds somewhat challenging. I would need to think about that.

So if I'm not mistaken it can be made work with the help of the (completely untested) attached patch (of course, the documentation would need to be updated too).

The key observation here is that it should be fine to create a link from the consumer driver's probe while the supplier is still probing if the consumer has some way to find out that the supplier is functional at that point (like when it has published itself already in your example above). The initial state of the link can be "consumer probe" in that case and I don't see a reason why that might not work.

Rafael J. Wysocki

10:47 p.m.

[CC Greg]

On Tuesday, January 15, 2019 1:04:02 AM CET Rafael J. Wysocki wrote:

...

[CC Lukas and linux-pm]

On Mon, Jan 14, 2019 at 1:32 PM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

[cut]

...
...
This thread is discussing how to deal with Armada DRM, its use of the component helper, TDA998x's hybrid use of the component helper and DRM bridges.

Currently, DRM bridges register into the DRM system and are added to a global list of bridges. When a DRM driver binds, it looks up the DRM bridge and attaches to it. There is no way in generic DRM code for the DRM driver to know if the DRM bridge is unbound from DRM, consequently a DRM driver may continue trying to call functions in the DRM bridge driver using memory that has already been freed.

We had similar issues with imx-drm, and a number of years ago at a kernel summit, this was discussed, and I proposed a system which is now known as the component helper. This handles the problem of a multi-component system.

However, DRM bridge was already established, and there appears to be no desire to convert DRM bridges and DRM drivers to use the component helper.

We are presently in the situation where Armada DRM is incompatible with the DRM bridges way of doing things, and making it compatible essentially means introducing potential use-after-free bugs into the code.

Device links in their stateful form appear to provide an alternative to the component helper, in that a stateful device link will remove consumers of a resource when the supplier is going away - which is exactly the problem which the component helper is solving. The difference is that device links look like being a cleaner solution.

Just like the component helper, a stateful link would unbind the consumer of a resource when the supplier goes away - which is exactly the behaviour we're wanting.

The problem is that the connection between various drivers is only really known when the drivers obtain their resources, and the following can happen:

supplier consumer

probe() alloc probe() publish obtain supplier's resource return

Where things go wrong is if a stateful link is created when the consumer obtains the suppliers resource before the supplier has finished probing - which from what's been said is illegal.

It just doesn't work (which means "invalid" rather than "illegal" I guess, but whatever :-)).

Admittedly, the original design didn't take this particular case into account and I'm not actually sure how it can be taken into account either.

If the link is created by the consumer in the scenario above, its status will be updated twice in a row after the consumer probe returns and after the supplier probe returns. It looks like this update would need to work regardless of the order it which this happened which sounds somewhat challenging. I would need to think about that.

So if I'm not mistaken it can be made work with the help of the (completely untested) attached patch (of course, the documentation would need to be updated too).

The key observation here is that it should be fine to create a link from the consumer driver's probe while the supplier is still probing if the consumer has some way to find out that the supplier is functional at that point (like when it has published itself already in your example above). The initial state of the link can be "consumer probe" in that case and I don't see a reason why that might not work.

Below is a more complete patch that should take all of the corner cases into account unless I have missed anything. Testing it would be appreciated. :-)

--- From: Rafael J. Wysocki rafael.j.wysocki@intel.com Subject: [PATCH] driver core: Fix adding device links to probing suppliers

Currently, it is not valid to add a device link from a consumer driver ->probe() callback to a supplier that is still probing too, but generally this is a valid use case. For example, if the consumer has just acquired a resource that can only be available when the supplier is functional, adding a device link to that supplier right away should be safe (and even desirable arguably), but device_link_add() doesn't handle that case correctly and the initial state of the link created by it is wrong then.

To address this problem, change the initial state of device links added between a probing supplier and a probing consumer to DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to skip such links on the supplier side.

With this change, if the supplier probe completes first, device_links_driver_bound() called for it will skip the link state update and when it is called for the consumer, the link state will be updated to "active". In turn, if the consumer probe completes first, device_links_driver_bound() called for it will change the state of the link to "active" and when it is called for the supplier, the link status update will be skipped.

However, in principle the supplier or consumer probe may still fail after the link has been added, so modify device_links_no_driver() to change device links in the "active" or "consumer probe" state to "dormant" on the supplier side and update __device_links_no_driver() to change the link state to "available" only if it is "consumer probe" or "active".

Then, if the supplier probe fails first, the leftover link to the probing consumer will become "dormant" and device_links_no_driver() called for the consumer (when its probe fails) will clean it up. In turn, if the consumer probe fails first, it will either drop the link, or change its state to "available" and, in the latter case, when device_links_no_driver() is called for the supplier, it will update the link state to "dormant". [If the supplier probe fails, but the consumer probe succeeds, which should not happen as long as the consumer driver is correct, the link still will be around, but it will be "dormant" until the supplier is probed again.]

Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com --- Documentation/driver-api/device_link.rst | 10 ++-- drivers/base/core.c | 74 +++++++++++++++++++++++++++---- 2 files changed, 73 insertions(+), 11 deletions(-)

Index: linux-pm/drivers/base/core.c =================================================================== --- linux-pm.orig/drivers/base/core.c +++ linux-pm/drivers/base/core.c @@ -260,17 +260,26 @@ struct device_link *device_link_add(stru link->status = DL_STATE_NONE; } else { switch (supplier->links.status) { - case DL_DEV_DRIVER_BOUND: + case DL_DEV_PROBING: switch (consumer->links.status) { case DL_DEV_PROBING: /* - * Some callers expect the link creation during - * consumer driver probe to resume the supplier - * even without DL_FLAG_RPM_ACTIVE. + * A consumer driver can create a link to a + * supplier that has not completed its probing + * yet as long as it knows that the supplier is + * already functional (for example, it has just + * acquired some resources from the supplier). */ - if (flags & DL_FLAG_PM_RUNTIME) - pm_runtime_resume(supplier); - + link->status = DL_STATE_CONSUMER_PROBE; + break; + default: + link->status = DL_STATE_DORMANT; + break; + } + break; + case DL_DEV_DRIVER_BOUND: + switch (consumer->links.status) { + case DL_DEV_PROBING: link->status = DL_STATE_CONSUMER_PROBE; break; case DL_DEV_DRIVER_BOUND: @@ -291,6 +300,14 @@ struct device_link *device_link_add(stru }

/* + * Some callers expect the link creation during consumer driver probe to + * resume the supplier even without DL_FLAG_RPM_ACTIVE. + */ + if (link->status == DL_STATE_CONSUMER_PROBE && + flags & DL_FLAG_PM_RUNTIME) + pm_runtime_resume(supplier); + + /* * Move the consumer and all of the devices depending on it to the end * of dpm_list and the devices_kset list. * @@ -474,6 +491,16 @@ void device_links_driver_bound(struct de if (link->flags & DL_FLAG_STATELESS) continue;

+ /* + * Links created during consumer probe may be in the "consumer + * probe" state to start with if the supplier is still probing + * when they are created and they may become "active" if the + * consumer probe returns first. Skip them here. + */ + if (link->status == DL_STATE_CONSUMER_PROBE || + link->status == DL_STATE_ACTIVE) + continue; + WARN_ON(link->status != DL_STATE_DORMANT); WRITE_ONCE(link->status, DL_STATE_AVAILABLE); } @@ -513,17 +540,48 @@ static void __device_links_no_driver(str

if (link->flags & DL_FLAG_AUTOREMOVE_CONSUMER) kref_put(&link->kref, __device_link_del); - else if (link->status != DL_STATE_SUPPLIER_UNBIND) + else if (link->status == DL_STATE_CONSUMER_PROBE || + link->status == DL_STATE_ACTIVE) WRITE_ONCE(link->status, DL_STATE_AVAILABLE); }

dev->links.status = DL_DEV_NO_DRIVER; }

+/** + * device_links_no_driver - Update links after failing driver probe. + * @dev: Device whose driver has just failed to probe. + * + * Clean up leftover links to consumers for @dev and invoke + * %__device_links_no_driver() to update links to suppliers for it as + * appropriate. + * + * Links with the DL_FLAG_STATELESS flag set are ignored. + */ void device_links_no_driver(struct device *dev) { + struct device_link *link; + device_links_write_lock(); + + list_for_each_entry(link, &dev->links.consumers, s_node) { + if (link->flags & DL_FLAG_STATELESS) + continue; + + /* + * The probe has failed, so if the status of the link is + * "consumer probe" or "active", it must have been added by + * a probing consumer while this device was still probing. + * Change its state to "dormant", as it represents a valid + * relationship, but it is not functionally meaningful. + */ + if (link->status == DL_STATE_CONSUMER_PROBE || + link->status == DL_STATE_ACTIVE) + WRITE_ONCE(link->status, DL_STATE_DORMANT); + } + __device_links_no_driver(dev); + device_links_write_unlock(); }

Index: linux-pm/Documentation/driver-api/device_link.rst =================================================================== --- linux-pm.orig/Documentation/driver-api/device_link.rst +++ linux-pm/Documentation/driver-api/device_link.rst @@ -59,11 +59,15 @@ device ``->probe`` callback or a boot-ti

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's -``->probe`` callback while the supplier hasn't probed yet: Had the driver -core known about the device link earlier, it wouldn't have probed the +``->probe`` callback while the supplier hasn't started to probe yet: Had the +driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on -non-presence. +non-presence. [Note that it is valid to create a link from the consumer's +``->probe`` callback while the supplier is still probing, but the consumer must +know that the supplier is functional already at the link creation time (that is +the case, for instance, if the consumer has just acquired some resources that +would not have been available had the supplier not been functional then).]

If a device link is added in the ``->probe`` callback of the supplier or consumer driver, it is typically deleted in its ``->remove`` callback for

Daniel Vetter

16 Jan 16 Jan

6:42 p.m.

On Tue, Jan 15, 2019 at 11:47 PM Rafael J. Wysocki rjw@rjwysocki.net wrote:

...

[CC Greg]

On Tuesday, January 15, 2019 1:04:02 AM CET Rafael J. Wysocki wrote:

...
[CC Lukas and linux-pm]

On Mon, Jan 14, 2019 at 1:32 PM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

[cut]

...
...
This thread is discussing how to deal with Armada DRM, its use of the component helper, TDA998x's hybrid use of the component helper and DRM bridges.

Currently, DRM bridges register into the DRM system and are added to a global list of bridges. When a DRM driver binds, it looks up the DRM bridge and attaches to it. There is no way in generic DRM code for the DRM driver to know if the DRM bridge is unbound from DRM, consequently a DRM driver may continue trying to call functions in the DRM bridge driver using memory that has already been freed.

We had similar issues with imx-drm, and a number of years ago at a kernel summit, this was discussed, and I proposed a system which is now known as the component helper. This handles the problem of a multi-component system.

However, DRM bridge was already established, and there appears to be no desire to convert DRM bridges and DRM drivers to use the component helper.

We are presently in the situation where Armada DRM is incompatible with the DRM bridges way of doing things, and making it compatible essentially means introducing potential use-after-free bugs into the code.

Device links in their stateful form appear to provide an alternative to the component helper, in that a stateful device link will remove consumers of a resource when the supplier is going away - which is exactly the problem which the component helper is solving. The difference is that device links look like being a cleaner solution.

Just like the component helper, a stateful link would unbind the consumer of a resource when the supplier goes away - which is exactly the behaviour we're wanting.

The problem is that the connection between various drivers is only really known when the drivers obtain their resources, and the following can happen:

supplier consumer

probe() alloc probe() publish obtain supplier's resource return

Where things go wrong is if a stateful link is created when the consumer obtains the suppliers resource before the supplier has finished probing - which from what's been said is illegal.

It just doesn't work (which means "invalid" rather than "illegal" I guess, but whatever :-)).

Admittedly, the original design didn't take this particular case into account and I'm not actually sure how it can be taken into account either.

If the link is created by the consumer in the scenario above, its status will be updated twice in a row after the consumer probe returns and after the supplier probe returns. It looks like this update would need to work regardless of the order it which this happened which sounds somewhat challenging. I would need to think about that.

So if I'm not mistaken it can be made work with the help of the (completely untested) attached patch (of course, the documentation would need to be updated too).

The key observation here is that it should be fine to create a link from the consumer driver's probe while the supplier is still probing if the consumer has some way to find out that the supplier is functional at that point (like when it has published itself already in your example above). The initial state of the link can be "consumer probe" in that case and I don't see a reason why that might not work.

Below is a more complete patch that should take all of the corner cases into account unless I have missed anything. Testing it would be appreciated. :-)

From: Rafael J. Wysocki rafael.j.wysocki@intel.com Subject: [PATCH] driver core: Fix adding device links to probing suppliers

Currently, it is not valid to add a device link from a consumer driver ->probe() callback to a supplier that is still probing too, but generally this is a valid use case. For example, if the consumer has just acquired a resource that can only be available when the supplier is functional, adding a device link to that supplier right away should be safe (and even desirable arguably), but device_link_add() doesn't handle that case correctly and the initial state of the link created by it is wrong then.

To address this problem, change the initial state of device links added between a probing supplier and a probing consumer to DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to skip such links on the supplier side.

With this change, if the supplier probe completes first, device_links_driver_bound() called for it will skip the link state update and when it is called for the consumer, the link state will be updated to "active". In turn, if the consumer probe completes first, device_links_driver_bound() called for it will change the state of the link to "active" and when it is called for the supplier, the link status update will be skipped.

However, in principle the supplier or consumer probe may still fail after the link has been added, so modify device_links_no_driver() to change device links in the "active" or "consumer probe" state to "dormant" on the supplier side and update __device_links_no_driver() to change the link state to "available" only if it is "consumer probe" or "active".

Then, if the supplier probe fails first, the leftover link to the probing consumer will become "dormant" and device_links_no_driver() called for the consumer (when its probe fails) will clean it up. In turn, if the consumer probe fails first, it will either drop the link, or change its state to "available" and, in the latter case, when device_links_no_driver() is called for the supplier, it will update the link state to "dormant". [If the supplier probe fails, but the consumer probe succeeds, which should not happen as long as the consumer driver is correct, the link still will be around, but it will be "dormant" until the supplier is probed again.]

Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com

Pulling in a bunch of drm_bridge and drm_panel people. See huge discussion upthread, this here is Rafael's idea for making device links more useful. Would be great if someone could test this out with panel/bridge dependencies.

I guess this here isn't yet solving the reprobe issue where the provider was unloaded and reloaded (which should cause the consumer to reprobe too, with the EPROBE_DEFERRED logic). I think that was the other issue we've hit. -Daniel

...

Documentation/driver-api/device_link.rst | 10 ++-- drivers/base/core.c | 74 +++++++++++++++++++++++++++---- 2 files changed, 73 insertions(+), 11 deletions(-)

Index: linux-pm/drivers/base/core.c

--- linux-pm.orig/drivers/base/core.c +++ linux-pm/drivers/base/core.c @@ -260,17 +260,26 @@ struct device_link *device_link_add(stru link->status = DL_STATE_NONE; } else { switch (supplier->links.status) {
          case DL_DEV_DRIVER_BOUND:
          case DL_DEV_PROBING:
                  switch (consumer->links.status) {
                  case DL_DEV_PROBING:
                          /*
                           * Some callers expect the link creation during
                           * consumer driver probe to resume the supplier
                           * even without DL_FLAG_RPM_ACTIVE.
                           * A consumer driver can create a link to a
                           * supplier that has not completed its probing
                           * yet as long as it knows that the supplier is
                           * already functional (for example, it has just
                           * acquired some resources from the supplier).
                           */
                          if (flags & DL_FLAG_PM_RUNTIME)
                                  pm_runtime_resume(supplier);
                          link->status = DL_STATE_CONSUMER_PROBE;
                          break;
                  default:
                          link->status = DL_STATE_DORMANT;
                          break;
                  }
                  break;
          case DL_DEV_DRIVER_BOUND:
                  switch (consumer->links.status) {
                  case DL_DEV_PROBING:
                          link->status = DL_STATE_CONSUMER_PROBE;
                          break;
                  case DL_DEV_DRIVER_BOUND:
@@ -291,6 +300,14 @@ struct device_link *device_link_add(stru }
    /*
   * Some callers expect the link creation during consumer driver probe to
   * resume the supplier even without DL_FLAG_RPM_ACTIVE.
   */
  if (link->status == DL_STATE_CONSUMER_PROBE &&
      flags & DL_FLAG_PM_RUNTIME)
          pm_runtime_resume(supplier);
  /*
   * Move the consumer and all of the devices depending on it to the end
   * of dpm_list and the devices_kset list.
   *
@@ -474,6 +491,16 @@ void device_links_driver_bound(struct de if (link->flags & DL_FLAG_STATELESS) continue;
          /*
           * Links created during consumer probe may be in the "consumer
           * probe" state to start with if the supplier is still probing
           * when they are created and they may become "active" if the
           * consumer probe returns first.  Skip them here.
           */
          if (link->status == DL_STATE_CONSUMER_PROBE ||
              link->status == DL_STATE_ACTIVE)
                  continue;
          WARN_ON(link->status != DL_STATE_DORMANT);
          WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
  }
@@ -513,17 +540,48 @@ static void __device_links_no_driver(str
            if (link->flags & DL_FLAG_AUTOREMOVE_CONSUMER)
                    kref_put(&link->kref, __device_link_del);
          else if (link->status != DL_STATE_SUPPLIER_UNBIND)
          else if (link->status == DL_STATE_CONSUMER_PROBE ||
                   link->status == DL_STATE_ACTIVE)
                  WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
  }

  dev->links.status = DL_DEV_NO_DRIVER;
}

+/**

device_links_no_driver - Update links after failing driver probe.

@dev: Device whose driver has just failed to probe.

Clean up leftover links to consumers for @dev and invoke

%__device_links_no_driver() to update links to suppliers for it as

appropriate.

Links with the DL_FLAG_STATELESS flag set are ignored.

*/

void device_links_no_driver(struct device *dev) {
  struct device_link *link;
  device_links_write_lock();
  list_for_each_entry(link, &dev->links.consumers, s_node) {
          if (link->flags & DL_FLAG_STATELESS)
                  continue;
          /*
           * The probe has failed, so if the status of the link is
           * "consumer probe" or "active", it must have been added by
           * a probing consumer while this device was still probing.
           * Change its state to "dormant", as it represents a valid
           * relationship, but it is not functionally meaningful.
           */
          if (link->status == DL_STATE_CONSUMER_PROBE ||
              link->status == DL_STATE_ACTIVE)
                  WRITE_ONCE(link->status, DL_STATE_DORMANT);
  }
  __device_links_no_driver(dev);
  device_links_write_unlock();
}

Index: linux-pm/Documentation/driver-api/device_link.rst

--- linux-pm.orig/Documentation/driver-api/device_link.rst +++ linux-pm/Documentation/driver-api/device_link.rst @@ -59,11 +59,15 @@ device ``->probe`` callback or a boot-ti

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's -``->probe`` callback while the supplier hasn't probed yet: Had the driver -core known about the device link earlier, it wouldn't have probed the +``->probe`` callback while the supplier hasn't started to probe yet: Had the +driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on -non-presence. +non-presence. [Note that it is valid to create a link from the consumer's +``->probe`` callback while the supplier is still probing, but the consumer must +know that the supplier is functional already at the link creation time (that is +the case, for instance, if the consumer has just acquired some resources that +would not have been available had the supplier not been functional then).]

If a device link is added in the ``->probe`` callback of the supplier or consumer driver, it is typically deleted in its ``->remove`` callback for

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Rafael J. Wysocki

10:43 p.m.

On Wednesday, January 16, 2019 7:42:45 PM CET Daniel Vetter wrote:

...

On Tue, Jan 15, 2019 at 11:47 PM Rafael J. Wysocki rjw@rjwysocki.net wrote:

...
[CC Greg]

On Tuesday, January 15, 2019 1:04:02 AM CET Rafael J. Wysocki wrote:

...
[CC Lukas and linux-pm]

On Mon, Jan 14, 2019 at 1:32 PM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

[cut]

...
...
This thread is discussing how to deal with Armada DRM, its use of the component helper, TDA998x's hybrid use of the component helper and DRM bridges.

Currently, DRM bridges register into the DRM system and are added to a global list of bridges. When a DRM driver binds, it looks up the DRM bridge and attaches to it. There is no way in generic DRM code for the DRM driver to know if the DRM bridge is unbound from DRM, consequently a DRM driver may continue trying to call functions in the DRM bridge driver using memory that has already been freed.

We had similar issues with imx-drm, and a number of years ago at a kernel summit, this was discussed, and I proposed a system which is now known as the component helper. This handles the problem of a multi-component system.

However, DRM bridge was already established, and there appears to be no desire to convert DRM bridges and DRM drivers to use the component helper.

We are presently in the situation where Armada DRM is incompatible with the DRM bridges way of doing things, and making it compatible essentially means introducing potential use-after-free bugs into the code.

Device links in their stateful form appear to provide an alternative to the component helper, in that a stateful device link will remove consumers of a resource when the supplier is going away - which is exactly the problem which the component helper is solving. The difference is that device links look like being a cleaner solution.

Just like the component helper, a stateful link would unbind the consumer of a resource when the supplier goes away - which is exactly the behaviour we're wanting.

The problem is that the connection between various drivers is only really known when the drivers obtain their resources, and the following can happen:

supplier consumer

probe() alloc probe() publish obtain supplier's resource return

Where things go wrong is if a stateful link is created when the consumer obtains the suppliers resource before the supplier has finished probing - which from what's been said is illegal.

It just doesn't work (which means "invalid" rather than "illegal" I guess, but whatever :-)).

Admittedly, the original design didn't take this particular case into account and I'm not actually sure how it can be taken into account either.

If the link is created by the consumer in the scenario above, its status will be updated twice in a row after the consumer probe returns and after the supplier probe returns. It looks like this update would need to work regardless of the order it which this happened which sounds somewhat challenging. I would need to think about that.

So if I'm not mistaken it can be made work with the help of the (completely untested) attached patch (of course, the documentation would need to be updated too).

The key observation here is that it should be fine to create a link from the consumer driver's probe while the supplier is still probing if the consumer has some way to find out that the supplier is functional at that point (like when it has published itself already in your example above). The initial state of the link can be "consumer probe" in that case and I don't see a reason why that might not work.

Below is a more complete patch that should take all of the corner cases into account unless I have missed anything. Testing it would be appreciated. :-)

From: Rafael J. Wysocki rafael.j.wysocki@intel.com Subject: [PATCH] driver core: Fix adding device links to probing suppliers

Currently, it is not valid to add a device link from a consumer driver ->probe() callback to a supplier that is still probing too, but generally this is a valid use case. For example, if the consumer has just acquired a resource that can only be available when the supplier is functional, adding a device link to that supplier right away should be safe (and even desirable arguably), but device_link_add() doesn't handle that case correctly and the initial state of the link created by it is wrong then.

To address this problem, change the initial state of device links added between a probing supplier and a probing consumer to DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to skip such links on the supplier side.

With this change, if the supplier probe completes first, device_links_driver_bound() called for it will skip the link state update and when it is called for the consumer, the link state will be updated to "active". In turn, if the consumer probe completes first, device_links_driver_bound() called for it will change the state of the link to "active" and when it is called for the supplier, the link status update will be skipped.

However, in principle the supplier or consumer probe may still fail after the link has been added, so modify device_links_no_driver() to change device links in the "active" or "consumer probe" state to "dormant" on the supplier side and update __device_links_no_driver() to change the link state to "available" only if it is "consumer probe" or "active".

Then, if the supplier probe fails first, the leftover link to the probing consumer will become "dormant" and device_links_no_driver() called for the consumer (when its probe fails) will clean it up. In turn, if the consumer probe fails first, it will either drop the link, or change its state to "available" and, in the latter case, when device_links_no_driver() is called for the supplier, it will update the link state to "dormant". [If the supplier probe fails, but the consumer probe succeeds, which should not happen as long as the consumer driver is correct, the link still will be around, but it will be "dormant" until the supplier is probed again.]

Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com

Pulling in a bunch of drm_bridge and drm_panel people. See huge discussion upthread, this here is Rafael's idea for making device links more useful. Would be great if someone could test this out with panel/bridge dependencies.

I guess this here isn't yet solving the reprobe issue where the provider was unloaded and reloaded (which should cause the consumer to reprobe too, with the EPROBE_DEFERRED logic). I think that was the other issue we've hit.

I don't think it is addressed here.

Can anyone please explain to me what happens in more detail?

Cheers, Rafael

Daniel Vetter

17 Jan 17 Jan

12:20 p.m.

On Wed, Jan 16, 2019 at 11:44 PM Rafael J. Wysocki rjw@rjwysocki.net wrote:

...

On Wednesday, January 16, 2019 7:42:45 PM CET Daniel Vetter wrote:

...
On Tue, Jan 15, 2019 at 11:47 PM Rafael J. Wysocki rjw@rjwysocki.net wrote:

...
[CC Greg]

On Tuesday, January 15, 2019 1:04:02 AM CET Rafael J. Wysocki wrote:

...
[CC Lukas and linux-pm]

On Mon, Jan 14, 2019 at 1:32 PM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

[cut]

...
...
This thread is discussing how to deal with Armada DRM, its use of the component helper, TDA998x's hybrid use of the component helper and DRM bridges.

Currently, DRM bridges register into the DRM system and are added to a global list of bridges. When a DRM driver binds, it looks up the DRM bridge and attaches to it. There is no way in generic DRM code for the DRM driver to know if the DRM bridge is unbound from DRM, consequently a DRM driver may continue trying to call functions in the DRM bridge driver using memory that has already been freed.

We had similar issues with imx-drm, and a number of years ago at a kernel summit, this was discussed, and I proposed a system which is now known as the component helper. This handles the problem of a multi-component system.

However, DRM bridge was already established, and there appears to be no desire to convert DRM bridges and DRM drivers to use the component helper.

We are presently in the situation where Armada DRM is incompatible with the DRM bridges way of doing things, and making it compatible essentially means introducing potential use-after-free bugs into the code.

Device links in their stateful form appear to provide an alternative to the component helper, in that a stateful device link will remove consumers of a resource when the supplier is going away - which is exactly the problem which the component helper is solving. The difference is that device links look like being a cleaner solution.

Just like the component helper, a stateful link would unbind the consumer of a resource when the supplier goes away - which is exactly the behaviour we're wanting.

The problem is that the connection between various drivers is only really known when the drivers obtain their resources, and the following can happen:

supplier consumer

probe() alloc probe() publish obtain supplier's resource return

Where things go wrong is if a stateful link is created when the consumer obtains the suppliers resource before the supplier has finished probing - which from what's been said is illegal.

It just doesn't work (which means "invalid" rather than "illegal" I guess, but whatever :-)).

Admittedly, the original design didn't take this particular case into account and I'm not actually sure how it can be taken into account either.

If the link is created by the consumer in the scenario above, its status will be updated twice in a row after the consumer probe returns and after the supplier probe returns. It looks like this update would need to work regardless of the order it which this happened which sounds somewhat challenging. I would need to think about that.

So if I'm not mistaken it can be made work with the help of the (completely untested) attached patch (of course, the documentation would need to be updated too).

The key observation here is that it should be fine to create a link from the consumer driver's probe while the supplier is still probing if the consumer has some way to find out that the supplier is functional at that point (like when it has published itself already in your example above). The initial state of the link can be "consumer probe" in that case and I don't see a reason why that might not work.

Below is a more complete patch that should take all of the corner cases into account unless I have missed anything. Testing it would be appreciated. :-)

From: Rafael J. Wysocki rafael.j.wysocki@intel.com Subject: [PATCH] driver core: Fix adding device links to probing suppliers

Currently, it is not valid to add a device link from a consumer driver ->probe() callback to a supplier that is still probing too, but generally this is a valid use case. For example, if the consumer has just acquired a resource that can only be available when the supplier is functional, adding a device link to that supplier right away should be safe (and even desirable arguably), but device_link_add() doesn't handle that case correctly and the initial state of the link created by it is wrong then.

To address this problem, change the initial state of device links added between a probing supplier and a probing consumer to DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to skip such links on the supplier side.

With this change, if the supplier probe completes first, device_links_driver_bound() called for it will skip the link state update and when it is called for the consumer, the link state will be updated to "active". In turn, if the consumer probe completes first, device_links_driver_bound() called for it will change the state of the link to "active" and when it is called for the supplier, the link status update will be skipped.

However, in principle the supplier or consumer probe may still fail after the link has been added, so modify device_links_no_driver() to change device links in the "active" or "consumer probe" state to "dormant" on the supplier side and update __device_links_no_driver() to change the link state to "available" only if it is "consumer probe" or "active".

Then, if the supplier probe fails first, the leftover link to the probing consumer will become "dormant" and device_links_no_driver() called for the consumer (when its probe fails) will clean it up. In turn, if the consumer probe fails first, it will either drop the link, or change its state to "available" and, in the latter case, when device_links_no_driver() is called for the supplier, it will update the link state to "dormant". [If the supplier probe fails, but the consumer probe succeeds, which should not happen as long as the consumer driver is correct, the link still will be around, but it will be "dormant" until the supplier is probed again.]

Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com

Pulling in a bunch of drm_bridge and drm_panel people. See huge discussion upthread, this here is Rafael's idea for making device links more useful. Would be great if someone could test this out with panel/bridge dependencies.

I guess this here isn't yet solving the reprobe issue where the provider was unloaded and reloaded (which should cause the consumer to reprobe too, with the EPROBE_DEFERRED logic). I think that was the other issue we've hit.

I don't think it is addressed here.

Can anyone please explain to me what happens in more detail?

My understanding (and this might be wrong, Russell, Andrzej, ... pls correct) is that the following sequence goes wrong:

- componentized driver (both producer and consumer) fully loaded&working - you unbind the producer/one of the components - component framework or driver core through device_link also unbinds the master/consumer - you reload/rebind the component - with the component framework this results in the master being rebound, and the overall driver working again. With device_links nothing happens.

I think there was a patch floating around once to put drivers unbound due to device_links onto the deferred probe list, so that the next time something is bound they have a chance to rebind. But I can't find that patch anymore, maybe someone else has the link still?

Thanks, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Lucas Stach

18 Jan 18 Jan

9:36 a.m.

Am Donnerstag, den 17.01.2019, 13:20 +0100 schrieb Daniel Vetter: [...]

...

...
I don't think it is addressed here.

Can anyone please explain to me what happens in more detail?

My understanding (and this might be wrong, Russell, Andrzej, ... pls correct) is that the following sequence goes wrong:

componentized driver (both producer and consumer) fully loaded&working

you unbind the producer/one of the components

component framework or driver core through device_link also unbinds

the master/consumer

you reload/rebind the component

with the component framework this results in the master being

rebound, and the overall driver working again. With device_links nothing happens.

I think there was a patch floating around once to put drivers unbound due to device_links onto the deferred probe list, so that the next time something is bound they have a chance to rebind. But I can't find that patch anymore, maybe someone else has the link still?

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1348023.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1355243.html

I can repost if this is finally deemed a good idea.

Regards, Lucas

Rafael J. Wysocki

10:03 a.m.

On Fri, Jan 18, 2019 at 10:37 AM Lucas Stach l.stach@pengutronix.de wrote:

...

Am Donnerstag, den 17.01.2019, 13:20 +0100 schrieb Daniel Vetter: [...]

...
...
I don't think it is addressed here.

Can anyone please explain to me what happens in more detail?

My understanding (and this might be wrong, Russell, Andrzej, ... pls correct) is that the following sequence goes wrong:

componentized driver (both producer and consumer) fully loaded&working

you unbind the producer/one of the components

component framework or driver core through device_link also unbinds

the master/consumer

you reload/rebind the component

with the component framework this results in the master being

rebound, and the overall driver working again. With device_links nothing happens.

I think there was a patch floating around once to put drivers unbound due to device_links onto the deferred probe list, so that the next time something is bound they have a chance to rebind. But I can't find that patch anymore, maybe someone else has the link still?

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1348023.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1355243.html

I can repost if this is finally deemed a good idea.

I don't think this will help.

Two things appear to be needed here: missing suppliers need to be probed automatically at a consumer probe time and "persistent" links created by ->probe() callbacks should not be reference-counted when those callbacks run again. I'm going to cut a patch to do that (early next week if all goes well).

Daniel Vetter

11:06 a.m.

On Fri, Jan 18, 2019 at 11:03 AM Rafael J. Wysocki rafael@kernel.org wrote:

...

On Fri, Jan 18, 2019 at 10:37 AM Lucas Stach l.stach@pengutronix.de wrote:

...
Am Donnerstag, den 17.01.2019, 13:20 +0100 schrieb Daniel Vetter: [...]

...
...
I don't think it is addressed here.

Can anyone please explain to me what happens in more detail?

My understanding (and this might be wrong, Russell, Andrzej, ... pls correct) is that the following sequence goes wrong:

componentized driver (both producer and consumer) fully loaded&working

you unbind the producer/one of the components

component framework or driver core through device_link also unbinds

the master/consumer

you reload/rebind the component

with the component framework this results in the master being

rebound, and the overall driver working again. With device_links nothing happens.

I think there was a patch floating around once to put drivers unbound due to device_links onto the deferred probe list, so that the next time something is bound they have a chance to rebind. But I can't find that patch anymore, maybe someone else has the link still?

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1348023.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1355243.html

Thanks for digging it out, those are the patches I had in mind.

...

...
I can repost if this is finally deemed a good idea.

I don't think this will help.

Two things appear to be needed here: missing suppliers need to be probed automatically at a consumer probe time and "persistent" links created by ->probe() callbacks should not be reference-counted when those callbacks run again. I'm going to cut a patch to do that (early next week if all goes well).

I thought this should work, since it makes an unbind of a consumer act the same way as if that consumer's bind function has hit an EPROBE_DEFER. Example:

1. consumer's bind is called, realizes (through a subsystem function like of_drm_find_panel) that the producer isn't there yet, fails with EPROBE DEFER.

2. driver core puts the consumer onto the deferred probe list.

3. producer gets loaded, registers the panel

4. driver core re-runs the consumer's bind function

5. consumer's bind function finds the produced and sets up the device link.

Now for the unbind case:

1. producer is unbound, driver core unbinds consumer due to the device_link

2. (Only with Lucas' patch) driver core puts the consumer onto the deferred probe list.

3. Developer (this really is useful for development) rebinds/reloads producer driver, which re-registers the panel

4 & 5 work exactly the same as with the initial load sequence.

With this initial loading and reloading would be very similar, and I think that's a good thing. It also solves the device_link lifetime/refcounting issue, since the device_link gets torn down in/restored completely, and reloading isn't a special case like in your proposed solution - there's no device_link left over from a previous loading of the driver, the dependency is only handled by putting the consumer (back) onto the deferred probe list.

Cheers, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Rafael J. Wysocki

11:17 a.m.

On Fri, Jan 18, 2019 at 12:06 PM Daniel Vetter daniel@ffwll.ch wrote:

...

On Fri, Jan 18, 2019 at 11:03 AM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 18, 2019 at 10:37 AM Lucas Stach l.stach@pengutronix.de wrote:

...
Am Donnerstag, den 17.01.2019, 13:20 +0100 schrieb Daniel Vetter: [...]

...
...
I don't think it is addressed here.

Can anyone please explain to me what happens in more detail?

My understanding (and this might be wrong, Russell, Andrzej, ... pls correct) is that the following sequence goes wrong:

componentized driver (both producer and consumer) fully loaded&working

you unbind the producer/one of the components

component framework or driver core through device_link also unbinds

the master/consumer

you reload/rebind the component

with the component framework this results in the master being

rebound, and the overall driver working again. With device_links nothing happens.

I think there was a patch floating around once to put drivers unbound due to device_links onto the deferred probe list, so that the next time something is bound they have a chance to rebind. But I can't find that patch anymore, maybe someone else has the link still?

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1348023.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1355243.html

Thanks for digging it out, those are the patches I had in mind.

...
...
I can repost if this is finally deemed a good idea.

I don't think this will help.

Two things appear to be needed here: missing suppliers need to be probed automatically at a consumer probe time and "persistent" links created by ->probe() callbacks should not be reference-counted when those callbacks run again. I'm going to cut a patch to do that (early next week if all goes well).

I thought this should work, since it makes an unbind of a consumer act the same way as if that consumer's bind function has hit an EPROBE_DEFER. Example:

consumer's bind is called, realizes (through a subsystem function

like of_drm_find_panel) that the producer isn't there yet, fails with EPROBE DEFER.

driver core puts the consumer onto the deferred probe list.

producer gets loaded, registers the panel

driver core re-runs the consumer's bind function

consumer's bind function finds the produced and sets up the device link.

Now for the unbind case:

producer is unbound, driver core unbinds consumer due to the device_link

(Only with Lucas' patch) driver core puts the consumer onto the

deferred probe list.

Developer (this really is useful for development) rebinds/reloads

producer driver, which re-registers the panel

4 & 5 work exactly the same as with the initial load sequence.

I misunderstood the Russell's description, sorry.

My understanding was that the consumer driver would be rebound at this point and that the (missing) producer was expected to rebind as well in response.

...

With this initial loading and reloading would be very similar, and I think that's a good thing. It also solves the device_link lifetime/refcounting issue, since the device_link gets torn down in/restored completely, and reloading isn't a special case like in your proposed solution - there's no device_link left over from a previous loading of the driver, the dependency is only handled by putting the consumer (back) onto the deferred probe list.

My idea was based on incorrect understanding of the problem, so scratch it. :-)

The Lukas' patch would indeed work here, but I'm not sure if doing that for all links unconditionally is a good idea.

I'd rather have a link flag to indicate that this behavior is desirable.

Rafael J. Wysocki

11:37 a.m.

On Fri, Jan 18, 2019 at 12:17 PM Rafael J. Wysocki rafael@kernel.org wrote:

...

On Fri, Jan 18, 2019 at 12:06 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Fri, Jan 18, 2019 at 11:03 AM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 18, 2019 at 10:37 AM Lucas Stach l.stach@pengutronix.de wrote:

...
Am Donnerstag, den 17.01.2019, 13:20 +0100 schrieb Daniel Vetter: [...]

...
...
I don't think it is addressed here.

Can anyone please explain to me what happens in more detail?

My understanding (and this might be wrong, Russell, Andrzej, ... pls correct) is that the following sequence goes wrong:

componentized driver (both producer and consumer) fully loaded&working

you unbind the producer/one of the components

component framework or driver core through device_link also unbinds

the master/consumer

you reload/rebind the component

with the component framework this results in the master being

rebound, and the overall driver working again. With device_links nothing happens.

I think there was a patch floating around once to put drivers unbound due to device_links onto the deferred probe list, so that the next time something is bound they have a chance to rebind. But I can't find that patch anymore, maybe someone else has the link still?

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1348023.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1355243.html

Thanks for digging it out, those are the patches I had in mind.

...
...
I can repost if this is finally deemed a good idea.

I don't think this will help.

Two things appear to be needed here: missing suppliers need to be probed automatically at a consumer probe time and "persistent" links created by ->probe() callbacks should not be reference-counted when those callbacks run again. I'm going to cut a patch to do that (early next week if all goes well).

I thought this should work, since it makes an unbind of a consumer act the same way as if that consumer's bind function has hit an EPROBE_DEFER. Example:

consumer's bind is called, realizes (through a subsystem function

like of_drm_find_panel) that the producer isn't there yet, fails with EPROBE DEFER.

driver core puts the consumer onto the deferred probe list.

producer gets loaded, registers the panel

driver core re-runs the consumer's bind function

consumer's bind function finds the produced and sets up the device link.

Now for the unbind case:

producer is unbound, driver core unbinds consumer due to the device_link

(Only with Lucas' patch) driver core puts the consumer onto the

deferred probe list.

Developer (this really is useful for development) rebinds/reloads

producer driver, which re-registers the panel

4 & 5 work exactly the same as with the initial load sequence.

I misunderstood the Russell's description, sorry.

My understanding was that the consumer driver would be rebound at this point and that the (missing) producer was expected to rebind as well in response.

...
With this initial loading and reloading would be very similar, and I think that's a good thing. It also solves the device_link lifetime/refcounting issue, since the device_link gets torn down in/restored completely, and reloading isn't a special case like in your proposed solution - there's no device_link left over from a previous loading of the driver, the dependency is only handled by putting the consumer (back) onto the deferred probe list.

My idea was based on incorrect understanding of the problem, so scratch it. :-)

The Lukas' patch would indeed work here, but I'm not sure if doing that for all links unconditionally is a good idea.

I'd rather have a link flag to indicate that this behavior is desirable.

That said, the creation of a device link is a heavy-weight operation in general as it triggers a list reordering that may turn out to be recursive etc., so I'd rather avoid doing too much of that.

Moreover, creating a link between two devices once should be sufficient, so I'd prefer "persistent" links to be used in that case, but they need to be fixed to avoid the extra reference counting. That and a new link flag to indicate that the consumer should be probed automatically after binding the supplier driver.

Daniel Vetter

12:57 p.m.

On Fri, Jan 18, 2019 at 12:38 PM Rafael J. Wysocki rafael@kernel.org wrote:

...

On Fri, Jan 18, 2019 at 12:17 PM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 18, 2019 at 12:06 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Fri, Jan 18, 2019 at 11:03 AM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 18, 2019 at 10:37 AM Lucas Stach l.stach@pengutronix.de wrote:

...
Am Donnerstag, den 17.01.2019, 13:20 +0100 schrieb Daniel Vetter: [...]

...
> I don't think it is addressed here. > > Can anyone please explain to me what happens in more detail?

My understanding (and this might be wrong, Russell, Andrzej, ... pls correct) is that the following sequence goes wrong:

componentized driver (both producer and consumer) fully loaded&working

you unbind the producer/one of the components

component framework or driver core through device_link also unbinds

the master/consumer

you reload/rebind the component

with the component framework this results in the master being

rebound, and the overall driver working again. With device_links nothing happens.

I think there was a patch floating around once to put drivers unbound due to device_links onto the deferred probe list, so that the next time something is bound they have a chance to rebind. But I can't find that patch anymore, maybe someone else has the link still?

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1348023.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1355243.html

Thanks for digging it out, those are the patches I had in mind.

...
...
I can repost if this is finally deemed a good idea.

I don't think this will help.

Two things appear to be needed here: missing suppliers need to be probed automatically at a consumer probe time and "persistent" links created by ->probe() callbacks should not be reference-counted when those callbacks run again. I'm going to cut a patch to do that (early next week if all goes well).

I thought this should work, since it makes an unbind of a consumer act the same way as if that consumer's bind function has hit an EPROBE_DEFER. Example:

consumer's bind is called, realizes (through a subsystem function

like of_drm_find_panel) that the producer isn't there yet, fails with EPROBE DEFER.

driver core puts the consumer onto the deferred probe list.

producer gets loaded, registers the panel

driver core re-runs the consumer's bind function

consumer's bind function finds the produced and sets up the device link.

Now for the unbind case:

producer is unbound, driver core unbinds consumer due to the device_link

(Only with Lucas' patch) driver core puts the consumer onto the

deferred probe list.

Developer (this really is useful for development) rebinds/reloads

producer driver, which re-registers the panel

4 & 5 work exactly the same as with the initial load sequence.

I misunderstood the Russell's description, sorry.

My understanding was that the consumer driver would be rebound at this point and that the (missing) producer was expected to rebind as well in response.

...
With this initial loading and reloading would be very similar, and I think that's a good thing. It also solves the device_link lifetime/refcounting issue, since the device_link gets torn down in/restored completely, and reloading isn't a special case like in your proposed solution - there's no device_link left over from a previous loading of the driver, the dependency is only handled by putting the consumer (back) onto the deferred probe list.

My idea was based on incorrect understanding of the problem, so scratch it. :-)

The Lukas' patch would indeed work here, but I'm not sure if doing that for all links unconditionally is a good idea.

I'd rather have a link flag to indicate that this behavior is desirable.

That said, the creation of a device link is a heavy-weight operation in general as it triggers a list reordering that may turn out to be recursive etc., so I'd rather avoid doing too much of that.

Moreover, creating a link between two devices once should be sufficient, so I'd prefer "persistent" links to be used in that case, but they need to be fixed to avoid the extra reference counting. That and a new link flag to indicate that the consumer should be probed automatically after binding the supplier driver.

Yeah if persistent links make more sense from an implementation pov then that's all fine, as long as it results in the same behaviour for from the pov of all the involved bind functions. So if the core handles all the refcount trickery, that's good.

Also an explicit flag sounds like a good idea, since for some other problems we want to make the device_link opt-in. There's otherwise some loops in some cases apparently. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Rafael J. Wysocki

24 Jan 24 Jan

11 a.m.

On Friday, January 18, 2019 1:57:47 PM CET Daniel Vetter wrote:

...

On Fri, Jan 18, 2019 at 12:38 PM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 18, 2019 at 12:17 PM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 18, 2019 at 12:06 PM Daniel Vetter daniel@ffwll.ch wrote:

...
On Fri, Jan 18, 2019 at 11:03 AM Rafael J. Wysocki rafael@kernel.org wrote:

...
On Fri, Jan 18, 2019 at 10:37 AM Lucas Stach l.stach@pengutronix.de wrote:

...
Am Donnerstag, den 17.01.2019, 13:20 +0100 schrieb Daniel Vetter: [...] > > > I don't think it is addressed here. > > > > Can anyone please explain to me what happens in more detail? > > My understanding (and this might be wrong, Russell, Andrzej, ... pls > correct) is that the following sequence goes wrong: > > - componentized driver (both producer and consumer) fully loaded&working > - you unbind the producer/one of the components > - component framework or driver core through device_link also unbinds > the master/consumer > - you reload/rebind the component > - with the component framework this results in the master being > rebound, and the overall driver working again. With device_links > nothing happens. > > I think there was a patch floating around once to put drivers unbound > due to device_links onto the deferred probe list, so that the next > time something is bound they have a chance to rebind. But I can't find > that patch anymore, maybe someone else has the link still?

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1348023.html https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1355243.html

Thanks for digging it out, those are the patches I had in mind.

...
...
I can repost if this is finally deemed a good idea.

I don't think this will help.

Two things appear to be needed here: missing suppliers need to be probed automatically at a consumer probe time and "persistent" links created by ->probe() callbacks should not be reference-counted when those callbacks run again. I'm going to cut a patch to do that (early next week if all goes well).

I thought this should work, since it makes an unbind of a consumer act the same way as if that consumer's bind function has hit an EPROBE_DEFER. Example:

consumer's bind is called, realizes (through a subsystem function

like of_drm_find_panel) that the producer isn't there yet, fails with EPROBE DEFER.

driver core puts the consumer onto the deferred probe list.

producer gets loaded, registers the panel

driver core re-runs the consumer's bind function

consumer's bind function finds the produced and sets up the device link.

Now for the unbind case:

producer is unbound, driver core unbinds consumer due to the device_link

(Only with Lucas' patch) driver core puts the consumer onto the

deferred probe list.

Developer (this really is useful for development) rebinds/reloads

producer driver, which re-registers the panel

4 & 5 work exactly the same as with the initial load sequence.

I misunderstood the Russell's description, sorry.

My understanding was that the consumer driver would be rebound at this point and that the (missing) producer was expected to rebind as well in response.

...
With this initial loading and reloading would be very similar, and I think that's a good thing. It also solves the device_link lifetime/refcounting issue, since the device_link gets torn down in/restored completely, and reloading isn't a special case like in your proposed solution - there's no device_link left over from a previous loading of the driver, the dependency is only handled by putting the consumer (back) onto the deferred probe list.

My idea was based on incorrect understanding of the problem, so scratch it. :-)

The Lukas' patch would indeed work here, but I'm not sure if doing that for all links unconditionally is a good idea.

I'd rather have a link flag to indicate that this behavior is desirable.

That said, the creation of a device link is a heavy-weight operation in general as it triggers a list reordering that may turn out to be recursive etc., so I'd rather avoid doing too much of that.

Moreover, creating a link between two devices once should be sufficient, so I'd prefer "persistent" links to be used in that case, but they need to be fixed to avoid the extra reference counting. That and a new link flag to indicate that the consumer should be probed automatically after binding the supplier driver.

Yeah if persistent links make more sense from an implementation pov then that's all fine, as long as it results in the same behaviour for from the pov of all the involved bind functions. So if the core handles all the refcount trickery, that's good.

Also an explicit flag sounds like a good idea, since for some other problems we want to make the device_link opt-in. There's otherwise some loops in some cases apparently.

OK, I think I know how to make device links support this use case, but that will require some rework of the framework, mostly consisting of fixes and cleanups.

I'll send the first part of it in a minute.

Cheers, Rafael

Russell King - ARM Linux admin

17 Jan 17 Jan

5:26 p.m.

On Tue, Jan 15, 2019 at 11:47:00PM +0100, Rafael J. Wysocki wrote:

...

From: Rafael J. Wysocki rafael.j.wysocki@intel.com Subject: [PATCH] driver core: Fix adding device links to probing suppliers

Currently, it is not valid to add a device link from a consumer driver ->probe() callback to a supplier that is still probing too, but generally this is a valid use case. For example, if the consumer has just acquired a resource that can only be available when the supplier is functional, adding a device link to that supplier right away should be safe (and even desirable arguably), but device_link_add() doesn't handle that case correctly and the initial state of the link created by it is wrong then.

To address this problem, change the initial state of device links added between a probing supplier and a probing consumer to DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to skip such links on the supplier side.

With this change, if the supplier probe completes first, device_links_driver_bound() called for it will skip the link state update and when it is called for the consumer, the link state will be updated to "active". In turn, if the consumer probe completes first, device_links_driver_bound() called for it will change the state of the link to "active" and when it is called for the supplier, the link status update will be skipped.

However, in principle the supplier or consumer probe may still fail after the link has been added, so modify device_links_no_driver() to change device links in the "active" or "consumer probe" state to "dormant" on the supplier side and update __device_links_no_driver() to change the link state to "available" only if it is "consumer probe" or "active".

Then, if the supplier probe fails first, the leftover link to the probing consumer will become "dormant" and device_links_no_driver() called for the consumer (when its probe fails) will clean it up. In turn, if the consumer probe fails first, it will either drop the link, or change its state to "available" and, in the latter case, when device_links_no_driver() is called for the supplier, it will update the link state to "dormant". [If the supplier probe fails, but the consumer probe succeeds, which should not happen as long as the consumer driver is correct, the link still will be around, but it will be "dormant" until the supplier is probed again.]

Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com

Hi Rafael,

I've tried this with Armada DRM without using the component helper for bridges, and it seems to work up to a point (I have a vga bridge here to allow further testing):

[ 2.323441] armada-drm display-subsystem: assigned reserved memory node framebuffer [ 2.332001] armada-drm display-subsystem: bound f1820000.lcd-controller (ops armada_lcd_ops) [ 2.340790] armada-drm display-subsystem: bound f1810000.lcd-controller (ops armada_lcd_ops) [ 2.349345] armada-drm display-subsystem: node=/i2c-mux/i2c@0/hdmi-encoder@70 [ 2.356719] armada-drm display-subsystem: panel=fffffdfb bridge= (null) ret=0 [ 2.364038] armada-drm display-subsystem: Linked as a consumer to 1-0070 [ 2.370818] armada-drm display-subsystem: bridge=ee8cda2c ret=0 [ 2.376894] armada-drm display-subsystem: node=/vga-bridge [ 2.382453] armada-drm display-subsystem: panel=fffffdfb bridge=(null) ret=0 [ 2.389762] armada-drm display-subsystem: Linked as a consumer to vga-bridge [ 2.396883] armada-drm display-subsystem: bridge=ef3bec40 ret=0

When I remove the HDMI encoder:

root@cubox:/sys/bus/i2c/drivers/tda998x# echo 1-0070 > unbind

then I get:

[ 1013.824860] Console: switching to colour dummy device 80x30 [ 1013.866785] armada-drm display-subsystem: Dropping the link to vga-bridge [ 1013.867126] armada-drm display-subsystem: Dropping the link to 1-0070

which looks like it did what was expected - the DRM device is indeed unbound, the nodes in /dev/dri are gone. When rebinding the HDMI encoder:

[ 1015.864703] tda998x 1-0070: found TDA19988 [ 1015.898078] tda9950 1-0034: TDA9950 CEC interface, hardware version 3.3 [ 1015.941374] Registered IR keymap rc-cec [ 1015.941684] rc rc0: tda9950 as /devices/platform/mbus/mbus:internal-regs/f1011000.i2c/i2c-0/i2c-1/1-0070/rc/rc0 [ 1015.942439] input: tda9950 as /devices/platform/mbus/mbus:internal-regs/f1011000.i2c/i2c-0/i2c-1/1-0070/rc/rc0/input2 [ 1016.982921] alloc_contig_range: [10390, 10490) PFNs busy [ 1016.987400] kirkwood-spdif-audio audio-subsystem: snd-soc-dummy-dai <-> kirkwood-fe mapping ok [ 1016.987591] kirkwood-spdif-audio audio-subsystem: multicodec <-> kirkwood-spdif mapping ok

but the DRM stuff doesn't come back - this is what Daniel was talking about further down this thread.

I guess the kernel can't know that it should come back because the device links were dropped at the unbind stage, which means the kernel has lost the information necessary to know that the display subsystem is dependent on the presence of the HDMI encoder. I don't see an easy way around that.

If we keep the device links after an unbind event, then, because we create them during probe, we'll be attempting to recreate the link when we reattach the supplier to the consumer. If we only allow one instance, then what does device_link_add() return.

Maybe it is going to be less painful to push everything bridge-related to use the component helper after all. Dunno. Problems either way.

...

Documentation/driver-api/device_link.rst | 10 ++-- drivers/base/core.c | 74 +++++++++++++++++++++++++++---- 2 files changed, 73 insertions(+), 11 deletions(-)

Index: linux-pm/drivers/base/core.c

--- linux-pm.orig/drivers/base/core.c +++ linux-pm/drivers/base/core.c @@ -260,17 +260,26 @@ struct device_link *device_link_add(stru link->status = DL_STATE_NONE; } else { switch (supplier->links.status) {
case DL_DEV_DRIVER_BOUND:
case DL_DEV_PROBING:
switch (consumer->links.status) {
case DL_DEV_PROBING:
	/*
		 * Some callers expect the link creation during
		 * consumer driver probe to resume the supplier
		 * even without DL_FLAG_RPM_ACTIVE.
		 * A consumer driver can create a link to a
		 * supplier that has not completed its probing
		 * yet as long as it knows that the supplier is
		 * already functional (for example, it has just
		 * acquired some resources from the supplier).
	 */
		if (flags & DL_FLAG_PM_RUNTIME)
			pm_runtime_resume(supplier);
		link->status = DL_STATE_CONSUMER_PROBE;
		break;
	default:
		link->status = DL_STATE_DORMANT;
		break;
	}
	break;
case DL_DEV_DRIVER_BOUND:
	switch (consumer->links.status) {
	case DL_DEV_PROBING:
	link->status = DL_STATE_CONSUMER_PROBE;
	break;
case DL_DEV_DRIVER_BOUND:
@@ -291,6 +300,14 @@ struct device_link *device_link_add(stru }

/*
* Some callers expect the link creation during consumer driver probe to
* resume the supplier even without DL_FLAG_RPM_ACTIVE.
*/
if (link->status == DL_STATE_CONSUMER_PROBE &&
   flags & DL_FLAG_PM_RUNTIME)
pm_runtime_resume(supplier);
/*

Move the consumer and all of the devices depending on it to the end

of dpm_list and the devices_kset list.
@@ -474,6 +491,16 @@ void device_links_driver_bound(struct de if (link->flags & DL_FLAG_STATELESS) continue;
/*
 * Links created during consumer probe may be in the "consumer
 * probe" state to start with if the supplier is still probing
 * when they are created and they may become "active" if the
 * consumer probe returns first.  Skip them here.
 */
if (link->status == DL_STATE_CONSUMER_PROBE ||
    link->status == DL_STATE_ACTIVE)
	continue;
WARN_ON(link->status != DL_STATE_DORMANT); WRITE_ONCE(link->status, DL_STATE_AVAILABLE); }
@@ -513,17 +540,48 @@ static void __device_links_no_driver(str
if (link->flags & DL_FLAG_AUTOREMOVE_CONSUMER)
	kref_put(&link->kref, __device_link_del);
else if (link->status != DL_STATE_SUPPLIER_UNBIND)
else if (link->status == DL_STATE_CONSUMER_PROBE ||
	 link->status == DL_STATE_ACTIVE)
WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
}

dev->links.status = DL_DEV_NO_DRIVER;
}

+/**

device_links_no_driver - Update links after failing driver probe.

@dev: Device whose driver has just failed to probe.

Clean up leftover links to consumers for @dev and invoke

%__device_links_no_driver() to update links to suppliers for it as

appropriate.

Links with the DL_FLAG_STATELESS flag set are ignored.

*/

void device_links_no_driver(struct device *dev) {
struct device_link *link;

device_links_write_lock();

list_for_each_entry(link, &dev->links.consumers, s_node) {
if (link->flags & DL_FLAG_STATELESS)
	continue;
/*
 * The probe has failed, so if the status of the link is
 * "consumer probe" or "active", it must have been added by
 * a probing consumer while this device was still probing.
 * Change its state to "dormant", as it represents a valid
 * relationship, but it is not functionally meaningful.
 */
if (link->status == DL_STATE_CONSUMER_PROBE ||
    link->status == DL_STATE_ACTIVE)
	WRITE_ONCE(link->status, DL_STATE_DORMANT);
}

__device_links_no_driver(dev);

device_links_write_unlock();
}

Index: linux-pm/Documentation/driver-api/device_link.rst

--- linux-pm.orig/Documentation/driver-api/device_link.rst +++ linux-pm/Documentation/driver-api/device_link.rst @@ -59,11 +59,15 @@ device ``->probe`` callback or a boot-ti

Another example for an inconsistent state would be a device link that represents a driver presence dependency, yet is added from the consumer's -``->probe`` callback while the supplier hasn't probed yet: Had the driver -core known about the device link earlier, it wouldn't have probed the +``->probe`` callback while the supplier hasn't started to probe yet: Had the +driver core known about the device link earlier, it wouldn't have probed the consumer in the first place. The onus is thus on the consumer to check presence of the supplier after adding the link, and defer probing on -non-presence. +non-presence. [Note that it is valid to create a link from the consumer's +``->probe`` callback while the supplier is still probing, but the consumer must +know that the supplier is functional already at the link creation time (that is +the case, for instance, if the consumer has just acquired some resources that +would not have been available had the supplier not been functional then).]

If a device link is added in the ``->probe`` callback of the supplier or consumer driver, it is typically deleted in its ``->remove`` callback for

-- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Rafael J. Wysocki

10:43 p.m.

On Thursday, January 17, 2019 6:26:25 PM CET Russell King - ARM Linux admin wrote:

...

On Tue, Jan 15, 2019 at 11:47:00PM +0100, Rafael J. Wysocki wrote:

...
From: Rafael J. Wysocki rafael.j.wysocki@intel.com Subject: [PATCH] driver core: Fix adding device links to probing suppliers

Currently, it is not valid to add a device link from a consumer driver ->probe() callback to a supplier that is still probing too, but generally this is a valid use case. For example, if the consumer has just acquired a resource that can only be available when the supplier is functional, adding a device link to that supplier right away should be safe (and even desirable arguably), but device_link_add() doesn't handle that case correctly and the initial state of the link created by it is wrong then.

To address this problem, change the initial state of device links added between a probing supplier and a probing consumer to DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to skip such links on the supplier side.

With this change, if the supplier probe completes first, device_links_driver_bound() called for it will skip the link state update and when it is called for the consumer, the link state will be updated to "active". In turn, if the consumer probe completes first, device_links_driver_bound() called for it will change the state of the link to "active" and when it is called for the supplier, the link status update will be skipped.

However, in principle the supplier or consumer probe may still fail after the link has been added, so modify device_links_no_driver() to change device links in the "active" or "consumer probe" state to "dormant" on the supplier side and update __device_links_no_driver() to change the link state to "available" only if it is "consumer probe" or "active".

Then, if the supplier probe fails first, the leftover link to the probing consumer will become "dormant" and device_links_no_driver() called for the consumer (when its probe fails) will clean it up. In turn, if the consumer probe fails first, it will either drop the link, or change its state to "available" and, in the latter case, when device_links_no_driver() is called for the supplier, it will update the link state to "dormant". [If the supplier probe fails, but the consumer probe succeeds, which should not happen as long as the consumer driver is correct, the link still will be around, but it will be "dormant" until the supplier is probed again.]

Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com

Hi Rafael,

Hi,

...

I've tried this with Armada DRM without using the component helper for bridges, and it seems to work up to a point (I have a vga bridge here to allow further testing):

[ 2.323441] armada-drm display-subsystem: assigned reserved memory node framebuffer [ 2.332001] armada-drm display-subsystem: bound f1820000.lcd-controller (ops armada_lcd_ops) [ 2.340790] armada-drm display-subsystem: bound f1810000.lcd-controller (ops armada_lcd_ops) [ 2.349345] armada-drm display-subsystem: node=/i2c-mux/i2c@0/hdmi-encoder@70 [ 2.356719] armada-drm display-subsystem: panel=fffffdfb bridge= (null) ret=0 [ 2.364038] armada-drm display-subsystem: Linked as a consumer to 1-0070 [ 2.370818] armada-drm display-subsystem: bridge=ee8cda2c ret=0 [ 2.376894] armada-drm display-subsystem: node=/vga-bridge [ 2.382453] armada-drm display-subsystem: panel=fffffdfb bridge=(null) ret=0 [ 2.389762] armada-drm display-subsystem: Linked as a consumer to vga-bridge [ 2.396883] armada-drm display-subsystem: bridge=ef3bec40 ret=0

When I remove the HDMI encoder:

root@cubox:/sys/bus/i2c/drivers/tda998x# echo 1-0070 > unbind

then I get:

[ 1013.824860] Console: switching to colour dummy device 80x30 [ 1013.866785] armada-drm display-subsystem: Dropping the link to vga-bridge [ 1013.867126] armada-drm display-subsystem: Dropping the link to 1-0070

which looks like it did what was expected - the DRM device is indeed unbound, the nodes in /dev/dri are gone.

OK, thanks!

This tells me that the patch is an improvement, so I'm going to move on to integrate it.

...

When rebinding the HDMI encoder:

[ 1015.864703] tda998x 1-0070: found TDA19988 [ 1015.898078] tda9950 1-0034: TDA9950 CEC interface, hardware version 3.3 [ 1015.941374] Registered IR keymap rc-cec [ 1015.941684] rc rc0: tda9950 as /devices/platform/mbus/mbus:internal-regs/f1011000.i2c/i2c-0/i2c-1/1-0070/rc/rc0 [ 1015.942439] input: tda9950 as /devices/platform/mbus/mbus:internal-regs/f1011000.i2c/i2c-0/i2c-1/1-0070/rc/rc0/input2 [ 1016.982921] alloc_contig_range: [10390, 10490) PFNs busy [ 1016.987400] kirkwood-spdif-audio audio-subsystem: snd-soc-dummy-dai <-> kirkwood-fe mapping ok [ 1016.987591] kirkwood-spdif-audio audio-subsystem: multicodec <-> kirkwood-spdif mapping ok

but the DRM stuff doesn't come back - this is what Daniel was talking about further down this thread.

Right, it's that case AFAICS.

...

I guess the kernel can't know that it should come back because the device links were dropped at the unbind stage, which means the kernel has lost the information necessary to know that the display subsystem is dependent on the presence of the HDMI encoder. I don't see an easy way around that.

If the link is defined as "persistent", it will survive the removal of drivers, but it will become "dormant".

Currently, if such a link is present, the consumer probe fill fail with -EPROBE_DEFER, but it might also trigger device_attach() on the supplier. It would make sense to do that anyway IMO.

...

If we keep the device links after an unbind event, then, because we create them during probe, we'll be attempting to recreate the link when we reattach the supplier to the consumer. If we only allow one instance, then what does device_link_add() return.

It returns the existing link, but it will reference-count it too.

However, that arguably is an overlooked use case too, because "persistent" links created during consumer or supplier probe should not be reference counted if the same probe runs again. Extra flags may be needed to handle that, though.

...

Maybe it is going to be less painful to push everything bridge-related to use the component helper after all. Dunno. Problems either way.

Well, this is a fairly complicated use case and I'm glad that we have it, because it shows what's missing.

Linus Walleij

18 Jan 18 Jan

11:07 a.m.

On Tue, Jan 8, 2019 at 10:22 AM Andrzej Hajda a.hajda@samsung.com wrote:

...

Of course it was tested on Exynos as this is HW I work on. Linus Walleij has also reported in this thread that device links have different issue

circular dependencies (last few emails in this lengthy thread). My

response explains it in detail.

Indeed, it was detailed in commit d6a77ba0eb92d8ffa4b05a442fc20d0a9b11c4c4

AFAICT the problem was that struct drm_device does not contain a struct device itself, just a pointer struct device * ->dev to the bus device which is unorthodox: most other device type have a full struct device inside them and the bus device would rather be that device's parent.

I had this situation with struct gpio_chip for a long time but eventually fixed it, which proved immensely helpful.

Yours, Linus Walleij

Russell King - ARM Linux

8 Jan 8 Jan

10:16 a.m.

On Tue, Jan 08, 2019 at 09:35:21AM +0100, Andrzej Hajda wrote:

...

On 07.01.2019 22:56, Daniel Vetter wrote:

...
If you call drm_dev_register before you have all the components assembled (i.e. anywhere else than in master bind) that sounds like a driver bug.

This is how components work, every component calls component_add usually in probe, and this function checks if all components are present, if yes (ie. during probe of the last component) master bind is called and it creates drm device and then registers it. If you add device link at this moment, it is still before end of probe of the last component.

I wonder if device links will create more problems with the component stuff, because we'll end up with two different systems fighting to achieve the same aim.

1) a component device is probed, and is added to the component helper as a component. (Repeat for as many component devices as are required.) 2) the main device is probed, added as a master. The system is complete, so the master bind is called, and device links are created - the component device is a provider, and the main device is a consumer. 3) the component device is unbound, which triggers the device links to unbind the provider device. This undoes everything, removing the device links and removing the main device from the component helper. 4) the component device is re-probed, and added to the component helper again.

At this point, there is nothing that causes the main device to be re-probed. As far as userspace is concerned, that's confusing because userspace never asked for any change of status of the provider device.

Without device links, the main device would not be unbound, and there would be an additional stage:

5) the component helper notices the re-appearance of the component device, notices that the system is again complete, and re-binds the master device bring the system back up.

This seems to be a disadvantage with device links, but there is an advantage that device links offer - the ability to deal with other resources (such as clocks, regulators, etc) going away more gracefully than is possible today - but folk have to understand that a consumer device will only ever be unbound, it will never be re-bound automatically when the resources re-appear.

I suppose that could be fixed by moving the consumer devices onto the deferred probe list, so when a device is re-bound, they are retried. At that point, the component helper becomes entirely redundant, since its functionality can be implemented completely using device links.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Daniel Vetter

10:31 a.m.

On Tue, Jan 8, 2019 at 11:16 AM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...

On Tue, Jan 08, 2019 at 09:35:21AM +0100, Andrzej Hajda wrote:

...
On 07.01.2019 22:56, Daniel Vetter wrote:

...
If you call drm_dev_register before you have all the components assembled (i.e. anywhere else than in master bind) that sounds like a driver bug.

This is how components work, every component calls component_add usually in probe, and this function checks if all components are present, if yes (ie. during probe of the last component) master bind is called and it creates drm device and then registers it. If you add device link at this moment, it is still before end of probe of the last component.

I wonder if device links will create more problems with the component stuff, because we'll end up with two different systems fighting to achieve the same aim.

a component device is probed, and is added to the component helper as a component. (Repeat for as many component devices as are required.)

the main device is probed, added as a master. The system is complete, so the master bind is called, and device links are created - the component device is a provider, and the main device is a consumer.

the component device is unbound, which triggers the device links to unbind the provider device. This undoes everything, removing the device links and removing the main device from the component helper.

the component device is re-probed, and added to the component helper again.

At this point, there is nothing that causes the main device to be re-probed. As far as userspace is concerned, that's confusing because userspace never asked for any change of status of the provider device.

Without device links, the main device would not be unbound, and there would be an additional stage:

the component helper notices the re-appearance of the component device, notices that the system is again complete, and re-binds the master device bring the system back up.

This seems to be a disadvantage with device links, but there is an advantage that device links offer - the ability to deal with other resources (such as clocks, regulators, etc) going away more gracefully than is possible today - but folk have to understand that a consumer device will only ever be unbound, it will never be re-bound automatically when the resources re-appear.

Hm, I thought there was patches floating around to remedy that, I guess they never landed? Without that device links aren't really useful beyond fixing suspend/resume issues.

...

I suppose that could be fixed by moving the consumer devices onto the deferred probe list, so when a device is re-bound, they are retried. At that point, the component helper becomes entirely redundant, since its functionality can be implemented completely using device links.

There's still the benefit that component doesn't have opinions about what exactly the things are you're collecting together - device links only work if you have a struct device. That's generally ok for DT platforms, but a notch more work on x86 (since you need to create platform devices, which then creates lifetime issues since the platform device is managed by the driver bound to the pci device, so if you unload that again the platform device itself has to disappear).

I guess if device links really don't work plan B would be to have a standard wrapper for bridg-as-component: drm_bridge_add would also register a component, and a new of_drm_bridge_component_add would add it to the component_match. We could probably even do standard bind/unbind functions for that bridge component, which links it up to the right encoder (or just to a given bridge pointer, whether that's in drm_encoder of in a drm_bridge for chaining). -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Russell King - ARM Linux

7 Jan 7 Jan

4:12 p.m.

On Mon, Jan 07, 2019 at 11:45:32AM +0100, Daniel Vetter wrote:

...

On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

Sorry, I disagree. Unbinding is important if the current state results in crashes and oops - the lack of unbinding support in bridge makes it harder to develop without constantly rebooting the target machine.

If all you care about is the end user who probably never removes a module, then yes, it's low priority, but if you care about efficient development, then the story is rather different.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Daniel Vetter

9:55 p.m.

On Mon, Jan 7, 2019 at 5:13 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...

On Mon, Jan 07, 2019 at 11:45:32AM +0100, Daniel Vetter wrote:

...
On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

Sorry, I disagree. Unbinding is important if the current state results in crashes and oops - the lack of unbinding support in bridge makes it harder to develop without constantly rebooting the target machine.

If all you care about is the end user who probably never removes a module, then yes, it's low priority, but if you care about efficient development, then the story is rather different.

Unloading i915 needs a very careful script, or you'll get a rather bright fireworks. Afaik all other drm drivers (except maybe udl) are the same. At least if you do anything fancy, where fancy includes: fbdev emulation, prime buffer sharing, shared dma fences, or well anything really that goes beyond a dummy boot splash. The lifetimes of all these things are flat-out broken. udl tries to at least wrap some duct-tape around it, and Noralf greatly improved the situation in the past year at least.

So still not seeing what exactly the massive blocker here is. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Russell King - ARM Linux

8 Jan 8 Jan

12:39 a.m.

On Mon, Jan 07, 2019 at 10:55:15PM +0100, Daniel Vetter wrote:

...

On Mon, Jan 7, 2019 at 5:13 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Mon, Jan 07, 2019 at 11:45:32AM +0100, Daniel Vetter wrote:

...
On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

Sorry, I disagree. Unbinding is important if the current state results in crashes and oops - the lack of unbinding support in bridge makes it harder to develop without constantly rebooting the target machine.

If all you care about is the end user who probably never removes a module, then yes, it's low priority, but if you care about efficient development, then the story is rather different.

Unloading i915 needs a very careful script, or you'll get a rather bright fireworks. Afaik all other drm drivers (except maybe udl) are the same. At least if you do anything fancy, where fancy includes: fbdev emulation, prime buffer sharing, shared dma fences, or well anything really that goes beyond a dummy boot splash. The lifetimes of all these things are flat-out broken. udl tries to at least wrap some duct-tape around it, and Noralf greatly improved the situation in the past year at least.

So still not seeing what exactly the massive blocker here is.

The fact that I can unload armada drm/tda998x modules without incident today, and have done many times through development, and I don't wish to regress from that position. As far as I'm concerned, this is a solved problem, but the pressure I'm under to convert tda998x to a bridge driver is causing bugs that I've already solved by _not_ using that to be introduced.

You've mentioned in your previous mail about me not trying to solve the situation - I have tried, I've proposed a way around the component vs bridge issue but I don't think it went anywhere.

If I can't get traction on issues, then I can only state what the current state of affairs is to folk asking about it. That is _not_ "whinging" about it, that is informing people and being helpful.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

Liviu Dudau

2:29 p.m.

On Tue, Jan 08, 2019 at 12:39:36AM +0000, Russell King - ARM Linux wrote:

...

On Mon, Jan 07, 2019 at 10:55:15PM +0100, Daniel Vetter wrote:

...
On Mon, Jan 7, 2019 at 5:13 PM Russell King - ARM Linux linux@armlinux.org.uk wrote:

...
On Mon, Jan 07, 2019 at 11:45:32AM +0100, Daniel Vetter wrote:

...
On Thu, Jan 03, 2019 at 01:11:47PM +0000, Russell King - ARM Linux wrote:

...
This is the long-standing problem with the conflict between bridge support and component support, and I'm not sure that there is really any answer to it.

I've gone into the details of the two several times on the list, particularly about the short-comings of the bridge approach, but it seems no one cares to fix those short-comings.

You are re-identifying some of the issues that I've already pointed out - such as what happens to DRM drives when the bridge driver is unbound (it's really not about modules being unloaded, and the problem can't be solved by taking a module reference count - all that the module reference count does is ensure that the module doesn't go away unexpected, there is no way to ensure that the device isn't unbound.)

The issue of unbinding is precisely the issue which the component support was created to solve - but everyone seems to prefer the buggy bridge approach, and no one seems willing to do anything about the bugs or even acknowledge that it's a problem. It's strange - if one identifies bugs that result in kernel oops in other kernel subsystems, one is generally taken seriously and the problem is solved.

Unbinding is really not the most important feature, especially for SoC. If you feel different, working together with others, getting some agreement, getting the patches reviewed and finding someone to get them merged is very much appreciated. But just complaining won't move this forward.

Sorry, I disagree. Unbinding is important if the current state results in crashes and oops - the lack of unbinding support in bridge makes it harder to develop without constantly rebooting the target machine.

If all you care about is the end user who probably never removes a module, then yes, it's low priority, but if you care about efficient development, then the story is rather different.

Unloading i915 needs a very careful script, or you'll get a rather bright fireworks. Afaik all other drm drivers (except maybe udl) are the same. At least if you do anything fancy, where fancy includes: fbdev emulation, prime buffer sharing, shared dma fences, or well anything really that goes beyond a dummy boot splash. The lifetimes of all these things are flat-out broken. udl tries to at least wrap some duct-tape around it, and Noralf greatly improved the situation in the past year at least.

So still not seeing what exactly the massive blocker here is.

The fact that I can unload armada drm/tda998x modules without incident today, and have done many times through development, and I don't wish to regress from that position. As far as I'm concerned, this is a solved problem, but the pressure I'm under to convert tda998x to a bridge driver is causing bugs that I've already solved by _not_ using that to be introduced.

And hdlcd and mali-dp have been unloaded and loaded back multiple times on my dev boards. I can't comment on other Arm SoC drm drivers, but I'll be prepared to bet that at least another one is capable of doing the same thing.

Now, the fact that you might have an fbcon that binds to the framebuffer and makes life a bit harder ... that's another thing.

Best regards, Liviu

...

You've mentioned in your previous mail about me not trying to solve the situation - I have tried, I've proposed a way around the component vs bridge issue but I don't think it went anywhere.

If I can't get traction on issues, then I can only state what the current state of affairs is to folk asking about it. That is _not_ "whinging" about it, that is informing people and being helpful.

-- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up

2306

Age (days ago)

2327

Last active (days ago)

dri-devel@lists.freedesktop.org

52 comments

10 participants

tags (0)

participants (10)

Andrzej Hajda
Daniel Vetter
Linus Walleij
Liviu Dudau
Lubomir Rintel
Lucas Stach
Rafael J. Wysocki
Rafael J. Wysocki
Russell King - ARM Linux
Russell King - ARM Linux admin