[PATCH 0/3] Use implicit kref infra

List overview All Threads
Download

newer

older

[radeon-alex:amd-staging-drm-next...

[PATCH] FROMLIST: udmabuf: Add...

Luben Tuikov

2 Sep 2020 2 Sep '20

1:06 a.m.

Use the implicit kref infrastructure to free the container struct amdgpu_device, container of struct drm_device.

First, in drm_dev_register(), do not indiscriminately warn when a DRM driver hasn't opted for managed.final_kfree, but instead check if the driver has provided its own "release" function callback in the DRM driver structure. If that is the case, no warning.

Remove drmm_add_final_kfree(). We take care of that, in the kref "release" callback when all refs are down to 0, via drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be suspended always exists, so long as the underlying PCI and DRM devices exist.

Luben Tuikov (3): drm: No warn for drivers who provide release drm/amdgpu: Remove drmm final free drm/amdgpu: Remove superfluous NULL check

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- drivers/gpu/drm/drm_drv.c | 3 ++- 3 files changed, 2 insertions(+), 6 deletions(-)

-- 2.28.0.394.ge197136389

Show replies by date

Luben Tuikov

2 Sep 2 Sep

1:06 a.m.

New subject: [PATCH 1/3] drm: No warn for drivers who provide release

Drivers usually allocate their container struct at PCI probe time, then call drm_dev_init(), which initializes the contained DRM dev kref to 1.

A DRM driver may provide their own kref release method, which frees the container object, the container of the DRM device, on the last "put" which usually comes after the PCI device has been freed with PCI and with DRM.

If a driver has provided their own "release" method in the drm_driver structure, then do not check "managed.final_kfree", and thus do not splat a WARN_ON in the kernel log when a driver which implements "release" is loaded.

This patch adds this one-line check.

Signed-off-by: Luben Tuikov luben.tuikov@amd.com --- drivers/gpu/drm/drm_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..952455dedb8c 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -935,7 +935,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags) if (!driver->load) drm_mode_config_validate(dev);

- WARN_ON(!dev->managed.final_kfree); + if (!driver->release) + WARN_ON(!dev->managed.final_kfree);

if (drm_dev_needs_global_mutex(dev)) mutex_lock(&drm_global_mutex);

-- 2.28.0.394.ge197136389

Luben Tuikov

1:06 a.m.

New subject: [PATCH 2/3] drm/amdgpu: Remove drmm final free

The amdgpu driver implements its own DRM driver release function which naturally frees the container struct amdgpu_device of the DRM device, on a "final" kref-put, i.e. when the kref transitions from non-zero to 0.

Signed-off-by: Luben Tuikov luben.tuikov@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 459cf13e76fe..17d49f1d86e7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1153,8 +1153,6 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, if (ret) goto err_free;

- drmm_add_final_kfree(ddev, ddev); - if (!supports_atomic) ddev->driver_features &= ~DRIVER_ATOMIC;

-- 2.28.0.394.ge197136389

Luben Tuikov

1:06 a.m.

New subject: [PATCH 3/3] drm/amdgpu: Remove superfluous NULL check

The DRM device is a static member of the amdgpu device structure and as such always exists, so long as the PCI and thus the amdgpu device exist.

Signed-off-by: Luben Tuikov luben.tuikov@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c4900471beb0..6dcc256b9ebc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3471,9 +3471,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) struct drm_connector_list_iter iter; int r;

- if (!dev) - return -ENODEV; - adev = drm_to_adev(dev);

if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)

-- 2.28.0.394.ge197136389

Daniel Vetter

6:59 a.m.

New subject: [PATCH 3/3] drm/amdgpu: Remove superfluous NULL check

On Tue, Sep 01, 2020 at 09:06:45PM -0400, Luben Tuikov wrote:

...

The DRM device is a static member of the amdgpu device structure and as such always exists, so long as the PCI and thus the amdgpu device exist.

Signed-off-by: Luben Tuikov luben.tuikov@amd.com

On this patch, but not the other two earlier in this series:

Acked-by: Daniel Vetter daniel.vetter@ffwll.ch

...

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c4900471beb0..6dcc256b9ebc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3471,9 +3471,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) struct drm_connector_list_iter iter; int r;
if (!dev)
return -ENODEV;
adev = drm_to_adev(dev);

if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
-- 2.28.0.394.ge197136389

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7:21 a.m.

New subject: [PATCH 3/3] drm/amdgpu: Remove superfluous NULL check

Am 02.09.20 um 08:59 schrieb Daniel Vetter:

...

On Tue, Sep 01, 2020 at 09:06:45PM -0400, Luben Tuikov wrote:

...
The DRM device is a static member of the amdgpu device structure and as such always exists, so long as the PCI and thus the amdgpu device exist.

Signed-off-by: Luben Tuikov luben.tuikov@amd.com

On this patch, but not the other two earlier in this series:

Acked-by: Daniel Vetter daniel.vetter@ffwll.ch

...
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c4900471beb0..6dcc256b9ebc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3471,9 +3471,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon) struct drm_connector_list_iter iter; int r;
if (!dev)
return -ENODEV;
adev = drm_to_adev(dev);

Maybe this could now even fit into the declaration of adev.

But either way Acked-by: Christian König christian.koenig@amd.com.

Christian.

...

...
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)

2.28.0.394.ge197136389

Pan, Xinhui

1:42 a.m.

If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if (dev->driver->release) 814 dev->driver->release(dev); 815 816 drm_managed_release(dev); 817 818 kfree(dev->managed.final_kfree); 819 }

You have to make another change something like diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..2aabd2b4c63b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)

drm_managed_release(dev);

- kfree(dev->managed.final_kfree); + if (dev->driver->final_release) + dev->driver->final_release(dev); }

And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

-----原始邮件----- 发件人: "Tuikov, Luben" Luben.Tuikov@amd.com 日期: 2020年9月2日星期三 09:07 收件人: "amd-gfx@lists.freedesktop.org" amd-gfx@lists.freedesktop.org, "dri-devel@lists.freedesktop.org" dri-devel@lists.freedesktop.org 抄送: "Deucher, Alexander" Alexander.Deucher@amd.com, Daniel Vetter daniel@ffwll.ch, "Pan, Xinhui" Xinhui.Pan@amd.com, "Tuikov, Luben" Luben.Tuikov@amd.com 主题: [PATCH 0/3] Use implicit kref infra

Use the implicit kref infrastructure to free the container struct amdgpu_device, container of struct drm_device.

Remove drmm_add_final_kfree(). We take care of that, in the kref "release" callback when all refs are down to 0, via drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be suspended always exists, so long as the underlying PCI and DRM devices exist.

Luben Tuikov (3): drm: No warn for drivers who provide release drm/amdgpu: Remove drmm final free drm/amdgpu: Remove superfluous NULL check

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- drivers/gpu/drm/drm_drv.c | 3 ++- 3 files changed, 2 insertions(+), 6 deletions(-)

-- 2.28.0.394.ge197136389

Luben Tuikov

3:46 a.m.

On 2020-09-01 21:42, Pan, Xinhui wrote:

...

If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective.

I know dev is embedded in adev--I did that patchset.

...

809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if (dev->driver->release) 814 dev->driver->release(dev); 815 816 drm_managed_release(dev); 817 818 kfree(dev->managed.final_kfree); 819 }

That's simple--this comes from change c6603c740e0e3 and it should be reverted. Simple as that.

The version before this change was absolutely correct:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

Meaning, "the kref is now 0"--> if the driver has a release, call it, else use our own. But note that nothing can be assumed after this point, about the existence of "dev".

It is exactly because struct drm_device is statically embedded into a container, struct amdgpu_device, that this change above should be reverted.

This is very similar to how fops has open/release but no close. That is, the "release" is called only when the last kref is released, i.e. when kref goes from non-zero to zero.

This uses the kref infrastructure which has been around for about 20 years in the Linux kernel.

I suggest reading the comments in drm_dev.c mostly, "DOC: driver instance overview" starting at line 240 onwards. This is right above drm_put_dev(). There is actually an example of a driver in the comment. Also the comment to drm_dev_init().

Now, take a look at this:

/** * drm_dev_put - Drop reference of a DRM device * @dev: device to drop reference of or NULL * * This decreases the ref-count of @dev by one. The device is destroyed if the * ref-count drops to zero. */ void drm_dev_put(struct drm_device *dev) { if (dev) kref_put(&dev->ref, drm_dev_release); } EXPORT_SYMBOL(drm_dev_put);

Two things:

1. It is us, who kzalloc the amdgpu device, which contains the drm_device (you'll see this discussed in the reading material I pointed to above). We do this because we're probing the PCI device whether we'll work it it or not.

2. Using the kref infrastructure, when the ref goes to 0, drm_dev_release is called. And here's the KEY: Because WE allocated the container, we should free it--after the release method is called, DRM cannot assume anything about the drm device or the container. The "release" method is final.

We allocate, we free. And we free only when the ref goes to 0.

DRM can, in due time, "free" itself of the DRM device and stop having knowledge of it--that's fine, but as long as the ref is not 0, the amdgpu device and thus the contained DRM device, cannot be freed.

...

You have to make another change something like diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..2aabd2b4c63b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
    drm_managed_release(dev);
  kfree(dev->managed.final_kfree);
  if (dev->driver->final_release)
          dev->driver->final_release(dev);
}

No. What's this? There is no such thing as "final" release, nor is there a "partial" release. When the kref goes to 0, the device disappears. Simple. If someone is using it, they should kref-get it, and when they're done with it, they should kref-put it.

The whole point is that this is done implicitly, via the kref infrastructure. drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all as per the documentation I pointed you to above.

Another point is that we can do some other stuff in the release function, notify someone, write some registers, free memory we use for that PCI device, etc.

If the "managed resources" infrastructure wants to stay, it should hook itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). It shouldn't have to be so out-of-place like in patch 2/3 of this series, where the drmm_add_final_kfree() is smack-dab in the middle of our PCI discovery function, surrounded on top and bottom by drm_dev_init() and drm_dev_register(). The "managed resources" infra should be non-invasive and drivers shouldn't have to change to use it--it should be invisible to them. Then our kref would just work.

...

And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

No! No final_kfree. It's a hack.

Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.

Regards, Luben

...

Use the implicit kref infrastructure to free the container
struct amdgpu_device, container of struct drm_device.

First, in drm_dev_register(), do not indiscriminately warn
when a DRM driver hasn't opted for managed.final_kfree,
but instead check if the driver has provided its own
"release" function callback in the DRM driver structure.
If that is the case, no warning.

Remove drmm_add_final_kfree(). We take care of that, in the
kref "release" callback when all refs are down to 0, via
drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be
suspended always exists, so long as the underlying PCI and
DRM devices exist.

Luben Tuikov (3):
  drm: No warn for drivers who provide release
  drm/amdgpu: Remove drmm final free
  drm/amdgpu: Remove superfluous NULL check

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 2 --
 drivers/gpu/drm/drm_drv.c                  | 3 ++-
 3 files changed, 2 insertions(+), 6 deletions(-)

-- 
2.28.0.394.ge197136389

Pan, Xinhui

4:43 a.m.

...

2020年9月2日 11:46，Tuikov, Luben Luben.Tuikov@amd.com 写道：

On 2020-09-01 21:42, Pan, Xinhui wrote:

...
If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective.

I know dev is embedded in adev--I did that patchset.

...
809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if (dev->driver->release) 814 dev->driver->release(dev); 815 816 drm_managed_release(dev); 817 818 kfree(dev->managed.final_kfree); 819 }

That's simple--this comes from change c6603c740e0e3 and it should be reverted. Simple as that.

The version before this change was absolutely correct:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

Meaning, "the kref is now 0"--> if the driver has a release, call it, else use our own. But note that nothing can be assumed after this point, about the existence of "dev".

It is exactly because struct drm_device is statically embedded into a container, struct amdgpu_device, that this change above should be reverted.

This is very similar to how fops has open/release but no close. That is, the "release" is called only when the last kref is released, i.e. when kref goes from non-zero to zero.

This uses the kref infrastructure which has been around for about 20 years in the Linux kernel.

I suggest reading the comments in drm_dev.c mostly, "DOC: driver instance overview" starting at line 240 onwards. This is right above drm_put_dev(). There is actually an example of a driver in the comment. Also the comment to drm_dev_init().

Now, take a look at this:

/**

drm_dev_put - Drop reference of a DRM device

@dev: device to drop reference of or NULL

This decreases the ref-count of @dev by one. The device is destroyed if the

ref-count drops to zero.

*/ void drm_dev_put(struct drm_device *dev) { if (dev) kref_put(&dev->ref, drm_dev_release); } EXPORT_SYMBOL(drm_dev_put);

Two things:

It is us, who kzalloc the amdgpu device, which contains

the drm_device (you'll see this discussed in the reading material I pointed to above). We do this because we're probing the PCI device whether we'll work it it or not.

that is true. My understanding of the drm core code is like something below. struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end. But yes, practice is more complex. if B has nothing to be destroyed. we can destory A directly, otherwise destroy B firstly.

in this case, we can do something below in our release() //some cleanup work of B drm_dev_fini(dev);//destroy A kfree(adev)

...

Using the kref infrastructure, when the ref goes to 0,

drm_dev_release is called. And here's the KEY: Because WE allocated the container, we should free it--after the release method is called, DRM cannot assume anything about the drm device or the container. The "release" method is final.

We allocate, we free. And we free only when the ref goes to 0.

DRM can, in due time, "free" itself of the DRM device and stop having knowledge of it--that's fine, but as long as the ref is not 0, the amdgpu device and thus the contained DRM device, cannot be freed.

...
You have to make another change something like diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..2aabd2b4c63b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
   drm_managed_release(dev);
  kfree(dev->managed.final_kfree);
  if (dev->driver->final_release)
          dev->driver->final_release(dev);
}
No. What's this? There is no such thing as "final" release, nor is there a "partial" release. When the kref goes to 0, the device disappears. Simple. If someone is using it, they should kref-get it, and when they're done with it, they should kref-put it.

I just take an example here. add another release in the end. then no one could touch us. IOW, final_release.

A destroy B by a callback, then A destroy itself. It assumes B just free its own resource. but that makes trouble if some resource of A is allocated by B. Because B must take care of these common resource shared between A and B.

yes, that logical is more complex. So I think we can revert drm_dev_release to its previous version.

...

The whole point is that this is done implicitly, via the kref infrastructure. drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all as per the documentation I pointed you to above.

I am not taking about kref. what we are discussing is all about the release sequence.

...

Another point is that we can do some other stuff in the release function, notify someone, write some registers, free memory we use for that PCI device, etc.

If the "managed resources" infrastructure wants to stay, it should hook itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). It shouldn't have to be so out-of-place like in patch 2/3 of this series, where the drmm_add_final_kfree() is smack-dab in the middle of our PCI discovery function, surrounded on top and bottom by drm_dev_init() and drm_dev_register(). The "managed resources" infra should be non-invasive and drivers shouldn't have to change to use it--it should be invisible to them. Then our kref would just work.

yep, that make sense. But you need more changes to fix this issue. this patchset is insufficient.

...

...
And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

No! No final_kfree. It's a hack.

Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.

Regards, Luben

...
-----原始邮件----- 发件人: "Tuikov, Luben" Luben.Tuikov@amd.com 日期: 2020年9月2日星期三 09:07 收件人: "amd-gfx@lists.freedesktop.org" amd-gfx@lists.freedesktop.org, "dri-devel@lists.freedesktop.org" dri-devel@lists.freedesktop.org 抄送: "Deucher, Alexander" Alexander.Deucher@amd.com, Daniel Vetter daniel@ffwll.ch, "Pan, Xinhui" Xinhui.Pan@amd.com, "Tuikov, Luben" Luben.Tuikov@amd.com 主题: [PATCH 0/3] Use implicit kref infra

Use the implicit kref infrastructure to free the container struct amdgpu_device, container of struct drm_device.

First, in drm_dev_register(), do not indiscriminately warn when a DRM driver hasn't opted for managed.final_kfree, but instead check if the driver has provided its own "release" function callback in the DRM driver structure. If that is the case, no warning.

Remove drmm_add_final_kfree(). We take care of that, in the kref "release" callback when all refs are down to 0, via drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be suspended always exists, so long as the underlying PCI and DRM devices exist.

Luben Tuikov (3): drm: No warn for drivers who provide release drm/amdgpu: Remove drmm final free drm/amdgpu: Remove superfluous NULL check
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 2 --
drivers/gpu/drm/drm_drv.c                  | 3 ++-
3 files changed, 2 insertions(+), 6 deletions(-)
-- 2.28.0.394.ge197136389

Luben Tuikov

2:50 p.m.

On 2020-09-02 00:43, Pan, Xinhui wrote:

...

...
2020年9月2日 11:46，Tuikov, Luben Luben.Tuikov@amd.com 写道：

On 2020-09-01 21:42, Pan, Xinhui wrote:

...
If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective.

I know dev is embedded in adev--I did that patchset.

...
809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if (dev->driver->release) 814 dev->driver->release(dev); 815 816 drm_managed_release(dev); 817 818 kfree(dev->managed.final_kfree); 819 }

That's simple--this comes from change c6603c740e0e3 and it should be reverted. Simple as that.

The version before this change was absolutely correct:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

Meaning, "the kref is now 0"--> if the driver has a release, call it, else use our own. But note that nothing can be assumed after this point, about the existence of "dev".

It is exactly because struct drm_device is statically embedded into a container, struct amdgpu_device, that this change above should be reverted.

This is very similar to how fops has open/release but no close. That is, the "release" is called only when the last kref is released, i.e. when kref goes from non-zero to zero.

This uses the kref infrastructure which has been around for about 20 years in the Linux kernel.

I suggest reading the comments in drm_dev.c mostly, "DOC: driver instance overview" starting at line 240 onwards. This is right above drm_put_dev(). There is actually an example of a driver in the comment. Also the comment to drm_dev_init().

Now, take a look at this:

/**

drm_dev_put - Drop reference of a DRM device

@dev: device to drop reference of or NULL

This decreases the ref-count of @dev by one. The device is destroyed if the

ref-count drops to zero.

*/ void drm_dev_put(struct drm_device *dev) { if (dev) kref_put(&dev->ref, drm_dev_release); } EXPORT_SYMBOL(drm_dev_put);

Two things:

It is us, who kzalloc the amdgpu device, which contains

the drm_device (you'll see this discussed in the reading material I pointed to above). We do this because we're probing the PCI device whether we'll work it it or not.

that is true.

Of course it's true--good morning!

...

My understanding of the drm core code is like something below.

Let me stop you right there--just read the documentation I pointed to you at.

...

struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end.

No! B, which is the amdgpu_device struct "exists" before A, which is the DRM struct. This is why DRM recommends to _embed_ it into the driver's own device struct, as the documentation I pointed you to at.

A, the DRM struct, is an abstraction, and is "created" last, and "undone" first, since the DRM layer may finish with a device, but the device may still exists with the driver and as well as with PCI. This is very VERY common, with kernels, devices, device abstractions, device layers: DRM dev <-- amdgpu dev <-- PCI dev.

...

But yes, practice is more complex. if B has nothing to be destroyed. we can destory A directly, otherwise destroy B firstly.

I'm sorry, that doesn't make sense. There is no such thing as "destroy directly" and "otherwise"--this is absolutely not how this works.

A good architecture doesn't have if-then-else--it's just a pure single-branch path.

...

in this case, we can do something below in our release() //some cleanup work of B drm_dev_fini(dev);//destroy A kfree(adev)

...
Using the kref infrastructure, when the ref goes to 0,

drm_dev_release is called. And here's the KEY: Because WE allocated the container, we should free it--after the release method is called, DRM cannot assume anything about the drm device or the container. The "release" method is final.

We allocate, we free. And we free only when the ref goes to 0.

DRM can, in due time, "free" itself of the DRM device and stop having knowledge of it--that's fine, but as long as the ref is not 0, the amdgpu device and thus the contained DRM device, cannot be freed.

...
You have to make another change something like diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..2aabd2b4c63b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
   drm_managed_release(dev);
  kfree(dev->managed.final_kfree);
  if (dev->driver->final_release)
          dev->driver->final_release(dev);
}
No. What's this? There is no such thing as "final" release, nor is there a "partial" release. When the kref goes to 0, the device disappears. Simple. If someone is using it, they should kref-get it, and when they're done with it, they should kref-put it.
I just take an example here. add another release in the end. then no one could touch us. IOW, final_release.

No, that's horrible.

...

A destroy B by a callback, then A destroy itself. It assumes B just free its own resource. but that makes trouble if some resource of A is allocated by B. Because B must take care of these common resource shared between A and B.

No, that's horrible.

...

yes, that logical is more complex. So I think we can revert drm_dev_release to its previous version.

drm_dev_release() in its original form, was pure:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

...

...
The whole point is that this is done implicitly, via the kref infrastructure. drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all as per the documentation I pointed you to above.

I am not taking about kref. what we are discussing is all about the release sequence.

You need to understand how the kref infrastructure works in the kernel. I've said it a million times: it's implicit. The "release sequence" as you like to call it, is implicit in the kref infrastructure.

...

...
Another point is that we can do some other stuff in the release function, notify someone, write some registers, free memory we use for that PCI device, etc.

If the "managed resources" infrastructure wants to stay, it should hook itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). It shouldn't have to be so out-of-place like in patch 2/3 of this series, where the drmm_add_final_kfree() is smack-dab in the middle of our PCI discovery function, surrounded on top and bottom by drm_dev_init() and drm_dev_register(). The "managed resources" infra should be non-invasive and drivers shouldn't have to change to use it--it should be invisible to them. Then our kref would just work.

yep, that make sense. But you need more changes to fix this issue. this patchset is insufficient.

Or LESS. Less changes. Less is better. Basically revert and redo all this "managed resources".

Regards, Luben

...

...
...
And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

No! No final_kfree. It's a hack.

Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.

Regards, Luben

...
-----原始邮件----- 发件人: "Tuikov, Luben" Luben.Tuikov@amd.com 日期: 2020年9月2日星期三 09:07 收件人: "amd-gfx@lists.freedesktop.org" amd-gfx@lists.freedesktop.org, "dri-devel@lists.freedesktop.org" dri-devel@lists.freedesktop.org 抄送: "Deucher, Alexander" Alexander.Deucher@amd.com, Daniel Vetter daniel@ffwll.ch, "Pan, Xinhui" Xinhui.Pan@amd.com, "Tuikov, Luben" Luben.Tuikov@amd.com 主题: [PATCH 0/3] Use implicit kref infra

Use the implicit kref infrastructure to free the container struct amdgpu_device, container of struct drm_device.

First, in drm_dev_register(), do not indiscriminately warn when a DRM driver hasn't opted for managed.final_kfree, but instead check if the driver has provided its own "release" function callback in the DRM driver structure. If that is the case, no warning.

Remove drmm_add_final_kfree(). We take care of that, in the kref "release" callback when all refs are down to 0, via drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be suspended always exists, so long as the underlying PCI and DRM devices exist.

Luben Tuikov (3): drm: No warn for drivers who provide release drm/amdgpu: Remove drmm final free drm/amdgpu: Remove superfluous NULL check
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 2 --
drivers/gpu/drm/drm_drv.c                  | 3 ++-
3 files changed, 2 insertions(+), 6 deletions(-)
-- 2.28.0.394.ge197136389

Daniel Stone

3 p.m.

Hi Luben,

On Wed, 2 Sep 2020 at 15:51, Luben Tuikov luben.tuikov@amd.com wrote:

...

Of course it's true--good morning!

Let me stop you right there--just read the documentation I pointed to you at.

No!

I'm sorry, that doesn't make sense.

No, that's horrible.

No, that's horrible.

You need to understand how the kref infrastructure works in the kernel. I've said it a million times: it's implicit.

Or LESS. Less changes. Less is better. Basically revert and redo all this "managed resources".

There are many better ways to make your point. At the moment it's just getting lost in shouting.

Cheers, Daniel

Luben Tuikov

3:16 p.m.

On 2020-09-02 11:00, Daniel Stone wrote:

...

Hi Luben,

On Wed, 2 Sep 2020 at 15:51, Luben Tuikov luben.tuikov@amd.com wrote:

...
Of course it's true--good morning!

Let me stop you right there--just read the documentation I pointed to you at.

No!

I'm sorry, that doesn't make sense.

No, that's horrible.

No, that's horrible.

You need to understand how the kref infrastructure works in the kernel. I've said it a million times: it's implicit.

Or LESS. Less changes. Less is better. Basically revert and redo all this "managed resources".

There are many better ways to make your point. At the moment it's just getting lost in shouting.

Hi Daniel,

Not sure how I can do this when someone doesn't want to read up on the kref infrastructure. Can you help?

When someone starts off with "My understanding of ..." (as in the OP) you know you're in trouble and in for a rough times.

Such is the nature of world-wide open-to-everyone mailing lists where anyone can put forth an argument, regardless of their level of understanding. The more obfuscated an argument, the more uncertainty.

If one knows the kref infrastructure, it just clicks, no explanation necessary.

Regards, Luben

...

Cheers, Daniel

Daniel Stone

3:51 p.m.

Hi Luben,

On Wed, 2 Sep 2020 at 16:16, Luben Tuikov luben.tuikov@amd.com wrote:

...

Not sure how I can do this when someone doesn't want to read up on the kref infrastructure. Can you help?

When someone starts off with "My understanding of ..." (as in the OP) you know you're in trouble and in for a rough times.

Such is the nature of world-wide open-to-everyone mailing lists where anyone can put forth an argument, regardless of their level of understanding. The more obfuscated an argument, the more uncertainty.

If one knows the kref infrastructure, it just clicks, no explanation necessary.

Evidently there are more points of view than yours. Evidently your method of persuasion is also not working, because this thread is now getting quite long and not converging on your point of view (which you are holding to be absolutely objectively correct).

I think you need to re-evaluate the way in which you speak to people, considering that it costs nothing to be polite and considerate, and also takes effort to be rude and dismissive.

Cheers, Daniel

Luben Tuikov

7:04 p.m.

On 2020-09-02 11:51 a.m., Daniel Stone wrote:

...

Hi Luben,

On Wed, 2 Sep 2020 at 16:16, Luben Tuikov luben.tuikov@amd.com wrote:

...
Not sure how I can do this when someone doesn't want to read up on the kref infrastructure. Can you help?

When someone starts off with "My understanding of ..." (as in the OP) you know you're in trouble and in for a rough times.

Such is the nature of world-wide open-to-everyone mailing lists where anyone can put forth an argument, regardless of their level of understanding. The more obfuscated an argument, the more uncertainty.

If one knows the kref infrastructure, it just clicks, no explanation necessary.

Evidently there are more points of view than yours. Evidently your method of persuasion is also not working, because this thread is now getting quite long and not converging on your point of view (which you are holding to be absolutely objectively correct).

I think you need to re-evaluate the way in which you speak to people, considering that it costs nothing to be polite and considerate, and also takes effort to be rude and dismissive.

Not sure how to help this:

...

My understanding of the drm core code is like something below. struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end.

Regards, Luben

Alex Deucher

7:16 p.m.

On Wed, Sep 2, 2020 at 3:04 PM Luben Tuikov luben.tuikov@amd.com wrote:

...

On 2020-09-02 11:51 a.m., Daniel Stone wrote:

...
Hi Luben,

On Wed, 2 Sep 2020 at 16:16, Luben Tuikov luben.tuikov@amd.com wrote:

...
Not sure how I can do this when someone doesn't want to read up on the kref infrastructure. Can you help?

When someone starts off with "My understanding of ..." (as in the OP) you know you're in trouble and in for a rough times.

Such is the nature of world-wide open-to-everyone mailing lists where anyone can put forth an argument, regardless of their level of understanding. The more obfuscated an argument, the more uncertainty.

If one knows the kref infrastructure, it just clicks, no explanation necessary.

Evidently there are more points of view than yours. Evidently your method of persuasion is also not working, because this thread is now getting quite long and not converging on your point of view (which you are holding to be absolutely objectively correct).

I think you need to re-evaluate the way in which you speak to people, considering that it costs nothing to be polite and considerate, and also takes effort to be rude and dismissive.

Not sure how to help this:

...
My understanding of the drm core code is like something below. struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end.

Luben, please tone it down a bit. You are coming across very harshly. You do make a good point though. What is the point of having the drm release callback if it's ostensibly useless? We should either use it as intended to release the structures allocated by the driver or the drm core should handle it all. With the managed resources there is an incongruity between allocation and freeing which leads to confusion. Even with the proposed updated documentation, it's not clear to me who should use the managed resources or not. My understanding was that it was optional for drivers that wanted it.

Alex

Daniel Vetter

7:55 p.m.

On Wed, Sep 2, 2020 at 9:17 PM Alex Deucher alexdeucher@gmail.com wrote:

...

On Wed, Sep 2, 2020 at 3:04 PM Luben Tuikov luben.tuikov@amd.com wrote:

...
On 2020-09-02 11:51 a.m., Daniel Stone wrote:

...
Hi Luben,

On Wed, 2 Sep 2020 at 16:16, Luben Tuikov luben.tuikov@amd.com wrote:

...
Not sure how I can do this when someone doesn't want to read up on the kref infrastructure. Can you help?

When someone starts off with "My understanding of ..." (as in the OP) you know you're in trouble and in for a rough times.

Such is the nature of world-wide open-to-everyone mailing lists where anyone can put forth an argument, regardless of their level of understanding. The more obfuscated an argument, the more uncertainty.

If one knows the kref infrastructure, it just clicks, no explanation necessary.

Evidently there are more points of view than yours. Evidently your method of persuasion is also not working, because this thread is now getting quite long and not converging on your point of view (which you are holding to be absolutely objectively correct).

I think you need to re-evaluate the way in which you speak to people, considering that it costs nothing to be polite and considerate, and also takes effort to be rude and dismissive.

Not sure how to help this:

...
My understanding of the drm core code is like something below. struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end.

Luben, please tone it down a bit. You are coming across very harshly. You do make a good point though. What is the point of having the drm release callback if it's ostensibly useless? We should either use it as intended to release the structures allocated by the driver or the drm core should handle it all. With the managed resources there is an incongruity between allocation and freeing which leads to confusion. Even with the proposed updated documentation, it's not clear to me who should use the managed resources or not. My understanding was that it was optional for drivers that wanted it.

In an ideal world this would all be perfectly clean. In reality we have huge existing drivers which, if at all feasible, can only be converted over step by step.

So with that there's a few ways we can make this happen: - drmm resources are cleaned up before ->release is called. This means doing auto-cleanup of the final steps like cleanup up drm_device resources is gated on all drivers first being converted completely over to drmm, which is never going to happen. And it's holding up removing all the fairly simple cleanup code from small driver, which is where managed resources (whether drmm or devm) have the most benefit, often they completely eliminate the need for any explicit teardown code. - start in the middle. That just oopses because the unwind order isn't the inverse of the setup order anymore, and generally that's required. - start at the end. Unfortunately this means that the drm_device cannot be freed in the driver's ->release callback, therefore for transition purposes I had to sprinkle drmm_add_final_kfree all over the place. But if you use devm_drm_dev_alloc (like the updated docs recommend) that's not needed anymore, so really not an eyesore for driver developers.

Yes there's mildly tricky code in the core as a result, but given that you guys wont volunteer to fix up the entire subsystem either we just have to live with that I think. Also, the commit adding the drm_managed stuff does explain these implementation details and the reasons why.

Cheers, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Daniel Vetter

7:59 p.m.

On Wed, Sep 2, 2020 at 9:55 PM Daniel Vetter daniel@ffwll.ch wrote:

...

On Wed, Sep 2, 2020 at 9:17 PM Alex Deucher alexdeucher@gmail.com wrote:

...
On Wed, Sep 2, 2020 at 3:04 PM Luben Tuikov luben.tuikov@amd.com wrote:

...
On 2020-09-02 11:51 a.m., Daniel Stone wrote:

...
Hi Luben,

On Wed, 2 Sep 2020 at 16:16, Luben Tuikov luben.tuikov@amd.com wrote:

...
Not sure how I can do this when someone doesn't want to read up on the kref infrastructure. Can you help?

When someone starts off with "My understanding of ..." (as in the OP) you know you're in trouble and in for a rough times.

Such is the nature of world-wide open-to-everyone mailing lists where anyone can put forth an argument, regardless of their level of understanding. The more obfuscated an argument, the more uncertainty.

If one knows the kref infrastructure, it just clicks, no explanation necessary.

Evidently there are more points of view than yours. Evidently your method of persuasion is also not working, because this thread is now getting quite long and not converging on your point of view (which you are holding to be absolutely objectively correct).

I think you need to re-evaluate the way in which you speak to people, considering that it costs nothing to be polite and considerate, and also takes effort to be rude and dismissive.

Not sure how to help this:

...
My understanding of the drm core code is like something below. struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end.

Luben, please tone it down a bit. You are coming across very harshly. You do make a good point though. What is the point of having the drm release callback if it's ostensibly useless? We should either use it as intended to release the structures allocated by the driver or the drm core should handle it all. With the managed resources there is an incongruity between allocation and freeing which leads to confusion. Even with the proposed updated documentation, it's not clear to me who should use the managed resources or not. My understanding was that it was optional for drivers that wanted it.

In an ideal world this would all be perfectly clean. In reality we have huge existing drivers which, if at all feasible, can only be converted over step by step.

So with that there's a few ways we can make this happen:

drmm resources are cleaned up before ->release is called. This means

doing auto-cleanup of the final steps like cleanup up drm_device resources is gated on all drivers first being converted completely over to drmm, which is never going to happen. And it's holding up removing all the fairly simple cleanup code from small driver, which is where managed resources (whether drmm or devm) have the most benefit, often they completely eliminate the need for any explicit teardown code.

start in the middle. That just oopses because the unwind order isn't

the inverse of the setup order anymore, and generally that's required.

start at the end. Unfortunately this means that the drm_device

cannot be freed in the driver's ->release callback, therefore for transition purposes I had to sprinkle drmm_add_final_kfree all over the place. But if you use devm_drm_dev_alloc (like the updated docs recommend) that's not needed anymore, so really not an eyesore for driver developers.

Yes there's mildly tricky code in the core as a result, but given that you guys wont volunteer to fix up the entire subsystem either we just have to live with that I think. Also, the commit adding the drm_managed stuff does explain these implementation details and the reasons why.

Also note that tons of stuff in drm doesn't yet provide drmm_ versions, teardown-less drivers really only works for really simple ones. So completely getting rid of the ->release callback will also need lots of core work, like the currently in-flight series to add more drmm_ helpers for kms objects:

https://lore.kernel.org/dri-devel/20200827160545.1146-1-p.zabel@pengutronix....

Help obviously very much welcome.

Cheers, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Pan, Xinhui

3 Sep 3 Sep

1:57 a.m.

...

2020年9月2日 22:50，Tuikov, Luben Luben.Tuikov@amd.com 写道：

On 2020-09-02 00:43, Pan, Xinhui wrote:

...
...
2020年9月2日 11:46，Tuikov, Luben Luben.Tuikov@amd.com 写道：

On 2020-09-01 21:42, Pan, Xinhui wrote:

...
If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective.

I know dev is embedded in adev--I did that patchset.

...
809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if (dev->driver->release) 814 dev->driver->release(dev); 815 816 drm_managed_release(dev); 817 818 kfree(dev->managed.final_kfree); 819 }

That's simple--this comes from change c6603c740e0e3 and it should be reverted. Simple as that.

The version before this change was absolutely correct:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

Meaning, "the kref is now 0"--> if the driver has a release, call it, else use our own. But note that nothing can be assumed after this point, about the existence of "dev".

It is exactly because struct drm_device is statically embedded into a container, struct amdgpu_device, that this change above should be reverted.

This is very similar to how fops has open/release but no close. That is, the "release" is called only when the last kref is released, i.e. when kref goes from non-zero to zero.

This uses the kref infrastructure which has been around for about 20 years in the Linux kernel.

I suggest reading the comments in drm_dev.c mostly, "DOC: driver instance overview" starting at line 240 onwards. This is right above drm_put_dev(). There is actually an example of a driver in the comment. Also the comment to drm_dev_init().

Now, take a look at this:

/**

drm_dev_put - Drop reference of a DRM device

@dev: device to drop reference of or NULL

This decreases the ref-count of @dev by one. The device is destroyed if the

ref-count drops to zero.

*/ void drm_dev_put(struct drm_device *dev) { if (dev) kref_put(&dev->ref, drm_dev_release); } EXPORT_SYMBOL(drm_dev_put);

Two things:

It is us, who kzalloc the amdgpu device, which contains

the drm_device (you'll see this discussed in the reading material I pointed to above). We do this because we're probing the PCI device whether we'll work it it or not.

that is true.

Of course it's true--good morning!

...
My understanding of the drm core code is like something below.

Let me stop you right there--just read the documentation I pointed to you at.

...
struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end.

No! B, which is the amdgpu_device struct "exists" before A, which is the DRM struct. This is why DRM recommends to _embed_ it into the driver's own device struct, as the documentation I pointed you to at.

I think you are misleading me here. A pci dev as you said below can act as many roles, a drm dev, a tty dev, etc. say, struct B{ struct A; struct TTY; struct printer; ... } but TTY or other members has nothing to do with our discussion.

B of course exists before A. but the code logic is not that. code below is really rare in drm world. create_B() { init B members return create_A() } So usually B have more work to do after it initialize A. then code should like below create_B() { init B base members create_A() init B extended members }

For release part. release B extended member release A release B base member

a good design should not have the so-called extended and base members existing in the release process. Now have a look at the drm core code. it expects driver to do release process like below. release B cleanup work of A

as long as the cleanup work of A exists, we can not do a pure release of B.

So if you want to follow the ruls of kref, you have to rework the drm core code first. only after that, we can do a pure release of B.

What I am confused is that, kfer sits in drm dev. why adev must be destroyed too when drm dev is going to be destroyed. adev is not equal to drm dev. I think struct below is more fit for the logic. struct adev { struct drm * ddev_p = &adev.ddev struct type *odev_p = &adev.odev struct drm ddev struct type odev }

...

"undone" first, since the DRM layer may finish with a device, but the device may still exists with the driver and as well as with PCI. This is very VERY common, with kernels, devices, device abstractions, device layers: DRM dev <-- amdgpu dev <-- PCI dev.

...
But yes, practice is more complex. if B has nothing to be destroyed. we can destory A directly, otherwise destroy B firstly.

I'm sorry, that doesn't make sense. There is no such thing as "destroy directly" and "otherwise"--this is absolutely not how this works.

A good architecture doesn't have if-then-else--it's just a pure single-branch path.

well, there is code below everywhere. if (fops->release) fops->release else default_release

...

...
in this case, we can do something below in our release() //some cleanup work of B drm_dev_fini(dev);//destroy A kfree(adev)

...
Using the kref infrastructure, when the ref goes to 0,

drm_dev_release is called. And here's the KEY: Because WE allocated the container, we should free it--after the release method is called, DRM cannot assume anything about the drm device or the container. The "release" method is final.

We allocate, we free. And we free only when the ref goes to 0.

DRM can, in due time, "free" itself of the DRM device and stop having knowledge of it--that's fine, but as long as the ref is not 0, the amdgpu device and thus the contained DRM device, cannot be freed.

...
You have to make another change something like diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..2aabd2b4c63b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
  drm_managed_release(dev);
  kfree(dev->managed.final_kfree);
  if (dev->driver->final_release)
          dev->driver->final_release(dev);
}
No. What's this? There is no such thing as "final" release, nor is there a "partial" release. When the kref goes to 0, the device disappears. Simple. If someone is using it, they should kref-get it, and when they're done with it, they should kref-put it.
I just take an example here. add another release in the end. then no one could touch us. IOW, final_release.
No, that's horrible.

...
A destroy B by a callback, then A destroy itself. It assumes B just free its own resource. but that makes trouble if some resource of A is allocated by B. Because B must take care of these common resource shared between A and B.

No, that's horrible.

...
yes, that logical is more complex. So I think we can revert drm_dev_release to its previous version.

drm_dev_release() in its original form, was pure:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

...
...
The whole point is that this is done implicitly, via the kref infrastructure. drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all as per the documentation I pointed you to above.

I am not taking about kref. what we are discussing is all about the release sequence.

You need to understand how the kref infrastructure works in the kernel. I've said it a million times: it's implicit. The "release sequence" as you like to call it, is implicit in the kref infrastructure.

...
...
Another point is that we can do some other stuff in the release function, notify someone, write some registers, free memory we use for that PCI device, etc.

If the "managed resources" infrastructure wants to stay, it should hook itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). It shouldn't have to be so out-of-place like in patch 2/3 of this series, where the drmm_add_final_kfree() is smack-dab in the middle of our PCI discovery function, surrounded on top and bottom by drm_dev_init() and drm_dev_register(). The "managed resources" infra should be non-invasive and drivers shouldn't have to change to use it--it should be invisible to them. Then our kref would just work.

yep, that make sense. But you need more changes to fix this issue. this patchset is insufficient.

Or LESS. Less changes. Less is better. Basically revert and redo all this "managed resources".

Regards, Luben

...
...
...
And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

No! No final_kfree. It's a hack.

Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.

Regards, Luben

...
-----原始邮件----- 发件人: "Tuikov, Luben" Luben.Tuikov@amd.com 日期: 2020年9月2日星期三 09:07 收件人: "amd-gfx@lists.freedesktop.org" amd-gfx@lists.freedesktop.org, "dri-devel@lists.freedesktop.org" dri-devel@lists.freedesktop.org 抄送: "Deucher, Alexander" Alexander.Deucher@amd.com, Daniel Vetter daniel@ffwll.ch, "Pan, Xinhui" Xinhui.Pan@amd.com, "Tuikov, Luben" Luben.Tuikov@amd.com 主题: [PATCH 0/3] Use implicit kref infra

Use the implicit kref infrastructure to free the container struct amdgpu_device, container of struct drm_device.

First, in drm_dev_register(), do not indiscriminately warn when a DRM driver hasn't opted for managed.final_kfree, but instead check if the driver has provided its own "release" function callback in the DRM driver structure. If that is the case, no warning.

Remove drmm_add_final_kfree(). We take care of that, in the kref "release" callback when all refs are down to 0, via drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be suspended always exists, so long as the underlying PCI and DRM devices exist.

Luben Tuikov (3): drm: No warn for drivers who provide release drm/amdgpu: Remove drmm final free drm/amdgpu: Remove superfluous NULL check

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- drivers/gpu/drm/drm_drv.c | 3 ++- 3 files changed, 2 insertions(+), 6 deletions(-)

-- 2.28.0.394.ge197136389

Luben Tuikov

6:52 p.m.

On 2020-09-02 9:57 p.m., Pan, Xinhui wrote:

...

...
2020年9月2日 22:50，Tuikov, Luben Luben.Tuikov@amd.com 写道：

On 2020-09-02 00:43, Pan, Xinhui wrote:

...
...
2020年9月2日 11:46，Tuikov, Luben Luben.Tuikov@amd.com 写道：

On 2020-09-01 21:42, Pan, Xinhui wrote:

...
If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective.

I know dev is embedded in adev--I did that patchset.

...
809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if (dev->driver->release) 814 dev->driver->release(dev); 815 816 drm_managed_release(dev); 817 818 kfree(dev->managed.final_kfree); 819 }

That's simple--this comes from change c6603c740e0e3 and it should be reverted. Simple as that.

The version before this change was absolutely correct:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

Meaning, "the kref is now 0"--> if the driver has a release, call it, else use our own. But note that nothing can be assumed after this point, about the existence of "dev".

It is exactly because struct drm_device is statically embedded into a container, struct amdgpu_device, that this change above should be reverted.

This is very similar to how fops has open/release but no close. That is, the "release" is called only when the last kref is released, i.e. when kref goes from non-zero to zero.

This uses the kref infrastructure which has been around for about 20 years in the Linux kernel.

I suggest reading the comments in drm_dev.c mostly, "DOC: driver instance overview" starting at line 240 onwards. This is right above drm_put_dev(). There is actually an example of a driver in the comment. Also the comment to drm_dev_init().

Now, take a look at this:

/**

drm_dev_put - Drop reference of a DRM device

@dev: device to drop reference of or NULL

This decreases the ref-count of @dev by one. The device is destroyed if the

ref-count drops to zero.

*/ void drm_dev_put(struct drm_device *dev) { if (dev) kref_put(&dev->ref, drm_dev_release); } EXPORT_SYMBOL(drm_dev_put);

Two things:

It is us, who kzalloc the amdgpu device, which contains

the drm_device (you'll see this discussed in the reading material I pointed to above). We do this because we're probing the PCI device whether we'll work it it or not.

that is true.

Of course it's true--good morning!

...
My understanding of the drm core code is like something below.

Let me stop you right there--just read the documentation I pointed to you at.

...
struct B { strcut A } we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end.

No! B, which is the amdgpu_device struct "exists" before A, which is the DRM struct. This is why DRM recommends to _embed_ it into the driver's own device struct, as the documentation I pointed you to at.

I think you are misleading me here. A pci dev as you said below can act as many roles, a drm dev, a tty dev, etc. say, struct B{ struct A; struct TTY; struct printer; ... } but TTY or other members has nothing to do with our discussion.

B of course exists before A. but the code logic is not that. code below is really rare in drm world. create_B() { init B members return create_A() } So usually B have more work to do after it initialize A. then code should like below create_B() { init B base members create_A() init B extended members }

For release part. release B extended member release A release B base member

a good design should not have the so-called extended and base members existing in the release process. Now have a look at the drm core code. it expects driver to do release process like below. release B cleanup work of A

as long as the cleanup work of A exists, we can not do a pure release of B.

So if you want to follow the ruls of kref, you have to rework the drm core code first. only after that, we can do a pure release of B.

What I am confused is that, kfer sits in drm dev. why adev must be destroyed too when drm dev is going to be destroyed. adev is not equal to drm dev. I think struct below is more fit for the logic. struct adev { struct drm * ddev_p = &adev.ddev struct type *odev_p = &adev.odev struct drm ddev struct type odev }

I've not much idea what you said above.

Consider that DRM dev is embedded into amdgpu_device, as opposed to be a pointer to an allocated DRM dev (which used to be the case before my patch making the driver device struct a container and DRM a member). And as such, DRM dev embedded into amdgpu_device, then either both exist or neither exists, allocated in memory.

DRM layer can finish with a DRM device, fine. amdgpu driver can finish with an amdgpu device, fine. They can happen out of order.

But the beauty of kref, is that regardless of how a driver or layer or a third entity uses and later relinquishes a device it is working on, only when the kref goes to 0, is the memory of, in our case both DRM device struct and amdgpu_device struct both, freed.

And as such, this happens implicitly, without explicit logic or tracking. All that is required is that if an entity wants to use a struct, it calls kref-get, and when it is done using it, it calls kref-put.

The new half-baked managed resources, ported from devres, obviates this a bit, and smudges this clean separation, because now with it, the release method isn't terminal (final).

...

...
"undone" first, since the DRM layer may finish with a device, but the device may still exists with the driver and as well as with PCI. This is very VERY common, with kernels, devices, device abstractions, device layers: DRM dev <-- amdgpu dev <-- PCI dev.

...
But yes, practice is more complex. if B has nothing to be destroyed. we can destory A directly, otherwise destroy B firstly.

I'm sorry, that doesn't make sense. There is no such thing as "destroy directly" and "otherwise"--this is absolutely not how this works.

A good architecture doesn't have if-then-else--it's just a pure single-branch path.

well, there is code below everywhere. if (fops->release) fops->release else default_release

Yep, exactly! And this is the 3rd or 4th time I'm pasting this, lets go for one more:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

Is how drm_dev_release() used to look like, before "managed resources" came in. When you look at it, it is clear what is happening.

You can have managed resources in DRM, but it should be a bit more elegant and implicit, and possibly preserve this finality of the release method.

For instance, one could simply decree that drivers would allocate and free their container structure, using new functions (as "DRM managed resources" does right now), but without changing the release method, thus leaving the finality to the driver.

At this point the driver may call a helper function in DRM or, DRM can add a function to be called in drm_dev_release() _before_ calling the driver's release method.

Right now what we have in amdgpu is non-symmetric allocation scheme, since with your patch, release is missing.

Regards, Luben

...

...
...
in this case, we can do something below in our release() //some cleanup work of B drm_dev_fini(dev);//destroy A kfree(adev)

...
Using the kref infrastructure, when the ref goes to 0,

drm_dev_release is called. And here's the KEY: Because WE allocated the container, we should free it--after the release method is called, DRM cannot assume anything about the drm device or the container. The "release" method is final.

We allocate, we free. And we free only when the ref goes to 0.

DRM can, in due time, "free" itself of the DRM device and stop having knowledge of it--that's fine, but as long as the ref is not 0, the amdgpu device and thus the contained DRM device, cannot be freed.

...
You have to make another change something like diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..2aabd2b4c63b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
  drm_managed_release(dev);
  kfree(dev->managed.final_kfree);
  if (dev->driver->final_release)
          dev->driver->final_release(dev);
}
No. What's this? There is no such thing as "final" release, nor is there a "partial" release. When the kref goes to 0, the device disappears. Simple. If someone is using it, they should kref-get it, and when they're done with it, they should kref-put it.
I just take an example here. add another release in the end. then no one could touch us. IOW, final_release.
No, that's horrible.

...
A destroy B by a callback, then A destroy itself. It assumes B just free its own resource. but that makes trouble if some resource of A is allocated by B. Because B must take care of these common resource shared between A and B.

No, that's horrible.

...
yes, that logical is more complex. So I think we can revert drm_dev_release to its previous version.

drm_dev_release() in its original form, was pure:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

...
...
The whole point is that this is done implicitly, via the kref infrastructure. drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all as per the documentation I pointed you to above.

I am not taking about kref. what we are discussing is all about the release sequence.

You need to understand how the kref infrastructure works in the kernel. I've said it a million times: it's implicit. The "release sequence" as you like to call it, is implicit in the kref infrastructure.

...
...
Another point is that we can do some other stuff in the release function, notify someone, write some registers, free memory we use for that PCI device, etc.

If the "managed resources" infrastructure wants to stay, it should hook itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). It shouldn't have to be so out-of-place like in patch 2/3 of this series, where the drmm_add_final_kfree() is smack-dab in the middle of our PCI discovery function, surrounded on top and bottom by drm_dev_init() and drm_dev_register(). The "managed resources" infra should be non-invasive and drivers shouldn't have to change to use it--it should be invisible to them. Then our kref would just work.

yep, that make sense. But you need more changes to fix this issue. this patchset is insufficient.

Or LESS. Less changes. Less is better. Basically revert and redo all this "managed resources".

Regards, Luben

...
...
...
And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

No! No final_kfree. It's a hack.

Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.

Regards, Luben

...
-----原始邮件----- 发件人: "Tuikov, Luben" Luben.Tuikov@amd.com 日期: 2020年9月2日星期三 09:07 收件人: "amd-gfx@lists.freedesktop.org" amd-gfx@lists.freedesktop.org, "dri-devel@lists.freedesktop.org" dri-devel@lists.freedesktop.org 抄送: "Deucher, Alexander" Alexander.Deucher@amd.com, Daniel Vetter daniel@ffwll.ch, "Pan, Xinhui" Xinhui.Pan@amd.com, "Tuikov, Luben" Luben.Tuikov@amd.com 主题: [PATCH 0/3] Use implicit kref infra

Use the implicit kref infrastructure to free the container struct amdgpu_device, container of struct drm_device.

First, in drm_dev_register(), do not indiscriminately warn when a DRM driver hasn't opted for managed.final_kfree, but instead check if the driver has provided its own "release" function callback in the DRM driver structure. If that is the case, no warning.

Remove drmm_add_final_kfree(). We take care of that, in the kref "release" callback when all refs are down to 0, via drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be suspended always exists, so long as the underlying PCI and DRM devices exist.

Luben Tuikov (3): drm: No warn for drivers who provide release drm/amdgpu: Remove drmm final free drm/amdgpu: Remove superfluous NULL check

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- drivers/gpu/drm/drm_drv.c | 3 ++- 3 files changed, 2 insertions(+), 6 deletions(-)

-- 2.28.0.394.ge197136389

Daniel Vetter

2 Sep 2 Sep

6:52 a.m.

On Tue, Sep 01, 2020 at 11:46:18PM -0400, Luben Tuikov wrote:

...

On 2020-09-01 21:42, Pan, Xinhui wrote:

...
If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective.

I know dev is embedded in adev--I did that patchset.

...
809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if (dev->driver->release) 814 dev->driver->release(dev); 815 816 drm_managed_release(dev); 817 818 kfree(dev->managed.final_kfree); 819 }

That's simple--this comes from change c6603c740e0e3 and it should be reverted. Simple as that.

The version before this change was absolutely correct:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

Meaning, "the kref is now 0"--> if the driver has a release, call it, else use our own. But note that nothing can be assumed after this point, about the existence of "dev".

It is exactly because struct drm_device is statically embedded into a container, struct amdgpu_device, that this change above should be reverted.

This is very similar to how fops has open/release but no close. That is, the "release" is called only when the last kref is released, i.e. when kref goes from non-zero to zero.

This uses the kref infrastructure which has been around for about 20 years in the Linux kernel.

I suggest reading the comments in drm_dev.c mostly, "DOC: driver instance overview" starting at line 240 onwards. This is right above drm_put_dev(). There is actually an example of a driver in the comment. Also the comment to drm_dev_init().

Now, take a look at this:

/**

drm_dev_put - Drop reference of a DRM device

@dev: device to drop reference of or NULL

This decreases the ref-count of @dev by one. The device is destroyed if the

ref-count drops to zero.

*/ void drm_dev_put(struct drm_device *dev) { if (dev) kref_put(&dev->ref, drm_dev_release); } EXPORT_SYMBOL(drm_dev_put);

Two things:

It is us, who kzalloc the amdgpu device, which contains

the drm_device (you'll see this discussed in the reading material I pointed to above). We do this because we're probing the PCI device whether we'll work it it or not.

Using the kref infrastructure, when the ref goes to 0,

drm_dev_release is called. And here's the KEY: Because WE allocated the container, we should free it--after the release method is called, DRM cannot assume anything about the drm device or the container. The "release" method is final.

We allocate, we free. And we free only when the ref goes to 0.

DRM can, in due time, "free" itself of the DRM device and stop having knowledge of it--that's fine, but as long as the ref is not 0, the amdgpu device and thus the contained DRM device, cannot be freed.

...
You have to make another change something like diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..2aabd2b4c63b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
    drm_managed_release(dev);
  kfree(dev->managed.final_kfree);
  if (dev->driver->final_release)
          dev->driver->final_release(dev);
}
No. What's this? There is no such thing as "final" release, nor is there a "partial" release. When the kref goes to 0, the device disappears. Simple. If someone is using it, they should kref-get it, and when they're done with it, they should kref-put it.

The whole point is that this is done implicitly, via the kref infrastructure. drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all as per the documentation I pointed you to above.

Another point is that we can do some other stuff in the release function, notify someone, write some registers, free memory we use for that PCI device, etc.

If the "managed resources" infrastructure wants to stay, it should hook itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). It shouldn't have to be so out-of-place like in patch 2/3 of this series, where the drmm_add_final_kfree() is smack-dab in the middle of our PCI discovery function, surrounded on top and bottom by drm_dev_init() and drm_dev_register(). The "managed resources" infra should be non-invasive and drivers shouldn't have to change to use it--it should be invisible to them. Then our kref would just work.

Unfortunately some part of that drm managed series is stuck still waiting for review (I guess I need to resubmit), but with that the docs would tell you to use devm_drm_dev_alloc instead of hand-rolling all this.

Definitely not any of the ideas proposed in this discussion or patches :-)

I'll cc you on that series when I send it out again.

Cheers, Daniel

...

...
And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

No! No final_kfree. It's a hack.

Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.

Regards, Luben

...
-----原始邮件----- 发件人: "Tuikov, Luben" Luben.Tuikov@amd.com 日期: 2020年9月2日星期三 09:07 收件人: "amd-gfx@lists.freedesktop.org" amd-gfx@lists.freedesktop.org, "dri-devel@lists.freedesktop.org" dri-devel@lists.freedesktop.org 抄送: "Deucher, Alexander" Alexander.Deucher@amd.com, Daniel Vetter daniel@ffwll.ch, "Pan, Xinhui" Xinhui.Pan@amd.com, "Tuikov, Luben" Luben.Tuikov@amd.com 主题: [PATCH 0/3] Use implicit kref infra
Use the implicit kref infrastructure to free the container
struct amdgpu_device, container of struct drm_device.

First, in drm_dev_register(), do not indiscriminately warn
when a DRM driver hasn't opted for managed.final_kfree,
but instead check if the driver has provided its own
"release" function callback in the DRM driver structure.
If that is the case, no warning.

Remove drmm_add_final_kfree(). We take care of that, in the
kref "release" callback when all refs are down to 0, via
drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be
suspended always exists, so long as the underlying PCI and
DRM devices exist.

Luben Tuikov (3):
  drm: No warn for drivers who provide release
  drm/amdgpu: Remove drmm final free
  drm/amdgpu: Remove superfluous NULL check

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 2 --
 drivers/gpu/drm/drm_drv.c                  | 3 ++-
 3 files changed, 2 insertions(+), 6 deletions(-)

-- 
2.28.0.394.ge197136389

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Luben Tuikov

2:59 p.m.

On 2020-09-02 02:52, Daniel Vetter wrote:

...

On Tue, Sep 01, 2020 at 11:46:18PM -0400, Luben Tuikov wrote:

...
On 2020-09-01 21:42, Pan, Xinhui wrote:

...
If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

Do you mean "look at the function below", using "below" as an adverb? "below" is not an adjective.

I know dev is embedded in adev--I did that patchset.

...
809 static void drm_dev_release(struct kref *ref) 810 { 811 struct drm_device *dev = container_of(ref, struct drm_device, ref); 812 813 if (dev->driver->release) 814 dev->driver->release(dev); 815 816 drm_managed_release(dev); 817 818 kfree(dev->managed.final_kfree); 819 }

That's simple--this comes from change c6603c740e0e3 and it should be reverted. Simple as that.

The version before this change was absolutely correct:

static void drm_dev_release(struct kref *ref) { if (dev->driver->release) dev->driver->release(dev); else drm_dev_fini(dev); }

Meaning, "the kref is now 0"--> if the driver has a release, call it, else use our own. But note that nothing can be assumed after this point, about the existence of "dev".

It is exactly because struct drm_device is statically embedded into a container, struct amdgpu_device, that this change above should be reverted.

This is very similar to how fops has open/release but no close. That is, the "release" is called only when the last kref is released, i.e. when kref goes from non-zero to zero.

This uses the kref infrastructure which has been around for about 20 years in the Linux kernel.

I suggest reading the comments in drm_dev.c mostly, "DOC: driver instance overview" starting at line 240 onwards. This is right above drm_put_dev(). There is actually an example of a driver in the comment. Also the comment to drm_dev_init().

Now, take a look at this:

/**

drm_dev_put - Drop reference of a DRM device

@dev: device to drop reference of or NULL

This decreases the ref-count of @dev by one. The device is destroyed if the

ref-count drops to zero.

*/ void drm_dev_put(struct drm_device *dev) { if (dev) kref_put(&dev->ref, drm_dev_release); } EXPORT_SYMBOL(drm_dev_put);

Two things:

It is us, who kzalloc the amdgpu device, which contains

the drm_device (you'll see this discussed in the reading material I pointed to above). We do this because we're probing the PCI device whether we'll work it it or not.

Using the kref infrastructure, when the ref goes to 0,

drm_dev_release is called. And here's the KEY: Because WE allocated the container, we should free it--after the release method is called, DRM cannot assume anything about the drm device or the container. The "release" method is final.

We allocate, we free. And we free only when the ref goes to 0.

DRM can, in due time, "free" itself of the DRM device and stop having knowledge of it--that's fine, but as long as the ref is not 0, the amdgpu device and thus the contained DRM device, cannot be freed.

...
You have to make another change something like diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 13068fdf4331..2aabd2b4c63b 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
    drm_managed_release(dev);
  kfree(dev->managed.final_kfree);
  if (dev->driver->final_release)
          dev->driver->final_release(dev);
}
No. What's this? There is no such thing as "final" release, nor is there a "partial" release. When the kref goes to 0, the device disappears. Simple. If someone is using it, they should kref-get it, and when they're done with it, they should kref-put it.

The whole point is that this is done implicitly, via the kref infrastructure. drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all as per the documentation I pointed you to above.

Another point is that we can do some other stuff in the release function, notify someone, write some registers, free memory we use for that PCI device, etc.

If the "managed resources" infrastructure wants to stay, it should hook itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register(). It shouldn't have to be so out-of-place like in patch 2/3 of this series, where the drmm_add_final_kfree() is smack-dab in the middle of our PCI discovery function, surrounded on top and bottom by drm_dev_init() and drm_dev_register(). The "managed resources" infra should be non-invasive and drivers shouldn't have to change to use it--it should be invisible to them. Then our kref would just work.
Unfortunately some part of that drm managed series is stuck still waiting for review (I guess I need to resubmit), but with that the docs would tell you to use devm_drm_dev_alloc instead of hand-rolling all this.

That's very very misleading--the documentation DOES NOT tell you this. It tells you exactly how to do it using the "release" method folded into the kref infrastructure. It is your changes to the documentation which tell you to use this new alloc.

...

Definitely not any of the ideas proposed in this discussion or patches :-)

These are nothing new--it's just standard DRM stuff described in the documentation in drm_drv.c.

There are NO NEW IDEAS or anything "proposed" in this discussion or patches--it's just standard stuff from DRM, which had been the case for the last 10 years, until your "managed resources" came along. It's the managed resources which are new.

Regards, Luben

...

I'll cc you on that series when I send it out again.

Cheers, Daniel

...
...
And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree. Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

No! No final_kfree. It's a hack.

Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.

Regards, Luben

...
-----原始邮件----- 发件人: "Tuikov, Luben" Luben.Tuikov@amd.com 日期: 2020年9月2日星期三 09:07 收件人: "amd-gfx@lists.freedesktop.org" amd-gfx@lists.freedesktop.org, "dri-devel@lists.freedesktop.org" dri-devel@lists.freedesktop.org 抄送: "Deucher, Alexander" Alexander.Deucher@amd.com, Daniel Vetter daniel@ffwll.ch, "Pan, Xinhui" Xinhui.Pan@amd.com, "Tuikov, Luben" Luben.Tuikov@amd.com 主题: [PATCH 0/3] Use implicit kref infra
Use the implicit kref infrastructure to free the container
struct amdgpu_device, container of struct drm_device.

First, in drm_dev_register(), do not indiscriminately warn
when a DRM driver hasn't opted for managed.final_kfree,
but instead check if the driver has provided its own
"release" function callback in the DRM driver structure.
If that is the case, no warning.

Remove drmm_add_final_kfree(). We take care of that, in the
kref "release" callback when all refs are down to 0, via
drm_dev_put(), i.e. the free is implicit.

Remove superfluous NULL check, since the DRM device to be
suspended always exists, so long as the underlying PCI and
DRM devices exist.

Luben Tuikov (3):
  drm: No warn for drivers who provide release
  drm/amdgpu: Remove drmm final free
  drm/amdgpu: Remove superfluous NULL check

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 2 --
 drivers/gpu/drm/drm_drv.c                  | 3 ++-
 3 files changed, 2 insertions(+), 6 deletions(-)

-- 
2.28.0.394.ge197136389

1699

Age (days ago)

1700

Last active (days ago)

dri-devel@lists.freedesktop.org

20 comments

6 participants

tags (0)

participants (6)

Alex Deucher
Christian König
Daniel Stone
Daniel Vetter
Luben Tuikov
Pan, Xinhui