This RFC is more of a proof of concept then a fully working solution as there are a few unresolved issues we are hoping to get advise on from people on the mailing list. Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handler if the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate. This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUS it would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence. This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated. With this I was able to cleanly emulate device unplug with X and glxgears running and later emulate device plug back and restart of X and glxgears.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page. Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requiremnts that still give useful solution to work and this is the 'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
This iteration is still more of a draft as I am still facing a few unsolved issues such as a crash in user client when trying to CPU map imported BO if the map happens after device was removed and HW failure to plug back a removed device. Also since i don't have real life setup with external GPU connected through TB I am using sysfs to emulate pci remove and i expect to encounter more issues once i try this on real life case. I am also expecting some help on this from a user who volunteered to test in the related gitlab ticket. So basically this is more of a way to get feedback if I am moving in the right direction.
[1] - Discussions during v1 of the patchset https://lists.freedesktop.org/archives/dri-devel/2020-May/265386.html [2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html [3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081
Andrey Grodzovsky (8): drm: Add dummy page per device or GEM object drm/ttm: Remap all page faults to per process dummy page. drm/ttm: Add unampping of the entire device address space drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Refactor sysfs removal drm/amdgpu: Unmap entire device address space on device remove. drm/amdgpu: Fix sdma code crash post device unplug drm/amdgpu: Prevent any job recoveries after device is unplugged.
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 19 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 50 +++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 23 ++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 12 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 24 ++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 23 +++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 13 +++++- drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 10 +++-- drivers/gpu/drm/drm_file.c | 8 ++++ drivers/gpu/drm/drm_prime.c | 10 +++++ drivers/gpu/drm/ttm/ttm_bo.c | 8 +++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++---- include/drm/drm_file.h | 2 + include/drm/drm_gem.h | 2 + include/drm/ttm/ttm_bo_driver.h | 7 +++ 22 files changed, 286 insertions(+), 55 deletions(-)
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
+ file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!file->dummy_page) { + ret = -ENOMEM; + goto out_prime_destroy; + } + return file;
out_prime_destroy: @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
+ __free_page(file->dummy_page); + drm_prime_destroy_file_private(&file->prime);
WARN_ON(!list_empty(&file->event_list)); diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle); + + if (!ret) { + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!obj->dummy_page) + ret = -ENOMEM; + } + mutex_unlock(&file_priv->prime.lock); if (ret) goto fail; @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach); + + __free_page(obj->dummy_page); + /* remove the reference */ dma_buf_put(dma_buf); } diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
+ struct page *dummy_page; + /* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count */ diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs; + + struct page *dummy_page; };
/**
On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
- file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!file->dummy_page) {
ret = -ENOMEM;
goto out_prime_destroy;
- }
- return file;
out_prime_destroy: @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
__free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime);
WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
- if (!ret) {
obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!obj->dummy_page)
ret = -ENOMEM;
- }
- mutex_unlock(&file_priv->prime.lock); if (ret) goto fail;
@@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
- __free_page(obj->dummy_page);
- /* remove the reference */ dma_buf_put(dma_buf);
} diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
Kerneldoc for these please, including why we need them and when. E.g. the one in gem_bo should say it's only for exported buffers, so that we're not colliding security spaces.
- struct page *dummy_page;
- /* private: */
#if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count */ diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page;
};
I think amdgpu doesn't care, but everyone else still might care somewhat about flink. That also shares buffers, so also needs to allocate the per-bo dummy page.
I also wonder whether we shouldn't have a helper to look up the dummy page, just to encode in core code how it's supposedo to cascade. -Daniel
/**
2.7.4
On Mon, 22 Jun 2020 11:35:01 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
...
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page;
};
I think amdgpu doesn't care, but everyone else still might care somewhat about flink. That also shares buffers, so also needs to allocate the per-bo dummy page.
Do you really care about making flink not explode on device hot-unplug? Why not just leave flink users die in a fire? It's not a regression.
Thanks, pq
On Mon, Jun 22, 2020 at 4:22 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Mon, 22 Jun 2020 11:35:01 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
...
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page;
};
I think amdgpu doesn't care, but everyone else still might care somewhat about flink. That also shares buffers, so also needs to allocate the per-bo dummy page.
Do you really care about making flink not explode on device hot-unplug? Why not just leave flink users die in a fire? It's not a regression.
It's not about exploding, they won't. With flink you can pass a buffer from one address space to the other, so imo we should avoid false sharing. E.g. if you happen to write something $secret into a private buffer, but only $non-secret stuff into shared buffers. Then if you unplug, your well-kept $secret might suddenly be visible by lots of other processes you never intended to share it with.
Just feels safer to plug that hole completely. -Daniel
On Mon, 22 Jun 2020 16:24:38 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Jun 22, 2020 at 4:22 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Mon, 22 Jun 2020 11:35:01 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
...
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page;
};
I think amdgpu doesn't care, but everyone else still might care somewhat about flink. That also shares buffers, so also needs to allocate the per-bo dummy page.
Do you really care about making flink not explode on device hot-unplug? Why not just leave flink users die in a fire? It's not a regression.
It's not about exploding, they won't. With flink you can pass a buffer from one address space to the other, so imo we should avoid false sharing. E.g. if you happen to write something $secret into a private buffer, but only $non-secret stuff into shared buffers. Then if you unplug, your well-kept $secret might suddenly be visible by lots of other processes you never intended to share it with.
Just feels safer to plug that hole completely.
Ah! Ok, I clearly didn't understand the consequences.
Thanks, pq
On 6/22/20 5:35 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!file->dummy_page) {
ret = -ENOMEM;
goto out_prime_destroy;
}
return file;
out_prime_destroy:
@@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
__free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime);
WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
- if (!ret) {
obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!obj->dummy_page)
ret = -ENOMEM;
- }
- mutex_unlock(&file_priv->prime.lock); if (ret) goto fail;
@@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
- __free_page(obj->dummy_page);
- /* remove the reference */ dma_buf_put(dma_buf); }
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
Kerneldoc for these please, including why we need them and when. E.g. the one in gem_bo should say it's only for exported buffers, so that we're not colliding security spaces.
- struct page *dummy_page;
- /* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page; };
I think amdgpu doesn't care, but everyone else still might care somewhat about flink. That also shares buffers, so also needs to allocate the per-bo dummy page.
Hi, back to this topic after a long context switch for internal project.
I don't see why for FLINK we can't use same dummy page from struct drm_gem_object - looking at drm_gem_flink_ioctl I see that the underlying object we look up is still of type drm_gem_object. Why we need per BO (TTM BO I assume?) dummy page for this ?
Andrey
I also wonder whether we shouldn't have a helper to look up the dummy page, just to encode in core code how it's supposedo to cascade. -Daniel
/**
2.7.4
On 6/22/20 5:35 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!file->dummy_page) {
ret = -ENOMEM;
goto out_prime_destroy;
}
return file;
out_prime_destroy:
@@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
__free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime);
WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
- if (!ret) {
obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!obj->dummy_page)
ret = -ENOMEM;
- }
- mutex_unlock(&file_priv->prime.lock); if (ret) goto fail;
@@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
- __free_page(obj->dummy_page);
- /* remove the reference */ dma_buf_put(dma_buf); }
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
Kerneldoc for these please, including why we need them and when. E.g. the one in gem_bo should say it's only for exported buffers, so that we're not colliding security spaces.
- struct page *dummy_page;
- /* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page; };
I think amdgpu doesn't care, but everyone else still might care somewhat about flink. That also shares buffers, so also needs to allocate the per-bo dummy page.
Not familiar with FLINK so I read a bit here https://lwn.net/Articles/283798/ sections 3 and 4 about FLINK naming and later mapping, I don't see a difference between FLINK and local BO mapping as opening by FLINK name returns handle to the same BO as the original. Why then we need a special handling for FLINK ?
Andrey
I also wonder whether we shouldn't have a helper to look up the dummy page, just to encode in core code how it's supposedo to cascade. -Daniel
/**
2.7.4
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!file->dummy_page) {
ret = -ENOMEM;
goto out_prime_destroy;
}
return file;
out_prime_destroy:
@@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
__free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime);
WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
- if (!ret) {
obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!obj->dummy_page)
ret = -ENOMEM;
- }
While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Christian.
mutex_unlock(&file_priv->prime.lock); if (ret) goto fail; @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
- __free_page(obj->dummy_page);
- /* remove the reference */ dma_buf_put(dma_buf); }
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
- struct page *dummy_page;
- /* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
struct page *dummy_page; };
/**
On Mon, Jun 22, 2020 at 3:18 PM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!file->dummy_page) {
ret = -ENOMEM;
goto out_prime_destroy;
}
return file;
out_prime_destroy:
@@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
__free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime); WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
if (!ret) {
obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!obj->dummy_page)
ret = -ENOMEM;
}
While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Imo we either don't bother with per-file dummy page, or we need this. Half-way doesn't make much sense, since for anything you dma-buf exported you have no idea whether it left a sandbox or not.
E.g. anything that's shared between client/compositor has a different security context, so picking the dummy page of either is the wrong thing.
If you're worried about the overhead we can also allocate the dummy page on demand, and SIGBUS if we can't allocate the right one. Then we just need to track whether a buffer has ever been exported. -Daniel
Christian.
mutex_unlock(&file_priv->prime.lock); if (ret) goto fail;
@@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
__free_page(obj->dummy_page);
}/* remove the reference */ dma_buf_put(dma_buf);
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
struct page *dummy_page;
#if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count *//* private: */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
struct page *dummy_page;
};
/**
On 6/22/20 9:18 AM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; } + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!file->dummy_page) { + ret = -ENOMEM; + goto out_prime_destroy; + }
return file; out_prime_destroy: @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file); + __free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime); WARN_ON(!list_empty(&file->event_list)); diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
+ if (!ret) { + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!obj->dummy_page) + ret = -ENOMEM; + }
While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Christian.
Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://elixir.bootlin.com/linux/v5.7-rc7/source/mm/memory.c#L3977) on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object. We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
mutex_unlock(&file_priv->prime.lock); if (ret) goto fail; @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
+ __free_page(obj->dummy_page);
/* remove the reference */ dma_buf_put(dma_buf); } diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime; + struct page *dummy_page;
/* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count */ diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
+ struct page *dummy_page; }; /**
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
On 6/22/20 9:18 AM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; } + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!file->dummy_page) { + ret = -ENOMEM; + goto out_prime_destroy; + }
return file; out_prime_destroy: @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file); + __free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime); WARN_ON(!list_empty(&file->event_list)); diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
+ if (!ret) { + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!obj->dummy_page) + ret = -ENOMEM; + }
While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Christian.
Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://elixir.bootlin.com/linux/v5.7-rc7/source/mm/memory.c#L3977) on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object.
Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
Regards, Christian.
We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
mutex_unlock(&file_priv->prime.lock); if (ret) goto fail; @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
+ __free_page(obj->dummy_page);
/* remove the reference */ dma_buf_put(dma_buf); } diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime; + struct page *dummy_page;
/* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count */ diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
+ struct page *dummy_page; }; /**
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
On 6/22/20 9:18 AM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
- file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!file->dummy_page) {
ret = -ENOMEM;
goto out_prime_destroy;
- }
out_prime_destroy:return file;
@@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
- __free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime); WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
- if (!ret) {
obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!obj->dummy_page)
ret = -ENOMEM;
- }
While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Christian.
Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://elixir.bootlin.com/linux/v5.7-rc7/source/mm/memory.c#L3977) on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object.
Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regards, Christian.
We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
mutex_unlock(&file_priv->prime.lock); if (ret) goto fail; @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
- __free_page(obj->dummy_page);
}/* remove the reference */ dma_buf_put(dma_buf);
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
- struct page *dummy_page;
#if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count *//* private: */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page; }; /**
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
On 6/22/20 9:18 AM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
- file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!file->dummy_page) {
ret = -ENOMEM;
goto out_prime_destroy;
- }
out_prime_destroy:return file;
@@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
- __free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime); WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
- if (!ret) {
obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!obj->dummy_page)
ret = -ENOMEM;
- }
While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Christian.
Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object.
Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
Can you clarify your concern here ? I see no DRM driver besides vmwgfx who installs a handler for vm_operations_struct.page_mkwrite and in any case, since I will be turning off VM_SHARED flag for the faulting vm_area_struct making it a COW, page_mkwrite will not be called on any subsequent vm fault.
Andrey
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regards, Christian.
We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
mutex_unlock(&file_priv->prime.lock); if (ret) goto fail; @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
- __free_page(obj->dummy_page);
}/* remove the reference */ dma_buf_put(dma_buf);
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
- struct page *dummy_page;
#if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count *//* private: */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page; }; /**
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
On 6/22/20 9:18 AM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; }
- file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!file->dummy_page) {
ret = -ENOMEM;
goto out_prime_destroy;
- }
out_prime_destroy:return file;
@@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file);
- __free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime); WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
- if (!ret) {
obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!obj->dummy_page)
ret = -ENOMEM;
- }
While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Christian.
Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object.
Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128) but core mm takes only read side mm_sem (here for example https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#...) and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
Andrey
Regards, Christian.
We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
mutex_unlock(&file_priv->prime.lock); if (ret) goto fail; @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
- __free_page(obj->dummy_page);
}/* remove the reference */ dma_buf_put(dma_buf);
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime;
- struct page *dummy_page;
#if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count *//* private: */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
- struct page *dummy_page; }; /**
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
On 6/22/20 9:18 AM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Will be used to reroute CPU mapped BO's page faults once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/drm_file.c | 8 ++++++++ drivers/gpu/drm/drm_prime.c | 10 ++++++++++ include/drm/drm_file.h | 2 ++ include/drm/drm_gem.h | 2 ++ 4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index c4c704e..67c0770 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) goto out_prime_destroy; } + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!file->dummy_page) { + ret = -ENOMEM; + goto out_prime_destroy; + }
return file; out_prime_destroy: @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) if (dev->driver->postclose) dev->driver->postclose(dev, file); + __free_page(file->dummy_page);
drm_prime_destroy_file_private(&file->prime); WARN_ON(!list_empty(&file->event_list)); diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 1de2cde..c482e9c 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
+ if (!ret) { + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!obj->dummy_page) + ret = -ENOMEM; + }
While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Christian.
Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object.
Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128) but core mm takes only read side mm_sem (here for example https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#...) and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
But could we at least have only one page per client instead of per BO?
Thanks, Christian.
Andrey
Regards, Christian.
We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
mutex_unlock(&file_priv->prime.lock); if (ret) goto fail; @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach);
+ __free_page(obj->dummy_page);
/* remove the reference */ dma_buf_put(dma_buf); } diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 19df802..349a658 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -335,6 +335,8 @@ struct drm_file { */ struct drm_prime_file_private prime; + struct page *dummy_page;
/* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) unsigned long lock_count; /* DRI1 legacy lock count */ diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 0b37506..47460d1 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -310,6 +310,8 @@ struct drm_gem_object { * */ const struct drm_gem_object_funcs *funcs;
+ struct page *dummy_page; }; /**
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
On 6/22/20 9:18 AM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: > Will be used to reroute CPU mapped BO's page faults once > device is removed. > > Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com > --- > drivers/gpu/drm/drm_file.c | 8 ++++++++ > drivers/gpu/drm/drm_prime.c | 10 ++++++++++ > include/drm/drm_file.h | 2 ++ > include/drm/drm_gem.h | 2 ++ > 4 files changed, 22 insertions(+) > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c > index c4c704e..67c0770 100644 > --- a/drivers/gpu/drm/drm_file.c > +++ b/drivers/gpu/drm/drm_file.c > @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct > drm_minor *minor) > goto out_prime_destroy; > } > + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); > + if (!file->dummy_page) { > + ret = -ENOMEM; > + goto out_prime_destroy; > + } > + > return file; > out_prime_destroy: > @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) > if (dev->driver->postclose) > dev->driver->postclose(dev, file); > + __free_page(file->dummy_page); > + > drm_prime_destroy_file_private(&file->prime); > WARN_ON(!list_empty(&file->event_list)); > diff --git a/drivers/gpu/drm/drm_prime.c > b/drivers/gpu/drm/drm_prime.c > index 1de2cde..c482e9c 100644 > --- a/drivers/gpu/drm/drm_prime.c > +++ b/drivers/gpu/drm/drm_prime.c > @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct > drm_device *dev, > ret = drm_prime_add_buf_handle(&file_priv->prime, > dma_buf, *handle); > + > + if (!ret) { > + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); > + if (!obj->dummy_page) > + ret = -ENOMEM; > + } > + While the per file case still looks acceptable this is a clear NAK since it will massively increase the memory needed for a prime exported object.
I think that this is quite overkill in the first place and for the hot unplug case we can just use the global dummy page as well.
Christian.
Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object.
Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128) but core mm takes only read side mm_sem (here for example https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#...) and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
But could we at least have only one page per client instead of per BO?
I think you can do one page per file descriptor or something like that. But gets annoying with shared bo, especially with dma_buf_mmap forwarding. -Daniel
Thanks, Christian.
Andrey
Regards, Christian.
We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
> mutex_unlock(&file_priv->prime.lock); > if (ret) > goto fail; > @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct > drm_gem_object *obj, struct sg_table *sg) > dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); > dma_buf = attach->dmabuf; > dma_buf_detach(attach->dmabuf, attach); > + > + __free_page(obj->dummy_page); > + > /* remove the reference */ > dma_buf_put(dma_buf); > } > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h > index 19df802..349a658 100644 > --- a/include/drm/drm_file.h > +++ b/include/drm/drm_file.h > @@ -335,6 +335,8 @@ struct drm_file { > */ > struct drm_prime_file_private prime; > + struct page *dummy_page; > + > /* private: */ > #if IS_ENABLED(CONFIG_DRM_LEGACY) > unsigned long lock_count; /* DRI1 legacy lock count */ > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > index 0b37506..47460d1 100644 > --- a/include/drm/drm_gem.h > +++ b/include/drm/drm_gem.h > @@ -310,6 +310,8 @@ struct drm_gem_object { > * > */ > const struct drm_gem_object_funcs *funcs; > + > + struct page *dummy_page; > }; > /**
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
On Sat, Nov 14, 2020 at 10:51 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
On 6/22/20 9:18 AM, Christian König wrote: > Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: >> Will be used to reroute CPU mapped BO's page faults once >> device is removed. >> >> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >> --- >> drivers/gpu/drm/drm_file.c | 8 ++++++++ >> drivers/gpu/drm/drm_prime.c | 10 ++++++++++ >> include/drm/drm_file.h | 2 ++ >> include/drm/drm_gem.h | 2 ++ >> 4 files changed, 22 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >> index c4c704e..67c0770 100644 >> --- a/drivers/gpu/drm/drm_file.c >> +++ b/drivers/gpu/drm/drm_file.c >> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct >> drm_minor *minor) >> goto out_prime_destroy; >> } >> + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >> + if (!file->dummy_page) { >> + ret = -ENOMEM; >> + goto out_prime_destroy; >> + } >> + >> return file; >> out_prime_destroy: >> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) >> if (dev->driver->postclose) >> dev->driver->postclose(dev, file); >> + __free_page(file->dummy_page); >> + >> drm_prime_destroy_file_private(&file->prime); >> WARN_ON(!list_empty(&file->event_list)); >> diff --git a/drivers/gpu/drm/drm_prime.c >> b/drivers/gpu/drm/drm_prime.c >> index 1de2cde..c482e9c 100644 >> --- a/drivers/gpu/drm/drm_prime.c >> +++ b/drivers/gpu/drm/drm_prime.c >> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct >> drm_device *dev, >> ret = drm_prime_add_buf_handle(&file_priv->prime, >> dma_buf, *handle); >> + >> + if (!ret) { >> + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >> + if (!obj->dummy_page) >> + ret = -ENOMEM; >> + } >> + > While the per file case still looks acceptable this is a clear NAK > since it will massively increase the memory needed for a prime > exported object. > > I think that this is quite overkill in the first place and for the > hot unplug case we can just use the global dummy page as well. > > Christian.
Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object.
Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128) but core mm takes only read side mm_sem (here for example https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#...) and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
btw I think not clearing at alloc breaks the render node model a bit. Without that this was all fine, since system pages still got cleared by alloc_page(), and we only leaked vram. And for the legacy node model with authentication of clients against the X server, leaking that all around was ok. With render nodes no leaking should happen, with no knob for userspace to opt out of the forced clearing. -Daniel
But could we at least have only one page per client instead of per BO?
I think you can do one page per file descriptor or something like that. But gets annoying with shared bo, especially with dma_buf_mmap forwarding. -Daniel
Thanks, Christian.
Andrey
Regards, Christian.
We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
>> mutex_unlock(&file_priv->prime.lock); >> if (ret) >> goto fail; >> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct >> drm_gem_object *obj, struct sg_table *sg) >> dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); >> dma_buf = attach->dmabuf; >> dma_buf_detach(attach->dmabuf, attach); >> + >> + __free_page(obj->dummy_page); >> + >> /* remove the reference */ >> dma_buf_put(dma_buf); >> } >> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h >> index 19df802..349a658 100644 >> --- a/include/drm/drm_file.h >> +++ b/include/drm/drm_file.h >> @@ -335,6 +335,8 @@ struct drm_file { >> */ >> struct drm_prime_file_private prime; >> + struct page *dummy_page; >> + >> /* private: */ >> #if IS_ENABLED(CONFIG_DRM_LEGACY) >> unsigned long lock_count; /* DRI1 legacy lock count */ >> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >> index 0b37506..47460d1 100644 >> --- a/include/drm/drm_gem.h >> +++ b/include/drm/drm_gem.h >> @@ -310,6 +310,8 @@ struct drm_gem_object { >> * >> */ >> const struct drm_gem_object_funcs *funcs; >> + >> + struct page *dummy_page; >> }; >> /**
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On 2020-11-14 10:57 a.m., Daniel Vetter wrote:
On Sat, Nov 14, 2020 at 10:51 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky: > On 6/22/20 9:18 AM, Christian König wrote: >> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: >>> Will be used to reroute CPU mapped BO's page faults once >>> device is removed. >>> >>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>> --- >>> drivers/gpu/drm/drm_file.c | 8 ++++++++ >>> drivers/gpu/drm/drm_prime.c | 10 ++++++++++ >>> include/drm/drm_file.h | 2 ++ >>> include/drm/drm_gem.h | 2 ++ >>> 4 files changed, 22 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >>> index c4c704e..67c0770 100644 >>> --- a/drivers/gpu/drm/drm_file.c >>> +++ b/drivers/gpu/drm/drm_file.c >>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct >>> drm_minor *minor) >>> goto out_prime_destroy; >>> } >>> + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>> + if (!file->dummy_page) { >>> + ret = -ENOMEM; >>> + goto out_prime_destroy; >>> + } >>> + >>> return file; >>> out_prime_destroy: >>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) >>> if (dev->driver->postclose) >>> dev->driver->postclose(dev, file); >>> + __free_page(file->dummy_page); >>> + >>> drm_prime_destroy_file_private(&file->prime); >>> WARN_ON(!list_empty(&file->event_list)); >>> diff --git a/drivers/gpu/drm/drm_prime.c >>> b/drivers/gpu/drm/drm_prime.c >>> index 1de2cde..c482e9c 100644 >>> --- a/drivers/gpu/drm/drm_prime.c >>> +++ b/drivers/gpu/drm/drm_prime.c >>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct >>> drm_device *dev, >>> ret = drm_prime_add_buf_handle(&file_priv->prime, >>> dma_buf, *handle); >>> + >>> + if (!ret) { >>> + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>> + if (!obj->dummy_page) >>> + ret = -ENOMEM; >>> + } >>> + >> While the per file case still looks acceptable this is a clear NAK >> since it will massively increase the memory needed for a prime >> exported object. >> >> I think that this is quite overkill in the first place and for the >> hot unplug case we can just use the global dummy page as well. >> >> Christian. > > Global dummy page is good for read access, what do you do on write > access ? My first approach was indeed to map at first global dummy > page as read only and mark the vma->vm_flags as !VM_SHARED assuming > that this would trigger Copy On Write flow in core mm > (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) > > on the next page fault to same address triggered by a write access but > then i realized a new COW page will be allocated for each such mapping > and this is much more wasteful then having a dedicated page per GEM > object. Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128) but core mm takes only read side mm_sem (here for example https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#...) and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
btw I think not clearing at alloc breaks the render node model a bit. Without that this was all fine, since system pages still got cleared by alloc_page(), and we only leaked vram. And for the legacy node model with authentication of clients against the X server, leaking that all around was ok. With render nodes no leaking should happen, with no knob for userspace to opt out of the forced clearing.
Seconded.
On 11/14/20 4:51 AM, Daniel Vetter wrote:
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
On 6/22/20 9:18 AM, Christian König wrote: > Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: >> Will be used to reroute CPU mapped BO's page faults once >> device is removed. >> >> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >> --- >> drivers/gpu/drm/drm_file.c | 8 ++++++++ >> drivers/gpu/drm/drm_prime.c | 10 ++++++++++ >> include/drm/drm_file.h | 2 ++ >> include/drm/drm_gem.h | 2 ++ >> 4 files changed, 22 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >> index c4c704e..67c0770 100644 >> --- a/drivers/gpu/drm/drm_file.c >> +++ b/drivers/gpu/drm/drm_file.c >> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct >> drm_minor *minor) >> goto out_prime_destroy; >> } >> + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >> + if (!file->dummy_page) { >> + ret = -ENOMEM; >> + goto out_prime_destroy; >> + } >> + >> return file; >> out_prime_destroy: >> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) >> if (dev->driver->postclose) >> dev->driver->postclose(dev, file); >> + __free_page(file->dummy_page); >> + >> drm_prime_destroy_file_private(&file->prime); >> WARN_ON(!list_empty(&file->event_list)); >> diff --git a/drivers/gpu/drm/drm_prime.c >> b/drivers/gpu/drm/drm_prime.c >> index 1de2cde..c482e9c 100644 >> --- a/drivers/gpu/drm/drm_prime.c >> +++ b/drivers/gpu/drm/drm_prime.c >> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct >> drm_device *dev, >> ret = drm_prime_add_buf_handle(&file_priv->prime, >> dma_buf, *handle); >> + >> + if (!ret) { >> + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >> + if (!obj->dummy_page) >> + ret = -ENOMEM; >> + } >> + > While the per file case still looks acceptable this is a clear NAK > since it will massively increase the memory needed for a prime > exported object. > > I think that this is quite overkill in the first place and for the > hot unplug case we can just use the global dummy page as well. > > Christian. Global dummy page is good for read access, what do you do on write access ? My first approach was indeed to map at first global dummy page as read only and mark the vma->vm_flags as !VM_SHARED assuming that this would trigger Copy On Write flow in core mm (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
on the next page fault to same address triggered by a write access but then i realized a new COW page will be allocated for each such mapping and this is much more wasteful then having a dedicated page per GEM object.
Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) but core mm takes only read side mm_sem (here for example https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu... but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
But could we at least have only one page per client instead of per BO?
I think you can do one page per file descriptor or something like that. But gets annoying with shared bo, especially with dma_buf_mmap forwarding. -Daniel
Christian - is your concern more with too much page allocations or with extra pointer member cluttering TTM BO struct ? Because we can allocate the dummy page on demand only when needed. It's just seems to me that keeping it per BO streamlines the code as I don't need to have different handling for local vs imported BOs.
Andrey
Thanks, Christian.
Andrey
Regards, Christian.
We can indeed optimize by allocating this dummy page on the first page fault after device disconnect instead on GEM object creation.
Andrey
>> mutex_unlock(&file_priv->prime.lock); >> if (ret) >> goto fail; >> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct >> drm_gem_object *obj, struct sg_table *sg) >> dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL); >> dma_buf = attach->dmabuf; >> dma_buf_detach(attach->dmabuf, attach); >> + >> + __free_page(obj->dummy_page); >> + >> /* remove the reference */ >> dma_buf_put(dma_buf); >> } >> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h >> index 19df802..349a658 100644 >> --- a/include/drm/drm_file.h >> +++ b/include/drm/drm_file.h >> @@ -335,6 +335,8 @@ struct drm_file { >> */ >> struct drm_prime_file_private prime; >> + struct page *dummy_page; >> + >> /* private: */ >> #if IS_ENABLED(CONFIG_DRM_LEGACY) >> unsigned long lock_count; /* DRI1 legacy lock count */ >> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >> index 0b37506..47460d1 100644 >> --- a/include/drm/drm_gem.h >> +++ b/include/drm/drm_gem.h >> @@ -310,6 +310,8 @@ struct drm_gem_object { >> * >> */ >> const struct drm_gem_object_funcs *funcs; >> + >> + struct page *dummy_page; >> }; >> /**
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
On 11/14/20 4:51 AM, Daniel Vetter wrote:
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote:
Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky: > On 6/22/20 9:18 AM, Christian König wrote: >> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: >>> Will be used to reroute CPU mapped BO's page faults once >>> device is removed. >>> >>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>> --- >>> drivers/gpu/drm/drm_file.c | 8 ++++++++ >>> drivers/gpu/drm/drm_prime.c | 10 ++++++++++ >>> include/drm/drm_file.h | 2 ++ >>> include/drm/drm_gem.h | 2 ++ >>> 4 files changed, 22 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/drm_file.c >>> b/drivers/gpu/drm/drm_file.c >>> index c4c704e..67c0770 100644 >>> --- a/drivers/gpu/drm/drm_file.c >>> +++ b/drivers/gpu/drm/drm_file.c >>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct >>> drm_minor *minor) >>> goto out_prime_destroy; >>> } >>> + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>> + if (!file->dummy_page) { >>> + ret = -ENOMEM; >>> + goto out_prime_destroy; >>> + } >>> + >>> return file; >>> out_prime_destroy: >>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) >>> if (dev->driver->postclose) >>> dev->driver->postclose(dev, file); >>> + __free_page(file->dummy_page); >>> + >>> drm_prime_destroy_file_private(&file->prime); >>> WARN_ON(!list_empty(&file->event_list)); >>> diff --git a/drivers/gpu/drm/drm_prime.c >>> b/drivers/gpu/drm/drm_prime.c >>> index 1de2cde..c482e9c 100644 >>> --- a/drivers/gpu/drm/drm_prime.c >>> +++ b/drivers/gpu/drm/drm_prime.c >>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct >>> drm_device *dev, >>> ret = drm_prime_add_buf_handle(&file_priv->prime, >>> dma_buf, *handle); >>> + >>> + if (!ret) { >>> + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>> + if (!obj->dummy_page) >>> + ret = -ENOMEM; >>> + } >>> + >> While the per file case still looks acceptable this is a clear NAK >> since it will massively increase the memory needed for a prime >> exported object. >> >> I think that this is quite overkill in the first place and for the >> hot unplug case we can just use the global dummy page as well. >> >> Christian. > Global dummy page is good for read access, what do you do on write > access ? My first approach was indeed to map at first global dummy > page as read only and mark the vma->vm_flags as !VM_SHARED assuming > that this would trigger Copy On Write flow in core mm > (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) > > > on the next page fault to same address triggered by a write > access but > then i realized a new COW page will be allocated for each such > mapping > and this is much more wasteful then having a dedicated page per GEM > object. Yeah, but this is only for a very very small corner cases. What we need to prevent is increasing the memory usage during normal operation to much.
Using memory during the unplug is completely unproblematic because we just released quite a bunch of it by releasing all those system memory buffers.
And I'm pretty sure that COWed pages are correctly accounted towards the used memory of a process.
So I think if that approach works as intended and the COW pages are released again on unmapping it would be the perfect solution to the problem.
Daniel what do you think?
If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
but core mm takes only read side mm_sem (here for example https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu... but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
But could we at least have only one page per client instead of per BO?
I think you can do one page per file descriptor or something like that. But gets annoying with shared bo, especially with dma_buf_mmap forwarding. -Daniel
Christian - is your concern more with too much page allocations or with extra pointer member cluttering TTM BO struct ?
Yes, that is one problem.
Because we can allocate the dummy page on demand only when needed. It's just seems to me that keeping it per BO streamlines the code as I don't need to have different handling for local vs imported BOs.
Why should you have a difference between local vs imported BOs?
Christian.
Andrey
On 11/16/20 4:48 AM, Christian König wrote:
Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
On 11/14/20 4:51 AM, Daniel Vetter wrote:
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
On 6/22/20 1:50 PM, Daniel Vetter wrote:
On Mon, Jun 22, 2020 at 7:45 PM Christian König christian.koenig@amd.com wrote: > Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky: >> On 6/22/20 9:18 AM, Christian König wrote: >>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: >>>> Will be used to reroute CPU mapped BO's page faults once >>>> device is removed. >>>> >>>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>>> --- >>>> drivers/gpu/drm/drm_file.c | 8 ++++++++ >>>> drivers/gpu/drm/drm_prime.c | 10 ++++++++++ >>>> include/drm/drm_file.h | 2 ++ >>>> include/drm/drm_gem.h | 2 ++ >>>> 4 files changed, 22 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >>>> index c4c704e..67c0770 100644 >>>> --- a/drivers/gpu/drm/drm_file.c >>>> +++ b/drivers/gpu/drm/drm_file.c >>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct >>>> drm_minor *minor) >>>> goto out_prime_destroy; >>>> } >>>> + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>>> + if (!file->dummy_page) { >>>> + ret = -ENOMEM; >>>> + goto out_prime_destroy; >>>> + } >>>> + >>>> return file; >>>> out_prime_destroy: >>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) >>>> if (dev->driver->postclose) >>>> dev->driver->postclose(dev, file); >>>> + __free_page(file->dummy_page); >>>> + >>>> drm_prime_destroy_file_private(&file->prime); >>>> WARN_ON(!list_empty(&file->event_list)); >>>> diff --git a/drivers/gpu/drm/drm_prime.c >>>> b/drivers/gpu/drm/drm_prime.c >>>> index 1de2cde..c482e9c 100644 >>>> --- a/drivers/gpu/drm/drm_prime.c >>>> +++ b/drivers/gpu/drm/drm_prime.c >>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct >>>> drm_device *dev, >>>> ret = drm_prime_add_buf_handle(&file_priv->prime, >>>> dma_buf, *handle); >>>> + >>>> + if (!ret) { >>>> + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>>> + if (!obj->dummy_page) >>>> + ret = -ENOMEM; >>>> + } >>>> + >>> While the per file case still looks acceptable this is a clear NAK >>> since it will massively increase the memory needed for a prime >>> exported object. >>> >>> I think that this is quite overkill in the first place and for the >>> hot unplug case we can just use the global dummy page as well. >>> >>> Christian. >> Global dummy page is good for read access, what do you do on write >> access ? My first approach was indeed to map at first global dummy >> page as read only and mark the vma->vm_flags as !VM_SHARED assuming >> that this would trigger Copy On Write flow in core mm >> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) >> >> >> on the next page fault to same address triggered by a write access but >> then i realized a new COW page will be allocated for each such mapping >> and this is much more wasteful then having a dedicated page per GEM >> object. > Yeah, but this is only for a very very small corner cases. What we need > to prevent is increasing the memory usage during normal operation to > much. > > Using memory during the unplug is completely unproblematic because we > just released quite a bunch of it by releasing all those system memory > buffers. > > And I'm pretty sure that COWed pages are correctly accounted towards > the > used memory of a process. > > So I think if that approach works as intended and the COW pages are > released again on unmapping it would be the perfect solution to the > problem. > > Daniel what do you think? If COW works, sure sounds reasonable. And if we can make sure we managed to drop all the system allocations (otherwise suddenly 2x memory usage, worst case). But I have no idea whether we can retroshoehorn that into an established vma, you might have fun stuff like a mkwrite handler there (which I thought is the COW handler thing, but really no idea).
If we need to massively change stuff then I think rw dummy page, allocated on first fault after hotunplug (maybe just make it one per object, that's simplest) seems like the much safer option. Much less code that can go wrong. -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
but core mm takes only read side mm_sem (here for example https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu... but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
But could we at least have only one page per client instead of per BO?
I think you can do one page per file descriptor or something like that. But gets annoying with shared bo, especially with dma_buf_mmap forwarding. -Daniel
Christian - is your concern more with too much page allocations or with extra pointer member cluttering TTM BO struct ?
Yes, that is one problem.
Because we can allocate the dummy page on demand only when needed. It's just seems to me that keeping it per BO streamlines the code as I don't need to have different handling for local vs imported BOs.
Why should you have a difference between local vs imported BOs?
For local BO seems like Daniel's suggestion to use vm_area_struct->vm_file->private_data should work as this points to drm_file. For imported BOs private_data will point to dma_buf structure since each imported BO is backed by a pseudo file (created in dma_buf_getfile). If so,where should we store the dummy RW BO in this case ? In current implementation it's stored in drm_gem_object.
P.S For FLINK case it seems to me the handling should be no different then with local BO as the FD used for mmap in this case is still the same one associated with the DRM file.
Andrey
Christian.
Andrey
Am 16.11.20 um 20:00 schrieb Andrey Grodzovsky:
On 11/16/20 4:48 AM, Christian König wrote:
Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
On 11/14/20 4:51 AM, Daniel Vetter wrote:
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
On 6/22/20 1:50 PM, Daniel Vetter wrote: > On Mon, Jun 22, 2020 at 7:45 PM Christian König > christian.koenig@amd.com wrote: >> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky: >>> On 6/22/20 9:18 AM, Christian König wrote: >>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: >>>>> Will be used to reroute CPU mapped BO's page faults once >>>>> device is removed. >>>>> >>>>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>>>> --- >>>>> drivers/gpu/drm/drm_file.c | 8 ++++++++ >>>>> drivers/gpu/drm/drm_prime.c | 10 ++++++++++ >>>>> include/drm/drm_file.h | 2 ++ >>>>> include/drm/drm_gem.h | 2 ++ >>>>> 4 files changed, 22 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/drm_file.c >>>>> b/drivers/gpu/drm/drm_file.c >>>>> index c4c704e..67c0770 100644 >>>>> --- a/drivers/gpu/drm/drm_file.c >>>>> +++ b/drivers/gpu/drm/drm_file.c >>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct >>>>> drm_minor *minor) >>>>> goto out_prime_destroy; >>>>> } >>>>> + file->dummy_page = alloc_page(GFP_KERNEL | >>>>> __GFP_ZERO); >>>>> + if (!file->dummy_page) { >>>>> + ret = -ENOMEM; >>>>> + goto out_prime_destroy; >>>>> + } >>>>> + >>>>> return file; >>>>> out_prime_destroy: >>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) >>>>> if (dev->driver->postclose) >>>>> dev->driver->postclose(dev, file); >>>>> + __free_page(file->dummy_page); >>>>> + >>>>> drm_prime_destroy_file_private(&file->prime); >>>>> WARN_ON(!list_empty(&file->event_list)); >>>>> diff --git a/drivers/gpu/drm/drm_prime.c >>>>> b/drivers/gpu/drm/drm_prime.c >>>>> index 1de2cde..c482e9c 100644 >>>>> --- a/drivers/gpu/drm/drm_prime.c >>>>> +++ b/drivers/gpu/drm/drm_prime.c >>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct >>>>> drm_device *dev, >>>>> ret = drm_prime_add_buf_handle(&file_priv->prime, >>>>> dma_buf, *handle); >>>>> + >>>>> + if (!ret) { >>>>> + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>>>> + if (!obj->dummy_page) >>>>> + ret = -ENOMEM; >>>>> + } >>>>> + >>>> While the per file case still looks acceptable this is a >>>> clear NAK >>>> since it will massively increase the memory needed for a prime >>>> exported object. >>>> >>>> I think that this is quite overkill in the first place and >>>> for the >>>> hot unplug case we can just use the global dummy page as well. >>>> >>>> Christian. >>> Global dummy page is good for read access, what do you do on >>> write >>> access ? My first approach was indeed to map at first global >>> dummy >>> page as read only and mark the vma->vm_flags as !VM_SHARED >>> assuming >>> that this would trigger Copy On Write flow in core mm >>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) >>> >>> >>> on the next page fault to same address triggered by a write >>> access but >>> then i realized a new COW page will be allocated for each such >>> mapping >>> and this is much more wasteful then having a dedicated page >>> per GEM >>> object. >> Yeah, but this is only for a very very small corner cases. What >> we need >> to prevent is increasing the memory usage during normal >> operation to >> much. >> >> Using memory during the unplug is completely unproblematic >> because we >> just released quite a bunch of it by releasing all those system >> memory >> buffers. >> >> And I'm pretty sure that COWed pages are correctly accounted >> towards >> the >> used memory of a process. >> >> So I think if that approach works as intended and the COW pages >> are >> released again on unmapping it would be the perfect solution to >> the >> problem. >> >> Daniel what do you think? > If COW works, sure sounds reasonable. And if we can make sure we > managed to drop all the system allocations (otherwise suddenly 2x > memory usage, worst case). But I have no idea whether we can > retroshoehorn that into an established vma, you might have fun > stuff > like a mkwrite handler there (which I thought is the COW handler > thing, but really no idea). > > If we need to massively change stuff then I think rw dummy page, > allocated on first fault after hotunplug (maybe just make it one > per > object, that's simplest) seems like the much safer option. Much > less > code that can go wrong. > -Daniel
Regarding COW, i was looking into how to properly implement it from within the fault handler (i.e. ttm_bo_vm_fault) and the main obstacle I hit is that of exclusive access to the vm_area_struct, i need to be able to modify vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can be triggered on subsequent write access fault (here https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
but core mm takes only read side mm_sem (here for example https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...)
and so I am not supposed to modify vm_area_struct in this case. I am not sure if it's legit to write lock tthe mm_sem from this point. I found some discussions about this here https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu... but it wasn't really clear to me what's the solution.
In any case, seems to me that easier and more memory saving solution would be to just switch to per ttm bo dumy rw page that would be allocated on demand as you suggested here. This should also take care of imported BOs and flink cases. Then i can drop the per device FD and per GEM object FD dummy BO and the ugly loop i am using in patch 2 to match faulting BO to the right dummy page.
Does this makes sense ?
I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
But could we at least have only one page per client instead of per BO?
I think you can do one page per file descriptor or something like that. But gets annoying with shared bo, especially with dma_buf_mmap forwarding. -Daniel
Christian - is your concern more with too much page allocations or with extra pointer member cluttering TTM BO struct ?
Yes, that is one problem.
Because we can allocate the dummy page on demand only when needed. It's just seems to me that keeping it per BO streamlines the code as I don't need to have different handling for local vs imported BOs.
Why should you have a difference between local vs imported BOs?
For local BO seems like Daniel's suggestion to use vm_area_struct->vm_file->private_data should work as this points to drm_file. For imported BOs private_data will point to dma_buf structure since each imported BO is backed by a pseudo file (created in dma_buf_getfile).
Oh, good point. But we could easily fix that now. That should make the mapping code less complex as well.
Regards, Christian.
If so,where should we store the dummy RW BO in this case ? In current implementation it's stored in drm_gem_object.
P.S For FLINK case it seems to me the handling should be no different then with local BO as the FD used for mmap in this case is still the same one associated with the DRM file.
Andrey
Christian.
Andrey
On 11/16/20 3:36 PM, Christian König wrote:
Am 16.11.20 um 20:00 schrieb Andrey Grodzovsky:
On 11/16/20 4:48 AM, Christian König wrote:
Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
On 11/14/20 4:51 AM, Daniel Vetter wrote:
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky: > On 6/22/20 1:50 PM, Daniel Vetter wrote: >> On Mon, Jun 22, 2020 at 7:45 PM Christian König >> christian.koenig@amd.com wrote: >>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky: >>>> On 6/22/20 9:18 AM, Christian König wrote: >>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: >>>>>> Will be used to reroute CPU mapped BO's page faults once >>>>>> device is removed. >>>>>> >>>>>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>>>>> --- >>>>>> drivers/gpu/drm/drm_file.c | 8 ++++++++ >>>>>> drivers/gpu/drm/drm_prime.c | 10 ++++++++++ >>>>>> include/drm/drm_file.h | 2 ++ >>>>>> include/drm/drm_gem.h | 2 ++ >>>>>> 4 files changed, 22 insertions(+) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >>>>>> index c4c704e..67c0770 100644 >>>>>> --- a/drivers/gpu/drm/drm_file.c >>>>>> +++ b/drivers/gpu/drm/drm_file.c >>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct >>>>>> drm_minor *minor) >>>>>> goto out_prime_destroy; >>>>>> } >>>>>> + file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>>>>> + if (!file->dummy_page) { >>>>>> + ret = -ENOMEM; >>>>>> + goto out_prime_destroy; >>>>>> + } >>>>>> + >>>>>> return file; >>>>>> out_prime_destroy: >>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) >>>>>> if (dev->driver->postclose) >>>>>> dev->driver->postclose(dev, file); >>>>>> + __free_page(file->dummy_page); >>>>>> + >>>>>> drm_prime_destroy_file_private(&file->prime); >>>>>> WARN_ON(!list_empty(&file->event_list)); >>>>>> diff --git a/drivers/gpu/drm/drm_prime.c >>>>>> b/drivers/gpu/drm/drm_prime.c >>>>>> index 1de2cde..c482e9c 100644 >>>>>> --- a/drivers/gpu/drm/drm_prime.c >>>>>> +++ b/drivers/gpu/drm/drm_prime.c >>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct >>>>>> drm_device *dev, >>>>>> ret = drm_prime_add_buf_handle(&file_priv->prime, >>>>>> dma_buf, *handle); >>>>>> + >>>>>> + if (!ret) { >>>>>> + obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO); >>>>>> + if (!obj->dummy_page) >>>>>> + ret = -ENOMEM; >>>>>> + } >>>>>> + >>>>> While the per file case still looks acceptable this is a clear NAK >>>>> since it will massively increase the memory needed for a prime >>>>> exported object. >>>>> >>>>> I think that this is quite overkill in the first place and for the >>>>> hot unplug case we can just use the global dummy page as well. >>>>> >>>>> Christian. >>>> Global dummy page is good for read access, what do you do on write >>>> access ? My first approach was indeed to map at first global dummy >>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming >>>> that this would trigger Copy On Write flow in core mm >>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) >>>> >>>> >>>> on the next page fault to same address triggered by a write access but >>>> then i realized a new COW page will be allocated for each such mapping >>>> and this is much more wasteful then having a dedicated page per GEM >>>> object. >>> Yeah, but this is only for a very very small corner cases. What we need >>> to prevent is increasing the memory usage during normal operation to >>> much. >>> >>> Using memory during the unplug is completely unproblematic because we >>> just released quite a bunch of it by releasing all those system memory >>> buffers. >>> >>> And I'm pretty sure that COWed pages are correctly accounted towards >>> the >>> used memory of a process. >>> >>> So I think if that approach works as intended and the COW pages are >>> released again on unmapping it would be the perfect solution to the >>> problem. >>> >>> Daniel what do you think? >> If COW works, sure sounds reasonable. And if we can make sure we >> managed to drop all the system allocations (otherwise suddenly 2x >> memory usage, worst case). But I have no idea whether we can >> retroshoehorn that into an established vma, you might have fun stuff >> like a mkwrite handler there (which I thought is the COW handler >> thing, but really no idea). >> >> If we need to massively change stuff then I think rw dummy page, >> allocated on first fault after hotunplug (maybe just make it one per >> object, that's simplest) seems like the much safer option. Much less >> code that can go wrong. >> -Daniel > > Regarding COW, i was looking into how to properly implement it from > within the fault handler (i.e. ttm_bo_vm_fault) > and the main obstacle I hit is that of exclusive access to the > vm_area_struct, i need to be able to modify > vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so COW can > be triggered on subsequent write access > fault (here > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) > > but core mm takes only read side mm_sem (here for example > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) > > and so I am not supposed to modify vm_area_struct in this case. I am > not sure if it's legit to write lock tthe mm_sem from this point. > I found some discussions about this here > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu... > but it > wasn't really clear to me > what's the solution. > > In any case, seems to me that easier and more memory saving solution > would be to just switch to per ttm bo dumy rw page that > would be allocated on demand as you suggested here. This should also > take care of imported BOs and flink cases. > Then i can drop the per device FD and per GEM object FD dummy BO and > the ugly loop i am using in patch 2 to match faulting BO to the right > dummy page. > > Does this makes sense ? I still don't see the information leak as much of a problem, but if Daniel insists we should probably do this.
Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
But could we at least have only one page per client instead of per BO?
I think you can do one page per file descriptor or something like that. But gets annoying with shared bo, especially with dma_buf_mmap forwarding. -Daniel
Christian - is your concern more with too much page allocations or with extra pointer member cluttering TTM BO struct ?
Yes, that is one problem.
Because we can allocate the dummy page on demand only when needed. It's just seems to me that keeping it per BO streamlines the code as I don't need to have different handling for local vs imported BOs.
Why should you have a difference between local vs imported BOs?
For local BO seems like Daniel's suggestion to use vm_area_struct->vm_file->private_data should work as this points to drm_file. For imported BOs private_data will point to dma_buf structure since each imported BO is backed by a pseudo file (created in dma_buf_getfile).
Oh, good point. But we could easily fix that now. That should make the mapping code less complex as well.
Can you clarify what fix u have in mind ? I assume it's not by altering file->private_data to point to something else as we need to retrieve dmabuf (e.g. dma_buf_mmap_internal)
Andrey
Regards, Christian.
If so,where should we store the dummy RW BO in this case ? In current implementation it's stored in drm_gem_object.
P.S For FLINK case it seems to me the handling should be no different then with local BO as the FD used for mmap in this case is still the same one associated with the DRM file.
Andrey
Christian.
Andrey
Am 16.11.20 um 21:42 schrieb Andrey Grodzovsky:
On 11/16/20 3:36 PM, Christian König wrote:
Am 16.11.20 um 20:00 schrieb Andrey Grodzovsky:
On 11/16/20 4:48 AM, Christian König wrote:
Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
On 11/14/20 4:51 AM, Daniel Vetter wrote:
On Sat, Nov 14, 2020 at 9:41 AM Christian König ckoenig.leichtzumerken@gmail.com wrote: > Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky: >> On 6/22/20 1:50 PM, Daniel Vetter wrote: >>> On Mon, Jun 22, 2020 at 7:45 PM Christian König >>> christian.koenig@amd.com wrote: >>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky: >>>>> On 6/22/20 9:18 AM, Christian König wrote: >>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky: >>>>>>> Will be used to reroute CPU mapped BO's page faults once >>>>>>> device is removed. >>>>>>> >>>>>>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>>>>>> --- >>>>>>> drivers/gpu/drm/drm_file.c | 8 ++++++++ >>>>>>> drivers/gpu/drm/drm_prime.c | 10 ++++++++++ >>>>>>> include/drm/drm_file.h | 2 ++ >>>>>>> include/drm/drm_gem.h | 2 ++ >>>>>>> 4 files changed, 22 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/drm_file.c >>>>>>> b/drivers/gpu/drm/drm_file.c >>>>>>> index c4c704e..67c0770 100644 >>>>>>> --- a/drivers/gpu/drm/drm_file.c >>>>>>> +++ b/drivers/gpu/drm/drm_file.c >>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct >>>>>>> drm_minor *minor) >>>>>>> goto out_prime_destroy; >>>>>>> } >>>>>>> + file->dummy_page = alloc_page(GFP_KERNEL | >>>>>>> __GFP_ZERO); >>>>>>> + if (!file->dummy_page) { >>>>>>> + ret = -ENOMEM; >>>>>>> + goto out_prime_destroy; >>>>>>> + } >>>>>>> + >>>>>>> return file; >>>>>>> out_prime_destroy: >>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file) >>>>>>> if (dev->driver->postclose) >>>>>>> dev->driver->postclose(dev, file); >>>>>>> + __free_page(file->dummy_page); >>>>>>> + >>>>>>> drm_prime_destroy_file_private(&file->prime); >>>>>>> WARN_ON(!list_empty(&file->event_list)); >>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c >>>>>>> b/drivers/gpu/drm/drm_prime.c >>>>>>> index 1de2cde..c482e9c 100644 >>>>>>> --- a/drivers/gpu/drm/drm_prime.c >>>>>>> +++ b/drivers/gpu/drm/drm_prime.c >>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct >>>>>>> drm_device *dev, >>>>>>> ret = drm_prime_add_buf_handle(&file_priv->prime, >>>>>>> dma_buf, *handle); >>>>>>> + >>>>>>> + if (!ret) { >>>>>>> + obj->dummy_page = alloc_page(GFP_KERNEL | >>>>>>> __GFP_ZERO); >>>>>>> + if (!obj->dummy_page) >>>>>>> + ret = -ENOMEM; >>>>>>> + } >>>>>>> + >>>>>> While the per file case still looks acceptable this is a >>>>>> clear NAK >>>>>> since it will massively increase the memory needed for a prime >>>>>> exported object. >>>>>> >>>>>> I think that this is quite overkill in the first place and >>>>>> for the >>>>>> hot unplug case we can just use the global dummy page as well. >>>>>> >>>>>> Christian. >>>>> Global dummy page is good for read access, what do you do on >>>>> write >>>>> access ? My first approach was indeed to map at first global >>>>> dummy >>>>> page as read only and mark the vma->vm_flags as !VM_SHARED >>>>> assuming >>>>> that this would trigger Copy On Write flow in core mm >>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) >>>>> >>>>> >>>>> on the next page fault to same address triggered by a write >>>>> access but >>>>> then i realized a new COW page will be allocated for each >>>>> such mapping >>>>> and this is much more wasteful then having a dedicated page >>>>> per GEM >>>>> object. >>>> Yeah, but this is only for a very very small corner cases. >>>> What we need >>>> to prevent is increasing the memory usage during normal >>>> operation to >>>> much. >>>> >>>> Using memory during the unplug is completely unproblematic >>>> because we >>>> just released quite a bunch of it by releasing all those >>>> system memory >>>> buffers. >>>> >>>> And I'm pretty sure that COWed pages are correctly accounted >>>> towards >>>> the >>>> used memory of a process. >>>> >>>> So I think if that approach works as intended and the COW >>>> pages are >>>> released again on unmapping it would be the perfect solution >>>> to the >>>> problem. >>>> >>>> Daniel what do you think? >>> If COW works, sure sounds reasonable. And if we can make sure we >>> managed to drop all the system allocations (otherwise suddenly 2x >>> memory usage, worst case). But I have no idea whether we can >>> retroshoehorn that into an established vma, you might have fun >>> stuff >>> like a mkwrite handler there (which I thought is the COW handler >>> thing, but really no idea). >>> >>> If we need to massively change stuff then I think rw dummy page, >>> allocated on first fault after hotunplug (maybe just make it >>> one per >>> object, that's simplest) seems like the much safer option. >>> Much less >>> code that can go wrong. >>> -Daniel >> >> Regarding COW, i was looking into how to properly implement it >> from >> within the fault handler (i.e. ttm_bo_vm_fault) >> and the main obstacle I hit is that of exclusive access to the >> vm_area_struct, i need to be able to modify >> vma->vm_flags (and vm_page_prot) to remove VM_SHARED bit so >> COW can >> be triggered on subsequent write access >> fault (here >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) >> >> but core mm takes only read side mm_sem (here for example >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...) >> >> and so I am not supposed to modify vm_area_struct in this case. >> I am >> not sure if it's legit to write lock tthe mm_sem from this point. >> I found some discussions about this here >> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu... >> but it >> wasn't really clear to me >> what's the solution. >> >> In any case, seems to me that easier and more memory saving >> solution >> would be to just switch to per ttm bo dumy rw page that >> would be allocated on demand as you suggested here. This should >> also >> take care of imported BOs and flink cases. >> Then i can drop the per device FD and per GEM object FD dummy >> BO and >> the ugly loop i am using in patch 2 to match faulting BO to the >> right >> dummy page. >> >> Does this makes sense ? > I still don't see the information leak as much of a problem, but if > Daniel insists we should probably do this. Well amdgpu doesn't clear buffers by default, so indeed you guys are a lot more laissez-faire here. But in general we really don't do that kind of leaking. Iirc there's even radeonsi bugs because else clears, and radeonsi happily displays gunk :-)
> But could we at least have only one page per client instead of > per BO? I think you can do one page per file descriptor or something like that. But gets annoying with shared bo, especially with dma_buf_mmap forwarding. -Daniel
Christian - is your concern more with too much page allocations or with extra pointer member cluttering TTM BO struct ?
Yes, that is one problem.
Because we can allocate the dummy page on demand only when needed. It's just seems to me that keeping it per BO streamlines the code as I don't need to have different handling for local vs imported BOs.
Why should you have a difference between local vs imported BOs?
For local BO seems like Daniel's suggestion to use vm_area_struct->vm_file->private_data should work as this points to drm_file. For imported BOs private_data will point to dma_buf structure since each imported BO is backed by a pseudo file (created in dma_buf_getfile).
Oh, good point. But we could easily fix that now. That should make the mapping code less complex as well.
Can you clarify what fix u have in mind ? I assume it's not by altering file->private_data to point to something else as we need to retrieve dmabuf (e.g. dma_buf_mmap_internal)
Ah, crap. You are right that is really tricky because vma->vm_file doesn't point to something useful in this situation.
I was talking about the new vma_set_file() function I've just pushed to drm-misc-next, but that stuff can't be used here.
I still don't see the need to use more than the global dummy page even if that means information leak between processes on unplug.
Christian.
Andrey
Regards, Christian.
If so,where should we store the dummy RW BO in this case ? In current implementation it's stored in drm_gem_object.
P.S For FLINK case it seems to me the handling should be no different then with local BO as the FD used for mmap in this case is still the same one associated with the DRM file.
Andrey
Christian.
Andrey
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 389128b..2f8bf5e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -35,6 +35,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_file.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; vm_fault_t ret; + int idx; + struct drm_device *ddev = bo->base.dev;
- ret = ttm_bo_vm_reserve(bo, vmf); - if (ret) - return ret; + if (drm_dev_enter(ddev, &idx)) { + ret = ttm_bo_vm_reserve(bo, vmf); + if (ret) + goto exit; + + prot = vma->vm_page_prot;
- prot = vma->vm_page_prot; - ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT); - if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) + ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT); + if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) + goto exit; + + dma_resv_unlock(bo->base.resv); + +exit: + drm_dev_exit(idx); return ret; + } else {
- dma_resv_unlock(bo->base.resv); + struct drm_file *file = NULL; + struct page *dummy_page = NULL; + int handle;
- return ret; + /* We are faulting on imported BO from dma_buf */ + if (bo->base.dma_buf && bo->base.import_attach) { + dummy_page = bo->base.dummy_page; + /* We are faulting on non imported BO, find drm_file owning the BO*/ + } else { + struct drm_gem_object *gobj; + + mutex_lock(&ddev->filelist_mutex); + list_for_each_entry(file, &ddev->filelist, lhead) { + spin_lock(&file->table_lock); + idr_for_each_entry(&file->object_idr, gobj, handle) { + if (gobj == &bo->base) { + dummy_page = file->dummy_page; + break; + } + } + spin_unlock(&file->table_lock); + } + mutex_unlock(&ddev->filelist_mutex); + } + + if (dummy_page) { + /* + * Let do_fault complete the PTE install e.t.c using vmf->page + * + * TODO - should i call free_page somewhere ? + */ + get_page(dummy_page); + vmf->page = dummy_page; + return 0; + } else { + return VM_FAULT_SIGSEGV; + } + } } EXPORT_SYMBOL(ttm_bo_vm_fault);
On Sun, Jun 21, 2020 at 02:03:02AM -0400, Andrey Grodzovsky wrote:
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 389128b..2f8bf5e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -35,6 +35,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_file.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
Hm I think diff and code flow look a bit bad now. What about renaming the current function to __ttm_bo_vm_fault and then having something like the below:
ttm_bo_vm_fault(args) {
if (drm_dev_enter()) { __ttm_bo_vm_fault(args); drm_dev_exit(); } else { drm_gem_insert_dummy_pfn(); } }
I think drm_gem_insert_dummy_pfn(); should be portable across drivers, so another nice point to try to unifiy drivers as much as possible. -Daniel
pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; vm_fault_t ret;
- int idx;
- struct drm_device *ddev = bo->base.dev;
- ret = ttm_bo_vm_reserve(bo, vmf);
- if (ret)
return ret;
- if (drm_dev_enter(ddev, &idx)) {
ret = ttm_bo_vm_reserve(bo, vmf);
if (ret)
goto exit;
prot = vma->vm_page_prot;
- prot = vma->vm_page_prot;
- ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
- if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
goto exit;
dma_resv_unlock(bo->base.resv);
+exit:
return ret;drm_dev_exit(idx);
- } else {
- dma_resv_unlock(bo->base.resv);
struct drm_file *file = NULL;
struct page *dummy_page = NULL;
int handle;
- return ret;
/* We are faulting on imported BO from dma_buf */
if (bo->base.dma_buf && bo->base.import_attach) {
dummy_page = bo->base.dummy_page;
/* We are faulting on non imported BO, find drm_file owning the BO*/
Uh, we can't fish that out of the vma->vm_file pointer somehow? Or is that one all wrong? Doing this kind of list walk looks pretty horrible.
If the vma doesn't have the right pointer I guess next option is that we store the drm_file page in gem_bo->dummy_page, and replace it on first export. But that's going to be tricky to track ...
} else {
struct drm_gem_object *gobj;
mutex_lock(&ddev->filelist_mutex);
list_for_each_entry(file, &ddev->filelist, lhead) {
spin_lock(&file->table_lock);
idr_for_each_entry(&file->object_idr, gobj, handle) {
if (gobj == &bo->base) {
dummy_page = file->dummy_page;
break;
}
}
spin_unlock(&file->table_lock);
}
mutex_unlock(&ddev->filelist_mutex);
}
if (dummy_page) {
/*
* Let do_fault complete the PTE install e.t.c using vmf->page
*
* TODO - should i call free_page somewhere ?
Nah, instead don't call get_page. The page will be around as long as there's a reference for the drm_file or gem_bo, which is longer than any mmap. Otherwise yes this would like really badly.
*/
get_page(dummy_page);
vmf->page = dummy_page;
return 0;
} else {
return VM_FAULT_SIGSEGV;
Hm that would be a kernel bug, wouldn't it? WARN_ON() required here imo. -Daniel
}
- }
} EXPORT_SYMBOL(ttm_bo_vm_fault);
-- 2.7.4
On 6/22/20 5:41 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:02AM -0400, Andrey Grodzovsky wrote:
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 389128b..2f8bf5e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -35,6 +35,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_file.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
Hm I think diff and code flow look a bit bad now. What about renaming the current function to __ttm_bo_vm_fault and then having something like the below:
ttm_bo_vm_fault(args) {
if (drm_dev_enter()) { __ttm_bo_vm_fault(args); drm_dev_exit(); } else { drm_gem_insert_dummy_pfn(); } }
I think drm_gem_insert_dummy_pfn(); should be portable across drivers, so another nice point to try to unifiy drivers as much as possible. -Daniel
pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; vm_fault_t ret;
- int idx;
- struct drm_device *ddev = bo->base.dev;
- ret = ttm_bo_vm_reserve(bo, vmf);
- if (ret)
return ret;
- if (drm_dev_enter(ddev, &idx)) {
ret = ttm_bo_vm_reserve(bo, vmf);
if (ret)
goto exit;
prot = vma->vm_page_prot;
- prot = vma->vm_page_prot;
- ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
- if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
goto exit;
dma_resv_unlock(bo->base.resv);
+exit:
return ret;drm_dev_exit(idx);
- } else {
- dma_resv_unlock(bo->base.resv);
struct drm_file *file = NULL;
struct page *dummy_page = NULL;
int handle;
- return ret;
/* We are faulting on imported BO from dma_buf */
if (bo->base.dma_buf && bo->base.import_attach) {
dummy_page = bo->base.dummy_page;
/* We are faulting on non imported BO, find drm_file owning the BO*/
Uh, we can't fish that out of the vma->vm_file pointer somehow? Or is that one all wrong? Doing this kind of list walk looks pretty horrible.
If the vma doesn't have the right pointer I guess next option is that we store the drm_file page in gem_bo->dummy_page, and replace it on first export. But that's going to be tricky to track ...
} else {
struct drm_gem_object *gobj;
mutex_lock(&ddev->filelist_mutex);
list_for_each_entry(file, &ddev->filelist, lhead) {
spin_lock(&file->table_lock);
idr_for_each_entry(&file->object_idr, gobj, handle) {
if (gobj == &bo->base) {
dummy_page = file->dummy_page;
break;
}
}
spin_unlock(&file->table_lock);
}
mutex_unlock(&ddev->filelist_mutex);
}
if (dummy_page) {
/*
* Let do_fault complete the PTE install e.t.c using vmf->page
*
* TODO - should i call free_page somewhere ?
Nah, instead don't call get_page. The page will be around as long as there's a reference for the drm_file or gem_bo, which is longer than any mmap. Otherwise yes this would like really badly.
So actually that was my thinking in the first place and I indeed avoided taking reference and this ended up with multiple BUG_ONs as seen bellow where refcount:-63 mapcount:-48 for a page are deep into negative values... Those warnings were gone once i added get_page(dummy) which in my opinion implies that there is a page reference per each PTE and that when there is unmapping of the process address space and PTEs are deleted there is also put_page somewhere in mm core and the get_page per mapping keeps it balanced.
Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762929] BUG: Bad page map in process glxgear:disk$0 pte:8000000132284867 pmd:15aaec067 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762931] page:ffffe63384c8a100 refcount:-63 mapcount:-48 mapping:0000000000000000 index:0x0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762932] flags: 0x17fff8000000008(dirty) Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762933] raw: 017fff8000000008 dead000000000100 dead000000000122 0000000000000000 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762934] raw: 0000000000000000 0000000000000000 ffffffc1ffffffcf 0000000000000000 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762935] page dumped because: bad pte Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762937] addr:00007fe086263000 vm_flags:1c0440fb anon_vma:0000000000000000 mapping:ffff9b5cd42db268 index:1008b3 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762981] file:renderD129 fault:ttm_bo_vm_fault [ttm] mmap:amdgpu_mmap [amdgpu] readpage:0x0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762984] CPU: 5 PID: 2619 Comm: glxgear:disk$0 Tainted: G B OE 5.6.0-dev+ #51 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762985] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762985] Call Trace: Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762988] dump_stack+0x68/0x9b Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762990] print_bad_pte+0x19f/0x270 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762992] ? lock_page_memcg+0x5/0xf0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762995] unmap_page_range+0x777/0xbe0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763000] unmap_vmas+0xcc/0x160 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763004] exit_mmap+0xb5/0x1b0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763009] mmput+0x65/0x140 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763010] do_exit+0x362/0xc40 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763013] do_group_exit+0x47/0xb0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763016] get_signal+0x18b/0xc30 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763019] do_signal+0x36/0x6a0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763021] ? __set_task_comm+0x62/0x120 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763024] ? __x64_sys_futex+0x88/0x180 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763028] exit_to_usermode_loop+0x6f/0xc0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763030] do_syscall_64+0x149/0x1c0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763032] entry_SYSCALL_64_after_hwframe+0x49/0xbe Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763034] RIP: 0033:0x7fe091bd9360 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763037] Code: Bad RIP value.
Andrey
*/
get_page(dummy_page);
vmf->page = dummy_page;
return 0;
} else {
return VM_FAULT_SIGSEGV;
Hm that would be a kernel bug, wouldn't it? WARN_ON() required here imo. -Daniel
}
- } } EXPORT_SYMBOL(ttm_bo_vm_fault);
-- 2.7.4
On Tue, Jun 23, 2020 at 11:31:45PM -0400, Andrey Grodzovsky wrote:
On 6/22/20 5:41 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:02AM -0400, Andrey Grodzovsky wrote:
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 389128b..2f8bf5e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -35,6 +35,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_file.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
Hm I think diff and code flow look a bit bad now. What about renaming the current function to __ttm_bo_vm_fault and then having something like the below:
ttm_bo_vm_fault(args) {
if (drm_dev_enter()) { __ttm_bo_vm_fault(args); drm_dev_exit(); } else { drm_gem_insert_dummy_pfn(); } }
I think drm_gem_insert_dummy_pfn(); should be portable across drivers, so another nice point to try to unifiy drivers as much as possible. -Daniel
pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; vm_fault_t ret;
- int idx;
- struct drm_device *ddev = bo->base.dev;
- ret = ttm_bo_vm_reserve(bo, vmf);
- if (ret)
return ret;
- if (drm_dev_enter(ddev, &idx)) {
ret = ttm_bo_vm_reserve(bo, vmf);
if (ret)
goto exit;
prot = vma->vm_page_prot;
- prot = vma->vm_page_prot;
- ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
- if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
goto exit;
dma_resv_unlock(bo->base.resv);
+exit:
return ret;drm_dev_exit(idx);
- } else {
- dma_resv_unlock(bo->base.resv);
struct drm_file *file = NULL;
struct page *dummy_page = NULL;
int handle;
- return ret;
/* We are faulting on imported BO from dma_buf */
if (bo->base.dma_buf && bo->base.import_attach) {
dummy_page = bo->base.dummy_page;
/* We are faulting on non imported BO, find drm_file owning the BO*/
Uh, we can't fish that out of the vma->vm_file pointer somehow? Or is that one all wrong? Doing this kind of list walk looks pretty horrible.
If the vma doesn't have the right pointer I guess next option is that we store the drm_file page in gem_bo->dummy_page, and replace it on first export. But that's going to be tricky to track ...
} else {
struct drm_gem_object *gobj;
mutex_lock(&ddev->filelist_mutex);
list_for_each_entry(file, &ddev->filelist, lhead) {
spin_lock(&file->table_lock);
idr_for_each_entry(&file->object_idr, gobj, handle) {
if (gobj == &bo->base) {
dummy_page = file->dummy_page;
break;
}
}
spin_unlock(&file->table_lock);
}
mutex_unlock(&ddev->filelist_mutex);
}
if (dummy_page) {
/*
* Let do_fault complete the PTE install e.t.c using vmf->page
*
* TODO - should i call free_page somewhere ?
Nah, instead don't call get_page. The page will be around as long as there's a reference for the drm_file or gem_bo, which is longer than any mmap. Otherwise yes this would like really badly.
So actually that was my thinking in the first place and I indeed avoided taking reference and this ended up with multiple BUG_ONs as seen bellow where refcount:-63 mapcount:-48 for a page are deep into negative values... Those warnings were gone once i added get_page(dummy) which in my opinion implies that there is a page reference per each PTE and that when there is unmapping of the process address space and PTEs are deleted there is also put_page somewhere in mm core and the get_page per mapping keeps it balanced.
Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762929] BUG: Bad page map in process glxgear:disk$0 pte:8000000132284867 pmd:15aaec067 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762931] page:ffffe63384c8a100 refcount:-63 mapcount:-48 mapping:0000000000000000 index:0x0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762932] flags: 0x17fff8000000008(dirty) Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762933] raw: 017fff8000000008 dead000000000100 dead000000000122 0000000000000000 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762934] raw: 0000000000000000 0000000000000000 ffffffc1ffffffcf 0000000000000000 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762935] page dumped because: bad pte Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762937] addr:00007fe086263000 vm_flags:1c0440fb anon_vma:0000000000000000 mapping:ffff9b5cd42db268 index:1008b3 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762981] file:renderD129 fault:ttm_bo_vm_fault [ttm] mmap:amdgpu_mmap [amdgpu] readpage:0x0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762984] CPU: 5 PID: 2619 Comm: glxgear:disk$0 Tainted: G B OE 5.6.0-dev+ #51 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762985] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762985] Call Trace: Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762988] dump_stack+0x68/0x9b Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762990] print_bad_pte+0x19f/0x270 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762992] ? lock_page_memcg+0x5/0xf0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.762995] unmap_page_range+0x777/0xbe0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763000] unmap_vmas+0xcc/0x160 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763004] exit_mmap+0xb5/0x1b0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763009] mmput+0x65/0x140 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763010] do_exit+0x362/0xc40 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763013] do_group_exit+0x47/0xb0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763016] get_signal+0x18b/0xc30 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763019] do_signal+0x36/0x6a0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763021] ? __set_task_comm+0x62/0x120 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763024] ? __x64_sys_futex+0x88/0x180 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763028] exit_to_usermode_loop+0x6f/0xc0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763030] do_syscall_64+0x149/0x1c0 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763032] entry_SYSCALL_64_after_hwframe+0x49/0xbe Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763034] RIP: 0033:0x7fe091bd9360 Jun 20 01:36:43 ubuntu-1604-test kernel: [ 98.763037] Code: Bad RIP value.
Uh, I guess that just shows how little I understand how this all works. But yeah if we set vmf->page then I guess core mm takes care of everything, but apparently expects a page reference. -Daniel
Andrey
*/
get_page(dummy_page);
vmf->page = dummy_page;
return 0;
} else {
return VM_FAULT_SIGSEGV;
Hm that would be a kernel bug, wouldn't it? WARN_ON() required here imo. -Daniel
}
- } } EXPORT_SYMBOL(ttm_bo_vm_fault);
-- 2.7.4
On 6/22/20 5:41 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:02AM -0400, Andrey Grodzovsky wrote:
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 389128b..2f8bf5e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -35,6 +35,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_file.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
Hm I think diff and code flow look a bit bad now. What about renaming the current function to __ttm_bo_vm_fault and then having something like the below:
ttm_bo_vm_fault(args) {
if (drm_dev_enter()) { __ttm_bo_vm_fault(args); drm_dev_exit(); } else { drm_gem_insert_dummy_pfn(); } }
I think drm_gem_insert_dummy_pfn(); should be portable across drivers, so another nice point to try to unifiy drivers as much as possible. -Daniel
pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; vm_fault_t ret;
- int idx;
- struct drm_device *ddev = bo->base.dev;
- ret = ttm_bo_vm_reserve(bo, vmf);
- if (ret)
return ret;
- if (drm_dev_enter(ddev, &idx)) {
ret = ttm_bo_vm_reserve(bo, vmf);
if (ret)
goto exit;
prot = vma->vm_page_prot;
- prot = vma->vm_page_prot;
- ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
- if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
goto exit;
dma_resv_unlock(bo->base.resv);
+exit:
return ret;drm_dev_exit(idx);
- } else {
- dma_resv_unlock(bo->base.resv);
struct drm_file *file = NULL;
struct page *dummy_page = NULL;
int handle;
- return ret;
/* We are faulting on imported BO from dma_buf */
if (bo->base.dma_buf && bo->base.import_attach) {
dummy_page = bo->base.dummy_page;
/* We are faulting on non imported BO, find drm_file owning the BO*/
Uh, we can't fish that out of the vma->vm_file pointer somehow? Or is that one all wrong? Doing this kind of list walk looks pretty horrible.
If the vma doesn't have the right pointer I guess next option is that we store the drm_file page in gem_bo->dummy_page, and replace it on first export. But that's going to be tricky to track ...
For this one I hope to make all of this obsolete if Christian's suggestion from path 1/8 about mapping global RO dummy page for read and COW on write will be possible to implement (testing that indeed no memory usage explodes)
Andrey
} else {
struct drm_gem_object *gobj;
mutex_lock(&ddev->filelist_mutex);
list_for_each_entry(file, &ddev->filelist, lhead) {
spin_lock(&file->table_lock);
idr_for_each_entry(&file->object_idr, gobj, handle) {
if (gobj == &bo->base) {
dummy_page = file->dummy_page;
break;
}
}
spin_unlock(&file->table_lock);
}
mutex_unlock(&ddev->filelist_mutex);
}
if (dummy_page) {
/*
* Let do_fault complete the PTE install e.t.c using vmf->page
*
* TODO - should i call free_page somewhere ?
Nah, instead don't call get_page. The page will be around as long as there's a reference for the drm_file or gem_bo, which is longer than any mmap. Otherwise yes this would like really badly.
*/
get_page(dummy_page);
vmf->page = dummy_page;
return 0;
} else {
return VM_FAULT_SIGSEGV;
Hm that would be a kernel bug, wouldn't it? WARN_ON() required here imo. -Daniel
}
- } } EXPORT_SYMBOL(ttm_bo_vm_fault);
-- 2.7.4
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 389128b..2f8bf5e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -35,6 +35,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_file.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; vm_fault_t ret;
- int idx;
- struct drm_device *ddev = bo->base.dev;
- ret = ttm_bo_vm_reserve(bo, vmf);
- if (ret)
return ret;
- if (drm_dev_enter(ddev, &idx)) {
Better do this like if (!drm_dev_enter(...)) return ttm_bo_vm_dummy(..);
This way you can move all the dummy fault handling into a separate function without cluttering this one here to much.
Christian.
ret = ttm_bo_vm_reserve(bo, vmf);
if (ret)
goto exit;
prot = vma->vm_page_prot;
- prot = vma->vm_page_prot;
- ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
- if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
goto exit;
dma_resv_unlock(bo->base.resv);
+exit:
return ret;drm_dev_exit(idx);
- } else {
- dma_resv_unlock(bo->base.resv);
struct drm_file *file = NULL;
struct page *dummy_page = NULL;
int handle;
- return ret;
/* We are faulting on imported BO from dma_buf */
if (bo->base.dma_buf && bo->base.import_attach) {
dummy_page = bo->base.dummy_page;
/* We are faulting on non imported BO, find drm_file owning the BO*/
} else {
struct drm_gem_object *gobj;
mutex_lock(&ddev->filelist_mutex);
list_for_each_entry(file, &ddev->filelist, lhead) {
spin_lock(&file->table_lock);
idr_for_each_entry(&file->object_idr, gobj, handle) {
if (gobj == &bo->base) {
dummy_page = file->dummy_page;
break;
}
}
spin_unlock(&file->table_lock);
}
mutex_unlock(&ddev->filelist_mutex);
}
if (dummy_page) {
/*
* Let do_fault complete the PTE install e.t.c using vmf->page
*
* TODO - should i call free_page somewhere ?
*/
get_page(dummy_page);
vmf->page = dummy_page;
return 0;
} else {
return VM_FAULT_SIGSEGV;
}
- } } EXPORT_SYMBOL(ttm_bo_vm_fault);
Helper function to be used to invalidate all BOs CPU mappings once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/ttm/ttm_bo.c | 8 ++++++-- include/drm/ttm/ttm_bo_driver.h | 7 +++++++ 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index c5b516f..926a365 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo) ttm_bo_unmap_virtual_locked(bo); ttm_mem_io_unlock(man); } - - EXPORT_SYMBOL(ttm_bo_unmap_virtual);
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev) +{ + unmap_mapping_range(bdev->dev_mapping, 0, 0, 1); +} +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space); + int ttm_bo_wait(struct ttm_buffer_object *bo, bool interruptible, bool no_wait) { diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index c9e0fd0..39ea44f 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
/** + * ttm_bo_unmap_virtual_address_space + * + * @bdev: tear down all the virtual mappings for this device + */ +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev); + +/** * ttm_bo_unmap_virtual * * @bo: tear down the virtual mappings for this BO
On Sun, Jun 21, 2020 at 02:03:03AM -0400, Andrey Grodzovsky wrote:
Helper function to be used to invalidate all BOs CPU mappings once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
This seems to be missing the code to invalidate all the dma-buf mmaps?
Probably needs more testcases if you're not yet catching this. Or am I missing something, and we're exchanging the the address space also for dma-buf? -Daniel
drivers/gpu/drm/ttm/ttm_bo.c | 8 ++++++-- include/drm/ttm/ttm_bo_driver.h | 7 +++++++ 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index c5b516f..926a365 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo) ttm_bo_unmap_virtual_locked(bo); ttm_mem_io_unlock(man); }
EXPORT_SYMBOL(ttm_bo_unmap_virtual);
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev) +{
- unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
+} +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
int ttm_bo_wait(struct ttm_buffer_object *bo, bool interruptible, bool no_wait) { diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index c9e0fd0..39ea44f 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
/**
- ttm_bo_unmap_virtual_address_space
- @bdev: tear down all the virtual mappings for this device
- */
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
+/**
- ttm_bo_unmap_virtual
- @bo: tear down the virtual mappings for this BO
-- 2.7.4
On 6/22/20 5:45 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:03AM -0400, Andrey Grodzovsky wrote:
Helper function to be used to invalidate all BOs CPU mappings once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
This seems to be missing the code to invalidate all the dma-buf mmaps?
Probably needs more testcases if you're not yet catching this. Or am I missing something, and we're exchanging the the address space also for dma-buf? -Daniel
IMHO the device address space includes all user clients having a CPU view of the BO either from direct mapping though drm file or by mapping through imported BO's FD.
Andrey
drivers/gpu/drm/ttm/ttm_bo.c | 8 ++++++-- include/drm/ttm/ttm_bo_driver.h | 7 +++++++ 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index c5b516f..926a365 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo) ttm_bo_unmap_virtual_locked(bo); ttm_mem_io_unlock(man); }
- EXPORT_SYMBOL(ttm_bo_unmap_virtual);
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev) +{
- unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
+} +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
- int ttm_bo_wait(struct ttm_buffer_object *bo, bool interruptible, bool no_wait) {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index c9e0fd0..39ea44f 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
/**
- ttm_bo_unmap_virtual_address_space
- @bdev: tear down all the virtual mappings for this device
- */
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
+/**
- ttm_bo_unmap_virtual
- @bo: tear down the virtual mappings for this BO
-- 2.7.4
On Tue, Jun 23, 2020 at 01:00:02AM -0400, Andrey Grodzovsky wrote:
On 6/22/20 5:45 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:03AM -0400, Andrey Grodzovsky wrote:
Helper function to be used to invalidate all BOs CPU mappings once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
This seems to be missing the code to invalidate all the dma-buf mmaps?
Probably needs more testcases if you're not yet catching this. Or am I missing something, and we're exchanging the the address space also for dma-buf? -Daniel
IMHO the device address space includes all user clients having a CPU view of the BO either from direct mapping though drm file or by mapping through imported BO's FD.
Uh this is all very confusing and very much midlayer-y thanks to ttm.
I think a much better solution would be to have a core gem helper for this (well not even gem really, this is core drm), which directly uses drm_device->anon_inode->i_mapping.
Then a) it clearly matches what drm_prime.c does on export b) can be reused across all drivers, not just ttm
So much better.
What's more, we could then very easily make the generic drm_dev_unplug_and_unmap helper I've talked about for the amdgpu patch, which I think would be really neat&pretty.
Thoughts? -Daniel
Andrey
drivers/gpu/drm/ttm/ttm_bo.c | 8 ++++++-- include/drm/ttm/ttm_bo_driver.h | 7 +++++++ 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index c5b516f..926a365 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo) ttm_bo_unmap_virtual_locked(bo); ttm_mem_io_unlock(man); }
- EXPORT_SYMBOL(ttm_bo_unmap_virtual);
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev) +{
- unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
+} +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
- int ttm_bo_wait(struct ttm_buffer_object *bo, bool interruptible, bool no_wait) {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index c9e0fd0..39ea44f 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo); /**
- ttm_bo_unmap_virtual_address_space
- @bdev: tear down all the virtual mappings for this device
- */
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
+/**
- ttm_bo_unmap_virtual
- @bo: tear down the virtual mappings for this BO
-- 2.7.4
Am 23.06.20 um 12:25 schrieb Daniel Vetter:
On Tue, Jun 23, 2020 at 01:00:02AM -0400, Andrey Grodzovsky wrote:
On 6/22/20 5:45 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:03AM -0400, Andrey Grodzovsky wrote:
Helper function to be used to invalidate all BOs CPU mappings once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
This seems to be missing the code to invalidate all the dma-buf mmaps?
Probably needs more testcases if you're not yet catching this. Or am I missing something, and we're exchanging the the address space also for dma-buf? -Daniel
IMHO the device address space includes all user clients having a CPU view of the BO either from direct mapping though drm file or by mapping through imported BO's FD.
Uh this is all very confusing and very much midlayer-y thanks to ttm.
I think a much better solution would be to have a core gem helper for this (well not even gem really, this is core drm), which directly uses drm_device->anon_inode->i_mapping.
Then a) it clearly matches what drm_prime.c does on export b) can be reused across all drivers, not just ttm
So much better.
What's more, we could then very easily make the generic drm_dev_unplug_and_unmap helper I've talked about for the amdgpu patch, which I think would be really neat&pretty.
Good point, that is indeed a rather nice idea.
Christian.
Thoughts? -Daniel
Andrey
drivers/gpu/drm/ttm/ttm_bo.c | 8 ++++++-- include/drm/ttm/ttm_bo_driver.h | 7 +++++++ 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index c5b516f..926a365 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo) ttm_bo_unmap_virtual_locked(bo); ttm_mem_io_unlock(man); }
- EXPORT_SYMBOL(ttm_bo_unmap_virtual);
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev) +{
- unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
+} +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
- int ttm_bo_wait(struct ttm_buffer_object *bo, bool interruptible, bool no_wait) {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index c9e0fd0..39ea44f 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo); /**
- ttm_bo_unmap_virtual_address_space
- @bdev: tear down all the virtual mappings for this device
- */
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
+/** * ttm_bo_unmap_virtual * * @bo: tear down the virtual mappings for this BO -- 2.7.4
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Helper function to be used to invalidate all BOs CPU mappings once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/ttm/ttm_bo.c | 8 ++++++-- include/drm/ttm/ttm_bo_driver.h | 7 +++++++ 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index c5b516f..926a365 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo) ttm_bo_unmap_virtual_locked(bo); ttm_mem_io_unlock(man); }
- EXPORT_SYMBOL(ttm_bo_unmap_virtual);
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev) +{
- unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
+} +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
- int ttm_bo_wait(struct ttm_buffer_object *bo, bool interruptible, bool no_wait) {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index c9e0fd0..39ea44f 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
/**
- ttm_bo_unmap_virtual_address_space
- @bdev: tear down all the virtual mappings for this device
- */
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
+/**
- ttm_bo_unmap_virtual
- @bo: tear down the virtual mappings for this BO
On Sun, Jun 21, 2020 at 2:05 AM Andrey Grodzovsky andrey.grodzovsky@amd.com wrote:
Helper function to be used to invalidate all BOs CPU mappings once device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Typo in the subject: unampping -> unmapping
Alex
drivers/gpu/drm/ttm/ttm_bo.c | 8 ++++++-- include/drm/ttm/ttm_bo_driver.h | 7 +++++++ 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index c5b516f..926a365 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo) ttm_bo_unmap_virtual_locked(bo); ttm_mem_io_unlock(man); }
EXPORT_SYMBOL(ttm_bo_unmap_virtual);
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev) +{
unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
+} +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
int ttm_bo_wait(struct ttm_buffer_object *bo, bool interruptible, bool no_wait) { diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index c9e0fd0..39ea44f 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
/**
- ttm_bo_unmap_virtual_address_space
- @bdev: tear down all the virtual mappings for this device
- */
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
+/**
- ttm_bo_unmap_virtual
- @bo: tear down the virtual mappings for this BO
-- 2.7.4
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++---- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 24 +++++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 23 +++++++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++ 7 files changed, 54 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 2a806cb..604a681 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1003,7 +1003,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, struct drm_device *ddev, struct pci_dev *pdev, uint32_t flags); -void amdgpu_device_fini(struct amdgpu_device *adev); +void amdgpu_device_fini_early(struct amdgpu_device *adev); +void amdgpu_device_fini_late(struct amdgpu_device *adev); + int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos, @@ -1188,6 +1190,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev); int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv); void amdgpu_driver_postclose_kms(struct drm_device *dev, struct drm_file *file_priv); +void amdgpu_driver_release_kms(struct drm_device *dev); + int amdgpu_device_ip_suspend(struct amdgpu_device *adev); int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int amdgpu_device_resume(struct drm_device *dev, bool fbcon); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index cc41e8f..e7b9065 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2309,6 +2309,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) { int i, r;
+ //DRM_ERROR("adev 0x%llx", (long long unsigned int)adev); + amdgpu_ras_pre_fini(adev);
if (adev->gmc.xgmi.num_physical_nodes > 1) @@ -3304,10 +3306,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, * Tear down the driver info (all asics). * Called at driver shutdown. */ -void amdgpu_device_fini(struct amdgpu_device *adev) +void amdgpu_device_fini_early(struct amdgpu_device *adev) { - int r; - DRM_INFO("amdgpu: finishing device.\n"); flush_delayed_work(&adev->delayed_init_work); adev->shutdown = true; @@ -3330,7 +3330,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev) if (adev->pm_sysfs_en) amdgpu_pm_sysfs_fini(adev); amdgpu_fbdev_fini(adev); - r = amdgpu_device_ip_fini(adev); + + amdgpu_irq_fini_early(adev); +} + +void amdgpu_device_fini_late(struct amdgpu_device *adev) +{ + amdgpu_device_ip_fini(adev); if (adev->firmware.gpu_info_fw) { release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL; @@ -3368,6 +3374,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) amdgpu_discovery_fini(adev); + }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 9e5afa5..43592dc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1134,12 +1134,9 @@ amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
-#ifdef MODULE - if (THIS_MODULE->state != MODULE_STATE_GOING) -#endif - DRM_ERROR("Hotplug removal is not supported\n"); drm_dev_unplug(dev); amdgpu_driver_unload_kms(dev); + pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); drm_dev_put(dev); @@ -1445,6 +1442,7 @@ static struct drm_driver kms_driver = { .dumb_create = amdgpu_mode_dumb_create, .dumb_map_offset = amdgpu_mode_dumb_mmap, .fops = &amdgpu_driver_kms_fops, + .release = &amdgpu_driver_release_kms,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 0cc4c67..1697655 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -49,6 +49,7 @@ #include <drm/drm_irq.h> #include <drm/drm_vblank.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu.h" #include "amdgpu_ih.h" #include "atom.h" @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev) return 0; }
+ +void amdgpu_irq_fini_early(struct amdgpu_device *adev) +{ + if (adev->irq.installed) { + drm_irq_uninstall(adev->ddev); + adev->irq.installed = false; + if (adev->irq.msi_enabled) + pci_free_irq_vectors(adev->pdev); + + if (!amdgpu_device_has_dc_support(adev)) + flush_work(&adev->hotplug_work); + } +} + /** * amdgpu_irq_fini - shut down interrupt handling * @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev) { unsigned i, j;
- if (adev->irq.installed) { - drm_irq_uninstall(adev->ddev); - adev->irq.installed = false; - if (adev->irq.msi_enabled) - pci_free_irq_vectors(adev->pdev); - if (!amdgpu_device_has_dc_support(adev)) - flush_work(&adev->hotplug_work); - } - for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) { if (!adev->irq.client[i].sources) continue; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index c718e94..718c70f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
int amdgpu_irq_init(struct amdgpu_device *adev); void amdgpu_irq_fini(struct amdgpu_device *adev); +void amdgpu_irq_fini_early(struct amdgpu_device *adev); int amdgpu_irq_add_id(struct amdgpu_device *adev, unsigned client_id, unsigned src_id, struct amdgpu_irq_src *source); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index c0b1904..9d0af22 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -29,6 +29,7 @@ #include "amdgpu.h" #include <drm/drm_debugfs.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu_sched.h" #include "amdgpu_uvd.h" #include "amdgpu_vce.h" @@ -86,7 +87,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev) amdgpu_unregister_gpu_instance(adev);
if (adev->rmmio == NULL) - goto done_free; + return;
if (adev->runpm) { pm_runtime_get_sync(dev->dev); @@ -95,11 +96,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
amdgpu_acpi_fini(adev);
- amdgpu_device_fini(adev); - -done_free: - kfree(adev); - dev->dev_private = NULL; + amdgpu_device_fini_early(adev); }
void amdgpu_register_gpu_instance(struct amdgpu_device *adev) @@ -1108,6 +1105,20 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev, pm_runtime_put_autosuspend(dev->dev); }
+ +void amdgpu_driver_release_kms (struct drm_device *dev) +{ + struct amdgpu_device *adev = dev->dev_private; + + amdgpu_device_fini_late(adev); + + kfree(adev); + dev->dev_private = NULL; + + drm_dev_fini(dev); + kfree(dev); +} + /* * VBlank related functions. */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 7348619..169c2239 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2056,9 +2056,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
+ //DRM_ERROR("adev 0x%llx", (long long unsigned int)adev); + if (!con) return 0;
+ /* Need disable ras on all IPs here before ip [hw/sw]fini */ amdgpu_ras_disable_all_features(adev, 0); amdgpu_ras_recovery_fini(adev);
On Sun, Jun 21, 2020 at 02:03:04AM -0400, Andrey Grodzovsky wrote:
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Long term I think best if as much of this code is converted over to devm (for hw stuff) and drmm (for sw stuff and allocations). Doing this all manually is very error prone.
I've started various such patches and others followed, but thus far only very simple drivers tackled. But it should be doable step by step at least, so you should have incremental benefits in code complexity right away I hope. -Daniel
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++---- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 24 +++++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 23 +++++++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++ 7 files changed, 54 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 2a806cb..604a681 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1003,7 +1003,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, struct drm_device *ddev, struct pci_dev *pdev, uint32_t flags); -void amdgpu_device_fini(struct amdgpu_device *adev); +void amdgpu_device_fini_early(struct amdgpu_device *adev); +void amdgpu_device_fini_late(struct amdgpu_device *adev);
int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos, @@ -1188,6 +1190,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev); int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv); void amdgpu_driver_postclose_kms(struct drm_device *dev, struct drm_file *file_priv); +void amdgpu_driver_release_kms(struct drm_device *dev);
int amdgpu_device_ip_suspend(struct amdgpu_device *adev); int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int amdgpu_device_resume(struct drm_device *dev, bool fbcon); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index cc41e8f..e7b9065 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2309,6 +2309,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) { int i, r;
//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
amdgpu_ras_pre_fini(adev);
if (adev->gmc.xgmi.num_physical_nodes > 1)
@@ -3304,10 +3306,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
- Tear down the driver info (all asics).
- Called at driver shutdown.
*/ -void amdgpu_device_fini(struct amdgpu_device *adev) +void amdgpu_device_fini_early(struct amdgpu_device *adev) {
- int r;
- DRM_INFO("amdgpu: finishing device.\n"); flush_delayed_work(&adev->delayed_init_work); adev->shutdown = true;
@@ -3330,7 +3330,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev) if (adev->pm_sysfs_en) amdgpu_pm_sysfs_fini(adev); amdgpu_fbdev_fini(adev);
- r = amdgpu_device_ip_fini(adev);
- amdgpu_irq_fini_early(adev);
+}
+void amdgpu_device_fini_late(struct amdgpu_device *adev) +{
- amdgpu_device_ip_fini(adev); if (adev->firmware.gpu_info_fw) { release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL;
@@ -3368,6 +3374,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) amdgpu_discovery_fini(adev);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 9e5afa5..43592dc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1134,12 +1134,9 @@ amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
-#ifdef MODULE
- if (THIS_MODULE->state != MODULE_STATE_GOING)
-#endif
drm_dev_unplug(dev); amdgpu_driver_unload_kms(dev);DRM_ERROR("Hotplug removal is not supported\n");
- pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); drm_dev_put(dev);
@@ -1445,6 +1442,7 @@ static struct drm_driver kms_driver = { .dumb_create = amdgpu_mode_dumb_create, .dumb_map_offset = amdgpu_mode_dumb_mmap, .fops = &amdgpu_driver_kms_fops,
.release = &amdgpu_driver_release_kms,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 0cc4c67..1697655 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -49,6 +49,7 @@ #include <drm/drm_irq.h> #include <drm/drm_vblank.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu.h" #include "amdgpu_ih.h" #include "atom.h" @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev) return 0; }
+void amdgpu_irq_fini_early(struct amdgpu_device *adev) +{
- if (adev->irq.installed) {
drm_irq_uninstall(adev->ddev);
adev->irq.installed = false;
if (adev->irq.msi_enabled)
pci_free_irq_vectors(adev->pdev);
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
- }
+}
/**
- amdgpu_irq_fini - shut down interrupt handling
@@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev) { unsigned i, j;
- if (adev->irq.installed) {
drm_irq_uninstall(adev->ddev);
adev->irq.installed = false;
if (adev->irq.msi_enabled)
pci_free_irq_vectors(adev->pdev);
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
- }
- for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) { if (!adev->irq.client[i].sources) continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index c718e94..718c70f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
int amdgpu_irq_init(struct amdgpu_device *adev); void amdgpu_irq_fini(struct amdgpu_device *adev); +void amdgpu_irq_fini_early(struct amdgpu_device *adev); int amdgpu_irq_add_id(struct amdgpu_device *adev, unsigned client_id, unsigned src_id, struct amdgpu_irq_src *source); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index c0b1904..9d0af22 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -29,6 +29,7 @@ #include "amdgpu.h" #include <drm/drm_debugfs.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu_sched.h" #include "amdgpu_uvd.h" #include "amdgpu_vce.h" @@ -86,7 +87,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev) amdgpu_unregister_gpu_instance(adev);
if (adev->rmmio == NULL)
goto done_free;
return;
if (adev->runpm) { pm_runtime_get_sync(dev->dev);
@@ -95,11 +96,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
amdgpu_acpi_fini(adev);
- amdgpu_device_fini(adev);
-done_free:
- kfree(adev);
- dev->dev_private = NULL;
- amdgpu_device_fini_early(adev);
}
void amdgpu_register_gpu_instance(struct amdgpu_device *adev) @@ -1108,6 +1105,20 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev, pm_runtime_put_autosuspend(dev->dev); }
+void amdgpu_driver_release_kms (struct drm_device *dev) +{
- struct amdgpu_device *adev = dev->dev_private;
- amdgpu_device_fini_late(adev);
- kfree(adev);
- dev->dev_private = NULL;
- drm_dev_fini(dev);
- kfree(dev);
+}
/*
- VBlank related functions.
*/ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 7348619..169c2239 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2056,9 +2056,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
if (!con) return 0;
/* Need disable ras on all IPs here before ip [hw/sw]fini */ amdgpu_ras_disable_all_features(adev, 0); amdgpu_ras_recovery_fini(adev);
-- 2.7.4
On 6/22/20 5:48 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:04AM -0400, Andrey Grodzovsky wrote:
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Long term I think best if as much of this code is converted over to devm (for hw stuff) and drmm (for sw stuff and allocations). Doing this all manually is very error prone.
I've started various such patches and others followed, but thus far only very simple drivers tackled. But it should be doable step by step at least, so you should have incremental benefits in code complexity right away I hope. -Daniel
Sure, I will definitely add this to my TODOs for after landing (hopefully) this patch set (after a few more iterations) as indeed the required changes for using devm and drmm are non trivial and I prefer to avoid diverging here into multiple directions at once.
Andrey
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++---- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 24 +++++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 23 +++++++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++ 7 files changed, 54 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 2a806cb..604a681 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1003,7 +1003,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, struct drm_device *ddev, struct pci_dev *pdev, uint32_t flags); -void amdgpu_device_fini(struct amdgpu_device *adev); +void amdgpu_device_fini_early(struct amdgpu_device *adev); +void amdgpu_device_fini_late(struct amdgpu_device *adev);
int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
@@ -1188,6 +1190,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev); int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv); void amdgpu_driver_postclose_kms(struct drm_device *dev, struct drm_file *file_priv); +void amdgpu_driver_release_kms(struct drm_device *dev);
- int amdgpu_device_ip_suspend(struct amdgpu_device *adev); int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index cc41e8f..e7b9065 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2309,6 +2309,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) { int i, r;
//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
amdgpu_ras_pre_fini(adev);
if (adev->gmc.xgmi.num_physical_nodes > 1)
@@ -3304,10 +3306,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
- Tear down the driver info (all asics).
- Called at driver shutdown.
*/ -void amdgpu_device_fini(struct amdgpu_device *adev) +void amdgpu_device_fini_early(struct amdgpu_device *adev) {
- int r;
- DRM_INFO("amdgpu: finishing device.\n"); flush_delayed_work(&adev->delayed_init_work); adev->shutdown = true;
@@ -3330,7 +3330,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev) if (adev->pm_sysfs_en) amdgpu_pm_sysfs_fini(adev); amdgpu_fbdev_fini(adev);
- r = amdgpu_device_ip_fini(adev);
- amdgpu_irq_fini_early(adev);
+}
+void amdgpu_device_fini_late(struct amdgpu_device *adev) +{
- amdgpu_device_ip_fini(adev); if (adev->firmware.gpu_info_fw) { release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL;
@@ -3368,6 +3374,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) amdgpu_discovery_fini(adev);
- }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 9e5afa5..43592dc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1134,12 +1134,9 @@ amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
-#ifdef MODULE
- if (THIS_MODULE->state != MODULE_STATE_GOING)
-#endif
drm_dev_unplug(dev); amdgpu_driver_unload_kms(dev);DRM_ERROR("Hotplug removal is not supported\n");
- pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); drm_dev_put(dev);
@@ -1445,6 +1442,7 @@ static struct drm_driver kms_driver = { .dumb_create = amdgpu_mode_dumb_create, .dumb_map_offset = amdgpu_mode_dumb_mmap, .fops = &amdgpu_driver_kms_fops,
.release = &amdgpu_driver_release_kms,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 0cc4c67..1697655 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -49,6 +49,7 @@ #include <drm/drm_irq.h> #include <drm/drm_vblank.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu.h" #include "amdgpu_ih.h" #include "atom.h" @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev) return 0; }
+void amdgpu_irq_fini_early(struct amdgpu_device *adev) +{
- if (adev->irq.installed) {
drm_irq_uninstall(adev->ddev);
adev->irq.installed = false;
if (adev->irq.msi_enabled)
pci_free_irq_vectors(adev->pdev);
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
- }
+}
- /**
- amdgpu_irq_fini - shut down interrupt handling
@@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev) { unsigned i, j;
- if (adev->irq.installed) {
drm_irq_uninstall(adev->ddev);
adev->irq.installed = false;
if (adev->irq.msi_enabled)
pci_free_irq_vectors(adev->pdev);
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
- }
- for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) { if (!adev->irq.client[i].sources) continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index c718e94..718c70f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
int amdgpu_irq_init(struct amdgpu_device *adev); void amdgpu_irq_fini(struct amdgpu_device *adev); +void amdgpu_irq_fini_early(struct amdgpu_device *adev); int amdgpu_irq_add_id(struct amdgpu_device *adev, unsigned client_id, unsigned src_id, struct amdgpu_irq_src *source); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index c0b1904..9d0af22 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -29,6 +29,7 @@ #include "amdgpu.h" #include <drm/drm_debugfs.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu_sched.h" #include "amdgpu_uvd.h" #include "amdgpu_vce.h" @@ -86,7 +87,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev) amdgpu_unregister_gpu_instance(adev);
if (adev->rmmio == NULL)
goto done_free;
return;
if (adev->runpm) { pm_runtime_get_sync(dev->dev);
@@ -95,11 +96,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
amdgpu_acpi_fini(adev);
- amdgpu_device_fini(adev);
-done_free:
- kfree(adev);
- dev->dev_private = NULL;
amdgpu_device_fini_early(adev); }
void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
@@ -1108,6 +1105,20 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev, pm_runtime_put_autosuspend(dev->dev); }
+void amdgpu_driver_release_kms (struct drm_device *dev) +{
- struct amdgpu_device *adev = dev->dev_private;
- amdgpu_device_fini_late(adev);
- kfree(adev);
- dev->dev_private = NULL;
- drm_dev_fini(dev);
- kfree(dev);
+}
- /*
*/
- VBlank related functions.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 7348619..169c2239 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2056,9 +2056,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
if (!con) return 0;
/* Need disable ras on all IPs here before ip [hw/sw]fini */ amdgpu_ras_disable_all_features(adev, 0); amdgpu_ras_recovery_fini(adev);
-- 2.7.4
On Wed, Nov 11, 2020 at 11:19:04PM -0500, Andrey Grodzovsky wrote:
On 6/22/20 5:48 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:04AM -0400, Andrey Grodzovsky wrote:
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Long term I think best if as much of this code is converted over to devm (for hw stuff) and drmm (for sw stuff and allocations). Doing this all manually is very error prone.
I've started various such patches and others followed, but thus far only very simple drivers tackled. But it should be doable step by step at least, so you should have incremental benefits in code complexity right away I hope. -Daniel
Sure, I will definitely add this to my TODOs for after landing (hopefully) this patch set (after a few more iterations) as indeed the required changes for using devm and drmm are non trivial and I prefer to avoid diverging here into multiple directions at once.
For the display side there's a very nice patch series from Philip Zabel pending:
https://lore.kernel.org/dri-devel/20200911135724.25833-1-p.zabel@pengutronix...
I think you'll want to use this. It's not landed yet, so a nudge from someone else also using it would help I think.
Cheers, Daniel
Andrey
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++---- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 24 +++++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 23 +++++++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++ 7 files changed, 54 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 2a806cb..604a681 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1003,7 +1003,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, struct drm_device *ddev, struct pci_dev *pdev, uint32_t flags); -void amdgpu_device_fini(struct amdgpu_device *adev); +void amdgpu_device_fini_early(struct amdgpu_device *adev); +void amdgpu_device_fini_late(struct amdgpu_device *adev);
- int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev); void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
@@ -1188,6 +1190,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev); int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv); void amdgpu_driver_postclose_kms(struct drm_device *dev, struct drm_file *file_priv); +void amdgpu_driver_release_kms(struct drm_device *dev);
- int amdgpu_device_ip_suspend(struct amdgpu_device *adev); int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index cc41e8f..e7b9065 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2309,6 +2309,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) { int i, r;
- //DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
- amdgpu_ras_pre_fini(adev); if (adev->gmc.xgmi.num_physical_nodes > 1)
@@ -3304,10 +3306,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
- Tear down the driver info (all asics).
- Called at driver shutdown.
*/ -void amdgpu_device_fini(struct amdgpu_device *adev) +void amdgpu_device_fini_early(struct amdgpu_device *adev) {
- int r;
- DRM_INFO("amdgpu: finishing device.\n"); flush_delayed_work(&adev->delayed_init_work); adev->shutdown = true;
@@ -3330,7 +3330,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev) if (adev->pm_sysfs_en) amdgpu_pm_sysfs_fini(adev); amdgpu_fbdev_fini(adev);
- r = amdgpu_device_ip_fini(adev);
- amdgpu_irq_fini_early(adev);
+}
+void amdgpu_device_fini_late(struct amdgpu_device *adev) +{
- amdgpu_device_ip_fini(adev); if (adev->firmware.gpu_info_fw) { release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL;
@@ -3368,6 +3374,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) amdgpu_discovery_fini(adev);
- }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 9e5afa5..43592dc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1134,12 +1134,9 @@ amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev); -#ifdef MODULE
- if (THIS_MODULE->state != MODULE_STATE_GOING)
-#endif
drm_dev_unplug(dev); amdgpu_driver_unload_kms(dev);DRM_ERROR("Hotplug removal is not supported\n");
- pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); drm_dev_put(dev);
@@ -1445,6 +1442,7 @@ static struct drm_driver kms_driver = { .dumb_create = amdgpu_mode_dumb_create, .dumb_map_offset = amdgpu_mode_dumb_mmap, .fops = &amdgpu_driver_kms_fops,
- .release = &amdgpu_driver_release_kms, .prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 0cc4c67..1697655 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -49,6 +49,7 @@ #include <drm/drm_irq.h> #include <drm/drm_vblank.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu.h" #include "amdgpu_ih.h" #include "atom.h" @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev) return 0; }
+void amdgpu_irq_fini_early(struct amdgpu_device *adev) +{
- if (adev->irq.installed) {
drm_irq_uninstall(adev->ddev);
adev->irq.installed = false;
if (adev->irq.msi_enabled)
pci_free_irq_vectors(adev->pdev);
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
- }
+}
- /**
- amdgpu_irq_fini - shut down interrupt handling
@@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev) { unsigned i, j;
- if (adev->irq.installed) {
drm_irq_uninstall(adev->ddev);
adev->irq.installed = false;
if (adev->irq.msi_enabled)
pci_free_irq_vectors(adev->pdev);
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
- }
- for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) { if (!adev->irq.client[i].sources) continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index c718e94..718c70f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg); int amdgpu_irq_init(struct amdgpu_device *adev); void amdgpu_irq_fini(struct amdgpu_device *adev); +void amdgpu_irq_fini_early(struct amdgpu_device *adev); int amdgpu_irq_add_id(struct amdgpu_device *adev, unsigned client_id, unsigned src_id, struct amdgpu_irq_src *source); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index c0b1904..9d0af22 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -29,6 +29,7 @@ #include "amdgpu.h" #include <drm/drm_debugfs.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu_sched.h" #include "amdgpu_uvd.h" #include "amdgpu_vce.h" @@ -86,7 +87,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev) amdgpu_unregister_gpu_instance(adev); if (adev->rmmio == NULL)
goto done_free;
if (adev->runpm) { pm_runtime_get_sync(dev->dev);return;
@@ -95,11 +96,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev) amdgpu_acpi_fini(adev);
- amdgpu_device_fini(adev);
-done_free:
- kfree(adev);
- dev->dev_private = NULL;
- amdgpu_device_fini_early(adev); } void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
@@ -1108,6 +1105,20 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev, pm_runtime_put_autosuspend(dev->dev); }
+void amdgpu_driver_release_kms (struct drm_device *dev) +{
- struct amdgpu_device *adev = dev->dev_private;
- amdgpu_device_fini_late(adev);
- kfree(adev);
- dev->dev_private = NULL;
- drm_dev_fini(dev);
- kfree(dev);
+}
- /*
*/
- VBlank related functions.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 7348619..169c2239 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2056,9 +2056,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
- //DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
- if (!con) return 0;
- /* Need disable ras on all IPs here before ip [hw/sw]fini */ amdgpu_ras_disable_all_features(adev, 0); amdgpu_ras_recovery_fini(adev);
-- 2.7.4
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 13 +++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 7 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 ++++++++++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 12 ++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 13 ++++++++++- drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 10 +++++--- 8 files changed, 99 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 604a681..ba3775f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -726,6 +726,15 @@ struct amd_powerplay {
#define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 + +struct amdgpu_sysfs_list_node { + struct list_head head; + struct device_attribute *attr; +}; + +#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \ + struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr} + struct amdgpu_device { struct device *dev; struct drm_device *ddev; @@ -992,6 +1001,10 @@ struct amdgpu_device { char product_number[16]; char product_name[32]; char serial[16]; + + struct list_head sysfs_files_list; + struct mutex sysfs_files_list_lock; + };
static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c index fdd52d8..c1549ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev, return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version); }
+ static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version, NULL); +static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
/** * amdgpu_atombios_fini - free the driver info and callbacks for atombios @@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev) adev->mode_info.atom_context = NULL; kfree(adev->mode_info.atom_card_info); adev->mode_info.atom_card_info = NULL; - device_remove_file(adev->dev, &dev_attr_vbios_version); }
/** @@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) return ret; }
+ mutex_lock(&adev->sysfs_files_list_lock); + list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list); + mutex_unlock(&adev->sysfs_files_list_lock); + return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e7b9065..3173046 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = { NULL };
+static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name); +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count); + + /** * amdgpu_device_init - initialize the driver * @@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_LIST_HEAD(&adev->shadow_list); mutex_init(&adev->shadow_list_lock);
+ INIT_LIST_HEAD(&adev->sysfs_files_list); + mutex_init(&adev->sysfs_files_list_lock); + INIT_DELAYED_WORK(&adev->delayed_init_work, amdgpu_device_delayed_init_work_handler); INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work, @@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (r) { dev_err(adev->dev, "Could not create amdgpu device attr\n"); return r; + } else { + mutex_lock(&adev->sysfs_files_list_lock); + list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list); + list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list); + list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list); + list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list); + mutex_unlock(&adev->sysfs_files_list_lock); }
if (IS_ENABLED(CONFIG_PERF_EVENTS)) @@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev, return r; }
+static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev) +{ + struct amdgpu_sysfs_list_node *node; + + mutex_lock(&adev->sysfs_files_list_lock); + list_for_each_entry(node, &adev->sysfs_files_list, head) + device_remove_file(adev->dev, node->attr); + mutex_unlock(&adev->sysfs_files_list_lock); +} + /** * amdgpu_device_fini - tear down the driver * @@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev) amdgpu_fbdev_fini(adev);
amdgpu_irq_fini_early(adev); + + amdgpu_sysfs_remove_files(adev); + + if (adev->ucode_sysfs_en) + amdgpu_ucode_sysfs_fini(adev); }
void amdgpu_device_fini_late(struct amdgpu_device *adev) @@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev) adev->rmmio = NULL; amdgpu_device_doorbell_fini(adev);
- if (adev->ucode_sysfs_en) - amdgpu_ucode_sysfs_fini(adev); - - sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes); if (IS_ENABLED(CONFIG_PERF_EVENTS)) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index 6271044..e7b6c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO, amdgpu_mem_info_gtt_used_show, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used); + /** * amdgpu_gtt_mgr_init - init GTT manager and DRM MM * @@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, return ret; }
+ mutex_lock(&adev->sysfs_files_list_lock); + list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list); + list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list); + mutex_unlock(&adev->sysfs_files_list_lock); + return 0; }
@@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, */ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) { - struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev); struct amdgpu_gtt_mgr *mgr = man->priv; spin_lock(&mgr->lock); drm_mm_takedown(&mgr->mm); @@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) kfree(mgr); man->priv = NULL;
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total); - device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used); - return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index ddb4af0c..554fec0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw); +
const struct amd_ip_funcs psp_ip_funcs = { @@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
if (ret) DRM_ERROR("Failed to create USBC PD FW control file!"); + else { + mutex_lock(&adev->sysfs_files_list_lock); + list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list); + mutex_unlock(&adev->sysfs_files_list_lock); + }
return ret; }
static void psp_sysfs_fini(struct amdgpu_device *adev) { - device_remove_file(adev->dev, &dev_attr_usbc_pd_fw); }
const struct amdgpu_ip_block_version psp_v3_1_ip_block = diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 7723937..39c400c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO, static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO, amdgpu_mem_info_vram_vendor, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor); + static const struct attribute *amdgpu_vram_mgr_attributes[] = { &dev_attr_mem_info_vram_total.attr, &dev_attr_mem_info_vis_vram_total.attr, @@ -184,6 +190,15 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man, ret = sysfs_create_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes); if (ret) DRM_ERROR("Failed to register sysfs\n"); + else { + mutex_lock(&adev->sysfs_files_list_lock); + list_add_tail(&dev_attr_handle_mem_info_vram_total.head, &adev->sysfs_files_list); + list_add_tail(&dev_attr_handle_mem_info_vis_vram_total.head, &adev->sysfs_files_list); + list_add_tail(&dev_attr_handle_mem_info_vram_used.head, &adev->sysfs_files_list); + list_add_tail(&dev_attr_handle_mem_info_vis_vram_used.head, &adev->sysfs_files_list); + list_add_tail(&dev_attr_handle_mem_info_vram_vendor.head, &adev->sysfs_files_list); + mutex_unlock(&adev->sysfs_files_list_lock); + }
return 0; } @@ -198,7 +213,6 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man, */ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man) { - struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev); struct amdgpu_vram_mgr *mgr = man->priv;
spin_lock(&mgr->lock); @@ -206,7 +220,6 @@ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man) spin_unlock(&mgr->lock); kfree(mgr); man->priv = NULL; - sysfs_remove_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes); return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 90610b4..455eaa4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -272,6 +272,9 @@ static ssize_t amdgpu_xgmi_show_error(struct device *dev, static DEVICE_ATTR(xgmi_device_id, S_IRUGO, amdgpu_xgmi_show_device_id, NULL); static DEVICE_ATTR(xgmi_error, S_IRUGO, amdgpu_xgmi_show_error, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_device_id); +static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_error); + static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) { @@ -285,10 +288,19 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, return ret; }
+ mutex_lock(&adev->sysfs_files_list_lock); + list_add_tail(&dev_attr_handle_xgmi_device_id.head, &adev->sysfs_files_list); + mutex_unlock(&adev->sysfs_files_list_lock); + /* Create xgmi error file */ ret = device_create_file(adev->dev, &dev_attr_xgmi_error); if (ret) pr_err("failed to create xgmi_error\n"); + else { + mutex_lock(&adev->sysfs_files_list_lock); + list_add_tail(&dev_attr_handle_xgmi_error.head, &adev->sysfs_files_list); + mutex_unlock(&adev->sysfs_files_list_lock); + }
/* Create sysfs link to hive info folder on the first device */ @@ -325,7 +337,6 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, static void amdgpu_xgmi_sysfs_rem_dev_info(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) { - device_remove_file(adev->dev, &dev_attr_xgmi_device_id); sysfs_remove_link(&adev->dev->kobj, adev->ddev->unique); sysfs_remove_link(hive->kobj, adev->ddev->unique); } diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c index a7b8292..f95b0b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c @@ -265,6 +265,8 @@ static ssize_t df_v3_6_get_df_cntr_avail(struct device *dev, /* device attr for available perfmon counters */ static DEVICE_ATTR(df_cntr_avail, S_IRUGO, df_v3_6_get_df_cntr_avail, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(df_cntr_avail); + static void df_v3_6_query_hashes(struct amdgpu_device *adev) { u32 tmp; @@ -299,6 +301,11 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev) ret = device_create_file(adev->dev, &dev_attr_df_cntr_avail); if (ret) DRM_ERROR("failed to create file for available df counters\n"); + else { + mutex_lock(&adev->sysfs_files_list_lock); + list_add_tail(&dev_attr_handle_df_cntr_avail.head, &adev->sysfs_files_list); + mutex_unlock(&adev->sysfs_files_list_lock); + }
for (i = 0; i < AMDGPU_MAX_DF_PERFMONS; i++) adev->df_perfmon_config_assign_mask[i] = 0; @@ -308,9 +315,6 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
static void df_v3_6_sw_fini(struct amdgpu_device *adev) { - - device_remove_file(adev->dev, &dev_attr_df_cntr_avail); - }
static void df_v3_6_enable_broadcast_mode(struct amdgpu_device *adev,
On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Uh I thought sysfs just gets yanked completely. Please check with Greg KH whether hand-rolling all this really is the right solution here ... Feels very wrong. I thought this was all supposed to work by adding attributes before publishing the sysfs node, and then letting sysfs clean up everything. Not by cleaning up manually yourself.
Adding Greg for an authoritative answer. -Daniel
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 13 +++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 7 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 ++++++++++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 12 ++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 13 ++++++++++- drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 10 +++++--- 8 files changed, 99 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 604a681..ba3775f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -726,6 +726,15 @@ struct amd_powerplay {
#define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4
+struct amdgpu_sysfs_list_node {
- struct list_head head;
- struct device_attribute *attr;
+};
+#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
- struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
struct amdgpu_device { struct device *dev; struct drm_device *ddev; @@ -992,6 +1001,10 @@ struct amdgpu_device { char product_number[16]; char product_name[32]; char serial[16];
- struct list_head sysfs_files_list;
- struct mutex sysfs_files_list_lock;
};
static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c index fdd52d8..c1549ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev, return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version); }
static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version, NULL); +static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
/**
- amdgpu_atombios_fini - free the driver info and callbacks for atombios
@@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev) adev->mode_info.atom_context = NULL; kfree(adev->mode_info.atom_card_info); adev->mode_info.atom_card_info = NULL;
- device_remove_file(adev->dev, &dev_attr_vbios_version);
}
/** @@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) return ret; }
- mutex_lock(&adev->sysfs_files_list_lock);
- list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
- mutex_unlock(&adev->sysfs_files_list_lock);
- return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e7b9065..3173046 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = { NULL };
+static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name); +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
/**
- amdgpu_device_init - initialize the driver
@@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_LIST_HEAD(&adev->shadow_list); mutex_init(&adev->shadow_list_lock);
- INIT_LIST_HEAD(&adev->sysfs_files_list);
- mutex_init(&adev->sysfs_files_list_lock);
- INIT_DELAYED_WORK(&adev->delayed_init_work, amdgpu_device_delayed_init_work_handler); INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
@@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (r) { dev_err(adev->dev, "Could not create amdgpu device attr\n"); return r;
} else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
if (IS_ENABLED(CONFIG_PERF_EVENTS))
@@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev, return r; }
+static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev) +{
- struct amdgpu_sysfs_list_node *node;
- mutex_lock(&adev->sysfs_files_list_lock);
- list_for_each_entry(node, &adev->sysfs_files_list, head)
device_remove_file(adev->dev, node->attr);
- mutex_unlock(&adev->sysfs_files_list_lock);
+}
/**
- amdgpu_device_fini - tear down the driver
@@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev) amdgpu_fbdev_fini(adev);
amdgpu_irq_fini_early(adev);
- amdgpu_sysfs_remove_files(adev);
- if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
}
void amdgpu_device_fini_late(struct amdgpu_device *adev) @@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev) adev->rmmio = NULL; amdgpu_device_doorbell_fini(adev);
- if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
- sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes); if (IS_ENABLED(CONFIG_PERF_EVENTS)) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index 6271044..e7b6c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO, amdgpu_mem_info_gtt_used_show, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
/**
- amdgpu_gtt_mgr_init - init GTT manager and DRM MM
@@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, return ret; }
- mutex_lock(&adev->sysfs_files_list_lock);
- list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
- list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
- mutex_unlock(&adev->sysfs_files_list_lock);
- return 0;
}
@@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, */ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) {
- struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev); struct amdgpu_gtt_mgr *mgr = man->priv; spin_lock(&mgr->lock); drm_mm_takedown(&mgr->mm);
@@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) kfree(mgr); man->priv = NULL;
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
- return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index ddb4af0c..554fec0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
const struct amd_ip_funcs psp_ip_funcs = { @@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
if (ret) DRM_ERROR("Failed to create USBC PD FW control file!");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
return ret;
}
static void psp_sysfs_fini(struct amdgpu_device *adev) {
- device_remove_file(adev->dev, &dev_attr_usbc_pd_fw);
}
const struct amdgpu_ip_block_version psp_v3_1_ip_block = diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 7723937..39c400c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO, static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO, amdgpu_mem_info_vram_vendor, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);
static const struct attribute *amdgpu_vram_mgr_attributes[] = { &dev_attr_mem_info_vram_total.attr, &dev_attr_mem_info_vis_vram_total.attr, @@ -184,6 +190,15 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man, ret = sysfs_create_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes); if (ret) DRM_ERROR("Failed to register sysfs\n");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_mem_info_vram_total.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_mem_info_vis_vram_total.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_mem_info_vram_used.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_mem_info_vis_vram_used.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_mem_info_vram_vendor.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
return 0;
} @@ -198,7 +213,6 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man, */ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man) {
struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev); struct amdgpu_vram_mgr *mgr = man->priv;
spin_lock(&mgr->lock);
@@ -206,7 +220,6 @@ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man) spin_unlock(&mgr->lock); kfree(mgr); man->priv = NULL;
- sysfs_remove_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes); return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 90610b4..455eaa4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -272,6 +272,9 @@ static ssize_t amdgpu_xgmi_show_error(struct device *dev, static DEVICE_ATTR(xgmi_device_id, S_IRUGO, amdgpu_xgmi_show_device_id, NULL); static DEVICE_ATTR(xgmi_error, S_IRUGO, amdgpu_xgmi_show_error, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_device_id); +static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_error);
static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) { @@ -285,10 +288,19 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, return ret; }
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_xgmi_device_id.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
/* Create xgmi error file */ ret = device_create_file(adev->dev, &dev_attr_xgmi_error); if (ret) pr_err("failed to create xgmi_error\n");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_xgmi_error.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
/* Create sysfs link to hive info folder on the first device */
@@ -325,7 +337,6 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, static void amdgpu_xgmi_sysfs_rem_dev_info(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) {
- device_remove_file(adev->dev, &dev_attr_xgmi_device_id); sysfs_remove_link(&adev->dev->kobj, adev->ddev->unique); sysfs_remove_link(hive->kobj, adev->ddev->unique);
} diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c index a7b8292..f95b0b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c @@ -265,6 +265,8 @@ static ssize_t df_v3_6_get_df_cntr_avail(struct device *dev, /* device attr for available perfmon counters */ static DEVICE_ATTR(df_cntr_avail, S_IRUGO, df_v3_6_get_df_cntr_avail, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(df_cntr_avail);
static void df_v3_6_query_hashes(struct amdgpu_device *adev) { u32 tmp; @@ -299,6 +301,11 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev) ret = device_create_file(adev->dev, &dev_attr_df_cntr_avail); if (ret) DRM_ERROR("failed to create file for available df counters\n");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_df_cntr_avail.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
for (i = 0; i < AMDGPU_MAX_DF_PERFMONS; i++) adev->df_perfmon_config_assign_mask[i] = 0;
@@ -308,9 +315,6 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
static void df_v3_6_sw_fini(struct amdgpu_device *adev) {
- device_remove_file(adev->dev, &dev_attr_df_cntr_avail);
}
static void df_v3_6_enable_broadcast_mode(struct amdgpu_device *adev,
2.7.4
On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove.
Huh? That should not happen, do you have a backtrace of that crash?
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Uh I thought sysfs just gets yanked completely. Please check with Greg KH whether hand-rolling all this really is the right solution here ... Feels very wrong. I thought this was all supposed to work by adding attributes before publishing the sysfs node, and then letting sysfs clean up everything. Not by cleaning up manually yourself.
Yes, that is supposed to be the correct thing to do.
Adding Greg for an authoritative answer. -Daniel
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 13 +++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 7 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 ++++++++++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 12 ++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 13 ++++++++++- drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 10 +++++--- 8 files changed, 99 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 604a681..ba3775f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -726,6 +726,15 @@ struct amd_powerplay {
#define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4
+struct amdgpu_sysfs_list_node {
- struct list_head head;
- struct device_attribute *attr;
+};
You know we have lists of attributes already, called attribute groups, if you really wanted to do something like this. But, I don't think so.
Either way, don't hand-roll your own stuff that the driver core has provided for you for a decade or more, that's just foolish :)
+#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
- struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
struct amdgpu_device { struct device *dev; struct drm_device *ddev; @@ -992,6 +1001,10 @@ struct amdgpu_device { char product_number[16]; char product_name[32]; char serial[16];
- struct list_head sysfs_files_list;
- struct mutex sysfs_files_list_lock;
};
static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c index fdd52d8..c1549ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev, return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version); }
static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version, NULL); +static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
/**
- amdgpu_atombios_fini - free the driver info and callbacks for atombios
@@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev) adev->mode_info.atom_context = NULL; kfree(adev->mode_info.atom_card_info); adev->mode_info.atom_card_info = NULL;
- device_remove_file(adev->dev, &dev_attr_vbios_version);
}
/** @@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) return ret; }
- mutex_lock(&adev->sysfs_files_list_lock);
- list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
- mutex_unlock(&adev->sysfs_files_list_lock);
- return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e7b9065..3173046 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = { NULL };
+static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name); +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
/**
- amdgpu_device_init - initialize the driver
@@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_LIST_HEAD(&adev->shadow_list); mutex_init(&adev->shadow_list_lock);
- INIT_LIST_HEAD(&adev->sysfs_files_list);
- mutex_init(&adev->sysfs_files_list_lock);
- INIT_DELAYED_WORK(&adev->delayed_init_work, amdgpu_device_delayed_init_work_handler); INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
@@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (r) { dev_err(adev->dev, "Could not create amdgpu device attr\n"); return r;
} else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
if (IS_ENABLED(CONFIG_PERF_EVENTS))
@@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev, return r; }
+static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev) +{
- struct amdgpu_sysfs_list_node *node;
- mutex_lock(&adev->sysfs_files_list_lock);
- list_for_each_entry(node, &adev->sysfs_files_list, head)
device_remove_file(adev->dev, node->attr);
- mutex_unlock(&adev->sysfs_files_list_lock);
+}
/**
- amdgpu_device_fini - tear down the driver
@@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev) amdgpu_fbdev_fini(adev);
amdgpu_irq_fini_early(adev);
- amdgpu_sysfs_remove_files(adev);
- if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
}
void amdgpu_device_fini_late(struct amdgpu_device *adev) @@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev) adev->rmmio = NULL; amdgpu_device_doorbell_fini(adev);
- if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
- sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes); if (IS_ENABLED(CONFIG_PERF_EVENTS)) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index 6271044..e7b6c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO, amdgpu_mem_info_gtt_used_show, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
/**
- amdgpu_gtt_mgr_init - init GTT manager and DRM MM
@@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, return ret; }
- mutex_lock(&adev->sysfs_files_list_lock);
- list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
- list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
- mutex_unlock(&adev->sysfs_files_list_lock);
- return 0;
}
@@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, */ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) {
- struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev); struct amdgpu_gtt_mgr *mgr = man->priv; spin_lock(&mgr->lock); drm_mm_takedown(&mgr->mm);
@@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) kfree(mgr); man->priv = NULL;
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
- return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index ddb4af0c..554fec0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
const struct amd_ip_funcs psp_ip_funcs = { @@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
if (ret) DRM_ERROR("Failed to create USBC PD FW control file!");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
return ret;
}
static void psp_sysfs_fini(struct amdgpu_device *adev) {
- device_remove_file(adev->dev, &dev_attr_usbc_pd_fw);
}
const struct amdgpu_ip_block_version psp_v3_1_ip_block = diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 7723937..39c400c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO, static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO, amdgpu_mem_info_vram_vendor, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);
Converting all of these individual attributes to an attribute group would be a nice thing to do anyway. Makes your logic much simpler and less error-prone.
But again, the driver core should do all of the device file removal stuff automatically for you when your PCI device is removed from the system _UNLESS_ you are doing crazy things like creating child devices or messing with raw kobjects or other horrible things that I haven't read the code to see if you are, but hopefully not :)
thanks,
greg k-h
On 6/22/20 7:21 AM, Greg KH wrote:
On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove.
Huh? That should not happen, do you have a backtrace of that crash?
2 examples in the attached trace.
Andrey
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Uh I thought sysfs just gets yanked completely. Please check with Greg KH whether hand-rolling all this really is the right solution here ... Feels very wrong. I thought this was all supposed to work by adding attributes before publishing the sysfs node, and then letting sysfs clean up everything. Not by cleaning up manually yourself.
Yes, that is supposed to be the correct thing to do.
Adding Greg for an authoritative answer. -Daniel
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 13 +++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 7 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 ++++++++++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 12 ++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 13 ++++++++++- drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 10 +++++--- 8 files changed, 99 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 604a681..ba3775f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -726,6 +726,15 @@ struct amd_powerplay {
#define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4
+struct amdgpu_sysfs_list_node {
- struct list_head head;
- struct device_attribute *attr;
+};
You know we have lists of attributes already, called attribute groups, if you really wanted to do something like this. But, I don't think so.
Either way, don't hand-roll your own stuff that the driver core has provided for you for a decade or more, that's just foolish :)
+#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
- struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
- struct amdgpu_device { struct device *dev; struct drm_device *ddev;
@@ -992,6 +1001,10 @@ struct amdgpu_device { char product_number[16]; char product_name[32]; char serial[16];
struct list_head sysfs_files_list;
struct mutex sysfs_files_list_lock;
};
static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c index fdd52d8..c1549ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev, return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version); }
- static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
/**
- amdgpu_atombios_fini - free the driver info and callbacks for atombios
@@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev) adev->mode_info.atom_context = NULL; kfree(adev->mode_info.atom_card_info); adev->mode_info.atom_card_info = NULL;
device_remove_file(adev->dev, &dev_attr_vbios_version); }
/**
@@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) return ret; }
- mutex_lock(&adev->sysfs_files_list_lock);
- list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
- mutex_unlock(&adev->sysfs_files_list_lock);
- return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e7b9065..3173046 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = { NULL };
+static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name); +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
- /**
- amdgpu_device_init - initialize the driver
@@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_LIST_HEAD(&adev->shadow_list); mutex_init(&adev->shadow_list_lock);
- INIT_LIST_HEAD(&adev->sysfs_files_list);
- mutex_init(&adev->sysfs_files_list_lock);
- INIT_DELAYED_WORK(&adev->delayed_init_work, amdgpu_device_delayed_init_work_handler); INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
@@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (r) { dev_err(adev->dev, "Could not create amdgpu device attr\n"); return r;
} else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
if (IS_ENABLED(CONFIG_PERF_EVENTS))
@@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev, return r; }
+static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev) +{
- struct amdgpu_sysfs_list_node *node;
- mutex_lock(&adev->sysfs_files_list_lock);
- list_for_each_entry(node, &adev->sysfs_files_list, head)
device_remove_file(adev->dev, node->attr);
- mutex_unlock(&adev->sysfs_files_list_lock);
+}
- /**
- amdgpu_device_fini - tear down the driver
@@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev) amdgpu_fbdev_fini(adev);
amdgpu_irq_fini_early(adev);
amdgpu_sysfs_remove_files(adev);
if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
}
void amdgpu_device_fini_late(struct amdgpu_device *adev)
@@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev) adev->rmmio = NULL; amdgpu_device_doorbell_fini(adev);
- if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
- sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes); if (IS_ENABLED(CONFIG_PERF_EVENTS)) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index 6271044..e7b6c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO, amdgpu_mem_info_gtt_used_show, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
- /**
- amdgpu_gtt_mgr_init - init GTT manager and DRM MM
@@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, return ret; }
- mutex_lock(&adev->sysfs_files_list_lock);
- list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
- list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
- mutex_unlock(&adev->sysfs_files_list_lock);
- return 0; }
@@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, */ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) {
- struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev); struct amdgpu_gtt_mgr *mgr = man->priv; spin_lock(&mgr->lock); drm_mm_takedown(&mgr->mm);
@@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) kfree(mgr); man->priv = NULL;
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
- return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index ddb4af0c..554fec0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
const struct amd_ip_funcs psp_ip_funcs = {
@@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
if (ret) DRM_ERROR("Failed to create USBC PD FW control file!");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
return ret; }
static void psp_sysfs_fini(struct amdgpu_device *adev) {
device_remove_file(adev->dev, &dev_attr_usbc_pd_fw); }
const struct amdgpu_ip_block_version psp_v3_1_ip_block =
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 7723937..39c400c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO, static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO, amdgpu_mem_info_vram_vendor, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);
Converting all of these individual attributes to an attribute group would be a nice thing to do anyway. Makes your logic much simpler and less error-prone.
But again, the driver core should do all of the device file removal stuff automatically for you when your PCI device is removed from the system _UNLESS_ you are doing crazy things like creating child devices or messing with raw kobjects or other horrible things that I haven't read the code to see if you are, but hopefully not :)
thanks,
greg k-h
On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
On 6/22/20 7:21 AM, Greg KH wrote:
On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove.
Huh? That should not happen, do you have a backtrace of that crash?
2 examples in the attached trace.
Odd, how did you trigger these?
[ 925.738225 < 0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 925.738232 < 0.000007>] #PF: supervisor read access in kernel mode [ 925.738236 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 925.738240 < 0.000004>] PGD 0 P4D 0 [ 925.738245 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 925.738249 < 0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 925.738256 < 0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 925.738266 < 0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 925.738270 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 925.738282 < 0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246 [ 925.738287 < 0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 925.738292 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000 [ 925.738297 < 0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000 [ 925.738302 < 0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000 [ 925.738307 < 0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130 [ 925.738313 < 0.000006>] FS: 00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000 [ 925.738319 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 925.738323 < 0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0 [ 925.738329 < 0.000006>] Call Trace: [ 925.738334 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 925.738339 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 925.738344 < 0.000005>] sysfs_remove_groups+0x29/0x40 [ 925.738350 < 0.000006>] free_msi_irqs+0xf5/0x190 [ 925.738354 < 0.000004>] pci_disable_msi+0xe9/0x120
So the PCI core is trying to clean up attributes that it had registered, which is fine. But we can't seem to find the attributes? Were they already removed somewhere else?
that's odd.
[ 925.738406 < 0.000052>] amdgpu_irq_fini+0xe3/0xf0 [amdgpu] [ 925.738453 < 0.000047>] tonga_ih_sw_fini+0xe/0x30 [amdgpu] [ 925.738490 < 0.000037>] amdgpu_device_fini_late+0x14b/0x440 [amdgpu] [ 925.738529 < 0.000039>] amdgpu_driver_release_kms+0x16/0x40 [amdgpu] [ 925.738548 < 0.000019>] drm_dev_put+0x5b/0x80 [drm] [ 925.738558 < 0.000010>] drm_release+0xc6/0xd0 [drm] [ 925.738563 < 0.000005>] __fput+0xc6/0x260 [ 925.738568 < 0.000005>] task_work_run+0x79/0xb0 [ 925.738573 < 0.000005>] do_exit+0x3d0/0xc60 [ 925.738578 < 0.000005>] do_group_exit+0x47/0xb0 [ 925.738583 < 0.000005>] get_signal+0x18b/0xc30 [ 925.738589 < 0.000006>] do_signal+0x36/0x6a0 [ 925.738593 < 0.000004>] ? force_sig_info_to_task+0xbc/0xd0 [ 925.738597 < 0.000004>] ? signal_wake_up_state+0x15/0x30 [ 925.738603 < 0.000006>] exit_to_usermode_loop+0x6f/0xc0 [ 925.738608 < 0.000005>] prepare_exit_to_usermode+0xc7/0x110 [ 925.738613 < 0.000005>] ret_from_intr+0x25/0x35 [ 925.738617 < 0.000004>] RIP: 0033:0x417369 [ 925.738621 < 0.000004>] Code: Bad RIP value. [ 925.738625 < 0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246 [ 925.738629 < 0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260 [ 925.738634 < 0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000 [ 925.738639 < 0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700 [ 925.738645 < 0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170 [ 925.738650 < 0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000
[ 40.880899 < 0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 40.880906 < 0.000007>] #PF: supervisor read access in kernel mode [ 40.880910 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 40.880915 < 0.000005>] PGD 0 P4D 0 [ 40.880920 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 40.880924 < 0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 40.880932 < 0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 40.880941 < 0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 40.880945 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 40.880957 < 0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246 [ 40.880963 < 0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 40.880968 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000 [ 40.880973 < 0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000 [ 40.880979 < 0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000 [ 40.880984 < 0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130 [ 40.880990 < 0.000006>] FS: 00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000 [ 40.880996 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.881001 < 0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0 [ 40.881006 < 0.000005>] Call Trace: [ 40.881011 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 40.881016 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 40.881055 < 0.000039>] amdgpu_device_fini_late+0x3eb/0x440 [amdgpu] [ 40.881095 < 0.000040>] amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
Here is this is your driver doing the same thing, removing attributes it created. But again they are not there.
So something went through and wiped the tree clean, which if I'm reading this correctly, your patch would not solve as you would try to also remove attributes that were already removed, right?
And 5.5-rc7 is a bit old (6 months and many thousands of changes ago), does this still happen on a modern, released, kernel?
thanks,
greg k-h
On 6/22/20 12:45 PM, Greg KH wrote:
On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
On 6/22/20 7:21 AM, Greg KH wrote:
On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove.
Huh? That should not happen, do you have a backtrace of that crash?
2 examples in the attached trace.
Odd, how did you trigger these?
By manually triggering PCI remove from sysfs
cd /sys/bus/pci/devices/0000:05:00.0 && echo 1 > remove
[ 925.738225 < 0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 925.738232 < 0.000007>] #PF: supervisor read access in kernel mode [ 925.738236 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 925.738240 < 0.000004>] PGD 0 P4D 0 [ 925.738245 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 925.738249 < 0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 925.738256 < 0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 925.738266 < 0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 925.738270 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 925.738282 < 0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246 [ 925.738287 < 0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 925.738292 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000 [ 925.738297 < 0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000 [ 925.738302 < 0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000 [ 925.738307 < 0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130 [ 925.738313 < 0.000006>] FS: 00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000 [ 925.738319 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 925.738323 < 0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0 [ 925.738329 < 0.000006>] Call Trace: [ 925.738334 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 925.738339 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 925.738344 < 0.000005>] sysfs_remove_groups+0x29/0x40 [ 925.738350 < 0.000006>] free_msi_irqs+0xf5/0x190 [ 925.738354 < 0.000004>] pci_disable_msi+0xe9/0x120
So the PCI core is trying to clean up attributes that it had registered, which is fine. But we can't seem to find the attributes? Were they already removed somewhere else?
that's odd.
Yes, as i pointed above i am emulating device remove from sysfs and this triggers pci device remove sequence and as part of that my specific device folder (05:00.0) is removed from the sysfs tree.
[ 925.738406 < 0.000052>] amdgpu_irq_fini+0xe3/0xf0 [amdgpu] [ 925.738453 < 0.000047>] tonga_ih_sw_fini+0xe/0x30 [amdgpu] [ 925.738490 < 0.000037>] amdgpu_device_fini_late+0x14b/0x440 [amdgpu] [ 925.738529 < 0.000039>] amdgpu_driver_release_kms+0x16/0x40 [amdgpu] [ 925.738548 < 0.000019>] drm_dev_put+0x5b/0x80 [drm] [ 925.738558 < 0.000010>] drm_release+0xc6/0xd0 [drm] [ 925.738563 < 0.000005>] __fput+0xc6/0x260 [ 925.738568 < 0.000005>] task_work_run+0x79/0xb0 [ 925.738573 < 0.000005>] do_exit+0x3d0/0xc60 [ 925.738578 < 0.000005>] do_group_exit+0x47/0xb0 [ 925.738583 < 0.000005>] get_signal+0x18b/0xc30 [ 925.738589 < 0.000006>] do_signal+0x36/0x6a0 [ 925.738593 < 0.000004>] ? force_sig_info_to_task+0xbc/0xd0 [ 925.738597 < 0.000004>] ? signal_wake_up_state+0x15/0x30 [ 925.738603 < 0.000006>] exit_to_usermode_loop+0x6f/0xc0 [ 925.738608 < 0.000005>] prepare_exit_to_usermode+0xc7/0x110 [ 925.738613 < 0.000005>] ret_from_intr+0x25/0x35 [ 925.738617 < 0.000004>] RIP: 0033:0x417369 [ 925.738621 < 0.000004>] Code: Bad RIP value. [ 925.738625 < 0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246 [ 925.738629 < 0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260 [ 925.738634 < 0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000 [ 925.738639 < 0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700 [ 925.738645 < 0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170 [ 925.738650 < 0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000
[ 40.880899 < 0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 40.880906 < 0.000007>] #PF: supervisor read access in kernel mode [ 40.880910 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 40.880915 < 0.000005>] PGD 0 P4D 0 [ 40.880920 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 40.880924 < 0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 40.880932 < 0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 40.880941 < 0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 40.880945 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 40.880957 < 0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246 [ 40.880963 < 0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 40.880968 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000 [ 40.880973 < 0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000 [ 40.880979 < 0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000 [ 40.880984 < 0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130 [ 40.880990 < 0.000006>] FS: 00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000 [ 40.880996 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.881001 < 0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0 [ 40.881006 < 0.000005>] Call Trace: [ 40.881011 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 40.881016 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 40.881055 < 0.000039>] amdgpu_device_fini_late+0x3eb/0x440 [amdgpu] [ 40.881095 < 0.000040>] amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
Here is this is your driver doing the same thing, removing attributes it created. But again they are not there.
So something went through and wiped the tree clean, which if I'm reading this correctly, your patch would not solve as you would try to also remove attributes that were already removed, right?
I don't think so, the stack here is from a later stage (after pci remove) where the last user process holding a reference to the device file decides to die and thus triggering drm_dev_release sequence after drm dev refcount dropped to zero. And this why my patch helps, i am expediting all amdgpu sysfs attributes removal to the pci remove stage when the device folder is still present in the sysfs hierarchy. At least this is my understanding to why it helped. I admit i am not an expert on sysfs internals.
And 5.5-rc7 is a bit old (6 months and many thousands of changes ago), does this still happen on a modern, released, kernel?
I will give it a try with the latest and greatest but it might take some time as I have to make a temporary context switch to some urgent task.
Andrey
thanks,
greg k-h
On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
On 6/22/20 12:45 PM, Greg KH wrote:
On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
On 6/22/20 7:21 AM, Greg KH wrote:
On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove.
Huh? That should not happen, do you have a backtrace of that crash?
2 examples in the attached trace.
Odd, how did you trigger these?
By manually triggering PCI remove from sysfs
cd /sys/bus/pci/devices/0000:05:00.0 && echo 1 > remove
For some reason, I didn't think that video/drm devices could handle hot-remove like this. The "old" PCI hotplug specification explicitly said that video devices were not supported, has that changed?
And this whole issue is probably tied to the larger issue that Daniel was asking me about, when it came to device lifetimes and the drm layer, so odds are we need to fix that up first before worrying about trying to support this crazy request, right? :)
[ 925.738225 < 0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 925.738232 < 0.000007>] #PF: supervisor read access in kernel mode [ 925.738236 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 925.738240 < 0.000004>] PGD 0 P4D 0 [ 925.738245 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 925.738249 < 0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 925.738256 < 0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 925.738266 < 0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 925.738270 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 925.738282 < 0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246 [ 925.738287 < 0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 925.738292 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000 [ 925.738297 < 0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000 [ 925.738302 < 0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000 [ 925.738307 < 0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130 [ 925.738313 < 0.000006>] FS: 00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000 [ 925.738319 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 925.738323 < 0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0 [ 925.738329 < 0.000006>] Call Trace: [ 925.738334 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 925.738339 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 925.738344 < 0.000005>] sysfs_remove_groups+0x29/0x40 [ 925.738350 < 0.000006>] free_msi_irqs+0xf5/0x190 [ 925.738354 < 0.000004>] pci_disable_msi+0xe9/0x120
So the PCI core is trying to clean up attributes that it had registered, which is fine. But we can't seem to find the attributes? Were they already removed somewhere else?
that's odd.
Yes, as i pointed above i am emulating device remove from sysfs and this triggers pci device remove sequence and as part of that my specific device folder (05:00.0) is removed from the sysfs tree.
But why are things being removed twice?
[ 925.738406 < 0.000052>] amdgpu_irq_fini+0xe3/0xf0 [amdgpu] [ 925.738453 < 0.000047>] tonga_ih_sw_fini+0xe/0x30 [amdgpu] [ 925.738490 < 0.000037>] amdgpu_device_fini_late+0x14b/0x440 [amdgpu] [ 925.738529 < 0.000039>] amdgpu_driver_release_kms+0x16/0x40 [amdgpu] [ 925.738548 < 0.000019>] drm_dev_put+0x5b/0x80 [drm] [ 925.738558 < 0.000010>] drm_release+0xc6/0xd0 [drm] [ 925.738563 < 0.000005>] __fput+0xc6/0x260 [ 925.738568 < 0.000005>] task_work_run+0x79/0xb0 [ 925.738573 < 0.000005>] do_exit+0x3d0/0xc60 [ 925.738578 < 0.000005>] do_group_exit+0x47/0xb0 [ 925.738583 < 0.000005>] get_signal+0x18b/0xc30 [ 925.738589 < 0.000006>] do_signal+0x36/0x6a0 [ 925.738593 < 0.000004>] ? force_sig_info_to_task+0xbc/0xd0 [ 925.738597 < 0.000004>] ? signal_wake_up_state+0x15/0x30 [ 925.738603 < 0.000006>] exit_to_usermode_loop+0x6f/0xc0 [ 925.738608 < 0.000005>] prepare_exit_to_usermode+0xc7/0x110 [ 925.738613 < 0.000005>] ret_from_intr+0x25/0x35 [ 925.738617 < 0.000004>] RIP: 0033:0x417369 [ 925.738621 < 0.000004>] Code: Bad RIP value. [ 925.738625 < 0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246 [ 925.738629 < 0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260 [ 925.738634 < 0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000 [ 925.738639 < 0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700 [ 925.738645 < 0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170 [ 925.738650 < 0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000
[ 40.880899 < 0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 40.880906 < 0.000007>] #PF: supervisor read access in kernel mode [ 40.880910 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 40.880915 < 0.000005>] PGD 0 P4D 0 [ 40.880920 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 40.880924 < 0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 40.880932 < 0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 40.880941 < 0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 40.880945 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 40.880957 < 0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246 [ 40.880963 < 0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 40.880968 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000 [ 40.880973 < 0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000 [ 40.880979 < 0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000 [ 40.880984 < 0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130 [ 40.880990 < 0.000006>] FS: 00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000 [ 40.880996 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.881001 < 0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0 [ 40.881006 < 0.000005>] Call Trace: [ 40.881011 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 40.881016 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 40.881055 < 0.000039>] amdgpu_device_fini_late+0x3eb/0x440 [amdgpu] [ 40.881095 < 0.000040>] amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
Here is this is your driver doing the same thing, removing attributes it created. But again they are not there.
So something went through and wiped the tree clean, which if I'm reading this correctly, your patch would not solve as you would try to also remove attributes that were already removed, right?
I don't think so, the stack here is from a later stage (after pci remove) where the last user process holding a reference to the device file decides to die and thus triggering drm_dev_release sequence after drm dev refcount dropped to zero. And this why my patch helps, i am expediting all amdgpu sysfs attributes removal to the pci remove stage when the device folder is still present in the sysfs hierarchy. At least this is my understanding to why it helped. I admit i am not an expert on sysfs internals.
Ok, yeah, I think this is back to the drm lifecycle issues mentioned above.
{sigh}, I'll get to that once I deal with the -rc1/-rc2 merge fallout, that will take me a week or so, sorry...
thanks,
greg k-h
On 6/23/20 2:05 AM, Greg KH wrote:
On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
On 6/22/20 12:45 PM, Greg KH wrote:
On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
On 6/22/20 7:21 AM, Greg KH wrote:
On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote: > Track sysfs files in a list so they all can be removed during pci remove > since otherwise their removal after that causes crash because parent > folder was already removed during pci remove.
Huh? That should not happen, do you have a backtrace of that crash?
2 examples in the attached trace.
Odd, how did you trigger these?
By manually triggering PCI remove from sysfs
cd /sys/bus/pci/devices/0000:05:00.0 && echo 1 > remove
For some reason, I didn't think that video/drm devices could handle hot-remove like this. The "old" PCI hotplug specification explicitly said that video devices were not supported, has that changed?
And this whole issue is probably tied to the larger issue that Daniel was asking me about, when it came to device lifetimes and the drm layer, so odds are we need to fix that up first before worrying about trying to support this crazy request, right? :)
[ 925.738225 < 0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 925.738232 < 0.000007>] #PF: supervisor read access in kernel mode [ 925.738236 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 925.738240 < 0.000004>] PGD 0 P4D 0 [ 925.738245 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 925.738249 < 0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 925.738256 < 0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 925.738266 < 0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 925.738270 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 925.738282 < 0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246 [ 925.738287 < 0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 925.738292 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000 [ 925.738297 < 0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000 [ 925.738302 < 0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000 [ 925.738307 < 0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130 [ 925.738313 < 0.000006>] FS: 00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000 [ 925.738319 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 925.738323 < 0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0 [ 925.738329 < 0.000006>] Call Trace: [ 925.738334 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 925.738339 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 925.738344 < 0.000005>] sysfs_remove_groups+0x29/0x40 [ 925.738350 < 0.000006>] free_msi_irqs+0xf5/0x190 [ 925.738354 < 0.000004>] pci_disable_msi+0xe9/0x120
So the PCI core is trying to clean up attributes that it had registered, which is fine. But we can't seem to find the attributes? Were they already removed somewhere else?
that's odd.
Yes, as i pointed above i am emulating device remove from sysfs and this triggers pci device remove sequence and as part of that my specific device folder (05:00.0) is removed from the sysfs tree.
But why are things being removed twice?
Not sure I understand what removed twice ? I remove only once per sysfs attribute.
Andrey
[ 925.738406 < 0.000052>] amdgpu_irq_fini+0xe3/0xf0 [amdgpu] [ 925.738453 < 0.000047>] tonga_ih_sw_fini+0xe/0x30 [amdgpu] [ 925.738490 < 0.000037>] amdgpu_device_fini_late+0x14b/0x440 [amdgpu] [ 925.738529 < 0.000039>] amdgpu_driver_release_kms+0x16/0x40 [amdgpu] [ 925.738548 < 0.000019>] drm_dev_put+0x5b/0x80 [drm] [ 925.738558 < 0.000010>] drm_release+0xc6/0xd0 [drm] [ 925.738563 < 0.000005>] __fput+0xc6/0x260 [ 925.738568 < 0.000005>] task_work_run+0x79/0xb0 [ 925.738573 < 0.000005>] do_exit+0x3d0/0xc60 [ 925.738578 < 0.000005>] do_group_exit+0x47/0xb0 [ 925.738583 < 0.000005>] get_signal+0x18b/0xc30 [ 925.738589 < 0.000006>] do_signal+0x36/0x6a0 [ 925.738593 < 0.000004>] ? force_sig_info_to_task+0xbc/0xd0 [ 925.738597 < 0.000004>] ? signal_wake_up_state+0x15/0x30 [ 925.738603 < 0.000006>] exit_to_usermode_loop+0x6f/0xc0 [ 925.738608 < 0.000005>] prepare_exit_to_usermode+0xc7/0x110 [ 925.738613 < 0.000005>] ret_from_intr+0x25/0x35 [ 925.738617 < 0.000004>] RIP: 0033:0x417369 [ 925.738621 < 0.000004>] Code: Bad RIP value. [ 925.738625 < 0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246 [ 925.738629 < 0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260 [ 925.738634 < 0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000 [ 925.738639 < 0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700 [ 925.738645 < 0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170 [ 925.738650 < 0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000
[ 40.880899 < 0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 40.880906 < 0.000007>] #PF: supervisor read access in kernel mode [ 40.880910 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 40.880915 < 0.000005>] PGD 0 P4D 0 [ 40.880920 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 40.880924 < 0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 40.880932 < 0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 40.880941 < 0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 40.880945 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 40.880957 < 0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246 [ 40.880963 < 0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 40.880968 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000 [ 40.880973 < 0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000 [ 40.880979 < 0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000 [ 40.880984 < 0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130 [ 40.880990 < 0.000006>] FS: 00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000 [ 40.880996 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.881001 < 0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0 [ 40.881006 < 0.000005>] Call Trace: [ 40.881011 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 40.881016 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 40.881055 < 0.000039>] amdgpu_device_fini_late+0x3eb/0x440 [amdgpu] [ 40.881095 < 0.000040>] amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
Here is this is your driver doing the same thing, removing attributes it created. But again they are not there.
So something went through and wiped the tree clean, which if I'm reading this correctly, your patch would not solve as you would try to also remove attributes that were already removed, right?
I don't think so, the stack here is from a later stage (after pci remove) where the last user process holding a reference to the device file decides to die and thus triggering drm_dev_release sequence after drm dev refcount dropped to zero. And this why my patch helps, i am expediting all amdgpu sysfs attributes removal to the pci remove stage when the device folder is still present in the sysfs hierarchy. At least this is my understanding to why it helped. I admit i am not an expert on sysfs internals.
Ok, yeah, I think this is back to the drm lifecycle issues mentioned above.
{sigh}, I'll get to that once I deal with the -rc1/-rc2 merge fallout, that will take me a week or so, sorry...
thanks,
greg k-h
On Tue, Jun 23, 2020 at 11:04:30PM -0400, Andrey Grodzovsky wrote:
On 6/23/20 2:05 AM, Greg KH wrote:
On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
On 6/22/20 12:45 PM, Greg KH wrote:
On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
On 6/22/20 7:21 AM, Greg KH wrote:
On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote: > On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote: > > Track sysfs files in a list so they all can be removed during pci remove > > since otherwise their removal after that causes crash because parent > > folder was already removed during pci remove. Huh? That should not happen, do you have a backtrace of that crash?
2 examples in the attached trace.
Odd, how did you trigger these?
By manually triggering PCI remove from sysfs
cd /sys/bus/pci/devices/0000:05:00.0 && echo 1 > remove
For some reason, I didn't think that video/drm devices could handle hot-remove like this. The "old" PCI hotplug specification explicitly said that video devices were not supported, has that changed?
And this whole issue is probably tied to the larger issue that Daniel was asking me about, when it came to device lifetimes and the drm layer, so odds are we need to fix that up first before worrying about trying to support this crazy request, right? :)
[ 925.738225 < 0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 925.738232 < 0.000007>] #PF: supervisor read access in kernel mode [ 925.738236 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 925.738240 < 0.000004>] PGD 0 P4D 0 [ 925.738245 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 925.738249 < 0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 925.738256 < 0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 925.738266 < 0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 925.738270 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 925.738282 < 0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246 [ 925.738287 < 0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 925.738292 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000 [ 925.738297 < 0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000 [ 925.738302 < 0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000 [ 925.738307 < 0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130 [ 925.738313 < 0.000006>] FS: 00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000 [ 925.738319 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 925.738323 < 0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0 [ 925.738329 < 0.000006>] Call Trace: [ 925.738334 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 925.738339 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 925.738344 < 0.000005>] sysfs_remove_groups+0x29/0x40 [ 925.738350 < 0.000006>] free_msi_irqs+0xf5/0x190 [ 925.738354 < 0.000004>] pci_disable_msi+0xe9/0x120
So the PCI core is trying to clean up attributes that it had registered, which is fine. But we can't seem to find the attributes? Were they already removed somewhere else?
that's odd.
Yes, as i pointed above i am emulating device remove from sysfs and this triggers pci device remove sequence and as part of that my specific device folder (05:00.0) is removed from the sysfs tree.
But why are things being removed twice?
Not sure I understand what removed twice ? I remove only once per sysfs attribute.
This code path shows that the kernel is trying to remove a file that is not present, so someone removed it already...
thanks,
gre k-h
On 6/24/20 2:11 AM, Greg KH wrote:
On Tue, Jun 23, 2020 at 11:04:30PM -0400, Andrey Grodzovsky wrote:
On 6/23/20 2:05 AM, Greg KH wrote:
On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
On 6/22/20 12:45 PM, Greg KH wrote:
On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
On 6/22/20 7:21 AM, Greg KH wrote: > On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote: >> On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote: >>> Track sysfs files in a list so they all can be removed during pci remove >>> since otherwise their removal after that causes crash because parent >>> folder was already removed during pci remove. > Huh? That should not happen, do you have a backtrace of that crash? 2 examples in the attached trace.
Odd, how did you trigger these?
By manually triggering PCI remove from sysfs
cd /sys/bus/pci/devices/0000:05:00.0 && echo 1 > remove
For some reason, I didn't think that video/drm devices could handle hot-remove like this. The "old" PCI hotplug specification explicitly said that video devices were not supported, has that changed?
And this whole issue is probably tied to the larger issue that Daniel was asking me about, when it came to device lifetimes and the drm layer, so odds are we need to fix that up first before worrying about trying to support this crazy request, right? :)
[ 925.738225 < 0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090 [ 925.738232 < 0.000007>] #PF: supervisor read access in kernel mode [ 925.738236 < 0.000004>] #PF: error_code(0x0000) - not-present page [ 925.738240 < 0.000004>] PGD 0 P4D 0 [ 925.738245 < 0.000005>] Oops: 0000 [#1] SMP PTI [ 925.738249 < 0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G W OE 5.5.0-rc7-dev-kfd+ #50 [ 925.738256 < 0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 [ 925.738266 < 0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110 [ 925.738270 < 0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41 [ 925.738282 < 0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246 [ 925.738287 < 0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e [ 925.738292 < 0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000 [ 925.738297 < 0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000 [ 925.738302 < 0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000 [ 925.738307 < 0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130 [ 925.738313 < 0.000006>] FS: 00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000 [ 925.738319 < 0.000006>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 925.738323 < 0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0 [ 925.738329 < 0.000006>] Call Trace: [ 925.738334 < 0.000005>] kernfs_find_and_get_ns+0x2e/0x50 [ 925.738339 < 0.000005>] sysfs_remove_group+0x25/0x80 [ 925.738344 < 0.000005>] sysfs_remove_groups+0x29/0x40 [ 925.738350 < 0.000006>] free_msi_irqs+0xf5/0x190 [ 925.738354 < 0.000004>] pci_disable_msi+0xe9/0x120
So the PCI core is trying to clean up attributes that it had registered, which is fine. But we can't seem to find the attributes? Were they already removed somewhere else?
that's odd.
Yes, as i pointed above i am emulating device remove from sysfs and this triggers pci device remove sequence and as part of that my specific device folder (05:00.0) is removed from the sysfs tree.
But why are things being removed twice?
Not sure I understand what removed twice ? I remove only once per sysfs attribute.
This code path shows that the kernel is trying to remove a file that is not present, so someone removed it already...
thanks,
gre k-h
That a mystery for me too...
Andrey
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=amd-staging-drm-next-... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
Andrey
On 6/24/20 2:11 AM, Greg KH wrote:
But why are things being removed twice?
Not sure I understand what removed twice ? I remove only once per sysfs attribute.
This code path shows that the kernel is trying to remove a file that is not present, so someone removed it already...
thanks,
gre k-h
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=amd-staging-drm-next-... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Also, run your patch above through checkpatch.pl before submitting it :)
thanks,
greg k-h
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Andrey
Also, run your patch above through checkpatch.pl before submitting it :)
thanks,
greg k-h
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
thanks,
greg k-h
On 11/11/20 10:34 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
Googling for default groups attributes i found this - https://www.linuxfoundation.org/blog/2013/06/how-to-create-a-sysfs-file-corr... Would this be what you suggest for us ? Specifically for our case the struct device's groups seems the right solution as different devices might have slightly diffreent sysfs attributes.
Andrey
thanks,
greg k-h
On Wed, Nov 11, 2020 at 10:45:53AM -0500, Andrey Grodzovsky wrote:
On 11/11/20 10:34 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
Googling for default groups attributes i found this - https://www.linuxfoundation.org/blog/2013/06/how-to-create-a-sysfs-file-corr...
Odd, mirror of the original article: http://kroah.com/log/blog/2013/06/26/how-to-create-a-sysfs-file-correctly/
Would this be what you suggest for us ? Specifically for our case the struct device's groups seems the right solution as different devices might have slightly diffreent sysfs attributes.
That's what the is_visable() callback in your attribute group is for, to tell the kernel if an individual sysfs attribute should be created or not.
thanks,
greg k-h
On 11/11/20 11:06 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:45:53AM -0500, Andrey Grodzovsky wrote:
On 11/11/20 10:34 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
Googling for default groups attributes i found this - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linuxf...
Odd, mirror of the original article: https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkroah.com%2...
Would this be what you suggest for us ? Specifically for our case the struct device's groups seems the right solution as different devices might have slightly diffreent sysfs attributes.
That's what the is_visable() callback in your attribute group is for, to tell the kernel if an individual sysfs attribute should be created or not.
I see, this looks like a good improvement to our current way of managing sysfs. Since this change is somewhat fundamental and requires good testing I prefer to deal with it separately from my current work on device unplug and so I will put it on TODO right after finishing this work.
Andrey
thanks,
greg k-h
On 11/11/20 10:34 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
Hi Greg, I tried your suggestion to hang amdgpu's sysfs attributes on default attributes in struct device.groups but turns out it's not usable since by the time i have access to struct device from amdgpu code it has already been initialized by pci core (i.e. past the point where device_add->device_add_attrs->device_add_groups with dev->groups is called) and so i can't really use it.
What I can only think of using is creating my own struct attribute_group ** array in amdgpu where I aggregate all amdgpu sysfs attributes, call device_add_groups in the end of amgpu pci probe with that array and on device remove call device_remove_groups with the same array.
Do you maybe have a better suggestion for me ?
Andrey
thanks,
greg k-h
On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
On 11/11/20 10:34 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
Hi Greg, I tried your suggestion to hang amdgpu's sysfs attributes on default attributes in struct device.groups but turns out it's not usable since by the time i have access to struct device from amdgpu code it has already been initialized by pci core (i.e. past the point where device_add->device_add_attrs->device_add_groups with dev->groups is called) and so i can't really use it.
That's odd, why can't you just set the groups pointer in your pci_driver structure? That's what it is there for, right?
What I can only think of using is creating my own struct attribute_group ** array in amdgpu where I aggregate all amdgpu sysfs attributes, call device_add_groups in the end of amgpu pci probe with that array and on device remove call device_remove_groups with the same array.
Horrid, no, see above :)
thanks,
greg k-h
On 12/2/20 12:34 PM, Greg KH wrote:
On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
On 11/11/20 10:34 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
Hi, back to this after a long context switch for some higher priority stuff.
So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... was enough for me. Seems like while device_remove_file can handle the use case where the file and the parent directory already gone, sysfs_remove_group goes down in flames in that case due to kobj->sd being unset on device removal.
A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
Hi Greg, I tried your suggestion to hang amdgpu's sysfs attributes on default attributes in struct device.groups but turns out it's not usable since by the time i have access to struct device from amdgpu code it has already been initialized by pci core (i.e. past the point where device_add->device_add_attrs->device_add_groups with dev->groups is called) and so i can't really use it.
That's odd, why can't you just set the groups pointer in your pci_driver structure? That's what it is there for, right?
I am probably missing something but amdgpu sysfs attrs are per device not per driver and their life cycle is bound to the device and their location in the sysfs topology is under each device. Putting them as driver default attr will not put them in their current per device location and won't make them automatically be destroyed once a particular device goes away, no ?
Andrey
What I can only think of using is creating my own struct attribute_group ** array in amdgpu where I aggregate all amdgpu sysfs attributes, call device_add_groups in the end of amgpu pci probe with that array and on device remove call device_remove_groups with the same array.
Horrid, no, see above :)
thanks,
greg k-h
On Wed, Dec 02, 2020 at 01:02:06PM -0500, Andrey Grodzovsky wrote:
On 12/2/20 12:34 PM, Greg KH wrote:
On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
On 11/11/20 10:34 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote:
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote: > Hi, back to this after a long context switch for some higher priority stuff. > > So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... > was enough for me. Seems like while device_remove_file can handle the use > case where the file and the parent directory already gone, > sysfs_remove_group goes down in flames in that case > due to kobj->sd being unset on device removal. A driver shouldn't ever have to remove individual sysfs groups, the driver core/bus logic should do it for them automatically.
And whenever a driver calls a sysfs_* call, that's a hint that something is not working properly.
Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
Hi Greg, I tried your suggestion to hang amdgpu's sysfs attributes on default attributes in struct device.groups but turns out it's not usable since by the time i have access to struct device from amdgpu code it has already been initialized by pci core (i.e. past the point where device_add->device_add_attrs->device_add_groups with dev->groups is called) and so i can't really use it.
That's odd, why can't you just set the groups pointer in your pci_driver structure? That's what it is there for, right?
I am probably missing something but amdgpu sysfs attrs are per device not per driver
Oops, you are right, you want the 'dev_groups' field. Looks like pci doesn't export that directly, so you can do: .driver { .dev_groups = my_device_groups; }, in your pci_driver structure.
Or I'm sure the PCI driver maintainer would take a patch like 7d9c1d2f7aca ("USB: add support for dev_groups to struct usb_device_driver") was done for the USB subsystem, as diving into the "raw" .driver pointer isn't really that clean or nice in my opinion.
thanks,
greg k-h
On 12/2/20 1:20 PM, Greg KH wrote:
On Wed, Dec 02, 2020 at 01:02:06PM -0500, Andrey Grodzovsky wrote:
On 12/2/20 12:34 PM, Greg KH wrote:
On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
On 11/11/20 10:34 AM, Greg KH wrote:
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
On 11/10/20 12:59 PM, Greg KH wrote: > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote: >> Hi, back to this after a long context switch for some higher priority stuff. >> >> So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedes... >> was enough for me. Seems like while device_remove_file can handle the use >> case where the file and the parent directory already gone, >> sysfs_remove_group goes down in flames in that case >> due to kobj->sd being unset on device removal. > A driver shouldn't ever have to remove individual sysfs groups, the > driver core/bus logic should do it for them automatically. > > And whenever a driver calls a sysfs_* call, that's a hint that something > is not working properly. Do you mean that while the driver creates the groups and files explicitly from it's different subsystems it should not explicitly remove each one of them because all of them should be removed at once (and recursively) when the device is being removed ?
Individual drivers should never add groups/files in sysfs, the driver core should do it properly for you if you have everything set up properly. And yes, the driver core will automatically remove them as well.
Please use the default groups attribute for your bus/subsystem and this will happen automagically.
Hi Greg, I tried your suggestion to hang amdgpu's sysfs attributes on default attributes in struct device.groups but turns out it's not usable since by the time i have access to struct device from amdgpu code it has already been initialized by pci core (i.e. past the point where device_add->device_add_attrs->device_add_groups with dev->groups is called) and so i can't really use it.
That's odd, why can't you just set the groups pointer in your pci_driver structure? That's what it is there for, right?
I am probably missing something but amdgpu sysfs attrs are per device not per driver
Oops, you are right, you want the 'dev_groups' field. Looks like pci doesn't export that directly, so you can do: .driver { .dev_groups = my_device_groups; }, in your pci_driver structure.
Or I'm sure the PCI driver maintainer would take a patch like 7d9c1d2f7aca ("USB: add support for dev_groups to struct usb_device_driver") was done for the USB subsystem, as diving into the "raw" .driver pointer isn't really that clean or nice in my opinion.
Looks like what I need exactly. I will probably start with assigning raw pointer just to push ahead my work and in parallel will probably submit same patch as yours for PCI subsystem review as the rework to switch to this is really minimal.
Andrey
thanks,
greg k-h
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Track sysfs files in a list so they all can be removed during pci remove since otherwise their removal after that causes crash because parent folder was already removed during pci remove.
That looks extremely fishy to me.
It sounds like we just don't remove stuff in the right order.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 13 +++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 7 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 ++++++++++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 12 ++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 13 ++++++++++- drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 10 +++++--- 8 files changed, 99 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 604a681..ba3775f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -726,6 +726,15 @@ struct amd_powerplay {
#define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4
+struct amdgpu_sysfs_list_node {
- struct list_head head;
- struct device_attribute *attr;
+};
+#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
- struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
- struct amdgpu_device { struct device *dev; struct drm_device *ddev;
@@ -992,6 +1001,10 @@ struct amdgpu_device { char product_number[16]; char product_name[32]; char serial[16];
struct list_head sysfs_files_list;
struct mutex sysfs_files_list_lock;
};
static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c index fdd52d8..c1549ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev, return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version); }
- static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
/**
- amdgpu_atombios_fini - free the driver info and callbacks for atombios
@@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev) adev->mode_info.atom_context = NULL; kfree(adev->mode_info.atom_card_info); adev->mode_info.atom_card_info = NULL;
device_remove_file(adev->dev, &dev_attr_vbios_version); }
/**
@@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) return ret; }
- mutex_lock(&adev->sysfs_files_list_lock);
- list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
- mutex_unlock(&adev->sysfs_files_list_lock);
- return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e7b9065..3173046 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = { NULL };
+static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name); +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number); +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
- /**
- amdgpu_device_init - initialize the driver
@@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_LIST_HEAD(&adev->shadow_list); mutex_init(&adev->shadow_list_lock);
- INIT_LIST_HEAD(&adev->sysfs_files_list);
- mutex_init(&adev->sysfs_files_list_lock);
- INIT_DELAYED_WORK(&adev->delayed_init_work, amdgpu_device_delayed_init_work_handler); INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
@@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (r) { dev_err(adev->dev, "Could not create amdgpu device attr\n"); return r;
} else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
if (IS_ENABLED(CONFIG_PERF_EVENTS))
@@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev, return r; }
+static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev) +{
- struct amdgpu_sysfs_list_node *node;
- mutex_lock(&adev->sysfs_files_list_lock);
- list_for_each_entry(node, &adev->sysfs_files_list, head)
device_remove_file(adev->dev, node->attr);
- mutex_unlock(&adev->sysfs_files_list_lock);
+}
- /**
- amdgpu_device_fini - tear down the driver
@@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev) amdgpu_fbdev_fini(adev);
amdgpu_irq_fini_early(adev);
amdgpu_sysfs_remove_files(adev);
if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
}
void amdgpu_device_fini_late(struct amdgpu_device *adev)
@@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev) adev->rmmio = NULL; amdgpu_device_doorbell_fini(adev);
- if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
- sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes); if (IS_ENABLED(CONFIG_PERF_EVENTS)) amdgpu_pmu_fini(adev); if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index 6271044..e7b6c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO, amdgpu_mem_info_gtt_used_show, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
- /**
- amdgpu_gtt_mgr_init - init GTT manager and DRM MM
@@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, return ret; }
- mutex_lock(&adev->sysfs_files_list_lock);
- list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
- list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
- mutex_unlock(&adev->sysfs_files_list_lock);
- return 0; }
@@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man, */ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) {
- struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev); struct amdgpu_gtt_mgr *mgr = man->priv; spin_lock(&mgr->lock); drm_mm_takedown(&mgr->mm);
@@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man) kfree(mgr); man->priv = NULL;
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
- return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index ddb4af0c..554fec0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
const struct amd_ip_funcs psp_ip_funcs = {
@@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
if (ret) DRM_ERROR("Failed to create USBC PD FW control file!");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
return ret; }
static void psp_sysfs_fini(struct amdgpu_device *adev) {
device_remove_file(adev->dev, &dev_attr_usbc_pd_fw); }
const struct amdgpu_ip_block_version psp_v3_1_ip_block =
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 7723937..39c400c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO, static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO, amdgpu_mem_info_vram_vendor, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used); +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);
- static const struct attribute *amdgpu_vram_mgr_attributes[] = { &dev_attr_mem_info_vram_total.attr, &dev_attr_mem_info_vis_vram_total.attr,
@@ -184,6 +190,15 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man, ret = sysfs_create_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes); if (ret) DRM_ERROR("Failed to register sysfs\n");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_mem_info_vram_total.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_mem_info_vis_vram_total.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_mem_info_vram_used.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_mem_info_vis_vram_used.head, &adev->sysfs_files_list);
list_add_tail(&dev_attr_handle_mem_info_vram_vendor.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
return 0; }
@@ -198,7 +213,6 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man, */ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man) {
struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev); struct amdgpu_vram_mgr *mgr = man->priv;
spin_lock(&mgr->lock);
@@ -206,7 +220,6 @@ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man) spin_unlock(&mgr->lock); kfree(mgr); man->priv = NULL;
- sysfs_remove_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes); return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 90610b4..455eaa4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -272,6 +272,9 @@ static ssize_t amdgpu_xgmi_show_error(struct device *dev, static DEVICE_ATTR(xgmi_device_id, S_IRUGO, amdgpu_xgmi_show_device_id, NULL); static DEVICE_ATTR(xgmi_error, S_IRUGO, amdgpu_xgmi_show_error, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_device_id); +static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_error);
- static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) {
@@ -285,10 +288,19 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, return ret; }
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_xgmi_device_id.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
/* Create xgmi error file */ ret = device_create_file(adev->dev, &dev_attr_xgmi_error); if (ret) pr_err("failed to create xgmi_error\n");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_xgmi_error.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
/* Create sysfs link to hive info folder on the first device */
@@ -325,7 +337,6 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev, static void amdgpu_xgmi_sysfs_rem_dev_info(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) {
- device_remove_file(adev->dev, &dev_attr_xgmi_device_id); sysfs_remove_link(&adev->dev->kobj, adev->ddev->unique); sysfs_remove_link(hive->kobj, adev->ddev->unique); }
diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c index a7b8292..f95b0b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c @@ -265,6 +265,8 @@ static ssize_t df_v3_6_get_df_cntr_avail(struct device *dev, /* device attr for available perfmon counters */ static DEVICE_ATTR(df_cntr_avail, S_IRUGO, df_v3_6_get_df_cntr_avail, NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(df_cntr_avail);
- static void df_v3_6_query_hashes(struct amdgpu_device *adev) { u32 tmp;
@@ -299,6 +301,11 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev) ret = device_create_file(adev->dev, &dev_attr_df_cntr_avail); if (ret) DRM_ERROR("failed to create file for available df counters\n");
else {
mutex_lock(&adev->sysfs_files_list_lock);
list_add_tail(&dev_attr_handle_df_cntr_avail.head, &adev->sysfs_files_list);
mutex_unlock(&adev->sysfs_files_list_lock);
}
for (i = 0; i < AMDGPU_MAX_DF_PERFMONS; i++) adev->df_perfmon_config_assign_mask[i] = 0;
@@ -308,9 +315,6 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
static void df_v3_6_sw_fini(struct amdgpu_device *adev) {
device_remove_file(adev->dev, &dev_attr_df_cntr_avail);
}
static void df_v3_6_enable_broadcast_mode(struct amdgpu_device *adev,
Use the new TTM interface to invalidate all exsisting BO CPU mappings form all user proccesses.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 43592dc..6932d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev);
drm_dev_unplug(dev); + ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
pci_disable_device(pdev);
On Sun, Jun 21, 2020 at 02:03:06AM -0400, Andrey Grodzovsky wrote:
Use the new TTM interface to invalidate all exsisting BO CPU mappings form all user proccesses.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 43592dc..6932d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev);
drm_dev_unplug(dev);
- ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
Hm a ttm, or maybe even vram helper function which wraps drm_dev_unplug + ttm unmapping into one would be nice I think? I suspect there's going to be more in the future here. -Daniel
pci_disable_device(pdev);
2.7.4
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Use the new TTM interface to invalidate all exsisting BO CPU mappings form all user proccesses.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
I think those two patches could already land in amd-staging-drm-next since they are a good idea independent of how else we fix the other issues.
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 43592dc..6932d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev);
drm_dev_unplug(dev);
ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
pci_disable_device(pdev);
On Mon, Jun 22, 2020 at 3:38 PM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Use the new TTM interface to invalidate all exsisting BO CPU mappings form all user proccesses.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
I think those two patches could already land in amd-staging-drm-next since they are a good idea independent of how else we fix the other issues.
Please make sure they land in drm-misc as well.
Alex
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 43592dc..6932d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev);
drm_dev_unplug(dev);
ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev); pci_disable_device(pdev);
On Mon, Jun 22, 2020 at 03:48:29PM -0400, Alex Deucher wrote:
On Mon, Jun 22, 2020 at 3:38 PM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Use the new TTM interface to invalidate all exsisting BO CPU mappings form all user proccesses.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
I think those two patches could already land in amd-staging-drm-next since they are a good idea independent of how else we fix the other issues.
Please make sure they land in drm-misc as well.
Not sure that's much use, since without any of the fault side changes you just blow up on the first refault. Seems somewhat silly to charge ahead on this with the other bits still very much under discussion.
Plus I suggested a possible bikeshed here :-) -Daniel
Alex
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 43592dc..6932d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev);
drm_dev_unplug(dev);
ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev); pci_disable_device(pdev);
Am 23.06.20 um 12:22 schrieb Daniel Vetter:
On Mon, Jun 22, 2020 at 03:48:29PM -0400, Alex Deucher wrote:
On Mon, Jun 22, 2020 at 3:38 PM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Use the new TTM interface to invalidate all exsisting BO CPU mappings form all user proccesses.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
I think those two patches could already land in amd-staging-drm-next since they are a good idea independent of how else we fix the other issues.
Please make sure they land in drm-misc as well.
Not sure that's much use, since without any of the fault side changes you just blow up on the first refault. Seems somewhat silly to charge ahead on this with the other bits still very much under discussion.
Well what I wanted to say is that we don't need to send out those simple patches once more.
Plus I suggested a possible bikeshed here :-)
No bikeshed, but indeed a rather good idea to not make this a TTM function.
Christian.
-Daniel
Alex
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 43592dc..6932d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev);
drm_dev_unplug(dev);
ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev); pci_disable_device(pdev);
On 6/23/20 9:16 AM, Christian König wrote:
Am 23.06.20 um 12:22 schrieb Daniel Vetter:
On Mon, Jun 22, 2020 at 03:48:29PM -0400, Alex Deucher wrote:
On Mon, Jun 22, 2020 at 3:38 PM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
Use the new TTM interface to invalidate all exsisting BO CPU mappings form all user proccesses.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
I think those two patches could already land in amd-staging-drm-next since they are a good idea independent of how else we fix the other issues.
Please make sure they land in drm-misc as well.
Not sure that's much use, since without any of the fault side changes you just blow up on the first refault. Seems somewhat silly to charge ahead on this with the other bits still very much under discussion.
Well what I wanted to say is that we don't need to send out those simple patches once more.
Plus I suggested a possible bikeshed here :-)
No bikeshed, but indeed a rather good idea to not make this a TTM function.
Christian.
So i will incorporate the changes suggested to turn the TTM part into generic DRM helper and will resend both patches as part of V3 (which might take a while now due to a context switch I am doing for another task).
Andrey
-Daniel
Alex
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 43592dc..6932d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev);
drm_dev_unplug(dev);
- ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
amdgpu_driver_unload_kms(dev);
pci_disable_device(pdev);
entity->rq becomes null aftre device unplugged so just return early in that case.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index 8d9c6fe..d252427 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -24,6 +24,7 @@ #include "amdgpu_job.h" #include "amdgpu_object.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h>
#define AMDGPU_VM_SDMA_MIN_NUM_DW 256u #define AMDGPU_VM_SDMA_MAX_NUM_DW (16u * 1024u) @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, struct drm_sched_entity *entity; struct amdgpu_ring *ring; struct dma_fence *f; - int r; + int r, idx; + + if (!drm_dev_enter(p->adev->ddev, &idx)) { + r = -ENODEV; + goto nodev; + }
entity = p->immediate ? &p->vm->immediate : &p->vm->delayed; ring = container_of(entity->rq->sched, struct amdgpu_ring, sched); @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, WARN_ON(ib->length_dw > p->num_dw_left); r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f); if (r) - goto error; + goto job_fail;
if (p->unlocked) { struct dma_fence *tmp = dma_fence_get(f); @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, if (fence && !p->immediate) swap(*fence, f); dma_fence_put(f); - return 0;
-error: - amdgpu_job_free(p->job); + r = 0; + +job_fail: + drm_dev_exit(idx); +nodev: + if (r) + amdgpu_job_free(p->job); + return r; }
On Sun, Jun 21, 2020 at 02:03:07AM -0400, Andrey Grodzovsky wrote:
entity->rq becomes null aftre device unplugged so just return early in that case.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
That looks very deep in amdgpu internals ... how do you even get in here after the device is fully unplugged on the sw side?
Is this amdkfd doing something stupid because entirely unaware of what amdgpu has done? Something else? Just feels like this is just duct-taping over a more fundamental problem, after hotunplug no one should be able to even submit anything new, or do bo moves, or well anything really. -Daniel
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index 8d9c6fe..d252427 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -24,6 +24,7 @@ #include "amdgpu_job.h" #include "amdgpu_object.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h>
#define AMDGPU_VM_SDMA_MIN_NUM_DW 256u #define AMDGPU_VM_SDMA_MAX_NUM_DW (16u * 1024u) @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, struct drm_sched_entity *entity; struct amdgpu_ring *ring; struct dma_fence *f;
- int r;
int r, idx;
if (!drm_dev_enter(p->adev->ddev, &idx)) {
r = -ENODEV;
goto nodev;
}
entity = p->immediate ? &p->vm->immediate : &p->vm->delayed; ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
@@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, WARN_ON(ib->length_dw > p->num_dw_left); r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f); if (r)
goto error;
goto job_fail;
if (p->unlocked) { struct dma_fence *tmp = dma_fence_get(f);
@@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, if (fence && !p->immediate) swap(*fence, f); dma_fence_put(f);
- return 0;
-error:
- amdgpu_job_free(p->job);
- r = 0;
+job_fail:
- drm_dev_exit(idx);
+nodev:
- if (r)
amdgpu_job_free(p->job);
- return r;
}
-- 2.7.4
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
entity->rq becomes null aftre device unplugged so just return early in that case.
Mhm, do you have a backtrace for this?
This should only be called by an IOCTL and IOCTLs should already call drm_dev_enter()/exit() on their own...
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index 8d9c6fe..d252427 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -24,6 +24,7 @@ #include "amdgpu_job.h" #include "amdgpu_object.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h>
#define AMDGPU_VM_SDMA_MIN_NUM_DW 256u #define AMDGPU_VM_SDMA_MAX_NUM_DW (16u * 1024u) @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, struct drm_sched_entity *entity; struct amdgpu_ring *ring; struct dma_fence *f;
- int r;
int r, idx;
if (!drm_dev_enter(p->adev->ddev, &idx)) {
r = -ENODEV;
goto nodev;
}
entity = p->immediate ? &p->vm->immediate : &p->vm->delayed; ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
@@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, WARN_ON(ib->length_dw > p->num_dw_left); r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f); if (r)
goto error;
goto job_fail;
if (p->unlocked) { struct dma_fence *tmp = dma_fence_get(f);
@@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, if (fence && !p->immediate) swap(*fence, f); dma_fence_put(f);
- return 0;
-error:
- amdgpu_job_free(p->job);
- r = 0;
+job_fail:
- drm_dev_exit(idx);
+nodev:
- if (r)
amdgpu_job_free(p->job);
- return r; }
On 6/22/20 3:40 PM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
entity->rq becomes null aftre device unplugged so just return early in that case.
Mhm, do you have a backtrace for this?
This should only be called by an IOCTL and IOCTLs should already call drm_dev_enter()/exit() on their own...
Christian.
See bellow, it's not during IOCTL but during all GEM objects release when releasing the device. entity->rq becomes null because all the gpu schedulers are marked as not ready during the early pci remove stage and so the next time sdma job tries to pick a scheduler to run nothing is available and it's set to null.
Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382648] BUG: kernel NULL pointer dereference, address: 0000000000000038 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382651] #PF: supervisor read access in kernel mode Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382652] #PF: error_code(0x0000) - not-present page Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382653] PGD 0 P4D 0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382656] Oops: 0000 [#1] SMP PTI Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382658] CPU: 6 PID: 2598 Comm: llvmpipe-6 Tainted: G OE 5.6.0-dev+ #51 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382659] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382700] RIP: 0010:amdgpu_vm_sdma_commit+0x6c/0x270 [amdgpu] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382702] Code: 01 00 00 48 89 ee 48 c7 c7 ef d4 85 c0 e8 fc 5f e8 ff 48 8b 75 10 48 c7 c7 fd d4 85 c0 e8 ec 5f e8 ff 48 8b 45 10 41 8b 55 08 <48> 8b 40 38 85 d2 48 8d b8 30 ff ff ff 0f 84 9b 01 00 00 48 8b 80 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382704] RSP: 0018:ffffa88e40f57950 EFLAGS: 00010282 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382705] RAX: 0000000000000000 RBX: ffffa88e40f579a8 RCX: 0000000000000001 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382707] RDX: 0000000000000014 RSI: ffff94d4d62388e0 RDI: ffff94d4dbd98e30 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382708] RBP: ffff94d4d2ad3288 R08: 0000000000000000 R09: 0000000000000001 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382709] R10: 000000000000001f R11: 0000000000000000 R12: ffffa88e40f57a48 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382710] R13: ffff94d4d627a5e8 R14: ffff94d4d424d978 R15: 0000000800100020 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382712] FS: 00007f30ae694700(0000) GS:ffff94d4dbd80000(0000) knlGS:0000000000000000 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382713] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382714] CR2: 0000000000000038 CR3: 0000000121810006 CR4: 00000000000606e0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382716] Call Trace: Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382755] amdgpu_vm_bo_update_mapping.constprop.30+0x16b/0x230 [amdgpu] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382795] amdgpu_vm_clear_freed+0xd7/0x210 [amdgpu] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382833] amdgpu_gem_object_close+0x200/0x2b0 [amdgpu] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382856] ? drm_gem_object_handle_put_unlocked+0x90/0x90 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382864] ? drm_gem_object_release_handle+0x2c/0x90 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382872] drm_gem_object_release_handle+0x2c/0x90 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382879] ? drm_gem_object_handle_put_unlocked+0x90/0x90 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382882] idr_for_each+0x48/0xd0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382885] ? _raw_spin_unlock_irqrestore+0x2d/0x50 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382893] drm_gem_release+0x1c/0x30 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382901] drm_file_free+0x21d/0x270 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382908] drm_release+0x67/0xe0 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382912] __fput+0xc6/0x260 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382916] task_work_run+0x79/0xb0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382919] do_exit+0x3d0/0xc40 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382921] ? get_signal+0x13d/0xc30 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382924] do_group_exit+0x47/0xb0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382926] get_signal+0x18b/0xc30 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382929] do_signal+0x36/0x6a0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382931] ? __set_task_comm+0x62/0x120 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382935] ? __x64_sys_futex+0x88/0x180 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382938] exit_to_usermode_loop+0x6f/0xc0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382941] do_syscall_64+0x149/0x1c0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382943] entry_SYSCALL_64_after_hwframe+0x49/0xbe Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382944] RIP: 0033:0x7f30f7f35360 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382947] Code: Bad RIP value.
Andrey
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index 8d9c6fe..d252427 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -24,6 +24,7 @@ #include "amdgpu_job.h" #include "amdgpu_object.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h> #define AMDGPU_VM_SDMA_MIN_NUM_DW 256u #define AMDGPU_VM_SDMA_MAX_NUM_DW (16u * 1024u) @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, struct drm_sched_entity *entity; struct amdgpu_ring *ring; struct dma_fence *f; - int r; + int r, idx;
+ if (!drm_dev_enter(p->adev->ddev, &idx)) { + r = -ENODEV; + goto nodev; + } entity = p->immediate ? &p->vm->immediate : &p->vm->delayed; ring = container_of(entity->rq->sched, struct amdgpu_ring, sched); @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, WARN_ON(ib->length_dw > p->num_dw_left); r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f); if (r) - goto error; + goto job_fail; if (p->unlocked) { struct dma_fence *tmp = dma_fence_get(f); @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, if (fence && !p->immediate) swap(*fence, f); dma_fence_put(f); - return 0; -error: - amdgpu_job_free(p->job); + r = 0;
+job_fail: + drm_dev_exit(idx); +nodev: + if (r) + amdgpu_job_free(p->job);
return r; }
Am 23.06.20 um 07:11 schrieb Andrey Grodzovsky:
On 6/22/20 3:40 PM, Christian König wrote:
Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
entity->rq becomes null aftre device unplugged so just return early in that case.
Mhm, do you have a backtrace for this?
This should only be called by an IOCTL and IOCTLs should already call drm_dev_enter()/exit() on their own...
Christian.
See bellow, it's not during IOCTL but during all GEM objects release when releasing the device. entity->rq becomes null because all the gpu schedulers are marked as not ready during the early pci remove stage and so the next time sdma job tries to pick a scheduler to run nothing is available and it's set to null.
I see. This should then probably go into amdgpu_gem_object_close() before we reserve the PD.
See drm_dev_enter()/exit() are kind of a read side lock and with this we create a nice lock inversion when we do it in the low level SDMA VM backend.
Christian.
Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382648] BUG: kernel NULL pointer dereference, address: 0000000000000038 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382651] #PF: supervisor read access in kernel mode Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382652] #PF: error_code(0x0000) - not-present page Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382653] PGD 0 P4D 0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382656] Oops: 0000 [#1] SMP PTI Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382658] CPU: 6 PID: 2598 Comm: llvmpipe-6 Tainted: G OE 5.6.0-dev+ #51 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382659] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382700] RIP: 0010:amdgpu_vm_sdma_commit+0x6c/0x270 [amdgpu] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382702] Code: 01 00 00 48 89 ee 48 c7 c7 ef d4 85 c0 e8 fc 5f e8 ff 48 8b 75 10 48 c7 c7 fd d4 85 c0 e8 ec 5f e8 ff 48 8b 45 10 41 8b 55 08 <48> 8b 40 38 85 d2 48 8d b8 30 ff ff ff 0f 84 9b 01 00 00 48 8b 80 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382704] RSP: 0018:ffffa88e40f57950 EFLAGS: 00010282 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382705] RAX: 0000000000000000 RBX: ffffa88e40f579a8 RCX: 0000000000000001 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382707] RDX: 0000000000000014 RSI: ffff94d4d62388e0 RDI: ffff94d4dbd98e30 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382708] RBP: ffff94d4d2ad3288 R08: 0000000000000000 R09: 0000000000000001 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382709] R10: 000000000000001f R11: 0000000000000000 R12: ffffa88e40f57a48 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382710] R13: ffff94d4d627a5e8 R14: ffff94d4d424d978 R15: 0000000800100020 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382712] FS: 00007f30ae694700(0000) GS:ffff94d4dbd80000(0000) knlGS:0000000000000000 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382713] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382714] CR2: 0000000000000038 CR3: 0000000121810006 CR4: 00000000000606e0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382716] Call Trace: Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382755] amdgpu_vm_bo_update_mapping.constprop.30+0x16b/0x230 [amdgpu] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382795] amdgpu_vm_clear_freed+0xd7/0x210 [amdgpu] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382833] amdgpu_gem_object_close+0x200/0x2b0 [amdgpu] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382856] ? drm_gem_object_handle_put_unlocked+0x90/0x90 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382864] ? drm_gem_object_release_handle+0x2c/0x90 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382872] drm_gem_object_release_handle+0x2c/0x90 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382879] ? drm_gem_object_handle_put_unlocked+0x90/0x90 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382882] idr_for_each+0x48/0xd0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382885] ? _raw_spin_unlock_irqrestore+0x2d/0x50 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382893] drm_gem_release+0x1c/0x30 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382901] drm_file_free+0x21d/0x270 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382908] drm_release+0x67/0xe0 [drm] Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382912] __fput+0xc6/0x260 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382916] task_work_run+0x79/0xb0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382919] do_exit+0x3d0/0xc40 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382921] ? get_signal+0x13d/0xc30 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382924] do_group_exit+0x47/0xb0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382926] get_signal+0x18b/0xc30 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382929] do_signal+0x36/0x6a0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382931] ? __set_task_comm+0x62/0x120 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382935] ? __x64_sys_futex+0x88/0x180 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382938] exit_to_usermode_loop+0x6f/0xc0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382941] do_syscall_64+0x149/0x1c0 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382943] entry_SYSCALL_64_after_hwframe+0x49/0xbe Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382944] RIP: 0033:0x7f30f7f35360 Jun 8 11:14:56 ubuntu-1604-test kernel: [ 44.382947] Code: Bad RIP value.
Andrey
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index 8d9c6fe..d252427 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -24,6 +24,7 @@ #include "amdgpu_job.h" #include "amdgpu_object.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h> #define AMDGPU_VM_SDMA_MIN_NUM_DW 256u #define AMDGPU_VM_SDMA_MAX_NUM_DW (16u * 1024u) @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, struct drm_sched_entity *entity; struct amdgpu_ring *ring; struct dma_fence *f; - int r; + int r, idx;
+ if (!drm_dev_enter(p->adev->ddev, &idx)) { + r = -ENODEV; + goto nodev; + } entity = p->immediate ? &p->vm->immediate : &p->vm->delayed; ring = container_of(entity->rq->sched, struct amdgpu_ring, sched); @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, WARN_ON(ib->length_dw > p->num_dw_left); r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f); if (r) - goto error; + goto job_fail; if (p->unlocked) { struct dma_fence *tmp = dma_fence_get(f); @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p, if (fence && !p->immediate) swap(*fence, f); dma_fence_put(f); - return 0; -error: - amdgpu_job_free(p->job); + r = 0;
+job_fail: + drm_dev_exit(idx); +nodev: + if (r) + amdgpu_job_free(p->job);
return r; }
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
No point to try recovery if device is gone, just messes up things.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6932d75..5d6d3d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, return ret; }
+static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) +{ + int i; + + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->sched.thread) + continue; + + cancel_delayed_work_sync(&ring->sched.work_tdr); + } +} + static void amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev); + struct amdgpu_device *adev = dev->dev_private;
drm_dev_unplug(dev); + amdgpu_cancel_all_tdr(adev); ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4720718..87ff0c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,6 +28,8 @@ #include "amdgpu.h" #include "amdgpu_trace.h"
+#include <drm/drm_drv.h> + static void amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
memset(&ti, 0, sizeof(struct amdgpu_task_info));
+ if (drm_dev_is_unplugged(adev->ddev)) { + DRM_INFO("ring %s timeout, but device unplugged, skipping.\n", + s_job->sched->name); + return; + } + if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
No point to try recovery if device is gone, just messes up things.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6932d75..5d6d3d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, return ret; }
+static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) +{
- int i;
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->sched.thread)
continue;
cancel_delayed_work_sync(&ring->sched.work_tdr);
- }
+}
I think this is a function that's supposed to be in drm/scheduler, not here. Might also just be your cleanup code being ordered wrongly, or your split in one of the earlier patches not done quite right. -Daniel
static void amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
struct amdgpu_device *adev = dev->dev_private;
drm_dev_unplug(dev);
amdgpu_cancel_all_tdr(adev); ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4720718..87ff0c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,6 +28,8 @@ #include "amdgpu.h" #include "amdgpu_trace.h"
+#include <drm/drm_drv.h>
static void amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
memset(&ti, 0, sizeof(struct amdgpu_task_info));
- if (drm_dev_is_unplugged(adev->ddev)) {
DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
s_job->sched->name);
return;
- }
- if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
-- 2.7.4
On 6/22/20 5:53 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
No point to try recovery if device is gone, just messes up things.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6932d75..5d6d3d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, return ret; }
+static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) +{
- int i;
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->sched.thread)
continue;
cancel_delayed_work_sync(&ring->sched.work_tdr);
- }
+}
I think this is a function that's supposed to be in drm/scheduler, not here. Might also just be your cleanup code being ordered wrongly, or your split in one of the earlier patches not done quite right. -Daniel
This function iterates across all the schedulers per amdgpu device and accesses amdgpu specific structures , drm/scheduler deals with single scheduler at most so looks to me like this is the right place for this function
Andrey
static void amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
struct amdgpu_device *adev = dev->dev_private;
drm_dev_unplug(dev);
amdgpu_cancel_all_tdr(adev); ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4720718..87ff0c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,6 +28,8 @@ #include "amdgpu.h" #include "amdgpu_trace.h"
+#include <drm/drm_drv.h>
- static void amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
@@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
memset(&ti, 0, sizeof(struct amdgpu_task_info));
- if (drm_dev_is_unplugged(adev->ddev)) {
DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
s_job->sched->name);
return;
- }
- if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
-- 2.7.4
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
On 6/22/20 5:53 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
No point to try recovery if device is gone, just messes up things.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6932d75..5d6d3d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, return ret; } +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) +{
- int i;
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->sched.thread)
continue;
cancel_delayed_work_sync(&ring->sched.work_tdr);
- }
+}
I think this is a function that's supposed to be in drm/scheduler, not here. Might also just be your cleanup code being ordered wrongly, or your split in one of the earlier patches not done quite right. -Daniel
This function iterates across all the schedulers per amdgpu device and accesses amdgpu specific structures , drm/scheduler deals with single scheduler at most so looks to me like this is the right place for this function
I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race. I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack. -Daniel
Andrey
- static void amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
- struct amdgpu_device *adev = dev->dev_private; drm_dev_unplug(dev);
- amdgpu_cancel_all_tdr(adev); ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4720718..87ff0c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,6 +28,8 @@ #include "amdgpu.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h>
- static void amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
@@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) memset(&ti, 0, sizeof(struct amdgpu_task_info));
- if (drm_dev_is_unplugged(adev->ddev)) {
DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
s_job->sched->name);
return;
- }
- if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
-- 2.7.4
On 11/17/20 1:52 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
On 6/22/20 5:53 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
No point to try recovery if device is gone, just messes up things.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6932d75..5d6d3d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, return ret; } +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) +{
- int i;
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->sched.thread)
continue;
cancel_delayed_work_sync(&ring->sched.work_tdr);
- }
+}
I think this is a function that's supposed to be in drm/scheduler, not here. Might also just be your cleanup code being ordered wrongly, or your split in one of the earlier patches not done quite right. -Daniel
This function iterates across all the schedulers per amdgpu device and accesses amdgpu specific structures , drm/scheduler deals with single scheduler at most so looks to me like this is the right place for this function
I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race.
Will do
I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack.
Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Andrey
-Daniel
Andrey
- static void amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
- struct amdgpu_device *adev = dev->dev_private; drm_dev_unplug(dev);
- amdgpu_cancel_all_tdr(adev); ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4720718..87ff0c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,6 +28,8 @@ #include "amdgpu.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h>
- static void amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
@@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) memset(&ti, 0, sizeof(struct amdgpu_task_info));
- if (drm_dev_is_unplugged(adev->ddev)) {
DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
s_job->sched->name);
return;
- }
- if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
-- 2.7.4
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
On 11/17/20 1:52 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
On 6/22/20 5:53 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
No point to try recovery if device is gone, just messes up things.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6932d75..5d6d3d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, return ret; } +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) +{
- int i;
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->sched.thread)
continue;
cancel_delayed_work_sync(&ring->sched.work_tdr);
- }
+}
I think this is a function that's supposed to be in drm/scheduler, not here. Might also just be your cleanup code being ordered wrongly, or your split in one of the earlier patches not done quite right. -Daniel
This function iterates across all the schedulers per amdgpu device and accesses amdgpu specific structures , drm/scheduler deals with single scheduler at most so looks to me like this is the right place for this function
I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race.
Will do
I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack.
Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
For race free cancel_work_sync you need: 1. make sure whatever is calling schedule_work is guaranteed to no longer call schedule_work. 2. call cancel_work_sync
Anything else is cargo-culted work cleanup:
- 1. without 2. means if a work got scheduled right before it'll still be a problem. - 2. without 1. means a schedule_work right after makes you calling cancel_work_sync pointless.
So either both or nothing. -Daniel
Andrey
-Daniel
Andrey
- static void amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
- struct amdgpu_device *adev = dev->dev_private; drm_dev_unplug(dev);
- amdgpu_cancel_all_tdr(adev); ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4720718..87ff0c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,6 +28,8 @@ #include "amdgpu.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h>
- static void amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
@@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) memset(&ti, 0, sizeof(struct amdgpu_task_info));
- if (drm_dev_is_unplugged(adev->ddev)) {
DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
s_job->sched->name);
return;
- }
- if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
-- 2.7.4
On 11/17/20 2:49 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
On 11/17/20 1:52 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
On 6/22/20 5:53 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
No point to try recovery if device is gone, just messes up things.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6932d75..5d6d3d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, return ret; } +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) +{
- int i;
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->sched.thread)
continue;
cancel_delayed_work_sync(&ring->sched.work_tdr);
- }
+}
I think this is a function that's supposed to be in drm/scheduler, not here. Might also just be your cleanup code being ordered wrongly, or your split in one of the earlier patches not done quite right. -Daniel
This function iterates across all the schedulers per amdgpu device and accesses amdgpu specific structures , drm/scheduler deals with single scheduler at most so looks to me like this is the right place for this function
I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race.
Will do
I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack.
Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
Agree, synchronize_srcu from drm_dev_unplug should play the role of 'flushing' any earlier (in progress) tdr work which is using drm_dev_enter/exit pair. Any later arising tdr will terminate early when drm_dev_enter returns false.
Will update.
Andrey
For race free cancel_work_sync you need:
- make sure whatever is calling schedule_work is guaranteed to no longer
call schedule_work. 2. call cancel_work_sync
Anything else is cargo-culted work cleanup:
a problem.
- without 2. means if a work got scheduled right before it'll still be
cancel_work_sync pointless.
- without 1. means a schedule_work right after makes you calling
So either both or nothing. -Daniel
Andrey
-Daniel
Andrey
- static void amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
- struct amdgpu_device *adev = dev->dev_private; drm_dev_unplug(dev);
- amdgpu_cancel_all_tdr(adev); ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4720718..87ff0c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,6 +28,8 @@ #include "amdgpu.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h>
- static void amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
@@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) memset(&ti, 0, sizeof(struct amdgpu_task_info));
- if (drm_dev_is_unplugged(adev->ddev)) {
DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
s_job->sched->name);
return;
- }
if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
-- 2.7.4
On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 11/17/20 2:49 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
On 11/17/20 1:52 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
On 6/22/20 5:53 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote: > No point to try recovery if device is gone, just messes up things. > > Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ > 2 files changed, 24 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 6932d75..5d6d3d9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, > return ret; > } > +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) > +{ > + int i; > + > + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > + struct amdgpu_ring *ring = adev->rings[i]; > + > + if (!ring || !ring->sched.thread) > + continue; > + > + cancel_delayed_work_sync(&ring->sched.work_tdr); > + } > +} I think this is a function that's supposed to be in drm/scheduler, not here. Might also just be your cleanup code being ordered wrongly, or your split in one of the earlier patches not done quite right. -Daniel
This function iterates across all the schedulers per amdgpu device and accesses amdgpu specific structures , drm/scheduler deals with single scheduler at most so looks to me like this is the right place for this function
I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race.
Will do
I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack.
Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
Agree, synchronize_srcu from drm_dev_unplug should play the role of 'flushing' any earlier (in progress) tdr work which is using drm_dev_enter/exit pair. Any later arising tdr will terminate early when drm_dev_enter returns false.
Nope, anything you put into the work itself cannot close this race. It's the schedule_work that matters here. Or I'm missing something ... I thought that the tdr work you're cancelling here is launched by drm/scheduler code, not by the amd callback? -Daniel
Will update.
Andrey
For race free cancel_work_sync you need:
- make sure whatever is calling schedule_work is guaranteed to no longer
call schedule_work. 2. call cancel_work_sync
Anything else is cargo-culted work cleanup:
a problem.
- without 2. means if a work got scheduled right before it'll still be
cancel_work_sync pointless.
- without 1. means a schedule_work right after makes you calling
So either both or nothing. -Daniel
Andrey
-Daniel
Andrey
> + > static void > amdgpu_pci_remove(struct pci_dev *pdev) > { > struct drm_device *dev = pci_get_drvdata(pdev); > + struct amdgpu_device *adev = dev->dev_private; > drm_dev_unplug(dev); > + amdgpu_cancel_all_tdr(adev); > ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); > amdgpu_driver_unload_kms(dev); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > index 4720718..87ff0c0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > @@ -28,6 +28,8 @@ > #include "amdgpu.h" > #include "amdgpu_trace.h" > +#include <drm/drm_drv.h> > + > static void amdgpu_job_timedout(struct drm_sched_job *s_job) > { > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); > @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) > memset(&ti, 0, sizeof(struct amdgpu_task_info)); > + if (drm_dev_is_unplugged(adev->ddev)) { > + DRM_INFO("ring %s timeout, but device unplugged, skipping.\n", > + s_job->sched->name); > + return; > + } > + > if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { > DRM_ERROR("ring %s timeout, but soft recovered\n", > s_job->sched->name); > -- > 2.7.4 >
Am 18.11.20 um 08:39 schrieb Daniel Vetter:
On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 11/17/20 2:49 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
On 11/17/20 1:52 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
On 6/22/20 5:53 AM, Daniel Vetter wrote: > On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote: >> No point to try recovery if device is gone, just messes up things. >> >> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ >> 2 files changed, 24 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> index 6932d75..5d6d3d9 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, >> return ret; >> } >> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) >> +{ >> + int i; >> + >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + cancel_delayed_work_sync(&ring->sched.work_tdr); >> + } >> +} > I think this is a function that's supposed to be in drm/scheduler, not > here. Might also just be your cleanup code being ordered wrongly, or your > split in one of the earlier patches not done quite right. > -Daniel This function iterates across all the schedulers per amdgpu device and accesses amdgpu specific structures , drm/scheduler deals with single scheduler at most so looks to me like this is the right place for this function
I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race.
Will do
I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack.
Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
Agree, synchronize_srcu from drm_dev_unplug should play the role of 'flushing' any earlier (in progress) tdr work which is using drm_dev_enter/exit pair. Any later arising tdr will terminate early when drm_dev_enter returns false.
Nope, anything you put into the work itself cannot close this race. It's the schedule_work that matters here. Or I'm missing something ... I thought that the tdr work you're cancelling here is launched by drm/scheduler code, not by the amd callback?
Yes that is correct. Canceling the work item is not the right approach at all, nor is adding dev_enter/exit pair in the recovery handler.
What we need to do here is to stop the scheduler thread and then wait for any timeout handling to have finished.
Otherwise it can scheduler a new timeout just after we have canceled this one.
Regards, Christian.
-Daniel
Will update.
Andrey
For race free cancel_work_sync you need:
- make sure whatever is calling schedule_work is guaranteed to no longer
call schedule_work. 2. call cancel_work_sync
Anything else is cargo-culted work cleanup:
a problem.
- without 2. means if a work got scheduled right before it'll still be
cancel_work_sync pointless.
- without 1. means a schedule_work right after makes you calling
So either both or nothing. -Daniel
Andrey
-Daniel
Andrey
>> + >> static void >> amdgpu_pci_remove(struct pci_dev *pdev) >> { >> struct drm_device *dev = pci_get_drvdata(pdev); >> + struct amdgpu_device *adev = dev->dev_private; >> drm_dev_unplug(dev); >> + amdgpu_cancel_all_tdr(adev); >> ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); >> amdgpu_driver_unload_kms(dev); >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> index 4720718..87ff0c0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> @@ -28,6 +28,8 @@ >> #include "amdgpu.h" >> #include "amdgpu_trace.h" >> +#include <drm/drm_drv.h> >> + >> static void amdgpu_job_timedout(struct drm_sched_job *s_job) >> { >> struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); >> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) >> memset(&ti, 0, sizeof(struct amdgpu_task_info)); >> + if (drm_dev_is_unplugged(adev->ddev)) { >> + DRM_INFO("ring %s timeout, but device unplugged, skipping.\n", >> + s_job->sched->name); >> + return; >> + } >> + >> if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { >> DRM_ERROR("ring %s timeout, but soft recovered\n", >> s_job->sched->name); >> -- >> 2.7.4 >>
On 2020-11-18 07:01, Christian König wrote:
Am 18.11.20 um 08:39 schrieb Daniel Vetter:
On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 11/17/20 2:49 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
On 11/17/20 1:52 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote: > On 6/22/20 5:53 AM, Daniel Vetter wrote: >> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote: >>> No point to try recovery if device is gone, just messes up things. >>> >>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ >>> 2 files changed, 24 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> index 6932d75..5d6d3d9 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, >>> return ret; >>> } >>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) >>> +{ >>> + int i; >>> + >>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> + struct amdgpu_ring *ring = adev->rings[i]; >>> + >>> + if (!ring || !ring->sched.thread) >>> + continue; >>> + >>> + cancel_delayed_work_sync(&ring->sched.work_tdr); >>> + } >>> +} >> I think this is a function that's supposed to be in drm/scheduler, not >> here. Might also just be your cleanup code being ordered wrongly, or your >> split in one of the earlier patches not done quite right. >> -Daniel > This function iterates across all the schedulers per amdgpu device and accesses > amdgpu specific structures , drm/scheduler deals with single scheduler at most > so looks to me like this is the right place for this function I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race.
Will do
I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack.
Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
Agree, synchronize_srcu from drm_dev_unplug should play the role of 'flushing' any earlier (in progress) tdr work which is using drm_dev_enter/exit pair. Any later arising tdr will terminate early when drm_dev_enter returns false.
Nope, anything you put into the work itself cannot close this race. It's the schedule_work that matters here. Or I'm missing something ... I thought that the tdr work you're cancelling here is launched by drm/scheduler code, not by the amd callback?
Yes that is correct. Canceling the work item is not the right approach at all, nor is adding dev_enter/exit pair in the recovery handler.
What we need to do here is to stop the scheduler thread and then wait for any timeout handling to have finished.
Otherwise it can scheduler a new timeout just after we have canceled this one.
Yep, that's exactly what I said in my email above.
Regards, Luben
Regards, Christian.
-Daniel
Will update.
Andrey
For race free cancel_work_sync you need:
- make sure whatever is calling schedule_work is guaranteed to no longer
call schedule_work. 2. call cancel_work_sync
Anything else is cargo-culted work cleanup:
a problem.
- without 2. means if a work got scheduled right before it'll still be
cancel_work_sync pointless.
- without 1. means a schedule_work right after makes you calling
So either both or nothing. -Daniel
Andrey
-Daniel
> Andrey > > >>> + >>> static void >>> amdgpu_pci_remove(struct pci_dev *pdev) >>> { >>> struct drm_device *dev = pci_get_drvdata(pdev); >>> + struct amdgpu_device *adev = dev->dev_private; >>> drm_dev_unplug(dev); >>> + amdgpu_cancel_all_tdr(adev); >>> ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); >>> amdgpu_driver_unload_kms(dev); >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>> index 4720718..87ff0c0 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>> @@ -28,6 +28,8 @@ >>> #include "amdgpu.h" >>> #include "amdgpu_trace.h" >>> +#include <drm/drm_drv.h> >>> + >>> static void amdgpu_job_timedout(struct drm_sched_job *s_job) >>> { >>> struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); >>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) >>> memset(&ti, 0, sizeof(struct amdgpu_task_info)); >>> + if (drm_dev_is_unplugged(adev->ddev)) { >>> + DRM_INFO("ring %s timeout, but device unplugged, skipping.\n", >>> + s_job->sched->name); >>> + return; >>> + } >>> + >>> if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { >>> DRM_ERROR("ring %s timeout, but soft recovered\n", >>> s_job->sched->name); >>> -- >>> 2.7.4 >>>
dri-devel mailing list dri-devel@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On 11/18/20 7:01 AM, Christian König wrote:
Am 18.11.20 um 08:39 schrieb Daniel Vetter:
On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 11/17/20 2:49 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
On 11/17/20 1:52 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote: > On 6/22/20 5:53 AM, Daniel Vetter wrote: >> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote: >>> No point to try recovery if device is gone, just messes up things. >>> >>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ >>> 2 files changed, 24 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> index 6932d75..5d6d3d9 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, >>> return ret; >>> } >>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) >>> +{ >>> + int i; >>> + >>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> + struct amdgpu_ring *ring = adev->rings[i]; >>> + >>> + if (!ring || !ring->sched.thread) >>> + continue; >>> + >>> + cancel_delayed_work_sync(&ring->sched.work_tdr); >>> + } >>> +} >> I think this is a function that's supposed to be in drm/scheduler, not >> here. Might also just be your cleanup code being ordered wrongly, or your >> split in one of the earlier patches not done quite right. >> -Daniel > This function iterates across all the schedulers per amdgpu device and > accesses > amdgpu specific structures , drm/scheduler deals with single scheduler > at most > so looks to me like this is the right place for this function I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race.
Will do
I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack.
Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
Agree, synchronize_srcu from drm_dev_unplug should play the role of 'flushing' any earlier (in progress) tdr work which is using drm_dev_enter/exit pair. Any later arising tdr will terminate early when drm_dev_enter returns false.
Nope, anything you put into the work itself cannot close this race. It's the schedule_work that matters here. Or I'm missing something ... I thought that the tdr work you're cancelling here is launched by drm/scheduler code, not by the amd callback?
My bad, you are right, I am supposed to put drm_dev_enter.exit pair into drm_sched_job_timedout
Yes that is correct. Canceling the work item is not the right approach at all, nor is adding dev_enter/exit pair in the recovery handler.
Without adding the dev_enter/exit guarding pair in the recovery handler you are ending up with GPU reset starting while the device is already unplugged, this leads to multiple errors and general mess.
What we need to do here is to stop the scheduler thread and then wait for any timeout handling to have finished.
Otherwise it can scheduler a new timeout just after we have canceled this one.
Regards, Christian.
Schedulers are stopped from amdgpu_driver_unload_kms which indeed happens after drm_dev_unplug so yes, there is still a chance for new work being scheduler and timeout armed after but, once i fix the code to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't see why that not a good solution ? Any tdr work started after drm_dev_unplug finished will simply abort on entry to drm_sched_job_timedout because drm_dev_enter will be false and the function will return without rearming the timeout timer and so will have no impact.
The only issue i see here now is of possible use after free if some late tdr work will try to execute after drm device already gone, for this we probably should add cancel_delayed_work_sync(sched.work_tdr) to drm_sched_fini after sched->thread is stopped there.
Andrey
-Daniel
Will update.
Andrey
For race free cancel_work_sync you need:
- make sure whatever is calling schedule_work is guaranteed to no longer
call schedule_work. 2. call cancel_work_sync
Anything else is cargo-culted work cleanup:
- without 2. means if a work got scheduled right before it'll still be
a problem.
- without 1. means a schedule_work right after makes you calling
cancel_work_sync pointless.
So either both or nothing. -Daniel
Andrey
-Daniel
> Andrey > > >>> + >>> static void >>> amdgpu_pci_remove(struct pci_dev *pdev) >>> { >>> struct drm_device *dev = pci_get_drvdata(pdev); >>> + struct amdgpu_device *adev = dev->dev_private; >>> drm_dev_unplug(dev); >>> + amdgpu_cancel_all_tdr(adev); >>> ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); >>> amdgpu_driver_unload_kms(dev); >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>> index 4720718..87ff0c0 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>> @@ -28,6 +28,8 @@ >>> #include "amdgpu.h" >>> #include "amdgpu_trace.h" >>> +#include <drm/drm_drv.h> >>> + >>> static void amdgpu_job_timedout(struct drm_sched_job *s_job) >>> { >>> struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); >>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct >>> drm_sched_job *s_job) >>> memset(&ti, 0, sizeof(struct amdgpu_task_info)); >>> + if (drm_dev_is_unplugged(adev->ddev)) { >>> + DRM_INFO("ring %s timeout, but device unplugged, >>> skipping.\n", >>> + s_job->sched->name); >>> + return; >>> + } >>> + >>> if (amdgpu_ring_soft_recovery(ring, job->vmid, >>> s_job->s_fence->parent)) { >>> DRM_ERROR("ring %s timeout, but soft recovered\n", >>> s_job->sched->name); >>> -- >>> 2.7.4 >>>
Am 18.11.20 um 17:20 schrieb Andrey Grodzovsky:
On 11/18/20 7:01 AM, Christian König wrote:
Am 18.11.20 um 08:39 schrieb Daniel Vetter:
On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 11/17/20 2:49 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
On 11/17/20 1:52 PM, Daniel Vetter wrote: > On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote: >> On 6/22/20 5:53 AM, Daniel Vetter wrote: >>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky >>> wrote: >>>> No point to try recovery if device is gone, just messes up >>>> things. >>>> >>>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 >>>> ++++++++++++++++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ >>>> 2 files changed, 24 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>> index 6932d75..5d6d3d9 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct >>>> pci_dev *pdev, >>>> return ret; >>>> } >>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) >>>> +{ >>>> + int i; >>>> + >>>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> + struct amdgpu_ring *ring = adev->rings[i]; >>>> + >>>> + if (!ring || !ring->sched.thread) >>>> + continue; >>>> + >>>> + cancel_delayed_work_sync(&ring->sched.work_tdr); >>>> + } >>>> +} >>> I think this is a function that's supposed to be in >>> drm/scheduler, not >>> here. Might also just be your cleanup code being ordered >>> wrongly, or your >>> split in one of the earlier patches not done quite right. >>> -Daniel >> This function iterates across all the schedulers per amdgpu >> device and accesses >> amdgpu specific structures , drm/scheduler deals with single >> scheduler at most >> so looks to me like this is the right place for this function > I guess we could keep track of all schedulers somewhere in a > list in > struct drm_device and wrap this up. That was kinda the idea. > > Minimally I think a tiny wrapper with docs for the > cancel_delayed_work_sync(&sched->work_tdr); which explains what > you must > observe to make sure there's no race. Will do
> I'm not exactly sure there's no > guarantee here we won't get a new tdr work launched right > afterwards at > least, so this looks a bit like a hack. Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
Agree, synchronize_srcu from drm_dev_unplug should play the role of 'flushing' any earlier (in progress) tdr work which is using drm_dev_enter/exit pair. Any later arising tdr will terminate early when drm_dev_enter returns false.
Nope, anything you put into the work itself cannot close this race. It's the schedule_work that matters here. Or I'm missing something ... I thought that the tdr work you're cancelling here is launched by drm/scheduler code, not by the amd callback?
My bad, you are right, I am supposed to put drm_dev_enter.exit pair into drm_sched_job_timedout
Yes that is correct. Canceling the work item is not the right approach at all, nor is adding dev_enter/exit pair in the recovery handler.
Without adding the dev_enter/exit guarding pair in the recovery handler you are ending up with GPU reset starting while the device is already unplugged, this leads to multiple errors and general mess.
What we need to do here is to stop the scheduler thread and then wait for any timeout handling to have finished.
Otherwise it can scheduler a new timeout just after we have canceled this one.
Regards, Christian.
Schedulers are stopped from amdgpu_driver_unload_kms which indeed happens after drm_dev_unplug so yes, there is still a chance for new work being scheduler and timeout armed after but, once i fix the code to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't see why that not a good solution ?
Yeah that should work as well, but then you also don't need to cancel the work item from the driver.
Any tdr work started after drm_dev_unplug finished will simply abort on entry to drm_sched_job_timedout because drm_dev_enter will be false and the function will return without rearming the timeout timer and so will have no impact.
The only issue i see here now is of possible use after free if some late tdr work will try to execute after drm device already gone, for this we probably should add cancel_delayed_work_sync(sched.work_tdr) to drm_sched_fini after sched->thread is stopped there.
Good point, that is indeed missing as far as I can see.
Christian.
Andrey
On 11/19/20 2:55 AM, Christian König wrote:
Am 18.11.20 um 17:20 schrieb Andrey Grodzovsky:
On 11/18/20 7:01 AM, Christian König wrote:
Am 18.11.20 um 08:39 schrieb Daniel Vetter:
On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 11/17/20 2:49 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote: > On 11/17/20 1:52 PM, Daniel Vetter wrote: >> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote: >>> On 6/22/20 5:53 AM, Daniel Vetter wrote: >>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote: >>>>> No point to try recovery if device is gone, just messes up things. >>>>> >>>>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ >>>>> 2 files changed, 24 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> index 6932d75..5d6d3d9 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev >>>>> *pdev, >>>>> return ret; >>>>> } >>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) >>>>> +{ >>>>> + int i; >>>>> + >>>>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>>> + struct amdgpu_ring *ring = adev->rings[i]; >>>>> + >>>>> + if (!ring || !ring->sched.thread) >>>>> + continue; >>>>> + >>>>> + cancel_delayed_work_sync(&ring->sched.work_tdr); >>>>> + } >>>>> +} >>>> I think this is a function that's supposed to be in drm/scheduler, not >>>> here. Might also just be your cleanup code being ordered wrongly, or >>>> your >>>> split in one of the earlier patches not done quite right. >>>> -Daniel >>> This function iterates across all the schedulers per amdgpu device and >>> accesses >>> amdgpu specific structures , drm/scheduler deals with single scheduler >>> at most >>> so looks to me like this is the right place for this function >> I guess we could keep track of all schedulers somewhere in a list in >> struct drm_device and wrap this up. That was kinda the idea. >> >> Minimally I think a tiny wrapper with docs for the >> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must >> observe to make sure there's no race. > Will do > > >> I'm not exactly sure there's no >> guarantee here we won't get a new tdr work launched right afterwards at >> least, so this looks a bit like a hack. > Note that for any TDR work happening post amdgpu_cancel_all_tdr > amdgpu_job_timedout->drm_dev_is_unplugged > will return true and so it will return early. To make it water proof tight > against race > i can switch from drm_dev_is_unplugged to drm_dev_enter/exit Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
Agree, synchronize_srcu from drm_dev_unplug should play the role of 'flushing' any earlier (in progress) tdr work which is using drm_dev_enter/exit pair. Any later arising tdr will terminate early when drm_dev_enter returns false.
Nope, anything you put into the work itself cannot close this race. It's the schedule_work that matters here. Or I'm missing something ... I thought that the tdr work you're cancelling here is launched by drm/scheduler code, not by the amd callback?
My bad, you are right, I am supposed to put drm_dev_enter.exit pair into drm_sched_job_timedout
Yes that is correct. Canceling the work item is not the right approach at all, nor is adding dev_enter/exit pair in the recovery handler.
Without adding the dev_enter/exit guarding pair in the recovery handler you are ending up with GPU reset starting while the device is already unplugged, this leads to multiple errors and general mess.
What we need to do here is to stop the scheduler thread and then wait for any timeout handling to have finished.
Otherwise it can scheduler a new timeout just after we have canceled this one.
Regards, Christian.
Schedulers are stopped from amdgpu_driver_unload_kms which indeed happens after drm_dev_unplug so yes, there is still a chance for new work being scheduler and timeout armed after but, once i fix the code to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't see why that not a good solution ?
Yeah that should work as well, but then you also don't need to cancel the work item from the driver.
Indeed, as Daniel pointed out no need and I dropped it. One correction - I previously said that w/o dev_enter/exit guarding pair in scheduler's TO handler you will get GPU reset starting while device already gone - of course this is not fully preventing this as the device can be extracted at any moment just after we already entered GPU recovery. But it does saves us processing a futile GPU recovery which always starts once you unplug the device if there are active gobs in progress at the moment and so I think it's still justifiable to keep the dev_enter/exit guarding pair there.
Andrey
Any tdr work started after drm_dev_unplug finished will simply abort on entry to drm_sched_job_timedout because drm_dev_enter will be false and the function will return without rearming the timeout timer and so will have no impact.
The only issue i see here now is of possible use after free if some late tdr work will try to execute after drm device already gone, for this we probably should add cancel_delayed_work_sync(sched.work_tdr) to drm_sched_fini after sched->thread is stopped there.
Good point, that is indeed missing as far as I can see.
Christian.
Andrey
On Thu, Nov 19, 2020 at 10:02:28AM -0500, Andrey Grodzovsky wrote:
On 11/19/20 2:55 AM, Christian König wrote:
Am 18.11.20 um 17:20 schrieb Andrey Grodzovsky:
On 11/18/20 7:01 AM, Christian König wrote:
Am 18.11.20 um 08:39 schrieb Daniel Vetter:
On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 11/17/20 2:49 PM, Daniel Vetter wrote: > On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote: > > On 11/17/20 1:52 PM, Daniel Vetter wrote: > > > On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote: > > > > On 6/22/20 5:53 AM, Daniel Vetter wrote: > > > > > On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote: > > > > > > No point to try recovery if device is gone, just messes up things. > > > > > > > > > > > > Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com > > > > > > --- > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ > > > > > > 2 files changed, 24 insertions(+) > > > > > > > > > > > > diff --git > > > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > index 6932d75..5d6d3d9 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > @@ -1129,12 +1129,28 @@ static > > > > > > int amdgpu_pci_probe(struct > > > > > > pci_dev *pdev, > > > > > > return ret; > > > > > > } > > > > > > +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) > > > > > > +{ > > > > > > + int i; > > > > > > + > > > > > > + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > > > > > > + struct amdgpu_ring *ring = adev->rings[i]; > > > > > > + > > > > > > + if (!ring || !ring->sched.thread) > > > > > > + continue; > > > > > > + > > > > > > + cancel_delayed_work_sync(&ring->sched.work_tdr); > > > > > > + } > > > > > > +} > > > > > I think this is a function that's supposed to be in drm/scheduler, not > > > > > here. Might also just be your > > > > > cleanup code being ordered wrongly, > > > > > or your > > > > > split in one of the earlier patches not done quite right. > > > > > -Daniel > > > > This function iterates across all the > > > > schedulers per amdgpu device and > > > > accesses > > > > amdgpu specific structures , > > > > drm/scheduler deals with single > > > > scheduler at most > > > > so looks to me like this is the right place for this function > > > I guess we could keep track of all schedulers somewhere in a list in > > > struct drm_device and wrap this up. That was kinda the idea. > > > > > > Minimally I think a tiny wrapper with docs for the > > > cancel_delayed_work_sync(&sched->work_tdr); which explains what you must > > > observe to make sure there's no race. > > Will do > > > > > > > I'm not exactly sure there's no > > > guarantee here we won't get a new tdr work launched right afterwards at > > > least, so this looks a bit like a hack. > > Note that for any TDR work happening post amdgpu_cancel_all_tdr > > amdgpu_job_timedout->drm_dev_is_unplugged > > will return true and so it will return early. To make it water proof tight > > against race > > i can switch from drm_dev_is_unplugged to drm_dev_enter/exit > Hm that's confusing. You do a work_cancel_sync, so that at least looks > like "tdr work must not run after this point" > > If you only rely on drm_dev_enter/exit check with the tdr work, then > there's no need to cancel anything.
Agree, synchronize_srcu from drm_dev_unplug should play the role of 'flushing' any earlier (in progress) tdr work which is using drm_dev_enter/exit pair. Any later arising tdr will terminate early when drm_dev_enter returns false.
Nope, anything you put into the work itself cannot close this race. It's the schedule_work that matters here. Or I'm missing something ... I thought that the tdr work you're cancelling here is launched by drm/scheduler code, not by the amd callback?
My bad, you are right, I am supposed to put drm_dev_enter.exit pair into drm_sched_job_timedout
Yes that is correct. Canceling the work item is not the right approach at all, nor is adding dev_enter/exit pair in the recovery handler.
Without adding the dev_enter/exit guarding pair in the recovery handler you are ending up with GPU reset starting while the device is already unplugged, this leads to multiple errors and general mess.
What we need to do here is to stop the scheduler thread and then wait for any timeout handling to have finished.
Otherwise it can scheduler a new timeout just after we have canceled this one.
Regards, Christian.
Schedulers are stopped from amdgpu_driver_unload_kms which indeed happens after drm_dev_unplug so yes, there is still a chance for new work being scheduler and timeout armed after but, once i fix the code to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't see why that not a good solution ?
Yeah that should work as well, but then you also don't need to cancel the work item from the driver.
Indeed, as Daniel pointed out no need and I dropped it. One correction - I previously said that w/o dev_enter/exit guarding pair in scheduler's TO handler you will get GPU reset starting while device already gone - of course this is not fully preventing this as the device can be extracted at any moment just after we already entered GPU recovery. But it does saves us processing a futile GPU recovery which always starts once you unplug the device if there are active gobs in progress at the moment and so I think it's still justifiable to keep the dev_enter/exit guarding pair there.
Yeah sprinkling drm_dev_enter/exit over the usual suspect code paths like tdr to make the entire unloading much faster makes sense. Waiting for enormous amounts of mmio ops to time out isn't fun. A comment might be good for that though, to explain why we're doing that. -Daniel
Andrey
Any tdr work started after drm_dev_unplug finished will simply abort on entry to drm_sched_job_timedout because drm_dev_enter will be false and the function will return without rearming the timeout timer and so will have no impact.
The only issue i see here now is of possible use after free if some late tdr work will try to execute after drm device already gone, for this we probably should add cancel_delayed_work_sync(sched.work_tdr) to drm_sched_fini after sched->thread is stopped there.
Good point, that is indeed missing as far as I can see.
Christian.
Andrey
On 11/19/20 10:29 AM, Daniel Vetter wrote:
On Thu, Nov 19, 2020 at 10:02:28AM -0500, Andrey Grodzovsky wrote:
On 11/19/20 2:55 AM, Christian König wrote:
Am 18.11.20 um 17:20 schrieb Andrey Grodzovsky:
On 11/18/20 7:01 AM, Christian König wrote:
Am 18.11.20 um 08:39 schrieb Daniel Vetter:
On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote: > On 11/17/20 2:49 PM, Daniel Vetter wrote: >> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote: >>> On 11/17/20 1:52 PM, Daniel Vetter wrote: >>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote: >>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote: >>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote: >>>>>>> No point to try recovery if device is gone, just messes up things. >>>>>>> >>>>>>> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ >>>>>>> 2 files changed, 24 insertions(+) >>>>>>> >>>>>>> diff --git >>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>>> index 6932d75..5d6d3d9 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>>> @@ -1129,12 +1129,28 @@ static >>>>>>> int amdgpu_pci_probe(struct >>>>>>> pci_dev *pdev, >>>>>>> return ret; >>>>>>> } >>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) >>>>>>> +{ >>>>>>> + int i; >>>>>>> + >>>>>>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>>>>> + struct amdgpu_ring *ring = adev->rings[i]; >>>>>>> + >>>>>>> + if (!ring || !ring->sched.thread) >>>>>>> + continue; >>>>>>> + >>>>>>> + cancel_delayed_work_sync(&ring->sched.work_tdr); >>>>>>> + } >>>>>>> +} >>>>>> I think this is a function that's supposed to be in drm/scheduler, not >>>>>> here. Might also just be your >>>>>> cleanup code being ordered wrongly, >>>>>> or your >>>>>> split in one of the earlier patches not done quite right. >>>>>> -Daniel >>>>> This function iterates across all the >>>>> schedulers per amdgpu device and >>>>> accesses >>>>> amdgpu specific structures , >>>>> drm/scheduler deals with single >>>>> scheduler at most >>>>> so looks to me like this is the right place for this function >>>> I guess we could keep track of all schedulers somewhere in a list in >>>> struct drm_device and wrap this up. That was kinda the idea. >>>> >>>> Minimally I think a tiny wrapper with docs for the >>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must >>>> observe to make sure there's no race. >>> Will do >>> >>> >>>> I'm not exactly sure there's no >>>> guarantee here we won't get a new tdr work launched right afterwards at >>>> least, so this looks a bit like a hack. >>> Note that for any TDR work happening post amdgpu_cancel_all_tdr >>> amdgpu_job_timedout->drm_dev_is_unplugged >>> will return true and so it will return early. To make it water proof tight >>> against race >>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit >> Hm that's confusing. You do a work_cancel_sync, so that at least looks >> like "tdr work must not run after this point" >> >> If you only rely on drm_dev_enter/exit check with the tdr work, then >> there's no need to cancel anything. > Agree, synchronize_srcu from drm_dev_unplug should play the role > of 'flushing' any earlier (in progress) tdr work which is > using drm_dev_enter/exit pair. Any later arising tdr > will terminate early when > drm_dev_enter > returns false. Nope, anything you put into the work itself cannot close this race. It's the schedule_work that matters here. Or I'm missing something ... I thought that the tdr work you're cancelling here is launched by drm/scheduler code, not by the amd callback?
My bad, you are right, I am supposed to put drm_dev_enter.exit pair into drm_sched_job_timedout
Yes that is correct. Canceling the work item is not the right approach at all, nor is adding dev_enter/exit pair in the recovery handler.
Without adding the dev_enter/exit guarding pair in the recovery handler you are ending up with GPU reset starting while the device is already unplugged, this leads to multiple errors and general mess.
What we need to do here is to stop the scheduler thread and then wait for any timeout handling to have finished.
Otherwise it can scheduler a new timeout just after we have canceled this one.
Regards, Christian.
Schedulers are stopped from amdgpu_driver_unload_kms which indeed happens after drm_dev_unplug so yes, there is still a chance for new work being scheduler and timeout armed after but, once i fix the code to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't see why that not a good solution ?
Yeah that should work as well, but then you also don't need to cancel the work item from the driver.
Indeed, as Daniel pointed out no need and I dropped it. One correction - I previously said that w/o dev_enter/exit guarding pair in scheduler's TO handler you will get GPU reset starting while device already gone - of course this is not fully preventing this as the device can be extracted at any moment just after we already entered GPU recovery. But it does saves us processing a futile GPU recovery which always starts once you unplug the device if there are active gobs in progress at the moment and so I think it's still justifiable to keep the dev_enter/exit guarding pair there.
Yeah sprinkling drm_dev_enter/exit over the usual suspect code paths like tdr to make the entire unloading much faster makes sense. Waiting for enormous amounts of mmio ops to time out isn't fun. A comment might be good for that though, to explain why we're doing that. -Daniel
Will do, I also tried to insert drm_dev_enter/exit in all MMIO accessors in amdgpu to try and avoid at that level but didn't get good results for unclear reason, will probably get to this as a follow up work to again avoid expanding the scope of current work too much.
Andrey
Andrey
Any tdr work started after drm_dev_unplug finished will simply abort on entry to drm_sched_job_timedout because drm_dev_enter will be false and the function will return without rearming the timeout timer and so will have no impact.
The only issue i see here now is of possible use after free if some late tdr work will try to execute after drm device already gone, for this we probably should add cancel_delayed_work_sync(sched.work_tdr) to drm_sched_fini after sched->thread is stopped there.
Good point, that is indeed missing as far as I can see.
Christian.
Andrey
On 2020-11-17 2:49 p.m., Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
On 11/17/20 1:52 PM, Daniel Vetter wrote:
On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
On 6/22/20 5:53 AM, Daniel Vetter wrote:
On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
No point to try recovery if device is gone, just messes up things.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++ 2 files changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6932d75..5d6d3d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev, return ret; } +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) +{
- int i;
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->sched.thread)
continue;
cancel_delayed_work_sync(&ring->sched.work_tdr);
- }
+}
I think this is a function that's supposed to be in drm/scheduler, not here. Might also just be your cleanup code being ordered wrongly, or your split in one of the earlier patches not done quite right. -Daniel
This function iterates across all the schedulers per amdgpu device and accesses amdgpu specific structures , drm/scheduler deals with single scheduler at most so looks to me like this is the right place for this function
I guess we could keep track of all schedulers somewhere in a list in struct drm_device and wrap this up. That was kinda the idea.
Minimally I think a tiny wrapper with docs for the cancel_delayed_work_sync(&sched->work_tdr); which explains what you must observe to make sure there's no race.
Will do
I'm not exactly sure there's no guarantee here we won't get a new tdr work launched right afterwards at least, so this looks a bit like a hack.
Note that for any TDR work happening post amdgpu_cancel_all_tdr amdgpu_job_timedout->drm_dev_is_unplugged will return true and so it will return early. To make it water proof tight against race i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
Hm that's confusing. You do a work_cancel_sync, so that at least looks like "tdr work must not run after this point"
If you only rely on drm_dev_enter/exit check with the tdr work, then there's no need to cancel anything.
For race free cancel_work_sync you need:
- make sure whatever is calling schedule_work is guaranteed to no longer
call schedule_work. 2. call cancel_work_sync
Anything else is cargo-culted work cleanup:
a problem.
- without 2. means if a work got scheduled right before it'll still be
cancel_work_sync pointless.
- without 1. means a schedule_work right after makes you calling
This is sound advice and I did something similar for SAS over a decade ago where an expander could be disconnected from the domain via which many IOs are flying to end devices.
You need a small tiny DRM function which low-level drivers (such as amdgpu) call in order to tell DRM that this device is not accepting commands any more (sets a flag) and starts a thread to clean up commands which are "done" or "incoming". At the same time, the low-level driver returns commands which are pending in the hardware back out to DRM (thus those commands become "done" from "pending"), and DRM cleans them up.(*)
The point is that you're not bubbling up the error, but directly notifying the highest level of upper layer to hold off, while you're cleaning up all incoming and pending commands.
Depending on the situation, case 1 above has two sub-cases:
a) the device will not come back--then cancel any new work back out to the application client, or b) the device may come back again, i.e. it is being reset, then you can queue up work, assuming the device will come back on successfully and you'd be able to send the incoming requests down to it. Or cancel everything and let the application client do the queueing and resubmission, like in a). The latter will not work when this resubmission (and error recovery) is done without the knowledge of the application client, for instance communication or parity errors, protocol retries, etc.
(*) I've some work coming in, in the scheduler, which could make this handling easier, or at least set a mechanism by which this could be made easier.
Regards, Luben
So either both or nothing. -Daniel
Andrey
-Daniel
Andrey
- static void amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
- struct amdgpu_device *adev = dev->dev_private; drm_dev_unplug(dev);
- amdgpu_cancel_all_tdr(adev); ttm_bo_unmap_virtual_address_space(&adev->mman.bdev); amdgpu_driver_unload_kms(dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4720718..87ff0c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,6 +28,8 @@ #include "amdgpu.h" #include "amdgpu_trace.h" +#include <drm/drm_drv.h>
- static void amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
@@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) memset(&ti, 0, sizeof(struct amdgpu_task_info));
- if (drm_dev_is_unplugged(adev->ddev)) {
DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
s_job->sched->name);
return;
- }
- if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
-- 2.7.4
On Sun, Jun 21, 2020 at 02:03:00AM -0400, Andrey Grodzovsky wrote:
This RFC is more of a proof of concept then a fully working solution as there are a few unresolved issues we are hoping to get advise on from people on the mailing list. Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handler if the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate. This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUS it would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence. This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated. With this I was able to cleanly emulate device unplug with X and glxgears running and later emulate device plug back and restart of X and glxgears.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page. Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requiremnts that still give useful solution to work and this is the 'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
This iteration is still more of a draft as I am still facing a few unsolved issues such as a crash in user client when trying to CPU map imported BO if the map happens after device was removed and HW failure to plug back a removed device. Also since i don't have real life setup with external GPU connected through TB I am using sysfs to emulate pci remove and i expect to encounter more issues once i try this on real life case. I am also expecting some help on this from a user who volunteered to test in the related gitlab ticket. So basically this is more of a way to get feedback if I am moving in the right direction.
[1] - Discussions during v1 of the patchset https://lists.freedesktop.org/archives/dri-devel/2020-May/265386.html [2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html [3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081
A few high-level commments on the generic parts, I didn't really look at the amdgpu side yet.
Also a nit: Please tell your mailer to break long lines, it looks funny and inconsistent otherwise, at least in some of the mailers I use here :-/ -Daniel
Andrey Grodzovsky (8): drm: Add dummy page per device or GEM object drm/ttm: Remap all page faults to per process dummy page. drm/ttm: Add unampping of the entire device address space drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Refactor sysfs removal drm/amdgpu: Unmap entire device address space on device remove. drm/amdgpu: Fix sdma code crash post device unplug drm/amdgpu: Prevent any job recoveries after device is unplugged.
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 19 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 50 +++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 23 ++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 12 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 24 ++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 23 +++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 13 +++++- drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 10 +++-- drivers/gpu/drm/drm_file.c | 8 ++++ drivers/gpu/drm/drm_prime.c | 10 +++++ drivers/gpu/drm/ttm/ttm_bo.c | 8 +++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++---- include/drm/drm_file.h | 2 + include/drm/drm_gem.h | 2 + include/drm/ttm/ttm_bo_driver.h | 7 +++ 22 files changed, 286 insertions(+), 55 deletions(-)
-- 2.7.4
I am fighting with Thunderbird to make limit a line to 80 chars but nothing helps. Any suggestions please.
Andrey
On 6/22/20 5:46 AM, Daniel Vetter wrote:
Also a nit: Please tell your mailer to break long lines, it looks funny and inconsistent otherwise, at least in some of the mailers I use here :-/ -Daniel
On 2020-06-23 7:14 a.m., Andrey Grodzovsky wrote:
I am fighting with Thunderbird to make limit a line to 80 chars but nothing helps. Any suggestions please.
Maybe try disabling mail.compose.default_to_paragraph, or check other *wrap* settings.
Tried, didn't have any impact
Andrey
On 6/23/20 5:04 AM, Michel Dänzer wrote:
On 2020-06-23 7:14 a.m., Andrey Grodzovsky wrote:
I am fighting with Thunderbird to make limit a line to 80 chars but nothing helps. Any suggestions please.
Maybe try disabling mail.compose.default_to_paragraph, or check other *wrap* settings.
dri-devel@lists.freedesktop.org