Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handlerif the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUSit would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requirements that still give useful solution to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
v3: More updates following comments from v2 such as removing loop to find DRM file when rerouting page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving prevention of GPU recovery post device unplug from amdgpu to scheduler layer. On top of that added unplug support for the IOMMU enabled system.
v4: Drop last sysfs hack and use sysfs default attribute. Guard against write accesses after device removal to avoid modifying released memory. Update dummy pages handling to on demand allocation and release through drm managed framework. Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention of GPU recovery post device unplug Also rebase on top of drm-misc-mext instead of amd-staging-drm-next
With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses
TODOs for followup work: Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.
[1] - Discussions during v3 of the patchset https://www.spinics.net/lists/amd-gfx/msg55576.html [2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html [3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081
Andrey Grodzovsky (13): drm/ttm: Remap all page faults to per process dummy page. drm: Unamp the entire device address space on device unplug drm/ttm: Expose ttm_tt_unpopulate for driver use drm/sched: Cancel and flush all oustatdning jobs before finish. drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Add early fini callback drm/amdgpu: Register IOMMU topology notifier per device. drm/amdgpu: Fix a bunch of sdma code crash post device unplug drm/amdgpu: Remap all page faults to per process dummy page. dmr/amdgpu: Move some sysfs attrs creation to default_attr drm/amdgpu: Guard against write accesses after device removal drm/sched: Make timeout timer rearm conditional. drm/amdgpu: Prevent any job recoveries after device is unplugged.
Luben Tuikov (1): drm/scheduler: Job timeout handler returns status
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- drivers/gpu/drm/amd/include/amd_shared.h | 2 + drivers/gpu/drm/drm_drv.c | 3 + drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- drivers/gpu/drm/lima/lima_sched.c | 4 +- drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- drivers/gpu/drm/scheduler/sched_main.c | 18 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- include/drm/gpu_scheduler.h | 17 ++- include/drm/ttm/ttm_bo_api.h | 2 + 45 files changed, 583 insertions(+), 198 deletions(-)
On device removal reroute all CPU mappings to dummy page.
v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function.
v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 ++++++++++++++++++++++++++++++++++++++++- include/drm/ttm/ttm_bo_api.h | 2 + 2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dc96cf..ed89da3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_managed.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -380,25 +382,103 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
+static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{ + struct page *dummy_page = (struct page *)res; + + __free_page(dummy_page); +} + +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{ + struct vm_area_struct *vma = vmf->vma; + struct ttm_buffer_object *bo = vma->vm_private_data; + struct ttm_bo_device *bdev = bo->bdev; + struct drm_device *ddev = bo->base.dev; + vm_fault_t ret = VM_FAULT_NOPAGE; + unsigned long address = vma->vm_start; + unsigned long num_prefault = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + unsigned long pfn; + struct page *page; + int i; + + /* + * Wait for buffer data in transit, due to a pipelined + * move. + */ + ret = ttm_bo_vm_fault_idle(bo, vmf); + if (unlikely(ret != 0)) + return ret; + + /* Allocate new dummy page to map all the VA range in this VMA to it*/ + page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return VM_FAULT_OOM; + + pfn = page_to_pfn(page); + + /* + * Prefault the entire VMA range right away to avoid further faults + */ + for (i = 0; i < num_prefault; ++i) { + + if (unlikely(address >= vma->vm_end)) + break; + + if (vma->vm_flags & VM_MIXEDMAP) + ret = vmf_insert_mixed_prot(vma, address, + __pfn_to_pfn_t(pfn, PFN_DEV), + prot); + else + ret = vmf_insert_pfn_prot(vma, address, pfn, prot); + + /* Never error on prefaulted PTEs */ + if (unlikely((ret & VM_FAULT_ERROR))) { + if (i == 0) + return VM_FAULT_NOPAGE; + else + break; + } + + address += PAGE_SIZE; + } + + /* Set the page to be freed using drmm release action */ + if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page)) + return VM_FAULT_OOM; + + return ret; +} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page); + vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; + struct drm_device *ddev = bo->base.dev; vm_fault_t ret; + int idx;
ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret;
prot = vma->vm_page_prot; - ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + if (drm_dev_enter(ddev, &idx)) { + ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + drm_dev_exit(idx); + } else { + ret = ttm_bo_vm_dummy_page(vmf, prot); + } if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret;
dma_resv_unlock(bo->base.resv);
return ret; + + return ret; } EXPORT_SYMBOL(ttm_bo_vm_fault);
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index e17be32..12fb240 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -643,4 +643,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma); int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write);
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot); + #endif
On Mon, Jan 18, 2021 at 4:02 PM Andrey Grodzovsky andrey.grodzovsky@amd.com wrote:
On device removal reroute all CPU mappings to dummy page.
v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function.
v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 ++++++++++++++++++++++++++++++++++++++++- include/drm/ttm/ttm_bo_api.h | 2 + 2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dc96cf..ed89da3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_managed.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -380,25 +382,103 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
+static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{
struct page *dummy_page = (struct page *)res;
__free_page(dummy_page);
+}
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{
struct vm_area_struct *vma = vmf->vma;
struct ttm_buffer_object *bo = vma->vm_private_data;
struct ttm_bo_device *bdev = bo->bdev;
struct drm_device *ddev = bo->base.dev;
vm_fault_t ret = VM_FAULT_NOPAGE;
unsigned long address = vma->vm_start;
unsigned long num_prefault = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
unsigned long pfn;
struct page *page;
int i;
/*
* Wait for buffer data in transit, due to a pipelined
* move.
*/
ret = ttm_bo_vm_fault_idle(bo, vmf);
if (unlikely(ret != 0))
return ret;
/* Allocate new dummy page to map all the VA range in this VMA to it*/
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page)
return VM_FAULT_OOM;
pfn = page_to_pfn(page);
/*
* Prefault the entire VMA range right away to avoid further faults
*/
for (i = 0; i < num_prefault; ++i) {
if (unlikely(address >= vma->vm_end))
break;
if (vma->vm_flags & VM_MIXEDMAP)
ret = vmf_insert_mixed_prot(vma, address,
__pfn_to_pfn_t(pfn, PFN_DEV),
prot);
else
ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
/* Never error on prefaulted PTEs */
if (unlikely((ret & VM_FAULT_ERROR))) {
if (i == 0)
return VM_FAULT_NOPAGE;
else
break;
}
address += PAGE_SIZE;
}
/* Set the page to be freed using drmm release action */
if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page))
return VM_FAULT_OOM;
return ret;
+} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data;
struct drm_device *ddev = bo->base.dev; vm_fault_t ret;
int idx; ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret; prot = vma->vm_page_prot;
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
if (drm_dev_enter(ddev, &idx)) {
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
drm_dev_exit(idx);
} else {
ret = ttm_bo_vm_dummy_page(vmf, prot);
} if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret; dma_resv_unlock(bo->base.resv); return ret;
return ret;
Duplicate return here.
Alex
} EXPORT_SYMBOL(ttm_bo_vm_fault);
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index e17be32..12fb240 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -643,4 +643,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma); int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write);
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot);
#endif
2.7.4
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
On device removal reroute all CPU mappings to dummy page.
v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function.
v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 ++++++++++++++++++++++++++++++++++++++++- include/drm/ttm/ttm_bo_api.h | 2 + 2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dc96cf..ed89da3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_managed.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -380,25 +382,103 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
+static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{
- struct page *dummy_page = (struct page *)res;
- __free_page(dummy_page);
+}
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{
- struct vm_area_struct *vma = vmf->vma;
- struct ttm_buffer_object *bo = vma->vm_private_data;
- struct ttm_bo_device *bdev = bo->bdev;
- struct drm_device *ddev = bo->base.dev;
- vm_fault_t ret = VM_FAULT_NOPAGE;
- unsigned long address = vma->vm_start;
- unsigned long num_prefault = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
- unsigned long pfn;
- struct page *page;
- int i;
- /*
* Wait for buffer data in transit, due to a pipelined
* move.
*/
- ret = ttm_bo_vm_fault_idle(bo, vmf);
- if (unlikely(ret != 0))
return ret;
This is superfluous and probably quite harmful here because we wait for the hardware to do something.
We map a dummy page instead of the real BO content to the whole range anyway, so no need to wait for the real BO content to show up.
- /* Allocate new dummy page to map all the VA range in this VMA to it*/
- page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!page)
return VM_FAULT_OOM;
- pfn = page_to_pfn(page);
- /*
* Prefault the entire VMA range right away to avoid further faults
*/
- for (i = 0; i < num_prefault; ++i) {
Maybe rename the variable to num_pages. I was confused for a moment why we still prefault.
Alternative you can just drop i and do "for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE)".
if (unlikely(address >= vma->vm_end))
break;
if (vma->vm_flags & VM_MIXEDMAP)
ret = vmf_insert_mixed_prot(vma, address,
__pfn_to_pfn_t(pfn, PFN_DEV),
prot);
else
ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
/* Never error on prefaulted PTEs */
if (unlikely((ret & VM_FAULT_ERROR))) {
if (i == 0)
return VM_FAULT_NOPAGE;
else
break;
This should probably be modified to either always return the error or always ignore it.
Apart from that looks good to me.
Christian.
}
address += PAGE_SIZE;
- }
- /* Set the page to be freed using drmm release action */
- if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page))
return VM_FAULT_OOM;
- return ret;
+} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data;
struct drm_device *ddev = bo->base.dev; vm_fault_t ret;
int idx;
ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret;
prot = vma->vm_page_prot;
- ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
if (drm_dev_enter(ddev, &idx)) {
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
drm_dev_exit(idx);
} else {
ret = ttm_bo_vm_dummy_page(vmf, prot);
} if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret;
dma_resv_unlock(bo->base.resv);
return ret;
return ret; } EXPORT_SYMBOL(ttm_bo_vm_fault);
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index e17be32..12fb240 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -643,4 +643,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma); int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write);
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot);
- #endif
On Mon, Jan 18, 2021 at 04:01:10PM -0500, Andrey Grodzovsky wrote:
On device removal reroute all CPU mappings to dummy page.
v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function.
v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 ++++++++++++++++++++++++++++++++++++++++- include/drm/ttm/ttm_bo_api.h | 2 + 2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dc96cf..ed89da3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_managed.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -380,25 +382,103 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
+static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{
- struct page *dummy_page = (struct page *)res;
- __free_page(dummy_page);
+}
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{
- struct vm_area_struct *vma = vmf->vma;
- struct ttm_buffer_object *bo = vma->vm_private_data;
- struct ttm_bo_device *bdev = bo->bdev;
- struct drm_device *ddev = bo->base.dev;
- vm_fault_t ret = VM_FAULT_NOPAGE;
- unsigned long address = vma->vm_start;
- unsigned long num_prefault = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
- unsigned long pfn;
- struct page *page;
- int i;
- /*
* Wait for buffer data in transit, due to a pipelined
* move.
*/
- ret = ttm_bo_vm_fault_idle(bo, vmf);
- if (unlikely(ret != 0))
return ret;
- /* Allocate new dummy page to map all the VA range in this VMA to it*/
- page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!page)
return VM_FAULT_OOM;
- pfn = page_to_pfn(page);
- /*
* Prefault the entire VMA range right away to avoid further faults
*/
- for (i = 0; i < num_prefault; ++i) {
if (unlikely(address >= vma->vm_end))
break;
if (vma->vm_flags & VM_MIXEDMAP)
ret = vmf_insert_mixed_prot(vma, address,
__pfn_to_pfn_t(pfn, PFN_DEV),
prot);
else
ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
/* Never error on prefaulted PTEs */
if (unlikely((ret & VM_FAULT_ERROR))) {
if (i == 0)
return VM_FAULT_NOPAGE;
else
break;
}
address += PAGE_SIZE;
- }
- /* Set the page to be freed using drmm release action */
- if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page))
return VM_FAULT_OOM;
- return ret;
+} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
I think we can lift this entire thing (once the ttm_bo_vm_fault_idle is gone) to the drm level, since nothing ttm specific in here. Probably stuff it into drm_gem.c (but really it's not even gem specific, it's fully generic "replace this vma with dummy pages pls" function.
Aside from this nit I think the overall approach you have here is starting to look good. Lots of work&polish, but imo we're getting there and can start landing stuff soon. -Daniel
vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data;
struct drm_device *ddev = bo->base.dev; vm_fault_t ret;
int idx;
ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret;
prot = vma->vm_page_prot;
- ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
if (drm_dev_enter(ddev, &idx)) {
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
drm_dev_exit(idx);
} else {
ret = ttm_bo_vm_dummy_page(vmf, prot);
} if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret;
dma_resv_unlock(bo->base.resv);
return ret;
return ret;
} EXPORT_SYMBOL(ttm_bo_vm_fault);
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index e17be32..12fb240 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -643,4 +643,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma); int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write);
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot);
#endif
2.7.4
On 1/19/21 8:56 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:10PM -0500, Andrey Grodzovsky wrote:
On device removal reroute all CPU mappings to dummy page.
v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function.
v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 ++++++++++++++++++++++++++++++++++++++++- include/drm/ttm/ttm_bo_api.h | 2 + 2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dc96cf..ed89da3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_managed.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -380,25 +382,103 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
+static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{
- struct page *dummy_page = (struct page *)res;
- __free_page(dummy_page);
+}
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{
- struct vm_area_struct *vma = vmf->vma;
- struct ttm_buffer_object *bo = vma->vm_private_data;
- struct ttm_bo_device *bdev = bo->bdev;
- struct drm_device *ddev = bo->base.dev;
- vm_fault_t ret = VM_FAULT_NOPAGE;
- unsigned long address = vma->vm_start;
- unsigned long num_prefault = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
- unsigned long pfn;
- struct page *page;
- int i;
- /*
* Wait for buffer data in transit, due to a pipelined
* move.
*/
- ret = ttm_bo_vm_fault_idle(bo, vmf);
- if (unlikely(ret != 0))
return ret;
- /* Allocate new dummy page to map all the VA range in this VMA to it*/
- page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!page)
return VM_FAULT_OOM;
- pfn = page_to_pfn(page);
- /*
* Prefault the entire VMA range right away to avoid further faults
*/
- for (i = 0; i < num_prefault; ++i) {
if (unlikely(address >= vma->vm_end))
break;
if (vma->vm_flags & VM_MIXEDMAP)
ret = vmf_insert_mixed_prot(vma, address,
__pfn_to_pfn_t(pfn, PFN_DEV),
prot);
else
ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
/* Never error on prefaulted PTEs */
if (unlikely((ret & VM_FAULT_ERROR))) {
if (i == 0)
return VM_FAULT_NOPAGE;
else
break;
}
address += PAGE_SIZE;
- }
- /* Set the page to be freed using drmm release action */
- if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page))
return VM_FAULT_OOM;
- return ret;
+} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
I think we can lift this entire thing (once the ttm_bo_vm_fault_idle is gone) to the drm level, since nothing ttm specific in here. Probably stuff it into drm_gem.c (but really it's not even gem specific, it's fully generic "replace this vma with dummy pages pls" function.
Once I started with this I noticed that drmm_add_action_or_reset depends on struct drm_device *ddev = bo->base.dev and bo is the private data we embed at the TTM level when setting up the mapping and so this forces to move drmm_add_action_or_reset out of this function to every client who uses this function, and then you separate the logic of page allocation from it's release. So I suggest we keep it as is.
Andrey
Aside from this nit I think the overall approach you have here is starting to look good. Lots of work&polish, but imo we're getting there and can start landing stuff soon. -Daniel
vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data;
struct drm_device *ddev = bo->base.dev; vm_fault_t ret;
int idx;
ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret;
prot = vma->vm_page_prot;
- ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
if (drm_dev_enter(ddev, &idx)) {
ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
drm_dev_exit(idx);
} else {
ret = ttm_bo_vm_dummy_page(vmf, prot);
} if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret;
dma_resv_unlock(bo->base.resv);
return ret;
return ret; } EXPORT_SYMBOL(ttm_bo_vm_fault);
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index e17be32..12fb240 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -643,4 +643,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma); int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write);
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot);
- #endif
-- 2.7.4
Hey Daniel, just a ping.
Andrey
On 1/25/21 10:28 AM, Andrey Grodzovsky wrote:
On 1/19/21 8:56 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:10PM -0500, Andrey Grodzovsky wrote:
On device removal reroute all CPU mappings to dummy page.
v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function.
v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 ++++++++++++++++++++++++++++++++++++++++- include/drm/ttm/ttm_bo_api.h | 2 + 2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dc96cf..ed89da3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_managed.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -380,25 +382,103 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved); +static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{ + struct page *dummy_page = (struct page *)res;
+ __free_page(dummy_page); +}
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{ + struct vm_area_struct *vma = vmf->vma; + struct ttm_buffer_object *bo = vma->vm_private_data; + struct ttm_bo_device *bdev = bo->bdev; + struct drm_device *ddev = bo->base.dev; + vm_fault_t ret = VM_FAULT_NOPAGE; + unsigned long address = vma->vm_start; + unsigned long num_prefault = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + unsigned long pfn; + struct page *page; + int i;
+ /* + * Wait for buffer data in transit, due to a pipelined + * move. + */ + ret = ttm_bo_vm_fault_idle(bo, vmf); + if (unlikely(ret != 0)) + return ret;
+ /* Allocate new dummy page to map all the VA range in this VMA to it*/ + page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return VM_FAULT_OOM;
+ pfn = page_to_pfn(page);
+ /* + * Prefault the entire VMA range right away to avoid further faults + */ + for (i = 0; i < num_prefault; ++i) {
+ if (unlikely(address >= vma->vm_end)) + break;
+ if (vma->vm_flags & VM_MIXEDMAP) + ret = vmf_insert_mixed_prot(vma, address, + __pfn_to_pfn_t(pfn, PFN_DEV), + prot); + else + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+ /* Never error on prefaulted PTEs */ + if (unlikely((ret & VM_FAULT_ERROR))) { + if (i == 0) + return VM_FAULT_NOPAGE; + else + break; + }
+ address += PAGE_SIZE; + }
+ /* Set the page to be freed using drmm release action */ + if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page)) + return VM_FAULT_OOM;
+ return ret; +} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
I think we can lift this entire thing (once the ttm_bo_vm_fault_idle is gone) to the drm level, since nothing ttm specific in here. Probably stuff it into drm_gem.c (but really it's not even gem specific, it's fully generic "replace this vma with dummy pages pls" function.
Once I started with this I noticed that drmm_add_action_or_reset depends on struct drm_device *ddev = bo->base.dev and bo is the private data we embed at the TTM level when setting up the mapping and so this forces to move drmm_add_action_or_reset out of this function to every client who uses this function, and then you separate the logic of page allocation from it's release. So I suggest we keep it as is.
Andrey
Aside from this nit I think the overall approach you have here is starting to look good. Lots of work&polish, but imo we're getting there and can start landing stuff soon. -Daniel
vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; + struct drm_device *ddev = bo->base.dev; vm_fault_t ret; + int idx; ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret; prot = vma->vm_page_prot; - ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + if (drm_dev_enter(ddev, &idx)) { + ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + drm_dev_exit(idx); + } else { + ret = ttm_bo_vm_dummy_page(vmf, prot); + } if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret; dma_resv_unlock(bo->base.resv); return ret;
+ return ret; } EXPORT_SYMBOL(ttm_bo_vm_fault); diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index e17be32..12fb240 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -643,4 +643,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma); int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write); +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot);
#endif
2.7.4
On Wed, Jan 27, 2021 at 09:29:41AM -0500, Andrey Grodzovsky wrote:
Hey Daniel, just a ping.
Was on vacations last week.
Andrey
On 1/25/21 10:28 AM, Andrey Grodzovsky wrote:
On 1/19/21 8:56 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:10PM -0500, Andrey Grodzovsky wrote:
On device removal reroute all CPU mappings to dummy page.
v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function.
v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 ++++++++++++++++++++++++++++++++++++++++- include/drm/ttm/ttm_bo_api.h | 2 + 2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dc96cf..ed89da3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_placement.h> #include <drm/drm_vma_manager.h> +#include <drm/drm_drv.h> +#include <drm/drm_managed.h> #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/rbtree.h> @@ -380,25 +382,103 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved); +static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{ + struct page *dummy_page = (struct page *)res;
+ __free_page(dummy_page); +}
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{ + struct vm_area_struct *vma = vmf->vma; + struct ttm_buffer_object *bo = vma->vm_private_data; + struct ttm_bo_device *bdev = bo->bdev; + struct drm_device *ddev = bo->base.dev; + vm_fault_t ret = VM_FAULT_NOPAGE; + unsigned long address = vma->vm_start; + unsigned long num_prefault = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + unsigned long pfn; + struct page *page; + int i;
+ /* + * Wait for buffer data in transit, due to a pipelined + * move. + */ + ret = ttm_bo_vm_fault_idle(bo, vmf); + if (unlikely(ret != 0)) + return ret;
+ /* Allocate new dummy page to map all the VA range in this VMA to it*/ + page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return VM_FAULT_OOM;
+ pfn = page_to_pfn(page);
+ /* + * Prefault the entire VMA range right away to avoid further faults + */ + for (i = 0; i < num_prefault; ++i) {
+ if (unlikely(address >= vma->vm_end)) + break;
+ if (vma->vm_flags & VM_MIXEDMAP) + ret = vmf_insert_mixed_prot(vma, address, + __pfn_to_pfn_t(pfn, PFN_DEV), + prot); + else + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+ /* Never error on prefaulted PTEs */ + if (unlikely((ret & VM_FAULT_ERROR))) { + if (i == 0) + return VM_FAULT_NOPAGE; + else + break; + }
+ address += PAGE_SIZE; + }
+ /* Set the page to be freed using drmm release action */ + if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page)) + return VM_FAULT_OOM;
+ return ret; +} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
I think we can lift this entire thing (once the ttm_bo_vm_fault_idle is gone) to the drm level, since nothing ttm specific in here. Probably stuff it into drm_gem.c (but really it's not even gem specific, it's fully generic "replace this vma with dummy pages pls" function.
Once I started with this I noticed that drmm_add_action_or_reset depends on struct drm_device *ddev = bo->base.dev and bo is the private data we embed at the TTM level when setting up the mapping and so this forces to move drmm_add_action_or_reset out of this function to every client who uses this function, and then you separate the logic of page allocation from it's release. So I suggest we keep it as is.
Uh disappointing. Thing is, ttm essentially means drm devices with gem, except for vmwgfx, which is a drm_device without gem. And I think one of the remaining ttm refactors in this area is to move ttm_device over into drm_device someone, and then we'd have bo->base.dev always set to something that drmm_add_action_or_reset can use.
I guess hand-rolling for now and jotting this down as a TODO item is fine too, but would be good to get this addressed since that's another reason here to do this. Maybe sync with Christian how to best do this. -Daniel
Andrey
Aside from this nit I think the overall approach you have here is starting to look good. Lots of work&polish, but imo we're getting there and can start landing stuff soon. -Daniel
vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; + struct drm_device *ddev = bo->base.dev; vm_fault_t ret; + int idx; ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret; prot = vma->vm_page_prot; - ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + if (drm_dev_enter(ddev, &idx)) { + ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + drm_dev_exit(idx); + } else { + ret = ttm_bo_vm_dummy_page(vmf, prot); + } if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret; dma_resv_unlock(bo->base.resv); return ret;
+ return ret; } EXPORT_SYMBOL(ttm_bo_vm_fault); diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index e17be32..12fb240 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -643,4 +643,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma); int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write); +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot);
#endif
2.7.4
Invalidate all BOs CPU mappings once device is removed.
v3: Move the code from TTM into drm_dev_unplug
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Reviewed-by: Christian König christian.koenig@amd.com Reviewed-by: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/drm_drv.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index d384a5b..20d22e4 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -469,6 +469,9 @@ void drm_dev_unplug(struct drm_device *dev) synchronize_srcu(&drm_unplug_srcu);
drm_dev_unregister(dev); + + /* Clear all CPU mappings pointing to this device */ + unmap_mapping_range(dev->anon_inode->i_mapping, 0, 0, 1); } EXPORT_SYMBOL(drm_dev_unplug);
It's needed to drop iommu backed pages on device unplug before device's IOMMU group is released.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/ttm/ttm_tt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 7f75a13..f9e0b0d 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -341,3 +341,4 @@ void ttm_tt_unpopulate(struct ttm_bo_device *bdev, ttm_pool_free(&bdev->pool, ttm); ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED; } +EXPORT_SYMBOL(ttm_tt_unpopulate);
To avoid any possible use after free.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Reviewed-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 997aa15..92637b7 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -899,6 +899,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) if (sched->thread) kthread_stop(sched->thread);
+ /* Confirm no work left behind accessing device structures */ + cancel_delayed_work_sync(&sched->work_tdr); + sched->ready = false; } EXPORT_SYMBOL(drm_sched_fini);
On Mon, Jan 18, 2021 at 4:02 PM Andrey Grodzovsky andrey.grodzovsky@amd.com wrote:
To avoid any possible use after free.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Reviewed-by: Christian König christian.koenig@amd.com
In the subject: oustatdning -> outstanding
Alex
drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 997aa15..92637b7 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -899,6 +899,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) if (sched->thread) kthread_stop(sched->thread);
/* Confirm no work left behind accessing device structures */
cancel_delayed_work_sync(&sched->work_tdr);
sched->ready = false;
} EXPORT_SYMBOL(drm_sched_fini); -- 2.7.4
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
This is a bug fix and should probably be pushed separately to drm-misc-next.
Christian.
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
To avoid any possible use after free.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Reviewed-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 997aa15..92637b7 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -899,6 +899,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) if (sched->thread) kthread_stop(sched->thread);
- /* Confirm no work left behind accessing device structures */
- cancel_delayed_work_sync(&sched->work_tdr);
- sched->ready = false; } EXPORT_SYMBOL(drm_sched_fini);
Added a CC: stable tag and pushed it.
Thanks, Christian.
Am 19.01.21 um 09:42 schrieb Christian König:
This is a bug fix and should probably be pushed separately to drm-misc-next.
Christian.
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
To avoid any possible use after free.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Reviewed-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 997aa15..92637b7 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -899,6 +899,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) if (sched->thread) kthread_stop(sched->thread); + /* Confirm no work left behind accessing device structures */ + cancel_delayed_work_sync(&sched->work_tdr);
sched->ready = false; } EXPORT_SYMBOL(drm_sched_fini);
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped.
v4: Change functions prefix early->hw and late->sw
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ++++++++++++++++++-------- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 ++----- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++++++++++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 ++- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- 16 files changed, 78 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index f77443c..478a7d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1060,7 +1060,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
int amdgpu_device_init(struct amdgpu_device *adev, uint32_t flags); -void amdgpu_device_fini(struct amdgpu_device *adev); +void amdgpu_device_fini_hw(struct amdgpu_device *adev); +void amdgpu_device_fini_sw(struct amdgpu_device *adev); + int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos, @@ -1273,6 +1275,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev); int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv); void amdgpu_driver_postclose_kms(struct drm_device *dev, struct drm_file *file_priv); +void amdgpu_driver_release_kms(struct drm_device *dev); + int amdgpu_device_ip_suspend(struct amdgpu_device *adev); int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int amdgpu_device_resume(struct drm_device *dev, bool fbcon); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 348ac67..90c8353 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3579,14 +3579,12 @@ int amdgpu_device_init(struct amdgpu_device *adev, * Tear down the driver info (all asics). * Called at driver shutdown. */ -void amdgpu_device_fini(struct amdgpu_device *adev) +void amdgpu_device_fini_hw(struct amdgpu_device *adev) { dev_info(adev->dev, "amdgpu: finishing device.\n"); flush_delayed_work(&adev->delayed_init_work); adev->shutdown = true;
- kfree(adev->pci_state); - /* make sure IB test finished before entering exclusive mode * to avoid preemption on IB test * */ @@ -3603,11 +3601,24 @@ void amdgpu_device_fini(struct amdgpu_device *adev) else drm_atomic_helper_shutdown(adev_to_drm(adev)); } - amdgpu_fence_driver_fini(adev); + amdgpu_fence_driver_fini_hw(adev); + if (adev->pm_sysfs_en) amdgpu_pm_sysfs_fini(adev); + if (adev->ucode_sysfs_en) + amdgpu_ucode_sysfs_fini(adev); + sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes); + + amdgpu_fbdev_fini(adev); + + amdgpu_irq_fini_hw(adev); +} + +void amdgpu_device_fini_sw(struct amdgpu_device *adev) +{ amdgpu_device_ip_fini(adev); + amdgpu_fence_driver_fini_sw(adev); release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL; adev->accel_working = false; @@ -3636,14 +3647,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev) adev->rmmio = NULL; amdgpu_device_doorbell_fini(adev);
- if (adev->ucode_sysfs_en) - amdgpu_ucode_sysfs_fini(adev); - - sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes); if (IS_ENABLED(CONFIG_PERF_EVENTS)) amdgpu_pmu_fini(adev); if (adev->mman.discovery_bin) amdgpu_discovery_fini(adev); + + kfree(adev->pci_state); + }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 72efd57..9c0cd00 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1238,14 +1238,10 @@ amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
-#ifdef MODULE - if (THIS_MODULE->state != MODULE_STATE_GOING) -#endif - DRM_ERROR("Hotplug removal is not supported\n"); drm_dev_unplug(dev); amdgpu_driver_unload_kms(dev); + pci_disable_device(pdev); - pci_set_drvdata(pdev, NULL); }
static void @@ -1569,6 +1565,7 @@ static const struct drm_driver amdgpu_kms_driver = { .dumb_create = amdgpu_mode_dumb_create, .dumb_map_offset = amdgpu_mode_dumb_mmap, .fops = &amdgpu_driver_kms_fops, + .release = &amdgpu_driver_release_kms,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d56f402..e19b74c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev) * * Tear down the fence driver for all possible rings (all asics). */ -void amdgpu_fence_driver_fini(struct amdgpu_device *adev) +void amdgpu_fence_driver_fini_hw(struct amdgpu_device *adev) { unsigned i, j; int r; @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev) if (!ring->no_scheduler) drm_sched_fini(&ring->sched); del_timer_sync(&ring->fence_drv.fallback_timer); + } +} + +void amdgpu_fence_driver_fini_sw(struct amdgpu_device *adev) +{ + unsigned int i, j; + + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->fence_drv.initialized) + continue; + for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) dma_fence_put(ring->fence_drv.fences[j]); kfree(ring->fence_drv.fences); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index bea57e8..2f1cfc5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -49,6 +49,7 @@ #include <drm/drm_irq.h> #include <drm/drm_vblank.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu.h" #include "amdgpu_ih.h" #include "atom.h" @@ -313,6 +314,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev) return 0; }
+ +void amdgpu_irq_fini_hw(struct amdgpu_device *adev) +{ + if (adev->irq.installed) { + drm_irq_uninstall(&adev->ddev); + adev->irq.installed = false; + if (adev->irq.msi_enabled) + pci_free_irq_vectors(adev->pdev); + + if (!amdgpu_device_has_dc_support(adev)) + flush_work(&adev->hotplug_work); + } +} + /** * amdgpu_irq_fini - shut down interrupt handling * @@ -322,19 +337,10 @@ int amdgpu_irq_init(struct amdgpu_device *adev) * functionality, shuts down vblank, hotplug and reset interrupt handling, * turns off interrupts from all sources (all ASICs). */ -void amdgpu_irq_fini(struct amdgpu_device *adev) +void amdgpu_irq_fini_sw(struct amdgpu_device *adev) { unsigned i, j;
- if (adev->irq.installed) { - drm_irq_uninstall(adev_to_drm(adev)); - adev->irq.installed = false; - if (adev->irq.msi_enabled) - pci_free_irq_vectors(adev->pdev); - if (!amdgpu_device_has_dc_support(adev)) - flush_work(&adev->hotplug_work); - } - for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) { if (!adev->irq.client[i].sources) continue; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index ac527e5..392a732 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -104,7 +104,8 @@ void amdgpu_irq_disable_all(struct amdgpu_device *adev); irqreturn_t amdgpu_irq_handler(int irq, void *arg);
int amdgpu_irq_init(struct amdgpu_device *adev); -void amdgpu_irq_fini(struct amdgpu_device *adev); +void amdgpu_irq_fini_sw(struct amdgpu_device *adev); +void amdgpu_irq_fini_hw(struct amdgpu_device *adev); int amdgpu_irq_add_id(struct amdgpu_device *adev, unsigned client_id, unsigned src_id, struct amdgpu_irq_src *source); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index b16b327..fee95d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -29,6 +29,7 @@ #include "amdgpu.h" #include <drm/drm_debugfs.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu_uvd.h" #include "amdgpu_vce.h" #include "atom.h" @@ -93,7 +94,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev) }
amdgpu_acpi_fini(adev); - amdgpu_device_fini(adev); + amdgpu_device_fini_hw(adev); }
void amdgpu_register_gpu_instance(struct amdgpu_device *adev) @@ -1153,6 +1154,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev, pm_runtime_put_autosuspend(dev->dev); }
+ +void amdgpu_driver_release_kms(struct drm_device *dev) +{ + struct amdgpu_device *adev = drm_to_adev(dev); + + amdgpu_device_fini_sw(adev); + pci_set_drvdata(adev->pdev, NULL); +} + /* * VBlank related functions. */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index c136bd4..87eaf13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2142,6 +2142,7 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev) if (!con) return 0;
+ /* Need disable ras on all IPs here before ip [hw/sw]fini */ amdgpu_ras_disable_all_features(adev, 0); amdgpu_ras_recovery_fini(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7112137..accb243 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -107,7 +107,8 @@ struct amdgpu_fence_driver { };
int amdgpu_fence_driver_init(struct amdgpu_device *adev); -void amdgpu_fence_driver_fini(struct amdgpu_device *adev); +void amdgpu_fence_driver_fini_hw(struct amdgpu_device *adev); +void amdgpu_fence_driver_fini_sw(struct amdgpu_device *adev); void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index d374571..183d44a 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -309,7 +309,7 @@ static int cik_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev); + amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih); amdgpu_irq_remove_domain(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c index da37f8a..ee824d7 100644 --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c @@ -290,7 +290,7 @@ static int cz_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev); + amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih); amdgpu_irq_remove_domain(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c index 37d8b6c..b24f6fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c @@ -290,7 +290,7 @@ static int iceland_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev); + amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih); amdgpu_irq_remove_domain(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c index 7ba229e..c191410 100644 --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c @@ -716,7 +716,7 @@ static int navi10_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev); + amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih2); amdgpu_ih_ring_fini(adev, &adev->irq.ih1); amdgpu_ih_ring_fini(adev, &adev->irq.ih); diff --git a/drivers/gpu/drm/amd/amdgpu/si_ih.c b/drivers/gpu/drm/amd/amdgpu/si_ih.c index 51880f6..751307f 100644 --- a/drivers/gpu/drm/amd/amdgpu/si_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c @@ -175,7 +175,7 @@ static int si_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev); + amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih);
return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c index ce33199..729aaaa 100644 --- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c @@ -301,7 +301,7 @@ static int tonga_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev); + amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih); amdgpu_irq_remove_domain(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index e5ae31e..a342406 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -627,7 +627,7 @@ static int vega10_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev); + amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih2); amdgpu_ih_ring_fini(adev, &adev->irq.ih1); amdgpu_ih_ring_fini(adev, &adev->irq.ih);
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped.
v4: Change functions prefix early->hw and late->sw
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
The fence and irq changes look sane to me, no idea for the rest.
Acked-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ++++++++++++++++++-------- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 ++----- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++++++++++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 ++- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- 16 files changed, 78 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index f77443c..478a7d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1060,7 +1060,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
int amdgpu_device_init(struct amdgpu_device *adev, uint32_t flags); -void amdgpu_device_fini(struct amdgpu_device *adev); +void amdgpu_device_fini_hw(struct amdgpu_device *adev); +void amdgpu_device_fini_sw(struct amdgpu_device *adev);
int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
@@ -1273,6 +1275,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev); int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv); void amdgpu_driver_postclose_kms(struct drm_device *dev, struct drm_file *file_priv); +void amdgpu_driver_release_kms(struct drm_device *dev);
- int amdgpu_device_ip_suspend(struct amdgpu_device *adev); int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 348ac67..90c8353 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3579,14 +3579,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
- Tear down the driver info (all asics).
- Called at driver shutdown.
*/ -void amdgpu_device_fini(struct amdgpu_device *adev) +void amdgpu_device_fini_hw(struct amdgpu_device *adev) { dev_info(adev->dev, "amdgpu: finishing device.\n"); flush_delayed_work(&adev->delayed_init_work); adev->shutdown = true;
- kfree(adev->pci_state);
- /* make sure IB test finished before entering exclusive mode
- to avoid preemption on IB test
- */
@@ -3603,11 +3601,24 @@ void amdgpu_device_fini(struct amdgpu_device *adev) else drm_atomic_helper_shutdown(adev_to_drm(adev)); }
- amdgpu_fence_driver_fini(adev);
- amdgpu_fence_driver_fini_hw(adev);
- if (adev->pm_sysfs_en) amdgpu_pm_sysfs_fini(adev);
- if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
- sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
- amdgpu_fbdev_fini(adev);
- amdgpu_irq_fini_hw(adev);
+}
+void amdgpu_device_fini_sw(struct amdgpu_device *adev) +{ amdgpu_device_ip_fini(adev);
- amdgpu_fence_driver_fini_sw(adev); release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL; adev->accel_working = false;
@@ -3636,14 +3647,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev) adev->rmmio = NULL; amdgpu_device_doorbell_fini(adev);
- if (adev->ucode_sysfs_en)
amdgpu_ucode_sysfs_fini(adev);
- sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes); if (IS_ENABLED(CONFIG_PERF_EVENTS)) amdgpu_pmu_fini(adev); if (adev->mman.discovery_bin) amdgpu_discovery_fini(adev);
- kfree(adev->pci_state);
- }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 72efd57..9c0cd00 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1238,14 +1238,10 @@ amdgpu_pci_remove(struct pci_dev *pdev) { struct drm_device *dev = pci_get_drvdata(pdev);
-#ifdef MODULE
- if (THIS_MODULE->state != MODULE_STATE_GOING)
-#endif
drm_dev_unplug(dev); amdgpu_driver_unload_kms(dev);DRM_ERROR("Hotplug removal is not supported\n");
- pci_disable_device(pdev);
pci_set_drvdata(pdev, NULL); }
static void
@@ -1569,6 +1565,7 @@ static const struct drm_driver amdgpu_kms_driver = { .dumb_create = amdgpu_mode_dumb_create, .dumb_map_offset = amdgpu_mode_dumb_mmap, .fops = &amdgpu_driver_kms_fops,
.release = &amdgpu_driver_release_kms,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d56f402..e19b74c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
- Tear down the fence driver for all possible rings (all asics).
*/ -void amdgpu_fence_driver_fini(struct amdgpu_device *adev) +void amdgpu_fence_driver_fini_hw(struct amdgpu_device *adev) { unsigned i, j; int r; @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev) if (!ring->no_scheduler) drm_sched_fini(&ring->sched); del_timer_sync(&ring->fence_drv.fallback_timer);
- }
+}
+void amdgpu_fence_driver_fini_sw(struct amdgpu_device *adev) +{
- unsigned int i, j;
- for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->fence_drv.initialized)
continue;
- for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) dma_fence_put(ring->fence_drv.fences[j]); kfree(ring->fence_drv.fences);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index bea57e8..2f1cfc5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -49,6 +49,7 @@ #include <drm/drm_irq.h> #include <drm/drm_vblank.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu.h" #include "amdgpu_ih.h" #include "atom.h" @@ -313,6 +314,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev) return 0; }
+void amdgpu_irq_fini_hw(struct amdgpu_device *adev) +{
- if (adev->irq.installed) {
drm_irq_uninstall(&adev->ddev);
adev->irq.installed = false;
if (adev->irq.msi_enabled)
pci_free_irq_vectors(adev->pdev);
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
- }
+}
- /**
- amdgpu_irq_fini - shut down interrupt handling
@@ -322,19 +337,10 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
- functionality, shuts down vblank, hotplug and reset interrupt handling,
- turns off interrupts from all sources (all ASICs).
*/ -void amdgpu_irq_fini(struct amdgpu_device *adev) +void amdgpu_irq_fini_sw(struct amdgpu_device *adev) { unsigned i, j;
- if (adev->irq.installed) {
drm_irq_uninstall(adev_to_drm(adev));
adev->irq.installed = false;
if (adev->irq.msi_enabled)
pci_free_irq_vectors(adev->pdev);
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
- }
- for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) { if (!adev->irq.client[i].sources) continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h index ac527e5..392a732 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h @@ -104,7 +104,8 @@ void amdgpu_irq_disable_all(struct amdgpu_device *adev); irqreturn_t amdgpu_irq_handler(int irq, void *arg);
int amdgpu_irq_init(struct amdgpu_device *adev); -void amdgpu_irq_fini(struct amdgpu_device *adev); +void amdgpu_irq_fini_sw(struct amdgpu_device *adev); +void amdgpu_irq_fini_hw(struct amdgpu_device *adev); int amdgpu_irq_add_id(struct amdgpu_device *adev, unsigned client_id, unsigned src_id, struct amdgpu_irq_src *source); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index b16b327..fee95d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -29,6 +29,7 @@ #include "amdgpu.h" #include <drm/drm_debugfs.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu_uvd.h" #include "amdgpu_vce.h" #include "atom.h" @@ -93,7 +94,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev) }
amdgpu_acpi_fini(adev);
- amdgpu_device_fini(adev);
amdgpu_device_fini_hw(adev); }
void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
@@ -1153,6 +1154,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev, pm_runtime_put_autosuspend(dev->dev); }
+void amdgpu_driver_release_kms(struct drm_device *dev) +{
- struct amdgpu_device *adev = drm_to_adev(dev);
- amdgpu_device_fini_sw(adev);
- pci_set_drvdata(adev->pdev, NULL);
+}
- /*
*/
- VBlank related functions.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index c136bd4..87eaf13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2142,6 +2142,7 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev) if (!con) return 0;
- /* Need disable ras on all IPs here before ip [hw/sw]fini */ amdgpu_ras_disable_all_features(adev, 0); amdgpu_ras_recovery_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 7112137..accb243 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -107,7 +107,8 @@ struct amdgpu_fence_driver { };
int amdgpu_fence_driver_init(struct amdgpu_device *adev); -void amdgpu_fence_driver_fini(struct amdgpu_device *adev); +void amdgpu_fence_driver_fini_hw(struct amdgpu_device *adev); +void amdgpu_fence_driver_fini_sw(struct amdgpu_device *adev); void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index d374571..183d44a 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -309,7 +309,7 @@ static int cik_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev);
- amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih); amdgpu_irq_remove_domain(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c index da37f8a..ee824d7 100644 --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c @@ -290,7 +290,7 @@ static int cz_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev);
- amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih); amdgpu_irq_remove_domain(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c index 37d8b6c..b24f6fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c @@ -290,7 +290,7 @@ static int iceland_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev);
- amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih); amdgpu_irq_remove_domain(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c index 7ba229e..c191410 100644 --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c @@ -716,7 +716,7 @@ static int navi10_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev);
- amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih2); amdgpu_ih_ring_fini(adev, &adev->irq.ih1); amdgpu_ih_ring_fini(adev, &adev->irq.ih);
diff --git a/drivers/gpu/drm/amd/amdgpu/si_ih.c b/drivers/gpu/drm/amd/amdgpu/si_ih.c index 51880f6..751307f 100644 --- a/drivers/gpu/drm/amd/amdgpu/si_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c @@ -175,7 +175,7 @@ static int si_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev);
amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih);
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c index ce33199..729aaaa 100644 --- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c @@ -301,7 +301,7 @@ static int tonga_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev);
- amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih); amdgpu_irq_remove_domain(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index e5ae31e..a342406 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -627,7 +627,7 @@ static int vega10_ih_sw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- amdgpu_irq_fini(adev);
- amdgpu_irq_fini_sw(adev); amdgpu_ih_ring_fini(adev, &adev->irq.ih2); amdgpu_ih_ring_fini(adev, &adev->irq.ih1); amdgpu_ih_ring_fini(adev, &adev->irq.ih);
Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 ++++++++++++++++++++ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 ++++++++++-- drivers/gpu/drm/amd/include/amd_shared.h | 2 ++ 3 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 90c8353..45e23e3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2529,6 +2529,24 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev) return 0; }
+static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev) +{ + int i, r; + + for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].version->funcs->early_fini) + continue; + + r = adev->ip_blocks[i].version->funcs->early_fini((void *)adev); + if (r) { + DRM_DEBUG("early_fini of IP block <%s> failed %d\n", + adev->ip_blocks[i].version->funcs->name, r); + } + } + + return 0; +} + /** * amdgpu_device_ip_fini - run fini for hardware IPs * @@ -3613,6 +3631,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_fbdev_fini(adev);
amdgpu_irq_fini_hw(adev); + + amdgpu_device_ip_fini_early(adev); }
void amdgpu_device_fini_sw(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 86c2b2c..9b24f3e 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1156,6 +1156,15 @@ static int amdgpu_dm_init(struct amdgpu_device *adev) return -EINVAL; }
+static int amdgpu_dm_early_fini(void *handle) +{ + struct amdgpu_device *adev = (struct amdgpu_device *)handle; + + amdgpu_dm_audio_fini(adev); + + return 0; +} + static void amdgpu_dm_fini(struct amdgpu_device *adev) { int i; @@ -1164,8 +1173,6 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev) drm_encoder_cleanup(&adev->dm.mst_encoders[i].base); }
- amdgpu_dm_audio_fini(adev); - amdgpu_dm_destroy_drm_device(&adev->dm);
#ifdef CONFIG_DRM_AMD_DC_HDCP @@ -2175,6 +2182,7 @@ static const struct amd_ip_funcs amdgpu_dm_funcs = { .late_init = dm_late_init, .sw_init = dm_sw_init, .sw_fini = dm_sw_fini, + .early_fini = amdgpu_dm_early_fini, .hw_init = dm_hw_init, .hw_fini = dm_hw_fini, .suspend = dm_suspend, diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h index 9676016..63bb846 100644 --- a/drivers/gpu/drm/amd/include/amd_shared.h +++ b/drivers/gpu/drm/amd/include/amd_shared.h @@ -239,6 +239,7 @@ enum amd_dpm_forced_level; * @late_init: sets up late driver/hw state (post hw_init) - Optional * @sw_init: sets up driver state, does not configure hw * @sw_fini: tears down driver state, does not configure hw + * @early_fini: tears down stuff before dev detached from driver * @hw_init: sets up the hw state * @hw_fini: tears down the hw state * @late_fini: final cleanup @@ -267,6 +268,7 @@ struct amd_ip_funcs { int (*late_init)(void *handle); int (*sw_init)(void *handle); int (*sw_fini)(void *handle); + int (*early_fini)(void *handle); int (*hw_init)(void *handle); int (*hw_fini)(void *handle); void (*late_fini)(void *handle);
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h>
#include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device {
bool in_pci_err_recovery; struct pci_saved_state *pci_state; + + struct notifier_block nb; + struct blocking_notifier_head notifier; + struct list_head device_bo_list; };
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h>
+#include <linux/iommu.h> + MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { };
+static int amdgpu_iommu_group_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb); + struct amdgpu_bo *bo = NULL; + + /* + * Following is a set of IOMMU group dependencies taken care of before + * device's IOMMU group is removed + */ + if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) { + + spin_lock(&ttm_bo_glob.lru_lock); + list_for_each_entry(bo, &adev->device_bo_list, bo) { + if (bo->tbo.ttm) + ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm); + } + spin_unlock(&ttm_bo_glob.lru_lock); + + if (adev->irq.ih.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih); + if (adev->irq.ih1.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih1); + if (adev->irq.ih2.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih2); + + amdgpu_gart_dummy_page_fini(adev); + } + + return NOTIFY_OK; +} + + /** * amdgpu_device_init - initialize the driver * @@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
+ INIT_LIST_HEAD(&adev->device_bo_list); + adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev);
+ BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier); + adev->nb.notifier_call = amdgpu_iommu_group_notifier; + + if (adev->dev->iommu_group) { + r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb); + if (r) + goto failed; + } + return 0;
failed: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent);
+ spin_lock(&ttm_bo_glob.lru_lock); + list_del(&bo->bo); + spin_unlock(&ttm_bo_glob.lru_lock); + kfree(bo->metadata); kfree(bo); } @@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+ INIT_LIST_HEAD(&bo->bo); + + spin_lock(&ttm_bo_glob.lru_lock); + list_add_tail(&bo->bo, &adev->device_bo_list); + spin_unlock(&ttm_bo_glob.lru_lock); + return 0;
fail_unreserve: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list;
struct kgd_mem *kfd_bo; + + struct list_head bo; };
static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
On Mon, Jan 18, 2021 at 4:02 PM Andrey Grodzovsky andrey.grodzovsky@amd.com wrote:
Handle all DMA IOMMU gropup related dependencies before the
gropup -> group
Alex
group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h>
#include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device {
bool in_pci_err_recovery; struct pci_saved_state *pci_state;
struct notifier_block nb;
struct blocking_notifier_head notifier;
struct list_head device_bo_list;
};
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h>
+#include <linux/iommu.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { };
+static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
+{
struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
struct amdgpu_bo *bo = NULL;
/*
* Following is a set of IOMMU group dependencies taken care of before
* device's IOMMU group is removed
*/
if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
spin_lock(&ttm_bo_glob.lru_lock);
list_for_each_entry(bo, &adev->device_bo_list, bo) {
if (bo->tbo.ttm)
ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
}
spin_unlock(&ttm_bo_glob.lru_lock);
if (adev->irq.ih.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih);
if (adev->irq.ih1.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
if (adev->irq.ih2.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
amdgpu_gart_dummy_page_fini(adev);
}
return NOTIFY_OK;
+}
/**
- amdgpu_device_init - initialize the driver
@@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
INIT_LIST_HEAD(&adev->device_bo_list);
adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev);
BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
adev->nb.notifier_call = amdgpu_iommu_group_notifier;
if (adev->dev->iommu_group) {
r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
if (r)
goto failed;
}
return 0;
failed: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
- Frees the dummy page used by the driver (all asics).
*/ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent);
spin_lock(&ttm_bo_glob.lru_lock);
list_del(&bo->bo);
spin_unlock(&ttm_bo_glob.lru_lock);
kfree(bo->metadata); kfree(bo);
} @@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
INIT_LIST_HEAD(&bo->bo);
spin_lock(&ttm_bo_glob.lru_lock);
list_add_tail(&bo->bo, &adev->device_bo_list);
spin_unlock(&ttm_bo_glob.lru_lock);
return 0;
fail_unreserve: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list;
struct kgd_mem *kfd_bo;
struct list_head bo;
};
static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
2.7.4
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h>
#include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device {
bool in_pci_err_recovery; struct pci_saved_state *pci_state;
struct notifier_block nb;
struct blocking_notifier_head notifier;
struct list_head device_bo_list; };
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h>
+#include <linux/iommu.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { };
+static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
+{
- struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
- struct amdgpu_bo *bo = NULL;
- /*
* Following is a set of IOMMU group dependencies taken care of before
* device's IOMMU group is removed
*/
- if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
spin_lock(&ttm_bo_glob.lru_lock);
list_for_each_entry(bo, &adev->device_bo_list, bo) {
if (bo->tbo.ttm)
ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
}
spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
Christian.
if (adev->irq.ih.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih);
if (adev->irq.ih1.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
if (adev->irq.ih2.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
amdgpu_gart_dummy_page_fini(adev);
- }
- return NOTIFY_OK;
+}
- /**
- amdgpu_device_init - initialize the driver
@@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
- INIT_LIST_HEAD(&adev->device_bo_list);
- adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev);
BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
adev->nb.notifier_call = amdgpu_iommu_group_notifier;
if (adev->dev->iommu_group) {
r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
if (r)
goto failed;
}
return 0;
failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
- Frees the dummy page used by the driver (all asics).
*/ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_del(&bo->bo);
- spin_unlock(&ttm_bo_glob.lru_lock);
- kfree(bo->metadata); kfree(bo); }
@@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
INIT_LIST_HEAD(&bo->bo);
spin_lock(&ttm_bo_glob.lru_lock);
list_add_tail(&bo->bo, &adev->device_bo_list);
spin_unlock(&ttm_bo_glob.lru_lock);
return 0;
fail_unreserve:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list;
struct kgd_mem *kfd_bo;
struct list_head bo; };
static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
On Tue, Jan 19, 2021 at 09:48:03AM +0100, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h> #include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device { bool in_pci_err_recovery; struct pci_saved_state *pci_state;
- struct notifier_block nb;
- struct blocking_notifier_head notifier;
- struct list_head device_bo_list; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { }; +static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
+{
- struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
- struct amdgpu_bo *bo = NULL;
- /*
* Following is a set of IOMMU group dependencies taken care of before
* device's IOMMU group is removed
*/
- if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
spin_lock(&ttm_bo_glob.lru_lock);
list_for_each_entry(bo, &adev->device_bo_list, bo) {
if (bo->tbo.ttm)
ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
}
spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
I'd also be worried about the notifier mutex getting really badly in the way.
Plus I'm worried why we even need this, it sounds a bit like papering over the iommu subsystem. Assuming we clean up all our iommu mappings in our device hotunplug/unload code, why do we still need to have an additional iommu notifier on top, with all kinds of additional headaches? The iommu shouldn't clean up before the devices in its group have cleaned up.
I think we need more info here on what the exact problem is first. -Daniel
Christian.
if (adev->irq.ih.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih);
if (adev->irq.ih1.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
if (adev->irq.ih2.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
amdgpu_gart_dummy_page_fini(adev);
- }
- return NOTIFY_OK;
+}
- /**
- amdgpu_device_init - initialize the driver
@@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
- INIT_LIST_HEAD(&adev->device_bo_list);
- adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev);
- BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
- adev->nb.notifier_call = amdgpu_iommu_group_notifier;
- if (adev->dev->iommu_group) {
r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
if (r)
goto failed;
- }
- return 0; failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
- Frees the dummy page used by the driver (all asics).
*/ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_del(&bo->bo);
- spin_unlock(&ttm_bo_glob.lru_lock);
- kfree(bo->metadata); kfree(bo); }
@@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
- INIT_LIST_HEAD(&bo->bo);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_add_tail(&bo->bo, &adev->device_bo_list);
- spin_unlock(&ttm_bo_glob.lru_lock);
- return 0; fail_unreserve:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list; struct kgd_mem *kfd_bo;
- struct list_head bo; }; static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
On 1/19/21 8:45 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 09:48:03AM +0100, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h> #include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device { bool in_pci_err_recovery; struct pci_saved_state *pci_state;
- struct notifier_block nb;
- struct blocking_notifier_head notifier;
- struct list_head device_bo_list; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { }; +static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
+{
- struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
- struct amdgpu_bo *bo = NULL;
- /*
* Following is a set of IOMMU group dependencies taken care of before
* device's IOMMU group is removed
*/
- if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
spin_lock(&ttm_bo_glob.lru_lock);
list_for_each_entry(bo, &adev->device_bo_list, bo) {
if (bo->tbo.ttm)
ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
}
spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
I'd also be worried about the notifier mutex getting really badly in the way.
Plus I'm worried why we even need this, it sounds a bit like papering over the iommu subsystem. Assuming we clean up all our iommu mappings in our device hotunplug/unload code, why do we still need to have an additional iommu notifier on top, with all kinds of additional headaches? The iommu shouldn't clean up before the devices in its group have cleaned up.
I think we need more info here on what the exact problem is first. -Daniel
Originally I experienced the crash bellow on IOMMU enabled device, it happens post device removal from PCI topology - during shutting down of user client holding last reference to drm device file (X in my case). The crash is because by the time I get to this point struct device->iommu_group pointer is NULL already since the IOMMU group for the device is unset during PCI removal. So this contradicts what you said above that the iommu shouldn't clean up before the devices in its group have cleaned up. So instead of guessing when is the right place to place all IOMMU related cleanups it makes sense to get notification from IOMMU subsystem in the form of event IOMMU_GROUP_NOTIFY_DEL_DEVICE and use that place to do all the relevant cleanups.
Andrey
[ 123.810074 < 28.126960>] BUG: kernel NULL pointer dereference, address: 00000000000000c8 [ 123.810080 < 0.000006>] #PF: supervisor read access in kernel mode [ 123.810082 < 0.000002>] #PF: error_code(0x0000) - not-present page [ 123.810085 < 0.000003>] PGD 0 P4D 0 [ 123.810089 < 0.000004>] Oops: 0000 [#1] SMP NOPTI [ 123.810094 < 0.000005>] CPU: 5 PID: 1418 Comm: Xorg:shlo4 Tainted: G O 5.9.0-rc2-dev+ #59 [ 123.810096 < 0.000002>] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 4406 02/28/2019 [ 123.810105 < 0.000009>] *RIP: 0010:iommu_get_dma_domain*+0x10/0x20 [ 123.810108 < 0.000003>] Code: b0 48 c7 87 98 00 00 00 00 00 00 00 31 c0 c3 b8 f4 ff ff ff eb a6 0f 1f 40 00 0f 1f 44 00 00 48 8b 87 d0 02 00 00 55 48 89 e5 <48> 8b 80 c8 00 00 00 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 [ 123.810111 < 0.000003>] RSP: 0018:ffffa2e201f7f980 EFLAGS: 00010246 [ 123.810114 < 0.000003>] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000 [ 123.810116 < 0.000002>] RDX: 0000000000001000 RSI: 00000000bf5cb000 RDI: ffff93c259dc60b0 [ 123.810117 < 0.000001>] RBP: ffffa2e201f7f980 R08: 0000000000000000 R09: 0000000000000000 [ 123.810119 < 0.000002>] R10: ffffa2e201f7faf0 R11: 0000000000000001 R12: 00000000bf5cb000 [ 123.810121 < 0.000002>] R13: 0000000000001000 R14: ffff93c24cef9c50 R15: ffff93c256c05688 [ 123.810124 < 0.000003>] FS: 00007f5e5e8d3700(0000) GS:ffff93c25e940000(0000) knlGS:0000000000000000 [ 123.810126 < 0.000002>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.810128 < 0.000002>] CR2: 00000000000000c8 CR3: 000000027fe0a000 CR4: 00000000003506e0 [ 123.810130 < 0.000002>] Call Trace: [ 123.810136 < 0.000006>] __iommu_dma_unmap+0x2e/0x100 [ 123.810141 < 0.000005>] ? kfree+0x389/0x3a0 [ 123.810144 < 0.000003>] iommu_dma_unmap_page+0xe/0x10 [ 123.810149 < 0.000005>] dma_unmap_page_attrs+0x4d/0xf0 [ 123.810159 < 0.000010>] ? ttm_bo_del_from_lru+0x8e/0xb0 [ttm] [ 123.810165 < 0.000006>] ttm_unmap_and_unpopulate_pages+0x8e/0xc0 [ttm] [ 123.810252 < 0.000087>] amdgpu_ttm_tt_unpopulate+0xaa/0xd0 [amdgpu] [ 123.810258 < 0.000006>] ttm_tt_unpopulate+0x59/0x70 [ttm] [ 123.810264 < 0.000006>] ttm_tt_destroy+0x6a/0x70 [ttm] [ 123.810270 < 0.000006>] ttm_bo_cleanup_memtype_use+0x36/0xa0 [ttm] [ 123.810276 < 0.000006>] ttm_bo_put+0x1e7/0x400 [ttm] [ 123.810358 < 0.000082>] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 123.810440 < 0.000082>] amdgpu_gem_object_free+0x37/0x50 [amdgpu] [ 123.810459 < 0.000019>] drm_gem_object_free+0x35/0x40 [drm] [ 123.810476 < 0.000017>] drm_gem_object_handle_put_unlocked+0x9d/0xd0 [drm] [ 123.810494 < 0.000018>] drm_gem_object_release_handle+0x74/0x90 [drm] [ 123.810511 < 0.000017>] ? drm_gem_object_handle_put_unlocked+0xd0/0xd0 [drm] [ 123.810516 < 0.000005>] idr_for_each+0x4d/0xd0 [ 123.810534 < 0.000018>] drm_gem_release+0x20/0x30 [drm] [ 123.810550 < 0.000016>] drm_file_free+0x251/0x2a0 [drm] [ 123.810567 < 0.000017>] drm_close_helper.isra.14+0x61/0x70 [drm] [ 123.810583 < 0.000016>] drm_release+0x6a/0xe0 [drm] [ 123.810588 < 0.000005>] __fput+0xa2/0x250 [ 123.810592 < 0.000004>] ____fput+0xe/0x10 [ 123.810595 < 0.000003>] task_work_run+0x6c/0xa0 [ 123.810600 < 0.000005>] do_exit+0x376/0xb60 [ 123.810604 < 0.000004>] do_group_exit+0x43/0xa0 [ 123.810608 < 0.000004>] get_signal+0x18b/0x8e0 [ 123.810612 < 0.000004>] ? do_futex+0x595/0xc20 [ 123.810617 < 0.000005>] arch_do_signal+0x34/0x880 [ 123.810620 < 0.000003>] ? check_preempt_curr+0x50/0x60 [ 123.810623 < 0.000003>] ? ttwu_do_wakeup+0x1e/0x160 [ 123.810626 < 0.000003>] ? ttwu_do_activate+0x61/0x70 [ 123.810630 < 0.000004>] exit_to_user_mode_prepare+0x124/0x1b0 [ 123.810635 < 0.000005>] syscall_exit_to_user_mode+0x31/0x170 [ 123.810639 < 0.000004>] do_syscall_64+0x43/0x80
Andrey
Christian.
if (adev->irq.ih.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih);
if (adev->irq.ih1.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
if (adev->irq.ih2.use_bus_addr)
amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
amdgpu_gart_dummy_page_fini(adev);
- }
- return NOTIFY_OK;
+}
- /**
- amdgpu_device_init - initialize the driver
@@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
- INIT_LIST_HEAD(&adev->device_bo_list);
- adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev);
- BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
- adev->nb.notifier_call = amdgpu_iommu_group_notifier;
- if (adev->dev->iommu_group) {
r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
if (r)
goto failed;
- }
- return 0; failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_del(&bo->bo);
- spin_unlock(&ttm_bo_glob.lru_lock);
- kfree(bo->metadata); kfree(bo); }
@@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
- INIT_LIST_HEAD(&bo->bo);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_add_tail(&bo->bo, &adev->device_bo_list);
- spin_unlock(&ttm_bo_glob.lru_lock);
- return 0; fail_unreserve:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list; struct kgd_mem *kfd_bo;
- struct list_head bo; }; static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
On Tue, Jan 19, 2021 at 10:22 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 8:45 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 09:48:03AM +0100, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h> #include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device { bool in_pci_err_recovery; struct pci_saved_state *pci_state;
- struct notifier_block nb;
- struct blocking_notifier_head notifier;
- struct list_head device_bo_list; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { }; +static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
+{
- struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
- struct amdgpu_bo *bo = NULL;
- /*
- Following is a set of IOMMU group dependencies taken care of before
- device's IOMMU group is removed
- */
- if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
- spin_lock(&ttm_bo_glob.lru_lock);
- list_for_each_entry(bo, &adev->device_bo_list, bo) {
- if (bo->tbo.ttm)
- ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
- }
- spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
I'd also be worried about the notifier mutex getting really badly in the way.
Plus I'm worried why we even need this, it sounds a bit like papering over the iommu subsystem. Assuming we clean up all our iommu mappings in our device hotunplug/unload code, why do we still need to have an additional iommu notifier on top, with all kinds of additional headaches? The iommu shouldn't clean up before the devices in its group have cleaned up.
I think we need more info here on what the exact problem is first. -Daniel
Originally I experienced the crash bellow on IOMMU enabled device, it happens post device removal from PCI topology - during shutting down of user client holding last reference to drm device file (X in my case). The crash is because by the time I get to this point struct device->iommu_group pointer is NULL already since the IOMMU group for the device is unset during PCI removal. So this contradicts what you said above that the iommu shouldn't clean up before the devices in its group have cleaned up. So instead of guessing when is the right place to place all IOMMU related cleanups it makes sense to get notification from IOMMU subsystem in the form of event IOMMU_GROUP_NOTIFY_DEL_DEVICE and use that place to do all the relevant cleanups.
Yeah that goes boom, but you shouldn't need this special iommu cleanup handler. Making sure that all the dma-api mappings are gone needs to be done as part of the device hotunplug, you can't delay that to the last drm_device cleanup.
So I most of the patch here with pulling that out (should be outright removed from the final release code even) is good, just not yet how you call that new code. Probably these bits (aside from walking all buffers and unpopulating the tt) should be done from the early_free callback you're adding.
Also what I just realized: For normal unload you need to make sure the hw is actually stopped first, before we unmap buffers. Otherwise driver unload will likely result in wedged hw, probably not what you want for debugging. -Daniel
Andrey
[ 123.810074 < 28.126960>] BUG: kernel NULL pointer dereference, address: 00000000000000c8 [ 123.810080 < 0.000006>] #PF: supervisor read access in kernel mode [ 123.810082 < 0.000002>] #PF: error_code(0x0000) - not-present page [ 123.810085 < 0.000003>] PGD 0 P4D 0 [ 123.810089 < 0.000004>] Oops: 0000 [#1] SMP NOPTI [ 123.810094 < 0.000005>] CPU: 5 PID: 1418 Comm: Xorg:shlo4 Tainted: G O 5.9.0-rc2-dev+ #59 [ 123.810096 < 0.000002>] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 4406 02/28/2019 [ 123.810105 < 0.000009>] RIP: 0010:iommu_get_dma_domain+0x10/0x20 [ 123.810108 < 0.000003>] Code: b0 48 c7 87 98 00 00 00 00 00 00 00 31 c0 c3 b8 f4 ff ff ff eb a6 0f 1f 40 00 0f 1f 44 00 00 48 8b 87 d0 02 00 00 55 48 89 e5 <48> 8b 80 c8 00 00 00 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 [ 123.810111 < 0.000003>] RSP: 0018:ffffa2e201f7f980 EFLAGS: 00010246 [ 123.810114 < 0.000003>] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000 [ 123.810116 < 0.000002>] RDX: 0000000000001000 RSI: 00000000bf5cb000 RDI: ffff93c259dc60b0 [ 123.810117 < 0.000001>] RBP: ffffa2e201f7f980 R08: 0000000000000000 R09: 0000000000000000 [ 123.810119 < 0.000002>] R10: ffffa2e201f7faf0 R11: 0000000000000001 R12: 00000000bf5cb000 [ 123.810121 < 0.000002>] R13: 0000000000001000 R14: ffff93c24cef9c50 R15: ffff93c256c05688 [ 123.810124 < 0.000003>] FS: 00007f5e5e8d3700(0000) GS:ffff93c25e940000(0000) knlGS:0000000000000000 [ 123.810126 < 0.000002>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.810128 < 0.000002>] CR2: 00000000000000c8 CR3: 000000027fe0a000 CR4: 00000000003506e0 [ 123.810130 < 0.000002>] Call Trace: [ 123.810136 < 0.000006>] __iommu_dma_unmap+0x2e/0x100 [ 123.810141 < 0.000005>] ? kfree+0x389/0x3a0 [ 123.810144 < 0.000003>] iommu_dma_unmap_page+0xe/0x10 [ 123.810149 < 0.000005>] dma_unmap_page_attrs+0x4d/0xf0 [ 123.810159 < 0.000010>] ? ttm_bo_del_from_lru+0x8e/0xb0 [ttm] [ 123.810165 < 0.000006>] ttm_unmap_and_unpopulate_pages+0x8e/0xc0 [ttm] [ 123.810252 < 0.000087>] amdgpu_ttm_tt_unpopulate+0xaa/0xd0 [amdgpu] [ 123.810258 < 0.000006>] ttm_tt_unpopulate+0x59/0x70 [ttm] [ 123.810264 < 0.000006>] ttm_tt_destroy+0x6a/0x70 [ttm] [ 123.810270 < 0.000006>] ttm_bo_cleanup_memtype_use+0x36/0xa0 [ttm] [ 123.810276 < 0.000006>] ttm_bo_put+0x1e7/0x400 [ttm] [ 123.810358 < 0.000082>] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 123.810440 < 0.000082>] amdgpu_gem_object_free+0x37/0x50 [amdgpu] [ 123.810459 < 0.000019>] drm_gem_object_free+0x35/0x40 [drm] [ 123.810476 < 0.000017>] drm_gem_object_handle_put_unlocked+0x9d/0xd0 [drm] [ 123.810494 < 0.000018>] drm_gem_object_release_handle+0x74/0x90 [drm] [ 123.810511 < 0.000017>] ? drm_gem_object_handle_put_unlocked+0xd0/0xd0 [drm] [ 123.810516 < 0.000005>] idr_for_each+0x4d/0xd0 [ 123.810534 < 0.000018>] drm_gem_release+0x20/0x30 [drm] [ 123.810550 < 0.000016>] drm_file_free+0x251/0x2a0 [drm] [ 123.810567 < 0.000017>] drm_close_helper.isra.14+0x61/0x70 [drm] [ 123.810583 < 0.000016>] drm_release+0x6a/0xe0 [drm] [ 123.810588 < 0.000005>] __fput+0xa2/0x250 [ 123.810592 < 0.000004>] ____fput+0xe/0x10 [ 123.810595 < 0.000003>] task_work_run+0x6c/0xa0 [ 123.810600 < 0.000005>] do_exit+0x376/0xb60 [ 123.810604 < 0.000004>] do_group_exit+0x43/0xa0 [ 123.810608 < 0.000004>] get_signal+0x18b/0x8e0 [ 123.810612 < 0.000004>] ? do_futex+0x595/0xc20 [ 123.810617 < 0.000005>] arch_do_signal+0x34/0x880 [ 123.810620 < 0.000003>] ? check_preempt_curr+0x50/0x60 [ 123.810623 < 0.000003>] ? ttwu_do_wakeup+0x1e/0x160 [ 123.810626 < 0.000003>] ? ttwu_do_activate+0x61/0x70 [ 123.810630 < 0.000004>] exit_to_user_mode_prepare+0x124/0x1b0 [ 123.810635 < 0.000005>] syscall_exit_to_user_mode+0x31/0x170 [ 123.810639 < 0.000004>] do_syscall_64+0x43/0x80
Andrey
Christian.
- if (adev->irq.ih.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih);
- if (adev->irq.ih1.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
- if (adev->irq.ih2.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
- amdgpu_gart_dummy_page_fini(adev);
- }
- return NOTIFY_OK;
+}
- /**
- amdgpu_device_init - initialize the driver
@@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
- INIT_LIST_HEAD(&adev->device_bo_list);
- adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev);
- BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
- adev->nb.notifier_call = amdgpu_iommu_group_notifier;
- if (adev->dev->iommu_group) {
- r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
- if (r)
- goto failed;
- }
- return 0; failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
- Frees the dummy page used by the driver (all asics).
*/ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_del(&bo->bo);
- spin_unlock(&ttm_bo_glob.lru_lock);
- kfree(bo->metadata); kfree(bo); }
@@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
- INIT_LIST_HEAD(&bo->bo);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_add_tail(&bo->bo, &adev->device_bo_list);
- spin_unlock(&ttm_bo_glob.lru_lock);
- return 0; fail_unreserve:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list; struct kgd_mem *kfd_bo;
- struct list_head bo; }; static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
On 1/19/21 5:01 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 10:22 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 8:45 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 09:48:03AM +0100, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h> #include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device { bool in_pci_err_recovery; struct pci_saved_state *pci_state;
- struct notifier_block nb;
- struct blocking_notifier_head notifier;
- struct list_head device_bo_list; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { }; +static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
+{
- struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
- struct amdgpu_bo *bo = NULL;
- /*
- Following is a set of IOMMU group dependencies taken care of before
- device's IOMMU group is removed
- */
- if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
- spin_lock(&ttm_bo_glob.lru_lock);
- list_for_each_entry(bo, &adev->device_bo_list, bo) {
- if (bo->tbo.ttm)
- ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
- }
- spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
I'd also be worried about the notifier mutex getting really badly in the way.
Plus I'm worried why we even need this, it sounds a bit like papering over the iommu subsystem. Assuming we clean up all our iommu mappings in our device hotunplug/unload code, why do we still need to have an additional iommu notifier on top, with all kinds of additional headaches? The iommu shouldn't clean up before the devices in its group have cleaned up.
I think we need more info here on what the exact problem is first. -Daniel
Originally I experienced the crash bellow on IOMMU enabled device, it happens post device removal from PCI topology - during shutting down of user client holding last reference to drm device file (X in my case). The crash is because by the time I get to this point struct device->iommu_group pointer is NULL already since the IOMMU group for the device is unset during PCI removal. So this contradicts what you said above that the iommu shouldn't clean up before the devices in its group have cleaned up. So instead of guessing when is the right place to place all IOMMU related cleanups it makes sense to get notification from IOMMU subsystem in the form of event IOMMU_GROUP_NOTIFY_DEL_DEVICE and use that place to do all the relevant cleanups.
Yeah that goes boom, but you shouldn't need this special iommu cleanup handler. Making sure that all the dma-api mappings are gone needs to be done as part of the device hotunplug, you can't delay that to the last drm_device cleanup.
So I most of the patch here with pulling that out (should be outright removed from the final release code even) is good, just not yet how you call that new code. Probably these bits (aside from walking all buffers and unpopulating the tt) should be done from the early_free callback you're adding.
Also what I just realized: For normal unload you need to make sure the hw is actually stopped first, before we unmap buffers. Otherwise driver unload will likely result in wedged hw, probably not what you want for debugging. -Daniel
Since device removal from IOMMU group and this hook in particular takes place before call to amdgpu_pci_remove essentially it means that for IOMMU use case the entire amdgpu_device_fini_hw function shouold be called here to stop the HW instead from amdgpu_pci_remove.
Looking at this from another perspective, AFAIK on each new device probing either due to PCI bus rescan or driver reload we are resetting the ASIC before doing any init operations (assuming we successfully gained MMIO access) and so maybe your concern is not an issue ?
Andrey
Andrey
[ 123.810074 < 28.126960>] BUG: kernel NULL pointer dereference, address: 00000000000000c8 [ 123.810080 < 0.000006>] #PF: supervisor read access in kernel mode [ 123.810082 < 0.000002>] #PF: error_code(0x0000) - not-present page [ 123.810085 < 0.000003>] PGD 0 P4D 0 [ 123.810089 < 0.000004>] Oops: 0000 [#1] SMP NOPTI [ 123.810094 < 0.000005>] CPU: 5 PID: 1418 Comm: Xorg:shlo4 Tainted: G O 5.9.0-rc2-dev+ #59 [ 123.810096 < 0.000002>] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 4406 02/28/2019 [ 123.810105 < 0.000009>] RIP: 0010:iommu_get_dma_domain+0x10/0x20 [ 123.810108 < 0.000003>] Code: b0 48 c7 87 98 00 00 00 00 00 00 00 31 c0 c3 b8 f4 ff ff ff eb a6 0f 1f 40 00 0f 1f 44 00 00 48 8b 87 d0 02 00 00 55 48 89 e5 <48> 8b 80 c8 00 00 00 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 [ 123.810111 < 0.000003>] RSP: 0018:ffffa2e201f7f980 EFLAGS: 00010246 [ 123.810114 < 0.000003>] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000 [ 123.810116 < 0.000002>] RDX: 0000000000001000 RSI: 00000000bf5cb000 RDI: ffff93c259dc60b0 [ 123.810117 < 0.000001>] RBP: ffffa2e201f7f980 R08: 0000000000000000 R09: 0000000000000000 [ 123.810119 < 0.000002>] R10: ffffa2e201f7faf0 R11: 0000000000000001 R12: 00000000bf5cb000 [ 123.810121 < 0.000002>] R13: 0000000000001000 R14: ffff93c24cef9c50 R15: ffff93c256c05688 [ 123.810124 < 0.000003>] FS: 00007f5e5e8d3700(0000) GS:ffff93c25e940000(0000) knlGS:0000000000000000 [ 123.810126 < 0.000002>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.810128 < 0.000002>] CR2: 00000000000000c8 CR3: 000000027fe0a000 CR4: 00000000003506e0 [ 123.810130 < 0.000002>] Call Trace: [ 123.810136 < 0.000006>] __iommu_dma_unmap+0x2e/0x100 [ 123.810141 < 0.000005>] ? kfree+0x389/0x3a0 [ 123.810144 < 0.000003>] iommu_dma_unmap_page+0xe/0x10 [ 123.810149 < 0.000005>] dma_unmap_page_attrs+0x4d/0xf0 [ 123.810159 < 0.000010>] ? ttm_bo_del_from_lru+0x8e/0xb0 [ttm] [ 123.810165 < 0.000006>] ttm_unmap_and_unpopulate_pages+0x8e/0xc0 [ttm] [ 123.810252 < 0.000087>] amdgpu_ttm_tt_unpopulate+0xaa/0xd0 [amdgpu] [ 123.810258 < 0.000006>] ttm_tt_unpopulate+0x59/0x70 [ttm] [ 123.810264 < 0.000006>] ttm_tt_destroy+0x6a/0x70 [ttm] [ 123.810270 < 0.000006>] ttm_bo_cleanup_memtype_use+0x36/0xa0 [ttm] [ 123.810276 < 0.000006>] ttm_bo_put+0x1e7/0x400 [ttm] [ 123.810358 < 0.000082>] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 123.810440 < 0.000082>] amdgpu_gem_object_free+0x37/0x50 [amdgpu] [ 123.810459 < 0.000019>] drm_gem_object_free+0x35/0x40 [drm] [ 123.810476 < 0.000017>] drm_gem_object_handle_put_unlocked+0x9d/0xd0 [drm] [ 123.810494 < 0.000018>] drm_gem_object_release_handle+0x74/0x90 [drm] [ 123.810511 < 0.000017>] ? drm_gem_object_handle_put_unlocked+0xd0/0xd0 [drm] [ 123.810516 < 0.000005>] idr_for_each+0x4d/0xd0 [ 123.810534 < 0.000018>] drm_gem_release+0x20/0x30 [drm] [ 123.810550 < 0.000016>] drm_file_free+0x251/0x2a0 [drm] [ 123.810567 < 0.000017>] drm_close_helper.isra.14+0x61/0x70 [drm] [ 123.810583 < 0.000016>] drm_release+0x6a/0xe0 [drm] [ 123.810588 < 0.000005>] __fput+0xa2/0x250 [ 123.810592 < 0.000004>] ____fput+0xe/0x10 [ 123.810595 < 0.000003>] task_work_run+0x6c/0xa0 [ 123.810600 < 0.000005>] do_exit+0x376/0xb60 [ 123.810604 < 0.000004>] do_group_exit+0x43/0xa0 [ 123.810608 < 0.000004>] get_signal+0x18b/0x8e0 [ 123.810612 < 0.000004>] ? do_futex+0x595/0xc20 [ 123.810617 < 0.000005>] arch_do_signal+0x34/0x880 [ 123.810620 < 0.000003>] ? check_preempt_curr+0x50/0x60 [ 123.810623 < 0.000003>] ? ttwu_do_wakeup+0x1e/0x160 [ 123.810626 < 0.000003>] ? ttwu_do_activate+0x61/0x70 [ 123.810630 < 0.000004>] exit_to_user_mode_prepare+0x124/0x1b0 [ 123.810635 < 0.000005>] syscall_exit_to_user_mode+0x31/0x170 [ 123.810639 < 0.000004>] do_syscall_64+0x43/0x80
Andrey
Christian.
- if (adev->irq.ih.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih);
- if (adev->irq.ih1.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
- if (adev->irq.ih2.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
- amdgpu_gart_dummy_page_fini(adev);
- }
- return NOTIFY_OK;
+}
- /**
- amdgpu_device_init - initialize the driver
@@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
- INIT_LIST_HEAD(&adev->device_bo_list);
- adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev);
- BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
- adev->nb.notifier_call = amdgpu_iommu_group_notifier;
- if (adev->dev->iommu_group) {
- r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
- if (r)
- goto failed;
- }
- return 0; failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_del(&bo->bo);
- spin_unlock(&ttm_bo_glob.lru_lock);
- kfree(bo->metadata); kfree(bo); }
@@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
- INIT_LIST_HEAD(&bo->bo);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_add_tail(&bo->bo, &adev->device_bo_list);
- spin_unlock(&ttm_bo_glob.lru_lock);
- return 0; fail_unreserve:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list; struct kgd_mem *kfd_bo;
- struct list_head bo; }; static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
On Wed, Jan 20, 2021 at 5:21 AM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 5:01 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 10:22 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 8:45 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 09:48:03AM +0100, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h> #include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device { bool in_pci_err_recovery; struct pci_saved_state *pci_state;
- struct notifier_block nb;
- struct blocking_notifier_head notifier;
- struct list_head device_bo_list; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { }; +static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
+{
- struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
- struct amdgpu_bo *bo = NULL;
- /*
- Following is a set of IOMMU group dependencies taken care of before
- device's IOMMU group is removed
- */
- if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
- spin_lock(&ttm_bo_glob.lru_lock);
- list_for_each_entry(bo, &adev->device_bo_list, bo) {
- if (bo->tbo.ttm)
- ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
- }
- spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
I'd also be worried about the notifier mutex getting really badly in the way.
Plus I'm worried why we even need this, it sounds a bit like papering over the iommu subsystem. Assuming we clean up all our iommu mappings in our device hotunplug/unload code, why do we still need to have an additional iommu notifier on top, with all kinds of additional headaches? The iommu shouldn't clean up before the devices in its group have cleaned up.
I think we need more info here on what the exact problem is first. -Daniel
Originally I experienced the crash bellow on IOMMU enabled device, it happens post device removal from PCI topology - during shutting down of user client holding last reference to drm device file (X in my case). The crash is because by the time I get to this point struct device->iommu_group pointer is NULL already since the IOMMU group for the device is unset during PCI removal. So this contradicts what you said above that the iommu shouldn't clean up before the devices in its group have cleaned up. So instead of guessing when is the right place to place all IOMMU related cleanups it makes sense to get notification from IOMMU subsystem in the form of event IOMMU_GROUP_NOTIFY_DEL_DEVICE and use that place to do all the relevant cleanups.
Yeah that goes boom, but you shouldn't need this special iommu cleanup handler. Making sure that all the dma-api mappings are gone needs to be done as part of the device hotunplug, you can't delay that to the last drm_device cleanup.
So I most of the patch here with pulling that out (should be outright removed from the final release code even) is good, just not yet how you call that new code. Probably these bits (aside from walking all buffers and unpopulating the tt) should be done from the early_free callback you're adding.
Also what I just realized: For normal unload you need to make sure the hw is actually stopped first, before we unmap buffers. Otherwise driver unload will likely result in wedged hw, probably not what you want for debugging. -Daniel
Since device removal from IOMMU group and this hook in particular takes place before call to amdgpu_pci_remove essentially it means that for IOMMU use case the entire amdgpu_device_fini_hw function shouold be called here to stop the HW instead from amdgpu_pci_remove.
The crash you showed was on final drm_close, which should happen after device removal, so that's clearly buggy. If the iommu subsystem removes stuff before the driver could clean up already, then I think that's an iommu bug or dma-api bug. Just plain using dma_map/unmap and friends really shouldn't need notifier hacks like you're implementing here. Can you pls show me a backtrace where dma_unmap_sg blows up when it's put into the pci_driver remove callback?
Looking at this from another perspective, AFAIK on each new device probing either due to PCI bus rescan or driver reload we are resetting the ASIC before doing any init operations (assuming we successfully gained MMIO access) and so maybe your concern is not an issue ?
Reset on probe is too late. The problem is that if you just remove the driver, your device is doing dma at that moment. And you kinda have to stop that before you free the mappings/memory. Of course when the device is actually hotunplugged, then dma is guaranteed to have stopped already. I'm not sure whether disabling the pci device is enough to make sure no more dma happens, could be that's enough. -Daniel
Andrey
Andrey
[ 123.810074 < 28.126960>] BUG: kernel NULL pointer dereference, address: 00000000000000c8 [ 123.810080 < 0.000006>] #PF: supervisor read access in kernel mode [ 123.810082 < 0.000002>] #PF: error_code(0x0000) - not-present page [ 123.810085 < 0.000003>] PGD 0 P4D 0 [ 123.810089 < 0.000004>] Oops: 0000 [#1] SMP NOPTI [ 123.810094 < 0.000005>] CPU: 5 PID: 1418 Comm: Xorg:shlo4 Tainted: G O 5.9.0-rc2-dev+ #59 [ 123.810096 < 0.000002>] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 4406 02/28/2019 [ 123.810105 < 0.000009>] RIP: 0010:iommu_get_dma_domain+0x10/0x20 [ 123.810108 < 0.000003>] Code: b0 48 c7 87 98 00 00 00 00 00 00 00 31 c0 c3 b8 f4 ff ff ff eb a6 0f 1f 40 00 0f 1f 44 00 00 48 8b 87 d0 02 00 00 55 48 89 e5 <48> 8b 80 c8 00 00 00 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 [ 123.810111 < 0.000003>] RSP: 0018:ffffa2e201f7f980 EFLAGS: 00010246 [ 123.810114 < 0.000003>] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000 [ 123.810116 < 0.000002>] RDX: 0000000000001000 RSI: 00000000bf5cb000 RDI: ffff93c259dc60b0 [ 123.810117 < 0.000001>] RBP: ffffa2e201f7f980 R08: 0000000000000000 R09: 0000000000000000 [ 123.810119 < 0.000002>] R10: ffffa2e201f7faf0 R11: 0000000000000001 R12: 00000000bf5cb000 [ 123.810121 < 0.000002>] R13: 0000000000001000 R14: ffff93c24cef9c50 R15: ffff93c256c05688 [ 123.810124 < 0.000003>] FS: 00007f5e5e8d3700(0000) GS:ffff93c25e940000(0000) knlGS:0000000000000000 [ 123.810126 < 0.000002>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.810128 < 0.000002>] CR2: 00000000000000c8 CR3: 000000027fe0a000 CR4: 00000000003506e0 [ 123.810130 < 0.000002>] Call Trace: [ 123.810136 < 0.000006>] __iommu_dma_unmap+0x2e/0x100 [ 123.810141 < 0.000005>] ? kfree+0x389/0x3a0 [ 123.810144 < 0.000003>] iommu_dma_unmap_page+0xe/0x10 [ 123.810149 < 0.000005>] dma_unmap_page_attrs+0x4d/0xf0 [ 123.810159 < 0.000010>] ? ttm_bo_del_from_lru+0x8e/0xb0 [ttm] [ 123.810165 < 0.000006>] ttm_unmap_and_unpopulate_pages+0x8e/0xc0 [ttm] [ 123.810252 < 0.000087>] amdgpu_ttm_tt_unpopulate+0xaa/0xd0 [amdgpu] [ 123.810258 < 0.000006>] ttm_tt_unpopulate+0x59/0x70 [ttm] [ 123.810264 < 0.000006>] ttm_tt_destroy+0x6a/0x70 [ttm] [ 123.810270 < 0.000006>] ttm_bo_cleanup_memtype_use+0x36/0xa0 [ttm] [ 123.810276 < 0.000006>] ttm_bo_put+0x1e7/0x400 [ttm] [ 123.810358 < 0.000082>] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 123.810440 < 0.000082>] amdgpu_gem_object_free+0x37/0x50 [amdgpu] [ 123.810459 < 0.000019>] drm_gem_object_free+0x35/0x40 [drm] [ 123.810476 < 0.000017>] drm_gem_object_handle_put_unlocked+0x9d/0xd0 [drm] [ 123.810494 < 0.000018>] drm_gem_object_release_handle+0x74/0x90 [drm] [ 123.810511 < 0.000017>] ? drm_gem_object_handle_put_unlocked+0xd0/0xd0 [drm] [ 123.810516 < 0.000005>] idr_for_each+0x4d/0xd0 [ 123.810534 < 0.000018>] drm_gem_release+0x20/0x30 [drm] [ 123.810550 < 0.000016>] drm_file_free+0x251/0x2a0 [drm] [ 123.810567 < 0.000017>] drm_close_helper.isra.14+0x61/0x70 [drm] [ 123.810583 < 0.000016>] drm_release+0x6a/0xe0 [drm] [ 123.810588 < 0.000005>] __fput+0xa2/0x250 [ 123.810592 < 0.000004>] ____fput+0xe/0x10 [ 123.810595 < 0.000003>] task_work_run+0x6c/0xa0 [ 123.810600 < 0.000005>] do_exit+0x376/0xb60 [ 123.810604 < 0.000004>] do_group_exit+0x43/0xa0 [ 123.810608 < 0.000004>] get_signal+0x18b/0x8e0 [ 123.810612 < 0.000004>] ? do_futex+0x595/0xc20 [ 123.810617 < 0.000005>] arch_do_signal+0x34/0x880 [ 123.810620 < 0.000003>] ? check_preempt_curr+0x50/0x60 [ 123.810623 < 0.000003>] ? ttwu_do_wakeup+0x1e/0x160 [ 123.810626 < 0.000003>] ? ttwu_do_activate+0x61/0x70 [ 123.810630 < 0.000004>] exit_to_user_mode_prepare+0x124/0x1b0 [ 123.810635 < 0.000005>] syscall_exit_to_user_mode+0x31/0x170 [ 123.810639 < 0.000004>] do_syscall_64+0x43/0x80
Andrey
Christian.
- if (adev->irq.ih.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih);
- if (adev->irq.ih1.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
- if (adev->irq.ih2.use_bus_addr)
- amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
- amdgpu_gart_dummy_page_fini(adev);
- }
- return NOTIFY_OK;
+}
- /**
- amdgpu_device_init - initialize the driver
@@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
- INIT_LIST_HEAD(&adev->device_bo_list);
- adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev);
- BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
- adev->nb.notifier_call = amdgpu_iommu_group_notifier;
- if (adev->dev->iommu_group) {
- r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
- if (r)
- goto failed;
- }
- return 0; failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_del(&bo->bo);
- spin_unlock(&ttm_bo_glob.lru_lock);
- kfree(bo->metadata); kfree(bo); }
@@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
- INIT_LIST_HEAD(&bo->bo);
- spin_lock(&ttm_bo_glob.lru_lock);
- list_add_tail(&bo->bo, &adev->device_bo_list);
- spin_unlock(&ttm_bo_glob.lru_lock);
- return 0; fail_unreserve:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list; struct kgd_mem *kfd_bo;
- struct list_head bo; }; static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
On 1/19/21 3:48 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h> #include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device { bool in_pci_err_recovery; struct pci_saved_state *pci_state;
+ struct notifier_block nb; + struct blocking_notifier_head notifier; + struct list_head device_bo_list; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { }; +static int amdgpu_iommu_group_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb); + struct amdgpu_bo *bo = NULL;
+ /* + * Following is a set of IOMMU group dependencies taken care of before + * device's IOMMU group is removed + */ + if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
+ spin_lock(&ttm_bo_glob.lru_lock); + list_for_each_entry(bo, &adev->device_bo_list, bo) { + if (bo->tbo.ttm) + ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm); + } + spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
Christian.
I can think of switching to RCU list ? Otherwise, elements are added on BO create and deleted on BO destroy, how can i prevent any of those from happening while in this section besides mutex ? Make a copy list and run over it instead ?
Andrey
+ if (adev->irq.ih.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih); + if (adev->irq.ih1.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih1); + if (adev->irq.ih2.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
+ amdgpu_gart_dummy_page_fini(adev); + }
+ return NOTIFY_OK; +}
/** * amdgpu_device_init - initialize the driver * @@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func); + INIT_LIST_HEAD(&adev->device_bo_list);
adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0; @@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev); + BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier); + adev->nb.notifier_call = amdgpu_iommu_group_notifier;
+ if (adev->dev->iommu_group) { + r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb); + if (r) + goto failed; + }
return 0; failed: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent); + spin_lock(&ttm_bo_glob.lru_lock); + list_del(&bo->bo); + spin_unlock(&ttm_bo_glob.lru_lock);
kfree(bo->metadata); kfree(bo); } @@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; + INIT_LIST_HEAD(&bo->bo);
+ spin_lock(&ttm_bo_glob.lru_lock); + list_add_tail(&bo->bo, &adev->device_bo_list); + spin_unlock(&ttm_bo_glob.lru_lock);
return 0; fail_unreserve: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list; struct kgd_mem *kfd_bo;
+ struct list_head bo; }; static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
Ping
Andrey
On 1/20/21 12:01 AM, Andrey Grodzovsky wrote:
On 1/19/21 3:48 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h> #include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device { bool in_pci_err_recovery; struct pci_saved_state *pci_state;
+ struct notifier_block nb; + struct blocking_notifier_head notifier; + struct list_head device_bo_list; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { }; +static int amdgpu_iommu_group_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb); + struct amdgpu_bo *bo = NULL;
+ /* + * Following is a set of IOMMU group dependencies taken care of before + * device's IOMMU group is removed + */ + if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
+ spin_lock(&ttm_bo_glob.lru_lock); + list_for_each_entry(bo, &adev->device_bo_list, bo) { + if (bo->tbo.ttm) + ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm); + } + spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
Christian.
I can think of switching to RCU list ? Otherwise, elements are added on BO create and deleted on BO destroy, how can i prevent any of those from happening while in this section besides mutex ? Make a copy list and run over it instead ?
Andrey
+ if (adev->irq.ih.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih); + if (adev->irq.ih1.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih1); + if (adev->irq.ih2.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
+ amdgpu_gart_dummy_page_fini(adev); + }
+ return NOTIFY_OK; +}
/** * amdgpu_device_init - initialize the driver * @@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func); + INIT_LIST_HEAD(&adev->device_bo_list);
adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0; @@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev); + BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier); + adev->nb.notifier_call = amdgpu_iommu_group_notifier;
+ if (adev->dev->iommu_group) { + r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb); + if (r) + goto failed; + }
return 0; failed: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent); + spin_lock(&ttm_bo_glob.lru_lock); + list_del(&bo->bo); + spin_unlock(&ttm_bo_glob.lru_lock);
kfree(bo->metadata); kfree(bo); } @@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; + INIT_LIST_HEAD(&bo->bo);
+ spin_lock(&ttm_bo_glob.lru_lock); + list_add_tail(&bo->bo, &adev->device_bo_list); + spin_unlock(&ttm_bo_glob.lru_lock);
return 0; fail_unreserve: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list; struct kgd_mem *kfd_bo;
+ struct list_head bo; }; static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
Am 20.01.21 um 20:38 schrieb Andrey Grodzovsky:
Ping
Andrey
On 1/20/21 12:01 AM, Andrey Grodzovsky wrote:
On 1/19/21 3:48 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
Handle all DMA IOMMU gropup related dependencies before the group is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++ 6 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 478a7d8..2953420 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -51,6 +51,7 @@ #include <linux/dma-fence.h> #include <linux/pci.h> #include <linux/aer.h> +#include <linux/notifier.h> #include <drm/ttm/ttm_bo_api.h> #include <drm/ttm/ttm_bo_driver.h> @@ -1041,6 +1042,10 @@ struct amdgpu_device { bool in_pci_err_recovery; struct pci_saved_state *pci_state;
+ struct notifier_block nb; + struct blocking_notifier_head notifier; + struct list_head device_bo_list; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45e23e3..e99f4f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -70,6 +70,8 @@ #include <drm/task_barrier.h> #include <linux/pm_runtime.h> +#include <linux/iommu.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -3200,6 +3202,39 @@ static const struct attribute *amdgpu_dev_attributes[] = { }; +static int amdgpu_iommu_group_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb); + struct amdgpu_bo *bo = NULL;
+ /* + * Following is a set of IOMMU group dependencies taken care of before + * device's IOMMU group is removed + */ + if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
+ spin_lock(&ttm_bo_glob.lru_lock); + list_for_each_entry(bo, &adev->device_bo_list, bo) { + if (bo->tbo.ttm) + ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm); + } + spin_unlock(&ttm_bo_glob.lru_lock);
That approach won't work. ttm_tt_unpopulate() might sleep on an IOMMU lock.
You need to use a mutex here or even better make sure you can access the device_bo_list without a lock in this moment.
Christian.
I can think of switching to RCU list ? Otherwise, elements are added on BO create and deleted on BO destroy, how can i prevent any of those from happening while in this section besides mutex ? Make a copy list and run over it instead ?
RCU won't work since the BO is not RCU protected.
What you can try something like this:
spin_lock(&ttm_bo_glob.lru_lock); while (list_not_empty(&adev->device_bo_list)) { bo = list_first_entry(&adev->device_bo_list); list_del(bo->...); spin_unlock(&ttm_bo_glob.lru_lock); ttm_tt_unpopulate(bo); spin_lock(&ttm_bo_glob.lru_lock); }...
Regards, Christian.
Andrey
+ if (adev->irq.ih.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih); + if (adev->irq.ih1.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih1); + if (adev->irq.ih2.use_bus_addr) + amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
+ amdgpu_gart_dummy_page_fini(adev); + }
+ return NOTIFY_OK; +}
/** * amdgpu_device_init - initialize the driver * @@ -3304,6 +3339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func); + INIT_LIST_HEAD(&adev->device_bo_list);
adev->gfx.gfx_off_req_count = 1; adev->pm.ac_power = power_supply_is_system_supplied() > 0; @@ -3575,6 +3612,15 @@ int amdgpu_device_init(struct amdgpu_device *adev, if (amdgpu_device_cache_pci_state(adev->pdev)) pci_restore_state(pdev); + BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier); + adev->nb.notifier_call = amdgpu_iommu_group_notifier;
+ if (adev->dev->iommu_group) { + r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb); + if (r) + goto failed; + }
return 0; failed: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index 0db9330..486ad6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index afa2e28..5678d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6cc9919..4a1de69 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) } amdgpu_bo_unref(&bo->parent); + spin_lock(&ttm_bo_glob.lru_lock); + list_del(&bo->bo); + spin_unlock(&ttm_bo_glob.lru_lock);
kfree(bo->metadata); kfree(bo); } @@ -613,6 +617,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, if (bp->type == ttm_bo_type_device) bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; + INIT_LIST_HEAD(&bo->bo);
+ spin_lock(&ttm_bo_glob.lru_lock); + list_add_tail(&bo->bo, &adev->device_bo_list); + spin_unlock(&ttm_bo_glob.lru_lock);
return 0; fail_unreserve: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 9ac3756..5ae8555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -110,6 +110,8 @@ struct amdgpu_bo { struct list_head shadow_list; struct kgd_mem *kfd_bo;
+ struct list_head bo; }; static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
We can't allocate and submit IBs post device unplug.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index ad91c0c..5096351 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -31,6 +31,7 @@ #include <linux/dma-buf.h>
#include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu.h" #include "amdgpu_trace.h" #include "amdgpu_amdkfd.h" @@ -1604,7 +1605,10 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, struct amdgpu_vm_update_params params; enum amdgpu_sync_mode sync_mode; uint64_t pfn; - int r; + int r, idx; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return -ENOENT;
memset(¶ms, 0, sizeof(params)); params.adev = adev; @@ -1647,6 +1651,8 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, if (r) goto error_unlock;
+ + drm_dev_exit(idx); do { uint64_t tmp, num_entries, addr;
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
We can't allocate and submit IBs post device unplug.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index ad91c0c..5096351 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -31,6 +31,7 @@ #include <linux/dma-buf.h>
#include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h> #include "amdgpu.h" #include "amdgpu_trace.h" #include "amdgpu_amdkfd.h" @@ -1604,7 +1605,10 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, struct amdgpu_vm_update_params params; enum amdgpu_sync_mode sync_mode; uint64_t pfn;
- int r;
- int r, idx;
- if (!drm_dev_enter(&adev->ddev, &idx))
return -ENOENT;
Why not -ENODEV?
memset(¶ms, 0, sizeof(params)); params.adev = adev; @@ -1647,6 +1651,8 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, if (r) goto error_unlock;
- drm_dev_exit(idx);
That's to early. You probably need to do this much further below after the commit.
Christian.
do { uint64_t tmp, num_entries, addr;
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object.
v4: Update for modified ttm_bo_vm_dummy_page
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 9fd2157..550dc5e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -49,6 +49,7 @@
#include <drm/drm_debugfs.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h>
#include "amdgpu.h" #include "amdgpu_object.h" @@ -1982,18 +1983,28 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) static vm_fault_t amdgpu_ttm_fault(struct vm_fault *vmf) { struct ttm_buffer_object *bo = vmf->vma->vm_private_data; + struct drm_device *ddev = bo->base.dev; vm_fault_t ret; + int idx;
ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret;
- ret = amdgpu_bo_fault_reserve_notify(bo); - if (ret) - goto unlock; + if (drm_dev_enter(ddev, &idx)) { + ret = amdgpu_bo_fault_reserve_notify(bo); + if (ret) { + drm_dev_exit(idx); + goto unlock; + }
- ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot, - TTM_BO_VM_NUM_PREFAULT, 1); + ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot, + TTM_BO_VM_NUM_PREFAULT, 1); + + drm_dev_exit(idx); + } else { + ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot); + } if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret;
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
On device removal reroute all CPU mappings to dummy page per drm_file instance or imported GEM object.
v4: Update for modified ttm_bo_vm_dummy_page
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 9fd2157..550dc5e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -49,6 +49,7 @@
#include <drm/drm_debugfs.h> #include <drm/amdgpu_drm.h> +#include <drm/drm_drv.h>
#include "amdgpu.h" #include "amdgpu_object.h" @@ -1982,18 +1983,28 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) static vm_fault_t amdgpu_ttm_fault(struct vm_fault *vmf) { struct ttm_buffer_object *bo = vmf->vma->vm_private_data;
struct drm_device *ddev = bo->base.dev; vm_fault_t ret;
int idx;
ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret;
- ret = amdgpu_bo_fault_reserve_notify(bo);
- if (ret)
goto unlock;
- if (drm_dev_enter(ddev, &idx)) {
ret = amdgpu_bo_fault_reserve_notify(bo);
if (ret) {
drm_dev_exit(idx);
goto unlock;
}
- ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
TTM_BO_VM_NUM_PREFAULT, 1);
ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
TTM_BO_VM_NUM_PREFAULT, 1);
drm_dev_exit(idx);
- } else {
ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
- } if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret;
This allows to remove explicit creation and destruction of those attrs and by this avoids warnings on device finilizing post physical device extraction.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +++++++++-------- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 +++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++++++++++--------------- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +++++--------- 4 files changed, 37 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c index 86add0f..0346e12 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c @@ -1953,6 +1953,15 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev, static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version, NULL);
+static struct attribute *amdgpu_vbios_version_attrs[] = { + &dev_attr_vbios_version.attr, + NULL +}; + +const struct attribute_group amdgpu_vbios_version_attr_group = { + .attrs = amdgpu_vbios_version_attrs +}; + /** * amdgpu_atombios_fini - free the driver info and callbacks for atombios * @@ -1972,7 +1981,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev) adev->mode_info.atom_context = NULL; kfree(adev->mode_info.atom_card_info); adev->mode_info.atom_card_info = NULL; - device_remove_file(adev->dev, &dev_attr_vbios_version); }
/** @@ -1989,7 +1997,6 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) { struct card_info *atom_card_info = kzalloc(sizeof(struct card_info), GFP_KERNEL); - int ret;
if (!atom_card_info) return -ENOMEM; @@ -2027,12 +2034,6 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) amdgpu_atombios_allocate_fb_scratch(adev); }
- ret = device_create_file(adev->dev, &dev_attr_vbios_version); - if (ret) { - DRM_ERROR("Failed to create device file for VBIOS version\n"); - return ret; - } - return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 9c0cd00..8fddd74 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1587,6 +1587,18 @@ static struct pci_error_handlers amdgpu_pci_err_handler = { .resume = amdgpu_pci_resume, };
+extern const struct attribute_group amdgpu_vram_mgr_attr_group; +extern const struct attribute_group amdgpu_gtt_mgr_attr_group; +extern const struct attribute_group amdgpu_vbios_version_attr_group; + +static const struct attribute_group *amdgpu_sysfs_groups[] = { + &amdgpu_vram_mgr_attr_group, + &amdgpu_gtt_mgr_attr_group, + &amdgpu_vbios_version_attr_group, + NULL, +}; + + static struct pci_driver amdgpu_kms_pci_driver = { .name = DRIVER_NAME, .id_table = pciidlist, @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = { .shutdown = amdgpu_pci_shutdown, .driver.pm = &amdgpu_pm_ops, .err_handler = &amdgpu_pci_err_handler, + .driver.dev_groups = amdgpu_sysfs_groups, };
static int __init amdgpu_init(void) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index 8980329..3b7150e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -77,6 +77,16 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO, amdgpu_mem_info_gtt_used_show, NULL);
+static struct attribute *amdgpu_gtt_mgr_attributes[] = { + &dev_attr_mem_info_gtt_total.attr, + &dev_attr_mem_info_gtt_used.attr, + NULL +}; + +const struct attribute_group amdgpu_gtt_mgr_attr_group = { + .attrs = amdgpu_gtt_mgr_attributes +}; + static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func; /** * amdgpu_gtt_mgr_init - init GTT manager and DRM MM @@ -91,7 +101,6 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size) struct amdgpu_gtt_mgr *mgr = &adev->mman.gtt_mgr; struct ttm_resource_manager *man = &mgr->manager; uint64_t start, size; - int ret;
man->use_tt = true; man->func = &amdgpu_gtt_mgr_func; @@ -104,17 +113,6 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size) spin_lock_init(&mgr->lock); atomic64_set(&mgr->available, gtt_size >> PAGE_SHIFT);
- ret = device_create_file(adev->dev, &dev_attr_mem_info_gtt_total); - if (ret) { - DRM_ERROR("Failed to create device file mem_info_gtt_total\n"); - return ret; - } - ret = device_create_file(adev->dev, &dev_attr_mem_info_gtt_used); - if (ret) { - DRM_ERROR("Failed to create device file mem_info_gtt_used\n"); - return ret; - } - ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_TT, &mgr->manager); ttm_resource_manager_set_used(man, true); return 0; @@ -144,9 +142,6 @@ void amdgpu_gtt_mgr_fini(struct amdgpu_device *adev) drm_mm_takedown(&mgr->mm); spin_unlock(&mgr->lock);
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total); - device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used); - ttm_resource_manager_cleanup(man); ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_TT, NULL); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index d2de2a7..9158d11 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -154,7 +154,7 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO, static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO, amdgpu_mem_info_vram_vendor, NULL);
-static const struct attribute *amdgpu_vram_mgr_attributes[] = { +static struct attribute *amdgpu_vram_mgr_attributes[] = { &dev_attr_mem_info_vram_total.attr, &dev_attr_mem_info_vis_vram_total.attr, &dev_attr_mem_info_vram_used.attr, @@ -163,6 +163,10 @@ static const struct attribute *amdgpu_vram_mgr_attributes[] = { NULL };
+const struct attribute_group amdgpu_vram_mgr_attr_group = { + .attrs = amdgpu_vram_mgr_attributes +}; + static const struct ttm_resource_manager_func amdgpu_vram_mgr_func;
/** @@ -176,7 +180,6 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) { struct amdgpu_vram_mgr *mgr = &adev->mman.vram_mgr; struct ttm_resource_manager *man = &mgr->manager; - int ret;
ttm_resource_manager_init(man, adev->gmc.real_vram_size >> PAGE_SHIFT);
@@ -187,11 +190,6 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) INIT_LIST_HEAD(&mgr->reservations_pending); INIT_LIST_HEAD(&mgr->reserved_pages);
- /* Add the two VRAM-related sysfs files */ - ret = sysfs_create_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes); - if (ret) - DRM_ERROR("Failed to register sysfs\n"); - ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager); ttm_resource_manager_set_used(man, true); return 0; @@ -229,8 +227,6 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) drm_mm_takedown(&mgr->mm); spin_unlock(&mgr->lock);
- sysfs_remove_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes); - ttm_resource_manager_cleanup(man); ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, NULL); }
On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
static struct pci_driver amdgpu_kms_pci_driver = { .name = DRIVER_NAME, .id_table = pciidlist, @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = { .shutdown = amdgpu_pci_shutdown, .driver.pm = &amdgpu_pm_ops, .err_handler = &amdgpu_pci_err_handler,
- .driver.dev_groups = amdgpu_sysfs_groups,
Shouldn't this just be: groups - amdgpu_sysfs_groups,
Why go to the "driver root" here?
Other than that tiny thing, looks good to me, nice cleanup!
greg k-h
On 1/19/21 2:34 AM, Greg KH wrote:
On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
static struct pci_driver amdgpu_kms_pci_driver = { .name = DRIVER_NAME, .id_table = pciidlist, @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = { .shutdown = amdgpu_pci_shutdown, .driver.pm = &amdgpu_pm_ops, .err_handler = &amdgpu_pci_err_handler,
- .driver.dev_groups = amdgpu_sysfs_groups,
Shouldn't this just be: groups - amdgpu_sysfs_groups,
Why go to the "driver root" here?
Because I still didn't get to your suggestion to propose a patch to add groups to pci_driver, it's located in 'base' driver struct.
Andrey
Other than that tiny thing, looks good to me, nice cleanup!
greg k-h
On Tue, Jan 19, 2021 at 11:36:01AM -0500, Andrey Grodzovsky wrote:
On 1/19/21 2:34 AM, Greg KH wrote:
On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
static struct pci_driver amdgpu_kms_pci_driver = { .name = DRIVER_NAME, .id_table = pciidlist, @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = { .shutdown = amdgpu_pci_shutdown, .driver.pm = &amdgpu_pm_ops, .err_handler = &amdgpu_pci_err_handler,
- .driver.dev_groups = amdgpu_sysfs_groups,
Shouldn't this just be: groups - amdgpu_sysfs_groups,
Why go to the "driver root" here?
Because I still didn't get to your suggestion to propose a patch to add groups to pci_driver, it's located in 'base' driver struct.
You are a pci driver, you should never have to mess with the "base" driver struct. Look at commit 92d50fc1602e ("PCI/IB: add support for pci driver attribute groups") which got merged in 4.14, way back in 2017 :)
driver.pm also looks odd, but I'm just going to ignore that for now...
thanks,
greg k-h
On Tue, Jan 19, 2021 at 1:26 PM Greg KH gregkh@linuxfoundation.org wrote:
On Tue, Jan 19, 2021 at 11:36:01AM -0500, Andrey Grodzovsky wrote:
On 1/19/21 2:34 AM, Greg KH wrote:
On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
static struct pci_driver amdgpu_kms_pci_driver = { .name = DRIVER_NAME, .id_table = pciidlist, @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = { .shutdown = amdgpu_pci_shutdown, .driver.pm = &amdgpu_pm_ops, .err_handler = &amdgpu_pci_err_handler,
- .driver.dev_groups = amdgpu_sysfs_groups,
Shouldn't this just be: groups - amdgpu_sysfs_groups,
Why go to the "driver root" here?
Because I still didn't get to your suggestion to propose a patch to add groups to pci_driver, it's located in 'base' driver struct.
You are a pci driver, you should never have to mess with the "base" driver struct. Look at commit 92d50fc1602e ("PCI/IB: add support for pci driver attribute groups") which got merged in 4.14, way back in 2017 :)
Per the previous discussion of this patch set: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg56019.html
Alex
driver.pm also looks odd, but I'm just going to ignore that for now...
thanks,
greg k-h _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On 1/19/21 2:04 PM, Alex Deucher wrote:
On Tue, Jan 19, 2021 at 1:26 PM Greg KH gregkh@linuxfoundation.org wrote:
On Tue, Jan 19, 2021 at 11:36:01AM -0500, Andrey Grodzovsky wrote:
On 1/19/21 2:34 AM, Greg KH wrote:
On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
static struct pci_driver amdgpu_kms_pci_driver = { .name = DRIVER_NAME, .id_table = pciidlist, @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = { .shutdown = amdgpu_pci_shutdown, .driver.pm = &amdgpu_pm_ops, .err_handler = &amdgpu_pci_err_handler,
- .driver.dev_groups = amdgpu_sysfs_groups,
Shouldn't this just be: groups - amdgpu_sysfs_groups,
Why go to the "driver root" here?
Because I still didn't get to your suggestion to propose a patch to add groups to pci_driver, it's located in 'base' driver struct.
You are a pci driver, you should never have to mess with the "base" driver struct. Look at commit 92d50fc1602e ("PCI/IB: add support for pci driver attribute groups") which got merged in 4.14, way back in 2017 :)
Per the previous discussion of this patch set: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-a...
Alex
Got it, Next iteration I will include a patch like the above to pci-devel as part of the series and will update this patch accordingly.
Andrey
driver.pm also looks odd, but I'm just going to ignore that for now...
thanks,
greg k-h _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On Tue, Jan 19, 2021 at 02:04:48PM -0500, Alex Deucher wrote:
On Tue, Jan 19, 2021 at 1:26 PM Greg KH gregkh@linuxfoundation.org wrote:
On Tue, Jan 19, 2021 at 11:36:01AM -0500, Andrey Grodzovsky wrote:
On 1/19/21 2:34 AM, Greg KH wrote:
On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
static struct pci_driver amdgpu_kms_pci_driver = { .name = DRIVER_NAME, .id_table = pciidlist, @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = { .shutdown = amdgpu_pci_shutdown, .driver.pm = &amdgpu_pm_ops, .err_handler = &amdgpu_pci_err_handler,
- .driver.dev_groups = amdgpu_sysfs_groups,
Shouldn't this just be: groups - amdgpu_sysfs_groups,
Why go to the "driver root" here?
Because I still didn't get to your suggestion to propose a patch to add groups to pci_driver, it's located in 'base' driver struct.
You are a pci driver, you should never have to mess with the "base" driver struct. Look at commit 92d50fc1602e ("PCI/IB: add support for pci driver attribute groups") which got merged in 4.14, way back in 2017 :)
Per the previous discussion of this patch set: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg56019.html
Hey, at least I'm consistent :)
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This allows to remove explicit creation and destruction of those attrs and by this avoids warnings on device finilizing post physical device extraction.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Acked-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +++++++++-------- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 +++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++++++++++--------------- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +++++--------- 4 files changed, 37 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c index 86add0f..0346e12 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c @@ -1953,6 +1953,15 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev, static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version, NULL);
+static struct attribute *amdgpu_vbios_version_attrs[] = {
- &dev_attr_vbios_version.attr,
- NULL
+};
+const struct attribute_group amdgpu_vbios_version_attr_group = {
- .attrs = amdgpu_vbios_version_attrs
+};
- /**
- amdgpu_atombios_fini - free the driver info and callbacks for atombios
@@ -1972,7 +1981,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev) adev->mode_info.atom_context = NULL; kfree(adev->mode_info.atom_card_info); adev->mode_info.atom_card_info = NULL;
device_remove_file(adev->dev, &dev_attr_vbios_version); }
/**
@@ -1989,7 +1997,6 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) { struct card_info *atom_card_info = kzalloc(sizeof(struct card_info), GFP_KERNEL);
int ret;
if (!atom_card_info) return -ENOMEM;
@@ -2027,12 +2034,6 @@ int amdgpu_atombios_init(struct amdgpu_device *adev) amdgpu_atombios_allocate_fb_scratch(adev); }
- ret = device_create_file(adev->dev, &dev_attr_vbios_version);
- if (ret) {
DRM_ERROR("Failed to create device file for VBIOS version\n");
return ret;
- }
- return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 9c0cd00..8fddd74 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1587,6 +1587,18 @@ static struct pci_error_handlers amdgpu_pci_err_handler = { .resume = amdgpu_pci_resume, };
+extern const struct attribute_group amdgpu_vram_mgr_attr_group; +extern const struct attribute_group amdgpu_gtt_mgr_attr_group; +extern const struct attribute_group amdgpu_vbios_version_attr_group;
+static const struct attribute_group *amdgpu_sysfs_groups[] = {
- &amdgpu_vram_mgr_attr_group,
- &amdgpu_gtt_mgr_attr_group,
- &amdgpu_vbios_version_attr_group,
- NULL,
+};
- static struct pci_driver amdgpu_kms_pci_driver = { .name = DRIVER_NAME, .id_table = pciidlist,
@@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = { .shutdown = amdgpu_pci_shutdown, .driver.pm = &amdgpu_pm_ops, .err_handler = &amdgpu_pci_err_handler,
.driver.dev_groups = amdgpu_sysfs_groups, };
static int __init amdgpu_init(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index 8980329..3b7150e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -77,6 +77,16 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO, static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO, amdgpu_mem_info_gtt_used_show, NULL);
+static struct attribute *amdgpu_gtt_mgr_attributes[] = {
- &dev_attr_mem_info_gtt_total.attr,
- &dev_attr_mem_info_gtt_used.attr,
- NULL
+};
+const struct attribute_group amdgpu_gtt_mgr_attr_group = {
- .attrs = amdgpu_gtt_mgr_attributes
+};
- static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func; /**
- amdgpu_gtt_mgr_init - init GTT manager and DRM MM
@@ -91,7 +101,6 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size) struct amdgpu_gtt_mgr *mgr = &adev->mman.gtt_mgr; struct ttm_resource_manager *man = &mgr->manager; uint64_t start, size;
int ret;
man->use_tt = true; man->func = &amdgpu_gtt_mgr_func;
@@ -104,17 +113,6 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size) spin_lock_init(&mgr->lock); atomic64_set(&mgr->available, gtt_size >> PAGE_SHIFT);
- ret = device_create_file(adev->dev, &dev_attr_mem_info_gtt_total);
- if (ret) {
DRM_ERROR("Failed to create device file mem_info_gtt_total\n");
return ret;
- }
- ret = device_create_file(adev->dev, &dev_attr_mem_info_gtt_used);
- if (ret) {
DRM_ERROR("Failed to create device file mem_info_gtt_used\n");
return ret;
- }
- ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_TT, &mgr->manager); ttm_resource_manager_set_used(man, true); return 0;
@@ -144,9 +142,6 @@ void amdgpu_gtt_mgr_fini(struct amdgpu_device *adev) drm_mm_takedown(&mgr->mm); spin_unlock(&mgr->lock);
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
- device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
- ttm_resource_manager_cleanup(man); ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_TT, NULL); }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index d2de2a7..9158d11 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -154,7 +154,7 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO, static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO, amdgpu_mem_info_vram_vendor, NULL);
-static const struct attribute *amdgpu_vram_mgr_attributes[] = { +static struct attribute *amdgpu_vram_mgr_attributes[] = { &dev_attr_mem_info_vram_total.attr, &dev_attr_mem_info_vis_vram_total.attr, &dev_attr_mem_info_vram_used.attr, @@ -163,6 +163,10 @@ static const struct attribute *amdgpu_vram_mgr_attributes[] = { NULL };
+const struct attribute_group amdgpu_vram_mgr_attr_group = {
- .attrs = amdgpu_vram_mgr_attributes
+};
static const struct ttm_resource_manager_func amdgpu_vram_mgr_func;
/**
@@ -176,7 +180,6 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) { struct amdgpu_vram_mgr *mgr = &adev->mman.vram_mgr; struct ttm_resource_manager *man = &mgr->manager;
int ret;
ttm_resource_manager_init(man, adev->gmc.real_vram_size >> PAGE_SHIFT);
@@ -187,11 +190,6 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) INIT_LIST_HEAD(&mgr->reservations_pending); INIT_LIST_HEAD(&mgr->reserved_pages);
- /* Add the two VRAM-related sysfs files */
- ret = sysfs_create_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes);
- if (ret)
DRM_ERROR("Failed to register sysfs\n");
- ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager); ttm_resource_manager_set_used(man, true); return 0;
@@ -229,8 +227,6 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) drm_mm_takedown(&mgr->mm); spin_unlock(&mgr->lock);
- sysfs_remove_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes);
- ttm_resource_manager_cleanup(man); ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, NULL); }
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@
#include <linux/iommu.h>
+#include <drm/drm_drv.h> + MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) { + int idx; + if (adev->in_pci_err_recovery) return;
+ + if (!drm_dev_enter(&adev->ddev, &idx)) + return; + if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG(); + + drm_dev_exit(idx); }
/** @@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) { + int idx; + if (adev->in_pci_err_recovery) return;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; + if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) && @@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, }
trace_amdgpu_device_wreg(adev->pdev->device, reg, v); + + drm_dev_exit(idx); }
/* @@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) { + int idx; + if (adev->in_pci_err_recovery) return;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; + if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) { @@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); } + + drm_dev_exit(idx); }
/** @@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) { + int idx; + if (adev->in_pci_err_recovery) return;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; + if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); } + + drm_dev_exit(idx); }
/** @@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) { + int idx; + if (adev->in_pci_err_recovery) return;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; + if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); } + + drm_dev_exit(idx); }
/** @@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) { + int idx; + if (adev->in_pci_err_recovery) return;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; + if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); } + + drm_dev_exit(idx); }
/** @@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags); + + drm_dev_exit(idx); }
/** @@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags); + + drm_dev_exit(idx); }
/** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h"
+#include <drm/drm_drv.h> + /** * amdgpu_gmc_get_pde_for_bo - get the PDE for a BO * @@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value; + int idx; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return 0;
/* * The following is for PTE only. GART does not have PDEs. @@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8)); + + drm_dev_exit(idx); + return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@
#include "amdgpu_ras.h"
+#include <drm/drm_drv.h> + static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev);
@@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret; - int index; + int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false; @@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0;
+ if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return 0; + mutex_lock(&psp->mutex);
memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); @@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value); - mutex_unlock(&psp->mutex); - return ret; + goto exit; }
amdgpu_asic_invalidate_hdp(psp->adev, NULL); @@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) { - mutex_unlock(&psp->mutex); - return -EINVAL; + ret = -EINVAL; + goto exit; } }
@@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; } - mutex_unlock(&psp->mutex);
+exit: + mutex_unlock(&psp->mutex); + drm_dev_exit(idx); return ret; }
@@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */ - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size); + psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size);
psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size);
@@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size); + psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size);
psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size); @@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); + psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); + psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr, + psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, @@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); + psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); + psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; }
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{ + int idx; + + if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return; + + memset(psp->fw_pri_buf, 0, PSP_1_MEG); + memcpy(psp->fw_pri_buf, start_addr, bin_size); + + drm_dev_exit(idx); +} + + static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr); + +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size); + #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h"
+#include <drm/drm_drv.h> + /* * Rings * Most engines on the GPU are fed via ring buffers. Ring @@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; } + +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{ + int idx; + int i = 0; + + if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return; + + while (i <= ring->buf_mask) + ring->ring[i++] = ring->funcs->nop; + + drm_dev_exit(idx); + +} + +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{ + int idx; + + if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return; + + if (ring->count_dw <= 0) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); + ring->ring[ring->wptr++ & ring->buf_mask] = v; + ring->wptr &= ring->ptr_mask; + ring->count_dw--; + + drm_dev_exit(idx); +} + +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw) +{ + unsigned occupied, chunk1, chunk2; + void *dst; + int idx; + + if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return; + + if (unlikely(ring->count_dw < count_dw)) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); + + occupied = ring->wptr & ring->buf_mask; + dst = (void *)&ring->ring[occupied]; + chunk1 = ring->buf_mask + 1 - occupied; + chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; + chunk2 = count_dw - chunk1; + chunk1 <<= 2; + chunk2 <<= 2; + + if (chunk1) + memcpy(dst, src, chunk1); + + if (chunk2) { + src += chunk1; + dst = (void *)ring->ring; + memcpy(dst, src, chunk2); + } + + ring->wptr += count_dw; + ring->wptr &= ring->ptr_mask; + ring->count_dw -= count_dw; + + drm_dev_exit(idx); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; }
-static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{ - int i = 0; - while (i <= ring->buf_mask) - ring->ring[i++] = ring->funcs->nop; - -} - -static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{ - if (ring->count_dw <= 0) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); - ring->ring[ring->wptr++ & ring->buf_mask] = v; - ring->wptr &= ring->ptr_mask; - ring->count_dw--; -} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring);
-static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, - void *src, int count_dw) -{ - unsigned occupied, chunk1, chunk2; - void *dst; - - if (unlikely(ring->count_dw < count_dw)) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); - - occupied = ring->wptr & ring->buf_mask; - dst = (void *)&ring->ring[occupied]; - chunk1 = ring->buf_mask + 1 - occupied; - chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; - chunk2 = count_dw - chunk1; - chunk1 <<= 2; - chunk2 <<= 2; +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v);
- if (chunk1) - memcpy(dst, src, chunk1); - - if (chunk2) { - src += chunk1; - dst = (void *)ring->ring; - memcpy(dst, src, chunk2); - } - - ring->wptr += count_dw; - ring->wptr &= ring->ptr_mask; - ring->count_dw -= count_dw; -} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw);
int amdgpu_ring_test_helper(struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - /* Copy PSP KDB binary to memory */ - memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size); + psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size);
/* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - /* Copy PSP SPL binary to memory */ - memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size); + psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size);
/* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - /* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size);
/* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - /* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size);
/* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - /* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size);
/* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - /* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size);
/* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - /* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size);
/* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG); - /* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size);
/* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@
#include <linux/iommu.h>
+#include <drm/drm_drv.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) {
int idx;
if (adev->in_pci_err_recovery) return;
if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG();
drm_dev_exit(idx); }
/**
@@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) {
int idx;
if (adev->in_pci_err_recovery) return;
if (!drm_dev_enter(&adev->ddev, &idx))
return;
if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) &&
@@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, }
trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
drm_dev_exit(idx); }
/*
@@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) {
int idx;
if (adev->in_pci_err_recovery) return;
if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) {
@@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); }
drm_dev_exit(idx); }
/**
@@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) {
int idx;
if (adev->in_pci_err_recovery) return;
if (!drm_dev_enter(&adev->ddev, &idx))
return;
if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); }
drm_dev_exit(idx); }
/**
@@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) {
int idx;
if (adev->in_pci_err_recovery) return;
if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
drm_dev_exit(idx); }
/**
@@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) {
int idx;
if (adev->in_pci_err_recovery) return;
if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
drm_dev_exit(idx); }
/**
@@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset;
int idx;
if (!drm_dev_enter(&adev->ddev, &idx))
return;
spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4;
@@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
drm_dev_exit(idx); }
/**
@@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset;
int idx;
if (!drm_dev_enter(&adev->ddev, &idx))
return;
spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4;
@@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
drm_dev_exit(idx); }
/**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h"
+#include <drm/drm_drv.h>
- /**
- amdgpu_gmc_get_pde_for_bo - get the PDE for a BO
@@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value;
int idx;
if (!drm_dev_enter(&adev->ddev, &idx))
return 0;
/*
- The following is for PTE only. GART does not have PDEs.
@@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8));
- drm_dev_exit(idx);
- return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@
#include "amdgpu_ras.h"
+#include <drm/drm_drv.h>
- static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev);
@@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret;
- int index;
- int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false;
@@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0;
if (!drm_dev_enter(&psp->adev->ddev, &idx))
return 0;
mutex_lock(&psp->mutex);
memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
@@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value);
mutex_unlock(&psp->mutex);
return ret;
goto exit;
}
amdgpu_asic_invalidate_hdp(psp->adev, NULL);
@@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) {
mutex_unlock(&psp->mutex);
return -EINVAL;
ret = -EINVAL;
} }goto exit;
@@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; }
- mutex_unlock(&psp->mutex);
+exit:
- mutex_unlock(&psp->mutex);
- drm_dev_exit(idx); return ret; }
@@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size);
psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size);
psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size);
@@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size);
psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size);
psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size);
@@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size);
psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size);
psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr,
psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size);
psp_prep_ta_load_cmd_buf(cmd,
@@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size);
psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size);
psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size);
psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; }
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{
- int idx;
- if (!drm_dev_enter(&psp->adev->ddev, &idx))
return;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, start_addr, bin_size);
- drm_dev_exit(idx);
+}
- static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr);
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size);
- #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h"
+#include <drm/drm_drv.h>
- /*
- Rings
- Most engines on the GPU are fed via ring buffers. Ring
@@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; }
+void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{
- int idx;
- int i = 0;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- while (i <= ring->buf_mask)
ring->ring[i++] = ring->funcs->nop;
- drm_dev_exit(idx);
+}
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{
- int idx;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- if (ring->count_dw <= 0)
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- ring->ring[ring->wptr++ & ring->buf_mask] = v;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw--;
- drm_dev_exit(idx);
+}
+void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
void *src, int count_dw)
+{
- unsigned occupied, chunk1, chunk2;
- void *dst;
- int idx;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- if (unlikely(ring->count_dw < count_dw))
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask;
- dst = (void *)&ring->ring[occupied];
- chunk1 = ring->buf_mask + 1 - occupied;
- chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1;
- chunk2 = count_dw - chunk1;
- chunk1 <<= 2;
- chunk2 <<= 2;
- if (chunk1)
memcpy(dst, src, chunk1);
- if (chunk2) {
src += chunk1;
dst = (void *)ring->ring;
memcpy(dst, src, chunk2);
- }
- ring->wptr += count_dw;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw -= count_dw;
- drm_dev_exit(idx);
+} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; }
-static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{
- int i = 0;
- while (i <= ring->buf_mask)
ring->ring[i++] = ring->funcs->nop;
-}
-static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{
- if (ring->count_dw <= 0)
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- ring->ring[ring->wptr++ & ring->buf_mask] = v;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw--;
-} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring);
-static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
void *src, int count_dw)
-{
- unsigned occupied, chunk1, chunk2;
- void *dst;
- if (unlikely(ring->count_dw < count_dw))
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask;
- dst = (void *)&ring->ring[occupied];
- chunk1 = ring->buf_mask + 1 - occupied;
- chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1;
- chunk2 = count_dw - chunk1;
- chunk1 <<= 2;
- chunk2 <<= 2;
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v);
- if (chunk1)
memcpy(dst, src, chunk1);
- if (chunk2) {
src += chunk1;
dst = (void *)ring->ring;
memcpy(dst, src, chunk2);
- }
- ring->wptr += count_dw;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw -= count_dw;
-} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
void *src, int count_dw);
int amdgpu_ring_test_helper(struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- /* Copy PSP KDB binary to memory */
- memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size);
psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size);
/* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- /* Copy PSP SPL binary to memory */
- memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size);
psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size);
/* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- /* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size);
/* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- /* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size);
/* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- /* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size);
/* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- /* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size);
/* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- /* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size);
/* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- /* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size);
/* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
There is really no other way according to this article https://lwn.net/Articles/767885/
"A perfect solution seems nearly impossible though; we cannot acquire a mutex on the user to prevent them from yanking a device and we cannot check for a presence change after every device access for performance reasons. "
But I assumed srcu_read_lock should be pretty seamless performance wise, no ? The other solution would be as I suggested to keep all the device IO ranges reserved and system memory pages unfreed until the device is finalized in the driver but Daniel said this would upset the PCI layer (the MMIO ranges reservation part).
Andrey
On 1/19/21 3:55 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@ #include <linux/iommu.h> +#include <drm/drm_drv.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) { + int idx;
if (adev->in_pci_err_recovery) return; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG();
+ drm_dev_exit(idx); } /** @@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) && @@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, } trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
+ drm_dev_exit(idx); } /* @@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) { @@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); }
+ drm_dev_exit(idx); } /** @@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); }
+ drm_dev_exit(idx); } /** @@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** @@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h" +#include <drm/drm_drv.h>
/** * amdgpu_gmc_get_pde_for_bo - get the PDE for a BO * @@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return 0; /* * The following is for PTE only. GART does not have PDEs. @@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8));
+ drm_dev_exit(idx);
return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@ #include "amdgpu_ras.h" +#include <drm/drm_drv.h>
static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev); @@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret; - int index; + int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false; @@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0; + if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return 0;
mutex_lock(&psp->mutex); memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); @@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value); - mutex_unlock(&psp->mutex); - return ret; + goto exit; } amdgpu_asic_invalidate_hdp(psp->adev, NULL); @@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) { - mutex_unlock(&psp->mutex); - return -EINVAL; + ret = -EINVAL; + goto exit; } } @@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; } - mutex_unlock(&psp->mutex); +exit: + mutex_unlock(&psp->mutex); + drm_dev_exit(idx); return ret; } @@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */ - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size); + psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size); psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size); @@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size); + psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size); psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size); @@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); + psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); + psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr, + psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size); psp_prep_ta_load_cmd_buf(cmd, @@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); + psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); + psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; } +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{ + int idx;
+ if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return;
+ memset(psp->fw_pri_buf, 0, PSP_1_MEG); + memcpy(psp->fw_pri_buf, start_addr, bin_size);
+ drm_dev_exit(idx); +}
static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr);
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size);
#endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h" +#include <drm/drm_drv.h>
/* * Rings * Most engines on the GPU are fed via ring buffers. Ring @@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; }
+void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{ + int idx; + int i = 0;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ while (i <= ring->buf_mask) + ring->ring[i++] = ring->funcs->nop;
+ drm_dev_exit(idx);
+}
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{ + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (ring->count_dw <= 0) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); + ring->ring[ring->wptr++ & ring->buf_mask] = v; + ring->wptr &= ring->ptr_mask; + ring->count_dw--;
+ drm_dev_exit(idx); +}
+void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw) +{ + unsigned occupied, chunk1, chunk2; + void *dst; + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (unlikely(ring->count_dw < count_dw)) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
+ occupied = ring->wptr & ring->buf_mask; + dst = (void *)&ring->ring[occupied]; + chunk1 = ring->buf_mask + 1 - occupied; + chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; + chunk2 = count_dw - chunk1; + chunk1 <<= 2; + chunk2 <<= 2;
+ if (chunk1) + memcpy(dst, src, chunk1);
+ if (chunk2) { + src += chunk1; + dst = (void *)ring->ring; + memcpy(dst, src, chunk2); + }
+ ring->wptr += count_dw; + ring->wptr &= ring->ptr_mask; + ring->count_dw -= count_dw;
+ drm_dev_exit(idx); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; } -static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{ - int i = 0; - while (i <= ring->buf_mask) - ring->ring[i++] = ring->funcs->nop;
-}
-static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{ - if (ring->count_dw <= 0) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); - ring->ring[ring->wptr++ & ring->buf_mask] = v; - ring->wptr &= ring->ptr_mask; - ring->count_dw--; -} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring); -static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, - void *src, int count_dw) -{ - unsigned occupied, chunk1, chunk2; - void *dst;
- if (unlikely(ring->count_dw < count_dw)) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask; - dst = (void *)&ring->ring[occupied]; - chunk1 = ring->buf_mask + 1 - occupied; - chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; - chunk2 = count_dw - chunk1; - chunk1 <<= 2; - chunk2 <<= 2; +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v); - if (chunk1) - memcpy(dst, src, chunk1);
- if (chunk2) { - src += chunk1; - dst = (void *)ring->ring; - memcpy(dst, src, chunk2); - }
- ring->wptr += count_dw; - ring->wptr &= ring->ptr_mask; - ring->count_dw -= count_dw; -} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw); int amdgpu_ring_test_helper(struct amdgpu_ring *ring); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP KDB binary to memory */ - memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size); + psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size); /* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP SPL binary to memory */ - memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size); + psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size); /* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
The is also the possibility to have the drm_dev_enter/exit much more high level.
E.g. we should it have anyway on every IOCTL and what remains are work items, scheduler threads and interrupts.
Christian.
Am 19.01.21 um 16:35 schrieb Andrey Grodzovsky:
There is really no other way according to this article https://lwn.net/Articles/767885/
"A perfect solution seems nearly impossible though; we cannot acquire a mutex on the user to prevent them from yanking a device and we cannot check for a presence change after every device access for performance reasons. "
But I assumed srcu_read_lock should be pretty seamless performance wise, no ? The other solution would be as I suggested to keep all the device IO ranges reserved and system memory pages unfreed until the device is finalized in the driver but Daniel said this would upset the PCI layer (the MMIO ranges reservation part).
Andrey
On 1/19/21 3:55 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@ #include <linux/iommu.h> +#include <drm/drm_drv.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) { + int idx;
if (adev->in_pci_err_recovery) return; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG();
+ drm_dev_exit(idx); } /** @@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) && @@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, } trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
+ drm_dev_exit(idx); } /* @@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) { @@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); }
+ drm_dev_exit(idx); } /** @@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); }
+ drm_dev_exit(idx); } /** @@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** @@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h" +#include <drm/drm_drv.h>
/** * amdgpu_gmc_get_pde_for_bo - get the PDE for a BO * @@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return 0; /* * The following is for PTE only. GART does not have PDEs. @@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8));
+ drm_dev_exit(idx);
return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@ #include "amdgpu_ras.h" +#include <drm/drm_drv.h>
static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev); @@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret; - int index; + int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false; @@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0; + if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return 0;
mutex_lock(&psp->mutex); memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); @@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value); - mutex_unlock(&psp->mutex); - return ret; + goto exit; } amdgpu_asic_invalidate_hdp(psp->adev, NULL); @@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) { - mutex_unlock(&psp->mutex); - return -EINVAL; + ret = -EINVAL; + goto exit; } } @@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; } - mutex_unlock(&psp->mutex); +exit: + mutex_unlock(&psp->mutex); + drm_dev_exit(idx); return ret; } @@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */ - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size); + psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size); psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size); @@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size); + psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size); psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size); @@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); + psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); + psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr, + psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size); psp_prep_ta_load_cmd_buf(cmd, @@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); + psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); + psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; } +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{ + int idx;
+ if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return;
+ memset(psp->fw_pri_buf, 0, PSP_1_MEG); + memcpy(psp->fw_pri_buf, start_addr, bin_size);
+ drm_dev_exit(idx); +}
static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr);
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size);
#endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h" +#include <drm/drm_drv.h>
/* * Rings * Most engines on the GPU are fed via ring buffers. Ring @@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; }
+void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{ + int idx; + int i = 0;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ while (i <= ring->buf_mask) + ring->ring[i++] = ring->funcs->nop;
+ drm_dev_exit(idx);
+}
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{ + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (ring->count_dw <= 0) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); + ring->ring[ring->wptr++ & ring->buf_mask] = v; + ring->wptr &= ring->ptr_mask; + ring->count_dw--;
+ drm_dev_exit(idx); +}
+void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw) +{ + unsigned occupied, chunk1, chunk2; + void *dst; + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (unlikely(ring->count_dw < count_dw)) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
+ occupied = ring->wptr & ring->buf_mask; + dst = (void *)&ring->ring[occupied]; + chunk1 = ring->buf_mask + 1 - occupied; + chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; + chunk2 = count_dw - chunk1; + chunk1 <<= 2; + chunk2 <<= 2;
+ if (chunk1) + memcpy(dst, src, chunk1);
+ if (chunk2) { + src += chunk1; + dst = (void *)ring->ring; + memcpy(dst, src, chunk2); + }
+ ring->wptr += count_dw; + ring->wptr &= ring->ptr_mask; + ring->count_dw -= count_dw;
+ drm_dev_exit(idx); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; } -static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{ - int i = 0; - while (i <= ring->buf_mask) - ring->ring[i++] = ring->funcs->nop;
-}
-static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{ - if (ring->count_dw <= 0) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); - ring->ring[ring->wptr++ & ring->buf_mask] = v; - ring->wptr &= ring->ptr_mask; - ring->count_dw--; -} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring); -static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, - void *src, int count_dw) -{ - unsigned occupied, chunk1, chunk2; - void *dst;
- if (unlikely(ring->count_dw < count_dw)) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask; - dst = (void *)&ring->ring[occupied]; - chunk1 = ring->buf_mask + 1 - occupied; - chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; - chunk2 = count_dw - chunk1; - chunk1 <<= 2; - chunk2 <<= 2; +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v); - if (chunk1) - memcpy(dst, src, chunk1);
- if (chunk2) { - src += chunk1; - dst = (void *)ring->ring; - memcpy(dst, src, chunk2); - }
- ring->wptr += count_dw; - ring->wptr &= ring->ptr_mask; - ring->count_dw -= count_dw; -} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw); int amdgpu_ring_test_helper(struct amdgpu_ring *ring); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP KDB binary to memory */ - memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size); + psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size); /* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP SPL binary to memory */ - memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size); + psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size); /* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
On Tue, Jan 19, 2021 at 4:35 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
There is really no other way according to this article https://lwn.net/Articles/767885/
"A perfect solution seems nearly impossible though; we cannot acquire a mutex on the user to prevent them from yanking a device and we cannot check for a presence change after every device access for performance reasons. "
But I assumed srcu_read_lock should be pretty seamless performance wise, no ?
The read side is supposed to be dirt cheap, the write side is were we just stall for all readers to eventually complete on their own. Definitely should be much cheaper than mmio read, on the mmio write side it might actually hurt a bit. Otoh I think those don't stall the cpu by default when they're timing out, so maybe if the overhead is too much for those, we could omit them?
Maybe just do a small microbenchmark for these for testing, with a register that doesn't change hw state. So with and without drm_dev_enter/exit, and also one with the hw plugged out so that we have actual timeouts in the transactions. -Daniel
The other solution would be as I suggested to keep all the device IO ranges reserved and system memory pages unfreed until the device is finalized in the driver but Daniel said this would upset the PCI layer (the MMIO ranges reservation part).
Andrey
On 1/19/21 3:55 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@ #include <linux/iommu.h> +#include <drm/drm_drv.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG();
- drm_dev_exit(idx); } /**
@@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) &&
@@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, } trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
- drm_dev_exit(idx); } /*
@@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) {
@@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); }
- drm_dev_exit(idx); } /**
@@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); }
- drm_dev_exit(idx); } /**
@@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
- drm_dev_exit(idx); } /**
@@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
- drm_dev_exit(idx); } /**
@@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset;
- int idx;
- if (!drm_dev_enter(&adev->ddev, &idx))
return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4;
@@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
- drm_dev_exit(idx); } /**
@@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset;
- int idx;
- if (!drm_dev_enter(&adev->ddev, &idx))
return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4;
@@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
- drm_dev_exit(idx); } /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h" +#include <drm/drm_drv.h>
- /**
- amdgpu_gmc_get_pde_for_bo - get the PDE for a BO
@@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value;
- int idx;
- if (!drm_dev_enter(&adev->ddev, &idx))
return 0; /* * The following is for PTE only. GART does not have PDEs.
@@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8));
- drm_dev_exit(idx);
} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.creturn 0;
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@ #include "amdgpu_ras.h" +#include <drm/drm_drv.h>
- static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev); @@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret;
- int index;
- int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false;
@@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0;
- if (!drm_dev_enter(&psp->adev->ddev, &idx))
return 0;
mutex_lock(&psp->mutex); memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
@@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value);
mutex_unlock(&psp->mutex);
return ret;
goto exit; } amdgpu_asic_invalidate_hdp(psp->adev, NULL);
@@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) {
mutex_unlock(&psp->mutex);
return -EINVAL;
ret = -EINVAL;
@@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; }goto exit; } }
- mutex_unlock(&psp->mutex); +exit:
- mutex_unlock(&psp->mutex);
- drm_dev_exit(idx); return ret; } @@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size);
- psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size); psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size); @@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size);
- psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size); psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size);
@@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size);
- psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size);
- psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr,
- psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size); psp_prep_ta_load_cmd_buf(cmd,
@@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size);
- psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size);
- psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; } +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{
- int idx;
- if (!drm_dev_enter(&psp->adev->ddev, &idx))
return;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, start_addr, bin_size);
- drm_dev_exit(idx);
+}
- static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr);
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size);
- #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h" +#include <drm/drm_drv.h>
- /*
- Rings
- Most engines on the GPU are fed via ring buffers. Ring
@@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; }
+void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{
- int idx;
- int i = 0;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- while (i <= ring->buf_mask)
ring->ring[i++] = ring->funcs->nop;
- drm_dev_exit(idx);
+}
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{
- int idx;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- if (ring->count_dw <= 0)
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- ring->ring[ring->wptr++ & ring->buf_mask] = v;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw--;
- drm_dev_exit(idx);
+}
+void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
void *src, int count_dw)
+{
- unsigned occupied, chunk1, chunk2;
- void *dst;
- int idx;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- if (unlikely(ring->count_dw < count_dw))
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask;
- dst = (void *)&ring->ring[occupied];
- chunk1 = ring->buf_mask + 1 - occupied;
- chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1;
- chunk2 = count_dw - chunk1;
- chunk1 <<= 2;
- chunk2 <<= 2;
- if (chunk1)
memcpy(dst, src, chunk1);
- if (chunk2) {
src += chunk1;
dst = (void *)ring->ring;
memcpy(dst, src, chunk2);
- }
- ring->wptr += count_dw;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw -= count_dw;
- drm_dev_exit(idx);
+} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; } -static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{
- int i = 0;
- while (i <= ring->buf_mask)
ring->ring[i++] = ring->funcs->nop;
-}
-static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{
- if (ring->count_dw <= 0)
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- ring->ring[ring->wptr++ & ring->buf_mask] = v;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw--;
-} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring); -static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
void *src, int count_dw)
-{
- unsigned occupied, chunk1, chunk2;
- void *dst;
- if (unlikely(ring->count_dw < count_dw))
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask;
- dst = (void *)&ring->ring[occupied];
- chunk1 = ring->buf_mask + 1 - occupied;
- chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1;
- chunk2 = count_dw - chunk1;
- chunk1 <<= 2;
- chunk2 <<= 2;
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v);
- if (chunk1)
memcpy(dst, src, chunk1);
- if (chunk2) {
src += chunk1;
dst = (void *)ring->ring;
memcpy(dst, src, chunk2);
- }
- ring->wptr += count_dw;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw -= count_dw;
-} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
int amdgpu_ring_test_helper(struct amdgpu_ring *ring); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.cvoid *src, int count_dw);
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP KDB binary to memory */
- memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size);
- psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size); /* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP SPL binary to memory */
- memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size);
- psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size); /* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
- psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
- psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
- psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
- psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
- psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
- psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
On 1/19/21 1:05 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 4:35 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
There is really no other way according to this article https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2F...
"A perfect solution seems nearly impossible though; we cannot acquire a mutex on the user to prevent them from yanking a device and we cannot check for a presence change after every device access for performance reasons. "
But I assumed srcu_read_lock should be pretty seamless performance wise, no ?
The read side is supposed to be dirt cheap, the write side is were we just stall for all readers to eventually complete on their own. Definitely should be much cheaper than mmio read, on the mmio write side it might actually hurt a bit. Otoh I think those don't stall the cpu by default when they're timing out, so maybe if the overhead is too much for those, we could omit them?
Maybe just do a small microbenchmark for these for testing, with a register that doesn't change hw state. So with and without drm_dev_enter/exit, and also one with the hw plugged out so that we have actual timeouts in the transactions. -Daniel
So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
Andrey
The other solution would be as I suggested to keep all the device IO ranges reserved and system memory pages unfreed until the device is finalized in the driver but Daniel said this would upset the PCI layer (the MMIO ranges reservation part).
Andrey
On 1/19/21 3:55 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@ #include <linux/iommu.h> +#include <drm/drm_drv.h>
- MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG();
- drm_dev_exit(idx); } /**
@@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) &&
@@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, } trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
- drm_dev_exit(idx); } /*
@@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) {
@@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); }
- drm_dev_exit(idx); } /**
@@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); }
- drm_dev_exit(idx); } /**
@@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
- drm_dev_exit(idx); } /**
@@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) {
- int idx;
if (adev->in_pci_err_recovery) return;
- if (!drm_dev_enter(&adev->ddev, &idx))
return;
if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
- drm_dev_exit(idx); } /**
@@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset;
- int idx;
- if (!drm_dev_enter(&adev->ddev, &idx))
return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4;
@@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
- drm_dev_exit(idx); } /**
@@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset;
- int idx;
- if (!drm_dev_enter(&adev->ddev, &idx))
return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4;
@@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
- drm_dev_exit(idx); } /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h" +#include <drm/drm_drv.h>
- /**
- amdgpu_gmc_get_pde_for_bo - get the PDE for a BO
@@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value;
- int idx;
- if (!drm_dev_enter(&adev->ddev, &idx))
return 0; /* * The following is for PTE only. GART does not have PDEs.
@@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8));
- drm_dev_exit(idx);
} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.creturn 0;
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@ #include "amdgpu_ras.h" +#include <drm/drm_drv.h>
- static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev); @@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret;
- int index;
- int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false;
@@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0;
- if (!drm_dev_enter(&psp->adev->ddev, &idx))
return 0;
mutex_lock(&psp->mutex); memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
@@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value);
mutex_unlock(&psp->mutex);
return ret;
goto exit; } amdgpu_asic_invalidate_hdp(psp->adev, NULL);
@@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) {
mutex_unlock(&psp->mutex);
return -EINVAL;
ret = -EINVAL;
@@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; }goto exit; } }
- mutex_unlock(&psp->mutex); +exit:
- mutex_unlock(&psp->mutex);
- drm_dev_exit(idx); return ret; } @@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size);
- psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size); psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size); @@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size);
- psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size); psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size);
@@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size);
- psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size);
- psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr,
- psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size); psp_prep_ta_load_cmd_buf(cmd,
@@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size);
- psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size);
- psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr,
@@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; } +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{
- int idx;
- if (!drm_dev_enter(&psp->adev->ddev, &idx))
return;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
- memcpy(psp->fw_pri_buf, start_addr, bin_size);
- drm_dev_exit(idx);
+}
- static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr);
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size);
- #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h" +#include <drm/drm_drv.h>
- /*
- Rings
- Most engines on the GPU are fed via ring buffers. Ring
@@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; }
+void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{
- int idx;
- int i = 0;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- while (i <= ring->buf_mask)
ring->ring[i++] = ring->funcs->nop;
- drm_dev_exit(idx);
+}
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{
- int idx;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- if (ring->count_dw <= 0)
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- ring->ring[ring->wptr++ & ring->buf_mask] = v;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw--;
- drm_dev_exit(idx);
+}
+void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
void *src, int count_dw)
+{
- unsigned occupied, chunk1, chunk2;
- void *dst;
- int idx;
- if (!drm_dev_enter(&ring->adev->ddev, &idx))
return;
- if (unlikely(ring->count_dw < count_dw))
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask;
- dst = (void *)&ring->ring[occupied];
- chunk1 = ring->buf_mask + 1 - occupied;
- chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1;
- chunk2 = count_dw - chunk1;
- chunk1 <<= 2;
- chunk2 <<= 2;
- if (chunk1)
memcpy(dst, src, chunk1);
- if (chunk2) {
src += chunk1;
dst = (void *)ring->ring;
memcpy(dst, src, chunk2);
- }
- ring->wptr += count_dw;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw -= count_dw;
- drm_dev_exit(idx);
+} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; } -static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{
- int i = 0;
- while (i <= ring->buf_mask)
ring->ring[i++] = ring->funcs->nop;
-}
-static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{
- if (ring->count_dw <= 0)
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- ring->ring[ring->wptr++ & ring->buf_mask] = v;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw--;
-} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring); -static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
void *src, int count_dw)
-{
- unsigned occupied, chunk1, chunk2;
- void *dst;
- if (unlikely(ring->count_dw < count_dw))
DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask;
- dst = (void *)&ring->ring[occupied];
- chunk1 = ring->buf_mask + 1 - occupied;
- chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1;
- chunk2 = count_dw - chunk1;
- chunk1 <<= 2;
- chunk2 <<= 2;
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v);
- if (chunk1)
memcpy(dst, src, chunk1);
- if (chunk2) {
src += chunk1;
dst = (void *)ring->ring;
memcpy(dst, src, chunk2);
- }
- ring->wptr += count_dw;
- ring->wptr &= ring->ptr_mask;
- ring->count_dw -= count_dw;
-} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring,
int amdgpu_ring_test_helper(struct amdgpu_ring *ring); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.cvoid *src, int count_dw);
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP KDB binary to memory */
- memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size);
- psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size); /* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP SPL binary to memory */
- memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size);
- psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size); /* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
- psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
- psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
- psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
- psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */
- memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size);
- psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
@@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */
- memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size);
- psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 4:35 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
There is really no other way according to this article https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2F...
"A perfect solution seems nearly impossible though; we cannot acquire a mutex on the user to prevent them from yanking a device and we cannot check for a presence change after every device access for performance reasons. "
But I assumed srcu_read_lock should be pretty seamless performance wise, no ?
The read side is supposed to be dirt cheap, the write side is were we just stall for all readers to eventually complete on their own. Definitely should be much cheaper than mmio read, on the mmio write side it might actually hurt a bit. Otoh I think those don't stall the cpu by default when they're timing out, so maybe if the overhead is too much for those, we could omit them?
Maybe just do a small microbenchmark for these for testing, with a register that doesn't change hw state. So with and without drm_dev_enter/exit, and also one with the hw plugged out so that we have actual timeouts in the transactions. -Daniel
So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
I think we should at least measure the following:
1. Writing X times to a scratch reg without your patch. 2. Writing X times to a scratch reg with your patch. 3. Writing X times to a scratch reg with the hardware physically disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
Andrey
The other solution would be as I suggested to keep all the device IO ranges reserved and system memory pages unfreed until the device is finalized in the driver but Daniel said this would upset the PCI layer (the MMIO ranges reservation part).
Andrey
On 1/19/21 3:55 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@ #include <linux/iommu.h> +#include <drm/drm_drv.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) { + int idx;
if (adev->in_pci_err_recovery) return; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG();
+ drm_dev_exit(idx); } /** @@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) && @@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, } trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
+ drm_dev_exit(idx); } /* @@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) { @@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); }
+ drm_dev_exit(idx); } /** @@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); }
+ drm_dev_exit(idx); } /** @@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** @@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h" +#include <drm/drm_drv.h>
/** * amdgpu_gmc_get_pde_for_bo - get the PDE for a BO * @@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return 0; /* * The following is for PTE only. GART does not have PDEs. @@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8));
+ drm_dev_exit(idx);
return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@ #include "amdgpu_ras.h" +#include <drm/drm_drv.h>
static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev); @@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret; - int index; + int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false; @@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0; + if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return 0;
mutex_lock(&psp->mutex); memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); @@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value); - mutex_unlock(&psp->mutex); - return ret; + goto exit; } amdgpu_asic_invalidate_hdp(psp->adev, NULL); @@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) { - mutex_unlock(&psp->mutex); - return -EINVAL; + ret = -EINVAL; + goto exit; } } @@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; } - mutex_unlock(&psp->mutex); +exit: + mutex_unlock(&psp->mutex); + drm_dev_exit(idx); return ret; } @@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */ - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size); + psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size); psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size); @@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size); + psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size); psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size); @@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); + psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); + psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr, + psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size); psp_prep_ta_load_cmd_buf(cmd, @@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); + psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); + psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; } +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{ + int idx;
+ if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return;
+ memset(psp->fw_pri_buf, 0, PSP_1_MEG); + memcpy(psp->fw_pri_buf, start_addr, bin_size);
+ drm_dev_exit(idx); +}
static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr);
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size);
#endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h" +#include <drm/drm_drv.h>
/* * Rings * Most engines on the GPU are fed via ring buffers. Ring @@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; }
+void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{ + int idx; + int i = 0;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ while (i <= ring->buf_mask) + ring->ring[i++] = ring->funcs->nop;
+ drm_dev_exit(idx);
+}
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{ + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (ring->count_dw <= 0) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); + ring->ring[ring->wptr++ & ring->buf_mask] = v; + ring->wptr &= ring->ptr_mask; + ring->count_dw--;
+ drm_dev_exit(idx); +}
+void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw) +{ + unsigned occupied, chunk1, chunk2; + void *dst; + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (unlikely(ring->count_dw < count_dw)) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
+ occupied = ring->wptr & ring->buf_mask; + dst = (void *)&ring->ring[occupied]; + chunk1 = ring->buf_mask + 1 - occupied; + chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; + chunk2 = count_dw - chunk1; + chunk1 <<= 2; + chunk2 <<= 2;
+ if (chunk1) + memcpy(dst, src, chunk1);
+ if (chunk2) { + src += chunk1; + dst = (void *)ring->ring; + memcpy(dst, src, chunk2); + }
+ ring->wptr += count_dw; + ring->wptr &= ring->ptr_mask; + ring->count_dw -= count_dw;
+ drm_dev_exit(idx); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; } -static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{ - int i = 0; - while (i <= ring->buf_mask) - ring->ring[i++] = ring->funcs->nop;
-}
-static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{ - if (ring->count_dw <= 0) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); - ring->ring[ring->wptr++ & ring->buf_mask] = v; - ring->wptr &= ring->ptr_mask; - ring->count_dw--; -} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring); -static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, - void *src, int count_dw) -{ - unsigned occupied, chunk1, chunk2; - void *dst;
- if (unlikely(ring->count_dw < count_dw)) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask; - dst = (void *)&ring->ring[occupied]; - chunk1 = ring->buf_mask + 1 - occupied; - chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; - chunk2 = count_dw - chunk1; - chunk1 <<= 2; - chunk2 <<= 2; +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v); - if (chunk1) - memcpy(dst, src, chunk1);
- if (chunk2) { - src += chunk1; - dst = (void *)ring->ring; - memcpy(dst, src, chunk2); - }
- ring->wptr += count_dw; - ring->wptr &= ring->ptr_mask; - ring->count_dw -= count_dw; -} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw); int amdgpu_ring_test_helper(struct amdgpu_ring *ring); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP KDB binary to memory */ - memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size); + psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size); /* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP SPL binary to memory */ - memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size); + psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size); /* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 4:35 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
There is really no other way according to this article https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2F...
"A perfect solution seems nearly impossible though; we cannot acquire a mutex on the user to prevent them from yanking a device and we cannot check for a presence change after every device access for performance reasons. "
But I assumed srcu_read_lock should be pretty seamless performance wise, no ?
The read side is supposed to be dirt cheap, the write side is were we just stall for all readers to eventually complete on their own. Definitely should be much cheaper than mmio read, on the mmio write side it might actually hurt a bit. Otoh I think those don't stall the cpu by default when they're timing out, so maybe if the overhead is too much for those, we could omit them?
Maybe just do a small microbenchmark for these for testing, with a register that doesn't change hw state. So with and without drm_dev_enter/exit, and also one with the hw plugged out so that we have actual timeouts in the transactions. -Daniel
So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
I think we should at least measure the following:
- Writing X times to a scratch reg without your patch.
- Writing X times to a scratch reg with your patch.
- Writing X times to a scratch reg with the hardware physically disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
Will do.
Andrey
Andrey
The other solution would be as I suggested to keep all the device IO ranges reserved and system memory pages unfreed until the device is finalized in the driver but Daniel said this would upset the PCI layer (the MMIO ranges reservation part).
Andrey
On 1/19/21 3:55 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@ #include <linux/iommu.h> +#include <drm/drm_drv.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) { + int idx;
if (adev->in_pci_err_recovery) return; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG();
+ drm_dev_exit(idx); } /** @@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) && @@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, } trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
+ drm_dev_exit(idx); } /* @@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) { @@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); }
+ drm_dev_exit(idx); } /** @@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); }
+ drm_dev_exit(idx); } /** @@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** @@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h" +#include <drm/drm_drv.h>
/** * amdgpu_gmc_get_pde_for_bo - get the PDE for a BO * @@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return 0; /* * The following is for PTE only. GART does not have PDEs. @@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8));
+ drm_dev_exit(idx);
return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@ #include "amdgpu_ras.h" +#include <drm/drm_drv.h>
static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev); @@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret; - int index; + int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false; @@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0; + if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return 0;
mutex_lock(&psp->mutex); memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); @@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value); - mutex_unlock(&psp->mutex); - return ret; + goto exit; } amdgpu_asic_invalidate_hdp(psp->adev, NULL); @@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) { - mutex_unlock(&psp->mutex); - return -EINVAL; + ret = -EINVAL; + goto exit; } } @@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; } - mutex_unlock(&psp->mutex); +exit: + mutex_unlock(&psp->mutex); + drm_dev_exit(idx); return ret; } @@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */ - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size); + psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size); psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size); @@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size); + psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size); psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size); @@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); + psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); + psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr, + psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size); psp_prep_ta_load_cmd_buf(cmd, @@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); + psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); + psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; } +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{ + int idx;
+ if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return;
+ memset(psp->fw_pri_buf, 0, PSP_1_MEG); + memcpy(psp->fw_pri_buf, start_addr, bin_size);
+ drm_dev_exit(idx); +}
static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr);
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size);
#endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h" +#include <drm/drm_drv.h>
/* * Rings * Most engines on the GPU are fed via ring buffers. Ring @@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; }
+void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{ + int idx; + int i = 0;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ while (i <= ring->buf_mask) + ring->ring[i++] = ring->funcs->nop;
+ drm_dev_exit(idx);
+}
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{ + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (ring->count_dw <= 0) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); + ring->ring[ring->wptr++ & ring->buf_mask] = v; + ring->wptr &= ring->ptr_mask; + ring->count_dw--;
+ drm_dev_exit(idx); +}
+void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw) +{ + unsigned occupied, chunk1, chunk2; + void *dst; + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (unlikely(ring->count_dw < count_dw)) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
+ occupied = ring->wptr & ring->buf_mask; + dst = (void *)&ring->ring[occupied]; + chunk1 = ring->buf_mask + 1 - occupied; + chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; + chunk2 = count_dw - chunk1; + chunk1 <<= 2; + chunk2 <<= 2;
+ if (chunk1) + memcpy(dst, src, chunk1);
+ if (chunk2) { + src += chunk1; + dst = (void *)ring->ring; + memcpy(dst, src, chunk2); + }
+ ring->wptr += count_dw; + ring->wptr &= ring->ptr_mask; + ring->count_dw -= count_dw;
+ drm_dev_exit(idx); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; } -static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{ - int i = 0; - while (i <= ring->buf_mask) - ring->ring[i++] = ring->funcs->nop;
-}
-static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{ - if (ring->count_dw <= 0) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); - ring->ring[ring->wptr++ & ring->buf_mask] = v; - ring->wptr &= ring->ptr_mask; - ring->count_dw--; -} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring); -static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, - void *src, int count_dw) -{ - unsigned occupied, chunk1, chunk2; - void *dst;
- if (unlikely(ring->count_dw < count_dw)) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask; - dst = (void *)&ring->ring[occupied]; - chunk1 = ring->buf_mask + 1 - occupied; - chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; - chunk2 = count_dw - chunk1; - chunk1 <<= 2; - chunk2 <<= 2; +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v); - if (chunk1) - memcpy(dst, src, chunk1);
- if (chunk2) { - src += chunk1; - dst = (void *)ring->ring; - memcpy(dst, src, chunk2); - }
- ring->wptr += count_dw; - ring->wptr &= ring->ptr_mask; - ring->count_dw -= count_dw; -} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw); int amdgpu_ring_test_helper(struct amdgpu_ring *ring); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP KDB binary to memory */ - memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size); + psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size); /* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP SPL binary to memory */ - memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size); + psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size); /* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On 1/19/21 2:16 PM, Andrey Grodzovsky wrote:
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 4:35 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
There is really no other way according to this article https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2F...
"A perfect solution seems nearly impossible though; we cannot acquire a mutex on the user to prevent them from yanking a device and we cannot check for a presence change after every device access for performance reasons. "
But I assumed srcu_read_lock should be pretty seamless performance wise, no ?
The read side is supposed to be dirt cheap, the write side is were we just stall for all readers to eventually complete on their own. Definitely should be much cheaper than mmio read, on the mmio write side it might actually hurt a bit. Otoh I think those don't stall the cpu by default when they're timing out, so maybe if the overhead is too much for those, we could omit them?
Maybe just do a small microbenchmark for these for testing, with a register that doesn't change hw state. So with and without drm_dev_enter/exit, and also one with the hw plugged out so that we have actual timeouts in the transactions. -Daniel
So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
I think we should at least measure the following:
- Writing X times to a scratch reg without your patch.
- Writing X times to a scratch reg with your patch.
- Writing X times to a scratch reg with the hardware physically disconnected.
Just realized, I can't test this part since I don't have eGPU to yank out.
Andrey
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
Will do.
Andrey
Andrey
The other solution would be as I suggested to keep all the device IO ranges reserved and system memory pages unfreed until the device is finalized in the driver but Daniel said this would upset the PCI layer (the MMIO ranges reservation part).
Andrey
On 1/19/21 3:55 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky: > This should prevent writing to memory or IO ranges possibly > already allocated for other uses after our device is removed. Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
> Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 > ++++++++++++++++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- > drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- > drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- > 9 files changed, 184 insertions(+), 89 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index e99f4f1..0a9d73c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -72,6 +72,8 @@ > #include <linux/iommu.h> > +#include <drm/drm_drv.h> > + > MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); > MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); > MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); > @@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, > uint32_t offset) > */ > void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, > uint8_t > value) > { > + int idx; > + > if (adev->in_pci_err_recovery) > return; > + > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return; > + > if (offset < adev->rmmio_size) > writeb(value, adev->rmmio + offset); > else > BUG(); > + > + drm_dev_exit(idx); > } > /** > @@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, > uint32_t reg, uint32_t v, > uint32_t acc_flags) > { > + int idx; > + > if (adev->in_pci_err_recovery) > return; > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return; > + > if ((reg * 4) < adev->rmmio_size) { > if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && > amdgpu_sriov_runtime(adev) && > @@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, > } > trace_amdgpu_device_wreg(adev->pdev->device, reg, v); > + > + drm_dev_exit(idx); > } > /* > @@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, > void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, > uint32_t reg, uint32_t v) > { > + int idx; > + > if (adev->in_pci_err_recovery) > return; > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return; > + > if (amdgpu_sriov_fullaccess(adev) && > adev->gfx.rlc.funcs && > adev->gfx.rlc.funcs->is_rlcg_access_range) { > @@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device > *adev, > } else { > writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); > } > + > + drm_dev_exit(idx); > } > /** > @@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 > reg) > */ > void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) > { > + int idx; > + > if (adev->in_pci_err_recovery) > return; > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return; > + > if ((reg * 4) < adev->rio_mem_size) > iowrite32(v, adev->rio_mem + (reg * 4)); > else { > iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); > iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); > } > + > + drm_dev_exit(idx); > } > /** > @@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device > *adev, u32 > index) > */ > void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) > { > + int idx; > + > if (adev->in_pci_err_recovery) > return; > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return; > + > if (index < adev->doorbell.num_doorbells) { > writel(v, adev->doorbell.ptr + index); > } else { > DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); > } > + > + drm_dev_exit(idx); > } > /** > @@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, > u32 index) > */ > void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) > { > + int idx; > + > if (adev->in_pci_err_recovery) > return; > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return; > + > if (index < adev->doorbell.num_doorbells) { > atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); > } else { > DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); > } > + > + drm_dev_exit(idx); > } > /** > @@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device > *adev, > unsigned long flags; > void __iomem *pcie_index_offset; > void __iomem *pcie_data_offset; > + int idx; > + > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return; > spin_lock_irqsave(&adev->pcie_idx_lock, flags); > pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; > @@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct > amdgpu_device *adev, > writel(reg_data, pcie_data_offset); > readl(pcie_data_offset); > spin_unlock_irqrestore(&adev->pcie_idx_lock, flags); > + > + drm_dev_exit(idx); > } > /** > @@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct > amdgpu_device > *adev, > unsigned long flags; > void __iomem *pcie_index_offset; > void __iomem *pcie_data_offset; > + int idx; > + > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return; > spin_lock_irqsave(&adev->pcie_idx_lock, flags); > pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; > @@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device > *adev, > writel((u32)(reg_data >> 32), pcie_data_offset); > readl(pcie_data_offset); > spin_unlock_irqrestore(&adev->pcie_idx_lock, flags); > + > + drm_dev_exit(idx); > } > /** > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > index fe1a39f..1beb4e6 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > @@ -31,6 +31,8 @@ > #include "amdgpu_ras.h" > #include "amdgpu_xgmi.h" > +#include <drm/drm_drv.h> > + > /** > * amdgpu_gmc_get_pde_for_bo - get the PDE for a BO > * > @@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, > void *cpu_pt_addr, > { > void __iomem *ptr = (void *)cpu_pt_addr; > uint64_t value; > + int idx; > + > + if (!drm_dev_enter(&adev->ddev, &idx)) > + return 0; > /* > * The following is for PTE only. GART does not have PDEs. > @@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, > void *cpu_pt_addr, > value = addr & 0x0000FFFFFFFFF000ULL; > value |= flags; > writeq(value, ptr + (gpu_page_idx * 8)); > + > + drm_dev_exit(idx); > + > return 0; > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > index 523d22d..89e2bfe 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > @@ -37,6 +37,8 @@ > #include "amdgpu_ras.h" > +#include <drm/drm_drv.h> > + > static int psp_sysfs_init(struct amdgpu_device *adev); > static void psp_sysfs_fini(struct amdgpu_device *adev); > @@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, > struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) > { > int ret; > - int index; > + int index, idx; > int timeout = 2000; > bool ras_intr = false; > bool skip_unsupport = false; > @@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, > if (psp->adev->in_pci_err_recovery) > return 0; > + if (!drm_dev_enter(&psp->adev->ddev, &idx)) > + return 0; > + > mutex_lock(&psp->mutex); > memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); > @@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, > ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, > index); > if (ret) { > atomic_dec(&psp->fence_value); > - mutex_unlock(&psp->mutex); > - return ret; > + goto exit; > } > amdgpu_asic_invalidate_hdp(psp->adev, NULL); > @@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, > psp->cmd_buf_mem->cmd_id, > psp->cmd_buf_mem->resp.status); > if (!timeout) { > - mutex_unlock(&psp->mutex); > - return -EINVAL; > + ret = -EINVAL; > + goto exit; > } > } > @@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, > ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; > ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; > } > - mutex_unlock(&psp->mutex); > +exit: > + mutex_unlock(&psp->mutex); > + drm_dev_exit(idx); > return ret; > } > @@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, > if (!cmd) > return -ENOMEM; > /* Copy toc to psp firmware private buffer */ > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size); > + psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size); > psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, > psp->toc_bin_size); > @@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) > if (!cmd) > return -ENOMEM; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size); > + psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size); > psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, > psp->asd_ucode_size); > @@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) > if (!cmd) > return -ENOMEM; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, > psp->ta_xgmi_ucode_size); > + psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); > psp_prep_ta_load_cmd_buf(cmd, > psp->fw_pri_mc_addr, > @@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) > if (!cmd) > return -ENOMEM; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, > psp->ta_ras_ucode_size); > + psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); > psp_prep_ta_load_cmd_buf(cmd, > psp->fw_pri_mc_addr, > @@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) > if (!cmd) > return -ENOMEM; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr, > + psp_copy_fw(psp, psp->ta_hdcp_start_addr, > psp->ta_hdcp_ucode_size); > psp_prep_ta_load_cmd_buf(cmd, > @@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) > if (!cmd) > return -ENOMEM; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, > psp->ta_dtm_ucode_size); > + psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); > psp_prep_ta_load_cmd_buf(cmd, > psp->fw_pri_mc_addr, > @@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) > if (!cmd) > return -ENOMEM; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, > psp->ta_rap_ucode_size); > + psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); > psp_prep_ta_load_cmd_buf(cmd, > psp->fw_pri_mc_addr, > @@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct > device *dev, > return count; > } > +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t > bin_size) > +{ > + int idx; > + > + if (!drm_dev_enter(&psp->adev->ddev, &idx)) > + return; > + > + memset(psp->fw_pri_buf, 0, PSP_1_MEG); > + memcpy(psp->fw_pri_buf, start_addr, bin_size); > + > + drm_dev_exit(idx); > +} > + > + > static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, > psp_usbc_pd_fw_sysfs_read, > psp_usbc_pd_fw_sysfs_write); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h > index da250bc..ac69314 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h > @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, > const char *chip_name); > int psp_get_fw_attestation_records_addr(struct psp_context *psp, > uint64_t *output_ptr); > + > +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t > bin_size); > + > #endif > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > index 1a612f5..d656494 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > @@ -35,6 +35,8 @@ > #include "amdgpu.h" > #include "atom.h" > +#include <drm/drm_drv.h> > + > /* > * Rings > * Most engines on the GPU are fed via ring buffers. Ring > @@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) > ring->sched.ready = !r; > return r; > } > + > +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) > +{ > + int idx; > + int i = 0; > + > + if (!drm_dev_enter(&ring->adev->ddev, &idx)) > + return; > + > + while (i <= ring->buf_mask) > + ring->ring[i++] = ring->funcs->nop; > + > + drm_dev_exit(idx); > + > +} > + > +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) > +{ > + int idx; > + > + if (!drm_dev_enter(&ring->adev->ddev, &idx)) > + return; > + > + if (ring->count_dw <= 0) > + DRM_ERROR("amdgpu: writing more dwords to the ring than > expected!\n"); > + ring->ring[ring->wptr++ & ring->buf_mask] = v; > + ring->wptr &= ring->ptr_mask; > + ring->count_dw--; > + > + drm_dev_exit(idx); > +} > + > +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, > + void *src, int count_dw) > +{ > + unsigned occupied, chunk1, chunk2; > + void *dst; > + int idx; > + > + if (!drm_dev_enter(&ring->adev->ddev, &idx)) > + return; > + > + if (unlikely(ring->count_dw < count_dw)) > + DRM_ERROR("amdgpu: writing more dwords to the ring than > expected!\n"); > + > + occupied = ring->wptr & ring->buf_mask; > + dst = (void *)&ring->ring[occupied]; > + chunk1 = ring->buf_mask + 1 - occupied; > + chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; > + chunk2 = count_dw - chunk1; > + chunk1 <<= 2; > + chunk2 <<= 2; > + > + if (chunk1) > + memcpy(dst, src, chunk1); > + > + if (chunk2) { > + src += chunk1; > + dst = (void *)ring->ring; > + memcpy(dst, src, chunk2); > + } > + > + ring->wptr += count_dw; > + ring->wptr &= ring->ptr_mask; > + ring->count_dw -= count_dw; > + > + drm_dev_exit(idx); > +} > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > index accb243..f90b81f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > @@ -300,53 +300,12 @@ static inline void > amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, > *ring->cond_exe_cpu_addr = cond_exec; > } > -static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) > -{ > - int i = 0; > - while (i <= ring->buf_mask) > - ring->ring[i++] = ring->funcs->nop; > - > -} > - > -static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) > -{ > - if (ring->count_dw <= 0) > - DRM_ERROR("amdgpu: writing more dwords to the ring than > expected!\n"); > - ring->ring[ring->wptr++ & ring->buf_mask] = v; > - ring->wptr &= ring->ptr_mask; > - ring->count_dw--; > -} > +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring); > -static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, > - void *src, int count_dw) > -{ > - unsigned occupied, chunk1, chunk2; > - void *dst; > - > - if (unlikely(ring->count_dw < count_dw)) > - DRM_ERROR("amdgpu: writing more dwords to the ring than > expected!\n"); > - > - occupied = ring->wptr & ring->buf_mask; > - dst = (void *)&ring->ring[occupied]; > - chunk1 = ring->buf_mask + 1 - occupied; > - chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; > - chunk2 = count_dw - chunk1; > - chunk1 <<= 2; > - chunk2 <<= 2; > +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v); > - if (chunk1) > - memcpy(dst, src, chunk1); > - > - if (chunk2) { > - src += chunk1; > - dst = (void *)ring->ring; > - memcpy(dst, src, chunk2); > - } > - > - ring->wptr += count_dw; > - ring->wptr &= ring->ptr_mask; > - ring->count_dw -= count_dw; > -} > +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, > + void *src, int count_dw); > int amdgpu_ring_test_helper(struct amdgpu_ring *ring); > diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c > index bd4248c..b3ce5be 100644 > --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c > @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct > psp_context *psp) > if (ret) > return ret; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - > /* Copy PSP KDB binary to memory */ > - memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size); > + psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size); > /* Provide the PSP KDB to bootloader */ > WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, > @@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct > psp_context *psp) > if (ret) > return ret; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - > /* Copy PSP SPL binary to memory */ > - memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size); > + psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size); > /* Provide the PSP SPL to bootloader */ > WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, > @@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct > psp_context *psp) > if (ret) > return ret; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - > /* Copy PSP System Driver binary to memory */ > - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); > + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); > /* Provide the sys driver to bootloader */ > WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, > @@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct > psp_context *psp) > if (ret) > return ret; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - > /* Copy Secure OS binary to PSP memory */ > - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); > + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); > /* Provide the PSP secure OS to bootloader */ > WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, > diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c > b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c > index c4828bd..618e5b6 100644 > --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c > @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct > psp_context *psp) > if (ret) > return ret; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - > /* Copy PSP System Driver binary to memory */ > - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); > + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); > /* Provide the sys driver to bootloader */ > WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, > @@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct > psp_context *psp) > if (ret) > return ret; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - > /* Copy Secure OS binary to PSP memory */ > - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); > + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); > /* Provide the PSP secure OS to bootloader */ > WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, > diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c > b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c > index f2e725f..d0a6cccd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c > +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c > @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct > psp_context *psp) > if (ret) > return ret; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - > /* Copy PSP System Driver binary to memory */ > - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); > + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); > /* Provide the sys driver to bootloader */ > WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, > @@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct > psp_context *psp) > if (ret) > return ret; > - memset(psp->fw_pri_buf, 0, PSP_1_MEG); > - > /* Copy Secure OS binary to PSP memory */ > - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); > + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); > /* Provide the PSP secure OS to bootloader */ > WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 4:35 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
There is really no other way according to this article https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2F...
"A perfect solution seems nearly impossible though; we cannot acquire a mutex on the user to prevent them from yanking a device and we cannot check for a presence change after every device access for performance reasons. "
But I assumed srcu_read_lock should be pretty seamless performance wise, no ?
The read side is supposed to be dirt cheap, the write side is were we just stall for all readers to eventually complete on their own. Definitely should be much cheaper than mmio read, on the mmio write side it might actually hurt a bit. Otoh I think those don't stall the cpu by default when they're timing out, so maybe if the overhead is too much for those, we could omit them?
Maybe just do a small microbenchmark for these for testing, with a register that doesn't change hw state. So with and without drm_dev_enter/exit, and also one with the hw plugged out so that we have actual timeouts in the transactions. -Daniel
So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
I think we should at least measure the following:
- Writing X times to a scratch reg without your patch.
- Writing X times to a scratch reg with your patch.
- Writing X times to a scratch reg with the hardware physically disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
See attached results and the testing code. Ran on Polaris (gfx8) and Vega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch you get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for each WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 3763921..1650549 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -873,6 +873,11 @@ static int gfx_v8_0_ring_test_ring(struct amdgpu_ring *ring) if (i >= adev->usec_timeout) r = -ETIMEDOUT;
+ DRM_ERROR("Before write 1M times to scratch register"); + for (i = 0; i < 1000000; i++) + WREG32(scratch, 0xDEADBEEF); + DRM_ERROR("After write 1M times to scratch register"); + error_free_scratch: amdgpu_gfx_scratch_free(adev, scratch); return r; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 5f4805e..7ecbfef 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -1063,6 +1063,11 @@ static int gfx_v9_0_ring_test_ring(struct amdgpu_ring *ring) if (i >= adev->usec_timeout) r = -ETIMEDOUT;
+ DRM_ERROR("Before write 1M times to scratch register"); + for (i = 0; i < 1000000; i++) + WREG32(scratch, 0xDEADBEEF); + DRM_ERROR("After write 1M times to scratch register"); + error_free_scratch: amdgpu_gfx_scratch_free(adev, scratch); return r;
Andrey
Andrey
Andrey
The other solution would be as I suggested to keep all the device IO ranges reserved and system memory pages unfreed until the device is finalized in the driver but Daniel said this would upset the PCI layer (the MMIO ranges reservation part).
Andrey
On 1/19/21 3:55 AM, Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed.
Wow, that adds quite some overhead to every register access. I'm not sure we can do this.
Christian.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 ++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++++++++++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 ++----- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +--- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +--- 9 files changed, 184 insertions(+), 89 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e99f4f1..0a9d73c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -72,6 +72,8 @@ #include <linux/iommu.h> +#include <drm/drm_drv.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -404,13 +406,21 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) */ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t value) { + int idx;
if (adev->in_pci_err_recovery) return; + + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (offset < adev->rmmio_size) writeb(value, adev->rmmio + offset); else BUG();
+ drm_dev_exit(idx); } /** @@ -427,9 +437,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rmmio_size) { if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev) && @@ -444,6 +459,8 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, } trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
+ drm_dev_exit(idx); } /* @@ -454,9 +471,14 @@ void amdgpu_device_wreg(struct amdgpu_device *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, uint32_t v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (amdgpu_sriov_fullaccess(adev) && adev->gfx.rlc.funcs && adev->gfx.rlc.funcs->is_rlcg_access_range) { @@ -465,6 +487,8 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, } else { writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); }
+ drm_dev_exit(idx); } /** @@ -499,15 +523,22 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) */ void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if ((reg * 4) < adev->rio_mem_size) iowrite32(v, adev->rio_mem + (reg * 4)); else { iowrite32((reg * 4), adev->rio_mem + (mmMM_INDEX * 4)); iowrite32(v, adev->rio_mem + (mmMM_DATA * 4)); }
+ drm_dev_exit(idx); } /** @@ -544,14 +575,21 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { writel(v, adev->doorbell.ptr + index); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -588,14 +626,21 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) */ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v) { + int idx;
if (adev->in_pci_err_recovery) return; + if (!drm_dev_enter(&adev->ddev, &idx)) + return;
if (index < adev->doorbell.num_doorbells) { atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v); } else { DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index); }
+ drm_dev_exit(idx); } /** @@ -682,6 +727,10 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -692,6 +741,8 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev, writel(reg_data, pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** @@ -711,6 +762,10 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, unsigned long flags; void __iomem *pcie_index_offset; void __iomem *pcie_data_offset; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return; spin_lock_irqsave(&adev->pcie_idx_lock, flags); pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; @@ -727,6 +782,8 @@ void amdgpu_device_indirect_wreg64(struct amdgpu_device *adev, writel((u32)(reg_data >> 32), pcie_data_offset); readl(pcie_data_offset); spin_unlock_irqrestore(&adev->pcie_idx_lock, flags);
+ drm_dev_exit(idx); } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index fe1a39f..1beb4e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h" +#include <drm/drm_drv.h>
/** * amdgpu_gmc_get_pde_for_bo - get the PDE for a BO * @@ -98,6 +100,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value; + int idx;
+ if (!drm_dev_enter(&adev->ddev, &idx)) + return 0; /* * The following is for PTE only. GART does not have PDEs. @@ -105,6 +111,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0x0000FFFFFFFFF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8));
+ drm_dev_exit(idx);
return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 523d22d..89e2bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,6 +37,8 @@ #include "amdgpu_ras.h" +#include <drm/drm_drv.h>
static int psp_sysfs_init(struct amdgpu_device *adev); static void psp_sysfs_fini(struct amdgpu_device *adev); @@ -248,7 +250,7 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret; - int index; + int index, idx; int timeout = 2000; bool ras_intr = false; bool skip_unsupport = false; @@ -256,6 +258,9 @@ psp_cmd_submit_buf(struct psp_context *psp, if (psp->adev->in_pci_err_recovery) return 0; + if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return 0;
mutex_lock(&psp->mutex); memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); @@ -266,8 +271,7 @@ psp_cmd_submit_buf(struct psp_context *psp, ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, index); if (ret) { atomic_dec(&psp->fence_value); - mutex_unlock(&psp->mutex); - return ret; + goto exit; } amdgpu_asic_invalidate_hdp(psp->adev, NULL); @@ -307,8 +311,8 @@ psp_cmd_submit_buf(struct psp_context *psp, psp->cmd_buf_mem->cmd_id, psp->cmd_buf_mem->resp.status); if (!timeout) { - mutex_unlock(&psp->mutex); - return -EINVAL; + ret = -EINVAL; + goto exit; } } @@ -316,8 +320,10 @@ psp_cmd_submit_buf(struct psp_context *psp, ucode->tmr_mc_addr_lo = psp->cmd_buf_mem->resp.fw_addr_lo; ucode->tmr_mc_addr_hi = psp->cmd_buf_mem->resp.fw_addr_hi; } - mutex_unlock(&psp->mutex); +exit: + mutex_unlock(&psp->mutex); + drm_dev_exit(idx); return ret; } @@ -354,8 +360,7 @@ static int psp_load_toc(struct psp_context *psp, if (!cmd) return -ENOMEM; /* Copy toc to psp firmware private buffer */ - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->toc_start_addr, psp->toc_bin_size); + psp_copy_fw(psp, psp->toc_start_addr, psp->toc_bin_size); psp_prep_load_toc_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->toc_bin_size); @@ -570,8 +575,7 @@ static int psp_asd_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->asd_start_addr, psp->asd_ucode_size); + psp_copy_fw(psp, psp->asd_start_addr, psp->asd_ucode_size); psp_prep_asd_load_cmd_buf(cmd, psp->fw_pri_mc_addr, psp->asd_ucode_size); @@ -726,8 +730,7 @@ static int psp_xgmi_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); + psp_copy_fw(psp, psp->ta_xgmi_start_addr, psp->ta_xgmi_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -982,8 +985,7 @@ static int psp_ras_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); + psp_copy_fw(psp, psp->ta_ras_start_addr, psp->ta_ras_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1219,8 +1221,7 @@ static int psp_hdcp_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_hdcp_start_addr, + psp_copy_fw(psp, psp->ta_hdcp_start_addr, psp->ta_hdcp_ucode_size); psp_prep_ta_load_cmd_buf(cmd, @@ -1366,8 +1367,7 @@ static int psp_dtm_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); + psp_copy_fw(psp, psp->ta_dtm_start_addr, psp->ta_dtm_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -1507,8 +1507,7 @@ static int psp_rap_load(struct psp_context *psp) if (!cmd) return -ENOMEM; - memset(psp->fw_pri_buf, 0, PSP_1_MEG); - memcpy(psp->fw_pri_buf, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); + psp_copy_fw(psp, psp->ta_rap_start_addr, psp->ta_rap_ucode_size); psp_prep_ta_load_cmd_buf(cmd, psp->fw_pri_mc_addr, @@ -2778,6 +2777,20 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, return count; } +void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size) +{ + int idx;
+ if (!drm_dev_enter(&psp->adev->ddev, &idx)) + return;
+ memset(psp->fw_pri_buf, 0, PSP_1_MEG); + memcpy(psp->fw_pri_buf, start_addr, bin_size);
+ drm_dev_exit(idx); +}
static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR, psp_usbc_pd_fw_sysfs_read, psp_usbc_pd_fw_sysfs_write); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index da250bc..ac69314 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -400,4 +400,7 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name); int psp_get_fw_attestation_records_addr(struct psp_context *psp, uint64_t *output_ptr);
+void psp_copy_fw(struct psp_context *psp, uint8_t *start_addr, uint32_t bin_size);
#endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 1a612f5..d656494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -35,6 +35,8 @@ #include "amdgpu.h" #include "atom.h" +#include <drm/drm_drv.h>
/* * Rings * Most engines on the GPU are fed via ring buffers. Ring @@ -463,3 +465,71 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring) ring->sched.ready = !r; return r; }
+void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) +{ + int idx; + int i = 0;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ while (i <= ring->buf_mask) + ring->ring[i++] = ring->funcs->nop;
+ drm_dev_exit(idx);
+}
+void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) +{ + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (ring->count_dw <= 0) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); + ring->ring[ring->wptr++ & ring->buf_mask] = v; + ring->wptr &= ring->ptr_mask; + ring->count_dw--;
+ drm_dev_exit(idx); +}
+void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw) +{ + unsigned occupied, chunk1, chunk2; + void *dst; + int idx;
+ if (!drm_dev_enter(&ring->adev->ddev, &idx)) + return;
+ if (unlikely(ring->count_dw < count_dw)) + DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
+ occupied = ring->wptr & ring->buf_mask; + dst = (void *)&ring->ring[occupied]; + chunk1 = ring->buf_mask + 1 - occupied; + chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; + chunk2 = count_dw - chunk1; + chunk1 <<= 2; + chunk2 <<= 2;
+ if (chunk1) + memcpy(dst, src, chunk1);
+ if (chunk2) { + src += chunk1; + dst = (void *)ring->ring; + memcpy(dst, src, chunk2); + }
+ ring->wptr += count_dw; + ring->wptr &= ring->ptr_mask; + ring->count_dw -= count_dw;
+ drm_dev_exit(idx); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index accb243..f90b81f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -300,53 +300,12 @@ static inline void amdgpu_ring_set_preempt_cond_exec(struct amdgpu_ring *ring, *ring->cond_exe_cpu_addr = cond_exec; } -static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring) -{ - int i = 0; - while (i <= ring->buf_mask) - ring->ring[i++] = ring->funcs->nop;
-}
-static inline void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v) -{ - if (ring->count_dw <= 0) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n"); - ring->ring[ring->wptr++ & ring->buf_mask] = v; - ring->wptr &= ring->ptr_mask; - ring->count_dw--; -} +void amdgpu_ring_clear_ring(struct amdgpu_ring *ring); -static inline void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, - void *src, int count_dw) -{ - unsigned occupied, chunk1, chunk2; - void *dst;
- if (unlikely(ring->count_dw < count_dw)) - DRM_ERROR("amdgpu: writing more dwords to the ring than expected!\n");
- occupied = ring->wptr & ring->buf_mask; - dst = (void *)&ring->ring[occupied]; - chunk1 = ring->buf_mask + 1 - occupied; - chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1; - chunk2 = count_dw - chunk1; - chunk1 <<= 2; - chunk2 <<= 2; +void amdgpu_ring_write(struct amdgpu_ring *ring, uint32_t v); - if (chunk1) - memcpy(dst, src, chunk1);
- if (chunk2) { - src += chunk1; - dst = (void *)ring->ring; - memcpy(dst, src, chunk2); - }
- ring->wptr += count_dw; - ring->wptr &= ring->ptr_mask; - ring->count_dw -= count_dw; -} +void amdgpu_ring_write_multiple(struct amdgpu_ring *ring, + void *src, int count_dw); int amdgpu_ring_test_helper(struct amdgpu_ring *ring); diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index bd4248c..b3ce5be 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -269,10 +269,8 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP KDB binary to memory */ - memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size); + psp_copy_fw(psp, psp->kdb_start_addr, psp->kdb_bin_size); /* Provide the PSP KDB to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -302,10 +300,8 @@ static int psp_v11_0_bootloader_load_spl(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP SPL binary to memory */ - memcpy(psp->fw_pri_buf, psp->spl_start_addr, psp->spl_bin_size); + psp_copy_fw(psp, psp->spl_start_addr, psp->spl_bin_size); /* Provide the PSP SPL to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -335,10 +331,8 @@ static int psp_v11_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -371,10 +365,8 @@ static int psp_v11_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c index c4828bd..618e5b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -138,10 +138,8 @@ static int psp_v12_0_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -179,10 +177,8 @@ static int psp_v12_0_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index f2e725f..d0a6cccd 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -102,10 +102,8 @@ static int psp_v3_1_bootloader_load_sysdrv(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP System Driver binary to memory */ - memcpy(psp->fw_pri_buf, psp->sys_start_addr, psp->sys_bin_size); + psp_copy_fw(psp, psp->sys_start_addr, psp->sys_bin_size); /* Provide the sys driver to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36, @@ -143,10 +141,8 @@ static int psp_v3_1_bootloader_load_sos(struct psp_context *psp) if (ret) return ret; - memset(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy Secure OS binary to PSP memory */ - memcpy(psp->fw_pri_buf, psp->sos_start_addr, psp->sos_bin_size); + psp_copy_fw(psp, psp->sos_start_addr, psp->sos_bin_size); /* Provide the PSP secure OS to bootloader */ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote:
[SNIP]
So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
I think we should at least measure the following:
- Writing X times to a scratch reg without your patch.
- Writing X times to a scratch reg with your patch.
- Writing X times to a scratch reg with the hardware physically
disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
See attached results and the testing code. Ran on Polaris (gfx8) and Vega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch you get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for each WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Christian.
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote:
[SNIP]
So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
I think we should at least measure the following:
- Writing X times to a scratch reg without your patch.
- Writing X times to a scratch reg with your patch.
- Writing X times to a scratch reg with the hardware physically disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
See attached results and the testing code. Ran on Polaris (gfx8) and Vega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch you get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for each WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
My bad, and 67% from what ? How u calculate ?
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problem with this is that it increase the chance of race where the device is extracted after we check for drm_dev_enter (there is also such chance even when it's placed inside WWREG but it's lower). Earlier I propsed that instead of doing all those guards scattered all over the code simply delay release of system memory pages and unreserve of MMIO ranges to until after the device itself is gone after last drm device reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after PCI remove code because according to him it will upset the PCI subsytem.
Andrey
Christian.
Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote:
[SNIP]
So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
I think we should at least measure the following:
- Writing X times to a scratch reg without your patch.
- Writing X times to a scratch reg with your patch.
- Writing X times to a scratch reg with the hardware physically
disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
See attached results and the testing code. Ran on Polaris (gfx8) and Vega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch you get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for each WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
My bad, and 67% from what ? How u calculate ?
My bad, (308501-209689)/209689=47% increase.
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problem with this is that it increase the chance of race where the device is extracted after we check for drm_dev_enter (there is also such chance even when it's placed inside WWREG but it's lower). Earlier I propsed that instead of doing all those guards scattered all over the code simply delay release of system memory pages and unreserve of MMIO ranges to until after the device itself is gone after last drm device reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after PCI remove code because according to him it will upset the PCI subsytem.
Yeah, that's most likely true as well.
Maybe Daniel has another idea when he's back from vacation.
Christian.
Andrey
Christian.
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev on the same topic I added you to.
Andrey
On 1/29/21 2:25 PM, Christian König wrote:
Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote: > [SNIP] So say writing in a loop to some harmless scratch register for many times both for plugged and unplugged case and measure total time delta ?
I think we should at least measure the following:
- Writing X times to a scratch reg without your patch.
- Writing X times to a scratch reg with your patch.
- Writing X times to a scratch reg with the hardware physically disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
See attached results and the testing code. Ran on Polaris (gfx8) and Vega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch you get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for each WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
My bad, and 67% from what ? How u calculate ?
My bad, (308501-209689)/209689=47% increase.
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problem with this is that it increase the chance of race where the device is extracted after we check for drm_dev_enter (there is also such chance even when it's placed inside WWREG but it's lower). Earlier I propsed that instead of doing all those guards scattered all over the code simply delay release of system memory pages and unreserve of MMIO ranges to until after the device itself is gone after last drm device reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after PCI remove code because according to him it will upset the PCI subsytem.
Yeah, that's most likely true as well.
Maybe Daniel has another idea when he's back from vacation.
Christian.
Andrey
Christian.
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev on the same topic I added you to.
Summarizing my take over there for here plus maybe some more clarification. There's two problems:
- You must guarantee that after the ->remove callback of your driver is finished, there's no more mmio or any other hw access. A combination of stopping stuff and drm_dev_enter/exit can help with that. This prevents the use-after-free issue.
- For the actual hotunplug time, i.e. anything that can run while your driver is used up to the point where ->remove callback has finished stopp hw access you must guarantee that code doesn't blow up when it gets bogus reads (in the form of 0xff values). drm_dev_enter/exit can't help you with that. Plus you should make sure that we're not spending forever waiting for a big pile of mmio access all to time out because you never bail out - some coarse-grained drm_dev_enter/exit might help here.
Plus finally the userspace access problem: You must guarantee that after ->remove has finished that none of the uapi or cross-driver access points (driver ioctl, dma-buf, dma-fence, anything else that hangs around) can reach the data structures/memory mappings/whatever which have been released as part of your ->remove callback. drm_dev_enter/exit is again the tool of choice here.
So you have to use drm_dev_enter/exit for some of the problems we face on hotunplug, but it's not the tool that can handle the actual hw hotunplug race conditions for you.
Unfortunately the hw hotunplug race condition is an utter pain to test, since essentially you need to validate your driver against spurious 0xff reads at any moment. And I don't even have a clever idea to simulate this, e.g. by forcefully replacing the iobar mapping: What we'd need is a mapping that allows reads (so we can fill a page with 0xff and use that everywhere), but instead of rejecting writes, allows them, but drops them (so that the 0xff stays intact). Maybe we could simulate this with some kernel debug tricks (kinda like mmiotrace) with a read-only mapping and dropping every write every time we fault. But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here? -Daniel
Andrey
On 1/29/21 2:25 PM, Christian König wrote:
Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky: > > On 1/19/21 1:05 PM, Daniel Vetter wrote: >> [SNIP] > So say writing in a loop to some harmless scratch register for many times > both for plugged > and unplugged case and measure total time delta ?
I think we should at least measure the following:
- Writing X times to a scratch reg without your patch.
- Writing X times to a scratch reg with your patch.
- Writing X times to a scratch reg with the hardware physically disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega or Navi.
The SRBM on Polaris is meant to introduce some delay in each access, so it might react differently then the newer hardware.
Christian.
See attached results and the testing code. Ran on Polaris (gfx8) and Vega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch you get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for each WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
My bad, and 67% from what ? How u calculate ?
My bad, (308501-209689)/209689=47% increase.
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problem with this is that it increase the chance of race where the device is extracted after we check for drm_dev_enter (there is also such chance even when it's placed inside WWREG but it's lower). Earlier I propsed that instead of doing all those guards scattered all over the code simply delay release of system memory pages and unreserve of MMIO ranges to until after the device itself is gone after last drm device reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after PCI remove code because according to him it will upset the PCI subsytem.
Yeah, that's most likely true as well.
Maybe Daniel has another idea when he's back from vacation.
Christian.
Andrey
Christian.
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On 2/5/21 5:10 PM, Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev on the same topic I added you to.
Summarizing my take over there for here plus maybe some more clarification. There's two problems:
- You must guarantee that after the ->remove callback of your driver
is finished, there's no more mmio or any other hw access. A combination of stopping stuff and drm_dev_enter/exit can help with that. This prevents the use-after-free issue.
- For the actual hotunplug time, i.e. anything that can run while your
driver is used up to the point where ->remove callback has finished stopp hw access you must guarantee that code doesn't blow up when it gets bogus reads (in the form of 0xff values). drm_dev_enter/exit can't help you with that. Plus you should make sure that we're not spending forever waiting for a big pile of mmio access all to time out because you never bail out - some coarse-grained drm_dev_enter/exit might help here.
Plus finally the userspace access problem: You must guarantee that after ->remove has finished that none of the uapi or cross-driver access points (driver ioctl, dma-buf, dma-fence, anything else that hangs around) can reach the data structures/memory mappings/whatever which have been released as part of your ->remove callback. drm_dev_enter/exit is again the tool of choice here.
So you have to use drm_dev_enter/exit for some of the problems we face on hotunplug, but it's not the tool that can handle the actual hw hotunplug race conditions for you.
Unfortunately the hw hotunplug race condition is an utter pain to test, since essentially you need to validate your driver against spurious 0xff reads at any moment. And I don't even have a clever idea to simulate this, e.g. by forcefully replacing the iobar mapping: What we'd need is a mapping that allows reads (so we can fill a page with 0xff and use that everywhere), but instead of rejecting writes, allows them, but drops them (so that the 0xff stays intact). Maybe we could simulate this with some kernel debug tricks (kinda like mmiotrace) with a read-only mapping and dropping every write every time we fault. But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here? -Daniel
Not sure it's not a dump idea but still, worth asking - what if I just simply quietly return early from the .remove callback without doing anything there, the driver will not be aware that the device is removed and will at least try to continue working as usual including IOCTLs, job scheduling e.t.c. On the other hand all MMIO read accesses will start returning ~0, regarding rejecting writes - I don't see anywhere we test for result of writing (e.g. amdgpu_mm_wreg8) so seems they will just seamlessly go through... Or is it the pci_dev that will be freed by PCI core itself and so I will immediately crash ?
Andrey
Andrey
On 1/29/21 2:25 PM, Christian König wrote:
Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
On 1/19/21 1:59 PM, Christian König wrote: > Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky: >> >> On 1/19/21 1:05 PM, Daniel Vetter wrote: >>> [SNIP] >> So say writing in a loop to some harmless scratch register for many times >> both for plugged >> and unplugged case and measure total time delta ? > > I think we should at least measure the following: > > 1. Writing X times to a scratch reg without your patch. > 2. Writing X times to a scratch reg with your patch. > 3. Writing X times to a scratch reg with the hardware physically disconnected. > > I suggest to repeat that once for Polaris (or older) and once for Vega or > Navi. > > The SRBM on Polaris is meant to introduce some delay in each access, so it > might react differently then the newer hardware. > > Christian.
See attached results and the testing code. Ran on Polaris (gfx8) and Vega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch you get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for each WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
My bad, and 67% from what ? How u calculate ?
My bad, (308501-209689)/209689=47% increase.
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problem with this is that it increase the chance of race where the device is extracted after we check for drm_dev_enter (there is also such chance even when it's placed inside WWREG but it's lower). Earlier I propsed that instead of doing all those guards scattered all over the code simply delay release of system memory pages and unreserve of MMIO ranges to until after the device itself is gone after last drm device reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after PCI remove code because according to him it will upset the PCI subsytem.
Yeah, that's most likely true as well.
Maybe Daniel has another idea when he's back from vacation.
Christian.
Andrey
Christian.
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On Sat, Feb 6, 2021 at 12:09 AM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 2/5/21 5:10 PM, Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev on the same topic I added you to.
Summarizing my take over there for here plus maybe some more clarification. There's two problems:
- You must guarantee that after the ->remove callback of your driver
is finished, there's no more mmio or any other hw access. A combination of stopping stuff and drm_dev_enter/exit can help with that. This prevents the use-after-free issue.
- For the actual hotunplug time, i.e. anything that can run while your
driver is used up to the point where ->remove callback has finished stopp hw access you must guarantee that code doesn't blow up when it gets bogus reads (in the form of 0xff values). drm_dev_enter/exit can't help you with that. Plus you should make sure that we're not spending forever waiting for a big pile of mmio access all to time out because you never bail out - some coarse-grained drm_dev_enter/exit might help here.
Plus finally the userspace access problem: You must guarantee that after ->remove has finished that none of the uapi or cross-driver access points (driver ioctl, dma-buf, dma-fence, anything else that hangs around) can reach the data structures/memory mappings/whatever which have been released as part of your ->remove callback. drm_dev_enter/exit is again the tool of choice here.
So you have to use drm_dev_enter/exit for some of the problems we face on hotunplug, but it's not the tool that can handle the actual hw hotunplug race conditions for you.
Unfortunately the hw hotunplug race condition is an utter pain to test, since essentially you need to validate your driver against spurious 0xff reads at any moment. And I don't even have a clever idea to simulate this, e.g. by forcefully replacing the iobar mapping: What we'd need is a mapping that allows reads (so we can fill a page with 0xff and use that everywhere), but instead of rejecting writes, allows them, but drops them (so that the 0xff stays intact). Maybe we could simulate this with some kernel debug tricks (kinda like mmiotrace) with a read-only mapping and dropping every write every time we fault. But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here? -Daniel
Not sure it's not a dump idea but still, worth asking - what if I just simply quietly return early from the .remove callback without doing anything there, the driver will not be aware that the device is removed and will at least try to continue working as usual including IOCTLs, job scheduling e.t.c. On the other hand all MMIO read accesses will start returning ~0, regarding rejecting writes - I don't see anywhere we test for result of writing (e.g. amdgpu_mm_wreg8) so seems they will just seamlessly go through... Or is it the pci_dev that will be freed by PCI core itself and so I will immediately crash ?
This still requires that you physically unplug the device, so not something you can do in CI. Plus it doesn't allow you to easily fake a hotunplug in the middle of something interesting like an atomic modeset commit. If you instead punch out the mmio mapping with some pte trick, you can intercept the faults and count down until you actually switch over to only returning 0xff. This allows you to sweep through entire complex execution flows so that you have a guarantee you've actually caught everything.
If otoh you just hotunplug and don't clean up (or equivalent, insert a long sleep at the beginning of your ->remove hook) then you just check that at the beginning of each operation there's a check that bails out.
It's better than nothing for prototyping, but I don't think it's useful in a CI setting to assure stuff stays fixed. -Daniel
Andrey
Andrey
On 1/29/21 2:25 PM, Christian König wrote:
Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky: > > On 1/19/21 1:59 PM, Christian König wrote: >> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky: >>> >>> On 1/19/21 1:05 PM, Daniel Vetter wrote: >>>> [SNIP] >>> So say writing in a loop to some harmless scratch register for many times >>> both for plugged >>> and unplugged case and measure total time delta ? >> >> I think we should at least measure the following: >> >> 1. Writing X times to a scratch reg without your patch. >> 2. Writing X times to a scratch reg with your patch. >> 3. Writing X times to a scratch reg with the hardware physically disconnected. >> >> I suggest to repeat that once for Polaris (or older) and once for Vega or >> Navi. >> >> The SRBM on Polaris is meant to introduce some delay in each access, so it >> might react differently then the newer hardware. >> >> Christian. > > > See attached results and the testing code. Ran on Polaris (gfx8) and > Vega10(gfx9) > > In summary, over 1 million WWREG32 in loop with and without this patch you > get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for > each WWREG32) for using drm_dev_enter check when writing registers. > > P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
My bad, and 67% from what ? How u calculate ?
My bad, (308501-209689)/209689=47% increase.
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problem with this is that it increase the chance of race where the device is extracted after we check for drm_dev_enter (there is also such chance even when it's placed inside WWREG but it's lower). Earlier I propsed that instead of doing all those guards scattered all over the code simply delay release of system memory pages and unreserve of MMIO ranges to until after the device itself is gone after last drm device reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after PCI remove code because according to him it will upset the PCI subsytem.
Yeah, that's most likely true as well.
Maybe Daniel has another idea when he's back from vacation.
Christian.
Andrey
Christian.
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On 2/5/21 5:10 PM, Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev on the same topic I added you to.
Summarizing my take over there for here plus maybe some more clarification. There's two problems:
- You must guarantee that after the ->remove callback of your driver
is finished, there's no more mmio or any other hw access. A combination of stopping stuff and drm_dev_enter/exit can help with that. This prevents the use-after-free issue.
- For the actual hotunplug time, i.e. anything that can run while your
driver is used up to the point where ->remove callback has finished stopp hw access you must guarantee that code doesn't blow up when it gets bogus reads (in the form of 0xff values). drm_dev_enter/exit can't help you with that. Plus you should make sure that we're not spending forever waiting for a big pile of mmio access all to time out because you never bail out - some coarse-grained drm_dev_enter/exit might help here.
Plus finally the userspace access problem: You must guarantee that after ->remove has finished that none of the uapi or cross-driver access points (driver ioctl, dma-buf, dma-fence, anything else that hangs around) can reach the data structures/memory mappings/whatever which have been released as part of your ->remove callback. drm_dev_enter/exit is again the tool of choice here.
So you have to use drm_dev_enter/exit for some of the problems we face on hotunplug, but it's not the tool that can handle the actual hw hotunplug race conditions for you.
Unfortunately the hw hotunplug race condition is an utter pain to test, since essentially you need to validate your driver against spurious 0xff reads at any moment. And I don't even have a clever idea to simulate this, e.g. by forcefully replacing the iobar mapping: What we'd need is a mapping that allows reads (so we can fill a page with 0xff and use that everywhere), but instead of rejecting writes, allows them, but drops them (so that the 0xff stays intact). Maybe we could simulate this with some kernel debug tricks (kinda like mmiotrace) with a read-only mapping and dropping every write every time we fault.
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply i got)
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Andrey
-Daniel
Andrey
On 1/29/21 2:25 PM, Christian König wrote:
Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
On 1/19/21 1:59 PM, Christian König wrote: > Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky: >> >> On 1/19/21 1:05 PM, Daniel Vetter wrote: >>> [SNIP] >> So say writing in a loop to some harmless scratch register for many times >> both for plugged >> and unplugged case and measure total time delta ? > > I think we should at least measure the following: > > 1. Writing X times to a scratch reg without your patch. > 2. Writing X times to a scratch reg with your patch. > 3. Writing X times to a scratch reg with the hardware physically disconnected. > > I suggest to repeat that once for Polaris (or older) and once for Vega or > Navi. > > The SRBM on Polaris is meant to introduce some delay in each access, so it > might react differently then the newer hardware. > > Christian.
See attached results and the testing code. Ran on Polaris (gfx8) and Vega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch you get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for each WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
My bad, and 67% from what ? How u calculate ?
My bad, (308501-209689)/209689=47% increase.
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problem with this is that it increase the chance of race where the device is extracted after we check for drm_dev_enter (there is also such chance even when it's placed inside WWREG but it's lower). Earlier I propsed that instead of doing all those guards scattered all over the code simply delay release of system memory pages and unreserve of MMIO ranges to until after the device itself is gone after last drm device reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after PCI remove code because according to him it will upset the PCI subsytem.
Yeah, that's most likely true as well.
Maybe Daniel has another idea when he's back from vacation.
Christian.
Andrey
Christian.
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
On Sun, Feb 7, 2021 at 10:28 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 2/5/21 5:10 PM, Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev on the same topic I added you to.
Summarizing my take over there for here plus maybe some more clarification. There's two problems:
- You must guarantee that after the ->remove callback of your driver
is finished, there's no more mmio or any other hw access. A combination of stopping stuff and drm_dev_enter/exit can help with that. This prevents the use-after-free issue.
- For the actual hotunplug time, i.e. anything that can run while your
driver is used up to the point where ->remove callback has finished stopp hw access you must guarantee that code doesn't blow up when it gets bogus reads (in the form of 0xff values). drm_dev_enter/exit can't help you with that. Plus you should make sure that we're not spending forever waiting for a big pile of mmio access all to time out because you never bail out - some coarse-grained drm_dev_enter/exit might help here.
Plus finally the userspace access problem: You must guarantee that after ->remove has finished that none of the uapi or cross-driver access points (driver ioctl, dma-buf, dma-fence, anything else that hangs around) can reach the data structures/memory mappings/whatever which have been released as part of your ->remove callback. drm_dev_enter/exit is again the tool of choice here.
So you have to use drm_dev_enter/exit for some of the problems we face on hotunplug, but it's not the tool that can handle the actual hw hotunplug race conditions for you.
Unfortunately the hw hotunplug race condition is an utter pain to test, since essentially you need to validate your driver against spurious 0xff reads at any moment. And I don't even have a clever idea to simulate this, e.g. by forcefully replacing the iobar mapping: What we'd need is a mapping that allows reads (so we can fill a page with 0xff and use that everywhere), but instead of rejecting writes, allows them, but drops them (so that the 0xff stays intact). Maybe we could simulate this with some kernel debug tricks (kinda like mmiotrace) with a read-only mapping and dropping every write every time we fault.
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://www.kernel.org/doc/html/latest/trace/mmiotrace.html
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this. -Daniel
Andrey
-Daniel
Andrey
On 1/29/21 2:25 PM, Christian König wrote:
Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky: > > On 1/19/21 1:59 PM, Christian König wrote: >> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky: >>> >>> On 1/19/21 1:05 PM, Daniel Vetter wrote: >>>> [SNIP] >>> So say writing in a loop to some harmless scratch register for many times >>> both for plugged >>> and unplugged case and measure total time delta ? >> >> I think we should at least measure the following: >> >> 1. Writing X times to a scratch reg without your patch. >> 2. Writing X times to a scratch reg with your patch. >> 3. Writing X times to a scratch reg with the hardware physically disconnected. >> >> I suggest to repeat that once for Polaris (or older) and once for Vega or >> Navi. >> >> The SRBM on Polaris is meant to introduce some delay in each access, so it >> might react differently then the newer hardware. >> >> Christian. > > > See attached results and the testing code. Ran on Polaris (gfx8) and > Vega10(gfx9) > > In summary, over 1 million WWREG32 in loop with and without this patch you > get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for > each WWREG32) for using drm_dev_enter check when writing registers. > > P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead. So around 100ns per write. And even bigger problem is that this is a ~67% increase.
My bad, and 67% from what ? How u calculate ?
My bad, (308501-209689)/209689=47% increase.
I'm not sure how many write we do during normal operation, but that sounds like a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problem with this is that it increase the chance of race where the device is extracted after we check for drm_dev_enter (there is also such chance even when it's placed inside WWREG but it's lower). Earlier I propsed that instead of doing all those guards scattered all over the code simply delay release of system memory pages and unreserve of MMIO ranges to until after the device itself is gone after last drm device reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after PCI remove code because according to him it will upset the PCI subsytem.
Yeah, that's most likely true as well.
Maybe Daniel has another idea when he's back from vacation.
Christian.
Andrey
Christian.
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP]
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://www.kernel.org/doc/html/latest/trace/mmiotrace.html
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
Regards, Christian.
-Daniel
Andrey
On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP]
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://www.kernel.org/doc/html/latest/trace/mmiotrace.html
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
Simply pte trick isn't enough, because we need: - drop all writes silently - all reads return 0xff
ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik.
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
Yeah that's the entire crux here :-/ -Daniel
Am 08.02.21 um 10:48 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP]
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://www.kernel.org/doc/html/latest/trace/mmiotrace.html
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
Simply pte trick isn't enough, because we need:
- drop all writes silently
- all reads return 0xff
ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik.
You are not thinking far enough :)
The dummy PTE is point to a dummy MMIO page which is just never used.
That hast the exact same properties than our removed MMIO space just doesn't goes bananas when a new device is MMIO mapped into that and our driver still tries to write there.
Regards, Christian.
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
Yeah that's the entire crux here :-/ -Daniel
On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
Am 08.02.21 um 10:48 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP]
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://www.kernel.org/doc/html/latest/trace/mmiotrace.html
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
Simply pte trick isn't enough, because we need:
- drop all writes silently
- all reads return 0xff
ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik.
You are not thinking far enough :)
The dummy PTE is point to a dummy MMIO page which is just never used.
That hast the exact same properties than our removed MMIO space just doesn't goes bananas when a new device is MMIO mapped into that and our driver still tries to write there.
Hm, but where do we get such a "guaranteed never used" mmio page from?
It's a nifty idea indeed otherwise ... -Daniel
Regards, Christian.
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
Yeah that's the entire crux here :-/ -Daniel
Am 08.02.21 um 11:11 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
Am 08.02.21 um 10:48 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP]
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel...
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
Simply pte trick isn't enough, because we need:
- drop all writes silently
- all reads return 0xff
ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik.
You are not thinking far enough :)
The dummy PTE is point to a dummy MMIO page which is just never used.
That hast the exact same properties than our removed MMIO space just doesn't goes bananas when a new device is MMIO mapped into that and our driver still tries to write there.
Hm, but where do we get such a "guaranteed never used" mmio page from?
Well we have tons of unused IO space on 64bit systems these days.
Doesn't really needs to be PCIe address space, doesn't it?
Christian.
It's a nifty idea indeed otherwise ... -Daniel
Regards, Christian.
> But ugh ... > > Otoh validating an entire driver like amdgpu without such a trick > against 0xff reads is practically impossible. So maybe you need to add > this as one of the tasks here? Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
Yeah that's the entire crux here :-/ -Daniel
On Mon, Feb 8, 2021 at 3:00 PM Christian König christian.koenig@amd.com wrote:
Am 08.02.21 um 11:11 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
Am 08.02.21 um 10:48 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP] > Clarification - as far as I know there are no page fault handlers for kernel > mappings. And we are talking about kernel mappings here, right ? If there were > I could solve all those issues the same as I do for user mappings, by > invalidating all existing mappings in the kernel (both kmaps and ioreamps)and > insert dummy zero or ~0 filled page instead. > Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve > ioremap API and it's not something that I think can be easily done according to > am answer i got to a related topic a few weeks ago > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... (that was the only reply > i got) mmiotrace can, but only for debug, and only on x86 platforms:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel...
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
Simply pte trick isn't enough, because we need:
- drop all writes silently
- all reads return 0xff
ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik.
You are not thinking far enough :)
The dummy PTE is point to a dummy MMIO page which is just never used.
That hast the exact same properties than our removed MMIO space just doesn't goes bananas when a new device is MMIO mapped into that and our driver still tries to write there.
Hm, but where do we get such a "guaranteed never used" mmio page from?
Well we have tons of unused IO space on 64bit systems these days.
Doesn't really needs to be PCIe address space, doesn't it?
That sounds very trusting to modern systems not decoding random ranges. E.g. the pci code stopped extending the host bridge windows on its own, entirely relying on the acpi provided ranges, to avoid stomping on stuff that's the but not listed anywhere.
I guess if we have a range behind a pci bridge, which isn't used by any device, but decoded by the bridge, then that should be safe enough. Maybe could even have an option in upstream to do that on unplug, if a certain flag is set, or a cmdline option. -Daniel
Christian.
It's a nifty idea indeed otherwise ... -Daniel
Regards, Christian.
>> But ugh ... >> >> Otoh validating an entire driver like amdgpu without such a trick >> against 0xff reads is practically impossible. So maybe you need to add >> this as one of the tasks here? > Or I could just for validation purposes return ~0 from all reg reads in the code > and ignore writes if drm_dev_unplugged, this could already easily validate a big > portion of the code flow under such scenario. Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
Yeah that's the entire crux here :-/ -Daniel
On 2/8/21 11:23 AM, Daniel Vetter wrote:
On Mon, Feb 8, 2021 at 3:00 PM Christian König christian.koenig@amd.com wrote:
Am 08.02.21 um 11:11 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
Am 08.02.21 um 10:48 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter: > [SNIP] >> Clarification - as far as I know there are no page fault handlers for kernel >> mappings. And we are talking about kernel mappings here, right ? If there were >> I could solve all those issues the same as I do for user mappings, by >> invalidating all existing mappings in the kernel (both kmaps and ioreamps)and >> insert dummy zero or ~0 filled page instead. >> Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve >> ioremap API and it's not something that I think can be easily done according to >> am answer i got to a related topic a few weeks ago >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... (that was the only reply >> i got) > mmiotrace can, but only for debug, and only on x86 platforms: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel... > > Should be feasible (but maybe not worth the effort) to extend this to > support fake unplug. Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
Simply pte trick isn't enough, because we need:
- drop all writes silently
- all reads return 0xff
ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik.
You are not thinking far enough :)
The dummy PTE is point to a dummy MMIO page which is just never used.
That hast the exact same properties than our removed MMIO space just doesn't goes bananas when a new device is MMIO mapped into that and our driver still tries to write there.
Hm, but where do we get such a "guaranteed never used" mmio page from?
Well we have tons of unused IO space on 64bit systems these days.
Doesn't really needs to be PCIe address space, doesn't it?
That sounds very trusting to modern systems not decoding random ranges. E.g. the pci code stopped extending the host bridge windows on its own, entirely relying on the acpi provided ranges, to avoid stomping on stuff that's the but not listed anywhere.
I guess if we have a range behind a pci bridge, which isn't used by any device, but decoded by the bridge, then that should be safe enough. Maybe could even have an option in upstream to do that on unplug, if a certain flag is set, or a cmdline option. -Daniel
Question - Why can't we just set those PTEs to point to system memory (another RO dummy page) filled with 1s ?
Andrey
Christian.
It's a nifty idea indeed otherwise ... -Daniel
Regards, Christian.
>>> But ugh ... >>> >>> Otoh validating an entire driver like amdgpu without such a trick >>> against 0xff reads is practically impossible. So maybe you need to add >>> this as one of the tasks here? >> Or I could just for validation purposes return ~0 from all reg reads in the code >> and ignore writes if drm_dev_unplugged, this could already easily validate a big >> portion of the code flow under such scenario. > Hm yeah if your really wrap them all, that should work too. Since > iommappings have __iomem pointer type, as long as amdgpu is sparse > warning free, should be doable to guarantee this. Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
Yeah that's the entire crux here :-/ -Daniel
Am 08.02.21 um 23:15 schrieb Andrey Grodzovsky:
On 2/8/21 11:23 AM, Daniel Vetter wrote:
On Mon, Feb 8, 2021 at 3:00 PM Christian König christian.koenig@amd.com wrote:
Am 08.02.21 um 11:11 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
Am 08.02.21 um 10:48 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote: > Am 07.02.21 um 22:50 schrieb Daniel Vetter: >> [SNIP] >>> Clarification - as far as I know there are no page fault >>> handlers for kernel >>> mappings. And we are talking about kernel mappings here, right >>> ? If there were >>> I could solve all those issues the same as I do for user >>> mappings, by >>> invalidating all existing mappings in the kernel (both kmaps >>> and ioreamps)and >>> insert dummy zero or ~0 filled page instead. >>> Also, I assume forcefully remapping the IO BAR to ~0 filled >>> page would involve >>> ioremap API and it's not something that I think can be easily >>> done according to >>> am answer i got to a related topic a few weeks ago >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >>> (that was the only reply >>> i got) >> mmiotrace can, but only for debug, and only on x86 platforms: >> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel... >> >> >> Should be feasible (but maybe not worth the effort) to extend >> this to >> support fake unplug. > Mhm, interesting idea you guys brought up here. > > We don't need a page fault for this to work, all we need to do > is to insert > dummy PTEs into the kernels page table at the place where > previously the > MMIO mapping has been. Simply pte trick isn't enough, because we need:
- drop all writes silently
- all reads return 0xff
ptes can't do that themselves, we minimally need write protection and then silently proceed on each write fault without restarting the instruction. Better would be to only catch reads, but x86 doesn't do write-only pte permissions afaik.
You are not thinking far enough :)
The dummy PTE is point to a dummy MMIO page which is just never used.
That hast the exact same properties than our removed MMIO space just doesn't goes bananas when a new device is MMIO mapped into that and our driver still tries to write there.
Hm, but where do we get such a "guaranteed never used" mmio page from?
Well we have tons of unused IO space on 64bit systems these days.
Doesn't really needs to be PCIe address space, doesn't it?
That sounds very trusting to modern systems not decoding random ranges. E.g. the pci code stopped extending the host bridge windows on its own, entirely relying on the acpi provided ranges, to avoid stomping on stuff that's the but not listed anywhere.
I guess if we have a range behind a pci bridge, which isn't used by any device, but decoded by the bridge, then that should be safe enough. Maybe could even have an option in upstream to do that on unplug, if a certain flag is set, or a cmdline option. -Daniel
Question - Why can't we just set those PTEs to point to system memory (another RO dummy page) filled with 1s ?
Then writes are not discarded. E.g. the 1s would change to something else.
Christian.
Andrey
Christian.
It's a nifty idea indeed otherwise ... -Daniel
Regards, Christian.
>>>> But ugh ... >>>> >>>> Otoh validating an entire driver like amdgpu without such a >>>> trick >>>> against 0xff reads is practically impossible. So maybe you >>>> need to add >>>> this as one of the tasks here? >>> Or I could just for validation purposes return ~0 from all reg >>> reads in the code >>> and ignore writes if drm_dev_unplugged, this could already >>> easily validate a big >>> portion of the code flow under such scenario. >> Hm yeah if your really wrap them all, that should work too. Since >> iommappings have __iomem pointer type, as long as amdgpu is sparse >> warning free, should be doable to guarantee this. > Problem is that ~0 is not always a valid register value. > > You would need to audit every register read that it doesn't use > the returned > value blindly as index or similar. That is quite a bit of work. Yeah that's the entire crux here :-/ -Daniel
On 2/9/21 2:58 AM, Christian König wrote:
Am 08.02.21 um 23:15 schrieb Andrey Grodzovsky:
On 2/8/21 11:23 AM, Daniel Vetter wrote:
On Mon, Feb 8, 2021 at 3:00 PM Christian König christian.koenig@amd.com wrote:
Am 08.02.21 um 11:11 schrieb Daniel Vetter:
On Mon, Feb 08, 2021 at 11:03:15AM +0100, Christian König wrote:
Am 08.02.21 um 10:48 schrieb Daniel Vetter: > On Mon, Feb 08, 2021 at 10:37:19AM +0100, Christian König wrote: >> Am 07.02.21 um 22:50 schrieb Daniel Vetter: >>> [SNIP] >>>> Clarification - as far as I know there are no page fault handlers for >>>> kernel >>>> mappings. And we are talking about kernel mappings here, right ? If >>>> there were >>>> I could solve all those issues the same as I do for user mappings, by >>>> invalidating all existing mappings in the kernel (both kmaps and >>>> ioreamps)and >>>> insert dummy zero or ~0 filled page instead. >>>> Also, I assume forcefully remapping the IO BAR to ~0 filled page >>>> would involve >>>> ioremap API and it's not something that I think can be easily done >>>> according to >>>> am answer i got to a related topic a few weeks ago >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >>>> (that was the only reply >>>> i got) >>> mmiotrace can, but only for debug, and only on x86 platforms: >>> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel... >>> >>> >>> Should be feasible (but maybe not worth the effort) to extend this to >>> support fake unplug. >> Mhm, interesting idea you guys brought up here. >> >> We don't need a page fault for this to work, all we need to do is to >> insert >> dummy PTEs into the kernels page table at the place where previously the >> MMIO mapping has been. > Simply pte trick isn't enough, because we need: > - drop all writes silently > - all reads return 0xff > > ptes can't do that themselves, we minimally need write protection and then > silently proceed on each write fault without restarting the instruction. > Better would be to only catch reads, but x86 doesn't do write-only pte > permissions afaik. You are not thinking far enough :)
The dummy PTE is point to a dummy MMIO page which is just never used.
That hast the exact same properties than our removed MMIO space just doesn't goes bananas when a new device is MMIO mapped into that and our driver still tries to write there.
Hm, but where do we get such a "guaranteed never used" mmio page from?
Well we have tons of unused IO space on 64bit systems these days.
Doesn't really needs to be PCIe address space, doesn't it?
That sounds very trusting to modern systems not decoding random ranges. E.g. the pci code stopped extending the host bridge windows on its own, entirely relying on the acpi provided ranges, to avoid stomping on stuff that's the but not listed anywhere.
I guess if we have a range behind a pci bridge, which isn't used by any device, but decoded by the bridge, then that should be safe enough. Maybe could even have an option in upstream to do that on unplug, if a certain flag is set, or a cmdline option. -Daniel
Question - Why can't we just set those PTEs to point to system memory (another RO dummy page) filled with 1s ?
Then writes are not discarded. E.g. the 1s would change to something else.
Christian.
I see but, what about marking the mappings as RO and discarding the write access page faults continuously until the device is finalized ? Regarding using an unused range behind the upper bridge as Daniel suggested, I wonder will this interfere with the upcoming feature to support BARs movement during hot plug - https://www.spinics.net/lists/linux-pci/msg103195.html ?
Andrey
Andrey
Christian.
It's a nifty idea indeed otherwise ... -Daniel
Regards, Christian.
>>>>> But ugh ... >>>>> >>>>> Otoh validating an entire driver like amdgpu without such a trick >>>>> against 0xff reads is practically impossible. So maybe you need to add >>>>> this as one of the tasks here? >>>> Or I could just for validation purposes return ~0 from all reg reads >>>> in the code >>>> and ignore writes if drm_dev_unplugged, this could already easily >>>> validate a big >>>> portion of the code flow under such scenario. >>> Hm yeah if your really wrap them all, that should work too. Since >>> iommappings have __iomem pointer type, as long as amdgpu is sparse >>> warning free, should be doable to guarantee this. >> Problem is that ~0 is not always a valid register value. >> >> You would need to audit every register read that it doesn't use the >> returned >> value blindly as index or similar. That is quite a bit of work. > Yeah that's the entire crux here :-/ > -Daniel
Am 09.02.21 um 15:30 schrieb Andrey Grodzovsky:
[SNIP]
Question - Why can't we just set those PTEs to point to system memory (another RO dummy page) filled with 1s ?
Then writes are not discarded. E.g. the 1s would change to something else.
Christian.
I see but, what about marking the mappings as RO and discarding the write access page faults continuously until the device is finalized ? Regarding using an unused range behind the upper bridge as Daniel suggested, I wonder will this interfere with the upcoming feature to support BARs movement during hot plug - https://www.spinics.net/lists/linux-pci/msg103195.html ?
In the picture in my head the address will never be used.
But it doesn't even needs to be an unused range of the root bridge. That one is usually stuffed full by the BIOS.
For my BAR resize work I looked at quite a bunch of NB documentation. At least for AMD CPUs we should always have an MMIO address which is never ever used. That makes this platform/CPU dependent, but the code is just minimal.
The really really nice thing about this approach is that you could unit test and audit the code for problems even without *real* hotplug hardware. E.g. we can swap the kernel PTEs and see how the whole power and display code reacts to that etc....
Christian.
Andrey
Andrey
Christian.
It's a nifty idea indeed otherwise ... -Daniel
> Regards, > Christian. > > >>>>>> But ugh ... >>>>>> >>>>>> Otoh validating an entire driver like amdgpu without such a >>>>>> trick >>>>>> against 0xff reads is practically impossible. So maybe you >>>>>> need to add >>>>>> this as one of the tasks here? >>>>> Or I could just for validation purposes return ~0 from all >>>>> reg reads in the code >>>>> and ignore writes if drm_dev_unplugged, this could already >>>>> easily validate a big >>>>> portion of the code flow under such scenario. >>>> Hm yeah if your really wrap them all, that should work too. >>>> Since >>>> iommappings have __iomem pointer type, as long as amdgpu is >>>> sparse >>>> warning free, should be doable to guarantee this. >>> Problem is that ~0 is not always a valid register value. >>> >>> You would need to audit every register read that it doesn't >>> use the returned >>> value blindly as index or similar. That is quite a bit of work. >> Yeah that's the entire crux here :-/ >> -Daniel
On 2/9/21 10:40 AM, Christian König wrote:
Am 09.02.21 um 15:30 schrieb Andrey Grodzovsky:
[SNIP]
Question - Why can't we just set those PTEs to point to system memory (another RO dummy page) filled with 1s ?
Then writes are not discarded. E.g. the 1s would change to something else.
Christian.
I see but, what about marking the mappings as RO and discarding the write access page faults continuously until the device is finalized ? Regarding using an unused range behind the upper bridge as Daniel suggested, I wonder will this interfere with the upcoming feature to support BARs movement during hot plug - https://www.spinics.net/lists/linux-pci/msg103195.html ?
In the picture in my head the address will never be used.
But it doesn't even needs to be an unused range of the root bridge. That one is usually stuffed full by the BIOS.
For my BAR resize work I looked at quite a bunch of NB documentation. At least for AMD CPUs we should always have an MMIO address which is never ever used. That makes this platform/CPU dependent, but the code is just minimal.
The really really nice thing about this approach is that you could unit test and audit the code for problems even without *real* hotplug hardware. E.g. we can swap the kernel PTEs and see how the whole power and display code reacts to that etc....
Christian.
Tried to play with hacking mmio tracer as Daniel suggested but just hanged the machine so...
Can you tell me how to dynamically obtain this kind of unused MMIO address ? I think given such address, writing to which is dropped and reading from return all 1s, I can then do something like bellow, if that what u meant -
for (address = adev->rmmio; address < adev->rmmio_size; adress += PAGE_SIZE) {
old_pte = get_locked_pte(init_mm, address) dummy_pte = pfn_pte(fake_mmio_address, 0); set_pte(&old_pte, dummy_pte)
}
flush_tlb ???
P.S I hope to obtain thunderbolt eGPU adapter soon so even if this won't work I still will be able to test how the driver handles all 1s.
Andrey
Andrey
Andrey
Christian.
> It's a nifty idea indeed otherwise ... > -Daniel > >> Regards, >> Christian. >> >> >>>>>>> But ugh ... >>>>>>> >>>>>>> Otoh validating an entire driver like amdgpu without such a trick >>>>>>> against 0xff reads is practically impossible. So maybe you need to >>>>>>> add >>>>>>> this as one of the tasks here? >>>>>> Or I could just for validation purposes return ~0 from all reg >>>>>> reads in the code >>>>>> and ignore writes if drm_dev_unplugged, this could already easily >>>>>> validate a big >>>>>> portion of the code flow under such scenario. >>>>> Hm yeah if your really wrap them all, that should work too. Since >>>>> iommappings have __iomem pointer type, as long as amdgpu is sparse >>>>> warning free, should be doable to guarantee this. >>>> Problem is that ~0 is not always a valid register value. >>>> >>>> You would need to audit every register read that it doesn't use the >>>> returned >>>> value blindly as index or similar. That is quite a bit of work. >>> Yeah that's the entire crux here :-/ >>> -Daniel
Ping
Andrey
On 2/10/21 5:01 PM, Andrey Grodzovsky wrote:
On 2/9/21 10:40 AM, Christian König wrote:
Am 09.02.21 um 15:30 schrieb Andrey Grodzovsky:
[SNIP]
Question - Why can't we just set those PTEs to point to system memory (another RO dummy page) filled with 1s ?
Then writes are not discarded. E.g. the 1s would change to something else.
Christian.
I see but, what about marking the mappings as RO and discarding the write access page faults continuously until the device is finalized ? Regarding using an unused range behind the upper bridge as Daniel suggested, I wonder will this interfere with the upcoming feature to support BARs movement during hot plug - https://www.spinics.net/lists/linux-pci/msg103195.html ?
In the picture in my head the address will never be used.
But it doesn't even needs to be an unused range of the root bridge. That one is usually stuffed full by the BIOS.
For my BAR resize work I looked at quite a bunch of NB documentation. At least for AMD CPUs we should always have an MMIO address which is never ever used. That makes this platform/CPU dependent, but the code is just minimal.
The really really nice thing about this approach is that you could unit test and audit the code for problems even without *real* hotplug hardware. E.g. we can swap the kernel PTEs and see how the whole power and display code reacts to that etc....
Christian.
Tried to play with hacking mmio tracer as Daniel suggested but just hanged the machine so...
Can you tell me how to dynamically obtain this kind of unused MMIO address ? I think given such address, writing to which is dropped and reading from return all 1s, I can then do something like bellow, if that what u meant -
for (address = adev->rmmio; address < adev->rmmio_size; adress += PAGE_SIZE) {
old_pte = get_locked_pte(init_mm, address) dummy_pte = pfn_pte(fake_mmio_address, 0); set_pte(&old_pte, dummy_pte)
}
flush_tlb ???
P.S I hope to obtain thunderbolt eGPU adapter soon so even if this won't work I still will be able to test how the driver handles all 1s.
Andrey
Andrey
Andrey
> Christian. > >> It's a nifty idea indeed otherwise ... >> -Daniel >> >>> Regards, >>> Christian. >>> >>> >>>>>>>> But ugh ... >>>>>>>> >>>>>>>> Otoh validating an entire driver like amdgpu without such a trick >>>>>>>> against 0xff reads is practically impossible. So maybe you need >>>>>>>> to add >>>>>>>> this as one of the tasks here? >>>>>>> Or I could just for validation purposes return ~0 from all reg >>>>>>> reads in the code >>>>>>> and ignore writes if drm_dev_unplugged, this could already easily >>>>>>> validate a big >>>>>>> portion of the code flow under such scenario. >>>>>> Hm yeah if your really wrap them all, that should work too. Since >>>>>> iommappings have __iomem pointer type, as long as amdgpu is sparse >>>>>> warning free, should be doable to guarantee this. >>>>> Problem is that ~0 is not always a valid register value. >>>>> >>>>> You would need to audit every register read that it doesn't use the >>>>> returned >>>>> value blindly as index or similar. That is quite a bit of work. >>>> Yeah that's the entire crux here :-/ >>>> -Daniel
On 2/8/21 4:37 AM, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP]
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel...
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
But that exactly what Mathew from linux-mm says is not a trivial thing to do, quote:
"
ioremap() is done through the vmalloc space. It would, in theory, be possible to reprogram the page tables used for vmalloc to point to your magic page. I don't think we have such a mechanism today, and there are lots of problems with things like TLB flushes. It's probably going to be harder than you think. "
If you believe it's actually doable then it would be useful not only for simulating device unplugged situation with all MMIOs returning 0xff... but for actual handling of driver accesses to MMIO after device is gone and, we could then drop entirely this patch as there would be no need to guard against such accesses post device unplug.
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
But ~0 is the value that will be returned for every read post device unplug, regardless if it's valid or not, and we have to cope with it then, no ?
Andrey
Regards, Christian.
-Daniel
Andrey
Am 08.02.21 um 23:09 schrieb Andrey Grodzovsky:
On 2/8/21 4:37 AM, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP]
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel...
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
But that exactly what Mathew from linux-mm says is not a trivial thing to do, quote:
"
ioremap() is done through the vmalloc space. It would, in theory, be possible to reprogram the page tables used for vmalloc to point to your magic page. I don't think we have such a mechanism today, and there are lots of problems with things like TLB flushes. It's probably going to be harder than you think. "
I haven't followed the full discussion, but I don't see much preventing this.
All you need is a new ioremap_dummy() function which takes the old start and length of the mapping.
Still a bit core and maybe even platform code, but rather useful I think.
Christian.
If you believe it's actually doable then it would be useful not only for simulating device unplugged situation with all MMIOs returning 0xff... but for actual handling of driver accesses to MMIO after device is gone and, we could then drop entirely this patch as there would be no need to guard against such accesses post device unplug.
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
But ~0 is the value that will be returned for every read post device unplug, regardless if it's valid or not, and we have to cope with it then, no ?
Andrey
Regards, Christian.
-Daniel
Andrey
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
On Tue, Feb 09, 2021 at 09:27:03AM +0100, Christian König wrote:
Am 08.02.21 um 23:09 schrieb Andrey Grodzovsky:
On 2/8/21 4:37 AM, Christian König wrote:
Am 07.02.21 um 22:50 schrieb Daniel Vetter:
[SNIP]
Clarification - as far as I know there are no page fault handlers for kernel mappings. And we are talking about kernel mappings here, right ? If there were I could solve all those issues the same as I do for user mappings, by invalidating all existing mappings in the kernel (both kmaps and ioreamps)and insert dummy zero or ~0 filled page instead. Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve ioremap API and it's not something that I think can be easily done according to am answer i got to a related topic a few weeks ago https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... (that was the only reply i got)
mmiotrace can, but only for debug, and only on x86 platforms:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel...
Should be feasible (but maybe not worth the effort) to extend this to support fake unplug.
Mhm, interesting idea you guys brought up here.
We don't need a page fault for this to work, all we need to do is to insert dummy PTEs into the kernels page table at the place where previously the MMIO mapping has been.
But that exactly what Mathew from linux-mm says is not a trivial thing to do, quote:
"
ioremap() is done through the vmalloc space. It would, in theory, be possible to reprogram the page tables used for vmalloc to point to your magic page. I don't think we have such a mechanism today, and there are lots of problems with things like TLB flushes. It's probably going to be harder than you think. "
I haven't followed the full discussion, but I don't see much preventing this.
All you need is a new ioremap_dummy() function which takes the old start and length of the mapping.
Still a bit core and maybe even platform code, but rather useful I think.
Yeah we don't care about races, so if the tlb flushing isn't perfect that's fine.
Also if we glue this into the mmiotrace infrastructure, that already has all the fault handling. So on x86 I think we could even make it perfect (but that feels like overkill) and fully atomic. Plus the mmiotrace overhead (even if we don't capture anything) is probably a bit much even for testing in CI or somewhere like that. -Daniel
Christian.
If you believe it's actually doable then it would be useful not only for simulating device unplugged situation with all MMIOs returning 0xff... but for actual handling of driver accesses to MMIO after device is gone and, we could then drop entirely this patch as there would be no need to guard against such accesses post device unplug.
But ugh ...
Otoh validating an entire driver like amdgpu without such a trick against 0xff reads is practically impossible. So maybe you need to add this as one of the tasks here?
Or I could just for validation purposes return ~0 from all reg reads in the code and ignore writes if drm_dev_unplugged, this could already easily validate a big portion of the code flow under such scenario.
Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this.
Problem is that ~0 is not always a valid register value.
You would need to audit every register read that it doesn't use the returned value blindly as index or similar. That is quite a bit of work.
But ~0 is the value that will be returned for every read post device unplug, regardless if it's valid or not, and we have to cope with it then, no ?
Andrey
Regards, Christian.
-Daniel
Andrey
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
From: Luben Tuikov luben.tuikov@amd.com
This patch does not change current behaviour.
The driver's job timeout handler now returns status indicating back to the DRM layer whether the task (job) was successfully aborted or whether more time should be given to the task to complete.
Default behaviour as of this patch, is preserved, except in obvious-by-comment case in the Panfrost driver, as documented below.
All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_TASK_STATUS_ALIVE to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch.
In the case of the Panfrost driver, its timedout callback correctly first checks if the job had completed in due time and if so, it now returns DRM_TASK_STATUS_COMPLETE to notify the DRM layer that the task can be moved to the done list, to be freed later. In the other two subsequent checks, the value of DRM_TASK_STATUS_ALIVE is returned, as per the default behaviour.
A more involved driver's solutions can be had in subequent patches.
v2: Use enum as the status of a driver's job timeout callback method.
v4: (By Andrey Grodzovsky) Replace DRM_TASK_STATUS_COMPLETE with DRM_TASK_STATUS_ENODEV to enable a hint to the schduler for when NOT to rearm the timeout timer.
Cc: Alexander Deucher Alexander.Deucher@amd.com Cc: Andrey Grodzovsky Andrey.Grodzovsky@amd.com Cc: Christian König christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: Qiang Yu yuq825@gmail.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Eric Anholt eric@anholt.net Reported-by: kernel test robot lkp@intel.com Signed-off-by: Luben Tuikov luben.tuikov@amd.com Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 ++++-- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++++++++- drivers/gpu/drm/lima/lima_sched.c | 4 +++- drivers/gpu/drm/panfrost/panfrost_job.c | 9 ++++++--- drivers/gpu/drm/scheduler/sched_main.c | 4 +--- drivers/gpu/drm/v3d/v3d_sched.c | 32 +++++++++++++++++--------------- include/drm/gpu_scheduler.h | 17 ++++++++++++++--- 7 files changed, 54 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index ff48101..a111326 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,7 +28,7 @@ #include "amdgpu.h" #include "amdgpu_trace.h"
-static void amdgpu_job_timedout(struct drm_sched_job *s_job) +static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); struct amdgpu_job *job = to_amdgpu_job(s_job); @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name); - return; + return DRM_TASK_STATUS_ALIVE; }
amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti); @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
if (amdgpu_device_should_recover_gpu(ring->adev)) { amdgpu_device_gpu_recover(ring->adev, job); + return DRM_TASK_STATUS_ALIVE; } else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev)) adev->virt.tdr_debug = true; + return DRM_TASK_STATUS_ALIVE; } }
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index cd46c88..c495169 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -82,7 +82,8 @@ static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job) return fence; }
-static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) +static enum drm_task_status etnaviv_sched_timedout_job(struct drm_sched_job + *sched_job) { struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); struct etnaviv_gpu *gpu = submit->gpu; @@ -120,9 +121,16 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
drm_sched_resubmit_jobs(&gpu->sched);
+ /* Tell the DRM scheduler that this task needs + * more time. + */ + drm_sched_start(&gpu->sched, true); + return DRM_TASK_STATUS_ALIVE; + out_no_timeout: /* restart scheduler after GPU is usable again */ drm_sched_start(&gpu->sched, true); + return DRM_TASK_STATUS_ALIVE; }
static void etnaviv_sched_free_job(struct drm_sched_job *sched_job) diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index 63b4c56..66d9236 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -415,7 +415,7 @@ static void lima_sched_build_error_task_list(struct lima_sched_task *task) mutex_unlock(&dev->error_task_list_lock); }
-static void lima_sched_timedout_job(struct drm_sched_job *job) +static enum drm_task_status lima_sched_timedout_job(struct drm_sched_job *job) { struct lima_sched_pipe *pipe = to_lima_pipe(job->sched); struct lima_sched_task *task = to_lima_task(job); @@ -449,6 +449,8 @@ static void lima_sched_timedout_job(struct drm_sched_job *job)
drm_sched_resubmit_jobs(&pipe->base); drm_sched_start(&pipe->base, true); + + return DRM_TASK_STATUS_ALIVE; }
static void lima_sched_free_job(struct drm_sched_job *job) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 04e6f6f..10d41ac 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -432,7 +432,8 @@ static void panfrost_scheduler_start(struct panfrost_queue_state *queue) mutex_unlock(&queue->lock); }
-static void panfrost_job_timedout(struct drm_sched_job *sched_job) +static enum drm_task_status panfrost_job_timedout(struct drm_sched_job + *sched_job) { struct panfrost_job *job = to_panfrost_job(sched_job); struct panfrost_device *pfdev = job->pfdev; @@ -443,7 +444,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) * spurious. Bail out. */ if (dma_fence_is_signaled(job->done_fence)) - return; + return DRM_TASK_STATUS_ALIVE;
dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p", js, @@ -455,11 +456,13 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
/* Scheduler is already stopped, nothing to do. */ if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job)) - return; + return DRM_TASK_STATUS_ALIVE;
/* Schedule a reset if there's no reset in progress. */ if (!atomic_xchg(&pfdev->reset.pending, 1)) schedule_work(&pfdev->reset.work); + + return DRM_TASK_STATUS_ALIVE; }
static const struct drm_sched_backend_ops panfrost_sched_ops = { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 92637b7..73fccc5 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -527,7 +527,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) EXPORT_SYMBOL(drm_sched_start);
/** - * drm_sched_resubmit_jobs - helper to relunch job from pending ring list + * drm_sched_resubmit_jobs - helper to relaunch jobs from the pending list * * @sched: scheduler instance * @@ -561,8 +561,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) } else { s_job->s_fence->parent = fence; } - - } } EXPORT_SYMBOL(drm_sched_resubmit_jobs); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 452682e..3740665e 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -259,7 +259,7 @@ v3d_cache_clean_job_run(struct drm_sched_job *sched_job) return NULL; }
-static void +static enum drm_task_status v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) { enum v3d_queue q; @@ -285,6 +285,8 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) }
mutex_unlock(&v3d->reset_lock); + + return DRM_TASK_STATUS_ALIVE; }
/* If the current address or return address have changed, then the GPU @@ -292,7 +294,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) * could fail if the GPU got in an infinite loop in the CL, but that * is pretty unlikely outside of an i-g-t testcase. */ -static void +static enum drm_task_status v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q, u32 *timedout_ctca, u32 *timedout_ctra) { @@ -304,39 +306,39 @@ v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q, if (*timedout_ctca != ctca || *timedout_ctra != ctra) { *timedout_ctca = ctca; *timedout_ctra = ctra; - return; + return DRM_TASK_STATUS_ALIVE; }
- v3d_gpu_reset_for_timeout(v3d, sched_job); + return v3d_gpu_reset_for_timeout(v3d, sched_job); }
-static void +static enum drm_task_status v3d_bin_job_timedout(struct drm_sched_job *sched_job) { struct v3d_bin_job *job = to_bin_job(sched_job);
- v3d_cl_job_timedout(sched_job, V3D_BIN, - &job->timedout_ctca, &job->timedout_ctra); + return v3d_cl_job_timedout(sched_job, V3D_BIN, + &job->timedout_ctca, &job->timedout_ctra); }
-static void +static enum drm_task_status v3d_render_job_timedout(struct drm_sched_job *sched_job) { struct v3d_render_job *job = to_render_job(sched_job);
- v3d_cl_job_timedout(sched_job, V3D_RENDER, - &job->timedout_ctca, &job->timedout_ctra); + return v3d_cl_job_timedout(sched_job, V3D_RENDER, + &job->timedout_ctca, &job->timedout_ctra); }
-static void +static enum drm_task_status v3d_generic_job_timedout(struct drm_sched_job *sched_job) { struct v3d_job *job = to_v3d_job(sched_job);
- v3d_gpu_reset_for_timeout(job->v3d, sched_job); + return v3d_gpu_reset_for_timeout(job->v3d, sched_job); }
-static void +static enum drm_task_status v3d_csd_job_timedout(struct drm_sched_job *sched_job) { struct v3d_csd_job *job = to_csd_job(sched_job); @@ -348,10 +350,10 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job) */ if (job->timedout_batches != batches) { job->timedout_batches = batches; - return; + return DRM_TASK_STATUS_ALIVE; }
- v3d_gpu_reset_for_timeout(v3d, sched_job); + return v3d_gpu_reset_for_timeout(v3d, sched_job); }
static const struct drm_sched_backend_ops v3d_bin_sched_ops = { diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 975e8a6..3ba36bc 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -206,6 +206,11 @@ static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, return s_job && atomic_inc_return(&s_job->karma) > threshold; }
+enum drm_task_status { + DRM_TASK_STATUS_ENODEV, + DRM_TASK_STATUS_ALIVE +}; + /** * struct drm_sched_backend_ops * @@ -230,10 +235,16 @@ struct drm_sched_backend_ops { struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
/** - * @timedout_job: Called when a job has taken too long to execute, - * to trigger GPU recovery. + * @timedout_job: Called when a job has taken too long to execute, + * to trigger GPU recovery. + * + * Return DRM_TASK_STATUS_ALIVE, if the task (job) is healthy + * and executing in the hardware, i.e. it needs more time. + * + * Return DRM_TASK_STATUS_ENODEV, if the task (job) has + * been aborted. */ - void (*timedout_job)(struct drm_sched_job *sched_job); + enum drm_task_status (*timedout_job)(struct drm_sched_job *sched_job);
/** * @free_job: Called once the job's finished fence has been signaled
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
From: Luben Tuikov luben.tuikov@amd.com
This patch does not change current behaviour.
The driver's job timeout handler now returns status indicating back to the DRM layer whether the task (job) was successfully aborted or whether more time should be given to the task to complete.
Default behaviour as of this patch, is preserved, except in obvious-by-comment case in the Panfrost driver, as documented below.
All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_TASK_STATUS_ALIVE to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch.
In the case of the Panfrost driver, its timedout callback correctly first checks if the job had completed in due time and if so, it now returns DRM_TASK_STATUS_COMPLETE to notify the DRM layer that the task can be moved to the done list, to be freed later. In the other two subsequent checks, the value of DRM_TASK_STATUS_ALIVE is returned, as per the default behaviour.
A more involved driver's solutions can be had in subequent patches.
v2: Use enum as the status of a driver's job timeout callback method.
v4: (By Andrey Grodzovsky) Replace DRM_TASK_STATUS_COMPLETE with DRM_TASK_STATUS_ENODEV to enable a hint to the schduler for when NOT to rearm the timeout timer.
As Lukas pointed out returning the job (or task) status doesn't make much sense.
What we return here is the status of the scheduler.
I would either rename the enum or completely drop it and return a negative error status.
Apart from that looks fine to me, Christian.
Cc: Alexander Deucher Alexander.Deucher@amd.com Cc: Andrey Grodzovsky Andrey.Grodzovsky@amd.com Cc: Christian König christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: Qiang Yu yuq825@gmail.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Eric Anholt eric@anholt.net Reported-by: kernel test robot lkp@intel.com Signed-off-by: Luben Tuikov luben.tuikov@amd.com Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 ++++-- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++++++++- drivers/gpu/drm/lima/lima_sched.c | 4 +++- drivers/gpu/drm/panfrost/panfrost_job.c | 9 ++++++--- drivers/gpu/drm/scheduler/sched_main.c | 4 +--- drivers/gpu/drm/v3d/v3d_sched.c | 32 +++++++++++++++++--------------- include/drm/gpu_scheduler.h | 17 ++++++++++++++--- 7 files changed, 54 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index ff48101..a111326 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,7 +28,7 @@ #include "amdgpu.h" #include "amdgpu_trace.h"
-static void amdgpu_job_timedout(struct drm_sched_job *s_job) +static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); struct amdgpu_job *job = to_amdgpu_job(s_job); @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
return;
return DRM_TASK_STATUS_ALIVE;
}
amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
if (amdgpu_device_should_recover_gpu(ring->adev)) { amdgpu_device_gpu_recover(ring->adev, job);
} else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev)) adev->virt.tdr_debug = true;return DRM_TASK_STATUS_ALIVE;
} }return DRM_TASK_STATUS_ALIVE;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index cd46c88..c495169 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -82,7 +82,8 @@ static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job) return fence; }
-static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) +static enum drm_task_status etnaviv_sched_timedout_job(struct drm_sched_job
{ struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); struct etnaviv_gpu *gpu = submit->gpu;*sched_job)
@@ -120,9 +121,16 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
drm_sched_resubmit_jobs(&gpu->sched);
/* Tell the DRM scheduler that this task needs
* more time.
*/
drm_sched_start(&gpu->sched, true);
return DRM_TASK_STATUS_ALIVE;
out_no_timeout: /* restart scheduler after GPU is usable again */ drm_sched_start(&gpu->sched, true);
return DRM_TASK_STATUS_ALIVE; }
static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index 63b4c56..66d9236 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -415,7 +415,7 @@ static void lima_sched_build_error_task_list(struct lima_sched_task *task) mutex_unlock(&dev->error_task_list_lock); }
-static void lima_sched_timedout_job(struct drm_sched_job *job) +static enum drm_task_status lima_sched_timedout_job(struct drm_sched_job *job) { struct lima_sched_pipe *pipe = to_lima_pipe(job->sched); struct lima_sched_task *task = to_lima_task(job); @@ -449,6 +449,8 @@ static void lima_sched_timedout_job(struct drm_sched_job *job)
drm_sched_resubmit_jobs(&pipe->base); drm_sched_start(&pipe->base, true);
return DRM_TASK_STATUS_ALIVE; }
static void lima_sched_free_job(struct drm_sched_job *job)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 04e6f6f..10d41ac 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -432,7 +432,8 @@ static void panfrost_scheduler_start(struct panfrost_queue_state *queue) mutex_unlock(&queue->lock); }
-static void panfrost_job_timedout(struct drm_sched_job *sched_job) +static enum drm_task_status panfrost_job_timedout(struct drm_sched_job
{ struct panfrost_job *job = to_panfrost_job(sched_job); struct panfrost_device *pfdev = job->pfdev;*sched_job)
@@ -443,7 +444,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) * spurious. Bail out. */ if (dma_fence_is_signaled(job->done_fence))
return;
return DRM_TASK_STATUS_ALIVE;
dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p", js,
@@ -455,11 +456,13 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
/* Scheduler is already stopped, nothing to do. */ if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
return;
return DRM_TASK_STATUS_ALIVE;
/* Schedule a reset if there's no reset in progress. */ if (!atomic_xchg(&pfdev->reset.pending, 1)) schedule_work(&pfdev->reset.work);
return DRM_TASK_STATUS_ALIVE; }
static const struct drm_sched_backend_ops panfrost_sched_ops = {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 92637b7..73fccc5 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -527,7 +527,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) EXPORT_SYMBOL(drm_sched_start);
/**
- drm_sched_resubmit_jobs - helper to relunch job from pending ring list
- drm_sched_resubmit_jobs - helper to relaunch jobs from the pending list
- @sched: scheduler instance
@@ -561,8 +561,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) } else { s_job->s_fence->parent = fence; }
- } } EXPORT_SYMBOL(drm_sched_resubmit_jobs);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 452682e..3740665e 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -259,7 +259,7 @@ v3d_cache_clean_job_run(struct drm_sched_job *sched_job) return NULL; }
-static void +static enum drm_task_status v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) { enum v3d_queue q; @@ -285,6 +285,8 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) }
mutex_unlock(&v3d->reset_lock);
return DRM_TASK_STATUS_ALIVE; }
/* If the current address or return address have changed, then the GPU
@@ -292,7 +294,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
- could fail if the GPU got in an infinite loop in the CL, but that
- is pretty unlikely outside of an i-g-t testcase.
*/ -static void +static enum drm_task_status v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q, u32 *timedout_ctca, u32 *timedout_ctra) { @@ -304,39 +306,39 @@ v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q, if (*timedout_ctca != ctca || *timedout_ctra != ctra) { *timedout_ctca = ctca; *timedout_ctra = ctra;
return;
}return DRM_TASK_STATUS_ALIVE;
- v3d_gpu_reset_for_timeout(v3d, sched_job);
- return v3d_gpu_reset_for_timeout(v3d, sched_job); }
-static void +static enum drm_task_status v3d_bin_job_timedout(struct drm_sched_job *sched_job) { struct v3d_bin_job *job = to_bin_job(sched_job);
- v3d_cl_job_timedout(sched_job, V3D_BIN,
&job->timedout_ctca, &job->timedout_ctra);
- return v3d_cl_job_timedout(sched_job, V3D_BIN,
}&job->timedout_ctca, &job->timedout_ctra);
-static void +static enum drm_task_status v3d_render_job_timedout(struct drm_sched_job *sched_job) { struct v3d_render_job *job = to_render_job(sched_job);
- v3d_cl_job_timedout(sched_job, V3D_RENDER,
&job->timedout_ctca, &job->timedout_ctra);
- return v3d_cl_job_timedout(sched_job, V3D_RENDER,
}&job->timedout_ctca, &job->timedout_ctra);
-static void +static enum drm_task_status v3d_generic_job_timedout(struct drm_sched_job *sched_job) { struct v3d_job *job = to_v3d_job(sched_job);
- v3d_gpu_reset_for_timeout(job->v3d, sched_job);
- return v3d_gpu_reset_for_timeout(job->v3d, sched_job); }
-static void +static enum drm_task_status v3d_csd_job_timedout(struct drm_sched_job *sched_job) { struct v3d_csd_job *job = to_csd_job(sched_job); @@ -348,10 +350,10 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job) */ if (job->timedout_batches != batches) { job->timedout_batches = batches;
return;
}return DRM_TASK_STATUS_ALIVE;
- v3d_gpu_reset_for_timeout(v3d, sched_job);
return v3d_gpu_reset_for_timeout(v3d, sched_job); }
static const struct drm_sched_backend_ops v3d_bin_sched_ops = {
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 975e8a6..3ba36bc 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -206,6 +206,11 @@ static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, return s_job && atomic_inc_return(&s_job->karma) > threshold; }
+enum drm_task_status {
- DRM_TASK_STATUS_ENODEV,
- DRM_TASK_STATUS_ALIVE
+};
- /**
- struct drm_sched_backend_ops
@@ -230,10 +235,16 @@ struct drm_sched_backend_ops { struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
/**
* @timedout_job: Called when a job has taken too long to execute,
* to trigger GPU recovery.
* @timedout_job: Called when a job has taken too long to execute,
* to trigger GPU recovery.
*
* Return DRM_TASK_STATUS_ALIVE, if the task (job) is healthy
* and executing in the hardware, i.e. it needs more time.
*
* Return DRM_TASK_STATUS_ENODEV, if the task (job) has
*/* been aborted.
- void (*timedout_job)(struct drm_sched_job *sched_job);
enum drm_task_status (*timedout_job)(struct drm_sched_job *sched_job);
/** * @free_job: Called once the job's finished fence has been signaled
On 2021-01-19 2:53 a.m., Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
From: Luben Tuikov luben.tuikov@amd.com
This patch does not change current behaviour.
The driver's job timeout handler now returns status indicating back to the DRM layer whether the task (job) was successfully aborted or whether more time should be given to the task to complete.
Default behaviour as of this patch, is preserved, except in obvious-by-comment case in the Panfrost driver, as documented below.
All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_TASK_STATUS_ALIVE to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch.
In the case of the Panfrost driver, its timedout callback correctly first checks if the job had completed in due time and if so, it now returns DRM_TASK_STATUS_COMPLETE to notify the DRM layer that the task can be moved to the done list, to be freed later. In the other two subsequent checks, the value of DRM_TASK_STATUS_ALIVE is returned, as per the default behaviour.
A more involved driver's solutions can be had in subequent patches.
v2: Use enum as the status of a driver's job timeout callback method.
v4: (By Andrey Grodzovsky) Replace DRM_TASK_STATUS_COMPLETE with DRM_TASK_STATUS_ENODEV to enable a hint to the schduler for when NOT to rearm the timeout timer.
As Lukas pointed out returning the job (or task) status doesn't make much sense.
What we return here is the status of the scheduler.
I would either rename the enum or completely drop it and return a negative error status.
Yes, that could be had.
Although, dropping the enum and returning [-1, 0], might make the return status meaning vague. Using an enum with an appropriate name, makes the intention clear to the next programmer.
Now, Andrey did rename one of the enumerated values to DRM_TASK_STATUS_ENODEV, perhaps the same but with:
enum drm_sched_status { DRM_SCHED_STAT_NONE, /* Reserve 0 */ DRM_SCHED_STAT_NOMINAL, DRM_SCHED_STAT_ENODEV, };
and also renaming the enum to the above would be acceptable?
Regards, Luben
Apart from that looks fine to me, Christian.
Cc: Alexander Deucher Alexander.Deucher@amd.com Cc: Andrey Grodzovsky Andrey.Grodzovsky@amd.com Cc: Christian König christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: Qiang Yu yuq825@gmail.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Eric Anholt eric@anholt.net Reported-by: kernel test robot lkp@intel.com Signed-off-by: Luben Tuikov luben.tuikov@amd.com Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 ++++-- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++++++++- drivers/gpu/drm/lima/lima_sched.c | 4 +++- drivers/gpu/drm/panfrost/panfrost_job.c | 9 ++++++--- drivers/gpu/drm/scheduler/sched_main.c | 4 +--- drivers/gpu/drm/v3d/v3d_sched.c | 32 +++++++++++++++++--------------- include/drm/gpu_scheduler.h | 17 ++++++++++++++--- 7 files changed, 54 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index ff48101..a111326 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,7 +28,7 @@ #include "amdgpu.h" #include "amdgpu_trace.h"
-static void amdgpu_job_timedout(struct drm_sched_job *s_job) +static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); struct amdgpu_job *job = to_amdgpu_job(s_job); @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
return;
return DRM_TASK_STATUS_ALIVE;
}
amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
if (amdgpu_device_should_recover_gpu(ring->adev)) { amdgpu_device_gpu_recover(ring->adev, job);
} else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev)) adev->virt.tdr_debug = true;return DRM_TASK_STATUS_ALIVE;
} }return DRM_TASK_STATUS_ALIVE;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index cd46c88..c495169 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -82,7 +82,8 @@ static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job) return fence; }
-static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) +static enum drm_task_status etnaviv_sched_timedout_job(struct drm_sched_job
{ struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); struct etnaviv_gpu *gpu = submit->gpu;*sched_job)
@@ -120,9 +121,16 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
drm_sched_resubmit_jobs(&gpu->sched);
/* Tell the DRM scheduler that this task needs
* more time.
*/
drm_sched_start(&gpu->sched, true);
return DRM_TASK_STATUS_ALIVE;
out_no_timeout: /* restart scheduler after GPU is usable again */ drm_sched_start(&gpu->sched, true);
return DRM_TASK_STATUS_ALIVE; }
static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index 63b4c56..66d9236 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -415,7 +415,7 @@ static void lima_sched_build_error_task_list(struct lima_sched_task *task) mutex_unlock(&dev->error_task_list_lock); }
-static void lima_sched_timedout_job(struct drm_sched_job *job) +static enum drm_task_status lima_sched_timedout_job(struct drm_sched_job *job) { struct lima_sched_pipe *pipe = to_lima_pipe(job->sched); struct lima_sched_task *task = to_lima_task(job); @@ -449,6 +449,8 @@ static void lima_sched_timedout_job(struct drm_sched_job *job)
drm_sched_resubmit_jobs(&pipe->base); drm_sched_start(&pipe->base, true);
return DRM_TASK_STATUS_ALIVE; }
static void lima_sched_free_job(struct drm_sched_job *job)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 04e6f6f..10d41ac 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -432,7 +432,8 @@ static void panfrost_scheduler_start(struct panfrost_queue_state *queue) mutex_unlock(&queue->lock); }
-static void panfrost_job_timedout(struct drm_sched_job *sched_job) +static enum drm_task_status panfrost_job_timedout(struct drm_sched_job
{ struct panfrost_job *job = to_panfrost_job(sched_job); struct panfrost_device *pfdev = job->pfdev;*sched_job)
@@ -443,7 +444,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) * spurious. Bail out. */ if (dma_fence_is_signaled(job->done_fence))
return;
return DRM_TASK_STATUS_ALIVE;
dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p", js,
@@ -455,11 +456,13 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
/* Scheduler is already stopped, nothing to do. */ if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
return;
return DRM_TASK_STATUS_ALIVE;
/* Schedule a reset if there's no reset in progress. */ if (!atomic_xchg(&pfdev->reset.pending, 1)) schedule_work(&pfdev->reset.work);
return DRM_TASK_STATUS_ALIVE; }
static const struct drm_sched_backend_ops panfrost_sched_ops = {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 92637b7..73fccc5 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -527,7 +527,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) EXPORT_SYMBOL(drm_sched_start);
/**
- drm_sched_resubmit_jobs - helper to relunch job from pending ring list
- drm_sched_resubmit_jobs - helper to relaunch jobs from the pending list
- @sched: scheduler instance
@@ -561,8 +561,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) } else { s_job->s_fence->parent = fence; }
- } } EXPORT_SYMBOL(drm_sched_resubmit_jobs);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 452682e..3740665e 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -259,7 +259,7 @@ v3d_cache_clean_job_run(struct drm_sched_job *sched_job) return NULL; }
-static void +static enum drm_task_status v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) { enum v3d_queue q; @@ -285,6 +285,8 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) }
mutex_unlock(&v3d->reset_lock);
return DRM_TASK_STATUS_ALIVE; }
/* If the current address or return address have changed, then the GPU
@@ -292,7 +294,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
- could fail if the GPU got in an infinite loop in the CL, but that
- is pretty unlikely outside of an i-g-t testcase.
*/ -static void +static enum drm_task_status v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q, u32 *timedout_ctca, u32 *timedout_ctra) { @@ -304,39 +306,39 @@ v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q, if (*timedout_ctca != ctca || *timedout_ctra != ctra) { *timedout_ctca = ctca; *timedout_ctra = ctra;
return;
}return DRM_TASK_STATUS_ALIVE;
- v3d_gpu_reset_for_timeout(v3d, sched_job);
- return v3d_gpu_reset_for_timeout(v3d, sched_job); }
-static void +static enum drm_task_status v3d_bin_job_timedout(struct drm_sched_job *sched_job) { struct v3d_bin_job *job = to_bin_job(sched_job);
- v3d_cl_job_timedout(sched_job, V3D_BIN,
&job->timedout_ctca, &job->timedout_ctra);
- return v3d_cl_job_timedout(sched_job, V3D_BIN,
}&job->timedout_ctca, &job->timedout_ctra);
-static void +static enum drm_task_status v3d_render_job_timedout(struct drm_sched_job *sched_job) { struct v3d_render_job *job = to_render_job(sched_job);
- v3d_cl_job_timedout(sched_job, V3D_RENDER,
&job->timedout_ctca, &job->timedout_ctra);
- return v3d_cl_job_timedout(sched_job, V3D_RENDER,
}&job->timedout_ctca, &job->timedout_ctra);
-static void +static enum drm_task_status v3d_generic_job_timedout(struct drm_sched_job *sched_job) { struct v3d_job *job = to_v3d_job(sched_job);
- v3d_gpu_reset_for_timeout(job->v3d, sched_job);
- return v3d_gpu_reset_for_timeout(job->v3d, sched_job); }
-static void +static enum drm_task_status v3d_csd_job_timedout(struct drm_sched_job *sched_job) { struct v3d_csd_job *job = to_csd_job(sched_job); @@ -348,10 +350,10 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job) */ if (job->timedout_batches != batches) { job->timedout_batches = batches;
return;
}return DRM_TASK_STATUS_ALIVE;
- v3d_gpu_reset_for_timeout(v3d, sched_job);
return v3d_gpu_reset_for_timeout(v3d, sched_job); }
static const struct drm_sched_backend_ops v3d_bin_sched_ops = {
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 975e8a6..3ba36bc 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -206,6 +206,11 @@ static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, return s_job && atomic_inc_return(&s_job->karma) > threshold; }
+enum drm_task_status {
- DRM_TASK_STATUS_ENODEV,
- DRM_TASK_STATUS_ALIVE
+};
- /**
- struct drm_sched_backend_ops
@@ -230,10 +235,16 @@ struct drm_sched_backend_ops { struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
/**
* @timedout_job: Called when a job has taken too long to execute,
* to trigger GPU recovery.
* @timedout_job: Called when a job has taken too long to execute,
* to trigger GPU recovery.
*
* Return DRM_TASK_STATUS_ALIVE, if the task (job) is healthy
* and executing in the hardware, i.e. it needs more time.
*
* Return DRM_TASK_STATUS_ENODEV, if the task (job) has
*/* been aborted.
- void (*timedout_job)(struct drm_sched_job *sched_job);
enum drm_task_status (*timedout_job)(struct drm_sched_job *sched_job);
/** * @free_job: Called once the job's finished fence has been signaled
Am 19.01.21 um 18:47 schrieb Luben Tuikov:
On 2021-01-19 2:53 a.m., Christian König wrote:
Am 18.01.21 um 22:01 schrieb Andrey Grodzovsky:
From: Luben Tuikov luben.tuikov@amd.com
This patch does not change current behaviour.
The driver's job timeout handler now returns status indicating back to the DRM layer whether the task (job) was successfully aborted or whether more time should be given to the task to complete.
Default behaviour as of this patch, is preserved, except in obvious-by-comment case in the Panfrost driver, as documented below.
All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_TASK_STATUS_ALIVE to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch.
In the case of the Panfrost driver, its timedout callback correctly first checks if the job had completed in due time and if so, it now returns DRM_TASK_STATUS_COMPLETE to notify the DRM layer that the task can be moved to the done list, to be freed later. In the other two subsequent checks, the value of DRM_TASK_STATUS_ALIVE is returned, as per the default behaviour.
A more involved driver's solutions can be had in subequent patches.
v2: Use enum as the status of a driver's job timeout callback method.
v4: (By Andrey Grodzovsky) Replace DRM_TASK_STATUS_COMPLETE with DRM_TASK_STATUS_ENODEV to enable a hint to the schduler for when NOT to rearm the timeout timer.
As Lukas pointed out returning the job (or task) status doesn't make much sense.
What we return here is the status of the scheduler.
I would either rename the enum or completely drop it and return a negative error status.
Yes, that could be had.
Although, dropping the enum and returning [-1, 0], might make the return status meaning vague. Using an enum with an appropriate name, makes the intention clear to the next programmer.
Completely agree, but -ENODEV and 0 could work.
On the other hand using DRM_SCHED_* is perfectly fine with me as well.
Christian.
Now, Andrey did rename one of the enumerated values to DRM_TASK_STATUS_ENODEV, perhaps the same but with:
enum drm_sched_status { DRM_SCHED_STAT_NONE, /* Reserve 0 */ DRM_SCHED_STAT_NOMINAL, DRM_SCHED_STAT_ENODEV, };
and also renaming the enum to the above would be acceptable?
Regards, Luben
Apart from that looks fine to me, Christian.
Cc: Alexander Deucher Alexander.Deucher@amd.com Cc: Andrey Grodzovsky Andrey.Grodzovsky@amd.com Cc: Christian König christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: Qiang Yu yuq825@gmail.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Eric Anholt eric@anholt.net Reported-by: kernel test robot lkp@intel.com Signed-off-by: Luben Tuikov luben.tuikov@amd.com Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 ++++-- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++++++++- drivers/gpu/drm/lima/lima_sched.c | 4 +++- drivers/gpu/drm/panfrost/panfrost_job.c | 9 ++++++--- drivers/gpu/drm/scheduler/sched_main.c | 4 +--- drivers/gpu/drm/v3d/v3d_sched.c | 32 +++++++++++++++++--------------- include/drm/gpu_scheduler.h | 17 ++++++++++++++--- 7 files changed, 54 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index ff48101..a111326 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,7 +28,7 @@ #include "amdgpu.h" #include "amdgpu_trace.h"
-static void amdgpu_job_timedout(struct drm_sched_job *s_job) +static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); struct amdgpu_job *job = to_amdgpu_job(s_job); @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name);
return;
return DRM_TASK_STATUS_ALIVE;
}
amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
if (amdgpu_device_should_recover_gpu(ring->adev)) { amdgpu_device_gpu_recover(ring->adev, job);
} else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev)) adev->virt.tdr_debug = true;return DRM_TASK_STATUS_ALIVE;
} }return DRM_TASK_STATUS_ALIVE;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index cd46c88..c495169 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -82,7 +82,8 @@ static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job) return fence; }
-static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) +static enum drm_task_status etnaviv_sched_timedout_job(struct drm_sched_job
{ struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); struct etnaviv_gpu *gpu = submit->gpu;*sched_job)
@@ -120,9 +121,16 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
drm_sched_resubmit_jobs(&gpu->sched);
/* Tell the DRM scheduler that this task needs
* more time.
*/
drm_sched_start(&gpu->sched, true);
return DRM_TASK_STATUS_ALIVE;
out_no_timeout: /* restart scheduler after GPU is usable again */ drm_sched_start(&gpu->sched, true);
return DRM_TASK_STATUS_ALIVE; }
static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index 63b4c56..66d9236 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -415,7 +415,7 @@ static void lima_sched_build_error_task_list(struct lima_sched_task *task) mutex_unlock(&dev->error_task_list_lock); }
-static void lima_sched_timedout_job(struct drm_sched_job *job) +static enum drm_task_status lima_sched_timedout_job(struct drm_sched_job *job) { struct lima_sched_pipe *pipe = to_lima_pipe(job->sched); struct lima_sched_task *task = to_lima_task(job); @@ -449,6 +449,8 @@ static void lima_sched_timedout_job(struct drm_sched_job *job)
drm_sched_resubmit_jobs(&pipe->base); drm_sched_start(&pipe->base, true);
return DRM_TASK_STATUS_ALIVE; }
static void lima_sched_free_job(struct drm_sched_job *job)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 04e6f6f..10d41ac 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -432,7 +432,8 @@ static void panfrost_scheduler_start(struct panfrost_queue_state *queue) mutex_unlock(&queue->lock); }
-static void panfrost_job_timedout(struct drm_sched_job *sched_job) +static enum drm_task_status panfrost_job_timedout(struct drm_sched_job
{ struct panfrost_job *job = to_panfrost_job(sched_job); struct panfrost_device *pfdev = job->pfdev;*sched_job)
@@ -443,7 +444,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) * spurious. Bail out. */ if (dma_fence_is_signaled(job->done_fence))
return;
return DRM_TASK_STATUS_ALIVE;
dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p", js,
@@ -455,11 +456,13 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
/* Scheduler is already stopped, nothing to do. */ if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
return;
return DRM_TASK_STATUS_ALIVE;
/* Schedule a reset if there's no reset in progress. */ if (!atomic_xchg(&pfdev->reset.pending, 1)) schedule_work(&pfdev->reset.work);
return DRM_TASK_STATUS_ALIVE; }
static const struct drm_sched_backend_ops panfrost_sched_ops = {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 92637b7..73fccc5 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -527,7 +527,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) EXPORT_SYMBOL(drm_sched_start);
/**
- drm_sched_resubmit_jobs - helper to relunch job from pending ring list
- drm_sched_resubmit_jobs - helper to relaunch jobs from the pending list
- @sched: scheduler instance
@@ -561,8 +561,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) } else { s_job->s_fence->parent = fence; }
- } } EXPORT_SYMBOL(drm_sched_resubmit_jobs);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 452682e..3740665e 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -259,7 +259,7 @@ v3d_cache_clean_job_run(struct drm_sched_job *sched_job) return NULL; }
-static void +static enum drm_task_status v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) { enum v3d_queue q; @@ -285,6 +285,8 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) }
mutex_unlock(&v3d->reset_lock);
return DRM_TASK_STATUS_ALIVE; }
/* If the current address or return address have changed, then the GPU
@@ -292,7 +294,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) * could fail if the GPU got in an infinite loop in the CL, but that * is pretty unlikely outside of an i-g-t testcase. */ -static void +static enum drm_task_status v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q, u32 *timedout_ctca, u32 *timedout_ctra) { @@ -304,39 +306,39 @@ v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum v3d_queue q, if (*timedout_ctca != ctca || *timedout_ctra != ctra) { *timedout_ctca = ctca; *timedout_ctra = ctra;
return;
}return DRM_TASK_STATUS_ALIVE;
- v3d_gpu_reset_for_timeout(v3d, sched_job);
- return v3d_gpu_reset_for_timeout(v3d, sched_job); }
-static void +static enum drm_task_status v3d_bin_job_timedout(struct drm_sched_job *sched_job) { struct v3d_bin_job *job = to_bin_job(sched_job);
- v3d_cl_job_timedout(sched_job, V3D_BIN,
&job->timedout_ctca, &job->timedout_ctra);
- return v3d_cl_job_timedout(sched_job, V3D_BIN,
}&job->timedout_ctca, &job->timedout_ctra);
-static void +static enum drm_task_status v3d_render_job_timedout(struct drm_sched_job *sched_job) { struct v3d_render_job *job = to_render_job(sched_job);
- v3d_cl_job_timedout(sched_job, V3D_RENDER,
&job->timedout_ctca, &job->timedout_ctra);
- return v3d_cl_job_timedout(sched_job, V3D_RENDER,
}&job->timedout_ctca, &job->timedout_ctra);
-static void +static enum drm_task_status v3d_generic_job_timedout(struct drm_sched_job *sched_job) { struct v3d_job *job = to_v3d_job(sched_job);
- v3d_gpu_reset_for_timeout(job->v3d, sched_job);
- return v3d_gpu_reset_for_timeout(job->v3d, sched_job); }
-static void +static enum drm_task_status v3d_csd_job_timedout(struct drm_sched_job *sched_job) { struct v3d_csd_job *job = to_csd_job(sched_job); @@ -348,10 +350,10 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job) */ if (job->timedout_batches != batches) { job->timedout_batches = batches;
return;
}return DRM_TASK_STATUS_ALIVE;
- v3d_gpu_reset_for_timeout(v3d, sched_job);
return v3d_gpu_reset_for_timeout(v3d, sched_job); }
static const struct drm_sched_backend_ops v3d_bin_sched_ops = {
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 975e8a6..3ba36bc 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -206,6 +206,11 @@ static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, return s_job && atomic_inc_return(&s_job->karma) > threshold; }
+enum drm_task_status {
- DRM_TASK_STATUS_ENODEV,
- DRM_TASK_STATUS_ALIVE
+};
- /**
- struct drm_sched_backend_ops
@@ -230,10 +235,16 @@ struct drm_sched_backend_ops { struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
/**
* @timedout_job: Called when a job has taken too long to execute,
* to trigger GPU recovery.
* @timedout_job: Called when a job has taken too long to execute,
* to trigger GPU recovery.
*
* Return DRM_TASK_STATUS_ALIVE, if the task (job) is healthy
* and executing in the hardware, i.e. it needs more time.
*
* Return DRM_TASK_STATUS_ENODEV, if the task (job) has
*/* been aborted.
- void (*timedout_job)(struct drm_sched_job *sched_job);
enum drm_task_status (*timedout_job)(struct drm_sched_job *sched_job);
/** * @free_job: Called once the job's finished fence has been signaled
We don't want to rearm the timer if driver hook reports that the device is gone.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 73fccc5..9552334 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work) { struct drm_gpu_scheduler *sched; struct drm_sched_job *job; + enum drm_task_status status = DRM_TASK_STATUS_ALIVE;
sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
@@ -331,7 +332,7 @@ static void drm_sched_job_timedout(struct work_struct *work) list_del_init(&job->list); spin_unlock(&sched->job_list_lock);
- job->sched->ops->timedout_job(job); + status = job->sched->ops->timedout_job(job);
/* * Guilty job did complete and hence needs to be manually removed @@ -345,9 +346,11 @@ static void drm_sched_job_timedout(struct work_struct *work) spin_unlock(&sched->job_list_lock); }
- spin_lock(&sched->job_list_lock); - drm_sched_start_timeout(sched); - spin_unlock(&sched->job_list_lock); + if (status != DRM_TASK_STATUS_ENODEV) { + spin_lock(&sched->job_list_lock); + drm_sched_start_timeout(sched); + spin_unlock(&sched->job_list_lock); + } }
/**
Return DRM_TASK_STATUS_ENODEV back to the scheduler when device is not present so they timeout timer will not be rearmed.
Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index a111326..e4aa5fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -25,6 +25,8 @@ #include <linux/wait.h> #include <linux/sched.h>
+#include <drm/drm_drv.h> + #include "amdgpu.h" #include "amdgpu_trace.h"
@@ -34,6 +36,15 @@ static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job *s_job) struct amdgpu_job *job = to_amdgpu_job(s_job); struct amdgpu_task_info ti; struct amdgpu_device *adev = ring->adev; + int idx; + + if (!drm_dev_enter(&adev->ddev, &idx)) { + DRM_INFO("%s - device unplugged skipping recovery on scheduler:%s", + __func__, s_job->sched->name); + + /* Effectively the job is aborted as the device is gone */ + return DRM_TASK_STATUS_ENODEV; + }
memset(&ti, 0, sizeof(struct amdgpu_task_info));
@@ -41,7 +52,7 @@ static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job *s_job) amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name); - return DRM_TASK_STATUS_ALIVE; + goto exit; }
amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti); @@ -53,13 +64,15 @@ static enum drm_task_status amdgpu_job_timedout(struct drm_sched_job *s_job)
if (amdgpu_device_should_recover_gpu(ring->adev)) { amdgpu_device_gpu_recover(ring->adev, job); - return DRM_TASK_STATUS_ALIVE; } else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev)) adev->virt.tdr_debug = true; - return DRM_TASK_STATUS_ALIVE; } + +exit: + drm_dev_exit(idx); + return DRM_TASK_STATUS_ALIVE; }
int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote:
Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handlerif the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUSit would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requirements that still give useful solution to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
v3: More updates following comments from v2 such as removing loop to find DRM file when rerouting page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving prevention of GPU recovery post device unplug from amdgpu to scheduler layer. On top of that added unplug support for the IOMMU enabled system.
v4: Drop last sysfs hack and use sysfs default attribute. Guard against write accesses after device removal to avoid modifying released memory. Update dummy pages handling to on demand allocation and release through drm managed framework. Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention of GPU recovery post device unplug Also rebase on top of drm-misc-mext instead of amd-staging-drm-next
With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses
TODOs for followup work: Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.
[1] - Discussions during v3 of the patchset https://www.spinics.net/lists/amd-gfx/msg55576.html [2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html [3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081
btw have you tried this out with some of the igts we have? core_hotunplug is the one I'm thinking of. Might be worth to extend this for amdgpu specific stuff (like run some batches on it while hotunplugging).
Since there's so many corner cases we need to test here (shared dma-buf, shared dma_fence) I think it would make sense to have a shared testcase across drivers. Only specific thing would be some hooks to keep the gpu busy in some fashion while we yank the driver. But just to get it started you can throw in entirely amdgpu specific subtests and just share some of the test code. -Daniel
Andrey Grodzovsky (13): drm/ttm: Remap all page faults to per process dummy page. drm: Unamp the entire device address space on device unplug drm/ttm: Expose ttm_tt_unpopulate for driver use drm/sched: Cancel and flush all oustatdning jobs before finish. drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Add early fini callback drm/amdgpu: Register IOMMU topology notifier per device. drm/amdgpu: Fix a bunch of sdma code crash post device unplug drm/amdgpu: Remap all page faults to per process dummy page. dmr/amdgpu: Move some sysfs attrs creation to default_attr drm/amdgpu: Guard against write accesses after device removal drm/sched: Make timeout timer rearm conditional. drm/amdgpu: Prevent any job recoveries after device is unplugged.
Luben Tuikov (1): drm/scheduler: Job timeout handler returns status
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- drivers/gpu/drm/amd/include/amd_shared.h | 2 + drivers/gpu/drm/drm_drv.c | 3 + drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- drivers/gpu/drm/lima/lima_sched.c | 4 +- drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- drivers/gpu/drm/scheduler/sched_main.c | 18 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- include/drm/gpu_scheduler.h | 17 ++- include/drm/ttm/ttm_bo_api.h | 2 + 45 files changed, 583 insertions(+), 198 deletions(-)
-- 2.7.4
On 1/19/21 9:16 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote:
Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handlerif the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUSit would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requirements that still give useful solution to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
v3: More updates following comments from v2 such as removing loop to find DRM file when rerouting page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving prevention of GPU recovery post device unplug from amdgpu to scheduler layer. On top of that added unplug support for the IOMMU enabled system.
v4: Drop last sysfs hack and use sysfs default attribute. Guard against write accesses after device removal to avoid modifying released memory. Update dummy pages handling to on demand allocation and release through drm managed framework. Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention of GPU recovery post device unplug Also rebase on top of drm-misc-mext instead of amd-staging-drm-next
With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses
TODOs for followup work: Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.
[1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
btw have you tried this out with some of the igts we have? core_hotunplug is the one I'm thinking of. Might be worth to extend this for amdgpu specific stuff (like run some batches on it while hotunplugging).
No, I mostly used just running glxgears while testing which covers already exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu test suite
Since there's so many corner cases we need to test here (shared dma-buf, shared dma_fence) I think it would make sense to have a shared testcase across drivers.
Not familiar with IGT too much, is there an easy way to setup shared dma bufs and fences use cases there or you mean I need to add them now ?
Only specific thing would be some hooks to keep the gpu busy in some fashion while we yank the driver.
Do you mean like staring X and some active rendering on top (like glxgears) automatically from within IGT ?
But just to get it started you can throw in entirely amdgpu specific subtests and just share some of the test code. -Daniel
Im general, I wasn't aware of this test suite and looks like it does what i test among other stuff. I will definitely try to run with it although the rescan part will not work as plugging the device back is in my TODO list and not part of the scope for this patchset and so I will probably comment the re-scan section out while testing.
Andrey
Andrey Grodzovsky (13): drm/ttm: Remap all page faults to per process dummy page. drm: Unamp the entire device address space on device unplug drm/ttm: Expose ttm_tt_unpopulate for driver use drm/sched: Cancel and flush all oustatdning jobs before finish. drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Add early fini callback drm/amdgpu: Register IOMMU topology notifier per device. drm/amdgpu: Fix a bunch of sdma code crash post device unplug drm/amdgpu: Remap all page faults to per process dummy page. dmr/amdgpu: Move some sysfs attrs creation to default_attr drm/amdgpu: Guard against write accesses after device removal drm/sched: Make timeout timer rearm conditional. drm/amdgpu: Prevent any job recoveries after device is unplugged.
Luben Tuikov (1): drm/scheduler: Job timeout handler returns status
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- drivers/gpu/drm/amd/include/amd_shared.h | 2 + drivers/gpu/drm/drm_drv.c | 3 + drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- drivers/gpu/drm/lima/lima_sched.c | 4 +- drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- drivers/gpu/drm/scheduler/sched_main.c | 18 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- include/drm/gpu_scheduler.h | 17 ++- include/drm/ttm/ttm_bo_api.h | 2 + 45 files changed, 583 insertions(+), 198 deletions(-)
-- 2.7.4
On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 9:16 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote:
Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handlerif the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUSit would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requirements that still give useful solution to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
v3: More updates following comments from v2 such as removing loop to find DRM file when rerouting page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving prevention of GPU recovery post device unplug from amdgpu to scheduler layer. On top of that added unplug support for the IOMMU enabled system.
v4: Drop last sysfs hack and use sysfs default attribute. Guard against write accesses after device removal to avoid modifying released memory. Update dummy pages handling to on demand allocation and release through drm managed framework. Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention of GPU recovery post device unplug Also rebase on top of drm-misc-mext instead of amd-staging-drm-next
With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses
TODOs for followup work: Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.
[1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
btw have you tried this out with some of the igts we have? core_hotunplug is the one I'm thinking of. Might be worth to extend this for amdgpu specific stuff (like run some batches on it while hotunplugging).
No, I mostly used just running glxgears while testing which covers already exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu test suite
Since there's so many corner cases we need to test here (shared dma-buf, shared dma_fence) I think it would make sense to have a shared testcase across drivers.
Not familiar with IGT too much, is there an easy way to setup shared dma bufs and fences use cases there or you mean I need to add them now ?
We do have test infrastructure for all of that, but the hotunplug test doesn't have that yet I think.
Only specific thing would be some hooks to keep the gpu busy in some fashion while we yank the driver.
Do you mean like staring X and some active rendering on top (like glxgears) automatically from within IGT ?
Nope, igt is meant to be bare metal testing so you don't have to drag the entire winsys around (which in a wayland world, is not really good for driver testing anyway, since everything is different). We use this for our pre-merge ci for drm/i915.
But just to get it started you can throw in entirely amdgpu specific subtests and just share some of the test code. -Daniel
Im general, I wasn't aware of this test suite and looks like it does what i test among other stuff. I will definitely try to run with it although the rescan part will not work as plugging the device back is in my TODO list and not part of the scope for this patchset and so I will probably comment the re-scan section out while testing.
amd gem has been using libdrm-amd thus far iirc, but for things like this I think it'd be worth to at least consider switching. Display team has already started to use some of the test and contribute stuff (I think the VRR testcase is from amd). -Daniel
Andrey
Andrey Grodzovsky (13): drm/ttm: Remap all page faults to per process dummy page. drm: Unamp the entire device address space on device unplug drm/ttm: Expose ttm_tt_unpopulate for driver use drm/sched: Cancel and flush all oustatdning jobs before finish. drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Add early fini callback drm/amdgpu: Register IOMMU topology notifier per device. drm/amdgpu: Fix a bunch of sdma code crash post device unplug drm/amdgpu: Remap all page faults to per process dummy page. dmr/amdgpu: Move some sysfs attrs creation to default_attr drm/amdgpu: Guard against write accesses after device removal drm/sched: Make timeout timer rearm conditional. drm/amdgpu: Prevent any job recoveries after device is unplugged.
Luben Tuikov (1): drm/scheduler: Job timeout handler returns status
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- drivers/gpu/drm/amd/include/amd_shared.h | 2 + drivers/gpu/drm/drm_drv.c | 3 + drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- drivers/gpu/drm/lima/lima_sched.c | 4 +- drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- drivers/gpu/drm/scheduler/sched_main.c | 18 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- include/drm/gpu_scheduler.h | 17 ++- include/drm/ttm/ttm_bo_api.h | 2 + 45 files changed, 583 insertions(+), 198 deletions(-)
-- 2.7.4
On 1/19/21 1:08 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 9:16 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote:
Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handlerif the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUSit would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requirements that still give useful solution to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
v3: More updates following comments from v2 such as removing loop to find DRM file when rerouting page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving prevention of GPU recovery post device unplug from amdgpu to scheduler layer. On top of that added unplug support for the IOMMU enabled system.
v4: Drop last sysfs hack and use sysfs default attribute. Guard against write accesses after device removal to avoid modifying released memory. Update dummy pages handling to on demand allocation and release through drm managed framework. Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention of GPU recovery post device unplug Also rebase on top of drm-misc-mext instead of amd-staging-drm-next
With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses
TODOs for followup work: Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.
[1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
btw have you tried this out with some of the igts we have? core_hotunplug is the one I'm thinking of. Might be worth to extend this for amdgpu specific stuff (like run some batches on it while hotunplugging).
No, I mostly used just running glxgears while testing which covers already exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu test suite
Since there's so many corner cases we need to test here (shared dma-buf, shared dma_fence) I think it would make sense to have a shared testcase across drivers.
Not familiar with IGT too much, is there an easy way to setup shared dma bufs and fences use cases there or you mean I need to add them now ?
We do have test infrastructure for all of that, but the hotunplug test doesn't have that yet I think.
Only specific thing would be some hooks to keep the gpu busy in some fashion while we yank the driver.
Do you mean like staring X and some active rendering on top (like glxgears) automatically from within IGT ?
Nope, igt is meant to be bare metal testing so you don't have to drag the entire winsys around (which in a wayland world, is not really good for driver testing anyway, since everything is different). We use this for our pre-merge ci for drm/i915.
So i keep it busy by X/glxgers which is manual operation. What you suggest then is some client within IGT which opens the device and starts submitting jobs (which is much like what libdrm amdgpu tests already do) ? And this part is the amdgou specific code I just need to port from libdrm to here ?
Andrey
But just to get it started you can throw in entirely amdgpu specific subtests and just share some of the test code. -Daniel
Im general, I wasn't aware of this test suite and looks like it does what i test among other stuff. I will definitely try to run with it although the rescan part will not work as plugging the device back is in my TODO list and not part of the scope for this patchset and so I will probably comment the re-scan section out while testing.
amd gem has been using libdrm-amd thus far iirc, but for things like this I think it'd be worth to at least consider switching. Display team has already started to use some of the test and contribute stuff (I think the VRR testcase is from amd). -Daniel
Andrey
Andrey Grodzovsky (13): drm/ttm: Remap all page faults to per process dummy page. drm: Unamp the entire device address space on device unplug drm/ttm: Expose ttm_tt_unpopulate for driver use drm/sched: Cancel and flush all oustatdning jobs before finish. drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Add early fini callback drm/amdgpu: Register IOMMU topology notifier per device. drm/amdgpu: Fix a bunch of sdma code crash post device unplug drm/amdgpu: Remap all page faults to per process dummy page. dmr/amdgpu: Move some sysfs attrs creation to default_attr drm/amdgpu: Guard against write accesses after device removal drm/sched: Make timeout timer rearm conditional. drm/amdgpu: Prevent any job recoveries after device is unplugged.
Luben Tuikov (1): drm/scheduler: Job timeout handler returns status
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- drivers/gpu/drm/amd/include/amd_shared.h | 2 + drivers/gpu/drm/drm_drv.c | 3 + drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- drivers/gpu/drm/lima/lima_sched.c | 4 +- drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- drivers/gpu/drm/scheduler/sched_main.c | 18 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- include/drm/gpu_scheduler.h | 17 ++- include/drm/ttm/ttm_bo_api.h | 2 + 45 files changed, 583 insertions(+), 198 deletions(-)
-- 2.7.4
On Tue, Jan 19, 2021 at 01:18:15PM -0500, Andrey Grodzovsky wrote:
On 1/19/21 1:08 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 9:16 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote:
Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handlerif the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUSit would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requirements that still give useful solution to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
v3: More updates following comments from v2 such as removing loop to find DRM file when rerouting page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving prevention of GPU recovery post device unplug from amdgpu to scheduler layer. On top of that added unplug support for the IOMMU enabled system.
v4: Drop last sysfs hack and use sysfs default attribute. Guard against write accesses after device removal to avoid modifying released memory. Update dummy pages handling to on demand allocation and release through drm managed framework. Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention of GPU recovery post device unplug Also rebase on top of drm-misc-mext instead of amd-staging-drm-next
With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses
TODOs for followup work: Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.
[1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
btw have you tried this out with some of the igts we have? core_hotunplug is the one I'm thinking of. Might be worth to extend this for amdgpu specific stuff (like run some batches on it while hotunplugging).
No, I mostly used just running glxgears while testing which covers already exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu test suite
Since there's so many corner cases we need to test here (shared dma-buf, shared dma_fence) I think it would make sense to have a shared testcase across drivers.
Not familiar with IGT too much, is there an easy way to setup shared dma bufs and fences use cases there or you mean I need to add them now ?
We do have test infrastructure for all of that, but the hotunplug test doesn't have that yet I think.
Only specific thing would be some hooks to keep the gpu busy in some fashion while we yank the driver.
Do you mean like staring X and some active rendering on top (like glxgears) automatically from within IGT ?
Nope, igt is meant to be bare metal testing so you don't have to drag the entire winsys around (which in a wayland world, is not really good for driver testing anyway, since everything is different). We use this for our pre-merge ci for drm/i915.
So i keep it busy by X/glxgers which is manual operation. What you suggest then is some client within IGT which opens the device and starts submitting jobs (which is much like what libdrm amdgpu tests already do) ? And this part is the amdgou specific code I just need to port from libdrm to here ?
Yup. For i915 tests we have an entire library already for small workloads, including some that just spin forever (useful for reset testing and could also come handy for unload testing). -Daniel
Andrey
But just to get it started you can throw in entirely amdgpu specific subtests and just share some of the test code. -Daniel
Im general, I wasn't aware of this test suite and looks like it does what i test among other stuff. I will definitely try to run with it although the rescan part will not work as plugging the device back is in my TODO list and not part of the scope for this patchset and so I will probably comment the re-scan section out while testing.
amd gem has been using libdrm-amd thus far iirc, but for things like this I think it'd be worth to at least consider switching. Display team has already started to use some of the test and contribute stuff (I think the VRR testcase is from amd). -Daniel
Andrey
Andrey Grodzovsky (13): drm/ttm: Remap all page faults to per process dummy page. drm: Unamp the entire device address space on device unplug drm/ttm: Expose ttm_tt_unpopulate for driver use drm/sched: Cancel and flush all oustatdning jobs before finish. drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Add early fini callback drm/amdgpu: Register IOMMU topology notifier per device. drm/amdgpu: Fix a bunch of sdma code crash post device unplug drm/amdgpu: Remap all page faults to per process dummy page. dmr/amdgpu: Move some sysfs attrs creation to default_attr drm/amdgpu: Guard against write accesses after device removal drm/sched: Make timeout timer rearm conditional. drm/amdgpu: Prevent any job recoveries after device is unplugged.
Luben Tuikov (1): drm/scheduler: Job timeout handler returns status
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- drivers/gpu/drm/amd/include/amd_shared.h | 2 + drivers/gpu/drm/drm_drv.c | 3 + drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- drivers/gpu/drm/lima/lima_sched.c | 4 +- drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- drivers/gpu/drm/scheduler/sched_main.c | 18 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- include/drm/gpu_scheduler.h | 17 ++- include/drm/ttm/ttm_bo_api.h | 2 + 45 files changed, 583 insertions(+), 198 deletions(-)
-- 2.7.4
On 1/20/21 4:05 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 01:18:15PM -0500, Andrey Grodzovsky wrote:
On 1/19/21 1:08 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 9:16 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote:
Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still trying to access the BO while the backing device was gone. To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handlerif the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUSit would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
v2: Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted to per user process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requirements that still give useful solution to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
v3: More updates following comments from v2 such as removing loop to find DRM file when rerouting page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving prevention of GPU recovery post device unplug from amdgpu to scheduler layer. On top of that added unplug support for the IOMMU enabled system.
v4: Drop last sysfs hack and use sysfs default attribute. Guard against write accesses after device removal to avoid modifying released memory. Update dummy pages handling to on demand allocation and release through drm managed framework. Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention of GPU recovery post device unplug Also rebase on top of drm-misc-mext instead of amd-staging-drm-next
With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the primary card or soft reset the device without hangs or oopses
TODOs for followup work: Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.
[1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
btw have you tried this out with some of the igts we have? core_hotunplug is the one I'm thinking of. Might be worth to extend this for amdgpu specific stuff (like run some batches on it while hotunplugging).
No, I mostly used just running glxgears while testing which covers already exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu test suite
Since there's so many corner cases we need to test here (shared dma-buf, shared dma_fence) I think it would make sense to have a shared testcase across drivers.
Not familiar with IGT too much, is there an easy way to setup shared dma bufs and fences use cases there or you mean I need to add them now ?
We do have test infrastructure for all of that, but the hotunplug test doesn't have that yet I think.
Only specific thing would be some hooks to keep the gpu busy in some fashion while we yank the driver.
Do you mean like staring X and some active rendering on top (like glxgears) automatically from within IGT ?
Nope, igt is meant to be bare metal testing so you don't have to drag the entire winsys around (which in a wayland world, is not really good for driver testing anyway, since everything is different). We use this for our pre-merge ci for drm/i915.
So i keep it busy by X/glxgers which is manual operation. What you suggest then is some client within IGT which opens the device and starts submitting jobs (which is much like what libdrm amdgpu tests already do) ? And this part is the amdgou specific code I just need to port from libdrm to here ?
Yup. For i915 tests we have an entire library already for small workloads, including some that just spin forever (useful for reset testing and could also come handy for unload testing). -Daniel
Does it mean I would have to drag in the entire infrastructure code from within libdrm amdgpu code that allows for command submissions through our IOCTLs ?
Andrey
Andrey
But just to get it started you can throw in entirely amdgpu specific subtests and just share some of the test code. -Daniel
Im general, I wasn't aware of this test suite and looks like it does what i test among other stuff. I will definitely try to run with it although the rescan part will not work as plugging the device back is in my TODO list and not part of the scope for this patchset and so I will probably comment the re-scan section out while testing.
amd gem has been using libdrm-amd thus far iirc, but for things like this I think it'd be worth to at least consider switching. Display team has already started to use some of the test and contribute stuff (I think the VRR testcase is from amd). -Daniel
Andrey
Andrey Grodzovsky (13): drm/ttm: Remap all page faults to per process dummy page. drm: Unamp the entire device address space on device unplug drm/ttm: Expose ttm_tt_unpopulate for driver use drm/sched: Cancel and flush all oustatdning jobs before finish. drm/amdgpu: Split amdgpu_device_fini into early and late drm/amdgpu: Add early fini callback drm/amdgpu: Register IOMMU topology notifier per device. drm/amdgpu: Fix a bunch of sdma code crash post device unplug drm/amdgpu: Remap all page faults to per process dummy page. dmr/amdgpu: Move some sysfs attrs creation to default_attr drm/amdgpu: Guard against write accesses after device removal drm/sched: Make timeout timer rearm conditional. drm/amdgpu: Prevent any job recoveries after device is unplugged.
Luben Tuikov (1): drm/scheduler: Job timeout handler returns status
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- drivers/gpu/drm/amd/include/amd_shared.h | 2 + drivers/gpu/drm/drm_drv.c | 3 + drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- drivers/gpu/drm/lima/lima_sched.c | 4 +- drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- drivers/gpu/drm/scheduler/sched_main.c | 18 ++- drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- include/drm/gpu_scheduler.h | 17 ++- include/drm/ttm/ttm_bo_api.h | 2 + 45 files changed, 583 insertions(+), 198 deletions(-)
-- 2.7.4
On Wed, Jan 20, 2021 at 3:20 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 4:05 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 01:18:15PM -0500, Andrey Grodzovsky wrote:
On 1/19/21 1:08 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 9:16 AM, Daniel Vetter wrote:
On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote: > Until now extracting a card either by physical extraction (e.g. eGPU with > thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) > would cause random crashes in user apps. The random crashes in apps were > mostly due to the app having mapped a device backed BO into its address > space was still trying to access the BO while the backing device was gone. > To answer this first problem Christian suggested to fix the handling of mapped > memory in the clients when the device goes away by forcibly unmap all buffers the > user processes has by clearing their respective VMAs mapping the device BOs. > Then when the VMAs try to fill in the page tables again we check in the fault > handlerif the device is removed and if so, return an error. This will generate a > SIGBUS to the application which can then cleanly terminate.This indeed was done > but this in turn created a problem of kernel OOPs were the OOPSes were due to the > fact that while the app was terminating because of the SIGBUSit would trigger use > after free in the driver by calling to accesses device structures that were already > released from the pci remove sequence.This was handled by introducing a 'flush' > sequence during device removal were we wait for drm file reference to drop to 0 > meaning all user clients directly using this device terminated. > > v2: > Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document > produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and > waiting for all user clients having CPU mapping of device BOs to die was dropped. > Instead as per the document suggestion the device structures are kept alive until > the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs > belonging to the device directly or by dma-buf import are rerouted to per user > process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] > since i am trying to get the minimal set of requirements that still give useful solution > to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my > test case is removing a secondary device, which is render only and is not involved > in KMS. > > v3: > More updates following comments from v2 such as removing loop to find DRM file when rerouting > page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving > prevention of GPU recovery post device unplug from amdgpu to scheduler layer. > On top of that added unplug support for the IOMMU enabled system. > > v4: > Drop last sysfs hack and use sysfs default attribute. > Guard against write accesses after device removal to avoid modifying released memory. > Update dummy pages handling to on demand allocation and release through drm managed framework. > Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention > of GPU recovery post device unplug > Also rebase on top of drm-misc-mext instead of amd-staging-drm-next > > With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears > is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working > with the primary card or soft reset the device without hangs or oopses > > TODOs for followup work: > Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) > Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. > Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card. > > [1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... > [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... > [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre... btw have you tried this out with some of the igts we have? core_hotunplug is the one I'm thinking of. Might be worth to extend this for amdgpu specific stuff (like run some batches on it while hotunplugging).
No, I mostly used just running glxgears while testing which covers already exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu test suite
Since there's so many corner cases we need to test here (shared dma-buf, shared dma_fence) I think it would make sense to have a shared testcase across drivers.
Not familiar with IGT too much, is there an easy way to setup shared dma bufs and fences use cases there or you mean I need to add them now ?
We do have test infrastructure for all of that, but the hotunplug test doesn't have that yet I think.
Only specific thing would be some hooks to keep the gpu busy in some fashion while we yank the driver.
Do you mean like staring X and some active rendering on top (like glxgears) automatically from within IGT ?
Nope, igt is meant to be bare metal testing so you don't have to drag the entire winsys around (which in a wayland world, is not really good for driver testing anyway, since everything is different). We use this for our pre-merge ci for drm/i915.
So i keep it busy by X/glxgers which is manual operation. What you suggest then is some client within IGT which opens the device and starts submitting jobs (which is much like what libdrm amdgpu tests already do) ? And this part is the amdgou specific code I just need to port from libdrm to here ?
Yup. For i915 tests we have an entire library already for small workloads, including some that just spin forever (useful for reset testing and could also come handy for unload testing). -Daniel
Does it mean I would have to drag in the entire infrastructure code from within libdrm amdgpu code that allows for command submissions through our IOCTLs ?
No it's perfectly fine to use libdrm in igt tests, we do that too. I just mean we have some additional helpers to submit specific workloads for intel gpu, like rendercpy to move data with the 3d engine (just using copy engines only isn't good enough sometimes for testing), or the special hanging batchbuffers we use for reset testing, or in general for having precise control over race conditions and things like that.
One thing that was somewhat annoying for i915 but shouldn't be a problem for amdgpu is that igt builds on intel. So we have stub functions for libdrm-intel, since libdrm-intel doesn't build on arm. Shouldn't be a problem for you. -Daniel
Andrey
Andrey
But just to get it started you can throw in entirely amdgpu specific subtests and just share some of the test code. -Daniel
Im general, I wasn't aware of this test suite and looks like it does what i test among other stuff. I will definitely try to run with it although the rescan part will not work as plugging the device back is in my TODO list and not part of the scope for this patchset and so I will probably comment the re-scan section out while testing.
amd gem has been using libdrm-amd thus far iirc, but for things like this I think it'd be worth to at least consider switching. Display team has already started to use some of the test and contribute stuff (I think the VRR testcase is from amd). -Daniel
Andrey
> Andrey Grodzovsky (13): > drm/ttm: Remap all page faults to per process dummy page. > drm: Unamp the entire device address space on device unplug > drm/ttm: Expose ttm_tt_unpopulate for driver use > drm/sched: Cancel and flush all oustatdning jobs before finish. > drm/amdgpu: Split amdgpu_device_fini into early and late > drm/amdgpu: Add early fini callback > drm/amdgpu: Register IOMMU topology notifier per device. > drm/amdgpu: Fix a bunch of sdma code crash post device unplug > drm/amdgpu: Remap all page faults to per process dummy page. > dmr/amdgpu: Move some sysfs attrs creation to default_attr > drm/amdgpu: Guard against write accesses after device removal > drm/sched: Make timeout timer rearm conditional. > drm/amdgpu: Prevent any job recoveries after device is unplugged. > > Luben Tuikov (1): > drm/scheduler: Job timeout handler returns status > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- > drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- > drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- > drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- > drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- > drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- > drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- > drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- > drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- > drivers/gpu/drm/amd/include/amd_shared.h | 2 + > drivers/gpu/drm/drm_drv.c | 3 + > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- > drivers/gpu/drm/lima/lima_sched.c | 4 +- > drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- > drivers/gpu/drm/scheduler/sched_main.c | 18 ++- > drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- > drivers/gpu/drm/ttm/ttm_tt.c | 1 + > drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- > include/drm/gpu_scheduler.h | 17 ++- > include/drm/ttm/ttm_bo_api.h | 2 + > 45 files changed, 583 insertions(+), 198 deletions(-) > > -- > 2.7.4 >
On 1/20/21 10:59 AM, Daniel Vetter wrote:
On Wed, Jan 20, 2021 at 3:20 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 4:05 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 01:18:15PM -0500, Andrey Grodzovsky wrote:
On 1/19/21 1:08 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/19/21 9:16 AM, Daniel Vetter wrote: > On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote: >> Until now extracting a card either by physical extraction (e.g. eGPU with >> thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) >> would cause random crashes in user apps. The random crashes in apps were >> mostly due to the app having mapped a device backed BO into its address >> space was still trying to access the BO while the backing device was gone. >> To answer this first problem Christian suggested to fix the handling of mapped >> memory in the clients when the device goes away by forcibly unmap all buffers the >> user processes has by clearing their respective VMAs mapping the device BOs. >> Then when the VMAs try to fill in the page tables again we check in the fault >> handlerif the device is removed and if so, return an error. This will generate a >> SIGBUS to the application which can then cleanly terminate.This indeed was done >> but this in turn created a problem of kernel OOPs were the OOPSes were due to the >> fact that while the app was terminating because of the SIGBUSit would trigger use >> after free in the driver by calling to accesses device structures that were already >> released from the pci remove sequence.This was handled by introducing a 'flush' >> sequence during device removal were we wait for drm file reference to drop to 0 >> meaning all user clients directly using this device terminated. >> >> v2: >> Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document >> produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and >> waiting for all user clients having CPU mapping of device BOs to die was dropped. >> Instead as per the document suggestion the device structures are kept alive until >> the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs >> belonging to the device directly or by dma-buf import are rerouted to per user >> process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] >> since i am trying to get the minimal set of requirements that still give useful solution >> to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my >> test case is removing a secondary device, which is render only and is not involved >> in KMS. >> >> v3: >> More updates following comments from v2 such as removing loop to find DRM file when rerouting >> page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving >> prevention of GPU recovery post device unplug from amdgpu to scheduler layer. >> On top of that added unplug support for the IOMMU enabled system. >> >> v4: >> Drop last sysfs hack and use sysfs default attribute. >> Guard against write accesses after device removal to avoid modifying released memory. >> Update dummy pages handling to on demand allocation and release through drm managed framework. >> Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention >> of GPU recovery post device unplug >> Also rebase on top of drm-misc-mext instead of amd-staging-drm-next >> >> With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears >> is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working >> with the primary card or soft reset the device without hangs or oopses >> >> TODOs for followup work: >> Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) >> Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. >> Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card. >> >> [1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >> [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >> [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre... > btw have you tried this out with some of the igts we have? core_hotunplug > is the one I'm thinking of. Might be worth to extend this for amdgpu > specific stuff (like run some batches on it while hotunplugging). No, I mostly used just running glxgears while testing which covers already exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu test suite
> Since there's so many corner cases we need to test here (shared dma-buf, > shared dma_fence) I think it would make sense to have a shared testcase > across drivers. Not familiar with IGT too much, is there an easy way to setup shared dma bufs and fences use cases there or you mean I need to add them now ?
We do have test infrastructure for all of that, but the hotunplug test doesn't have that yet I think.
> Only specific thing would be some hooks to keep the gpu > busy in some fashion while we yank the driver. Do you mean like staring X and some active rendering on top (like glxgears) automatically from within IGT ?
Nope, igt is meant to be bare metal testing so you don't have to drag the entire winsys around (which in a wayland world, is not really good for driver testing anyway, since everything is different). We use this for our pre-merge ci for drm/i915.
So i keep it busy by X/glxgers which is manual operation. What you suggest then is some client within IGT which opens the device and starts submitting jobs (which is much like what libdrm amdgpu tests already do) ? And this part is the amdgou specific code I just need to port from libdrm to here ?
Yup. For i915 tests we have an entire library already for small workloads, including some that just spin forever (useful for reset testing and could also come handy for unload testing). -Daniel
Does it mean I would have to drag in the entire infrastructure code from within libdrm amdgpu code that allows for command submissions through our IOCTLs ?
No it's perfectly fine to use libdrm in igt tests, we do that too. I just mean we have some additional helpers to submit specific workloads for intel gpu, like rendercpy to move data with the 3d engine (just using copy engines only isn't good enough sometimes for testing), or the special hanging batchbuffers we use for reset testing, or in general for having precise control over race conditions and things like that.
One thing that was somewhat annoying for i915 but shouldn't be a problem for amdgpu is that igt builds on intel. So we have stub functions for libdrm-intel, since libdrm-intel doesn't build on arm. Shouldn't be a problem for you. -Daniel
Tested with igt hot-unplug test. Passed unbind_rebind, unplug-rescan, hot-unbind-rebind and hotunplug-rescan if disabling the rescan part as I don't support plug-back for now. Also added command submission for amdgpu. Attached a draft of submitting workload while unbinding the driver or simulating detach. Catched 2 issues with unpug if command submission in flight during unplug - (unsignaled fence causing a hang in amdgpu_cs_sync and hitting a BUG_ON in gfx_v9_0_ring_emit_patch_cond_exec whic is expected i guess). Guess glxgears command submissions is at a much slower rate so this was missed. Is that what you meant for this test ?
Andrey
Andrey
Andrey
> But just to get it started > you can throw in entirely amdgpu specific subtests and just share some of > the test code. > -Daniel Im general, I wasn't aware of this test suite and looks like it does what i test among other stuff. I will definitely try to run with it although the rescan part will not work as plugging the device back is in my TODO list and not part of the scope for this patchset and so I will probably comment the re-scan section out while testing.
amd gem has been using libdrm-amd thus far iirc, but for things like this I think it'd be worth to at least consider switching. Display team has already started to use some of the test and contribute stuff (I think the VRR testcase is from amd). -Daniel
Andrey
>> Andrey Grodzovsky (13): >> drm/ttm: Remap all page faults to per process dummy page. >> drm: Unamp the entire device address space on device unplug >> drm/ttm: Expose ttm_tt_unpopulate for driver use >> drm/sched: Cancel and flush all oustatdning jobs before finish. >> drm/amdgpu: Split amdgpu_device_fini into early and late >> drm/amdgpu: Add early fini callback >> drm/amdgpu: Register IOMMU topology notifier per device. >> drm/amdgpu: Fix a bunch of sdma code crash post device unplug >> drm/amdgpu: Remap all page faults to per process dummy page. >> dmr/amdgpu: Move some sysfs attrs creation to default_attr >> drm/amdgpu: Guard against write accesses after device removal >> drm/sched: Make timeout timer rearm conditional. >> drm/amdgpu: Prevent any job recoveries after device is unplugged. >> >> Luben Tuikov (1): >> drm/scheduler: Job timeout handler returns status >> >> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- >> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- >> drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- >> drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- >> drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- >> drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- >> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- >> drivers/gpu/drm/amd/include/amd_shared.h | 2 + >> drivers/gpu/drm/drm_drv.c | 3 + >> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- >> drivers/gpu/drm/lima/lima_sched.c | 4 +- >> drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- >> drivers/gpu/drm/scheduler/sched_main.c | 18 ++- >> drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- >> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >> drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- >> include/drm/gpu_scheduler.h | 17 ++- >> include/drm/ttm/ttm_bo_api.h | 2 + >> 45 files changed, 583 insertions(+), 198 deletions(-) >> >> -- >> 2.7.4 >>
On Mon, Feb 8, 2021 at 6:59 AM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 10:59 AM, Daniel Vetter wrote:
On Wed, Jan 20, 2021 at 3:20 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 4:05 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 01:18:15PM -0500, Andrey Grodzovsky wrote:
On 1/19/21 1:08 PM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote: > On 1/19/21 9:16 AM, Daniel Vetter wrote: >> On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote: >>> Until now extracting a card either by physical extraction (e.g. eGPU with >>> thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) >>> would cause random crashes in user apps. The random crashes in apps were >>> mostly due to the app having mapped a device backed BO into its address >>> space was still trying to access the BO while the backing device was gone. >>> To answer this first problem Christian suggested to fix the handling of mapped >>> memory in the clients when the device goes away by forcibly unmap all buffers the >>> user processes has by clearing their respective VMAs mapping the device BOs. >>> Then when the VMAs try to fill in the page tables again we check in the fault >>> handlerif the device is removed and if so, return an error. This will generate a >>> SIGBUS to the application which can then cleanly terminate.This indeed was done >>> but this in turn created a problem of kernel OOPs were the OOPSes were due to the >>> fact that while the app was terminating because of the SIGBUSit would trigger use >>> after free in the driver by calling to accesses device structures that were already >>> released from the pci remove sequence.This was handled by introducing a 'flush' >>> sequence during device removal were we wait for drm file reference to drop to 0 >>> meaning all user clients directly using this device terminated. >>> >>> v2: >>> Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document >>> produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and >>> waiting for all user clients having CPU mapping of device BOs to die was dropped. >>> Instead as per the document suggestion the device structures are kept alive until >>> the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs >>> belonging to the device directly or by dma-buf import are rerouted to per user >>> process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] >>> since i am trying to get the minimal set of requirements that still give useful solution >>> to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my >>> test case is removing a secondary device, which is render only and is not involved >>> in KMS. >>> >>> v3: >>> More updates following comments from v2 such as removing loop to find DRM file when rerouting >>> page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving >>> prevention of GPU recovery post device unplug from amdgpu to scheduler layer. >>> On top of that added unplug support for the IOMMU enabled system. >>> >>> v4: >>> Drop last sysfs hack and use sysfs default attribute. >>> Guard against write accesses after device removal to avoid modifying released memory. >>> Update dummy pages handling to on demand allocation and release through drm managed framework. >>> Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention >>> of GPU recovery post device unplug >>> Also rebase on top of drm-misc-mext instead of amd-staging-drm-next >>> >>> With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears >>> is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working >>> with the primary card or soft reset the device without hangs or oopses >>> >>> TODOs for followup work: >>> Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) >>> Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. >>> Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card. >>> >>> [1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >>> [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >>> [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre... >> btw have you tried this out with some of the igts we have? core_hotunplug >> is the one I'm thinking of. Might be worth to extend this for amdgpu >> specific stuff (like run some batches on it while hotunplugging). > No, I mostly used just running glxgears while testing which covers already > exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu > test suite > > >> Since there's so many corner cases we need to test here (shared dma-buf, >> shared dma_fence) I think it would make sense to have a shared testcase >> across drivers. > Not familiar with IGT too much, is there an easy way to setup shared dma bufs > and fences > use cases there or you mean I need to add them now ? We do have test infrastructure for all of that, but the hotunplug test doesn't have that yet I think.
>> Only specific thing would be some hooks to keep the gpu >> busy in some fashion while we yank the driver. > Do you mean like staring X and some active rendering on top (like glxgears) > automatically from within IGT ? Nope, igt is meant to be bare metal testing so you don't have to drag the entire winsys around (which in a wayland world, is not really good for driver testing anyway, since everything is different). We use this for our pre-merge ci for drm/i915.
So i keep it busy by X/glxgers which is manual operation. What you suggest then is some client within IGT which opens the device and starts submitting jobs (which is much like what libdrm amdgpu tests already do) ? And this part is the amdgou specific code I just need to port from libdrm to here ?
Yup. For i915 tests we have an entire library already for small workloads, including some that just spin forever (useful for reset testing and could also come handy for unload testing). -Daniel
Does it mean I would have to drag in the entire infrastructure code from within libdrm amdgpu code that allows for command submissions through our IOCTLs ?
No it's perfectly fine to use libdrm in igt tests, we do that too. I just mean we have some additional helpers to submit specific workloads for intel gpu, like rendercpy to move data with the 3d engine (just using copy engines only isn't good enough sometimes for testing), or the special hanging batchbuffers we use for reset testing, or in general for having precise control over race conditions and things like that.
One thing that was somewhat annoying for i915 but shouldn't be a problem for amdgpu is that igt builds on intel. So we have stub functions for libdrm-intel, since libdrm-intel doesn't build on arm. Shouldn't be a problem for you. -Daniel
Tested with igt hot-unplug test. Passed unbind_rebind, unplug-rescan, hot-unbind-rebind and hotunplug-rescan if disabling the rescan part as I don't support plug-back for now. Also added command submission for amdgpu. Attached a draft of submitting workload while unbinding the driver or simulating detach. Catched 2 issues with unpug if command submission in flight during unplug - (unsignaled fence causing a hang in amdgpu_cs_sync and hitting a BUG_ON in gfx_v9_0_ring_emit_patch_cond_exec whic is expected i guess). Guess glxgears command submissions is at a much slower rate so this was missed. Is that what you meant for this test ?
Yup. Would be good if you can submit this one for inclusion. -Daniel
Andrey
Andrey
Andrey
>> But just to get it started >> you can throw in entirely amdgpu specific subtests and just share some of >> the test code. >> -Daniel > Im general, I wasn't aware of this test suite and looks like it does what i test > among other stuff. > I will definitely try to run with it although the rescan part will not work as > plugging > the device back is in my TODO list and not part of the scope for this patchset > and so I will > probably comment the re-scan section out while testing. amd gem has been using libdrm-amd thus far iirc, but for things like this I think it'd be worth to at least consider switching. Display team has already started to use some of the test and contribute stuff (I think the VRR testcase is from amd). -Daniel
> Andrey > > >>> Andrey Grodzovsky (13): >>> drm/ttm: Remap all page faults to per process dummy page. >>> drm: Unamp the entire device address space on device unplug >>> drm/ttm: Expose ttm_tt_unpopulate for driver use >>> drm/sched: Cancel and flush all oustatdning jobs before finish. >>> drm/amdgpu: Split amdgpu_device_fini into early and late >>> drm/amdgpu: Add early fini callback >>> drm/amdgpu: Register IOMMU topology notifier per device. >>> drm/amdgpu: Fix a bunch of sdma code crash post device unplug >>> drm/amdgpu: Remap all page faults to per process dummy page. >>> dmr/amdgpu: Move some sysfs attrs creation to default_attr >>> drm/amdgpu: Guard against write accesses after device removal >>> drm/sched: Make timeout timer rearm conditional. >>> drm/amdgpu: Prevent any job recoveries after device is unplugged. >>> >>> Luben Tuikov (1): >>> drm/scheduler: Job timeout handler returns status >>> >>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- >>> drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- >>> drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- >>> drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- >>> drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- >>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- >>> drivers/gpu/drm/amd/include/amd_shared.h | 2 + >>> drivers/gpu/drm/drm_drv.c | 3 + >>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- >>> drivers/gpu/drm/lima/lima_sched.c | 4 +- >>> drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- >>> drivers/gpu/drm/scheduler/sched_main.c | 18 ++- >>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- >>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>> drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- >>> include/drm/gpu_scheduler.h | 17 ++- >>> include/drm/ttm/ttm_bo_api.h | 2 + >>> 45 files changed, 583 insertions(+), 198 deletions(-) >>> >>> -- >>> 2.7.4 >>>
On 2/8/21 2:27 AM, Daniel Vetter wrote:
On Mon, Feb 8, 2021 at 6:59 AM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 10:59 AM, Daniel Vetter wrote:
On Wed, Jan 20, 2021 at 3:20 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 4:05 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 01:18:15PM -0500, Andrey Grodzovsky wrote:
On 1/19/21 1:08 PM, Daniel Vetter wrote: > On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky > Andrey.Grodzovsky@amd.com wrote: >> On 1/19/21 9:16 AM, Daniel Vetter wrote: >>> On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote: >>>> Until now extracting a card either by physical extraction (e.g. eGPU with >>>> thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) >>>> would cause random crashes in user apps. The random crashes in apps were >>>> mostly due to the app having mapped a device backed BO into its address >>>> space was still trying to access the BO while the backing device was gone. >>>> To answer this first problem Christian suggested to fix the handling of mapped >>>> memory in the clients when the device goes away by forcibly unmap all buffers the >>>> user processes has by clearing their respective VMAs mapping the device BOs. >>>> Then when the VMAs try to fill in the page tables again we check in the fault >>>> handlerif the device is removed and if so, return an error. This will generate a >>>> SIGBUS to the application which can then cleanly terminate.This indeed was done >>>> but this in turn created a problem of kernel OOPs were the OOPSes were due to the >>>> fact that while the app was terminating because of the SIGBUSit would trigger use >>>> after free in the driver by calling to accesses device structures that were already >>>> released from the pci remove sequence.This was handled by introducing a 'flush' >>>> sequence during device removal were we wait for drm file reference to drop to 0 >>>> meaning all user clients directly using this device terminated. >>>> >>>> v2: >>>> Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document >>>> produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and >>>> waiting for all user clients having CPU mapping of device BOs to die was dropped. >>>> Instead as per the document suggestion the device structures are kept alive until >>>> the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs >>>> belonging to the device directly or by dma-buf import are rerouted to per user >>>> process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] >>>> since i am trying to get the minimal set of requirements that still give useful solution >>>> to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my >>>> test case is removing a secondary device, which is render only and is not involved >>>> in KMS. >>>> >>>> v3: >>>> More updates following comments from v2 such as removing loop to find DRM file when rerouting >>>> page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving >>>> prevention of GPU recovery post device unplug from amdgpu to scheduler layer. >>>> On top of that added unplug support for the IOMMU enabled system. >>>> >>>> v4: >>>> Drop last sysfs hack and use sysfs default attribute. >>>> Guard against write accesses after device removal to avoid modifying released memory. >>>> Update dummy pages handling to on demand allocation and release through drm managed framework. >>>> Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention >>>> of GPU recovery post device unplug >>>> Also rebase on top of drm-misc-mext instead of amd-staging-drm-next >>>> >>>> With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears >>>> is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working >>>> with the primary card or soft reset the device without hangs or oopses >>>> >>>> TODOs for followup work: >>>> Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) >>>> Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. >>>> Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card. >>>> >>>> [1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >>>> [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >>>> [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre... >>> btw have you tried this out with some of the igts we have? core_hotunplug >>> is the one I'm thinking of. Might be worth to extend this for amdgpu >>> specific stuff (like run some batches on it while hotunplugging). >> No, I mostly used just running glxgears while testing which covers already >> exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu >> test suite >> >> >>> Since there's so many corner cases we need to test here (shared dma-buf, >>> shared dma_fence) I think it would make sense to have a shared testcase >>> across drivers. >> Not familiar with IGT too much, is there an easy way to setup shared dma bufs >> and fences >> use cases there or you mean I need to add them now ? > We do have test infrastructure for all of that, but the hotunplug test > doesn't have that yet I think. > >>> Only specific thing would be some hooks to keep the gpu >>> busy in some fashion while we yank the driver. >> Do you mean like staring X and some active rendering on top (like glxgears) >> automatically from within IGT ? > Nope, igt is meant to be bare metal testing so you don't have to drag > the entire winsys around (which in a wayland world, is not really good > for driver testing anyway, since everything is different). We use this > for our pre-merge ci for drm/i915. So i keep it busy by X/glxgers which is manual operation. What you suggest then is some client within IGT which opens the device and starts submitting jobs (which is much like what libdrm amdgpu tests already do) ? And this part is the amdgou specific code I just need to port from libdrm to here ?
Yup. For i915 tests we have an entire library already for small workloads, including some that just spin forever (useful for reset testing and could also come handy for unload testing). -Daniel
Does it mean I would have to drag in the entire infrastructure code from within libdrm amdgpu code that allows for command submissions through our IOCTLs ?
No it's perfectly fine to use libdrm in igt tests, we do that too. I just mean we have some additional helpers to submit specific workloads for intel gpu, like rendercpy to move data with the 3d engine (just using copy engines only isn't good enough sometimes for testing), or the special hanging batchbuffers we use for reset testing, or in general for having precise control over race conditions and things like that.
One thing that was somewhat annoying for i915 but shouldn't be a problem for amdgpu is that igt builds on intel. So we have stub functions for libdrm-intel, since libdrm-intel doesn't build on arm. Shouldn't be a problem for you. -Daniel
Tested with igt hot-unplug test. Passed unbind_rebind, unplug-rescan, hot-unbind-rebind and hotunplug-rescan if disabling the rescan part as I don't support plug-back for now. Also added command submission for amdgpu. Attached a draft of submitting workload while unbinding the driver or simulating detach. Catched 2 issues with unpug if command submission in flight during unplug - (unsignaled fence causing a hang in amdgpu_cs_sync and hitting a BUG_ON in gfx_v9_0_ring_emit_patch_cond_exec whic is expected i guess). Guess glxgears command submissions is at a much slower rate so this was missed. Is that what you meant for this test ?
Yup. Would be good if you can submit this one for inclusion. -Daniel
Will do together with exported dma-buf test once I do it.
P.S How am i supposed to do exported fence test. Exporting a fence from device A, importing it into device B, unplugging device A then signaling the fence from device B - this supposed to call a fence cb which was registered by the exporter which by now is dead and hence will cause a 'use after free' ?
Andrey
Andrey
Andrey
Andrey
>>> But just to get it started >>> you can throw in entirely amdgpu specific subtests and just share some of >>> the test code. >>> -Daniel >> Im general, I wasn't aware of this test suite and looks like it does what i test >> among other stuff. >> I will definitely try to run with it although the rescan part will not work as >> plugging >> the device back is in my TODO list and not part of the scope for this patchset >> and so I will >> probably comment the re-scan section out while testing. > amd gem has been using libdrm-amd thus far iirc, but for things like > this I think it'd be worth to at least consider switching. Display > team has already started to use some of the test and contribute stuff > (I think the VRR testcase is from amd). > -Daniel > >> Andrey >> >> >>>> Andrey Grodzovsky (13): >>>> drm/ttm: Remap all page faults to per process dummy page. >>>> drm: Unamp the entire device address space on device unplug >>>> drm/ttm: Expose ttm_tt_unpopulate for driver use >>>> drm/sched: Cancel and flush all oustatdning jobs before finish. >>>> drm/amdgpu: Split amdgpu_device_fini into early and late >>>> drm/amdgpu: Add early fini callback >>>> drm/amdgpu: Register IOMMU topology notifier per device. >>>> drm/amdgpu: Fix a bunch of sdma code crash post device unplug >>>> drm/amdgpu: Remap all page faults to per process dummy page. >>>> dmr/amdgpu: Move some sysfs attrs creation to default_attr >>>> drm/amdgpu: Guard against write accesses after device removal >>>> drm/sched: Make timeout timer rearm conditional. >>>> drm/amdgpu: Prevent any job recoveries after device is unplugged. >>>> >>>> Luben Tuikov (1): >>>> drm/scheduler: Job timeout handler returns status >>>> >>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- >>>> drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- >>>> drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- >>>> drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- >>>> drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- >>>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- >>>> drivers/gpu/drm/amd/include/amd_shared.h | 2 + >>>> drivers/gpu/drm/drm_drv.c | 3 + >>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- >>>> drivers/gpu/drm/lima/lima_sched.c | 4 +- >>>> drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- >>>> drivers/gpu/drm/scheduler/sched_main.c | 18 ++- >>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- >>>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>>> drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- >>>> include/drm/gpu_scheduler.h | 17 ++- >>>> include/drm/ttm/ttm_bo_api.h | 2 + >>>> 45 files changed, 583 insertions(+), 198 deletions(-) >>>> >>>> -- >>>> 2.7.4 >>>>
On Mon, Feb 08, 2021 at 11:01:14PM -0500, Andrey Grodzovsky wrote:
On 2/8/21 2:27 AM, Daniel Vetter wrote:
On Mon, Feb 8, 2021 at 6:59 AM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 10:59 AM, Daniel Vetter wrote:
On Wed, Jan 20, 2021 at 3:20 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 4:05 AM, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 01:18:15PM -0500, Andrey Grodzovsky wrote: > On 1/19/21 1:08 PM, Daniel Vetter wrote: > > On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky > > Andrey.Grodzovsky@amd.com wrote: > > > On 1/19/21 9:16 AM, Daniel Vetter wrote: > > > > On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote: > > > > > Until now extracting a card either by physical extraction (e.g. eGPU with > > > > > thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) > > > > > would cause random crashes in user apps. The random crashes in apps were > > > > > mostly due to the app having mapped a device backed BO into its address > > > > > space was still trying to access the BO while the backing device was gone. > > > > > To answer this first problem Christian suggested to fix the handling of mapped > > > > > memory in the clients when the device goes away by forcibly unmap all buffers the > > > > > user processes has by clearing their respective VMAs mapping the device BOs. > > > > > Then when the VMAs try to fill in the page tables again we check in the fault > > > > > handlerif the device is removed and if so, return an error. This will generate a > > > > > SIGBUS to the application which can then cleanly terminate.This indeed was done > > > > > but this in turn created a problem of kernel OOPs were the OOPSes were due to the > > > > > fact that while the app was terminating because of the SIGBUSit would trigger use > > > > > after free in the driver by calling to accesses device structures that were already > > > > > released from the pci remove sequence.This was handled by introducing a 'flush' > > > > > sequence during device removal were we wait for drm file reference to drop to 0 > > > > > meaning all user clients directly using this device terminated. > > > > > > > > > > v2: > > > > > Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document > > > > > produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and > > > > > waiting for all user clients having CPU mapping of device BOs to die was dropped. > > > > > Instead as per the document suggestion the device structures are kept alive until > > > > > the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs > > > > > belonging to the device directly or by dma-buf import are rerouted to per user > > > > > process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] > > > > > since i am trying to get the minimal set of requirements that still give useful solution > > > > > to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my > > > > > test case is removing a secondary device, which is render only and is not involved > > > > > in KMS. > > > > > > > > > > v3: > > > > > More updates following comments from v2 such as removing loop to find DRM file when rerouting > > > > > page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving > > > > > prevention of GPU recovery post device unplug from amdgpu to scheduler layer. > > > > > On top of that added unplug support for the IOMMU enabled system. > > > > > > > > > > v4: > > > > > Drop last sysfs hack and use sysfs default attribute. > > > > > Guard against write accesses after device removal to avoid modifying released memory. > > > > > Update dummy pages handling to on demand allocation and release through drm managed framework. > > > > > Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention > > > > > of GPU recovery post device unplug > > > > > Also rebase on top of drm-misc-mext instead of amd-staging-drm-next > > > > > > > > > > With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears > > > > > is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working > > > > > with the primary card or soft reset the device without hangs or oopses > > > > > > > > > > TODOs for followup work: > > > > > Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) > > > > > Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. > > > > > Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card. > > > > > > > > > > [1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... > > > > > [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... > > > > > [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre... > > > > btw have you tried this out with some of the igts we have? core_hotunplug > > > > is the one I'm thinking of. Might be worth to extend this for amdgpu > > > > specific stuff (like run some batches on it while hotunplugging). > > > No, I mostly used just running glxgears while testing which covers already > > > exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu > > > test suite > > > > > > > > > > Since there's so many corner cases we need to test here (shared dma-buf, > > > > shared dma_fence) I think it would make sense to have a shared testcase > > > > across drivers. > > > Not familiar with IGT too much, is there an easy way to setup shared dma bufs > > > and fences > > > use cases there or you mean I need to add them now ? > > We do have test infrastructure for all of that, but the hotunplug test > > doesn't have that yet I think. > > > > > > Only specific thing would be some hooks to keep the gpu > > > > busy in some fashion while we yank the driver. > > > Do you mean like staring X and some active rendering on top (like glxgears) > > > automatically from within IGT ? > > Nope, igt is meant to be bare metal testing so you don't have to drag > > the entire winsys around (which in a wayland world, is not really good > > for driver testing anyway, since everything is different). We use this > > for our pre-merge ci for drm/i915. > So i keep it busy by X/glxgers which is manual operation. What you suggest > then is some client within IGT which opens the device and starts submitting jobs > (which is much like what libdrm amdgpu tests already do) ? And this > part is the amdgou specific code I just need to port from libdrm to here ? Yup. For i915 tests we have an entire library already for small workloads, including some that just spin forever (useful for reset testing and could also come handy for unload testing). -Daniel
Does it mean I would have to drag in the entire infrastructure code from within libdrm amdgpu code that allows for command submissions through our IOCTLs ?
No it's perfectly fine to use libdrm in igt tests, we do that too. I just mean we have some additional helpers to submit specific workloads for intel gpu, like rendercpy to move data with the 3d engine (just using copy engines only isn't good enough sometimes for testing), or the special hanging batchbuffers we use for reset testing, or in general for having precise control over race conditions and things like that.
One thing that was somewhat annoying for i915 but shouldn't be a problem for amdgpu is that igt builds on intel. So we have stub functions for libdrm-intel, since libdrm-intel doesn't build on arm. Shouldn't be a problem for you. -Daniel
Tested with igt hot-unplug test. Passed unbind_rebind, unplug-rescan, hot-unbind-rebind and hotunplug-rescan if disabling the rescan part as I don't support plug-back for now. Also added command submission for amdgpu. Attached a draft of submitting workload while unbinding the driver or simulating detach. Catched 2 issues with unpug if command submission in flight during unplug - (unsignaled fence causing a hang in amdgpu_cs_sync and hitting a BUG_ON in gfx_v9_0_ring_emit_patch_cond_exec whic is expected i guess). Guess glxgears command submissions is at a much slower rate so this was missed. Is that what you meant for this test ?
Yup. Would be good if you can submit this one for inclusion. -Daniel
Will do together with exported dma-buf test once I do it.
P.S How am i supposed to do exported fence test. Exporting a fence from device A, importing it into device B, unplugging device A then signaling the fence from device B - this supposed to call a fence cb which was registered by the exporter which by now is dead and hence will cause a 'use after free' ?
Yeah in the end we'd need 2 hw devices for testing full fence functionality. A useful intermediate step would be to just export the fence (either as sync_file, which I think amdgpu doesn't support because no android egl support in mesa) or drm_syncobj (which you can do as standalone fd too iirc), and then just using the fence a bit from userspace (like wait on it or get its status) after the device is unplugged.
I think this should cover most of the cross-driver issues that fences bring in, and I think for the other problems we can worry once we spot. -Daniel
Andrey
Andrey
Andrey
> Andrey > > > > > > But just to get it started > > > > you can throw in entirely amdgpu specific subtests and just share some of > > > > the test code. > > > > -Daniel > > > Im general, I wasn't aware of this test suite and looks like it does what i test > > > among other stuff. > > > I will definitely try to run with it although the rescan part will not work as > > > plugging > > > the device back is in my TODO list and not part of the scope for this patchset > > > and so I will > > > probably comment the re-scan section out while testing. > > amd gem has been using libdrm-amd thus far iirc, but for things like > > this I think it'd be worth to at least consider switching. Display > > team has already started to use some of the test and contribute stuff > > (I think the VRR testcase is from amd). > > -Daniel > > > > > Andrey > > > > > > > > > > > Andrey Grodzovsky (13): > > > > > drm/ttm: Remap all page faults to per process dummy page. > > > > > drm: Unamp the entire device address space on device unplug > > > > > drm/ttm: Expose ttm_tt_unpopulate for driver use > > > > > drm/sched: Cancel and flush all oustatdning jobs before finish. > > > > > drm/amdgpu: Split amdgpu_device_fini into early and late > > > > > drm/amdgpu: Add early fini callback > > > > > drm/amdgpu: Register IOMMU topology notifier per device. > > > > > drm/amdgpu: Fix a bunch of sdma code crash post device unplug > > > > > drm/amdgpu: Remap all page faults to per process dummy page. > > > > > dmr/amdgpu: Move some sysfs attrs creation to default_attr > > > > > drm/amdgpu: Guard against write accesses after device removal > > > > > drm/sched: Make timeout timer rearm conditional. > > > > > drm/amdgpu: Prevent any job recoveries after device is unplugged. > > > > > > > > > > Luben Tuikov (1): > > > > > drm/scheduler: Job timeout handler returns status > > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- > > > > > drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- > > > > > drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- > > > > > drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- > > > > > drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- > > > > > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- > > > > > drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- > > > > > drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- > > > > > drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- > > > > > drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- > > > > > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- > > > > > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- > > > > > drivers/gpu/drm/amd/include/amd_shared.h | 2 + > > > > > drivers/gpu/drm/drm_drv.c | 3 + > > > > > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- > > > > > drivers/gpu/drm/lima/lima_sched.c | 4 +- > > > > > drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- > > > > > drivers/gpu/drm/scheduler/sched_main.c | 18 ++- > > > > > drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- > > > > > drivers/gpu/drm/ttm/ttm_tt.c | 1 + > > > > > drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- > > > > > include/drm/gpu_scheduler.h | 17 ++- > > > > > include/drm/ttm/ttm_bo_api.h | 2 + > > > > > 45 files changed, 583 insertions(+), 198 deletions(-) > > > > > > > > > > -- > > > > > 2.7.4 > > > > >
On 2/9/21 4:50 AM, Daniel Vetter wrote:
On Mon, Feb 08, 2021 at 11:01:14PM -0500, Andrey Grodzovsky wrote:
On 2/8/21 2:27 AM, Daniel Vetter wrote:
On Mon, Feb 8, 2021 at 6:59 AM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 10:59 AM, Daniel Vetter wrote:
On Wed, Jan 20, 2021 at 3:20 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
On 1/20/21 4:05 AM, Daniel Vetter wrote: > On Tue, Jan 19, 2021 at 01:18:15PM -0500, Andrey Grodzovsky wrote: >> On 1/19/21 1:08 PM, Daniel Vetter wrote: >>> On Tue, Jan 19, 2021 at 6:31 PM Andrey Grodzovsky >>> Andrey.Grodzovsky@amd.com wrote: >>>> On 1/19/21 9:16 AM, Daniel Vetter wrote: >>>>> On Mon, Jan 18, 2021 at 04:01:09PM -0500, Andrey Grodzovsky wrote: >>>>>> Until now extracting a card either by physical extraction (e.g. eGPU with >>>>>> thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove) >>>>>> would cause random crashes in user apps. The random crashes in apps were >>>>>> mostly due to the app having mapped a device backed BO into its address >>>>>> space was still trying to access the BO while the backing device was gone. >>>>>> To answer this first problem Christian suggested to fix the handling of mapped >>>>>> memory in the clients when the device goes away by forcibly unmap all buffers the >>>>>> user processes has by clearing their respective VMAs mapping the device BOs. >>>>>> Then when the VMAs try to fill in the page tables again we check in the fault >>>>>> handlerif the device is removed and if so, return an error. This will generate a >>>>>> SIGBUS to the application which can then cleanly terminate.This indeed was done >>>>>> but this in turn created a problem of kernel OOPs were the OOPSes were due to the >>>>>> fact that while the app was terminating because of the SIGBUSit would trigger use >>>>>> after free in the driver by calling to accesses device structures that were already >>>>>> released from the pci remove sequence.This was handled by introducing a 'flush' >>>>>> sequence during device removal were we wait for drm file reference to drop to 0 >>>>>> meaning all user clients directly using this device terminated. >>>>>> >>>>>> v2: >>>>>> Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document >>>>>> produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and >>>>>> waiting for all user clients having CPU mapping of device BOs to die was dropped. >>>>>> Instead as per the document suggestion the device structures are kept alive until >>>>>> the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs >>>>>> belonging to the device directly or by dma-buf import are rerouted to per user >>>>>> process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] >>>>>> since i am trying to get the minimal set of requirements that still give useful solution >>>>>> to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my >>>>>> test case is removing a secondary device, which is render only and is not involved >>>>>> in KMS. >>>>>> >>>>>> v3: >>>>>> More updates following comments from v2 such as removing loop to find DRM file when rerouting >>>>>> page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving >>>>>> prevention of GPU recovery post device unplug from amdgpu to scheduler layer. >>>>>> On top of that added unplug support for the IOMMU enabled system. >>>>>> >>>>>> v4: >>>>>> Drop last sysfs hack and use sysfs default attribute. >>>>>> Guard against write accesses after device removal to avoid modifying released memory. >>>>>> Update dummy pages handling to on demand allocation and release through drm managed framework. >>>>>> Add return value to scheduler job TO handler (by Luben Tuikov) and use this in amdgpu for prevention >>>>>> of GPU recovery post device unplug >>>>>> Also rebase on top of drm-misc-mext instead of amd-staging-drm-next >>>>>> >>>>>> With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears >>>>>> is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working >>>>>> with the primary card or soft reset the device without hangs or oopses >>>>>> >>>>>> TODOs for followup work: >>>>>> Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel) >>>>>> Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back. >>>>>> Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card. >>>>>> >>>>>> [1] - Discussions during v3 of the patchset https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >>>>>> [2] - drm/doc: device hot-unplug for userspace https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinic... >>>>>> [3] - Related gitlab ticket https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre... >>>>> btw have you tried this out with some of the igts we have? core_hotunplug >>>>> is the one I'm thinking of. Might be worth to extend this for amdgpu >>>>> specific stuff (like run some batches on it while hotunplugging). >>>> No, I mostly used just running glxgears while testing which covers already >>>> exported/imported dma-buf case and a few manually hacked tests in libdrm amdgpu >>>> test suite >>>> >>>> >>>>> Since there's so many corner cases we need to test here (shared dma-buf, >>>>> shared dma_fence) I think it would make sense to have a shared testcase >>>>> across drivers. >>>> Not familiar with IGT too much, is there an easy way to setup shared dma bufs >>>> and fences >>>> use cases there or you mean I need to add them now ? >>> We do have test infrastructure for all of that, but the hotunplug test >>> doesn't have that yet I think. >>> >>>>> Only specific thing would be some hooks to keep the gpu >>>>> busy in some fashion while we yank the driver. >>>> Do you mean like staring X and some active rendering on top (like glxgears) >>>> automatically from within IGT ? >>> Nope, igt is meant to be bare metal testing so you don't have to drag >>> the entire winsys around (which in a wayland world, is not really good >>> for driver testing anyway, since everything is different). We use this >>> for our pre-merge ci for drm/i915. >> So i keep it busy by X/glxgers which is manual operation. What you suggest >> then is some client within IGT which opens the device and starts submitting jobs >> (which is much like what libdrm amdgpu tests already do) ? And this >> part is the amdgou specific code I just need to port from libdrm to here ? > Yup. For i915 tests we have an entire library already for small workloads, > including some that just spin forever (useful for reset testing and could > also come handy for unload testing). > -Daniel Does it mean I would have to drag in the entire infrastructure code from within libdrm amdgpu code that allows for command submissions through our IOCTLs ?
No it's perfectly fine to use libdrm in igt tests, we do that too. I just mean we have some additional helpers to submit specific workloads for intel gpu, like rendercpy to move data with the 3d engine (just using copy engines only isn't good enough sometimes for testing), or the special hanging batchbuffers we use for reset testing, or in general for having precise control over race conditions and things like that.
One thing that was somewhat annoying for i915 but shouldn't be a problem for amdgpu is that igt builds on intel. So we have stub functions for libdrm-intel, since libdrm-intel doesn't build on arm. Shouldn't be a problem for you. -Daniel
Tested with igt hot-unplug test. Passed unbind_rebind, unplug-rescan, hot-unbind-rebind and hotunplug-rescan if disabling the rescan part as I don't support plug-back for now. Also added command submission for amdgpu. Attached a draft of submitting workload while unbinding the driver or simulating detach. Catched 2 issues with unpug if command submission in flight during unplug - (unsignaled fence causing a hang in amdgpu_cs_sync and hitting a BUG_ON in gfx_v9_0_ring_emit_patch_cond_exec whic is expected i guess). Guess glxgears command submissions is at a much slower rate so this was missed. Is that what you meant for this test ?
Yup. Would be good if you can submit this one for inclusion. -Daniel
Will do together with exported dma-buf test once I do it.
P.S How am i supposed to do exported fence test. Exporting a fence from device A, importing it into device B, unplugging device A then signaling the fence from device B - this supposed to call a fence cb which was registered by the exporter which by now is dead and hence will cause a 'use after free' ?
Yeah in the end we'd need 2 hw devices for testing full fence functionality. A useful intermediate step would be to just export the fence (either as sync_file, which I think amdgpu doesn't support because no android egl support in mesa) or drm_syncobj (which you can do as standalone fd too iirc), and then just using the fence a bit from userspace (like wait on it or get its status) after the device is unplugged.
I think this should cover most of the cross-driver issues that fences bring in, and I think for the other problems we can worry once we spot. -Daniel
OK, will write up all the tests and submit a merge request for all of them together to IGT gitlab
Andrey
Andrey
Andrey
Andrey
>> Andrey >> >> >>>>> But just to get it started >>>>> you can throw in entirely amdgpu specific subtests and just share some of >>>>> the test code. >>>>> -Daniel >>>> Im general, I wasn't aware of this test suite and looks like it does what i test >>>> among other stuff. >>>> I will definitely try to run with it although the rescan part will not work as >>>> plugging >>>> the device back is in my TODO list and not part of the scope for this patchset >>>> and so I will >>>> probably comment the re-scan section out while testing. >>> amd gem has been using libdrm-amd thus far iirc, but for things like >>> this I think it'd be worth to at least consider switching. Display >>> team has already started to use some of the test and contribute stuff >>> (I think the VRR testcase is from amd). >>> -Daniel >>> >>>> Andrey >>>> >>>> >>>>>> Andrey Grodzovsky (13): >>>>>> drm/ttm: Remap all page faults to per process dummy page. >>>>>> drm: Unamp the entire device address space on device unplug >>>>>> drm/ttm: Expose ttm_tt_unpopulate for driver use >>>>>> drm/sched: Cancel and flush all oustatdning jobs before finish. >>>>>> drm/amdgpu: Split amdgpu_device_fini into early and late >>>>>> drm/amdgpu: Add early fini callback >>>>>> drm/amdgpu: Register IOMMU topology notifier per device. >>>>>> drm/amdgpu: Fix a bunch of sdma code crash post device unplug >>>>>> drm/amdgpu: Remap all page faults to per process dummy page. >>>>>> dmr/amdgpu: Move some sysfs attrs creation to default_attr >>>>>> drm/amdgpu: Guard against write accesses after device removal >>>>>> drm/sched: Make timeout timer rearm conditional. >>>>>> drm/amdgpu: Prevent any job recoveries after device is unplugged. >>>>>> >>>>>> Luben Tuikov (1): >>>>>> drm/scheduler: Job timeout handler returns status >>>>>> >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 +-- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 149 ++++++++++++++++++++-- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 ++- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 15 ++- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 2 +- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 ++ >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 25 ++-- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 26 ++-- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 3 +- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ++- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++ >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 + >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 +++++--- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 + >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 ++++++++++ >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 52 +------- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 +- >>>>>> drivers/gpu/drm/amd/amdgpu/cik_ih.c | 2 +- >>>>>> drivers/gpu/drm/amd/amdgpu/cz_ih.c | 2 +- >>>>>> drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 2 +- >>>>>> drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 2 +- >>>>>> drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +-- >>>>>> drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +- >>>>>> drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +- >>>>>> drivers/gpu/drm/amd/amdgpu/si_ih.c | 2 +- >>>>>> drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 2 +- >>>>>> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +- >>>>>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +- >>>>>> drivers/gpu/drm/amd/include/amd_shared.h | 2 + >>>>>> drivers/gpu/drm/drm_drv.c | 3 + >>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +- >>>>>> drivers/gpu/drm/lima/lima_sched.c | 4 +- >>>>>> drivers/gpu/drm/panfrost/panfrost_job.c | 9 +- >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 18 ++- >>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 82 +++++++++++- >>>>>> drivers/gpu/drm/ttm/ttm_tt.c | 1 + >>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 32 ++--- >>>>>> include/drm/gpu_scheduler.h | 17 ++- >>>>>> include/drm/ttm/ttm_bo_api.h | 2 + >>>>>> 45 files changed, 583 insertions(+), 198 deletions(-) >>>>>> >>>>>> -- >>>>>> 2.7.4 >>>>>>
Looked a bit into it, I want to export sync_object to FD and import from that FD such that I will wait on the imported sync object handle from one thread while signaling the exported sync object handle from another (post device unplug) ?
My problem is how to create a sync object with a non signaled 'fake' fence ? I only see API that creates it with already signaled fence (or none) - https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_syncobj.c...
P.S I expect the kernel to crash since unlike with dma_bufs we don't hold drm device reference here on export.
Andrey
On 2/9/21 4:50 AM, Daniel Vetter wrote:
Yeah in the end we'd need 2 hw devices for testing full fence functionality. A useful intermediate step would be to just export the fence (either as sync_file, which I think amdgpu doesn't support because no android egl support in mesa) or drm_syncobj (which you can do as standalone fd too iirc), and then just using the fence a bit from userspace (like wait on it or get its status) after the device is unplugged.
On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Looked a bit into it, I want to export sync_object to FD and import from that FD such that I will wait on the imported sync object handle from one thread while signaling the exported sync object handle from another (post device unplug) ?
My problem is how to create a sync object with a non signaled 'fake' fence ? I only see API that creates it with already signaled fence (or none) - https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_syncobj.c...
P.S I expect the kernel to crash since unlike with dma_bufs we don't hold drm device reference here on export.
Well maybe there's no crash. I think if you go through all your dma_fence that you have and force-complete them, then I think external callers wont go into the driver anymore. But there's still pointers potentially pointing at your device struct and all that, but should work. Still needs some audit ofc.
Wrt how you get such a free-standing fence, that's amdgpu specific. Roughly - submit cs - get the fence for that (either sync_file, but I don't think amdgpu supports that, or maybe through drm_syncobj) - hotunplug - wait on that fence somehow (drm_syncobj has direct uapi for this, same for sync_file I think)
Cheers, Daniel
Andrey
On 2/9/21 4:50 AM, Daniel Vetter wrote:
Yeah in the end we'd need 2 hw devices for testing full fence functionality. A useful intermediate step would be to just export the fence (either as sync_file, which I think amdgpu doesn't support because no android egl support in mesa) or drm_syncobj (which you can do as standalone fd too iirc), and then just using the fence a bit from userspace (like wait on it or get its status) after the device is unplugged.
On 2021-02-19 5:24 a.m., Daniel Vetter wrote:
On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Looked a bit into it, I want to export sync_object to FD and import from that FD such that I will wait on the imported sync object handle from one thread while signaling the exported sync object handle from another (post device unplug) ?
My problem is how to create a sync object with a non signaled 'fake' fence ? I only see API that creates it with already signaled fence (or none) - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...
P.S I expect the kernel to crash since unlike with dma_bufs we don't hold drm device reference here on export.
Well maybe there's no crash. I think if you go through all your dma_fence that you have and force-complete them, then I think external callers wont go into the driver anymore. But there's still pointers potentially pointing at your device struct and all that, but should work. Still needs some audit ofc.
Wrt how you get such a free-standing fence, that's amdgpu specific. Roughly
- submit cs
- get the fence for that (either sync_file, but I don't think amdgpu
supports that, or maybe through drm_syncobj)
- hotunplug
- wait on that fence somehow (drm_syncobj has direct uapi for this,
same for sync_file I think)
Cheers, Daniel
Indeed worked fine, did with 2 devices. Since syncobj is refcounted, even after I destroyed the original syncobj and unplugged the device, the exported syncobj and the fence inside didn't go anywhere.
See my 3 tests in my branch on Gitlab https://gitlab.freedesktop.org/agrodzov/igt-gpu-tools/-/commits/master and let me know if I should go ahead and do a merge request (into which target project/branch ?) or you have more comments.
Andrey
Andrey
On 2/9/21 4:50 AM, Daniel Vetter wrote:
Yeah in the end we'd need 2 hw devices for testing full fence functionality. A useful intermediate step would be to just export the fence (either as sync_file, which I think amdgpu doesn't support because no android egl support in mesa) or drm_syncobj (which you can do as standalone fd too iirc), and then just using the fence a bit from userspace (like wait on it or get its status) after the device is unplugged.
On Wed, Feb 24, 2021 at 11:30:50AM -0500, Andrey Grodzovsky wrote:
On 2021-02-19 5:24 a.m., Daniel Vetter wrote:
On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Looked a bit into it, I want to export sync_object to FD and import from that FD such that I will wait on the imported sync object handle from one thread while signaling the exported sync object handle from another (post device unplug) ?
My problem is how to create a sync object with a non signaled 'fake' fence ? I only see API that creates it with already signaled fence (or none) - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...
P.S I expect the kernel to crash since unlike with dma_bufs we don't hold drm device reference here on export.
Well maybe there's no crash. I think if you go through all your dma_fence that you have and force-complete them, then I think external callers wont go into the driver anymore. But there's still pointers potentially pointing at your device struct and all that, but should work. Still needs some audit ofc.
Wrt how you get such a free-standing fence, that's amdgpu specific. Roughly
- submit cs
- get the fence for that (either sync_file, but I don't think amdgpu
supports that, or maybe through drm_syncobj)
- hotunplug
- wait on that fence somehow (drm_syncobj has direct uapi for this,
same for sync_file I think)
Cheers, Daniel
Indeed worked fine, did with 2 devices. Since syncobj is refcounted, even after I destroyed the original syncobj and unplugged the device, the exported syncobj and the fence inside didn't go anywhere.
See my 3 tests in my branch on Gitlab https://gitlab.freedesktop.org/agrodzov/igt-gpu-tools/-/commits/master and let me know if I should go ahead and do a merge request (into which target project/branch ?) or you have more comments.
igt still works with patch submission. -Daniel
Andrey
Andrey
On 2/9/21 4:50 AM, Daniel Vetter wrote:
Yeah in the end we'd need 2 hw devices for testing full fence functionality. A useful intermediate step would be to just export the fence (either as sync_file, which I think amdgpu doesn't support because no android egl support in mesa) or drm_syncobj (which you can do as standalone fd too iirc), and then just using the fence a bit from userspace (like wait on it or get its status) after the device is unplugged.
On 2021-02-25 5:25 a.m., Daniel Vetter wrote:
On Wed, Feb 24, 2021 at 11:30:50AM -0500, Andrey Grodzovsky wrote:
On 2021-02-19 5:24 a.m., Daniel Vetter wrote:
On Thu, Feb 18, 2021 at 9:03 PM Andrey Grodzovsky Andrey.Grodzovsky@amd.com wrote:
Looked a bit into it, I want to export sync_object to FD and import from that FD such that I will wait on the imported sync object handle from one thread while signaling the exported sync object handle from another (post device unplug) ?
My problem is how to create a sync object with a non signaled 'fake' fence ? I only see API that creates it with already signaled fence (or none) - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...
P.S I expect the kernel to crash since unlike with dma_bufs we don't hold drm device reference here on export.
Well maybe there's no crash. I think if you go through all your dma_fence that you have and force-complete them, then I think external callers wont go into the driver anymore. But there's still pointers potentially pointing at your device struct and all that, but should work. Still needs some audit ofc.
Wrt how you get such a free-standing fence, that's amdgpu specific. Roughly
- submit cs
- get the fence for that (either sync_file, but I don't think amdgpu
supports that, or maybe through drm_syncobj)
- hotunplug
- wait on that fence somehow (drm_syncobj has direct uapi for this,
same for sync_file I think)
Cheers, Daniel
Indeed worked fine, did with 2 devices. Since syncobj is refcounted, even after I destroyed the original syncobj and unplugged the device, the exported syncobj and the fence inside didn't go anywhere.
See my 3 tests in my branch on Gitlab https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre... and let me know if I should go ahead and do a merge request (into which target project/branch ?) or you have more comments.
igt still works with patch submission. -Daniel
I see, Need to divert to other work for a while, will get to it once I am back to device unplug.
Andrey
Andrey
Andrey
On 2/9/21 4:50 AM, Daniel Vetter wrote:
Yeah in the end we'd need 2 hw devices for testing full fence functionality. A useful intermediate step would be to just export the fence (either as sync_file, which I think amdgpu doesn't support because no android egl support in mesa) or drm_syncobj (which you can do as standalone fd too iirc), and then just using the fence a bit from userspace (like wait on it or get its status) after the device is unplugged.
dri-devel@lists.freedesktop.org