[RFC 0/6] Define and use reset domain for GPU recovery in amdgpu

List overview All Threads
Download

newer

older

[PATCH 1/2] drm: Always include...

linux-next: manual merge of the...

Andrey Grodzovsky

17 Dec 2021 17 Dec '21

10:27 p.m.

This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same ordered wq and in this way simplify somewhat our GPU reset code so we don't need to protect from concurrency by multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another.

TDR triggered by multiple entities within the same domain due to the same reason will not be triggered as the first such reset will cancel all the pending resets. This is relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.

[1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-...

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Andrey Grodzovsky (6): drm/amdgpu: Init GPU reset single threaded wq drm/amdgpu: Move scheduler init to after XGMI is ready drm/amdgpu: Fix crash on modprobe drm/amdgpu: Serialize non TDR gpu recovery with TDRs drm/amdgpu: Drop hive->in_reset drm/amdgpu: Drop concurrent GPU reset protection for device

drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 36 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +- 7 files changed, 132 insertions(+), 136 deletions(-)

-- 2.25.1

Show replies by date

Andrey Grodzovsky

17 Dec 17 Dec

10:27 p.m.

New subject: [RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq

Do it for both single device and XGMI hive cases.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 2 ++ 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 9f017663ac50..b5ff76aae7e0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -812,6 +812,11 @@ struct amd_powerplay {

#define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 + +struct amdgpu_reset_domain { + struct workqueue_struct *wq; +}; + struct amdgpu_device { struct device *dev; struct pci_dev *pdev; @@ -1096,6 +1101,8 @@ struct amdgpu_device {

struct amdgpu_reset_control *reset_cntl; uint32_t ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE]; + + struct amdgpu_reset_domain reset_domain; };

static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5625f7736e37..5f13195d23d1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2391,9 +2391,27 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed;

- if (adev->gmc.xgmi.num_physical_nodes > 1) + if (adev->gmc.xgmi.num_physical_nodes > 1) { + struct amdgpu_hive_info *hive; + amdgpu_xgmi_add_device(adev);

+ hive = amdgpu_get_xgmi_hive(adev); + if (!hive || !hive->reset_domain.wq) { + DRM_ERROR("Failed to obtain reset domain info for XGMI hive:%llx", hive->hive_id); + r = -EINVAL; + goto init_failed; + } + + adev->reset_domain.wq = hive->reset_domain.wq; + } else { + adev->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-dev", 0); + if (!adev->reset_domain.wq) { + r = -ENOMEM; + goto init_failed; + } + } + /* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 0fad2bf854ae..8b116f398101 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -391,6 +391,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) goto pro_end; }

+ hive->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-hive", 0); + if (!hive->reset_domain.wq) { + dev_err(adev->dev, "XGMI: failed allocating wq for reset domain!\n"); + kfree(hive); + hive = NULL; + goto pro_end; + } + hive->hive_id = adev->gmc.xgmi.hive_id; INIT_LIST_HEAD(&hive->device_list); INIT_LIST_HEAD(&hive->node); @@ -400,6 +408,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) task_barrier_init(&hive->tb); hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN; hive->hi_req_gpu = NULL; + /* * hive pstate on boot is high in vega20 so we have to go to low * pstate on after boot. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index d2189bf7d428..6121aaa292cb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -42,6 +42,8 @@ struct amdgpu_hive_info { AMDGPU_XGMI_PSTATE_MAX_VEGA20, AMDGPU_XGMI_PSTATE_UNKNOWN } pstate; + + struct amdgpu_reset_domain reset_domain; };

struct amdgpu_pcs_ras_field {

-- 2.25.1

Andrey Grodzovsky

10:27 p.m.

New subject: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready

Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 34 ++-------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5f13195d23d1..b595e6d699b5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev) return r; }

+static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) +{ + long timeout; + int r, i; + + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = adev->rings[i]; + + /* No need to setup the GPU scheduler for rings that don't need it */ + if (!ring || ring->no_scheduler) + continue; + + switch (ring->funcs->type) { + case AMDGPU_RING_TYPE_GFX: + timeout = adev->gfx_timeout; + break; + case AMDGPU_RING_TYPE_COMPUTE: + timeout = adev->compute_timeout; + break; + case AMDGPU_RING_TYPE_SDMA: + timeout = adev->sdma_timeout; + break; + default: + timeout = adev->video_timeout; + break; + } + + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, + ring->num_hw_submission, amdgpu_job_hang_limit, + timeout, adev->reset_domain.wq, ring->sched_score, ring->name); + if (r) { + DRM_ERROR("Failed to create scheduler on ring %s.\n", + ring->name); + return r; + } + } + + return 0; +} + + /** * amdgpu_device_ip_init - run init for hardware IPs * @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) } }

+ r = amdgpu_device_init_schedulers(adev); + if (r) + goto init_failed; + /* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 3b7e86ea7167..5527c68c51de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, atomic_t *sched_score) { struct amdgpu_device *adev = ring->adev; - long timeout; - int r;

if (!adev) return -EINVAL; @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, spin_lock_init(&ring->fence_drv.lock); ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *), GFP_KERNEL); - if (!ring->fence_drv.fences) - return -ENOMEM;

- /* No need to setup the GPU scheduler for rings that don't need it */ - if (ring->no_scheduler) - return 0; + ring->num_hw_submission = num_hw_submission; + ring->sched_score = sched_score;

- switch (ring->funcs->type) { - case AMDGPU_RING_TYPE_GFX: - timeout = adev->gfx_timeout; - break; - case AMDGPU_RING_TYPE_COMPUTE: - timeout = adev->compute_timeout; - break; - case AMDGPU_RING_TYPE_SDMA: - timeout = adev->sdma_timeout; - break; - default: - timeout = adev->video_timeout; - break; - } - - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, - num_hw_submission, amdgpu_job_hang_limit, - timeout, NULL, sched_score, ring->name); - if (r) { - DRM_ERROR("Failed to create scheduler on ring %s.\n", - ring->name); - return r; - } + if (!ring->fence_drv.fences) + return -ENOMEM;

return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 4d380e79752c..a4b8279e3011 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -253,6 +253,8 @@ struct amdgpu_ring { bool has_compute_vm_bug; bool no_scheduler; int hw_prio; + unsigned num_hw_submission; + atomic_t *sched_score; };

#define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib)))

-- 2.25.1

Christian König

20 Dec 20 Dec

7:16 a.m.

New subject: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...

Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 34 ++-------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5f13195d23d1..b595e6d699b5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev) return r; }

+static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) +{
long timeout;

int r, i;

for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
/* No need to setup the GPU scheduler for rings that don't need it */
if (!ring || ring->no_scheduler)
	continue;
switch (ring->funcs->type) {
case AMDGPU_RING_TYPE_GFX:
	timeout = adev->gfx_timeout;
	break;
case AMDGPU_RING_TYPE_COMPUTE:
	timeout = adev->compute_timeout;
	break;
case AMDGPU_RING_TYPE_SDMA:
	timeout = adev->sdma_timeout;
	break;
default:
	timeout = adev->video_timeout;
	break;
}

...

r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,

		   ring->num_hw_submission, amdgpu_job_hang_limit,

		   timeout, adev->reset_domain.wq, ring->sched_score, ring->name);

```
if (r) {
```

	DRM_ERROR("Failed to create scheduler on ring %s.\n",

```
		  ring->name);
```
```
	return r;
```
```
}
```

Maybe better put that into amdgpu_ring.c. But not really a hard requirement, more a gut feeling.

...

}

return 0;

+}

/**

amdgpu_device_ip_init - run init for hardware IPs

@@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) } }
r = amdgpu_device_init_schedulers(adev);

if (r)
goto init_failed;
/* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 3b7e86ea7167..5527c68c51de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, atomic_t *sched_score) { struct amdgpu_device *adev = ring->adev;

long timeout;

int r;

if (!adev) return -EINVAL;

@@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, spin_lock_init(&ring->fence_drv.lock); ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *), GFP_KERNEL);
if (!ring->fence_drv.fences)
return -ENOMEM;
/* No need to setup the GPU scheduler for rings that don't need it */

if (ring->no_scheduler)
return 0;
ring->num_hw_submission = num_hw_submission;

ring->sched_score = sched_score;

Probably better to set that in the caller and drop the parameters from the amdgpu_fence_driver_init_ring() function completely.

Christian.

...

switch (ring->funcs->type) {

case AMDGPU_RING_TYPE_GFX:
timeout = adev->gfx_timeout;
break;
case AMDGPU_RING_TYPE_COMPUTE:
timeout = adev->compute_timeout;
break;
case AMDGPU_RING_TYPE_SDMA:
timeout = adev->sdma_timeout;
break;
default:
timeout = adev->video_timeout;
break;
}

r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
	   num_hw_submission, amdgpu_job_hang_limit,
	   timeout, NULL, sched_score, ring->name);
if (r) {
DRM_ERROR("Failed to create scheduler on ring %s.\n",
	  ring->name);
return r;
}
if (!ring->fence_drv.fences)
return -ENOMEM;
return 0; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 4d380e79752c..a4b8279e3011 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -253,6 +253,8 @@ struct amdgpu_ring { bool has_compute_vm_bug; bool no_scheduler; int hw_prio;

unsigned num_hw_submission;

atomic_t *sched_score; };

#define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib)))

Andrey Grodzovsky

9:51 p.m.

New subject: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready

On 2021-12-20 2:16 a.m., Christian König wrote:

...

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 34 ++-------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   | 2 + 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5f13195d23d1..b595e6d699b5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev)       return r; } +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) +{ +    long timeout; +    int r, i;

+    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { +        struct amdgpu_ring *ring = adev->rings[i];

+        /* No need to setup the GPU scheduler for rings that don't need it */ +        if (!ring || ring->no_scheduler) +            continue;

+        switch (ring->funcs->type) { +        case AMDGPU_RING_TYPE_GFX: +            timeout = adev->gfx_timeout; +            break; +        case AMDGPU_RING_TYPE_COMPUTE: +            timeout = adev->compute_timeout; +            break; +        case AMDGPU_RING_TYPE_SDMA: +            timeout = adev->sdma_timeout; +            break; +        default: +            timeout = adev->video_timeout; +            break; +        }

...
+        r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, +                   ring->num_hw_submission, amdgpu_job_hang_limit, +                   timeout, adev->reset_domain.wq, ring->sched_score, ring->name); +        if (r) { +            DRM_ERROR("Failed to create scheduler on ring %s.\n", +                  ring->name); +            return r; +        }

Maybe better put that into amdgpu_ring.c. But not really a hard requirement, more a gut feeling.

...
+    }

+    return 0; +}

/**    * amdgpu_device_ip_init - run init for hardware IPs    * @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)           }       } +    r = amdgpu_device_init_schedulers(adev); +    if (r) +        goto init_failed;

/* Don't init kfd if whole hive need to be reset during init */       if (!adev->gmc.xgmi.pending_reset)           amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 3b7e86ea7167..5527c68c51de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,                     atomic_t *sched_score) {       struct amdgpu_device *adev = ring->adev; -    long timeout; -    int r;       if (!adev)           return -EINVAL; @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,       spin_lock_init(&ring->fence_drv.lock);       ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *),                        GFP_KERNEL); -    if (!ring->fence_drv.fences) -        return -ENOMEM; -    /* No need to setup the GPU scheduler for rings that don't need it */ -    if (ring->no_scheduler) -        return 0; +    ring->num_hw_submission = num_hw_submission; +    ring->sched_score = sched_score;

Probably better to set that in the caller and drop the parameters from the amdgpu_fence_driver_init_ring() function completely.

Christian.

I noticed that at least num_hw_submission is validated within the function so not sure we should then discard the parameters.

Andrey

...

...
- switch (ring->funcs->type) { - case AMDGPU_RING_TYPE_GFX: - timeout = adev->gfx_timeout; - break; - case AMDGPU_RING_TYPE_COMPUTE: - timeout = adev->compute_timeout; - break; - case AMDGPU_RING_TYPE_SDMA: - timeout = adev->sdma_timeout; - break; - default: - timeout = adev->video_timeout; - break; - }

- r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, - num_hw_submission, amdgpu_job_hang_limit, - timeout, NULL, sched_score, ring->name); - if (r) { - DRM_ERROR("Failed to create scheduler on ring %s.\n", - ring->name); - return r; - } + if (!ring->fence_drv.fences) + return -ENOMEM; return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 4d380e79752c..a4b8279e3011 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -253,6 +253,8 @@ struct amdgpu_ring { bool has_compute_vm_bug; bool no_scheduler; int hw_prio; + unsigned num_hw_submission; + atomic_t *sched_score; }; #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib)))

Christian König

21 Dec 21 Dec

7:05 a.m.

New subject: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready

Am 20.12.21 um 22:51 schrieb Andrey Grodzovsky:

...

On 2021-12-20 2:16 a.m., Christian König wrote:

...
Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 34 ++-------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   | 2 + 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5f13195d23d1..b595e6d699b5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev)       return r; } +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) +{ +    long timeout; +    int r, i;

+    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { +        struct amdgpu_ring *ring = adev->rings[i];

+        /* No need to setup the GPU scheduler for rings that don't need it */ +        if (!ring || ring->no_scheduler) +            continue;

+        switch (ring->funcs->type) { +        case AMDGPU_RING_TYPE_GFX: +            timeout = adev->gfx_timeout; +            break; +        case AMDGPU_RING_TYPE_COMPUTE: +            timeout = adev->compute_timeout; +            break; +        case AMDGPU_RING_TYPE_SDMA: +            timeout = adev->sdma_timeout; +            break; +        default: +            timeout = adev->video_timeout; +            break; +        }

...
+        r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, +                   ring->num_hw_submission, amdgpu_job_hang_limit, +                   timeout, adev->reset_domain.wq, ring->sched_score, ring->name); +        if (r) { +            DRM_ERROR("Failed to create scheduler on ring %s.\n", +                  ring->name); +            return r; +        }

Maybe better put that into amdgpu_ring.c. But not really a hard requirement, more a gut feeling.

...
+    }

+    return 0; +}

/**    * amdgpu_device_ip_init - run init for hardware IPs    * @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)           }       } +    r = amdgpu_device_init_schedulers(adev); +    if (r) +        goto init_failed;

/* Don't init kfd if whole hive need to be reset during init */       if (!adev->gmc.xgmi.pending_reset)           amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 3b7e86ea7167..5527c68c51de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,                     atomic_t *sched_score) {       struct amdgpu_device *adev = ring->adev; -    long timeout; -    int r;       if (!adev)           return -EINVAL; @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,       spin_lock_init(&ring->fence_drv.lock);       ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *),                        GFP_KERNEL); -    if (!ring->fence_drv.fences) -        return -ENOMEM; -    /* No need to setup the GPU scheduler for rings that don't need it */ -    if (ring->no_scheduler) -        return 0; +    ring->num_hw_submission = num_hw_submission; +    ring->sched_score = sched_score;

Probably better to set that in the caller and drop the parameters from the amdgpu_fence_driver_init_ring() function completely.

Christian.

I noticed that at least num_hw_submission is validated within the function so not sure we should then discard the parameters.

Good point. It also doesn't make sense to move this check up because the power of two requirement comes from the fences, doesn't it?

Ok in this case just keep it like it is.

Christian.

...

Andrey

...
...
- switch (ring->funcs->type) { - case AMDGPU_RING_TYPE_GFX: - timeout = adev->gfx_timeout; - break; - case AMDGPU_RING_TYPE_COMPUTE: - timeout = adev->compute_timeout; - break; - case AMDGPU_RING_TYPE_SDMA: - timeout = adev->sdma_timeout; - break; - default: - timeout = adev->video_timeout; - break; - }

- r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, - num_hw_submission, amdgpu_job_hang_limit, - timeout, NULL, sched_score, ring->name); - if (r) { - DRM_ERROR("Failed to create scheduler on ring %s.\n", - ring->name); - return r; - } + if (!ring->fence_drv.fences) + return -ENOMEM; return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 4d380e79752c..a4b8279e3011 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -253,6 +253,8 @@ struct amdgpu_ring { bool has_compute_vm_bug; bool no_scheduler; int hw_prio; + unsigned num_hw_submission; + atomic_t *sched_score; }; #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib)))

Andrey Grodzovsky

17 Dec 17 Dec

10:27 p.m.

New subject: [RFC 3/6] drm/amdgpu: Fix crash on modprobe

Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue;

- if (!ring->no_scheduler) { + if (adev->in_suspend && !ring->no_scheduler) { drm_sched_resubmit_jobs(&ring->sched); drm_sched_start(&ring->sched, true); }

-- 2.25.1

Christian König

20 Dec 20 Dec

7:17 a.m.

New subject: [RFC 3/6] drm/amdgpu: Fix crash on modprobe

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...

Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue;
if (!ring->no_scheduler) {
if (adev->in_suspend && !ring->no_scheduler) {

Uff, why is that suddenly necessary? Because of the changed order?

Christian.

...

	drm_sched_resubmit_jobs(&ring->sched);
	drm_sched_start(&ring->sched, true);
}

Andrey Grodzovsky

7:22 p.m.

New subject: [RFC 3/6] drm/amdgpu: Fix crash on modprobe

On 2021-12-20 2:17 a.m., Christian König wrote:

...

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) { + if (adev->in_suspend && !ring->no_scheduler) {

Uff, why is that suddenly necessary? Because of the changed order?

Christian.

Yes.

Andrey

...

...
drm_sched_resubmit_jobs(&ring->sched); drm_sched_start(&ring->sched, true); }

Christian König

21 Dec 21 Dec

7:02 a.m.

New subject: [RFC 3/6] drm/amdgpu: Fix crash on modprobe

Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:

...

On 2021-12-20 2:17 a.m., Christian König wrote:

...
Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) { + if (adev->in_suspend && !ring->no_scheduler) {

Uff, why is that suddenly necessary? Because of the changed order?

Christian.

Yes.

Mhm, that's quite bad design then.

How about we keep the order as is and allow specifying the reset work queue with drm_sched_start() ?

Christian.

...

Andrey

...
...
drm_sched_resubmit_jobs(&ring->sched); drm_sched_start(&ring->sched, true); }

Andrey Grodzovsky

4:03 p.m.

New subject: [RFC 3/6] drm/amdgpu: Fix crash on modprobe

On 2021-12-21 2:02 a.m., Christian König wrote:

...

Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:

...
On 2021-12-20 2:17 a.m., Christian König wrote:

...
Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) { + if (adev->in_suspend && !ring->no_scheduler) {

Uff, why is that suddenly necessary? Because of the changed order?

Christian.

Yes.

Mhm, that's quite bad design then.

If you look at the original patch for this https://www.spinics.net/lists/amd-gfx/msg67560.html you will see that that restarting scheduler here is only relevant for suspend/resume case because there was a race to fix. There is no point in this code on driver init because nothing was submitted to scheduler yet and so it seems to me ok to add condition that this code run only in_suspend case.

...

How about we keep the order as is and allow specifying the reset work queue with drm_sched_start() ?

As i mentioned above, the fact we even have drm_sched_start there is just part of a solution to resolve a race during suspend/resume. It is not for device initialization and indeed, other client drivers of gpu shcheduler never call drm_sched_start on device init. We must guarantee that reset work queue already initialized before any job submission to scheduler and because of that IMHO the right place for this is drm_sched_init.

Andrey

...

Christian.

...
Andrey

...
...
drm_sched_resubmit_jobs(&ring->sched); drm_sched_start(&ring->sched, true); }

Christian König

22 Dec 22 Dec

7:50 a.m.

New subject: [RFC 3/6] drm/amdgpu: Fix crash on modprobe

Am 21.12.21 um 17:03 schrieb Andrey Grodzovsky:

...

On 2021-12-21 2:02 a.m., Christian König wrote:

...
Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:

...
On 2021-12-20 2:17 a.m., Christian König wrote:

...
Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) { + if (adev->in_suspend && !ring->no_scheduler) {

Uff, why is that suddenly necessary? Because of the changed order?

Christian.

Yes.

Mhm, that's quite bad design then.

If you look at the original patch for this https://www.spinics.net/lists/amd-gfx/msg67560.html you will see that that restarting scheduler here is only relevant for suspend/resume case because there was a race to fix. There is no point in this code on driver init because nothing was submitted to scheduler yet and so it seems to me ok to add condition that this code run only in_suspend case.

Yeah, but having extra logic like this means that we have some design issue in the IP block handling.

We need to clean that and some other odd approaches up at some point, but probably not now.

Christian.

...

...
How about we keep the order as is and allow specifying the reset work queue with drm_sched_start() ?

As i mentioned above, the fact we even have drm_sched_start there is just part of a solution to resolve a race during suspend/resume. It is not for device initialization and indeed, other client drivers of gpu shcheduler never call drm_sched_start on device init. We must guarantee that reset work queue already initialized before any job submission to scheduler and because of that IMHO the right place for this is drm_sched_init.

Andrey

...
Christian.

...
Andrey

...
...
drm_sched_resubmit_jobs(&ring->sched); drm_sched_start(&ring->sched, true); }

Andrey Grodzovsky

17 Dec 17 Dec

10:27 p.m.

New subject: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b5ff76aae7e0..8e96b9a14452 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job* job); +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, + struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b595e6d699b5..55cd67b9ede2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs( * Returns 0 for success or an error on failure. */

-int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, struct amdgpu_job *job) { struct list_head device_list, *device_list_handle = NULL; @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, return r; }

+struct recover_work_struct { + struct work_struct base; + struct amdgpu_device *adev; + struct amdgpu_job *job; + int ret; +}; + +static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) +{ + struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base); + + recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); +} +/* + * Serialize gpu recover into reset domain single threaded wq + */ +int amdgpu_device_gpu_recover(struct amdgpu_device *adev, + struct amdgpu_job *job) +{ + struct recover_work_struct work = {.adev = adev, .job = job}; + + INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work); + + if (!queue_work(adev->reset_domain.wq, &work.base)) + return -EAGAIN; + + flush_work(&work.base); + + return work.ret; +} + /** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index bfc47bea23db..38c9fd7b7ad4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid);

if (amdgpu_device_should_recover_gpu(ring->adev)) { - amdgpu_device_gpu_recover(ring->adev, job); + amdgpu_device_gpu_recover_imp(ring->adev, job); } else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev))

-- 2.25.1

Christian König

20 Dec 20 Dec

7:20 a.m.

New subject: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...

Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b5ff76aae7e0..8e96b9a14452 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job* job); +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
	      struct amdgpu_job *job);
void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b595e6d699b5..55cd67b9ede2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(

Returns 0 for success or an error on failure.

*/

-int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, struct amdgpu_job *job) { struct list_head device_list, *device_list_handle = NULL; @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, return r; }

+struct recover_work_struct {

Please add an amdgpu_ prefix to the name.

...

struct work_struct base;

struct amdgpu_device *adev;

struct amdgpu_job *job;

int ret;

+};

+static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) +{

struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base);

recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);

+} +/*

Serialize gpu recover into reset domain single threaded wq

*/

+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
		    struct amdgpu_job *job)
+{
struct recover_work_struct work = {.adev = adev, .job = job};

INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);

if (!queue_work(adev->reset_domain.wq, &work.base))
return -EAGAIN;
flush_work(&work.base);

return work.ret;
+}

Maybe that should be part of the scheduler code? Not really sure, just an idea.

Apart from that looks good to me, Christian.

...

/**

amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index bfc47bea23db..38c9fd7b7ad4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid);

if (amdgpu_device_should_recover_gpu(ring->adev)) {
amdgpu_device_gpu_recover(ring->adev, job);
amdgpu_device_gpu_recover_imp(ring->adev, job);
} else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev))

Andrey Grodzovsky

10:17 p.m.

New subject: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

On 2021-12-20 2:20 a.m., Christian König wrote:

...

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 2 +- 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b5ff76aae7e0..8e96b9a14452 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev,                     struct amdgpu_job* job); +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, +                  struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b595e6d699b5..55cd67b9ede2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(    * Returns 0 for success or an error on failure.    */ -int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,                     struct amdgpu_job *job) {       struct list_head device_list, *device_list_handle = NULL; @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,       return r; } +struct recover_work_struct {

Please add an amdgpu_ prefix to the name.

...
+    struct work_struct base; +    struct amdgpu_device *adev; +    struct amdgpu_job *job; +    int ret; +};

+static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) +{ +    struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base);

+    recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); +} +/*

Serialize gpu recover into reset domain single threaded wq

*/

+int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +                    struct amdgpu_job *job) +{ +    struct recover_work_struct work = {.adev = adev, .job = job};

+    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);

+    if (!queue_work(adev->reset_domain.wq, &work.base)) +        return -EAGAIN;

+    flush_work(&work.base);

+    return work.ret; +}

Maybe that should be part of the scheduler code? Not really sure, just an idea.

Seems to me that since the reset domain is almost always above a single scheduler granularity then it wouldn't feet very well there.

Andrey

...

Apart from that looks good to me, Christian.

...

/** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index bfc47bea23db..38c9fd7b7ad4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid); if (amdgpu_device_should_recover_gpu(ring->adev)) { - amdgpu_device_gpu_recover(ring->adev, job); + amdgpu_device_gpu_recover_imp(ring->adev, job); } else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev))

Christian König

21 Dec 21 Dec

7:59 a.m.

New subject: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

Am 20.12.21 um 23:17 schrieb Andrey Grodzovsky:

...

On 2021-12-20 2:20 a.m., Christian König wrote:

...
Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 2 +- 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b5ff76aae7e0..8e96b9a14452 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev,                     struct amdgpu_job* job); +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, +                  struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b595e6d699b5..55cd67b9ede2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(    * Returns 0 for success or an error on failure.    */ -int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,                     struct amdgpu_job *job) {       struct list_head device_list, *device_list_handle = NULL; @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,       return r; } +struct recover_work_struct {

Please add an amdgpu_ prefix to the name.

...
+    struct work_struct base; +    struct amdgpu_device *adev; +    struct amdgpu_job *job; +    int ret; +};

+static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) +{ +    struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base);

+    recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); +} +/*

Serialize gpu recover into reset domain single threaded wq

*/

+int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +                    struct amdgpu_job *job) +{ +    struct recover_work_struct work = {.adev = adev, .job = job};

+    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);

+    if (!queue_work(adev->reset_domain.wq, &work.base)) +        return -EAGAIN;

+    flush_work(&work.base);

+    return work.ret; +}

Maybe that should be part of the scheduler code? Not really sure, just an idea.

Seems to me that since the reset domain is almost always above a single scheduler granularity then it wouldn't feet very well there.

Yeah, but what if we introduce an drm_sched_recover_queue and drm_sched_recover_work object?

It's probably ok to go forward with that for now, but this handling makes quite some sense to have independent of which driver is using it. So as soon as we see a second similar implementation we should move it into common code.

Regards, Christian.

...

Andrey

...
Apart from that looks good to me, Christian.

...

/** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index bfc47bea23db..38c9fd7b7ad4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid); if (amdgpu_device_should_recover_gpu(ring->adev)) { - amdgpu_device_gpu_recover(ring->adev, job); + amdgpu_device_gpu_recover_imp(ring->adev, job); } else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev))

Andrey Grodzovsky

4:10 p.m.

New subject: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

On 2021-12-21 2:59 a.m., Christian König wrote:

...

Am 20.12.21 um 23:17 schrieb Andrey Grodzovsky:

...
On 2021-12-20 2:20 a.m., Christian König wrote:

...
Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 2 +- 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b5ff76aae7e0..8e96b9a14452 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev,                     struct amdgpu_job* job); +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, +                  struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b595e6d699b5..55cd67b9ede2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(    * Returns 0 for success or an error on failure.    */ -int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,                     struct amdgpu_job *job) {       struct list_head device_list, *device_list_handle = NULL; @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,       return r; } +struct recover_work_struct {

Please add an amdgpu_ prefix to the name.

...
+    struct work_struct base; +    struct amdgpu_device *adev; +    struct amdgpu_job *job; +    int ret; +};

+static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) +{ +    struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base);

+    recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); +} +/*

Serialize gpu recover into reset domain single threaded wq

*/

+int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +                    struct amdgpu_job *job) +{ +    struct recover_work_struct work = {.adev = adev, .job = job};

+    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);

+    if (!queue_work(adev->reset_domain.wq, &work.base)) +        return -EAGAIN;

+    flush_work(&work.base);

+    return work.ret; +}

Maybe that should be part of the scheduler code? Not really sure, just an idea.

Seems to me that since the reset domain is almost always above a single scheduler granularity then it wouldn't feet very well there.

Yeah, but what if we introduce an drm_sched_recover_queue and drm_sched_recover_work object?

It's probably ok to go forward with that for now, but this handling makes quite some sense to have independent of which driver is using it. So as soon as we see a second similar implementation we should move it into common code.

Regards, Christian.

Agree, the only point i would stress is that we need an entity separate from from drm_gpu_scheduler, something like drm_gpu_reset_domain which should point to or be pointed by a set of schedulers that should go through reset together.

Andrey

...

...
Andrey

...
Apart from that looks good to me, Christian.

...

/** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index bfc47bea23db..38c9fd7b7ad4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid); if (amdgpu_device_should_recover_gpu(ring->adev)) { - amdgpu_device_gpu_recover(ring->adev, job); + amdgpu_device_gpu_recover_imp(ring->adev, job); } else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev))

Andrey Grodzovsky

17 Dec 17 Dec

10:27 p.m.

New subject: [RFC 5/6] drm/amdgpu: Drop hive->in_reset

Since we serialize all resets no need to protect from concurrent resets.

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 - 3 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 55cd67b9ede2..d2701e4d0622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, dev_info(adev->dev, "GPU %s begin!\n", need_emergency_restart ? "jobs stop":"reset");

- /* - * Here we trylock to avoid chain of resets executing from - * either trigger by jobs on different adevs in XGMI hive or jobs on - * different schedulers for same device while this TO handler is running. - * We always reset all schedulers for device and all devices for XGMI - * hive so that should take care of them too. - */ hive = amdgpu_get_xgmi_hive(adev); - if (hive) { - if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) { - DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", - job ? job->base.id : -1, hive->hive_id); - amdgpu_put_xgmi_hive(hive); - if (job && job->vm) - drm_sched_increase_karma(&job->base); - return 0; - } + if (hive) mutex_lock(&hive->hive_lock); - }

reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; @@ -5226,7 +5210,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,

skip_recovery: if (hive) { - atomic_set(&hive->in_reset, 0); mutex_unlock(&hive->hive_lock); amdgpu_put_xgmi_hive(hive); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 8b116f398101..0d54bef5c494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -403,7 +403,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) INIT_LIST_HEAD(&hive->device_list); INIT_LIST_HEAD(&hive->node); mutex_init(&hive->hive_lock); - atomic_set(&hive->in_reset, 0); atomic_set(&hive->number_devices, 0); task_barrier_init(&hive->tb); hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6121aaa292cb..2f2ce53645a5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -33,7 +33,6 @@ struct amdgpu_hive_info { struct list_head node; atomic_t number_devices; struct mutex hive_lock; - atomic_t in_reset; int hi_req_count; struct amdgpu_device *hi_req_gpu; struct task_barrier tb;

-- 2.25.1

Andrey Grodzovsky

10:27 p.m.

New subject: [RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device

Since now all GPU resets are serialzied there is no need for this.

This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout'

Signed-off-by: Andrey Grodzovsky andrey.grodzovsky@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++-------------------- 1 file changed, 7 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d2701e4d0622..311e0b9e1e4f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4763,11 +4763,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, return r; }

-static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, +static void amdgpu_device_lock_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) { - if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0) - return false; + atomic_set(&adev->in_gpu_reset, 1);

if (hive) { down_write_nest_lock(&adev->reset_sem, &hive->hive_lock); @@ -4786,8 +4785,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, adev->mp1_state = PP_MP1_STATE_NONE; break; } - - return true; }

static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) @@ -4798,46 +4795,6 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) up_write(&adev->reset_sem); }

-/* - * to lockup a list of amdgpu devices in a hive safely, if not a hive - * with multiple nodes, it will be similar as amdgpu_device_lock_adev. - * - * unlock won't require roll back. - */ -static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) -{ - struct amdgpu_device *tmp_adev = NULL; - - if (adev->gmc.xgmi.num_physical_nodes > 1) { - if (!hive) { - dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes"); - return -ENODEV; - } - list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) { - if (!amdgpu_device_lock_adev(tmp_adev, hive)) - goto roll_back; - } - } else if (!amdgpu_device_lock_adev(adev, hive)) - return -EAGAIN; - - return 0; -roll_back: - if (!list_is_first(&tmp_adev->gmc.xgmi.head, &hive->device_list)) { - /* - * if the lockup iteration break in the middle of a hive, - * it may means there may has a race issue, - * or a hive device locked up independently. - * we may be in trouble and may not, so will try to roll back - * the lock and give out a warnning. - */ - dev_warn(tmp_adev->dev, "Hive lock iteration broke in the middle. Rolling back to unlock"); - list_for_each_entry_continue_reverse(tmp_adev, &hive->device_list, gmc.xgmi.head) { - amdgpu_device_unlock_adev(tmp_adev); - } - } - return -EAGAIN; -} - static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev) { struct pci_dev *p = NULL; @@ -5023,22 +4980,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, reset_context.hive = hive; clear_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags);

- /* - * lock the device before we try to operate the linked list - * if didn't get the device lock, don't touch the linked list since - * others may iterating it. - */ - r = amdgpu_device_lock_hive_adev(adev, hive); - if (r) { - dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress", - job ? job->base.id : -1); - - /* even we skipped this reset, still need to set the job to guilty */ - if (job && job->vm) - drm_sched_increase_karma(&job->base); - goto skip_recovery; - } - /* * Build list of devices to reset. * In case we are in XGMI hive mode, resort the device list @@ -5058,6 +4999,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,

/* block all schedulers and reset given job's ring */ list_for_each_entry(tmp_adev, device_list_handle, reset_list) { + + amdgpu_device_lock_adev(tmp_adev, hive); + /* * Try to put the audio codec into suspend state * before gpu reset started. @@ -5208,13 +5152,12 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, amdgpu_device_unlock_adev(tmp_adev); }

-skip_recovery: if (hive) { mutex_unlock(&hive->hive_lock); amdgpu_put_xgmi_hive(hive); }

- if (r && r != -EAGAIN) + if (r) dev_info(adev->dev, "GPU reset end with ret = %d\n", r); return r; } @@ -5437,20 +5380,6 @@ int amdgpu_device_baco_exit(struct drm_device *dev) return 0; }

-static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) -{ - int i; - - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { - struct amdgpu_ring *ring = adev->rings[i]; - - if (!ring || !ring->sched.thread) - continue; - - cancel_delayed_work_sync(&ring->sched.work_tdr); - } -} - /** * amdgpu_pci_error_detected - Called when a PCI error is detected. * @pdev: PCI device struct @@ -5481,14 +5410,10 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta /* Fatal error, prepare for slot reset */ case pci_channel_io_frozen: /* - * Cancel and wait for all TDRs in progress if failing to - * set adev->in_gpu_reset in amdgpu_device_lock_adev - * * Locking adev->reset_sem will prevent any external access * to GPU during PCI error recovery */ - while (!amdgpu_device_lock_adev(adev, NULL)) - amdgpu_cancel_all_tdr(adev); + amdgpu_device_lock_adev(adev, NULL);

/* * Block any work scheduling as we do for regular GPU reset

-- 2.25.1

Christian König

20 Dec 20 Dec

7:25 a.m.

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...

This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same ordered wq and in this way simplify somewhat our GPU reset code so we don't need to protect from concurrency by multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another.

TDR triggered by multiple entities within the same domain due to the same reason will not be triggered as the first such reset will cancel all the pending resets. This is relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.

[1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-...

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Patches #1 and #5, #6 are Reviewed-by: Christian König christian.koenig@amd.com

Some minor comments on the rest, but in general absolutely looks like the way we want to go.

Regards, Christian.

...

Andrey Grodzovsky (6): drm/amdgpu: Init GPU reset single threaded wq drm/amdgpu: Move scheduler init to after XGMI is ready drm/amdgpu: Fix crash on modprobe drm/amdgpu: Serialize non TDR gpu recovery with TDRs drm/amdgpu: Drop hive->in_reset drm/amdgpu: Drop concurrent GPU reset protection for device

drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 36 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +- 7 files changed, 132 insertions(+), 136 deletions(-)

Daniel Vetter

9:43 a.m.

On Mon, Dec 20, 2021 at 08:25:05AM +0100, Christian König wrote:

...

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same ordered wq and in this way simplify somewhat our GPU reset code so we don't need to protect from concurrency by multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another.

TDR triggered by multiple entities within the same domain due to the same reason will not be triggered as the first such reset will cancel all the pending resets. This is relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.

[1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-...

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Patches #1 and #5, #6 are Reviewed-by: Christian König christian.koenig@amd.com

Some minor comments on the rest, but in general absolutely looks like the way we want to go.

I only scrolled through quickly, but yeah I'm concurring. -Daniel

...

Regards, Christian.

...
Andrey Grodzovsky (6): drm/amdgpu: Init GPU reset single threaded wq drm/amdgpu: Move scheduler init to after XGMI is ready drm/amdgpu: Fix crash on modprobe drm/amdgpu: Serialize non TDR gpu recovery with TDRs drm/amdgpu: Drop hive->in_reset drm/amdgpu: Drop concurrent GPU reset protection for device

drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 36 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +- 7 files changed, 132 insertions(+), 136 deletions(-)

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Liu, Shaoyun

5:06 p.m.

[AMD Official Use Only]

Hi , Andrey I actually has some concerns about this change . 1. on SRIOV configuration , the reset notify coming from host , and driver already trigger a work queue to handle the reset (check xgpu_*_mailbox_flr_work) , is it a good idea to trigger another work queue inside the work queue ? Can we just use the new one you added ? 2. For KFD, the rocm use the user queue for the submission and it won't call the drm scheduler and hence no job timeout. Can we handle that with your new change ? 3 . For XGMI hive, there is only hive reset for all devices on bare-metal config , but for SRIOV config , the VF will support VF FLR, which means host might only need to reset specific device instead trigger whole hive reset . So we might still need reset_domain for individual device within the hive for SRIOV configuration.

Anyway I think this change need to be verified on sriov configuration on XGMI with some rocm use app is running .

Regards Shaoyun.liu

-----Original Message----- From: amd-gfx amd-gfx-bounces@lists.freedesktop.org On Behalf Of Christian König Sent: Monday, December 20, 2021 2:25 AM To: Grodzovsky, Andrey Andrey.Grodzovsky@amd.com; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org Cc: daniel@ffwll.ch; Chen, Horace Horace.Chen@amd.com; Koenig, Christian Christian.Koenig@amd.com; Liu, Monk Monk.Liu@amd.com Subject: Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...

This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same ordered wq and in this way simplify somewhat our GPU reset code so we don't need to protect from concurrency by multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another.

TDR triggered by multiple entities within the same domain due to the same reason will not be triggered as the first such reset will cancel all the pending resets. This is relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.

[1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc hwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210629073510.276439 1-3-boris.brezillon%40collabora.com%2F&data=04%7C01%7CShaoyun.Liu% 40amd.com%7C1d2b07ad556b4da5d58808d9c389decf%7C3dd8961fe4884e608e11a82 d994e183d%7C0%7C0%7C637755819206627827%7CUnknown%7CTWFpbGZsb3d8eyJWIjo iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp ;sdata=8C8UbdPmM%2FH6sdTYDP5lZfRfBdQ%2B%2FN7m6s%2FREW8%2BsoM%3D&re served=0

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Patches #1 and #5, #6 are Reviewed-by: Christian König christian.koenig@amd.com

Some minor comments on the rest, but in general absolutely looks like the way we want to go.

Regards, Christian.

...

Andrey Grodzovsky (6): drm/amdgpu: Init GPU reset single threaded wq drm/amdgpu: Move scheduler init to after XGMI is ready drm/amdgpu: Fix crash on modprobe drm/amdgpu: Serialize non TDR gpu recovery with TDRs drm/amdgpu: Drop hive->in_reset drm/amdgpu: Drop concurrent GPU reset protection for device

drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 36 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +- 7 files changed, 132 insertions(+), 136 deletions(-)

Andrey Grodzovsky

7:11 p.m.

On 2021-12-20 12:06 p.m., Liu, Shaoyun wrote:

...

[AMD Official Use Only]

Hi , Andrey I actually has some concerns about this change .

on SRIOV configuration , the reset notify coming from host , and driver already trigger a work queue to handle the reset (check xgpu_*_mailbox_flr_work) , is it a good idea to trigger another work queue inside the work queue ? Can we just use the new one you added ?

Shouldn't be a problem, i will change. In fact it's a great idea because then it looks like we can totally drop 'adev->in_gpu_reset' since we don't need to lock again concurrent resets anymore

...

For KFD, the rocm use the user queue for the submission and it won't call the drm scheduler and hence no job timeout. Can we handle that with your new change ?

I think that not a problem - a lot of places use direct submissions and not scheduler, in case they need to synchronize against concurrent GPU resets they lock adev->reset_sem. Nothing changes in this sense.

...

3 . For XGMI hive, there is only hive reset for all devices on bare-metal config , but for SRIOV config , the VF will support VF FLR, which means host might only need to reset specific device instead trigger whole hive reset . So we might still need reset_domain for individual device within the hive for SRIOV configuration.

This is something future right ? I don't see it in the code - in this case we will have to account for this as part of the generic design for this kind of single device reset within XGMI hive. It should require only a minor addition to current design in creating 2 parallel reset domains - one for hive and one per device.

...

Anyway I think this change need to be verified on sriov configuration on XGMI with some rocm use app is running .

I do have XGMI setup where I still test XGMI resets. It has ROCm stack there - can you please login there and tell me if I have what needed there to do the tests you advise ? I am not very familiar with ROCm tools as i usually test using libdrm. (Ping me on teams for the device ip and user name)

Andrey

...

Regards Shaoyun.liu

-----Original Message----- From: amd-gfx amd-gfx-bounces@lists.freedesktop.org On Behalf Of Christian König Sent: Monday, December 20, 2021 2:25 AM To: Grodzovsky, Andrey Andrey.Grodzovsky@amd.com; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org Cc: daniel@ffwll.ch; Chen, Horace Horace.Chen@amd.com; Koenig, Christian Christian.Koenig@amd.com; Liu, Monk Monk.Liu@amd.com Subject: Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:

...
This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same ordered wq and in this way simplify somewhat our GPU reset code so we don't need to protect from concurrency by multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another.

TDR triggered by multiple entities within the same domain due to the same reason will not be triggered as the first such reset will cancel all the pending resets. This is relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.

[1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc hwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210629073510.276439 1-3-boris.brezillon%40collabora.com%2F&data=04%7C01%7CShaoyun.Liu% 40amd.com%7C1d2b07ad556b4da5d58808d9c389decf%7C3dd8961fe4884e608e11a82 d994e183d%7C0%7C0%7C637755819206627827%7CUnknown%7CTWFpbGZsb3d8eyJWIjo iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp ;sdata=8C8UbdPmM%2FH6sdTYDP5lZfRfBdQ%2B%2FN7m6s%2FREW8%2BsoM%3D&re served=0

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Patches #1 and #5, #6 are Reviewed-by: Christian König christian.koenig@amd.com

Some minor comments on the rest, but in general absolutely looks like the way we want to go.

Regards, Christian.

...
Andrey Grodzovsky (6): drm/amdgpu: Init GPU reset single threaded wq drm/amdgpu: Move scheduler init to after XGMI is ready drm/amdgpu: Fix crash on modprobe drm/amdgpu: Serialize non TDR gpu recovery with TDRs drm/amdgpu: Drop hive->in_reset drm/amdgpu: Drop concurrent GPU reset protection for device

drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 36 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 3 +- 7 files changed, 132 insertions(+), 136 deletions(-)

1247

Age (days ago)

1252

Last active (days ago)

dri-devel@lists.freedesktop.org

22 comments

5 participants

tags (0)

participants (5)

Andrey Grodzovsky
Christian König
Christian König
Daniel Vetter
Liu, Shaoyun