This patch set fixes SDMA after module reload.
Monk Liu (4): drm/amdgpu: clear RB at ring init drm/amdgpu: clear SA bo when created drm/amdgpu: init more register for sdma drm/amdgpu: modify sdma start sequence
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 1 + drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 11 +++++++++-- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 10 ++++++++-- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 17 +++++++++++++---- 5 files changed, 34 insertions(+), 8 deletions(-)
From: Monk Liu Monk.Liu@amd.com
This help fix reloading driver hang issue of SDMA ring.
Signed-off-by: Monk Liu Monk.Liu@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 3b02272..a4b3f44 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -310,6 +310,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, } r = amdgpu_bo_kmap(ring->ring_obj, (void **)&ring->ring); + + memset((void *)ring->ring, 0, ring->ring_size); + amdgpu_bo_unreserve(ring->ring_obj); if (r) { dev_err(adev->dev, "(%d) ring map failed\n", r);
On 02.06.2016 07:27, Alex Deucher wrote:
From: Monk Liu Monk.Liu@amd.com
This help fix reloading driver hang issue of SDMA ring.
Signed-off-by: Monk Liu Monk.Liu@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 3b02272..a4b3f44 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -310,6 +310,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, } r = amdgpu_bo_kmap(ring->ring_obj, (void **)&ring->ring);
memset((void *)ring->ring, 0, ring->ring_size);
- amdgpu_bo_unreserve(ring->ring_obj); if (r) { dev_err(adev->dev, "(%d) ring map failed\n", r);
We should only call memset if amdgpu_bo_kmap succeeded. Same issue in patch 2.
There's something else about these two patches I'm a bit worried about:
The GPU should only read data from ring buffers and IBs that we previously explicitly wrote there. I'm afraid these patches might just paper over bugs elsewhere, which might still bite us under different circumstances.
We should only call memset if amdgpu_bo_kmap succeeded. Same issue in patch 2.
[ml] agreed
There's something else about these two patches I'm a bit worried about:
The GPU should only read data from ring buffers and IBs that we previously explicitly wrote there. I'm afraid these patches might just paper over bugs elsewhere, which might still bite us under different circumstances.
[ml] so do you mean we should use NOP to fulfill the ring buffer instead of 0 ?
-----Original Message----- From: Michel Dänzer [mailto:michel@daenzer.net] Sent: Tuesday, June 07, 2016 3:34 PM To: Alex Deucher alexdeucher@gmail.com Cc: dri-devel@lists.freedesktop.org; Liu, Monk Monk.Liu@amd.com Subject: Re: [PATCH 1/4] drm/amdgpu: clear RB at ring init
On 02.06.2016 07:27, Alex Deucher wrote:
From: Monk Liu Monk.Liu@amd.com
This help fix reloading driver hang issue of SDMA ring.
Signed-off-by: Monk Liu Monk.Liu@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 3b02272..a4b3f44 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -310,6 +310,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, } r = amdgpu_bo_kmap(ring->ring_obj, (void **)&ring->ring);
memset((void *)ring->ring, 0, ring->ring_size);
- amdgpu_bo_unreserve(ring->ring_obj); if (r) { dev_err(adev->dev, "(%d) ring map failed\n", r);
We should only call memset if amdgpu_bo_kmap succeeded. Same issue in patch 2.
There's something else about these two patches I'm a bit worried about:
The GPU should only read data from ring buffers and IBs that we previously explicitly wrote there. I'm afraid these patches might just paper over bugs elsewhere, which might still bite us under different circumstances.
On 07.06.2016 20:38, Liu, Monk wrote:
There's something else about these two patches I'm a bit worried about:
The GPU should only read data from ring buffers and IBs that we previously explicitly wrote there. I'm afraid these patches might just paper over bugs elsewhere, which might still bite us under different circumstances.
so do you mean we should use NOP to fulfill the ring buffer instead of 0 ?
I mean we should find out why the GPU seems to read something else than the packets we write to the ring buffer / IBs.
From: Monk Liu Monk.Liu@amd.com
This help fix reloading driver hang issue of SDMA ring
Signed-off-by: Monk Liu Monk.Liu@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c index 8bf84ef..48618ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c @@ -115,6 +115,7 @@ int amdgpu_sa_bo_manager_start(struct amdgpu_device *adev, return r; } r = amdgpu_bo_kmap(sa_manager->bo, &sa_manager->cpu_ptr); + memset(sa_manager->cpu_ptr, 0, sa_manager->size); amdgpu_bo_unreserve(sa_manager->bo); return r; }
From: Monk Liu Monk.Liu@amd.com
This help fix reloading driver hang issue of SDMA ring
Signed-off-by: Monk Liu Monk.Liu@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com --- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 ++ drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 ++ drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 ++ 3 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index 518dca4..76f73ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -419,6 +419,8 @@ static int cik_sdma_gfx_resume(struct amdgpu_device *adev) /* Initialize the ring buffer's read and write pointers */ WREG32(mmSDMA0_GFX_RB_RPTR + sdma_offsets[i], 0); WREG32(mmSDMA0_GFX_RB_WPTR + sdma_offsets[i], 0); + WREG32(mmSDMA0_GFX_IB_RPTR + sdma_offsets[i], 0); + WREG32(mmSDMA0_GFX_IB_OFFSET + sdma_offsets[i], 0);
/* set the wb address whether it's enabled or not */ WREG32(mmSDMA0_GFX_RB_RPTR_ADDR_HI + sdma_offsets[i], diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c index f4c3130..e11a374 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c @@ -461,6 +461,8 @@ static int sdma_v2_4_gfx_resume(struct amdgpu_device *adev) /* Initialize the ring buffer's read and write pointers */ WREG32(mmSDMA0_GFX_RB_RPTR + sdma_offsets[i], 0); WREG32(mmSDMA0_GFX_RB_WPTR + sdma_offsets[i], 0); + WREG32(mmSDMA0_GFX_IB_RPTR + sdma_offsets[i], 0); + WREG32(mmSDMA0_GFX_IB_OFFSET + sdma_offsets[i], 0);
/* set the wb address whether it's enabled or not */ WREG32(mmSDMA0_GFX_RB_RPTR_ADDR_HI + sdma_offsets[i], diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c index 31d99b00..585d8fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c @@ -672,6 +672,8 @@ static int sdma_v3_0_gfx_resume(struct amdgpu_device *adev) /* Initialize the ring buffer's read and write pointers */ WREG32(mmSDMA0_GFX_RB_RPTR + sdma_offsets[i], 0); WREG32(mmSDMA0_GFX_RB_WPTR + sdma_offsets[i], 0); + WREG32(mmSDMA0_GFX_IB_RPTR + sdma_offsets[i], 0); + WREG32(mmSDMA0_GFX_IB_OFFSET + sdma_offsets[i], 0);
/* set the wb address whether it's enabled or not */ WREG32(mmSDMA0_GFX_RB_RPTR_ADDR_HI + sdma_offsets[i],
From: Monk Liu Monk.Liu@amd.com
should fist halt engine, and then doing the register programing, and later unhalt engine, and finally run ring_test.
this help fix reloading driver hang issue of SDMA ring
original sequence is wrong for it programing engine after unhalt, which will lead to fault behavior when doing driver reloading after unloaded.
Signed-off-by: Monk Liu Monk.Liu@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com --- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 9 +++++++-- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 8 ++++++-- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 15 +++++++++++---- 3 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index 76f73ab..0079916 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -448,7 +448,12 @@ static int cik_sdma_gfx_resume(struct amdgpu_device *adev) WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], ib_cntl);
ring->ready = true; + } + + cik_sdma_enable(adev, true);
+ for (i = 0; i < adev->sdma.num_instances; i++) { + ring = &adev->sdma.instance[i].ring; r = amdgpu_ring_test_ring(ring); if (r) { ring->ready = false; @@ -531,8 +536,8 @@ static int cik_sdma_start(struct amdgpu_device *adev) if (r) return r;
- /* unhalt the MEs */ - cik_sdma_enable(adev, true); + /* halt the engine before programing */ + cik_sdma_enable(adev, false);
/* start the gfx rings and rlc compute queues */ r = cik_sdma_gfx_resume(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c index e11a374..f6014b0 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c @@ -491,7 +491,11 @@ static int sdma_v2_4_gfx_resume(struct amdgpu_device *adev) WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], ib_cntl);
ring->ready = true; + }
+ sdma_v2_4_enable(adev, true); + for (i = 0; i < adev->sdma.num_instances; i++) { + ring = &adev->sdma.instance[i].ring; r = amdgpu_ring_test_ring(ring); if (r) { ring->ready = false; @@ -582,8 +586,8 @@ static int sdma_v2_4_start(struct amdgpu_device *adev) return -EINVAL; }
- /* unhalt the MEs */ - sdma_v2_4_enable(adev, true); + /* halt the engine before programing */ + sdma_v2_4_enable(adev, false);
/* start the gfx rings and rlc compute queues */ r = sdma_v2_4_gfx_resume(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c index 585d8fe..33605d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c @@ -713,7 +713,15 @@ static int sdma_v3_0_gfx_resume(struct amdgpu_device *adev) WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], ib_cntl);
ring->ready = true; + }
+ /* unhalt the MEs */ + sdma_v3_0_enable(adev, true); + /* enable sdma ring preemption */ + sdma_v3_0_ctx_switch_enable(adev, true); + + for (i = 0; i < adev->sdma.num_instances; i++) { + ring = &adev->sdma.instance[i].ring; r = amdgpu_ring_test_ring(ring); if (r) { ring->ready = false; @@ -806,10 +814,9 @@ static int sdma_v3_0_start(struct amdgpu_device *adev) } }
- /* unhalt the MEs */ - sdma_v3_0_enable(adev, true); - /* enable sdma ring preemption */ - sdma_v3_0_ctx_switch_enable(adev, true); + /* disble sdma engine before programing it */ + sdma_v3_0_ctx_switch_enable(adev, false); + sdma_v3_0_enable(adev, false);
/* start the gfx rings and rlc compute queues */ r = sdma_v3_0_gfx_resume(adev);
dri-devel@lists.freedesktop.org