Folks,
in the discussion about preempt count consistency across kernel configurations:
https://lore.kernel.org/r/20200914204209.256266093@linutronix.de/
it was concluded that the usage of in_interrupt() and related context checks should be removed from non-core code.
In the long run, usage of 'preemptible, in_*irq etc.' should be banned from driver code completely.
This series addresses parts of the amdgpu driver. There are still call sites left in in the amdgpu driver.
v1…v2: - Limit to admgpu only - use "bool" instead of "bool == true"
Sebastian
The usage of in_interrupt() in gmc_v*_process_interrupt() is intended to use a different code path if invoked from the interrupt handler vs invoked from the workqueue.
The usage of in_interrupt() in drivers is phased out and Linus clearly requested that code which changes behaviour depending on context should either be separated or the context be conveyed in an argument passed by the caller, which usually knows the context.
gmc_v*_process_interrupt() is invoked via the ->process() callback from amdgpu_ih_process() which in turn is invoked either from amdgpu_irq_handler() (the interrupt handler) or from amdgpu_irq_handle_*() which is a workqueue.
amdgpu_irq::ih is always processed from the interrupt handler, the other three struct amdgpu_ih_ring members are processed from a workqueue.
Replace the in_interrupt() check with a comparison against adev->irq.ih. A similar check is already done to check if the ih pointer is from ih_soft.
Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de --- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c index 3b7c6c31fce1f..7b6791d699e27 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c @@ -113,7 +113,7 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev, /* Delegate it to a different ring if the hardware hasn't * already done it. */ - if (in_interrupt()) { + if (entry->ih == &adev->irq.ih) { amdgpu_irq_delegate(adev, entry, 8); return 1; } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index aedef9017c4c2..266296be7302d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -486,7 +486,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev, /* Delegate it to a different ring if the hardware hasn't * already done it. */ - if (in_interrupt()) { + if (entry->ih == &adev->irq.ih) { amdgpu_irq_delegate(adev, entry, 8); return 1; }
gfx_v9_0_get_gpu_clock_counter() acquires a mutex_t lock and is the only caller of gfx_v9_0_kiq_read_clock(). If it safe to acquire a mutex_t then gfx_v9_0_get_gpu_clock_counter() is always invoked from preemptible context.
Remove in_interrupt() because it superfluous as it will always return false.
Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index ca7e7264926e6..72c319b860a33 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -4105,7 +4105,7 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev) * * also don't wait anymore for IRQ context * */ - if (r < 1 && (amdgpu_in_reset(adev) || in_interrupt())) + if (r < 1 && (amdgpu_in_reset(adev))) goto failed_kiq_read;
might_sleep();
gfx_v8_0_parse_sq_irq() is using in_task() to distinguish if it is invoked from a workqueue worker or directly from the interrupt handler.
The usage of in_interrupt() in drivers is phased out and Linus clearly requested that code which changes behaviour depending on context should either be separated or the context be conveyed in an argument passed by the caller, which usually knows the context.
gfx_v8_0_parse_sq_irq() is invoked directly either from a worker or from the interrupt service routine. The worker is only bypassed if the worker is already busy.
Add an argument `from_wq' to gfx_v8_0_parse_sq_irq() which is true if invoked from the worker.
Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 37639214cbbbd..f346afce82ea0 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -6719,7 +6719,8 @@ static int gfx_v8_0_cp_ecc_error_irq(struct amdgpu_device *adev, return 0; }
-static void gfx_v8_0_parse_sq_irq(struct amdgpu_device *adev, unsigned ih_data) +static void gfx_v8_0_parse_sq_irq(struct amdgpu_device *adev, unsigned ih_data, + bool from_wq) { u32 enc, se_id, sh_id, cu_id; char type[20]; @@ -6757,7 +6758,7 @@ static void gfx_v8_0_parse_sq_irq(struct amdgpu_device *adev, unsigned ih_data) * or from BH in which case we can access SQ_EDC_INFO * instance */ - if (in_task()) { + if (from_wq) { mutex_lock(&adev->grbm_idx_mutex); gfx_v8_0_select_se_sh(adev, se_id, sh_id, cu_id);
@@ -6795,7 +6796,7 @@ static void gfx_v8_0_sq_irq_work_func(struct work_struct *work) struct amdgpu_device *adev = container_of(work, struct amdgpu_device, gfx.sq_work.work); struct sq_work *sq_work = container_of(work, struct sq_work, work);
- gfx_v8_0_parse_sq_irq(adev, sq_work->ih_data); + gfx_v8_0_parse_sq_irq(adev, sq_work->ih_data, true); }
static int gfx_v8_0_sq_irq(struct amdgpu_device *adev, @@ -6810,7 +6811,7 @@ static int gfx_v8_0_sq_irq(struct amdgpu_device *adev, * just print whatever info is possible directly from the ISR. */ if (work_pending(&adev->gfx.sq_work.work)) { - gfx_v8_0_parse_sq_irq(adev, ih_data); + gfx_v8_0_parse_sq_irq(adev, ih_data, false); } else { adev->gfx.sq_work.ih_data = ih_data; schedule_work(&adev->gfx.sq_work.work);
Reviewed-by: Christian König christian.koenig@amd.com for the series.
Am 09.02.21 um 13:44 schrieb Sebastian Andrzej Siewior:
Folks,
in the discussion about preempt count consistency across kernel configurations:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
it was concluded that the usage of in_interrupt() and related context checks should be removed from non-core code.
In the long run, usage of 'preemptible, in_*irq etc.' should be banned from driver code completely.
This series addresses parts of the amdgpu driver. There are still call sites left in in the amdgpu driver.
v1…v2: - Limit to admgpu only - use "bool" instead of "bool == true"
Sebastian
On 2021-02-09 13:50:31 [+0100], Christian König wrote:
Reviewed-by: Christian König christian.koenig@amd.com for the series.
Thank you. Any chance you could give me a hand with the remaining three users within the amdgpu driver? I don't know if the in_interrupt() check can be limited to certain callers. What I noticed while tracing v5.10 is this:
| Xorg-2257 [007] d... 57261.620043: amdgpu_device_wreg: 0x699f, 0x00001bcf, 0x00000100 | => trace_event_raw_event_amdgpu_device_wreg | => amdgpu_device_wreg.part.0 | => dce110_arm_vert_intr | => dce110_vblank_set | => dm_enable_vblank | => drm_vblank_enable | => drm_vblank_get | => drm_wait_vblank_ioctl | => drm_ioctl_kernel | => drm_ioctl | => amdgpu_drm_ioctl | => __x64_sys_ioctl | => do_syscall_64 | => entry_SYSCALL_64_after_hwframe
I think that amdgpu_device_wreg() -> amdgpu_kiq_wreg() could be invoked. It doesn't here because amdgpu_sriov_runtime() is false. The trace says `d' which means interrupts are disabled but in_interrupt() will return false in this case (no IRQ/softirq).
Sebastian
Hi Sebastian,
to be honest I'm thinking about that for quite some time now and I don't think that this is possible without a severe rewrite of the driver.
The problem is simply that we have a lot of functions which deal with hardware handling independent of the context. But how registers are accessed needs to be different depending if your are in the interrupt handler or not.
You would need to push the information if we are coming in from the interrupt handler through a > 10 function calls.
I don't think that this is feasible nor good design.
Regards, Christian.
Am 09.02.21 um 17:53 schrieb Sebastian Andrzej Siewior:
On 2021-02-09 13:50:31 [+0100], Christian König wrote:
Reviewed-by: Christian König christian.koenig@amd.com for the series.
Thank you. Any chance you could give me a hand with the remaining three users within the amdgpu driver? I don't know if the in_interrupt() check can be limited to certain callers. What I noticed while tracing v5.10 is this:
| Xorg-2257 [007] d... 57261.620043: amdgpu_device_wreg: 0x699f, 0x00001bcf, 0x00000100 | => trace_event_raw_event_amdgpu_device_wreg | => amdgpu_device_wreg.part.0 | => dce110_arm_vert_intr | => dce110_vblank_set | => dm_enable_vblank | => drm_vblank_enable | => drm_vblank_get | => drm_wait_vblank_ioctl | => drm_ioctl_kernel | => drm_ioctl | => amdgpu_drm_ioctl | => __x64_sys_ioctl | => do_syscall_64 | => entry_SYSCALL_64_after_hwframe
I think that amdgpu_device_wreg() -> amdgpu_kiq_wreg() could be invoked. It doesn't here because amdgpu_sriov_runtime() is false. The trace says `d' which means interrupts are disabled but in_interrupt() will return false in this case (no IRQ/softirq).
Sebastian
On 2021-02-09 18:43:54 [+0100], Christian König wrote:
Hi Sebastian,
Hi Christian,
to be honest I'm thinking about that for quite some time now and I don't think that this is possible without a severe rewrite of the driver.
The problem is simply that we have a lot of functions which deal with hardware handling independent of the context. But how registers are accessed needs to be different depending if your are in the interrupt handler or not.
You would need to push the information if we are coming in from the interrupt handler through a > 10 function calls.
I don't think that this is feasible nor good design.
Yeah, that is what I saw and didn't even try.
The possible backtrace (at the bottom of this email) is this a correct assumption?
Another quick question: You acked my three-patch series. I don't see it in the next tree as of today. Is there anything for me to do?
Regards, Christian.
Am 09.02.21 um 17:53 schrieb Sebastian Andrzej Siewior:
On 2021-02-09 13:50:31 [+0100], Christian König wrote:
Reviewed-by: Christian König christian.koenig@amd.com for the series.
Thank you. Any chance you could give me a hand with the remaining three users within the amdgpu driver? I don't know if the in_interrupt() check can be limited to certain callers. What I noticed while tracing v5.10 is this:
| Xorg-2257 [007] d... 57261.620043: amdgpu_device_wreg: 0x699f, 0x00001bcf, 0x00000100 | => trace_event_raw_event_amdgpu_device_wreg | => amdgpu_device_wreg.part.0 | => dce110_arm_vert_intr | => dce110_vblank_set | => dm_enable_vblank | => drm_vblank_enable | => drm_vblank_get | => drm_wait_vblank_ioctl | => drm_ioctl_kernel | => drm_ioctl | => amdgpu_drm_ioctl | => __x64_sys_ioctl | => do_syscall_64 | => entry_SYSCALL_64_after_hwframe
I think that amdgpu_device_wreg() -> amdgpu_kiq_wreg() could be invoked. It doesn't here because amdgpu_sriov_runtime() is false. The trace says `d' which means interrupts are disabled but in_interrupt() will return false in this case (no IRQ/softirq).
Sebastian
Sebastian
Hi Sebastian,
Am 10.03.21 um 18:47 schrieb Sebastian Andrzej Siewior:
On 2021-02-09 18:43:54 [+0100], Christian König wrote:
to be honest I'm thinking about that for quite some time now and I don't think that this is possible without a severe rewrite of the driver.
The problem is simply that we have a lot of functions which deal with hardware handling independent of the context. But how registers are accessed needs to be different depending if your are in the interrupt handler or not.
You would need to push the information if we are coming in from the interrupt handler through a > 10 function calls.
I don't think that this is feasible nor good design.
Yeah, that is what I saw and didn't even try.
I also have no idea where to start.
The possible backtrace (at the bottom of this email) is this a correct assumption?
It's one of many, yes. But the real complicated once are in the CS UAPI and interrupt handling.
Another quick question: You acked my three-patch series. I don't see it in the next tree as of today. Is there anything for me to do?
Alex usually picks them up into amd-staging-drm-next which is then merged into drm-next.
Regards, Christian.
Regards, Christian.
Am 09.02.21 um 17:53 schrieb Sebastian Andrzej Siewior:
On 2021-02-09 13:50:31 [+0100], Christian König wrote:
Reviewed-by: Christian König christian.koenig@amd.com for the series.
Thank you. Any chance you could give me a hand with the remaining three users within the amdgpu driver? I don't know if the in_interrupt() check can be limited to certain callers. What I noticed while tracing v5.10 is this:
| Xorg-2257 [007] d... 57261.620043: amdgpu_device_wreg: 0x699f, 0x00001bcf, 0x00000100 | => trace_event_raw_event_amdgpu_device_wreg | => amdgpu_device_wreg.part.0 | => dce110_arm_vert_intr | => dce110_vblank_set | => dm_enable_vblank | => drm_vblank_enable | => drm_vblank_get | => drm_wait_vblank_ioctl | => drm_ioctl_kernel | => drm_ioctl | => amdgpu_drm_ioctl | => __x64_sys_ioctl | => do_syscall_64 | => entry_SYSCALL_64_after_hwframe
I think that amdgpu_device_wreg() -> amdgpu_kiq_wreg() could be invoked. It doesn't here because amdgpu_sriov_runtime() is false. The trace says `d' which means interrupts are disabled but in_interrupt() will return false in this case (no IRQ/softirq).
Sebastian
Sebastian
Applied. Thanks!
Alex
On Tue, Feb 9, 2021 at 7:50 AM Christian König christian.koenig@amd.com wrote:
Reviewed-by: Christian König christian.koenig@amd.com for the series.
Am 09.02.21 um 13:44 schrieb Sebastian Andrzej Siewior:
Folks,
in the discussion about preempt count consistency across kernel configurations:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
it was concluded that the usage of in_interrupt() and related context checks should be removed from non-core code.
In the long run, usage of 'preemptible, in_*irq etc.' should be banned from driver code completely.
This series addresses parts of the amdgpu driver. There are still call sites left in in the amdgpu driver.
v1…v2: - Limit to admgpu only - use "bool" instead of "bool == true"
Sebastian
amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
dri-devel@lists.freedesktop.org