On 2022-03-17 12:04, Christian König wrote:
Am 17.03.22 um 16:10 schrieb Rob Clark:
[SNIP] userspace frozen != kthread frozen .. that is what this patch is trying to address, so we aren't racing between shutting down the hw and the scheduler shoveling more jobs at us.
Well exactly that's the problem. The scheduler is supposed to shoveling more jobs at us until it is empty.
Thinking more about it we will then keep some dma_fence instance unsignaled and that is and extremely bad idea since it can lead to deadlocks during suspend.
So this patch here is an absolute clear NAK from my side. If amdgpu is doing something similar that is a severe bug and needs to be addressed somehow.
From looking at latest amd-stagin-drm-next we only use directly kthread_park during in debugfs IB hooks. For S3 suspend (amdgpu_pmops_suspend) we will only flush all the HW fences (amdgpu_fence_wait_empty) so we don't freeze the scheduler thread and don't flush scheduler entities.
Andrey
Regards, Christian.
BR, -R