On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
On 2018-06-22 11:24 AM, Michal Hocko wrote:
On Fri 22-06-18 17:13:02, Christian König wrote:
Hi Michal,
[Adding Felix as well]
Well first of all you have a misconception why at least the AMD graphics driver need to be able to sleep in an MMU notifier: We need to sleep because we need to wait for hardware operations to finish and *NOT* because we need to wait for locks.
I'm not sure if your flag now means that you generally can't sleep in MMU notifiers any more, but if that's the case at least AMD hardware will break badly. In our case the approach of waiting for a short time for the process to be reaped and then select another victim actually sounds like the right thing to do.
Well, I do not need to make the notifier code non blocking all the time. All I need is to ensure that it won't sleep if the flag says so and return -EAGAIN instead.
So here is what I do for amdgpu:
In the case of KFD we also need to take the DQM lock:
amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr -> kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
So we'd need to pass the blockable parameter all the way through that call chain.
Thanks, I have missed that part. So I guess I will start with something similar to intel-gfx and back off when the current range needs some treatment. So this on top. Does it look correct?
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index d138a526feff..e2d422b3eb0b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn, struct amdgpu_mn_node *node; struct amdgpu_bo *bo;
+ if (!blockable) { + amdgpu_mn_read_unlock(); + return -EAGAIN; + } + node = container_of(it, struct amdgpu_mn_node, it); it = interval_tree_iter_next(it, start, end);