On Mon 25-06-18 10:01:03, Michal Hocko wrote:
On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
On 2018-06-22 11:24 AM, Michal Hocko wrote:
On Fri 22-06-18 17:13:02, Christian König wrote:
Hi Michal,
[Adding Felix as well]
Well first of all you have a misconception why at least the AMD graphics driver need to be able to sleep in an MMU notifier: We need to sleep because we need to wait for hardware operations to finish and *NOT* because we need to wait for locks.
I'm not sure if your flag now means that you generally can't sleep in MMU notifiers any more, but if that's the case at least AMD hardware will break badly. In our case the approach of waiting for a short time for the process to be reaped and then select another victim actually sounds like the right thing to do.
Well, I do not need to make the notifier code non blocking all the time. All I need is to ensure that it won't sleep if the flag says so and return -EAGAIN instead.
So here is what I do for amdgpu:
In the case of KFD we also need to take the DQM lock:
amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr -> kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
So we'd need to pass the blockable parameter all the way through that call chain.
Thanks, I have missed that part. So I guess I will start with something similar to intel-gfx and back off when the current range needs some treatment. So this on top. Does it look correct?
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index d138a526feff..e2d422b3eb0b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn, struct amdgpu_mn_node *node; struct amdgpu_bo *bo;
if (!blockable) {
amdgpu_mn_read_unlock();
return -EAGAIN;
}
- node = container_of(it, struct amdgpu_mn_node, it); it = interval_tree_iter_next(it, start, end);
Ble, just noticed that half of the change didn't get to git index... This is what I have commit c4701b36ac2802b903db3d05cf77c030fccce3a8 Author: Michal Hocko mhocko@suse.com Date: Mon Jun 25 15:24:03 2018 +0200
fold me
- amd gpu notifiers can sleep deeper in the callchain (evict_process_queues_cpsch on a lock and amdgpu_mn_invalidate_node on unbound timeout) make sure we bail out when we have an intersecting range for starter
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index d138a526feff..3399a4a927fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -225,6 +225,11 @@ static int amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn, while (it) { struct amdgpu_mn_node *node;
+ if (!blockable) { + amdgpu_mn_read_unlock(rmn); + return -EAGAIN; + } + node = container_of(it, struct amdgpu_mn_node, it); it = interval_tree_iter_next(it, start, end);
@@ -266,6 +271,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn, struct amdgpu_mn_node *node; struct amdgpu_bo *bo;
+ if (!blockable) { + amdgpu_mn_read_unlock(rmn); + return -EAGAIN; + } + node = container_of(it, struct amdgpu_mn_node, it); it = interval_tree_iter_next(it, start, end);