On Mon, May 04, 2020 at 06:30:00PM -0700, John Hubbard wrote:
On 2020-05-01 11:20, Jason Gunthorpe wrote:
From: Jason Gunthorpe jgg@mellanox.com
Presumably the intent here was that hmm_range_fault() could put the data into some HW specific format and thus avoid some work. However, nothing actually does that, and it isn't clear how anything actually could do that as hmm_range_fault() provides CPU addresses which must be DMA mapped.
Perhaps there is some special HW that does not need DMA mapping, but we don't have any examples of this, and the theoretical performance win of avoiding an extra scan over the pfns array doesn't seem worth the complexity. Plus pfns needs to be scanned anyhow to sort out any DEVICE_PRIVATE pages.
This version replaces the uint64_t with an usigned long containing a pfn and fixed flags. On input flags is filled with the HMM_PFN_REQ_* values, on successful output it is filled with HMM_PFN_* values, describing the state of the pages.
Just some minor stuff below. I wasn't able to spot any errors in the code, though, so these are just documentation nits.
...
diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index 9924f2caa0184c..c9f2329113a47f 100644 +++ b/Documentation/vm/hmm.rst @@ -185,9 +185,6 @@ The usage pattern is:: range.start = ...; range.end = ...; range.pfns = ...;
That should be:
range.hmm_pfns = ...;
Yep
range.flags = ...;
range.values = ...;
range.pfn_shift = ...; if (!mmget_not_zero(interval_sub->notifier.mm)) return -EFAULT;
@@ -229,15 +226,10 @@ The hmm_range struct has 2 fields, default_flags and pfn_flags_mask, that specif fault or snapshot policy for the whole range instead of having to set them for each entry in the pfns array. -For instance, if the device flags for range.flags are:: +For instance if the device driver wants pages for a range with at least read +permission, it sets::
- range.flags[HMM_PFN_VALID] = (1 << 63);
- range.flags[HMM_PFN_WRITE] = (1 << 62);
-and the device driver wants pages for a range with at least read permission, -it sets::
- range->default_flags = (1 << 63);
- range->default_flags = HMM_PFN_REQ_FAULT; range->pfn_flags_mask = 0; and calls hmm_range_fault() as described above. This will fill fault all pages
@@ -246,18 +238,18 @@ in the range with at least read permission. Now let's say the driver wants to do the same except for one page in the range for which it wants to have write permission. Now driver set::
- range->default_flags = (1 << 63);
- range->pfn_flags_mask = (1 << 62);
- range->pfns[index_of_write] = (1 << 62);
- range->default_flags = HMM_PFN_REQ_FAULT;
- range->pfn_flags_mask = HMM_PFN_REQ_WRITE;
- range->pfns[index_of_write] = HMM_PFN_REQ_WRITE;
All these choices for _WRITE behavior make it slightly confusing. I mean, it's better than it was, but there are default flags, a mask, and an index as well, and it looks like maybe we have a little more power and flexibility than desirable? Nouveau for example is now just setting the mask only:
The example is showing how to fault all pages but request write for only certain pages, ie it shows how to use default_flags and pfn_flags together in probably the only way that could make any sense
@@ -542,12 +564,15 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm, return -EBUSY; range.notifier_seq = mmu_interval_read_begin(range.notifier);
range.default_flags = 0;
down_read(&mm->mmap_sem); ret = hmm_range_fault(&range); up_read(&mm->mmap_sem); if (ret) {range.pfn_flags_mask = -1UL;
/*
* FIXME: the input PFN_REQ flags are destroyed on
* -EBUSY, we need to regenerate them, also for the
* other continue below
*/
How serious is this FIXME? It seems like we could get stuck in a loop here, if we're not issuing a new REQ, right?
Serious enough someone should fix it and not copy it into other drivers..
if (ret == -EBUSY) continue; return ret;
@@ -562,7 +587,7 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm, break; }
- nouveau_dmem_convert_pfn(drm, &range);
- nouveau_hmm_convert_pfn(drm, &range, ioctl_addr); svmm->vmm->vmm.object.client->super = true; ret = nvif_object_ioctl(&svmm->vmm->vmm.object, data, size, NULL);
@@ -589,6 +614,7 @@ nouveau_svm_fault(struct nvif_notify *notify) } i; u64 phys[16]; } args;
- unsigned long hmm_pfns[ARRAY_SIZE(args.phys)];
Is there a risk of blowing up the stack here?
16*8 is pretty small, but the call stack is very long sadly, since Ralph succeed it seems OK
*/ -enum hmm_pfn_flag_e {
- HMM_PFN_VALID = 0,
- HMM_PFN_WRITE,
- HMM_PFN_FLAG_MAX
+enum hmm_pfn_flags {
Let's add:
/* Output flags: */
- HMM_PFN_VALID = 1UL << (BITS_PER_LONG - 1),
- HMM_PFN_WRITE = 1UL << (BITS_PER_LONG - 2),
- HMM_PFN_ERROR = 1UL << (BITS_PER_LONG - 3),
/* Input flags: */
Ok
Thanks, Jason