Test-with: 20210715100932.2605648-1-matthew.auld@intel.com
Chris Wilson (1): drm/i915/userptr: Probe existence of backing struct pages upon creation
Matthew Auld (3): drm/i915/uapi: reject caching ioctls for discrete drm/i915/uapi: convert drm_i915_gem_userptr to kernel doc drm/i915/uapi: reject set_domain for discrete
drivers/gpu/drm/i915/gem/i915_gem_domain.c | 9 ++ drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 +++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 + include/uapi/drm/i915_drm.h | 106 +++++++++++++++++++- 4 files changed, 156 insertions(+), 2 deletions(-)
It's a noop on DG1, and in the future when need to support other devices which let us control the coherency, then it should be an immutable creation time property for the BO. This will likely be controlled through a new gem_create_ext extension.
v2: add some kernel doc for the discrete changes, and document the implicit rules
Suggested-by: Daniel Vetter daniel@ffwll.ch Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com Reviewed-by: Kenneth Graunke kenneth@whitecape.org --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 6 +++++ include/uapi/drm/i915_drm.h | 29 ++++++++++++++++++++++ 2 files changed, 35 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 7d1400b13429..43004bef55cb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -268,6 +268,9 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data, struct drm_i915_gem_object *obj; int err = 0;
+ if (IS_DGFX(to_i915(dev))) + return -ENODEV; + rcu_read_lock(); obj = i915_gem_object_lookup_rcu(file, args->handle); if (!obj) { @@ -303,6 +306,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data, enum i915_cache_level level; int ret = 0;
+ if (IS_DGFX(i915)) + return -ENODEV; + switch (args->caching) { case I915_CACHING_NONE: level = I915_CACHE_NONE; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e54f9efaead0..868c2ee7be60 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1395,6 +1395,35 @@ struct drm_i915_gem_busy { * ppGTT support, or if the object is used for scanout). Note that this might * require unbinding the object from the GTT first, if its current caching value * doesn't match. + * + * Note that this all changes on discrete platforms, starting from DG1, the + * set/get caching is no longer supported, and is now rejected. Instead the CPU + * caching attributes(WB vs WC) will become an immutable creation time property + * for the object, along with the GTT caching level. For now we don't expose any + * new uAPI for this, instead on DG1 this is all implicit, although this largely + * shouldn't matter since DG1 is coherent by default(without any way of + * controlling it). + * + * Implicit caching rules, starting from DG1: + * + * - If any of the object placements (see &drm_i915_gem_create_ext_memory_regions) + * contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and + * mapped as write-combined only. + * + * - Everything else is always allocated and mapped as write-back, with the + * guarantee that everything is also coherent with the GPU. + * + * Note that this is likely to change in the future again, where we might need + * more flexibility on future devices, so making this all explicit as part of a + * new &drm_i915_gem_create_ext extension is probable. + * + * Side note: Part of the reason for this is that changing the at-allocation-time CPU + * caching attributes for the pages might be required(and is expensive) if we + * need to then CPU map the pages later with different caching attributes. This + * inconsistent caching behaviour, while supported on x86, is not universally + * supported on other architectures. So for simplicity we opt for setting + * everything at creation time, whilst also making it immutable, on discrete + * platforms. */ struct drm_i915_gem_caching { /**
Add the missing kernel-doc.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com --- include/uapi/drm/i915_drm.h | 40 ++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 868c2ee7be60..e20eeeca7a1c 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -2141,14 +2141,52 @@ struct drm_i915_reset_stats { __u32 pad; };
+/** + * struct drm_i915_gem_userptr - Create GEM object from user allocated memory. + * + * Userptr objects have several restrictions on what ioctls can be used with the + * object handle. + */ struct drm_i915_gem_userptr { + /** + * @user_ptr: The pointer to the allocated memory. + * + * Needs to be aligned to PAGE_SIZE. + */ __u64 user_ptr; + + /** + * @user_size: + * + * The size in bytes for the allocated memory. This will also become the + * object size. + * + * Needs to be aligned to PAGE_SIZE, and should be at least PAGE_SIZE, + * or larger. + */ __u64 user_size; + + /** + * @flags: + * + * Supported flags: + * + * I915_USERPTR_READ_ONLY: + * + * Mark the object as readonly, this also means GPU access can only be + * readonly. This is only supported on HW which supports readonly access + * through the GTT. If the HW can't support readonly access, an error is + * returned. + * + * I915_USERPTR_UNSYNCHRONIZED: + * + * NOT USED. Setting this flag will result in an error. + */ __u32 flags; #define I915_USERPTR_READ_ONLY 0x1 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** - * Returned handle for the object. + * @handle: Returned handle for the object. * * Object handles are nonzero. */
Op 15-07-2021 om 12:15 schreef Matthew Auld:
Add the missing kernel-doc.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
include/uapi/drm/i915_drm.h | 40 ++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 868c2ee7be60..e20eeeca7a1c 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -2141,14 +2141,52 @@ struct drm_i915_reset_stats { __u32 pad; };
+/**
- struct drm_i915_gem_userptr - Create GEM object from user allocated memory.
- Userptr objects have several restrictions on what ioctls can be used with the
- object handle.
- */
struct drm_i915_gem_userptr {
- /**
* @user_ptr: The pointer to the allocated memory.
*
* Needs to be aligned to PAGE_SIZE.
__u64 user_ptr;*/
- /**
* @user_size:
*
* The size in bytes for the allocated memory. This will also become the
* object size.
*
* Needs to be aligned to PAGE_SIZE, and should be at least PAGE_SIZE,
* or larger.
__u64 user_size;*/
- /**
* @flags:
*
* Supported flags:
*
* I915_USERPTR_READ_ONLY:
*
* Mark the object as readonly, this also means GPU access can only be
* readonly. This is only supported on HW which supports readonly access
* through the GTT. If the HW can't support readonly access, an error is
* returned.
*
* I915_USERPTR_UNSYNCHRONIZED:
*
* NOT USED. Setting this flag will result in an error.
__u32 flags;*/
#define I915_USERPTR_READ_ONLY 0x1 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /**
* Returned handle for the object.
* @handle: Returned handle for the object.
*/
- Object handles are nonzero.
Reviewed-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Can you review
https://patchwork.freedesktop.org/patch/444147/?series=92359&rev=2
?
I noticed some parts of i915_drm.h missing.
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2: - add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason). - use mmap_read_{lock, unlock}. - add some kernel-doc.
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{ + const unsigned long end = addr + len; + struct vm_area_struct *vma; + int ret = -EFAULT; + + mmap_read_lock(mm); + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { + if (vma->vm_start > addr) + break; + + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) + break; + + if (vma->vm_end >= end) { + ret = 0; + break; + } + + addr = vma->vm_end; + } + mmap_read_unlock(mm); + + return ret; +} + /* * Creates a new mm object that wraps some normal memory from the process * context - user memory. @@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY | - I915_USERPTR_UNSYNCHRONIZED)) + I915_USERPTR_UNSYNCHRONIZED | + I915_USERPTR_PROBE)) return -EINVAL;
if (i915_gem_object_size_2big(args->user_size)) @@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
+ if (args->flags & I915_USERPTR_PROBE) { + /* + * Check that the range pointed to represents real struct + * pages and not iomappings (at this moment in time!) + */ + ret = probe_range(current->mm, args->user_ptr, args->user_size); + if (ret) + return ret; + } + #ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break; + case I915_PARAM_HAS_USERPTR_PROBE: + value = true; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56 + /* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam { @@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. * + * I915_USERPTR_PROBE: + * + * Probe the provided @user_ptr range and validate that the @user_ptr is + * indeed pointing to normal memory and that the range is also valid. + * For example if some garbage address is given to the kernel, then this + * should complain. + * + * Returns -EFAULT if the probe failed. + * + * Note that this doesn't populate the backing pages. + * + * The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE + * returns a non-zero value. + * * I915_USERPTR_UNSYNCHRONIZED: * * NOT USED. Setting this flag will result in an error. */ __u32 flags; #define I915_USERPTR_READ_ONLY 0x1 +#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
1)
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
2)
I am also not a fan of getparam for individual ioctl flags since I don't think it scales nicely. How about add a param which returns all supported flags like I915_PARAM_USERPTR_SUPPORTED_FLAGS?
Downside is it only works for 32-bit flag fields with getparam. Or it could be a query to solve that as well.
Regards,
Tvrtko
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
- const unsigned long end = addr + len;
- struct vm_area_struct *vma;
- int ret = -EFAULT;
- mmap_read_lock(mm);
- for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
- }
- mmap_read_unlock(mm);
- return ret;
+}
- /*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE))
return -EINVAL;
if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
- if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
- }
- #ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL)
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
- case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;break;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam {
@@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
*
*/ __u32 flags; #define I915_USERPTR_READ_ONLY 0x1
- I915_USERPTR_UNSYNCHRONIZED:
- NOT USED. Setting this flag will result in an error.
+#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
On Thu, Jul 15, 2021 at 11:33:10AM +0100, Tvrtko Ursulin wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
Populate is exactly as racy as probe. We don't support pinned userptr anymore.
I am also not a fan of getparam for individual ioctl flags since I don't think it scales nicely. How about add a param which returns all supported flags like I915_PARAM_USERPTR_SUPPORTED_FLAGS?
Downside is it only works for 32-bit flag fields with getparam. Or it could be a query to solve that as well.
I expect the actual userspace code will simply try with the probe flag first, and then without + set_domain. So strictly speaking we might not even have a need for the getparam.
Otoh, let's not overthink/engineer this, whatever works for userspace is ok imo. A new query that lists all flags is the kind of fake-generic stuff that will like mis-estimate where the future actually lands, e.g. how do you encode if we add extensions to userptr to this? Which is something we absolutely can, by extending the struct at the end, which doesn't even need a new flag.
Let's solve the immediate problem at hand, and not try to guess what more problems we might have in the future. -Daniel
Regards,
Tvrtko
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { #endif +static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
- const unsigned long end = addr + len;
- struct vm_area_struct *vma;
- int ret = -EFAULT;
- mmap_read_lock(mm);
- for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
- }
- mmap_read_unlock(mm);
- return ret;
+}
- /*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, } if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
return -EINVAL; if (i915_gem_object_size_2big(args->user_size))I915_USERPTR_PROBE))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
- if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
- }
- #ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL)
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
- case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;break;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55 +/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
- /* Must be kept compact -- no holes and well documented */ typedef struct drm_i915_getparam {
@@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
*
*/ __u32 flags; #define I915_USERPTR_READ_ONLY 0x1
- I915_USERPTR_UNSYNCHRONIZED:
- NOT USED. Setting this flag will result in an error.
+#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
On 15/07/2021 12:07, Daniel Vetter wrote:
On Thu, Jul 15, 2021 at 11:33:10AM +0100, Tvrtko Ursulin wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
Populate is exactly as racy as probe. We don't support pinned userptr anymore.
Yes, wrote so myself - "..again with no guarantees they are still there at use time..".
Perhaps I don't understand what problem is probe supposed to solve. It doesn't deal 1:1 with set_domain removal since that one actually did get_pages so that would be populate. But fact remains regardless that if userspace is given a pointer it doesn't trust, _and_ wants the check it for this reason or that, then probe solves nothing. Unless there is actually at minimum some protocol to reply to whoever sent the pointer like "not that pointer please".
I am also not a fan of getparam for individual ioctl flags since I don't think it scales nicely. How about add a param which returns all supported flags like I915_PARAM_USERPTR_SUPPORTED_FLAGS?
Downside is it only works for 32-bit flag fields with getparam. Or it could be a query to solve that as well.
I expect the actual userspace code will simply try with the probe flag first, and then without + set_domain. So strictly speaking we might not even have a need for the getparam.
Otoh, let's not overthink/engineer this, whatever works for userspace is ok imo. A new query that lists all flags is the kind of fake-generic stuff that will like mis-estimate where the future actually lands, e.g. how do you encode if we add extensions to userptr to this? Which is something we absolutely can, by extending the struct at the end, which doesn't even need a new flag.
Let's solve the immediate problem at hand, and not try to guess what more problems we might have in the future.
Yeah I am guessing if anyone is using some of the other userptr flags they don't have a getparam for that either. At least that would be consolidated with I915_PARAM_USERPTR_SUPPORTED_FLAGS. There is no requirement that getparam needs to keep supporting future extensions to the struct itself, equally as I915_PARAM_HAS_USERPTR_PROBE is directly tied to the same flags field, only a subset of it.
Regards,
Tvrtko
-Daniel
Regards,
Tvrtko
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { #endif +static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
- const unsigned long end = addr + len;
- struct vm_area_struct *vma;
- int ret = -EFAULT;
- mmap_read_lock(mm);
- for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
- }
- mmap_read_unlock(mm);
- return ret;
+}
- /*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, } if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
if (i915_gem_object_size_2big(args->user_size))I915_USERPTR_PROBE)) return -EINVAL;
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
- if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
- }
- #ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL)
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
- case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;break;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55 +/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
- /* Must be kept compact -- no holes and well documented */ typedef struct drm_i915_getparam {
@@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
*
*/ __u32 flags; #define I915_USERPTR_READ_ONLY 0x1
- I915_USERPTR_UNSYNCHRONIZED:
- NOT USED. Setting this flag will result in an error.
+#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
On Thursday, July 15, 2021 4:27:44 AM PDT Tvrtko Ursulin wrote:
On 15/07/2021 12:07, Daniel Vetter wrote:
On Thu, Jul 15, 2021 at 11:33:10AM +0100, Tvrtko Ursulin wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
Populate is exactly as racy as probe. We don't support pinned userptr anymore.
Yes, wrote so myself - "..again with no guarantees they are still there at use time..".
Perhaps I don't understand what problem is probe supposed to solve. It doesn't deal 1:1 with set_domain removal since that one actually did get_pages so that would be populate. But fact remains regardless that if userspace is given a pointer it doesn't trust, _and_ wants the check it for this reason or that, then probe solves nothing. Unless there is actually at minimum some protocol to reply to whoever sent the pointer like "not that pointer please".
That's exactly the point. GL_AMD_pinned_memory requires us the OpenGL implementation to return an error for "not that pointer, please", at the time when said pointer is supplied - not at first use.
Sure, there can be reasons why it might seem fine up front, and not work later. But an early check of "just no, you're doing it totally wrong" at the right moment can be helpful for application developers. While it shouldn't really happen, if it ever did, it would be a lot more obvious to debug than "much later on, when something randomly flushed the GPU commands we were building, something went wrong, and we don't know why."
--Ken
On Thu, Jul 15, 2021 at 1:21 PM Kenneth Graunke kenneth@whitecape.org wrote:
On Thursday, July 15, 2021 4:27:44 AM PDT Tvrtko Ursulin wrote:
On 15/07/2021 12:07, Daniel Vetter wrote:
On Thu, Jul 15, 2021 at 11:33:10AM +0100, Tvrtko Ursulin wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
Populate is exactly as racy as probe. We don't support pinned userptr anymore.
Yes, wrote so myself - "..again with no guarantees they are still there at use time..".
Perhaps I don't understand what problem is probe supposed to solve. It doesn't deal 1:1 with set_domain removal since that one actually did get_pages so that would be populate. But fact remains regardless that if userspace is given a pointer it doesn't trust, _and_ wants the check it for this reason or that, then probe solves nothing. Unless there is actually at minimum some protocol to reply to whoever sent the pointer like "not that pointer please".
That's exactly the point. GL_AMD_pinned_memory requires us the OpenGL implementation to return an error for "not that pointer, please", at the time when said pointer is supplied - not at first use.
Sure, there can be reasons why it might seem fine up front, and not work later. But an early check of "just no, you're doing it totally wrong" at the right moment can be helpful for application developers. While it shouldn't really happen, if it ever did, it would be a lot more obvious to debug than "much later on, when something randomly flushed the GPU commands we were building, something went wrong, and we don't know why."
And before someone chimes in saying that Vulkan doesn't need this because we can just VK_ERROR_DEVICE_LOST later, that's not true. We'd like to be able to return VK_ERROR_INVALID_EXTERNAL_HANDLE in the easily checkable cases like if they try to pass in a mmapped file.
--Jason
On 15/07/2021 19:21, Kenneth Graunke wrote:
On Thursday, July 15, 2021 4:27:44 AM PDT Tvrtko Ursulin wrote:
On 15/07/2021 12:07, Daniel Vetter wrote:
On Thu, Jul 15, 2021 at 11:33:10AM +0100, Tvrtko Ursulin wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
Populate is exactly as racy as probe. We don't support pinned userptr anymore.
Yes, wrote so myself - "..again with no guarantees they are still there at use time..".
Perhaps I don't understand what problem is probe supposed to solve. It doesn't deal 1:1 with set_domain removal since that one actually did get_pages so that would be populate. But fact remains regardless that if userspace is given a pointer it doesn't trust, _and_ wants the check it for this reason or that, then probe solves nothing. Unless there is actually at minimum some protocol to reply to whoever sent the pointer like "not that pointer please".
That's exactly the point. GL_AMD_pinned_memory requires us the OpenGL implementation to return an error for "not that pointer, please", at the time when said pointer is supplied - not at first use.
Sure, there can be reasons why it might seem fine up front, and not work later. But an early check of "just no, you're doing it totally wrong" at the right moment can be helpful for application developers. While it shouldn't really happen, if it ever did, it would be a lot more obvious to debug than "much later on, when something randomly flushed the GPU commands we were building, something went wrong, and we don't know why."
Ack, thanks for the clarification. For this limited use case I agree probe works.
Regards,
Tvrtko
P.S. I made a mistake (?) of looking at the GL_AMD_pinned_memory spec:
"""
3) Can the application free the memory?
RESOLVED: YES. However, the underlying buffer object should have been deleted from the OpenGL to prevent any access by the GPU, or the result is undefined, including program or even system termination. """
Scary stuff that spec of userspace level API would allow such kernel bugs!
On Thu, Jul 15, 2021 at 8:21 PM Kenneth Graunke kenneth@whitecape.org wrote:
On Thursday, July 15, 2021 4:27:44 AM PDT Tvrtko Ursulin wrote:
On 15/07/2021 12:07, Daniel Vetter wrote:
On Thu, Jul 15, 2021 at 11:33:10AM +0100, Tvrtko Ursulin wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
Populate is exactly as racy as probe. We don't support pinned userptr anymore.
Yes, wrote so myself - "..again with no guarantees they are still there at use time..".
Perhaps I don't understand what problem is probe supposed to solve. It doesn't deal 1:1 with set_domain removal since that one actually did get_pages so that would be populate. But fact remains regardless that if userspace is given a pointer it doesn't trust, _and_ wants the check it for this reason or that, then probe solves nothing. Unless there is actually at minimum some protocol to reply to whoever sent the pointer like "not that pointer please".
That's exactly the point. GL_AMD_pinned_memory requires us the OpenGL implementation to return an error for "not that pointer, please", at the time when said pointer is supplied - not at first use.
Sure, there can be reasons why it might seem fine up front, and not work later. But an early check of "just no, you're doing it totally wrong" at the right moment can be helpful for application developers. While it shouldn't really happen, if it ever did, it would be a lot more obvious to debug than "much later on, when something randomly flushed the GPU commands we were building, something went wrong, and we don't know why."
Also, that extension doesn't make guarantees about importing nasty userspace memory where some non-trusted entity could e.g. truncate() the file and render all pages invalid, even if you're holding a reference to it still.
It's purely to check "hey I got this random pointer here from somewhere, I trust it, can you use it assuming I will not change anything with hit?". Which is exactly what probe solves, and pulling in the pages is kinda overkill. -Daniel
On Thu, 15 Jul 2021 at 11:33, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
I am also not a fan of getparam for individual ioctl flags since I don't think it scales nicely. How about add a param which returns all supported flags like I915_PARAM_USERPTR_SUPPORTED_FLAGS?
Downside is it only works for 32-bit flag fields with getparam. Or it could be a query to solve that as well.
I guess. You don't think it's a little iffy though, since there were other flags which were added before this? So effectively userspace queries SUPPORTED_FLAGS and might get -EINVAL on older kernels, even though the flag is supported on that kernel(like READONLY)?
Maybe a versioning scheme instead? I915_PARAM_USERPTR_VERSION? Seems quite common for other params.
Regards,
Tvrtko
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
const unsigned long end = addr + len;
struct vm_area_struct *vma;
int ret = -EFAULT;
mmap_read_lock(mm);
for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
}
mmap_read_unlock(mm);
return ret;
+}
- /*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE)) return -EINVAL; if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
}
- #ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL)
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam {
@@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
#define I915_USERPTR_READ_ONLY 0x1* * I915_USERPTR_UNSYNCHRONIZED: * * NOT USED. Setting this flag will result in an error. */ __u32 flags;
+#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 15/07/2021 12:09, Matthew Auld wrote:
On Thu, 15 Jul 2021 at 11:33, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
I think probing is too weak to be offered as part of the uapi. What probes successfully at create time might not be there anymore at usage time. So if the pointer is not trusted at one point, why should it be at a later stage?
Only thing which works for me is populate (so get_pages) at create time. But again with no guarantees they are still there at use time clearly documented.
I am also not a fan of getparam for individual ioctl flags since I don't think it scales nicely. How about add a param which returns all supported flags like I915_PARAM_USERPTR_SUPPORTED_FLAGS?
Downside is it only works for 32-bit flag fields with getparam. Or it could be a query to solve that as well.
I guess. You don't think it's a little iffy though, since there were other flags which were added before this? So effectively userspace queries SUPPORTED_FLAGS and might get -EINVAL on older kernels, even though the flag is supported on that kernel(like READONLY)?
For me it is probably passable since unsupported getparam fundamentally doesn't imply unsupported features predating that getparam. Same as for example unsupported engine query does not mean there are no engines. But anyway, seems question will be resolved by Daniel so don't pay attention to me.
Regards,
Tvrtko
Maybe a versioning scheme instead? I915_PARAM_USERPTR_VERSION? Seems quite common for other params.
Regards,
Tvrtko
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
const unsigned long end = addr + len;
struct vm_area_struct *vma;
int ret = -EFAULT;
mmap_read_lock(mm);
for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
}
mmap_read_unlock(mm);
return ret;
+}
- /*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE)) return -EINVAL; if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
}
- #ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL)
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam {
@@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
#define I915_USERPTR_READ_ONLY 0x1* * I915_USERPTR_UNSYNCHRONIZED: * * NOT USED. Setting this flag will result in an error. */ __u32 flags;
+#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 15/07/2021 11:15, Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily
PROBE
check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
- const unsigned long end = addr + len;
- struct vm_area_struct *vma;
- int ret = -EFAULT;
- mmap_read_lock(mm);
- for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
- }
- mmap_read_unlock(mm);
Logic here looks good to me.
- return ret;
+}
- /*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE))
return -EINVAL;
if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
- if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
- }
- #ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL)
diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
- case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;break;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam {
@@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
I'd also add a note on how validation at create time may not mean object will still be valid at use time.
With that added, and ignoring the question of whether to have setparam or not:
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
*
*/ __u32 flags; #define I915_USERPTR_READ_ONLY 0x1
- I915_USERPTR_UNSYNCHRONIZED:
- NOT USED. Setting this flag will result in an error.
+#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
Thanks for this! Series is:
Acked-by: Kenneth Graunke kenneth@whitecape.org
https://gitlab.freedesktop.org/kwg/mesa/-/commits/iris-userptr-probe is an untested branch that uses the new probe API in Mesa.
On Thursday, July 15, 2021 3:15:35 AM PDT Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
- const unsigned long end = addr + len;
- struct vm_area_struct *vma;
- int ret = -EFAULT;
- mmap_read_lock(mm);
- for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
- }
- mmap_read_unlock(mm);
- return ret;
+}
/*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE))
return -EINVAL;
if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
- if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
- }
#ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
- case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;break;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam { @@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
*
*/ __u32 flags;
- I915_USERPTR_UNSYNCHRONIZED:
- NOT USED. Setting this flag will result in an error.
#define I915_USERPTR_READ_ONLY 0x1 +#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
Thanks! Series is:
Acked-by: Kenneth Graunke kenneth@whitecape.org
https://gitlab.freedesktop.org/kwg/mesa/-/commits/iris-userptr-probe is an untested Mesa branch that makes use of the new probe uAPI.
On Thursday, July 15, 2021 3:15:35 AM PDT Matthew Auld wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
- const unsigned long end = addr + len;
- struct vm_area_struct *vma;
- int ret = -EFAULT;
- mmap_read_lock(mm);
- for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
- }
- mmap_read_unlock(mm);
- return ret;
+}
/*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE))
return -EINVAL;
if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
- if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
- }
#ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
- case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;break;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam { @@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
*
*/ __u32 flags;
- I915_USERPTR_UNSYNCHRONIZED:
- NOT USED. Setting this flag will result in an error.
#define I915_USERPTR_READ_ONLY 0x1 +#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object.
On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld matthew.auld@intel.com wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
const unsigned long end = addr + len;
struct vm_area_struct *vma;
int ret = -EFAULT;
mmap_read_lock(mm);
for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
Why isn't this > end? Are we somehow guaranteed that one vma covers the entire range?
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
}
mmap_read_unlock(mm);
return ret;
+}
/*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE)) return -EINVAL; if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
}
#ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam { @@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
* * I915_USERPTR_UNSYNCHRONIZED: * * NOT USED. Setting this flag will result in an error. */ __u32 flags;
#define I915_USERPTR_READ_ONLY 0x1 +#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object. -- 2.26.3
On Wed, 21 Jul 2021 at 21:28, Jason Ekstrand jason@jlekstrand.net wrote:
On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld matthew.auld@intel.com wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
const unsigned long end = addr + len;
struct vm_area_struct *vma;
int ret = -EFAULT;
mmap_read_lock(mm);
for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
Why isn't this > end? Are we somehow guaranteed that one vma covers the entire range?
AFAIK we are just making sure we don't have a hole(note that we also update addr below), for example the user might have done a partial munmap. There could be multiple vma's if the kernel was unable to merge them. If we reach the vm_end >= end, then we know we have a "valid" range.
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
}
mmap_read_unlock(mm);
return ret;
+}
/*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE)) return -EINVAL; if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
}
#ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam { @@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
* * I915_USERPTR_UNSYNCHRONIZED: * * NOT USED. Setting this flag will result in an error. */ __u32 flags;
#define I915_USERPTR_READ_ONLY 0x1 +#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object. -- 2.26.3
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Jul 22, 2021 at 3:44 AM Matthew Auld matthew.william.auld@gmail.com wrote:
On Wed, 21 Jul 2021 at 21:28, Jason Ekstrand jason@jlekstrand.net wrote:
On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld matthew.auld@intel.com wrote:
From: Chris Wilson chris@chris-wilson.co.uk
Jason Ekstrand requested a more efficient method than userptr+set-domain to determine if the userptr object was backed by a complete set of pages upon creation. To be more efficient than simply populating the userptr using get_user_pages() (as done by the call to set-domain or execbuf), we can walk the tree of vm_area_struct and check for gaps or vma not backed by struct page (VM_PFNMAP). The question is how to handle VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete are going to drop support for set_domain(), so offering a way to probe the pages, without having to resort to dummy batches has been requested.
v2:
- add new query param for the PROPBE flag, so userspace can easily check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
Testcase: igt/gem_userptr_blits/probe Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 ++++++++++++++++++++- drivers/gpu/drm/i915/i915_getparam.c | 3 ++ include/uapi/drm/i915_drm.h | 18 ++++++++++ 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 56edfeff8c02..fd6880328596 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
+static int +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) +{
const unsigned long end = addr + len;
struct vm_area_struct *vma;
int ret = -EFAULT;
mmap_read_lock(mm);
for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
if (vma->vm_start > addr)
Why isn't this > end? Are we somehow guaranteed that one vma covers the entire range?
AFAIK we are just making sure we don't have a hole(note that we also update addr below), for example the user might have done a partial munmap. There could be multiple vma's if the kernel was unable to merge them. If we reach the vm_end >= end, then we know we have a "valid" range.
Ok. That wasn't obvious to me but I see the addr update now. Makes sense. Might be worth a one-line comment for the next guy. Either way,
Reviewed-by: Jason Ekstrand jason@jlekstrand.net
Thanks for wiring this up!
--Jason
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
}
mmap_read_unlock(mm);
return ret;
+}
/*
- Creates a new mm object that wraps some normal memory from the process
- context - user memory.
@@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, }
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE)) return -EINVAL; if (i915_gem_object_size_2big(args->user_size))
@@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
}
#ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 24e18219eb50..d6d2e1a10d14 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break;
case I915_PARAM_HAS_USERPTR_PROBE:
value = true;
break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e20eeeca7a1c..2e4112bf4d38 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
+/* Query if the kernel supports the I915_USERPTR_PROBE flag. */ +#define I915_PARAM_HAS_USERPTR_PROBE 56
/* Must be kept compact -- no holes and well documented */
typedef struct drm_i915_getparam { @@ -2178,12 +2181,27 @@ struct drm_i915_gem_userptr { * through the GTT. If the HW can't support readonly access, an error is * returned. *
* I915_USERPTR_PROBE:
*
* Probe the provided @user_ptr range and validate that the @user_ptr is
* indeed pointing to normal memory and that the range is also valid.
* For example if some garbage address is given to the kernel, then this
* should complain.
*
* Returns -EFAULT if the probe failed.
*
* Note that this doesn't populate the backing pages.
*
* The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
* returns a non-zero value.
* * I915_USERPTR_UNSYNCHRONIZED: * * NOT USED. Setting this flag will result in an error. */ __u32 flags;
#define I915_USERPTR_READ_ONLY 0x1 +#define I915_USERPTR_PROBE 0x2 #define I915_USERPTR_UNSYNCHRONIZED 0x80000000 /** * @handle: Returned handle for the object. -- 2.26.3
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
The CPU domain should be static for discrete, and on DG1 we don't need any flushing since everything is already coherent, so really all this does is an object wait, for which we have an ioctl. Longer term the desired caching should be an immutable creation time property for the BO, which can be set with something like gem_create_ext.
One other user is iris + userptr, which uses the set_domain to probe all the pages to check if the GUP succeeds, however we now have a PROBE flag for this purpose.
v2: add some more kernel doc, also add the implicit rules with caching
Suggested-by: Daniel Vetter daniel@ffwll.ch Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ include/uapi/drm/i915_drm.h | 19 +++++++++++++++++++ 2 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 43004bef55cb..b684a62bf3b0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -490,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, u32 write_domain = args->write_domain; int err;
+ if (IS_DGFX(to_i915(dev))) + return -ENODEV; + /* Only handle setting domains to types used by the CPU. */ if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS) return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 2e4112bf4d38..04ce310e7ee6 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -901,6 +901,25 @@ struct drm_i915_gem_mmap_offset { * - I915_GEM_DOMAIN_GTT: Mappable aperture domain * * All other domains are rejected. + * + * Note that for discrete, starting from DG1, this is no longer supported, and + * is instead rejected. On such platforms the CPU domain is effectively static, + * where we also only support a single &drm_i915_gem_mmap_offset cache mode, + * which can't be set explicitly and instead depends on the object placements, + * as per the below. + * + * Implicit caching rules, starting from DG1: + * + * - If any of the object placements (see &drm_i915_gem_create_ext_memory_regions) + * contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and + * mapped as write-combined only. + * + * - Everything else is always allocated and mapped as write-back, with the + * guarantee that everything is also coherent with the GPU. + * + * Note that this is likely to change in the future again, where we might need + * more flexibility on future devices, so making this all explicit as part of a + * new &drm_i915_gem_create_ext extension is probable. */ struct drm_i915_gem_set_domain { /** @handle: Handle for the object. */
On 15/07/2021 11:15, Matthew Auld wrote:
The CPU domain should be static for discrete, and on DG1 we don't need any flushing since everything is already coherent, so really all this does is an object wait, for which we have an ioctl. Longer term the desired caching should be an immutable creation time property for the BO, which can be set with something like gem_create_ext.
One other user is iris + userptr, which uses the set_domain to probe all the pages to check if the GUP succeeds, however we now have a PROBE flag for this purpose.
v2: add some more kernel doc, also add the implicit rules with caching
Suggested-by: Daniel Vetter daniel@ffwll.ch Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ include/uapi/drm/i915_drm.h | 19 +++++++++++++++++++ 2 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 43004bef55cb..b684a62bf3b0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -490,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, u32 write_domain = args->write_domain; int err;
- if (IS_DGFX(to_i915(dev)))
return -ENODEV;
- /* Only handle setting domains to types used by the CPU. */ if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS) return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 2e4112bf4d38..04ce310e7ee6 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -901,6 +901,25 @@ struct drm_i915_gem_mmap_offset {
- I915_GEM_DOMAIN_GTT: Mappable aperture domain
- All other domains are rejected.
- Note that for discrete, starting from DG1, this is no longer supported, and
- is instead rejected. On such platforms the CPU domain is effectively static,
- where we also only support a single &drm_i915_gem_mmap_offset cache mode,
- which can't be set explicitly and instead depends on the object placements,
- as per the below.
- Implicit caching rules, starting from DG1:
- If any of the object placements (see &drm_i915_gem_create_ext_memory_regions)
contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and
mapped as write-combined only.
A note about write-combine buffer? I guess saying it is userspace responsibility to do it and how.
- Everything else is always allocated and mapped as write-back, with the
guarantee that everything is also coherent with the GPU.
Haven't been following this so just a question on this one - it is not considered interesting to offer non-coherent modes, or even write combine, with system memory buffers, for a specific reason?
Regards,
Tvrtko
- Note that this is likely to change in the future again, where we might need
- more flexibility on future devices, so making this all explicit as part of a
*/ struct drm_i915_gem_set_domain { /** @handle: Handle for the object. */
- new &drm_i915_gem_create_ext extension is probable.
On Fri, Jul 16, 2021 at 9:52 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
The CPU domain should be static for discrete, and on DG1 we don't need any flushing since everything is already coherent, so really all this does is an object wait, for which we have an ioctl. Longer term the desired caching should be an immutable creation time property for the BO, which can be set with something like gem_create_ext.
One other user is iris + userptr, which uses the set_domain to probe all the pages to check if the GUP succeeds, however we now have a PROBE flag for this purpose.
v2: add some more kernel doc, also add the implicit rules with caching
Suggested-by: Daniel Vetter daniel@ffwll.ch Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ include/uapi/drm/i915_drm.h | 19 +++++++++++++++++++ 2 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 43004bef55cb..b684a62bf3b0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -490,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, u32 write_domain = args->write_domain; int err;
if (IS_DGFX(to_i915(dev)))
return -ENODEV;
/* Only handle setting domains to types used by the CPU. */ if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS) return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 2e4112bf4d38..04ce310e7ee6 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -901,6 +901,25 @@ struct drm_i915_gem_mmap_offset {
- I915_GEM_DOMAIN_GTT: Mappable aperture domain
- All other domains are rejected.
- Note that for discrete, starting from DG1, this is no longer supported, and
- is instead rejected. On such platforms the CPU domain is effectively static,
- where we also only support a single &drm_i915_gem_mmap_offset cache mode,
- which can't be set explicitly and instead depends on the object placements,
- as per the below.
- Implicit caching rules, starting from DG1:
- If any of the object placements (see &drm_i915_gem_create_ext_memory_regions)
contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and
mapped as write-combined only.
Is this accurate? I thought they got WB when living in SMEM and WC when on the device. But, since both are coherent, it's safe to lie to userspace and say it's all WC. Is that correct or am I missing something?
A note about write-combine buffer? I guess saying it is userspace responsibility to do it and how.
What exactly are you thinking is userspace's responsibility?
- Everything else is always allocated and mapped as write-back, with the
guarantee that everything is also coherent with the GPU.
Haven't been following this so just a question on this one - it is not considered interesting to offer non-coherent modes, or even write combine, with system memory buffers, for a specific reason?
We only care about non-coherent modes on integrated little-core. There, we share memory between CPU and GPU but snooping from the GPU is optional. Depending on access patterns, we might want WB with GPU snooping or we might want WC. I don't think we care about WC for SMEM allocations on discrete. For that matter, I'm not sure you can actually shut snooping off when going across a "real" PCIe bus. At least not with DG1.
--Jason
Regards,
Tvrtko
- Note that this is likely to change in the future again, where we might need
- more flexibility on future devices, so making this all explicit as part of a
*/ struct drm_i915_gem_set_domain { /** @handle: Handle for the object. */
- new &drm_i915_gem_create_ext extension is probable.
On 16/07/2021 16:23, Jason Ekstrand wrote:
On Fri, Jul 16, 2021 at 9:52 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
The CPU domain should be static for discrete, and on DG1 we don't need any flushing since everything is already coherent, so really all this does is an object wait, for which we have an ioctl. Longer term the desired caching should be an immutable creation time property for the BO, which can be set with something like gem_create_ext.
One other user is iris + userptr, which uses the set_domain to probe all the pages to check if the GUP succeeds, however we now have a PROBE flag for this purpose.
v2: add some more kernel doc, also add the implicit rules with caching
Suggested-by: Daniel Vetter daniel@ffwll.ch Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ include/uapi/drm/i915_drm.h | 19 +++++++++++++++++++ 2 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 43004bef55cb..b684a62bf3b0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -490,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, u32 write_domain = args->write_domain; int err;
if (IS_DGFX(to_i915(dev)))
return -ENODEV;
/* Only handle setting domains to types used by the CPU. */ if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS) return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 2e4112bf4d38..04ce310e7ee6 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -901,6 +901,25 @@ struct drm_i915_gem_mmap_offset { * - I915_GEM_DOMAIN_GTT: Mappable aperture domain * * All other domains are rejected.
- Note that for discrete, starting from DG1, this is no longer supported, and
- is instead rejected. On such platforms the CPU domain is effectively static,
- where we also only support a single &drm_i915_gem_mmap_offset cache mode,
- which can't be set explicitly and instead depends on the object placements,
- as per the below.
- Implicit caching rules, starting from DG1:
- If any of the object placements (see &drm_i915_gem_create_ext_memory_regions)
contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and
mapped as write-combined only.
Is this accurate? I thought they got WB when living in SMEM and WC when on the device. But, since both are coherent, it's safe to lie to userspace and say it's all WC. Is that correct or am I missing something?
A note about write-combine buffer? I guess saying it is userspace responsibility to do it and how.
What exactly are you thinking is userspace's responsibility?
Flushing of the write combine buffer.
- Everything else is always allocated and mapped as write-back, with the
guarantee that everything is also coherent with the GPU.
Haven't been following this so just a question on this one - it is not considered interesting to offer non-coherent modes, or even write combine, with system memory buffers, for a specific reason?
We only care about non-coherent modes on integrated little-core. There, we share memory between CPU and GPU but snooping from the GPU is optional. Depending on access patterns, we might want WB with GPU snooping or we might want WC. I don't think we care about WC for SMEM allocations on discrete. For that matter, I'm not sure you can actually shut snooping off when going across a "real" PCIe bus. At least not with DG1.
But writes to system memory buffers aren't going over the PCIe bus?!
Anyways, I am not claiming it is an interesting use case, just wondering about the reasoning for making the modes fixed.
Regards,
Tvrtko
--Jason
Regards,
Tvrtko
- Note that this is likely to change in the future again, where we might need
- more flexibility on future devices, so making this all explicit as part of a
struct drm_i915_gem_set_domain { /** @handle: Handle for the object. */
- new &drm_i915_gem_create_ext extension is probable. */
On Fri, 16 Jul 2021 at 16:23, Jason Ekstrand jason@jlekstrand.net wrote:
On Fri, Jul 16, 2021 at 9:52 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
The CPU domain should be static for discrete, and on DG1 we don't need any flushing since everything is already coherent, so really all this does is an object wait, for which we have an ioctl. Longer term the desired caching should be an immutable creation time property for the BO, which can be set with something like gem_create_ext.
One other user is iris + userptr, which uses the set_domain to probe all the pages to check if the GUP succeeds, however we now have a PROBE flag for this purpose.
v2: add some more kernel doc, also add the implicit rules with caching
Suggested-by: Daniel Vetter daniel@ffwll.ch Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ include/uapi/drm/i915_drm.h | 19 +++++++++++++++++++ 2 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 43004bef55cb..b684a62bf3b0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -490,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, u32 write_domain = args->write_domain; int err;
if (IS_DGFX(to_i915(dev)))
return -ENODEV;
/* Only handle setting domains to types used by the CPU. */ if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS) return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 2e4112bf4d38..04ce310e7ee6 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -901,6 +901,25 @@ struct drm_i915_gem_mmap_offset {
- I915_GEM_DOMAIN_GTT: Mappable aperture domain
- All other domains are rejected.
- Note that for discrete, starting from DG1, this is no longer supported, and
- is instead rejected. On such platforms the CPU domain is effectively static,
- where we also only support a single &drm_i915_gem_mmap_offset cache mode,
- which can't be set explicitly and instead depends on the object placements,
- as per the below.
- Implicit caching rules, starting from DG1:
- If any of the object placements (see &drm_i915_gem_create_ext_memory_regions)
contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and
mapped as write-combined only.
Is this accurate? I thought they got WB when living in SMEM and WC when on the device. But, since both are coherent, it's safe to lie to userspace and say it's all WC. Is that correct or am I missing something?
Yes, it's accurate, it will be allocated and mapped as WC. I think we can just make select_tt_caching always return cached if we want, and it looks like ttm seems to be fine with having different caching values for the tt vs io resource. Daniel, should we adjust this?
A note about write-combine buffer? I guess saying it is userspace responsibility to do it and how.
What exactly are you thinking is userspace's responsibility?
- Everything else is always allocated and mapped as write-back, with the
guarantee that everything is also coherent with the GPU.
Haven't been following this so just a question on this one - it is not considered interesting to offer non-coherent modes, or even write combine, with system memory buffers, for a specific reason?
We only care about non-coherent modes on integrated little-core. There, we share memory between CPU and GPU but snooping from the GPU is optional. Depending on access patterns, we might want WB with GPU snooping or we might want WC. I don't think we care about WC for SMEM allocations on discrete. For that matter, I'm not sure you can actually shut snooping off when going across a "real" PCIe bus. At least not with DG1.
--Jason
Regards,
Tvrtko
- Note that this is likely to change in the future again, where we might need
- more flexibility on future devices, so making this all explicit as part of a
*/ struct drm_i915_gem_set_domain { /** @handle: Handle for the object. */
- new &drm_i915_gem_create_ext extension is probable.
On Mon, Jul 19, 2021 at 4:10 AM Matthew Auld matthew.william.auld@gmail.com wrote:
On Fri, 16 Jul 2021 at 16:23, Jason Ekstrand jason@jlekstrand.net wrote:
On Fri, Jul 16, 2021 at 9:52 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 15/07/2021 11:15, Matthew Auld wrote:
The CPU domain should be static for discrete, and on DG1 we don't need any flushing since everything is already coherent, so really all this does is an object wait, for which we have an ioctl. Longer term the desired caching should be an immutable creation time property for the BO, which can be set with something like gem_create_ext.
One other user is iris + userptr, which uses the set_domain to probe all the pages to check if the GUP succeeds, however we now have a PROBE flag for this purpose.
v2: add some more kernel doc, also add the implicit rules with caching
Suggested-by: Daniel Vetter daniel@ffwll.ch Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ include/uapi/drm/i915_drm.h | 19 +++++++++++++++++++ 2 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 43004bef55cb..b684a62bf3b0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -490,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, u32 write_domain = args->write_domain; int err;
if (IS_DGFX(to_i915(dev)))
return -ENODEV;
/* Only handle setting domains to types used by the CPU. */ if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS) return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 2e4112bf4d38..04ce310e7ee6 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -901,6 +901,25 @@ struct drm_i915_gem_mmap_offset {
- I915_GEM_DOMAIN_GTT: Mappable aperture domain
- All other domains are rejected.
- Note that for discrete, starting from DG1, this is no longer supported, and
- is instead rejected. On such platforms the CPU domain is effectively static,
- where we also only support a single &drm_i915_gem_mmap_offset cache mode,
- which can't be set explicitly and instead depends on the object placements,
- as per the below.
- Implicit caching rules, starting from DG1:
- If any of the object placements (see &drm_i915_gem_create_ext_memory_regions)
contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and
mapped as write-combined only.
Is this accurate? I thought they got WB when living in SMEM and WC when on the device. But, since both are coherent, it's safe to lie to userspace and say it's all WC. Is that correct or am I missing something?
Yes, it's accurate, it will be allocated and mapped as WC. I think we can just make select_tt_caching always return cached if we want, and it looks like ttm seems to be fine with having different caching values for the tt vs io resource. Daniel, should we adjust this?
Mildly related, we had an issue some time back with i915+amdgpu where we were choosing different caching settings for SMEM shared BOs and the fallout was that we had all sorts of caching trouble when running an integrated+discrete setup with them. I don't remember how all that shook out but we should think about it here. Why is this important? Because our mmap caching settings are going to be related to our snooping settings for GPU access across the PCIe bar to SMEM. If we're WC then when can avoid snooping but if we're WB then we need snooping enabled. WC+snooping might work but I'm not sure off-hand.
--Jason
A note about write-combine buffer? I guess saying it is userspace responsibility to do it and how.
What exactly are you thinking is userspace's responsibility?
- Everything else is always allocated and mapped as write-back, with the
guarantee that everything is also coherent with the GPU.
Haven't been following this so just a question on this one - it is not considered interesting to offer non-coherent modes, or even write combine, with system memory buffers, for a specific reason?
We only care about non-coherent modes on integrated little-core. There, we share memory between CPU and GPU but snooping from the GPU is optional. Depending on access patterns, we might want WB with GPU snooping or we might want WC. I don't think we care about WC for SMEM allocations on discrete. For that matter, I'm not sure you can actually shut snooping off when going across a "real" PCIe bus. At least not with DG1.
--Jason
Regards,
Tvrtko
- Note that this is likely to change in the future again, where we might need
- more flexibility on future devices, so making this all explicit as part of a
*/ struct drm_i915_gem_set_domain { /** @handle: Handle for the object. */
- new &drm_i915_gem_create_ext extension is probable.
dri-devel@lists.freedesktop.org