Test-with: 20220525183702.490989-1-matthew.auld@intel.com
Add an entry for the new uapi needed for small BAR on DG2+.
v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) v3: - Drop the vma query for now. - Add unallocated_cpu_visible_size as part of the region query. - Improve the docs some more, including documenting the expected behaviour on older kernels, since this came up in some offline discussion. v4: - Various improvements all over. (Tvrtko)
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com Cc: mesa-dev@lists.freedesktop.org Acked-by: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Acked-by: Akeem G Abodunrin akeem.g.abodunrin@intel.com --- Documentation/gpu/rfc/i915_small_bar.h | 189 +++++++++++++++++++++++ Documentation/gpu/rfc/i915_small_bar.rst | 47 ++++++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 240 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst
diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index 000000000000..752bb2ceb399 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,189 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** + * @probed_size: Memory probed by the driver (-1 = unknown) + * + * Note that it should not be possible to ever encounter a zero value + * here, also note that no current region type will ever return -1 here. + * Although for future region types, this might be a possibility. The + * same applies to the other size fields. + */ + __u64 probed_size; + + /** + * @unallocated_size: Estimate of memory remaining (-1 = unknown) + * + * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. + * Without this (or if this is an older kernel) the value here will + * always equal the @probed_size. Note this is only currently tracked + * for I915_MEMORY_CLASS_DEVICE regions (for other types the value here + * will always equal the @probed_size). + */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder (if there is any) will not be CPU + * accessible. + * + * On systems without small BAR, the @probed_size will + * always equal the @probed_cpu_visible_size, since all + * of it will be CPU accessible. + * + * Note this is only tracked for + * I915_MEMORY_CLASS_DEVICE regions (for other types the + * value here will always equal the @probed_size). + * + * Note that if the value returned here is zero, then + * this must be an old kernel which lacks the relevant + * small-bar uAPI support (including + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on + * such systems we should never actually end up with a + * small BAR configuration, assuming we are able to load + * the kernel module. Hence it should be safe to treat + * this the same as when @probed_cpu_visible_size == + * @probed_size. + */ + __u64 probed_cpu_visible_size; + + /** + * @unallocated_cpu_visible_size: Estimate of CPU + * visible memory remaining (-1 = unknown). + * + * Note this is only tracked for + * I915_MEMORY_CLASS_DEVICE regions (for other types the + * value here will always equal the + * @probed_cpu_visible_size). + * + * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable + * accounting. Without this the value here will always + * equal the @probed_cpu_visible_size. Note this is only + * currently tracked for I915_MEMORY_CLASS_DEVICE + * regions (for other types the value here will also + * always equal the @probed_cpu_visible_size). + * + * If this is an older kernel the value here will be + * zero, see also @probed_cpu_visible_size. + */ + __u64 unallocated_cpu_visible_size; + }; + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * + * Note that for some devices we have might have further minimum + * page-size restrictions (larger than 4K), like for device local-memory. + * However in general the final size here should always reflect any + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS + * extension to place the object in device local-memory. The kernel will + * always select the largest minimum page-size for the set of possible + * placements as the value to use when rounding up the @size. + */ + __u64 size; + + /** + * @handle: Returned handle for the object. + * + * Object handles are nonzero. + */ + __u32 handle; + + /** + * @flags: Optional flags. + * + * Supported values: + * + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that + * the object will need to be accessed via the CPU. + * + * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and only + * strictly required on configurations where some subset of the device + * memory is directly visible/mappable through the CPU (which we also + * call small BAR), like on some DG2+ systems. Note that this is quite + * undesirable, but due to various factors like the client CPU, BIOS etc + * it's something we can expect to see in the wild. See + * &__drm_i915_memory_region_info.probed_cpu_visible_size for how to + * determine if this system applies. + * + * Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, to + * ensure the kernel can always spill the allocation to system memory, + * if the object can't be allocated in the mappable part of + * I915_MEMORY_CLASS_DEVICE. + * + * Also note that since the kernel only supports flat-CCS on objects + * that can *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore + * don't support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with + * flat-CCS. + * + * Without this hint, the kernel will assume that non-mappable + * I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the + * kernel can still migrate the object to the mappable part, as a last + * resort, if userspace ever CPU faults this object, but this might be + * expensive, and so ideally should be avoided. + * + * On older kernels which lack the relevant small-bar uAPI support (see + * also &__drm_i915_memory_region_info.probed_cpu_visible_size), + * usage of the flag will result in an error, but it should NEVER be + * possible to end up with a small BAR configuration, assuming we can + * also successfully load the i915 kernel module. In such cases the + * entire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and as + * such there are zero restrictions on where the object can be placed. + */ +#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0) + __u32 flags; + + /** + * @extensions: The chain of extensions to apply to this object. + * + * This will be useful in the future when we need to support several + * different extensions, and we need to apply more than one when + * creating the object. See struct i915_user_extension. + * + * If we don't supply any extensions then we get the same old gem_create + * behaviour. + * + * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see + * struct drm_i915_gem_create_ext_memory_regions. + * + * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see + * struct drm_i915_gem_create_ext_protected_content. + */ +#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0 +#define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1 + __u64 extensions; +}; diff --git a/Documentation/gpu/rfc/i915_small_bar.rst b/Documentation/gpu/rfc/i915_small_bar.rst new file mode 100644 index 000000000000..a322481cea8b --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.rst @@ -0,0 +1,47 @@ +========================== +I915 Small BAR RFC Section +========================== +Starting from DG2 we will have resizable BAR support for device local-memory(i.e +I915_MEMORY_CLASS_DEVICE), but in some cases the final BAR size might still be +smaller than the total probed_size. In such cases, only some subset of +I915_MEMORY_CLASS_DEVICE will be CPU accessible(for example the first 256M), +while the remainder is only accessible via the GPU. + +I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag +---------------------------------------------- +New gem_create_ext flag to tell the kernel that a BO will require CPU access. +This becomes important when placing an object in I915_MEMORY_CLASS_DEVICE, where +underneath the device has a small BAR, meaning only some portion of it is CPU +accessible. Without this flag the kernel will assume that CPU access is not +required, and prioritize using the non-CPU visible portion of +I915_MEMORY_CLASS_DEVICE. + +.. kernel-doc:: Documentation/gpu/rfc/i915_small_bar.h + :functions: __drm_i915_gem_create_ext + +probed_cpu_visible_size attribute +--------------------------------- +New struct__drm_i915_memory_region attribute which returns the total size of the +CPU accessible portion, for the particular region. This should only be +applicable for I915_MEMORY_CLASS_DEVICE. We also report the +unallocated_cpu_visible_size, alongside the unallocated_size. + +Vulkan will need this as part of creating a separate VkMemoryHeap with the +VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT set, to represent the CPU visible portion, +where the total size of the heap needs to be known. It also wants to be able to +give a rough estimate of how memory can potentially be allocated. + +.. kernel-doc:: Documentation/gpu/rfc/i915_small_bar.h + :functions: __drm_i915_memory_region_info + +Error Capture restrictions +-------------------------- +With error capture we have two new restrictions: + + 1) Error capture is best effort on small BAR systems; if the pages are not + CPU accessible, at the time of capture, then the kernel is free to skip + trying to capture them. + + 2) On discrete we now reject error capture on recoverable contexts. In the + future the kernel may want to blit during error capture, when for example + something is not currently CPU accessible. diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index 91e93a705230..5a3bd3924ba6 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -23,3 +23,7 @@ host such documentation: .. toctree::
i915_scheduler.rst + +.. toctree:: + + i915_small_bar.rst
On 5/25/22 20:43, Matthew Auld wrote:
Change this to all upcoming hardware so that we are more likely to be able to allocate memory outside of a fence signalling critical section?
/Thomas
Userspace wants to know the size of CPU visible portion of device local-memory, and on small BAR devices the probed_size is no longer enough. In Vulkan, for example, it would like to know the size in bytes for CPU visible VkMemoryHeap. We already track the io_size for each region, so it's just case of plumbing that through to the region query.
Testcase: igt@i915_query@query-regions-sanity-check Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com --- drivers/gpu/drm/i915/i915_query.c | 6 +++ include/uapi/drm/i915_drm.h | 74 +++++++++++++++++-------------- 2 files changed, 47 insertions(+), 33 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index 7584cec53d5d..9aa0b28aa6ee 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -496,6 +496,12 @@ static int query_memregion_info(struct drm_i915_private *i915, info.region.memory_class = mr->type; info.region.memory_instance = mr->instance; info.probed_size = mr->total; + + if (mr->type == INTEL_MEMORY_LOCAL) + info.probed_cpu_visible_size = mr->io_size; + else + info.probed_cpu_visible_size = mr->total; + info.unallocated_size = mr->avail;
if (__copy_to_user(info_ptr, &info, sizeof(info))) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index de49b68b4fc8..9df419a45244 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3207,36 +3207,6 @@ struct drm_i915_gem_memory_class_instance { * struct drm_i915_memory_region_info - Describes one region as known to the * driver. * - * Note that we reserve some stuff here for potential future work. As an example - * we might want expose the capabilities for a given region, which could include - * things like if the region is CPU mappable/accessible, what are the supported - * mapping types etc. - * - * Note that to extend struct drm_i915_memory_region_info and struct - * drm_i915_query_memory_regions in the future the plan is to do the following: - * - * .. code-block:: C - * - * struct drm_i915_memory_region_info { - * struct drm_i915_gem_memory_class_instance region; - * union { - * __u32 rsvd0; - * __u32 new_thing1; - * }; - * ... - * union { - * __u64 rsvd1[8]; - * struct { - * __u64 new_thing2; - * __u64 new_thing3; - * ... - * }; - * }; - * }; - * - * With this things should remain source compatible between versions for - * userspace, even as we add new fields. - * * Note this is using both struct drm_i915_query_item and struct drm_i915_query. * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS * at &drm_i915_query_item.query_id. @@ -3248,14 +3218,52 @@ struct drm_i915_memory_region_info { /** @rsvd0: MBZ */ __u32 rsvd0;
- /** @probed_size: Memory probed by the driver (-1 = unknown) */ + /** + * @probed_size: Memory probed by the driver (-1 = unknown) + * + * Note that it should not be possible to ever encounter a zero value + * here, also note that no current region type will ever return -1 here. + * Although for future region types, this might be a possibility. The + * same applies to the other size fields. + */ __u64 probed_size;
/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ __u64 unallocated_size;
- /** @rsvd1: MBZ */ - __u64 rsvd1[8]; + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder (if there is any) will not be CPU + * accessible. + * + * On systems without small BAR, the @probed_size will + * always equal the @probed_cpu_visible_size, since all + * of it will be CPU accessible. + * + * Note this is only tracked for + * I915_MEMORY_CLASS_DEVICE regions (for other types the + * value here will always equal the @probed_size). + * + * Note that if the value returned here is zero, then + * this must be an old kernel which lacks the relevant + * small-bar uAPI support (including + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on + * such systems we should never actually end up with a + * small BAR configuration, assuming we are able to load + * the kernel module. Hence it should be safe to treat + * this the same as when @probed_cpu_visible_size == + * @probed_size. + */ + __u64 probed_cpu_visible_size; + }; + }; };
/**
Vulkan would like to have a rough measure of how much device memory can in theory be allocated. Also add unallocated_cpu_visible_size to track the visible portion, in case the device is using small BAR.
Testcase: igt@i915_query@query-regions-unallocated Testcase: igt@i915_query@query-regions-sanity-check Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com --- drivers/gpu/drm/i915/i915_query.c | 10 +++++- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 20 ++++++++++++ drivers/gpu/drm/i915/i915_ttm_buddy_manager.h | 3 ++ drivers/gpu/drm/i915/intel_memory_region.c | 14 +++++++++ drivers/gpu/drm/i915/intel_memory_region.h | 3 ++ include/uapi/drm/i915_drm.h | 31 ++++++++++++++++++- 6 files changed, 79 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index 9aa0b28aa6ee..e095c55f4d4b 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -502,7 +502,15 @@ static int query_memregion_info(struct drm_i915_private *i915, else info.probed_cpu_visible_size = mr->total;
- info.unallocated_size = mr->avail; + if (perfmon_capable()) { + intel_memory_region_avail(mr, + &info.unallocated_size, + &info.unallocated_cpu_visible_size); + } else { + info.unallocated_size = info.probed_size; + info.unallocated_cpu_visible_size = + info.probed_cpu_visible_size; + }
if (__copy_to_user(info_ptr, &info, sizeof(info))) return -EFAULT; diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c index a5109548abc0..aa5c91e44438 100644 --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c @@ -365,6 +365,26 @@ u64 i915_ttm_buddy_man_visible_size(struct ttm_resource_manager *man) return bman->visible_size; }
+/** + * i915_ttm_buddy_man_visible_size - Query the avail tracking for the manager. + * + * @man: The buddy allocator ttm manager + * @avail: The total available memory in pages for the entire manager. + * @visible_avail: The total available memory in pages for the CPU visible + * portion. Note that this will always give the same value as @avail on + * configurations that don't have a small BAR. + */ +void i915_ttm_buddy_man_avail(struct ttm_resource_manager *man, + u64 *avail, u64 *visible_avail) +{ + struct i915_ttm_buddy_manager *bman = to_buddy_manager(man); + + mutex_lock(&bman->lock); + *avail = bman->mm.avail >> PAGE_SHIFT; + *visible_avail = bman->visible_avail; + mutex_unlock(&bman->lock); +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) void i915_ttm_buddy_man_force_visible_size(struct ttm_resource_manager *man, u64 size) diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h index 52d9586d242c..d64620712830 100644 --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h @@ -61,6 +61,9 @@ int i915_ttm_buddy_man_reserve(struct ttm_resource_manager *man,
u64 i915_ttm_buddy_man_visible_size(struct ttm_resource_manager *man);
+void i915_ttm_buddy_man_avail(struct ttm_resource_manager *man, + u64 *avail, u64 *avail_visible); + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) void i915_ttm_buddy_man_force_visible_size(struct ttm_resource_manager *man, u64 size); diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index e38d2db1c3e3..94ee26e99549 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -279,6 +279,20 @@ void intel_memory_region_set_name(struct intel_memory_region *mem, va_end(ap); }
+void intel_memory_region_avail(struct intel_memory_region *mr, + u64 *avail, u64 *visible_avail) +{ + if (mr->type == INTEL_MEMORY_LOCAL) { + i915_ttm_buddy_man_avail(mr->region_private, + avail, visible_avail); + *avail <<= PAGE_SHIFT; + *visible_avail <<= PAGE_SHIFT; + } else { + *avail = mr->total; + *visible_avail = mr->total; + } +} + void intel_memory_region_destroy(struct intel_memory_region *mem) { int ret = 0; diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 3d8378c1b447..2214f251bec3 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -127,6 +127,9 @@ int intel_memory_region_reserve(struct intel_memory_region *mem, void intel_memory_region_debug(struct intel_memory_region *mr, struct drm_printer *printer);
+void intel_memory_region_avail(struct intel_memory_region *mr, + u64 *avail, u64 *visible_avail); + struct intel_memory_region * i915_gem_ttm_system_setup(struct drm_i915_private *i915, u16 type, u16 instance); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 9df419a45244..e30f31a440b3 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3228,7 +3228,15 @@ struct drm_i915_memory_region_info { */ __u64 probed_size;
- /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + /** + * @unallocated_size: Estimate of memory remaining (-1 = unknown) + * + * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. + * Without this (or if this is an older kernel) the value here will + * always equal the @probed_size. Note this is only currently tracked + * for I915_MEMORY_CLASS_DEVICE regions (for other types the value here + * will always equal the @probed_size). + */ __u64 unallocated_size;
union { @@ -3262,6 +3270,27 @@ struct drm_i915_memory_region_info { * @probed_size. */ __u64 probed_cpu_visible_size; + + /** + * @unallocated_cpu_visible_size: Estimate of CPU + * visible memory remaining (-1 = unknown). + * + * Note this is only tracked for + * I915_MEMORY_CLASS_DEVICE regions (for other types the + * value here will always equal the + * @probed_cpu_visible_size). + * + * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable + * accounting. Without this the value here will always + * equal the @probed_cpu_visible_size. Note this is only + * currently tracked for I915_MEMORY_CLASS_DEVICE + * regions (for other types the value here will also + * always equal the @probed_cpu_visible_size). + * + * If this is an older kernel the value here will be + * zero, see also @probed_cpu_visible_size. + */ + __u64 unallocated_cpu_visible_size; }; }; };
Hi Matthew,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on drm-tip/drm-tip] [also build test WARNING on v5.18 next-20220525] [cannot apply to drm-intel/for-linux-next drm/drm-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Auld/small-BAR-uapi-b... base: git://anongit.freedesktop.org/drm/drm-tip drm-tip config: x86_64-rhel-8.3-kunit (https://download.01.org/0day-ci/archive/20220526/202205261034.CoXEwzSb-lkp@i...) compiler: gcc-11 (Debian 11.3.0-1) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/614521eb68cc1e72a489c1c7968273... git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Matthew-Auld/small-BAR-uapi-bits/20220526-024641 git checkout 614521eb68cc1e72a489c1c796827329c98bf031 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/gpu/drm/i915/
If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot lkp@intel.com
All warnings (new ones prefixed by >>):
drivers/gpu/drm/i915/i915_ttm_buddy_manager.c:379: warning: expecting prototype for i915_ttm_buddy_man_visible_size(). Prototype was for i915_ttm_buddy_man_avail() instead
vim +379 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
367 368 /** 369 * i915_ttm_buddy_man_visible_size - Query the avail tracking for the manager. 370 * 371 * @man: The buddy allocator ttm manager 372 * @avail: The total available memory in pages for the entire manager. 373 * @visible_avail: The total available memory in pages for the CPU visible 374 * portion. Note that this will always give the same value as @avail on 375 * configurations that don't have a small BAR. 376 */ 377 void i915_ttm_buddy_man_avail(struct ttm_resource_manager *man, 378 u64 *avail, u64 *visible_avail)
379 {
380 struct i915_ttm_buddy_manager *bman = to_buddy_manager(man); 381 382 mutex_lock(&bman->lock); 383 *avail = bman->mm.avail >> PAGE_SHIFT; 384 *visible_avail = bman->visible_avail; 385 mutex_unlock(&bman->lock); 386 } 387
On 25/05/2022 19:43, Matthew Auld wrote:
I have concerns here that it isn't useful and could even be counter-productive. If we give unprivileged access it may be considered a side channel, but if we "lie" (report total region size) to unprivileged clients (like in this patch), then they don't co-operate well and end trashing.
Is Vulkan really sure it wants this and why?
Regards,
Tvrtko
On 26/05/2022 08:58, Tvrtko Ursulin wrote:
Lionel pointed out: https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/VK_EXT...
Also note that the existing behaviour was to lie. I'm not sure what's the best option here.
On 26/05/2022 09:10, Matthew Auld wrote:
So this query would provide VkPhysicalDeviceMemoryBudgetPropertiesEXT::heapBudget. Apart that it wouldn't since we thought to lie. And apart that it's text says:
""" ...how much total memory from each heap the current process can use at any given time, before allocations may start failing or causing performance degradation. The values may change based on other activity in the system that is outside the scope and control of the Vulkan implementation. """
It acknowledges itself in the second sentence that the first sentence is questionable.
And VkPhysicalDeviceMemoryBudgetPropertiesEXT::heapUsage would be still missing and would maybe come via fdinfo? Or you plan to add it to this same query later?
I like to source knowledge of best practices from the long established world of CPU scheduling and process memory management. Is anyone aware of this kind of techniques there - applications actively looking at free memory data from /proc/meminfo and dynamically adjusting their runtime behaviour based on it? And that they are not single application on a dedicated system type of things.
Or perhaps this Vk extension is envisaged for exactly those kind of scenarios? However if so then userspace can know all this data from probed size and the data set it created.
Also note that the existing behaviour was to lie. I'm not sure what's the best option here.
Uapi reserved -1 for unknown so we could do that?
Regards,
Tvrtko
On 26/05/2022 09:33, Tvrtko Ursulin wrote:
IIUC the heapUsage is something like per app usage, which already looks to be tracked in anv, although I don't think it knows if stuff is actually resident or not. The heapBudget looks to then be roughly the heapUsage + info.unallocated.
AFAICT we can do that for the info.unallocated_cpu_visible, but not for the existing info.unallocated without maybe breaking something?
No longer used.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com --- drivers/gpu/drm/i915/intel_memory_region.c | 4 +--- drivers/gpu/drm/i915/intel_memory_region.h | 1 - 2 files changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 94ee26e99549..9a4a7fb55582 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -198,8 +198,7 @@ void intel_memory_region_debug(struct intel_memory_region *mr, if (mr->region_private) ttm_resource_manager_debug(mr->region_private, printer); else - drm_printf(printer, "total:%pa, available:%pa bytes\n", - &mr->total, &mr->avail); + drm_printf(printer, "total:%pa bytes\n", &mr->total); }
static int intel_memory_region_memtest(struct intel_memory_region *mem, @@ -242,7 +241,6 @@ intel_memory_region_create(struct drm_i915_private *i915, mem->min_page_size = min_page_size; mem->ops = ops; mem->total = size; - mem->avail = mem->total; mem->type = type; mem->instance = instance;
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 2214f251bec3..2953ed5c3248 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -75,7 +75,6 @@ struct intel_memory_region { resource_size_t io_size; resource_size_t min_page_size; resource_size_t total; - resource_size_t avail;
u16 type; u16 instance;
On 5/25/22 20:43, Matthew Auld wrote:
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
On small BAR configurations, when dealing with I915_MEMORY_CLASS_DEVICE allocations, we assume that by default, all userspace allocations should be placed in the non-CPU visible portion. Note that dumb buffers are not included here, since these are not "GPU accelerated" and likely need CPU access. We choose to just always set GPU_ONLY, and let the backend figure out if that should be ignored or not, for example on full BAR systems.
In a later patch userspace will be able to provide a hint if CPU access to the buffer is needed.
v2(Thomas) - Apply GPU_ONLY on all discrete devices, but only if the BO can be placed in LMEM. Down in the depths this should be turned into a noop, where required, and as an annotation it still make some sense. If we apply it regardless of the placements then we end up needing to check the placements during exec capture. Also it's slightly inconsistent since the NEEDS_CPU_ACCESS can only be applied on objects that can be placed in LMEM. The other annoyance would be gem_create_ext vs plain gem_create, if we were to always apply GPU_ONLY.
Testcase: igt@gem-create@create-ext-cpu-access-sanity-check Testcase: igt@gem-create@create-ext-cpu-access-big Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index 5802692ea604..d094cae0ddf1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -427,6 +427,14 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, ext_data.n_placements = 1; }
+ /* + * TODO: add a userspace hint to force CPU_ACCESS for the object, which + * can override this. + */ + if (ext_data.n_placements > 1 || + ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM) + ext_data.flags |= I915_BO_ALLOC_GPU_ONLY; + obj = __i915_gem_object_create_user_ext(i915, args->size, ext_data.placements, ext_data.n_placements,
If set, force the allocation to be placed in the mappable portion of I915_MEMORY_CLASS_DEVICE. One big restriction here is that system memory (i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the object, that way we can always spill the object into system memory if we can't make space.
Testcase: igt@gem-create@create-ext-cpu-access-sanity-check Testcase: igt@gem-create@create-ext-cpu-access-big Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 26 ++++++--- include/uapi/drm/i915_drm.h | 61 +++++++++++++++++++--- 2 files changed, 71 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index d094cae0ddf1..33673fe7ee0a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -241,6 +241,7 @@ struct create_ext { struct drm_i915_private *i915; struct intel_memory_region *placements[INTEL_REGION_UNKNOWN]; unsigned int n_placements; + unsigned int placement_mask; unsigned long flags; };
@@ -337,6 +338,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, for (i = 0; i < args->num_regions; i++) ext_data->placements[i] = placements[i];
+ ext_data->placement_mask = mask; return 0;
out_dump: @@ -411,7 +413,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, struct drm_i915_gem_object *obj; int ret;
- if (args->flags) + if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) return -EINVAL;
ret = i915_user_extensions(u64_to_user_ptr(args->extensions), @@ -427,13 +429,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, ext_data.n_placements = 1; }
- /* - * TODO: add a userspace hint to force CPU_ACCESS for the object, which - * can override this. - */ - if (ext_data.n_placements > 1 || - ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM) - ext_data.flags |= I915_BO_ALLOC_GPU_ONLY; + if (args->flags & I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) { + if (ext_data.n_placements == 1) + return -EINVAL; + + /* + * We always need to be able to spill to system memory, if we + * can't place in the mappable part of LMEM. + */ + if (!(ext_data.placement_mask & BIT(INTEL_REGION_SMEM))) + return -EINVAL; + } else { + if (ext_data.n_placements > 1 || + ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM) + ext_data.flags |= I915_BO_ALLOC_GPU_ONLY; + }
obj = __i915_gem_object_create_user_ext(i915, args->size, ext_data.placements, diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e30f31a440b3..5b0a10e6a1b8 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3366,11 +3366,11 @@ struct drm_i915_query_memory_regions { * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added * extension support using struct i915_user_extension. * - * Note that in the future we want to have our buffer flags here, at least for - * the stuff that is immutable. Previously we would have two ioctls, one to - * create the object with gem_create, and another to apply various parameters, - * however this creates some ambiguity for the params which are considered - * immutable. Also in general we're phasing out the various SET/GET ioctls. + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. */ struct drm_i915_gem_create_ext { /** @@ -3378,7 +3378,6 @@ struct drm_i915_gem_create_ext { * * The (page-aligned) allocated size for the object will be returned. * - * * DG2 64K min page size implications: * * On discrete platforms, starting from DG2, we have to contend with GTT @@ -3390,7 +3389,9 @@ struct drm_i915_gem_create_ext { * * Note that the returned size here will always reflect any required * rounding up done by the kernel, i.e 4K will now become 64K on devices - * such as DG2. + * such as DG2. The kernel will always select the largest minimum + * page-size for the set of possible placements as the value to use when + * rounding up the @size. * * Special DG2 GTT address alignment requirement: * @@ -3414,14 +3415,58 @@ struct drm_i915_gem_create_ext { * is deemed to be a good compromise. */ __u64 size; + /** * @handle: Returned handle for the object. * * Object handles are nonzero. */ __u32 handle; - /** @flags: MBZ */ + + /** + * @flags: Optional flags. + * + * Supported values: + * + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that + * the object will need to be accessed via the CPU. + * + * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and only + * strictly required on configurations where some subset of the device + * memory is directly visible/mappable through the CPU (which we also + * call small BAR), like on some DG2+ systems. Note that this is quite + * undesirable, but due to various factors like the client CPU, BIOS etc + * it's something we can expect to see in the wild. See + * &drm_i915_memory_region_info.probed_cpu_visible_size for how to + * determine if this system applies. + * + * Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, to + * ensure the kernel can always spill the allocation to system memory, + * if the object can't be allocated in the mappable part of + * I915_MEMORY_CLASS_DEVICE. + * + * Also note that since the kernel only supports flat-CCS on objects + * that can *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore + * don't support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with + * flat-CCS. + * + * Without this hint, the kernel will assume that non-mappable + * I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the + * kernel can still migrate the object to the mappable part, as a last + * resort, if userspace ever CPU faults this object, but this might be + * expensive, and so ideally should be avoided. + * + * On older kernels which lack the relevant small-bar uAPI support (see + * also &drm_i915_memory_region_info.probed_cpu_visible_size), + * usage of the flag will result in an error, but it should NEVER be + * possible to end up with a small BAR configuration, assuming we can + * also successfully load the i915 kernel module. In such cases the + * entire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and as + * such there are zero restrictions on where the object can be placed. + */ +#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0) __u32 flags; + /** * @extensions: The chain of extensions to apply to this object. *
On 5/25/22 20:43, Matthew Auld wrote:
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
Skip capturing any lmem pages that can't be copied using the CPU. This in now only best effort on platforms that have small BAR.
Testcase: igt@gem-exec-capture@capture-invisible Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com --- drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 0512c66fa4f3..77df899123c2 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1116,11 +1116,15 @@ i915_vma_coredump_create(const struct intel_gt *gt, dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { + dma_addr_t offset = dma - mem->region.start; void __iomem *s;
- s = io_mapping_map_wc(&mem->iomap, - dma - mem->region.start, - PAGE_SIZE); + if (offset + PAGE_SIZE > mem->io_size) { + ret = -EINVAL; + break; + } + + s = io_mapping_map_wc(&mem->iomap, offset, PAGE_SIZE); ret = compress_page(compress, (void __force *)s, dst, true);
On 5/25/22 20:43, Matthew Auld wrote:
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
A non-recoverable context must be used if the user wants proper error capture on discrete platforms. In the future the kernel may want to blit the contents of some objects when later doing the capture stage.
Testcase: igt@gem_exec_capture@capture-recoverable-discrete Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index b279588c0672..e27ccfa50dc3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1961,7 +1961,7 @@ eb_find_first_request_added(struct i915_execbuffer *eb) #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
/* Stage with GFP_KERNEL allocations before we enter the signaling critical path */ -static void eb_capture_stage(struct i915_execbuffer *eb) +static int eb_capture_stage(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; unsigned int i = count, j; @@ -1974,6 +1974,10 @@ static void eb_capture_stage(struct i915_execbuffer *eb) if (!(flags & EXEC_OBJECT_CAPTURE)) continue;
+ if (i915_gem_context_is_recoverable(eb->gem_context) && + IS_DGFX(eb->i915)) + return -EINVAL; + for_each_batch_create_order(eb, j) { struct i915_capture_list *capture;
@@ -1986,6 +1990,8 @@ static void eb_capture_stage(struct i915_execbuffer *eb) eb->capture_lists[j] = capture; } } + + return 0; }
/* Commit once we're in the critical path */ @@ -3420,7 +3426,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, }
ww_acquire_done(&eb.ww.ctx); - eb_capture_stage(&eb); + err = eb_capture_stage(&eb); + if (err) + goto err_vma;
out_fence = eb_requests_create(&eb, in_fence, out_fence_fd); if (IS_ERR(out_fence)) {
Hi Matthew,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on drm-tip/drm-tip] [also build test ERROR on v5.18 next-20220525] [cannot apply to drm-intel/for-linux-next drm/drm-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Auld/small-BAR-uapi-b... base: git://anongit.freedesktop.org/drm/drm-tip drm-tip config: i386-randconfig-a004 (https://download.01.org/0day-ci/archive/20220526/202205260728.itOPg4qx-lkp@i...) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d52a6e75b0c402c7f3b42a2b1b2873f151220947) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/fdc3574e30bb0fdfdc9569fa42d369... git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Matthew-Auld/small-BAR-uapi-bits/20220526-024641 git checkout fdc3574e30bb0fdfdc9569fa42d369b1fae41e9e # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot lkp@intel.com
All errors (new ones prefixed by >>):
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3429:6: error: assigning to 'int' from incompatible type 'void'
err = eb_capture_stage(&eb); ^ ~~~~~~~~~~~~~~~~~~~~~ 1 error generated.
vim +3429 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
3384 3385 if (args->flags & I915_EXEC_FENCE_OUT) { 3386 out_fence_fd = get_unused_fd_flags(O_CLOEXEC); 3387 if (out_fence_fd < 0) { 3388 err = out_fence_fd; 3389 goto err_in_fence; 3390 } 3391 } 3392 3393 err = eb_create(&eb); 3394 if (err) 3395 goto err_out_fence; 3396 3397 GEM_BUG_ON(!eb.lut_size); 3398 3399 err = eb_select_context(&eb); 3400 if (unlikely(err)) 3401 goto err_destroy; 3402 3403 err = eb_select_engine(&eb); 3404 if (unlikely(err)) 3405 goto err_context; 3406 3407 err = eb_lookup_vmas(&eb); 3408 if (err) { 3409 eb_release_vmas(&eb, true); 3410 goto err_engine; 3411 } 3412 3413 i915_gem_ww_ctx_init(&eb.ww, true); 3414 3415 err = eb_relocate_parse(&eb); 3416 if (err) { 3417 /* 3418 * If the user expects the execobject.offset and 3419 * reloc.presumed_offset to be an exact match, 3420 * as for using NO_RELOC, then we cannot update 3421 * the execobject.offset until we have completed 3422 * relocation. 3423 */ 3424 args->flags &= ~__EXEC_HAS_RELOC; 3425 goto err_vma; 3426 } 3427 3428 ww_acquire_done(&eb.ww.ctx);
3429 err = eb_capture_stage(&eb);
3430 if (err) 3431 goto err_vma; 3432 3433 out_fence = eb_requests_create(&eb, in_fence, out_fence_fd); 3434 if (IS_ERR(out_fence)) { 3435 err = PTR_ERR(out_fence); 3436 out_fence = NULL; 3437 if (eb.requests[0]) 3438 goto err_request; 3439 else 3440 goto err_vma; 3441 } 3442 3443 err = eb_submit(&eb); 3444 3445 err_request: 3446 eb_requests_get(&eb); 3447 err = eb_requests_add(&eb, err); 3448 3449 if (eb.fences) 3450 signal_fence_array(&eb, eb.composite_fence ? 3451 eb.composite_fence : 3452 &eb.requests[0]->fence); 3453 3454 if (out_fence) { 3455 if (err == 0) { 3456 fd_install(out_fence_fd, out_fence->file); 3457 args->rsvd2 &= GENMASK_ULL(31, 0); /* keep in-fence */ 3458 args->rsvd2 |= (u64)out_fence_fd << 32; 3459 out_fence_fd = -1; 3460 } else { 3461 fput(out_fence->file); 3462 } 3463 } 3464 3465 if (unlikely(eb.gem_context->syncobj)) { 3466 drm_syncobj_replace_fence(eb.gem_context->syncobj, 3467 eb.composite_fence ? 3468 eb.composite_fence : 3469 &eb.requests[0]->fence); 3470 } 3471 3472 if (!out_fence && eb.composite_fence) 3473 dma_fence_put(eb.composite_fence); 3474 3475 eb_requests_put(&eb); 3476 3477 err_vma: 3478 eb_release_vmas(&eb, true); 3479 WARN_ON(err == -EDEADLK); 3480 i915_gem_ww_ctx_fini(&eb.ww); 3481 3482 if (eb.batch_pool) 3483 intel_gt_buffer_pool_put(eb.batch_pool); 3484 err_engine: 3485 eb_put_engine(&eb); 3486 err_context: 3487 i915_gem_context_put(eb.gem_context); 3488 err_destroy: 3489 eb_destroy(&eb); 3490 err_out_fence: 3491 if (out_fence_fd != -1) 3492 put_unused_fd(out_fence_fd); 3493 err_in_fence: 3494 dma_fence_put(in_fence); 3495 err_ext: 3496 put_fence_array(eb.fences, eb.num_fences); 3497 return err; 3498 } 3499
On 5/25/22 20:43, Matthew Auld wrote:
We should also require this for future integrated, for capture buffer memory allocation purposes.
Otherwise Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
With the uAPI in place we should now have enough in place to ensure a working system on small BAR configurations.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com --- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index e9c12e0d6f59..6c6f8cbd7321 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -111,12 +111,6 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) flat_ccs_base = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR); flat_ccs_base = (flat_ccs_base >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
- /* FIXME: Remove this when we have small-bar enabled */ - if (pci_resource_len(pdev, 2) < lmem_size) { - drm_err(&i915->drm, "System requires small-BAR support, which is currently unsupported on this kernel\n"); - return ERR_PTR(-EINVAL); - } - if (GEM_WARN_ON(lmem_size < flat_ccs_base)) return ERR_PTR(-EIO);
@@ -169,6 +163,10 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) drm_info(&i915->drm, "Local memory available: %pa\n", &lmem_size);
+ if (io_size < lmem_size) + drm_info(&i915->drm, "Using a reduced BAR size of %lluMiB. Consider enabling the full BAR size if available in the BIOS.\n", + (u64)io_size >> 20); + return mem;
err_region_put:
On 5/25/22 20:43, Matthew Auld wrote:
Hmm. I wonder what BIOS uis typically call the mappable portion of VRAM. I'll se if I can check that on my DG1 system. Might be that an average user misinterprets "full BAR".
/Thomas
return mem;
err_region_put:
On Tue, 2022-06-21 at 11:05 +0200, Das, Nirmoy wrote:
A quick googling turns up "Resizable BAR". My Asus Bios on the DG1 machine says "ReSize BAR (Resizable BAR support to harness full GPU memory)".
So "Resizable BAR" should hopefully be understood by most people. Not sure though if this is the same as "Above 4G memory", although the latter must be a prerequisite I assume.
/Thomas
Just for CI.
Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index 6c6f8cbd7321..119e53f5d9b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -137,6 +137,11 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) if (!io_size) return ERR_PTR(-EIO);
+ if (io_size == lmem_size) { + drm_info(&i915->drm, "NOTE!! Forcing small BAR for testing\n"); + io_size = SZ_256M; + } + min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K : I915_GTT_PAGE_SIZE_4K; mem = intel_memory_region_create(i915,
dri-devel@lists.freedesktop.org