This series includes a version of Maarten' series[1], which converts more of the driver locking over to dma-resv. On top of this we now implement things like LMEM eviction, which has a dependency on this new locking design.
In terms of new uAPI we have gem_create_ext, which offers extensions support for gem_create. For now the only extension we add is giving userspace the ability to optionally provide a priority list of potential placements for the object. The other bit of new uAPI is the query interface for memory regions, which describes the supported memory regions for the device. What this reports can then be fed into gem_create_ext to specify where an object might reside, like device local memory. Note that the class/instance complexity in the uAPI is not very relevant for DG1, but is in preparation for the Xe HP multi-tile architecture with multiple memory regions.
The series still includes relocation support, but that's purely for CI, until we have completed all the IGT rework[2] and so will not be merged. Likewise for pread/pwrite, which will also be dropped from DG1+.
[1] https://patchwork.freedesktop.org/series/82337/ [2] https://patchwork.freedesktop.org/series/82954/
Abdiel Janulgue (3): drm/i915/query: Expose memory regions through the query uAPI drm/i915: Provide a way to disable PCIe relaxed write ordering drm/i915: Reintroduce mem->reserved
Animesh Manna (2): drm/i915/lmem: reset the lmem buffer created by fbdev drm/i915/dsb: Enable lmem for dsb
Anshuman Gupta (1): drm/i915/oprom: Basic sanitization
Anusha Srivatsa (1): drm/i915/lmem: Bypass aperture when lmem is available
Bommu Krishnaiah (1): drm/i915/gem: Update shmem available memory
CQ Tang (13): drm/i915/dg1: Fix occasional migration error drm/i915: i915 returns -EBUSY on thread contention drm/i915: setup GPU device lmem region drm/i915: Fix object page offset within a region drm/i915: add i915_gem_object_is_devmem() function drm/i915: finish memory region support for stolen objects. drm/i915: Create stolen memory region from local memory drm/i915/dg1: intel_memory_region_evict() changes for eviction drm/i915/dg1: i915_gem_object_memcpy(..) infrastructure drm/i915/dg1: Eviction logic drm/i915/dg1: Add enable_eviction modparam drm/i915/dg1: Add lmem_size modparam drm/i915: need consider system BO snoop for dgfx
Chris Wilson (2): drm/i915/gt: Move move context layout registers and offsets to lrc_reg.h drm/i915/gt: Rename lrc.c to execlists_submission.c
Clint Taylor (3): drm/i915/dg1: Read OPROM via SPI controller drm/i915/dg1: Compute MEM Bandwidth using MCHBAR drm/i915/dg1: Double memory bandwidth available
Daniele Ceraolo Spurio (5): drm/i915: split gen8+ flush and bb_start emission functions to their own file drm/i915: split wa_bb code to its own file drm/i915: Make intel_init_workaround_bb more compatible with ww locking. drm/i915/guc: put all guc objects in lmem when available drm/i915: WA for zero memory channel
Imre Deak (1): drm/i915/dg1: Reserve first 1MB of local memory
Kui Wen (1): drm/i915/dg1: Do not check r->sgt.pfn for NULL
Lucas De Marchi (2): drm/i915: move eviction to prepare hook drm/i915/dg1: allow pci to auto probe
Maarten Lankhorst (60): drm/i915: Pin timeline map after first timeline pin, v5. drm/i915: Move cmd parser pinning to execbuffer drm/i915: Add missing -EDEADLK handling to execbuf pinning, v2. drm/i915: Ensure we hold the object mutex in pin correctly v2 drm/i915: Add gem object locking to madvise. drm/i915: Move HAS_STRUCT_PAGE to obj->flags drm/i915: Rework struct phys attachment handling drm/i915: Convert i915_gem_object_attach_phys() to ww locking, v2. drm/i915: make lockdep slightly happier about execbuf. drm/i915: Disable userptr pread/pwrite support. drm/i915: No longer allow exporting userptr through dma-buf drm/i915: Reject more ioctls for userptr drm/i915: Reject UNSYNCHRONIZED for userptr, v2. drm/i915: Make compilation of userptr code depend on MMU_NOTIFIER. drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v5. drm/i915: Flatten obj->mm.lock drm/i915: Populate logical context during first pin. drm/i915: Make ring submission compatible with obj->mm.lock removal, v2. drm/i915: Handle ww locking in init_status_page drm/i915: Rework clflush to work correctly without obj->mm.lock. drm/i915: Pass ww ctx to intel_pin_to_display_plane drm/i915: Add object locking to vm_fault_cpu drm/i915: Move pinning to inside engine_wa_list_verify() drm/i915: Take reservation lock around i915_vma_pin. drm/i915: Make __engine_unpark() compatible with ww locking v2 drm/i915: Take obj lock around set_domain ioctl drm/i915: Defer pin calls in buffer pool until first use by caller. drm/i915: Fix pread/pwrite to work with new locking rules. drm/i915: Fix workarounds selftest, part 1 drm/i915: Add igt_spinner_pin() to allow for ww locking around spinner. drm/i915: Add ww locking around vm_access() drm/i915: Increase ww locking for perf. drm/i915: Lock ww in ucode objects correctly drm/i915: Add ww locking to dma-buf ops. drm/i915: Add missing ww lock in intel_dsb_prepare. drm/i915: Fix ww locking in shmem_create_from_object drm/i915: Use a single page table lock for each gtt. drm/i915/selftests: Prepare huge_pages testcases for obj->mm.lock removal. drm/i915/selftests: Prepare client blit for obj->mm.lock removal. drm/i915/selftests: Prepare coherency tests for obj->mm.lock removal. drm/i915/selftests: Prepare context tests for obj->mm.lock removal. drm/i915/selftests: Prepare dma-buf tests for obj->mm.lock removal. drm/i915/selftests: Prepare execbuf tests for obj->mm.lock removal. drm/i915/selftests: Prepare mman testcases for obj->mm.lock removal. drm/i915/selftests: Prepare object tests for obj->mm.lock removal. drm/i915/selftests: Prepare object blit tests for obj->mm.lock removal. drm/i915/selftests: Prepare igt_gem_utils for obj->mm.lock removal drm/i915/selftests: Prepare context selftest for obj->mm.lock removal drm/i915/selftests: Prepare hangcheck for obj->mm.lock removal drm/i915/selftests: Prepare execlists for obj->mm.lock removal drm/i915/selftests: Prepare mocs tests for obj->mm.lock removal drm/i915/selftests: Prepare ring submission for obj->mm.lock removal drm/i915/selftests: Prepare timeline tests for obj->mm.lock removal drm/i915/selftests: Prepare i915_request tests for obj->mm.lock removal drm/i915/selftests: Prepare memory region tests for obj->mm.lock removal drm/i915/selftests: Prepare cs engine tests for obj->mm.lock removal drm/i915/selftests: Prepare gtt tests for obj->mm.lock removal drm/i915: Finally remove obj->mm.lock. drm/i915: Keep userpointer bindings if seqcount is unchanged, v2. drm/i915: Implement eviction locking v2
Matt Roper (1): drm/i915/lmem: Fail driver init if LMEM training failed
Matthew Auld (19): drm/i915/selftest: also consider non-contiguous objects drm/i915/selftest: assert we get 2M GTT pages drm/i915/selftest: handle local-memory in perf_memcpy HAX drm/i915/lmem: support CPU relocations HAX drm/i915/lmem: support pread and pwrite drm/i915: introduce kernel blitter_context drm/i915/region: support basic eviction drm/i915: support basic object migration drm/i915/uapi: introduce drm_i915_gem_create_ext drm/i915: setup the LMEM region drm/i915/gtt: map the PD up front drm/i915/gtt/dgfx: place the PD in LMEM drm/i915/gtt: make flushing conditional drm/i915/gtt/dg1: add PTE_LM plumbing for PPGTT drm/i915/gtt/dg1: add PTE_LM plumbing for GGTT drm/i915: allocate context from LMEM drm/i915: move engine scratch to LMEM drm/i915/lmem: support optional CPU clearing for special internal use drm/i915: drop fake lmem
Michael J. Ruhl (2): drm/i915/dmabuf: Disallow LMEM objects from dma-buf drm/i915/dg1: Introduce dmabuf mmap to LMEM
Michel Thierry (2): drm/i915/lmem: allocate cmd ring in lmem drm/i915/lmem: allocate HWSP in lmem
Mohammed Khajapasha (2): drm/i915/fbdev: Use lmem physical addresses for fb_mmap() on discrete drm/i915: Return error value when bo not in LMEM for discrete
Prathap Kumar Valsan (2): drm/i915: Store gt in memory region drm/i915/pm: suspend and restore ppgtt mapping
Ramalingam C (6): drm/i915: define intel_partial_pages_for_sg_table drm/i915: create and destroy dummy vma drm/i915: blt copy between objs using pre-created vma windows drm/i915: window_blt_copy is used for swapin and swapout drm/i915: Lmem eviction statistics by category drm/i915/gem/selftest: test and measure window based blt cpy
Stuart Summers (1): drm/i915: Allow non-uniform subslices in gen12+
Sudeep Dutt (2): drm/i915/dg1: Track swap in/out stats via debugfs drm/i915/dg1: Measure swap in/out timing stats
Thomas Hellström (15): HAX drm/i915: Work around the selftest timeline lock splat workaround drm/i915: Introduce drm_i915_lock_isolated drm/i915: Lock hwsp objects isolated for pinning at create time drm/i915: Prepare for obj->mm.lock removal drm/i915: Avoid some false positives in assert_object_held() drm/i915: Reference contending lock objects drm/i915: Break out dma_resv ww locking utilities to separate files drm/i915: Introduce a for_i915_gem_ww(){} drm/i915: Untangle the vma pages_mutex drm/i915: Add blit functions that can be called from within a WW transaction drm/i915: Delay publishing objects on the eviction lists drm/i915: Perform execbuffer object locking as a separate step drm/i915: Support ww eviction drm/i915: Use a ww transaction in the fault handler drm/i915: Use a ww transaction in i915_gem_object_pin_map_unlocked()
Tvrtko Ursulin (4): drm/i915/dg1: Eliminate eviction mutex drm/i915/dg1: Keep engine awake across whole blit drm/i915/dg1: Add dedicated context for blitter eviction drm/i915: Improve accuracy of eviction stats
Venkata Ramana Nayana (8): drm/i915: suspend/resume eviction drm/i915: Reset blitter context when unpark engine drm/i915/gt: Allocate default ctx objects in SMEM drm/i915: suspend/resume enable blitter eviction drm/i915: suspend/resume handling of perma-pinned objects drm/i915: Support ww locks in suspend/resume drm/i915/dg1: Fix mapping type for default state object drm/i915/dg1: Fix GPU hang due to shmemfs page drop
Venkata Sandeep Dhanalakota (2): drm/i915: Update the helper to set correct mapping drm/i915/lmem: Limit block size to 4G
Zbigniew Kempczyński (1): drm/i915: Distinction of memory regions
drivers/gpu/drm/i915/Kconfig.debug | 11 + drivers/gpu/drm/i915/Makefile | 7 +- drivers/gpu/drm/i915/display/intel_bios.c | 75 +- drivers/gpu/drm/i915/display/intel_bw.c | 64 +- drivers/gpu/drm/i915/display/intel_display.c | 80 +- drivers/gpu/drm/i915/display/intel_display.h | 2 +- drivers/gpu/drm/i915/display/intel_dsb.c | 9 +- drivers/gpu/drm/i915/display/intel_fbc.c | 20 +- drivers/gpu/drm/i915/display/intel_fbdev.c | 54 +- drivers/gpu/drm/i915/display/intel_opregion.c | 169 ++++ drivers/gpu/drm/i915/display/intel_opregion.h | 31 +- drivers/gpu/drm/i915/display/intel_overlay.c | 34 +- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 15 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_create.c | 398 ++++++++ drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 123 ++- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 52 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 302 +++++- drivers/gpu/drm/i915/gem/i915_gem_fence.c | 95 -- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 6 +- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 254 ++++- drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 22 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 187 ++-- drivers/gpu/drm/i915/gem/i915_gem_mman.h | 11 + drivers/gpu/drm/i915/gem/i915_gem_object.c | 711 +++++++++++++- drivers/gpu/drm/i915/gem/i915_gem_object.h | 206 +++- .../gpu/drm/i915/gem/i915_gem_object_blt.c | 101 +- .../gpu/drm/i915/gem/i915_gem_object_blt.h | 10 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 46 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 140 ++- drivers/gpu/drm/i915/gem/i915_gem_phys.c | 110 +-- drivers/gpu/drm/i915/gem/i915_gem_region.c | 234 ++++- drivers/gpu/drm/i915/gem/i915_gem_region.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 46 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 52 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.h | 6 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 247 +++-- drivers/gpu/drm/i915/gem/i915_gem_stolen.h | 10 +- drivers/gpu/drm/i915/gem/i915_gem_tiling.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 870 ++++++----------- .../drm/i915/gem/selftests/huge_gem_object.c | 4 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 +- .../i915/gem/selftests/i915_gem_client_blt.c | 8 +- .../i915/gem/selftests/i915_gem_coherency.c | 18 +- .../drm/i915/gem/selftests/i915_gem_context.c | 21 +- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 2 +- .../i915/gem/selftests/i915_gem_execbuffer.c | 2 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 33 +- .../drm/i915/gem/selftests/i915_gem_object.c | 2 +- .../i915/gem/selftests/i915_gem_object_blt.c | 172 +++- .../drm/i915/gem/selftests/i915_gem_phys.c | 10 +- .../drm/i915/gem/selftests/igt_gem_utils.c | 2 +- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 11 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 393 ++++++++ drivers/gpu/drm/i915/gt/gen8_engine_cs.h | 26 + drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 91 +- drivers/gpu/drm/i915/gt/gen8_ppgtt.h | 2 + drivers/gpu/drm/i915/gt/intel_context.c | 3 +- drivers/gpu/drm/i915/gt/intel_context.h | 2 + drivers/gpu/drm/i915/gt/intel_context_sseu.c | 2 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 13 +- drivers/gpu/drm/i915/gt/intel_engine.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 125 ++- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 14 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 + .../drm/i915/gt/intel_engine_workaround_bb.c | 364 +++++++ .../drm/i915/gt/intel_engine_workaround_bb.h | 14 + ...tel_lrc.c => intel_execlists_submission.c} | 899 +++--------------- .../drm/i915/gt/intel_execlists_submission.h | 66 ++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 93 +- drivers/gpu/drm/i915/gt/intel_gt.c | 14 +- .../gpu/drm/i915/gt/intel_gt_buffer_pool.c | 47 +- .../gpu/drm/i915/gt/intel_gt_buffer_pool.h | 5 + .../drm/i915/gt/intel_gt_buffer_pool_types.h | 1 + drivers/gpu/drm/i915/gt/intel_gt_irq.c | 1 + drivers/gpu/drm/i915/gt/intel_gtt.c | 105 +- drivers/gpu/drm/i915/gt/intel_gtt.h | 23 +- drivers/gpu/drm/i915/gt/intel_lrc.h | 128 --- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 39 + drivers/gpu/drm/i915/gt/intel_mocs.c | 2 +- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 20 +- drivers/gpu/drm/i915/gt/intel_renderstate.c | 2 +- drivers/gpu/drm/i915/gt/intel_renderstate.h | 1 + drivers/gpu/drm/i915/gt/intel_ring.c | 24 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 184 ++-- drivers/gpu/drm/i915/gt/intel_sseu.c | 6 +- drivers/gpu/drm/i915/gt/intel_timeline.c | 121 ++- drivers/gpu/drm/i915/gt/intel_timeline.h | 1 + .../gpu/drm/i915/gt/intel_timeline_types.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 24 +- drivers/gpu/drm/i915/gt/mock_engine.c | 24 +- drivers/gpu/drm/i915/gt/selftest_context.c | 5 +- drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 4 +- .../{selftest_lrc.c => selftest_execlists.c} | 37 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 8 +- drivers/gpu/drm/i915/gt/selftest_mocs.c | 2 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 5 +- .../drm/i915/gt/selftest_ring_submission.c | 4 +- drivers/gpu/drm/i915/gt/selftest_timeline.c | 100 +- .../gpu/drm/i915/gt/selftest_workarounds.c | 101 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 11 +- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 13 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 11 +- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 4 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_huc.c | 18 +- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 37 +- drivers/gpu/drm/i915/gvt/dmabuf.c | 2 +- drivers/gpu/drm/i915/gvt/mmio_context.h | 2 + drivers/gpu/drm/i915/gvt/scheduler.c | 1 + drivers/gpu/drm/i915/i915_active.c | 20 +- drivers/gpu/drm/i915/i915_cmd_parser.c | 104 +- drivers/gpu/drm/i915/i915_debugfs.c | 42 +- drivers/gpu/drm/i915/i915_drv.c | 277 +++++- drivers/gpu/drm/i915/i915_drv.h | 57 +- drivers/gpu/drm/i915/i915_gem.c | 418 +++----- drivers/gpu/drm/i915/i915_gem.h | 12 - drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/i915_gem_ww.c | 93 ++ drivers/gpu/drm/i915/i915_gem_ww.h | 53 ++ drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/i915_memcpy.c | 2 +- drivers/gpu/drm/i915/i915_memcpy.h | 2 +- drivers/gpu/drm/i915/i915_mm.c | 2 +- drivers/gpu/drm/i915/i915_params.c | 11 +- drivers/gpu/drm/i915/i915_params.h | 3 +- drivers/gpu/drm/i915/i915_pci.c | 5 +- drivers/gpu/drm/i915/i915_perf.c | 57 +- drivers/gpu/drm/i915/i915_query.c | 62 ++ drivers/gpu/drm/i915/i915_reg.h | 17 + drivers/gpu/drm/i915/i915_selftest.h | 2 + drivers/gpu/drm/i915/i915_vma.c | 154 ++- drivers/gpu/drm/i915/i915_vma.h | 28 +- drivers/gpu/drm/i915/intel_memory_region.c | 229 ++++- drivers/gpu/drm/i915/intel_memory_region.h | 53 +- drivers/gpu/drm/i915/intel_region_lmem.c | 168 ++-- drivers/gpu/drm/i915/intel_region_lmem.h | 3 +- drivers/gpu/drm/i915/intel_uncore.c | 12 + drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 100 +- .../drm/i915/selftests/i915_live_selftests.h | 1 + drivers/gpu/drm/i915/selftests/i915_perf.c | 3 +- drivers/gpu/drm/i915/selftests/i915_request.c | 10 +- drivers/gpu/drm/i915/selftests/igt_spinner.c | 136 ++- drivers/gpu/drm/i915/selftests/igt_spinner.h | 5 + .../drm/i915/selftests/intel_memory_region.c | 442 ++++++++- drivers/gpu/drm/i915/selftests/mock_region.c | 4 +- include/uapi/drm/i915_drm.h | 118 +++ 148 files changed, 7813 insertions(+), 3312 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_create.c delete mode 100644 drivers/gpu/drm/i915/gem/i915_gem_fence.c create mode 100644 drivers/gpu/drm/i915/gt/gen8_engine_cs.c create mode 100644 drivers/gpu/drm/i915/gt/gen8_engine_cs.h create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.h rename drivers/gpu/drm/i915/gt/{intel_lrc.c => intel_execlists_submission.c} (87%) create mode 100644 drivers/gpu/drm/i915/gt/intel_execlists_submission.h delete mode 100644 drivers/gpu/drm/i915/gt/intel_lrc.h rename drivers/gpu/drm/i915/gt/{selftest_lrc.c => selftest_execlists.c} (99%) create mode 100644 drivers/gpu/drm/i915/i915_gem_ww.c create mode 100644 drivers/gpu/drm/i915/i915_gem_ww.h
In igt_ppgtt_sanity_check we should also exercise the non-contiguous option for LMEM, since this will give us slightly different sg layouts and alignment.
Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 1f35e71429b4..0bf93947d89d 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1333,6 +1333,7 @@ static int igt_ppgtt_sanity_check(void *arg) unsigned int flags; } backends[] = { { igt_create_system, 0, }, + { igt_create_local, 0, }, { igt_create_local, I915_BO_ALLOC_CONTIGUOUS, }, }; struct {
Quoting Matthew Auld (2020-11-27 12:04:37)
In igt_ppgtt_sanity_check we should also exercise the non-contiguous option for LMEM, since this will give us slightly different sg layouts and alignment.
Signed-off-by: Matthew Auld matthew.auld@intel.com
Reviewed-by: Chris Wilson chris@chris-wilson.co.uk -Chris
For the LMEM case if we have suitable alignment and 2M physical pages we should always get 2M GTT pages within the constraints of the hugepages selftest. If we don't then something might be wrong in our construction of the backing pages.
Signed-off-by: Matthew Auld matthew.auld@intel.com --- .../gpu/drm/i915/gem/selftests/huge_pages.c | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 0bf93947d89d..77a13527a7e6 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -368,6 +368,27 @@ static int igt_check_page_sizes(struct i915_vma *vma) err = -EINVAL; }
+ + /* + * The dma-api is like a box of chocolates when it comes to the + * alignment of dma addresses, however for LMEM we have total control + * and so can guarantee alignment, likewise when we allocate our blocks + * they should appear in descending order, and if we know that we align + * to the largest page size for the GTT address, we should be able to + * assert that if we see 2M physical pages then we should also get 2M + * GTT pages. If we don't then something might be wrong in our + * construction of the backing pages. + */ + if (i915_gem_object_is_lmem(obj) && + IS_ALIGNED(vma->node.start, SZ_2M) && + vma->page_sizes.sg & SZ_2M && + vma->page_sizes.gtt < SZ_2M) { + pr_err("gtt pages mismatch for LMEM, expected 2M GTT pages, sg(%u), gtt(%u)\n", + vma->page_sizes.sg, vma->page_sizes.gtt); + err = -EINVAL; + } + + if (obj->mm.page_sizes.gtt) { pr_err("obj->page_sizes.gtt(%u) should never be set\n", obj->mm.page_sizes.gtt);
We currently only support WC when mapping device local-memory, which is returned as a generic -ENOMEM when mapping the object with an unsupported type. Try to handle that case also, although it's starting to get pretty ugly in there.
Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/selftests/intel_memory_region.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 0aeba8e3af28..27389fb19951 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -681,6 +681,8 @@ create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, i915_gem_object_put(obj); if (PTR_ERR(addr) == -ENXIO) return ERR_PTR(-ENODEV); + if (PTR_ERR(addr) == -ENOMEM) /* WB local-memory */ + return ERR_PTR(-ENODEV); return addr; }
From: Chris Wilson chris@chris-wilson.co.uk
Cleanup intel_lrc.h by moving some of the residual common register definitions into intel_lrc_reg.h, prior to rebranding and splitting off the submission backends.
v2: keep the SCHEDULE enum in the old file, since it is specific to the gvt usage of the execlists submission backend (John)
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com #v2 Cc: John Harrison John.C.Harrison@Intel.com Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt_irq.c | 1 + drivers/gpu/drm/i915/gt/intel_lrc.h | 39 ----------------------- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 39 +++++++++++++++++++++++ drivers/gpu/drm/i915/gvt/mmio_context.h | 2 ++ 5 files changed, 43 insertions(+), 40 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index d4e988b2816a..02ea16b29c9f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -36,7 +36,7 @@ #include "intel_gt.h" #include "intel_gt_requests.h" #include "intel_gt_pm.h" -#include "intel_lrc.h" +#include "intel_lrc_reg.h" #include "intel_reset.h" #include "intel_ring.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c index 257063a57101..9830342aa6f4 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c @@ -11,6 +11,7 @@ #include "intel_breadcrumbs.h" #include "intel_gt.h" #include "intel_gt_irq.h" +#include "intel_lrc_reg.h" #include "intel_uncore.h" #include "intel_rps.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index 802585a308e9..9116b46844a2 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -34,45 +34,6 @@ struct i915_request; struct intel_context; struct intel_engine_cs;
-/* Execlists regs */ -#define RING_ELSP(base) _MMIO((base) + 0x230) -#define RING_EXECLIST_STATUS_LO(base) _MMIO((base) + 0x234) -#define RING_EXECLIST_STATUS_HI(base) _MMIO((base) + 0x234 + 4) -#define RING_CONTEXT_CONTROL(base) _MMIO((base) + 0x244) -#define CTX_CTRL_INHIBIT_SYN_CTX_SWITCH (1 << 3) -#define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT (1 << 0) -#define CTX_CTRL_RS_CTX_ENABLE (1 << 1) -#define CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT (1 << 2) -#define GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE (1 << 8) -#define RING_CONTEXT_STATUS_PTR(base) _MMIO((base) + 0x3a0) -#define RING_EXECLIST_SQ_CONTENTS(base) _MMIO((base) + 0x510) -#define RING_EXECLIST_CONTROL(base) _MMIO((base) + 0x550) - -#define EL_CTRL_LOAD (1 << 0) - -/* The docs specify that the write pointer wraps around after 5h, "After status - * is written out to the last available status QW at offset 5h, this pointer - * wraps to 0." - * - * Therefore, one must infer than even though there are 3 bits available, 6 and - * 7 appear to be * reserved. - */ -#define GEN8_CSB_ENTRIES 6 -#define GEN8_CSB_PTR_MASK 0x7 -#define GEN8_CSB_READ_PTR_MASK (GEN8_CSB_PTR_MASK << 8) -#define GEN8_CSB_WRITE_PTR_MASK (GEN8_CSB_PTR_MASK << 0) - -#define GEN11_CSB_ENTRIES 12 -#define GEN11_CSB_PTR_MASK 0xf -#define GEN11_CSB_READ_PTR_MASK (GEN11_CSB_PTR_MASK << 8) -#define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0) - -#define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */ -#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ -#define GEN11_MAX_CONTEXT_HW_ID (1<<11) /* exclusive */ -/* in Gen12 ID 0x7FF is reserved to indicate idle */ -#define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) - enum { INTEL_CONTEXT_SCHEDULE_IN = 0, INTEL_CONTEXT_SCHEDULE_OUT, diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 1b51f7b9a5c3..b2e03ce35599 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -52,4 +52,43 @@ #define GEN8_EXECLISTS_STATUS_BUF 0x370 #define GEN11_EXECLISTS_STATUS_BUF2 0x3c0
+/* Execlists regs */ +#define RING_ELSP(base) _MMIO((base) + 0x230) +#define RING_EXECLIST_STATUS_LO(base) _MMIO((base) + 0x234) +#define RING_EXECLIST_STATUS_HI(base) _MMIO((base) + 0x234 + 4) +#define RING_CONTEXT_CONTROL(base) _MMIO((base) + 0x244) +#define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT REG_BIT(0) +#define CTX_CTRL_RS_CTX_ENABLE REG_BIT(1) +#define CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT REG_BIT(2) +#define CTX_CTRL_INHIBIT_SYN_CTX_SWITCH REG_BIT(3) +#define GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE REG_BIT(8) +#define RING_CONTEXT_STATUS_PTR(base) _MMIO((base) + 0x3a0) +#define RING_EXECLIST_SQ_CONTENTS(base) _MMIO((base) + 0x510) +#define RING_EXECLIST_CONTROL(base) _MMIO((base) + 0x550) +#define EL_CTRL_LOAD REG_BIT(0) + +/* + * The docs specify that the write pointer wraps around after 5h, "After status + * is written out to the last available status QW at offset 5h, this pointer + * wraps to 0." + * + * Therefore, one must infer than even though there are 3 bits available, 6 and + * 7 appear to be * reserved. + */ +#define GEN8_CSB_ENTRIES 6 +#define GEN8_CSB_PTR_MASK 0x7 +#define GEN8_CSB_READ_PTR_MASK (GEN8_CSB_PTR_MASK << 8) +#define GEN8_CSB_WRITE_PTR_MASK (GEN8_CSB_PTR_MASK << 0) + +#define GEN11_CSB_ENTRIES 12 +#define GEN11_CSB_PTR_MASK 0xf +#define GEN11_CSB_READ_PTR_MASK (GEN11_CSB_PTR_MASK << 8) +#define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0) + +#define MAX_CONTEXT_HW_ID (1 << 21) /* exclusive */ +#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ +#define GEN11_MAX_CONTEXT_HW_ID (1 << 11) /* exclusive */ +/* in Gen12 ID 0x7FF is reserved to indicate idle */ +#define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) + #endif /* _INTEL_LRC_REG_H_ */ diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.h b/drivers/gpu/drm/i915/gvt/mmio_context.h index 3b25e7fe32f6..412b96ee6883 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.h +++ b/drivers/gpu/drm/i915/gvt/mmio_context.h @@ -36,6 +36,8 @@ #ifndef __GVT_RENDER_H__ #define __GVT_RENDER_H__
+#include "gt/intel_lrc_reg.h" + struct engine_mmio { enum intel_engine_id id; i915_reg_t reg;
Quoting Matthew Auld (2020-11-27 12:04:40)
From: Chris Wilson chris@chris-wilson.co.uk
Cleanup intel_lrc.h by moving some of the residual common register definitions into intel_lrc_reg.h, prior to rebranding and splitting off the submission backends.
v2: keep the SCHEDULE enum in the old file, since it is specific to the gvt usage of the execlists submission backend (John)
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com #v2 Cc: John Harrison John.C.Harrison@Intel.com Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt_irq.c | 1 + drivers/gpu/drm/i915/gt/intel_lrc.h | 39 ----------------------- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 39 +++++++++++++++++++++++ drivers/gpu/drm/i915/gvt/mmio_context.h | 2 ++ 5 files changed, 43 insertions(+), 40 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index d4e988b2816a..02ea16b29c9f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -36,7 +36,7 @@ #include "intel_gt.h" #include "intel_gt_requests.h" #include "intel_gt_pm.h" -#include "intel_lrc.h" +#include "intel_lrc_reg.h" #include "intel_reset.h" #include "intel_ring.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c index 257063a57101..9830342aa6f4 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c @@ -11,6 +11,7 @@ #include "intel_breadcrumbs.h" #include "intel_gt.h" #include "intel_gt_irq.h" +#include "intel_lrc_reg.h" #include "intel_uncore.h" #include "intel_rps.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index 802585a308e9..9116b46844a2 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -34,45 +34,6 @@ struct i915_request; struct intel_context; struct intel_engine_cs;
-/* Execlists regs */ -#define RING_ELSP(base) _MMIO((base) + 0x230) -#define RING_EXECLIST_STATUS_LO(base) _MMIO((base) + 0x234) -#define RING_EXECLIST_STATUS_HI(base) _MMIO((base) + 0x234 + 4) -#define RING_CONTEXT_CONTROL(base) _MMIO((base) + 0x244) -#define CTX_CTRL_INHIBIT_SYN_CTX_SWITCH (1 << 3) -#define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT (1 << 0) -#define CTX_CTRL_RS_CTX_ENABLE (1 << 1) -#define CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT (1 << 2) -#define GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE (1 << 8) -#define RING_CONTEXT_STATUS_PTR(base) _MMIO((base) + 0x3a0) -#define RING_EXECLIST_SQ_CONTENTS(base) _MMIO((base) + 0x510) -#define RING_EXECLIST_CONTROL(base) _MMIO((base) + 0x550)
-#define EL_CTRL_LOAD (1 << 0)
-/* The docs specify that the write pointer wraps around after 5h, "After status
- is written out to the last available status QW at offset 5h, this pointer
- wraps to 0."
- Therefore, one must infer than even though there are 3 bits available, 6 and
- 7 appear to be * reserved.
- */
-#define GEN8_CSB_ENTRIES 6 -#define GEN8_CSB_PTR_MASK 0x7 -#define GEN8_CSB_READ_PTR_MASK (GEN8_CSB_PTR_MASK << 8) -#define GEN8_CSB_WRITE_PTR_MASK (GEN8_CSB_PTR_MASK << 0)
-#define GEN11_CSB_ENTRIES 12 -#define GEN11_CSB_PTR_MASK 0xf -#define GEN11_CSB_READ_PTR_MASK (GEN11_CSB_PTR_MASK << 8) -#define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0)
-#define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */ -#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ -#define GEN11_MAX_CONTEXT_HW_ID (1<<11) /* exclusive */ -/* in Gen12 ID 0x7FF is reserved to indicate idle */ -#define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1)
enum { INTEL_CONTEXT_SCHEDULE_IN = 0, INTEL_CONTEXT_SCHEDULE_OUT, diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 1b51f7b9a5c3..b2e03ce35599 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -52,4 +52,43 @@ #define GEN8_EXECLISTS_STATUS_BUF 0x370 #define GEN11_EXECLISTS_STATUS_BUF2 0x3c0
+/* Execlists regs */ +#define RING_ELSP(base) _MMIO((base) + 0x230) +#define RING_EXECLIST_STATUS_LO(base) _MMIO((base) + 0x234) +#define RING_EXECLIST_STATUS_HI(base) _MMIO((base) + 0x234 + 4) +#define RING_CONTEXT_CONTROL(base) _MMIO((base) + 0x244) +#define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT REG_BIT(0) +#define CTX_CTRL_RS_CTX_ENABLE REG_BIT(1) +#define CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT REG_BIT(2) +#define CTX_CTRL_INHIBIT_SYN_CTX_SWITCH REG_BIT(3) +#define GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE REG_BIT(8) +#define RING_CONTEXT_STATUS_PTR(base) _MMIO((base) + 0x3a0) +#define RING_EXECLIST_SQ_CONTENTS(base) _MMIO((base) + 0x510) +#define RING_EXECLIST_CONTROL(base) _MMIO((base) + 0x550) +#define EL_CTRL_LOAD REG_BIT(0)
+/*
- The docs specify that the write pointer wraps around after 5h, "After status
- is written out to the last available status QW at offset 5h, this pointer
- wraps to 0."
- Therefore, one must infer than even though there are 3 bits available, 6 and
- 7 appear to be * reserved.
Stray '*'
That's a very weird statement. 6/7 simply do not exist, since the ringbuffer doesn't have that many elements. -Chris
From: Chris Wilson chris@chris-wilson.co.uk
We want to separate the utility functions for controlling the logical ring context from the execlists submission mechanism (which is an overgrown scheduler).
This is similar to Daniele's work to split up the files, but being selfish I wanted to base it after my own changes to intel_lrc.c petered out.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com --- drivers/gpu/drm/i915/Makefile | 2 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 1 + drivers/gpu/drm/i915/gt/intel_context_sseu.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 1 + ...tel_lrc.c => intel_execlists_submission.c} | 30 ++---------------- ...tel_lrc.h => intel_execlists_submission.h} | 31 +++---------------- drivers/gpu/drm/i915/gt/intel_mocs.c | 2 +- .../{selftest_lrc.c => selftest_execlists.c} | 0 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 1 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 + drivers/gpu/drm/i915/gvt/scheduler.c | 1 + drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_perf.c | 1 + 13 files changed, 16 insertions(+), 58 deletions(-) rename drivers/gpu/drm/i915/gt/{intel_lrc.c => intel_execlists_submission.c} (99%) rename drivers/gpu/drm/i915/gt/{intel_lrc.h => intel_execlists_submission.h} (57%) rename drivers/gpu/drm/i915/gt/{selftest_lrc.c => selftest_execlists.c} (100%)
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index e5574e506a5c..aedbd8f52be8 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -91,6 +91,7 @@ gt-y += \ gt/intel_engine_heartbeat.o \ gt/intel_engine_pm.o \ gt/intel_engine_user.o \ + gt/intel_execlists_submission.o \ gt/intel_ggtt.o \ gt/intel_ggtt_fencing.o \ gt/intel_gt.o \ @@ -102,7 +103,6 @@ gt-y += \ gt/intel_gt_requests.o \ gt/intel_gtt.o \ gt/intel_llc.o \ - gt/intel_lrc.o \ gt/intel_mocs.o \ gt/intel_ppgtt.o \ gt/intel_rc6.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index a6299da64de4..ad136d009d9b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -72,6 +72,7 @@ #include "gt/intel_context_param.h" #include "gt/intel_engine_heartbeat.h" #include "gt/intel_engine_user.h" +#include "gt/intel_execlists_submission.h" /* virtual_engine */ #include "gt/intel_ring.h"
#include "i915_gem_context.h" diff --git a/drivers/gpu/drm/i915/gt/intel_context_sseu.c b/drivers/gpu/drm/i915/gt/intel_context_sseu.c index b9c8163978a3..5f94b44022dc 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_context_sseu.c @@ -8,7 +8,7 @@ #include "intel_context.h" #include "intel_engine_pm.h" #include "intel_gpu_commands.h" -#include "intel_lrc.h" +#include "intel_execlists_submission.h" #include "intel_lrc_reg.h" #include "intel_ring.h" #include "intel_sseu.h" diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 02ea16b29c9f..97ceaf7116e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -33,6 +33,7 @@ #include "intel_engine.h" #include "intel_engine_pm.h" #include "intel_engine_user.h" +#include "intel_execlists_submission.h" #include "intel_gt.h" #include "intel_gt_requests.h" #include "intel_gt_pm.h" diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c similarity index 99% rename from drivers/gpu/drm/i915/gt/intel_lrc.c rename to drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 43703efb36d1..fc330233ea20 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -1,31 +1,6 @@ +// SPDX-License-Identifier: MIT /* * Copyright © 2014 Intel Corporation - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice (including the next - * paragraph) shall be included in all copies or substantial portions of the - * Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS - * IN THE SOFTWARE. - * - * Authors: - * Ben Widawsky ben@bwidawsk.net - * Michel Thierry michel.thierry@intel.com - * Thomas Daniel thomas.daniel@intel.com - * Oscar Mateo oscar.mateo@intel.com - * */
/** @@ -140,6 +115,7 @@ #include "intel_breadcrumbs.h" #include "intel_context.h" #include "intel_engine_pm.h" +#include "intel_execlists_submission.h" #include "intel_gt.h" #include "intel_gt_pm.h" #include "intel_gt_requests.h" @@ -6127,5 +6103,5 @@ intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine) }
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) -#include "selftest_lrc.c" +#include "selftest_execlists.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h similarity index 57% rename from drivers/gpu/drm/i915/gt/intel_lrc.h rename to drivers/gpu/drm/i915/gt/intel_execlists_submission.h index 9116b46844a2..2c9d7354b42f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h @@ -1,35 +1,15 @@ +/* SPDX-License-Identifier: MIT */ /* * Copyright © 2014 Intel Corporation - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice (including the next - * paragraph) shall be included in all copies or substantial portions of the - * Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER - * DEALINGS IN THE SOFTWARE. */
-#ifndef _INTEL_LRC_H_ -#define _INTEL_LRC_H_ +#ifndef __INTEL_EXECLISTS_SUBMISSION_H__ +#define __INTEL_EXECLISTS_SUBMISSION_H__
#include <linux/types.h>
struct drm_printer;
-struct drm_i915_private; -struct i915_gem_context; struct i915_request; struct intel_context; struct intel_engine_cs; @@ -40,9 +20,6 @@ enum { INTEL_CONTEXT_SCHEDULE_PREEMPTED, };
-/* Logical Rings */ -void intel_logical_ring_cleanup(struct intel_engine_cs *engine); - int intel_execlists_submission_setup(struct intel_engine_cs *engine);
/* Logical Ring Contexts */ @@ -86,4 +63,4 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, bool intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
-#endif /* _INTEL_LRC_H_ */ +#endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c index b8d0c32ae9dd..516206007398 100644 --- a/drivers/gpu/drm/i915/gt/intel_mocs.c +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c @@ -24,8 +24,8 @@
#include "intel_engine.h" #include "intel_gt.h" +#include "intel_lrc_reg.h" #include "intel_mocs.h" -#include "intel_lrc.h" #include "intel_ring.h"
/* structures required */ diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c similarity index 100% rename from drivers/gpu/drm/i915/gt/selftest_lrc.c rename to drivers/gpu/drm/i915/gt/selftest_execlists.c diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index 5212ff844292..1a2e4f631763 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */
+#include "gt/intel_execlists_submission.h" /* lrc layout */ #include "gt/intel_gt.h" #include "intel_guc_ads.h" #include "intel_uc.h" diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index fdfeb4b9b0f5..8528ab574dbe 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -8,6 +8,7 @@ #include "gem/i915_gem_context.h" #include "gt/intel_context.h" #include "gt/intel_engine_pm.h" +#include "gt/intel_execlists_submission.h" /* XXX */ #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" #include "gt/intel_lrc_reg.h" diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index aed2ef6466a2..ed30fdde4114 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -37,6 +37,7 @@
#include "gem/i915_gem_pm.h" #include "gt/intel_context.h" +#include "gt/intel_execlists_submission.h" #include "gt/intel_ring.h"
#include "i915_drv.h" diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 15be8debae54..0f7bf6831633 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -79,7 +79,6 @@ #include "gem/i915_gem_shrinker.h" #include "gem/i915_gem_stolen.h"
-#include "gt/intel_lrc.h" #include "gt/intel_engine.h" #include "gt/intel_gt_types.h" #include "gt/intel_workarounds.h" diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 3b12c8ff7182..0b300e0d9561 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -198,6 +198,7 @@ #include "gem/i915_gem_context.h" #include "gt/intel_engine_pm.h" #include "gt/intel_engine_user.h" +#include "gt/intel_execlists_submission.h" #include "gt/intel_gt.h" #include "gt/intel_lrc_reg.h" #include "gt/intel_ring.h"
Quoting Matthew Auld (2020-11-27 12:04:41)
From: Chris Wilson chris@chris-wilson.co.uk
We want to separate the utility functions for controlling the logical ring context from the execlists submission mechanism (which is an overgrown scheduler).
This is similar to Daniele's work to split up the files, but being selfish I wanted to base it after my own changes to intel_lrc.c petered out.
Note in accordance with recent intel_ring_submission.c vs intel_ring_scheduler.c, this would be intel_execlists_scheduler.c -Chris
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
These functions are independent from the backend used and can therefore be split out of the exelists submission file, so they can be re-used by the upcoming GuC submission backend.
Based on a patch by Chris Wilson.
Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Reviewed-by: John Harrison John.C.Harrison@Intel.com --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 393 ++++++++++++++++++ drivers/gpu/drm/i915/gt/gen8_engine_cs.h | 26 ++ .../drm/i915/gt/intel_execlists_submission.c | 385 +---------------- 4 files changed, 421 insertions(+), 384 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/gen8_engine_cs.c create mode 100644 drivers/gpu/drm/i915/gt/gen8_engine_cs.h
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index aedbd8f52be8..f9ef5199b124 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -82,6 +82,7 @@ gt-y += \ gt/gen6_engine_cs.o \ gt/gen6_ppgtt.o \ gt/gen7_renderclear.o \ + gt/gen8_engine_cs.o \ gt/gen8_ppgtt.o \ gt/intel_breadcrumbs.o \ gt/intel_context.o \ diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c new file mode 100644 index 000000000000..a96fe108685e --- /dev/null +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -0,0 +1,393 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2014 Intel Corporation + */ + +#include "i915_drv.h" +#include "intel_execlists_submission.h" /* XXX */ +#include "intel_gpu_commands.h" +#include "intel_ring.h" + +int gen8_emit_flush_render(struct i915_request *request, u32 mode) +{ + bool vf_flush_wa = false, dc_flush_wa = false; + u32 *cs, flags = 0; + int len; + + flags |= PIPE_CONTROL_CS_STALL; + + if (mode & EMIT_FLUSH) { + flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; + flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; + flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; + flags |= PIPE_CONTROL_FLUSH_ENABLE; + } + + if (mode & EMIT_INVALIDATE) { + flags |= PIPE_CONTROL_TLB_INVALIDATE; + flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_QW_WRITE; + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + + /* + * On GEN9: before VF_CACHE_INVALIDATE we need to emit a NULL + * pipe control. + */ + if (IS_GEN(request->engine->i915, 9)) + vf_flush_wa = true; + + /* WaForGAMHang:kbl */ + if (IS_KBL_GT_REVID(request->engine->i915, 0, KBL_REVID_B0)) + dc_flush_wa = true; + } + + len = 6; + + if (vf_flush_wa) + len += 6; + + if (dc_flush_wa) + len += 12; + + cs = intel_ring_begin(request, len); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + if (vf_flush_wa) + cs = gen8_emit_pipe_control(cs, 0, 0); + + if (dc_flush_wa) + cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_DC_FLUSH_ENABLE, + 0); + + cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); + + if (dc_flush_wa) + cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_CS_STALL, 0); + + intel_ring_advance(request, cs); + + return 0; +} + +int gen8_emit_flush(struct i915_request *request, u32 mode) +{ + u32 cmd, *cs; + + cs = intel_ring_begin(request, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + cmd = MI_FLUSH_DW + 1; + + /* We always require a command barrier so that subsequent + * commands, such as breadcrumb interrupts, are strictly ordered + * wrt the contents of the write cache being flushed to memory + * (and thus being coherent from the CPU). + */ + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; + + if (mode & EMIT_INVALIDATE) { + cmd |= MI_INVALIDATE_TLB; + if (request->engine->class == VIDEO_DECODE_CLASS) + cmd |= MI_INVALIDATE_BSD; + } + + *cs++ = cmd; + *cs++ = LRC_PPHWSP_SCRATCH_ADDR; + *cs++ = 0; /* upper addr */ + *cs++ = 0; /* value */ + intel_ring_advance(request, cs); + + return 0; +} + +int gen11_emit_flush_render(struct i915_request *request, u32 mode) +{ + if (mode & EMIT_FLUSH) { + u32 *cs; + u32 flags = 0; + + flags |= PIPE_CONTROL_CS_STALL; + + flags |= PIPE_CONTROL_TILE_CACHE_FLUSH; + flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; + flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; + flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; + flags |= PIPE_CONTROL_FLUSH_ENABLE; + flags |= PIPE_CONTROL_QW_WRITE; + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + + cs = intel_ring_begin(request, 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); + intel_ring_advance(request, cs); + } + + if (mode & EMIT_INVALIDATE) { + u32 *cs; + u32 flags = 0; + + flags |= PIPE_CONTROL_CS_STALL; + + flags |= PIPE_CONTROL_COMMAND_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_TLB_INVALIDATE; + flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_QW_WRITE; + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + + cs = intel_ring_begin(request, 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); + intel_ring_advance(request, cs); + } + + return 0; +} + +static u32 preparser_disable(bool state) +{ + return MI_ARB_CHECK | 1 << 8 | state; +} + +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) +{ + static const i915_reg_t vd[] = { + GEN12_VD0_AUX_NV, + GEN12_VD1_AUX_NV, + GEN12_VD2_AUX_NV, + GEN12_VD3_AUX_NV, + }; + + static const i915_reg_t ve[] = { + GEN12_VE0_AUX_NV, + GEN12_VE1_AUX_NV, + }; + + if (engine->class == VIDEO_DECODE_CLASS) + return vd[engine->instance]; + + if (engine->class == VIDEO_ENHANCEMENT_CLASS) + return ve[engine->instance]; + + GEM_BUG_ON("unknown aux_inv_reg\n"); + + return INVALID_MMIO_REG; +} + +static u32 * +gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs) +{ + *cs++ = MI_LOAD_REGISTER_IMM(1); + *cs++ = i915_mmio_reg_offset(inv_reg); + *cs++ = AUX_INV; + *cs++ = MI_NOOP; + + return cs; +} + +int gen12_emit_flush_render(struct i915_request *request, u32 mode) +{ + if (mode & EMIT_FLUSH) { + u32 flags = 0; + u32 *cs; + + flags |= PIPE_CONTROL_TILE_CACHE_FLUSH; + flags |= PIPE_CONTROL_FLUSH_L3; + flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; + flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; + /* Wa_1409600907:tgl */ + flags |= PIPE_CONTROL_DEPTH_STALL; + flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; + flags |= PIPE_CONTROL_FLUSH_ENABLE; + + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + flags |= PIPE_CONTROL_QW_WRITE; + + flags |= PIPE_CONTROL_CS_STALL; + + cs = intel_ring_begin(request, 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + cs = gen12_emit_pipe_control(cs, + PIPE_CONTROL0_HDC_PIPELINE_FLUSH, + flags, LRC_PPHWSP_SCRATCH_ADDR); + intel_ring_advance(request, cs); + } + + if (mode & EMIT_INVALIDATE) { + u32 flags = 0; + u32 *cs; + + flags |= PIPE_CONTROL_COMMAND_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_TLB_INVALIDATE; + flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; + flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; + + flags |= PIPE_CONTROL_STORE_DATA_INDEX; + flags |= PIPE_CONTROL_QW_WRITE; + + flags |= PIPE_CONTROL_CS_STALL; + + cs = intel_ring_begin(request, 8 + 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + /* + * Prevent the pre-parser from skipping past the TLB + * invalidate and loading a stale page for the batch + * buffer / request payload. + */ + *cs++ = preparser_disable(true); + + cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); + + /* hsdes: 1809175790 */ + cs = gen12_emit_aux_table_inv(GEN12_GFX_CCS_AUX_NV, cs); + + *cs++ = preparser_disable(false); + intel_ring_advance(request, cs); + } + + return 0; +} + +int gen12_emit_flush(struct i915_request *request, u32 mode) +{ + intel_engine_mask_t aux_inv = 0; + u32 cmd, *cs; + + cmd = 4; + if (mode & EMIT_INVALIDATE) + cmd += 2; + if (mode & EMIT_INVALIDATE) + aux_inv = request->engine->mask & ~BIT(BCS0); + if (aux_inv) + cmd += 2 * hweight8(aux_inv) + 2; + + cs = intel_ring_begin(request, cmd); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + if (mode & EMIT_INVALIDATE) + *cs++ = preparser_disable(true); + + cmd = MI_FLUSH_DW + 1; + + /* We always require a command barrier so that subsequent + * commands, such as breadcrumb interrupts, are strictly ordered + * wrt the contents of the write cache being flushed to memory + * (and thus being coherent from the CPU). + */ + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; + + if (mode & EMIT_INVALIDATE) { + cmd |= MI_INVALIDATE_TLB; + if (request->engine->class == VIDEO_DECODE_CLASS) + cmd |= MI_INVALIDATE_BSD; + } + + *cs++ = cmd; + *cs++ = LRC_PPHWSP_SCRATCH_ADDR; + *cs++ = 0; /* upper addr */ + *cs++ = 0; /* value */ + + if (aux_inv) { /* hsdes: 1809175790 */ + struct intel_engine_cs *engine; + unsigned int tmp; + + *cs++ = MI_LOAD_REGISTER_IMM(hweight8(aux_inv)); + for_each_engine_masked(engine, request->engine->gt, + aux_inv, tmp) { + *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine)); + *cs++ = AUX_INV; + } + *cs++ = MI_NOOP; + } + + if (mode & EMIT_INVALIDATE) + *cs++ = preparser_disable(false); + + intel_ring_advance(request, cs); + + return 0; +} + +int gen8_emit_bb_start_noarb(struct i915_request *rq, + u64 offset, u32 len, + const unsigned int flags) +{ + u32 *cs; + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + /* + * WaDisableCtxRestoreArbitration:bdw,chv + * + * We don't need to perform MI_ARB_ENABLE as often as we do (in + * particular all the gen that do not need the w/a at all!), if we + * took care to make sure that on every switch into this context + * (both ordinary and for preemption) that arbitrartion was enabled + * we would be fine. However, for gen8 there is another w/a that + * requires us to not preempt inside GPGPU execution, so we keep + * arbitration disabled for gen8 batches. Arbitration will be + * re-enabled before we close the request + * (engine->emit_fini_breadcrumb). + */ + *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; + + /* FIXME(BDW+): Address space and security selectors. */ + *cs++ = MI_BATCH_BUFFER_START_GEN8 | + (flags & I915_DISPATCH_SECURE ? 0 : BIT(8)); + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); + + intel_ring_advance(rq, cs); + + return 0; +} + +int gen8_emit_bb_start(struct i915_request *rq, + u64 offset, u32 len, + const unsigned int flags) +{ + u32 *cs; + + cs = intel_ring_begin(rq, 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + + *cs++ = MI_BATCH_BUFFER_START_GEN8 | + (flags & I915_DISPATCH_SECURE ? 0 : BIT(8)); + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); + + *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; + *cs++ = MI_NOOP; + + intel_ring_advance(rq, cs); + + return 0; +} + + diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.h b/drivers/gpu/drm/i915/gt/gen8_engine_cs.h new file mode 100644 index 000000000000..c0c62284b650 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2014 Intel Corporation + */ + +#ifndef __GEN8_ENGINE_CS_H__ +#define __GEN8_ENGINE_CS_H__ + +#include <linux/types.h> + +struct i915_request; + +int gen8_emit_flush_render(struct i915_request *request, u32 mode); +int gen8_emit_flush(struct i915_request *request, u32 mode); +int gen11_emit_flush_render(struct i915_request *request, u32 mode); +int gen12_emit_flush_render(struct i915_request *request, u32 mode); +int gen12_emit_flush(struct i915_request *request, u32 mode); + +int gen8_emit_bb_start_noarb(struct i915_request *rq, + u64 offset, u32 len, + const unsigned int flags); +int gen8_emit_bb_start(struct i915_request *rq, + u64 offset, u32 len, + const unsigned int flags); + +#endif /* __GEN8_ENGINE_CS_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index fc330233ea20..9069a456d2f7 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -112,6 +112,7 @@ #include "i915_perf.h" #include "i915_trace.h" #include "i915_vgpu.h" +#include "gen8_engine_cs.h" #include "intel_breadcrumbs.h" #include "intel_context.h" #include "intel_engine_pm.h" @@ -4465,67 +4466,6 @@ static void execlists_reset_finish(struct intel_engine_cs *engine) atomic_read(&execlists->tasklet.count)); }
-static int gen8_emit_bb_start_noarb(struct i915_request *rq, - u64 offset, u32 len, - const unsigned int flags) -{ - u32 *cs; - - cs = intel_ring_begin(rq, 4); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - /* - * WaDisableCtxRestoreArbitration:bdw,chv - * - * We don't need to perform MI_ARB_ENABLE as often as we do (in - * particular all the gen that do not need the w/a at all!), if we - * took care to make sure that on every switch into this context - * (both ordinary and for preemption) that arbitrartion was enabled - * we would be fine. However, for gen8 there is another w/a that - * requires us to not preempt inside GPGPU execution, so we keep - * arbitration disabled for gen8 batches. Arbitration will be - * re-enabled before we close the request - * (engine->emit_fini_breadcrumb). - */ - *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; - - /* FIXME(BDW+): Address space and security selectors. */ - *cs++ = MI_BATCH_BUFFER_START_GEN8 | - (flags & I915_DISPATCH_SECURE ? 0 : BIT(8)); - *cs++ = lower_32_bits(offset); - *cs++ = upper_32_bits(offset); - - intel_ring_advance(rq, cs); - - return 0; -} - -static int gen8_emit_bb_start(struct i915_request *rq, - u64 offset, u32 len, - const unsigned int flags) -{ - u32 *cs; - - cs = intel_ring_begin(rq, 6); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - - *cs++ = MI_BATCH_BUFFER_START_GEN8 | - (flags & I915_DISPATCH_SECURE ? 0 : BIT(8)); - *cs++ = lower_32_bits(offset); - *cs++ = upper_32_bits(offset); - - *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; - *cs++ = MI_NOOP; - - intel_ring_advance(rq, cs); - - return 0; -} - static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine) { ENGINE_WRITE(engine, RING_IMR, @@ -4538,329 +4478,6 @@ static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine) ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask); }
-static int gen8_emit_flush(struct i915_request *request, u32 mode) -{ - u32 cmd, *cs; - - cs = intel_ring_begin(request, 4); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - cmd = MI_FLUSH_DW + 1; - - /* We always require a command barrier so that subsequent - * commands, such as breadcrumb interrupts, are strictly ordered - * wrt the contents of the write cache being flushed to memory - * (and thus being coherent from the CPU). - */ - cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; - - if (mode & EMIT_INVALIDATE) { - cmd |= MI_INVALIDATE_TLB; - if (request->engine->class == VIDEO_DECODE_CLASS) - cmd |= MI_INVALIDATE_BSD; - } - - *cs++ = cmd; - *cs++ = LRC_PPHWSP_SCRATCH_ADDR; - *cs++ = 0; /* upper addr */ - *cs++ = 0; /* value */ - intel_ring_advance(request, cs); - - return 0; -} - -static int gen8_emit_flush_render(struct i915_request *request, - u32 mode) -{ - bool vf_flush_wa = false, dc_flush_wa = false; - u32 *cs, flags = 0; - int len; - - flags |= PIPE_CONTROL_CS_STALL; - - if (mode & EMIT_FLUSH) { - flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; - flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; - flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; - flags |= PIPE_CONTROL_FLUSH_ENABLE; - } - - if (mode & EMIT_INVALIDATE) { - flags |= PIPE_CONTROL_TLB_INVALIDATE; - flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_QW_WRITE; - flags |= PIPE_CONTROL_STORE_DATA_INDEX; - - /* - * On GEN9: before VF_CACHE_INVALIDATE we need to emit a NULL - * pipe control. - */ - if (IS_GEN(request->engine->i915, 9)) - vf_flush_wa = true; - - /* WaForGAMHang:kbl */ - if (IS_KBL_GT_REVID(request->engine->i915, 0, KBL_REVID_B0)) - dc_flush_wa = true; - } - - len = 6; - - if (vf_flush_wa) - len += 6; - - if (dc_flush_wa) - len += 12; - - cs = intel_ring_begin(request, len); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - if (vf_flush_wa) - cs = gen8_emit_pipe_control(cs, 0, 0); - - if (dc_flush_wa) - cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_DC_FLUSH_ENABLE, - 0); - - cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); - - if (dc_flush_wa) - cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_CS_STALL, 0); - - intel_ring_advance(request, cs); - - return 0; -} - -static int gen11_emit_flush_render(struct i915_request *request, - u32 mode) -{ - if (mode & EMIT_FLUSH) { - u32 *cs; - u32 flags = 0; - - flags |= PIPE_CONTROL_CS_STALL; - - flags |= PIPE_CONTROL_TILE_CACHE_FLUSH; - flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; - flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; - flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; - flags |= PIPE_CONTROL_FLUSH_ENABLE; - flags |= PIPE_CONTROL_QW_WRITE; - flags |= PIPE_CONTROL_STORE_DATA_INDEX; - - cs = intel_ring_begin(request, 6); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); - intel_ring_advance(request, cs); - } - - if (mode & EMIT_INVALIDATE) { - u32 *cs; - u32 flags = 0; - - flags |= PIPE_CONTROL_CS_STALL; - - flags |= PIPE_CONTROL_COMMAND_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_TLB_INVALIDATE; - flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_QW_WRITE; - flags |= PIPE_CONTROL_STORE_DATA_INDEX; - - cs = intel_ring_begin(request, 6); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); - intel_ring_advance(request, cs); - } - - return 0; -} - -static u32 preparser_disable(bool state) -{ - return MI_ARB_CHECK | 1 << 8 | state; -} - -static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) -{ - static const i915_reg_t vd[] = { - GEN12_VD0_AUX_NV, - GEN12_VD1_AUX_NV, - GEN12_VD2_AUX_NV, - GEN12_VD3_AUX_NV, - }; - - static const i915_reg_t ve[] = { - GEN12_VE0_AUX_NV, - GEN12_VE1_AUX_NV, - }; - - if (engine->class == VIDEO_DECODE_CLASS) - return vd[engine->instance]; - - if (engine->class == VIDEO_ENHANCEMENT_CLASS) - return ve[engine->instance]; - - GEM_BUG_ON("unknown aux_inv_reg\n"); - - return INVALID_MMIO_REG; -} - -static u32 * -gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs) -{ - *cs++ = MI_LOAD_REGISTER_IMM(1); - *cs++ = i915_mmio_reg_offset(inv_reg); - *cs++ = AUX_INV; - *cs++ = MI_NOOP; - - return cs; -} - -static int gen12_emit_flush_render(struct i915_request *request, - u32 mode) -{ - if (mode & EMIT_FLUSH) { - u32 flags = 0; - u32 *cs; - - flags |= PIPE_CONTROL_TILE_CACHE_FLUSH; - flags |= PIPE_CONTROL_FLUSH_L3; - flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; - flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; - /* Wa_1409600907:tgl */ - flags |= PIPE_CONTROL_DEPTH_STALL; - flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; - flags |= PIPE_CONTROL_FLUSH_ENABLE; - - flags |= PIPE_CONTROL_STORE_DATA_INDEX; - flags |= PIPE_CONTROL_QW_WRITE; - - flags |= PIPE_CONTROL_CS_STALL; - - cs = intel_ring_begin(request, 6); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - cs = gen12_emit_pipe_control(cs, - PIPE_CONTROL0_HDC_PIPELINE_FLUSH, - flags, LRC_PPHWSP_SCRATCH_ADDR); - intel_ring_advance(request, cs); - } - - if (mode & EMIT_INVALIDATE) { - u32 flags = 0; - u32 *cs; - - flags |= PIPE_CONTROL_COMMAND_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_TLB_INVALIDATE; - flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; - flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; - - flags |= PIPE_CONTROL_STORE_DATA_INDEX; - flags |= PIPE_CONTROL_QW_WRITE; - - flags |= PIPE_CONTROL_CS_STALL; - - cs = intel_ring_begin(request, 8 + 4); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - /* - * Prevent the pre-parser from skipping past the TLB - * invalidate and loading a stale page for the batch - * buffer / request payload. - */ - *cs++ = preparser_disable(true); - - cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); - - /* hsdes: 1809175790 */ - cs = gen12_emit_aux_table_inv(GEN12_GFX_CCS_AUX_NV, cs); - - *cs++ = preparser_disable(false); - intel_ring_advance(request, cs); - } - - return 0; -} - -static int gen12_emit_flush(struct i915_request *request, u32 mode) -{ - intel_engine_mask_t aux_inv = 0; - u32 cmd, *cs; - - cmd = 4; - if (mode & EMIT_INVALIDATE) - cmd += 2; - if (mode & EMIT_INVALIDATE) - aux_inv = request->engine->mask & ~BIT(BCS0); - if (aux_inv) - cmd += 2 * hweight8(aux_inv) + 2; - - cs = intel_ring_begin(request, cmd); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - if (mode & EMIT_INVALIDATE) - *cs++ = preparser_disable(true); - - cmd = MI_FLUSH_DW + 1; - - /* We always require a command barrier so that subsequent - * commands, such as breadcrumb interrupts, are strictly ordered - * wrt the contents of the write cache being flushed to memory - * (and thus being coherent from the CPU). - */ - cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; - - if (mode & EMIT_INVALIDATE) { - cmd |= MI_INVALIDATE_TLB; - if (request->engine->class == VIDEO_DECODE_CLASS) - cmd |= MI_INVALIDATE_BSD; - } - - *cs++ = cmd; - *cs++ = LRC_PPHWSP_SCRATCH_ADDR; - *cs++ = 0; /* upper addr */ - *cs++ = 0; /* value */ - - if (aux_inv) { /* hsdes: 1809175790 */ - struct intel_engine_cs *engine; - unsigned int tmp; - - *cs++ = MI_LOAD_REGISTER_IMM(hweight8(aux_inv)); - for_each_engine_masked(engine, request->engine->gt, - aux_inv, tmp) { - *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine)); - *cs++ = AUX_INV; - } - *cs++ = MI_NOOP; - } - - if (mode & EMIT_INVALIDATE) - *cs++ = preparser_disable(false); - - intel_ring_advance(request, cs); - - return 0; -}
static void assert_request_valid(struct i915_request *rq) {
Quoting Matthew Auld (2020-11-27 12:04:42)
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
These functions are independent from the backend used and can therefore be split out of the exelists submission file, so they can be re-used by the upcoming GuC submission backend.
Based on a patch by Chris Wilson.
Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Reviewed-by: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 393 ++++++++++++++++++ drivers/gpu/drm/i915/gt/gen8_engine_cs.h | 26 ++ .../drm/i915/gt/intel_execlists_submission.c | 385 +---------------- 4 files changed, 421 insertions(+), 384 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/gen8_engine_cs.c create mode 100644 drivers/gpu/drm/i915/gt/gen8_engine_cs.h
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index aedbd8f52be8..f9ef5199b124 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -82,6 +82,7 @@ gt-y += \ gt/gen6_engine_cs.o \ gt/gen6_ppgtt.o \ gt/gen7_renderclear.o \
gt/gen8_engine_cs.o \ gt/gen8_ppgtt.o \ gt/intel_breadcrumbs.o \ gt/intel_context.o \
diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c new file mode 100644 index 000000000000..a96fe108685e --- /dev/null +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -0,0 +1,393 @@ +// SPDX-License-Identifier: MIT +/*
- Copyright © 2014 Intel Corporation
- */
+#include "i915_drv.h" +#include "intel_execlists_submission.h" /* XXX */ +#include "intel_gpu_commands.h" +#include "intel_ring.h"
+int gen8_emit_flush_render(struct i915_request *request, u32 mode)
Refresh the names to make the recent schemes. (rcs when specific, xcs when not) -Chris --------------------------------------------------------------------- Intel Corporation (UK) Limited Registered No. 1134945 (England) Registered Office: Pipers Way, Swindon SN3 1RJ VAT No: 860 2173 47
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Continuing the split of back-end independent code from the execlist submission specific file.
Based on a patch by Chris Wilson.
Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Reviewed-by: John Harrison John.C.Harrison@Intel.com --- drivers/gpu/drm/i915/Makefile | 1 + .../drm/i915/gt/intel_engine_workaround_bb.c | 335 ++++++++++++++++++ .../drm/i915/gt/intel_engine_workaround_bb.h | 14 + .../drm/i915/gt/intel_execlists_submission.c | 327 +---------------- 4 files changed, 352 insertions(+), 325 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.h
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index f9ef5199b124..2445cc990e15 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -92,6 +92,7 @@ gt-y += \ gt/intel_engine_heartbeat.o \ gt/intel_engine_pm.o \ gt/intel_engine_user.o \ + gt/intel_engine_workaround_bb.o \ gt/intel_execlists_submission.o \ gt/intel_ggtt.o \ gt/intel_ggtt_fencing.o \ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c b/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c new file mode 100644 index 000000000000..b03bdfc92bb2 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c @@ -0,0 +1,335 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2014 Intel Corporation + */ + +#include "i915_drv.h" +#include "intel_engine_types.h" +#include "intel_engine_workaround_bb.h" +#include "intel_execlists_submission.h" /* XXX */ +#include "intel_gpu_commands.h" +#include "intel_gt.h" + +/* + * In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after + * PIPE_CONTROL instruction. This is required for the flush to happen correctly + * but there is a slight complication as this is applied in WA batch where the + * values are only initialized once so we cannot take register value at the + * beginning and reuse it further; hence we save its value to memory, upload a + * constant value with bit21 set and then we restore it back with the saved value. + * To simplify the WA, a constant value is formed by using the default value + * of this register. This shouldn't be a problem because we are only modifying + * it for a short period and this batch in non-premptible. We can ofcourse + * use additional instructions that read the actual value of the register + * at that time and set our bit of interest but it makes the WA complicated. + * + * This WA is also required for Gen9 so extracting as a function avoids + * code duplication. + */ +static u32 * +gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine, u32 *batch) +{ + /* NB no one else is allowed to scribble over scratch + 256! */ + *batch++ = MI_STORE_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT; + *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); + *batch++ = intel_gt_scratch_offset(engine->gt, + INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA); + *batch++ = 0; + + *batch++ = MI_LOAD_REGISTER_IMM(1); + *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); + *batch++ = 0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES; + + batch = gen8_emit_pipe_control(batch, + PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_DC_FLUSH_ENABLE, + 0); + + *batch++ = MI_LOAD_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT; + *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); + *batch++ = intel_gt_scratch_offset(engine->gt, + INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA); + *batch++ = 0; + + return batch; +} + +/* + * Typically we only have one indirect_ctx and per_ctx batch buffer which are + * initialized at the beginning and shared across all contexts but this field + * helps us to have multiple batches at different offsets and select them based + * on a criteria. At the moment this batch always start at the beginning of the page + * and at this point we don't have multiple wa_ctx batch buffers. + * + * The number of WA applied are not known at the beginning; we use this field + * to return the no of DWORDS written. + * + * It is to be noted that this batch does not contain MI_BATCH_BUFFER_END + * so it adds NOOPs as padding to make it cacheline aligned. + * MI_BATCH_BUFFER_END will be added to perctx batch and both of them together + * makes a complete batch buffer. + */ +static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) +{ + /* WaDisableCtxRestoreArbitration:bdw,chv */ + *batch++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; + + /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */ + if (IS_BROADWELL(engine->i915)) + batch = gen8_emit_flush_coherentl3_wa(engine, batch); + + /* WaClearSlmSpaceAtContextSwitch:bdw,chv */ + /* Actual scratch location is at 128 bytes offset */ + batch = gen8_emit_pipe_control(batch, + PIPE_CONTROL_FLUSH_L3 | + PIPE_CONTROL_STORE_DATA_INDEX | + PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_QW_WRITE, + LRC_PPHWSP_SCRATCH_ADDR); + + *batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + + /* Pad to end of cacheline */ + while ((unsigned long)batch % CACHELINE_BYTES) + *batch++ = MI_NOOP; + + /* + * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because + * execution depends on the length specified in terms of cache lines + * in the register CTX_RCS_INDIRECT_CTX + */ + + return batch; +} + +struct lri { + i915_reg_t reg; + u32 value; +}; + +static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count) +{ + GEM_BUG_ON(!count || count > 63); + + *batch++ = MI_LOAD_REGISTER_IMM(count); + do { + *batch++ = i915_mmio_reg_offset(lri->reg); + *batch++ = lri->value; + } while (lri++, --count); + *batch++ = MI_NOOP; + + return batch; +} + +static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) +{ + static const struct lri lri[] = { + /* WaDisableGatherAtSetShaderCommonSlice:skl,bxt,kbl,glk */ + { + COMMON_SLICE_CHICKEN2, + __MASKED_FIELD(GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE, + 0), + }, + + /* BSpec: 11391 */ + { + FF_SLICE_CHICKEN, + __MASKED_FIELD(FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX, + FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX), + }, + + /* BSpec: 11299 */ + { + _3D_CHICKEN3, + __MASKED_FIELD(_3D_CHICKEN_SF_PROVOKING_VERTEX_FIX, + _3D_CHICKEN_SF_PROVOKING_VERTEX_FIX), + } + }; + + *batch++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; + + /* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */ + batch = gen8_emit_flush_coherentl3_wa(engine, batch); + + /* WaClearSlmSpaceAtContextSwitch:skl,bxt,kbl,glk,cfl */ + batch = gen8_emit_pipe_control(batch, + PIPE_CONTROL_FLUSH_L3 | + PIPE_CONTROL_STORE_DATA_INDEX | + PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_QW_WRITE, + LRC_PPHWSP_SCRATCH_ADDR); + + batch = emit_lri(batch, lri, ARRAY_SIZE(lri)); + + /* WaMediaPoolStateCmdInWABB:bxt,glk */ + if (HAS_POOLED_EU(engine->i915)) { + /* + * EU pool configuration is setup along with golden context + * during context initialization. This value depends on + * device type (2x6 or 3x6) and needs to be updated based + * on which subslice is disabled especially for 2x6 + * devices, however it is safe to load default + * configuration of 3x6 device instead of masking off + * corresponding bits because HW ignores bits of a disabled + * subslice and drops down to appropriate config. Please + * see render_state_setup() in i915_gem_render_state.c for + * possible configurations, to avoid duplication they are + * not shown here again. + */ + *batch++ = GEN9_MEDIA_POOL_STATE; + *batch++ = GEN9_MEDIA_POOL_ENABLE; + *batch++ = 0x00777000; + *batch++ = 0; + *batch++ = 0; + *batch++ = 0; + } + + *batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + + /* Pad to end of cacheline */ + while ((unsigned long)batch % CACHELINE_BYTES) + *batch++ = MI_NOOP; + + return batch; +} + +static u32 * +gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) +{ + int i; + + /* + * WaPipeControlBefore3DStateSamplePattern: cnl + * + * Ensure the engine is idle prior to programming a + * 3DSTATE_SAMPLE_PATTERN during a context restore. + */ + batch = gen8_emit_pipe_control(batch, + PIPE_CONTROL_CS_STALL, + 0); + /* + * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for + * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in + * total. However, a PIPE_CONTROL is 6 dwords long, not 4, which is + * confusing. Since gen8_emit_pipe_control() already advances the + * batch by 6 dwords, we advance the other 10 here, completing a + * cacheline. It's not clear if the workaround requires this padding + * before other commands, or if it's just the regular padding we would + * already have for the workaround bb, so leave it here for now. + */ + for (i = 0; i < 10; i++) + *batch++ = MI_NOOP; + + /* Pad to end of cacheline */ + while ((unsigned long)batch % CACHELINE_BYTES) + *batch++ = MI_NOOP; + + return batch; +} + +#define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE) + +static int lrc_setup_wa_ctx(struct intel_engine_cs *engine) +{ + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int err; + + obj = i915_gem_object_create_shmem(engine->i915, CTX_WA_BB_OBJ_SIZE); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err; + } + + err = i915_ggtt_pin(vma, NULL, 0, PIN_HIGH); + if (err) + goto err; + + engine->wa_ctx.vma = vma; + return 0; + +err: + i915_gem_object_put(obj); + return err; +} + +typedef u32 *(*wa_bb_func_t)(struct intel_engine_cs *engine, u32 *batch); + +int intel_init_workaround_bb(struct intel_engine_cs *engine) +{ + struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx; + struct i915_wa_ctx_bb *wa_bb[2] = { &wa_ctx->indirect_ctx, + &wa_ctx->per_ctx }; + wa_bb_func_t wa_bb_fn[2]; + void *batch, *batch_ptr; + unsigned int i; + int ret; + + if (engine->class != RENDER_CLASS) + return 0; + + switch (INTEL_GEN(engine->i915)) { + case 12: + case 11: + return 0; + case 10: + wa_bb_fn[0] = gen10_init_indirectctx_bb; + wa_bb_fn[1] = NULL; + break; + case 9: + wa_bb_fn[0] = gen9_init_indirectctx_bb; + wa_bb_fn[1] = NULL; + break; + case 8: + wa_bb_fn[0] = gen8_init_indirectctx_bb; + wa_bb_fn[1] = NULL; + break; + default: + MISSING_CASE(INTEL_GEN(engine->i915)); + return 0; + } + + ret = lrc_setup_wa_ctx(engine); + if (ret) { + drm_dbg(&engine->i915->drm, + "Failed to setup context WA page: %d\n", ret); + return ret; + } + + batch = i915_gem_object_pin_map(wa_ctx->vma->obj, I915_MAP_WB); + + /* + * Emit the two workaround batch buffers, recording the offset from the + * start of the workaround batch buffer object for each and their + * respective sizes. + */ + batch_ptr = batch; + for (i = 0; i < ARRAY_SIZE(wa_bb_fn); i++) { + wa_bb[i]->offset = batch_ptr - batch; + if (GEM_DEBUG_WARN_ON(!IS_ALIGNED(wa_bb[i]->offset, + CACHELINE_BYTES))) { + ret = -EINVAL; + break; + } + if (wa_bb_fn[i]) + batch_ptr = wa_bb_fn[i](engine, batch_ptr); + wa_bb[i]->size = batch_ptr - (batch + wa_bb[i]->offset); + } + GEM_BUG_ON(batch_ptr - batch > CTX_WA_BB_OBJ_SIZE); + + __i915_gem_object_flush_map(wa_ctx->vma->obj, 0, batch_ptr - batch); + __i915_gem_object_release_map(wa_ctx->vma->obj); + if (ret) + intel_fini_workaround_bb(engine); + + return ret; +} + +void intel_fini_workaround_bb(struct intel_engine_cs *engine) +{ + i915_vma_unpin_and_release(&engine->wa_ctx.vma, 0); +} diff --git a/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.h b/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.h new file mode 100644 index 000000000000..88771d77fd42 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2014 Intel Corporation + */ + +#ifndef __INTEL_ENGINE_WORKAROUND_BB_H__ +#define __INTEL_ENGINE_WORKAROUND_BB_H__ + +struct intel_engine_cs; + +int intel_init_workaround_bb(struct intel_engine_cs *engine); +void intel_fini_workaround_bb(struct intel_engine_cs *engine); + +#endif /* __INTEL_ENGINE_WORKAROUND_BB_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 9069a456d2f7..1cc93ea6b7f0 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -116,6 +116,7 @@ #include "intel_breadcrumbs.h" #include "intel_context.h" #include "intel_engine_pm.h" +#include "intel_engine_workaround_bb.h" #include "intel_execlists_submission.h" #include "intel_gt.h" #include "intel_gt_pm.h" @@ -3695,330 +3696,6 @@ static int execlists_request_alloc(struct i915_request *request) return 0; }
-/* - * In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after - * PIPE_CONTROL instruction. This is required for the flush to happen correctly - * but there is a slight complication as this is applied in WA batch where the - * values are only initialized once so we cannot take register value at the - * beginning and reuse it further; hence we save its value to memory, upload a - * constant value with bit21 set and then we restore it back with the saved value. - * To simplify the WA, a constant value is formed by using the default value - * of this register. This shouldn't be a problem because we are only modifying - * it for a short period and this batch in non-premptible. We can ofcourse - * use additional instructions that read the actual value of the register - * at that time and set our bit of interest but it makes the WA complicated. - * - * This WA is also required for Gen9 so extracting as a function avoids - * code duplication. - */ -static u32 * -gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine, u32 *batch) -{ - /* NB no one else is allowed to scribble over scratch + 256! */ - *batch++ = MI_STORE_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT; - *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); - *batch++ = intel_gt_scratch_offset(engine->gt, - INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA); - *batch++ = 0; - - *batch++ = MI_LOAD_REGISTER_IMM(1); - *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); - *batch++ = 0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES; - - batch = gen8_emit_pipe_control(batch, - PIPE_CONTROL_CS_STALL | - PIPE_CONTROL_DC_FLUSH_ENABLE, - 0); - - *batch++ = MI_LOAD_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT; - *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); - *batch++ = intel_gt_scratch_offset(engine->gt, - INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA); - *batch++ = 0; - - return batch; -} - -/* - * Typically we only have one indirect_ctx and per_ctx batch buffer which are - * initialized at the beginning and shared across all contexts but this field - * helps us to have multiple batches at different offsets and select them based - * on a criteria. At the moment this batch always start at the beginning of the page - * and at this point we don't have multiple wa_ctx batch buffers. - * - * The number of WA applied are not known at the beginning; we use this field - * to return the no of DWORDS written. - * - * It is to be noted that this batch does not contain MI_BATCH_BUFFER_END - * so it adds NOOPs as padding to make it cacheline aligned. - * MI_BATCH_BUFFER_END will be added to perctx batch and both of them together - * makes a complete batch buffer. - */ -static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) -{ - /* WaDisableCtxRestoreArbitration:bdw,chv */ - *batch++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; - - /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */ - if (IS_BROADWELL(engine->i915)) - batch = gen8_emit_flush_coherentl3_wa(engine, batch); - - /* WaClearSlmSpaceAtContextSwitch:bdw,chv */ - /* Actual scratch location is at 128 bytes offset */ - batch = gen8_emit_pipe_control(batch, - PIPE_CONTROL_FLUSH_L3 | - PIPE_CONTROL_STORE_DATA_INDEX | - PIPE_CONTROL_CS_STALL | - PIPE_CONTROL_QW_WRITE, - LRC_PPHWSP_SCRATCH_ADDR); - - *batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - - /* Pad to end of cacheline */ - while ((unsigned long)batch % CACHELINE_BYTES) - *batch++ = MI_NOOP; - - /* - * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because - * execution depends on the length specified in terms of cache lines - * in the register CTX_RCS_INDIRECT_CTX - */ - - return batch; -} - -struct lri { - i915_reg_t reg; - u32 value; -}; - -static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count) -{ - GEM_BUG_ON(!count || count > 63); - - *batch++ = MI_LOAD_REGISTER_IMM(count); - do { - *batch++ = i915_mmio_reg_offset(lri->reg); - *batch++ = lri->value; - } while (lri++, --count); - *batch++ = MI_NOOP; - - return batch; -} - -static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) -{ - static const struct lri lri[] = { - /* WaDisableGatherAtSetShaderCommonSlice:skl,bxt,kbl,glk */ - { - COMMON_SLICE_CHICKEN2, - __MASKED_FIELD(GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE, - 0), - }, - - /* BSpec: 11391 */ - { - FF_SLICE_CHICKEN, - __MASKED_FIELD(FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX, - FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX), - }, - - /* BSpec: 11299 */ - { - _3D_CHICKEN3, - __MASKED_FIELD(_3D_CHICKEN_SF_PROVOKING_VERTEX_FIX, - _3D_CHICKEN_SF_PROVOKING_VERTEX_FIX), - } - }; - - *batch++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; - - /* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */ - batch = gen8_emit_flush_coherentl3_wa(engine, batch); - - /* WaClearSlmSpaceAtContextSwitch:skl,bxt,kbl,glk,cfl */ - batch = gen8_emit_pipe_control(batch, - PIPE_CONTROL_FLUSH_L3 | - PIPE_CONTROL_STORE_DATA_INDEX | - PIPE_CONTROL_CS_STALL | - PIPE_CONTROL_QW_WRITE, - LRC_PPHWSP_SCRATCH_ADDR); - - batch = emit_lri(batch, lri, ARRAY_SIZE(lri)); - - /* WaMediaPoolStateCmdInWABB:bxt,glk */ - if (HAS_POOLED_EU(engine->i915)) { - /* - * EU pool configuration is setup along with golden context - * during context initialization. This value depends on - * device type (2x6 or 3x6) and needs to be updated based - * on which subslice is disabled especially for 2x6 - * devices, however it is safe to load default - * configuration of 3x6 device instead of masking off - * corresponding bits because HW ignores bits of a disabled - * subslice and drops down to appropriate config. Please - * see render_state_setup() in i915_gem_render_state.c for - * possible configurations, to avoid duplication they are - * not shown here again. - */ - *batch++ = GEN9_MEDIA_POOL_STATE; - *batch++ = GEN9_MEDIA_POOL_ENABLE; - *batch++ = 0x00777000; - *batch++ = 0; - *batch++ = 0; - *batch++ = 0; - } - - *batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - - /* Pad to end of cacheline */ - while ((unsigned long)batch % CACHELINE_BYTES) - *batch++ = MI_NOOP; - - return batch; -} - -static u32 * -gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) -{ - int i; - - /* - * WaPipeControlBefore3DStateSamplePattern: cnl - * - * Ensure the engine is idle prior to programming a - * 3DSTATE_SAMPLE_PATTERN during a context restore. - */ - batch = gen8_emit_pipe_control(batch, - PIPE_CONTROL_CS_STALL, - 0); - /* - * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for - * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in - * total. However, a PIPE_CONTROL is 6 dwords long, not 4, which is - * confusing. Since gen8_emit_pipe_control() already advances the - * batch by 6 dwords, we advance the other 10 here, completing a - * cacheline. It's not clear if the workaround requires this padding - * before other commands, or if it's just the regular padding we would - * already have for the workaround bb, so leave it here for now. - */ - for (i = 0; i < 10; i++) - *batch++ = MI_NOOP; - - /* Pad to end of cacheline */ - while ((unsigned long)batch % CACHELINE_BYTES) - *batch++ = MI_NOOP; - - return batch; -} - -#define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE) - -static int lrc_setup_wa_ctx(struct intel_engine_cs *engine) -{ - struct drm_i915_gem_object *obj; - struct i915_vma *vma; - int err; - - obj = i915_gem_object_create_shmem(engine->i915, CTX_WA_BB_OBJ_SIZE); - if (IS_ERR(obj)) - return PTR_ERR(obj); - - vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL); - if (IS_ERR(vma)) { - err = PTR_ERR(vma); - goto err; - } - - err = i915_ggtt_pin(vma, NULL, 0, PIN_HIGH); - if (err) - goto err; - - engine->wa_ctx.vma = vma; - return 0; - -err: - i915_gem_object_put(obj); - return err; -} - -static void lrc_destroy_wa_ctx(struct intel_engine_cs *engine) -{ - i915_vma_unpin_and_release(&engine->wa_ctx.vma, 0); -} - -typedef u32 *(*wa_bb_func_t)(struct intel_engine_cs *engine, u32 *batch); - -static int intel_init_workaround_bb(struct intel_engine_cs *engine) -{ - struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx; - struct i915_wa_ctx_bb *wa_bb[2] = { &wa_ctx->indirect_ctx, - &wa_ctx->per_ctx }; - wa_bb_func_t wa_bb_fn[2]; - void *batch, *batch_ptr; - unsigned int i; - int ret; - - if (engine->class != RENDER_CLASS) - return 0; - - switch (INTEL_GEN(engine->i915)) { - case 12: - case 11: - return 0; - case 10: - wa_bb_fn[0] = gen10_init_indirectctx_bb; - wa_bb_fn[1] = NULL; - break; - case 9: - wa_bb_fn[0] = gen9_init_indirectctx_bb; - wa_bb_fn[1] = NULL; - break; - case 8: - wa_bb_fn[0] = gen8_init_indirectctx_bb; - wa_bb_fn[1] = NULL; - break; - default: - MISSING_CASE(INTEL_GEN(engine->i915)); - return 0; - } - - ret = lrc_setup_wa_ctx(engine); - if (ret) { - drm_dbg(&engine->i915->drm, - "Failed to setup context WA page: %d\n", ret); - return ret; - } - - batch = i915_gem_object_pin_map(wa_ctx->vma->obj, I915_MAP_WB); - - /* - * Emit the two workaround batch buffers, recording the offset from the - * start of the workaround batch buffer object for each and their - * respective sizes. - */ - batch_ptr = batch; - for (i = 0; i < ARRAY_SIZE(wa_bb_fn); i++) { - wa_bb[i]->offset = batch_ptr - batch; - if (GEM_DEBUG_WARN_ON(!IS_ALIGNED(wa_bb[i]->offset, - CACHELINE_BYTES))) { - ret = -EINVAL; - break; - } - if (wa_bb_fn[i]) - batch_ptr = wa_bb_fn[i](engine, batch_ptr); - wa_bb[i]->size = batch_ptr - (batch + wa_bb[i]->offset); - } - GEM_BUG_ON(batch_ptr - batch > CTX_WA_BB_OBJ_SIZE); - - __i915_gem_object_flush_map(wa_ctx->vma->obj, 0, batch_ptr - batch); - __i915_gem_object_release_map(wa_ctx->vma->obj); - if (ret) - lrc_destroy_wa_ctx(engine); - - return ret; -} - static void reset_csb_pointers(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -4707,7 +4384,7 @@ static void execlists_release(struct intel_engine_cs *engine) execlists_shutdown(engine);
intel_engine_cleanup_common(engine); - lrc_destroy_wa_ctx(engine); + intel_fini_workaround_bb(engine); }
static void
From: Thomas Hellström thomas.hellstrom@intel.com
There is a dirty hack to work around a lockdep splat because incorrect ordering of selftest timeline lock against other locks. However, some selftests recently started to use the same nesting level as the workaround and thus introduced more splats. Add a workaround to the workaround making some selftests aware of the workaround.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Mattew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/intel_context.c | 3 ++- drivers/gpu/drm/i915/gt/intel_context.h | 2 ++ drivers/gpu/drm/i915/gt/selftest_timeline.c | 10 ++++++---- 3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 349e7fa1488d..b63a8eb6c1a9 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -495,7 +495,8 @@ struct i915_request *intel_context_create_request(struct intel_context *ce) */ lockdep_unpin_lock(&ce->timeline->mutex, rq->cookie); mutex_release(&ce->timeline->mutex.dep_map, _RET_IP_); - mutex_acquire(&ce->timeline->mutex.dep_map, SINGLE_DEPTH_NESTING, 0, _RET_IP_); + mutex_acquire(&ce->timeline->mutex.dep_map, SELFTEST_WA_NESTING, 0, + _RET_IP_); rq->cookie = lockdep_pin_lock(&ce->timeline->mutex);
return rq; diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index fda2eba81e22..175d505951c7 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -25,6 +25,8 @@ ##__VA_ARGS__); \ } while (0)
+#define SELFTEST_WA_NESTING SINGLE_DEPTH_NESTING + struct i915_gem_ww_ctx;
void intel_context_init(struct intel_context *ce, diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c index e4285d5a0360..fa3fec049542 100644 --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c @@ -688,7 +688,7 @@ static int live_hwsp_wrap(void *arg)
tl->seqno = -4u;
- mutex_lock_nested(&tl->mutex, SINGLE_DEPTH_NESTING); + mutex_lock_nested(&tl->mutex, SELFTEST_WA_NESTING + 1); err = intel_timeline_get_seqno(tl, rq, &seqno[0]); mutex_unlock(&tl->mutex); if (err) { @@ -705,7 +705,7 @@ static int live_hwsp_wrap(void *arg) } hwsp_seqno[0] = tl->hwsp_seqno;
- mutex_lock_nested(&tl->mutex, SINGLE_DEPTH_NESTING); + mutex_lock_nested(&tl->mutex, SELFTEST_WA_NESTING + 1); err = intel_timeline_get_seqno(tl, rq, &seqno[1]); mutex_unlock(&tl->mutex); if (err) { @@ -1037,7 +1037,8 @@ static int live_hwsp_read(void *arg) goto out; }
- mutex_lock(&watcher[0].rq->context->timeline->mutex); + mutex_lock_nested(&watcher[0].rq->context->timeline->mutex, + SELFTEST_WA_NESTING + 1); err = intel_timeline_read_hwsp(rq, watcher[0].rq, &hwsp); if (err == 0) err = emit_read_hwsp(watcher[0].rq, /* before */ @@ -1050,7 +1051,8 @@ static int live_hwsp_read(void *arg) goto out; }
- mutex_lock(&watcher[1].rq->context->timeline->mutex); + mutex_lock_nested(&watcher[1].rq->context->timeline->mutex, + SELFTEST_WA_NESTING + 1); err = intel_timeline_read_hwsp(rq, watcher[1].rq, &hwsp); if (err == 0) err = emit_read_hwsp(watcher[1].rq, /* after */
From: Thomas Hellström thomas.hellstrom@intel.com
When an object is just created and not yet put on any lists, there's a single owner and thus trylock will always succeed. Introduce drm_i915_lock_isolated to annotate trylock in this situation. This is similar to TTM's create_locked() functionality.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index be14486f63a7..d61194ef484e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -107,6 +107,13 @@ i915_gem_object_put(struct drm_i915_gem_object *obj)
#define assert_object_held(obj) dma_resv_assert_held((obj)->base.resv)
+#define object_is_isolated(obj) \ + (!IS_ENABLED(CONFIG_LOCKDEP) || \ + ((kref_read(&obj->base.refcount) == 0) || \ + ((kref_read(&obj->base.refcount) == 1) && \ + list_empty_careful(&obj->mm.link) && \ + list_empty_careful(&obj->vma.list)))) + static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj, struct i915_gem_ww_ctx *ww, bool intr) @@ -147,6 +154,15 @@ static inline bool i915_gem_object_trylock(struct drm_i915_gem_object *obj) return dma_resv_trylock(obj->base.resv); }
+static inline void i915_gem_object_lock_isolated(struct drm_i915_gem_object *obj) +{ + int ret; + + WARN_ON(!object_is_isolated(obj)); + ret = dma_resv_trylock(obj->base.resv); + GEM_WARN_ON(!ret); +} + static inline void i915_gem_object_unlock(struct drm_i915_gem_object *obj) { dma_resv_unlock(obj->base.resv);
From: Thomas Hellström thomas.hellstrom@intel.com
We may need to create hwsp objects at request treate time in the middle of a ww transaction. Since we typically don't have easy access to the ww_acquire_context, lock the hwsp objects isolated for pinning/mapping only at create time. For later binding to the ggtt, make sure lockdep allows binding of already pinned pages to the ggtt without the underlying object lock held.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/intel_timeline.c | 58 ++++++++++++++---------- drivers/gpu/drm/i915/i915_vma.c | 13 ++++-- 2 files changed, 44 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 512afacd2bdc..a58228d1cd3b 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -24,25 +24,43 @@ struct intel_timeline_hwsp { struct list_head free_link; struct i915_vma *vma; u64 free_bitmap; + void *vaddr; };
-static struct i915_vma *__hwsp_alloc(struct intel_gt *gt) +static int __hwsp_alloc(struct intel_gt *gt, struct intel_timeline_hwsp *hwsp) { struct drm_i915_private *i915 = gt->i915; struct drm_i915_gem_object *obj; - struct i915_vma *vma; + int ret;
obj = i915_gem_object_create_internal(i915, PAGE_SIZE); if (IS_ERR(obj)) - return ERR_CAST(obj); + return PTR_ERR(obj);
+ i915_gem_object_lock_isolated(obj); i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
- vma = i915_vma_instance(obj, >->ggtt->vm, NULL); - if (IS_ERR(vma)) - i915_gem_object_put(obj); + hwsp->vma = i915_vma_instance(obj, >->ggtt->vm, NULL); + if (IS_ERR(hwsp->vma)) { + ret = PTR_ERR(hwsp->vma); + goto out_unlock; + } + + /* Pin early so we can call i915_ggtt_pin unlocked. */ + hwsp->vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + if (IS_ERR(hwsp->vaddr)) { + ret = PTR_ERR(hwsp->vaddr); + goto out_unlock; + } + + i915_gem_object_unlock(obj); + return 0; + +out_unlock: + i915_gem_object_unlock(obj); + i915_gem_object_put(obj);
- return vma; + return ret; }
static struct i915_vma * @@ -59,7 +77,7 @@ hwsp_alloc(struct intel_timeline *timeline, unsigned int *cacheline) hwsp = list_first_entry_or_null(>->hwsp_free_list, typeof(*hwsp), free_link); if (!hwsp) { - struct i915_vma *vma; + int ret;
spin_unlock_irq(>->hwsp_lock);
@@ -67,17 +85,16 @@ hwsp_alloc(struct intel_timeline *timeline, unsigned int *cacheline) if (!hwsp) return ERR_PTR(-ENOMEM);
- vma = __hwsp_alloc(timeline->gt); - if (IS_ERR(vma)) { + ret = __hwsp_alloc(timeline->gt, hwsp); + if (ret) { kfree(hwsp); - return vma; + return ERR_PTR(ret); }
GT_TRACE(timeline->gt, "new HWSP allocated\n");
- vma->private = hwsp; + hwsp->vma->private = hwsp; hwsp->gt = timeline->gt; - hwsp->vma = vma; hwsp->free_bitmap = ~0ull; hwsp->gt_timelines = gt;
@@ -113,9 +130,12 @@ static void __idle_hwsp_free(struct intel_timeline_hwsp *hwsp, int cacheline)
/* And if no one is left using it, give the page back to the system */ if (hwsp->free_bitmap == ~0ull) { - i915_vma_put(hwsp->vma); list_del(&hwsp->free_link); + spin_unlock_irqrestore(>->hwsp_lock, flags); + i915_gem_object_unpin_map(hwsp->vma->obj); + i915_vma_put(hwsp->vma); kfree(hwsp); + return; }
spin_unlock_irqrestore(>->hwsp_lock, flags); @@ -134,7 +154,6 @@ static void __idle_cacheline_free(struct intel_timeline_cacheline *cl) { GEM_BUG_ON(!i915_active_is_idle(&cl->active));
- i915_gem_object_unpin_map(cl->hwsp->vma->obj); i915_vma_put(cl->hwsp->vma); __idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
@@ -165,7 +184,6 @@ static struct intel_timeline_cacheline * cacheline_alloc(struct intel_timeline_hwsp *hwsp, unsigned int cacheline) { struct intel_timeline_cacheline *cl; - void *vaddr;
GEM_BUG_ON(cacheline >= BIT(CACHELINE_BITS));
@@ -173,15 +191,9 @@ cacheline_alloc(struct intel_timeline_hwsp *hwsp, unsigned int cacheline) if (!cl) return ERR_PTR(-ENOMEM);
- vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB); - if (IS_ERR(vaddr)) { - kfree(cl); - return ERR_CAST(vaddr); - } - i915_vma_get(hwsp->vma); cl->hwsp = hwsp; - cl->vaddr = page_pack_bits(vaddr, cacheline); + cl->vaddr = page_pack_bits(hwsp->vaddr, cacheline);
i915_active_init(&cl->active, __cacheline_active, __cacheline_retire);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index caa9b041616b..8e8c80ccbe32 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -862,10 +862,15 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, unsigned int bound; int err;
-#ifdef CONFIG_PROVE_LOCKING - if (debug_locks && lockdep_is_held(&vma->vm->i915->drm.struct_mutex)) - WARN_ON(!ww); -#endif + if (IS_ENABLED(CONFIG_PROVE_LOCKING) && debug_locks) { + bool pinned_bind_wo_alloc = + vma->obj && i915_gem_object_has_pinned_pages(vma->obj) && + !vma->vm->allocate_va_range; + + if (lockdep_is_held(&vma->vm->i915->drm.struct_mutex) && + !pinned_bind_wo_alloc) + WARN_ON(!ww); + }
BUILD_BUG_ON(PIN_GLOBAL != I915_VMA_GLOBAL_BIND); BUILD_BUG_ON(PIN_USER != I915_VMA_LOCAL_BIND);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We're starting to require the reservation lock for pinning, so wait until we have that.
Update the selftests to handle this correctly, and ensure pin is called in live_hwsp_rollover_user() and mock_hwsp_freelist().
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Reported-by: kernel test robot lkp@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_timeline.c | 49 ++++++++++---- drivers/gpu/drm/i915/gt/intel_timeline.h | 1 + .../gpu/drm/i915/gt/intel_timeline_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 24 ++++++- drivers/gpu/drm/i915/gt/selftest_timeline.c | 64 ++++++++++--------- drivers/gpu/drm/i915/i915_selftest.h | 2 + 6 files changed, 96 insertions(+), 45 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index a58228d1cd3b..479eb5440bc6 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -229,13 +229,30 @@ static void cacheline_free(struct intel_timeline_cacheline *cl) i915_active_release(&cl->active); }
+I915_SELFTEST_EXPORT int +intel_timeline_pin_map(struct intel_timeline *timeline) +{ + if (!timeline->hwsp_cacheline) { + struct drm_i915_gem_object *obj = timeline->hwsp_ggtt->obj; + u32 ofs = offset_in_page(timeline->hwsp_offset); + void *vaddr; + + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr); + + timeline->hwsp_map = vaddr; + timeline->hwsp_seqno = memset(vaddr + ofs, 0, CACHELINE_BYTES); + } + + return 0; +} + static int intel_timeline_init(struct intel_timeline *timeline, struct intel_gt *gt, struct i915_vma *hwsp, unsigned int offset) { - void *vaddr; - kref_init(&timeline->kref); atomic_set(&timeline->pin_count, 0);
@@ -260,18 +277,15 @@ static int intel_timeline_init(struct intel_timeline *timeline,
timeline->hwsp_cacheline = cl; timeline->hwsp_offset = cacheline * CACHELINE_BYTES; - - vaddr = page_mask_bits(cl->vaddr); + timeline->hwsp_map = page_mask_bits(cl->vaddr); + timeline->hwsp_seqno = + memset(timeline->hwsp_map + timeline->hwsp_offset, 0, + CACHELINE_BYTES); } else { timeline->hwsp_offset = offset; - vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB); - if (IS_ERR(vaddr)) - return PTR_ERR(vaddr); + timeline->hwsp_map = NULL; }
- timeline->hwsp_seqno = - memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES); - timeline->hwsp_ggtt = i915_vma_get(hwsp); GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
@@ -306,7 +320,7 @@ static void intel_timeline_fini(struct intel_timeline *timeline)
if (timeline->hwsp_cacheline) cacheline_free(timeline->hwsp_cacheline); - else + else if (timeline->hwsp_map) i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
i915_vma_put(timeline->hwsp_ggtt); @@ -346,9 +360,18 @@ int intel_timeline_pin(struct intel_timeline *tl, struct i915_gem_ww_ctx *ww) if (atomic_add_unless(&tl->pin_count, 1, 0)) return 0;
+ if (!tl->hwsp_cacheline) { + err = intel_timeline_pin_map(tl); + if (err) + return err; + } + err = i915_ggtt_pin(tl->hwsp_ggtt, ww, 0, PIN_HIGH); - if (err) + if (err) { + if (!tl->hwsp_cacheline) + i915_gem_object_unpin_map(tl->hwsp_ggtt->obj); return err; + }
tl->hwsp_offset = i915_ggtt_offset(tl->hwsp_ggtt) + @@ -360,6 +383,8 @@ int intel_timeline_pin(struct intel_timeline *tl, struct i915_gem_ww_ctx *ww) if (atomic_fetch_inc(&tl->pin_count)) { cacheline_release(tl->hwsp_cacheline); __i915_vma_unpin(tl->hwsp_ggtt); + if (!tl->hwsp_cacheline) + i915_gem_object_unpin_map(tl->hwsp_ggtt->obj); }
return 0; diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h b/drivers/gpu/drm/i915/gt/intel_timeline.h index 634acebd0c4b..725bae16237c 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.h +++ b/drivers/gpu/drm/i915/gt/intel_timeline.h @@ -114,5 +114,6 @@ void intel_gt_show_timelines(struct intel_gt *gt, const struct i915_request *rq, const char *prefix, int indent)); +I915_SELFTEST_DECLARE(int intel_timeline_pin_map(struct intel_timeline *tl));
#endif diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h b/drivers/gpu/drm/i915/gt/intel_timeline_types.h index 4474f487f589..cac7fa3dfd43 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h +++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h @@ -45,6 +45,7 @@ struct intel_timeline { atomic_t pin_count; atomic_t active_count;
+ void *hwsp_map; const u32 *hwsp_seqno; struct i915_vma *hwsp_ggtt; u32 hwsp_offset; diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 2f830017c51d..016f4f345706 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -32,9 +32,22 @@ #include "mock_engine.h" #include "selftests/mock_request.h"
-static void mock_timeline_pin(struct intel_timeline *tl) +static int mock_timeline_pin(struct intel_timeline *tl) { + int err; + + if (!tl->hwsp_cacheline) { + if (WARN_ON(!i915_gem_object_trylock(tl->hwsp_ggtt->obj))) + return -EBUSY; + + err = intel_timeline_pin_map(tl); + i915_gem_object_unlock(tl->hwsp_ggtt->obj); + if (err) + return err; + } + atomic_inc(&tl->pin_count); + return 0; }
static void mock_timeline_unpin(struct intel_timeline *tl) @@ -152,6 +165,8 @@ static void mock_context_destroy(struct kref *ref)
static int mock_context_alloc(struct intel_context *ce) { + int err; + ce->ring = mock_ring(ce->engine); if (!ce->ring) return -ENOMEM; @@ -162,7 +177,12 @@ static int mock_context_alloc(struct intel_context *ce) return PTR_ERR(ce->timeline); }
- mock_timeline_pin(ce->timeline); + err = mock_timeline_pin(ce->timeline); + if (err) { + intel_timeline_put(ce->timeline); + ce->timeline = NULL; + return err; + }
return 0; } diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c index fa3fec049542..7435abf5a703 100644 --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c @@ -34,7 +34,7 @@ static unsigned long hwsp_cacheline(struct intel_timeline *tl) { unsigned long address = (unsigned long)page_address(hwsp_page(tl));
- return (address + tl->hwsp_offset) / CACHELINE_BYTES; + return (address + offset_in_page(tl->hwsp_offset)) / CACHELINE_BYTES; }
#define CACHELINES_PER_PAGE (PAGE_SIZE / CACHELINE_BYTES) @@ -58,6 +58,7 @@ static void __mock_hwsp_record(struct mock_hwsp_freelist *state, tl = xchg(&state->history[idx], tl); if (tl) { radix_tree_delete(&state->cachelines, hwsp_cacheline(tl)); + intel_timeline_unpin(tl); intel_timeline_put(tl); } } @@ -77,6 +78,12 @@ static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state, if (IS_ERR(tl)) return PTR_ERR(tl);
+ err = intel_timeline_pin(tl, NULL); + if (err) { + intel_timeline_put(tl); + return err; + } + cacheline = hwsp_cacheline(tl); err = radix_tree_insert(&state->cachelines, cacheline, tl); if (err) { @@ -84,6 +91,7 @@ static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state, pr_err("HWSP cacheline %lu already used; duplicate allocation!\n", cacheline); } + intel_timeline_unpin(tl); intel_timeline_put(tl); return err; } @@ -451,7 +459,7 @@ static int emit_ggtt_store_dw(struct i915_request *rq, u32 addr, u32 value) }
static struct i915_request * -tl_write(struct intel_timeline *tl, struct intel_engine_cs *engine, u32 value) +checked_tl_write(struct intel_timeline *tl, struct intel_engine_cs *engine, u32 value) { struct i915_request *rq; int err; @@ -462,6 +470,13 @@ tl_write(struct intel_timeline *tl, struct intel_engine_cs *engine, u32 value) goto out; }
+ if (READ_ONCE(*tl->hwsp_seqno) != tl->seqno) { + pr_err("Timeline created with incorrect breadcrumb, found %x, expected %x\n", + *tl->hwsp_seqno, tl->seqno); + intel_timeline_unpin(tl); + return ERR_PTR(-EINVAL); + } + rq = intel_engine_create_kernel_request(engine); if (IS_ERR(rq)) goto out_unpin; @@ -483,25 +498,6 @@ tl_write(struct intel_timeline *tl, struct intel_engine_cs *engine, u32 value) return rq; }
-static struct intel_timeline * -checked_intel_timeline_create(struct intel_gt *gt) -{ - struct intel_timeline *tl; - - tl = intel_timeline_create(gt); - if (IS_ERR(tl)) - return tl; - - if (READ_ONCE(*tl->hwsp_seqno) != tl->seqno) { - pr_err("Timeline created with incorrect breadcrumb, found %x, expected %x\n", - *tl->hwsp_seqno, tl->seqno); - intel_timeline_put(tl); - return ERR_PTR(-EINVAL); - } - - return tl; -} - static int live_hwsp_engine(void *arg) { #define NUM_TIMELINES 4096 @@ -534,13 +530,13 @@ static int live_hwsp_engine(void *arg) struct intel_timeline *tl; struct i915_request *rq;
- tl = checked_intel_timeline_create(gt); + tl = intel_timeline_create(gt); if (IS_ERR(tl)) { err = PTR_ERR(tl); break; }
- rq = tl_write(tl, engine, count); + rq = checked_tl_write(tl, engine, count); if (IS_ERR(rq)) { intel_timeline_put(tl); err = PTR_ERR(rq); @@ -607,14 +603,14 @@ static int live_hwsp_alternate(void *arg) if (!intel_engine_can_store_dword(engine)) continue;
- tl = checked_intel_timeline_create(gt); + tl = intel_timeline_create(gt); if (IS_ERR(tl)) { err = PTR_ERR(tl); goto out; }
intel_engine_pm_get(engine); - rq = tl_write(tl, engine, count); + rq = checked_tl_write(tl, engine, count); intel_engine_pm_put(engine); if (IS_ERR(rq)) { intel_timeline_put(tl); @@ -1239,8 +1235,13 @@ static int live_hwsp_rollover_user(void *arg) if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline) goto out;
+ err = intel_context_pin(ce); + if (err) + goto out; + timeline_rollback(tl); timeline_rollback(tl); + WRITE_ONCE(*(u32 *)tl->hwsp_seqno, tl->seqno);
for (i = 0; i < ARRAY_SIZE(rq); i++) { @@ -1249,7 +1250,7 @@ static int live_hwsp_rollover_user(void *arg) this = intel_context_create_request(ce); if (IS_ERR(this)) { err = PTR_ERR(this); - goto out; + goto out_unpin; }
pr_debug("%s: create fence.seqnp:%d\n", @@ -1268,17 +1269,18 @@ static int live_hwsp_rollover_user(void *arg) if (i915_request_wait(rq[2], 0, HZ / 5) < 0) { pr_err("Wait for timeline wrap timed out!\n"); err = -EIO; - goto out; + goto out_unpin; }
for (i = 0; i < ARRAY_SIZE(rq); i++) { if (!i915_request_completed(rq[i])) { pr_err("Pre-wrap request not completed!\n"); err = -EINVAL; - goto out; + goto out_unpin; } } - +out_unpin: + intel_context_unpin(ce); out: for (i = 0; i < ARRAY_SIZE(rq); i++) i915_request_put(rq[i]); @@ -1320,13 +1322,13 @@ static int live_hwsp_recycle(void *arg) struct intel_timeline *tl; struct i915_request *rq;
- tl = checked_intel_timeline_create(gt); + tl = intel_timeline_create(gt); if (IS_ERR(tl)) { err = PTR_ERR(tl); break; }
- rq = tl_write(tl, engine, count); + rq = checked_tl_write(tl, engine, count); if (IS_ERR(rq)) { intel_timeline_put(tl); err = PTR_ERR(rq); diff --git a/drivers/gpu/drm/i915/i915_selftest.h b/drivers/gpu/drm/i915/i915_selftest.h index d53d207ab6eb..f54de0499be7 100644 --- a/drivers/gpu/drm/i915/i915_selftest.h +++ b/drivers/gpu/drm/i915/i915_selftest.h @@ -107,6 +107,7 @@ int __i915_subtests(const char *caller,
#define I915_SELFTEST_DECLARE(x) x #define I915_SELFTEST_ONLY(x) unlikely(x) +#define I915_SELFTEST_EXPORT
#else /* !IS_ENABLED(CONFIG_DRM_I915_SELFTEST) */
@@ -116,6 +117,7 @@ static inline int i915_perf_selftests(struct pci_dev *pdev) { return 0; }
#define I915_SELFTEST_DECLARE(x) #define I915_SELFTEST_ONLY(x) 0 +#define I915_SELFTEST_EXPORT static
#endif
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We need to get rid of allocations in the cmd parser, because it needs to be called from a signaling context, first move all pinning to execbuf, where we already hold all locks.
Allocate jump_whitelist in the execbuffer, and add annotations around intel_engine_cmd_parser(), to ensure we only call the command parser without allocating any memory, or taking any locks we're not supposed to.
Because i915_gem_object_get_page() may also allocate memory, add a path to i915_gem_object_get_sg() that prevents memory allocations, and walk the sg list manually. It should be similarly fast.
This has the added benefit of being able to catch all memory allocation errors before the point of no return, and return -ENOMEM safely to the execbuf submitter.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 74 ++++++++++++- drivers/gpu/drm/i915/gem/i915_gem_object.h | 10 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 21 +++- drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/i915_cmd_parser.c | 104 ++++++++---------- drivers/gpu/drm/i915/i915_drv.h | 7 +- drivers/gpu/drm/i915/i915_memcpy.c | 2 +- drivers/gpu/drm/i915/i915_memcpy.h | 2 +- 8 files changed, 142 insertions(+), 80 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 1904e6e5ea64..60afa6f826d6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -24,6 +24,7 @@ #include "i915_gem_clflush.h" #include "i915_gem_context.h" #include "i915_gem_ioctls.h" +#include "i915_memcpy.h" #include "i915_sw_fence_work.h" #include "i915_trace.h" #include "i915_user_extensions.h" @@ -2273,24 +2274,45 @@ struct eb_parse_work { struct i915_vma *trampoline; unsigned long batch_offset; unsigned long batch_length; + unsigned long *jump_whitelist; + const void *batch_map; + void *shadow_map; };
static int __eb_parse(struct dma_fence_work *work) { struct eb_parse_work *pw = container_of(work, typeof(*pw), base); + int ret; + bool cookie;
- return intel_engine_cmd_parser(pw->engine, - pw->batch, - pw->batch_offset, - pw->batch_length, - pw->shadow, - pw->trampoline); + cookie = dma_fence_begin_signalling(); + ret = intel_engine_cmd_parser(pw->engine, + pw->batch, + pw->batch_offset, + pw->batch_length, + pw->shadow, + pw->jump_whitelist, + pw->shadow_map, + pw->batch_map); + dma_fence_end_signalling(cookie); + + return ret; }
static void __eb_parse_release(struct dma_fence_work *work) { struct eb_parse_work *pw = container_of(work, typeof(*pw), base);
+ if (!IS_ERR_OR_NULL(pw->jump_whitelist)) + kfree(pw->jump_whitelist); + + if (pw->batch_map) + i915_gem_object_unpin_map(pw->batch->obj); + else + i915_gem_object_unpin_pages(pw->batch->obj); + + i915_gem_object_unpin_map(pw->shadow->obj); + if (pw->trampoline) i915_active_release(&pw->trampoline->active); i915_active_release(&pw->shadow->active); @@ -2340,6 +2362,8 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb, struct i915_vma *trampoline) { struct eb_parse_work *pw; + struct drm_i915_gem_object *batch = eb->batch->vma->obj; + bool needs_clflush; int err;
GEM_BUG_ON(overflows_type(eb->batch_start_offset, pw->batch_offset)); @@ -2363,6 +2387,34 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb, goto err_shadow; }
+ pw->shadow_map = i915_gem_object_pin_map(shadow->obj, I915_MAP_FORCE_WB); + if (IS_ERR(pw->shadow_map)) { + err = PTR_ERR(pw->shadow_map); + goto err_trampoline; + } + + needs_clflush = + !(batch->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ); + + pw->batch_map = ERR_PTR(-ENODEV); + if (needs_clflush && i915_has_memcpy_from_wc()) + pw->batch_map = i915_gem_object_pin_map(batch, I915_MAP_WC); + + if (IS_ERR(pw->batch_map)) { + err = i915_gem_object_pin_pages(batch); + if (err) + goto err_unmap_shadow; + pw->batch_map = NULL; + } + + pw->jump_whitelist = + intel_engine_cmd_parser_alloc_jump_whitelist(eb->batch_len, + trampoline); + if (IS_ERR(pw->jump_whitelist)) { + err = PTR_ERR(pw->jump_whitelist); + goto err_unmap_batch; + } + dma_fence_work_init(&pw->base, &eb_parse_ops);
pw->engine = eb->engine; @@ -2402,6 +2454,16 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb, dma_fence_work_commit_imm(&pw->base); return err;
+err_unmap_batch: + if (pw->batch_map) + i915_gem_object_unpin_map(batch); + else + i915_gem_object_unpin_pages(batch); +err_unmap_shadow: + i915_gem_object_unpin_map(shadow->obj); +err_trampoline: + if (trampoline) + i915_active_release(&trampoline->active); err_shadow: i915_active_release(&shadow->active); err_batch: diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index d61194ef484e..80c5b2b326f5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -291,22 +291,22 @@ struct scatterlist * __i915_gem_object_get_sg(struct drm_i915_gem_object *obj, struct i915_gem_object_page_iter *iter, unsigned int n, - unsigned int *offset); + unsigned int *offset, bool allow_alloc);
static inline struct scatterlist * i915_gem_object_get_sg(struct drm_i915_gem_object *obj, unsigned int n, - unsigned int *offset) + unsigned int *offset, bool allow_alloc) { - return __i915_gem_object_get_sg(obj, &obj->mm.get_page, n, offset); + return __i915_gem_object_get_sg(obj, &obj->mm.get_page, n, offset, allow_alloc); }
static inline struct scatterlist * i915_gem_object_get_sg_dma(struct drm_i915_gem_object *obj, unsigned int n, - unsigned int *offset) + unsigned int *offset, bool allow_alloc) { - return __i915_gem_object_get_sg(obj, &obj->mm.get_dma_page, n, offset); + return __i915_gem_object_get_sg(obj, &obj->mm.get_dma_page, n, offset, allow_alloc); }
struct page * diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index e2c7b2a7895f..ca076203f5e9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -445,7 +445,8 @@ struct scatterlist * __i915_gem_object_get_sg(struct drm_i915_gem_object *obj, struct i915_gem_object_page_iter *iter, unsigned int n, - unsigned int *offset) + unsigned int *offset, + bool allow_alloc) { const bool dma = iter == &obj->mm.get_dma_page; struct scatterlist *sg; @@ -467,6 +468,9 @@ __i915_gem_object_get_sg(struct drm_i915_gem_object *obj, if (n < READ_ONCE(iter->sg_idx)) goto lookup;
+ if (!allow_alloc) + goto manual_lookup; + mutex_lock(&iter->lock);
/* We prefer to reuse the last sg so that repeated lookup of this @@ -516,7 +520,16 @@ __i915_gem_object_get_sg(struct drm_i915_gem_object *obj, if (unlikely(n < idx)) /* insertion completed by another thread */ goto lookup;
- /* In case we failed to insert the entry into the radixtree, we need + goto manual_walk; + +manual_lookup: + idx = 0; + sg = obj->mm.pages->sgl; + count = __sg_page_count(sg); + +manual_walk: + /* + * In case we failed to insert the entry into the radixtree, we need * to look beyond the current sg. */ while (idx + count <= n) { @@ -563,7 +576,7 @@ i915_gem_object_get_page(struct drm_i915_gem_object *obj, unsigned int n)
GEM_BUG_ON(!i915_gem_object_has_struct_page(obj));
- sg = i915_gem_object_get_sg(obj, n, &offset); + sg = i915_gem_object_get_sg(obj, n, &offset, true); return nth_page(sg_page(sg), offset); }
@@ -589,7 +602,7 @@ i915_gem_object_get_dma_address_len(struct drm_i915_gem_object *obj, struct scatterlist *sg; unsigned int offset;
- sg = i915_gem_object_get_sg_dma(obj, n, &offset); + sg = i915_gem_object_get_sg_dma(obj, n, &offset, true);
if (len) *len = sg_dma_len(sg) - (offset << PAGE_SHIFT); diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index cf94525be2c1..60bd2c8ed8b0 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -1383,7 +1383,7 @@ intel_partial_pages(const struct i915_ggtt_view *view, if (ret) goto err_sg_alloc;
- iter = i915_gem_object_get_sg_dma(obj, view->partial.offset, &offset); + iter = i915_gem_object_get_sg_dma(obj, view->partial.offset, &offset, true); GEM_BUG_ON(!iter);
sg = st->sgl; diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 93265951fdbb..8883a7d4964f 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -1136,38 +1136,19 @@ find_reg(const struct intel_engine_cs *engine, u32 addr) /* Returns a vmap'd pointer to dst_obj, which the caller must unmap */ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj, struct drm_i915_gem_object *src_obj, - unsigned long offset, unsigned long length) + unsigned long offset, unsigned long length, + void *dst, const void *src) { - bool needs_clflush; - void *dst, *src; - int ret; - - dst = i915_gem_object_pin_map(dst_obj, I915_MAP_FORCE_WB); - if (IS_ERR(dst)) - return dst; - - ret = i915_gem_object_pin_pages(src_obj); - if (ret) { - i915_gem_object_unpin_map(dst_obj); - return ERR_PTR(ret); - } - - needs_clflush = + bool needs_clflush = !(src_obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ);
- src = ERR_PTR(-ENODEV); - if (needs_clflush && i915_has_memcpy_from_wc()) { - src = i915_gem_object_pin_map(src_obj, I915_MAP_WC); - if (!IS_ERR(src)) { - i915_unaligned_memcpy_from_wc(dst, - src + offset, - length); - i915_gem_object_unpin_map(src_obj); - } - } - if (IS_ERR(src)) { - unsigned long x, n; + if (src) { + GEM_BUG_ON(!needs_clflush); + i915_unaligned_memcpy_from_wc(dst, src + offset, length); + } else { + struct scatterlist *sg; void *ptr; + unsigned int x, sg_ofs;
/* * We can avoid clflushing partial cachelines before the write @@ -1183,23 +1164,32 @@ static u32 *copy_batch(struct drm_i915_gem_object *dst_obj,
ptr = dst; x = offset_in_page(offset); - for (n = offset >> PAGE_SHIFT; length; n++) { - int len = min(length, PAGE_SIZE - x); - - src = kmap_atomic(i915_gem_object_get_page(src_obj, n)); - if (needs_clflush) - drm_clflush_virt_range(src + x, len); - memcpy(ptr, src + x, len); - kunmap_atomic(src); - - ptr += len; - length -= len; - x = 0; + + sg = i915_gem_object_get_sg(src_obj, offset >> PAGE_SHIFT, &sg_ofs, false); + + while (length) { + unsigned long sg_max = sg->length >> PAGE_SHIFT; + + for (; length && sg_ofs < sg_max; sg_ofs++) { + unsigned long len = min(length, PAGE_SIZE - x); + void *map; + + map = kmap_atomic(nth_page(sg_page(sg), sg_ofs)); + if (needs_clflush) + drm_clflush_virt_range(map + x, len); + memcpy(ptr, map + x, len); + kunmap_atomic(map); + + ptr += len; + length -= len; + x = 0; + } + + sg_ofs = 0; + sg = sg_next(sg); } }
- i915_gem_object_unpin_pages(src_obj); - /* dst_obj is returned with vmap pinned */ return dst; } @@ -1359,9 +1349,6 @@ static int check_bbstart(u32 *cmd, u32 offset, u32 length, if (target_cmd_index == offset) return 0;
- if (IS_ERR(jump_whitelist)) - return PTR_ERR(jump_whitelist); - if (!test_bit(target_cmd_index, jump_whitelist)) { DRM_DEBUG("CMD: BB_START to 0x%llx not a previously executed cmd\n", jump_target); @@ -1371,10 +1358,14 @@ static int check_bbstart(u32 *cmd, u32 offset, u32 length, return 0; }
-static unsigned long *alloc_whitelist(u32 batch_length) +unsigned long *intel_engine_cmd_parser_alloc_jump_whitelist(u32 batch_length, + bool trampoline) { unsigned long *jmp;
+ if (trampoline) + return NULL; + /* * We expect batch_length to be less than 256KiB for known users, * i.e. we need at most an 8KiB bitmap allocation which should be @@ -1417,14 +1408,16 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, unsigned long batch_offset, unsigned long batch_length, struct i915_vma *shadow, - bool trampoline) + unsigned long *jump_whitelist, + void *shadow_map, + const void *batch_map) { u32 *cmd, *batch_end, offset = 0; struct drm_i915_cmd_descriptor default_desc = noop_desc; const struct drm_i915_cmd_descriptor *desc = &default_desc; - unsigned long *jump_whitelist; u64 batch_addr, shadow_addr; int ret = 0; + bool trampoline = !jump_whitelist;
GEM_BUG_ON(!IS_ALIGNED(batch_offset, sizeof(*cmd))); GEM_BUG_ON(!IS_ALIGNED(batch_length, sizeof(*cmd))); @@ -1432,16 +1425,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, batch->size)); GEM_BUG_ON(!batch_length);
- cmd = copy_batch(shadow->obj, batch->obj, batch_offset, batch_length); - if (IS_ERR(cmd)) { - DRM_DEBUG("CMD: Failed to copy batch\n"); - return PTR_ERR(cmd); - } - - jump_whitelist = NULL; - if (!trampoline) - /* Defer failure until attempted use */ - jump_whitelist = alloc_whitelist(batch_length); + cmd = copy_batch(shadow->obj, batch->obj, batch_offset, batch_length, + shadow_map, batch_map);
shadow_addr = gen8_canonical_addr(shadow->node.start); batch_addr = gen8_canonical_addr(batch->node.start + batch_offset); @@ -1549,9 +1534,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs *engine, drm_clflush_virt_range(ptr, (void *)(cmd + 1) - ptr); }
- if (!IS_ERR_OR_NULL(jump_whitelist)) - kfree(jump_whitelist); - i915_gem_object_unpin_map(shadow->obj); return ret; }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0f7bf6831633..84182a40e777 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1951,12 +1951,17 @@ const char *i915_cache_level_str(struct drm_i915_private *i915, int type); int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv); void intel_engine_init_cmd_parser(struct intel_engine_cs *engine); void intel_engine_cleanup_cmd_parser(struct intel_engine_cs *engine); +unsigned long *intel_engine_cmd_parser_alloc_jump_whitelist(u32 batch_length, + bool trampoline); + int intel_engine_cmd_parser(struct intel_engine_cs *engine, struct i915_vma *batch, unsigned long batch_offset, unsigned long batch_length, struct i915_vma *shadow, - bool trampoline); + unsigned long *jump_whitelist, + void *shadow_map, + const void *batch_map); #define I915_CMD_PARSER_TRAMPOLINE_SIZE 8
/* intel_device_info.c */ diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/i915/i915_memcpy.c index 7b3b83bd5ab8..1b021a4902de 100644 --- a/drivers/gpu/drm/i915/i915_memcpy.c +++ b/drivers/gpu/drm/i915/i915_memcpy.c @@ -135,7 +135,7 @@ bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len) * accepts that its arguments may not be aligned, but are valid for the * potential 16-byte read past the end. */ -void i915_unaligned_memcpy_from_wc(void *dst, void *src, unsigned long len) +void i915_unaligned_memcpy_from_wc(void *dst, const void *src, unsigned long len) { unsigned long addr;
diff --git a/drivers/gpu/drm/i915/i915_memcpy.h b/drivers/gpu/drm/i915/i915_memcpy.h index e36d30edd987..3df063a3293b 100644 --- a/drivers/gpu/drm/i915/i915_memcpy.h +++ b/drivers/gpu/drm/i915/i915_memcpy.h @@ -13,7 +13,7 @@ struct drm_i915_private; void i915_memcpy_init_early(struct drm_i915_private *i915);
bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len); -void i915_unaligned_memcpy_from_wc(void *dst, void *src, unsigned long len); +void i915_unaligned_memcpy_from_wc(void *dst, const void *src, unsigned long len);
/* The movntdqa instructions used for memcpy-from-wc require 16-byte alignment, * as well as SSE4.1 support. i915_memcpy_from_wc() will report if it cannot
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
i915_vma_pin may fail with -EDEADLK when we start locking page tables, so ensure we handle this correctly.
Cc: Matthew Brost matthew.brost@intel.com Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 35 +++++++++++++------ 1 file changed, 24 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 60afa6f826d6..568c8321dc3d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -419,13 +419,14 @@ static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry, return pin_flags; }
-static inline bool +static inline int eb_pin_vma(struct i915_execbuffer *eb, const struct drm_i915_gem_exec_object2 *entry, struct eb_vma *ev) { struct i915_vma *vma = ev->vma; u64 pin_flags; + int err;
if (vma->node.size) pin_flags = vma->node.start; @@ -437,24 +438,29 @@ eb_pin_vma(struct i915_execbuffer *eb, pin_flags |= PIN_GLOBAL;
/* Attempt to reuse the current location if available */ - /* TODO: Add -EDEADLK handling here */ - if (unlikely(i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags))) { + err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags); + if (err == -EDEADLK) + return err; + + if (unlikely(err)) { if (entry->flags & EXEC_OBJECT_PINNED) - return false; + return err;
/* Failing that pick any _free_ space if suitable */ - if (unlikely(i915_vma_pin_ww(vma, &eb->ww, + err = i915_vma_pin_ww(vma, &eb->ww, entry->pad_to_size, entry->alignment, eb_pin_flags(entry, ev->flags) | - PIN_USER | PIN_NOEVICT))) - return false; + PIN_USER | PIN_NOEVICT); + if (unlikely(err)) + return err; }
if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) { - if (unlikely(i915_vma_pin_fence(vma))) { + err = i915_vma_pin_fence(vma); + if (unlikely(err)) { i915_vma_unpin(vma); - return false; + return err; }
if (vma->fence) @@ -462,7 +468,10 @@ eb_pin_vma(struct i915_execbuffer *eb, }
ev->flags |= __EXEC_OBJECT_HAS_PIN; - return !eb_vma_misplaced(entry, vma, ev->flags); + if (eb_vma_misplaced(entry, vma, ev->flags)) + return -EBADSLT; + + return 0; }
static inline void @@ -900,7 +909,11 @@ static int eb_validate_vmas(struct i915_execbuffer *eb) if (err) return err;
- if (eb_pin_vma(eb, entry, ev)) { + err = eb_pin_vma(eb, entry, ev); + if (err == -EDEADLK) + return err; + + if (!err) { if (entry->offset != vma->node.start) { entry->offset = vma->node.start | UPDATE; eb->args->flags |= __EXEC_HAS_RELOC;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Currently we have a lot of places where we hold the gem object lock, but haven't yet been converted to the ww dance. Complain loudly about those places.
i915_vma_pin shouldn't have the obj lock held, so we can do a ww dance, while i915_vma_pin_ww should.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_renderstate.c | 2 +- drivers/gpu/drm/i915/gt/intel_timeline.c | 4 +- drivers/gpu/drm/i915/i915_vma.c | 46 +++++++++++++++++++-- drivers/gpu/drm/i915/i915_vma.h | 5 +++ 4 files changed, 50 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.c b/drivers/gpu/drm/i915/gt/intel_renderstate.c index ea2a77c7b469..a68e5c23a67c 100644 --- a/drivers/gpu/drm/i915/gt/intel_renderstate.c +++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c @@ -196,7 +196,7 @@ int intel_renderstate_init(struct intel_renderstate *so, if (err) goto err_context;
- err = i915_vma_pin(so->vma, 0, 0, PIN_GLOBAL | PIN_HIGH); + err = i915_vma_pin_ww(so->vma, &so->ww, 0, 0, PIN_GLOBAL | PIN_HIGH); if (err) goto err_context;
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 479eb5440bc6..b2d04717db20 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -46,7 +46,7 @@ static int __hwsp_alloc(struct intel_gt *gt, struct intel_timeline_hwsp *hwsp) goto out_unlock; }
- /* Pin early so we can call i915_ggtt_pin unlocked. */ + /* Pin early so we can call i915_ggtt_pin_unlocked(). */ hwsp->vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); if (IS_ERR(hwsp->vaddr)) { ret = PTR_ERR(hwsp->vaddr); @@ -514,7 +514,7 @@ __intel_timeline_get_seqno(struct intel_timeline *tl, goto err_rollback; }
- err = i915_ggtt_pin(vma, NULL, 0, PIN_HIGH); + err = i915_ggtt_pin_unlocked(vma, 0, PIN_HIGH); if (err) { __idle_hwsp_free(vma->private, cacheline); goto err_rollback; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 8e8c80ccbe32..e07621825da9 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -862,7 +862,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, unsigned int bound; int err;
- if (IS_ENABLED(CONFIG_PROVE_LOCKING) && debug_locks) { +#ifdef CONFIG_PROVE_LOCKING + if (debug_locks) { bool pinned_bind_wo_alloc = vma->obj && i915_gem_object_has_pinned_pages(vma->obj) && !vma->vm->allocate_va_range; @@ -870,7 +871,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, if (lockdep_is_held(&vma->vm->i915->drm.struct_mutex) && !pinned_bind_wo_alloc) WARN_ON(!ww); + if (ww && vma->resv) + assert_vma_held(vma); } +#endif
BUILD_BUG_ON(PIN_GLOBAL != I915_VMA_GLOBAL_BIND); BUILD_BUG_ON(PIN_USER != I915_VMA_LOCAL_BIND); @@ -1017,8 +1021,8 @@ static void flush_idle_contexts(struct intel_gt *gt) intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT); }
-int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, - u32 align, unsigned int flags) +static int __i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, + u32 align, unsigned int flags, bool unlocked) { struct i915_address_space *vm = vma->vm; int err; @@ -1026,7 +1030,10 @@ int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, GEM_BUG_ON(!i915_vma_is_ggtt(vma));
do { - err = i915_vma_pin_ww(vma, ww, 0, align, flags | PIN_GLOBAL); + if (ww || unlocked) + err = i915_vma_pin_ww(vma, ww, 0, align, flags | PIN_GLOBAL); + else + err = i915_vma_pin(vma, 0, align, flags | PIN_GLOBAL); if (err != -ENOSPC) { if (!err) { err = i915_vma_wait_for_bind(vma); @@ -1045,6 +1052,37 @@ int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, } while (1); }
+int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, + u32 align, unsigned int flags) +{ +#ifdef CONFIG_LOCKDEP + WARN_ON(!ww && vma->resv && dma_resv_held(vma->resv)); +#endif + + return __i915_ggtt_pin(vma, ww, align, flags, false); +} + +/** + * i915_ggtt_pin_unlocked - Pin a vma to ggtt without the underlying + * object's dma-resv held, but with object pages pinned. + * + * @vma: The vma to pin. + * @align: ggtt alignment. + * @flags: Pinning flags + * + * RETURN: Zero on success, negative error code on error. + * + * This function relies on the fact that object pages are already pinned, + * and that ggtt pinning doesn't require any page table page allocations + * to pin a vma without dma_resv lock and ww acquire context. + */ +int i915_ggtt_pin_unlocked(struct i915_vma *vma, u32 align, unsigned int flags) +{ + if (IS_ENABLED(CONFIG_LOCKDEP)) + WARN_ON(vma->obj && !i915_gem_object_has_pinned_pages(vma->obj)); + return __i915_ggtt_pin(vma, NULL, align, flags, true); +} + static void __vma_close(struct i915_vma *vma, struct intel_gt *gt) { /* diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 5b3a3c653454..22387a361999 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -243,12 +243,17 @@ i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, static inline int __must_check i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) { +#ifdef CONFIG_LOCKDEP + WARN_ON_ONCE(vma->resv && dma_resv_held(vma->resv)); +#endif return i915_vma_pin_ww(vma, NULL, size, alignment, flags); }
int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, u32 align, unsigned int flags);
+int i915_ggtt_pin_unlocked(struct i915_vma *vma, u32 align, unsigned int flags); + static inline int i915_vma_pin_count(const struct i915_vma *vma) { return atomic_read(&vma->flags) & I915_VMA_PIN_MASK;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Doesn't need the full ww lock, only checking if pages are bound.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/i915_gem.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 58276694c848..b03e245640c0 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1051,10 +1051,14 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data, if (!obj) return -ENOENT;
- err = mutex_lock_interruptible(&obj->mm.lock); + err = i915_gem_object_lock_interruptible(obj, NULL); if (err) goto out;
+ err = mutex_lock_interruptible(&obj->mm.lock); + if (err) + goto out_ww; + if (i915_gem_object_has_pages(obj) && i915_gem_object_is_tiled(obj) && i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) { @@ -1099,6 +1103,8 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data, args->retained = obj->mm.madv != __I915_MADV_PURGED; mutex_unlock(&obj->mm.lock);
+out_ww: + i915_gem_object_unlock(obj); out: i915_gem_object_put(obj); return err;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We want to remove the changing of ops structure for attaching phys pages, so we need to kill off HAS_STRUCT_PAGE from ops->flags, and put it in the bo.
This will remove a potential race of dereferencing the wrong obj->ops without ww mutex held.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 6 +++--- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 7 +++---- drivers/gpu/drm/i915/gem/i915_gem_object.c | 4 +++- drivers/gpu/drm/i915/gem/i915_gem_object.h | 5 +++-- drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 8 +++++--- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 ++--- drivers/gpu/drm/i915/gem/i915_gem_phys.c | 2 ++ drivers/gpu/drm/i915/gem/i915_gem_region.c | 4 +--- drivers/gpu/drm/i915/gem/i915_gem_region.h | 3 +-- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 8 ++++---- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 6 +++--- drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c | 4 ++-- drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 10 +++++----- drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 11 ++++------- drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c | 12 ++++++++++++ drivers/gpu/drm/i915/gvt/dmabuf.c | 2 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/selftests/mock_region.c | 4 ++-- 21 files changed, 62 insertions(+), 51 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 04e9c04545ad..36e3c2765f4c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -258,7 +258,7 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev, }
drm_gem_private_object_init(dev, &obj->base, dma_buf->size); - i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops, &lock_class); + i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops, &lock_class, 0); obj->base.import_attach = attach; obj->base.resv = dma_buf->resv;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index ad22f42541bd..21cc40897ca8 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -138,8 +138,7 @@ static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj,
static const struct drm_i915_gem_object_ops i915_gem_object_internal_ops = { .name = "i915_gem_object_internal", - .flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE | - I915_GEM_OBJECT_IS_SHRINKABLE, + .flags = I915_GEM_OBJECT_IS_SHRINKABLE, .get_pages = i915_gem_object_get_pages_internal, .put_pages = i915_gem_object_put_pages_internal, }; @@ -178,7 +177,8 @@ i915_gem_object_create_internal(struct drm_i915_private *i915, return ERR_PTR(-ENOMEM);
drm_gem_private_object_init(&i915->drm, &obj->base, size); - i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class); + i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class, + I915_BO_ALLOC_STRUCT_PAGE);
/* * Mark the object as volatile, such that the pages are marked as diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index 932ee21e6609..e953965f8263 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -45,13 +45,13 @@ __i915_gem_lmem_object_create(struct intel_memory_region *mem, return ERR_PTR(-ENOMEM);
drm_gem_private_object_init(&i915->drm, &obj->base, size); - i915_gem_object_init(obj, &i915_gem_lmem_obj_ops, &lock_class); + i915_gem_object_init(obj, &i915_gem_lmem_obj_ops, &lock_class, flags);
obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
- i915_gem_object_init_memory_region(obj, mem, flags); + i915_gem_object_init_memory_region(obj, mem);
return obj; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index ec28a6cde49b..c0034d811e50 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -251,7 +251,7 @@ static vm_fault_t vm_fault_cpu(struct vm_fault *vmf) goto out;
iomap = -1; - if (!i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_STRUCT_PAGE)) { + if (!i915_gem_object_has_struct_page(obj)) { iomap = obj->mm.region->iomap.base; iomap -= obj->mm.region->region.start; } @@ -653,9 +653,8 @@ __assign_mmap_offset(struct drm_file *file, }
if (mmap_type != I915_MMAP_TYPE_GTT && - !i915_gem_object_type_has(obj, - I915_GEM_OBJECT_HAS_STRUCT_PAGE | - I915_GEM_OBJECT_HAS_IOMEM)) { + !i915_gem_object_has_struct_page(obj) && + !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM)) { err = -ENODEV; goto out; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 00d24000b5e8..1393988bd5af 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -60,7 +60,7 @@ void i915_gem_object_free(struct drm_i915_gem_object *obj)
void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops, - struct lock_class_key *key) + struct lock_class_key *key, unsigned flags) { __mutex_init(&obj->mm.lock, ops->name ?: "obj->mm.lock", key);
@@ -78,6 +78,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, init_rcu_head(&obj->rcu);
obj->ops = ops; + GEM_BUG_ON(flags & ~I915_BO_ALLOC_FLAGS); + obj->flags = flags;
obj->mm.madv = I915_MADV_WILLNEED; INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 80c5b2b326f5..16608bf7a4e9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -23,7 +23,8 @@ void i915_gem_object_free(struct drm_i915_gem_object *obj);
void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops, - struct lock_class_key *key); + struct lock_class_key *key, + unsigned alloc_flags); struct drm_i915_gem_object * i915_gem_object_create_shmem(struct drm_i915_private *i915, resource_size_t size); @@ -213,7 +214,7 @@ i915_gem_object_type_has(const struct drm_i915_gem_object *obj, static inline bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj) { - return i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_STRUCT_PAGE); + return obj->flags & I915_BO_ALLOC_STRUCT_PAGE; }
static inline bool diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index e2d9b7e1e152..b53e44b06b09 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -30,7 +30,6 @@ struct i915_lut_handle {
struct drm_i915_gem_object_ops { unsigned int flags; -#define I915_GEM_OBJECT_HAS_STRUCT_PAGE BIT(0) #define I915_GEM_OBJECT_HAS_IOMEM BIT(1) #define I915_GEM_OBJECT_IS_SHRINKABLE BIT(2) #define I915_GEM_OBJECT_IS_PROXY BIT(3) @@ -165,8 +164,11 @@ struct drm_i915_gem_object { unsigned long flags; #define I915_BO_ALLOC_CONTIGUOUS BIT(0) #define I915_BO_ALLOC_VOLATILE BIT(1) -#define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | I915_BO_ALLOC_VOLATILE) -#define I915_BO_READONLY BIT(2) +#define I915_BO_ALLOC_STRUCT_PAGE BIT(2) +#define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \ + I915_BO_ALLOC_VOLATILE | \ + I915_BO_ALLOC_STRUCT_PAGE) +#define I915_BO_READONLY BIT(3)
/* * Is the object to be mapped as read-only to the GPU diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index ca076203f5e9..7983423237e3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -328,13 +328,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, enum i915_map_type type) { enum i915_map_type has_type; - unsigned int flags; bool pinned; void *ptr; int err;
- flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE | I915_GEM_OBJECT_HAS_IOMEM; - if (!i915_gem_object_type_has(obj, flags)) + if (!i915_gem_object_has_struct_page(obj) && + !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM)) return ERR_PTR(-ENXIO);
err = mutex_lock_interruptible_nested(&obj->mm.lock, I915_MM_GET_PAGES); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 3a4dfe2ef1da..965590d3a570 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c @@ -240,6 +240,7 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align) pages = __i915_gem_object_unset_pages(obj);
obj->ops = &i915_gem_phys_ops; + obj->flags &= ~I915_BO_ALLOC_STRUCT_PAGE;
err = ____i915_gem_object_get_pages(obj); if (err) @@ -258,6 +259,7 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align)
err_xfer: obj->ops = &i915_gem_shmem_ops; + obj->flags |= I915_BO_ALLOC_STRUCT_PAGE; if (!IS_ERR_OR_NULL(pages)) { unsigned int sg_page_sizes = i915_sg_page_sizes(pages->sgl);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 1515384d7e0e..6a96741253b3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -102,13 +102,11 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) }
void i915_gem_object_init_memory_region(struct drm_i915_gem_object *obj, - struct intel_memory_region *mem, - unsigned long flags) + struct intel_memory_region *mem) { INIT_LIST_HEAD(&obj->mm.blocks); obj->mm.region = intel_memory_region_get(mem);
- obj->flags |= flags; if (obj->base.size <= mem->min_page_size) obj->flags |= I915_BO_ALLOC_CONTIGUOUS;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.h b/drivers/gpu/drm/i915/gem/i915_gem_region.h index f2ff6f8bff74..ebddc86d78f7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.h @@ -17,8 +17,7 @@ void i915_gem_object_put_pages_buddy(struct drm_i915_gem_object *obj, struct sg_table *pages);
void i915_gem_object_init_memory_region(struct drm_i915_gem_object *obj, - struct intel_memory_region *mem, - unsigned long flags); + struct intel_memory_region *mem); void i915_gem_object_release_memory_region(struct drm_i915_gem_object *obj);
struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 75e8b71c18b9..31c617a1115f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -430,8 +430,7 @@ static void shmem_release(struct drm_i915_gem_object *obj)
const struct drm_i915_gem_object_ops i915_gem_shmem_ops = { .name = "i915_gem_object_shmem", - .flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE | - I915_GEM_OBJECT_IS_SHRINKABLE, + .flags = I915_GEM_OBJECT_IS_SHRINKABLE,
.get_pages = shmem_get_pages, .put_pages = shmem_put_pages, @@ -496,7 +495,8 @@ create_shmem(struct intel_memory_region *mem, mapping_set_gfp_mask(mapping, mask); GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));
- i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class); + i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class, + I915_BO_ALLOC_STRUCT_PAGE);
obj->write_domain = I915_GEM_DOMAIN_CPU; obj->read_domains = I915_GEM_DOMAIN_CPU; @@ -520,7 +520,7 @@ create_shmem(struct intel_memory_region *mem,
i915_gem_object_set_cache_coherency(obj, cache_level);
- i915_gem_object_init_memory_region(obj, mem, 0); + i915_gem_object_init_memory_region(obj, mem);
return obj;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 29bffc6afcc1..5372b888ba01 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -636,7 +636,7 @@ __i915_gem_object_create_stolen(struct intel_memory_region *mem, goto err;
drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size); - i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class); + i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 0);
obj->stolen = stolen; obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT; @@ -647,7 +647,7 @@ __i915_gem_object_create_stolen(struct intel_memory_region *mem, if (err) goto cleanup;
- i915_gem_object_init_memory_region(obj, mem, 0); + i915_gem_object_init_memory_region(obj, mem);
return obj;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index f2eaed6aca3d..30edc5a0a54e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -702,8 +702,7 @@ i915_gem_userptr_dmabuf_export(struct drm_i915_gem_object *obj)
static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { .name = "i915_gem_object_userptr", - .flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE | - I915_GEM_OBJECT_IS_SHRINKABLE | + .flags = I915_GEM_OBJECT_IS_SHRINKABLE | I915_GEM_OBJECT_NO_MMAP | I915_GEM_OBJECT_ASYNC_CANCEL, .get_pages = i915_gem_userptr_get_pages, @@ -810,7 +809,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENOMEM;
drm_gem_private_object_init(dev, &obj->base, args->user_size); - i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class); + i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class, + I915_BO_ALLOC_STRUCT_PAGE); obj->read_domains = I915_GEM_DOMAIN_CPU; obj->write_domain = I915_GEM_DOMAIN_CPU; i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC); diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c index a768ec61e966..dfad86d74dd0 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c @@ -89,7 +89,6 @@ static void huge_put_pages(struct drm_i915_gem_object *obj,
static const struct drm_i915_gem_object_ops huge_ops = { .name = "huge-gem", - .flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE, .get_pages = huge_get_pages, .put_pages = huge_put_pages, }; @@ -115,7 +114,8 @@ huge_gem_object(struct drm_i915_private *i915, return ERR_PTR(-ENOMEM);
drm_gem_private_object_init(&i915->drm, &obj->base, dma_size); - i915_gem_object_init(obj, &huge_ops, &lock_class); + i915_gem_object_init(obj, &huge_ops, &lock_class, + I915_BO_ALLOC_STRUCT_PAGE);
obj->read_domains = I915_GEM_DOMAIN_CPU; obj->write_domain = I915_GEM_DOMAIN_CPU; diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 77a13527a7e6..709c63b9cfc4 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -140,8 +140,7 @@ static void put_huge_pages(struct drm_i915_gem_object *obj,
static const struct drm_i915_gem_object_ops huge_page_ops = { .name = "huge-gem", - .flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE | - I915_GEM_OBJECT_IS_SHRINKABLE, + .flags = I915_GEM_OBJECT_IS_SHRINKABLE, .get_pages = get_huge_pages, .put_pages = put_huge_pages, }; @@ -168,7 +167,8 @@ huge_pages_object(struct drm_i915_private *i915, return ERR_PTR(-ENOMEM);
drm_gem_private_object_init(&i915->drm, &obj->base, size); - i915_gem_object_init(obj, &huge_page_ops, &lock_class); + i915_gem_object_init(obj, &huge_page_ops, &lock_class, + I915_BO_ALLOC_STRUCT_PAGE);
i915_gem_object_set_volatile(obj);
@@ -319,9 +319,9 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single) drm_gem_private_object_init(&i915->drm, &obj->base, size);
if (single) - i915_gem_object_init(obj, &fake_ops_single, &lock_class); + i915_gem_object_init(obj, &fake_ops_single, &lock_class, 0); else - i915_gem_object_init(obj, &fake_ops, &lock_class); + i915_gem_object_init(obj, &fake_ops, &lock_class, 0);
i915_gem_object_set_volatile(obj);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index d27d87a678c8..3ac7628f3bc4 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -834,9 +834,8 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type) return false;
if (type != I915_MMAP_TYPE_GTT && - !i915_gem_object_type_has(obj, - I915_GEM_OBJECT_HAS_STRUCT_PAGE | - I915_GEM_OBJECT_HAS_IOMEM)) + !i915_gem_object_has_struct_page(obj) && + !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM)) return false;
return true; @@ -976,10 +975,8 @@ static const char *repr_mmap_type(enum i915_mmap_type type)
static bool can_access(const struct drm_i915_gem_object *obj) { - unsigned int flags = - I915_GEM_OBJECT_HAS_STRUCT_PAGE | I915_GEM_OBJECT_HAS_IOMEM; - - return i915_gem_object_type_has(obj, flags); + return i915_gem_object_has_struct_page(obj) || + i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM); }
static int __igt_mmap_access(struct drm_i915_private *i915, diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c index 8cee68c6a6dc..fb6a17701310 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c @@ -25,12 +25,24 @@ static int mock_phys_object(void *arg) goto out; }
+ if (!i915_gem_object_has_struct_page(obj)) { + err = -EINVAL; + pr_err("shmem has no struct page\n"); + goto out_obj; + } + err = i915_gem_object_attach_phys(obj, PAGE_SIZE); if (err) { pr_err("i915_gem_object_attach_phys failed, err=%d\n", err); goto out_obj; }
+ if (i915_gem_object_has_struct_page(obj)) { + err = -EINVAL; + pr_err("shmem has a struct page\n"); + goto out_obj; + } + if (obj->ops != &i915_gem_phys_ops) { pr_err("i915_gem_object_attach_phys did not create a phys object\n"); err = -EINVAL; diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.c b/drivers/gpu/drm/i915/gvt/dmabuf.c index c3eb3838fe88..d4f883f35b95 100644 --- a/drivers/gpu/drm/i915/gvt/dmabuf.c +++ b/drivers/gpu/drm/i915/gvt/dmabuf.c @@ -218,7 +218,7 @@ static struct drm_i915_gem_object *vgpu_create_gem(struct drm_device *dev,
drm_gem_private_object_init(dev, &obj->base, roundup(info->size, PAGE_SIZE)); - i915_gem_object_init(obj, &intel_vgpu_gem_ops, &lock_class); + i915_gem_object_init(obj, &intel_vgpu_gem_ops, &lock_class, 0); i915_gem_object_set_readonly(obj);
obj->read_domains = I915_GEM_DOMAIN_GTT; diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index c53a222e3dec..2cfe99c79034 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -120,7 +120,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size) goto err;
drm_gem_private_object_init(&i915->drm, &obj->base, size); - i915_gem_object_init(obj, &fake_ops, &lock_class); + i915_gem_object_init(obj, &fake_ops, &lock_class, 0);
i915_gem_object_set_volatile(obj);
diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c index 979d96f27c43..b046bd1a9ad3 100644 --- a/drivers/gpu/drm/i915/selftests/mock_region.c +++ b/drivers/gpu/drm/i915/selftests/mock_region.c @@ -32,13 +32,13 @@ mock_object_create(struct intel_memory_region *mem, return ERR_PTR(-ENOMEM);
drm_gem_private_object_init(&i915->drm, &obj->base, size); - i915_gem_object_init(obj, &mock_region_obj_ops, &lock_class); + i915_gem_object_init(obj, &mock_region_obj_ops, &lock_class, flags);
obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
- i915_gem_object_init_memory_region(obj, mem, flags); + i915_gem_object_init_memory_region(obj, mem);
return obj; }
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Instead of creating a separate object type, we make changes to the shmem type, to clear struct page backing. This will allow us to ensure we never run into a race when we exchange obj->ops with other function pointers.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 8 ++ drivers/gpu/drm/i915/gem/i915_gem_phys.c | 102 +++++++++--------- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 22 +++- .../drm/i915/gem/selftests/i915_gem_phys.c | 6 -- 4 files changed, 78 insertions(+), 60 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 16608bf7a4e9..e549b88693a2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -37,7 +37,15 @@ void __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, struct sg_table *pages, bool needs_clflush);
+int i915_gem_object_pwrite_phys(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pwrite *args); +int i915_gem_object_pread_phys(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pread *args); + int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align); +void i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj, + struct sg_table *pages); +
void i915_gem_flush_free_objects(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 965590d3a570..4bdd0429c08b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c @@ -76,6 +76,8 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
intel_gt_chipset_flush(&to_i915(obj->base.dev)->gt);
+ /* We're no longer struct page backed */ + obj->flags &= ~I915_BO_ALLOC_STRUCT_PAGE; __i915_gem_object_set_pages(obj, st, sg->length);
return 0; @@ -89,7 +91,7 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj) return -ENOMEM; }
-static void +void i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj, struct sg_table *pages) { @@ -134,9 +136,8 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj, vaddr, dma); }
-static int -phys_pwrite(struct drm_i915_gem_object *obj, - const struct drm_i915_gem_pwrite *args) +int i915_gem_object_pwrite_phys(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pwrite *args) { void *vaddr = sg_page(obj->mm.pages->sgl) + args->offset; char __user *user_data = u64_to_user_ptr(args->data_ptr); @@ -165,9 +166,8 @@ phys_pwrite(struct drm_i915_gem_object *obj, return 0; }
-static int -phys_pread(struct drm_i915_gem_object *obj, - const struct drm_i915_gem_pread *args) +int i915_gem_object_pread_phys(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pread *args) { void *vaddr = sg_page(obj->mm.pages->sgl) + args->offset; char __user *user_data = u64_to_user_ptr(args->data_ptr); @@ -186,86 +186,82 @@ phys_pread(struct drm_i915_gem_object *obj, return 0; }
-static void phys_release(struct drm_i915_gem_object *obj) +static int i915_gem_object_shmem_to_phys(struct drm_i915_gem_object *obj) { - fput(obj->base.filp); -} + struct sg_table *pages; + int err;
-static const struct drm_i915_gem_object_ops i915_gem_phys_ops = { - .name = "i915_gem_object_phys", - .get_pages = i915_gem_object_get_pages_phys, - .put_pages = i915_gem_object_put_pages_phys, + pages = __i915_gem_object_unset_pages(obj); + + err = i915_gem_object_get_pages_phys(obj); + if (err) + goto err_xfer;
- .pread = phys_pread, - .pwrite = phys_pwrite, + /* Perma-pin (until release) the physical set of pages */ + __i915_gem_object_pin_pages(obj);
- .release = phys_release, -}; + if (!IS_ERR_OR_NULL(pages)) + i915_gem_shmem_ops.put_pages(obj, pages); + + i915_gem_object_release_memory_region(obj); + return 0; + +err_xfer: + if (!IS_ERR_OR_NULL(pages)) { + unsigned int sg_page_sizes = i915_sg_page_sizes(pages->sgl); + + __i915_gem_object_set_pages(obj, pages, sg_page_sizes); + } + return err; +}
int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align) { - struct sg_table *pages; int err;
if (align > obj->base.size) return -EINVAL;
- if (obj->ops == &i915_gem_phys_ops) - return 0; - if (obj->ops != &i915_gem_shmem_ops) return -EINVAL;
+ if (!i915_gem_object_has_struct_page(obj)) + return 0; + err = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE); if (err) return err;
mutex_lock_nested(&obj->mm.lock, I915_MM_GET_PAGES);
+ if (unlikely(!i915_gem_object_has_struct_page(obj))) + goto out; + if (obj->mm.madv != I915_MADV_WILLNEED) { err = -EFAULT; - goto err_unlock; + goto out; }
if (obj->mm.quirked) { err = -EFAULT; - goto err_unlock; + goto out; }
- if (obj->mm.mapping) { + if (obj->mm.mapping || i915_gem_object_has_pinned_pages(obj)) { err = -EBUSY; - goto err_unlock; + goto out; }
- pages = __i915_gem_object_unset_pages(obj); - - obj->ops = &i915_gem_phys_ops; - obj->flags &= ~I915_BO_ALLOC_STRUCT_PAGE; - - err = ____i915_gem_object_get_pages(obj); - if (err) - goto err_xfer; - - /* Perma-pin (until release) the physical set of pages */ - __i915_gem_object_pin_pages(obj); - - if (!IS_ERR_OR_NULL(pages)) - i915_gem_shmem_ops.put_pages(obj, pages); - - i915_gem_object_release_memory_region(obj); - - mutex_unlock(&obj->mm.lock); - return 0; + if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) { + drm_dbg(obj->base.dev, + "Attempting to obtain a purgeable object\n"); + err = -EFAULT; + goto out; + }
-err_xfer: - obj->ops = &i915_gem_shmem_ops; - obj->flags |= I915_BO_ALLOC_STRUCT_PAGE; - if (!IS_ERR_OR_NULL(pages)) { - unsigned int sg_page_sizes = i915_sg_page_sizes(pages->sgl); + err = i915_gem_object_shmem_to_phys(obj);
- __i915_gem_object_set_pages(obj, pages, sg_page_sizes); - } -err_unlock: +out: mutex_unlock(&obj->mm.lock); return err; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 31c617a1115f..d590e0c3bd00 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -303,6 +303,11 @@ shmem_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages) struct pagevec pvec; struct page *page;
+ if (unlikely(!i915_gem_object_has_struct_page(obj))) { + i915_gem_object_put_pages_phys(obj, pages); + return; + } + __i915_gem_object_release_shmem(obj, pages, true);
i915_gem_gtt_finish_pages(obj, pages); @@ -343,6 +348,9 @@ shmem_pwrite(struct drm_i915_gem_object *obj, /* Caller already validated user args */ GEM_BUG_ON(!access_ok(user_data, arg->size));
+ if (!i915_gem_object_has_struct_page(obj)) + return i915_gem_object_pwrite_phys(obj, arg); + /* * Before we instantiate/pin the backing store for our use, we * can prepopulate the shmemfs filp efficiently using a write into @@ -421,9 +429,20 @@ shmem_pwrite(struct drm_i915_gem_object *obj, return 0; }
+static int +shmem_pread(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pread *arg) +{ + if (!i915_gem_object_has_struct_page(obj)) + return i915_gem_object_pread_phys(obj, arg); + + return -ENODEV; +} + static void shmem_release(struct drm_i915_gem_object *obj) { - i915_gem_object_release_memory_region(obj); + if (obj->flags & I915_BO_ALLOC_STRUCT_PAGE) + i915_gem_object_release_memory_region(obj);
fput(obj->base.filp); } @@ -438,6 +457,7 @@ const struct drm_i915_gem_object_ops i915_gem_shmem_ops = { .writeback = shmem_writeback,
.pwrite = shmem_pwrite, + .pread = shmem_pread,
.release = shmem_release, }; diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c index fb6a17701310..0cfa082047fe 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c @@ -38,12 +38,6 @@ static int mock_phys_object(void *arg) }
if (i915_gem_object_has_struct_page(obj)) { - err = -EINVAL; - pr_err("shmem has a struct page\n"); - goto out_obj; - } - - if (obj->ops != &i915_gem_phys_ops) { pr_err("i915_gem_object_attach_phys did not create a phys object\n"); err = -EINVAL; goto out_obj;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Simple adding of i915_gem_object_lock, we may start to pass ww to get_pages() in the future, but that won't be the case here; We override shmem's get_pages() handling by calling i915_gem_object_get_pages_phys(), no ww is needed.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 ++ drivers/gpu/drm/i915/gem/i915_gem_phys.c | 12 ++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 17 ++++++++++------- 3 files changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index e549b88693a2..47da3aff2a79 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -43,6 +43,8 @@ int i915_gem_object_pread_phys(struct drm_i915_gem_object *obj, const struct drm_i915_gem_pread *args);
int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align); +void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, + struct sg_table *pages); void i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj, struct sg_table *pages);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 4bdd0429c08b..144e4940eede 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c @@ -201,7 +201,7 @@ static int i915_gem_object_shmem_to_phys(struct drm_i915_gem_object *obj) __i915_gem_object_pin_pages(obj);
if (!IS_ERR_OR_NULL(pages)) - i915_gem_shmem_ops.put_pages(obj, pages); + i915_gem_object_put_pages_shmem(obj, pages);
i915_gem_object_release_memory_region(obj); return 0; @@ -232,7 +232,13 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align) if (err) return err;
- mutex_lock_nested(&obj->mm.lock, I915_MM_GET_PAGES); + err = i915_gem_object_lock_interruptible(obj, NULL); + if (err) + return err; + + err = mutex_lock_interruptible_nested(&obj->mm.lock, I915_MM_GET_PAGES); + if (err) + goto err_unlock;
if (unlikely(!i915_gem_object_has_struct_page(obj))) goto out; @@ -263,6 +269,8 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align)
out: mutex_unlock(&obj->mm.lock); +err_unlock: + i915_gem_object_unlock(obj); return err; }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index d590e0c3bd00..7a59fd1ea4e5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -296,18 +296,12 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, __start_cpu_write(obj); }
-static void -shmem_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages) +void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, struct sg_table *pages) { struct sgt_iter sgt_iter; struct pagevec pvec; struct page *page;
- if (unlikely(!i915_gem_object_has_struct_page(obj))) { - i915_gem_object_put_pages_phys(obj, pages); - return; - } - __i915_gem_object_release_shmem(obj, pages, true);
i915_gem_gtt_finish_pages(obj, pages); @@ -336,6 +330,15 @@ shmem_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages) kfree(pages); }
+static void +shmem_put_pages(struct drm_i915_gem_object *obj, struct sg_table *pages) +{ + if (likely(i915_gem_object_has_struct_page(obj))) + i915_gem_object_put_pages_shmem(obj, pages); + else + i915_gem_object_put_pages_phys(obj, pages); +} + static int shmem_pwrite(struct drm_i915_gem_object *obj, const struct drm_i915_gem_pwrite *arg)
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
As soon as we install fences, we should stop allocating memory in order to prevent any potential deadlocks.
This is required later on, when we start adding support for dma-fence annotations.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 24 ++++++++++++++----- drivers/gpu/drm/i915/i915_active.c | 20 ++++++++-------- drivers/gpu/drm/i915/i915_vma.c | 8 ++++--- drivers/gpu/drm/i915/i915_vma.h | 3 +++ 4 files changed, 36 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 568c8321dc3d..31e412e5c68a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -49,11 +49,12 @@ enum { #define DBG_FORCE_RELOC 0 /* choose one of the above! */ };
-#define __EXEC_OBJECT_HAS_PIN BIT(31) -#define __EXEC_OBJECT_HAS_FENCE BIT(30) -#define __EXEC_OBJECT_NEEDS_MAP BIT(29) -#define __EXEC_OBJECT_NEEDS_BIAS BIT(28) -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */ +/* __EXEC_OBJECT_NO_RESERVE is BIT(31), defined in i915_vma.h */ +#define __EXEC_OBJECT_HAS_PIN BIT(30) +#define __EXEC_OBJECT_HAS_FENCE BIT(29) +#define __EXEC_OBJECT_NEEDS_MAP BIT(28) +#define __EXEC_OBJECT_NEEDS_BIAS BIT(27) +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above + */ #define __EXEC_OBJECT_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
#define __EXEC_HAS_RELOC BIT(31) @@ -929,6 +930,12 @@ static int eb_validate_vmas(struct i915_execbuffer *eb) } }
+ if (!(ev->flags & EXEC_OBJECT_WRITE)) { + err = dma_resv_reserve_shared(vma->resv, 1); + if (err) + return err; + } + GEM_BUG_ON(drm_mm_node_allocated(&vma->node) && eb_vma_misplaced(&eb->exec[i], vma, ev->flags)); } @@ -2194,7 +2201,8 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb) }
if (err == 0) - err = i915_vma_move_to_active(vma, eb->request, flags); + err = i915_vma_move_to_active(vma, eb->request, + flags | __EXEC_OBJECT_NO_RESERVE); }
if (unlikely(err)) @@ -2446,6 +2454,10 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb, if (err) goto err_commit;
+ err = dma_resv_reserve_shared(shadow->resv, 1); + if (err) + goto err_commit; + /* Wait for all writes (and relocs) into the batch to complete */ err = i915_sw_fence_await_reservation(&pw->base.chain, pw->batch->resv, NULL, false, diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 10a865f3dc09..6ba4f878ab0e 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -296,18 +296,13 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx) static struct i915_active_fence * active_instance(struct i915_active *ref, u64 idx) { - struct active_node *node, *prealloc; + struct active_node *node; struct rb_node **p, *parent;
node = __active_lookup(ref, idx); if (likely(node)) return &node->base;
- /* Preallocate a replacement, just in case */ - prealloc = kmem_cache_alloc(global.slab_cache, GFP_KERNEL); - if (!prealloc) - return NULL; - spin_lock_irq(&ref->tree_lock); GEM_BUG_ON(i915_active_is_idle(ref));
@@ -317,10 +312,8 @@ active_instance(struct i915_active *ref, u64 idx) parent = *p;
node = rb_entry(parent, struct active_node, node); - if (node->timeline == idx) { - kmem_cache_free(global.slab_cache, prealloc); + if (node->timeline == idx) goto out; - }
if (node->timeline < idx) p = &parent->rb_right; @@ -328,7 +321,14 @@ active_instance(struct i915_active *ref, u64 idx) p = &parent->rb_left; }
- node = prealloc; + /* + * XXX: We should preallocate this before i915_active_ref() is ever + * called, but we cannot call into fs_reclaim() anyway, so use GFP_ATOMIC. + */ + node = kmem_cache_alloc(global.slab_cache, GFP_ATOMIC); + if (!node) + goto out; + __i915_active_fence_init(&node->base, NULL, node_retire); node->ref = ref; node->timeline = idx; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index e07621825da9..5b1d78fa748e 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -1281,9 +1281,11 @@ int i915_vma_move_to_active(struct i915_vma *vma, obj->write_domain = I915_GEM_DOMAIN_RENDER; obj->read_domains = 0; } else { - err = dma_resv_reserve_shared(vma->resv, 1); - if (unlikely(err)) - return err; + if (!(flags & __EXEC_OBJECT_NO_RESERVE)) { + err = dma_resv_reserve_shared(vma->resv, 1); + if (unlikely(err)) + return err; + }
dma_resv_add_shared_fence(vma->resv, &rq->fence); obj->write_domain = 0; diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 22387a361999..a2e7b58b70ca 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -52,6 +52,9 @@ static inline bool i915_vma_is_active(const struct i915_vma *vma) return !i915_active_is_idle(&vma->active); }
+/* do not reserve memory to prevent deadlocks */ +#define __EXEC_OBJECT_NO_RESERVE BIT(31) + int __must_check __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq); int __must_check i915_vma_move_to_active(struct i915_vma *vma,
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Userptr should not need the kernel for a userspace memcpy, userspace needs to call memcpy directly.
Specifically, disable i915_gem_pwrite_ioctl() and i915_gem_pread_ioctl().
Still needs an ack from relevant userspace that it won't break, but should be good.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 30edc5a0a54e..8c3d1eb2f96a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -700,6 +700,24 @@ i915_gem_userptr_dmabuf_export(struct drm_i915_gem_object *obj) return i915_gem_userptr_init__mmu_notifier(obj, 0); }
+static int +i915_gem_userptr_pwrite(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pwrite *args) +{ + drm_dbg(obj->base.dev, "pwrite to userptr no longer allowed\n"); + + return -EINVAL; +} + +static int +i915_gem_userptr_pread(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pread *args) +{ + drm_dbg(obj->base.dev, "pread from userptr no longer allowed\n"); + + return -EINVAL; +} + static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { .name = "i915_gem_object_userptr", .flags = I915_GEM_OBJECT_IS_SHRINKABLE | @@ -708,6 +726,8 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { .get_pages = i915_gem_userptr_get_pages, .put_pages = i915_gem_userptr_put_pages, .dmabuf_export = i915_gem_userptr_dmabuf_export, + .pwrite = i915_gem_userptr_pwrite, + .pread = i915_gem_userptr_pread, .release = i915_gem_userptr_release, };
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
It doesn't make sense to export a memory address, we will prevent allowing access this way to different address spaces when we rework userptr handling, so best to explicitly disable it.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 8c3d1eb2f96a..44af6265948d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -694,10 +694,9 @@ i915_gem_userptr_release(struct drm_i915_gem_object *obj) static int i915_gem_userptr_dmabuf_export(struct drm_i915_gem_object *obj) { - if (obj->userptr.mmu_object) - return 0; + drm_dbg(obj->base.dev, "Exporting userptr no longer allowed\n");
- return i915_gem_userptr_init__mmu_notifier(obj, 0); + return -EINVAL; }
static int
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
There are a couple of ioctl's related to tiling and cache placement, that make no sense for userptr, reject those: - i915_gem_set_tiling_ioctl() Tiling should always be linear for userptr. Changing placement will fail with -ENXIO. - i915_gem_set_caching_ioctl() Userptr memory should always be cached. Changing will fail with -ENXIO. - i915_gem_set_domain_ioctl() Changed to be equivalent to gem_wait, which is correct for the cached linear userptr pointers. This is required because we cannot grab a reference to the pages in the rework, but waiting for idle will do the same. Still needs an ack from relevant userspace that it won't break, but should be good.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 4 +++- drivers/gpu/drm/i915/gem/i915_gem_object.h | 6 ++++++ drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 ++- 4 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index ba26545392bc..f36921a3c4bc 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -17854,7 +17854,7 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb, struct drm_i915_gem_object *obj = intel_fb_obj(fb); struct drm_i915_private *i915 = to_i915(obj->base.dev);
- if (obj->userptr.mm) { + if (i915_gem_object_is_userptr(obj)) { drm_dbg(&i915->drm, "attempting to use a userptr for a framebuffer, denied\n"); return -EINVAL; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index fcce6909f201..c1d4bf62b3ea 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -528,7 +528,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, * considered to be outside of any cache domain. */ if (i915_gem_object_is_proxy(obj)) { - err = -ENXIO; + /* silently allow userptr to complete */ + if (!i915_gem_object_is_userptr(obj)) + err = -ENXIO; goto out; }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 47da3aff2a79..95907b8eb4c4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -551,6 +551,12 @@ void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj, void __i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, enum fb_op_origin origin);
+static inline bool +i915_gem_object_is_userptr(struct drm_i915_gem_object *obj) +{ + return obj->userptr.mm; +} + static inline void i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj, enum fb_op_origin origin) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 44af6265948d..64a946d5f753 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -721,7 +721,8 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { .name = "i915_gem_object_userptr", .flags = I915_GEM_OBJECT_IS_SHRINKABLE | I915_GEM_OBJECT_NO_MMAP | - I915_GEM_OBJECT_ASYNC_CANCEL, + I915_GEM_OBJECT_ASYNC_CANCEL | + I915_GEM_OBJECT_IS_PROXY, .get_pages = i915_gem_userptr_get_pages, .put_pages = i915_gem_userptr_put_pages, .dmabuf_export = i915_gem_userptr_dmabuf_export,
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We should not allow this any more, as it will break with the new userptr implementation, it could still be made to work, but there's no point in doing so.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 64a946d5f753..241f865077b9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -224,7 +224,7 @@ i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj, struct i915_mmu_object *mo;
if (flags & I915_USERPTR_UNSYNCHRONIZED) - return capable(CAP_SYS_ADMIN) ? 0 : -EPERM; + return -ENODEV;
if (GEM_WARN_ON(!obj->userptr.mm)) return -EINVAL; @@ -274,13 +274,7 @@ static int i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj, unsigned flags) { - if ((flags & I915_USERPTR_UNSYNCHRONIZED) == 0) - return -ENODEV; - - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - - return 0; + return -ENODEV; }
static void
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Now that unsynchronized mappings are removed, the only time userptr works is when the MMU notifier is enabled. Put all of the userptr code behind a mmu notifier ifdef.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 + drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 ++ .../gpu/drm/i915/gem/i915_gem_object_types.h | 2 + drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 58 +++++++------------ drivers/gpu/drm/i915/i915_drv.h | 2 + 5 files changed, 31 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 31e412e5c68a..064285a5009b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1970,8 +1970,10 @@ static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb, err = 0; }
+#ifdef CONFIG_MMU_NOTIFIER if (!err) flush_workqueue(eb->i915->mm.userptr_wq); +#endif
err_relock: i915_gem_ww_ctx_init(&eb->ww, true); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 95907b8eb4c4..7b3a84f98b42 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -554,7 +554,11 @@ void __i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, static inline bool i915_gem_object_is_userptr(struct drm_i915_gem_object *obj) { +#ifdef CONFIG_MMU_NOTIFIER return obj->userptr.mm; +#else + return false; +#endif }
static inline void diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index b53e44b06b09..6d3f451c15c6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -289,6 +289,7 @@ struct drm_i915_gem_object { unsigned long *bit_17;
union { +#ifdef CONFIG_MMU_NOTIFIER struct i915_gem_userptr { uintptr_t ptr;
@@ -296,6 +297,7 @@ struct drm_i915_gem_object { struct i915_mmu_object *mmu_object; struct work_struct *work; } userptr; +#endif
unsigned long scratch; u64 encode; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 241f865077b9..1183b28c084b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -15,6 +15,8 @@ #include "i915_gem_object.h" #include "i915_scatterlist.h"
+#if defined(CONFIG_MMU_NOTIFIER) + struct i915_mm_struct { struct mm_struct *mm; struct drm_i915_private *i915; @@ -24,7 +26,6 @@ struct i915_mm_struct { struct rcu_work work; };
-#if defined(CONFIG_MMU_NOTIFIER) #include <linux/interval_tree.h>
struct i915_mmu_notifier { @@ -217,15 +218,11 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm) }
static int -i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj, - unsigned flags) +i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj) { struct i915_mmu_notifier *mn; struct i915_mmu_object *mo;
- if (flags & I915_USERPTR_UNSYNCHRONIZED) - return -ENODEV; - if (GEM_WARN_ON(!obj->userptr.mm)) return -EINVAL;
@@ -258,32 +255,6 @@ i915_mmu_notifier_free(struct i915_mmu_notifier *mn, kfree(mn); }
-#else - -static void -__i915_gem_userptr_set_active(struct drm_i915_gem_object *obj, bool value) -{ -} - -static void -i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj) -{ -} - -static int -i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj, - unsigned flags) -{ - return -ENODEV; -} - -static void -i915_mmu_notifier_free(struct i915_mmu_notifier *mn, - struct mm_struct *mm) -{ -} - -#endif
static struct i915_mm_struct * __i915_mm_struct_find(struct drm_i915_private *i915, struct mm_struct *real) @@ -725,6 +696,8 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { .release = i915_gem_userptr_release, };
+#endif + /* * Creates a new mm object that wraps some normal memory from the process * context - user memory. @@ -765,12 +738,12 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { - static struct lock_class_key lock_class; + static struct lock_class_key __maybe_unused lock_class; struct drm_i915_private *dev_priv = to_i915(dev); struct drm_i915_gem_userptr *args = data; - struct drm_i915_gem_object *obj; - int ret; - u32 handle; + struct drm_i915_gem_object __maybe_unused *obj; + int __maybe_unused ret; + u32 __maybe_unused handle;
if (!HAS_LLC(dev_priv) && !HAS_SNOOP(dev_priv)) { /* We cannot support coherent userptr objects on hw without @@ -809,6 +782,9 @@ i915_gem_userptr_ioctl(struct drm_device *dev, if (!access_ok((char __user *)(unsigned long)args->user_ptr, args->user_size)) return -EFAULT;
+ if (args->flags & I915_USERPTR_UNSYNCHRONIZED) + return -ENODEV; + if (args->flags & I915_USERPTR_READ_ONLY) { /* * On almost all of the older hw, we cannot tell the GPU that @@ -818,6 +794,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -ENODEV; }
+#ifdef CONFIG_MMU_NOTIFIER obj = i915_gem_object_alloc(); if (obj == NULL) return -ENOMEM; @@ -839,7 +816,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev, */ ret = i915_gem_userptr_init__mm_struct(obj); if (ret == 0) - ret = i915_gem_userptr_init__mmu_notifier(obj, args->flags); + ret = i915_gem_userptr_init__mmu_notifier(obj); if (ret == 0) ret = drm_gem_handle_create(file, &obj->base, &handle);
@@ -850,10 +827,14 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
args->handle = handle; return 0; +#else + return -ENODEV; +#endif }
int i915_gem_init_userptr(struct drm_i915_private *dev_priv) { +#ifdef CONFIG_MMU_NOTIFIER spin_lock_init(&dev_priv->mm_lock); hash_init(dev_priv->mm_structs);
@@ -863,11 +844,14 @@ int i915_gem_init_userptr(struct drm_i915_private *dev_priv) 0); if (!dev_priv->mm.userptr_wq) return -ENOMEM; +#endif
return 0; }
void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv) { +#ifdef CONFIG_MMU_NOTIFIER destroy_workqueue(dev_priv->mm.userptr_wq); +#endif } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 84182a40e777..d3c67e17cd02 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -589,12 +589,14 @@ struct i915_gem_mm { struct notifier_block vmap_notifier; struct shrinker shrinker;
+#ifdef CONFIG_MMU_NOTIFIER /** * Workqueue to fault in userptr pages, flushed by the execbuf * when required but otherwise left to userspace to try again * on EAGAIN. */ struct workqueue_struct *userptr_wq; +#endif
/* shrinker accounting, also useful for userland debugging */ u64 shrink_memory;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Instead of doing what we do currently, which will never work with PROVE_LOCKING, do the same as AMD does, and something similar to relocation slowpath. When all locks are dropped, we acquire the pages for pinning. When the locks are taken, we transfer those pages in .get_pages() to the bo. As a final check before installing the fences, we ensure that the mmu notifier was not called; if it is, we return -EAGAIN to userspace to signal it has to start over.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 101 ++- drivers/gpu/drm/i915/gem/i915_gem_object.h | 35 +- .../gpu/drm/i915/gem/i915_gem_object_types.h | 10 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 764 ++++++------------ drivers/gpu/drm/i915/i915_drv.h | 9 +- drivers/gpu/drm/i915/i915_gem.c | 5 +- 7 files changed, 344 insertions(+), 582 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 064285a5009b..f5ea49e244ca 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -52,14 +52,16 @@ enum { /* __EXEC_OBJECT_NO_RESERVE is BIT(31), defined in i915_vma.h */ #define __EXEC_OBJECT_HAS_PIN BIT(30) #define __EXEC_OBJECT_HAS_FENCE BIT(29) -#define __EXEC_OBJECT_NEEDS_MAP BIT(28) -#define __EXEC_OBJECT_NEEDS_BIAS BIT(27) -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above + */ +#define __EXEC_OBJECT_USERPTR_INIT BIT(28) +#define __EXEC_OBJECT_NEEDS_MAP BIT(27) +#define __EXEC_OBJECT_NEEDS_BIAS BIT(26) +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 26) /* all of the above + */ #define __EXEC_OBJECT_RESERVED (__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE)
#define __EXEC_HAS_RELOC BIT(31) #define __EXEC_ENGINE_PINNED BIT(30) -#define __EXEC_INTERNAL_FLAGS (~0u << 30) +#define __EXEC_USERPTR_USED BIT(29) +#define __EXEC_INTERNAL_FLAGS (~0u << 29) #define UPDATE PIN_OFFSET_FIXED
#define BATCH_OFFSET_BIAS (256*1024) @@ -865,6 +867,26 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) }
eb_add_vma(eb, i, batch, vma); + + if (i915_gem_object_is_userptr(vma->obj)) { + err = i915_gem_object_userptr_submit_init(vma->obj); + if (err) { + if (i + 1 < eb->buffer_count) { + /* + * Execbuffer code expects last vma entry to be NULL, + * since we already initialized this entry, + * set the next value to NULL or we mess up + * cleanup handling. + */ + eb->vma[i + 1].vma = NULL; + } + + return err; + } + + eb->vma[i].flags |= __EXEC_OBJECT_USERPTR_INIT; + eb->args->flags |= __EXEC_USERPTR_USED; + } }
if (unlikely(eb->batch->flags & EXEC_OBJECT_WRITE)) { @@ -966,7 +988,7 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle) } }
-static void eb_release_vmas(struct i915_execbuffer *eb, bool final) +static void eb_release_vmas(struct i915_execbuffer *eb, bool final, bool release_userptr) { const unsigned int count = eb->buffer_count; unsigned int i; @@ -980,6 +1002,11 @@ static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
eb_unreserve_vma(ev);
+ if (release_userptr && ev->flags & __EXEC_OBJECT_USERPTR_INIT) { + ev->flags &= ~__EXEC_OBJECT_USERPTR_INIT; + i915_gem_object_userptr_submit_fini(vma->obj); + } + if (final) i915_vma_put(vma); } @@ -1915,6 +1942,31 @@ static int eb_prefault_relocations(const struct i915_execbuffer *eb) return 0; }
+static int eb_reinit_userptr(struct i915_execbuffer *eb) +{ + const unsigned int count = eb->buffer_count; + unsigned int i; + int ret; + + if (likely(!(eb->args->flags & __EXEC_USERPTR_USED))) + return 0; + + for (i = 0; i < count; i++) { + struct eb_vma *ev = &eb->vma[i]; + + if (!i915_gem_object_is_userptr(ev->vma->obj)) + continue; + + ret = i915_gem_object_userptr_submit_init(ev->vma->obj); + if (ret) + return ret; + + ev->flags |= __EXEC_OBJECT_USERPTR_INIT; + } + + return 0; +} + static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb, struct i915_request *rq) { @@ -1929,7 +1981,7 @@ static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb, }
/* We may process another execbuffer during the unlock... */ - eb_release_vmas(eb, false); + eb_release_vmas(eb, false, true); i915_gem_ww_ctx_fini(&eb->ww);
if (rq) { @@ -1970,10 +2022,8 @@ static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb, err = 0; }
-#ifdef CONFIG_MMU_NOTIFIER if (!err) - flush_workqueue(eb->i915->mm.userptr_wq); -#endif + err = eb_reinit_userptr(eb);
err_relock: i915_gem_ww_ctx_init(&eb->ww, true); @@ -2035,7 +2085,7 @@ static noinline int eb_relocate_parse_slow(struct i915_execbuffer *eb,
err: if (err == -EDEADLK) { - eb_release_vmas(eb, false); + eb_release_vmas(eb, false, false); err = i915_gem_ww_ctx_backoff(&eb->ww); if (!err) goto repeat_validate; @@ -2132,7 +2182,7 @@ static int eb_relocate_parse(struct i915_execbuffer *eb)
err: if (err == -EDEADLK) { - eb_release_vmas(eb, false); + eb_release_vmas(eb, false, false); err = i915_gem_ww_ctx_backoff(&eb->ww); if (!err) goto retry; @@ -2207,6 +2257,30 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb) flags | __EXEC_OBJECT_NO_RESERVE); }
+#ifdef CONFIG_MMU_NOTIFIER + if (!err && (eb->args->flags & __EXEC_USERPTR_USED)) { + spin_lock(&eb->i915->mm.notifier_lock); + + /* + * count is always at least 1, otherwise __EXEC_USERPTR_USED + * could not have been set + */ + for (i = 0; i < count; i++) { + struct eb_vma *ev = &eb->vma[i]; + struct drm_i915_gem_object *obj = ev->vma->obj; + + if (!i915_gem_object_is_userptr(obj)) + continue; + + err = i915_gem_object_userptr_submit_done(obj); + if (err) + break; + } + + spin_unlock(&eb->i915->mm.notifier_lock); + } +#endif + if (unlikely(err)) goto err_skip;
@@ -3347,7 +3421,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
err = eb_lookup_vmas(&eb); if (err) { - eb_release_vmas(&eb, true); + eb_release_vmas(&eb, true, true); goto err_engine; }
@@ -3419,6 +3493,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
trace_i915_request_queue(eb.request, eb.batch_flags); err = eb_submit(&eb, batch); + err_request: i915_request_get(eb.request); eb_request_add(&eb); @@ -3439,7 +3514,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, i915_request_put(eb.request);
err_vma: - eb_release_vmas(&eb, true); + eb_release_vmas(&eb, true, true); if (eb.trampoline) i915_vma_unpin(eb.trampoline); WARN_ON(err == -EDEADLK); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 7b3a84f98b42..33412248f6df 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -33,6 +33,7 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915, const void *data, resource_size_t size);
extern const struct drm_i915_gem_object_ops i915_gem_shmem_ops; + void __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, struct sg_table *pages, bool needs_clflush); @@ -245,12 +246,6 @@ i915_gem_object_never_mmap(const struct drm_i915_gem_object *obj) return i915_gem_object_type_has(obj, I915_GEM_OBJECT_NO_MMAP); }
-static inline bool -i915_gem_object_needs_async_cancel(const struct drm_i915_gem_object *obj) -{ - return i915_gem_object_type_has(obj, I915_GEM_OBJECT_ASYNC_CANCEL); -} - static inline bool i915_gem_object_is_framebuffer(const struct drm_i915_gem_object *obj) { @@ -551,16 +546,6 @@ void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj, void __i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, enum fb_op_origin origin);
-static inline bool -i915_gem_object_is_userptr(struct drm_i915_gem_object *obj) -{ -#ifdef CONFIG_MMU_NOTIFIER - return obj->userptr.mm; -#else - return false; -#endif -} - static inline void i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj, enum fb_op_origin origin) @@ -577,4 +562,22 @@ i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, __i915_gem_object_invalidate_frontbuffer(obj, origin); }
+#ifdef CONFIG_MMU_NOTIFIER +static inline bool +i915_gem_object_is_userptr(struct drm_i915_gem_object *obj) +{ + return obj->userptr.notifier.mm; +} + +int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj); +int i915_gem_object_userptr_submit_done(struct drm_i915_gem_object *obj); +void i915_gem_object_userptr_submit_fini(struct drm_i915_gem_object *obj); +#else +static inline bool i915_gem_object_is_userptr(struct drm_i915_gem_object *obj) { return false; } + +static inline int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj) { GEM_BUG_ON(1); return -ENODEV; } +static inline int i915_gem_object_userptr_submit_done(struct drm_i915_gem_object *obj) { GEM_BUG_ON(1); return -ENODEV; } +static inline void i915_gem_object_userptr_submit_fini(struct drm_i915_gem_object *obj) { GEM_BUG_ON(1); } +#endif + #endif diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 6d3f451c15c6..5234c1ed62d4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -7,6 +7,8 @@ #ifndef __I915_GEM_OBJECT_TYPES_H__ #define __I915_GEM_OBJECT_TYPES_H__
+#include <linux/mmu_notifier.h> + #include <drm/drm_gem.h> #include <uapi/drm/i915_drm.h>
@@ -34,7 +36,6 @@ struct drm_i915_gem_object_ops { #define I915_GEM_OBJECT_IS_SHRINKABLE BIT(2) #define I915_GEM_OBJECT_IS_PROXY BIT(3) #define I915_GEM_OBJECT_NO_MMAP BIT(4) -#define I915_GEM_OBJECT_ASYNC_CANCEL BIT(5)
/* Interface between the GEM object and its backing storage. * get_pages() is called once prior to the use of the associated set @@ -292,10 +293,11 @@ struct drm_i915_gem_object { #ifdef CONFIG_MMU_NOTIFIER struct i915_gem_userptr { uintptr_t ptr; + unsigned long notifier_seq;
- struct i915_mm_struct *mm; - struct i915_mmu_object *mmu_object; - struct work_struct *work; + struct mmu_interval_notifier notifier; + struct page **pvec; + int page_ref; } userptr; #endif
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 7983423237e3..60149cad6080 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -223,7 +223,7 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj) * get_pages backends we should be better able to handle the * cancellation of the async task in a more uniform manner. */ - if (!pages && !i915_gem_object_needs_async_cancel(obj)) + if (!pages) pages = ERR_PTR(-EINVAL);
if (!IS_ERR(pages)) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 1183b28c084b..9ea9aa65ade1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -2,10 +2,39 @@ * SPDX-License-Identifier: MIT * * Copyright © 2012-2014 Intel Corporation + * + * Based on amdgpu_mn, which bears the following notice: + * + * Copyright 2014 Advanced Micro Devices, Inc. + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + */ +/* + * Authors: + * Christian König christian.koenig@amd.com */
#include <linux/mmu_context.h> -#include <linux/mmu_notifier.h> #include <linux/mempolicy.h> #include <linux/swap.h> #include <linux/sched/mm.h> @@ -15,365 +44,106 @@ #include "i915_gem_object.h" #include "i915_scatterlist.h"
-#if defined(CONFIG_MMU_NOTIFIER) - -struct i915_mm_struct { - struct mm_struct *mm; - struct drm_i915_private *i915; - struct i915_mmu_notifier *mn; - struct hlist_node node; - struct kref kref; - struct rcu_work work; -}; - -#include <linux/interval_tree.h> - -struct i915_mmu_notifier { - spinlock_t lock; - struct hlist_node node; - struct mmu_notifier mn; - struct rb_root_cached objects; - struct i915_mm_struct *mm; -}; - -struct i915_mmu_object { - struct i915_mmu_notifier *mn; - struct drm_i915_gem_object *obj; - struct interval_tree_node it; -}; - -static void add_object(struct i915_mmu_object *mo) -{ - GEM_BUG_ON(!RB_EMPTY_NODE(&mo->it.rb)); - interval_tree_insert(&mo->it, &mo->mn->objects); -} - -static void del_object(struct i915_mmu_object *mo) -{ - if (RB_EMPTY_NODE(&mo->it.rb)) - return; - - interval_tree_remove(&mo->it, &mo->mn->objects); - RB_CLEAR_NODE(&mo->it.rb); -} +#ifdef CONFIG_MMU_NOTIFIER
-static void -__i915_gem_userptr_set_active(struct drm_i915_gem_object *obj, bool value) +/** + * i915_gem_userptr_invalidate - callback to notify about mm change + * + * @mni: the range (mm) is about to update + * @range: details on the invalidation + * @cur_seq: Value to pass to mmu_interval_set_seq() + * + * Block for operations on BOs to finish and mark pages as accessed and + * potentially dirty. + */ +static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) { - struct i915_mmu_object *mo = obj->userptr.mmu_object; - - /* - * During mm_invalidate_range we need to cancel any userptr that - * overlaps the range being invalidated. Doing so requires the - * struct_mutex, and that risks recursion. In order to cause - * recursion, the user must alias the userptr address space with - * a GTT mmapping (possible with a MAP_FIXED) - then when we have - * to invalidate that mmaping, mm_invalidate_range is called with - * the userptr address *and* the struct_mutex held. To prevent that - * we set a flag under the i915_mmu_notifier spinlock to indicate - * whether this object is valid. - */ - if (!mo) - return; + struct drm_i915_gem_object *obj = container_of(mni, struct drm_i915_gem_object, userptr.notifier); + struct drm_i915_private *i915 = to_i915(obj->base.dev); + long r;
- spin_lock(&mo->mn->lock); - if (value) - add_object(mo); - else - del_object(mo); - spin_unlock(&mo->mn->lock); -} + if (!mmu_notifier_range_blockable(range)) + return false;
-static int -userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, - const struct mmu_notifier_range *range) -{ - struct i915_mmu_notifier *mn = - container_of(_mn, struct i915_mmu_notifier, mn); - struct interval_tree_node *it; - unsigned long end; - int ret = 0; - - if (RB_EMPTY_ROOT(&mn->objects.rb_root)) - return 0; - - /* interval ranges are inclusive, but invalidate range is exclusive */ - end = range->end - 1; - - spin_lock(&mn->lock); - it = interval_tree_iter_first(&mn->objects, range->start, end); - while (it) { - struct drm_i915_gem_object *obj; - - if (!mmu_notifier_range_blockable(range)) { - ret = -EAGAIN; - break; - } + spin_lock(&i915->mm.notifier_lock);
- /* - * The mmu_object is released late when destroying the - * GEM object so it is entirely possible to gain a - * reference on an object in the process of being freed - * since our serialisation is via the spinlock and not - * the struct_mutex - and consequently use it after it - * is freed and then double free it. To prevent that - * use-after-free we only acquire a reference on the - * object if it is not in the process of being destroyed. - */ - obj = container_of(it, struct i915_mmu_object, it)->obj; - if (!kref_get_unless_zero(&obj->base.refcount)) { - it = interval_tree_iter_next(it, range->start, end); - continue; - } - spin_unlock(&mn->lock); + mmu_interval_set_seq(mni, cur_seq);
- ret = i915_gem_object_unbind(obj, - I915_GEM_OBJECT_UNBIND_ACTIVE | - I915_GEM_OBJECT_UNBIND_BARRIER); - if (ret == 0) - ret = __i915_gem_object_put_pages(obj); - i915_gem_object_put(obj); - if (ret) - return ret; + spin_unlock(&i915->mm.notifier_lock);
- spin_lock(&mn->lock); + /* During exit there's no need to wait */ + if (current->flags & PF_EXITING) + return true;
- /* - * As we do not (yet) protect the mmu from concurrent insertion - * over this range, there is no guarantee that this search will - * terminate given a pathologic workload. - */ - it = interval_tree_iter_first(&mn->objects, range->start, end); - } - spin_unlock(&mn->lock); - - return ret; + /* we will unbind on next submission, still have userptr pins */ + r = dma_resv_wait_timeout_rcu(obj->base.resv, true, false, + MAX_SCHEDULE_TIMEOUT); + if (r <= 0) + drm_err(&i915->drm, "(%ld) failed to wait for idle\n", r);
+ return true; }
-static const struct mmu_notifier_ops i915_gem_userptr_notifier = { - .invalidate_range_start = userptr_mn_invalidate_range_start, +static const struct mmu_interval_notifier_ops i915_gem_userptr_notifier_ops = { + .invalidate = i915_gem_userptr_invalidate, };
-static struct i915_mmu_notifier * -i915_mmu_notifier_create(struct i915_mm_struct *mm) -{ - struct i915_mmu_notifier *mn; - - mn = kmalloc(sizeof(*mn), GFP_KERNEL); - if (mn == NULL) - return ERR_PTR(-ENOMEM); - - spin_lock_init(&mn->lock); - mn->mn.ops = &i915_gem_userptr_notifier; - mn->objects = RB_ROOT_CACHED; - mn->mm = mm; - - return mn; -} - -static void -i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj) -{ - struct i915_mmu_object *mo; - - mo = fetch_and_zero(&obj->userptr.mmu_object); - if (!mo) - return; - - spin_lock(&mo->mn->lock); - del_object(mo); - spin_unlock(&mo->mn->lock); - kfree(mo); -} - -static struct i915_mmu_notifier * -i915_mmu_notifier_find(struct i915_mm_struct *mm) -{ - struct i915_mmu_notifier *mn, *old; - int err; - - mn = READ_ONCE(mm->mn); - if (likely(mn)) - return mn; - - mn = i915_mmu_notifier_create(mm); - if (IS_ERR(mn)) - return mn; - - err = mmu_notifier_register(&mn->mn, mm->mm); - if (err) { - kfree(mn); - return ERR_PTR(err); - } - - old = cmpxchg(&mm->mn, NULL, mn); - if (old) { - mmu_notifier_unregister(&mn->mn, mm->mm); - kfree(mn); - mn = old; - } - - return mn; -} - static int i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj) { - struct i915_mmu_notifier *mn; - struct i915_mmu_object *mo; - - if (GEM_WARN_ON(!obj->userptr.mm)) - return -EINVAL; - - mn = i915_mmu_notifier_find(obj->userptr.mm); - if (IS_ERR(mn)) - return PTR_ERR(mn); - - mo = kzalloc(sizeof(*mo), GFP_KERNEL); - if (!mo) - return -ENOMEM; - - mo->mn = mn; - mo->obj = obj; - mo->it.start = obj->userptr.ptr; - mo->it.last = obj->userptr.ptr + obj->base.size - 1; - RB_CLEAR_NODE(&mo->it.rb); - - obj->userptr.mmu_object = mo; - return 0; + return mmu_interval_notifier_insert(&obj->userptr.notifier, current->mm, + obj->userptr.ptr, obj->base.size, + &i915_gem_userptr_notifier_ops); }
-static void -i915_mmu_notifier_free(struct i915_mmu_notifier *mn, - struct mm_struct *mm) -{ - if (mn == NULL) - return; - - mmu_notifier_unregister(&mn->mn, mm); - kfree(mn); -} - - -static struct i915_mm_struct * -__i915_mm_struct_find(struct drm_i915_private *i915, struct mm_struct *real) -{ - struct i915_mm_struct *it, *mm = NULL; - - rcu_read_lock(); - hash_for_each_possible_rcu(i915->mm_structs, - it, node, - (unsigned long)real) - if (it->mm == real && kref_get_unless_zero(&it->kref)) { - mm = it; - break; - } - rcu_read_unlock(); - - return mm; -} - -static int -i915_gem_userptr_init__mm_struct(struct drm_i915_gem_object *obj) +static void i915_gem_object_userptr_drop_ref(struct drm_i915_gem_object *obj) { struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct i915_mm_struct *mm, *new; - int ret = 0; - - /* During release of the GEM object we hold the struct_mutex. This - * precludes us from calling mmput() at that time as that may be - * the last reference and so call exit_mmap(). exit_mmap() will - * attempt to reap the vma, and if we were holding a GTT mmap - * would then call drm_gem_vm_close() and attempt to reacquire - * the struct mutex. So in order to avoid that recursion, we have - * to defer releasing the mm reference until after we drop the - * struct_mutex, i.e. we need to schedule a worker to do the clean - * up. - */ - mm = __i915_mm_struct_find(i915, current->mm); - if (mm) - goto out; + struct page **pvec = NULL;
- new = kmalloc(sizeof(*mm), GFP_KERNEL); - if (!new) - return -ENOMEM; - - kref_init(&new->kref); - new->i915 = to_i915(obj->base.dev); - new->mm = current->mm; - new->mn = NULL; - - spin_lock(&i915->mm_lock); - mm = __i915_mm_struct_find(i915, current->mm); - if (!mm) { - hash_add_rcu(i915->mm_structs, - &new->node, - (unsigned long)new->mm); - mmgrab(current->mm); - mm = new; + spin_lock(&i915->mm.notifier_lock); + if (!--obj->userptr.page_ref) { + pvec = obj->userptr.pvec; + obj->userptr.pvec = NULL; } - spin_unlock(&i915->mm_lock); - if (mm != new) - kfree(new); - -out: - obj->userptr.mm = mm; - return ret; -} - -static void -__i915_mm_struct_free__worker(struct work_struct *work) -{ - struct i915_mm_struct *mm = container_of(work, typeof(*mm), work.work); - - i915_mmu_notifier_free(mm->mn, mm->mm); - mmdrop(mm->mm); - kfree(mm); -} - -static void -__i915_mm_struct_free(struct kref *kref) -{ - struct i915_mm_struct *mm = container_of(kref, typeof(*mm), kref); - - spin_lock(&mm->i915->mm_lock); - hash_del_rcu(&mm->node); - spin_unlock(&mm->i915->mm_lock); - - INIT_RCU_WORK(&mm->work, __i915_mm_struct_free__worker); - queue_rcu_work(system_wq, &mm->work); -} + GEM_BUG_ON(obj->userptr.page_ref < 0); + spin_unlock(&i915->mm.notifier_lock);
-static void -i915_gem_userptr_release__mm_struct(struct drm_i915_gem_object *obj) -{ - if (obj->userptr.mm == NULL) - return; + if (pvec) { + const unsigned long num_pages = obj->base.size >> PAGE_SHIFT;
- kref_put(&obj->userptr.mm->kref, __i915_mm_struct_free); - obj->userptr.mm = NULL; + unpin_user_pages(pvec, num_pages); + kfree(pvec); + } }
-struct get_pages_work { - struct work_struct work; - struct drm_i915_gem_object *obj; - struct task_struct *task; -}; - -static struct sg_table * -__i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj, - struct page **pvec, unsigned long num_pages) +static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) { + struct drm_i915_private *i915 = to_i915(obj->base.dev); + const unsigned long num_pages = obj->base.size >> PAGE_SHIFT; unsigned int max_segment = i915_sg_segment_size(); struct sg_table *st; unsigned int sg_page_sizes; struct scatterlist *sg; + struct page **pvec; int ret;
st = kmalloc(sizeof(*st), GFP_KERNEL); if (!st) - return ERR_PTR(-ENOMEM); + return -ENOMEM; + + spin_lock(&i915->mm.notifier_lock); + if (GEM_WARN_ON(!obj->userptr.page_ref)) { + spin_unlock(&i915->mm.notifier_lock); + ret = -EFAULT; + goto err_free; + } + + obj->userptr.page_ref++; + pvec = obj->userptr.pvec; + spin_unlock(&i915->mm.notifier_lock);
alloc_table: sg = __sg_alloc_table_from_pages(st, pvec, num_pages, 0, @@ -381,7 +151,8 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj, NULL, 0, GFP_KERNEL); if (IS_ERR(sg)) { kfree(st); - return ERR_CAST(sg); + ret = PTR_ERR(sg); + goto err; }
ret = i915_gem_gtt_prepare_pages(obj, st); @@ -393,203 +164,20 @@ __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj, goto alloc_table; }
- kfree(st); - return ERR_PTR(ret); + goto err; }
sg_page_sizes = i915_sg_page_sizes(st->sgl);
__i915_gem_object_set_pages(obj, st, sg_page_sizes);
- return st; -} - -static void -__i915_gem_userptr_get_pages_worker(struct work_struct *_work) -{ - struct get_pages_work *work = container_of(_work, typeof(*work), work); - struct drm_i915_gem_object *obj = work->obj; - const unsigned long npages = obj->base.size >> PAGE_SHIFT; - unsigned long pinned; - struct page **pvec; - int ret; - - ret = -ENOMEM; - pinned = 0; - - pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); - if (pvec != NULL) { - struct mm_struct *mm = obj->userptr.mm->mm; - unsigned int flags = 0; - int locked = 0; - - if (!i915_gem_object_is_readonly(obj)) - flags |= FOLL_WRITE; - - ret = -EFAULT; - if (mmget_not_zero(mm)) { - while (pinned < npages) { - if (!locked) { - mmap_read_lock(mm); - locked = 1; - } - ret = pin_user_pages_remote - (mm, - obj->userptr.ptr + pinned * PAGE_SIZE, - npages - pinned, - flags, - pvec + pinned, NULL, &locked); - if (ret < 0) - break; - - pinned += ret; - } - if (locked) - mmap_read_unlock(mm); - mmput(mm); - } - } - - mutex_lock_nested(&obj->mm.lock, I915_MM_GET_PAGES); - if (obj->userptr.work == &work->work) { - struct sg_table *pages = ERR_PTR(ret); - - if (pinned == npages) { - pages = __i915_gem_userptr_alloc_pages(obj, pvec, - npages); - if (!IS_ERR(pages)) { - pinned = 0; - pages = NULL; - } - } - - obj->userptr.work = ERR_CAST(pages); - if (IS_ERR(pages)) - __i915_gem_userptr_set_active(obj, false); - } - mutex_unlock(&obj->mm.lock); - - unpin_user_pages(pvec, pinned); - kvfree(pvec); - - i915_gem_object_put(obj); - put_task_struct(work->task); - kfree(work); -} - -static struct sg_table * -__i915_gem_userptr_get_pages_schedule(struct drm_i915_gem_object *obj) -{ - struct get_pages_work *work; - - /* Spawn a worker so that we can acquire the - * user pages without holding our mutex. Access - * to the user pages requires mmap_lock, and we have - * a strict lock ordering of mmap_lock, struct_mutex - - * we already hold struct_mutex here and so cannot - * call gup without encountering a lock inversion. - * - * Userspace will keep on repeating the operation - * (thanks to EAGAIN) until either we hit the fast - * path or the worker completes. If the worker is - * cancelled or superseded, the task is still run - * but the results ignored. (This leads to - * complications that we may have a stray object - * refcount that we need to be wary of when - * checking for existing objects during creation.) - * If the worker encounters an error, it reports - * that error back to this function through - * obj->userptr.work = ERR_PTR. - */ - work = kmalloc(sizeof(*work), GFP_KERNEL); - if (work == NULL) - return ERR_PTR(-ENOMEM); - - obj->userptr.work = &work->work; - - work->obj = i915_gem_object_get(obj); - - work->task = current; - get_task_struct(work->task); - - INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker); - queue_work(to_i915(obj->base.dev)->mm.userptr_wq, &work->work); - - return ERR_PTR(-EAGAIN); -} - -static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) -{ - const unsigned long num_pages = obj->base.size >> PAGE_SHIFT; - struct mm_struct *mm = obj->userptr.mm->mm; - struct page **pvec; - struct sg_table *pages; - bool active; - int pinned; - unsigned int gup_flags = 0; - - /* If userspace should engineer that these pages are replaced in - * the vma between us binding this page into the GTT and completion - * of rendering... Their loss. If they change the mapping of their - * pages they need to create a new bo to point to the new vma. - * - * However, that still leaves open the possibility of the vma - * being copied upon fork. Which falls under the same userspace - * synchronisation issue as a regular bo, except that this time - * the process may not be expecting that a particular piece of - * memory is tied to the GPU. - * - * Fortunately, we can hook into the mmu_notifier in order to - * discard the page references prior to anything nasty happening - * to the vma (discard or cloning) which should prevent the more - * egregious cases from causing harm. - */ - - if (obj->userptr.work) { - /* active flag should still be held for the pending work */ - if (IS_ERR(obj->userptr.work)) - return PTR_ERR(obj->userptr.work); - else - return -EAGAIN; - } - - pvec = NULL; - pinned = 0; - - if (mm == current->mm) { - pvec = kvmalloc_array(num_pages, sizeof(struct page *), - GFP_KERNEL | - __GFP_NORETRY | - __GFP_NOWARN); - if (pvec) { - /* defer to worker if malloc fails */ - if (!i915_gem_object_is_readonly(obj)) - gup_flags |= FOLL_WRITE; - pinned = pin_user_pages_fast_only(obj->userptr.ptr, - num_pages, gup_flags, - pvec); - } - } - - active = false; - if (pinned < 0) { - pages = ERR_PTR(pinned); - pinned = 0; - } else if (pinned < num_pages) { - pages = __i915_gem_userptr_get_pages_schedule(obj); - active = pages == ERR_PTR(-EAGAIN); - } else { - pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages); - active = !IS_ERR(pages); - } - if (active) - __i915_gem_userptr_set_active(obj, true); - - if (IS_ERR(pages)) - unpin_user_pages(pvec, pinned); - kvfree(pvec); + return 0;
- return PTR_ERR_OR_ZERO(pages); +err: + i915_gem_object_userptr_drop_ref(obj); +err_free: + kfree(st); + return ret; }
static void @@ -599,9 +187,6 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj, struct sgt_iter sgt_iter; struct page *page;
- /* Cancel any inflight work and force them to restart their gup */ - obj->userptr.work = NULL; - __i915_gem_userptr_set_active(obj, false); if (!pages) return;
@@ -641,19 +226,135 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj, }
mark_page_accessed(page); - unpin_user_page(page); } obj->mm.dirty = false;
sg_free_table(pages); kfree(pages); + + i915_gem_object_userptr_drop_ref(obj); +} + +static int i915_gem_object_userptr_unbind(struct drm_i915_gem_object *obj, bool get_pages) +{ + struct sg_table *pages; + int err; + + err = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE); + if (err) + return err; + + if (GEM_WARN_ON(i915_gem_object_has_pinned_pages(obj))) + return -EBUSY; + + mutex_lock_nested(&obj->mm.lock, I915_MM_GET_PAGES); + + pages = __i915_gem_object_unset_pages(obj); + if (!IS_ERR_OR_NULL(pages)) + i915_gem_userptr_put_pages(obj, pages); + + if (get_pages) + err = ____i915_gem_object_get_pages(obj); + mutex_unlock(&obj->mm.lock); + + return err; +} + +int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + const unsigned long num_pages = obj->base.size >> PAGE_SHIFT; + struct page **pvec; + unsigned int gup_flags = 0; + unsigned long notifier_seq; + int pinned, ret; + + if (obj->userptr.notifier.mm != current->mm) + return -EFAULT; + + ret = i915_gem_object_lock_interruptible(obj, NULL); + if (ret) + return ret; + + /* Make sure userptr is unbound for next attempt, so we don't use stale pages. */ + ret = i915_gem_object_userptr_unbind(obj, false); + i915_gem_object_unlock(obj); + if (ret) + return ret; + + notifier_seq = mmu_interval_read_begin(&obj->userptr.notifier); + + pvec = kvmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL); + if (!pvec) + return -ENOMEM; + + if (!i915_gem_object_is_readonly(obj)) + gup_flags |= FOLL_WRITE; + + pinned = ret = 0; + while (pinned < num_pages) { + ret = pin_user_pages_fast(obj->userptr.ptr + pinned * PAGE_SIZE, + num_pages - pinned, gup_flags, + &pvec[pinned]); + if (ret < 0) + goto out; + + pinned += ret; + } + ret = 0; + + spin_lock(&i915->mm.notifier_lock); + + if (mmu_interval_read_retry(&obj->userptr.notifier, + !obj->userptr.page_ref ? notifier_seq : + obj->userptr.notifier_seq)) { + ret = -EAGAIN; + goto out_unlock; + } + + if (!obj->userptr.page_ref++) { + obj->userptr.pvec = pvec; + obj->userptr.notifier_seq = notifier_seq; + + pvec = NULL; + } + +out_unlock: + spin_unlock(&i915->mm.notifier_lock); + +out: + if (pvec) { + unpin_user_pages(pvec, pinned); + kvfree(pvec); + } + + return ret; +} + +int i915_gem_object_userptr_submit_done(struct drm_i915_gem_object *obj) +{ + if (mmu_interval_read_retry(&obj->userptr.notifier, + obj->userptr.notifier_seq)) { + /* We collided with the mmu notifier, need to retry */ + + return -EAGAIN; + } + + return 0; +} + +void i915_gem_object_userptr_submit_fini(struct drm_i915_gem_object *obj) +{ + i915_gem_object_userptr_drop_ref(obj); }
static void i915_gem_userptr_release(struct drm_i915_gem_object *obj) { - i915_gem_userptr_release__mmu_notifier(obj); - i915_gem_userptr_release__mm_struct(obj); + GEM_WARN_ON(obj->userptr.page_ref); + + mmu_interval_notifier_remove(&obj->userptr.notifier); + obj->userptr.notifier.mm = NULL; }
static int @@ -686,7 +387,6 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { .name = "i915_gem_object_userptr", .flags = I915_GEM_OBJECT_IS_SHRINKABLE | I915_GEM_OBJECT_NO_MMAP | - I915_GEM_OBJECT_ASYNC_CANCEL | I915_GEM_OBJECT_IS_PROXY, .get_pages = i915_gem_userptr_get_pages, .put_pages = i915_gem_userptr_put_pages, @@ -807,6 +507,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev, i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
obj->userptr.ptr = args->user_ptr; + obj->userptr.notifier_seq = ULONG_MAX; if (args->flags & I915_USERPTR_READ_ONLY) i915_gem_object_set_readonly(obj);
@@ -814,9 +515,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev, * at binding. This means that we need to hook into the mmu_notifier * in order to detect if the mmu is destroyed. */ - ret = i915_gem_userptr_init__mm_struct(obj); - if (ret == 0) - ret = i915_gem_userptr_init__mmu_notifier(obj); + ret = i915_gem_userptr_init__mmu_notifier(obj); if (ret == 0) ret = drm_gem_handle_create(file, &obj->base, &handle);
@@ -835,15 +534,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev, int i915_gem_init_userptr(struct drm_i915_private *dev_priv) { #ifdef CONFIG_MMU_NOTIFIER - spin_lock_init(&dev_priv->mm_lock); - hash_init(dev_priv->mm_structs); - - dev_priv->mm.userptr_wq = - alloc_workqueue("i915-userptr-acquire", - WQ_HIGHPRI | WQ_UNBOUND, - 0); - if (!dev_priv->mm.userptr_wq) - return -ENOMEM; + spin_lock_init(&dev_priv->mm.notifier_lock); #endif
return 0; @@ -851,7 +542,4 @@ int i915_gem_init_userptr(struct drm_i915_private *dev_priv)
void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv) { -#ifdef CONFIG_MMU_NOTIFIER - destroy_workqueue(dev_priv->mm.userptr_wq); -#endif } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d3c67e17cd02..ce8d5ff8b9f4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -591,11 +591,10 @@ struct i915_gem_mm {
#ifdef CONFIG_MMU_NOTIFIER /** - * Workqueue to fault in userptr pages, flushed by the execbuf - * when required but otherwise left to userspace to try again - * on EAGAIN. + * notifier_lock for mmu notifiers, memory may not be allocated + * while holding this lock. */ - struct workqueue_struct *userptr_wq; + spinlock_t notifier_lock; #endif
/* shrinker accounting, also useful for userland debugging */ @@ -978,8 +977,6 @@ struct drm_i915_private { struct i915_ggtt ggtt; /* VM representing the global address space */
struct i915_gem_mm mm; - DECLARE_HASHTABLE(mm_structs, 7); - spinlock_t mm_lock;
/* Kernel Modesetting */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b03e245640c0..0b9eab66511c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1158,10 +1158,8 @@ int i915_gem_init(struct drm_i915_private *dev_priv) err_unlock: i915_gem_drain_workqueue(dev_priv);
- if (ret != -EIO) { + if (ret != -EIO) intel_uc_cleanup_firmwares(&dev_priv->gt.uc); - i915_gem_cleanup_userptr(dev_priv); - }
if (ret == -EIO) { /* @@ -1220,7 +1218,6 @@ void i915_gem_driver_release(struct drm_i915_private *dev_priv) intel_wa_list_free(&dev_priv->gt_wa_list);
intel_uc_cleanup_firmwares(&dev_priv->gt.uc); - i915_gem_cleanup_userptr(dev_priv);
i915_gem_drain_freed_objects(dev_priv);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
With userptr fixed, there is no need for all separate lockdep classes now, and we can remove all lockdep tricks used. A trylock in the shrinker is all we need now to flatten the locking hierarchy.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 +--- drivers/gpu/drm/i915/gem/i915_gem_object.h | 20 ++---------- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 34 ++++++++++---------- drivers/gpu/drm/i915/gem/i915_gem_phys.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 10 +++--- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +- 6 files changed, 27 insertions(+), 47 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 1393988bd5af..028a556ab1a5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -62,7 +62,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops, struct lock_class_key *key, unsigned flags) { - __mutex_init(&obj->mm.lock, ops->name ?: "obj->mm.lock", key); + mutex_init(&obj->mm.lock);
spin_lock_init(&obj->vma.lock); INIT_LIST_HEAD(&obj->vma.list); @@ -86,10 +86,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, mutex_init(&obj->mm.get_page.lock); INIT_RADIX_TREE(&obj->mm.get_dma_page.radix, GFP_KERNEL | __GFP_NOWARN); mutex_init(&obj->mm.get_dma_page.lock); - - if (IS_ENABLED(CONFIG_LOCKDEP) && i915_gem_object_is_shrinkable(obj)) - i915_gem_shrinker_taints_mutex(to_i915(obj->base.dev), - &obj->mm.lock); }
/** diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 33412248f6df..1b85f51c6ddd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -339,27 +339,10 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj); int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
-enum i915_mm_subclass { /* lockdep subclass for obj->mm.lock/struct_mutex */ - I915_MM_NORMAL = 0, - /* - * Only used by struct_mutex, when called "recursively" from - * direct-reclaim-esque. Safe because there is only every one - * struct_mutex in the entire system. - */ - I915_MM_SHRINKER = 1, - /* - * Used for obj->mm.lock when allocating pages. Safe because the object - * isn't yet on any LRU, and therefore the shrinker can't deadlock on - * it. As soon as the object has pages, obj->mm.lock nests within - * fs_reclaim. - */ - I915_MM_GET_PAGES = 1, -}; - static inline int __must_check i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) { - might_lock_nested(&obj->mm.lock, I915_MM_GET_PAGES); + might_lock(&obj->mm.lock);
if (atomic_inc_not_zero(&obj->mm.pages_pin_count)) return 0; @@ -403,6 +386,7 @@ i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) }
int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj); +int __i915_gem_object_put_pages_locked(struct drm_i915_gem_object *obj); void i915_gem_object_truncate(struct drm_i915_gem_object *obj); void i915_gem_object_writeback(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 60149cad6080..5bcd21a8fc4e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -111,7 +111,7 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj) { int err;
- err = mutex_lock_interruptible_nested(&obj->mm.lock, I915_MM_GET_PAGES); + err = mutex_lock_interruptible(&obj->mm.lock); if (err) return err;
@@ -193,21 +193,13 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) return pages; }
-int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj) +int __i915_gem_object_put_pages_locked(struct drm_i915_gem_object *obj) { struct sg_table *pages; - int err;
if (i915_gem_object_has_pinned_pages(obj)) return -EBUSY;
- /* May be called by shrinker from within get_pages() (on another bo) */ - mutex_lock(&obj->mm.lock); - if (unlikely(atomic_read(&obj->mm.pages_pin_count))) { - err = -EBUSY; - goto unlock; - } - i915_gem_object_release_mmap_offset(obj);
/* @@ -223,14 +215,22 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj) * get_pages backends we should be better able to handle the * cancellation of the async task in a more uniform manner. */ - if (!pages) - pages = ERR_PTR(-EINVAL); - - if (!IS_ERR(pages)) + if (!IS_ERR_OR_NULL(pages)) obj->ops->put_pages(obj, pages);
- err = 0; -unlock: + return 0; +} + +int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj) +{ + int err; + + if (i915_gem_object_has_pinned_pages(obj)) + return -EBUSY; + + /* May be called by shrinker from within get_pages() (on another bo) */ + mutex_lock(&obj->mm.lock); + err = __i915_gem_object_put_pages_locked(obj); mutex_unlock(&obj->mm.lock);
return err; @@ -336,7 +336,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM)) return ERR_PTR(-ENXIO);
- err = mutex_lock_interruptible_nested(&obj->mm.lock, I915_MM_GET_PAGES); + err = mutex_lock_interruptible(&obj->mm.lock); if (err) return ERR_PTR(err);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 144e4940eede..0d176bf06405 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c @@ -236,7 +236,7 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align) if (err) return err;
- err = mutex_lock_interruptible_nested(&obj->mm.lock, I915_MM_GET_PAGES); + err = mutex_lock_interruptible(&obj->mm.lock); if (err) goto err_unlock;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index dc8f052a0ffe..afc6e5b4dcf1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -48,9 +48,9 @@ static bool unsafe_drop_pages(struct drm_i915_gem_object *obj, flags = I915_GEM_OBJECT_UNBIND_TEST;
if (i915_gem_object_unbind(obj, flags) == 0) - __i915_gem_object_put_pages(obj); + return true;
- return !i915_gem_object_has_pages(obj); + return false; }
static void try_to_writeback(struct drm_i915_gem_object *obj, @@ -199,10 +199,10 @@ i915_gem_shrink(struct drm_i915_private *i915,
spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
- if (unsafe_drop_pages(obj, shrink)) { + if (unsafe_drop_pages(obj, shrink) && + mutex_trylock(&obj->mm.lock)) { /* May arrive from get_pages on another bo */ - mutex_lock(&obj->mm.lock); - if (!i915_gem_object_has_pages(obj)) { + if (!__i915_gem_object_put_pages_locked(obj)) { try_to_writeback(obj, shrink); count += obj->base.size >> PAGE_SHIFT; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 9ea9aa65ade1..0cab9da6669e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -247,7 +247,7 @@ static int i915_gem_object_userptr_unbind(struct drm_i915_gem_object *obj, bool if (GEM_WARN_ON(i915_gem_object_has_pinned_pages(obj))) return -EBUSY;
- mutex_lock_nested(&obj->mm.lock, I915_MM_GET_PAGES); + mutex_lock(&obj->mm.lock);
pages = __i915_gem_object_unset_pages(obj); if (!IS_ERR_OR_NULL(pages))
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
This allows us to remove pin_map from state allocation, which saves us a few retry loops. We won't need this until first pin, anyway.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_context_types.h | 13 ++- .../drm/i915/gt/intel_execlists_submission.c | 107 +++++++++--------- 2 files changed, 62 insertions(+), 58 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 52fa9c132746..a593c98398a7 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -81,12 +81,13 @@ struct intel_context { unsigned long flags; #define CONTEXT_BARRIER_BIT 0 #define CONTEXT_ALLOC_BIT 1 -#define CONTEXT_VALID_BIT 2 -#define CONTEXT_CLOSED_BIT 3 -#define CONTEXT_USE_SEMAPHORES 4 -#define CONTEXT_BANNED 5 -#define CONTEXT_FORCE_SINGLE_SUBMISSION 6 -#define CONTEXT_NOPREEMPT 7 +#define CONTEXT_INIT_BIT 2 +#define CONTEXT_VALID_BIT 3 +#define CONTEXT_CLOSED_BIT 4 +#define CONTEXT_USE_SEMAPHORES 5 +#define CONTEXT_BANNED 6 +#define CONTEXT_FORCE_SINGLE_SUBMISSION 7 +#define CONTEXT_NOPREEMPT 8
u32 *lrc_reg_state; union { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 1cc93ea6b7f0..7eec42b27bc1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3497,9 +3497,39 @@ __execlists_update_reg_state(const struct intel_context *ce, } }
+static void populate_lr_context(struct intel_context *ce, + struct intel_engine_cs *engine, + void *vaddr) +{ + bool inhibit = true; + struct drm_i915_gem_object *ctx_obj = ce->state->obj; + + set_redzone(vaddr, engine); + + if (engine->default_state) { + shmem_read(engine->default_state, 0, + vaddr, engine->context_size); + __set_bit(CONTEXT_VALID_BIT, &ce->flags); + inhibit = false; + } + + /* Clear the ppHWSP (inc. per-context counters) */ + memset(vaddr, 0, PAGE_SIZE); + + /* + * The second page of the context object contains some registers which + * must be set up prior to the first execution. + */ + execlists_init_reg_state(vaddr + LRC_STATE_OFFSET, + ce, engine, ce->ring, inhibit); + + __i915_gem_object_flush_map(ctx_obj, 0, engine->context_size); +} + static int -execlists_context_pre_pin(struct intel_context *ce, - struct i915_gem_ww_ctx *ww, void **vaddr) +__execlists_context_pre_pin(struct intel_context *ce, + struct intel_engine_cs *engine, + struct i915_gem_ww_ctx *ww, void **vaddr) { GEM_BUG_ON(!ce->state); GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); @@ -3507,8 +3537,20 @@ execlists_context_pre_pin(struct intel_context *ce, *vaddr = i915_gem_object_pin_map(ce->state->obj, i915_coherent_map_type(ce->engine->i915) | I915_MAP_OVERRIDE); + if (IS_ERR(*vaddr)) + return PTR_ERR(*vaddr); + + if (!__test_and_set_bit(CONTEXT_INIT_BIT, &ce->flags)) + populate_lr_context(ce, engine, *vaddr); + + return 0; +}
- return PTR_ERR_OR_ZERO(*vaddr); +static int +execlists_context_pre_pin(struct intel_context *ce, + struct i915_gem_ww_ctx *ww, void **vaddr) +{ + return __execlists_context_pre_pin(ce, ce->engine, ww, vaddr); }
static int @@ -4610,45 +4652,6 @@ static void execlists_init_reg_state(u32 *regs, __reset_stop_ring(regs, engine); }
-static int -populate_lr_context(struct intel_context *ce, - struct drm_i915_gem_object *ctx_obj, - struct intel_engine_cs *engine, - struct intel_ring *ring) -{ - bool inhibit = true; - void *vaddr; - - vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB); - if (IS_ERR(vaddr)) { - drm_dbg(&engine->i915->drm, "Could not map object pages!\n"); - return PTR_ERR(vaddr); - } - - set_redzone(vaddr, engine); - - if (engine->default_state) { - shmem_read(engine->default_state, 0, - vaddr, engine->context_size); - __set_bit(CONTEXT_VALID_BIT, &ce->flags); - inhibit = false; - } - - /* Clear the ppHWSP (inc. per-context counters) */ - memset(vaddr, 0, PAGE_SIZE); - - /* - * The second page of the context object contains some registers which - * must be set up prior to the first execution. - */ - execlists_init_reg_state(vaddr + LRC_STATE_OFFSET, - ce, engine, ring, inhibit); - - __i915_gem_object_flush_map(ctx_obj, 0, engine->context_size); - i915_gem_object_unpin_map(ctx_obj); - return 0; -} - static struct intel_timeline *pinned_timeline(struct intel_context *ce) { struct intel_timeline *tl = fetch_and_zero(&ce->timeline); @@ -4712,20 +4715,11 @@ static int __execlists_context_alloc(struct intel_context *ce, goto error_deref_obj; }
- ret = populate_lr_context(ce, ctx_obj, engine, ring); - if (ret) { - drm_dbg(&engine->i915->drm, - "Failed to populate LRC: %d\n", ret); - goto error_ring_free; - } - ce->ring = ring; ce->state = vma;
return 0;
-error_ring_free: - intel_ring_put(ring); error_deref_obj: i915_gem_object_put(ctx_obj); return ret; @@ -4849,6 +4843,15 @@ static int virtual_context_alloc(struct intel_context *ce) return __execlists_context_alloc(ce, ve->siblings[0]); }
+static int +virtual_context_pre_pin(struct intel_context *ce, + struct i915_gem_ww_ctx *ww, void **vaddr) +{ + struct virtual_engine *ve = container_of(ce, typeof(*ve), context); + + return __execlists_context_pre_pin(ce, ve->siblings[0], ww, vaddr); +} + static int virtual_context_pin(struct intel_context *ce, void *vaddr) { struct virtual_engine *ve = container_of(ce, typeof(*ve), context); @@ -4882,7 +4885,7 @@ static void virtual_context_exit(struct intel_context *ce) static const struct intel_context_ops virtual_context_ops = { .alloc = virtual_context_alloc,
- .pre_pin = execlists_context_pre_pin, + .pre_pin = virtual_context_pre_pin, .pin = virtual_context_pin, .unpin = execlists_context_unpin, .post_unpin = execlists_context_post_unpin,
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We map the initial context during first pin.
This allows us to remove pin_map from state allocation, which saves us a few retry loops. We won't need this until first pin anyway.
intel_ring_submission_setup() is also reworked slightly to do all pinning in a single ww loop.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@oracle.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../gpu/drm/i915/gt/intel_ring_submission.c | 184 +++++++++++------- 1 file changed, 118 insertions(+), 66 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index a41b43f445b8..6b280904db43 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -478,6 +478,26 @@ static void ring_context_destroy(struct kref *ref) intel_context_free(ce); }
+static int ring_context_init_default_state(struct intel_context *ce, + struct i915_gem_ww_ctx *ww) +{ + struct drm_i915_gem_object *obj = ce->state->obj; + void *vaddr; + + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr); + + shmem_read(ce->engine->default_state, 0, + vaddr, ce->engine->context_size); + + i915_gem_object_flush_map(obj); + __i915_gem_object_release_map(obj); + + __set_bit(CONTEXT_VALID_BIT, &ce->flags); + return 0; +} + static int ring_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **unused) @@ -485,6 +505,13 @@ static int ring_context_pre_pin(struct intel_context *ce, struct i915_address_space *vm; int err = 0;
+ if (ce->engine->default_state && + !test_bit(CONTEXT_VALID_BIT, &ce->flags)) { + err = ring_context_init_default_state(ce, ww); + if (err) + return err; + } + vm = vm_alias(ce->vm); if (vm) err = gen6_ppgtt_pin(i915_vm_to_ppgtt((vm)), ww); @@ -540,22 +567,6 @@ alloc_context_vma(struct intel_engine_cs *engine) if (IS_IVYBRIDGE(i915)) i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
- if (engine->default_state) { - void *vaddr; - - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); - if (IS_ERR(vaddr)) { - err = PTR_ERR(vaddr); - goto err_obj; - } - - shmem_read(engine->default_state, 0, - vaddr, engine->context_size); - - i915_gem_object_flush_map(obj); - __i915_gem_object_release_map(obj); - } - vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL); if (IS_ERR(vma)) { err = PTR_ERR(vma); @@ -587,8 +598,6 @@ static int ring_context_alloc(struct intel_context *ce) return PTR_ERR(vma);
ce->state = vma; - if (engine->default_state) - __set_bit(CONTEXT_VALID_BIT, &ce->flags); }
return 0; @@ -1184,37 +1193,15 @@ static int gen7_ctx_switch_bb_setup(struct intel_engine_cs * const engine, return gen7_setup_clear_gpr_bb(engine, vma); }
-static int gen7_ctx_switch_bb_init(struct intel_engine_cs *engine) +static int gen7_ctx_switch_bb_init(struct intel_engine_cs *engine, + struct i915_gem_ww_ctx *ww, + struct i915_vma *vma) { - struct drm_i915_gem_object *obj; - struct i915_vma *vma; - int size; int err;
- size = gen7_ctx_switch_bb_setup(engine, NULL /* probe size */); - if (size <= 0) - return size; - - size = ALIGN(size, PAGE_SIZE); - obj = i915_gem_object_create_internal(engine->i915, size); - if (IS_ERR(obj)) - return PTR_ERR(obj); - - vma = i915_vma_instance(obj, engine->gt->vm, NULL); - if (IS_ERR(vma)) { - err = PTR_ERR(vma); - goto err_obj; - } - - vma->private = intel_context_create(engine); /* dummy residuals */ - if (IS_ERR(vma->private)) { - err = PTR_ERR(vma->private); - goto err_obj; - } - - err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH); + err = i915_vma_pin_ww(vma, ww, 0, 0, PIN_USER | PIN_HIGH); if (err) - goto err_private; + return err;
err = i915_vma_sync(vma); if (err) @@ -1229,17 +1216,53 @@ static int gen7_ctx_switch_bb_init(struct intel_engine_cs *engine)
err_unpin: i915_vma_unpin(vma); -err_private: - intel_context_put(vma->private); -err_obj: - i915_gem_object_put(obj); return err; }
+static struct i915_vma *gen7_ctx_vma(struct intel_engine_cs *engine) +{ + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int size, err; + + if (!IS_HASWELL(engine->i915) || engine->class != RENDER_CLASS) + return 0; + + err = gen7_ctx_switch_bb_setup(engine, NULL /* probe size */); + if (err < 0) + return ERR_PTR(err); + if (!err) + return NULL; + + size = ALIGN(err, PAGE_SIZE); + + obj = i915_gem_object_create_internal(engine->i915, size); + if (IS_ERR(obj)) + return ERR_CAST(obj); + + vma = i915_vma_instance(obj, engine->gt->vm, NULL); + if (IS_ERR(vma)) { + i915_gem_object_put(obj); + return ERR_CAST(vma); + } + + vma->private = intel_context_create(engine); /* dummy residuals */ + if (IS_ERR(vma->private)) { + err = PTR_ERR(vma->private); + vma->private = NULL; + i915_gem_object_put(obj); + return ERR_PTR(err); + } + + return vma; +} + int intel_ring_submission_setup(struct intel_engine_cs *engine) { + struct i915_gem_ww_ctx ww; struct intel_timeline *timeline; struct intel_ring *ring; + struct i915_vma *gen7_wa_vma; int err;
setup_common(engine); @@ -1270,43 +1293,72 @@ int intel_ring_submission_setup(struct intel_engine_cs *engine) } GEM_BUG_ON(timeline->has_initial_breadcrumb);
- err = intel_timeline_pin(timeline, NULL); - if (err) - goto err_timeline; - ring = intel_engine_create_ring(engine, SZ_16K); if (IS_ERR(ring)) { err = PTR_ERR(ring); - goto err_timeline_unpin; + goto err_timeline; }
- err = intel_ring_pin(ring, NULL); - if (err) - goto err_ring; - GEM_BUG_ON(engine->legacy.ring); engine->legacy.ring = ring; engine->legacy.timeline = timeline;
- GEM_BUG_ON(timeline->hwsp_ggtt != engine->status_page.vma); + gen7_wa_vma = gen7_ctx_vma(engine); + if (IS_ERR(gen7_wa_vma)) { + err = PTR_ERR(gen7_wa_vma); + goto err_ring; + }
- if (IS_HASWELL(engine->i915) && engine->class == RENDER_CLASS) { - err = gen7_ctx_switch_bb_init(engine); + i915_gem_ww_ctx_init(&ww, false); + +retry: + err = i915_gem_object_lock(timeline->hwsp_ggtt->obj, &ww); + if (!err && gen7_wa_vma) + err = i915_gem_object_lock(gen7_wa_vma->obj, &ww); + if (!err && engine->legacy.ring->vma->obj) + err = i915_gem_object_lock(engine->legacy.ring->vma->obj, &ww); + if (!err) + err = intel_timeline_pin(timeline, &ww); + if (!err) { + err = intel_ring_pin(ring, &ww); if (err) - goto err_ring_unpin; + intel_timeline_unpin(timeline); } + if (err) + goto out; + + GEM_BUG_ON(timeline->hwsp_ggtt != engine->status_page.vma); + + if (gen7_wa_vma) { + err = gen7_ctx_switch_bb_init(engine, &ww, gen7_wa_vma); + if (err) { + intel_ring_unpin(ring); + intel_timeline_unpin(timeline); + } + } + +out: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + if (err) + goto err_gen7_put;
/* Finally, take ownership and responsibility for cleanup! */ engine->release = ring_release;
return 0;
-err_ring_unpin: - intel_ring_unpin(ring); +err_gen7_put: + if (gen7_wa_vma) { + intel_context_put(gen7_wa_vma->private); + i915_gem_object_put(gen7_wa_vma->obj); + } err_ring: intel_ring_put(ring); -err_timeline_unpin: - intel_timeline_unpin(timeline); err_timeline: intel_timeline_put(timeline); err:
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Try to pin to ggtt first, and use a full ww loop to handle eviction correctly.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 37 +++++++++++++++-------- 1 file changed, 24 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 97ceaf7116e8..420c6a35f3ed 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -618,6 +618,7 @@ static void cleanup_status_page(struct intel_engine_cs *engine) }
static int pin_ggtt_status_page(struct intel_engine_cs *engine, + struct i915_gem_ww_ctx *ww, struct i915_vma *vma) { unsigned int flags; @@ -638,12 +639,13 @@ static int pin_ggtt_status_page(struct intel_engine_cs *engine, else flags = PIN_HIGH;
- return i915_ggtt_pin(vma, NULL, 0, flags); + return i915_ggtt_pin(vma, ww, 0, flags); }
static int init_status_page(struct intel_engine_cs *engine) { struct drm_i915_gem_object *obj; + struct i915_gem_ww_ctx ww; struct i915_vma *vma; void *vaddr; int ret; @@ -667,30 +669,39 @@ static int init_status_page(struct intel_engine_cs *engine) vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL); if (IS_ERR(vma)) { ret = PTR_ERR(vma); - goto err; + goto err_put; }
+ i915_gem_ww_ctx_init(&ww, true); +retry: + ret = i915_gem_object_lock(obj, &ww); + if (!ret && !HWS_NEEDS_PHYSICAL(engine->i915)) + ret = pin_ggtt_status_page(engine, &ww, vma); + if (ret) + goto err; + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); if (IS_ERR(vaddr)) { ret = PTR_ERR(vaddr); - goto err; + goto err_unpin; }
engine->status_page.addr = memset(vaddr, 0, PAGE_SIZE); engine->status_page.vma = vma;
- if (!HWS_NEEDS_PHYSICAL(engine->i915)) { - ret = pin_ggtt_status_page(engine, vma); - if (ret) - goto err_unpin; - } - - return 0; - err_unpin: - i915_gem_object_unpin_map(obj); + if (ret) + i915_vma_unpin(vma); err: - i915_gem_object_put(obj); + if (ret == -EDEADLK) { + ret = i915_gem_ww_ctx_backoff(&ww); + if (!ret) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); +err_put: + if (ret) + i915_gem_object_put(obj); return ret; }
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Pin in the caller, not in the work itself. This should also work better for dma-fence annotations.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index bc0223716906..daf9284ef1f5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -27,15 +27,8 @@ static void __do_clflush(struct drm_i915_gem_object *obj) static int clflush_work(struct dma_fence_work *base) { struct clflush *clflush = container_of(base, typeof(*clflush), base); - struct drm_i915_gem_object *obj = clflush->obj; - int err;
- err = i915_gem_object_pin_pages(obj); - if (err) - return err; - - __do_clflush(obj); - i915_gem_object_unpin_pages(obj); + __do_clflush(clflush->obj);
return 0; } @@ -44,6 +37,7 @@ static void clflush_release(struct dma_fence_work *base) { struct clflush *clflush = container_of(base, typeof(*clflush), base);
+ i915_gem_object_unpin_pages(clflush->obj); i915_gem_object_put(clflush->obj); }
@@ -63,6 +57,11 @@ static struct clflush *clflush_work_create(struct drm_i915_gem_object *obj) if (!clflush) return NULL;
+ if (__i915_gem_object_get_pages(obj) < 0) { + kfree(clflush); + return NULL; + } + dma_fence_work_init(&clflush->base, &clflush_ops); clflush->obj = i915_gem_object_get(obj); /* obj <-> clflush cycle */
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Instead of multiple lockings, lock the object once, and perform the ww dance around attach_phys and pin_pages.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 69 ++++++++++++------- drivers/gpu/drm/i915/display/intel_display.h | 2 +- drivers/gpu/drm/i915/display/intel_fbdev.c | 2 +- drivers/gpu/drm/i915/display/intel_overlay.c | 34 +++++++-- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 30 ++------ drivers/gpu/drm/i915/gem/i915_gem_object.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_phys.c | 10 +-- .../drm/i915/gem/selftests/i915_gem_phys.c | 2 + 8 files changed, 86 insertions(+), 64 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index f36921a3c4bc..8a7945f55278 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -2232,6 +2232,7 @@ static bool intel_plane_uses_fence(const struct intel_plane_state *plane_state)
struct i915_vma * intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, + bool phys_cursor, const struct i915_ggtt_view *view, bool uses_fence, unsigned long *out_flags) @@ -2240,14 +2241,19 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, struct drm_i915_private *dev_priv = to_i915(dev); struct drm_i915_gem_object *obj = intel_fb_obj(fb); intel_wakeref_t wakeref; + struct i915_gem_ww_ctx ww; struct i915_vma *vma; unsigned int pinctl; u32 alignment; + int ret;
if (drm_WARN_ON(dev, !i915_gem_object_is_framebuffer(obj))) return ERR_PTR(-EINVAL);
- alignment = intel_surf_alignment(fb, 0); + if (phys_cursor) + alignment = intel_cursor_alignment(dev_priv); + else + alignment = intel_surf_alignment(fb, 0); if (drm_WARN_ON(dev, alignment && !is_power_of_2(alignment))) return ERR_PTR(-EINVAL);
@@ -2282,14 +2288,26 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, if (HAS_GMCH(dev_priv)) pinctl |= PIN_MAPPABLE;
- vma = i915_gem_object_pin_to_display_plane(obj, - alignment, view, pinctl); - if (IS_ERR(vma)) + i915_gem_ww_ctx_init(&ww, true); +retry: + ret = i915_gem_object_lock(obj, &ww); + if (!ret && phys_cursor) + ret = i915_gem_object_attach_phys(obj, alignment); + if (!ret) + ret = i915_gem_object_pin_pages(obj); + if (ret) goto err;
- if (uses_fence && i915_vma_is_map_and_fenceable(vma)) { - int ret; + if (!ret) { + vma = i915_gem_object_pin_to_display_plane(obj, &ww, alignment, + view, pinctl); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); + goto err_unpin; + } + }
+ if (uses_fence && i915_vma_is_map_and_fenceable(vma)) { /* * Install a fence for tiled scan-out. Pre-i965 always needs a * fence, whereas 965+ only requires a fence if using @@ -2310,16 +2328,28 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, ret = i915_vma_pin_fence(vma); if (ret != 0 && INTEL_GEN(dev_priv) < 4) { i915_gem_object_unpin_from_display_plane(vma); - vma = ERR_PTR(ret); - goto err; + goto err_unpin; } + ret = 0;
- if (ret == 0 && vma->fence) + if (vma->fence) *out_flags |= PLANE_HAS_FENCE; }
i915_vma_get(vma); + +err_unpin: + i915_gem_object_unpin_pages(obj); err: + if (ret == -EDEADLK) { + ret = i915_gem_ww_ctx_backoff(&ww); + if (!ret) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + if (ret) + vma = ERR_PTR(ret); + atomic_dec(&dev_priv->gpu_error.pending_fb_pin); intel_runtime_pm_put(&dev_priv->runtime_pm, wakeref); return vma; @@ -16626,19 +16656,11 @@ static int intel_plane_pin_fb(struct intel_plane_state *plane_state) struct drm_i915_private *dev_priv = to_i915(plane->base.dev); struct drm_framebuffer *fb = plane_state->hw.fb; struct i915_vma *vma; + bool phys_cursor = + plane->id == PLANE_CURSOR && + INTEL_INFO(dev_priv)->display.cursor_needs_physical;
- if (plane->id == PLANE_CURSOR && - INTEL_INFO(dev_priv)->display.cursor_needs_physical) { - struct drm_i915_gem_object *obj = intel_fb_obj(fb); - const int align = intel_cursor_alignment(dev_priv); - int err; - - err = i915_gem_object_attach_phys(obj, align); - if (err) - return err; - } - - vma = intel_pin_and_fence_fb_obj(fb, + vma = intel_pin_and_fence_fb_obj(fb, phys_cursor, &plane_state->view, intel_plane_uses_fence(plane_state), &plane_state->flags); @@ -16734,13 +16756,8 @@ intel_prepare_plane_fb(struct drm_plane *_plane, if (!obj) return 0;
- ret = i915_gem_object_pin_pages(obj); - if (ret) - return ret;
ret = intel_plane_pin_fb(new_plane_state); - - i915_gem_object_unpin_pages(obj); if (ret) return ret;
diff --git a/drivers/gpu/drm/i915/display/intel_display.h b/drivers/gpu/drm/i915/display/intel_display.h index 5e0d42d82c11..5f5e632e216b 100644 --- a/drivers/gpu/drm/i915/display/intel_display.h +++ b/drivers/gpu/drm/i915/display/intel_display.h @@ -569,7 +569,7 @@ void intel_release_load_detect_pipe(struct drm_connector *connector, struct intel_load_detect_pipe *old, struct drm_modeset_acquire_ctx *ctx); struct i915_vma * -intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, +intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, bool phys_cursor, const struct i915_ggtt_view *view, bool uses_fence, unsigned long *out_flags); diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 842c04e63214..bdf44e923cc0 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -211,7 +211,7 @@ static int intelfb_create(struct drm_fb_helper *helper, * This also validates that any existing fb inherited from the * BIOS is suitable for own access. */ - vma = intel_pin_and_fence_fb_obj(&ifbdev->fb->base, + vma = intel_pin_and_fence_fb_obj(&ifbdev->fb->base, false, &view, false, &flags); if (IS_ERR(vma)) { ret = PTR_ERR(vma); diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c b/drivers/gpu/drm/i915/display/intel_overlay.c index 52b4f6193b4c..9cf634cc7084 100644 --- a/drivers/gpu/drm/i915/display/intel_overlay.c +++ b/drivers/gpu/drm/i915/display/intel_overlay.c @@ -755,6 +755,32 @@ static u32 overlay_cmd_reg(struct drm_intel_overlay_put_image *params) return cmd; }
+static struct i915_vma *intel_overlay_pin_fb(struct drm_i915_gem_object *new_bo) +{ + struct i915_gem_ww_ctx ww; + struct i915_vma *vma; + int ret; + + i915_gem_ww_ctx_init(&ww, true); +retry: + ret = i915_gem_object_lock(new_bo, &ww); + if (!ret) { + vma = i915_gem_object_pin_to_display_plane(new_bo, &ww, 0, + NULL, PIN_MAPPABLE); + ret = PTR_ERR_OR_ZERO(vma); + } + if (ret == -EDEADLK) { + ret = i915_gem_ww_ctx_backoff(&ww); + if (!ret) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + if (ret) + return ERR_PTR(ret); + + return vma; +} + static int intel_overlay_do_put_image(struct intel_overlay *overlay, struct drm_i915_gem_object *new_bo, struct drm_intel_overlay_put_image *params) @@ -776,12 +802,10 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
atomic_inc(&dev_priv->gpu_error.pending_fb_pin);
- vma = i915_gem_object_pin_to_display_plane(new_bo, - 0, NULL, PIN_MAPPABLE); - if (IS_ERR(vma)) { - ret = PTR_ERR(vma); + vma = intel_overlay_pin_fb(new_bo); + if (IS_ERR(vma)) goto out_pin_section; - } + i915_gem_object_flush_frontbuffer(new_bo, ORIGIN_DIRTYFB);
if (!overlay->active) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index c1d4bf62b3ea..51a33c4f61d0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -313,12 +313,12 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data, */ struct i915_vma * i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, + struct i915_gem_ww_ctx *ww, u32 alignment, const struct i915_ggtt_view *view, unsigned int flags) { struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct i915_gem_ww_ctx ww; struct i915_vma *vma; int ret;
@@ -326,11 +326,6 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj)) return ERR_PTR(-EINVAL);
- i915_gem_ww_ctx_init(&ww, true); -retry: - ret = i915_gem_object_lock(obj, &ww); - if (ret) - goto err; /* * The display engine is not coherent with the LLC cache on gen6. As * a result, we make sure that the pinning that is about to occur is @@ -345,7 +340,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE); if (ret) - goto err; + return ERR_PTR(ret);
/* * As the user may map the buffer once pinned in the display plane @@ -358,32 +353,19 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, vma = ERR_PTR(-ENOSPC); if ((flags & PIN_MAPPABLE) == 0 && (!view || view->type == I915_GGTT_VIEW_NORMAL)) - vma = i915_gem_object_ggtt_pin_ww(obj, &ww, view, 0, alignment, + vma = i915_gem_object_ggtt_pin_ww(obj, ww, view, 0, alignment, flags | PIN_MAPPABLE | PIN_NONBLOCK); if (IS_ERR(vma) && vma != ERR_PTR(-EDEADLK)) - vma = i915_gem_object_ggtt_pin_ww(obj, &ww, view, 0, + vma = i915_gem_object_ggtt_pin_ww(obj, ww, view, 0, alignment, flags); - if (IS_ERR(vma)) { - ret = PTR_ERR(vma); - goto err; - } + if (IS_ERR(vma)) + return vma;
vma->display_alignment = max_t(u64, vma->display_alignment, alignment);
i915_gem_object_flush_if_display_locked(obj);
-err: - if (ret == -EDEADLK) { - ret = i915_gem_ww_ctx_backoff(&ww); - if (!ret) - goto retry; - } - i915_gem_ww_ctx_fini(&ww); - - if (ret) - return ERR_PTR(ret); - return vma; }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 1b85f51c6ddd..0fec91ad6f62 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -489,6 +489,7 @@ int __must_check i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write); struct i915_vma * __must_check i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, + struct i915_gem_ww_ctx *ww, u32 alignment, const struct i915_ggtt_view *view, unsigned int flags); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 0d176bf06405..f317be5f5e34 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c @@ -219,6 +219,8 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align) { int err;
+ assert_object_held(obj); + if (align > obj->base.size) return -EINVAL;
@@ -232,13 +234,9 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align) if (err) return err;
- err = i915_gem_object_lock_interruptible(obj, NULL); - if (err) - return err; - err = mutex_lock_interruptible(&obj->mm.lock); if (err) - goto err_unlock; + return err;
if (unlikely(!i915_gem_object_has_struct_page(obj))) goto out; @@ -269,8 +267,6 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align)
out: mutex_unlock(&obj->mm.lock); -err_unlock: - i915_gem_object_unlock(obj); return err; }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c index 0cfa082047fe..3a6ce87f8b52 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c @@ -31,7 +31,9 @@ static int mock_phys_object(void *arg) goto out_obj; }
+ i915_gem_object_lock(obj, NULL); err = i915_gem_object_attach_phys(obj, PAGE_SIZE); + i915_gem_object_unlock(obj); if (err) { pr_err("i915_gem_object_attach_phys failed, err=%d\n", err); goto out_obj;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Take a simple lock so we hold ww around (un)pin_pages as needed.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index c0034d811e50..163208a6260d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -246,6 +246,9 @@ static vm_fault_t vm_fault_cpu(struct vm_fault *vmf) area->vm_flags & VM_WRITE)) return VM_FAULT_SIGBUS;
+ if (i915_gem_object_lock_interruptible(obj, NULL)) + return VM_FAULT_NOPAGE; + err = i915_gem_object_pin_pages(obj); if (err) goto out; @@ -269,6 +272,7 @@ static vm_fault_t vm_fault_cpu(struct vm_fault *vmf) i915_gem_object_unpin_pages(obj);
out: + i915_gem_object_unlock(obj); return i915_error_to_vmf_fault(err); }
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
This should be done as part of the ww loop, in order to remove a i915_vma_pin that needs ww held.
Now only i915_ggtt_pin() callers remaining.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_workarounds.c | 24 ++++++++---------- .../gpu/drm/i915/gt/selftest_workarounds.c | 25 ++++++++++++++++--- 2 files changed, 32 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index a82554baa6ac..de50b7c47ea3 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2073,7 +2073,6 @@ create_scratch(struct i915_address_space *vm, int count) struct drm_i915_gem_object *obj; struct i915_vma *vma; unsigned int size; - int err;
size = round_up(count * sizeof(u32), PAGE_SIZE); obj = i915_gem_object_create_internal(vm->i915, size); @@ -2084,20 +2083,11 @@ create_scratch(struct i915_address_space *vm, int count)
vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) { - err = PTR_ERR(vma); - goto err_obj; + i915_gem_object_put(obj); + return vma; }
- err = i915_vma_pin(vma, 0, 0, - i915_vma_is_ggtt(vma) ? PIN_GLOBAL : PIN_USER); - if (err) - goto err_obj; - return vma; - -err_obj: - i915_gem_object_put(obj); - return ERR_PTR(err); }
struct mcr_range { @@ -2215,10 +2205,15 @@ static int engine_wa_list_verify(struct intel_context *ce, if (err) goto err_pm;
+ err = i915_vma_pin_ww(vma, &ww, 0, 0, + i915_vma_is_ggtt(vma) ? PIN_GLOBAL : PIN_USER); + if (err) + goto err_unpin; + rq = i915_request_create(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto err_unpin; + goto err_vma; }
err = i915_request_await_object(rq, vma->obj, true); @@ -2259,6 +2254,8 @@ static int engine_wa_list_verify(struct intel_context *ce,
err_rq: i915_request_put(rq); +err_vma: + i915_vma_unpin(vma); err_unpin: intel_context_unpin(ce); err_pm: @@ -2269,7 +2266,6 @@ static int engine_wa_list_verify(struct intel_context *ce, } i915_gem_ww_ctx_fini(&ww); intel_engine_pm_put(ce->engine); - i915_vma_unpin(vma); i915_vma_put(vma); return err; } diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c index 61a0532d0f3d..810ab026a55e 100644 --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c @@ -386,6 +386,25 @@ static struct i915_vma *create_batch(struct i915_address_space *vm) return ERR_PTR(err); }
+static struct i915_vma * +create_scratch_pinned(struct i915_address_space *vm, int count) +{ + struct i915_vma *vma = create_scratch(vm, count); + int err; + + if (IS_ERR(vma)) + return vma; + + err = i915_vma_pin(vma, 0, 0, + i915_vma_is_ggtt(vma) ? PIN_GLOBAL : PIN_USER); + if (err) { + i915_vma_put(vma); + return ERR_PTR(err); + } + + return vma; +} + static u32 reg_write(u32 old, u32 new, u32 rsvd) { if (rsvd == 0x0000ffff) { @@ -489,7 +508,7 @@ static int check_dirty_whitelist(struct intel_context *ce) int err = 0, i, v; u32 *cs, *results;
- scratch = create_scratch(ce->vm, 2 * ARRAY_SIZE(values) + 1); + scratch = create_scratch_pinned(ce->vm, 2 * ARRAY_SIZE(values) + 1); if (IS_ERR(scratch)) return PTR_ERR(scratch);
@@ -1043,7 +1062,7 @@ static int live_isolated_whitelist(void *arg)
vm = i915_gem_context_get_vm_rcu(c);
- client[i].scratch[0] = create_scratch(vm, 1024); + client[i].scratch[0] = create_scratch_pinned(vm, 1024); if (IS_ERR(client[i].scratch[0])) { err = PTR_ERR(client[i].scratch[0]); i915_vm_put(vm); @@ -1051,7 +1070,7 @@ static int live_isolated_whitelist(void *arg) goto err; }
- client[i].scratch[1] = create_scratch(vm, 1024); + client[i].scratch[1] = create_scratch_pinned(vm, 1024); if (IS_ERR(client[i].scratch[1])) { err = PTR_ERR(client[i].scratch[1]); i915_vma_unpin_and_release(&client[i].scratch[0], 0);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We previously complained when ww == NULL.
This function is now only used in selftests to pin an object, and ww locking is now fixed.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../i915/gem/selftests/i915_gem_coherency.c | 14 +++++-------- drivers/gpu/drm/i915/i915_gem.c | 6 +++++- drivers/gpu/drm/i915/i915_vma.c | 3 +-- drivers/gpu/drm/i915/i915_vma.h | 20 +++++++++++++++---- 4 files changed, 27 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c index 7049a6bbc03d..2e439bb269d6 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c @@ -199,16 +199,14 @@ static int gpu_set(struct context *ctx, unsigned long offset, u32 v) u32 *cs; int err;
+ vma = i915_gem_object_ggtt_pin(ctx->obj, NULL, 0, 0, 0); + if (IS_ERR(vma)) + return PTR_ERR(vma); + i915_gem_object_lock(ctx->obj, NULL); err = i915_gem_object_set_to_gtt_domain(ctx->obj, true); if (err) - goto out_unlock; - - vma = i915_gem_object_ggtt_pin(ctx->obj, NULL, 0, 0, 0); - if (IS_ERR(vma)) { - err = PTR_ERR(vma); - goto out_unlock; - } + goto out_unpin;
rq = intel_engine_create_kernel_request(ctx->engine); if (IS_ERR(rq)) { @@ -248,9 +246,7 @@ static int gpu_set(struct context *ctx, unsigned long offset, u32 v) i915_request_add(rq); out_unpin: i915_vma_unpin(vma); -out_unlock: i915_gem_object_unlock(ctx->obj); - return err; }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0b9eab66511c..b5311f7ad870 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1011,7 +1011,11 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj, return ERR_PTR(ret); }
- ret = i915_vma_pin_ww(vma, ww, size, alignment, flags | PIN_GLOBAL); + if (ww) + ret = i915_vma_pin_ww(vma, ww, size, alignment, flags | PIN_GLOBAL); + else + ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL); + if (ret) return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 5b1d78fa748e..63bdb0cc981e 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -868,8 +868,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, vma->obj && i915_gem_object_has_pinned_pages(vma->obj) && !vma->vm->allocate_va_range;
- if (lockdep_is_held(&vma->vm->i915->drm.struct_mutex) && - !pinned_bind_wo_alloc) + if (!pinned_bind_wo_alloc) WARN_ON(!ww); if (ww && vma->resv) assert_vma_held(vma); diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index a2e7b58b70ca..2db4f25b8d5f 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -246,10 +246,22 @@ i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, static inline int __must_check i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) { -#ifdef CONFIG_LOCKDEP - WARN_ON_ONCE(vma->resv && dma_resv_held(vma->resv)); -#endif - return i915_vma_pin_ww(vma, NULL, size, alignment, flags); + struct i915_gem_ww_ctx ww; + int err; + + i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(vma->obj, &ww); + if (!err) + err = i915_vma_pin_ww(vma, &ww, size, alignment, flags); + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + + return err; }
int i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Make creation separate from pinning, in order to take the lock only once, and pin the mapping with the lock held.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com --- .../drm/i915/gt/intel_engine_workaround_bb.c | 45 +++++++++++++++---- 1 file changed, 37 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c b/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c index b03bdfc92bb2..f3636b73cc10 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_workaround_bb.c @@ -229,7 +229,7 @@ gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
#define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE)
-static int lrc_setup_wa_ctx(struct intel_engine_cs *engine) +static int lrc_init_wa_ctx(struct intel_engine_cs *engine) { struct drm_i915_gem_object *obj; struct i915_vma *vma; @@ -245,10 +245,6 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs *engine) goto err; }
- err = i915_ggtt_pin(vma, NULL, 0, PIN_HIGH); - if (err) - goto err; - engine->wa_ctx.vma = vma; return 0;
@@ -257,6 +253,18 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs *engine) return err; }
+static void lrc_destroy_wa_ctx(struct intel_engine_cs *engine, bool unpin) +{ + if (!engine->wa_ctx.vma) + return; + + if (unpin) + i915_vma_unpin(engine->wa_ctx.vma); + + i915_vma_put(engine->wa_ctx.vma); + engine->wa_ctx.vma = NULL; +} + typedef u32 *(*wa_bb_func_t)(struct intel_engine_cs *engine, u32 *batch);
int intel_init_workaround_bb(struct intel_engine_cs *engine) @@ -266,6 +274,7 @@ int intel_init_workaround_bb(struct intel_engine_cs *engine) &wa_ctx->per_ctx }; wa_bb_func_t wa_bb_fn[2]; void *batch, *batch_ptr; + struct i915_gem_ww_ctx ww; unsigned int i; int ret;
@@ -293,13 +302,21 @@ int intel_init_workaround_bb(struct intel_engine_cs *engine) return 0; }
- ret = lrc_setup_wa_ctx(engine); + ret = lrc_init_wa_ctx(engine); if (ret) { drm_dbg(&engine->i915->drm, "Failed to setup context WA page: %d\n", ret); return ret; }
+ i915_gem_ww_ctx_init(&ww, true); +retry: + ret = i915_gem_object_lock(wa_ctx->vma->obj, &ww); + if (!ret) + ret = i915_ggtt_pin(wa_ctx->vma, &ww, 0, PIN_HIGH); + if (ret) + goto err; + batch = i915_gem_object_pin_map(wa_ctx->vma->obj, I915_MAP_WB);
/* @@ -323,13 +340,25 @@ int intel_init_workaround_bb(struct intel_engine_cs *engine)
__i915_gem_object_flush_map(wa_ctx->vma->obj, 0, batch_ptr - batch); __i915_gem_object_release_map(wa_ctx->vma->obj); + + if (ret) + i915_vma_unpin(wa_ctx->vma); + +err: + if (ret == -EDEADLK) { + ret = i915_gem_ww_ctx_backoff(&ww); + if (!ret) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); if (ret) - intel_fini_workaround_bb(engine); + lrc_destroy_wa_ctx(engine, false);
return ret; }
+ void intel_fini_workaround_bb(struct intel_engine_cs *engine) { - i915_vma_unpin_and_release(&engine->wa_ctx.vma, 0); + lrc_destroy_wa_ctx(engine, true); }
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Take the ww lock around engine_unpark. Because of the many many places where rpm is used, I chose the safest option and used a trylock to opportunistically take this lock for __engine_unpark.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 499b09cb4acf..5d51144ef074 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -27,12 +27,16 @@ static void dbg_poison_ce(struct intel_context *ce) int type = i915_coherent_map_type(ce->engine->i915); void *map;
+ if (!i915_gem_object_trylock(ce->state->obj)) + return; + map = i915_gem_object_pin_map(obj, type); if (!IS_ERR(map)) { memset(map, CONTEXT_REDZONE, obj->base.size); i915_gem_object_flush_map(obj); i915_gem_object_unpin_map(obj); } + i915_gem_object_unlock(obj); } }
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We need to lock the object to move it to the correct domain, add the missing lock.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 51a33c4f61d0..e62f9e8dd339 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -516,6 +516,10 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, goto out; }
+ err = i915_gem_object_lock_interruptible(obj, NULL); + if (err) + goto out; + /* * Flush and acquire obj->pages so that we are coherent through * direct access in memory with previous cached writes through @@ -527,7 +531,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, */ err = i915_gem_object_pin_pages(obj); if (err) - goto out; + goto out_unlock;
/* * Already in the desired write domain? Nothing for us to do! @@ -542,10 +546,6 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, if (READ_ONCE(obj->write_domain) == read_domains) goto out_unpin;
- err = i915_gem_object_lock_interruptible(obj, NULL); - if (err) - goto out_unpin; - if (read_domains & I915_GEM_DOMAIN_WC) err = i915_gem_object_set_to_wc_domain(obj, write_domain); else if (read_domains & I915_GEM_DOMAIN_GTT) @@ -556,13 +556,15 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, /* And bump the LRU for this access */ i915_gem_object_bump_inactive_ggtt(obj);
+out_unpin: + i915_gem_object_unpin_pages(obj); + +out_unlock: i915_gem_object_unlock(obj);
- if (write_domain) + if (!err && write_domain) i915_gem_object_invalidate_frontbuffer(obj, ORIGIN_CPU);
-out_unpin: - i915_gem_object_unpin_pages(obj); out: i915_gem_object_put(obj); return err;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We need to take the obj lock to pin pages, so wait until the callers have done so, before making the object unshrinkable.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 + .../gpu/drm/i915/gem/i915_gem_object_blt.c | 6 +++ .../gpu/drm/i915/gt/intel_gt_buffer_pool.c | 47 +++++++++---------- .../gpu/drm/i915/gt/intel_gt_buffer_pool.h | 5 ++ .../drm/i915/gt/intel_gt_buffer_pool_types.h | 1 + 5 files changed, 35 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index f5ea49e244ca..91f0c3fd9a4b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1343,6 +1343,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, err = PTR_ERR(cmd); goto err_pool; } + intel_gt_buffer_pool_mark_used(pool);
batch = i915_vma_instance(pool->obj, vma->vm, NULL); if (IS_ERR(batch)) { @@ -2635,6 +2636,7 @@ static int eb_parse(struct i915_execbuffer *eb) err = PTR_ERR(shadow); goto err; } + intel_gt_buffer_pool_mark_used(pool); i915_gem_object_set_readonly(shadow->obj); shadow->private = pool;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c index aee7ad3cc3c6..e0b873c3f46a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c @@ -54,6 +54,9 @@ struct i915_vma *intel_emit_vma_fill_blt(struct intel_context *ce, if (unlikely(err)) goto out_put;
+ /* we pinned the pool, mark it as such */ + intel_gt_buffer_pool_mark_used(pool); + cmd = i915_gem_object_pin_map(pool->obj, I915_MAP_WC); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); @@ -276,6 +279,9 @@ struct i915_vma *intel_emit_vma_copy_blt(struct intel_context *ce, if (unlikely(err)) goto out_put;
+ /* we pinned the pool, mark it as such */ + intel_gt_buffer_pool_mark_used(pool); + cmd = i915_gem_object_pin_map(pool->obj, I915_MAP_WC); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c index 104cb30e8c13..030759305196 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c @@ -98,28 +98,6 @@ static void pool_free_work(struct work_struct *wrk) round_jiffies_up_relative(HZ)); }
-static int pool_active(struct i915_active *ref) -{ - struct intel_gt_buffer_pool_node *node = - container_of(ref, typeof(*node), active); - struct dma_resv *resv = node->obj->base.resv; - int err; - - if (dma_resv_trylock(resv)) { - dma_resv_add_excl_fence(resv, NULL); - dma_resv_unlock(resv); - } - - err = i915_gem_object_pin_pages(node->obj); - if (err) - return err; - - /* Hide this pinned object from the shrinker until retired */ - i915_gem_object_make_unshrinkable(node->obj); - - return 0; -} - __i915_active_call static void pool_retire(struct i915_active *ref) { @@ -129,10 +107,13 @@ static void pool_retire(struct i915_active *ref) struct list_head *list = bucket_for_size(pool, node->obj->base.size); unsigned long flags;
- i915_gem_object_unpin_pages(node->obj); + if (node->pinned) { + i915_gem_object_unpin_pages(node->obj);
- /* Return this object to the shrinker pool */ - i915_gem_object_make_purgeable(node->obj); + /* Return this object to the shrinker pool */ + i915_gem_object_make_purgeable(node->obj); + node->pinned = false; + }
GEM_BUG_ON(node->age); spin_lock_irqsave(&pool->lock, flags); @@ -144,6 +125,19 @@ static void pool_retire(struct i915_active *ref) round_jiffies_up_relative(HZ)); }
+void intel_gt_buffer_pool_mark_used(struct intel_gt_buffer_pool_node *node) +{ + assert_object_held(node->obj); + + if (node->pinned) + return; + + __i915_gem_object_pin_pages(node->obj); + /* Hide this pinned object from the shrinker until retired */ + i915_gem_object_make_unshrinkable(node->obj); + node->pinned = true; +} + static struct intel_gt_buffer_pool_node * node_create(struct intel_gt_buffer_pool *pool, size_t sz) { @@ -158,7 +152,8 @@ node_create(struct intel_gt_buffer_pool *pool, size_t sz)
node->age = 0; node->pool = pool; - i915_active_init(&node->active, pool_active, pool_retire); + node->pinned = false; + i915_active_init(&node->active, NULL, pool_retire);
obj = i915_gem_object_create_internal(gt->i915, sz); if (IS_ERR(obj)) { diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h index 42cbac003e8a..9878ce9a07ab 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h @@ -17,10 +17,15 @@ struct i915_request; struct intel_gt_buffer_pool_node * intel_gt_get_buffer_pool(struct intel_gt *gt, size_t size);
+void intel_gt_buffer_pool_mark_used(struct intel_gt_buffer_pool_node *node); + static inline int intel_gt_buffer_pool_mark_active(struct intel_gt_buffer_pool_node *node, struct i915_request *rq) { + /* did we call mark_used? */ + GEM_WARN_ON(!node->pinned); + return i915_active_add_request(&node->active, rq); }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool_types.h b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool_types.h index bcf1658c9633..0401825e829d 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool_types.h @@ -31,6 +31,7 @@ struct intel_gt_buffer_pool_node { struct rcu_head rcu; }; unsigned long age; + bool pinned; };
#endif /* INTEL_GT_BUFFER_POOL_TYPES_H */
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We are removing obj->mm.lock, and need to take the reservation lock before we can pin pages. Move the pinning pages into the helper, and merge gtt pwrite/pread preparation and cleanup paths.
The fence lock is also removed; it will conflict with fence annotations, because of memory allocations done when pagefaulting inside copy_*_user.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/Makefile | 1 - drivers/gpu/drm/i915/gem/i915_gem_fence.c | 95 --------- drivers/gpu/drm/i915/gem/i915_gem_object.h | 5 - drivers/gpu/drm/i915/i915_gem.c | 224 +++++++++++---------- 4 files changed, 114 insertions(+), 211 deletions(-) delete mode 100644 drivers/gpu/drm/i915/gem/i915_gem_fence.c
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 2445cc990e15..5112e5d79316 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -136,7 +136,6 @@ gem-y += \ gem/i915_gem_dmabuf.o \ gem/i915_gem_domain.o \ gem/i915_gem_execbuffer.o \ - gem/i915_gem_fence.o \ gem/i915_gem_internal.o \ gem/i915_gem_object.o \ gem/i915_gem_object_blt.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_fence.c b/drivers/gpu/drm/i915/gem/i915_gem_fence.c deleted file mode 100644 index 8ab842c80f99..000000000000 --- a/drivers/gpu/drm/i915/gem/i915_gem_fence.c +++ /dev/null @@ -1,95 +0,0 @@ -/* - * SPDX-License-Identifier: MIT - * - * Copyright © 2019 Intel Corporation - */ - -#include "i915_drv.h" -#include "i915_gem_object.h" - -struct stub_fence { - struct dma_fence dma; - struct i915_sw_fence chain; -}; - -static int __i915_sw_fence_call -stub_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) -{ - struct stub_fence *stub = container_of(fence, typeof(*stub), chain); - - switch (state) { - case FENCE_COMPLETE: - dma_fence_signal(&stub->dma); - break; - - case FENCE_FREE: - dma_fence_put(&stub->dma); - break; - } - - return NOTIFY_DONE; -} - -static const char *stub_driver_name(struct dma_fence *fence) -{ - return DRIVER_NAME; -} - -static const char *stub_timeline_name(struct dma_fence *fence) -{ - return "object"; -} - -static void stub_release(struct dma_fence *fence) -{ - struct stub_fence *stub = container_of(fence, typeof(*stub), dma); - - i915_sw_fence_fini(&stub->chain); - - BUILD_BUG_ON(offsetof(typeof(*stub), dma)); - dma_fence_free(&stub->dma); -} - -static const struct dma_fence_ops stub_fence_ops = { - .get_driver_name = stub_driver_name, - .get_timeline_name = stub_timeline_name, - .release = stub_release, -}; - -struct dma_fence * -i915_gem_object_lock_fence(struct drm_i915_gem_object *obj) -{ - struct stub_fence *stub; - - assert_object_held(obj); - - stub = kmalloc(sizeof(*stub), GFP_KERNEL); - if (!stub) - return NULL; - - i915_sw_fence_init(&stub->chain, stub_notify); - dma_fence_init(&stub->dma, &stub_fence_ops, &stub->chain.wait.lock, - 0, 0); - - if (i915_sw_fence_await_reservation(&stub->chain, - obj->base.resv, NULL, true, - i915_fence_timeout(to_i915(obj->base.dev)), - I915_FENCE_GFP) < 0) - goto err; - - dma_resv_add_excl_fence(obj->base.resv, &stub->dma); - - return &stub->dma; - -err: - stub_release(&stub->dma); - return NULL; -} - -void i915_gem_object_unlock_fence(struct drm_i915_gem_object *obj, - struct dma_fence *fence) -{ - struct stub_fence *stub = container_of(fence, typeof(*stub), dma); - - i915_sw_fence_commit(&stub->chain); -} diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 0fec91ad6f62..9a81a80ca849 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -180,11 +180,6 @@ static inline void i915_gem_object_unlock(struct drm_i915_gem_object *obj) dma_resv_unlock(obj->base.resv); }
-struct dma_fence * -i915_gem_object_lock_fence(struct drm_i915_gem_object *obj); -void i915_gem_object_unlock_fence(struct drm_i915_gem_object *obj, - struct dma_fence *fence); - static inline void i915_gem_object_set_readonly(struct drm_i915_gem_object *obj) { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b5311f7ad870..b81fbd907775 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -306,7 +306,6 @@ i915_gem_shmem_pread(struct drm_i915_gem_object *obj, { unsigned int needs_clflush; unsigned int idx, offset; - struct dma_fence *fence; char __user *user_data; u64 remain; int ret; @@ -315,19 +314,17 @@ i915_gem_shmem_pread(struct drm_i915_gem_object *obj, if (ret) return ret;
+ ret = i915_gem_object_pin_pages(obj); + if (ret) + goto err_unlock; + ret = i915_gem_object_prepare_read(obj, &needs_clflush); - if (ret) { - i915_gem_object_unlock(obj); - return ret; - } + if (ret) + goto err_unpin;
- fence = i915_gem_object_lock_fence(obj); i915_gem_object_finish_access(obj); i915_gem_object_unlock(obj);
- if (!fence) - return -ENOMEM; - remain = args->size; user_data = u64_to_user_ptr(args->data_ptr); offset = offset_in_page(args->offset); @@ -345,7 +342,13 @@ i915_gem_shmem_pread(struct drm_i915_gem_object *obj, offset = 0; }
- i915_gem_object_unlock_fence(obj, fence); + i915_gem_object_unpin_pages(obj); + return ret; + +err_unpin: + i915_gem_object_unpin_pages(obj); +err_unlock: + i915_gem_object_unlock(obj); return ret; }
@@ -373,52 +376,102 @@ gtt_user_read(struct io_mapping *mapping, return unwritten; }
-static int -i915_gem_gtt_pread(struct drm_i915_gem_object *obj, - const struct drm_i915_gem_pread *args) +static struct i915_vma *i915_gem_gtt_prepare(struct drm_i915_gem_object *obj, + struct drm_mm_node *node, + bool write) { struct drm_i915_private *i915 = to_i915(obj->base.dev); struct i915_ggtt *ggtt = &i915->ggtt; - intel_wakeref_t wakeref; - struct drm_mm_node node; - struct dma_fence *fence; - void __user *user_data; struct i915_vma *vma; - u64 remain, offset; + struct i915_gem_ww_ctx ww; int ret;
- wakeref = intel_runtime_pm_get(&i915->runtime_pm); + i915_gem_ww_ctx_init(&ww, true); +retry: vma = ERR_PTR(-ENODEV); + ret = i915_gem_object_lock(obj, &ww); + if (ret) + goto err_ww; + + ret = i915_gem_object_set_to_gtt_domain(obj, write); + if (ret) + goto err_ww; + if (!i915_gem_object_is_tiled(obj)) - vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, - PIN_MAPPABLE | - PIN_NONBLOCK /* NOWARN */ | - PIN_NOEVICT); - if (!IS_ERR(vma)) { - node.start = i915_ggtt_offset(vma); - node.flags = 0; + vma = i915_gem_object_ggtt_pin_ww(obj, &ww, NULL, 0, 0, + PIN_MAPPABLE | + PIN_NONBLOCK /* NOWARN */ | + PIN_NOEVICT); + if (vma == ERR_PTR(-EDEADLK)) { + ret = -EDEADLK; + goto err_ww; + } else if (!IS_ERR(vma)) { + node->start = i915_ggtt_offset(vma); + node->flags = 0; } else { - ret = insert_mappable_node(ggtt, &node, PAGE_SIZE); + ret = insert_mappable_node(ggtt, node, PAGE_SIZE); if (ret) - goto out_rpm; - GEM_BUG_ON(!drm_mm_node_allocated(&node)); + goto err_ww; + GEM_BUG_ON(!drm_mm_node_allocated(node)); + vma = NULL; }
- ret = i915_gem_object_lock_interruptible(obj, NULL); - if (ret) - goto out_unpin; - - ret = i915_gem_object_set_to_gtt_domain(obj, false); + ret = i915_gem_object_pin_pages(obj); if (ret) { - i915_gem_object_unlock(obj); - goto out_unpin; + if (drm_mm_node_allocated(node)) { + ggtt->vm.clear_range(&ggtt->vm, node->start, node->size); + remove_mappable_node(ggtt, node); + } else { + i915_vma_unpin(vma); + } }
- fence = i915_gem_object_lock_fence(obj); - i915_gem_object_unlock(obj); - if (!fence) { - ret = -ENOMEM; - goto out_unpin; +err_ww: + if (ret == -EDEADLK) { + ret = i915_gem_ww_ctx_backoff(&ww); + if (!ret) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + + return ret ? ERR_PTR(ret) : vma; +} + +static void i915_gem_gtt_cleanup(struct drm_i915_gem_object *obj, + struct drm_mm_node *node, + struct i915_vma *vma) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct i915_ggtt *ggtt = &i915->ggtt; + + i915_gem_object_unpin_pages(obj); + if (drm_mm_node_allocated(node)) { + ggtt->vm.clear_range(&ggtt->vm, node->start, node->size); + remove_mappable_node(ggtt, node); + } else { + i915_vma_unpin(vma); + } +} + +static int +i915_gem_gtt_pread(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pread *args) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct i915_ggtt *ggtt = &i915->ggtt; + intel_wakeref_t wakeref; + struct drm_mm_node node; + void __user *user_data; + struct i915_vma *vma; + u64 remain, offset; + int ret = 0; + + wakeref = intel_runtime_pm_get(&i915->runtime_pm); + + vma = i915_gem_gtt_prepare(obj, &node, false); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); + goto out_rpm; }
user_data = u64_to_user_ptr(args->data_ptr); @@ -455,14 +508,7 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj, offset += page_length; }
- i915_gem_object_unlock_fence(obj, fence); -out_unpin: - if (drm_mm_node_allocated(&node)) { - ggtt->vm.clear_range(&ggtt->vm, node.start, node.size); - remove_mappable_node(ggtt, &node); - } else { - i915_vma_unpin(vma); - } + i915_gem_gtt_cleanup(obj, &node, vma); out_rpm: intel_runtime_pm_put(&i915->runtime_pm, wakeref); return ret; @@ -515,15 +561,10 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data, if (ret) goto out;
- ret = i915_gem_object_pin_pages(obj); - if (ret) - goto out; - ret = i915_gem_shmem_pread(obj, args); if (ret == -EFAULT || ret == -ENODEV) ret = i915_gem_gtt_pread(obj, args);
- i915_gem_object_unpin_pages(obj); out: i915_gem_object_put(obj); return ret; @@ -571,11 +612,10 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, struct intel_runtime_pm *rpm = &i915->runtime_pm; intel_wakeref_t wakeref; struct drm_mm_node node; - struct dma_fence *fence; struct i915_vma *vma; u64 remain, offset; void __user *user_data; - int ret; + int ret = 0;
if (i915_gem_object_has_struct_page(obj)) { /* @@ -593,37 +633,10 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, wakeref = intel_runtime_pm_get(rpm); }
- vma = ERR_PTR(-ENODEV); - if (!i915_gem_object_is_tiled(obj)) - vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, - PIN_MAPPABLE | - PIN_NONBLOCK /* NOWARN */ | - PIN_NOEVICT); - if (!IS_ERR(vma)) { - node.start = i915_ggtt_offset(vma); - node.flags = 0; - } else { - ret = insert_mappable_node(ggtt, &node, PAGE_SIZE); - if (ret) - goto out_rpm; - GEM_BUG_ON(!drm_mm_node_allocated(&node)); - } - - ret = i915_gem_object_lock_interruptible(obj, NULL); - if (ret) - goto out_unpin; - - ret = i915_gem_object_set_to_gtt_domain(obj, true); - if (ret) { - i915_gem_object_unlock(obj); - goto out_unpin; - } - - fence = i915_gem_object_lock_fence(obj); - i915_gem_object_unlock(obj); - if (!fence) { - ret = -ENOMEM; - goto out_unpin; + vma = i915_gem_gtt_prepare(obj, &node, true); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); + goto out_rpm; }
i915_gem_object_invalidate_frontbuffer(obj, ORIGIN_CPU); @@ -672,14 +685,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(ggtt->vm.gt); i915_gem_object_flush_frontbuffer(obj, ORIGIN_CPU);
- i915_gem_object_unlock_fence(obj, fence); -out_unpin: - if (drm_mm_node_allocated(&node)) { - ggtt->vm.clear_range(&ggtt->vm, node.start, node.size); - remove_mappable_node(ggtt, &node); - } else { - i915_vma_unpin(vma); - } + i915_gem_gtt_cleanup(obj, &node, vma); out_rpm: intel_runtime_pm_put(rpm, wakeref); return ret; @@ -719,7 +725,6 @@ i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj, unsigned int partial_cacheline_write; unsigned int needs_clflush; unsigned int offset, idx; - struct dma_fence *fence; void __user *user_data; u64 remain; int ret; @@ -728,19 +733,17 @@ i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj, if (ret) return ret;
+ ret = i915_gem_object_pin_pages(obj); + if (ret) + goto err_unlock; + ret = i915_gem_object_prepare_write(obj, &needs_clflush); - if (ret) { - i915_gem_object_unlock(obj); - return ret; - } + if (ret) + goto err_unpin;
- fence = i915_gem_object_lock_fence(obj); i915_gem_object_finish_access(obj); i915_gem_object_unlock(obj);
- if (!fence) - return -ENOMEM; - /* If we don't overwrite a cacheline completely we need to be * careful to have up-to-date data by first clflushing. Don't * overcomplicate things and flush the entire patch. @@ -768,8 +771,14 @@ i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj, }
i915_gem_object_flush_frontbuffer(obj, ORIGIN_CPU); - i915_gem_object_unlock_fence(obj, fence);
+ i915_gem_object_unpin_pages(obj); + return ret; + +err_unpin: + i915_gem_object_unpin_pages(obj); +err_unlock: + i915_gem_object_unlock(obj); return ret; }
@@ -826,10 +835,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, if (ret) goto err;
- ret = i915_gem_object_pin_pages(obj); - if (ret) - goto err; - ret = -EFAULT; /* We can only do the GTT pwrite on untiled buffers, as otherwise * it would end up going through the fenced access, and we'll get @@ -850,7 +855,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, ret = i915_gem_shmem_pwrite(obj, args); }
- i915_gem_object_unpin_pages(obj); err: i915_gem_object_put(obj); return ret;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
pin_map needs the ww lock, so ensure we pin both before submission.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 12 +++ .../gpu/drm/i915/gt/selftest_workarounds.c | 76 ++++++++++++------- 3 files changed, 64 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 9a81a80ca849..da7fd301fc8d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -412,6 +412,9 @@ enum i915_map_type { void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj, enum i915_map_type type);
+void *__must_check i915_gem_object_pin_map_unlocked(struct drm_i915_gem_object *obj, + enum i915_map_type type); + void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj, unsigned long offset, unsigned long size); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 5bcd21a8fc4e..b03e58106516 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -397,6 +397,18 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, goto out_unlock; }
+void *i915_gem_object_pin_map_unlocked(struct drm_i915_gem_object *obj, + enum i915_map_type type) +{ + void *ret; + + i915_gem_object_lock(obj, NULL); + ret = i915_gem_object_pin_map(obj, type); + i915_gem_object_unlock(obj); + + return ret; +} + void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj, unsigned long offset, unsigned long size) diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c index 810ab026a55e..69da2147ed3b 100644 --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c @@ -111,7 +111,7 @@ read_nonprivs(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
i915_gem_object_set_cache_coherency(result, I915_CACHE_LLC);
- cs = i915_gem_object_pin_map(result, I915_MAP_WB); + cs = i915_gem_object_pin_map_unlocked(result, I915_MAP_WB); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto err_obj; @@ -217,7 +217,7 @@ static int check_whitelist(struct i915_gem_context *ctx, i915_gem_object_lock(results, NULL); intel_wedge_on_timeout(&wedge, engine->gt, HZ / 5) /* safety net! */ err = i915_gem_object_set_to_cpu_domain(results, false); - i915_gem_object_unlock(results); + if (intel_gt_is_wedged(engine->gt)) err = -EIO; if (err) @@ -245,6 +245,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
i915_gem_object_unpin_map(results); out_put: + i915_gem_object_unlock(results); i915_gem_object_put(results); return err; } @@ -520,6 +521,7 @@ static int check_dirty_whitelist(struct intel_context *ce)
for (i = 0; i < engine->whitelist.count; i++) { u32 reg = i915_mmio_reg_offset(engine->whitelist.list[i].reg); + struct i915_gem_ww_ctx ww; u64 addr = scratch->node.start; struct i915_request *rq; u32 srm, lrm, rsvd; @@ -535,6 +537,29 @@ static int check_dirty_whitelist(struct intel_context *ce)
ro_reg = ro_register(reg);
+ i915_gem_ww_ctx_init(&ww, false); +retry: + cs = NULL; + err = i915_gem_object_lock(scratch->obj, &ww); + if (!err) + err = i915_gem_object_lock(batch->obj, &ww); + if (!err) + err = intel_context_pin_ww(ce, &ww); + if (err) + goto out; + + cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC); + if (IS_ERR(cs)) { + err = PTR_ERR(cs); + goto out_ctx; + } + + results = i915_gem_object_pin_map(scratch->obj, I915_MAP_WB); + if (IS_ERR(results)) { + err = PTR_ERR(results); + goto out_unmap_batch; + } + /* Clear non priv flags */ reg &= RING_FORCE_TO_NONPRIV_ADDRESS_MASK;
@@ -546,12 +571,6 @@ static int check_dirty_whitelist(struct intel_context *ce) pr_debug("%s: Writing garbage to %x\n", engine->name, reg);
- cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC); - if (IS_ERR(cs)) { - err = PTR_ERR(cs); - goto out_batch; - } - /* SRM original */ *cs++ = srm; *cs++ = reg; @@ -598,11 +617,12 @@ static int check_dirty_whitelist(struct intel_context *ce) i915_gem_object_flush_map(batch->obj); i915_gem_object_unpin_map(batch->obj); intel_gt_chipset_flush(engine->gt); + cs = NULL;
- rq = intel_context_create_request(ce); + rq = i915_request_create(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto out_batch; + goto out_unmap_scratch; }
if (engine->emit_init_breadcrumb) { /* Be nice if we hang */ @@ -611,20 +631,16 @@ static int check_dirty_whitelist(struct intel_context *ce) goto err_request; }
- i915_vma_lock(batch); err = i915_request_await_object(rq, batch->obj, false); if (err == 0) err = i915_vma_move_to_active(batch, rq, 0); - i915_vma_unlock(batch); if (err) goto err_request;
- i915_vma_lock(scratch); err = i915_request_await_object(rq, scratch->obj, true); if (err == 0) err = i915_vma_move_to_active(scratch, rq, EXEC_OBJECT_WRITE); - i915_vma_unlock(scratch); if (err) goto err_request;
@@ -640,13 +656,7 @@ static int check_dirty_whitelist(struct intel_context *ce) pr_err("%s: Futzing %x timedout; cancelling test\n", engine->name, reg); intel_gt_set_wedged(engine->gt); - goto out_batch; - } - - results = i915_gem_object_pin_map(scratch->obj, I915_MAP_WB); - if (IS_ERR(results)) { - err = PTR_ERR(results); - goto out_batch; + goto out_unmap_scratch; }
GEM_BUG_ON(values[ARRAY_SIZE(values) - 1] != 0xffffffff); @@ -657,7 +667,7 @@ static int check_dirty_whitelist(struct intel_context *ce) pr_err("%s: Unable to write to whitelisted register %x\n", engine->name, reg); err = -EINVAL; - goto out_unpin; + goto out_unmap_scratch; } } else { rsvd = 0; @@ -723,15 +733,27 @@ static int check_dirty_whitelist(struct intel_context *ce)
err = -EINVAL; } -out_unpin: +out_unmap_scratch: i915_gem_object_unpin_map(scratch->obj); +out_unmap_batch: + if (cs) + i915_gem_object_unpin_map(batch->obj); +out_ctx: + intel_context_unpin(ce); +out: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); if (err) break; }
if (igt_flush_test(engine->i915)) err = -EIO; -out_batch: + i915_vma_unpin_and_release(&batch, 0); out_scratch: i915_vma_unpin_and_release(&scratch, 0); @@ -868,7 +890,7 @@ static int scrub_whitelisted_registers(struct i915_gem_context *ctx, if (IS_ERR(batch)) return PTR_ERR(batch);
- cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC); + cs = i915_gem_object_pin_map_unlocked(batch->obj, I915_MAP_WC); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto err_batch; @@ -1003,11 +1025,11 @@ check_whitelisted_registers(struct intel_engine_cs *engine, u32 *a, *b; int i, err;
- a = i915_gem_object_pin_map(A->obj, I915_MAP_WB); + a = i915_gem_object_pin_map_unlocked(A->obj, I915_MAP_WB); if (IS_ERR(a)) return PTR_ERR(a);
- b = i915_gem_object_pin_map(B->obj, I915_MAP_WB); + b = i915_gem_object_pin_map_unlocked(B->obj, I915_MAP_WB); if (IS_ERR(b)) { err = PTR_ERR(b); goto err_a;
From: Thomas Hellström thomas.hellstrom@intel.com
Stolen objects need to lock, and we may call put_pages when refcount drops to 0, ensure all calls are handled correctly.
Idea-from: Thomas Hellström thomas.hellstrom@intel.com Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 13 +++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_pages.c | 14 ++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 10 +++++++++- 3 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index da7fd301fc8d..26ef37532f81 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -125,6 +125,19 @@ i915_gem_object_put(struct drm_i915_gem_object *obj) ((kref_read(&obj->base.refcount) == 1) && \ list_empty_careful(&obj->mm.link) && \ list_empty_careful(&obj->vma.list)))) +/* + * If more than one potential simultaneous locker, assert held. + */ +static inline void assert_object_held_shared(struct drm_i915_gem_object *obj) +{ + /* + * Note mm list lookup is protected by + * kref_get_unless_zero(). + */ + if (IS_ENABLED(CONFIG_LOCKDEP) && + kref_read(&obj->base.refcount) > 0) + lockdep_assert_held(&obj->mm.lock); +}
static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj, struct i915_gem_ww_ctx *ww, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index b03e58106516..183aae046b68 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -18,7 +18,7 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, unsigned long supported = INTEL_INFO(i915)->page_sizes; int i;
- lockdep_assert_held(&obj->mm.lock); + assert_object_held_shared(obj);
if (i915_gem_object_is_volatile(obj)) obj->mm.madv = I915_MADV_DONTNEED; @@ -67,6 +67,7 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, struct list_head *list; unsigned long flags;
+ lockdep_assert_held(&obj->mm.lock); spin_lock_irqsave(&i915->mm.obj_lock, flags);
i915->mm.shrink_count++; @@ -88,6 +89,8 @@ int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj) struct drm_i915_private *i915 = to_i915(obj->base.dev); int err;
+ assert_object_held_shared(obj); + if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) { drm_dbg(&i915->drm, "Attempting to obtain a purgeable object\n"); @@ -115,6 +118,8 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj) if (err) return err;
+ assert_object_held_shared(obj); + if (unlikely(!i915_gem_object_has_pages(obj))) { GEM_BUG_ON(i915_gem_object_has_pinned_pages(obj));
@@ -142,7 +147,7 @@ void i915_gem_object_truncate(struct drm_i915_gem_object *obj) /* Try to discard unwanted pages */ void i915_gem_object_writeback(struct drm_i915_gem_object *obj) { - lockdep_assert_held(&obj->mm.lock); + assert_object_held_shared(obj); GEM_BUG_ON(i915_gem_object_has_pages(obj));
if (obj->ops->writeback) @@ -173,6 +178,8 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) { struct sg_table *pages;
+ assert_object_held_shared(obj); + pages = fetch_and_zero(&obj->mm.pages); if (IS_ERR_OR_NULL(pages)) return pages; @@ -200,6 +207,9 @@ int __i915_gem_object_put_pages_locked(struct drm_i915_gem_object *obj) if (i915_gem_object_has_pinned_pages(obj)) return -EBUSY;
+ /* May be called by shrinker from within get_pages() (on another bo) */ + assert_object_held_shared(obj); + i915_gem_object_release_mmap_offset(obj);
/* diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 5372b888ba01..ce9086d3a647 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -643,11 +643,19 @@ __i915_gem_object_create_stolen(struct intel_memory_region *mem, cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE; i915_gem_object_set_cache_coherency(obj, cache_level);
+ if (WARN_ON(!i915_gem_object_trylock(obj))) { + err = -EBUSY; + goto cleanup; + } + err = i915_gem_object_pin_pages(obj); - if (err) + if (err) { + i915_gem_object_unlock(obj); goto cleanup; + }
i915_gem_object_init_memory_region(obj, mem); + i915_gem_object_unlock(obj);
return obj;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
By default, we assume that it's called inside igt_create_request to keep existing selftests working, but allow for manual pinning when passing a ww context.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/selftests/igt_spinner.c | 136 ++++++++++++------- drivers/gpu/drm/i915/selftests/igt_spinner.h | 5 + 2 files changed, 95 insertions(+), 46 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index ec0ecb4e4ca6..9c461edb0b73 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -11,8 +11,6 @@
int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt) { - unsigned int mode; - void *vaddr; int err;
memset(spin, 0, sizeof(*spin)); @@ -23,6 +21,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt) err = PTR_ERR(spin->hws); goto err; } + i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
spin->obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE); if (IS_ERR(spin->obj)) { @@ -30,34 +29,83 @@ int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt) goto err_hws; }
- i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC); - vaddr = i915_gem_object_pin_map(spin->hws, I915_MAP_WB); - if (IS_ERR(vaddr)) { - err = PTR_ERR(vaddr); - goto err_obj; - } - spin->seqno = memset(vaddr, 0xff, PAGE_SIZE); - - mode = i915_coherent_map_type(gt->i915); - vaddr = i915_gem_object_pin_map(spin->obj, mode); - if (IS_ERR(vaddr)) { - err = PTR_ERR(vaddr); - goto err_unpin_hws; - } - spin->batch = vaddr; - return 0;
-err_unpin_hws: - i915_gem_object_unpin_map(spin->hws); -err_obj: - i915_gem_object_put(spin->obj); err_hws: i915_gem_object_put(spin->hws); err: return err; }
+static void *igt_spinner_pin_obj(struct intel_context *ce, + struct i915_gem_ww_ctx *ww, + struct drm_i915_gem_object *obj, + unsigned int mode, struct i915_vma **vma) +{ + void *vaddr; + int ret; + + *vma = i915_vma_instance(obj, ce->vm, NULL); + if (IS_ERR(*vma)) + return ERR_CAST(*vma); + + ret = i915_gem_object_lock(obj, ww); + if (ret) + return ERR_PTR(ret); + + vaddr = i915_gem_object_pin_map(obj, mode); + + if (!ww) + i915_gem_object_unlock(obj); + + if (IS_ERR(vaddr)) + return vaddr; + + if (ww) + ret = i915_vma_pin_ww(*vma, ww, 0, 0, PIN_USER); + else + ret = i915_vma_pin(*vma, 0, 0, PIN_USER); + + if (ret) { + i915_gem_object_unpin_map(obj); + return ERR_PTR(ret); + } + + return vaddr; +} + +int igt_spinner_pin(struct igt_spinner *spin, + struct intel_context *ce, + struct i915_gem_ww_ctx *ww) +{ + void *vaddr; + + if (spin->ce && WARN_ON(spin->ce != ce)) + return -ENODEV; + spin->ce = ce; + + if (!spin->seqno) { + vaddr = igt_spinner_pin_obj(ce, ww, spin->hws, I915_MAP_WB, &spin->hws_vma); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr); + + spin->seqno = memset(vaddr, 0xff, PAGE_SIZE); + } + + if (!spin->batch) { + unsigned int mode = + i915_coherent_map_type(spin->gt->i915); + + vaddr = igt_spinner_pin_obj(ce, ww, spin->obj, mode, &spin->batch_vma); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr); + + spin->batch = vaddr; + } + + return 0; +} + static unsigned int seqno_offset(u64 fence) { return offset_in_page(sizeof(u32) * fence); @@ -102,27 +150,18 @@ igt_spinner_create_request(struct igt_spinner *spin, if (!intel_engine_can_store_dword(ce->engine)) return ERR_PTR(-ENODEV);
- vma = i915_vma_instance(spin->obj, ce->vm, NULL); - if (IS_ERR(vma)) - return ERR_CAST(vma); - - hws = i915_vma_instance(spin->hws, ce->vm, NULL); - if (IS_ERR(hws)) - return ERR_CAST(hws); + if (!spin->batch) { + err = igt_spinner_pin(spin, ce, NULL); + if (err) + return ERR_PTR(err); + }
- err = i915_vma_pin(vma, 0, 0, PIN_USER); - if (err) - return ERR_PTR(err); - - err = i915_vma_pin(hws, 0, 0, PIN_USER); - if (err) - goto unpin_vma; + hws = spin->hws_vma; + vma = spin->batch_vma;
rq = intel_context_create_request(ce); - if (IS_ERR(rq)) { - err = PTR_ERR(rq); - goto unpin_hws; - } + if (IS_ERR(rq)) + return ERR_CAST(rq);
err = move_to_active(vma, rq, 0); if (err) @@ -185,10 +224,6 @@ igt_spinner_create_request(struct igt_spinner *spin, i915_request_set_error_once(rq, err); i915_request_add(rq); } -unpin_hws: - i915_vma_unpin(hws); -unpin_vma: - i915_vma_unpin(vma); return err ? ERR_PTR(err) : rq; }
@@ -202,6 +237,9 @@ hws_seqno(const struct igt_spinner *spin, const struct i915_request *rq)
void igt_spinner_end(struct igt_spinner *spin) { + if (!spin->batch) + return; + *spin->batch = MI_BATCH_BUFFER_END; intel_gt_chipset_flush(spin->gt); } @@ -210,10 +248,16 @@ void igt_spinner_fini(struct igt_spinner *spin) { igt_spinner_end(spin);
- i915_gem_object_unpin_map(spin->obj); + if (spin->batch) { + i915_vma_unpin(spin->batch_vma); + i915_gem_object_unpin_map(spin->obj); + } i915_gem_object_put(spin->obj);
- i915_gem_object_unpin_map(spin->hws); + if (spin->seqno) { + i915_vma_unpin(spin->hws_vma); + i915_gem_object_unpin_map(spin->hws); + } i915_gem_object_put(spin->hws); }
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.h b/drivers/gpu/drm/i915/selftests/igt_spinner.h index ec62c9ef320b..fbe5b1625b05 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.h +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.h @@ -20,11 +20,16 @@ struct igt_spinner { struct intel_gt *gt; struct drm_i915_gem_object *hws; struct drm_i915_gem_object *obj; + struct intel_context *ce; + struct i915_vma *hws_vma, *batch_vma; u32 *batch; void *seqno; };
int igt_spinner_init(struct igt_spinner *spin, struct intel_gt *gt); +int igt_spinner_pin(struct igt_spinner *spin, + struct intel_context *ce, + struct i915_gem_ww_ctx *ww); void igt_spinner_fini(struct igt_spinner *spin);
struct i915_request *
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
i915_gem_object_pin_map potentially needs a ww context, so ensure we have one we can revoke.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 163208a6260d..2561a2f1e54f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -421,7 +421,9 @@ vm_access(struct vm_area_struct *area, unsigned long addr, { struct i915_mmap_offset *mmo = area->vm_private_data; struct drm_i915_gem_object *obj = mmo->obj; + struct i915_gem_ww_ctx ww; void *vaddr; + int err = 0;
if (i915_gem_object_is_readonly(obj) && write) return -EACCES; @@ -430,10 +432,18 @@ vm_access(struct vm_area_struct *area, unsigned long addr, if (addr >= obj->base.size) return -EINVAL;
+ i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(obj, &ww); + if (err) + goto out; + /* As this is primarily for debugging, let's focus on simplicity */ vaddr = i915_gem_object_pin_map(obj, I915_MAP_FORCE_WC); - if (IS_ERR(vaddr)) - return PTR_ERR(vaddr); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + goto out; + }
if (write) { memcpy(vaddr + addr, buf, len); @@ -443,6 +453,16 @@ vm_access(struct vm_area_struct *area, unsigned long addr, }
i915_gem_object_unpin_map(obj); +out: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + + if (err) + return err;
return len; }
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We need to lock a few more objects, some temporarily, add ww lock where needed.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/i915_perf.c | 56 ++++++++++++++++++++++++-------- 1 file changed, 43 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 0b300e0d9561..1f574d29ece5 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1587,7 +1587,7 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream) stream->oa_buffer.vma = vma;
stream->oa_buffer.vaddr = - i915_gem_object_pin_map(bo, I915_MAP_WB); + i915_gem_object_pin_map_unlocked(bo, I915_MAP_WB); if (IS_ERR(stream->oa_buffer.vaddr)) { ret = PTR_ERR(stream->oa_buffer.vaddr); goto err_unpin; @@ -1640,6 +1640,7 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) const u32 base = stream->engine->mmio_base; #define CS_GPR(x) GEN8_RING_CS_GPR(base, x) u32 *batch, *ts0, *cs, *jump; + struct i915_gem_ww_ctx ww; int ret, i; enum { START_TS, @@ -1657,15 +1658,21 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) return PTR_ERR(bo); }
+ i915_gem_ww_ctx_init(&ww, true); +retry: + ret = i915_gem_object_lock(bo, &ww); + if (ret) + goto out_ww; + /* * We pin in GGTT because we jump into this buffer now because * multiple OA config BOs will have a jump to this address and it * needs to be fixed during the lifetime of the i915/perf stream. */ - vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 0, PIN_HIGH); + vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH); if (IS_ERR(vma)) { ret = PTR_ERR(vma); - goto err_unref; + goto out_ww; }
batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB); @@ -1799,12 +1806,19 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) __i915_gem_object_release_map(bo);
stream->noa_wait = vma; - return 0; + goto out_ww;
err_unpin: i915_vma_unpin_and_release(&vma, 0); -err_unref: - i915_gem_object_put(bo); +out_ww: + if (ret == -EDEADLK) { + ret = i915_gem_ww_ctx_backoff(&ww); + if (!ret) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + if (ret) + i915_gem_object_put(bo); return ret; }
@@ -1847,6 +1861,7 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, { struct drm_i915_gem_object *obj; struct i915_oa_config_bo *oa_bo; + struct i915_gem_ww_ctx ww; size_t config_length = 0; u32 *cs; int err; @@ -1867,10 +1882,16 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, goto err_free; }
+ i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(obj, &ww); + if (err) + goto out_ww; + cs = i915_gem_object_pin_map(obj, I915_MAP_WB); if (IS_ERR(cs)) { err = PTR_ERR(cs); - goto err_oa_bo; + goto out_ww; }
cs = write_cs_mi_lri(cs, @@ -1898,19 +1919,28 @@ alloc_oa_config_buffer(struct i915_perf_stream *stream, NULL); if (IS_ERR(oa_bo->vma)) { err = PTR_ERR(oa_bo->vma); - goto err_oa_bo; + goto out_ww; }
oa_bo->oa_config = i915_oa_config_get(oa_config); llist_add(&oa_bo->node, &stream->oa_config_bos);
- return oa_bo; +out_ww: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww);
-err_oa_bo: - i915_gem_object_put(obj); + if (err) + i915_gem_object_put(obj); err_free: - kfree(oa_bo); - return ERR_PTR(err); + if (err) { + kfree(oa_bo); + return ERR_PTR(err); + } + return oa_bo; }
static struct i915_vma *
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
In the ucode functions, the calls are done before userspace runs, when debugging using debugfs, or when creating semi-permanent mappings; we can safely use the unlocked versions that does the ww dance for us.
Because there is no pin_pages_unlocked yet, add it as convenience function.
This removes possible lockdep splats about missing resv lock for ucode.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 ++ drivers/gpu/drm/i915/gem/i915_gem_pages.c | 20 ++++++++++++++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 4 ++-- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 2 +- 6 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 26ef37532f81..1d4b44151e0c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -358,6 +358,8 @@ i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) return __i915_gem_object_get_pages(obj); }
+int i915_gem_object_pin_pages_unlocked(struct drm_i915_gem_object *obj); + static inline bool i915_gem_object_has_pages(struct drm_i915_gem_object *obj) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 183aae046b68..79336735a6e4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -136,6 +136,26 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj) return err; }
+int i915_gem_object_pin_pages_unlocked(struct drm_i915_gem_object *obj) +{ + struct i915_gem_ww_ctx ww; + int err; + + i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(obj, &ww); + if (!err) + err = i915_gem_object_pin_pages(obj); + + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + return err; +} + /* Immediately discard the backing storage */ void i915_gem_object_truncate(struct drm_i915_gem_object *obj) { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 2a343a977987..a65661eb5d5d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -694,7 +694,7 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size, if (IS_ERR(vma)) return PTR_ERR(vma);
- vaddr = i915_gem_object_pin_map(vma->obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB); if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); return PTR_ERR(vaddr); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index 9bbe8a795cb8..8dc8678e7ab0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -335,7 +335,7 @@ static int guc_log_map(struct intel_guc_log *log) * buffer pages, so that we can directly get the data * (up-to-date) from memory. */ - vaddr = i915_gem_object_pin_map(log->vma->obj, I915_MAP_WC); + vaddr = i915_gem_object_pin_map_unlocked(log->vma->obj, I915_MAP_WC); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
@@ -744,7 +744,7 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p, if (!obj) return 0;
- map = i915_gem_object_pin_map(obj, I915_MAP_WC); + map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(map)) { DRM_DEBUG("Failed to pin object\n"); drm_puts(p, "(log data unaccessible)\n"); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 65eeb44b397d..2126dd81ac38 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,7 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) if (IS_ERR(vma)) return PTR_ERR(vma);
- vaddr = i915_gem_object_pin_map(vma->obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB); if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); return PTR_ERR(vaddr); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index 180c23e2e25e..b05076d190cc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -541,7 +541,7 @@ int intel_uc_fw_init(struct intel_uc_fw *uc_fw) if (!intel_uc_fw_is_available(uc_fw)) return -ENOEXEC;
- err = i915_gem_object_pin_pages(uc_fw->obj); + err = i915_gem_object_pin_pages_unlocked(uc_fw->obj); if (err) { DRM_DEBUG_DRIVER("%s fw pin-pages err=%d\n", intel_uc_fw_type_repr(uc_fw->type), err);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
vmap is using pin_pages, but needs to use ww locking, add pin_pages_unlocked to correctly lock the mapping.
Also add ww locking to begin/end cpu access.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 60 ++++++++++++---------- 1 file changed, 33 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 36e3c2765f4c..c4b01e819786 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -82,7 +82,7 @@ static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map *map struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); void *vaddr;
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
@@ -123,42 +123,48 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire { struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); bool write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE); + struct i915_gem_ww_ctx ww; int err;
- err = i915_gem_object_pin_pages(obj); - if (err) - return err; - - err = i915_gem_object_lock_interruptible(obj, NULL); - if (err) - goto out; - - err = i915_gem_object_set_to_cpu_domain(obj, write); - i915_gem_object_unlock(obj); - -out: - i915_gem_object_unpin_pages(obj); + i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(obj, &ww); + if (!err) + err = i915_gem_object_pin_pages(obj); + if (!err) { + err = i915_gem_object_set_to_cpu_domain(obj, write); + i915_gem_object_unpin_pages(obj); + } + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); return err; }
static int i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) { struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + struct i915_gem_ww_ctx ww; int err;
- err = i915_gem_object_pin_pages(obj); - if (err) - return err; - - err = i915_gem_object_lock_interruptible(obj, NULL); - if (err) - goto out; - - err = i915_gem_object_set_to_gtt_domain(obj, false); - i915_gem_object_unlock(obj); - -out: - i915_gem_object_unpin_pages(obj); + i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(obj, &ww); + if (!err) + err = i915_gem_object_pin_pages(obj); + if (!err) { + err = i915_gem_object_set_to_gtt_domain(obj, false); + i915_gem_object_unpin_pages(obj); + } + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); return err; }
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Because of the long lifetime of the mapping, we cannot wrap this in a simple limited ww lock. Just use the unlocked version of pin_map, because we'll likely release the mapping a lot later, in a different thread.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/display/intel_dsb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dsb.c b/drivers/gpu/drm/i915/display/intel_dsb.c index 566fa72427b3..857126822a88 100644 --- a/drivers/gpu/drm/i915/display/intel_dsb.c +++ b/drivers/gpu/drm/i915/display/intel_dsb.c @@ -293,7 +293,7 @@ void intel_dsb_prepare(struct intel_crtc_state *crtc_state) goto out; }
- buf = i915_gem_object_pin_map(vma->obj, I915_MAP_WC); + buf = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WC); if (IS_ERR(buf)) { drm_err(&i915->drm, "Command buffer creation failed\n"); i915_vma_unpin_and_release(&vma, I915_VMA_RELEASE_MAP);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Quick fix, just use the unlocked version.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/shmem_utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index f011ea42487e..041e2a50160d 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -39,7 +39,7 @@ struct file *shmem_create_from_object(struct drm_i915_gem_object *obj) return file; }
- ptr = i915_gem_object_pin_map(obj, I915_MAP_WB); + ptr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(ptr)) return ERR_CAST(ptr);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We may create page table objects on the fly, but we may need to wait with the ww lock held. Instead of waiting on a freed obj lock, ensure we have the same lock for each object to keep -EDEADLK working. This ensures that i915_vma_pin_ww can lock the page tables when required.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 8 +++++- drivers/gpu/drm/i915/gt/intel_gtt.c | 38 ++++++++++++++++++++++++++- drivers/gpu/drm/i915/gt/intel_gtt.h | 5 ++++ drivers/gpu/drm/i915/gt/intel_ppgtt.c | 3 ++- drivers/gpu/drm/i915/i915_vma.c | 4 +++ 5 files changed, 55 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 60bd2c8ed8b0..17ecaef1834d 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -615,7 +615,9 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt) if (err) goto err_ppgtt;
+ i915_gem_object_lock(ppgtt->vm.scratch[0], NULL); err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash); + i915_gem_object_unlock(ppgtt->vm.scratch[0]); if (err) goto err_stash;
@@ -702,6 +704,7 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
mutex_unlock(&ggtt->vm.mutex); i915_address_space_fini(&ggtt->vm); + dma_resv_fini(&ggtt->vm.resv);
arch_phys_wc_del(ggtt->mtrr);
@@ -1078,6 +1081,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt) ggtt->vm.gt = gt; ggtt->vm.i915 = i915; ggtt->vm.dma = &i915->drm.pdev->dev; + dma_resv_init(&ggtt->vm.resv);
if (INTEL_GEN(i915) <= 5) ret = i915_gmch_probe(ggtt); @@ -1085,8 +1089,10 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt) ret = gen6_gmch_probe(ggtt); else ret = gen8_gmch_probe(ggtt); - if (ret) + if (ret) { + dma_resv_fini(&ggtt->vm.resv); return ret; + }
if ((ggtt->vm.total - 1) >> 32) { drm_err(&i915->drm, diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 7bfe9072be9a..070d538cdc56 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -13,16 +13,36 @@
struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) { + struct drm_i915_gem_object *obj; + if (I915_SELFTEST_ONLY(should_fail(&vm->fault_attr, 1))) i915_gem_shrink_all(vm->i915);
- return i915_gem_object_create_internal(vm->i915, sz); + obj = i915_gem_object_create_internal(vm->i915, sz); + /* ensure all dma objects have the same reservation class */ + if (!IS_ERR(obj)) + obj->base.resv = &vm->resv; + return obj; }
int pin_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { int err;
+ i915_gem_object_lock(obj, NULL); + err = i915_gem_object_pin_pages(obj); + i915_gem_object_unlock(obj); + if (err) + return err; + + i915_gem_object_make_unshrinkable(obj); + return 0; +} + +int pin_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +{ + int err; + err = i915_gem_object_pin_pages(obj); if (err) return err; @@ -56,6 +76,20 @@ void __i915_vm_close(struct i915_address_space *vm) mutex_unlock(&vm->mutex); }
+/* lock the vm into the current ww, if we lock one, we lock all */ +int i915_vm_lock_objects(struct i915_address_space *vm, + struct i915_gem_ww_ctx *ww) +{ + if (vm->scratch[0]->base.resv == &vm->resv) { + return i915_gem_object_lock(vm->scratch[0], ww); + } else { + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + + /* We borrowed the scratch page from ggtt, take the top level object */ + return i915_gem_object_lock(ppgtt->pd->pt.base, ww); + } +} + void i915_address_space_fini(struct i915_address_space *vm) { drm_mm_takedown(&vm->mm); @@ -69,6 +103,7 @@ static void __i915_vm_release(struct work_struct *work)
vm->cleanup(vm); i915_address_space_fini(vm); + dma_resv_fini(&vm->resv);
kfree(vm); } @@ -98,6 +133,7 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass) mutex_init(&vm->mutex); lockdep_set_subclass(&vm->mutex, subclass); i915_gem_shrinker_taints_mutex(vm->i915, &vm->mutex); + dma_resv_init(&vm->resv);
GEM_BUG_ON(!vm->total); drm_mm_init(&vm->mm, 0, vm->total); diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 8a33940a71f3..16063b2f0119 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -238,6 +238,7 @@ struct i915_address_space { atomic_t open;
struct mutex mutex; /* protects vma and our lists */ + struct dma_resv resv; /* reservation lock for all pd objects, and buffer pool */ #define VM_CLASS_GGTT 0 #define VM_CLASS_PPGTT 1
@@ -346,6 +347,9 @@ struct i915_ppgtt {
#define i915_is_ggtt(vm) ((vm)->is_ggtt)
+int __must_check +i915_vm_lock_objects(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww); + static inline bool i915_vm_is_4lvl(const struct i915_address_space *vm) { @@ -522,6 +526,7 @@ struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde);
int pin_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj); +int pin_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj);
void free_px(struct i915_address_space *vm, struct i915_page_table *pt, int lvl); diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 46d9aceda64c..f3ac47702aee 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -262,7 +262,7 @@ int i915_vm_pin_pt_stash(struct i915_address_space *vm,
for (n = 0; n < ARRAY_SIZE(stash->pt); n++) { for (pt = stash->pt[n]; pt; pt = pt->stash) { - err = pin_pt_dma(vm, pt->base); + err = pin_pt_dma_locked(vm, pt->base); if (err) return err; } @@ -304,6 +304,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt) ppgtt->vm.dma = &i915->drm.pdev->dev; ppgtt->vm.total = BIT_ULL(INTEL_INFO(i915)->ppgtt_size);
+ dma_resv_init(&ppgtt->vm.resv); i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
ppgtt->vm.vma_ops.bind_vma = ppgtt_bind_vma; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 63bdb0cc981e..0c7e4191811a 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -908,6 +908,10 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, if (err) goto err_fence;
+ err = i915_vm_lock_objects(vma->vm, ww); + if (err) + goto err_fence; + err = i915_vm_pin_pt_stash(vma->vm, &work->stash); if (err)
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Straightforward conversion, just convert a bunch of calls to unlocked versions.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../gpu/drm/i915/gem/selftests/huge_pages.c | 28 ++++++++++++++----- 1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 709c63b9cfc4..586d8bafd7de 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -589,7 +589,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg) goto out_put; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) goto out_put;
@@ -653,15 +653,19 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg) break; }
+ i915_gem_object_lock(obj, NULL); i915_gem_object_unpin_pages(obj); __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); i915_gem_object_put(obj); }
return 0;
out_unpin: + i915_gem_object_lock(obj, NULL); i915_gem_object_unpin_pages(obj); + i915_gem_object_unlock(obj); out_put: i915_gem_object_put(obj);
@@ -675,8 +679,10 @@ static void close_object_list(struct list_head *objects,
list_for_each_entry_safe(obj, on, objects, st_link) { list_del(&obj->st_link); + i915_gem_object_lock(obj, NULL); i915_gem_object_unpin_pages(obj); __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); i915_gem_object_put(obj); } } @@ -713,7 +719,7 @@ static int igt_mock_ppgtt_huge_fill(void *arg) break; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { i915_gem_object_put(obj); break; @@ -889,7 +895,7 @@ static int igt_mock_ppgtt_64K(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj);
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) goto out_object_put;
@@ -943,8 +949,10 @@ static int igt_mock_ppgtt_64K(void *arg) }
i915_vma_unpin(vma); + i915_gem_object_lock(obj, NULL); i915_gem_object_unpin_pages(obj); __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); i915_gem_object_put(obj); } } @@ -954,7 +962,9 @@ static int igt_mock_ppgtt_64K(void *arg) out_vma_unpin: i915_vma_unpin(vma); out_object_unpin: + i915_gem_object_lock(obj, NULL); i915_gem_object_unpin_pages(obj); + i915_gem_object_unlock(obj); out_object_put: i915_gem_object_put(obj);
@@ -1024,7 +1034,7 @@ static int __cpu_check_vmap(struct drm_i915_gem_object *obj, u32 dword, u32 val) if (err) return err;
- ptr = i915_gem_object_pin_map(obj, I915_MAP_WC); + ptr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(ptr)) return PTR_ERR(ptr);
@@ -1304,7 +1314,7 @@ static int igt_ppgtt_smoke_huge(void *arg) return err; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { if (err == -ENXIO || err == -E2BIG) { i915_gem_object_put(obj); @@ -1327,8 +1337,10 @@ static int igt_ppgtt_smoke_huge(void *arg) __func__, size, i); } out_unpin: + i915_gem_object_lock(obj, NULL); i915_gem_object_unpin_pages(obj); __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); out_put: i915_gem_object_put(obj);
@@ -1402,7 +1414,7 @@ static int igt_ppgtt_sanity_check(void *arg) return err; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { i915_gem_object_put(obj); goto out; @@ -1416,8 +1428,10 @@ static int igt_ppgtt_sanity_check(void *arg)
err = igt_write_huge(ctx, obj);
+ i915_gem_object_lock(obj, NULL); i915_gem_object_unpin_pages(obj); __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); i915_gem_object_put(obj);
if (err) { @@ -1462,7 +1476,7 @@ static int igt_tmpfs_fallback(void *arg) goto out_restore; }
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto out_put;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Straightforward conversion, just convert a bunch of calls to unlocked versions.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index 4e36d4897ea6..cc782569765f 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -47,7 +47,7 @@ static int __igt_client_fill(struct intel_engine_cs *engine) goto err_flush; }
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_put; @@ -159,7 +159,7 @@ static int prepare_blit(const struct tiled_blits *t, u32 src_pitch, dst_pitch; u32 cmd, *cs;
- cs = i915_gem_object_pin_map(batch, I915_MAP_WC); + cs = i915_gem_object_pin_map_unlocked(batch, I915_MAP_WC); if (IS_ERR(cs)) return PTR_ERR(cs);
@@ -379,7 +379,7 @@ static int verify_buffer(const struct tiled_blits *t, y = i915_prandom_u32_max_state(t->height, prng); p = y * t->width + x;
- vaddr = i915_gem_object_pin_map(buf->vma->obj, I915_MAP_WC); + vaddr = i915_gem_object_pin_map_unlocked(buf->vma->obj, I915_MAP_WC); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
@@ -566,7 +566,7 @@ static int tiled_blits_prepare(struct tiled_blits *t, int err; int i;
- map = i915_gem_object_pin_map(t->scratch.vma->obj, I915_MAP_WC); + map = i915_gem_object_pin_map_unlocked(t->scratch.vma->obj, I915_MAP_WC); if (IS_ERR(map)) return PTR_ERR(map);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Straightforward conversion, just convert a bunch of calls to unlocked versions.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c index 2e439bb269d6..42aa3c5e0621 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c @@ -159,7 +159,7 @@ static int wc_set(struct context *ctx, unsigned long offset, u32 v) if (err) return err;
- map = i915_gem_object_pin_map(ctx->obj, I915_MAP_WC); + map = i915_gem_object_pin_map_unlocked(ctx->obj, I915_MAP_WC); if (IS_ERR(map)) return PTR_ERR(map);
@@ -182,7 +182,7 @@ static int wc_get(struct context *ctx, unsigned long offset, u32 *v) if (err) return err;
- map = i915_gem_object_pin_map(ctx->obj, I915_MAP_WC); + map = i915_gem_object_pin_map_unlocked(ctx->obj, I915_MAP_WC); if (IS_ERR(map)) return PTR_ERR(map);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Straightforward conversion, just convert a bunch of calls to unlocked versions.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index d3f87dc4eda3..5fef592390cb 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1094,7 +1094,7 @@ __read_slice_count(struct intel_context *ce, if (ret < 0) return ret;
- buf = i915_gem_object_pin_map(obj, I915_MAP_WB); + buf = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(buf)) { ret = PTR_ERR(buf); return ret; @@ -1511,7 +1511,7 @@ static int write_to_scratch(struct i915_gem_context *ctx, if (IS_ERR(obj)) return PTR_ERR(obj);
- cmd = i915_gem_object_pin_map(obj, I915_MAP_WB); + cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto out; @@ -1622,7 +1622,7 @@ static int read_from_scratch(struct i915_gem_context *ctx, if (err) goto out_vm;
- cmd = i915_gem_object_pin_map(obj, I915_MAP_WB); + cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto out; @@ -1658,7 +1658,7 @@ static int read_from_scratch(struct i915_gem_context *ctx, if (err) goto out_vm;
- cmd = i915_gem_object_pin_map(obj, I915_MAP_WB); + cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto out; @@ -1715,7 +1715,7 @@ static int read_from_scratch(struct i915_gem_context *ctx, if (err) goto out_vm;
- cmd = i915_gem_object_pin_map(obj, I915_MAP_WB); + cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto out_vm;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Use pin_pages_unlocked() where we don't have a lock.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c index b6d43880b0c1..dd74bc09ec88 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c @@ -194,7 +194,7 @@ static int igt_dmabuf_import_ownership(void *arg)
dma_buf_put(dmabuf);
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { pr_err("i915_gem_object_pin_pages failed with err=%d\n", err); goto out_obj;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Also quite simple, a single call needs to use the unlocked version.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c index e1d50a5a1477..4df505e4c53a 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c @@ -116,7 +116,7 @@ static int igt_gpu_reloc(void *arg) if (IS_ERR(scratch)) return PTR_ERR(scratch);
- map = i915_gem_object_pin_map(scratch, I915_MAP_WC); + map = i915_gem_object_pin_map_unlocked(scratch, I915_MAP_WC); if (IS_ERR(map)) { err = PTR_ERR(map); goto err_scratch;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Ensure we hold the lock around put_pages, and use the unlocked wrappers for pinning pages and mappings.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 3ac7628f3bc4..85fff8bed08c 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -321,7 +321,7 @@ static int igt_partial_tiling(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj);
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { pr_err("Failed to allocate %u pages (%lu total), err=%d\n", nreal, obj->base.size / PAGE_SIZE, err); @@ -458,7 +458,7 @@ static int igt_smoke_tiling(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj);
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { pr_err("Failed to allocate %u pages (%lu total), err=%d\n", nreal, obj->base.size / PAGE_SIZE, err); @@ -797,7 +797,7 @@ static int wc_set(struct drm_i915_gem_object *obj) { void *vaddr;
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
@@ -813,7 +813,7 @@ static int wc_check(struct drm_i915_gem_object *obj) void *vaddr; int err = 0;
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
@@ -1315,7 +1315,9 @@ static int __igt_mmap_revoke(struct drm_i915_private *i915, }
if (type != I915_MMAP_TYPE_GTT) { + i915_gem_object_lock(obj, NULL); __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); if (i915_gem_object_has_pages(obj)) { pr_err("Failed to put-pages object!\n"); err = -EINVAL;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Convert a single pin_pages call to use the unlocked version.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c index bf853c40ec65..740ee8086a27 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c @@ -47,7 +47,7 @@ static int igt_gem_huge(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj);
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { pr_err("Failed to allocate %u pages (%lu total), err=%d\n", nreal, obj->base.size / PAGE_SIZE, err);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Use some unlocked versions where we're not holding the ww lock.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c index 23b6e11bbc3e..ee9496f3d11d 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c @@ -262,7 +262,7 @@ static int igt_fill_blt_thread(void *arg) goto err_flush; }
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_put; @@ -380,7 +380,7 @@ static int igt_copy_blt_thread(void *arg) goto err_flush; }
- vaddr = i915_gem_object_pin_map(src, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(src, I915_MAP_WB); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_put_src; @@ -400,7 +400,7 @@ static int igt_copy_blt_thread(void *arg) goto err_put_src; }
- vaddr = i915_gem_object_pin_map(dst, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(dst, I915_MAP_WB); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_put_dst;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
igt_emit_store_dw needs to use the unlocked version, as it's not holding a lock. This fixes igt_gpu_fill_dw() which is used by some other selftests.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c index e21b5023ca7d..f4e85b4a347d 100644 --- a/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c +++ b/drivers/gpu/drm/i915/gem/selftests/igt_gem_utils.c @@ -54,7 +54,7 @@ igt_emit_store_dw(struct i915_vma *vma, if (IS_ERR(obj)) return ERR_CAST(obj);
- cmd = i915_gem_object_pin_map(obj, I915_MAP_WC); + cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto err;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Only needs to convert a single call to the unlocked version.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/selftest_context.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 1f4020e906a8..d9b0ebc938f1 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -88,8 +88,8 @@ static int __live_context_size(struct intel_engine_cs *engine) if (err) goto err;
- vaddr = i915_gem_object_pin_map(ce->state->obj, - i915_coherent_map_type(engine->i915)); + vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, + i915_coherent_map_type(engine->i915)); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); intel_context_unpin(ce);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Convert a few calls to use the unlocked versions.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index fb5ebf930ab2..e3027cebab5b 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -80,15 +80,15 @@ static int hang_init(struct hang *h, struct intel_gt *gt) }
i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC); - vaddr = i915_gem_object_pin_map(h->hws, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(h->hws, I915_MAP_WB); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_obj; } h->seqno = memset(vaddr, 0xff, PAGE_SIZE);
- vaddr = i915_gem_object_pin_map(h->obj, - i915_coherent_map_type(gt->i915)); + vaddr = i915_gem_object_pin_map_unlocked(h->obj, + i915_coherent_map_type(gt->i915)); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_unpin_hws; @@ -149,7 +149,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) return ERR_CAST(obj); }
- vaddr = i915_gem_object_pin_map(obj, i915_coherent_map_type(gt->i915)); + vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915)); if (IS_ERR(vaddr)) { i915_gem_object_put(obj); i915_vm_put(vm);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Convert normal functions to unlocked versions where needed.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/selftest_execlists.c | 34 ++++++++++---------- 1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 95d41c01d0e0..124011f6fb51 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -1007,7 +1007,7 @@ static int live_timeslice_preempt(void *arg) goto err_obj; }
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_obj; @@ -1315,7 +1315,7 @@ static int live_timeslice_queue(void *arg) goto err_obj; }
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_obj; @@ -1562,7 +1562,7 @@ static int live_busywait_preempt(void *arg) goto err_ctx_lo; }
- map = i915_gem_object_pin_map(obj, I915_MAP_WC); + map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(map)) { err = PTR_ERR(map); goto err_obj; @@ -2678,7 +2678,7 @@ static int create_gang(struct intel_engine_cs *engine, if (err) goto err_obj;
- cs = i915_gem_object_pin_map(obj, I915_MAP_WC); + cs = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(cs)) goto err_obj;
@@ -2960,7 +2960,7 @@ static int live_preempt_gang(void *arg) * it will terminate the next lowest spinner until there * are no more spinners and the gang is complete. */ - cs = i915_gem_object_pin_map(rq->batch->obj, I915_MAP_WC); + cs = i915_gem_object_pin_map_unlocked(rq->batch->obj, I915_MAP_WC); if (!IS_ERR(cs)) { *cs = 0; i915_gem_object_unpin_map(rq->batch->obj); @@ -3025,7 +3025,7 @@ create_gpr_user(struct intel_engine_cs *engine, return ERR_PTR(err); }
- cs = i915_gem_object_pin_map(obj, I915_MAP_WC); + cs = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(cs)) { i915_vma_put(vma); return ERR_CAST(cs); @@ -3235,7 +3235,7 @@ static int live_preempt_user(void *arg) if (IS_ERR(global)) return PTR_ERR(global);
- result = i915_gem_object_pin_map(global->obj, I915_MAP_WC); + result = i915_gem_object_pin_map_unlocked(global->obj, I915_MAP_WC); if (IS_ERR(result)) { i915_vma_unpin_and_release(&global, 0); return PTR_ERR(result); @@ -3628,7 +3628,7 @@ static int live_preempt_smoke(void *arg) goto err_free; }
- cs = i915_gem_object_pin_map(smoke.batch, I915_MAP_WB); + cs = i915_gem_object_pin_map_unlocked(smoke.batch, I915_MAP_WB); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto err_batch; @@ -4231,7 +4231,7 @@ static int preserved_virtual_engine(struct intel_gt *gt, goto out_end; }
- cs = i915_gem_object_pin_map(scratch->obj, I915_MAP_WB); + cs = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto out_end; @@ -5259,7 +5259,7 @@ static int __live_lrc_gpr(struct intel_engine_cs *engine, goto err_rq; }
- cs = i915_gem_object_pin_map(scratch->obj, I915_MAP_WB); + cs = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto err_rq; @@ -5553,7 +5553,7 @@ store_context(struct intel_context *ce, struct i915_vma *scratch) if (IS_ERR(batch)) return batch;
- cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC); + cs = i915_gem_object_pin_map_unlocked(batch->obj, I915_MAP_WC); if (IS_ERR(cs)) { i915_vma_put(batch); return ERR_CAST(cs); @@ -5717,7 +5717,7 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison) if (IS_ERR(batch)) return batch;
- cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC); + cs = i915_gem_object_pin_map_unlocked(batch->obj, I915_MAP_WC); if (IS_ERR(cs)) { i915_vma_put(batch); return ERR_CAST(cs); @@ -5831,29 +5831,29 @@ static int compare_isolation(struct intel_engine_cs *engine, u32 *defaults; int err = 0;
- A[0] = i915_gem_object_pin_map(ref[0]->obj, I915_MAP_WC); + A[0] = i915_gem_object_pin_map_unlocked(ref[0]->obj, I915_MAP_WC); if (IS_ERR(A[0])) return PTR_ERR(A[0]);
- A[1] = i915_gem_object_pin_map(ref[1]->obj, I915_MAP_WC); + A[1] = i915_gem_object_pin_map_unlocked(ref[1]->obj, I915_MAP_WC); if (IS_ERR(A[1])) { err = PTR_ERR(A[1]); goto err_A0; }
- B[0] = i915_gem_object_pin_map(result[0]->obj, I915_MAP_WC); + B[0] = i915_gem_object_pin_map_unlocked(result[0]->obj, I915_MAP_WC); if (IS_ERR(B[0])) { err = PTR_ERR(B[0]); goto err_A1; }
- B[1] = i915_gem_object_pin_map(result[1]->obj, I915_MAP_WC); + B[1] = i915_gem_object_pin_map_unlocked(result[1]->obj, I915_MAP_WC); if (IS_ERR(B[1])) { err = PTR_ERR(B[1]); goto err_B0; }
- lrc = i915_gem_object_pin_map(ce->state->obj, + lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, i915_coherent_map_type(engine->i915)); if (IS_ERR(lrc)) { err = PTR_ERR(lrc);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Use pin_map_unlocked when we're not holding locks.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/selftest_mocs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c index 21dcd91cbd62..eadb41b76d33 100644 --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c @@ -105,7 +105,7 @@ static int live_mocs_init(struct live_mocs *arg, struct intel_gt *gt) if (IS_ERR(arg->scratch)) return PTR_ERR(arg->scratch);
- arg->vaddr = i915_gem_object_pin_map(arg->scratch->obj, I915_MAP_WB); + arg->vaddr = i915_gem_object_pin_map_unlocked(arg->scratch->obj, I915_MAP_WB); if (IS_ERR(arg->vaddr)) { err = PTR_ERR(arg->vaddr); goto err_scratch;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Use unlocked versions when the ww lock is not held.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/selftest_ring_submission.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c index 3350e7c995bc..99609271c3a7 100644 --- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c @@ -35,7 +35,7 @@ static struct i915_vma *create_wally(struct intel_engine_cs *engine) return ERR_PTR(err); }
- cs = i915_gem_object_pin_map(obj, I915_MAP_WC); + cs = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(cs)) { i915_gem_object_put(obj); return ERR_CAST(cs); @@ -212,7 +212,7 @@ static int __live_ctx_switch_wa(struct intel_engine_cs *engine) if (IS_ERR(bb)) return PTR_ERR(bb);
- result = i915_gem_object_pin_map(bb->obj, I915_MAP_WC); + result = i915_gem_object_pin_map_unlocked(bb->obj, I915_MAP_WC); if (IS_ERR(result)) { intel_context_put(bb->private); i915_vma_unpin_and_release(&bb, 0);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We can no longer call intel_timeline_pin with a null argument, so add a ww loop that locks the backing object.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/selftest_timeline.c | 28 ++++++++++++++++++--- 1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c index 7435abf5a703..d468147a03de 100644 --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c @@ -37,6 +37,26 @@ static unsigned long hwsp_cacheline(struct intel_timeline *tl) return (address + offset_in_page(tl->hwsp_offset)) / CACHELINE_BYTES; }
+static int selftest_tl_pin(struct intel_timeline *tl) +{ + struct i915_gem_ww_ctx ww; + int err; + + i915_gem_ww_ctx_init(&ww, false); +retry: + err = i915_gem_object_lock(tl->hwsp_ggtt->obj, &ww); + if (!err) + err = intel_timeline_pin(tl, &ww); + + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + return err; +} + #define CACHELINES_PER_PAGE (PAGE_SIZE / CACHELINE_BYTES)
struct mock_hwsp_freelist { @@ -78,7 +98,7 @@ static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state, if (IS_ERR(tl)) return PTR_ERR(tl);
- err = intel_timeline_pin(tl, NULL); + err = selftest_tl_pin(tl); if (err) { intel_timeline_put(tl); return err; @@ -464,7 +484,7 @@ checked_tl_write(struct intel_timeline *tl, struct intel_engine_cs *engine, u32 struct i915_request *rq; int err;
- err = intel_timeline_pin(tl, NULL); + err = selftest_tl_pin(tl); if (err) { rq = ERR_PTR(err); goto out; @@ -664,7 +684,7 @@ static int live_hwsp_wrap(void *arg) if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline) goto out_free;
- err = intel_timeline_pin(tl, NULL); + err = selftest_tl_pin(tl); if (err) goto out_free;
@@ -811,7 +831,7 @@ static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt) if (IS_ERR(obj)) return PTR_ERR(obj);
- w->map = i915_gem_object_pin_map(obj, I915_MAP_WB); + w->map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(w->map)) { i915_gem_object_put(obj); return PTR_ERR(w->map);
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Straightforward conversion by using unlocked versions.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/selftests/i915_request.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index e424a6d1a68c..514fa109e40f 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -619,7 +619,7 @@ static struct i915_vma *empty_batch(struct drm_i915_private *i915) if (IS_ERR(obj)) return ERR_CAST(obj);
- cmd = i915_gem_object_pin_map(obj, I915_MAP_WB); + cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto err; @@ -781,7 +781,7 @@ static struct i915_vma *recursive_batch(struct drm_i915_private *i915) if (err) goto err;
- cmd = i915_gem_object_pin_map(obj, I915_MAP_WC); + cmd = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto err; @@ -816,7 +816,7 @@ static int recursive_batch_resolve(struct i915_vma *batch) { u32 *cmd;
- cmd = i915_gem_object_pin_map(batch->obj, I915_MAP_WC); + cmd = i915_gem_object_pin_map_unlocked(batch->obj, I915_MAP_WC); if (IS_ERR(cmd)) return PTR_ERR(cmd);
@@ -1069,8 +1069,8 @@ static int live_sequential_engines(void *arg) if (!request[idx]) break;
- cmd = i915_gem_object_pin_map(request[idx]->batch->obj, - I915_MAP_WC); + cmd = i915_gem_object_pin_map_unlocked(request[idx]->batch->obj, + I915_MAP_WC); if (!IS_ERR(cmd)) { *cmd = MI_BATCH_BUFFER_END;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Use the unlocked variants for pin_map and pin_pages, and add lock around unpinning/putting pages.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- .../drm/i915/selftests/intel_memory_region.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 27389fb19951..9c20b7065fc5 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -31,10 +31,12 @@ static void close_objects(struct intel_memory_region *mem, struct drm_i915_gem_object *obj, *on;
list_for_each_entry_safe(obj, on, objects, st_link) { + i915_gem_object_lock(obj, NULL); if (i915_gem_object_has_pinned_pages(obj)) i915_gem_object_unpin_pages(obj); /* No polluting the memory region between tests */ __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); list_del(&obj->st_link); i915_gem_object_put(obj); } @@ -69,7 +71,7 @@ static int igt_mock_fill(void *arg) break; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { i915_gem_object_put(obj); break; @@ -109,7 +111,7 @@ igt_object_create(struct intel_memory_region *mem, if (IS_ERR(obj)) return obj;
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) goto put;
@@ -123,8 +125,10 @@ igt_object_create(struct intel_memory_region *mem,
static void igt_object_release(struct drm_i915_gem_object *obj) { + i915_gem_object_lock(obj, NULL); i915_gem_object_unpin_pages(obj); __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); list_del(&obj->st_link); i915_gem_object_put(obj); } @@ -356,7 +360,7 @@ static int igt_cpu_check(struct drm_i915_gem_object *obj, u32 dword, u32 val) if (err) return err;
- ptr = i915_gem_object_pin_map(obj, I915_MAP_WC); + ptr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(ptr)) return PTR_ERR(ptr);
@@ -461,7 +465,7 @@ static int igt_lmem_create(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj);
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) goto out_put;
@@ -500,7 +504,7 @@ static int igt_lmem_write_gpu(void *arg) goto out_file; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) goto out_put;
@@ -572,7 +576,7 @@ static int igt_lmem_write_cpu(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj);
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC); + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto out_put; @@ -676,7 +680,7 @@ create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, return obj; }
- addr = i915_gem_object_pin_map(obj, type); + addr = i915_gem_object_pin_map_unlocked(obj, type); if (IS_ERR(addr)) { i915_gem_object_put(obj); if (PTR_ERR(addr) == -ENXIO)
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Same as other tests, use pin_map_unlocked.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c index 729c3c7b11e2..853d1f02131a 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c @@ -72,7 +72,7 @@ static struct i915_vma *create_empty_batch(struct intel_context *ce) if (IS_ERR(obj)) return ERR_CAST(obj);
- cs = i915_gem_object_pin_map(obj, I915_MAP_WB); + cs = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto err_put; @@ -208,7 +208,7 @@ static struct i915_vma *create_nop_batch(struct intel_context *ce) if (IS_ERR(obj)) return ERR_CAST(obj);
- cs = i915_gem_object_pin_map(obj, I915_MAP_WB); + cs = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); if (IS_ERR(cs)) { err = PTR_ERR(cs); goto err_put;
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
We need to lock the global gtt dma_resv, use i915_vm_lock_objects to handle this correctly. Add ww handling for this where required.
Add the object lock around unpin/put pages, and use the unlocked versions of pin_pages and pin_map where required.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 92 ++++++++++++++----- 1 file changed, 67 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 2cfe99c79034..d07dd6780005 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -129,7 +129,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size) obj->cache_level = I915_CACHE_NONE;
/* Preallocate the "backing storage" */ - if (i915_gem_object_pin_pages(obj)) + if (i915_gem_object_pin_pages_unlocked(obj)) goto err_obj;
i915_gem_object_unpin_pages(obj); @@ -145,6 +145,7 @@ static int igt_ppgtt_alloc(void *arg) { struct drm_i915_private *dev_priv = arg; struct i915_ppgtt *ppgtt; + struct i915_gem_ww_ctx ww; u64 size, last, limit; int err = 0;
@@ -170,6 +171,12 @@ static int igt_ppgtt_alloc(void *arg) limit = totalram_pages() << PAGE_SHIFT; limit = min(ppgtt->vm.total, limit);
+ i915_gem_ww_ctx_init(&ww, false); +retry: + err = i915_vm_lock_objects(&ppgtt->vm, &ww); + if (err) + goto err_ppgtt_cleanup; + /* Check we can allocate the entire range */ for (size = 4096; size <= limit; size <<= 2) { struct i915_vm_pt_stash stash = {}; @@ -214,6 +221,13 @@ static int igt_ppgtt_alloc(void *arg) }
err_ppgtt_cleanup: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + i915_vm_put(&ppgtt->vm); return err; } @@ -275,7 +289,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
GEM_BUG_ON(obj->base.size != BIT_ULL(size));
- if (i915_gem_object_pin_pages(obj)) { + if (i915_gem_object_pin_pages_unlocked(obj)) { i915_gem_object_put(obj); kfree(order); break; @@ -296,20 +310,36 @@ static int lowlevel_hole(struct i915_address_space *vm,
if (vm->allocate_va_range) { struct i915_vm_pt_stash stash = {}; + struct i915_gem_ww_ctx ww; + int err; + + i915_gem_ww_ctx_init(&ww, false); +retry: + err = i915_vm_lock_objects(vm, &ww); + if (err) + goto alloc_vm_end;
+ err = -ENOMEM; if (i915_vm_alloc_pt_stash(vm, &stash, BIT_ULL(size))) - break; - - if (i915_vm_pin_pt_stash(vm, &stash)) { - i915_vm_free_pt_stash(vm, &stash); - break; - } + goto alloc_vm_end;
- vm->allocate_va_range(vm, &stash, - addr, BIT_ULL(size)); + err = i915_vm_pin_pt_stash(vm, &stash); + if (!err) + vm->allocate_va_range(vm, &stash, + addr, BIT_ULL(size));
i915_vm_free_pt_stash(vm, &stash); +alloc_vm_end: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + + if (err) + break; }
mock_vma->pages = obj->mm.pages; @@ -1165,7 +1195,7 @@ static int igt_ggtt_page(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj);
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) goto out_free;
@@ -1332,7 +1362,7 @@ static int igt_gtt_reserve(void *arg) goto out; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { i915_gem_object_put(obj); goto out; @@ -1384,7 +1414,7 @@ static int igt_gtt_reserve(void *arg) goto out; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { i915_gem_object_put(obj); goto out; @@ -1548,7 +1578,7 @@ static int igt_gtt_insert(void *arg) goto out; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { i915_gem_object_put(obj); goto out; @@ -1657,7 +1687,7 @@ static int igt_gtt_insert(void *arg) goto out; }
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { i915_gem_object_put(obj); goto out; @@ -1828,7 +1858,7 @@ static int igt_cs_tlb(void *arg) goto out_vm; }
- batch = i915_gem_object_pin_map(bbe, I915_MAP_WC); + batch = i915_gem_object_pin_map_unlocked(bbe, I915_MAP_WC); if (IS_ERR(batch)) { err = PTR_ERR(batch); goto out_put_bbe; @@ -1844,7 +1874,7 @@ static int igt_cs_tlb(void *arg) }
/* Track the execution of each request by writing into different slot */ - batch = i915_gem_object_pin_map(act, I915_MAP_WC); + batch = i915_gem_object_pin_map_unlocked(act, I915_MAP_WC); if (IS_ERR(batch)) { err = PTR_ERR(batch); goto out_put_act; @@ -1891,7 +1921,7 @@ static int igt_cs_tlb(void *arg) goto out_put_out; GEM_BUG_ON(vma->node.start != vm->total - PAGE_SIZE);
- result = i915_gem_object_pin_map(out, I915_MAP_WB); + result = i915_gem_object_pin_map_unlocked(out, I915_MAP_WB); if (IS_ERR(result)) { err = PTR_ERR(result); goto out_put_out; @@ -1907,6 +1937,7 @@ static int igt_cs_tlb(void *arg) while (!__igt_timeout(end_time, NULL)) { struct i915_vm_pt_stash stash = {}; struct i915_request *rq; + struct i915_gem_ww_ctx ww; u64 offset;
offset = igt_random_offset(&prng, @@ -1925,19 +1956,30 @@ static int igt_cs_tlb(void *arg) if (err) goto end;
+ i915_gem_ww_ctx_init(&ww, false); +retry: + err = i915_vm_lock_objects(vm, &ww); + if (err) + goto end_ww; + err = i915_vm_alloc_pt_stash(vm, &stash, chunk_size); if (err) - goto end; + goto end_ww;
err = i915_vm_pin_pt_stash(vm, &stash); - if (err) { - i915_vm_free_pt_stash(vm, &stash); - goto end; - } - - vm->allocate_va_range(vm, &stash, offset, chunk_size); + if (!err) + vm->allocate_va_range(vm, &stash, offset, chunk_size);
i915_vm_free_pt_stash(vm, &stash); +end_ww: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + if (err) + goto end;
/* Prime the TLB with the dummy pages */ for (i = 0; i < count; i++) {
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
With all callers and selftests fixed to use ww locking, we can now finally remove this lock.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_object.h | 7 ++-- .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 - drivers/gpu/drm/i915/gem/i915_gem_pages.c | 38 ++++--------------- drivers/gpu/drm/i915/gem/i915_gem_phys.c | 34 ++++------------- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 37 +++++++++++++----- drivers/gpu/drm/i915/gem/i915_gem_shrinker.h | 4 +- drivers/gpu/drm/i915/gem/i915_gem_tiling.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 +- drivers/gpu/drm/i915/i915_debugfs.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +--- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +- 13 files changed, 54 insertions(+), 90 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 028a556ab1a5..08d806bbf48e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -62,8 +62,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops, struct lock_class_key *key, unsigned flags) { - mutex_init(&obj->mm.lock); - spin_lock_init(&obj->vma.lock); INIT_LIST_HEAD(&obj->vma.list);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 1d4b44151e0c..d0cc62d1c65e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -136,7 +136,7 @@ static inline void assert_object_held_shared(struct drm_i915_gem_object *obj) */ if (IS_ENABLED(CONFIG_LOCKDEP) && kref_read(&obj->base.refcount) > 0) - lockdep_assert_held(&obj->mm.lock); + assert_object_held(obj); }
static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj, @@ -350,11 +350,11 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj); static inline int __must_check i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) { - might_lock(&obj->mm.lock); - if (atomic_inc_not_zero(&obj->mm.pages_pin_count)) return 0;
+ assert_object_held(obj); + return __i915_gem_object_get_pages(obj); }
@@ -396,7 +396,6 @@ i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) }
int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj); -int __i915_gem_object_put_pages_locked(struct drm_i915_gem_object *obj); void i915_gem_object_truncate(struct drm_i915_gem_object *obj); void i915_gem_object_writeback(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 5234c1ed62d4..b172e8cc53ab 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -209,7 +209,6 @@ struct drm_i915_gem_object { * Protects the pages and their use. Do not use directly, but * instead go through the pin/unpin interfaces. */ - struct mutex lock; atomic_t pages_pin_count; atomic_t shrink_pin;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 79336735a6e4..4a8be759832b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -67,7 +67,7 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, struct list_head *list; unsigned long flags;
- lockdep_assert_held(&obj->mm.lock); + assert_object_held(obj); spin_lock_irqsave(&i915->mm.obj_lock, flags);
i915->mm.shrink_count++; @@ -114,9 +114,7 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj) { int err;
- err = mutex_lock_interruptible(&obj->mm.lock); - if (err) - return err; + assert_object_held(obj);
assert_object_held_shared(obj);
@@ -125,15 +123,13 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
err = ____i915_gem_object_get_pages(obj); if (err) - goto unlock; + return err;
smp_mb__before_atomic(); } atomic_inc(&obj->mm.pages_pin_count);
-unlock: - mutex_unlock(&obj->mm.lock); - return err; + return 0; }
int i915_gem_object_pin_pages_unlocked(struct drm_i915_gem_object *obj) @@ -220,7 +216,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) return pages; }
-int __i915_gem_object_put_pages_locked(struct drm_i915_gem_object *obj) +int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj) { struct sg_table *pages;
@@ -251,21 +247,6 @@ int __i915_gem_object_put_pages_locked(struct drm_i915_gem_object *obj) return 0; }
-int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj) -{ - int err; - - if (i915_gem_object_has_pinned_pages(obj)) - return -EBUSY; - - /* May be called by shrinker from within get_pages() (on another bo) */ - mutex_lock(&obj->mm.lock); - err = __i915_gem_object_put_pages_locked(obj); - mutex_unlock(&obj->mm.lock); - - return err; -} - /* The 'mapping' part of i915_gem_object_pin_map() below */ static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj, enum i915_map_type type) @@ -366,9 +347,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, !i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM)) return ERR_PTR(-ENXIO);
- err = mutex_lock_interruptible(&obj->mm.lock); - if (err) - return ERR_PTR(err); + assert_object_held(obj);
pinned = !(type & I915_MAP_OVERRIDE); type &= ~I915_MAP_OVERRIDE; @@ -416,15 +395,12 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, obj->mm.mapping = page_pack_bits(ptr, type); }
-out_unlock: - mutex_unlock(&obj->mm.lock); return ptr;
err_unpin: atomic_dec(&obj->mm.pages_pin_count); err_unlock: - ptr = ERR_PTR(err); - goto out_unlock; + return ERR_PTR(err); }
void *i915_gem_object_pin_map_unlocked(struct drm_i915_gem_object *obj, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index f317be5f5e34..435c3b54cf14 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c @@ -234,40 +234,22 @@ int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align) if (err) return err;
- err = mutex_lock_interruptible(&obj->mm.lock); - if (err) - return err; - - if (unlikely(!i915_gem_object_has_struct_page(obj))) - goto out; - - if (obj->mm.madv != I915_MADV_WILLNEED) { - err = -EFAULT; - goto out; - } + if (obj->mm.madv != I915_MADV_WILLNEED) + return -EFAULT;
- if (obj->mm.quirked) { - err = -EFAULT; - goto out; - } + if (obj->mm.quirked) + return -EFAULT;
- if (obj->mm.mapping || i915_gem_object_has_pinned_pages(obj)) { - err = -EBUSY; - goto out; - } + if (obj->mm.mapping || i915_gem_object_has_pinned_pages(obj)) + return -EBUSY;
if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) { drm_dbg(obj->base.dev, "Attempting to obtain a purgeable object\n"); - err = -EFAULT; - goto out; + return -EFAULT; }
- err = i915_gem_object_shmem_to_phys(obj); - -out: - mutex_unlock(&obj->mm.lock); - return err; + return i915_gem_object_shmem_to_phys(obj); }
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 7a59fd1ea4e5..b4dd7a709800 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -99,7 +99,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) goto err_sg; }
- i915_gem_shrink(i915, 2 * page_count, NULL, *s++); + i915_gem_shrink(NULL, i915, 2 * page_count, NULL, *s++);
/* * We've tried hard to allocate the memory by reaping diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index afc6e5b4dcf1..e42192834c88 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -93,7 +93,8 @@ static void try_to_writeback(struct drm_i915_gem_object *obj, * The number of pages of backing storage actually released. */ unsigned long -i915_gem_shrink(struct drm_i915_private *i915, +i915_gem_shrink(struct i915_gem_ww_ctx *ww, + struct drm_i915_private *i915, unsigned long target, unsigned long *nr_scanned, unsigned int shrink) @@ -112,6 +113,7 @@ i915_gem_shrink(struct drm_i915_private *i915, intel_wakeref_t wakeref = 0; unsigned long count = 0; unsigned long scanned = 0; + int err;
trace_i915_gem_shrink(i915, target, shrink);
@@ -199,23 +201,38 @@ i915_gem_shrink(struct drm_i915_private *i915,
spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
- if (unsafe_drop_pages(obj, shrink) && - mutex_trylock(&obj->mm.lock)) { + err = 0; + if (unsafe_drop_pages(obj, shrink)) { /* May arrive from get_pages on another bo */ - if (!__i915_gem_object_put_pages_locked(obj)) { + if (!ww) { + if (!i915_gem_object_trylock(obj)) + goto skip; + } else { + err = i915_gem_object_lock(obj, ww); + if (err) + goto skip; + } + + if (!__i915_gem_object_put_pages(obj)) { try_to_writeback(obj, shrink); count += obj->base.size >> PAGE_SHIFT; } - mutex_unlock(&obj->mm.lock); + if (!ww) + i915_gem_object_unlock(obj); }
scanned += obj->base.size >> PAGE_SHIFT; +skip: i915_gem_object_put(obj);
spin_lock_irqsave(&i915->mm.obj_lock, flags); + if (err) + break; } list_splice_tail(&still_in_list, phase->list); spin_unlock_irqrestore(&i915->mm.obj_lock, flags); + if (err) + return err; }
if (shrink & I915_SHRINK_BOUND) @@ -246,7 +263,7 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *i915) unsigned long freed = 0;
with_intel_runtime_pm(&i915->runtime_pm, wakeref) { - freed = i915_gem_shrink(i915, -1UL, NULL, + freed = i915_gem_shrink(NULL, i915, -1UL, NULL, I915_SHRINK_BOUND | I915_SHRINK_UNBOUND); } @@ -292,7 +309,7 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
sc->nr_scanned = 0;
- freed = i915_gem_shrink(i915, + freed = i915_gem_shrink(NULL, i915, sc->nr_to_scan, &sc->nr_scanned, I915_SHRINK_BOUND | @@ -301,7 +318,7 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) intel_wakeref_t wakeref;
with_intel_runtime_pm(&i915->runtime_pm, wakeref) { - freed += i915_gem_shrink(i915, + freed += i915_gem_shrink(NULL, i915, sc->nr_to_scan - sc->nr_scanned, &sc->nr_scanned, I915_SHRINK_ACTIVE | @@ -326,7 +343,7 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr)
freed_pages = 0; with_intel_runtime_pm(&i915->runtime_pm, wakeref) - freed_pages += i915_gem_shrink(i915, -1UL, NULL, + freed_pages += i915_gem_shrink(NULL, i915, -1UL, NULL, I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | I915_SHRINK_WRITEBACK); @@ -364,7 +381,7 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr intel_wakeref_t wakeref;
with_intel_runtime_pm(&i915->runtime_pm, wakeref) - freed_pages += i915_gem_shrink(i915, -1UL, NULL, + freed_pages += i915_gem_shrink(NULL, i915, -1UL, NULL, I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | I915_SHRINK_VMAPS); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h index b397d7785789..8512470f6fd6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h @@ -9,10 +9,12 @@ #include <linux/bits.h>
struct drm_i915_private; +struct i915_gem_ww_ctx; struct mutex;
/* i915_gem_shrinker.c */ -unsigned long i915_gem_shrink(struct drm_i915_private *i915, +unsigned long i915_gem_shrink(struct i915_gem_ww_ctx *ww, + struct drm_i915_private *i915, unsigned long target, unsigned long *nr_scanned, unsigned flags); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c index ffcaee74a249..4523a14db86e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c @@ -265,7 +265,6 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object *obj, * pages to prevent them being swapped out and causing corruption * due to the change in swizzling. */ - mutex_lock(&obj->mm.lock); if (i915_gem_object_has_pages(obj) && obj->mm.madv == I915_MADV_WILLNEED && i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) { @@ -280,7 +279,6 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object *obj, obj->mm.quirked = true; } } - mutex_unlock(&obj->mm.lock);
spin_lock(&obj->vma.lock); for_each_ggtt_vma(vma, obj) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 0cab9da6669e..fb4bc30fbd9a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -247,7 +247,7 @@ static int i915_gem_object_userptr_unbind(struct drm_i915_gem_object *obj, bool if (GEM_WARN_ON(i915_gem_object_has_pinned_pages(obj))) return -EBUSY;
- mutex_lock(&obj->mm.lock); + assert_object_held(obj);
pages = __i915_gem_object_unset_pages(obj); if (!IS_ERR_OR_NULL(pages)) @@ -255,7 +255,6 @@ static int i915_gem_object_userptr_unbind(struct drm_i915_gem_object *obj, bool
if (get_pages) err = ____i915_gem_object_get_pages(obj); - mutex_unlock(&obj->mm.lock);
return err; } diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 263074c2c097..6d1482c82694 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1510,10 +1510,10 @@ i915_drop_caches_set(void *data, u64 val)
fs_reclaim_acquire(GFP_KERNEL); if (val & DROP_BOUND) - i915_gem_shrink(i915, LONG_MAX, NULL, I915_SHRINK_BOUND); + i915_gem_shrink(NULL, i915, LONG_MAX, NULL, I915_SHRINK_BOUND);
if (val & DROP_UNBOUND) - i915_gem_shrink(i915, LONG_MAX, NULL, I915_SHRINK_UNBOUND); + i915_gem_shrink(NULL, i915, LONG_MAX, NULL, I915_SHRINK_UNBOUND);
if (val & DROP_SHRINK_ALL) i915_gem_shrink_all(i915); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b81fbd907775..ef66c0926af6 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1063,10 +1063,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data, if (err) goto out;
- err = mutex_lock_interruptible(&obj->mm.lock); - if (err) - goto out_ww; - if (i915_gem_object_has_pages(obj) && i915_gem_object_is_tiled(obj) && i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) { @@ -1109,9 +1105,7 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data, i915_gem_object_truncate(obj);
args->retained = obj->mm.madv != __I915_MADV_PURGED; - mutex_unlock(&obj->mm.lock);
-out_ww: i915_gem_object_unlock(obj); out: i915_gem_object_put(obj); @@ -1292,7 +1286,7 @@ int i915_gem_freeze_late(struct drm_i915_private *i915)
wakeref = intel_runtime_pm_get(&i915->runtime_pm);
- i915_gem_shrink(i915, -1UL, NULL, ~0); + i915_gem_shrink(NULL, i915, -1UL, NULL, ~0); i915_gem_drain_freed_objects(i915);
list_for_each_entry(obj, &i915->mm.shrink_list, mm.link) { diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index c5ee1567f3d1..729074ee33d4 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -44,7 +44,7 @@ int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj, * the DMA remapper, i915_gem_shrink will return 0. */ GEM_BUG_ON(obj->mm.pages == pages); - } while (i915_gem_shrink(to_i915(obj->base.dev), + } while (i915_gem_shrink(NULL, to_i915(obj->base.dev), obj->base.size >> PAGE_SHIFT, NULL, I915_SHRINK_BOUND | I915_SHRINK_UNBOUND));
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Instead of force unbinding and rebinding every time, we try to check if our notifier seqcount is still correct when pages are bound. This way we only rebind userptr when we need to, and prevent stalls.
Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 27 ++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index fb4bc30fbd9a..d1ecc31b5e90 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -275,12 +275,33 @@ int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj) if (ret) return ret;
- /* Make sure userptr is unbound for next attempt, so we don't use stale pages. */ - ret = i915_gem_object_userptr_unbind(obj, false); + /* optimistically try to preserve current pages while unlocked */ + if (i915_gem_object_has_pages(obj) && + !mmu_interval_check_retry(&obj->userptr.notifier, + obj->userptr.notifier_seq)) { + spin_lock(&i915->mm.notifier_lock); + if (obj->userptr.pvec && + !mmu_interval_read_retry(&obj->userptr.notifier, + obj->userptr.notifier_seq)) { + obj->userptr.page_ref++; + + /* We can keep using the current binding, this is the fastpath */ + ret = 1; + } + spin_unlock(&i915->mm.notifier_lock); + } + + if (!ret) { + /* Make sure userptr is unbound for next attempt, so we don't use stale pages. */ + ret = i915_gem_object_userptr_unbind(obj, false); + } i915_gem_object_unlock(obj); - if (ret) + if (ret < 0) return ret;
+ if (ret > 0) + return 0; + notifier_seq = mmu_interval_read_begin(&obj->userptr.notifier);
pvec = kvmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
From: Thomas Hellström thomas.hellstrom@intel.com
In a ww transaction where we've already locked a reservation object, assert_object_held() might not throw a splat even if the object is unlocked. Improve on that situation by asserting that the reservation object's ww mutex is indeed locked.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index d0cc62d1c65e..d56643b3b518 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -117,7 +117,14 @@ i915_gem_object_put(struct drm_i915_gem_object *obj) __drm_gem_object_put(&obj->base); }
-#define assert_object_held(obj) dma_resv_assert_held((obj)->base.resv) +#ifdef CONFIG_LOCKDEP +#define assert_object_held(obj) do { \ + dma_resv_assert_held((obj)->base.resv); \ + WARN_ON(!ww_mutex_is_locked(&(obj)->base.resv->lock)); \ + } while (0) +#else +#define assert_object_held(obj) do { } while (0) +#endif
#define object_is_isolated(obj) \ (!IS_ENABLED(CONFIG_LOCKDEP) || \
From: Thomas Hellström thomas.hellstrom@intel.com
When we lock objects in leaf functions, for example during eviction, they may disappear as soon as we unreference them, and the locking context contended pointer then points to a free object. Fix this by taking a reference on that object, and also unlock the contending object as soon as we've done the ww transaction relaxation: The restarted transaction may not even need the contending object, and keeping the lock is not needed to prevent starvation. Keeping that lock will unnecessarily requiring us to reference count all locks on the list and also creates locking confusion around -EALREADY.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++- 2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index d56643b3b518..60e27738c39d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -163,7 +163,7 @@ static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj, ret = 0;
if (ret == -EDEADLK) - ww->contended = obj; + ww->contended = i915_gem_object_get(obj);
return ret; } diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef66c0926af6..2248e65cf5f9 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1370,9 +1370,16 @@ int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ww) else dma_resv_lock_slow(ww->contended->base.resv, &ww->ctx);
+ /* + * Unlocking the contended lock again, as might not need it in + * the retried transaction. This does not increase starvation, + * but it's opening up for a wakeup flood if there are many + * transactions relaxing on this object. + */ if (!ret) - list_add_tail(&ww->contended->obj_link, &ww->obj_list); + dma_resv_unlock(ww->contended->base.resv);
+ i915_gem_object_put(ww->contended); ww->contended = NULL;
return ret;
From: Thomas Hellström thomas.hellstrom@intel.com
As we're about to add more ww-related functionality, break out the dma_resv ww locking utilities to their own files
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_object.h | 1 + drivers/gpu/drm/i915/gt/intel_renderstate.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 59 ------------------ drivers/gpu/drm/i915/i915_gem.h | 12 ---- drivers/gpu/drm/i915/i915_gem_ww.c | 66 +++++++++++++++++++++ drivers/gpu/drm/i915/i915_gem_ww.h | 21 +++++++ 7 files changed, 90 insertions(+), 71 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_gem_ww.c create mode 100644 drivers/gpu/drm/i915/i915_gem_ww.h
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 5112e5d79316..ec361d61230b 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -45,6 +45,7 @@ i915-y += i915_drv.o \ i915_switcheroo.o \ i915_sysfs.o \ i915_utils.o \ + i915_gem_ww.o \ intel_device_info.o \ intel_dram.o \ intel_memory_region.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 60e27738c39d..c6c7ab181a65 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -15,6 +15,7 @@ #include "i915_gem_object_types.h" #include "i915_gem_gtt.h" #include "i915_vma_types.h" +#include "i915_gem_ww.h"
void i915_gem_init__objects(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.h b/drivers/gpu/drm/i915/gt/intel_renderstate.h index 713aa1e86c80..d9db833b873b 100644 --- a/drivers/gpu/drm/i915/gt/intel_renderstate.h +++ b/drivers/gpu/drm/i915/gt/intel_renderstate.h @@ -26,6 +26,7 @@
#include <linux/types.h> #include "i915_gem.h" +#include "i915_gem_ww.h"
struct i915_request; struct intel_context; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2248e65cf5f9..2662d679db6e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1326,65 +1326,6 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file) return ret; }
-void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ww, bool intr) -{ - ww_acquire_init(&ww->ctx, &reservation_ww_class); - INIT_LIST_HEAD(&ww->obj_list); - ww->intr = intr; - ww->contended = NULL; -} - -static void i915_gem_ww_ctx_unlock_all(struct i915_gem_ww_ctx *ww) -{ - struct drm_i915_gem_object *obj; - - while ((obj = list_first_entry_or_null(&ww->obj_list, struct drm_i915_gem_object, obj_link))) { - list_del(&obj->obj_link); - i915_gem_object_unlock(obj); - } -} - -void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj) -{ - list_del(&obj->obj_link); - i915_gem_object_unlock(obj); -} - -void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ww) -{ - i915_gem_ww_ctx_unlock_all(ww); - WARN_ON(ww->contended); - ww_acquire_fini(&ww->ctx); -} - -int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ww) -{ - int ret = 0; - - if (WARN_ON(!ww->contended)) - return -EINVAL; - - i915_gem_ww_ctx_unlock_all(ww); - if (ww->intr) - ret = dma_resv_lock_slow_interruptible(ww->contended->base.resv, &ww->ctx); - else - dma_resv_lock_slow(ww->contended->base.resv, &ww->ctx); - - /* - * Unlocking the contended lock again, as might not need it in - * the retried transaction. This does not increase starvation, - * but it's opening up for a wakeup flood if there are many - * transactions relaxing on this object. - */ - if (!ret) - dma_resv_unlock(ww->contended->base.resv); - - i915_gem_object_put(ww->contended); - ww->contended = NULL; - - return ret; -} - #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/mock_gem_device.c" #include "selftests/i915_gem.c" diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h index a4cad3f154ca..f333e88a2b6e 100644 --- a/drivers/gpu/drm/i915/i915_gem.h +++ b/drivers/gpu/drm/i915/i915_gem.h @@ -116,16 +116,4 @@ static inline bool __tasklet_is_scheduled(struct tasklet_struct *t) return test_bit(TASKLET_STATE_SCHED, &t->state); }
-struct i915_gem_ww_ctx { - struct ww_acquire_ctx ctx; - struct list_head obj_list; - bool intr; - struct drm_i915_gem_object *contended; -}; - -void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ctx, bool intr); -void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ctx); -int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ctx); -void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj); - #endif /* __I915_GEM_H__ */ diff --git a/drivers/gpu/drm/i915/i915_gem_ww.c b/drivers/gpu/drm/i915/i915_gem_ww.c new file mode 100644 index 000000000000..43960d8595eb --- /dev/null +++ b/drivers/gpu/drm/i915/i915_gem_ww.c @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ +#include <linux/dma-resv.h> +#include "i915_gem_ww.h" +#include "gem/i915_gem_object.h" + +void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ww, bool intr) +{ + ww_acquire_init(&ww->ctx, &reservation_ww_class); + INIT_LIST_HEAD(&ww->obj_list); + ww->intr = intr; + ww->contended = NULL; +} + +static void i915_gem_ww_ctx_unlock_all(struct i915_gem_ww_ctx *ww) +{ + struct drm_i915_gem_object *obj; + + while ((obj = list_first_entry_or_null(&ww->obj_list, struct drm_i915_gem_object, obj_link))) { + list_del(&obj->obj_link); + i915_gem_object_unlock(obj); + } +} + +void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj) +{ + list_del(&obj->obj_link); + i915_gem_object_unlock(obj); +} + +void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ww) +{ + i915_gem_ww_ctx_unlock_all(ww); + WARN_ON(ww->contended); + ww_acquire_fini(&ww->ctx); +} + +int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ww) +{ + int ret = 0; + + if (WARN_ON(!ww->contended)) + return -EINVAL; + + i915_gem_ww_ctx_unlock_all(ww); + if (ww->intr) + ret = dma_resv_lock_slow_interruptible(ww->contended->base.resv, &ww->ctx); + else + dma_resv_lock_slow(ww->contended->base.resv, &ww->ctx); + + /* + * Unlocking the contended lock again, as might not need it in + * the retried transaction. This does not increase starvation, + * but it's opening up for a wakeup flood if there are many + * transactions relaxing on this object. + */ + if (!ret) + dma_resv_unlock(ww->contended->base.resv); + + i915_gem_object_put(ww->contended); + ww->contended = NULL; + + return ret; +} diff --git a/drivers/gpu/drm/i915/i915_gem_ww.h b/drivers/gpu/drm/i915/i915_gem_ww.h new file mode 100644 index 000000000000..f2d8769e4118 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_gem_ww.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2020 Intel Corporation + */ +#ifndef __I915_GEM_WW_H__ +#define __I915_GEM_WW_H__ + +#include <drm/drm_drv.h> + +struct i915_gem_ww_ctx { + struct ww_acquire_ctx ctx; + struct list_head obj_list; + struct drm_i915_gem_object *contended; + bool intr; +}; + +void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ctx, bool intr); +void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ctx); +int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ctx); +void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj); +#endif
From: Thomas Hellström thomas.hellstrom@intel.com
Introduce a for_i915_gem_ww(){} utility to help make the code around a ww transaction more readable.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/i915_gem_ww.h | 31 +++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_ww.h b/drivers/gpu/drm/i915/i915_gem_ww.h index f2d8769e4118..f6b1a796667b 100644 --- a/drivers/gpu/drm/i915/i915_gem_ww.h +++ b/drivers/gpu/drm/i915/i915_gem_ww.h @@ -11,11 +11,40 @@ struct i915_gem_ww_ctx { struct ww_acquire_ctx ctx; struct list_head obj_list; struct drm_i915_gem_object *contended; - bool intr; + unsigned short intr; + unsigned short loop; };
void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ctx, bool intr); void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ctx); int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ctx); void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj); + +/* Internal functions used by the inlines! Don't use. */ +static inline int __i915_gem_ww_fini(struct i915_gem_ww_ctx *ww, int err) +{ + ww->loop = 0; + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(ww); + if (!err) + ww->loop = 1; + } + + if (!ww->loop) + i915_gem_ww_ctx_fini(ww); + + return err; +} + +static inline void +__i915_gem_ww_init(struct i915_gem_ww_ctx *ww, bool intr) +{ + i915_gem_ww_ctx_init(ww, intr); + ww->loop = 1; +} + +#define for_i915_gem_ww(_ww, _err, _intr) \ + for (__i915_gem_ww_init(_ww, _intr); (_ww)->loop; \ + _err = __i915_gem_ww_fini(_ww, _err)) + #endif
From: Thomas Hellström thomas.hellstrom@intel.com
Move the vma pages_mutex out of the way from the object ww locks.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/i915_vma.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 0c7e4191811a..7243ab593aec 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -792,28 +792,30 @@ static int vma_get_pages(struct i915_vma *vma) if (atomic_add_unless(&vma->pages_count, 1, 0)) return 0;
+ if (vma->obj) { + err = i915_gem_object_pin_pages(vma->obj); + if (err) + return err; + } + /* Allocations ahoy! */ - if (mutex_lock_interruptible(&vma->pages_mutex)) - return -EINTR; + if (mutex_lock_interruptible(&vma->pages_mutex)) { + err = -EINTR; + goto unpin; + }
if (!atomic_read(&vma->pages_count)) { - if (vma->obj) { - err = i915_gem_object_pin_pages(vma->obj); - if (err) - goto unlock; - } - err = vma->ops->set_pages(vma); - if (err) { - if (vma->obj) - i915_gem_object_unpin_pages(vma->obj); + if (err) goto unlock; - } } atomic_inc(&vma->pages_count);
unlock: mutex_unlock(&vma->pages_mutex); +unpin: + if (err && vma->obj) + __i915_gem_object_unpin_pages(vma->obj);
return err; } @@ -826,10 +828,10 @@ static void __vma_put_pages(struct i915_vma *vma, unsigned int count) if (atomic_sub_return(count, &vma->pages_count) == 0) { vma->ops->clear_pages(vma); GEM_BUG_ON(vma->pages); - if (vma->obj) - i915_gem_object_unpin_pages(vma->obj); } mutex_unlock(&vma->pages_mutex); + if (vma->obj) + i915_gem_object_unpin_pages(vma->obj); }
static void vma_put_pages(struct i915_vma *vma)
From: Mohammed Khajapasha mohammed.khajapasha@intel.com
use local memory io BAR address for fbdev's fb_mmap() operation on discrete, fbdev uses the physical address of our framebuffer for its fb_mmap() fn.
Signed-off-by: Mohammed Khajapasha mohammed.khajapasha@intel.com Cc: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/display/intel_fbdev.c | 27 +++++++++++++++++----- 1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index bdf44e923cc0..831e99e0785c 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -178,6 +178,7 @@ static int intelfb_create(struct drm_fb_helper *helper, unsigned long flags = 0; bool prealloc = false; void __iomem *vaddr; + struct drm_i915_gem_object *obj; int ret;
if (intel_fb && @@ -232,13 +233,27 @@ static int intelfb_create(struct drm_fb_helper *helper, info->fbops = &intelfb_ops;
/* setup aperture base/size for vesafb takeover */ - info->apertures->ranges[0].base = ggtt->gmadr.start; - info->apertures->ranges[0].size = ggtt->mappable_end; + obj = intel_fb_obj(&intel_fb->base); + if (HAS_LMEM(dev_priv) && i915_gem_object_is_lmem(obj)) { + struct intel_memory_region *mem = obj->mm.region; + + info->apertures->ranges[0].base = mem->io_start; + info->apertures->ranges[0].size = mem->total; + + /* Use fbdev's framebuffer from lmem for discrete */ + info->fix.smem_start = + (unsigned long)(mem->io_start + + i915_gem_object_get_dma_address(obj, 0)); + info->fix.smem_len = obj->base.size; + } else { + info->apertures->ranges[0].base = ggtt->gmadr.start; + info->apertures->ranges[0].size = ggtt->mappable_end;
- /* Our framebuffer is the entirety of fbdev's system memory */ - info->fix.smem_start = - (unsigned long)(ggtt->gmadr.start + vma->node.start); - info->fix.smem_len = vma->node.size; + /* Our framebuffer is the entirety of fbdev's system memory */ + info->fix.smem_start = + (unsigned long)(ggtt->gmadr.start + vma->node.start); + info->fix.smem_len = vma->node.size; + }
vaddr = i915_vma_pin_iomap(vma); if (IS_ERR(vaddr)) {
From: Mohammed Khajapasha mohammed.khajapasha@intel.com
Return EREMOTE value when frame buffer object is not backed by LMEM for discrete. If Local memory is supported by hardware the framebuffer backing gem objects should be from local memory.
Signed-off-by: Mohammed Khajapasha mohammed.khajapasha@intel.com Cc: Michael J. Ruhl michael.j.ruhl@intel.com Cc: Animesh Manna animesh.manna@intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 8a7945f55278..95ed1e06ea55 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -18054,11 +18054,20 @@ intel_user_framebuffer_create(struct drm_device *dev, struct drm_framebuffer *fb; struct drm_i915_gem_object *obj; struct drm_mode_fb_cmd2 mode_cmd = *user_mode_cmd; + struct drm_i915_private *i915;
obj = i915_gem_object_lookup(filp, mode_cmd.handles[0]); if (!obj) return ERR_PTR(-ENOENT);
+ /* object is backed with LMEM for discrete */ + i915 = to_i915(obj->base.dev); + if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj)) { + /* object is "remote", not in local memory */ + i915_gem_object_put(obj); + return ERR_PTR(-EREMOTE); + } + fb = intel_framebuffer_create(obj, &mode_cmd); i915_gem_object_put(obj);
From: "Michael J. Ruhl" michael.j.ruhl@intel.com
The dma-buf interface for i915 does not currently support LMEM backed objects.
Check imported objects to see if they are from i915 and if they are LMEM. If they are, reject the import.
This check is needed in two places, once on import, and then a recheck in the mapping path in the off chance that an object was migrated to LMEM after import.
Signed-off-by: Michael J. Ruhl michael.j.ruhl@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index c4b01e819786..018d02cc4af5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -9,6 +9,7 @@ #include <linux/dma-resv.h>
#include "i915_drv.h" +#include "i915_gem_lmem.h" #include "i915_gem_object.h" #include "i915_scatterlist.h"
@@ -25,6 +26,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme struct scatterlist *src, *dst; int ret, i;
+ if (i915_gem_object_is_lmem(obj)) { + ret = -ENOTSUPP; + goto err; + } + ret = i915_gem_object_pin_pages(obj); if (ret) goto err; @@ -248,6 +254,10 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev, */ return &i915_gem_object_get(obj)->base; } + + /* not our device, but still a i915 object? */ + if (i915_gem_object_is_lmem(obj)) + return ERR_PTR(-ENOTSUPP); }
/* need to attach */
From: Matt Roper matthew.d.roper@intel.com
Boot firmware performs memory training and health assessment during startup. If the memory training fails, the firmware will consider the GPU unusable and will instruct the punit to keep the GT powered down. If this happens, our driver will be unable to communicate with the GT (all GT registers will read back as 0, forcewake requests will timeout, etc.) so we should abort driver initialization if this happens. We can confirm that LMEM was initialized successfully via sgunit register GU_CNTL.
Bspec: 53111 Signed-off-by: Matt Roper matthew.d.roper@intel.com Cc: Caz Yokoyama Caz.Yokoyama@intel.com --- drivers/gpu/drm/i915/i915_reg.h | 3 +++ drivers/gpu/drm/i915/intel_uncore.c | 12 ++++++++++++ 2 files changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 5375b219cc3b..bf9ba1e361bb 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -487,6 +487,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GAB_CTL _MMIO(0x24000) #define GAB_CTL_CONT_AFTER_PAGEFAULT (1 << 8)
+#define GU_CNTL _MMIO(0x101010) +#define LMEM_INIT REG_BIT(7) + #define GEN6_STOLEN_RESERVED _MMIO(0x1082C0) #define GEN6_STOLEN_RESERVED_ADDR_MASK (0xFFF << 20) #define GEN7_STOLEN_RESERVED_ADDR_MASK (0x3FFF << 18) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 1c14a07eba7d..1630452e82b8 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1901,6 +1901,18 @@ int intel_uncore_init_mmio(struct intel_uncore *uncore) if (ret) return ret;
+ /* + * The boot firmware initializes local memory and assesses its health. + * If memory training fails, the punit will have been instructed to + * keep the GT powered down; we won't be able to communicate with it + * and we should not continue with driver initialization. + */ + if (IS_DGFX(i915) && + !(__raw_uncore_read32(uncore, GU_CNTL) & LMEM_INIT)) { + drm_err(&i915->drm, "LMEM not initialized by firmware\n"); + return -ENODEV; + } + if (INTEL_GEN(i915) > 5 && !intel_vgpu_active(i915)) uncore->flags |= UNCORE_HAS_FORCEWAKE;
** DO NOT MERGE. RELOCATION SUPPORT WILL BE DROPPED FROM DG1+ **
Add LMEM support for the CPU reloc path. When doing relocations we have both a GPU and CPU reloc path, as well as some debugging options to force a particular path. The GPU reloc path is preferred when the object is not currently idle, otherwise we use the CPU reloc path. Since we can't kmap the object, and the mappable aperture might not be available, add support for mapping it through LMEMBAR.
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 53 +++++++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 12 +++++ drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 4 ++ 3 files changed, 65 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 91f0c3fd9a4b..e73a761a7d1f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -14,6 +14,7 @@ #include "display/intel_frontbuffer.h"
#include "gem/i915_gem_ioctls.h" +#include "gem/i915_gem_lmem.h" #include "gt/intel_context.h" #include "gt/intel_gt.h" #include "gt/intel_gt_buffer_pool.h" @@ -278,6 +279,7 @@ struct i915_execbuffer { bool has_llc : 1; bool has_fence : 1; bool needs_unfenced : 1; + bool is_lmem : 1;
struct i915_request *rq; u32 *rq_cmd; @@ -1049,6 +1051,7 @@ static void reloc_cache_init(struct reloc_cache *cache, cache->has_fence = cache->gen < 4; cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; cache->node.flags = 0; + cache->is_lmem = false; reloc_cache_clear(cache); }
@@ -1128,10 +1131,14 @@ static void reloc_cache_reset(struct reloc_cache *cache, struct i915_execbuffer } else { struct i915_ggtt *ggtt = cache_to_ggtt(cache);
- intel_gt_flush_ggtt_writes(ggtt->vm.gt); + if (!cache->is_lmem) + intel_gt_flush_ggtt_writes(ggtt->vm.gt); io_mapping_unmap_atomic((void __iomem *)vaddr);
- if (drm_mm_node_allocated(&cache->node)) { + if (cache->is_lmem) { + i915_gem_object_unpin_pages((struct drm_i915_gem_object *)cache->node.mm); + cache->is_lmem = false; + } else if (drm_mm_node_allocated(&cache->node)) { ggtt->vm.clear_range(&ggtt->vm, cache->node.start, cache->node.size); @@ -1184,6 +1191,40 @@ static void *reloc_kmap(struct drm_i915_gem_object *obj, return vaddr; }
+static void *reloc_lmem(struct drm_i915_gem_object *obj, + struct reloc_cache *cache, + unsigned long page) +{ + void *vaddr; + int err; + + GEM_BUG_ON(use_cpu_reloc(cache, obj)); + + if (cache->vaddr) { + io_mapping_unmap_atomic((void __force __iomem *) unmask_page(cache->vaddr)); + } else { + err = i915_gem_object_pin_pages(obj); + if (err) + return ERR_PTR(err); + + err = i915_gem_object_set_to_wc_domain(obj, true); + if (err) { + i915_gem_object_unpin_pages(obj); + return ERR_PTR(err); + } + + cache->node.mm = (void *)obj; + cache->is_lmem = true; + } + + vaddr = i915_gem_object_lmem_io_map_page_atomic(obj, page); + + cache->vaddr = (unsigned long)vaddr; + cache->page = page; + + return vaddr; +} + static void *reloc_iomap(struct drm_i915_gem_object *obj, struct i915_execbuffer *eb, unsigned long page) @@ -1262,8 +1303,12 @@ static void *reloc_vaddr(struct drm_i915_gem_object *obj, vaddr = unmask_page(cache->vaddr); } else { vaddr = NULL; - if ((cache->vaddr & KMAP) == 0) - vaddr = reloc_iomap(obj, eb, page); + if ((cache->vaddr & KMAP) == 0) { + if (i915_gem_object_is_lmem(obj)) + vaddr = reloc_lmem(obj, cache, page); + else + vaddr = reloc_iomap(obj, eb, page); + } if (!vaddr) vaddr = reloc_kmap(obj, cache, page); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index e953965f8263..f6c4d5998ff9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -17,6 +17,18 @@ const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = { .release = i915_gem_object_release_memory_region, };
+void __iomem * +i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj, + unsigned long n) +{ + resource_size_t offset; + + offset = i915_gem_object_get_dma_address(obj, n); + offset -= obj->mm.region->region.start; + + return io_mapping_map_atomic_wc(&obj->mm.region->iomap, offset); +} + bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) { return obj->ops == &i915_gem_lmem_obj_ops; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index fc3f15580fe3..bf7e11fad17b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -14,6 +14,10 @@ struct intel_memory_region;
extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops;
+void __iomem * +i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj, + unsigned long n); + bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj);
struct drm_i915_gem_object *
** DO NOT MERGE. PREAD/WRITE SUPPORT WILL BE DROPPED FROM DG1+ **
We need to add support for pread'ing and pwriting an LMEM object.
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Steve Hampson steven.t.hampson@intel.com Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 186 +++++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 2 + 2 files changed, 188 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index f6c4d5998ff9..840b68eb10d3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -8,6 +8,177 @@ #include "gem/i915_gem_lmem.h" #include "i915_drv.h"
+static int +i915_ww_pin_lock_interruptible(struct drm_i915_gem_object *obj) +{ + struct i915_gem_ww_ctx ww; + int ret; + + for_i915_gem_ww(&ww, ret, true) { + ret = i915_gem_object_lock(obj, &ww); + if (ret) + continue; + + ret = i915_gem_object_pin_pages(obj); + if (ret) + continue; + + ret = i915_gem_object_set_to_wc_domain(obj, false); + if (ret) + goto out_unpin; + + ret = i915_gem_object_wait(obj, + I915_WAIT_INTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT); + if (!ret) + continue; + +out_unpin: + i915_gem_object_unpin_pages(obj); + + /* Unlocking is done implicitly */ + } + + return ret; +} + +int i915_gem_object_lmem_pread(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pread *arg) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_runtime_pm *rpm = &i915->runtime_pm; + intel_wakeref_t wakeref; + char __user *user_data; + unsigned int offset; + unsigned long idx; + u64 remain; + int ret; + + ret = i915_gem_object_wait(obj, + I915_WAIT_INTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT); + if (ret) + return ret; + + ret = i915_ww_pin_lock_interruptible(obj); + if (ret) + return ret; + + wakeref = intel_runtime_pm_get(rpm); + + remain = arg->size; + user_data = u64_to_user_ptr(arg->data_ptr); + offset = offset_in_page(arg->offset); + for (idx = arg->offset >> PAGE_SHIFT; remain; idx++) { + unsigned long unwritten; + void __iomem *vaddr; + int length; + + length = remain; + if (offset + length > PAGE_SIZE) + length = PAGE_SIZE - offset; + + vaddr = i915_gem_object_lmem_io_map_page_atomic(obj, idx); + if (!vaddr) { + ret = -ENOMEM; + goto out_put; + } + unwritten = __copy_to_user_inatomic(user_data, + (void __force *)vaddr + offset, + length); + io_mapping_unmap_atomic(vaddr); + if (unwritten) { + vaddr = i915_gem_object_lmem_io_map_page(obj, idx); + unwritten = copy_to_user(user_data, + (void __force *)vaddr + offset, + length); + io_mapping_unmap(vaddr); + } + if (unwritten) { + ret = -EFAULT; + goto out_put; + } + + remain -= length; + user_data += length; + offset = 0; + } + +out_put: + intel_runtime_pm_put(rpm, wakeref); + i915_gem_object_unpin_pages(obj); + + return ret; +} + +static int i915_gem_object_lmem_pwrite(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pwrite *arg) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_runtime_pm *rpm = &i915->runtime_pm; + intel_wakeref_t wakeref; + char __user *user_data; + unsigned int offset; + unsigned long idx; + u64 remain; + int ret; + + ret = i915_gem_object_wait(obj, + I915_WAIT_INTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT); + if (ret) + return ret; + + ret = i915_ww_pin_lock_interruptible(obj); + if (ret) + return ret; + + wakeref = intel_runtime_pm_get(rpm); + + remain = arg->size; + user_data = u64_to_user_ptr(arg->data_ptr); + offset = offset_in_page(arg->offset); + for (idx = arg->offset >> PAGE_SHIFT; remain; idx++) { + unsigned long unwritten; + void __iomem *vaddr; + int length; + + length = remain; + if (offset + length > PAGE_SIZE) + length = PAGE_SIZE - offset; + + vaddr = i915_gem_object_lmem_io_map_page_atomic(obj, idx); + if (!vaddr) { + ret = -ENOMEM; + goto out_put; + } + + unwritten = __copy_from_user_inatomic_nocache((void __force *)vaddr + offset, + user_data, length); + io_mapping_unmap_atomic(vaddr); + if (unwritten) { + vaddr = i915_gem_object_lmem_io_map_page(obj, idx); + unwritten = copy_from_user((void __force *)vaddr + offset, + user_data, length); + io_mapping_unmap(vaddr); + } + if (unwritten) { + ret = -EFAULT; + goto out_put; + } + + remain -= length; + user_data += length; + offset = 0; + } + +out_put: + intel_runtime_pm_put(rpm, wakeref); + i915_gem_object_unpin_pages(obj); + + return ret; +} + const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = { .name = "i915_gem_object_lmem", .flags = I915_GEM_OBJECT_HAS_IOMEM, @@ -15,8 +186,23 @@ const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = { .get_pages = i915_gem_object_get_pages_buddy, .put_pages = i915_gem_object_put_pages_buddy, .release = i915_gem_object_release_memory_region, + + .pread = i915_gem_object_lmem_pread, + .pwrite = i915_gem_object_lmem_pwrite, };
+void __iomem * +i915_gem_object_lmem_io_map_page(struct drm_i915_gem_object *obj, + unsigned long n) +{ + resource_size_t offset; + + offset = i915_gem_object_get_dma_address(obj, n); + offset -= obj->mm.region->region.start; + + return io_mapping_map_wc(&obj->mm.region->iomap, offset, PAGE_SIZE); +} + void __iomem * i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj, unsigned long n) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index bf7e11fad17b..a24d94bc380f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -14,6 +14,8 @@ struct intel_memory_region;
extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops;
+void __iomem *i915_gem_object_lmem_io_map_page(struct drm_i915_gem_object *obj, + unsigned long n); void __iomem * i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj, unsigned long n);
From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com
Determine the possible coherent map type based on object location, and if target has llc or if user requires an always coherent mapping.
Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++-- drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- drivers/gpu/drm/i915/gt/intel_timeline.c | 8 ++++++-- drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- drivers/gpu/drm/i915/gt/selftest_execlists.c | 3 ++- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- 12 files changed, 40 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 420c6a35f3ed..677c97ded81d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -680,7 +680,8 @@ static int init_status_page(struct intel_engine_cs *engine) if (ret) goto err;
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map(obj, + i915_coherent_map_type(engine->i915, obj, true)); if (IS_ERR(vaddr)) { ret = PTR_ERR(vaddr); goto err_unpin; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 5d51144ef074..1b2009b4dcb7 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -24,7 +24,7 @@ static void dbg_poison_ce(struct intel_context *ce)
if (ce->state) { struct drm_i915_gem_object *obj = ce->state->obj; - int type = i915_coherent_map_type(ce->engine->i915); + int type = i915_coherent_map_type(ce->engine->i915, obj, true); void *map;
if (!i915_gem_object_trylock(ce->state->obj)) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 7eec42b27bc1..582a9044727e 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3535,8 +3535,8 @@ __execlists_context_pre_pin(struct intel_context *ce, GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
*vaddr = i915_gem_object_pin_map(ce->state->obj, - i915_coherent_map_type(ce->engine->i915) | - I915_MAP_OVERRIDE); + i915_coherent_map_type(ce->engine->i915, ce->state->obj, false) | + I915_MAP_OVERRIDE); if (IS_ERR(*vaddr)) return PTR_ERR(*vaddr);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index 4034a4bac7f0..d636c6ed88b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -51,9 +51,12 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
if (i915_vma_is_map_and_fenceable(vma)) addr = (void __force *)i915_vma_pin_iomap(vma); - else - addr = i915_gem_object_pin_map(vma->obj, - i915_coherent_map_type(vma->vm->i915)); + else { + int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false); + + addr = i915_gem_object_pin_map(vma->obj, type); + } + if (IS_ERR(addr)) { ret = PTR_ERR(addr); goto err_ring; diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index b2d04717db20..065943781586 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -31,6 +31,7 @@ static int __hwsp_alloc(struct intel_gt *gt, struct intel_timeline_hwsp *hwsp) { struct drm_i915_private *i915 = gt->i915; struct drm_i915_gem_object *obj; + int type; int ret;
obj = i915_gem_object_create_internal(i915, PAGE_SIZE); @@ -47,7 +48,8 @@ static int __hwsp_alloc(struct intel_gt *gt, struct intel_timeline_hwsp *hwsp) }
/* Pin early so we can call i915_ggtt_pin_unlocked(). */ - hwsp->vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + type = i915_coherent_map_type(i915, obj, true); + hwsp->vaddr = i915_gem_object_pin_map(obj, type); if (IS_ERR(hwsp->vaddr)) { ret = PTR_ERR(hwsp->vaddr); goto out_unlock; @@ -235,9 +237,11 @@ intel_timeline_pin_map(struct intel_timeline *timeline) if (!timeline->hwsp_cacheline) { struct drm_i915_gem_object *obj = timeline->hwsp_ggtt->obj; u32 ofs = offset_in_page(timeline->hwsp_offset); + int type; void *vaddr;
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + type = i915_coherent_map_type(timeline->gt->i915, obj, true); + vaddr = i915_gem_object_pin_map(obj, type); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index d9b0ebc938f1..86b6795dc4f3 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -89,7 +89,8 @@ static int __live_context_size(struct intel_engine_cs *engine) goto err;
vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, - i915_coherent_map_type(engine->i915)); + i915_coherent_map_type(engine->i915, + ce->state->obj, false)); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); intel_context_unpin(ce); diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 124011f6fb51..cb17da6a616f 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -5854,7 +5854,8 @@ static int compare_isolation(struct intel_engine_cs *engine, }
lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, - i915_coherent_map_type(engine->i915)); + i915_coherent_map_type(engine->i915, + ce->state->obj, true)); if (IS_ERR(lrc)) { err = PTR_ERR(lrc); goto err_B1; diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index e3027cebab5b..bc93dba3c8df 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -88,7 +88,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt) h->seqno = memset(vaddr, 0xff, PAGE_SIZE);
vaddr = i915_gem_object_pin_map_unlocked(h->obj, - i915_coherent_map_type(gt->i915)); + i915_coherent_map_type(gt->i915, h->obj, false)); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_unpin_hws; @@ -149,7 +149,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) return ERR_CAST(obj); }
- vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915)); + vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915, obj, false)); if (IS_ERR(vaddr)) { i915_gem_object_put(obj); i915_vm_put(vm); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index a65661eb5d5d..b54b9de31c3e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -694,7 +694,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size, if (IS_ERR(vma)) return PTR_ERR(vma);
- vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, + i915_coherent_map_type(guc_to_gt(guc)->i915, + vma->obj, true)); if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); return PTR_ERR(vaddr); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 2126dd81ac38..56d2144dc6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) if (IS_ERR(vma)) return PTR_ERR(vma);
- vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, + i915_coherent_map_type(gt->i915, + vma->obj, true)); if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); return PTR_ERR(vaddr); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index ce8d5ff8b9f4..13cb4936f15c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -78,6 +78,7 @@ #include "gem/i915_gem_context_types.h" #include "gem/i915_gem_shrinker.h" #include "gem/i915_gem_stolen.h" +#include "gem/i915_gem_lmem.h"
#include "gt/intel_engine.h" #include "gt/intel_gt_types.h" @@ -2027,9 +2028,15 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915) }
static inline enum i915_map_type -i915_coherent_map_type(struct drm_i915_private *i915) +i915_coherent_map_type(struct drm_i915_private *i915, + struct drm_i915_gem_object *obj, bool always_coherent) { - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; + if (i915_gem_object_is_lmem(obj)) + return I915_MAP_WC; + if (HAS_LLC(i915) || always_coherent) + return I915_MAP_WB; + else + return I915_MAP_WC; }
static inline u64 i915_cs_timestamp_ns_to_ticks(struct drm_i915_private *i915, u64 val) diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index 9c461edb0b73..b2a1f98c97f5 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -93,9 +93,9 @@ int igt_spinner_pin(struct igt_spinner *spin, }
if (!spin->batch) { - unsigned int mode = - i915_coherent_map_type(spin->gt->i915); + unsigned int mode;
+ mode = i915_coherent_map_type(spin->gt->i915, spin->obj, false); vaddr = igt_spinner_pin_obj(ce, ww, spin->obj, mode, &spin->batch_vma); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
We may be without a context to perform various internal blitter operations, for example when performing object migration. Piggybacking off the kernel_context is probably a bad idea, since it has other uses.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_engine.h | 2 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 40 +++++++++++++++++++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + 3 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 760fefdfe392..188c5ff6dc64 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -186,6 +186,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value) #define I915_GEM_HWS_PREEMPT_ADDR (I915_GEM_HWS_PREEMPT * sizeof(u32)) #define I915_GEM_HWS_SEQNO 0x40 #define I915_GEM_HWS_SEQNO_ADDR (I915_GEM_HWS_SEQNO * sizeof(u32)) +#define I915_GEM_HWS_BLITTER 0x42 +#define I915_GEM_HWS_BLITTER_ADDR (I915_GEM_HWS_BLITTER * sizeof(u32)) #define I915_GEM_HWS_SCRATCH 0x80
#define I915_HWS_CSB_BUF0_INDEX 0x10 diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 677c97ded81d..0ba020346566 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -819,6 +819,7 @@ create_pinned_context(struct intel_engine_cs *engine, int err;
ce = intel_context_create(engine); + if (IS_ERR(ce)) return ce;
@@ -851,6 +852,20 @@ create_kernel_context(struct intel_engine_cs *engine) &kernel, "kernel_context"); }
+static struct intel_context * +create_blitter_context(struct intel_engine_cs *engine) +{ + static struct lock_class_key blitter; + struct intel_context *ce; + + ce = create_pinned_context(engine, I915_GEM_HWS_BLITTER_ADDR, &blitter, + "blitter_context"); + if (IS_ERR(ce)) + return ce; + + return ce; +} + /** * intel_engines_init_common - initialize cengine state which might require hw access * @engine: Engine to initialize. @@ -881,17 +896,33 @@ static int engine_init_common(struct intel_engine_cs *engine) if (IS_ERR(ce)) return PTR_ERR(ce);
+ engine->kernel_context = ce; ret = measure_breadcrumb_dw(ce); if (ret < 0) goto err_context;
engine->emit_fini_breadcrumb_dw = ret; - engine->kernel_context = ce; + + /* + * The blitter context is used to quickly memset or migrate objects + * in local memory, so it has to always be available. + */ + if (engine->class == COPY_ENGINE_CLASS) { + ce = create_blitter_context(engine); + if (IS_ERR(ce)) { + ret = PTR_ERR(ce); + goto err_unpin; + } + + engine->blitter_context = ce; + }
return 0;
+err_unpin: + intel_context_unpin(engine->kernel_context); err_context: - intel_context_put(ce); + intel_context_put(engine->kernel_context); return ret; }
@@ -947,6 +978,11 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) if (engine->default_state) fput(engine->default_state);
+ if (engine->blitter_context) { + intel_context_unpin(engine->blitter_context); + intel_context_put(engine->blitter_context); + } + if (engine->kernel_context) { intel_context_unpin(engine->kernel_context); intel_context_put(engine->kernel_context); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index ee6312601c56..cb2de4bf86ba 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -347,6 +347,7 @@ struct intel_engine_cs { struct llist_head barrier_tasks;
struct intel_context *kernel_context; /* pinned */ + struct intel_context *blitter_context; /* pinned; exists for BCS only */
intel_engine_mask_t saturated; /* submitting semaphores too late? */
Support basic eviction for regions.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com --- .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 59 ++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_shrinker.h | 4 + drivers/gpu/drm/i915/i915_gem.c | 17 +++++ drivers/gpu/drm/i915/intel_memory_region.c | 24 +++++- .../drm/i915/selftests/intel_memory_region.c | 76 +++++++++++++++++++ 6 files changed, 178 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index b172e8cc53ab..6d101275bc9d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -226,6 +226,7 @@ struct drm_i915_gem_object { * region->obj_lock. */ struct list_head region_link; + struct list_head tmp_link;
struct sg_table *pages; void *mapping; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index e42192834c88..4d346df8fd5b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -16,6 +16,7 @@ #include "gt/intel_gt_requests.h"
#include "i915_trace.h" +#include "gt/intel_gt_requests.h"
static bool swap_available(void) { @@ -271,6 +272,64 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *i915) return freed; }
+int i915_gem_shrink_memory_region(struct intel_memory_region *mem, + resource_size_t target) +{ + struct drm_i915_private *i915 = mem->i915; + struct drm_i915_gem_object *obj; + resource_size_t purged; + LIST_HEAD(purgeable); + int err = -ENOSPC; + + intel_gt_retire_requests(&i915->gt); + + purged = 0; + + mutex_lock(&mem->objects.lock); + + while ((obj = list_first_entry_or_null(&mem->objects.purgeable, + typeof(*obj), + mm.region_link))) { + list_move_tail(&obj->mm.region_link, &purgeable); + + if (!i915_gem_object_has_pages(obj)) + continue; + + if (i915_gem_object_is_framebuffer(obj)) + continue; + + if (!kref_get_unless_zero(&obj->base.refcount)) + continue; + + mutex_unlock(&mem->objects.lock); + + if (!i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE)) { + if (i915_gem_object_trylock(obj)) { + __i915_gem_object_put_pages(obj); + if (!i915_gem_object_has_pages(obj)) { + purged += obj->base.size; + if (!i915_gem_object_is_volatile(obj)) + obj->mm.madv = __I915_MADV_PURGED; + } + i915_gem_object_unlock(obj); + } + } + + i915_gem_object_put(obj); + + mutex_lock(&mem->objects.lock); + + if (purged >= target) { + err = 0; + break; + } + } + + list_splice_tail(&purgeable, &mem->objects.purgeable); + mutex_unlock(&mem->objects.lock); + return err; +} + static unsigned long i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h index 8512470f6fd6..c945f3b587d6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h @@ -7,10 +7,12 @@ #define __I915_GEM_SHRINKER_H__
#include <linux/bits.h> +#include <linux/types.h>
struct drm_i915_private; struct i915_gem_ww_ctx; struct mutex; +struct intel_memory_region;
/* i915_gem_shrinker.c */ unsigned long i915_gem_shrink(struct i915_gem_ww_ctx *ww, @@ -29,5 +31,7 @@ void i915_gem_driver_register__shrinker(struct drm_i915_private *i915); void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915); void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915, struct mutex *mutex); +int i915_gem_shrink_memory_region(struct intel_memory_region *mem, + resource_size_t target);
#endif /* __I915_GEM_SHRINKER_H__ */ diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2662d679db6e..ef2124c17a7f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1104,6 +1104,23 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data, !i915_gem_object_has_pages(obj)) i915_gem_object_truncate(obj);
+ if (obj->mm.region && i915_gem_object_has_pages(obj)) { + mutex_lock(&obj->mm.region->objects.lock); + + switch (obj->mm.madv) { + case I915_MADV_WILLNEED: + list_move(&obj->mm.region_link, + &obj->mm.region->objects.list); + break; + default: + list_move(&obj->mm.region_link, + &obj->mm.region->objects.purgeable); + break; + } + + mutex_unlock(&obj->mm.region->objects.lock); + } + args->retained = obj->mm.madv != __I915_MADV_PURGED;
i915_gem_object_unlock(obj); diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index b326993a1026..308f89b87834 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -97,7 +97,8 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, do { struct i915_buddy_block *block; unsigned int order; - + bool retry = true; +retry: order = fls(n_pages) - 1; GEM_BUG_ON(order > mem->mm.max_order); GEM_BUG_ON(order < min_order); @@ -107,8 +108,25 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, if (!IS_ERR(block)) break;
- if (order-- == min_order) - goto err_free_blocks; + if (order-- == min_order) { + resource_size_t target; + int err; + + if (!retry) + goto err_free_blocks; + + target = n_pages * mem->mm.chunk_size; + + mutex_unlock(&mem->mm_lock); + err = i915_gem_shrink_memory_region(mem, + target); + mutex_lock(&mem->mm_lock); + if (err) + goto err_free_blocks; + + retry = false; + goto retry; + } } while (1);
n_pages -= BIT(order); diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 9c20b7065fc5..84525ddba321 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -848,12 +848,88 @@ static int perf_memcpy(void *arg) return 0; }
+static void igt_mark_evictable(struct drm_i915_gem_object *obj) +{ + i915_gem_object_unpin_pages(obj); + obj->mm.madv = I915_MADV_DONTNEED; + list_move(&obj->mm.region_link, &obj->mm.region->objects.purgeable); +} + +static int igt_mock_shrink(void *arg) +{ + struct intel_memory_region *mem = arg; + struct drm_i915_gem_object *obj; + unsigned long n_objects; + LIST_HEAD(objects); + resource_size_t target; + resource_size_t total; + int err = 0; + + target = mem->mm.chunk_size; + total = resource_size(&mem->region); + n_objects = total / target; + + while (n_objects--) { + obj = i915_gem_object_create_region(mem, + target, + 0); + if (IS_ERR(obj)) { + err = PTR_ERR(obj); + goto err_close_objects; + } + + list_add(&obj->st_link, &objects); + + err = i915_gem_object_pin_pages(obj); + if (err) + goto err_close_objects; + + /* + * Make half of the region evictable, though do so in a + * horribly fragmented fashion. + */ + if (n_objects % 2) + igt_mark_evictable(obj); + } + + while (target <= total / 2) { + obj = i915_gem_object_create_region(mem, target, 0); + if (IS_ERR(obj)) { + err = PTR_ERR(obj); + goto err_close_objects; + } + + list_add(&obj->st_link, &objects); + + /* Provoke the shrinker to start violently swinging its axe! */ + err = i915_gem_object_pin_pages(obj); + if (err) { + pr_err("failed to shrink for target=%pa", &target); + goto err_close_objects; + } + + /* Again, half of the region should remain evictable */ + igt_mark_evictable(obj); + + target <<= 1; + } + +err_close_objects: + close_objects(mem, &objects); + + if (err == -ENOMEM) + err = 0; + + return err; +} + int intel_memory_region_mock_selftests(void) { static const struct i915_subtest tests[] = { SUBTEST(igt_mock_fill), SUBTEST(igt_mock_contiguous), SUBTEST(igt_mock_splintered_region), + SUBTEST(igt_mock_shrink), }; struct intel_memory_region *mem; struct drm_i915_private *i915;
From: Thomas Hellström thomas.hellstrom@intel.com
We want to be able to blit from within a ww transaction, so add blit functions that are able to do that. Also take care to unlock the blit batch-buffer after use so it isn't recycled locked.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- .../gpu/drm/i915/gem/i915_gem_object_blt.c | 91 +++++++++++++------ .../gpu/drm/i915/gem/i915_gem_object_blt.h | 10 ++ 2 files changed, 72 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c index e0b873c3f46a..b41b076f6864 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c @@ -145,11 +145,11 @@ move_obj_to_gpu(struct drm_i915_gem_object *obj, return i915_request_await_object(rq, obj, write); }
-int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj, - struct intel_context *ce, - u32 value) +int i915_gem_object_ww_fill_blt(struct drm_i915_gem_object *obj, + struct i915_gem_ww_ctx *ww, + struct intel_context *ce, + u32 value) { - struct i915_gem_ww_ctx ww; struct i915_request *rq; struct i915_vma *batch; struct i915_vma *vma; @@ -159,22 +159,16 @@ int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj, if (IS_ERR(vma)) return PTR_ERR(vma);
- i915_gem_ww_ctx_init(&ww, true); intel_engine_pm_get(ce->engine); -retry: - err = i915_gem_object_lock(obj, &ww); + err = intel_context_pin_ww(ce, ww); if (err) goto out;
- err = intel_context_pin_ww(ce, &ww); - if (err) - goto out; - - err = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_USER); + err = i915_vma_pin_ww(vma, ww, 0, 0, PIN_USER); if (err) goto out_ctx;
- batch = intel_emit_vma_fill_blt(ce, vma, &ww, value); + batch = intel_emit_vma_fill_blt(ce, vma, ww, value); if (IS_ERR(batch)) { err = PTR_ERR(batch); goto out_vma; @@ -210,22 +204,43 @@ int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj,
i915_request_add(rq); out_batch: + i915_gem_ww_unlock_single(batch->obj); intel_emit_vma_release(ce, batch); out_vma: i915_vma_unpin(vma); out_ctx: intel_context_unpin(ce); out: + intel_engine_pm_put(ce->engine); + return err; +} + +int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj, + struct intel_context *ce, + u32 value) +{ + struct i915_gem_ww_ctx ww; + int err; + + i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(obj, &ww); + if (err) + goto out_err; + + err = i915_gem_object_ww_fill_blt(obj, &ww, ce, value); +out_err: if (err == -EDEADLK) { err = i915_gem_ww_ctx_backoff(&ww); if (!err) goto retry; } i915_gem_ww_ctx_fini(&ww); - intel_engine_pm_put(ce->engine); + return err; }
+ /* Wa_1209644611:icl,ehl */ static bool wa_1209644611_applies(struct drm_i915_private *i915, u32 size) { @@ -354,13 +369,13 @@ struct i915_vma *intel_emit_vma_copy_blt(struct intel_context *ce, return ERR_PTR(err); }
-int i915_gem_object_copy_blt(struct drm_i915_gem_object *src, - struct drm_i915_gem_object *dst, - struct intel_context *ce) +int i915_gem_object_ww_copy_blt(struct drm_i915_gem_object *src, + struct drm_i915_gem_object *dst, + struct i915_gem_ww_ctx *ww, + struct intel_context *ce) { struct i915_address_space *vm = ce->vm; struct i915_vma *vma[2], *batch; - struct i915_gem_ww_ctx ww; struct i915_request *rq; int err, i;
@@ -372,26 +387,20 @@ int i915_gem_object_copy_blt(struct drm_i915_gem_object *src, if (IS_ERR(vma[1])) return PTR_ERR(vma[1]);
- i915_gem_ww_ctx_init(&ww, true); intel_engine_pm_get(ce->engine); -retry: - err = i915_gem_object_lock(src, &ww); - if (!err) - err = i915_gem_object_lock(dst, &ww); - if (!err) - err = intel_context_pin_ww(ce, &ww); + err = intel_context_pin_ww(ce, ww); if (err) goto out;
- err = i915_vma_pin_ww(vma[0], &ww, 0, 0, PIN_USER); + err = i915_vma_pin_ww(vma[0], ww, 0, 0, PIN_USER); if (err) goto out_ctx;
- err = i915_vma_pin_ww(vma[1], &ww, 0, 0, PIN_USER); + err = i915_vma_pin_ww(vma[1], ww, 0, 0, PIN_USER); if (unlikely(err)) goto out_unpin_src;
- batch = intel_emit_vma_copy_blt(ce, &ww, vma[0], vma[1]); + batch = intel_emit_vma_copy_blt(ce, ww, vma[0], vma[1]); if (IS_ERR(batch)) { err = PTR_ERR(batch); goto out_unpin_dst; @@ -437,6 +446,7 @@ int i915_gem_object_copy_blt(struct drm_i915_gem_object *src,
i915_request_add(rq); out_batch: + i915_gem_ww_unlock_single(batch->obj); intel_emit_vma_release(ce, batch); out_unpin_dst: i915_vma_unpin(vma[1]); @@ -445,13 +455,36 @@ int i915_gem_object_copy_blt(struct drm_i915_gem_object *src, out_ctx: intel_context_unpin(ce); out: + intel_engine_pm_put(ce->engine); + return err; +} + +int i915_gem_object_copy_blt(struct drm_i915_gem_object *src, + struct drm_i915_gem_object *dst, + struct intel_context *ce) +{ + struct i915_gem_ww_ctx ww; + int err; + + i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(src, &ww); + if (err) + goto out_err; + + err = i915_gem_object_lock(dst, &ww); + if (err) + goto out_err; + + err = i915_gem_object_ww_copy_blt(src, dst, &ww, ce); +out_err: if (err == -EDEADLK) { err = i915_gem_ww_ctx_backoff(&ww); if (!err) goto retry; } i915_gem_ww_ctx_fini(&ww); - intel_engine_pm_put(ce->engine); + return err; }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.h b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.h index 2409fdcccf0e..da3d66abde64 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.h @@ -36,4 +36,14 @@ int i915_gem_object_copy_blt(struct drm_i915_gem_object *src, struct drm_i915_gem_object *dst, struct intel_context *ce);
+int i915_gem_object_ww_fill_blt(struct drm_i915_gem_object *obj, + struct i915_gem_ww_ctx *ww, + struct intel_context *ce, + u32 value); + +int i915_gem_object_ww_copy_blt(struct drm_i915_gem_object *src, + struct drm_i915_gem_object *dst, + struct i915_gem_ww_ctx *ww, + struct intel_context *ce); + #endif
From: Thomas Hellström thomas.hellstrom@intel.com
When an object is published on an eviction list, it's considered for eviction and can be locked by other threads. This is strictly not necessary until the object has pages. To limit eviction lookups that need to discard the object and facilitate a longer period during which we can lock the object isolated (trylock or ww lock without chance of deadlock or interruption), delay eviction list publishing until pages are set. Also take the object off the eviction lists when pages are unset. Finally make sure that an object is either locked or isolated when eviction list manipulation happens.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 2 ++ drivers/gpu/drm/i915/gem/i915_gem_pages.c | 22 +++++++++++++++++++++- drivers/gpu/drm/i915/gem/i915_gem_region.c | 18 ++---------------- 3 files changed, 25 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 08d806bbf48e..5326b4b5a9f7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -66,6 +66,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, INIT_LIST_HEAD(&obj->vma.list);
INIT_LIST_HEAD(&obj->mm.link); + INIT_LIST_HEAD(&obj->mm.region_link);
INIT_LIST_HEAD(&obj->lut_list); spin_lock_init(&obj->lut_lock); @@ -79,6 +80,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, GEM_BUG_ON(flags & ~I915_BO_ALLOC_FLAGS); obj->flags = flags;
+ obj->mm.region = NULL; obj->mm.madv = I915_MADV_WILLNEED; INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN); mutex_init(&obj->mm.get_page.lock); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 4a8be759832b..eacad971b955 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -16,6 +16,8 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, { struct drm_i915_private *i915 = to_i915(obj->base.dev); unsigned long supported = INTEL_INFO(i915)->page_sizes; + struct intel_memory_region *mem; + struct list_head *list; int i;
assert_object_held_shared(obj); @@ -64,7 +66,6 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(!HAS_PAGE_SIZES(i915, obj->mm.page_sizes.sg));
if (i915_gem_object_is_shrinkable(obj)) { - struct list_head *list; unsigned long flags;
assert_object_held(obj); @@ -82,6 +83,18 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, atomic_set(&obj->mm.shrink_pin, 0); spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } + + mem = obj->mm.region; + if (mem) { + mutex_lock(&mem->objects.lock); + GEM_WARN_ON(!list_empty(&obj->mm.region_link)); + if (obj->mm.madv != I915_MADV_WILLNEED) + list = &mem->objects.purgeable; + else + list = &mem->objects.list; + list_move_tail(&obj->mm.region_link, list); + mutex_unlock(&mem->objects.lock); + } }
int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj) @@ -192,6 +205,7 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr) struct sg_table * __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) { + struct intel_memory_region *mem = obj->mm.region; struct sg_table *pages;
assert_object_held_shared(obj); @@ -205,6 +219,12 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
i915_gem_object_make_unshrinkable(obj);
+ if (mem) { + mutex_lock(&mem->objects.lock); + list_del_init(&obj->mm.region_link); + mutex_unlock(&mem->objects.lock); + } + if (obj->mm.mapping) { unmap_object(obj, page_mask_bits(obj->mm.mapping)); obj->mm.mapping = NULL; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 6a96741253b3..58bf5f9e3199 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -105,30 +105,16 @@ void i915_gem_object_init_memory_region(struct drm_i915_gem_object *obj, struct intel_memory_region *mem) { INIT_LIST_HEAD(&obj->mm.blocks); + WARN_ON(i915_gem_object_has_pages(obj)); obj->mm.region = intel_memory_region_get(mem);
if (obj->base.size <= mem->min_page_size) obj->flags |= I915_BO_ALLOC_CONTIGUOUS; - - mutex_lock(&mem->objects.lock); - - if (obj->flags & I915_BO_ALLOC_VOLATILE) - list_add(&obj->mm.region_link, &mem->objects.purgeable); - else - list_add(&obj->mm.region_link, &mem->objects.list); - - mutex_unlock(&mem->objects.lock); }
void i915_gem_object_release_memory_region(struct drm_i915_gem_object *obj) { - struct intel_memory_region *mem = obj->mm.region; - - mutex_lock(&mem->objects.lock); - list_del(&obj->mm.region_link); - mutex_unlock(&mem->objects.lock); - - intel_memory_region_put(mem); + intel_memory_region_put(obj->mm.region); }
struct drm_i915_gem_object *
We are going want to able to move objects between different regions like system memory and local memory. In the future everything should be just another region.
Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Prathap Kumar Valsan prathap.kumar.valsan@intel.com Cc: CQ Tang cq.tang@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 13 ++ drivers/gpu/drm/i915/gem/i915_gem_mman.h | 2 + drivers/gpu/drm/i915/gem/i915_gem_object.c | 125 +++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_object.h | 9 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 2 +- .../drm/i915/selftests/intel_memory_region.c | 174 +++++++++++++++++- 6 files changed, 322 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 2561a2f1e54f..4e8a05c35252 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -546,6 +546,19 @@ void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj) spin_unlock(&obj->mmo.lock); }
+/** + * i915_gem_object_release_mmap - remove physical page mappings + * @obj: obj in question + * + * Preserve the reservation of the mmapping with the DRM core code, but + * relinquish ownership of the pages back to the system. + */ +void i915_gem_object_release_mmap(struct drm_i915_gem_object *obj) +{ + i915_gem_object_release_mmap_gtt(obj); + i915_gem_object_release_mmap_offset(obj); +} + static struct i915_mmap_offset * lookup_mmo(struct drm_i915_gem_object *obj, enum i915_mmap_type mmap_type) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h index efee9e0d2508..7c5ccdf59359 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h @@ -24,6 +24,8 @@ int i915_gem_dumb_mmap_offset(struct drm_file *file_priv, struct drm_device *dev, u32 handle, u64 *offset);
+void i915_gem_object_release_mmap(struct drm_i915_gem_object *obj); + void __i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj); void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 5326b4b5a9f7..7ff430503497 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -26,11 +26,14 @@
#include "display/intel_frontbuffer.h" #include "gt/intel_gt.h" +#include "gt/intel_gt_requests.h" #include "i915_drv.h" #include "i915_gem_clflush.h" #include "i915_gem_context.h" #include "i915_gem_mman.h" #include "i915_gem_object.h" +#include "i915_gem_object_blt.h" +#include "i915_gem_region.h" #include "i915_globals.h" #include "i915_trace.h"
@@ -311,6 +314,128 @@ static void i915_gem_free_object(struct drm_gem_object *gem_obj) queue_work(i915->wq, &i915->mm.free_work); }
+int i915_gem_object_prepare_move(struct drm_i915_gem_object *obj) +{ + int err; + + assert_object_held(obj); + + if (obj->mm.madv != I915_MADV_WILLNEED) + return -EINVAL; + + if (i915_gem_object_needs_bit17_swizzle(obj)) + return -EINVAL; + + if (i915_gem_object_is_framebuffer(obj)) + return -EBUSY; + + i915_gem_object_release_mmap(obj); + + GEM_BUG_ON(obj->mm.mapping); + GEM_BUG_ON(obj->base.filp && mapping_mapped(obj->base.filp->f_mapping)); + + err = i915_gem_object_wait(obj, + I915_WAIT_INTERRUPTIBLE | + I915_WAIT_ALL, + MAX_SCHEDULE_TIMEOUT); + if (err) + return err; + + return i915_gem_object_unbind(obj, + I915_GEM_OBJECT_UNBIND_ACTIVE); +} + +int i915_gem_object_migrate(struct drm_i915_gem_object *obj, + struct i915_gem_ww_ctx *ww, + struct intel_context *ce, + enum intel_region_id id) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct drm_i915_gem_object *donor; + struct intel_memory_region *mem; + struct sg_table *pages = NULL; + unsigned int page_sizes; + int err = 0; + + assert_object_held(obj); + GEM_BUG_ON(id >= INTEL_REGION_UNKNOWN); + GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); + if (obj->mm.region->id == id) + return 0; + + mem = i915->mm.regions[id]; + + donor = i915_gem_object_create_region(mem, obj->base.size, 0); + if (IS_ERR(donor)) { + err = PTR_ERR(donor); + return err; + } + + err = i915_gem_object_lock(donor, ww); + if (err) + goto err_put_donor; + + /* Copy backing-pages if we have to */ + if (i915_gem_object_has_pages(obj) || + obj->base.filp) { + err = i915_gem_object_ww_copy_blt(obj, donor, ww, ce); + if (err) + goto unlock_donor; + } + + err = i915_gem_object_set_to_cpu_domain(donor, false); + if (err) + goto unlock_donor; + + intel_gt_retire_requests(&i915->gt); + + i915_gem_object_unbind(donor, 0); + err = i915_gem_object_unbind(obj, 0); + if (err) + goto unlock_donor; + + pages = __i915_gem_object_unset_pages(obj); + if (pages) + obj->ops->put_pages(obj, pages); + + page_sizes = donor->mm.page_sizes.phys; + pages = __i915_gem_object_unset_pages(donor); + + if (obj->ops->release) + obj->ops->release(obj); + + /* We need still need a little special casing for shmem */ + if (obj->base.filp) + fput(fetch_and_zero(&obj->base.filp)); + else if (donor->base.filp) { + atomic_long_inc(&donor->base.filp->f_count); + obj->base.filp = donor->base.filp; + } + + obj->base.size = donor->base.size; + obj->mm.region = intel_memory_region_get(mem); + obj->flags = donor->flags; + obj->ops = donor->ops; + obj->cache_level = donor->cache_level; + obj->cache_coherent = donor->cache_coherent; + obj->cache_dirty = donor->cache_dirty; + + list_replace_init(&donor->mm.blocks, &obj->mm.blocks); + + /* set pages after migrated */ + if (pages) + __i915_gem_object_set_pages(obj, pages, page_sizes); + + GEM_BUG_ON(i915_gem_object_has_pages(donor)); + GEM_BUG_ON(i915_gem_object_has_pinned_pages(donor)); +unlock_donor: + i915_gem_ww_unlock_single(donor); +err_put_donor: + i915_gem_object_put(donor); + + return err; +} + static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) { return !(obj->cache_level == I915_CACHE_NONE || diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index c6c7ab181a65..1a1aa71a4494 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -51,8 +51,17 @@ void i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj, struct sg_table *pages);
+enum intel_region_id; +int i915_gem_object_prepare_move(struct drm_i915_gem_object *obj); +int i915_gem_object_migrate(struct drm_i915_gem_object *obj, + struct i915_gem_ww_ctx *ww, + struct intel_context *ce, + enum intel_region_id id); + void i915_gem_flush_free_objects(struct drm_i915_private *i915);
+void __i915_gem_object_reset_page_iter(struct drm_i915_gem_object *obj); + struct sg_table * __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj); void i915_gem_object_truncate(struct drm_i915_gem_object *obj); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index eacad971b955..2cdb7cf63383 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -183,7 +183,7 @@ void i915_gem_object_writeback(struct drm_i915_gem_object *obj) obj->ops->writeback(obj); }
-static void __i915_gem_object_reset_page_iter(struct drm_i915_gem_object *obj) +void __i915_gem_object_reset_page_iter(struct drm_i915_gem_object *obj) { struct radix_tree_iter iter; void __rcu **slot; diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 84525ddba321..7acb94e0e5fe 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -14,6 +14,7 @@
#include "gem/i915_gem_context.h" #include "gem/i915_gem_lmem.h" +#include "gem/i915_gem_object_blt.h" #include "gem/i915_gem_region.h" #include "gem/i915_gem_object_blt.h" #include "gem/selftests/igt_gem_utils.h" @@ -476,6 +477,71 @@ static int igt_lmem_create(void *arg) return err; }
+static int igt_smem_create_migrate(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct intel_context *ce = i915->gt.engine[BCS0]->kernel_context; + struct drm_i915_gem_object *obj; + struct i915_gem_ww_ctx ww; + int err = 0; + + /* Switch object backing-store on create */ + obj = i915_gem_object_create_lmem(i915, PAGE_SIZE, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(obj, &ww); + if (err) + continue; + + err = i915_gem_object_migrate(obj, &ww, ce, INTEL_REGION_SMEM); + if (err) + continue; + + err = i915_gem_object_pin_pages(obj); + if (err) + continue; + + i915_gem_object_unpin_pages(obj); + } + i915_gem_object_put(obj); + + return err; +} + +static int igt_lmem_create_migrate(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct intel_context *ce = i915->gt.engine[BCS0]->kernel_context; + struct drm_i915_gem_object *obj; + struct i915_gem_ww_ctx ww; + int err = 0; + + /* Switch object backing-store on create */ + obj = i915_gem_object_create_shmem(i915, PAGE_SIZE); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(obj, &ww); + if (err) + continue; + + err = i915_gem_object_migrate(obj, &ww, ce, INTEL_REGION_LMEM); + if (err) + continue; + + err = i915_gem_object_pin_pages(obj); + if (err) + continue; + + i915_gem_object_unpin_pages(obj); + } + i915_gem_object_put(obj); + + return err; +} static int igt_lmem_write_gpu(void *arg) { struct drm_i915_private *i915 = arg; @@ -880,7 +946,7 @@ static int igt_mock_shrink(void *arg)
list_add(&obj->st_link, &objects);
- err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) goto err_close_objects;
@@ -902,7 +968,7 @@ static int igt_mock_shrink(void *arg) list_add(&obj->st_link, &objects);
/* Provoke the shrinker to start violently swinging its axe! */ - err = i915_gem_object_pin_pages(obj); + err = i915_gem_object_pin_pages_unlocked(obj); if (err) { pr_err("failed to shrink for target=%pa", &target); goto err_close_objects; @@ -923,6 +989,107 @@ static int igt_mock_shrink(void *arg) return err; }
+static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww, + struct intel_context *ce, + struct drm_i915_gem_object *obj) +{ + int err; + + err = i915_gem_object_lock(obj, ww); + if (err) + return err; + + err = i915_gem_object_wait(obj, + I915_WAIT_INTERRUPTIBLE | + I915_WAIT_PRIORITY | + I915_WAIT_ALL, + MAX_SCHEDULE_TIMEOUT); + if (err) + return err; + + err = i915_gem_object_prepare_move(obj); + if (err) + return err; + + if (i915_gem_object_is_lmem(obj)) { + err = i915_gem_object_migrate(obj, ww, ce, INTEL_REGION_SMEM); + if (err) + return err; + + if (i915_gem_object_is_lmem(obj)) { + pr_err("object still backed by lmem\n"); + err = -EINVAL; + } + + if (!list_empty(&obj->mm.blocks)) { + pr_err("object leaking memory region\n"); + err = -EINVAL; + } + + if (!i915_gem_object_has_struct_page(obj)) { + pr_err("object not backed by struct page\n"); + err = -EINVAL; + } + + } else { + err = i915_gem_object_migrate(obj, ww, ce, INTEL_REGION_LMEM); + if (err) + return err; + + if (i915_gem_object_has_struct_page(obj)) { + pr_err("object still backed by struct page\n"); + err = -EINVAL; + } + + if (!i915_gem_object_is_lmem(obj)) { + pr_err("object not backed by lmem\n"); + err = -EINVAL; + } + } + + return err; +} + +static int igt_lmem_pages_migrate(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct drm_i915_gem_object *obj; + struct intel_context *ce; + struct i915_gem_ww_ctx ww; + int err; + int i; + + if (!HAS_ENGINE(&i915->gt, BCS0)) + return 0; + + ce = i915->gt.engine[BCS0]->kernel_context; + + /* From LMEM to shmem and back again */ + + obj = i915_gem_object_create_lmem(i915, SZ_2M, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + err = i915_gem_object_fill_blt(obj, ce, 0); + if (err) + goto out_put; + + for (i = 1; i <= 4; ++i) { + for_i915_gem_ww(&ww, err, true) + err = lmem_pages_migrate_one(&ww, ce, obj); + if (err) + break; + + err = i915_gem_object_fill_blt(obj, ce, 0xdeadbeaf); + if (err) + break; + } +out_put: + i915_gem_object_put(obj); + + return err; +} + int intel_memory_region_mock_selftests(void) { static const struct i915_subtest tests[] = { @@ -960,6 +1127,9 @@ int intel_memory_region_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_lmem_create), SUBTEST(igt_lmem_write_cpu), SUBTEST(igt_lmem_write_gpu), + SUBTEST(igt_smem_create_migrate), + SUBTEST(igt_lmem_create_migrate), + SUBTEST(igt_lmem_pages_migrate), };
if (!HAS_LMEM(i915)) {
From: CQ Tang cq.tang@intel.com
We posted blitter copying operation. Then we call i915_gem_object_set_to_cpu_domain(), inside this function, we call i915_gem_object_wait() with interruptible flag. Sometimes this wait call gets interrupted by the blitter copying complete interrupt. This will make migration operation to fail. So before calling i915_gem_object_set_to_cpu_domain(), we call i915_gem_object_wait() with non-interruptible flag to wait for the blitter operation to finish.
Signed-off-by: CQ Tang cq.tang@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 7ff430503497..49935245a4a8 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -381,6 +381,17 @@ int i915_gem_object_migrate(struct drm_i915_gem_object *obj, err = i915_gem_object_ww_copy_blt(obj, donor, ww, ce); if (err) goto unlock_donor; + + /* + * Occasionally i915_gem_object_wait() called inside + * i915_gem_object_set_to_cpu_domain() get interrupted + * and return -ERESTARTSYS, this will make migration + * operation fail. So adding a non-interruptible wait + * before changing the object domain. + */ + err = i915_gem_object_wait(donor, 0, MAX_SCHEDULE_TIMEOUT); + if (err) + goto unlock_donor; }
err = i915_gem_object_set_to_cpu_domain(donor, false);
From: Abdiel Janulgue abdiel.janulgue@linux.intel.com
Returns the available memory region areas supported by the HW.
Signed-off-by: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 12 ++++- drivers/gpu/drm/i915/gem/i915_gem_stolen.h | 3 ++ drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/i915_pci.c | 2 +- drivers/gpu/drm/i915/i915_query.c | 62 ++++++++++++++++++++++ drivers/gpu/drm/i915/intel_memory_region.c | 32 ++++++----- drivers/gpu/drm/i915/intel_memory_region.h | 38 +++++++------ include/uapi/drm/i915_drm.h | 58 ++++++++++++++++++++ 8 files changed, 172 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index ce9086d3a647..25e3cc53316e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -704,11 +704,19 @@ _i915_gem_object_create_stolen(struct intel_memory_region *mem, return obj; }
+struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915) +{ + if (HAS_LMEM(i915)) + return i915->mm.regions[INTEL_REGION_STOLEN_LMEM]; + + return i915->mm.regions[INTEL_REGION_STOLEN_SMEM]; +} + struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *i915, resource_size_t size) { - return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_STOLEN], + return i915_gem_object_create_region(i915_stolen_region(i915), size, I915_BO_ALLOC_CONTIGUOUS); }
@@ -748,7 +756,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915, resource_size_t stolen_offset, resource_size_t size) { - struct intel_memory_region *mem = i915->mm.regions[INTEL_REGION_STOLEN]; + struct intel_memory_region *mem = i915_stolen_region(i915); struct drm_i915_gem_object *obj; struct drm_mm_node *stolen; int ret; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h index 61e028063f9f..67f6264f3ff9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h @@ -22,6 +22,9 @@ int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv, void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv, struct drm_mm_node *node); struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915); + +struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915); + struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *dev_priv, resource_size_t size); diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 320856b665a1..07b3a89ec09e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -843,7 +843,7 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (INTEL_GEN(i915) >= 9 && i915_selftest.live < 0 && i915->params.fake_lmem_start) { mkwrite_device_info(i915)->memory_regions = - REGION_SMEM | REGION_LMEM | REGION_STOLEN; + REGION_SMEM | REGION_LMEM | REGION_STOLEN_SMEM; GEM_BUG_ON(!HAS_LMEM(i915)); } } diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 11fe790b1969..8243178a56f9 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -154,7 +154,7 @@ .page_sizes = I915_GTT_PAGE_SIZE_4K
#define GEN_DEFAULT_REGIONS \ - .memory_regions = REGION_SMEM | REGION_STOLEN + .memory_regions = REGION_SMEM | REGION_STOLEN_SMEM
#define I830_FEATURES \ GEN(2), \ diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..d4ca040c528b 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -419,11 +419,73 @@ static int query_perf_config(struct drm_i915_private *i915, } }
+static int query_memregion_info(struct drm_i915_private *dev_priv, + struct drm_i915_query_item *query_item) +{ + struct drm_i915_query_memory_regions __user *query_ptr = + u64_to_user_ptr(query_item->data_ptr); + struct drm_i915_memory_region_info __user *info_ptr = + &query_ptr->regions[0]; + struct drm_i915_memory_region_info info = { }; + struct drm_i915_query_memory_regions query; + u32 total_length; + int ret, i; + + if (query_item->flags != 0) + return -EINVAL; + + total_length = sizeof(query); + for (i = 0; i < ARRAY_SIZE(dev_priv->mm.regions); ++i) { + struct intel_memory_region *region = dev_priv->mm.regions[i]; + + if (!region) + continue; + + total_length += sizeof(info); + } + + ret = copy_query_item(&query, sizeof(query), total_length, query_item); + if (ret != 0) + return ret; + + if (query.num_regions) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(query.rsvd); ++i) { + if (query.rsvd[i]) + return -EINVAL; + } + + for (i = 0; i < ARRAY_SIZE(dev_priv->mm.regions); ++i) { + struct intel_memory_region *region = dev_priv->mm.regions[i]; + + if (!region) + continue; + + info.region.memory_class = region->type; + info.region.memory_instance = region->instance; + info.probed_size = region->total; + info.unallocated_size = region->avail; + + if (__copy_to_user(info_ptr, &info, sizeof(info))) + return -EFAULT; + + query.num_regions++; + info_ptr++; + } + + if (__copy_to_user(query_ptr, &query, sizeof(query))) + return -EFAULT; + + return total_length; +} + static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, struct drm_i915_query_item *query_item) = { query_topology_info, query_engine_info, query_perf_config, + query_memregion_info, };
int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file) diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 308f89b87834..dca1e367ab98 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -6,14 +6,19 @@ #include "intel_memory_region.h" #include "i915_drv.h"
-/* XXX: Hysterical raisins. BIT(inst) needs to just be (inst) at some point. */ -#define REGION_MAP(type, inst) \ - BIT((type) + INTEL_MEMORY_TYPE_SHIFT) | BIT(inst) - -const u32 intel_region_map[] = { - [INTEL_REGION_SMEM] = REGION_MAP(INTEL_MEMORY_SYSTEM, 0), - [INTEL_REGION_LMEM] = REGION_MAP(INTEL_MEMORY_LOCAL, 0), - [INTEL_REGION_STOLEN] = REGION_MAP(INTEL_MEMORY_STOLEN, 0), +const struct intel_memory_region_info intel_region_map[] = { + [INTEL_REGION_SMEM] = { + .class = INTEL_MEMORY_SYSTEM, + .instance = 0, + }, + [INTEL_REGION_LMEM] = { + .class = INTEL_MEMORY_LOCAL, + .instance = 0, + }, + [INTEL_REGION_STOLEN_SMEM] = { + .class = INTEL_MEMORY_STOLEN_SYSTEM, + .instance = 0, + }, };
struct intel_memory_region * @@ -263,17 +268,18 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915)
for (i = 0; i < ARRAY_SIZE(i915->mm.regions); i++) { struct intel_memory_region *mem = ERR_PTR(-ENODEV); - u32 type; + u16 type, instance;
if (!HAS_REGION(i915, BIT(i))) continue;
- type = MEMORY_TYPE_FROM_REGION(intel_region_map[i]); + type = intel_region_map[i].class; + instance = intel_region_map[i].instance; switch (type) { case INTEL_MEMORY_SYSTEM: mem = i915_gem_shmem_setup(i915); break; - case INTEL_MEMORY_STOLEN: + case INTEL_MEMORY_STOLEN_SYSTEM: mem = i915_gem_stolen_setup(i915); break; case INTEL_MEMORY_LOCAL: @@ -289,9 +295,9 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) goto out_cleanup; }
- mem->id = intel_region_map[i]; + mem->id = i; mem->type = type; - mem->instance = MEMORY_INSTANCE_FROM_REGION(intel_region_map[i]); + mem->instance = instance;
i915->mm.regions[i] = mem; } diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 232490d89a83..c047cf7c5e7c 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -11,6 +11,7 @@ #include <linux/mutex.h> #include <linux/io-mapping.h> #include <drm/drm_mm.h> +#include <drm/i915_drm.h>
#include "i915_buddy.h"
@@ -19,30 +20,25 @@ struct drm_i915_gem_object; struct intel_memory_region; struct sg_table;
-/** - * Base memory type - */ enum intel_memory_type { - INTEL_MEMORY_SYSTEM = 0, - INTEL_MEMORY_LOCAL, - INTEL_MEMORY_STOLEN, + INTEL_MEMORY_SYSTEM = I915_MEMORY_CLASS_SYSTEM, + INTEL_MEMORY_LOCAL = I915_MEMORY_CLASS_DEVICE, + INTEL_MEMORY_STOLEN_SYSTEM = I915_MEMORY_CLASS_STOLEN_SYSTEM, + INTEL_MEMORY_STOLEN_LOCAL = I915_MEMORY_CLASS_STOLEN_DEVICE, };
enum intel_region_id { INTEL_REGION_SMEM = 0, INTEL_REGION_LMEM, - INTEL_REGION_STOLEN, + INTEL_REGION_STOLEN_SMEM, + INTEL_REGION_STOLEN_LMEM, INTEL_REGION_UNKNOWN, /* Should be last */ };
#define REGION_SMEM BIT(INTEL_REGION_SMEM) #define REGION_LMEM BIT(INTEL_REGION_LMEM) -#define REGION_STOLEN BIT(INTEL_REGION_STOLEN) - -#define INTEL_MEMORY_TYPE_SHIFT 16 - -#define MEMORY_TYPE_FROM_REGION(r) (ilog2((r) >> INTEL_MEMORY_TYPE_SHIFT)) -#define MEMORY_INSTANCE_FROM_REGION(r) (ilog2((r) & 0xffff)) +#define REGION_STOLEN_SMEM BIT(INTEL_REGION_STOLEN_SMEM) +#define REGION_STOLEN_LMEM BIT(INTEL_REGION_STOLEN_LMEM)
#define I915_ALLOC_MIN_PAGE_SIZE BIT(0) #define I915_ALLOC_CONTIGUOUS BIT(1) @@ -51,10 +47,12 @@ enum intel_region_id { for (id = 0; id < ARRAY_SIZE((i915)->mm.regions); id++) \ for_each_if((mr) = (i915)->mm.regions[id])
-/** - * Memory regions encoded as type | instance - */ -extern const u32 intel_region_map[]; +struct intel_memory_region_info { + u16 class; + u16 instance; +}; + +extern const struct intel_memory_region_info intel_region_map[];
struct intel_memory_region_ops { unsigned int flags; @@ -89,9 +87,9 @@ struct intel_memory_region { resource_size_t total; resource_size_t avail;
- unsigned int type; - unsigned int instance; - unsigned int id; + u16 type; + u16 instance; + enum intel_region_id id; char name[8];
dma_addr_t remap_addr; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index fa1f3d62f9a6..41845203250d 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -2175,6 +2175,7 @@ struct drm_i915_query_item { #define DRM_I915_QUERY_TOPOLOGY_INFO 1 #define DRM_I915_QUERY_ENGINE_INFO 2 #define DRM_I915_QUERY_PERF_CONFIG 3 +#define DRM_I915_QUERY_MEMORY_REGIONS 4 /* Must be kept compact -- no holes and well documented */
/* @@ -2375,6 +2376,63 @@ struct drm_i915_query_perf_config { __u8 data[]; };
+enum drm_i915_gem_memory_class { + I915_MEMORY_CLASS_SYSTEM = 0, + I915_MEMORY_CLASS_DEVICE, + I915_MEMORY_CLASS_STOLEN_SYSTEM, + I915_MEMORY_CLASS_STOLEN_DEVICE, +}; + +struct drm_i915_gem_memory_class_instance { + __u16 memory_class; /* see enum drm_i915_gem_memory_class */ + __u16 memory_instance; +}; + +/** + * struct drm_i915_memory_region_info + * + * Describes one region as known to the driver. + */ +struct drm_i915_memory_region_info { + /** class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** MBZ */ + __u32 rsvd0; + + /** MBZ */ + __u64 caps; + + /** MBZ */ + __u64 flags; + + /** Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + /** MBZ */ + __u64 rsvd1[8]; +}; + +/** + * struct drm_i915_query_memory_regions + * + * Region info query enumerates all regions known to the driver by filling in + * an array of struct drm_i915_memory_region_info structures. + */ +struct drm_i915_query_memory_regions { + /** Number of supported regions */ + __u32 num_regions; + + /** MBZ */ + __u32 rsvd[3]; + + /* Info about each supported region */ + struct drm_i915_memory_region_info regions[]; +}; + #if defined(__cplusplus) } #endif
From: Prathap Kumar Valsan prathap.kumar.valsan@intel.com
Store pointer to gt closest to its memory region so that we can access the engines corresponding to that gt via memory region.
Signed-off-by: Prathap Kumar Valsan prathap.kumar.valsan@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com --- drivers/gpu/drm/i915/intel_memory_region.c | 1 + drivers/gpu/drm/i915/intel_memory_region.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index dca1e367ab98..6f40748901da 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -298,6 +298,7 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) mem->id = i; mem->type = type; mem->instance = instance; + mem->gt = &i915->gt;
i915->mm.regions[i] = mem; } diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index c047cf7c5e7c..15dcb57b4b5a 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -91,6 +91,7 @@ struct intel_memory_region { u16 instance; enum intel_region_id id; char name[8]; + struct intel_gt *gt; /* GT closest to this region. */
dma_addr_t remap_addr;
Same old gem_create but with now with extensions support. This is needed to support various upcoming usecases. For now we use the extensions mechanism to support setting an immutable-priority-list of potential placements, at creation time.
If we wish to set the placements/regions we can simply do:
struct drm_i915_gem_object_param region_param = { … }; /* Unchanged */ struct drm_i915_gem_create_ext_setparam setparam_region = { .base = { .name = I915_GEM_CREATE_EXT_SETPARAM }, .param = region_param, }
struct drm_i915_gem_create_ext create_ext = { .size = 16 * PAGE_SIZE, .extensions = (uintptr_t)&setparam_region, }; int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); if (err) ...
If we use the normal gem_create or gem_create_ext without the extensions/placements then we still get the old behaviour with only placing the object in system memory.
One important change here is the returned size will now be rounded up to the correct size, depending on the list of placements, where we might have minimum page-size restrictions on some platforms when dealing with device local-memory.
Also, we still keep around the i915_gem_object_setparam ioctl, although that is now restricted by the placement list(i.e we are not allowed to add new placements), and longer term that will be going away wrt setting placements, since it was deemed that the kernel doesn't need to support a dynamic list of placements, which is now solidified by this uapi change.
Testcase: igt/gem_create/create-ext-placement-sanity-check Testcase: igt/gem_create/create-ext-placement-each Testcase: igt/gem_create/create-ext-placement-all Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: CQ Tang cq.tang@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_create.c | 398 ++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_object.c | 2 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 9 + drivers/gpu/drm/i915/gem/i915_gem_region.c | 4 + drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/i915_gem.c | 103 +---- drivers/gpu/drm/i915/intel_memory_region.c | 20 + drivers/gpu/drm/i915/intel_memory_region.h | 4 + include/uapi/drm/i915_drm.h | 60 +++ 10 files changed, 500 insertions(+), 103 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_create.c
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index ec361d61230b..3955134feca7 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -134,6 +134,7 @@ gem-y += \ gem/i915_gem_clflush.o \ gem/i915_gem_client_blt.o \ gem/i915_gem_context.o \ + gem/i915_gem_create.o \ gem/i915_gem_dmabuf.o \ gem/i915_gem_domain.o \ gem/i915_gem_execbuffer.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c new file mode 100644 index 000000000000..6f6dd4f1ce7e --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -0,0 +1,398 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include "gem/i915_gem_ioctls.h" +#include "gem/i915_gem_lmem.h" +#include "gem/i915_gem_object_blt.h" +#include "gem/i915_gem_region.h" + +#include "i915_drv.h" +#include "i915_user_extensions.h" + +static u32 max_page_size(struct intel_memory_region **placements, + int n_placements) +{ + u32 max_page_size = 0; + int i; + + for (i = 0; i < n_placements; ++i) { + max_page_size = max_t(u32, max_page_size, + placements[i]->min_page_size); + } + + GEM_BUG_ON(!max_page_size); + return max_page_size; +} + +static int +i915_gem_create(struct drm_file *file, + struct intel_memory_region **placements, + int n_placements, + u64 *size_p, + u32 *handle_p) +{ + struct drm_i915_gem_object *obj; + u32 handle; + u64 size; + int ret; + + size = round_up(*size_p, max_page_size(placements, n_placements)); + if (size == 0) + return -EINVAL; + + /* For most of the ABI (e.g. mmap) we think in system pages */ + GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE)); + + /* Allocate the new object */ + obj = i915_gem_object_create_region(placements[0], size, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + if (i915_gem_object_is_lmem(obj)) { + struct intel_gt *gt = obj->mm.region->gt; + struct intel_context *ce = gt->engine[BCS0]->blitter_context; + + /* + * XXX: We really want to move this to get_pages(), but we + * require grabbing the BKL for the blitting operation which is + * annoying. In the pipeline is support for async get_pages() + * which should fit nicely for this. Also note that the actual + * clear should be done async(we currently do an object_wait + * which is pure garbage), we just need to take care if + * userspace opts of implicit sync for the execbuf, to avoid any + * potential info leak. + */ + +retry: + ret = i915_gem_object_fill_blt(obj, ce, 0); + if (ret == -EINTR) + goto retry; + if (ret) { + /* + * XXX: Post the error to where we would normally gather + * and clear the pages. This better reflects the final + * uapi behaviour, once we are at the point where we can + * move the clear worker to get_pages(). + */ + i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE); + i915_gem_object_lock(obj, NULL); + __i915_gem_object_put_pages(obj); + i915_gem_object_unlock(obj); + obj->mm.gem_create_posted_err = ret; + goto handle_create; + } + + /* + * XXX: Occasionally i915_gem_object_wait() called inside + * i915_gem_object_set_to_cpu_domain() get interrupted + * and return -ERESTARTSYS, this will cause go clearing + * code below and also set the gem_create_posted_err. + * moreover, the clearing sometimes fails because the + * object is still pinned by the blitter clearing code. + * this makes us to have an object with or without lmem + * pages, and with gem_create_posted_err = -ERESTARTSYS. + * Under lmem pressure, if the object has pages, we might + * swap out this object to smem. Next when user space + * code use this object in gem_execbuf() call, get_pages() + * operation will return -ERESTARTSYS error code, which + * causes user space code to fail. + * + * To avoid this problem, we add a non-interruptible + * wait before setting object to cpu domain. + */ + i915_gem_object_lock(obj, NULL); + ret = i915_gem_object_wait(obj, 0, MAX_SCHEDULE_TIMEOUT); + if (!ret) + ret = i915_gem_object_set_to_cpu_domain(obj, false); + if (ret) { + i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE); + __i915_gem_object_put_pages(obj); + obj->mm.gem_create_posted_err = ret; + i915_gem_object_unlock(obj); + goto handle_create; + } + i915_gem_object_unlock(obj); + } + +handle_create: + ret = drm_gem_handle_create(file, &obj->base, &handle); + /* drop reference from allocate - handle holds it now */ + i915_gem_object_put(obj); + if (ret) + return ret; + + obj->mm.placements = placements; + obj->mm.n_placements = n_placements; + + *handle_p = handle; + *size_p = size; + return 0; +} + +int +i915_gem_dumb_create(struct drm_file *file, + struct drm_device *dev, + struct drm_mode_create_dumb *args) +{ + struct intel_memory_region **placements; + enum intel_memory_type mem_type; + int cpp = DIV_ROUND_UP(args->bpp, 8); + u32 format; + int ret; + + switch (cpp) { + case 1: + format = DRM_FORMAT_C8; + break; + case 2: + format = DRM_FORMAT_RGB565; + break; + case 4: + format = DRM_FORMAT_XRGB8888; + break; + default: + return -EINVAL; + } + + /* have to work out size/pitch and return them */ + args->pitch = ALIGN(args->width * cpp, 64); + + /* align stride to page size so that we can remap */ + if (args->pitch > intel_plane_fb_max_stride(to_i915(dev), format, + DRM_FORMAT_MOD_LINEAR)) + args->pitch = ALIGN(args->pitch, 4096); + + if (args->pitch < args->width) + return -EINVAL; + + args->size = mul_u32_u32(args->pitch, args->height); + + mem_type = INTEL_MEMORY_SYSTEM; + if (HAS_LMEM(to_i915(dev))) + mem_type = INTEL_MEMORY_LOCAL; + + placements = kmalloc(sizeof(struct intel_memory_region *), GFP_KERNEL); + if (!placements) + return -ENOMEM; + + placements[0] = intel_memory_region_by_type(to_i915(dev), mem_type); + + ret = i915_gem_create(file, + placements, 1, + &args->size, &args->handle); + if (ret) + kfree(placements); + + return ret; +} + +struct create_ext { + struct drm_i915_private *i915; + struct intel_memory_region **placements; + int n_placements; +}; + +static void repr_placements(char *buf, size_t size, + struct intel_memory_region **placements, + int n_placements) +{ + int i; + + buf[0] = '\0'; + + for (i = 0; i < n_placements; i++) { + struct intel_memory_region *mr = placements[i]; + int r; + + r = snprintf(buf, size, "\n %s -> { class: %d, inst: %d }", + mr->name, mr->type, mr->instance); + if (r >= size) + return; + + buf += r; + size -= r; + } +} + +static int set_placements(struct drm_i915_gem_object_param *args, + struct create_ext *ext_data) +{ + struct drm_i915_private *i915 = ext_data->i915; + struct drm_i915_gem_memory_class_instance __user *uregions = + u64_to_user_ptr(args->data); + struct intel_memory_region **placements; + u32 mask; + int i, ret = 0; + + if (args->handle) { + DRM_DEBUG("Handle should be zero\n"); + ret = -EINVAL; + } + + if (!args->size) { + DRM_DEBUG("Size is zero\n"); + ret = -EINVAL; + } + + if (args->size > ARRAY_SIZE(i915->mm.regions)) { + DRM_DEBUG("Too many placements\n"); + ret = -EINVAL; + } + + if (ret) + return ret; + + placements = kmalloc_array(args->size, + sizeof(struct intel_memory_region *), + GFP_KERNEL); + if (!placements) + return -ENOMEM; + + mask = 0; + for (i = 0; i < args->size; i++) { + struct drm_i915_gem_memory_class_instance region; + struct intel_memory_region *mr; + + if (copy_from_user(®ion, uregions, sizeof(region))) { + ret = -EFAULT; + goto out_free; + } + + mr = intel_memory_region_lookup(i915, + region.memory_class, + region.memory_instance); + if (!mr) { + DRM_DEBUG("Device is missing region { class: %d, inst: %d } at index = %d\n", + region.memory_class, region.memory_instance, i); + ret = -EINVAL; + goto out_dump; + } + + if (mask & BIT(mr->id)) { + DRM_DEBUG("Found duplicate placement %s -> { class: %d, inst: %d } at index = %d\n", + mr->name, region.memory_class, + region.memory_instance, i); + ret = -EINVAL; + goto out_dump; + } + + placements[i] = mr; + mask |= BIT(mr->id); + + ++uregions; + } + + if (ext_data->placements) { + ret = -EINVAL; + goto out_dump; + } + + ext_data->placements = placements; + ext_data->n_placements = args->size; + + return 0; + +out_dump: + if (1) { + char buf[256]; + + if (ext_data->placements) { + repr_placements(buf, + sizeof(buf), + ext_data->placements, + ext_data->n_placements); + DRM_DEBUG("Placements were already set in previous SETPARAM. Existing placements: %s\n", + buf); + } + + repr_placements(buf, sizeof(buf), placements, i); + DRM_DEBUG("New placements(so far validated): %s\n", buf); + } + +out_free: + kfree(placements); + return ret; +} + +static int __create_setparam(struct drm_i915_gem_object_param *args, + struct create_ext *ext_data) +{ + if (!(args->param & I915_OBJECT_PARAM)) { + DRM_DEBUG("Missing I915_OBJECT_PARAM namespace\n"); + return -EINVAL; + } + + switch (lower_32_bits(args->param)) { + case I915_PARAM_MEMORY_REGIONS: + return set_placements(args, ext_data); + } + + return -EINVAL; +} + +static int create_setparam(struct i915_user_extension __user *base, void *data) +{ + struct drm_i915_gem_create_ext_setparam ext; + + if (copy_from_user(&ext, base, sizeof(ext))) + return -EFAULT; + + return __create_setparam(&ext.param, data); +} + +static const i915_user_extension_fn create_extensions[] = { + [I915_GEM_CREATE_EXT_SETPARAM] = create_setparam, +}; + +/** + * Creates a new mm object and returns a handle to it. + * @dev: drm device pointer + * @data: ioctl data blob + * @file: drm file pointer + */ +int +i915_gem_create_ioctl(struct drm_device *dev, void *data, + struct drm_file *file) +{ + struct drm_i915_private *i915 = to_i915(dev); + struct create_ext ext_data = { .i915 = i915 }; + struct drm_i915_gem_create_ext *args = data; + int ret; + + i915_gem_flush_free_objects(i915); + + ret = i915_user_extensions(u64_to_user_ptr(args->extensions), + create_extensions, + ARRAY_SIZE(create_extensions), + &ext_data); + if (ret) + goto err_free; + + if (!ext_data.placements) { + struct intel_memory_region **placements; + enum intel_memory_type mem_type = INTEL_MEMORY_SYSTEM; + + placements = kmalloc(sizeof(struct intel_memory_region *), + GFP_KERNEL); + if (!placements) + return -ENOMEM; + + placements[0] = intel_memory_region_by_type(i915, mem_type); + + ext_data.placements = placements; + ext_data.n_placements = 1; + } + + ret = i915_gem_create(file, + ext_data.placements, + ext_data.n_placements, + &args->size, &args->handle); + if (!ret) + return 0; + +err_free: + kfree(ext_data.placements); + return ret; +} diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 49935245a4a8..89b530841126 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -254,6 +254,8 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, if (obj->ops->release) obj->ops->release(obj);
+ kfree(obj->mm.placements); + /* But keep the pointer alive for RCU-protected lookups */ call_rcu(&obj->rcu, __i915_gem_free_object_rcu); cond_resched(); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 6d101275bc9d..115ad32c303f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -212,6 +212,15 @@ struct drm_i915_gem_object { atomic_t pages_pin_count; atomic_t shrink_pin;
+ /** + * Priority list of potential placements for this object. + */ + struct intel_memory_region **placements; + int n_placements; + + /* XXX: Nasty hack, see gem_create */ + int gem_create_posted_err; + /** * Memory region for this object. */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 58bf5f9e3199..8f352ba6202d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -33,6 +33,10 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) unsigned int sg_page_sizes; int ret;
+ /* XXX: Check if we have any post. This is nasty hack, see gem_create */ + if (obj->mm.gem_create_posted_err) + return obj->mm.gem_create_posted_err; + st = kmalloc(sizeof(*st), GFP_KERNEL); if (!st) return -ENOMEM; diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 07b3a89ec09e..f4540c048cd9 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1729,7 +1729,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_GEM_THROTTLE, i915_gem_throttle_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_ENTERVT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), DRM_IOCTL_DEF_DRV(I915_GEM_LEAVEVT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), - DRM_IOCTL_DEF_DRV(I915_GEM_CREATE, i915_gem_create_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_GEM_CREATE_EXT, i915_gem_create_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_PREAD, i915_gem_pread_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_PWRITE, i915_gem_pwrite_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_GEM_MMAP, i915_gem_mmap_ioctl, DRM_RENDER_ALLOW), diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ef2124c17a7f..bf67f323a1ae 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -43,6 +43,7 @@ #include "gem/i915_gem_clflush.h" #include "gem/i915_gem_context.h" #include "gem/i915_gem_ioctls.h" +#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_mman.h" #include "gem/i915_gem_region.h" #include "gt/intel_engine_user.h" @@ -179,108 +180,6 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, return ret; }
-static int -i915_gem_create(struct drm_file *file, - struct intel_memory_region *mr, - u64 *size_p, - u32 *handle_p) -{ - struct drm_i915_gem_object *obj; - u32 handle; - u64 size; - int ret; - - GEM_BUG_ON(!is_power_of_2(mr->min_page_size)); - size = round_up(*size_p, mr->min_page_size); - if (size == 0) - return -EINVAL; - - /* For most of the ABI (e.g. mmap) we think in system pages */ - GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE)); - - /* Allocate the new object */ - obj = i915_gem_object_create_region(mr, size, 0); - if (IS_ERR(obj)) - return PTR_ERR(obj); - - ret = drm_gem_handle_create(file, &obj->base, &handle); - /* drop reference from allocate - handle holds it now */ - i915_gem_object_put(obj); - if (ret) - return ret; - - *handle_p = handle; - *size_p = size; - return 0; -} - -int -i915_gem_dumb_create(struct drm_file *file, - struct drm_device *dev, - struct drm_mode_create_dumb *args) -{ - enum intel_memory_type mem_type; - int cpp = DIV_ROUND_UP(args->bpp, 8); - u32 format; - - switch (cpp) { - case 1: - format = DRM_FORMAT_C8; - break; - case 2: - format = DRM_FORMAT_RGB565; - break; - case 4: - format = DRM_FORMAT_XRGB8888; - break; - default: - return -EINVAL; - } - - /* have to work out size/pitch and return them */ - args->pitch = ALIGN(args->width * cpp, 64); - - /* align stride to page size so that we can remap */ - if (args->pitch > intel_plane_fb_max_stride(to_i915(dev), format, - DRM_FORMAT_MOD_LINEAR)) - args->pitch = ALIGN(args->pitch, 4096); - - if (args->pitch < args->width) - return -EINVAL; - - args->size = mul_u32_u32(args->pitch, args->height); - - mem_type = INTEL_MEMORY_SYSTEM; - if (HAS_LMEM(to_i915(dev))) - mem_type = INTEL_MEMORY_LOCAL; - - return i915_gem_create(file, - intel_memory_region_by_type(to_i915(dev), - mem_type), - &args->size, &args->handle); -} - -/** - * Creates a new mm object and returns a handle to it. - * @dev: drm device pointer - * @data: ioctl data blob - * @file: drm file pointer - */ -int -i915_gem_create_ioctl(struct drm_device *dev, void *data, - struct drm_file *file) -{ - struct drm_i915_private *i915 = to_i915(dev); - struct drm_i915_gem_create *args = data; - - i915_gem_flush_free_objects(i915); - - return i915_gem_create(file, - intel_memory_region_by_type(i915, - INTEL_MEMORY_SYSTEM), - &args->size, &args->handle); -} - static int shmem_pread(struct page *page, int offset, int len, char __user *user_data, bool needs_clflush) diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 6f40748901da..67240bddf2ca 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -21,6 +21,26 @@ const struct intel_memory_region_info intel_region_map[] = { }, };
+struct intel_memory_region * +intel_memory_region_lookup(struct drm_i915_private *i915, + u16 class, u16 instance) +{ + int i; + + /* XXX: consider maybe converting to an rb tree at some point */ + for (i = 0; i < ARRAY_SIZE(i915->mm.regions); ++i) { + struct intel_memory_region *region = i915->mm.regions[i]; + + if (!region) + continue; + + if (region->type == class && region->instance == instance) + return region; + } + + return NULL; +} + struct intel_memory_region * intel_memory_region_by_type(struct drm_i915_private *i915, enum intel_memory_type mem_type) diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 15dcb57b4b5a..20431d3ce490 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -102,6 +102,10 @@ struct intel_memory_region { } objects; };
+struct intel_memory_region * +intel_memory_region_lookup(struct drm_i915_private *i915, + u16 class, u16 instance); + int intel_memory_region_init_buddy(struct intel_memory_region *mem); void intel_memory_region_release_buddy(struct intel_memory_region *mem);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 41845203250d..f6e3a0462414 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -391,6 +391,7 @@ typedef struct _drm_i915_sarea { #define DRM_IOCTL_I915_GEM_ENTERVT DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_ENTERVT) #define DRM_IOCTL_I915_GEM_LEAVEVT DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_LEAVEVT) #define DRM_IOCTL_I915_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE, struct drm_i915_gem_create) +#define DRM_IOCTL_I915_GEM_CREATE_EXT DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE, struct drm_i915_gem_create_ext) #define DRM_IOCTL_I915_GEM_PREAD DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_PREAD, struct drm_i915_gem_pread) #define DRM_IOCTL_I915_GEM_PWRITE DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_PWRITE, struct drm_i915_gem_pwrite) #define DRM_IOCTL_I915_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_MMAP, struct drm_i915_gem_mmap) @@ -728,6 +729,27 @@ struct drm_i915_gem_create { __u32 pad; };
+struct drm_i915_gem_create_ext { + + /** + * Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + */ + __u64 size; + /** + * Returned handle for the object. + * + * Object handles are nonzero. + */ + __u32 handle; + __u32 pad; +#define I915_GEM_CREATE_EXT_SETPARAM (1u << 0) +#define I915_GEM_CREATE_EXT_FLAGS_UNKNOWN \ + (-(I915_GEM_CREATE_EXT_SETPARAM << 1)) + __u64 extensions; +}; + struct drm_i915_gem_pread { /** Handle for the object being read. */ __u32 handle; @@ -1698,6 +1720,44 @@ struct drm_i915_gem_context_param { __u64 value; };
+struct drm_i915_gem_object_param { + /* Object handle (0 for I915_GEM_CREATE_EXT_SETPARAM) */ + __u32 handle; + + /* Data pointer size */ + __u32 size; + +/* + * I915_OBJECT_PARAM: + * + * Select object namespace for the param. + */ +#define I915_OBJECT_PARAM (1ull<<32) + +/* + * I915_PARAM_MEMORY_REGIONS: + * + * Set the data pointer with the desired set of placements in priority + * order(each entry must be unique and supported by the device), as an array of + * drm_i915_gem_memory_class_instance, or an equivalent layout of class:instance + * pair encodings. See DRM_I915_QUERY_MEMORY_REGIONS for how to query the + * supported regions. + * + * Note that this requires the I915_OBJECT_PARAM namespace: + * .param = I915_OBJECT_PARAM | I915_PARAM_MEMORY_REGIONS + */ +#define I915_PARAM_MEMORY_REGIONS 0x1 + __u64 param; + + /* Data value or pointer */ + __u64 data; +}; + +struct drm_i915_gem_create_ext_setparam { + struct i915_user_extension base; + struct drm_i915_gem_object_param param; +}; + /** * Context SSEU programming *
Quoting Matthew Auld (2020-11-27 12:06:08)
Same old gem_create but with now with extensions support. This is needed to support various upcoming usecases. For now we use the extensions mechanism to support setting an immutable-priority-list of potential placements, at creation time.
If we wish to set the placements/regions we can simply do:
struct drm_i915_gem_object_param region_param = { … }; /* Unchanged */ struct drm_i915_gem_create_ext_setparam setparam_region = { .base = { .name = I915_GEM_CREATE_EXT_SETPARAM }, .param = region_param, }
struct drm_i915_gem_create_ext create_ext = { .size = 16 * PAGE_SIZE, .extensions = (uintptr_t)&setparam_region, }; int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); if (err) ...
If we use the normal gem_create or gem_create_ext without the extensions/placements then we still get the old behaviour with only placing the object in system memory.
One important change here is the returned size will now be rounded up to the correct size, depending on the list of placements, where we might have minimum page-size restrictions on some platforms when dealing with device local-memory.
Also, we still keep around the i915_gem_object_setparam ioctl, although that is now restricted by the placement list(i.e we are not allowed to add new placements), and longer term that will be going away wrt setting placements, since it was deemed that the kernel doesn't need to support a dynamic list of placements, which is now solidified by this uapi change.
Testcase: igt/gem_create/create-ext-placement-sanity-check Testcase: igt/gem_create/create-ext-placement-each Testcase: igt/gem_create/create-ext-placement-all Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: CQ Tang cq.tang@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_create.c | 398 ++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_object.c | 2 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 9 + drivers/gpu/drm/i915/gem/i915_gem_region.c | 4 + drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/i915_gem.c | 103 +---- drivers/gpu/drm/i915/intel_memory_region.c | 20 + drivers/gpu/drm/i915/intel_memory_region.h | 4 + include/uapi/drm/i915_drm.h | 60 +++ 10 files changed, 500 insertions(+), 103 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_create.c
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index ec361d61230b..3955134feca7 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -134,6 +134,7 @@ gem-y += \ gem/i915_gem_clflush.o \ gem/i915_gem_client_blt.o \ gem/i915_gem_context.o \
gem/i915_gem_create.o \ gem/i915_gem_dmabuf.o \ gem/i915_gem_domain.o \ gem/i915_gem_execbuffer.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c new file mode 100644 index 000000000000..6f6dd4f1ce7e --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -0,0 +1,398 @@ +// SPDX-License-Identifier: MIT +/*
- Copyright © 2020 Intel Corporation
- */
+#include "gem/i915_gem_ioctls.h" +#include "gem/i915_gem_lmem.h" +#include "gem/i915_gem_object_blt.h" +#include "gem/i915_gem_region.h"
+#include "i915_drv.h" +#include "i915_user_extensions.h"
+static u32 max_page_size(struct intel_memory_region **placements,
int n_placements)
+{
u32 max_page_size = 0;
int i;
for (i = 0; i < n_placements; ++i) {
max_page_size = max_t(u32, max_page_size,
placements[i]->min_page_size);
}
GEM_BUG_ON(!max_page_size);
return max_page_size;
+}
+static int +i915_gem_create(struct drm_file *file,
struct intel_memory_region **placements,
int n_placements,
u64 *size_p,
u32 *handle_p)
+{
struct drm_i915_gem_object *obj;
u32 handle;
u64 size;
int ret;
size = round_up(*size_p, max_page_size(placements, n_placements));
if (size == 0)
return -EINVAL;
/* For most of the ABI (e.g. mmap) we think in system pages */
GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
/* Allocate the new object */
obj = i915_gem_object_create_region(placements[0], size, 0);
if (IS_ERR(obj))
return PTR_ERR(obj);
if (i915_gem_object_is_lmem(obj)) {
struct intel_gt *gt = obj->mm.region->gt;
struct intel_context *ce = gt->engine[BCS0]->blitter_context;
/*
* XXX: We really want to move this to get_pages(), but we
* require grabbing the BKL for the blitting operation which is
* annoying. In the pipeline is support for async get_pages()
* which should fit nicely for this. Also note that the actual
* clear should be done async(we currently do an object_wait
* which is pure garbage), we just need to take care if
* userspace opts of implicit sync for the execbuf, to avoid any
* potential info leak.
*/
Not just XXX, but the design should be completed first. -Chris
On 11/27/20 2:25 PM, Chris Wilson wrote:
Quoting Matthew Auld (2020-11-27 12:06:08)
Same old gem_create but with now with extensions support. This is needed to support various upcoming usecases. For now we use the extensions mechanism to support setting an immutable-priority-list of potential placements, at creation time.
If we wish to set the placements/regions we can simply do:
struct drm_i915_gem_object_param region_param = { … }; /* Unchanged */ struct drm_i915_gem_create_ext_setparam setparam_region = { .base = { .name = I915_GEM_CREATE_EXT_SETPARAM }, .param = region_param, }
struct drm_i915_gem_create_ext create_ext = { .size = 16 * PAGE_SIZE, .extensions = (uintptr_t)&setparam_region, }; int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); if (err) ...
If we use the normal gem_create or gem_create_ext without the extensions/placements then we still get the old behaviour with only placing the object in system memory.
One important change here is the returned size will now be rounded up to the correct size, depending on the list of placements, where we might have minimum page-size restrictions on some platforms when dealing with device local-memory.
Also, we still keep around the i915_gem_object_setparam ioctl, although that is now restricted by the placement list(i.e we are not allowed to add new placements), and longer term that will be going away wrt setting placements, since it was deemed that the kernel doesn't need to support a dynamic list of placements, which is now solidified by this uapi change.
Testcase: igt/gem_create/create-ext-placement-sanity-check Testcase: igt/gem_create/create-ext-placement-each Testcase: igt/gem_create/create-ext-placement-all Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: CQ Tang cq.tang@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_create.c | 398 ++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_object.c | 2 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 9 + drivers/gpu/drm/i915/gem/i915_gem_region.c | 4 + drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/i915_gem.c | 103 +---- drivers/gpu/drm/i915/intel_memory_region.c | 20 + drivers/gpu/drm/i915/intel_memory_region.h | 4 + include/uapi/drm/i915_drm.h | 60 +++ 10 files changed, 500 insertions(+), 103 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_create.c
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index ec361d61230b..3955134feca7 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -134,6 +134,7 @@ gem-y += \ gem/i915_gem_clflush.o \ gem/i915_gem_client_blt.o \ gem/i915_gem_context.o \
gem/i915_gem_create.o \ gem/i915_gem_dmabuf.o \ gem/i915_gem_domain.o \ gem/i915_gem_execbuffer.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c new file mode 100644 index 000000000000..6f6dd4f1ce7e --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -0,0 +1,398 @@ +// SPDX-License-Identifier: MIT +/*
- Copyright © 2020 Intel Corporation
- */
+#include "gem/i915_gem_ioctls.h" +#include "gem/i915_gem_lmem.h" +#include "gem/i915_gem_object_blt.h" +#include "gem/i915_gem_region.h"
+#include "i915_drv.h" +#include "i915_user_extensions.h"
+static u32 max_page_size(struct intel_memory_region **placements,
int n_placements)
+{
u32 max_page_size = 0;
int i;
for (i = 0; i < n_placements; ++i) {
max_page_size = max_t(u32, max_page_size,
placements[i]->min_page_size);
}
GEM_BUG_ON(!max_page_size);
return max_page_size;
+}
+static int +i915_gem_create(struct drm_file *file,
struct intel_memory_region **placements,
int n_placements,
u64 *size_p,
u32 *handle_p)
+{
struct drm_i915_gem_object *obj;
u32 handle;
u64 size;
int ret;
size = round_up(*size_p, max_page_size(placements, n_placements));
if (size == 0)
return -EINVAL;
/* For most of the ABI (e.g. mmap) we think in system pages */
GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
/* Allocate the new object */
obj = i915_gem_object_create_region(placements[0], size, 0);
if (IS_ERR(obj))
return PTR_ERR(obj);
if (i915_gem_object_is_lmem(obj)) {
struct intel_gt *gt = obj->mm.region->gt;
struct intel_context *ce = gt->engine[BCS0]->blitter_context;
/*
* XXX: We really want to move this to get_pages(), but we
* require grabbing the BKL for the blitting operation which is
* annoying. In the pipeline is support for async get_pages()
* which should fit nicely for this. Also note that the actual
* clear should be done async(we currently do an object_wait
* which is pure garbage), we just need to take care if
* userspace opts of implicit sync for the execbuf, to avoid any
* potential info leak.
*/
Not just XXX, but the design should be completed first.
Matthew, I have a patch series in the makings that moves this blit to get_pages().
/Thomas
Quoting Matthew Auld (2020-11-27 12:06:08)
+int +i915_gem_create_ioctl(struct drm_device *dev, void *data,
struct drm_file *file)
+{
struct drm_i915_private *i915 = to_i915(dev);
struct create_ext ext_data = { .i915 = i915 };
struct drm_i915_gem_create_ext *args = data;
int ret;
i915_gem_flush_free_objects(i915);
ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
create_extensions,
ARRAY_SIZE(create_extensions),
&ext_data);
if (ret)
goto err_free;
if (!ext_data.placements) {
struct intel_memory_region **placements;
enum intel_memory_type mem_type = INTEL_MEMORY_SYSTEM;
placements = kmalloc(sizeof(struct intel_memory_region *),
GFP_KERNEL);
if (!placements)
return -ENOMEM;
placements[0] = intel_memory_region_by_type(i915, mem_type);
ext_data.placements = placements;
ext_data.n_placements = 1;
}
ret = i915_gem_create(file,
ext_data.placements,
ext_data.n_placements,
&args->size, &args->handle);
if (!ret)
return 0;
Applying the extensions has to happen after creating the vanilla object.
It literally is the equivalent of applying the setparam ioctl to a fresh object.
Look at the PXP series for how badly wrong this goes if you try it this way around. -Chris
Quoting Matthew Auld (2020-11-27 12:06:08)
Same old gem_create but with now with extensions support. This is needed to support various upcoming usecases. For now we use the extensions mechanism to support setting an immutable-priority-list of potential placements, at creation time.
If we wish to set the placements/regions we can simply do:
struct drm_i915_gem_object_param region_param = { … }; /* Unchanged */ struct drm_i915_gem_create_ext_setparam setparam_region = { .base = { .name = I915_GEM_CREATE_EXT_SETPARAM }, .param = region_param, }
struct drm_i915_gem_create_ext create_ext = { .size = 16 * PAGE_SIZE, .extensions = (uintptr_t)&setparam_region, }; int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); if (err) ...
Looking at the existing gem_create, there is no detection of an unsupported extension. That is there is no rejection of new userspace asking for placement on an old kernel. (As erroneous as that would be for many other reasons.)
Unless I've missed something, we need a new ioctl number for CREATEv2. -Chris
On 01/12/2020 12:55, Chris Wilson wrote:
Quoting Matthew Auld (2020-11-27 12:06:08)
Same old gem_create but with now with extensions support. This is needed to support various upcoming usecases. For now we use the extensions mechanism to support setting an immutable-priority-list of potential placements, at creation time.
If we wish to set the placements/regions we can simply do:
struct drm_i915_gem_object_param region_param = { … }; /* Unchanged */ struct drm_i915_gem_create_ext_setparam setparam_region = { .base = { .name = I915_GEM_CREATE_EXT_SETPARAM }, .param = region_param, }
struct drm_i915_gem_create_ext create_ext = { .size = 16 * PAGE_SIZE, .extensions = (uintptr_t)&setparam_region, }; int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); if (err) ...
Looking at the existing gem_create, there is no detection of an unsupported extension. That is there is no rejection of new userspace asking for placement on an old kernel. (As erroneous as that would be for many other reasons.)
Unless I've missed something, we need a new ioctl number for CREATEv2.
+Joonas
Right, and I guess it's not a good idea for userspace to implement something like has_gem_create_ext()?
-Chris
From: Michel Thierry michel.thierry@intel.com
Signed-off-by: Michel Thierry michel.thierry@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_ring.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index d636c6ed88b7..aa75e644f3f2 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -4,6 +4,7 @@ * Copyright © 2019 Intel Corporation */
+#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_object.h" #include "i915_drv.h" #include "i915_vma.h" @@ -111,10 +112,16 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size) struct i915_vma *vma;
obj = ERR_PTR(-ENODEV); - if (i915_ggtt_has_aperture(ggtt)) - obj = i915_gem_object_create_stolen(i915, size); - if (IS_ERR(obj)) - obj = i915_gem_object_create_internal(i915, size); + if (HAS_LMEM(i915)) { + obj = i915_gem_object_create_lmem(i915, size, + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_VOLATILE); + } else { + if (i915_ggtt_has_aperture(ggtt)) + obj = i915_gem_object_create_stolen(i915, size); + if (IS_ERR(obj)) + obj = i915_gem_object_create_internal(i915, size); + } if (IS_ERR(obj)) return ERR_CAST(obj);
Quoting Matthew Auld (2020-11-27 12:06:09)
From: Michel Thierry michel.thierry@intel.com
Signed-off-by: Michel Thierry michel.thierry@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com
drivers/gpu/drm/i915/gt/intel_ring.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index d636c6ed88b7..aa75e644f3f2 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -4,6 +4,7 @@
- Copyright © 2019 Intel Corporation
*/
+#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_object.h" #include "i915_drv.h" #include "i915_vma.h" @@ -111,10 +112,16 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size) struct i915_vma *vma;
obj = ERR_PTR(-ENODEV);
if (i915_ggtt_has_aperture(ggtt))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))
obj = i915_gem_object_create_internal(i915, size);
if (HAS_LMEM(i915)) {
obj = i915_gem_object_create_lmem(i915, size,
I915_BO_ALLOC_CONTIGUOUS |
I915_BO_ALLOC_VOLATILE);
Just create, and keep trying when !lmem returns an error.
Why contiguous, it's vmapped anyway?
} else {
if (i915_ggtt_has_aperture(ggtt))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))
obj = i915_gem_object_create_internal(i915, size);
} if (IS_ERR(obj)) return ERR_CAST(obj);
-- 2.26.2
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
From: Kui Wen kui.wen@intel.com
When user space does mmap, kernel would map the physical page of local memory to virtual memory address. The r->sgt.pfn is page address allocated from local memory and the local memory region is from 0 to LMEM size. Hence the r->sgt.pfn is possible to be 0, but this's normal case.
Signed-off-by: Kui Wen kui.wen@intel.com --- drivers/gpu/drm/i915/i915_mm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_mm.c b/drivers/gpu/drm/i915/i915_mm.c index 43039dc8c607..dcf6b3e5bfdf 100644 --- a/drivers/gpu/drm/i915/i915_mm.c +++ b/drivers/gpu/drm/i915/i915_mm.c @@ -62,7 +62,7 @@ static int remap_sg(pte_t *pte, unsigned long addr, void *data) { struct remap_pfn *r = data;
- if (GEM_WARN_ON(!r->sgt.pfn)) + if (GEM_WARN_ON(!use_dma(r->iobase) && !r->sgt.pfn)) return -EINVAL;
/* Special PTE are not associated with any struct page */
From: "Michael J. Ruhl" michael.j.ruhl@intel.com
The i915 GEM dmabuf mmap interface assumes all BOs are SHMEM. When the BO is backed by LMEM, this assumption doesn't work so well.
Introduce the dmabuf mmap interface to LMEM by adding the appropriate VMA faulting mechanism and update dmabuf to allow for LMEM backed BOs by leveraging the gem_mman path.
Cc: Brian Welty brian.welty@intel.com Signed-off-by: Michael J. Ruhl michael.j.ruhl@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 59 +++++++--- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 102 ++++++++++-------- drivers/gpu/drm/i915/gem/i915_gem_mman.h | 9 ++ .../drm/i915/gem/selftests/i915_gem_mman.c | 12 +-- 4 files changed, 118 insertions(+), 64 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 018d02cc4af5..85528eeaacbc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -10,6 +10,7 @@
#include "i915_drv.h" #include "i915_gem_lmem.h" +#include "i915_gem_mman.h" #include "i915_gem_object.h" #include "i915_scatterlist.h"
@@ -105,7 +106,41 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, struct dma_buf_map * i915_gem_object_unpin_map(obj); }
-static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) +/** + * i915_gem_dmabuf_update_vma - Setup VMA information for exported LMEM + * objects + * @obj: valid LMEM object + * @vma: va;od vma + * + * NOTE: on success, the final _object_put() will be done by the VMA + * vm_close() callback. + */ +static int i915_gem_dmabuf_update_vma(struct drm_i915_gem_object *obj, + struct vm_area_struct *vma) +{ + struct i915_mmap_offset *mmo; + int err; + + i915_gem_object_get(obj); + mmo = i915_gem_mmap_offset_attach(obj, I915_MMAP_TYPE_WC, NULL); + if (IS_ERR(mmo)) { + err = PTR_ERR(mmo); + goto out; + } + + err = i915_gem_update_vma_info(obj, mmo, vma); + if (err) + goto out; + + return 0; + +out: + i915_gem_object_put(obj); + return err; +} + +static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, + struct vm_area_struct *vma) { struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); int ret; @@ -113,16 +148,20 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct * if (obj->base.size < vma->vm_end - vma->vm_start) return -EINVAL;
- if (!obj->base.filp) - return -ENODEV; + /* shmem */ + if (obj->base.filp) { + ret = call_mmap(obj->base.filp, vma); + if (ret) + return ret;
- ret = call_mmap(obj->base.filp, vma); - if (ret) - return ret; + vma_set_file(vma, obj->base.filp); + return 0; + }
- vma_set_file(vma, obj->base.filp); + if (i915_gem_object_is_lmem(obj)) + return i915_gem_dmabuf_update_vma(obj, vma);
- return 0; + return -ENODEV; }
static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) @@ -254,10 +293,6 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev, */ return &i915_gem_object_get(obj)->base; } - - /* not our device, but still a i915 object? */ - if (i915_gem_object_is_lmem(obj)) - return ERR_PTR(-ENOTSUPP); }
/* need to attach */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 4e8a05c35252..33ccd4d665d4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -620,10 +620,10 @@ insert_mmo(struct drm_i915_gem_object *obj, struct i915_mmap_offset *mmo) return mmo; }
-static struct i915_mmap_offset * -mmap_offset_attach(struct drm_i915_gem_object *obj, - enum i915_mmap_type mmap_type, - struct drm_file *file) +struct i915_mmap_offset * +i915_gem_mmap_offset_attach(struct drm_i915_gem_object *obj, + enum i915_mmap_type mmap_type, + struct drm_file *file) { struct drm_i915_private *i915 = to_i915(obj->base.dev); struct i915_mmap_offset *mmo; @@ -696,7 +696,7 @@ __assign_mmap_offset(struct drm_file *file, goto out; }
- mmo = mmap_offset_attach(obj, mmap_type, file); + mmo = i915_gem_mmap_offset_attach(obj, mmap_type, file); if (IS_ERR(mmo)) { err = PTR_ERR(mmo); goto out; @@ -867,56 +867,22 @@ static struct file *mmap_singleton(struct drm_i915_private *i915) return file; }
-/* - * This overcomes the limitation in drm_gem_mmap's assignment of a - * drm_gem_object as the vma->vm_private_data. Since we need to - * be able to resolve multiple mmap offsets which could be tied - * to a single gem object. - */ -int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma) +int i915_gem_update_vma_info(struct drm_i915_gem_object *obj, + struct i915_mmap_offset *mmo, + struct vm_area_struct *vma) { - struct drm_vma_offset_node *node; - struct drm_file *priv = filp->private_data; - struct drm_device *dev = priv->minor->dev; - struct drm_i915_gem_object *obj = NULL; - struct i915_mmap_offset *mmo = NULL; struct file *anon;
- if (drm_dev_is_unplugged(dev)) - return -ENODEV; - - rcu_read_lock(); - drm_vma_offset_lock_lookup(dev->vma_offset_manager); - node = drm_vma_offset_exact_lookup_locked(dev->vma_offset_manager, - vma->vm_pgoff, - vma_pages(vma)); - if (node && drm_vma_node_is_allowed(node, priv)) { - /* - * Skip 0-refcnted objects as it is in the process of being - * destroyed and will be invalid when the vma manager lock - * is released. - */ - mmo = container_of(node, struct i915_mmap_offset, vma_node); - obj = i915_gem_object_get_rcu(mmo->obj); - } - drm_vma_offset_unlock_lookup(dev->vma_offset_manager); - rcu_read_unlock(); - if (!obj) - return node ? -EACCES : -EINVAL; - if (i915_gem_object_is_readonly(obj)) { - if (vma->vm_flags & VM_WRITE) { - i915_gem_object_put(obj); + if (vma->vm_flags & VM_WRITE) return -EINVAL; - } + vma->vm_flags &= ~VM_MAYWRITE; }
- anon = mmap_singleton(to_i915(dev)); - if (IS_ERR(anon)) { - i915_gem_object_put(obj); + anon = mmap_singleton(to_i915(obj->base.dev)); + if (IS_ERR(anon)) return PTR_ERR(anon); - }
vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; vma->vm_private_data = mmo; @@ -962,6 +928,50 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma) return 0; }
+/* + * This overcomes the limitation in drm_gem_mmap's assignment of a + * drm_gem_object as the vma->vm_private_data. Since we need to + * be able to resolve multiple mmap offsets which could be tied + * to a single gem object. + */ +int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct drm_vma_offset_node *node; + struct drm_file *priv = filp->private_data; + struct drm_device *dev = priv->minor->dev; + struct drm_i915_gem_object *obj = NULL; + struct i915_mmap_offset *mmo = NULL; + int err; + + if (drm_dev_is_unplugged(dev)) + return -ENODEV; + + rcu_read_lock(); + drm_vma_offset_lock_lookup(dev->vma_offset_manager); + node = drm_vma_offset_exact_lookup_locked(dev->vma_offset_manager, + vma->vm_pgoff, + vma_pages(vma)); + if (node && drm_vma_node_is_allowed(node, priv)) { + /* + * Skip 0-refcnted objects as it is in the process of being + * destroyed and will be invalid when the vma manager lock + * is released. + */ + mmo = container_of(node, struct i915_mmap_offset, vma_node); + obj = i915_gem_object_get_rcu(mmo->obj); + } + drm_vma_offset_unlock_lookup(dev->vma_offset_manager); + rcu_read_unlock(); + if (!obj) + return node ? -EACCES : -EINVAL; + + err = i915_gem_update_vma_info(obj, mmo, vma); + if (err) + i915_gem_object_put(obj); + + return err; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_gem_mman.c" #endif diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h index 7c5ccdf59359..dfd19da0b3e7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h @@ -10,6 +10,8 @@ #include <linux/mm_types.h> #include <linux/types.h>
+#include "gem/i915_gem_object_types.h" + struct drm_device; struct drm_file; struct drm_i915_gem_object; @@ -31,4 +33,11 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
+struct i915_mmap_offset * +i915_gem_mmap_offset_attach(struct drm_i915_gem_object *obj, + enum i915_mmap_type mmap_type, + struct drm_file *file); +int i915_gem_update_vma_info(struct drm_i915_gem_object *obj, + struct i915_mmap_offset *mmo, + struct vm_area_struct *vma); #endif diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 85fff8bed08c..5701549b5d13 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -583,7 +583,7 @@ static bool assert_mmap_offset(struct drm_i915_private *i915, if (IS_ERR(obj)) return false;
- mmo = mmap_offset_attach(obj, I915_MMAP_OFFSET_GTT, NULL); + mmo = i915_gem_mmap_offset_attach(obj, I915_MMAP_OFFSET_GTT, NULL); i915_gem_object_put(obj);
return PTR_ERR_OR_ZERO(mmo) == expected; @@ -686,7 +686,7 @@ static int igt_mmap_offset_exhaustion(void *arg) goto out; }
- mmo = mmap_offset_attach(obj, I915_MMAP_OFFSET_GTT, NULL); + mmo = i915_gem_mmap_offset_attach(obj, I915_MMAP_OFFSET_GTT, NULL); if (IS_ERR(mmo)) { pr_err("Unable to insert object into reclaimed hole\n"); err = PTR_ERR(mmo); @@ -860,7 +860,7 @@ static int __igt_mmap(struct drm_i915_private *i915, if (err) return err;
- mmo = mmap_offset_attach(obj, type, NULL); + mmo = i915_gem_mmap_offset_attach(obj, type, NULL); if (IS_ERR(mmo)) return PTR_ERR(mmo);
@@ -996,7 +996,7 @@ static int __igt_mmap_access(struct drm_i915_private *i915, if (!can_mmap(obj, type) || !can_access(obj)) return 0;
- mmo = mmap_offset_attach(obj, type, NULL); + mmo = i915_gem_mmap_offset_attach(obj, type, NULL); if (IS_ERR(mmo)) return PTR_ERR(mmo);
@@ -1109,7 +1109,7 @@ static int __igt_mmap_gpu(struct drm_i915_private *i915, if (err) return err;
- mmo = mmap_offset_attach(obj, type, NULL); + mmo = i915_gem_mmap_offset_attach(obj, type, NULL); if (IS_ERR(mmo)) return PTR_ERR(mmo);
@@ -1285,7 +1285,7 @@ static int __igt_mmap_revoke(struct drm_i915_private *i915, if (!can_mmap(obj, type)) return 0;
- mmo = mmap_offset_attach(obj, type, NULL); + mmo = i915_gem_mmap_offset_attach(obj, type, NULL); if (IS_ERR(mmo)) return PTR_ERR(mmo);
Hook up the LMEM region. Addresses will start from zero, and for CPU access we get LMEM_BAR which is just a 1:1 mapping of said region.
Based on a patch from Michel Thierry.
Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/i915_reg.h | 3 ++ drivers/gpu/drm/i915/intel_memory_region.c | 11 ++++++- drivers/gpu/drm/i915/intel_region_lmem.c | 38 ++++++++++++++++++++++ drivers/gpu/drm/i915/intel_region_lmem.h | 2 ++ 4 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index bf9ba1e361bb..1af1966ac461 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12063,6 +12063,9 @@ enum skl_power_gate {
#define GEN12_GLOBAL_MOCS(i) _MMIO(0x4000 + (i) * 4) /* Global MOCS regs */
+#define GEN12_LMEM_CFG_ADDR _MMIO(0xcf58) +#define LMEM_ENABLE (1 << 31) + /* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) #define GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW 0x67F1427F /* max/min for LRA1/2 */ diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 67240bddf2ca..1f26bc06ec20 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -303,7 +303,16 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) mem = i915_gem_stolen_setup(i915); break; case INTEL_MEMORY_LOCAL: - mem = intel_setup_fake_lmem(i915); +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) + if (IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM)) { + if (INTEL_GEN(i915) >= 9 && i915_selftest.live < 0 && + i915->params.fake_lmem_start) + mem = intel_setup_fake_lmem(i915); + } +#endif + + if (IS_ERR(mem)) + mem = i915_gem_setup_lmem(i915); break; }
diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index 40d8f1a95df6..e98582c76de1 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -136,3 +136,41 @@ intel_setup_fake_lmem(struct drm_i915_private *i915)
return mem; } + +static struct intel_memory_region * +setup_lmem(struct drm_i915_private *dev_priv) +{ + struct pci_dev *pdev = dev_priv->drm.pdev; + struct intel_memory_region *mem; + resource_size_t io_start; + resource_size_t size; + + /* Enables Local Memory functionality in GAM */ + I915_WRITE(GEN12_LMEM_CFG_ADDR, I915_READ(GEN12_LMEM_CFG_ADDR) | LMEM_ENABLE); + + io_start = pci_resource_start(pdev, 2); + size = pci_resource_len(pdev, 2); + + mem = intel_memory_region_create(dev_priv, + 0, + size, + I915_GTT_PAGE_SIZE_4K, + io_start, + &intel_region_lmem_ops); + if (!IS_ERR(mem)) { + DRM_INFO("Intel graphics LMEM: %pR\n", &mem->region); + DRM_INFO("Intel graphics LMEM IO start: %llx\n", + (u64)mem->io_start); + DRM_INFO("Intel graphics LMEM size: %llx\n", + (u64)size); + } + + return mem; +} + +struct intel_memory_region * +i915_gem_setup_lmem(struct drm_i915_private *i915) +{ + return setup_lmem(i915); +} + diff --git a/drivers/gpu/drm/i915/intel_region_lmem.h b/drivers/gpu/drm/i915/intel_region_lmem.h index 213def7c7b8a..054e729035c1 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.h +++ b/drivers/gpu/drm/i915/intel_region_lmem.h @@ -10,6 +10,8 @@ struct drm_i915_private;
extern const struct intel_memory_region_ops intel_region_lmem_ops;
+struct intel_memory_region *i915_gem_setup_lmem(struct drm_i915_private *i915); + struct intel_memory_region * intel_setup_fake_lmem(struct drm_i915_private *i915);
On Fri, 27 Nov 2020, Matthew Auld matthew.auld@intel.com wrote:
- /* Enables Local Memory functionality in GAM */
- I915_WRITE(GEN12_LMEM_CFG_ADDR, I915_READ(GEN12_LMEM_CFG_ADDR) | LMEM_ENABLE);
Please use intel_uncore_read/write and intel_de_read/write throughout the series. We don't want any new users of I915_READ/I915_WRITE in the driver.
BR, Jani.
From: Zbigniew Kempczyński zbigniew.kempczynski@intel.com
IGTs should be able to choose testing strategy depending on memory regions and its sizes. Add region instance number to make this easier and descriptive.
Cc: Matthew Auld matthew.auld@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Adam Miszczak adam.miszczak@intel.com Signed-off-by: Zbigniew Kempczyński zbigniew.kempczynski@intel.com --- drivers/gpu/drm/i915/intel_memory_region.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 1f26bc06ec20..cea44ddebe46 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -329,6 +329,10 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) mem->instance = instance; mem->gt = &i915->gt;
+ if (HAS_LMEM(mem->i915) && type != INTEL_MEMORY_SYSTEM) + intel_memory_region_set_name(mem, "%s%u", + mem->name, mem->instance); + i915->mm.regions[i] = mem; }
Quoting Matthew Auld (2020-11-27 12:06:13)
From: Zbigniew Kempczyński zbigniew.kempczynski@intel.com
IGTs should be able to choose testing strategy depending on memory regions and its sizes. Add region instance number to make this easier and descriptive.
Cc: Matthew Auld matthew.auld@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Adam Miszczak adam.miszczak@intel.com Signed-off-by: Zbigniew Kempczyński zbigniew.kempczynski@intel.com
drivers/gpu/drm/i915/intel_memory_region.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 1f26bc06ec20..cea44ddebe46 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -329,6 +329,10 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) mem->instance = instance; mem->gt = &i915->gt;
if (HAS_LMEM(mem->i915) && type != INTEL_MEMORY_SYSTEM)
intel_memory_region_set_name(mem, "%s%u",
mem->name, mem->instance);
sprintf(mem->name, "%s", mem->name)
is that even defined behaviour? -Chris
We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- .../drm/i915/gem/selftests/i915_gem_context.c | 11 +---- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 11 ++--- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++------ drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 48 +++++++++---------- drivers/gpu/drm/i915/gt/intel_gtt.h | 11 +++-- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 7 ++- drivers/gpu/drm/i915/i915_vma.c | 3 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 10 ++-- drivers/gpu/drm/i915/selftests/i915_perf.c | 3 +- 10 files changed, 54 insertions(+), 78 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 5fef592390cb..ce70d0a3afb2 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1740,7 +1740,6 @@ static int read_from_scratch(struct i915_gem_context *ctx, static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) { struct i915_address_space *vm; - struct page *page; u32 *vaddr; int err = 0;
@@ -1748,24 +1747,18 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) if (!vm) return -ENODEV;
- page = __px_page(vm->scratch[0]); - if (!page) { + if (!vm->scratch[0]) { pr_err("No scratch page!\n"); return -EINVAL; }
- vaddr = kmap(page); - if (!vaddr) { - pr_err("No (mappable) scratch page!\n"); - return -EINVAL; - } + vaddr = __px_vaddr(vm->scratch[0]);
memcpy(out, vaddr, sizeof(*out)); if (memchr_inv(vaddr, *out, PAGE_SIZE)) { pr_err("Inconsistent initial state of scratch page!\n"); err = -EINVAL; } - kunmap(page);
return err; } diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index 680bd9442eb0..78ad7d8a8bcc 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -105,9 +105,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, * entries back to scratch. */
- vaddr = kmap_atomic_px(pt); + vaddr = px_vaddr(pt); memset32(vaddr + pte, scratch_pte, count); - kunmap_atomic(vaddr);
pte = 0; } @@ -129,7 +128,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
GEM_BUG_ON(!pd->entry[act_pt]);
- vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt)); + vaddr = px_vaddr(i915_pt_entry(pd, act_pt)); do { GEM_BUG_ON(sg_dma_len(iter.sg) < I915_GTT_PAGE_SIZE); vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma); @@ -145,12 +144,10 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, }
if (++act_pte == GEN6_PTES) { - kunmap_atomic(vaddr); - vaddr = kmap_atomic_px(i915_pt_entry(pd, ++act_pt)); + vaddr = px_vaddr(i915_pt_entry(pd, ++act_pt)); act_pte = 0; } } while (1); - kunmap_atomic(vaddr);
vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; } @@ -244,7 +241,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt) goto err_scratch0; }
- ret = pin_pt_dma(vm, vm->scratch[1]); + ret = map_pt_dma(vm, vm->scratch[1]); if (ret) goto err_scratch1;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index a37c968ef8f7..a3093dd4b86d 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -237,11 +237,10 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
- vaddr = kmap_atomic_px(pt); + vaddr = px_vaddr(pt); memset64(vaddr + gen8_pd_index(start, 0), vm->scratch[0]->encode, count); - kunmap_atomic(vaddr);
atomic_sub(count, &pt->used); start += count; @@ -370,7 +369,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, gen8_pte_t *vaddr;
pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); - vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); do { GEM_BUG_ON(sg_dma_len(iter->sg) < I915_GTT_PAGE_SIZE); vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma; @@ -397,12 +396,10 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, }
clflush_cache_range(vaddr, PAGE_SIZE); - kunmap_atomic(vaddr); - vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); } } while (1); clflush_cache_range(vaddr, PAGE_SIZE); - kunmap_atomic(vaddr);
return idx; } @@ -437,7 +434,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, encode |= GEN8_PDE_PS_2M; page_size = I915_GTT_PAGE_SIZE_2M;
- vaddr = kmap_atomic_px(pd); + vaddr = px_vaddr(pd); } else { struct i915_page_table *pt = i915_pt_entry(pd, __gen8_pte_index(start, 1)); @@ -452,7 +449,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, rem >= (I915_PDES - index) * I915_GTT_PAGE_SIZE)) maybe_64K = __gen8_pte_index(start, 1);
- vaddr = kmap_atomic_px(pt); + vaddr = px_vaddr(pt); }
do { @@ -486,7 +483,6 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, } while (rem >= page_size && index < I915_PDES);
clflush_cache_range(vaddr, PAGE_SIZE); - kunmap_atomic(vaddr);
/* * Is it safe to mark the 2M block as 64K? -- Either we have @@ -500,9 +496,8 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, !iter->sg && IS_ALIGNED(vma->node.start + vma->node.size, I915_GTT_PAGE_SIZE_2M)))) { - vaddr = kmap_atomic_px(pd); + vaddr = px_vaddr(pd); vaddr[maybe_64K] |= GEN8_PDE_IPS_64K; - kunmap_atomic(vaddr); page_size = I915_GTT_PAGE_SIZE_64K;
/* @@ -518,12 +513,11 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, u16 i;
encode = vma->vm->scratch[0]->encode; - vaddr = kmap_atomic_px(i915_pt_entry(pd, maybe_64K)); + vaddr = px_vaddr(i915_pt_entry(pd, maybe_64K));
for (i = 1; i < index; i += 16) memset64(vaddr + i, encode, 15);
- kunmap_atomic(vaddr); } }
@@ -592,7 +586,7 @@ static int gen8_init_scratch(struct i915_address_space *vm) if (IS_ERR(obj)) goto free_scratch;
- ret = pin_pt_dma(vm, obj); + ret = map_pt_dma(vm, obj); if (ret) { i915_gem_object_put(obj); goto free_scratch; @@ -629,7 +623,7 @@ static int gen8_preallocate_top_level_pdp(struct i915_ppgtt *ppgtt) if (IS_ERR(pde)) return PTR_ERR(pde);
- err = pin_pt_dma(vm, pde->pt.base); + err = map_pt_dma(vm, pde->pt.base); if (err) { i915_gem_object_put(pde->pt.base); free_pd(vm, pde); @@ -665,7 +659,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm) goto err_pd; }
- err = pin_pt_dma(vm, pd->pt.base); + err = map_pt_dma(vm, pd->pt.base); if (err) goto err_pd;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 17ecaef1834d..4560e03067a7 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -616,7 +616,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt) goto err_ppgtt;
i915_gem_object_lock(ppgtt->vm.scratch[0], NULL); - err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash); + err = i915_vm_map_pt_stash(&ppgtt->vm, &stash); i915_gem_object_unlock(ppgtt->vm.scratch[0]); if (err) goto err_stash; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 070d538cdc56..f3a263f09368 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -25,27 +25,25 @@ struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) return obj; }
-int pin_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { - int err; + void *vaddr;
- i915_gem_object_lock(obj, NULL); - err = i915_gem_object_pin_pages(obj); - i915_gem_object_unlock(obj); - if (err) - return err; + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr);
i915_gem_object_make_unshrinkable(obj); return 0; }
-int pin_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { - int err; + void *vaddr;
- err = i915_gem_object_pin_pages(obj); - if (err) - return err; + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr);
i915_gem_object_make_unshrinkable(obj); return 0; @@ -155,6 +153,14 @@ void clear_pages(struct i915_vma *vma) memset(&vma->page_sizes, 0, sizeof(vma->page_sizes)); }
+void *__px_vaddr(struct drm_i915_gem_object *p) +{ + enum i915_map_type type; + + GEM_BUG_ON(!i915_gem_object_has_pages(p)); + return page_unpack_bits(p->mm.mapping, &type); +} + dma_addr_t __px_dma(struct drm_i915_gem_object *p) { GEM_BUG_ON(!i915_gem_object_has_pages(p)); @@ -170,32 +176,22 @@ struct page *__px_page(struct drm_i915_gem_object *p) void fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count) { - struct page *page = __px_page(p); - void *vaddr; + void *vaddr = __px_vaddr(p);
- vaddr = kmap(page); memset64(vaddr, val, count); clflush_cache_range(vaddr, PAGE_SIZE); - kunmap(page); }
static void poison_scratch_page(struct drm_i915_gem_object *scratch) { - struct sgt_iter sgt; - struct page *page; + void *vaddr = __px_vaddr(scratch); u8 val;
val = 0; if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) val = POISON_FREE;
- for_each_sgt_page(page, sgt, scratch->mm.pages) { - void *vaddr; - - vaddr = kmap(page); - memset(vaddr, val, PAGE_SIZE); - kunmap(page); - } + memset(vaddr, val, scratch->base.size); }
int setup_scratch_page(struct i915_address_space *vm) @@ -225,7 +221,7 @@ int setup_scratch_page(struct i915_address_space *vm) if (IS_ERR(obj)) goto skip;
- if (pin_pt_dma(vm, obj)) + if (map_pt_dma(vm, obj)) goto skip_obj;
/* We need a single contiguous page for our scratch */ diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 16063b2f0119..5b8ea9c8c654 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -176,6 +176,9 @@ struct page *__px_page(struct drm_i915_gem_object *p); dma_addr_t __px_dma(struct drm_i915_gem_object *p); #define px_dma(px) (__px_dma(px_base(px)))
+void *__px_vaddr(struct drm_i915_gem_object *p); +#define px_vaddr(px) (__px_vaddr(px_base(px))) + #define px_pt(px) \ __px_choose_expr(px, struct i915_page_table *, __x, \ __px_choose_expr(px, struct i915_page_directory *, &__x->pt, \ @@ -506,8 +509,6 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt); void i915_ggtt_suspend(struct i915_ggtt *gtt); void i915_ggtt_resume(struct i915_ggtt *ggtt);
-#define kmap_atomic_px(px) kmap_atomic(__px_page(px_base(px))) - void fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count);
@@ -525,8 +526,8 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm); struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde);
-int pin_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj); -int pin_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj); +int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj); +int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj);
void free_px(struct i915_address_space *vm, struct i915_page_table *pt, int lvl); @@ -573,7 +574,7 @@ void setup_private_pat(struct intel_uncore *uncore); int i915_vm_alloc_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, u64 size); -int i915_vm_pin_pt_stash(struct i915_address_space *vm, +int i915_vm_map_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash); void i915_vm_free_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash); diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index f3ac47702aee..8e7b77cc4594 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -85,11 +85,10 @@ write_dma_entry(struct drm_i915_gem_object * const pdma, const unsigned short idx, const u64 encoded_entry) { - u64 * const vaddr = kmap_atomic(__px_page(pdma)); + u64 * const vaddr = __px_vaddr(pdma);
vaddr[idx] = encoded_entry; clflush_cache_range(&vaddr[idx], sizeof(u64)); - kunmap_atomic(vaddr); }
void @@ -254,7 +253,7 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm, return 0; }
-int i915_vm_pin_pt_stash(struct i915_address_space *vm, +int i915_vm_map_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash) { struct i915_page_table *pt; @@ -262,7 +261,7 @@ int i915_vm_pin_pt_stash(struct i915_address_space *vm,
for (n = 0; n < ARRAY_SIZE(stash->pt); n++) { for (pt = stash->pt[n]; pt; pt = pt->stash) { - err = pin_pt_dma_locked(vm, pt->base); + err = map_pt_dma_locked(vm, pt->base); if (err) return err; } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 7243ab593aec..82f60cc43a90 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -914,8 +914,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, if (err) goto err_fence;
- err = i915_vm_pin_pt_stash(vma->vm, - &work->stash); + err = i915_vm_map_pt_stash(vma->vm, &work->stash); if (err) goto err_fence; } diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index d07dd6780005..9653d7c259a5 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -185,7 +185,7 @@ static int igt_ppgtt_alloc(void *arg) if (err) goto err_ppgtt_cleanup;
- err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash); + err = i915_vm_map_pt_stash(&ppgtt->vm, &stash); if (err) { i915_vm_free_pt_stash(&ppgtt->vm, &stash); goto err_ppgtt_cleanup; @@ -207,7 +207,7 @@ static int igt_ppgtt_alloc(void *arg) if (err) goto err_ppgtt_cleanup;
- err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash); + err = i915_vm_map_pt_stash(&ppgtt->vm, &stash); if (err) { i915_vm_free_pt_stash(&ppgtt->vm, &stash); goto err_ppgtt_cleanup; @@ -324,11 +324,10 @@ static int lowlevel_hole(struct i915_address_space *vm, BIT_ULL(size))) goto alloc_vm_end;
- err = i915_vm_pin_pt_stash(vm, &stash); + err = i915_vm_map_pt_stash(vm, &stash); if (!err) vm->allocate_va_range(vm, &stash, addr, BIT_ULL(size)); - i915_vm_free_pt_stash(vm, &stash); alloc_vm_end: if (err == -EDEADLK) { @@ -1966,10 +1965,9 @@ static int igt_cs_tlb(void *arg) if (err) goto end_ww;
- err = i915_vm_pin_pt_stash(vm, &stash); + err = i915_vm_map_pt_stash(vm, &stash); if (!err) vm->allocate_va_range(vm, &stash, offset, chunk_size); - i915_vm_free_pt_stash(vm, &stash); end_ww: if (err == -EDEADLK) { diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c index debbac660519..6a7abb3e2bb5 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf.c +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg) }
/* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */ - scratch = kmap(__px_page(ce->vm->scratch[0])); + scratch = __px_vaddr(ce->vm->scratch[0]); memset(scratch, POISON_FREE, PAGE_SIZE);
rq = intel_context_create_request(ce); @@ -405,7 +405,6 @@ static int live_noa_gpr(void *arg) out_rq: i915_request_put(rq); out_ce: - kunmap(__px_page(ce->vm->scratch[0])); intel_context_put(ce); out: stream_destroy(stream);
Quoting Matthew Auld (2020-11-27 12:06:14)
We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk
We are going to really struggle with this on 32b :( -Chris
On Fri, 27 Nov 2020 at 13:32, Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Matthew Auld (2020-11-27 12:06:14)
We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk
We are going to really struggle with this on 32b :(
Just go back to mapping everything on demand like we did previously, and unmap as soon as we are done with the current directory across alloc/insert/clear?
-Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Tue, Jan 12, 2021 at 10:47:57AM +0000, Matthew Auld wrote:
On Fri, 27 Nov 2020 at 13:32, Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Matthew Auld (2020-11-27 12:06:14)
We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk
We are going to really struggle with this on 32b :(
Just go back to mapping everything on demand like we did previously, and unmap as soon as we are done with the current directory across alloc/insert/clear?
tbh if you run i915.ko on 32b kernels, on a modern platform, you deserve all the pain you get. There's quite a bit of work going on to essentially make kmap functions worse on 32b (we're not yet at the stage where people propose to nuke them, but getting there slowly), so designing code today with them in mind as primary justification is backwards.
What we can't do is keep kmap around forever, it'd need to be something like vmap that has a long-term mapping intention behind it. And at that point it's probably equally amounts of work to just go back to ad-hoc kmap. Also the rules have changed somewhat with kmap_local anyway, a kmap is a lot less painful in the code than it was with kmap_atomic. -Daniel
It's a requirement that for dgfx we place all the paging structures in device local-memory.
Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 5 ++++- drivers/gpu/drm/i915/gt/intel_gtt.c | 27 +++++++++++++++++++++++++-- drivers/gpu/drm/i915/gt/intel_gtt.h | 1 + 3 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index a3093dd4b86d..f67e0332ccbc 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -702,7 +702,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt) */ ppgtt->vm.has_read_only = !IS_GEN_RANGE(gt->i915, 11, 12);
- ppgtt->vm.alloc_pt_dma = alloc_pt_dma; + if (IS_DGFX(gt->i915)) + ppgtt->vm.alloc_pt_dma = alloc_pt_lmem; + else + ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
err = gen8_init_scratch(&ppgtt->vm); if (err) diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index f3a263f09368..2605bfd39a15 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -7,10 +7,23 @@
#include <linux/fault-inject.h>
+#include "gem/i915_gem_lmem.h" #include "i915_trace.h" #include "intel_gt.h" #include "intel_gtt.h"
+struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz) +{ + struct drm_i915_gem_object *obj; + + obj = i915_gem_object_create_lmem(vm->i915, sz, 0); + + /* ensure all dma objects have the same reservation class */ + if (!IS_ERR(obj)) + obj->base.resv = &vm->resv; + return obj; +} + struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) { struct drm_i915_gem_object *obj; @@ -27,9 +40,14 @@ struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz)
int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { + enum i915_map_type type; void *vaddr;
- vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); + type = I915_MAP_WB; + if (i915_gem_object_is_lmem(obj)) + type = I915_MAP_WC; + + vaddr = i915_gem_object_pin_map_unlocked(obj, type); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
@@ -39,9 +57,14 @@ int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj)
int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { + enum i915_map_type type; void *vaddr;
- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + type = I915_MAP_WB; + if (i915_gem_object_is_lmem(obj)) + type = I915_MAP_WC; + + vaddr = i915_gem_object_pin_map(obj, type); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 5b8ea9c8c654..bdbdfded60cc 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -522,6 +522,7 @@ int setup_scratch_page(struct i915_address_space *vm); void free_scratch(struct i915_address_space *vm);
struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz); +struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz); struct i915_page_table *alloc_pt(struct i915_address_space *vm); struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde);
Now that PDs can also be mapped as WC, we can forgo all the flushing for such mappings.
Signed-off-by: Matthew Auld matthew.auld@intel.com --- .../drm/i915/gem/selftests/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 6 ++--- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++++++++++------- drivers/gpu/drm/i915/gt/intel_gtt.c | 20 ++++++++++---- drivers/gpu/drm/i915/gt/intel_gtt.h | 4 +-- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 6 +++-- drivers/gpu/drm/i915/selftests/i915_perf.c | 2 +- 7 files changed, 42 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index ce70d0a3afb2..e52cc74db2b1 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1752,7 +1752,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) return -EINVAL; }
- vaddr = __px_vaddr(vm->scratch[0]); + vaddr = __px_vaddr(vm->scratch[0], NULL);
memcpy(out, vaddr, sizeof(*out)); if (memchr_inv(vaddr, *out, PAGE_SIZE)) { diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index 78ad7d8a8bcc..8d12e9334861 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -105,7 +105,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, * entries back to scratch. */
- vaddr = px_vaddr(pt); + vaddr = px_vaddr(pt, NULL); memset32(vaddr + pte, scratch_pte, count);
pte = 0; @@ -128,7 +128,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
GEM_BUG_ON(!pd->entry[act_pt]);
- vaddr = px_vaddr(i915_pt_entry(pd, act_pt)); + vaddr = px_vaddr(i915_pt_entry(pd, act_pt), NULL); do { GEM_BUG_ON(sg_dma_len(iter.sg) < I915_GTT_PAGE_SIZE); vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma); @@ -144,7 +144,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, }
if (++act_pte == GEN6_PTES) { - vaddr = px_vaddr(i915_pt_entry(pd, ++act_pt)); + vaddr = px_vaddr(i915_pt_entry(pd, ++act_pt), NULL); act_pte = 0; } } while (1); diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index f67e0332ccbc..e2f1dfc48d43 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -237,7 +237,7 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
- vaddr = px_vaddr(pt); + vaddr = px_vaddr(pt, NULL); memset64(vaddr + gen8_pd_index(start, 0), vm->scratch[0]->encode, count); @@ -367,9 +367,10 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, struct i915_page_directory *pd; const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags); gen8_pte_t *vaddr; + bool needs_flush;
pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); - vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)), &needs_flush); do { GEM_BUG_ON(sg_dma_len(iter->sg) < I915_GTT_PAGE_SIZE); vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma; @@ -395,11 +396,14 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, pd = pdp->entry[gen8_pd_index(idx, 2)]; }
- clflush_cache_range(vaddr, PAGE_SIZE); - vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + if (needs_flush) + clflush_cache_range(vaddr, PAGE_SIZE); + vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)), + &needs_flush); } } while (1); - clflush_cache_range(vaddr, PAGE_SIZE); + if (needs_flush) + clflush_cache_range(vaddr, PAGE_SIZE);
return idx; } @@ -412,6 +416,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags); unsigned int rem = sg_dma_len(iter->sg); u64 start = vma->node.start; + bool needs_flush;
GEM_BUG_ON(!i915_vm_is_4lvl(vma->vm));
@@ -434,7 +439,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, encode |= GEN8_PDE_PS_2M; page_size = I915_GTT_PAGE_SIZE_2M;
- vaddr = px_vaddr(pd); + vaddr = px_vaddr(pd, &needs_flush); } else { struct i915_page_table *pt = i915_pt_entry(pd, __gen8_pte_index(start, 1)); @@ -449,7 +454,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, rem >= (I915_PDES - index) * I915_GTT_PAGE_SIZE)) maybe_64K = __gen8_pte_index(start, 1);
- vaddr = px_vaddr(pt); + vaddr = px_vaddr(pt, &needs_flush); }
do { @@ -482,7 +487,8 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, } } while (rem >= page_size && index < I915_PDES);
- clflush_cache_range(vaddr, PAGE_SIZE); + if (needs_flush) + clflush_cache_range(vaddr, PAGE_SIZE);
/* * Is it safe to mark the 2M block as 64K? -- Either we have @@ -496,7 +502,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, !iter->sg && IS_ALIGNED(vma->node.start + vma->node.size, I915_GTT_PAGE_SIZE_2M)))) { - vaddr = px_vaddr(pd); + vaddr = px_vaddr(pd, NULL); vaddr[maybe_64K] |= GEN8_PDE_IPS_64K; page_size = I915_GTT_PAGE_SIZE_64K;
@@ -513,7 +519,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, u16 i;
encode = vma->vm->scratch[0]->encode; - vaddr = px_vaddr(i915_pt_entry(pd, maybe_64K)); + vaddr = px_vaddr(i915_pt_entry(pd, maybe_64K), NULL);
for (i = 1; i < index; i += 16) memset64(vaddr + i, encode, 15); diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 2605bfd39a15..eee8338e330b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -176,12 +176,19 @@ void clear_pages(struct i915_vma *vma) memset(&vma->page_sizes, 0, sizeof(vma->page_sizes)); }
-void *__px_vaddr(struct drm_i915_gem_object *p) +void *__px_vaddr(struct drm_i915_gem_object *p, bool *needs_flush) { enum i915_map_type type; + void *vaddr;
GEM_BUG_ON(!i915_gem_object_has_pages(p)); - return page_unpack_bits(p->mm.mapping, &type); + + vaddr = page_unpack_bits(p->mm.mapping, &type); + + if (needs_flush) + *needs_flush = type != I915_MAP_WC; + + return vaddr; }
dma_addr_t __px_dma(struct drm_i915_gem_object *p) @@ -199,15 +206,18 @@ struct page *__px_page(struct drm_i915_gem_object *p) void fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count) { - void *vaddr = __px_vaddr(p); + bool needs_flush; + void *vaddr;
+ vaddr = __px_vaddr(p, &needs_flush); memset64(vaddr, val, count); - clflush_cache_range(vaddr, PAGE_SIZE); + if (needs_flush) + clflush_cache_range(vaddr, PAGE_SIZE); }
static void poison_scratch_page(struct drm_i915_gem_object *scratch) { - void *vaddr = __px_vaddr(scratch); + void *vaddr = __px_vaddr(scratch, NULL); u8 val;
val = 0; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index bdbdfded60cc..d96bd19d1b47 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -176,8 +176,8 @@ struct page *__px_page(struct drm_i915_gem_object *p); dma_addr_t __px_dma(struct drm_i915_gem_object *p); #define px_dma(px) (__px_dma(px_base(px)))
-void *__px_vaddr(struct drm_i915_gem_object *p); -#define px_vaddr(px) (__px_vaddr(px_base(px))) +void *__px_vaddr(struct drm_i915_gem_object *p, bool *needs_flush); +#define px_vaddr(px, needs_flush) (__px_vaddr(px_base(px), needs_flush))
#define px_pt(px) \ __px_choose_expr(px, struct i915_page_table *, __x, \ diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 8e7b77cc4594..2d74ae950e4b 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -85,10 +85,12 @@ write_dma_entry(struct drm_i915_gem_object * const pdma, const unsigned short idx, const u64 encoded_entry) { - u64 * const vaddr = __px_vaddr(pdma); + bool needs_flush; + u64 * const vaddr = __px_vaddr(pdma, &needs_flush);
vaddr[idx] = encoded_entry; - clflush_cache_range(&vaddr[idx], sizeof(u64)); + if (needs_flush) + clflush_cache_range(&vaddr[idx], sizeof(u64)); }
void diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c index 6a7abb3e2bb5..6698750ffe8d 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf.c +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg) }
/* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */ - scratch = __px_vaddr(ce->vm->scratch[0]); + scratch = __px_vaddr(ce->vm->scratch[0], NULL); memset(scratch, POISON_FREE, PAGE_SIZE);
rq = intel_context_create_request(ce);
For the PTEs we get an LM bit, to signal whether the page resides in SMEM or LMEM.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com --- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 35 ++++++++++++++++++++++----- drivers/gpu/drm/i915/gt/intel_gtt.h | 3 +++ drivers/gpu/drm/i915/gt/intel_ppgtt.c | 4 +++ 3 files changed, 36 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index e2f1dfc48d43..b6fcebeef02a 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -5,6 +5,7 @@
#include <linux/log2.h>
+#include "gem/i915_gem_lmem.h" #include "gen8_ppgtt.h" #include "i915_scatterlist.h" #include "i915_trace.h" @@ -50,6 +51,21 @@ static u64 gen8_pte_encode(dma_addr_t addr, return pte; }
+static u64 gen12_pte_encode(dma_addr_t addr, + enum i915_cache_level level, + u32 flags) +{ + gen8_pte_t pte = addr | _PAGE_PRESENT | _PAGE_RW; + + if (unlikely(flags & PTE_READ_ONLY)) + pte &= ~_PAGE_RW; + + if (flags & PTE_LM) + pte |= GEN12_PPGTT_PTE_LM; + + return pte; +} + static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create) { struct drm_i915_private *i915 = ppgtt->vm.i915; @@ -365,7 +381,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, u32 flags) { struct i915_page_directory *pd; - const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags); + const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags); gen8_pte_t *vaddr; bool needs_flush;
@@ -413,7 +429,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, enum i915_cache_level cache_level, u32 flags) { - const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags); + const gen8_pte_t pte_encode = vma->vm->pte_encode(0, cache_level, flags); unsigned int rem = sg_dma_len(iter->sg); u64 start = vma->node.start; bool needs_flush; @@ -558,6 +574,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
static int gen8_init_scratch(struct i915_address_space *vm) { + u32 pte_flags = vm->has_read_only; int ret; int i;
@@ -581,9 +598,12 @@ static int gen8_init_scratch(struct i915_address_space *vm) if (ret) return ret;
+ if (i915_gem_object_is_lmem(vm->scratch[0])) + pte_flags |= PTE_LM; + vm->scratch[0]->encode = - gen8_pte_encode(px_dma(vm->scratch[0]), - I915_CACHE_LLC, vm->has_read_only); + vm->pte_encode(px_dma(vm->scratch[0]), + I915_CACHE_LLC, pte_flags);
for (i = 1; i <= vm->top; i++) { struct drm_i915_gem_object *obj; @@ -713,6 +733,11 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt) else ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
+ if (INTEL_GEN(gt->i915) >= 12) + ppgtt->vm.pte_encode = gen12_pte_encode; + else + ppgtt->vm.pte_encode = gen8_pte_encode; + err = gen8_init_scratch(&ppgtt->vm); if (err) goto err_free; @@ -734,8 +759,6 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt) ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc; ppgtt->vm.clear_range = gen8_ppgtt_clear;
- ppgtt->vm.pte_encode = gen8_pte_encode; - if (intel_vgpu_active(gt->i915)) gen8_ppgtt_notify_vgt(ppgtt, true);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index d96bd19d1b47..f47899ef36f4 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -85,6 +85,8 @@ typedef u64 gen8_pte_t; #define BYT_PTE_SNOOPED_BY_CPU_CACHES REG_BIT(2) #define BYT_PTE_WRITEABLE REG_BIT(1)
+#define GEN12_PPGTT_PTE_LM (1 << 11) + /* * Cacheability Control is a 4-bit value. The low three bits are stored in bits * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE. @@ -268,6 +270,7 @@ struct i915_address_space { enum i915_cache_level level, u32 flags); /* Create a valid PTE */ #define PTE_READ_ONLY BIT(0) +#define PTE_LM BIT(1)
void (*allocate_va_range)(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 2d74ae950e4b..731d8730fa5f 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -7,6 +7,8 @@
#include "i915_trace.h" #include "intel_gtt.h" +#include "gem/i915_gem_lmem.h" +#include "gem/i915_gem_region.h" #include "gen6_ppgtt.h" #include "gen8_ppgtt.h"
@@ -193,6 +195,8 @@ void ppgtt_bind_vma(struct i915_address_space *vm, pte_flags = 0; if (i915_gem_object_is_readonly(vma->obj)) pte_flags |= PTE_READ_ONLY; + if (i915_gem_object_is_lmem(vma->obj)) + pte_flags |= PTE_LM;
vm->insert_entries(vm, vma, cache_level, pte_flags); wmb();
Quoting Matthew Auld (2020-11-27 12:06:17)
For the PTEs we get an LM bit, to signal whether the page resides in SMEM or LMEM.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com
drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 35 ++++++++++++++++++++++----- drivers/gpu/drm/i915/gt/intel_gtt.h | 3 +++ drivers/gpu/drm/i915/gt/intel_ppgtt.c | 4 +++ 3 files changed, 36 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index e2f1dfc48d43..b6fcebeef02a 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -5,6 +5,7 @@
#include <linux/log2.h>
+#include "gem/i915_gem_lmem.h" #include "gen8_ppgtt.h" #include "i915_scatterlist.h" #include "i915_trace.h" @@ -50,6 +51,21 @@ static u64 gen8_pte_encode(dma_addr_t addr, return pte; }
+static u64 gen12_pte_encode(dma_addr_t addr,
enum i915_cache_level level,
u32 flags)
+{
gen8_pte_t pte = addr | _PAGE_PRESENT | _PAGE_RW;
if (unlikely(flags & PTE_READ_ONLY))
pte &= ~_PAGE_RW;
if (flags & PTE_LM)
pte |= GEN12_PPGTT_PTE_LM;
return pte;
+}
static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create) { struct drm_i915_private *i915 = ppgtt->vm.i915; @@ -365,7 +381,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, u32 flags) { struct i915_page_directory *pd;
const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags);
We don't need the vfunc, since that flag will not be sent for gen8.
That bit test will be cheaper than the retpoline. -Chris
Based on a patch from Michel Thierry.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 24 ++++++++++++++++++------ drivers/gpu/drm/i915/gt/intel_gtt.h | 3 ++- 2 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 4560e03067a7..26aa5debd7e9 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -10,6 +10,7 @@
#include <drm/i915_drm.h>
+#include "gem/i915_gem_lmem.h" #include "intel_gt.h" #include "i915_drv.h" #include "i915_scatterlist.h" @@ -180,7 +181,12 @@ static u64 gen8_ggtt_pte_encode(dma_addr_t addr, enum i915_cache_level level, u32 flags) { - return addr | _PAGE_PRESENT; + gen8_pte_t pte = addr | _PAGE_PRESENT; + + if (flags & PTE_LM) + pte |= GEN12_GGTT_PTE_LM; + + return pte; }
static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte) @@ -192,13 +198,13 @@ static void gen8_ggtt_insert_page(struct i915_address_space *vm, dma_addr_t addr, u64 offset, enum i915_cache_level level, - u32 unused) + u32 flags) { struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); gen8_pte_t __iomem *pte = (gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
- gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, 0)); + gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
ggtt->invalidate(ggtt); } @@ -208,7 +214,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, enum i915_cache_level level, u32 flags) { - const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, 0); + const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, flags); struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); gen8_pte_t __iomem *gte; gen8_pte_t __iomem *end; @@ -448,8 +454,10 @@ static void ggtt_bind_vma(struct i915_address_space *vm,
/* Applicable to VLV (gen8+ do not support RO in the GGTT) */ pte_flags = 0; - if (i915_gem_object_is_readonly(obj)) + if (vma->vm->has_read_only && i915_gem_object_is_readonly(obj)) pte_flags |= PTE_READ_ONLY; + if (i915_gem_object_is_lmem(obj)) + pte_flags |= PTE_LM;
vm->insert_entries(vm, vma, cache_level, pte_flags); vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; @@ -765,6 +773,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size) struct drm_i915_private *i915 = ggtt->vm.i915; struct pci_dev *pdev = i915->drm.pdev; phys_addr_t phys_addr; + u32 pte_flags = 0; int ret;
/* For Modern GENs the PTEs and register space are split in the BAR */ @@ -794,9 +803,12 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size) return ret; }
+ if (i915_gem_object_is_lmem(ggtt->vm.scratch[0])) + pte_flags |= PTE_LM; + ggtt->vm.scratch[0]->encode = ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), - I915_CACHE_NONE, 0); + I915_CACHE_NONE, pte_flags);
return 0; } diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index f47899ef36f4..db3626c0ee20 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -85,7 +85,8 @@ typedef u64 gen8_pte_t; #define BYT_PTE_SNOOPED_BY_CPU_CACHES REG_BIT(2) #define BYT_PTE_WRITEABLE REG_BIT(1)
-#define GEN12_PPGTT_PTE_LM (1 << 11) +#define GEN12_GGTT_PTE_LM (1 << 1) +#define GEN12_PPGTT_PTE_LM (1 << 11)
/* * Cacheability Control is a 4-bit value. The low three bits are stored in bits
Based on a patch from Michel Thierry.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com --- .../drm/i915/gt/intel_execlists_submission.c | 31 ++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 582a9044727e..c640b90711fd 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -108,6 +108,8 @@ */ #include <linux/interrupt.h>
+#include "gem/i915_gem_lmem.h" + #include "i915_drv.h" #include "i915_perf.h" #include "i915_trace.h" @@ -4660,6 +4662,21 @@ static struct intel_timeline *pinned_timeline(struct intel_context *ce) page_unmask_bits(tl)); }
+static int context_clear_lmem(struct drm_i915_gem_object *ctx_obj) +{ + void *vaddr; + + vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WC); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr); + + memset64(vaddr, 0, ctx_obj->base.size / sizeof(u64)); + + i915_gem_object_unpin_map(ctx_obj); + + return 0; +} + static int __execlists_context_alloc(struct intel_context *ce, struct intel_engine_cs *engine) { @@ -4680,10 +4697,22 @@ static int __execlists_context_alloc(struct intel_context *ce, context_size += PAGE_SIZE; }
- ctx_obj = i915_gem_object_create_shmem(engine->i915, context_size); + if (HAS_LMEM(engine->i915)) { + ctx_obj = i915_gem_object_create_lmem(engine->i915, + context_size, + I915_BO_ALLOC_CONTIGUOUS); + } else { + ctx_obj = i915_gem_object_create_shmem(engine->i915, context_size); + } if (IS_ERR(ctx_obj)) return PTR_ERR(ctx_obj);
+ if (HAS_LMEM(engine->i915)) { + ret = context_clear_lmem(ctx_obj); + if (ret) + goto error_deref_obj; + } + vma = i915_vma_instance(ctx_obj, &engine->gt->ggtt->vm, NULL); if (IS_ERR(vma)) { ret = PTR_ERR(vma);
Quoting Matthew Auld (2020-11-27 12:06:19)
Based on a patch from Michel Thierry.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com
.../drm/i915/gt/intel_execlists_submission.c | 31 ++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 582a9044727e..c640b90711fd 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -108,6 +108,8 @@ */ #include <linux/interrupt.h>
+#include "gem/i915_gem_lmem.h"
#include "i915_drv.h" #include "i915_perf.h" #include "i915_trace.h" @@ -4660,6 +4662,21 @@ static struct intel_timeline *pinned_timeline(struct intel_context *ce) page_unmask_bits(tl)); }
+static int context_clear_lmem(struct drm_i915_gem_object *ctx_obj) +{
void *vaddr;
vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WC);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
memset64(vaddr, 0, ctx_obj->base.size / sizeof(u64));
i915_gem_object_unpin_map(ctx_obj);
What? We copy over the entire object with the default state. -Chris
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com --- drivers/gpu/drm/i915/gt/intel_gt.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 44f1d51e5ae5..caf2e72de1a6 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -4,6 +4,8 @@ */
#include "debugfs_gt.h" + +#include "gem/i915_gem_lmem.h" #include "i915_drv.h" #include "intel_context.h" #include "intel_gt.h" @@ -342,9 +344,15 @@ static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size) struct i915_vma *vma; int ret;
- obj = i915_gem_object_create_stolen(i915, size); - if (IS_ERR(obj)) - obj = i915_gem_object_create_internal(i915, size); + if (HAS_LMEM(i915)) { + obj = i915_gem_object_create_lmem(i915, size, + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_VOLATILE); + } else { + obj = i915_gem_object_create_stolen(i915, size); + if (IS_ERR(obj)) + obj = i915_gem_object_create_internal(i915, size); + } if (IS_ERR(obj)) { DRM_ERROR("Failed to allocate scratch page\n"); return PTR_ERR(obj);
From: Abdiel Janulgue abdiel.janulgue@linux.intel.com
For performance writes over PCIe may not be strictly ordered by default. This provides an option to expose a kernel configuration option to disable relaxed ordering and turn on strict ordering instead for debug purposes.
Signed-off-by: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Stuart Summers stuart.summers@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/Kconfig.debug | 11 +++++++++++ drivers/gpu/drm/i915/intel_memory_region.c | 12 ++++++++++++ 2 files changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug index 0fb7fd0ef717..65533cbbcb82 100644 --- a/drivers/gpu/drm/i915/Kconfig.debug +++ b/drivers/gpu/drm/i915/Kconfig.debug @@ -222,3 +222,14 @@ config DRM_I915_DEBUG_RUNTIME_PM driver loading, suspend and resume operations.
If in doubt, say "N" + +config DRM_I915_PCIE_STRICT_WRITE_ORDERING + bool "Enable PCIe strict ordering " + depends on DRM_I915 + default n + help + Relaxed ordering in writes is enabled by default to improve system + performance. Strict ordering can be selected instead to assist in + debugging. + + If in doubt, say "N". diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index cea44ddebe46..043541d409bd 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -286,6 +286,18 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) { int err, i;
+ /* All platforms currently have system memory */ + GEM_BUG_ON(!HAS_REGION(i915, REGION_SMEM)); + + if (IS_DGFX(i915)) { + if (IS_ENABLED(CONFIG_DRM_I915_PCIE_STRICT_WRITE_ORDERING)) + pcie_capability_clear_word(i915->drm.pdev, PCI_EXP_DEVCTL, + PCI_EXP_DEVCTL_RELAX_EN); + else + pcie_capability_set_word(i915->drm.pdev, PCI_EXP_DEVCTL, + PCI_EXP_DEVCTL_RELAX_EN); + } + for (i = 0; i < ARRAY_SIZE(i915->mm.regions); i++) { struct intel_memory_region *mem = ERR_PTR(-ENODEV); u16 type, instance;
From: CQ Tang cq.tang@intel.com
During high threads contention, the same object had been pinned with a different type. A new pinning will catch -EBUSY if the FORCE flag is not specified.
This error was observed on DG1 silicon during PO.
Cc: Matthew Auld matthew.auld@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Balestrieri, Francesco francesco.balestrieri@intel.com Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Venkata S Dhanalakota venkata.s.dhanalakota@intel.com Cc: Neel Desai neel.desai@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object_blt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c index b41b076f6864..1096f27627d4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c @@ -57,7 +57,7 @@ struct i915_vma *intel_emit_vma_fill_blt(struct intel_context *ce, /* we pinned the pool, mark it as such */ intel_gt_buffer_pool_mark_used(pool);
- cmd = i915_gem_object_pin_map(pool->obj, I915_MAP_WC); + cmd = i915_gem_object_pin_map(pool->obj, I915_MAP_FORCE_WC); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto out_unpin; @@ -297,7 +297,7 @@ struct i915_vma *intel_emit_vma_copy_blt(struct intel_context *ce, /* we pinned the pool, mark it as such */ intel_gt_buffer_pool_mark_used(pool);
- cmd = i915_gem_object_pin_map(pool->obj, I915_MAP_WC); + cmd = i915_gem_object_pin_map(pool->obj, I915_MAP_FORCE_WC); if (IS_ERR(cmd)) { err = PTR_ERR(cmd); goto out_unpin;
From: CQ Tang cq.tang@intel.com
The lmem region needs to remove the stolen part.
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Balestrieri, Francesco francesco.balestrieri@intel.com Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Venkata S Dhanalakota venkata.s.dhanalakota@intel.com Cc: Neel Desai neel.desai@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/i915_reg.h | 2 ++ drivers/gpu/drm/i915/intel_region_lmem.c | 11 +++++++---- 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 1af1966ac461..0e01ea0cb0a4 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12066,6 +12066,8 @@ enum skl_power_gate { #define GEN12_LMEM_CFG_ADDR _MMIO(0xcf58) #define LMEM_ENABLE (1 << 31)
+#define GEN12_GSMBASE _MMIO(0x108100) + /* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) #define GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW 0x67F1427F /* max/min for LRA1/2 */ diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index e98582c76de1..7f2b31d469b0 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -140,20 +140,23 @@ intel_setup_fake_lmem(struct drm_i915_private *i915) static struct intel_memory_region * setup_lmem(struct drm_i915_private *dev_priv) { + struct intel_uncore *uncore = &dev_priv->uncore; struct pci_dev *pdev = dev_priv->drm.pdev; struct intel_memory_region *mem; resource_size_t io_start; - resource_size_t size; + resource_size_t lmem_size;
/* Enables Local Memory functionality in GAM */ I915_WRITE(GEN12_LMEM_CFG_ADDR, I915_READ(GEN12_LMEM_CFG_ADDR) | LMEM_ENABLE);
+ /* Stolen starts from GSMBASE on DG1 */ + lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE); + io_start = pci_resource_start(pdev, 2); - size = pci_resource_len(pdev, 2);
mem = intel_memory_region_create(dev_priv, 0, - size, + lmem_size, I915_GTT_PAGE_SIZE_4K, io_start, &intel_region_lmem_ops); @@ -162,7 +165,7 @@ setup_lmem(struct drm_i915_private *dev_priv) DRM_INFO("Intel graphics LMEM IO start: %llx\n", (u64)mem->io_start); DRM_INFO("Intel graphics LMEM size: %llx\n", - (u64)size); + (u64)lmem_size); }
return mem;
Quoting Matthew Auld (2020-11-27 12:06:23)
From: CQ Tang cq.tang@intel.com
The lmem region needs to remove the stolen part.
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Balestrieri, Francesco francesco.balestrieri@intel.com Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Venkata S Dhanalakota venkata.s.dhanalakota@intel.com Cc: Neel Desai neel.desai@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com
drivers/gpu/drm/i915/i915_reg.h | 2 ++ drivers/gpu/drm/i915/intel_region_lmem.c | 11 +++++++---- 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 1af1966ac461..0e01ea0cb0a4 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12066,6 +12066,8 @@ enum skl_power_gate { #define GEN12_LMEM_CFG_ADDR _MMIO(0xcf58) #define LMEM_ENABLE (1 << 31)
+#define GEN12_GSMBASE _MMIO(0x108100)
/* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) #define GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW 0x67F1427F /* max/min for LRA1/2 */ diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index e98582c76de1..7f2b31d469b0 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -140,20 +140,23 @@ intel_setup_fake_lmem(struct drm_i915_private *i915) static struct intel_memory_region * setup_lmem(struct drm_i915_private *dev_priv)
Am I wrong in thinking lmem should be under gt?
{
struct intel_uncore *uncore = &dev_priv->uncore; struct pci_dev *pdev = dev_priv->drm.pdev; struct intel_memory_region *mem; resource_size_t io_start;
resource_size_t size;
resource_size_t lmem_size; /* Enables Local Memory functionality in GAM */ I915_WRITE(GEN12_LMEM_CFG_ADDR, I915_READ(GEN12_LMEM_CFG_ADDR) | LMEM_ENABLE);
/* Stolen starts from GSMBASE on DG1 */
lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
io_start = pci_resource_start(pdev, 2);
size = pci_resource_len(pdev, 2);
Sanitycheck the two.
size = min(size, lmem_size);
mem = intel_memory_region_create(dev_priv, 0,
size,
lmem_size,
Ok, stolen is at tail not start.
I915_GTT_PAGE_SIZE_4K, io_start, &intel_region_lmem_ops);
@@ -162,7 +165,7 @@ setup_lmem(struct drm_i915_private *dev_priv) DRM_INFO("Intel graphics LMEM IO start: %llx\n", (u64)mem->io_start); DRM_INFO("Intel graphics LMEM size: %llx\n",
(u64)size);
(u64)lmem_size);
Use the correct printf-formats, %pa.
} return mem;
-- 2.26.2
--------------------------------------------------------------------- Intel Corporation (UK) Limited Registered No. 1134945 (England) Registered Office: Pipers Way, Swindon SN3 1RJ VAT No: 860 2173 47
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
From: CQ Tang cq.tang@intel.com
Adjust the page offset with region start dma address.
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index d8cac4c5881f..16424755e89c 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1051,7 +1051,9 @@ i915_vma_coredump_create(const struct intel_gt *gt, for_each_sgt_daddr(dma, iter, vma->pages) { void __iomem *s;
- s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE); + s = io_mapping_map_wc(&mem->iomap, + dma - mem->region.start, + PAGE_SIZE); ret = compress_page(compress, (void __force *)s, dst, true);
From: CQ Tang cq.tang@intel.com
We have three memory region types: INTEL_SMEM, INTEL_LMEM, and INTEL_STOLEN. We also have two types of memory: system memory and device memory (or called local memory).
Memory region with type INTEL_SMEM only has system memory; the other two types of memory regions could have either system memory or device memory.
This function is used to distinguish real local device memory or system memory (including fake local memmory and bios stolen system memory) for INTEL_LMEM and INTEL_STOLEN memory region type.
PPGTT will program PTE_LM bit based on this value.
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Francesco Balestrieri francesco.balestrieri@intel.com Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Venkata S Dhanalakota venkata.s.dhanalakota@intel.com Cc: Neel Desai neel.desai@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 11 ++++++++++- drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 1 + drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 2 +- drivers/gpu/drm/i915/intel_memory_region.h | 1 + drivers/gpu/drm/i915/intel_region_lmem.c | 3 +++ 6 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index 840b68eb10d3..e56874e54fde 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -217,7 +217,16 @@ i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj,
bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) { - return obj->ops == &i915_gem_lmem_obj_ops; + struct intel_memory_region *region = obj->mm.region; + + return region && (region->is_devmem || region->type == INTEL_MEMORY_LOCAL); +} + +bool i915_gem_object_is_devmem(struct drm_i915_gem_object *obj) +{ + struct intel_memory_region *region = obj->mm.region; + + return region && region->is_devmem; }
struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index a24d94bc380f..a1b6a10050bf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -21,6 +21,7 @@ i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj, unsigned long n);
bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj); +bool i915_gem_object_is_devmem(struct drm_i915_gem_object *obj);
struct drm_i915_gem_object * i915_gem_object_create_lmem(struct drm_i915_private *i915, diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 26aa5debd7e9..eed5b640e493 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -456,7 +456,7 @@ static void ggtt_bind_vma(struct i915_address_space *vm, pte_flags = 0; if (vma->vm->has_read_only && i915_gem_object_is_readonly(obj)) pte_flags |= PTE_READ_ONLY; - if (i915_gem_object_is_lmem(obj)) + if (i915_gem_object_is_devmem(obj)) pte_flags |= PTE_LM;
vm->insert_entries(vm, vma, cache_level, pte_flags); diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 731d8730fa5f..34a02643bb75 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -195,7 +195,7 @@ void ppgtt_bind_vma(struct i915_address_space *vm, pte_flags = 0; if (i915_gem_object_is_readonly(vma->obj)) pte_flags |= PTE_READ_ONLY; - if (i915_gem_object_is_lmem(vma->obj)) + if (i915_gem_object_is_devmem(vma->obj)) pte_flags |= PTE_LM;
vm->insert_entries(vm, vma, cache_level, pte_flags); diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 20431d3ce490..ed827c770d47 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -92,6 +92,7 @@ struct intel_memory_region { enum intel_region_id id; char name[8]; struct intel_gt *gt; /* GT closest to this region. */ + bool is_devmem; /* true for device memory */
dma_addr_t remap_addr;
diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index 7f2b31d469b0..939cf0d195a5 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -166,6 +166,9 @@ setup_lmem(struct drm_i915_private *dev_priv) (u64)mem->io_start); DRM_INFO("Intel graphics LMEM size: %llx\n", (u64)lmem_size); + + /* this is real device memory */ + mem->is_devmem = true; }
return mem;
From: CQ Tang cq.tang@intel.com
Current stolen code has partial memory region support. This patch finish the rest of code, so object memory are allocated from stolen memory region.
However, three "global" variables are still kept for the display code to access, "i915->dsm", "i915->dsm_reserved", and "i915->stolen_usable_size",
This is to reduce the amount of code change. Also there is only one display per device, while there could be multipes stolen memory region.
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Balestrieri, Francesco francesco.balestrieri@intel.com Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Venkata S Dhanalakota venkata.s.dhanalakota@intel.com Cc: Neel Desai neel.desai@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/display/intel_fbc.c | 20 ++- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 185 ++++++++++----------- drivers/gpu/drm/i915/gem/i915_gem_stolen.h | 7 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 5 +- drivers/gpu/drm/i915/i915_drv.h | 6 - drivers/gpu/drm/i915/intel_memory_region.h | 3 + 6 files changed, 112 insertions(+), 114 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c index a5b072816a7b..2ad8ddc7e266 100644 --- a/drivers/gpu/drm/i915/display/intel_fbc.c +++ b/drivers/gpu/drm/i915/display/intel_fbc.c @@ -437,6 +437,7 @@ static int find_compression_threshold(struct drm_i915_private *dev_priv, unsigned int size, unsigned int fb_cpp) { + struct intel_memory_region *mem = i915_stolen_region(dev_priv); int compression_threshold = 1; int ret; u64 end; @@ -460,7 +461,7 @@ static int find_compression_threshold(struct drm_i915_private *dev_priv, */
/* Try to over-allocate to reduce reallocations and fragmentation. */ - ret = i915_gem_stolen_insert_node_in_range(dev_priv, node, size <<= 1, + ret = i915_gem_stolen_insert_node_in_range(mem, node, size <<= 1, 4096, 0, end); if (ret == 0) return compression_threshold; @@ -471,7 +472,7 @@ static int find_compression_threshold(struct drm_i915_private *dev_priv, (fb_cpp == 2 && compression_threshold == 2)) return 0;
- ret = i915_gem_stolen_insert_node_in_range(dev_priv, node, size >>= 1, + ret = i915_gem_stolen_insert_node_in_range(mem, node, size >>= 1, 4096, 0, end); if (ret && INTEL_GEN(dev_priv) <= 4) { return 0; @@ -486,6 +487,7 @@ static int find_compression_threshold(struct drm_i915_private *dev_priv, static int intel_fbc_alloc_cfb(struct drm_i915_private *dev_priv, unsigned int size, unsigned int fb_cpp) { + struct intel_memory_region *mem = i915_stolen_region(dev_priv); struct intel_fbc *fbc = &dev_priv->fbc; struct drm_mm_node *compressed_llb; int ret; @@ -515,7 +517,7 @@ static int intel_fbc_alloc_cfb(struct drm_i915_private *dev_priv, if (!compressed_llb) goto err_fb;
- ret = i915_gem_stolen_insert_node(dev_priv, compressed_llb, + ret = i915_gem_stolen_insert_node(mem, compressed_llb, 4096, 4096); if (ret) goto err_fb; @@ -542,15 +544,16 @@ static int intel_fbc_alloc_cfb(struct drm_i915_private *dev_priv,
err_fb: kfree(compressed_llb); - i915_gem_stolen_remove_node(dev_priv, &fbc->compressed_fb); + i915_gem_stolen_remove_node(mem, &fbc->compressed_fb); err_llb: - if (drm_mm_initialized(&dev_priv->mm.stolen)) + if (drm_mm_initialized(&mem->stolen)) drm_info_once(&dev_priv->drm, "not enough stolen space for compressed buffer (need %d more bytes), disabling. Hint: you may be able to increase stolen memory size in the BIOS to avoid this.\n", size); return -ENOSPC; }
static void __intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv) { + struct intel_memory_region *mem = i915_stolen_region(dev_priv); struct intel_fbc *fbc = &dev_priv->fbc;
if (WARN_ON(intel_fbc_hw_is_active(dev_priv))) @@ -560,11 +563,11 @@ static void __intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv) return;
if (fbc->compressed_llb) { - i915_gem_stolen_remove_node(dev_priv, fbc->compressed_llb); + i915_gem_stolen_remove_node(mem, fbc->compressed_llb); kfree(fbc->compressed_llb); }
- i915_gem_stolen_remove_node(dev_priv, &fbc->compressed_fb); + i915_gem_stolen_remove_node(mem, &fbc->compressed_fb); }
void intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv) @@ -1468,12 +1471,13 @@ static bool need_fbc_vtd_wa(struct drm_i915_private *dev_priv) void intel_fbc_init(struct drm_i915_private *dev_priv) { struct intel_fbc *fbc = &dev_priv->fbc; + struct intel_memory_region *mem = i915_stolen_region(dev_priv);
INIT_WORK(&fbc->underrun_work, intel_fbc_underrun_work_fn); mutex_init(&fbc->lock); fbc->active = false;
- if (!drm_mm_initialized(&dev_priv->mm.stolen)) + if (!mem || !drm_mm_initialized(&mem->stolen)) mkwrite_device_info(dev_priv)->display.has_fbc = false;
if (need_fbc_vtd_wa(dev_priv)) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 25e3cc53316e..0ddf48e472a0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -27,44 +27,44 @@ * for is a boon. */
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *i915, +int i915_gem_stolen_insert_node_in_range(struct intel_memory_region *mem, struct drm_mm_node *node, u64 size, unsigned alignment, u64 start, u64 end) { int ret;
- if (!drm_mm_initialized(&i915->mm.stolen)) + if (!drm_mm_initialized(&mem->stolen)) return -ENODEV;
/* WaSkipStolenMemoryFirstPage:bdw+ */ - if (INTEL_GEN(i915) >= 8 && start < 4096) + if (INTEL_GEN(mem->i915) >= 8 && start < 4096) start = 4096;
- mutex_lock(&i915->mm.stolen_lock); - ret = drm_mm_insert_node_in_range(&i915->mm.stolen, node, + mutex_lock(&mem->mm_lock); + ret = drm_mm_insert_node_in_range(&mem->stolen, node, size, alignment, 0, start, end, DRM_MM_INSERT_BEST); - mutex_unlock(&i915->mm.stolen_lock); + mutex_unlock(&mem->mm_lock);
return ret; }
-int i915_gem_stolen_insert_node(struct drm_i915_private *i915, +int i915_gem_stolen_insert_node(struct intel_memory_region *mem, struct drm_mm_node *node, u64 size, unsigned alignment) { - return i915_gem_stolen_insert_node_in_range(i915, node, + return i915_gem_stolen_insert_node_in_range(mem, node, size, alignment, I915_GEM_STOLEN_BIAS, U64_MAX); }
-void i915_gem_stolen_remove_node(struct drm_i915_private *i915, +void i915_gem_stolen_remove_node(struct intel_memory_region *mem, struct drm_mm_node *node) { - mutex_lock(&i915->mm.stolen_lock); + mutex_lock(&mem->mm_lock); drm_mm_remove_node(node); - mutex_unlock(&i915->mm.stolen_lock); + mutex_unlock(&mem->mm_lock); }
static int i915_adjust_stolen(struct drm_i915_private *i915, @@ -159,12 +159,12 @@ static int i915_adjust_stolen(struct drm_i915_private *i915, return 0; }
-static void i915_gem_cleanup_stolen(struct drm_i915_private *i915) +static void i915_gem_cleanup_stolen(struct intel_memory_region *mem) { - if (!drm_mm_initialized(&i915->mm.stolen)) + if (!drm_mm_initialized(&mem->stolen)) return;
- drm_mm_takedown(&i915->mm.stolen); + drm_mm_takedown(&mem->stolen); }
static void g4x_get_stolen_reserved(struct drm_i915_private *i915, @@ -374,14 +374,13 @@ static void icl_get_stolen_reserved(struct drm_i915_private *i915, } }
-static int i915_gem_init_stolen(struct drm_i915_private *i915) +static int i915_gem_init_stolen(struct intel_memory_region *mem) { + struct drm_i915_private *i915 = mem->i915; struct intel_uncore *uncore = &i915->uncore; resource_size_t reserved_base, stolen_top; resource_size_t reserved_total, reserved_size;
- mutex_init(&i915->mm.stolen_lock); - if (intel_vgpu_active(i915)) { drm_notice(&i915->drm, "%s, disabling use of stolen memory\n", @@ -396,10 +395,10 @@ static int i915_gem_init_stolen(struct drm_i915_private *i915) return 0; }
- if (resource_size(&intel_graphics_stolen_res) == 0) + if (resource_size(&mem->region) == 0) return 0;
- i915->dsm = intel_graphics_stolen_res; + i915->dsm = mem->region;
if (i915_adjust_stolen(i915, &i915->dsm)) return 0; @@ -492,7 +491,7 @@ static int i915_gem_init_stolen(struct drm_i915_private *i915) resource_size(&i915->dsm) - reserved_total;
/* Basic memrange allocator for stolen space. */ - drm_mm_init(&i915->mm.stolen, 0, i915->stolen_usable_size); + drm_mm_init(&mem->stolen, 0, i915->stolen_usable_size);
return 0; } @@ -535,14 +534,14 @@ static void dbg_poison(struct i915_ggtt *ggtt, }
static struct sg_table * -i915_pages_create_for_stolen(struct drm_device *dev, +i915_pages_create_for_stolen(struct drm_i915_gem_object *obj, resource_size_t offset, resource_size_t size) { - struct drm_i915_private *i915 = to_i915(dev); + struct intel_memory_region *mem = obj->mm.region; struct sg_table *st; struct scatterlist *sg;
- GEM_BUG_ON(range_overflows(offset, size, resource_size(&i915->dsm))); + GEM_BUG_ON(range_overflows(offset, size, resource_size(&mem->region)));
/* We hide that we have no struct page backing our stolen object * by wrapping the contiguous physical allocation with a fake @@ -562,7 +561,7 @@ i915_pages_create_for_stolen(struct drm_device *dev, sg->offset = 0; sg->length = size;
- sg_dma_address(sg) = (dma_addr_t)i915->dsm.start + offset; + sg_dma_address(sg) = (dma_addr_t)mem->region.start + offset; sg_dma_len(sg) = size;
return st; @@ -571,7 +570,7 @@ i915_pages_create_for_stolen(struct drm_device *dev, static int i915_gem_object_get_pages_stolen(struct drm_i915_gem_object *obj) { struct sg_table *pages = - i915_pages_create_for_stolen(obj->base.dev, + i915_pages_create_for_stolen(obj, obj->stolen->start, obj->stolen->size); if (IS_ERR(pages)) @@ -590,118 +589,113 @@ static int i915_gem_object_get_pages_stolen(struct drm_i915_gem_object *obj) static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj, struct sg_table *pages) { - /* Should only be called from i915_gem_object_release_stolen() */ + struct intel_memory_region *mem = obj->mm.region; + struct drm_mm_node *stolen = fetch_and_zero(&obj->stolen); + + GEM_BUG_ON(!mem); + GEM_BUG_ON(!stolen);
dbg_poison(&to_i915(obj->base.dev)->ggtt, sg_dma_address(pages->sgl), sg_dma_len(pages->sgl), POISON_FREE);
+ i915_gem_stolen_remove_node(mem, stolen); + kfree(stolen); + sg_free_table(pages); kfree(pages); }
-static void -i915_gem_object_release_stolen(struct drm_i915_gem_object *obj) -{ - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct drm_mm_node *stolen = fetch_and_zero(&obj->stolen); - - GEM_BUG_ON(!stolen); - - i915_gem_object_release_memory_region(obj); - - i915_gem_stolen_remove_node(i915, stolen); - kfree(stolen); -} - static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = { .name = "i915_gem_object_stolen", .get_pages = i915_gem_object_get_pages_stolen, .put_pages = i915_gem_object_put_pages_stolen, - .release = i915_gem_object_release_stolen, + .release = i915_gem_object_release_memory_region, };
static struct drm_i915_gem_object * __i915_gem_object_create_stolen(struct intel_memory_region *mem, - struct drm_mm_node *stolen) + resource_size_t size, + unsigned int flags) { static struct lock_class_key lock_class; + struct drm_i915_private *i915 = mem->i915; struct drm_i915_gem_object *obj; - unsigned int cache_level; - int err = -ENOMEM; + + if (!drm_mm_initialized(&mem->stolen)) + return ERR_PTR(-ENODEV); + + if (size == 0) + return ERR_PTR(-EINVAL);
obj = i915_gem_object_alloc(); if (!obj) - goto err; + return ERR_PTR(-ENOMEM);
- drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size); - i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 0); + drm_gem_private_object_init(&i915->drm, &obj->base, size); + i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, + flags);
- obj->stolen = stolen; obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT; - cache_level = HAS_LLC(mem->i915) ? I915_CACHE_LLC : I915_CACHE_NONE; - i915_gem_object_set_cache_coherency(obj, cache_level); - - if (WARN_ON(!i915_gem_object_trylock(obj))) { - err = -EBUSY; - goto cleanup; - } - - err = i915_gem_object_pin_pages(obj); - if (err) { - i915_gem_object_unlock(obj); - goto cleanup; - } + obj->cache_level = HAS_LLC(i915) ? I915_CACHE_LLC : I915_CACHE_NONE;
i915_gem_object_init_memory_region(obj, mem); - i915_gem_object_unlock(obj);
return obj; - -cleanup: - i915_gem_object_free(obj); -err: - return ERR_PTR(err); }
static struct drm_i915_gem_object * -_i915_gem_object_create_stolen(struct intel_memory_region *mem, - resource_size_t size, - unsigned int flags) +i915_gem_object_create_stolen_region(struct intel_memory_region *mem, + resource_size_t size, + unsigned int flags) { - struct drm_i915_private *i915 = mem->i915; - struct drm_i915_gem_object *obj; + struct drm_i915_gem_object *obj, *err; struct drm_mm_node *stolen; int ret;
- if (!drm_mm_initialized(&i915->mm.stolen)) - return ERR_PTR(-ENODEV); - - if (size == 0) - return ERR_PTR(-EINVAL); - stolen = kzalloc(sizeof(*stolen), GFP_KERNEL); if (!stolen) return ERR_PTR(-ENOMEM);
- ret = i915_gem_stolen_insert_node(i915, stolen, size, 4096); + ret = i915_gem_stolen_insert_node(mem, stolen, size, + mem->min_page_size); if (ret) { - obj = ERR_PTR(ret); + err = ERR_PTR(ret); goto err_free; }
- obj = __i915_gem_object_create_stolen(mem, stolen); - if (IS_ERR(obj)) + obj = __i915_gem_object_create_stolen(mem, size, + I915_BO_ALLOC_CONTIGUOUS); + if (IS_ERR(obj)) { + err = obj; goto err_remove; + } + + /* must set before pin pages */ + obj->stolen = stolen; + + /* if pinning fails, caller needs to free stolen */ + if (drm_WARN_ON(obj->base.dev, !i915_gem_object_trylock(obj))) { + ret = -EBUSY; + goto free_obj; + } + ret = i915_gem_object_pin_pages(obj); + i915_gem_object_unlock(obj); + if (ret) { + err = ERR_PTR(ret); + goto free_obj; + }
return obj;
+free_obj: + i915_gem_object_put(obj); err_remove: - i915_gem_stolen_remove_node(i915, stolen); + i915_gem_stolen_remove_node(mem, stolen); err_free: kfree(stolen); - return obj; + return err; }
struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915) @@ -728,18 +722,18 @@ static int init_stolen(struct intel_memory_region *mem) * Initialise stolen early so that we may reserve preallocated * objects for the BIOS to KMS transition. */ - return i915_gem_init_stolen(mem->i915); + return i915_gem_init_stolen(mem); }
static void release_stolen(struct intel_memory_region *mem) { - i915_gem_cleanup_stolen(mem->i915); + i915_gem_cleanup_stolen(mem); }
static const struct intel_memory_region_ops i915_region_stolen_ops = { .init = init_stolen, .release = release_stolen, - .create_object = _i915_gem_object_create_stolen, + .create_object = i915_gem_object_create_stolen_region, };
struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915) @@ -761,9 +755,6 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915, struct drm_mm_node *stolen; int ret;
- if (!drm_mm_initialized(&i915->mm.stolen)) - return ERR_PTR(-ENODEV); - drm_dbg(&i915->drm, "creating preallocated stolen object: stolen_offset=%pa, size=%pa\n", &stolen_offset, &size); @@ -780,23 +771,27 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915,
stolen->start = stolen_offset; stolen->size = size; - mutex_lock(&i915->mm.stolen_lock); - ret = drm_mm_reserve_node(&i915->mm.stolen, stolen); - mutex_unlock(&i915->mm.stolen_lock); + mutex_lock(&mem->mm_lock); + ret = drm_mm_reserve_node(&mem->stolen, stolen); + mutex_unlock(&mem->mm_lock); if (ret) { obj = ERR_PTR(ret); goto err_free; }
- obj = __i915_gem_object_create_stolen(mem, stolen); + obj = __i915_gem_object_create_stolen(mem, size, + I915_BO_ALLOC_CONTIGUOUS); if (IS_ERR(obj)) goto err_stolen;
+ /* must set before pin pages */ + obj->stolen = stolen; + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); return obj;
err_stolen: - i915_gem_stolen_remove_node(i915, stolen); + i915_gem_stolen_remove_node(mem, stolen); err_free: kfree(stolen); return obj; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h index 67f6264f3ff9..f64a5552e56b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h @@ -11,15 +11,16 @@ struct drm_i915_private; struct drm_mm_node; struct drm_i915_gem_object; +struct intel_memory_region;
-int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv, +int i915_gem_stolen_insert_node(struct intel_memory_region *mem, struct drm_mm_node *node, u64 size, unsigned alignment); -int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv, +int i915_gem_stolen_insert_node_in_range(struct intel_memory_region *mem, struct drm_mm_node *node, u64 size, unsigned alignment, u64 start, u64 end); -void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv, +void i915_gem_stolen_remove_node(struct intel_memory_region *mem, struct drm_mm_node *node); struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index ef5aeebbeeb0..7f4fd49bdd73 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -20,6 +20,7 @@ __igt_reset_stolen(struct intel_gt *gt, { struct i915_ggtt *ggtt = >->i915->ggtt; const struct resource *dsm = >->i915->dsm; + struct intel_memory_region *mem = i915_stolen_region(gt->i915); resource_size_t num_pages, page; struct intel_engine_cs *engine; intel_wakeref_t wakeref; @@ -92,7 +93,7 @@ __igt_reset_stolen(struct intel_gt *gt, ggtt->error_capture.start, PAGE_SIZE);
- if (!__drm_mm_interval_first(>->i915->mm.stolen, + if (!__drm_mm_interval_first(&mem->stolen, page << PAGE_SHIFT, ((page + 1) << PAGE_SHIFT) - 1)) memset32(s, STACK_MAGIC, PAGE_SIZE / sizeof(u32)); @@ -139,7 +140,7 @@ __igt_reset_stolen(struct intel_gt *gt, x = crc32_le(0, in, PAGE_SIZE);
if (x != crc[page] && - !__drm_mm_interval_first(>->i915->mm.stolen, + !__drm_mm_interval_first(&mem->stolen, page << PAGE_SHIFT, ((page + 1) << PAGE_SHIFT) - 1)) { pr_debug("unused stolen page %pa modified by GPU reset\n", diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 13cb4936f15c..1366b53ac8c9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -549,12 +549,6 @@ struct intel_l3_parity { };
struct i915_gem_mm { - /** Memory allocator for GTT stolen memory */ - struct drm_mm stolen; - /** Protects the usage of the GTT stolen memory allocator. This is - * always the inner lock when overlapping with struct_mutex. */ - struct mutex stolen_lock; - /* Protects bound_list/unbound_list and #drm_i915_gem_object.mm.link */ spinlock_t obj_lock;
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index ed827c770d47..b7a9e34faaf1 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -6,6 +6,7 @@ #ifndef __INTEL_MEMORY_REGION_H__ #define __INTEL_MEMORY_REGION_H__
+#include <drm/drm_mm.h> #include <linux/kref.h> #include <linux/ioport.h> #include <linux/mutex.h> @@ -77,6 +78,8 @@ struct intel_memory_region { /* For fake LMEM */ struct drm_mm_node fake_mappable;
+ struct drm_mm stolen; + struct i915_buddy_mm mm; struct mutex mm_lock;
For some internal device local-memory objects it would be useful to have an option to CPU clear the pages upon gathering the backing store. Note that this might be before the blitter is useable, which is the case for some internal GuC objects.
Signed-off-by: Matthew Auld matthew.auld@intel.com --- .../gpu/drm/i915/gem/i915_gem_object_types.h | 6 +- drivers/gpu/drm/i915/gem/i915_gem_region.c | 20 +++++ .../drm/i915/selftests/intel_memory_region.c | 90 ++++++++++++++++++- 3 files changed, 113 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 115ad32c303f..8d639509b78b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -166,10 +166,12 @@ struct drm_i915_gem_object { #define I915_BO_ALLOC_CONTIGUOUS BIT(0) #define I915_BO_ALLOC_VOLATILE BIT(1) #define I915_BO_ALLOC_STRUCT_PAGE BIT(2) +#define I915_BO_ALLOC_CPU_CLEAR BIT(3) #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \ I915_BO_ALLOC_VOLATILE | \ - I915_BO_ALLOC_STRUCT_PAGE) -#define I915_BO_READONLY BIT(3) + I915_BO_ALLOC_STRUCT_PAGE | \ + I915_BO_ALLOC_CPU_CLEAR) +#define I915_BO_READONLY BIT(4)
/* * Is the object to be mapped as read-only to the GPU diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 8f352ba6202d..e497ff374b13 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -95,6 +95,26 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) sg_mark_end(sg); i915_sg_trim(st);
+ /* Intended for kernel internal use only */ + if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) { + struct scatterlist *sg; + unsigned long i; + + for_each_sg(st->sgl, sg, st->nents, i) { + unsigned int length; + void __iomem *vaddr; + dma_addr_t daddr; + + daddr = sg_dma_address(sg); + daddr -= mem->region.start; + length = sg_dma_len(sg); + + vaddr = io_mapping_map_wc(&mem->iomap, daddr, length); + memset64(vaddr, 0, length / sizeof(u64)); + io_mapping_unmap(vaddr); + } + } + __i915_gem_object_set_pages(obj, st, sg_page_sizes);
return 0; diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 7acb94e0e5fe..93e067951e0f 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -361,7 +361,7 @@ static int igt_cpu_check(struct drm_i915_gem_object *obj, u32 dword, u32 val) if (err) return err;
- ptr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); + ptr = i915_gem_object_pin_map(obj, I915_MAP_WC); if (IS_ERR(ptr)) return PTR_ERR(ptr);
@@ -441,7 +441,9 @@ static int igt_gpu_write(struct i915_gem_context *ctx, if (err) break;
+ i915_gem_object_lock(obj, NULL); err = igt_cpu_check(obj, dword, rng); + i915_gem_object_unlock(obj); if (err) break; } while (!__igt_timeout(end_time, NULL)); @@ -542,6 +544,91 @@ static int igt_lmem_create_migrate(void *arg)
return err; } + +static int igt_lmem_create_cleared_cpu(void *arg) +{ + struct drm_i915_private *i915 = arg; + I915_RND_STATE(prng); + IGT_TIMEOUT(end_time); + u32 size, val, i; + int err; + + i915_gem_drain_freed_objects(i915); + + size = max_t(u32, PAGE_SIZE, i915_prandom_u32_max_state(SZ_32M, &prng)); + size = round_up(size, PAGE_SIZE); + i = 0; + + do { + struct drm_i915_gem_object *obj; + void __iomem *vaddr; + unsigned int flags; + unsigned long n; + u32 dword; + + /* + * Alternate between cleared and uncleared allocations, while + * also dirtying the pages each time to check that they either + * remain dirty or are indeed cleared. Allocations should be + * deterministic. + */ + + flags = I915_BO_ALLOC_CPU_CLEAR; + if (i & 1) + flags = 0; + else + val = 0; + + obj = i915_gem_object_create_lmem(i915, size, flags); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + i915_gem_object_lock(obj, NULL); + err = i915_gem_object_pin_pages(obj); + if (err) + goto out_put; + + dword = i915_prandom_u32_max_state(PAGE_SIZE / sizeof(u32), + &prng); + + err = igt_cpu_check(obj, dword, val); + if (err) { + pr_err("%s failed with size=%u, flags=%u\n", + __func__, size, flags); + goto out_unpin; + } + + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + goto out_unpin; + } + + val = prandom_u32_state(&prng); + + for (n = 0; n < obj->base.size >> PAGE_SHIFT; ++n) { + memset32(vaddr + n * PAGE_SIZE, val, + PAGE_SIZE / sizeof(u32)); + } + + i915_gem_object_unpin_map(obj); +out_unpin: + i915_gem_object_unpin_pages(obj); + __i915_gem_object_put_pages(obj); +out_put: + i915_gem_object_unlock(obj); + i915_gem_object_put(obj); + + if (err) + break; + ++i; + } while (!__igt_timeout(end_time, NULL)); + + pr_info("%s completed (%u) iterations\n", __func__, i); + + return err; +} + static int igt_lmem_write_gpu(void *arg) { struct drm_i915_private *i915 = arg; @@ -1125,6 +1212,7 @@ int intel_memory_region_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { SUBTEST(igt_lmem_create), + SUBTEST(igt_lmem_create_cleared_cpu), SUBTEST(igt_lmem_write_cpu), SUBTEST(igt_lmem_write_gpu), SUBTEST(igt_smem_create_migrate),
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
The firmware binary has to be loaded from lmem and the recommendation is to put all other objects in there as well. Note that we don't fall back to system memory if the allocation in lmem fails because all objects are allocated during driver load and if we have issues with lmem at that point something is seriously wrong with the system, so no point in trying to handle it.
Cc: Matthew Auld matthew.auld@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Michal Wajdeczko michal.wajdeczko@intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Cc: Radoslaw Szwichtenberg radoslaw.szwichtenberg@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Michal Wajdeczko michal.wajdeczko@intel.com #v1 Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 41 +++++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 8 +++++ drivers/gpu/drm/i915/gt/uc/intel_guc.c | 9 ++++- drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 11 ++++-- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++-- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 35 ++++++++++++++++--- 6 files changed, 107 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index e56874e54fde..71c07e1f6f26 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -215,6 +215,21 @@ i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj, return io_mapping_map_atomic_wc(&obj->mm.region->iomap, offset); }
+void __iomem * +i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, + unsigned long n, + unsigned long size) +{ + resource_size_t offset; + + GEM_BUG_ON(!i915_gem_object_is_contiguous(obj)); + + offset = i915_gem_object_get_dma_address(obj, n); + offset -= obj->mm.region->region.start; + + return io_mapping_map_wc(&obj->mm.region->iomap, offset, size); +} + bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) { struct intel_memory_region *region = obj->mm.region; @@ -229,6 +244,32 @@ bool i915_gem_object_is_devmem(struct drm_i915_gem_object *obj) return region && region->is_devmem; }
+struct drm_i915_gem_object * +i915_gem_object_create_lmem_from_data(struct drm_i915_private *i915, + const void *data, size_t size) +{ + struct drm_i915_gem_object *obj; + void *map; + + obj = i915_gem_object_create_lmem(i915, + round_up(size, PAGE_SIZE), + I915_BO_ALLOC_CONTIGUOUS); + if (IS_ERR(obj)) + return obj; + + map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); + if (IS_ERR(map)) { + i915_gem_object_put(obj); + return map; + } + + memcpy(map, data, size); + + i915_gem_object_unpin_map(obj); + + return obj; +} + struct drm_i915_gem_object * i915_gem_object_create_lmem(struct drm_i915_private *i915, resource_size_t size, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index a1b6a10050bf..e11e0545e39c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -14,6 +14,10 @@ struct intel_memory_region;
extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops;
+void __iomem * +i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, + unsigned long n, + unsigned long size); void __iomem *i915_gem_object_lmem_io_map_page(struct drm_i915_gem_object *obj, unsigned long n); void __iomem * @@ -23,6 +27,10 @@ i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj, bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj); bool i915_gem_object_is_devmem(struct drm_i915_gem_object *obj);
+struct drm_i915_gem_object * +i915_gem_object_create_lmem_from_data(struct drm_i915_private *i915, + const void *data, size_t size); + struct drm_i915_gem_object * i915_gem_object_create_lmem(struct drm_i915_private *i915, resource_size_t size, diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index b54b9de31c3e..703726825c50 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */
+#include "gem/i915_gem_lmem.h" #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm_irq.h" @@ -650,7 +651,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) u64 flags; int ret;
- obj = i915_gem_object_create_shmem(gt->i915, size); + if (HAS_LMEM(gt->i915)) + obj = i915_gem_object_create_lmem(gt->i915, size, + I915_BO_ALLOC_CPU_CLEAR | + I915_BO_ALLOC_CONTIGUOUS); + else + obj = i915_gem_object_create_shmem(gt->i915, size); + if (IS_ERR(obj)) return ERR_CAST(obj);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c index f9d0907ea1a5..8790052f1562 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c @@ -41,7 +41,7 @@ static void guc_prepare_xfer(struct intel_uncore *uncore) }
/* Copy RSA signature from the fw image to HW for verification */ -static void guc_xfer_rsa(struct intel_uc_fw *guc_fw, +static int guc_xfer_rsa(struct intel_uc_fw *guc_fw, struct intel_uncore *uncore) { u32 rsa[UOS_RSA_SCRATCH_COUNT]; @@ -49,10 +49,13 @@ static void guc_xfer_rsa(struct intel_uc_fw *guc_fw, int i;
copied = intel_uc_fw_copy_rsa(guc_fw, rsa, sizeof(rsa)); - GEM_BUG_ON(copied < sizeof(rsa)); + if (copied < sizeof(rsa)) + return -ENOMEM;
for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++) intel_uncore_write(uncore, UOS_RSA_SCRATCH(i), rsa[i]); + + return 0; }
/* @@ -142,7 +145,9 @@ int intel_guc_fw_upload(struct intel_guc *guc) * by the DMA engine in one operation, whereas the RSA signature is * loaded via MMIO. */ - guc_xfer_rsa(&guc->fw, uncore); + ret = guc_xfer_rsa(&guc->fw, uncore); + if (ret) + goto out;
/* * Current uCode expects the code to be loaded at 8k; locations below diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 56d2144dc6a0..c70bd024f1e1 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -87,17 +87,25 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) vma->obj, true)); if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); - return PTR_ERR(vaddr); + err = PTR_ERR(vaddr); + goto unpin_out; }
copied = intel_uc_fw_copy_rsa(&huc->fw, vaddr, vma->size); - GEM_BUG_ON(copied < huc->fw.rsa_size); - i915_gem_object_unpin_map(vma->obj);
+ if (copied < huc->fw.rsa_size) { + err = -ENOMEM; + goto unpin_out; + } + huc->rsa_data = vma;
return 0; + +unpin_out: + i915_vma_unpin_and_release(&vma, 0); + return err; }
static void intel_huc_rsa_data_destroy(struct intel_huc *huc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index b05076d190cc..795eca2bd5b4 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -7,6 +7,7 @@ #include <linux/firmware.h> #include <drm/drm_print.h>
+#include "gem/i915_gem_lmem.h" #include "intel_uc_fw.h" #include "intel_uc_fw_abi.h" #include "i915_drv.h" @@ -371,7 +372,11 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw) if (uc_fw->type == INTEL_UC_FW_TYPE_GUC) uc_fw->private_data_size = css->private_data_size;
- obj = i915_gem_object_create_shmem_from_data(i915, fw->data, fw->size); + if (HAS_LMEM(i915)) + obj = i915_gem_object_create_lmem_from_data(i915, fw->data, fw->size); + else + obj = i915_gem_object_create_shmem_from_data(i915, fw->data, fw->size); + if (IS_ERR(obj)) { err = PTR_ERR(obj); goto fail; @@ -420,14 +425,19 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw) .pages = obj->mm.pages, .vm = &ggtt->vm, }; + u32 pte_flags = 0;
GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); GEM_BUG_ON(dummy.node.size > ggtt->uc_fw.size);
/* uc_fw->obj cache domains were not controlled across suspend */ - drm_clflush_sg(dummy.pages); + if (i915_gem_object_has_struct_page(obj)) + drm_clflush_sg(dummy.pages); + + if (i915_gem_object_is_lmem(obj)) + pte_flags |= PTE_LM;
- ggtt->vm.insert_entries(&ggtt->vm, &dummy, I915_CACHE_NONE, 0); + ggtt->vm.insert_entries(&ggtt->vm, &dummy, I915_CACHE_NONE, pte_flags); }
static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw) @@ -592,7 +602,24 @@ size_t intel_uc_fw_copy_rsa(struct intel_uc_fw *uc_fw, void *dst, u32 max_len)
GEM_BUG_ON(!intel_uc_fw_is_available(uc_fw));
- return sg_pcopy_to_buffer(pages->sgl, pages->nents, dst, size, offset); + if (i915_gem_object_is_lmem(uc_fw->obj)) { + unsigned long page_idx = offset >> PAGE_SHIFT; + unsigned int page_off = offset_in_page(offset); + void __iomem *vaddr; + + vaddr = i915_gem_object_lmem_io_map(uc_fw->obj, + page_idx, + page_off + size); + if (!vaddr) + return 0; + + memcpy(dst, vaddr + page_off, size); + io_mapping_unmap(vaddr); + return size; + } else { + return sg_pcopy_to_buffer(pages->sgl, pages->nents, + dst, size, offset); + } }
/**
From: CQ Tang cq.tang@intel.com
Add "REGION_STOLEN" device info to dg1, create stolen memory region from upper portion of local device memory, starting from DSMBASE.
The memory region is marked with "is_devmem=true".
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Balestrieri, Francesco francesco.balestrieri@intel.com Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Venkata S Dhanalakota venkata.s.dhanalakota@intel.com Cc: Neel Desai neel.desai@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 7 +++ drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 56 +++++++++++++++++++++- drivers/gpu/drm/i915/i915_pci.c | 2 +- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_memory_region.c | 5 ++ drivers/gpu/drm/i915/intel_memory_region.h | 2 +- 7 files changed, 71 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index 71c07e1f6f26..b2fd2bc862c0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -111,8 +111,8 @@ int i915_gem_object_lmem_pread(struct drm_i915_gem_object *obj, return ret; }
-static int i915_gem_object_lmem_pwrite(struct drm_i915_gem_object *obj, - const struct drm_i915_gem_pwrite *arg) +int i915_gem_object_lmem_pwrite(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pwrite *arg) { struct drm_i915_private *i915 = to_i915(obj->base.dev); struct intel_runtime_pm *rpm = &i915->runtime_pm; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index e11e0545e39c..c59aa6c014c7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -11,9 +11,16 @@ struct drm_i915_private; struct drm_i915_gem_object; struct intel_memory_region; +struct drm_i915_gem_pread; +struct drm_i915_gem_pwrite;
extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops;
+int i915_gem_object_lmem_pread(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pread *args); +int i915_gem_object_lmem_pwrite(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pwrite *args); + void __iomem * i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, unsigned long n, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 0ddf48e472a0..633745336f40 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -10,6 +10,7 @@ #include <drm/drm_mm.h> #include <drm/i915_drm.h>
+#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" #include "i915_drv.h" #include "i915_gem_stolen.h" @@ -121,6 +122,14 @@ static int i915_adjust_stolen(struct drm_i915_private *i915, } }
+ /* + * With device local memory, we don't need to check the address range, + * this is device memory physical address, could overlap with system + * memory. + */ + if (HAS_LMEM(i915)) + return 0; + /* * Verify that nothing else uses this physical address. Stolen * memory should be reserved by the BIOS and hidden from the @@ -607,7 +616,7 @@ static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj, kfree(pages); }
-static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = { +static struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = { .name = "i915_gem_object_stolen", .get_pages = i915_gem_object_get_pages_stolen, .put_pages = i915_gem_object_put_pages_stolen, @@ -716,7 +725,19 @@ i915_gem_object_create_stolen(struct drm_i915_private *i915,
static int init_stolen(struct intel_memory_region *mem) { - intel_memory_region_set_name(mem, "stolen"); + if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM) + intel_memory_region_set_name(mem, "stolen-system"); + else + intel_memory_region_set_name(mem, "stolen-local"); + + if (HAS_LMEM(mem->i915)) { + i915_gem_object_stolen_ops.pread = i915_gem_object_lmem_pread; + i915_gem_object_stolen_ops.pwrite = i915_gem_object_lmem_pwrite; + if (!io_mapping_init_wc(&mem->iomap, + mem->io_start, + resource_size(&mem->region))) + return -EIO; + }
/* * Initialise stolen early so that we may reserve preallocated @@ -736,8 +757,39 @@ static const struct intel_memory_region_ops i915_region_stolen_ops = { .create_object = i915_gem_object_create_stolen_region, };
+static +struct intel_memory_region *setup_lmem_stolen(struct drm_i915_private *i915) +{ + struct intel_uncore *uncore = &i915->uncore; + struct pci_dev *pdev = i915->drm.pdev; + struct intel_memory_region *mem; + resource_size_t io_start; + resource_size_t lmem_size; + u64 lmem_base; + + lmem_base = intel_uncore_read64(uncore, GEN12_DSMBASE); + lmem_size = pci_resource_len(pdev, 2) - lmem_base; + io_start = pci_resource_start(pdev, 2) + lmem_base; + + mem = intel_memory_region_create(i915, lmem_base, lmem_size, + I915_GTT_PAGE_SIZE_4K, io_start, + &i915_region_stolen_ops); + if (!IS_ERR(mem)) { + DRM_INFO("Intel graphics stolen LMEM: %pR\n", &mem->region); + DRM_INFO("Intel graphics stolen LMEM IO start: %llx\n", + (u64)mem->io_start); + /* this is real device memory */ + mem->is_devmem = true; + } + + return mem; +} + struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915) { + if (HAS_LMEM(i915)) + return setup_lmem_stolen(i915); + return intel_memory_region_create(i915, intel_graphics_stolen_res.start, resource_size(&intel_graphics_stolen_res), diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 8243178a56f9..c3d9b36ef651 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -907,7 +907,7 @@ static const struct intel_device_info rkl_info = {
#define GEN12_DGFX_FEATURES \ GEN12_FEATURES, \ - .memory_regions = REGION_SMEM | REGION_LMEM, \ + .memory_regions = REGION_SMEM | REGION_LMEM | REGION_STOLEN_LMEM, \ .has_master_unit_irq = 1, \ .has_llc = 0, \ .has_snoop = 1, \ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0e01ea0cb0a4..3c8350f108e4 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12067,6 +12067,7 @@ enum skl_power_gate { #define LMEM_ENABLE (1 << 31)
#define GEN12_GSMBASE _MMIO(0x108100) +#define GEN12_DSMBASE _MMIO(0x1080C0)
/* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 043541d409bd..c7a1d84e7ee8 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -19,6 +19,10 @@ const struct intel_memory_region_info intel_region_map[] = { .class = INTEL_MEMORY_STOLEN_SYSTEM, .instance = 0, }, + [INTEL_REGION_STOLEN_LMEM] = { + .class = INTEL_MEMORY_STOLEN_LOCAL, + .instance = 0, + }, };
struct intel_memory_region * @@ -311,6 +315,7 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) case INTEL_MEMORY_SYSTEM: mem = i915_gem_shmem_setup(i915); break; + case INTEL_MEMORY_STOLEN_LOCAL: /* fallthrough */ case INTEL_MEMORY_STOLEN_SYSTEM: mem = i915_gem_stolen_setup(i915); break; diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index b7a9e34faaf1..8da82cb2afe3 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -93,7 +93,7 @@ struct intel_memory_region { u16 type; u16 instance; enum intel_region_id id; - char name[8]; + char name[16]; struct intel_gt *gt; /* GT closest to this region. */ bool is_devmem; /* true for device memory */
On Fri, 27 Nov 2020, Matthew Auld matthew.auld@intel.com wrote:
From: CQ Tang cq.tang@intel.com
Add "REGION_STOLEN" device info to dg1, create stolen memory region from upper portion of local device memory, starting from DSMBASE.
The memory region is marked with "is_devmem=true".
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Balestrieri, Francesco francesco.balestrieri@intel.com Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Venkata S Dhanalakota venkata.s.dhanalakota@intel.com Cc: Neel Desai neel.desai@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 7 +++ drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 56 +++++++++++++++++++++- drivers/gpu/drm/i915/i915_pci.c | 2 +- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_memory_region.c | 5 ++ drivers/gpu/drm/i915/intel_memory_region.h | 2 +- 7 files changed, 71 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index 71c07e1f6f26..b2fd2bc862c0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -111,8 +111,8 @@ int i915_gem_object_lmem_pread(struct drm_i915_gem_object *obj, return ret; }
-static int i915_gem_object_lmem_pwrite(struct drm_i915_gem_object *obj,
const struct drm_i915_gem_pwrite *arg)
+int i915_gem_object_lmem_pwrite(struct drm_i915_gem_object *obj,
const struct drm_i915_gem_pwrite *arg)
{ struct drm_i915_private *i915 = to_i915(obj->base.dev); struct intel_runtime_pm *rpm = &i915->runtime_pm; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index e11e0545e39c..c59aa6c014c7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -11,9 +11,16 @@ struct drm_i915_private; struct drm_i915_gem_object; struct intel_memory_region; +struct drm_i915_gem_pread; +struct drm_i915_gem_pwrite;
extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops;
+int i915_gem_object_lmem_pread(struct drm_i915_gem_object *obj,
const struct drm_i915_gem_pread *args);
+int i915_gem_object_lmem_pwrite(struct drm_i915_gem_object *obj,
const struct drm_i915_gem_pwrite *args);
void __iomem * i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, unsigned long n, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 0ddf48e472a0..633745336f40 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -10,6 +10,7 @@ #include <drm/drm_mm.h> #include <drm/i915_drm.h>
+#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" #include "i915_drv.h" #include "i915_gem_stolen.h" @@ -121,6 +122,14 @@ static int i915_adjust_stolen(struct drm_i915_private *i915, } }
- /*
* With device local memory, we don't need to check the address range,
* this is device memory physical address, could overlap with system
* memory.
*/
- if (HAS_LMEM(i915))
return 0;
- /*
- Verify that nothing else uses this physical address. Stolen
- memory should be reserved by the BIOS and hidden from the
@@ -607,7 +616,7 @@ static void i915_gem_object_put_pages_stolen(struct drm_i915_gem_object *obj, kfree(pages); }
-static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = { +static struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
Making driver specific ops non-const seems suspicious...
.name = "i915_gem_object_stolen", .get_pages = i915_gem_object_get_pages_stolen, .put_pages = i915_gem_object_put_pages_stolen, @@ -716,7 +725,19 @@ i915_gem_object_create_stolen(struct drm_i915_private *i915,
static int init_stolen(struct intel_memory_region *mem) {
- intel_memory_region_set_name(mem, "stolen");
- if (mem->type == INTEL_MEMORY_STOLEN_SYSTEM)
intel_memory_region_set_name(mem, "stolen-system");
- else
intel_memory_region_set_name(mem, "stolen-local");
- if (HAS_LMEM(mem->i915)) {
i915_gem_object_stolen_ops.pread = i915_gem_object_lmem_pread;
i915_gem_object_stolen_ops.pwrite = i915_gem_object_lmem_pwrite;
...and AFAICT this modifies the ops for all devices, including the integrated GPU, if any of the devices HAS_LMEM().
BR, Jani.
if (!io_mapping_init_wc(&mem->iomap,
mem->io_start,
resource_size(&mem->region)))
return -EIO;
}
/*
- Initialise stolen early so that we may reserve preallocated
@@ -736,8 +757,39 @@ static const struct intel_memory_region_ops i915_region_stolen_ops = { .create_object = i915_gem_object_create_stolen_region, };
+static +struct intel_memory_region *setup_lmem_stolen(struct drm_i915_private *i915) +{
- struct intel_uncore *uncore = &i915->uncore;
- struct pci_dev *pdev = i915->drm.pdev;
- struct intel_memory_region *mem;
- resource_size_t io_start;
- resource_size_t lmem_size;
- u64 lmem_base;
- lmem_base = intel_uncore_read64(uncore, GEN12_DSMBASE);
- lmem_size = pci_resource_len(pdev, 2) - lmem_base;
- io_start = pci_resource_start(pdev, 2) + lmem_base;
- mem = intel_memory_region_create(i915, lmem_base, lmem_size,
I915_GTT_PAGE_SIZE_4K, io_start,
&i915_region_stolen_ops);
- if (!IS_ERR(mem)) {
DRM_INFO("Intel graphics stolen LMEM: %pR\n", &mem->region);
DRM_INFO("Intel graphics stolen LMEM IO start: %llx\n",
(u64)mem->io_start);
/* this is real device memory */
mem->is_devmem = true;
- }
- return mem;
+}
struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915) {
- if (HAS_LMEM(i915))
return setup_lmem_stolen(i915);
- return intel_memory_region_create(i915, intel_graphics_stolen_res.start, resource_size(&intel_graphics_stolen_res),
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 8243178a56f9..c3d9b36ef651 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -907,7 +907,7 @@ static const struct intel_device_info rkl_info = {
#define GEN12_DGFX_FEATURES \ GEN12_FEATURES, \
- .memory_regions = REGION_SMEM | REGION_LMEM, \
- .memory_regions = REGION_SMEM | REGION_LMEM | REGION_STOLEN_LMEM, \ .has_master_unit_irq = 1, \ .has_llc = 0, \ .has_snoop = 1, \
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0e01ea0cb0a4..3c8350f108e4 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12067,6 +12067,7 @@ enum skl_power_gate { #define LMEM_ENABLE (1 << 31)
#define GEN12_GSMBASE _MMIO(0x108100) +#define GEN12_DSMBASE _MMIO(0x1080C0)
/* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 043541d409bd..c7a1d84e7ee8 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -19,6 +19,10 @@ const struct intel_memory_region_info intel_region_map[] = { .class = INTEL_MEMORY_STOLEN_SYSTEM, .instance = 0, },
[INTEL_REGION_STOLEN_LMEM] = {
.class = INTEL_MEMORY_STOLEN_LOCAL,
.instance = 0,
},
};
struct intel_memory_region * @@ -311,6 +315,7 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) case INTEL_MEMORY_SYSTEM: mem = i915_gem_shmem_setup(i915); break;
case INTEL_MEMORY_STOLEN_SYSTEM: mem = i915_gem_stolen_setup(i915); break;case INTEL_MEMORY_STOLEN_LOCAL: /* fallthrough */
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index b7a9e34faaf1..8da82cb2afe3 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -93,7 +93,7 @@ struct intel_memory_region { u16 type; u16 instance; enum intel_region_id id;
- char name[8];
- char name[16]; struct intel_gt *gt; /* GT closest to this region. */ bool is_devmem; /* true for device memory */
From: Anusha Srivatsa anusha.srivatsa@intel.com
In the scenario where local memory is available, we have rely on CPU access via lmem directly instead of aperture.
Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Dhinakaran Pandiyan dhinakaran.pandiyan@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: CQ Tang cq.tang@intel.com Signed-off-by: Anusha Srivatsa anusha.srivatsa@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/display/intel_fbdev.c | 23 +++++++++++++++------- drivers/gpu/drm/i915/i915_vma.c | 19 ++++++++++++------ 2 files changed, 29 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 831e99e0785c..65539fab6269 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -41,6 +41,7 @@ #include <drm/drm_fb_helper.h> #include <drm/drm_fourcc.h>
+#include "gem/i915_gem_lmem.h" #include "i915_drv.h" #include "intel_display_types.h" #include "intel_fbdev.h" @@ -137,14 +138,22 @@ static int intelfb_alloc(struct drm_fb_helper *helper, size = mode_cmd.pitches[0] * mode_cmd.height; size = PAGE_ALIGN(size);
- /* If the FB is too big, just don't use it since fbdev is not very - * important and we should probably use that space with FBC or other - * features. */ obj = ERR_PTR(-ENODEV); - if (size * 2 < dev_priv->stolen_usable_size) - obj = i915_gem_object_create_stolen(dev_priv, size); - if (IS_ERR(obj)) - obj = i915_gem_object_create_shmem(dev_priv, size); + if (HAS_LMEM(dev_priv)) { + obj = i915_gem_object_create_lmem(dev_priv, size, + I915_BO_ALLOC_CONTIGUOUS); + } else { + /* + * If the FB is too big, just don't use it since fbdev is not very + * important and we should probably use that space with FBC or other + * features. + */ + if (size * 2 < dev_priv->stolen_usable_size) + obj = i915_gem_object_create_stolen(dev_priv, size); + if (IS_ERR(obj)) + obj = i915_gem_object_create_shmem(dev_priv, size); + } + if (IS_ERR(obj)) { drm_err(&dev_priv->drm, "failed to allocate framebuffer\n"); return PTR_ERR(obj); diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 82f60cc43a90..59fe82af48b2 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -27,6 +27,7 @@
#include "display/intel_frontbuffer.h"
+#include "gem/i915_gem_lmem.h" #include "gt/intel_engine.h" #include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" @@ -448,9 +449,11 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma) void __iomem *ptr; int err;
- if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) { - err = -ENODEV; - goto err; + if (!i915_gem_object_is_devmem(vma->obj)) { + if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) { + err = -ENODEV; + goto err; + } }
GEM_BUG_ON(!i915_vma_is_ggtt(vma)); @@ -458,9 +461,13 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
ptr = READ_ONCE(vma->iomap); if (ptr == NULL) { - ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap, - vma->node.start, - vma->node.size); + if (i915_gem_object_is_devmem(vma->obj)) + ptr = i915_gem_object_lmem_io_map(vma->obj, 0, + vma->obj->base.size); + else + ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap, + vma->node.start, + vma->node.size); if (ptr == NULL) { err = -ENOMEM; goto err;
From: Animesh Manna animesh.manna@intel.com
Newly created lmem buffer by fbdev need reset otherwise it has old garbage data. Same logic was present for stolen memory, extended the same for lmem.
Cc: Daniel Vetter daniel.vetter@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Animesh Manna animesh.manna@intel.com --- drivers/gpu/drm/i915/display/intel_fbdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 65539fab6269..6bd3bbe42bf0 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -280,7 +280,7 @@ static int intelfb_create(struct drm_fb_helper *helper, * If the object is stolen however, it will be full of whatever * garbage was left in there. */ - if (vma->obj->stolen && !prealloc) + if ((vma->obj->stolen || HAS_LMEM(dev_priv)) && !prealloc) memset_io(info->screen_base, 0, info->screen_size);
/* Use default scratch pixmap (info->pixmap.flags = FB_PIXMAP_SYSTEM) */
From: Animesh Manna animesh.manna@intel.com
For dgfx, DSB should use local memory instead of system memory. Using local memory surely brings performance improvement as local memory is close to gpu. Also want to avoid multiple gpu using system memory.
Use LMEM API to create gem object needed for DSB command buffer.
Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Ramalingam C ramalingam.c@intel.com Signed-off-by: Animesh Manna animesh.manna@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/display/intel_dsb.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dsb.c b/drivers/gpu/drm/i915/display/intel_dsb.c index 857126822a88..73795e415ad5 100644 --- a/drivers/gpu/drm/i915/display/intel_dsb.c +++ b/drivers/gpu/drm/i915/display/intel_dsb.c @@ -6,6 +6,7 @@
#include "i915_drv.h" #include "intel_display_types.h" +#include "gem/i915_gem_lmem.h"
#define DSB_BUF_SIZE (2 * PAGE_SIZE)
@@ -278,7 +279,11 @@ void intel_dsb_prepare(struct intel_crtc_state *crtc_state)
wakeref = intel_runtime_pm_get(&i915->runtime_pm);
- obj = i915_gem_object_create_internal(i915, DSB_BUF_SIZE); + if (HAS_LMEM(i915)) + obj = i915_gem_object_create_lmem(i915, DSB_BUF_SIZE, 0); + else + obj = i915_gem_object_create_internal(i915, DSB_BUF_SIZE); + if (IS_ERR(obj)) { drm_err(&i915->drm, "Gem object creation failed\n"); kfree(dsb);
From: Abdiel Janulgue abdiel.janulgue@linux.intel.com
In the following patch we need to reserve regions unaccessible to the driver during initialization, so add back mem->reserved for collecting such regions.
Cc: Imre Deak imre.deak@intel.com Signed-off-by: Abdiel Janulgue abdiel.janulgue@linux.intel.com --- drivers/gpu/drm/i915/intel_memory_region.c | 2 + drivers/gpu/drm/i915/intel_memory_region.h | 2 + .../drm/i915/selftests/intel_memory_region.c | 89 +++++++++++++++++++ 3 files changed, 93 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index c7a1d84e7ee8..554fdd7735a8 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -203,6 +203,7 @@ int intel_memory_region_init_buddy(struct intel_memory_region *mem)
void intel_memory_region_release_buddy(struct intel_memory_region *mem) { + i915_buddy_free_list(&mem->mm, &mem->reserved); i915_buddy_fini(&mem->mm); }
@@ -232,6 +233,7 @@ intel_memory_region_create(struct drm_i915_private *i915, mutex_init(&mem->objects.lock); INIT_LIST_HEAD(&mem->objects.list); INIT_LIST_HEAD(&mem->objects.purgeable); + INIT_LIST_HEAD(&mem->reserved);
mutex_init(&mem->mm_lock);
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 8da82cb2afe3..0bfc1fa36f74 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -97,6 +97,8 @@ struct intel_memory_region { struct intel_gt *gt; /* GT closest to this region. */ bool is_devmem; /* true for device memory */
+ struct list_head reserved; + dma_addr_t remap_addr;
struct { diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 93e067951e0f..9df0a4f657c1 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -134,6 +134,94 @@ static void igt_object_release(struct drm_i915_gem_object *obj) i915_gem_object_put(obj); }
+static int igt_reserve_range(struct intel_memory_region *mem, + struct list_head *reserved, + u64 offset, + u64 size) +{ + int ret; + LIST_HEAD(blocks); + + ret = i915_buddy_alloc_range(&mem->mm, &blocks, offset, size); + if (!ret) + list_splice_tail(&blocks, reserved); + + return ret; +} + +static int igt_mock_reserve(void *arg) +{ + struct drm_i915_gem_object *obj; + struct intel_memory_region *mem = arg; + resource_size_t avail = resource_size(&mem->region); + I915_RND_STATE(prng); + LIST_HEAD(objects); + LIST_HEAD(reserved); + u32 i, offset, count, *order; + u64 allocated, cur_avail; + const u32 chunk_size = SZ_32M; + int err = 0; + + count = avail / chunk_size; + order = i915_random_order(count, &prng); + if (!order) + return 0; + + /* Reserve a bunch of ranges within the region */ + for (i = 0; i < count; ++i) { + u64 start = order[i] * chunk_size; + u64 size = i915_prandom_u32_max_state(chunk_size, &prng); + + /* Allow for some really big holes */ + if (!size) + continue; + + size = round_up(size, PAGE_SIZE); + offset = igt_random_offset(&prng, 0, chunk_size, size, + PAGE_SIZE); + + err = igt_reserve_range(mem, &reserved, start + offset, size); + if (err) { + pr_err("%s failed to reserve range", __func__); + goto out_close; + } + + /* XXX: maybe sanity check the block range here? */ + avail -= size; + } + + /* Try to see if we can allocate from the remaining space */ + allocated = 0; + cur_avail = avail; + do { + u64 size = i915_prandom_u32_max_state(cur_avail, &prng); + + size = max_t(u64, round_up(size, PAGE_SIZE), (u64)PAGE_SIZE); + obj = igt_object_create(mem, &objects, size, 0); + + if (IS_ERR(obj)) { + if (PTR_ERR(obj) == -ENXIO) + break; + + err = PTR_ERR(obj); + goto out_close; + } + cur_avail -= size; + allocated += size; + } while (1); + + if (allocated != avail) { + pr_err("%s mismatch between allocation and free space", __func__); + err = -EINVAL; + } + +out_close: + kfree(order); + close_objects(mem, &objects); + i915_buddy_free_list(&mem->mm, &reserved); + return err; +} + static int igt_mock_contiguous(void *arg) { struct intel_memory_region *mem = arg; @@ -1180,6 +1268,7 @@ static int igt_lmem_pages_migrate(void *arg) int intel_memory_region_mock_selftests(void) { static const struct i915_subtest tests[] = { + SUBTEST(igt_mock_reserve), SUBTEST(igt_mock_fill), SUBTEST(igt_mock_contiguous), SUBTEST(igt_mock_splintered_region),
From: Imre Deak imre.deak@intel.com
On DG1 A0/B0 steppings the first 1MB of local memory must be reserved. One reason for this is that the 0xA0000-0xB0000 range is not accessible by the display, probably since this region is redirected to another memory location for legacy VGA compatibility.
BSpec: 50586 Testcase: igt/kms_big_fb/linear-64bpp-rotate-0 Signed-off-by: Imre Deak imre.deak@intel.com --- drivers/gpu/drm/i915/intel_region_lmem.c | 52 ++++++++++++++++++++++++ 1 file changed, 52 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index 939cf0d195a5..eafef7034680 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -137,6 +137,48 @@ intel_setup_fake_lmem(struct drm_i915_private *i915) return mem; }
+static void get_legacy_lowmem_region(struct intel_uncore *uncore, + u64 *start, u32 *size) +{ + *start = 0; + *size = 0; + + if (!IS_DG1_REVID(uncore->i915, DG1_REVID_A0, DG1_REVID_B0)) + return; + + *size = SZ_1M; + + DRM_DEBUG_DRIVER("LMEM: reserved legacy low-memory [0x%llx-0x%llx]\n", + *start, *start + *size); +} + +static int reserve_lowmem_region(struct intel_uncore *uncore, + struct intel_memory_region *mem) +{ + u64 reserve_start; + u64 reserve_end; + u64 region_start; + u32 region_size; + int ret; + + get_legacy_lowmem_region(uncore, ®ion_start, ®ion_size); + reserve_start = region_start; + reserve_end = region_start + region_size; + + if (!reserve_end) + return 0; + + DRM_INFO("LMEM: reserving low-memory region [0x%llx-0x%llx]\n", + reserve_start, reserve_end); + ret = i915_buddy_alloc_range(&mem->mm, &mem->reserved, + reserve_start, + reserve_end - reserve_start); + if (ret) + DRM_ERROR("LMEM: reserving low memory region failed\n"); + + return ret; +} + static struct intel_memory_region * setup_lmem(struct drm_i915_private *dev_priv) { @@ -160,6 +202,16 @@ setup_lmem(struct drm_i915_private *dev_priv) I915_GTT_PAGE_SIZE_4K, io_start, &intel_region_lmem_ops); + if (!IS_ERR(mem)) { + int err; + + err = reserve_lowmem_region(uncore, mem); + if (err) { + intel_memory_region_put(mem); + return ERR_PTR(err); + } + } + if (!IS_ERR(mem)) { DRM_INFO("Intel graphics LMEM: %pR\n", &mem->region); DRM_INFO("Intel graphics LMEM IO start: %llx\n",
Quoting Matthew Auld (2020-11-27 12:06:34)
From: Imre Deak imre.deak@intel.com
On DG1 A0/B0 steppings the first 1MB of local memory must be reserved. One reason for this is that the 0xA0000-0xB0000 range is not accessible by the display, probably since this region is redirected to another memory location for legacy VGA compatibility.
BSpec: 50586 Testcase: igt/kms_big_fb/linear-64bpp-rotate-0 Signed-off-by: Imre Deak imre.deak@intel.com
drivers/gpu/drm/i915/intel_region_lmem.c | 52 ++++++++++++++++++++++++ 1 file changed, 52 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index 939cf0d195a5..eafef7034680 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -137,6 +137,48 @@ intel_setup_fake_lmem(struct drm_i915_private *i915) return mem; }
+static void get_legacy_lowmem_region(struct intel_uncore *uncore,
u64 *start, u32 *size)
+{
*start = 0;
*size = 0;
if (!IS_DG1_REVID(uncore->i915, DG1_REVID_A0, DG1_REVID_B0))
return;
*size = SZ_1M;
DRM_DEBUG_DRIVER("LMEM: reserved legacy low-memory [0x%llx-0x%llx]\n",
*start, *start + *size);
+}
+static int reserve_lowmem_region(struct intel_uncore *uncore,
struct intel_memory_region *mem)
+{
u64 reserve_start;
u64 reserve_end;
u64 region_start;
u32 region_size;
int ret;
get_legacy_lowmem_region(uncore, ®ion_start, ®ion_size);
reserve_start = region_start;
reserve_end = region_start + region_size;
if (!reserve_end)
return 0;
DRM_INFO("LMEM: reserving low-memory region [0x%llx-0x%llx]\n",
reserve_start, reserve_end);
ret = i915_buddy_alloc_range(&mem->mm, &mem->reserved,
reserve_start,
reserve_end - reserve_start);
Isn't this now relative to the stolen offset? Should this be reserved, or excluded like stolen? -Chris
On 27/11/2020 13:52, Chris Wilson wrote:
Quoting Matthew Auld (2020-11-27 12:06:34)
From: Imre Deak imre.deak@intel.com
On DG1 A0/B0 steppings the first 1MB of local memory must be reserved. One reason for this is that the 0xA0000-0xB0000 range is not accessible by the display, probably since this region is redirected to another memory location for legacy VGA compatibility.
BSpec: 50586 Testcase: igt/kms_big_fb/linear-64bpp-rotate-0 Signed-off-by: Imre Deak imre.deak@intel.com
drivers/gpu/drm/i915/intel_region_lmem.c | 52 ++++++++++++++++++++++++ 1 file changed, 52 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index 939cf0d195a5..eafef7034680 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -137,6 +137,48 @@ intel_setup_fake_lmem(struct drm_i915_private *i915) return mem; }
+static void get_legacy_lowmem_region(struct intel_uncore *uncore,
u64 *start, u32 *size)
+{
*start = 0;
*size = 0;
if (!IS_DG1_REVID(uncore->i915, DG1_REVID_A0, DG1_REVID_B0))
return;
*size = SZ_1M;
DRM_DEBUG_DRIVER("LMEM: reserved legacy low-memory [0x%llx-0x%llx]\n",
*start, *start + *size);
+}
+static int reserve_lowmem_region(struct intel_uncore *uncore,
struct intel_memory_region *mem)
+{
u64 reserve_start;
u64 reserve_end;
u64 region_start;
u32 region_size;
int ret;
get_legacy_lowmem_region(uncore, ®ion_start, ®ion_size);
reserve_start = region_start;
reserve_end = region_start + region_size;
if (!reserve_end)
return 0;
DRM_INFO("LMEM: reserving low-memory region [0x%llx-0x%llx]\n",
reserve_start, reserve_end);
ret = i915_buddy_alloc_range(&mem->mm, &mem->reserved,
reserve_start,
reserve_end - reserve_start);
Isn't this now relative to the stolen offset? Should this be reserved, or excluded like stolen?
AFAIK stolen is just snipped off at the end of lmem, so I don't think it really matters if we exclude or reserve. But for this if we exclude then the region.start might have "strange" alignment, which is annoying since alloc(some_power_of_two) might not give us the expected alignment, whereas if we reserve then the allocator is aware, and so we should get the proper alignment. Maybe you have better ideas with how to handle this, but I think keeping the alignment property is nice.
-Chris
Quoting Matthew Auld (2020-11-30 11:09:57)
On 27/11/2020 13:52, Chris Wilson wrote:
Quoting Matthew Auld (2020-11-27 12:06:34)
From: Imre Deak imre.deak@intel.com
On DG1 A0/B0 steppings the first 1MB of local memory must be reserved. One reason for this is that the 0xA0000-0xB0000 range is not accessible by the display, probably since this region is redirected to another memory location for legacy VGA compatibility.
BSpec: 50586 Testcase: igt/kms_big_fb/linear-64bpp-rotate-0 Signed-off-by: Imre Deak imre.deak@intel.com
drivers/gpu/drm/i915/intel_region_lmem.c | 52 ++++++++++++++++++++++++ 1 file changed, 52 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index 939cf0d195a5..eafef7034680 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -137,6 +137,48 @@ intel_setup_fake_lmem(struct drm_i915_private *i915) return mem; }
+static void get_legacy_lowmem_region(struct intel_uncore *uncore,
u64 *start, u32 *size)
+{
*start = 0;
*size = 0;
if (!IS_DG1_REVID(uncore->i915, DG1_REVID_A0, DG1_REVID_B0))
return;
*size = SZ_1M;
DRM_DEBUG_DRIVER("LMEM: reserved legacy low-memory [0x%llx-0x%llx]\n",
*start, *start + *size);
+}
+static int reserve_lowmem_region(struct intel_uncore *uncore,
struct intel_memory_region *mem)
+{
u64 reserve_start;
u64 reserve_end;
u64 region_start;
u32 region_size;
int ret;
get_legacy_lowmem_region(uncore, ®ion_start, ®ion_size);
reserve_start = region_start;
reserve_end = region_start + region_size;
if (!reserve_end)
return 0;
DRM_INFO("LMEM: reserving low-memory region [0x%llx-0x%llx]\n",
reserve_start, reserve_end);
ret = i915_buddy_alloc_range(&mem->mm, &mem->reserved,
reserve_start,
reserve_end - reserve_start);
Isn't this now relative to the stolen offset? Should this be reserved, or excluded like stolen?
AFAIK stolen is just snipped off at the end of lmem, so I don't think it really matters if we exclude or reserve.
Right, misread, thought it was moving the start point.
But for this if we exclude then the region.start might have "strange" alignment, which is annoying since alloc(some_power_of_two) might not give us the expected alignment, whereas if we reserve then the allocator is aware, and so we should get the proper alignment. Maybe you have better ideas with how to handle this, but I think keeping the alignment property is nice.
The only tweak I would look at is making this reservation be the property of the VGA decode. But if this promises not to live into production, kiss. -Chris
From: Clint Taylor clinton.a.taylor@intel.com
Read OPROM SPI through MMIO and find VBT entry since we can't use OpRegion and PCI mapping may not work on some systems due to the BIOS not leaving the Option ROM mapped.
Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Tomas Winkler tomas.winkler@intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Signed-off-by: Clint Taylor clinton.a.taylor@intel.com Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/display/intel_bios.c | 80 +++++++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 8 +++ 2 files changed, 82 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c index 4cc949b228f2..91044fc52acb 100644 --- a/drivers/gpu/drm/i915/display/intel_bios.c +++ b/drivers/gpu/drm/i915/display/intel_bios.c @@ -2086,6 +2086,66 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size) return vbt; }
+static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *dev_priv) +{ + u32 count, data, found, store = 0; + u32 static_region, oprom_offset; + u32 oprom_size = 0x200000; + u16 vbt_size; + u32 *vbt; + + static_region = I915_READ(SPI_STATIC_REGIONS); + static_region &= OPTIONROM_SPI_REGIONID_MASK; + I915_WRITE(PRIMARY_SPI_REGIONID, static_region); + + oprom_offset = I915_READ(OROM_OFFSET); + oprom_offset &= OROM_OFFSET_MASK; + + for (count = 0; count < oprom_size; count += 4) { + I915_WRITE(PRIMARY_SPI_ADDRESS, oprom_offset + count); + data = I915_READ(PRIMARY_SPI_TRIGGER); + + if (data == *((const u32 *)"$VBT")) { + found = oprom_offset + count; + break; + } + } + + if (count >= oprom_size) + goto err_not_found; + + /* Get VBT size and allocate space for the VBT */ + I915_WRITE(PRIMARY_SPI_ADDRESS, found + + offsetof(struct vbt_header, vbt_size)); + vbt_size = I915_READ(PRIMARY_SPI_TRIGGER); + vbt_size &= 0xffff; + + vbt = kzalloc(vbt_size, GFP_KERNEL); + if (!vbt) { + DRM_ERROR("Unable to allocate %u bytes for VBT storage\n", + vbt_size); + goto err_not_found; + } + + for (count = 0; count < vbt_size; count += 4) { + I915_WRITE(PRIMARY_SPI_ADDRESS, found + count); + data = I915_READ(PRIMARY_SPI_TRIGGER); + *(vbt + store++) = data; + } + + if (!intel_bios_is_valid_vbt(vbt, vbt_size)) + goto err_free_vbt; + + DRM_DEBUG_KMS("Found valid VBT in SPI flash\n"); + + return (struct vbt_header *)vbt; + +err_free_vbt: + kfree(vbt); +err_not_found: + return NULL; +} + static struct vbt_header *oprom_get_vbt(struct drm_i915_private *dev_priv) { struct pci_dev *pdev = dev_priv->drm.pdev; @@ -2135,6 +2195,8 @@ static struct vbt_header *oprom_get_vbt(struct drm_i915_private *dev_priv)
pci_unmap_rom(pdev, oprom);
+ DRM_DEBUG_KMS("Found valid VBT in PCI ROM\n"); + return vbt;
err_free_vbt: @@ -2169,17 +2231,23 @@ void intel_bios_init(struct drm_i915_private *dev_priv)
init_vbt_defaults(dev_priv);
- /* If the OpRegion does not have VBT, look in PCI ROM. */ + /* + * If the OpRegion does not have VBT, look in SPI flash through MMIO or + * PCI mapping + */ + if (!vbt && IS_DGFX(dev_priv)) { + oprom_vbt = spi_oprom_get_vbt(dev_priv); + vbt = oprom_vbt; + } + if (!vbt) { oprom_vbt = oprom_get_vbt(dev_priv); - if (!oprom_vbt) - goto out; - vbt = oprom_vbt; - - drm_dbg_kms(&dev_priv->drm, "Found valid VBT in PCI ROM\n"); }
+ if (!vbt) + goto out; + bdb = get_bdb_header(vbt);
drm_dbg_kms(&dev_priv->drm, diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 3c8350f108e4..f00289574ac8 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12413,6 +12413,14 @@ enum skl_power_gate { #define DP_PIN_ASSIGNMENT_MASK(idx) (0xf << ((idx) * 4)) #define DP_PIN_ASSIGNMENT(idx, x) ((x) << ((idx) * 4))
+#define PRIMARY_SPI_TRIGGER _MMIO(0x102040) +#define PRIMARY_SPI_ADDRESS _MMIO(0x102080) +#define PRIMARY_SPI_REGIONID _MMIO(0x102084) +#define SPI_STATIC_REGIONS _MMIO(0x102090) +#define OPTIONROM_SPI_REGIONID_MASK REG_GENMASK(7, 0) +#define OROM_OFFSET _MMIO(0x1020c0) +#define OROM_OFFSET_MASK REG_GENMASK(20, 16) + /* This register controls the Display State Buffer (DSB) engines. */ #define _DSBSL_INSTANCE_BASE 0x70B00 #define DSBSL_INSTANCE(pipe, id) (_DSBSL_INSTANCE_BASE + \
On Fri, 27 Nov 2020, Matthew Auld matthew.auld@intel.com wrote:
- DRM_DEBUG_KMS("Found valid VBT in SPI flash\n");
Please use drm_dbg_kms() and friends throughout the series. We don't want new users of DRM_DEBUG* in the driver.
BR, Jani.
From: Anshuman Gupta anshuman.gupta@intel.com
Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB structures and PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.
Cc: Jani Nikula jani.nikula@intel.com Cc: Uma Shankar uma.shankar@intel.com Cc: Uma Shankar uma.shankar@intel.com Signed-off-by: Anshuman Gupta anshuman.gupta@intel.com --- drivers/gpu/drm/i915/display/intel_bios.c | 49 +++-- drivers/gpu/drm/i915/display/intel_opregion.c | 169 ++++++++++++++++++ drivers/gpu/drm/i915/display/intel_opregion.h | 31 +++- 3 files changed, 221 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c index 91044fc52acb..358576bc0be2 100644 --- a/drivers/gpu/drm/i915/display/intel_bios.c +++ b/drivers/gpu/drm/i915/display/intel_bios.c @@ -2088,37 +2088,36 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size)
static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *dev_priv) { - u32 count, data, found, store = 0; - u32 static_region, oprom_offset; - u32 oprom_size = 0x200000; - u16 vbt_size; - u32 *vbt; - - static_region = I915_READ(SPI_STATIC_REGIONS); - static_region &= OPTIONROM_SPI_REGIONID_MASK; - I915_WRITE(PRIMARY_SPI_REGIONID, static_region); + u32 count, found; + u32 *vbt, *oprom_opreg = NULL; + u16 vbt_size, opreg_size; + u8 *parse_ptr;
- oprom_offset = I915_READ(OROM_OFFSET); - oprom_offset &= OROM_OFFSET_MASK; + if (intel_oprom_verify_signature(&oprom_opreg, &opreg_size, dev_priv)) { + drm_err(&dev_priv->drm, "oprom signature verification failed\n"); + goto err_not_found; + }
- for (count = 0; count < oprom_size; count += 4) { - I915_WRITE(PRIMARY_SPI_ADDRESS, oprom_offset + count); - data = I915_READ(PRIMARY_SPI_TRIGGER); + if (!oprom_opreg) { + drm_err(&dev_priv->drm, "opregion not found\n"); + goto err_not_found; + }
- if (data == *((const u32 *)"$VBT")) { - found = oprom_offset + count; + for (count = 0; count < opreg_size; count += 4) { + if (oprom_opreg[count / 4] == *((const u32 *)"$VBT")) { + found = count; break; } }
- if (count >= oprom_size) + if (count >= opreg_size) { + drm_err(&dev_priv->drm, "VBT not found in opregion\n"); goto err_not_found; + }
/* Get VBT size and allocate space for the VBT */ - I915_WRITE(PRIMARY_SPI_ADDRESS, found + - offsetof(struct vbt_header, vbt_size)); - vbt_size = I915_READ(PRIMARY_SPI_TRIGGER); - vbt_size &= 0xffff; + parse_ptr = (u8 *)oprom_opreg + found; + vbt_size = ((struct vbt_header *)parse_ptr)->vbt_size;
vbt = kzalloc(vbt_size, GFP_KERNEL); if (!vbt) { @@ -2127,16 +2126,12 @@ static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *dev_priv) goto err_not_found; }
- for (count = 0; count < vbt_size; count += 4) { - I915_WRITE(PRIMARY_SPI_ADDRESS, found + count); - data = I915_READ(PRIMARY_SPI_TRIGGER); - *(vbt + store++) = data; - } - + memcpy(vbt, parse_ptr, vbt_size); if (!intel_bios_is_valid_vbt(vbt, vbt_size)) goto err_free_vbt;
DRM_DEBUG_KMS("Found valid VBT in SPI flash\n"); + kfree(oprom_opreg);
return (struct vbt_header *)vbt;
diff --git a/drivers/gpu/drm/i915/display/intel_opregion.c b/drivers/gpu/drm/i915/display/intel_opregion.c index 4f77cf849171..81e5946393dd 100644 --- a/drivers/gpu/drm/i915/display/intel_opregion.c +++ b/drivers/gpu/drm/i915/display/intel_opregion.c @@ -983,6 +983,175 @@ int intel_opregion_setup(struct drm_i915_private *dev_priv) return err; }
+static int oprom_image_parse_helper(u8 *parse_ptr, u8 *last_img, u8 *code_type, + struct drm_i915_private *i915) +{ + u8 size_512_bytes; + + if (((union oprom_header *)parse_ptr)->signature != OPROM_IMAGE_MAGIC) { + drm_err(&i915->drm, "Wrong OPROM header signature.\n"); + return -EINVAL; + } + + size_512_bytes = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_IMAGE_LENGTH_OFFSET]; + *code_type = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_CODE_TYPE_OFFSET]; + *last_img = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_LAST_IMAGE_INDICATOR_OFFSET]; + + return size_512_bytes; +} + +static void spi_read_oprom_helper(size_t len, u32 offset, u32 *buf, + struct drm_i915_private *dev_priv) +{ + u32 count, data; + + for (count = 0; count < len; count += 4) { + I915_WRITE(PRIMARY_SPI_ADDRESS, offset + count); + data = I915_READ(PRIMARY_SPI_TRIGGER); + buf[count / 4] = data; + } +} + +/** + * + DASH+G OPROM IMAGE LAYOUT + + * +--------+-------+---------------------------+ + * | Offset | Value | ROM Header Fields +-----> Image 1 (CSS) + * +--------------------------------------------+ + * | 0h | 55h | ROM Signature Byte1 | + * | 1h | AAh | ROM Signature Byte2 | + * | 2h | xx | Reserved | + * | 18+19h| xx | Ptr to PCI DataStructure | + * +----------------+---------------------------+ + * | PCI Data Structure | + * +--------------------------------------------+ + * | . . . | + * | . . . | + * | 10 + xx + Image Length | + * | 14 + xx + Code Type | + * | 15 + xx + Last Image Indicator | + * | . . . | + * +--------------------------------------------+ + * | MEU BLOB | + * +--------------------------------------------+ + * | CPD Header | + * | CPD Entry | + * | Reserved | + * | SignedDataPart1 | + * | PublicKey | + * | RSA Signature | + * | SignedDataPart2 | + * | IFWI Metadata | + * +--------+-------+---------------------------+ + * | . | . | . | + * | . | . | . | + * +--------------------------------------------+ + * | Offset | Value | ROM Header Fields +-----> Image 2 (Config Data) (Offset: 0x800) + * +--------------------------------------------+ + * | 0h | 55h | ROM Signature Byte1 | + * | 1h | AAh | ROM Signature Byte2 | + * | 2h | xx | Reserved | + * | 18+19h| xx | Ptr to PCI DataStructure | + * +----------------+---------------------------+ + * | PCI Data Structure | + * +--------------------------------------------+ + * | . . . | + * | . . . | + * | 10 + xx + Image Length | + * | 14 + xx + Code Type | + * | 15 + xx + Last Image Indicator | + * | . . . | + * | 1A + 3C + Ptr to Opregion Signature | + * | . . . | + * | . . . | + * | 83Ch + IntelGraphicsMem | <---+ Opregion Signature + * +--------+-----------------------------------+ + * + * intel_oprom_verify_signature() verify OPROM signature. + * @opreg: pointer to opregion buffer output. + * @opreg_size: pointer to opregion size output. + * @dev_priv: i915 device. + */ +int +intel_oprom_verify_signature(u32 **opreg, u16 *opreg_size, + struct drm_i915_private *dev_priv) +{ + u8 img_sig[sizeof(OPREGION_SIGNATURE)]; + u8 code_type, last_img; + u32 static_region, offset; + u32 *oprom_img, *oprom_img_hdr; + u16 opreg_base, img_len; + u8 *parse_ptr; + int img_size; + int ret = -EINVAL; + + /* initialize SPI to read the OPROM */ + static_region = I915_READ(SPI_STATIC_REGIONS); + static_region &= OPTIONROM_SPI_REGIONID_MASK; + I915_WRITE(PRIMARY_SPI_REGIONID, static_region); + /* read OPROM offset in SPI flash */ + offset = I915_READ(OROM_OFFSET); + offset &= OROM_OFFSET_MASK; + + oprom_img_hdr = kzalloc(OPROM_INITIAL_READ_SIZE, GFP_KERNEL); + if (!oprom_img_hdr) + return -ENOMEM; + + do { + spi_read_oprom_helper(OPROM_INITIAL_READ_SIZE, offset, + oprom_img_hdr, dev_priv); + img_size = oprom_image_parse_helper((u8 *)oprom_img_hdr, &last_img, + &code_type, dev_priv); + if (img_size <= 0) { + ret = -EINVAL; + goto err_free_hdr; + } + + img_len = img_size * OPROM_BYTE_BOUNDARY; + oprom_img = kzalloc(img_len, GFP_KERNEL); + if (!oprom_img) { + ret = -ENOMEM; + goto err_free_hdr; + } + + spi_read_oprom_helper(img_len, offset, oprom_img, dev_priv); + parse_ptr = (u8 *)oprom_img; + offset = offset + img_len; + + /* opregion base offset */ + opreg_base = ((struct expansion_rom_header *)parse_ptr)->opregion_base; + /* CPD or opreg signature is present at opregion_base offset */ + memcpy(img_sig, parse_ptr + opreg_base, sizeof(OPREGION_SIGNATURE)); + + if (!memcmp(img_sig, OPREGION_SIGNATURE, sizeof(OPREGION_SIGNATURE) - 1)) { + *opreg = oprom_img; + *opreg_size = img_len; + drm_dbg_kms(&dev_priv->drm, "Found opregion image\n"); + ret = 0; + break; + } else if (!memcmp(img_sig, CPD_SIGNATURE, NUM_CPD_BYTES)) { + if (code_type != OPROM_CSS_CODE_TYPE) { + drm_err(&dev_priv->drm, "Invalid OPROM\n"); + ret = -EINVAL; + goto err_free_img; + } + drm_dbg_kms(&dev_priv->drm, "Found CSS image\n"); + /* proceed here onwards for signature authentication */ + kfree(oprom_img); + continue; + } + + } while (last_img != LAST_IMG_INDICATOR); + + return ret; + +err_free_img: + kfree(oprom_img); +err_free_hdr: + kfree(oprom_img_hdr); + + return ret; +} + static int intel_use_opregion_panel_type_callback(const struct dmi_system_id *id) { DRM_INFO("Using panel type from OpRegion on %s\n", id->ident); diff --git a/drivers/gpu/drm/i915/display/intel_opregion.h b/drivers/gpu/drm/i915/display/intel_opregion.h index 4aa68ffbd30e..4e2eeadf101e 100644 --- a/drivers/gpu/drm/i915/display/intel_opregion.h +++ b/drivers/gpu/drm/i915/display/intel_opregion.h @@ -54,6 +54,34 @@ struct intel_opregion {
#define OPREGION_SIZE (8 * 1024)
+#define CPD_SIGNATURE "$CPD" /* CPD Signature */ +#define NUM_CPD_BYTES 4 +#define PCI_IMAGE_LENGTH_OFFSET 0x10 +#define PCI_CODE_TYPE_OFFSET 0x14 +#define PCI_LAST_IMAGE_INDICATOR_OFFSET 0x15 +#define LAST_IMG_INDICATOR 0x80 +#define OPROM_IMAGE_MAGIC 0xAA55 /* Little Endian */ +#define OPROM_CSS_CODE_TYPE 0xF0 +#define OPROM_BYTE_BOUNDARY 512 /* OPROM image sizes are indicated in 512 byte boundaries */ +#define OPROM_INITIAL_READ_SIZE 60 /* Read 60 bytes to compute the Img Len from PCI structure */ + +union oprom_header { + u32 data; + struct { + u16 signature; /* Offset[0x0]: Header 0x55 0xAA */ + u8 sizein512bytes; + u8 reserved; + }; +}; + +struct expansion_rom_header { + union oprom_header header; /* Offset[0x0]: Oprom Header */ + u16 vbiospostoffset; /* Offset[0x4]: pointer to VBIOS entry point */ + u8 resvd[0x12]; + u16 pcistructoffset; /* Offset[0x18]: Contains pointer PCI Data Structure */ + u16 opregion_base; /* Offset[0x1A]: Offset to Opregion Base start */ +}; + #ifdef CONFIG_ACPI
int intel_opregion_setup(struct drm_i915_private *dev_priv); @@ -118,5 +146,6 @@ static inline int intel_opregion_get_panel_type(struct drm_i915_private *dev) }
#endif /* CONFIG_ACPI */ - +int intel_oprom_verify_signature(u32 **opreg, u16 *opreg_size, + struct drm_i915_private *i915); #endif
On Fri, 27 Nov 2020, Matthew Auld matthew.auld@intel.com wrote:
From: Anshuman Gupta anshuman.gupta@intel.com
Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB structures and PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.
Comments inline.
BR, Jani.
Cc: Jani Nikula jani.nikula@intel.com Cc: Uma Shankar uma.shankar@intel.com Cc: Uma Shankar uma.shankar@intel.com Signed-off-by: Anshuman Gupta anshuman.gupta@intel.com
drivers/gpu/drm/i915/display/intel_bios.c | 49 +++-- drivers/gpu/drm/i915/display/intel_opregion.c | 169 ++++++++++++++++++ drivers/gpu/drm/i915/display/intel_opregion.h | 31 +++- 3 files changed, 221 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c index 91044fc52acb..358576bc0be2 100644 --- a/drivers/gpu/drm/i915/display/intel_bios.c +++ b/drivers/gpu/drm/i915/display/intel_bios.c @@ -2088,37 +2088,36 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size)
static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *dev_priv) {
- u32 count, data, found, store = 0;
- u32 static_region, oprom_offset;
- u32 oprom_size = 0x200000;
- u16 vbt_size;
- u32 *vbt;
- static_region = I915_READ(SPI_STATIC_REGIONS);
- static_region &= OPTIONROM_SPI_REGIONID_MASK;
- I915_WRITE(PRIMARY_SPI_REGIONID, static_region);
- u32 count, found;
- u32 *vbt, *oprom_opreg = NULL;
- u16 vbt_size, opreg_size;
- u8 *parse_ptr;
- oprom_offset = I915_READ(OROM_OFFSET);
- oprom_offset &= OROM_OFFSET_MASK;
- if (intel_oprom_verify_signature(&oprom_opreg, &opreg_size, dev_priv)) {
drm_err(&dev_priv->drm, "oprom signature verification failed\n");
goto err_not_found;
- }
Kind of silly that the previous patch adds all the reading here, and then it gets moved into a function called "verify signature". Which looks like it verifies the signature, but it actually reads the SPI. Very confusing.
- for (count = 0; count < oprom_size; count += 4) {
I915_WRITE(PRIMARY_SPI_ADDRESS, oprom_offset + count);
data = I915_READ(PRIMARY_SPI_TRIGGER);
- if (!oprom_opreg) {
drm_err(&dev_priv->drm, "opregion not found\n");
goto err_not_found;
- }
if (data == *((const u32 *)"$VBT")) {
found = oprom_offset + count;
- for (count = 0; count < opreg_size; count += 4) {
if (oprom_opreg[count / 4] == *((const u32 *)"$VBT")) {
} }found = count; break;
- if (count >= oprom_size)
if (count >= opreg_size) {
drm_err(&dev_priv->drm, "VBT not found in opregion\n");
goto err_not_found;
}
/* Get VBT size and allocate space for the VBT */
- I915_WRITE(PRIMARY_SPI_ADDRESS, found +
offsetof(struct vbt_header, vbt_size));
- vbt_size = I915_READ(PRIMARY_SPI_TRIGGER);
- vbt_size &= 0xffff;
parse_ptr = (u8 *)oprom_opreg + found;
vbt_size = ((struct vbt_header *)parse_ptr)->vbt_size;
vbt = kzalloc(vbt_size, GFP_KERNEL); if (!vbt) {
@@ -2127,16 +2126,12 @@ static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *dev_priv) goto err_not_found; }
- for (count = 0; count < vbt_size; count += 4) {
I915_WRITE(PRIMARY_SPI_ADDRESS, found + count);
data = I915_READ(PRIMARY_SPI_TRIGGER);
*(vbt + store++) = data;
- }
memcpy(vbt, parse_ptr, vbt_size); if (!intel_bios_is_valid_vbt(vbt, vbt_size)) goto err_free_vbt;
DRM_DEBUG_KMS("Found valid VBT in SPI flash\n");
kfree(oprom_opreg);
return (struct vbt_header *)vbt;
diff --git a/drivers/gpu/drm/i915/display/intel_opregion.c b/drivers/gpu/drm/i915/display/intel_opregion.c index 4f77cf849171..81e5946393dd 100644 --- a/drivers/gpu/drm/i915/display/intel_opregion.c +++ b/drivers/gpu/drm/i915/display/intel_opregion.c @@ -983,6 +983,175 @@ int intel_opregion_setup(struct drm_i915_private *dev_priv) return err; }
+static int oprom_image_parse_helper(u8 *parse_ptr, u8 *last_img, u8 *code_type,
struct drm_i915_private *i915)
+{
- u8 size_512_bytes;
- if (((union oprom_header *)parse_ptr)->signature != OPROM_IMAGE_MAGIC) {
drm_err(&i915->drm, "Wrong OPROM header signature.\n");
return -EINVAL;
- }
- size_512_bytes = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_IMAGE_LENGTH_OFFSET];
- *code_type = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_CODE_TYPE_OFFSET];
- *last_img = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_LAST_IMAGE_INDICATOR_OFFSET];
- return size_512_bytes;
+}
+static void spi_read_oprom_helper(size_t len, u32 offset, u32 *buf,
struct drm_i915_private *dev_priv)
+{
- u32 count, data;
- for (count = 0; count < len; count += 4) {
I915_WRITE(PRIMARY_SPI_ADDRESS, offset + count);
data = I915_READ(PRIMARY_SPI_TRIGGER);
buf[count / 4] = data;
- }
+}
+/**
DASH+G OPROM IMAGE LAYOUT +
- +--------+-------+---------------------------+
- | Offset | Value | ROM Header Fields +-----> Image 1 (CSS)
- +--------------------------------------------+
- | 0h | 55h | ROM Signature Byte1 |
- | 1h | AAh | ROM Signature Byte2 |
- | 2h | xx | Reserved |
- | 18+19h| xx | Ptr to PCI DataStructure |
- +----------------+---------------------------+
- | PCI Data Structure |
- +--------------------------------------------+
- | . . . |
- | . . . |
- | 10 + xx + Image Length |
- | 14 + xx + Code Type |
- | 15 + xx + Last Image Indicator |
- | . . . |
- +--------------------------------------------+
- | MEU BLOB |
- +--------------------------------------------+
- | CPD Header |
- | CPD Entry |
- | Reserved |
- | SignedDataPart1 |
- | PublicKey |
- | RSA Signature |
- | SignedDataPart2 |
- | IFWI Metadata |
- +--------+-------+---------------------------+
- | . | . | . |
- | . | . | . |
- +--------------------------------------------+
- | Offset | Value | ROM Header Fields +-----> Image 2 (Config Data) (Offset: 0x800)
- +--------------------------------------------+
- | 0h | 55h | ROM Signature Byte1 |
- | 1h | AAh | ROM Signature Byte2 |
- | 2h | xx | Reserved |
- | 18+19h| xx | Ptr to PCI DataStructure |
- +----------------+---------------------------+
- | PCI Data Structure |
- +--------------------------------------------+
- | . . . |
- | . . . |
- | 10 + xx + Image Length |
- | 14 + xx + Code Type |
- | 15 + xx + Last Image Indicator |
- | . . . |
- | 1A + 3C + Ptr to Opregion Signature |
- | . . . |
- | . . . |
- | 83Ch + IntelGraphicsMem | <---+ Opregion Signature
- +--------+-----------------------------------+
- intel_oprom_verify_signature() verify OPROM signature.
- @opreg: pointer to opregion buffer output.
- @opreg_size: pointer to opregion size output.
- @dev_priv: i915 device.
- */
+int +intel_oprom_verify_signature(u32 **opreg, u16 *opreg_size,
struct drm_i915_private *dev_priv)
+{
- u8 img_sig[sizeof(OPREGION_SIGNATURE)];
- u8 code_type, last_img;
- u32 static_region, offset;
- u32 *oprom_img, *oprom_img_hdr;
- u16 opreg_base, img_len;
- u8 *parse_ptr;
- int img_size;
- int ret = -EINVAL;
- /* initialize SPI to read the OPROM */
- static_region = I915_READ(SPI_STATIC_REGIONS);
- static_region &= OPTIONROM_SPI_REGIONID_MASK;
- I915_WRITE(PRIMARY_SPI_REGIONID, static_region);
- /* read OPROM offset in SPI flash */
- offset = I915_READ(OROM_OFFSET);
- offset &= OROM_OFFSET_MASK;
- oprom_img_hdr = kzalloc(OPROM_INITIAL_READ_SIZE, GFP_KERNEL);
- if (!oprom_img_hdr)
return -ENOMEM;
- do {
spi_read_oprom_helper(OPROM_INITIAL_READ_SIZE, offset,
oprom_img_hdr, dev_priv);
img_size = oprom_image_parse_helper((u8 *)oprom_img_hdr, &last_img,
&code_type, dev_priv);
if (img_size <= 0) {
ret = -EINVAL;
goto err_free_hdr;
}
img_len = img_size * OPROM_BYTE_BOUNDARY;
oprom_img = kzalloc(img_len, GFP_KERNEL);
if (!oprom_img) {
ret = -ENOMEM;
goto err_free_hdr;
}
spi_read_oprom_helper(img_len, offset, oprom_img, dev_priv);
parse_ptr = (u8 *)oprom_img;
offset = offset + img_len;
/* opregion base offset */
opreg_base = ((struct expansion_rom_header *)parse_ptr)->opregion_base;
/* CPD or opreg signature is present at opregion_base offset */
memcpy(img_sig, parse_ptr + opreg_base, sizeof(OPREGION_SIGNATURE));
if (!memcmp(img_sig, OPREGION_SIGNATURE, sizeof(OPREGION_SIGNATURE) - 1)) {
*opreg = oprom_img;
*opreg_size = img_len;
drm_dbg_kms(&dev_priv->drm, "Found opregion image\n");
ret = 0;
break;
} else if (!memcmp(img_sig, CPD_SIGNATURE, NUM_CPD_BYTES)) {
if (code_type != OPROM_CSS_CODE_TYPE) {
drm_err(&dev_priv->drm, "Invalid OPROM\n");
ret = -EINVAL;
goto err_free_img;
}
drm_dbg_kms(&dev_priv->drm, "Found CSS image\n");
/* proceed here onwards for signature authentication */
kfree(oprom_img);
continue;
}
- } while (last_img != LAST_IMG_INDICATOR);
- return ret;
+err_free_img:
- kfree(oprom_img);
+err_free_hdr:
- kfree(oprom_img_hdr);
- return ret;
+}
static int intel_use_opregion_panel_type_callback(const struct dmi_system_id *id) { DRM_INFO("Using panel type from OpRegion on %s\n", id->ident); diff --git a/drivers/gpu/drm/i915/display/intel_opregion.h b/drivers/gpu/drm/i915/display/intel_opregion.h index 4aa68ffbd30e..4e2eeadf101e 100644 --- a/drivers/gpu/drm/i915/display/intel_opregion.h +++ b/drivers/gpu/drm/i915/display/intel_opregion.h @@ -54,6 +54,34 @@ struct intel_opregion {
#define OPREGION_SIZE (8 * 1024)
+#define CPD_SIGNATURE "$CPD" /* CPD Signature */ +#define NUM_CPD_BYTES 4 +#define PCI_IMAGE_LENGTH_OFFSET 0x10 +#define PCI_CODE_TYPE_OFFSET 0x14 +#define PCI_LAST_IMAGE_INDICATOR_OFFSET 0x15 +#define LAST_IMG_INDICATOR 0x80 +#define OPROM_IMAGE_MAGIC 0xAA55 /* Little Endian */ +#define OPROM_CSS_CODE_TYPE 0xF0 +#define OPROM_BYTE_BOUNDARY 512 /* OPROM image sizes are indicated in 512 byte boundaries */ +#define OPROM_INITIAL_READ_SIZE 60 /* Read 60 bytes to compute the Img Len from PCI structure */
+union oprom_header {
- u32 data;
- struct {
u16 signature; /* Offset[0x0]: Header 0x55 0xAA */
u8 sizein512bytes;
u8 reserved;
- };
+};
What's the point of the union?
+struct expansion_rom_header {
- union oprom_header header; /* Offset[0x0]: Oprom Header */
- u16 vbiospostoffset; /* Offset[0x4]: pointer to VBIOS entry point */
- u8 resvd[0x12];
- u16 pcistructoffset; /* Offset[0x18]: Contains pointer PCI Data Structure */
- u16 opregion_base; /* Offset[0x1A]: Offset to Opregion Base start */
+};
AFAICT both of these should be hidden in the .c file instead of exposed to the rest of the driver, and they should be __packed as they're used for serialisation.
#ifdef CONFIG_ACPI
int intel_opregion_setup(struct drm_i915_private *dev_priv); @@ -118,5 +146,6 @@ static inline int intel_opregion_get_panel_type(struct drm_i915_private *dev) }
#endif /* CONFIG_ACPI */
+int intel_oprom_verify_signature(u32 **opreg, u16 *opreg_size,
struct drm_i915_private *i915);
This breaks the build for CONFIG_ACPI=n.
#endif
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Commit c457d9cf256e ("drm/i915: Make sure we have enough memory bandwidth on ICL") assumes that we always have a non-zero dram_info->channels and uses it as a divisor. We need num memory channels to be at least 1 for sane bw limits checking, even when PCode returns 0, so lets force it to 1 in this case.
Cc: Stanislav Lisovskiy stanislav.lisovskiy@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/display/intel_bw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c index bd060404d249..9e7971ce24b3 100644 --- a/drivers/gpu/drm/i915/display/intel_bw.c +++ b/drivers/gpu/drm/i915/display/intel_bw.c @@ -222,7 +222,7 @@ static int icl_get_bw_info(struct drm_i915_private *dev_priv, const struct intel "Failed to get memory subsystem information, ignoring bandwidth limits"); return ret; } - num_channels = qi.num_channels; + num_channels = max_t(u8, 1, qi.num_channels);
deinterleave = DIV_ROUND_UP(num_channels, is_y_tile ? 4 : 2); dclk_max = icl_sagv_max_dclk(&qi);
From: Clint Taylor clinton.a.taylor@intel.com
The PUNIT FW is currently returning 0 for all memory bandwidth parameters. Read the values directly from MCHBAR offsets 0x5918 and 0x4000(4). This is a temporary WA until the PUNIT FW returns valid values.
Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: Jani Saarinen jani.saarinen@intel.com Signed-off-by: Clint Taylor clinton.a.taylor@intel.com --- drivers/gpu/drm/i915/display/intel_bw.c | 54 ++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c index 9e7971ce24b3..5244ae77226d 100644 --- a/drivers/gpu/drm/i915/display/intel_bw.c +++ b/drivers/gpu/drm/i915/display/intel_bw.c @@ -90,6 +90,53 @@ static int icl_pcode_read_mem_global_info(struct drm_i915_private *dev_priv, return 0; }
+#define SA_PERF_STATUS_0_0_0_MCHBAR_PC _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5918) +#define DG1_QCLK_RATIO_MASK (0xFF << 2) +#define DG1_QCLK_RATIO_SHIFT 2 +#define DG1_QCLK_REFERENCE (1 << 10) + +#define MCHBAR_CH0_CR_TC_PRE_0_0_0_MCHBAR _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x4000) +#define MCHBAR_CH0_CR_TC_PRE_0_0_0_MCHBAR_HIGH _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x4004) +#define MCHBAR_CH1_CR_TC_PRE_0_0_0_MCHBAR _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x4400) +#define MCHBAR_CH1_CR_TC_PRE_0_0_0_MCHBAR_HIGH _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x4404) +#define DG1_DRAM_T_RCD_MASK (0x7F << 9) +#define DG1_DRAM_T_RCD_SHIFT 9 +#define DG1_DRAM_T_RDPRE_MASK (0x3F << 11) +#define DG1_DRAM_T_RDPRE_SHIFT 11 +#define DG1_DRAM_T_RAS_MASK (0xFF << 1) +#define DG1_DRAM_T_RAS_SHIFT 1 +#define DG1_DRAM_T_RP_MASK (0x7F << 0) +#define DG1_DRAM_T_RP_SHIFT 0 + +static int dg1_mchbar_read_qgv_point_info(struct drm_i915_private *dev_priv, + struct intel_qgv_point *sp, + int point) +{ + u32 val = 0; + u32 dclk_ratio = 0, dclk_reference = 0; + + val = I915_READ(SA_PERF_STATUS_0_0_0_MCHBAR_PC); + dclk_ratio = (val & DG1_QCLK_RATIO_MASK) >> DG1_QCLK_RATIO_SHIFT; + if (val & DG1_QCLK_REFERENCE) + dclk_reference = 6; /* 6 * 16.666 MHz = 100 MHz */ + else + dclk_reference = 8; /* 8 * 16.666 MHz = 133 MHz */ + sp->dclk = dclk_ratio * dclk_reference; + if (sp->dclk == 0) + return -EINVAL; + + val = I915_READ(MCHBAR_CH0_CR_TC_PRE_0_0_0_MCHBAR); + sp->t_rp = (val & DG1_DRAM_T_RP_MASK) >> DG1_DRAM_T_RP_SHIFT; + sp->t_rdpre = (val & DG1_DRAM_T_RDPRE_MASK) >> DG1_DRAM_T_RDPRE_SHIFT; + + val = I915_READ(MCHBAR_CH0_CR_TC_PRE_0_0_0_MCHBAR_HIGH); + sp->t_rcd = (val & DG1_DRAM_T_RCD_MASK) >> DG1_DRAM_T_RCD_SHIFT; + sp->t_ras = (val & DG1_DRAM_T_RAS_MASK) >> DG1_DRAM_T_RAS_SHIFT; + + sp->t_rc = sp->t_rp + sp->t_ras; + return 0; +} + static int icl_pcode_read_qgv_point_info(struct drm_i915_private *dev_priv, struct intel_qgv_point *sp, int point) @@ -153,7 +200,12 @@ static int icl_get_qgv_points(struct drm_i915_private *dev_priv, struct intel_qgv_point *sp = &qi->points[i];
ret = icl_pcode_read_qgv_point_info(dev_priv, sp, i); - if (ret) + if (IS_DG1(dev_priv) && (ret || sp->dclk == 0)) { + drm_dbg_kms(&dev_priv->drm, "Failed to get memory subsystem information via pcode. IFWI needs update. Trying with MCHBAR\n"); + ret = dg1_mchbar_read_qgv_point_info(dev_priv, sp, i); + if (ret) + return ret; + } else if (ret) return ret;
drm_dbg_kms(&dev_priv->drm,
From: Clint Taylor clinton.a.taylor@intel.com
Use MCHBAR Gear_type information to compute memory bandwidth available during MCHBAR calculations.
Cc: Swati Sharma swati2.sharma@intel.com Cc: Swati Sharma swati2.sharma@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Signed-off-by: Clint Taylor clinton.a.taylor@intel.com --- drivers/gpu/drm/i915/display/intel_bw.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c index 5244ae77226d..37fef3b5cb58 100644 --- a/drivers/gpu/drm/i915/display/intel_bw.c +++ b/drivers/gpu/drm/i915/display/intel_bw.c @@ -108,6 +108,9 @@ static int icl_pcode_read_mem_global_info(struct drm_i915_private *dev_priv, #define DG1_DRAM_T_RP_MASK (0x7F << 0) #define DG1_DRAM_T_RP_SHIFT 0
+#define ICL_GEAR_TYPE_MASK (0x01 << 16) +#define ICL_GEAR_TYPE_SHIFT 16 + static int dg1_mchbar_read_qgv_point_info(struct drm_i915_private *dev_priv, struct intel_qgv_point *sp, int point) @@ -122,6 +125,11 @@ static int dg1_mchbar_read_qgv_point_info(struct drm_i915_private *dev_priv, else dclk_reference = 8; /* 8 * 16.666 MHz = 133 MHz */ sp->dclk = dclk_ratio * dclk_reference; + + val = I915_READ(SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU); + if ((val & ICL_GEAR_TYPE_MASK) >> ICL_GEAR_TYPE_SHIFT) + sp->dclk *= 2; + if (sp->dclk == 0) return -EINVAL;
From: Michel Thierry michel.thierry@intel.com
Signed-off-by: Michel Thierry michel.thierry@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 10 +++++++++- drivers/gpu/drm/i915/gt/intel_timeline.c | 8 +++++++- 2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 0ba020346566..9e0394b06f38 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -25,6 +25,7 @@ #include <drm/drm_print.h>
#include "gem/i915_gem_context.h" +#include "gem/i915_gem_lmem.h"
#include "i915_drv.h"
@@ -657,7 +658,14 @@ static int init_status_page(struct intel_engine_cs *engine) * in GFP_DMA32 for i965, and no earlier physical address users had * access to more than 4G. */ - obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE); + if (HAS_LMEM(engine->i915)) { + obj = i915_gem_object_create_lmem(engine->i915, + PAGE_SIZE, + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_VOLATILE); + } else { + obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE); + } if (IS_ERR(obj)) { drm_err(&engine->i915->drm, "Failed to allocate status page\n"); diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 065943781586..589559b526eb 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -6,6 +6,7 @@
#include "i915_drv.h"
+#include "gem/i915_gem_lmem.h" #include "i915_active.h" #include "i915_syncmap.h" #include "intel_gt.h" @@ -34,7 +35,12 @@ static int __hwsp_alloc(struct intel_gt *gt, struct intel_timeline_hwsp *hwsp) int type; int ret;
- obj = i915_gem_object_create_internal(i915, PAGE_SIZE); + if (HAS_LMEM(i915)) + obj = i915_gem_object_create_lmem(i915, PAGE_SIZE, + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_VOLATILE); + else + obj = i915_gem_object_create_internal(i915, PAGE_SIZE); if (IS_ERR(obj)) return PTR_ERR(obj);
Quoting Matthew Auld (2020-11-27 12:06:40)
From: Michel Thierry michel.thierry@intel.com
Rationale goes here.
Is this wise? HWSP is very frequently read by the CPU, and expected to be cached on the CPU.
What do the performance profiles indicate? -Chris
On 27/11/2020 13:55, Chris Wilson wrote:
Quoting Matthew Auld (2020-11-27 12:06:40)
From: Michel Thierry michel.thierry@intel.com
Rationale goes here.
Is this wise? HWSP is very frequently read by the CPU, and expected to be cached on the CPU.
What do the performance profiles indicate?
Do you have a recommendation for an existing selftest or IGT to help measure this?
Also are you suggesting moving this to system memory, or just using a different mapping type, if it's placed in local memory? Or maybe try both? Although I'm pretty sceptical about !wc for local memory.
-Chris
Quoting Matthew Auld (2020-11-30 17:17:16)
On 27/11/2020 13:55, Chris Wilson wrote:
Quoting Matthew Auld (2020-11-27 12:06:40)
From: Michel Thierry michel.thierry@intel.com
Rationale goes here.
Is this wise? HWSP is very frequently read by the CPU, and expected to be cached on the CPU.
What do the performance profiles indicate?
Do you have a recommendation for an existing selftest or IGT to help measure this?
Also are you suggesting moving this to system memory, or just using a different mapping type, if it's placed in local memory? Or maybe try both? Although I'm pretty sceptical about !wc for local memory.
A lot of worries go out of the window if this can be in system memory and snooped.
For measuring, I suspect there is a lot of chaff that needs to be removed before individual microbenchmarks like perf/request discern any difference; although that would be a starting point. We do a lot of completion checking during execlists interrupt processing, and there we (cpu profiles at least) are sensitive to uncached reads.
We can trivially construct a benchmark that only shows the impact of the WC reads; but the point where I think we would first notice from userspace is client wakeup latency scaling: benchmarks/gem_latency, which was once a point of major concern. Nowadays, we can couple that with a second concern about inducing system latency from interrupt processing time. -Chris
From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com
when allocating pages to lmem object of size 4G or greater we allocate memory blocks from buddy system. In this scenario buddy sytem can allocate blocks that can have size >= 4G and these blocks require >32b to represent block size with these blocks we run into an issue with sg list construction because sg->length field is only 32b wide.
Hence limit the max allowed block size to less than 4G.
Cc: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com --- drivers/gpu/drm/i915/intel_memory_region.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 554fdd7735a8..371cd88ff6d8 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -101,6 +101,7 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, struct list_head *blocks) { unsigned int min_order = 0; + unsigned int max_order; unsigned long n_pages;
GEM_BUG_ON(!IS_ALIGNED(size, mem->mm.chunk_size)); @@ -121,6 +122,16 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem,
n_pages = size >> ilog2(mem->mm.chunk_size);
+ /* + * When allocating pages for an lmem object of size > 4G + * the memory blocks allocated from buddy system could be + * from sizes greater than 4G requiring > 32b to represent + * block size. But those blocks cannot be used in sg list + * construction(in caller) as sg->length is only 32b wide. + * Hence limiting the block size to 4G. + */ + max_order = (ilog2(SZ_4G) - 1) - ilog2(mem->mm.chunk_size); + mutex_lock(&mem->mm_lock);
do { @@ -128,7 +139,7 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, unsigned int order; bool retry = true; retry: - order = fls(n_pages) - 1; + order = min_t(u32, (fls(n_pages) - 1), max_order); GEM_BUG_ON(order > mem->mm.max_order); GEM_BUG_ON(order < min_order);
Quoting Matthew Auld (2020-11-27 12:06:41)
From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com
when allocating pages to lmem object of size 4G or greater we allocate memory blocks from buddy system.
Any lmem object is from the buddy system.
In this scenario buddy sytem can allocate blocks that can have size >= 4G and these blocks require >32b to represent block size with these blocks we run into an issue with sg list construction because sg->length field is only 32b wide.
Just say the when using scatterlist, the maximum segment size is 4G. In fact, we can ask sg what the backend maximum is, and use that as our max order.
The only question is whether this merits a flag, or we just assume that the buddy allocator is only used for objects and so always presented via sg? -Chris
From: Bommu Krishnaiah krishnaiah.bommu@intel.com
Update shmem available memory in “intel_memory_region”
Signed-off-by: Bommu Krishnaiah krishnaiah.bommu@intel.com Cc: Zbigniew Kempczyński zbigniew.kempczynski@intel.com Cc: Zbigniew Kempczyński zbigniew.kempczynski@intel.com Cc: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index b4dd7a709800..f4bac72b3ccd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -30,6 +30,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) struct drm_i915_private *i915 = to_i915(obj->base.dev); struct intel_memory_region *mem = obj->mm.region; const unsigned long page_count = obj->base.size / PAGE_SIZE; + resource_size_t size = obj->base.size; unsigned long i; struct address_space *mapping; struct sg_table *st; @@ -184,6 +185,8 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
__i915_gem_object_set_pages(obj, st, sg_page_sizes);
+ mem->avail -= size; + return 0;
err_sg: @@ -298,6 +301,8 @@ __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, struct sg_table *pages) { + struct intel_memory_region *mem = obj->mm.region; + resource_size_t size = obj->base.size; struct sgt_iter sgt_iter; struct pagevec pvec; struct page *page; @@ -326,6 +331,8 @@ void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, struct sg_ check_release_pagevec(&pvec); obj->mm.dirty = false;
+ mem->avail += size; + sg_free_table(pages); kfree(pages); }
Quoting Matthew Auld (2020-11-27 12:06:42)
From: Bommu Krishnaiah krishnaiah.bommu@intel.com
Update shmem available memory in “intel_memory_region”
Was avail ever set? -Chris
From: Stuart Summers stuart.summers@intel.com
The current implementation of intel_set_subslices only takes the number of bits per subslice stride and copies those in based on the slice given. For all known use cases, this works fine. But in the event of some faulty hardware or other future use case, do a straight memcpy of these subslice bits into the internal mask to ensure all subslices are correctly calculated.
Cc: Harish Chegondi harish.chegondi@intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Suggested-by: Harish Chegondi harish.chegondi@intel.com Signed-off-by: Stuart Summers stuart.summers@intel.com --- drivers/gpu/drm/i915/gt/intel_sseu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index 8a72e0fe34ca..b8a945166d32 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -104,6 +104,7 @@ static u16 compute_eu_total(const struct sseu_dev_info *sseu) static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, u8 s_en, u32 ss_en, u16 eu_en) { + u32 ss_mask; int s, ss;
/* ss_en represents entire subslice mask across all slices */ @@ -116,7 +117,10 @@ static void gen11_compute_sseu_info(struct sseu_dev_info *sseu,
sseu->slice_mask |= BIT(s);
- intel_sseu_set_subslices(sseu, s, ss_en); + ss_mask = ss_en >> (s * sseu->max_subslices); + ss_mask &= GENMASK(sseu->max_subslices - 1, 0); + + intel_sseu_set_subslices(sseu, s, ss_mask);
for (ss = 0; ss < sseu->max_subslices; ss++) if (intel_sseu_has_subslice(sseu, s, ss))
From: CQ Tang cq.tang@intel.com
Function i915_gem_shrink_memory_region() is changed to intel_memory_region_evict() and moved from i915_gem_shrinker.c to intel_memory_region.c, this function is used to handle local memory swapping, in addition to evict purgeable objects only.
When an object is selected from list, i915_gem_object_unbind() might fail if the object vma is pinned, this causes an error -EBUSY is returned from this function.
The new code uses similar logic as function i915_gem_shrink().
Signed-off-by: CQ Tang cq.tang@intel.com --- .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 - drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 58 ----------- drivers/gpu/drm/i915/gem/i915_gem_shrinker.h | 2 - drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/intel_memory_region.c | 95 +++++++++++++++++-- .../drm/i915/selftests/intel_memory_region.c | 3 +- 6 files changed, 94 insertions(+), 73 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 8d639509b78b..517a606ade8d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -237,7 +237,6 @@ struct drm_i915_gem_object { * region->obj_lock. */ struct list_head region_link; - struct list_head tmp_link;
struct sg_table *pages; void *mapping; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index 4d346df8fd5b..27674048f17d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -272,64 +272,6 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *i915) return freed; }
-int i915_gem_shrink_memory_region(struct intel_memory_region *mem, - resource_size_t target) -{ - struct drm_i915_private *i915 = mem->i915; - struct drm_i915_gem_object *obj; - resource_size_t purged; - LIST_HEAD(purgeable); - int err = -ENOSPC; - - intel_gt_retire_requests(&i915->gt); - - purged = 0; - - mutex_lock(&mem->objects.lock); - - while ((obj = list_first_entry_or_null(&mem->objects.purgeable, - typeof(*obj), - mm.region_link))) { - list_move_tail(&obj->mm.region_link, &purgeable); - - if (!i915_gem_object_has_pages(obj)) - continue; - - if (i915_gem_object_is_framebuffer(obj)) - continue; - - if (!kref_get_unless_zero(&obj->base.refcount)) - continue; - - mutex_unlock(&mem->objects.lock); - - if (!i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE)) { - if (i915_gem_object_trylock(obj)) { - __i915_gem_object_put_pages(obj); - if (!i915_gem_object_has_pages(obj)) { - purged += obj->base.size; - if (!i915_gem_object_is_volatile(obj)) - obj->mm.madv = __I915_MADV_PURGED; - } - i915_gem_object_unlock(obj); - } - } - - i915_gem_object_put(obj); - - mutex_lock(&mem->objects.lock); - - if (purged >= target) { - err = 0; - break; - } - } - - list_splice_tail(&purgeable, &mem->objects.purgeable); - mutex_unlock(&mem->objects.lock); - return err; -} - static unsigned long i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h index c945f3b587d6..7c1e648a8b44 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h @@ -31,7 +31,5 @@ void i915_gem_driver_register__shrinker(struct drm_i915_private *i915); void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915); void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915, struct mutex *mutex); -int i915_gem_shrink_memory_region(struct intel_memory_region *mem, - resource_size_t target);
#endif /* __I915_GEM_SHRINKER_H__ */ diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index bf67f323a1ae..85cbdb8e2bb8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1008,12 +1008,12 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
switch (obj->mm.madv) { case I915_MADV_WILLNEED: - list_move(&obj->mm.region_link, - &obj->mm.region->objects.list); + list_move_tail(&obj->mm.region_link, + &obj->mm.region->objects.list); break; default: - list_move(&obj->mm.region_link, - &obj->mm.region->objects.purgeable); + list_move_tail(&obj->mm.region_link, + &obj->mm.region->objects.purgeable); break; }
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 371cd88ff6d8..185eab497803 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -3,6 +3,7 @@ * Copyright © 2019 Intel Corporation */
+#include "gt/intel_gt_requests.h" #include "intel_memory_region.h" #include "i915_drv.h"
@@ -94,6 +95,90 @@ __intel_memory_region_put_block_buddy(struct i915_buddy_block *block) __intel_memory_region_put_pages_buddy(block->private, &blocks); }
+static int intel_memory_region_evict(struct intel_memory_region *mem, + resource_size_t target) +{ + struct drm_i915_private *i915 = mem->i915; + struct list_head still_in_list; + struct drm_i915_gem_object *obj; + struct list_head *phases[] = { + &mem->objects.purgeable, + &mem->objects.list, + NULL, + }; + struct list_head **phase; + resource_size_t found; + int pass; + + intel_gt_retire_requests(&i915->gt); + + found = 0; + pass = 0; + phase = phases; + +next: + INIT_LIST_HEAD(&still_in_list); + mutex_lock(&mem->objects.lock); + + while (found < target && + (obj = list_first_entry_or_null(*phase, + typeof(*obj), + mm.region_link))) { + list_move_tail(&obj->mm.region_link, &still_in_list); + + if (!i915_gem_object_has_pages(obj)) + continue; + + if (i915_gem_object_is_framebuffer(obj)) + continue; + + /* + * For IOMEM region, only swap user space objects. + * kernel objects are bound and causes a lot of unbind + * warning message in driver. + * FIXME: swap kernel object as well. + */ + if (i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM) + && !obj->base.handle_count) + continue; + + if (!kref_get_unless_zero(&obj->base.refcount)) + continue; + + mutex_unlock(&mem->objects.lock); + + if (!i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE)) { + if (i915_gem_object_trylock(obj)) { + __i915_gem_object_put_pages(obj); + /* May arrive from get_pages on another bo */ + if (!i915_gem_object_has_pages(obj)) { + found += obj->base.size; + if (obj->mm.madv == I915_MADV_DONTNEED) + obj->mm.madv = __I915_MADV_PURGED; + } + i915_gem_object_unlock(obj); + } + } + + i915_gem_object_put(obj); + mutex_lock(&mem->objects.lock); + + if (found >= target) + break; + } + list_splice_tail(&still_in_list, *phase); + mutex_unlock(&mem->objects.lock); + + if (found < target) { + pass++; + phase++; + if (*phase) + goto next; + } + + return (found < target) ? -ENOSPC : 0; +} + int __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, resource_size_t size, @@ -137,7 +222,7 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, do { struct i915_buddy_block *block; unsigned int order; - bool retry = true; + retry: order = min_t(u32, (fls(n_pages) - 1), max_order); GEM_BUG_ON(order > mem->mm.max_order); @@ -152,19 +237,15 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, resource_size_t target; int err;
- if (!retry) - goto err_free_blocks; - target = n_pages * mem->mm.chunk_size;
mutex_unlock(&mem->mm_lock); - err = i915_gem_shrink_memory_region(mem, - target); + err = intel_memory_region_evict(mem, + target); mutex_lock(&mem->mm_lock); if (err) goto err_free_blocks;
- retry = false; goto retry; } } while (1); diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 9df0a4f657c1..4b007ed48d2f 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -1093,7 +1093,8 @@ static void igt_mark_evictable(struct drm_i915_gem_object *obj) { i915_gem_object_unpin_pages(obj); obj->mm.madv = I915_MADV_DONTNEED; - list_move(&obj->mm.region_link, &obj->mm.region->objects.purgeable); + list_move_tail(&obj->mm.region_link, + &obj->mm.region->objects.purgeable); }
static int igt_mock_shrink(void *arg)
Quoting Matthew Auld (2020-11-27 12:06:44)
From: CQ Tang cq.tang@intel.com
Function i915_gem_shrink_memory_region() is changed to intel_memory_region_evict() and moved from i915_gem_shrinker.c to intel_memory_region.c, this function is used to handle local memory swapping, in addition to evict purgeable objects only.
We really do not want to conflate the system shrinker with eviction. Reservation based eviction looks nothing like the shrinker. -Chris
From: CQ Tang cq.tang@intel.com
i915_gem_object_memcpy() will copy the pages from source object to destination object by using memcpy. If source and destination are not the same size, copy the smaller pages.
Using pread/pwrite mechanism to do the page read/write.
Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 151 +++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 + 2 files changed, 153 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 89b530841126..65690e3bf648 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -30,11 +30,13 @@ #include "i915_drv.h" #include "i915_gem_clflush.h" #include "i915_gem_context.h" +#include "i915_gem_lmem.h" #include "i915_gem_mman.h" #include "i915_gem_object.h" #include "i915_gem_object_blt.h" #include "i915_gem_region.h" #include "i915_globals.h" +#include "i915_memcpy.h" #include "i915_trace.h"
static struct i915_global_object { @@ -449,6 +451,155 @@ int i915_gem_object_migrate(struct drm_i915_gem_object *obj, return err; }
+struct object_memcpy_info { + struct drm_i915_gem_object *obj; + intel_wakeref_t wakeref; + bool write; + int clflush; + struct page *page; + void *vaddr; + void *(*get_vaddr)(struct object_memcpy_info *info, + unsigned long idx); + void (*put_vaddr)(struct object_memcpy_info *info); +}; + +static +void *lmem_get_vaddr(struct object_memcpy_info *info, unsigned long idx) +{ + info->vaddr = i915_gem_object_lmem_io_map_page(info->obj, idx); + return info->vaddr; +} + +static +void lmem_put_vaddr(struct object_memcpy_info *info) +{ + io_mapping_unmap(info->vaddr); +} + +static +void *smem_get_vaddr(struct object_memcpy_info *info, unsigned long idx) +{ + info->page = i915_gem_object_get_page(info->obj, (unsigned int)idx); + info->vaddr = kmap(info->page); + if (info->clflush & CLFLUSH_BEFORE) + drm_clflush_virt_range(info->vaddr, PAGE_SIZE); + return info->vaddr; +} + +static +void smem_put_vaddr(struct object_memcpy_info *info) +{ + if (info->clflush & CLFLUSH_AFTER) + drm_clflush_virt_range(info->vaddr, PAGE_SIZE); + kunmap(info->page); +} + +static int +i915_gem_object_prepare_memcpy(struct drm_i915_gem_object *obj, + struct object_memcpy_info *info, + bool write) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + int ret; + + assert_object_held(obj); + ret = i915_gem_object_wait(obj, + I915_WAIT_INTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT); + if (ret) + return ret; + + ret = i915_gem_object_pin_pages(obj); + if (ret) + return ret; + + if (i915_gem_object_is_lmem(obj)) { + ret = i915_gem_object_set_to_wc_domain(obj, write); + if (!ret) { + info->wakeref = + intel_runtime_pm_get(&i915->runtime_pm); + info->get_vaddr = lmem_get_vaddr; + info->put_vaddr = lmem_put_vaddr; + } + } else { + if (write) + ret = i915_gem_object_prepare_write(obj, + &info->clflush); + else + ret = i915_gem_object_prepare_read(obj, + &info->clflush); + + if (!ret) { + i915_gem_object_finish_access(obj); + info->get_vaddr = smem_get_vaddr; + info->put_vaddr = smem_put_vaddr; + } + } + + if (!ret) { + info->obj = obj; + info->write = write; + } else { + i915_gem_object_unpin_pages(obj); + } + + return ret; +} + +static void +i915_gem_object_finish_memcpy(struct object_memcpy_info *info) +{ + struct drm_i915_private *i915 = to_i915(info->obj->base.dev); + + if (i915_gem_object_is_lmem(info->obj)) { + intel_runtime_pm_put(&i915->runtime_pm, info->wakeref); + } else { + if (info->write) { + i915_gem_object_flush_frontbuffer(info->obj, + ORIGIN_CPU); + info->obj->mm.dirty = true; + } + } + i915_gem_object_unpin_pages(info->obj); +} + +int i915_gem_object_memcpy(struct drm_i915_gem_object *dst, + struct drm_i915_gem_object *src) +{ + struct object_memcpy_info sinfo, dinfo; + void *svaddr, *dvaddr; + unsigned long npages; + int i, ret; + + ret = i915_gem_object_prepare_memcpy(src, &sinfo, false); + if (ret) + return ret; + + ret = i915_gem_object_prepare_memcpy(dst, &dinfo, true); + if (ret) + goto finish_src; + + npages = src->base.size / PAGE_SIZE; + for (i = 0; i < npages; i++) { + svaddr = sinfo.get_vaddr(&sinfo, i); + dvaddr = dinfo.get_vaddr(&dinfo, i); + + /* a performance optimization */ + if (!i915_gem_object_is_lmem(src) || + !i915_memcpy_from_wc(dvaddr, svaddr, PAGE_SIZE)) + memcpy(dvaddr, svaddr, PAGE_SIZE); + + dinfo.put_vaddr(&dinfo); + sinfo.put_vaddr(&sinfo); + } + + i915_gem_object_finish_memcpy(&dinfo); +finish_src: + i915_gem_object_finish_memcpy(&sinfo); + + return ret; +} + static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) { return !(obj->cache_level == I915_CACHE_NONE || diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 1a1aa71a4494..175258106642 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -57,6 +57,8 @@ int i915_gem_object_migrate(struct drm_i915_gem_object *obj, struct i915_gem_ww_ctx *ww, struct intel_context *ce, enum intel_region_id id); +int i915_gem_object_memcpy(struct drm_i915_gem_object *dst, + struct drm_i915_gem_object *src);
void i915_gem_flush_free_objects(struct drm_i915_private *i915);
From: CQ Tang cq.tang@intel.com
When an object is pinning, get_pages() is called to allocate memory on a region, if memory pages are not availabe, the region eviction is triggered to find other objects on the same region to be evicted, the selected object is passed to put_pages() call to free the memory pages, before freeing the pages, whether to swap out first to system memory depends on if object is marked as WILLNEED.
After swapped-out, the object is treated as if it does not have any page allocated for it.
Similarly, when an object is pinning, memory pages are allocated from a region, then the object is checked if it had been swapped out before, if yes, swap-in the pages contents into the allocated memory pages.
For this initial swapping code, i915_gem_object_memcpy() is used to copy pages.
Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 12 +- drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 6 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 1 - drivers/gpu/drm/i915/gem/i915_gem_region.c | 139 +++++++++++++++++- drivers/gpu/drm/i915/intel_memory_region.c | 6 + 6 files changed, 162 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 65690e3bf648..7cb5f137522f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -178,6 +178,8 @@ static void __i915_gem_free_object_rcu(struct rcu_head *head) container_of(head, typeof(*obj), rcu); struct drm_i915_private *i915 = to_i915(obj->base.dev);
+ /* Reset shared reservation object */ + obj->base.resv = &obj->base._resv; dma_resv_fini(&obj->base._resv); i915_gem_object_free(obj);
@@ -185,7 +187,7 @@ static void __i915_gem_free_object_rcu(struct rcu_head *head) atomic_dec(&i915->mm.free_count); }
-static void __i915_gem_object_free_mmaps(struct drm_i915_gem_object *obj) +void __i915_gem_object_free_mmaps(struct drm_i915_gem_object *obj) { /* Skip serialisation and waking the device if known to be not used. */
@@ -287,6 +289,14 @@ static void i915_gem_free_object(struct drm_gem_object *gem_obj)
GEM_BUG_ON(i915_gem_object_is_framebuffer(obj));
+ /* + * If object had been swapped out, free the hidden object. + */ + if (obj->swapto) { + i915_gem_object_put(obj->swapto); + obj->swapto = NULL; + } + /* * Before we free the object, make sure any pure RCU-only * read-side critical sections are complete, e.g. diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 175258106642..ee1914ed2070 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -366,6 +366,8 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj); int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
+void __i915_gem_object_free_mmaps(struct drm_i915_gem_object *obj); + static inline int __must_check i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 517a606ade8d..e9f42d3137b3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -316,6 +316,12 @@ struct drm_i915_gem_object {
void *gvt_info; }; + + /** + * object to swap-to if non-null. + */ + bool do_swapping; + struct drm_i915_gem_object *swapto; };
static inline struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 2cdb7cf63383..d0f3da0925f5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -231,7 +231,6 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) }
__i915_gem_object_reset_page_iter(obj); - obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
return pages; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index e497ff374b13..a437538cd872 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -7,11 +7,135 @@ #include "i915_gem_region.h" #include "i915_drv.h" #include "i915_trace.h" +#include "i915_gem_mman.h" + +static int +i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, + struct sg_table *pages, unsigned int sizes) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct drm_i915_gem_object *dst, *src; + int err; + + GEM_BUG_ON(obj->swapto); + GEM_BUG_ON(i915_gem_object_has_pages(obj)); + GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); + GEM_BUG_ON(obj->mm.region->type != INTEL_MEMORY_LOCAL); + + assert_object_held(obj); + + /* create a shadow object on smem region */ + dst = i915_gem_object_create_shmem(i915, obj->base.size); + if (IS_ERR(dst)) + return PTR_ERR(dst); + + /* Share the dma-resv between the shadow- and the parent object */ + dst->base.resv = obj->base.resv; + assert_object_held(dst); + + /* + * create working object on the same region as 'obj', + * if 'obj' is used directly, it is set pages and is pinned + * again, other thread may wrongly use 'obj' pages. + */ + src = i915_gem_object_create_region(obj->mm.region, + obj->base.size, 0); + if (IS_ERR(src)) { + i915_gem_object_put(dst); + return PTR_ERR(src); + } + + /* set and pin working object pages */ + i915_gem_object_lock_isolated(src); + __i915_gem_object_set_pages(src, pages, sizes); + __i915_gem_object_pin_pages(src); + + /* copying the pages */ + err = i915_gem_object_memcpy(dst, src); + + __i915_gem_object_unpin_pages(src); + __i915_gem_object_unset_pages(src); + i915_gem_object_unlock(src); + i915_gem_object_put(src); + + if (!err) + obj->swapto = dst; + else + i915_gem_object_put(dst); + + return err; +} + +static int +i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, + struct sg_table *pages, unsigned int sizes) +{ + struct drm_i915_gem_object *dst, *src; + int err; + + GEM_BUG_ON(!obj->swapto); + GEM_BUG_ON(i915_gem_object_has_pages(obj)); + GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); + GEM_BUG_ON(obj->mm.region->type != INTEL_MEMORY_LOCAL); + + assert_object_held(obj); + + src = obj->swapto; + + /* + * create working object on the same region as 'obj', + * if 'obj' is used directly, it is set pages and is pinned + * again, other thread may wrongly use 'obj' pages. + */ + dst = i915_gem_object_create_region(obj->mm.region, + obj->base.size, 0); + if (IS_ERR(dst)) { + err = PTR_ERR(dst); + return err; + } + + /* @scr is sharing @obj's reservation object */ + assert_object_held(src); + + /* set and pin working object pages */ + i915_gem_object_lock_isolated(dst); + __i915_gem_object_set_pages(dst, pages, sizes); + __i915_gem_object_pin_pages(dst); + + /* copying the pages */ + err = i915_gem_object_memcpy(dst, src); + + __i915_gem_object_unpin_pages(dst); + __i915_gem_object_unset_pages(dst); + i915_gem_object_unlock(dst); + i915_gem_object_put(dst); + + if (!err) { + obj->swapto = NULL; + i915_gem_object_put(src); + } + + return err; +}
void i915_gem_object_put_pages_buddy(struct drm_i915_gem_object *obj, struct sg_table *pages) { + /* if need to save the page contents, swap them out */ + if (obj->do_swapping) { + unsigned int sizes = obj->mm.page_sizes.phys; + + GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); + GEM_BUG_ON(i915_gem_object_is_volatile(obj)); + + if (i915_gem_object_swapout_pages(obj, pages, sizes)) { + /* swapout failed, keep the pages */ + __i915_gem_object_set_pages(obj, pages, sizes); + return; + } + } + __intel_memory_region_put_pages_buddy(obj->mm.region, &obj->mm.blocks);
obj->mm.dirty = false; @@ -95,8 +219,19 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) sg_mark_end(sg); i915_sg_trim(st);
- /* Intended for kernel internal use only */ - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) { + /* if we saved the page contents, swap them in */ + if (obj->swapto) { + GEM_BUG_ON(i915_gem_object_is_volatile(obj)); + + ret = i915_gem_object_swapin_pages(obj, st, + sg_page_sizes); + if (ret) { + /* swapin failed, free the pages */ + __intel_memory_region_put_pages_buddy(mem, blocks); + ret = -ENXIO; + goto err_free_sg; + } + } else if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) { struct scatterlist *sg; unsigned long i;
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 185eab497803..afcd6fe6eaff 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -147,6 +147,11 @@ static int intel_memory_region_evict(struct intel_memory_region *mem,
mutex_unlock(&mem->objects.lock);
+ /* tell callee to do swapping */ + if (i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM) + && pass == 1) + obj->do_swapping = true; + if (!i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE)) { if (i915_gem_object_trylock(obj)) { __i915_gem_object_put_pages(obj); @@ -160,6 +165,7 @@ static int intel_memory_region_evict(struct intel_memory_region *mem, } }
+ obj->do_swapping = false; i915_gem_object_put(obj); mutex_lock(&mem->objects.lock);
From: CQ Tang cq.tang@intel.com
enable_eviction is used to tune if eviction is enabled (default) or not.
Signed-off-by: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_region.c | 5 +++++ drivers/gpu/drm/i915/i915_params.c | 3 +++ drivers/gpu/drm/i915/i915_params.h | 1 + drivers/gpu/drm/i915/intel_memory_region.c | 2 +- 5 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 7cb5f137522f..46d0f8731db0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -293,6 +293,7 @@ static void i915_gem_free_object(struct drm_gem_object *gem_obj) * If object had been swapped out, free the hidden object. */ if (obj->swapto) { + GEM_BUG_ON(!i915->params.enable_eviction); i915_gem_object_put(obj->swapto); obj->swapto = NULL; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index a437538cd872..e1793c5f8d8c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -21,6 +21,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(i915_gem_object_has_pages(obj)); GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); GEM_BUG_ON(obj->mm.region->type != INTEL_MEMORY_LOCAL); + GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj);
@@ -70,6 +71,7 @@ static int i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, unsigned int sizes) { + struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src; int err;
@@ -77,6 +79,7 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(i915_gem_object_has_pages(obj)); GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); GEM_BUG_ON(obj->mm.region->type != INTEL_MEMORY_LOCAL); + GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj);
@@ -146,6 +149,7 @@ i915_gem_object_put_pages_buddy(struct drm_i915_gem_object *obj, int i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) { + struct drm_i915_private *i915 = to_i915(obj->base.dev); struct intel_memory_region *mem = obj->mm.region; struct list_head *blocks = &obj->mm.blocks; resource_size_t size = obj->base.size; @@ -222,6 +226,7 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) /* if we saved the page contents, swap them in */ if (obj->swapto) { GEM_BUG_ON(i915_gem_object_is_volatile(obj)); + GEM_BUG_ON(!i915->params.enable_eviction);
ret = i915_gem_object_swapin_pages(obj, st, sg_page_sizes); diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 7f139ea4a90b..bb1ebb6ece95 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -197,6 +197,9 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400, "Fake LMEM start offset (default: 0)"); #endif
+i915_param_named_unsafe(enable_eviction, bool, 0600, + "Enable memcpy based eviction which does not rely on DMA resv refactoring)"); + static __always_inline void _print_param(struct drm_printer *p, const char *name, const char *type, diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h index 330c03e2b4f7..87df407d9afb 100644 --- a/drivers/gpu/drm/i915/i915_params.h +++ b/drivers/gpu/drm/i915/i915_params.h @@ -72,6 +72,7 @@ struct drm_printer; param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \ param(unsigned long, fake_lmem_start, 0, 0400) \ /* leave bools at the end to not create holes */ \ + param(bool, enable_eviction, true, 0600) \ param(bool, enable_hangcheck, true, 0600) \ param(bool, load_detect_test, false, 0600) \ param(bool, force_reset_modeset_test, false, 0600) \ diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index afcd6fe6eaff..57f01ef16628 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -175,7 +175,7 @@ static int intel_memory_region_evict(struct intel_memory_region *mem, list_splice_tail(&still_in_list, *phase); mutex_unlock(&mem->objects.lock);
- if (found < target) { + if (found < target && i915->params.enable_eviction) { pass++; phase++; if (*phase)
On Fri, 27 Nov 2020, Matthew Auld matthew.auld@intel.com wrote:
From: CQ Tang cq.tang@intel.com
enable_eviction is used to tune if eviction is enabled (default) or not.
Signed-off-by: Sudeep Dutt sudeep.dutt@intel.com Signed-off-by: CQ Tang cq.tang@intel.com
drivers/gpu/drm/i915/gem/i915_gem_object.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_region.c | 5 +++++ drivers/gpu/drm/i915/i915_params.c | 3 +++ drivers/gpu/drm/i915/i915_params.h | 1 + drivers/gpu/drm/i915/intel_memory_region.c | 2 +- 5 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 7cb5f137522f..46d0f8731db0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -293,6 +293,7 @@ static void i915_gem_free_object(struct drm_gem_object *gem_obj) * If object had been swapped out, free the hidden object. */ if (obj->swapto) {
i915_gem_object_put(obj->swapto); obj->swapto = NULL; }GEM_BUG_ON(!i915->params.enable_eviction);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index a437538cd872..e1793c5f8d8c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -21,6 +21,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(i915_gem_object_has_pages(obj)); GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); GEM_BUG_ON(obj->mm.region->type != INTEL_MEMORY_LOCAL);
GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj);
@@ -70,6 +71,7 @@ static int i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, unsigned int sizes) {
- struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src; int err;
@@ -77,6 +79,7 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(i915_gem_object_has_pages(obj)); GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED); GEM_BUG_ON(obj->mm.region->type != INTEL_MEMORY_LOCAL);
GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj);
@@ -146,6 +149,7 @@ i915_gem_object_put_pages_buddy(struct drm_i915_gem_object *obj, int i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) {
- struct drm_i915_private *i915 = to_i915(obj->base.dev); struct intel_memory_region *mem = obj->mm.region; struct list_head *blocks = &obj->mm.blocks; resource_size_t size = obj->base.size;
@@ -222,6 +226,7 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) /* if we saved the page contents, swap them in */ if (obj->swapto) { GEM_BUG_ON(i915_gem_object_is_volatile(obj));
GEM_BUG_ON(!i915->params.enable_eviction);
ret = i915_gem_object_swapin_pages(obj, st, sg_page_sizes);
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 7f139ea4a90b..bb1ebb6ece95 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -197,6 +197,9 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400, "Fake LMEM start offset (default: 0)"); #endif
+i915_param_named_unsafe(enable_eviction, bool, 0600,
- "Enable memcpy based eviction which does not rely on DMA resv refactoring)");
Does the module parameter actually need to be writable? Should be modified via debugfs as a device specific parameter?
BR, Jani.
static __always_inline void _print_param(struct drm_printer *p, const char *name, const char *type, diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h index 330c03e2b4f7..87df407d9afb 100644 --- a/drivers/gpu/drm/i915/i915_params.h +++ b/drivers/gpu/drm/i915/i915_params.h @@ -72,6 +72,7 @@ struct drm_printer; param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \ param(unsigned long, fake_lmem_start, 0, 0400) \ /* leave bools at the end to not create holes */ \
- param(bool, enable_eviction, true, 0600) \ param(bool, enable_hangcheck, true, 0600) \ param(bool, load_detect_test, false, 0600) \ param(bool, force_reset_modeset_test, false, 0600) \
diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index afcd6fe6eaff..57f01ef16628 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -175,7 +175,7 @@ static int intel_memory_region_evict(struct intel_memory_region *mem, list_splice_tail(&still_in_list, *phase); mutex_unlock(&mem->objects.lock);
- if (found < target) {
- if (found < target && i915->params.enable_eviction) { pass++; phase++; if (*phase)
From: CQ Tang cq.tang@intel.com
lmem_size is used to limit the amount of lmem_size. Default is to use hardware available lmem size, when setting this modpraram which is in MB unit.
Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/i915_params.c | 3 +++ drivers/gpu/drm/i915/i915_params.h | 1 + drivers/gpu/drm/i915/intel_region_lmem.c | 4 ++++ 3 files changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index bb1ebb6ece95..264de32f3d6a 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -200,6 +200,9 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400, i915_param_named_unsafe(enable_eviction, bool, 0600, "Enable memcpy based eviction which does not rely on DMA resv refactoring)");
+i915_param_named_unsafe(lmem_size, uint, 0400, + "Change lmem size for each region. (default: 0, all memory)"); + static __always_inline void _print_param(struct drm_printer *p, const char *name, const char *type, diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h index 87df407d9afb..be6979e7feda 100644 --- a/drivers/gpu/drm/i915/i915_params.h +++ b/drivers/gpu/drm/i915/i915_params.h @@ -71,6 +71,7 @@ struct drm_printer; param(int, enable_dpcd_backlight, -1, 0600) \ param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \ param(unsigned long, fake_lmem_start, 0, 0400) \ + param(unsigned int, lmem_size, 0, 0400) \ /* leave bools at the end to not create holes */ \ param(bool, enable_eviction, true, 0600) \ param(bool, enable_hangcheck, true, 0600) \ diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index eafef7034680..1cdb6354b968 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -196,6 +196,10 @@ setup_lmem(struct drm_i915_private *dev_priv)
io_start = pci_resource_start(pdev, 2);
+ if (dev_priv->params.lmem_size > 0) + lmem_size = min_t(resource_size_t, lmem_size, + mul_u32_u32(dev_priv->params.lmem_size, SZ_1M)); + mem = intel_memory_region_create(dev_priv, 0, lmem_size,
From: Sudeep Dutt sudeep.dutt@intel.com
cat /sys/kernel/debug/dri/0/i915_gem_objects num_bytes_swapped_out 94170000 num_bytes_swapped_in 56120000
Signed-off-by: Sudeep Dutt sudeep.dutt@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_region.c | 6 ++++++ drivers/gpu/drm/i915/i915_debugfs.c | 3 +++ drivers/gpu/drm/i915/i915_drv.h | 3 +++ 3 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index e1793c5f8d8c..ed108dbcb34e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -64,6 +64,9 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, else i915_gem_object_put(dst);
+ if (!err) + atomic_long_add(sizes, &i915->num_bytes_swapped_out); + return err; }
@@ -118,6 +121,9 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, i915_gem_object_put(src); }
+ if (!err) + atomic_long_add(sizes, &i915->num_bytes_swapped_in); + return err; }
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 6d1482c82694..1b7e9b6ab660 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -372,6 +372,9 @@ static int i915_gem_object_info(struct seq_file *m, void *data) for_each_memory_region(mr, i915, id) seq_printf(m, "%s: total:%pa, available:%pa bytes\n", mr->name, &mr->total, &mr->avail); + seq_printf(m, "num_bytes_swapped_out %ld num_bytes_swapped_in %ld\n", + atomic_long_read(&i915->num_bytes_swapped_out), + atomic_long_read(&i915->num_bytes_swapped_in)); seq_putc(m, '\n');
print_context_stats(m, i915); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1366b53ac8c9..7b1e95d494e6 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1214,6 +1214,9 @@ struct drm_i915_private { * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch * will be rejected. Instead look for a better place. */ + + atomic_long_t num_bytes_swapped_out; + atomic_long_t num_bytes_swapped_in; };
static inline struct drm_i915_private *to_i915(const struct drm_device *dev)
Quoting Matthew Auld (2020-11-27 12:06:49)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1366b53ac8c9..7b1e95d494e6 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1214,6 +1214,9 @@ struct drm_i915_private { * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch * will be rejected. Instead look for a better place. */
atomic_long_t num_bytes_swapped_out;
atomic_long_t num_bytes_swapped_in;
Enough said. Don't mindlessly add fields. -Chris
From: Sudeep Dutt sudeep.dutt@intel.com
Signed-off-by: Sudeep Dutt sudeep.dutt@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_region.c | 16 ++++++++++++++-- drivers/gpu/drm/i915/i915_debugfs.c | 3 +++ drivers/gpu/drm/i915/i915_drv.h | 2 ++ 3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index ed108dbcb34e..4fab9f6b4bee 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -15,6 +15,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, { struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src; + unsigned long start, diff, msec; int err;
GEM_BUG_ON(obj->swapto); @@ -24,6 +25,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj); + start = jiffies;
/* create a shadow object on smem region */ dst = i915_gem_object_create_shmem(i915, obj->base.size); @@ -64,8 +66,12 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, else i915_gem_object_put(dst);
- if (!err) + if (!err) { + diff = jiffies - start; + msec = diff * 1000 / HZ; + atomic_long_add(msec, &i915->time_swap_out_ms); atomic_long_add(sizes, &i915->num_bytes_swapped_out); + }
return err; } @@ -76,6 +82,7 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, { struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src; + unsigned long start, diff, msec; int err;
GEM_BUG_ON(!obj->swapto); @@ -85,6 +92,7 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj); + start = jiffies;
src = obj->swapto;
@@ -121,8 +129,12 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, i915_gem_object_put(src); }
- if (!err) + if (!err) { + diff = jiffies - start; + msec = diff * 1000 / HZ; + atomic_long_add(msec, &i915->time_swap_in_ms); atomic_long_add(sizes, &i915->num_bytes_swapped_in); + }
return err; } diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 1b7e9b6ab660..2bf51dd9de7c 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -375,6 +375,9 @@ static int i915_gem_object_info(struct seq_file *m, void *data) seq_printf(m, "num_bytes_swapped_out %ld num_bytes_swapped_in %ld\n", atomic_long_read(&i915->num_bytes_swapped_out), atomic_long_read(&i915->num_bytes_swapped_in)); + seq_printf(m, "time_swap_out_msec %ld time_swap_in_msec %ld\n", + atomic_long_read(&i915->time_swap_out_ms), + atomic_long_read(&i915->time_swap_in_ms)); seq_putc(m, '\n');
print_context_stats(m, i915); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7b1e95d494e6..10823abab224 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,8 @@ struct drm_i915_private {
atomic_long_t num_bytes_swapped_out; atomic_long_t num_bytes_swapped_in; + atomic_long_t time_swap_out_ms; + atomic_long_t time_swap_in_ms; };
static inline struct drm_i915_private *to_i915(const struct drm_device *dev)
Quoting Matthew Auld (2020-11-27 12:06:50)
From: Sudeep Dutt sudeep.dutt@intel.com
Signed-off-by: Sudeep Dutt sudeep.dutt@intel.com
drivers/gpu/drm/i915/gem/i915_gem_region.c | 16 ++++++++++++++-- drivers/gpu/drm/i915/i915_debugfs.c | 3 +++ drivers/gpu/drm/i915/i915_drv.h | 2 ++ 3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index ed108dbcb34e..4fab9f6b4bee 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -15,6 +15,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, { struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src;
unsigned long start, diff, msec; int err; GEM_BUG_ON(obj->swapto);
@@ -24,6 +25,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj);
start = jiffies; /* create a shadow object on smem region */ dst = i915_gem_object_create_shmem(i915, obj->base.size);
@@ -64,8 +66,12 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, else i915_gem_object_put(dst);
if (!err)
if (!err) {
diff = jiffies - start;
msec = diff * 1000 / HZ;
atomic_long_add(msec, &i915->time_swap_out_ms); atomic_long_add(sizes, &i915->num_bytes_swapped_out);
}
This can be done using a kprobe, and with prettier statistics as builtin functionality. -Chris
From: Ramalingam C ramalingam.c@intel.com
Function to retrieve the partial pages from the object, from mentioned offset(pages). This is created as a subset of intel_partial pages to be used for window blt copy feature which is introduced in forthcoming patches.
This takes the sg_table to be filled in with pages and also passes out the ptr to last scatterlist used. sg_table is trimmed based on the parameter.
Signed-off-by: Ramalingam C ramalingam.c@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 59 +++++++++++++++++----------- drivers/gpu/drm/i915/gt/intel_gtt.h | 4 ++ 2 files changed, 40 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index eed5b640e493..21804c4cef9c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -1383,25 +1383,17 @@ intel_remap_pages(struct intel_remapped_info *rem_info, return ERR_PTR(ret); }
-static noinline struct sg_table * -intel_partial_pages(const struct i915_ggtt_view *view, - struct drm_i915_gem_object *obj) +void intel_partial_pages_for_sg_table(struct drm_i915_gem_object *obj, + struct sg_table *st, + u32 obj_offset, u32 page_count, + struct scatterlist **sgl) { - struct sg_table *st; struct scatterlist *sg, *iter; - unsigned int count = view->partial.size; unsigned int offset; - int ret = -ENOMEM;
- st = kmalloc(sizeof(*st), GFP_KERNEL); - if (!st) - goto err_st_alloc; + GEM_BUG_ON(!st);
- ret = sg_alloc_table(st, count, GFP_KERNEL); - if (ret) - goto err_sg_alloc; - - iter = i915_gem_object_get_sg_dma(obj, view->partial.offset, &offset, true); + iter = i915_gem_object_get_sg_dma(obj, obj_offset, &offset, true); GEM_BUG_ON(!iter);
sg = st->sgl; @@ -1410,30 +1402,51 @@ intel_partial_pages(const struct i915_ggtt_view *view, unsigned int len;
len = min(sg_dma_len(iter) - (offset << PAGE_SHIFT), - count << PAGE_SHIFT); + page_count << PAGE_SHIFT); + sg_set_page(sg, NULL, len, 0); sg_dma_address(sg) = sg_dma_address(iter) + (offset << PAGE_SHIFT); sg_dma_len(sg) = len;
st->nents++; - count -= len >> PAGE_SHIFT; - if (count == 0) { + page_count -= len >> PAGE_SHIFT; + if (page_count == 0) { sg_mark_end(sg); - i915_sg_trim(st); /* Drop any unused tail entries. */ + if (sgl) + *sgl = sg;
- return st; + return; }
sg = __sg_next(sg); iter = __sg_next(iter); offset = 0; } while (1); +}
-err_sg_alloc: - kfree(st); -err_st_alloc: - return ERR_PTR(ret); +static noinline struct sg_table * +intel_partial_pages(const struct i915_ggtt_view *view, + struct drm_i915_gem_object *obj) +{ + struct sg_table *st; + int ret; + + st = kmalloc(sizeof(*st), GFP_KERNEL); + if (!st) + return ERR_PTR(-ENOMEM); + + ret = sg_alloc_table(st, view->partial.size, GFP_KERNEL); + if (ret) { + kfree(st); + return ERR_PTR(ret); + } + + intel_partial_pages_for_sg_table(obj, st, view->partial.offset, + view->partial.size, NULL); + i915_sg_trim(st); + + return st; }
static int diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index db3626c0ee20..37d2c692c0af 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -506,6 +506,10 @@ static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt) return ggtt->mappable_end > 0; }
+void intel_partial_pages_for_sg_table(struct drm_i915_gem_object *obj, + struct sg_table *st, + u32 obj_offset, u32 page_count, + struct scatterlist **sgl); int i915_ppgtt_init_hw(struct intel_gt *gt);
struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt);
From: Ramalingam C ramalingam.c@intel.com
Functions for window_blt_copy defined to create and destroy the dummy vmas for virtual memory, which dont have any associated objects.
These dummy vmas are used at window_blt_copy festure to associated to set of pages and create ptes at runtime and submit it for blt copy.
Signed-off-by: Ramalingam C ramalingam.c@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/i915_vma.c | 38 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_vma.h | 6 ++++++ 2 files changed, 44 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 59fe82af48b2..5537950e310f 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -100,6 +100,44 @@ static void __i915_vma_retire(struct i915_active *ref) i915_vma_put(active_to_vma(ref)); }
+struct i915_vma * +i915_alloc_window_vma(struct drm_i915_private *i915, + struct i915_address_space *vm, u64 size, + u64 min_page_size) +{ + struct i915_vma *vma; + + vma = i915_vma_alloc(); + if (!vma) + return ERR_PTR(-ENOMEM); + + kref_init(&vma->ref); + mutex_init(&vma->pages_mutex); + vma->vm = i915_vm_get(vm); + vma->ops = &vm->vma_ops; + vma->obj = NULL; + vma->resv = NULL; + vma->size = size; + vma->display_alignment = I915_GTT_MIN_ALIGNMENT; + vma->page_sizes.sg = min_page_size; + + i915_active_init(&vma->active, __i915_vma_active, __i915_vma_retire); + INIT_LIST_HEAD(&vma->closed_link); + + GEM_BUG_ON(!IS_ALIGNED(vma->size, I915_GTT_PAGE_SIZE)); + GEM_BUG_ON(i915_is_ggtt(vm)); + + return vma; +} + +void i915_destroy_window_vma(struct i915_vma *vma) +{ + i915_active_fini(&vma->active); + i915_vm_put(vma->vm); + mutex_destroy(&vma->pages_mutex); + i915_vma_free(vma); +} + static struct i915_vma * vma_create(struct drm_i915_gem_object *obj, struct i915_address_space *vm, diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 2db4f25b8d5f..f595fe706010 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -44,6 +44,12 @@ i915_vma_instance(struct drm_i915_gem_object *obj, struct i915_address_space *vm, const struct i915_ggtt_view *view);
+struct i915_vma * +i915_alloc_window_vma(struct drm_i915_private *i915, + struct i915_address_space *vm, u64 size, + u64 min_page_size); +void i915_destroy_window_vma(struct i915_vma *vma); + void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags); #define I915_VMA_RELEASE_MAP BIT(0)
From: Ramalingam C ramalingam.c@intel.com
To avoid the locking issues in vma handling during the blt copy of objects, dummy vmas with sg_tables of size BLT_WINDOW_SZ (with WINDOW_SIZE / PAGE_SIZE number of scatterlists) are created in driver load for source and destination objects.
Two sets of these vmas are created, where one set is for lmem and another set is for smem to enable the blt copy of the contents of objects belongs to same mem regions or different mem regions
When blitter copy is required between objects, pages of size <=WINDOW_SIZE are assigned to the dummy_vma->pages for both source and destination objects into corrresponding windows.
Then PTE are created at runtime for the pages attached and batch commands are filled into the BCS ring for the pages filled in the window and request is submitted for the BCS0.
Until all pages of the source object are copied into destination object pages above process will be executed in a loop.
Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Suggested-by: Daniel Vetter daniel.vetter@ffwll.ch Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 355 +++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_object.h | 5 + drivers/gpu/drm/i915/i915_drv.c | 11 + drivers/gpu/drm/i915/i915_drv.h | 6 + 4 files changed, 377 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 46d0f8731db0..3943a184fbe3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -22,11 +22,13 @@ * */
+#include <drm/drm_print.h> #include <linux/sched/mm.h>
#include "display/intel_frontbuffer.h" #include "gt/intel_gt.h" #include "gt/intel_gt_requests.h" +#include "gt/intel_ring.h" #include "i915_drv.h" #include "i915_gem_clflush.h" #include "i915_gem_context.h" @@ -718,6 +720,359 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = { .export = i915_gem_prime_export, };
+#define BLT_WINDOW_SZ SZ_4M +static int i915_alloc_vm_range(struct i915_vma *vma) +{ + struct i915_vm_pt_stash stash = {}; + int err; + struct i915_gem_ww_ctx ww; + + err = i915_vm_alloc_pt_stash(vma->vm, &stash, vma->size); + if (err) + return err; + + for_i915_gem_ww(&ww, err, false) { + err = i915_vm_lock_objects(vma->vm, &ww); + if (err) + continue; + + dma_resv_assert_held(&vma->vm->resv); + + err = i915_vm_map_pt_stash(vma->vm, &stash); + if (err) + continue; + + vma->vm->allocate_va_range(vma->vm, &stash, + vma->node.start, vma->size); + + set_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma)); + /* Implicit unlock */ + } + + i915_vm_free_pt_stash(vma->vm, &stash); + + return err; +} + +static inline void i915_insert_vma_pages(struct i915_vma *vma, bool is_lmem) +{ + enum i915_cache_level cache_level = I915_CACHE_NONE; + + vma->vm->insert_entries(vma->vm, vma, cache_level, + is_lmem ? PTE_LM : 0); + wmb(); +} + +static struct i915_vma * +i915_window_vma_init(struct drm_i915_private *i915, + struct intel_memory_region *mem) +{ + struct intel_context *ce = i915->gt.engine[BCS0]->blitter_context; + struct i915_address_space *vm = ce->vm; + struct i915_vma *vma; + int ret; + + vma = i915_alloc_window_vma(i915, vm, BLT_WINDOW_SZ, + mem->min_page_size); + if (IS_ERR(vma)) { + DRM_ERROR("window vma alloc failed(%ld)\n", PTR_ERR(vma)); + return vma; + } + + vma->pages = kmalloc(sizeof(*vma->pages), GFP_KERNEL); + if (!vma->pages) { + ret = -ENOMEM; + DRM_ERROR("page alloc failed. %d", ret); + goto err_page; + } + + ret = sg_alloc_table(vma->pages, BLT_WINDOW_SZ / PAGE_SIZE, + GFP_KERNEL); + if (ret) { + DRM_ERROR("sg alloc table failed(%d)", ret); + goto err_sg_table; + } + + mutex_lock(&vm->mutex); + ret = drm_mm_insert_node_in_range(&vm->mm, &vma->node, + BLT_WINDOW_SZ, BLT_WINDOW_SZ, + I915_COLOR_UNEVICTABLE, + 0, vm->total, + DRM_MM_INSERT_LOW); + mutex_unlock(&vm->mutex); + if (ret) { + DRM_ERROR("drm_mm_insert_node_in_range failed. %d\n", ret); + goto err_mm_node; + } + + ret = i915_alloc_vm_range(vma); + if (ret) { + DRM_ERROR("src: Page table alloc failed(%d)\n", ret); + goto err_alloc; + } + + return vma; + +err_alloc: + mutex_lock(&vm->mutex); + drm_mm_remove_node(&vma->node); + mutex_unlock(&vm->mutex); +err_mm_node: + sg_free_table(vma->pages); +err_sg_table: + kfree(vma->pages); +err_page: + i915_destroy_window_vma(vma); + + return ERR_PTR(ret); +} + +static void i915_window_vma_teardown(struct i915_vma *vma) +{ + vma->vm->clear_range(vma->vm, vma->node.start, vma->size); + drm_mm_remove_node(&vma->node); + sg_free_table(vma->pages); + kfree(vma->pages); + i915_destroy_window_vma(vma); +} + +int i915_setup_blt_windows(struct drm_i915_private *i915) +{ + struct intel_memory_region *lmem_region = + intel_memory_region_by_type(i915, INTEL_MEMORY_LOCAL); + struct intel_memory_region *smem_region = + intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM); + struct i915_vma *lmem[2]; + struct i915_vma *smem[2]; + int ret, i; + + if (intel_gt_is_wedged(&i915->gt)) { + drm_dbg(&i915->drm, "GT0 is wedged; BCS0 not available\n"); + return -EIO; + } + + if (!i915->gt.engine[BCS0]) { + DRM_DEBUG("No BCS0 engine, hence blt evict is not setup\n"); + return 0; + } + + mutex_init(&i915->mm.window_mutex); + for (i = 0; i < ARRAY_SIZE(lmem); i++) { + lmem[i] = i915_window_vma_init(i915, lmem_region); + if (IS_ERR_OR_NULL(lmem[i])) { + ret = PTR_ERR(lmem[i]); + DRM_ERROR("Err for lmem[%d]. %d\n", i, ret); + if (i--) + for (; i >= 0; i--) + i915_window_vma_teardown(lmem[i]); + return ret; + } + i915->mm.lmem_window[i] = lmem[i]; + GEM_BUG_ON(!i915->mm.lmem_window[i]); + } + + for (i = 0; i < ARRAY_SIZE(smem); i++) { + smem[i] = i915_window_vma_init(i915, smem_region); + if (IS_ERR_OR_NULL(smem[i])) { + ret = PTR_ERR(smem[i]); + DRM_ERROR("Err for smem[%d]. %d\n", i, ret); + if (i--) + for (; i >= 0; i--) + i915_window_vma_teardown(smem[i]); + for (i = 0; i < ARRAY_SIZE(lmem); i++) + i915_window_vma_teardown(lmem[i]); + return ret; + } + i915->mm.smem_window[i] = smem[i]; + GEM_BUG_ON(!i915->mm.smem_window[i]); + } + + return 0; +} + +void i915_teardown_blt_windows(struct drm_i915_private *i915) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(i915->mm.lmem_window); i++) { + if (!i915->mm.lmem_window[i]) + continue; + i915_window_vma_teardown(i915->mm.lmem_window[i]); + } + for (i = 0; i < ARRAY_SIZE(i915->mm.smem_window); i++) { + if (!i915->mm.smem_window[i]) + continue; + i915_window_vma_teardown(i915->mm.smem_window[i]); + } + mutex_destroy(&i915->mm.window_mutex); +} + +static int i915_window_blt_copy_prepare_obj(struct drm_i915_gem_object *obj) +{ + int ret; + + ret = i915_gem_object_wait(obj, + I915_WAIT_INTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT); + if (ret) + return ret; + + return i915_gem_object_pin_pages(obj); +} + +static int +i915_window_blt_copy_batch_prepare(struct i915_request *rq, + struct i915_vma *src, + struct i915_vma *dst, size_t size) +{ + u32 *cmd; + + GEM_BUG_ON(size > BLT_WINDOW_SZ); + cmd = intel_ring_begin(rq, 10); + if (IS_ERR(cmd)) + return PTR_ERR(cmd); + + GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX); + GEM_BUG_ON(INTEL_GEN(rq->engine->i915) < 9); + + *cmd++ = GEN9_XY_FAST_COPY_BLT_CMD | (10 - 2); + *cmd++ = BLT_DEPTH_32 | PAGE_SIZE; + *cmd++ = 0; + *cmd++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; + *cmd++ = lower_32_bits(dst->node.start); + *cmd++ = upper_32_bits(dst->node.start); + *cmd++ = 0; + *cmd++ = PAGE_SIZE; + *cmd++ = lower_32_bits(src->node.start); + *cmd++ = upper_32_bits(src->node.start); + intel_ring_advance(rq, cmd); + + return 0; +} + +int i915_window_blt_copy(struct drm_i915_gem_object *dst, + struct drm_i915_gem_object *src) +{ + struct drm_i915_private *i915 = to_i915(src->base.dev); + struct intel_context *ce = i915->gt.engine[BCS0]->blitter_context; + bool src_is_lmem = i915_gem_object_is_lmem(src); + bool dst_is_lmem = i915_gem_object_is_lmem(dst); + struct scatterlist *last_sgl; + struct i915_vma *src_vma, *dst_vma; + struct i915_request *rq; + u64 cur_win_sz, blt_copied, offset; + long timeout; + u32 size; + int err; + + src_vma = src_is_lmem ? i915->mm.lmem_window[0] : + i915->mm.smem_window[0]; + dst_vma = dst_is_lmem ? i915->mm.lmem_window[1] : + i915->mm.smem_window[1]; + + if (!src_vma || !dst_vma) + return -ENODEV; + + blt_copied = 0; + + err = i915_window_blt_copy_prepare_obj(src); + if (err) + return err; + + err = i915_window_blt_copy_prepare_obj(dst); + if (err) { + i915_gem_object_unpin_pages(src); + return err; + } + + mutex_lock(&i915->mm.window_mutex); + src_vma->obj = src; + dst_vma->obj = dst; + do { + cur_win_sz = min_t(u64, BLT_WINDOW_SZ, + (src->base.size - blt_copied)); + offset = blt_copied >> PAGE_SHIFT; + size = ALIGN(cur_win_sz, src->mm.region->min_page_size) >> + PAGE_SHIFT; + intel_partial_pages_for_sg_table(src, src_vma->pages, offset, + size, &last_sgl); + + /* + * Insert pages into vm, expects the pages to the full + * length of VMA. But we may have the pages of <= vma_size. + * Hence altering the vma size to match the total size of + * the pages attached. + */ + src_vma->size = size << PAGE_SHIFT; + i915_insert_vma_pages(src_vma, src_is_lmem); + sg_unmark_end(last_sgl); + + /* + * Source obj size could be smaller than the dst obj size, + * due to the varying min_page_size of the mem regions the + * obj belongs to. But when we insert the pages into vm, + * the total size of the pages supposed to be multiples of + * the min page size of that mem region. + */ + size = ALIGN(cur_win_sz, dst->mm.region->min_page_size) >> + PAGE_SHIFT; + intel_partial_pages_for_sg_table(dst, dst_vma->pages, offset, + size, &last_sgl); + + dst_vma->size = size << PAGE_SHIFT; + i915_insert_vma_pages(dst_vma, dst_is_lmem); + sg_unmark_end(last_sgl); + + rq = i915_request_create(ce); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + break; + } + if (rq->engine->emit_init_breadcrumb) { + err = rq->engine->emit_init_breadcrumb(rq); + if (unlikely(err)) { + DRM_ERROR("init_breadcrumb failed. %d\n", err); + break; + } + } + err = i915_window_blt_copy_batch_prepare(rq, src_vma, dst_vma, + cur_win_sz); + if (err) { + DRM_ERROR("Batch preparation failed. %d\n", err); + i915_request_set_error_once(rq, -EIO); + } + + i915_request_get(rq); + i915_request_add(rq); + + timeout = i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT); + if (timeout < 0) { + DRM_ERROR("BLT Request is not completed. %ld\n", + timeout); + err = timeout; + i915_request_put(rq); + break; + } + + blt_copied += cur_win_sz; + err = 0; + i915_request_put(rq); + flush_work(&i915->gt.engine[BCS0]->retire_work); + } while (src->base.size != blt_copied); + + src_vma->size = BLT_WINDOW_SZ; + dst_vma->size = BLT_WINDOW_SZ; + src_vma->obj = NULL; + dst_vma->obj = NULL; + mutex_unlock(&i915->mm.window_mutex); + + dst->mm.dirty = true; + i915_gem_object_unpin_pages(src); + i915_gem_object_unpin_pages(dst); + + return err; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/huge_gem_object.c" #include "selftests/huge_pages.c" diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index ee1914ed2070..52a36b4052f0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -598,4 +598,9 @@ static inline int i915_gem_object_userptr_submit_done(struct drm_i915_gem_object static inline void i915_gem_object_userptr_submit_fini(struct drm_i915_gem_object *obj) { GEM_BUG_ON(1); } #endif
+int i915_window_blt_copy(struct drm_i915_gem_object *dst, + struct drm_i915_gem_object *src); +int i915_setup_blt_windows(struct drm_i915_private *i915); +void i915_teardown_blt_windows(struct drm_i915_private *i915); + #endif diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index f4540c048cd9..683643b211fa 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -891,6 +891,12 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
i915_driver_register(i915);
+ if (HAS_LMEM(i915)) { + ret = i915_setup_blt_windows(i915); + if (ret) + goto out_cleanup_drv_register; + } + enable_rpm_wakeref_asserts(&i915->runtime_pm);
i915_welcome_messages(i915); @@ -899,6 +905,8 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;
+out_cleanup_drv_register: + i915_driver_unregister(i915); out_cleanup_gem: i915_gem_suspend(i915); i915_gem_driver_remove(i915); @@ -931,6 +939,9 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
void i915_driver_remove(struct drm_i915_private *i915) { + if (HAS_LMEM(i915)) + i915_teardown_blt_windows(i915); + disable_rpm_wakeref_asserts(&i915->runtime_pm);
i915_driver_unregister(i915); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 10823abab224..07da059640a1 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -595,6 +595,12 @@ struct i915_gem_mm { /* shrinker accounting, also useful for userland debugging */ u64 shrink_memory; u32 shrink_count; + + struct i915_vma *lmem_window[2]; + struct i915_vma *smem_window[2]; + + /* To protect above two set of vmas */ + struct mutex window_mutex; };
#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
Quoting Matthew Auld (2020-11-27 12:06:53)
+int i915_window_blt_copy(struct drm_i915_gem_object *dst,
struct drm_i915_gem_object *src)
+{
struct drm_i915_private *i915 = to_i915(src->base.dev);
struct intel_context *ce = i915->gt.engine[BCS0]->blitter_context;
bool src_is_lmem = i915_gem_object_is_lmem(src);
bool dst_is_lmem = i915_gem_object_is_lmem(dst);
struct scatterlist *last_sgl;
struct i915_vma *src_vma, *dst_vma;
struct i915_request *rq;
u64 cur_win_sz, blt_copied, offset;
long timeout;
u32 size;
int err;
src_vma = src_is_lmem ? i915->mm.lmem_window[0] :
i915->mm.smem_window[0];
dst_vma = dst_is_lmem ? i915->mm.lmem_window[1] :
i915->mm.smem_window[1];
if (!src_vma || !dst_vma)
return -ENODEV;
blt_copied = 0;
err = i915_window_blt_copy_prepare_obj(src);
if (err)
return err;
err = i915_window_blt_copy_prepare_obj(dst);
if (err) {
i915_gem_object_unpin_pages(src);
return err;
}
mutex_lock(&i915->mm.window_mutex);
src_vma->obj = src;
dst_vma->obj = dst;
do {
cur_win_sz = min_t(u64, BLT_WINDOW_SZ,
(src->base.size - blt_copied));
offset = blt_copied >> PAGE_SHIFT;
size = ALIGN(cur_win_sz, src->mm.region->min_page_size) >>
PAGE_SHIFT;
intel_partial_pages_for_sg_table(src, src_vma->pages, offset,
size, &last_sgl);
/*
* Insert pages into vm, expects the pages to the full
* length of VMA. But we may have the pages of <= vma_size.
* Hence altering the vma size to match the total size of
* the pages attached.
*/
src_vma->size = size << PAGE_SHIFT;
i915_insert_vma_pages(src_vma, src_is_lmem);
sg_unmark_end(last_sgl);
/*
* Source obj size could be smaller than the dst obj size,
* due to the varying min_page_size of the mem regions the
* obj belongs to. But when we insert the pages into vm,
* the total size of the pages supposed to be multiples of
* the min page size of that mem region.
*/
size = ALIGN(cur_win_sz, dst->mm.region->min_page_size) >>
PAGE_SHIFT;
intel_partial_pages_for_sg_table(dst, dst_vma->pages, offset,
size, &last_sgl);
dst_vma->size = size << PAGE_SHIFT;
i915_insert_vma_pages(dst_vma, dst_is_lmem);
sg_unmark_end(last_sgl);
rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
break;
}
if (rq->engine->emit_init_breadcrumb) {
err = rq->engine->emit_init_breadcrumb(rq);
if (unlikely(err)) {
DRM_ERROR("init_breadcrumb failed. %d\n", err);
break;
}
}
err = i915_window_blt_copy_batch_prepare(rq, src_vma, dst_vma,
cur_win_sz);
if (err) {
DRM_ERROR("Batch preparation failed. %d\n", err);
i915_request_set_error_once(rq, -EIO);
}
i915_request_get(rq);
i915_request_add(rq);
timeout = i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
Locked waits.
if (timeout < 0) {
DRM_ERROR("BLT Request is not completed. %ld\n",
timeout);
err = timeout;
i915_request_put(rq);
break;
}
blt_copied += cur_win_sz;
err = 0;
i915_request_put(rq);
flush_work(&i915->gt.engine[BCS0]->retire_work);
Papering (doubtful the paper is successful) over bugs by introducing a whole load more.
This fails the basic premise that eviction must be pipelined. The PTE are transient and can be written prior to the copy and kept within the non-preemptible window of the blt. Thus allowing many evictions to scheduled in parallel (by either allocating separate contexts, or more preferably picking a user context). -Chris
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
We can eliminate the current evict window mutex, held over the whole eviction process, and replace it with a wait queue which takes over the role of co-ordinating access to pre-configured window copy vmas.
Apart from the global lock not being held over whole of the copy, additional benefit is that, since we have two pairs of copy windows, two evict operations can now progress independently. (One swap-in plus one swap-out.)
Also consolidate some of the eviction code into helper functions for readability and fix cleanup if emit_init_breadcrumb fails.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 144 ++++++++++++--------- drivers/gpu/drm/i915/i915_drv.h | 2 +- 2 files changed, 85 insertions(+), 61 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 3943a184fbe3..34bbefa6d67f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -856,7 +856,8 @@ int i915_setup_blt_windows(struct drm_i915_private *i915) return 0; }
- mutex_init(&i915->mm.window_mutex); + init_waitqueue_head(&i915->mm.window_queue); + for (i = 0; i < ARRAY_SIZE(lmem); i++) { lmem[i] = i915_window_vma_init(i915, lmem_region); if (IS_ERR_OR_NULL(lmem[i])) { @@ -904,7 +905,6 @@ void i915_teardown_blt_windows(struct drm_i915_private *i915) continue; i915_window_vma_teardown(i915->mm.smem_window[i]); } - mutex_destroy(&i915->mm.window_mutex); }
static int i915_window_blt_copy_prepare_obj(struct drm_i915_gem_object *obj) @@ -950,6 +950,36 @@ i915_window_blt_copy_batch_prepare(struct i915_request *rq, return 0; }
+static void prepare_vma(struct i915_vma *vma, + struct drm_i915_gem_object *obj, + u32 offset, + u32 chunk, + bool is_lmem) +{ + struct scatterlist *sgl; + u32 size; + + /* + * Source obj size could be smaller than the dst obj size, + * due to the varying min_page_size of the mem regions the + * obj belongs to. But when we insert the pages into vm, + * the total size of the pages supposed to be multiples of + * the min page size of that mem region. + */ + size = ALIGN(chunk, obj->mm.region->min_page_size) >> PAGE_SHIFT; + intel_partial_pages_for_sg_table(obj, vma->pages, offset, size, &sgl); + + /* + * Insert pages into vm, expects the pages to the full + * length of VMA. But we may have the pages of <= vma_size. + * Hence altering the vma size to match the total size of + * the pages attached. + */ + vma->size = size << PAGE_SHIFT; + i915_insert_vma_pages(vma, is_lmem); + sg_unmark_end(sgl); +} + int i915_window_blt_copy(struct drm_i915_gem_object *dst, struct drm_i915_gem_object *src) { @@ -957,24 +987,10 @@ int i915_window_blt_copy(struct drm_i915_gem_object *dst, struct intel_context *ce = i915->gt.engine[BCS0]->blitter_context; bool src_is_lmem = i915_gem_object_is_lmem(src); bool dst_is_lmem = i915_gem_object_is_lmem(dst); - struct scatterlist *last_sgl; - struct i915_vma *src_vma, *dst_vma; - struct i915_request *rq; - u64 cur_win_sz, blt_copied, offset; - long timeout; - u32 size; + u64 remain = src->base.size, offset = 0; + struct i915_vma *src_vma, *dst_vma, **ps, **pd; int err;
- src_vma = src_is_lmem ? i915->mm.lmem_window[0] : - i915->mm.smem_window[0]; - dst_vma = dst_is_lmem ? i915->mm.lmem_window[1] : - i915->mm.smem_window[1]; - - if (!src_vma || !dst_vma) - return -ENODEV; - - blt_copied = 0; - err = i915_window_blt_copy_prepare_obj(src); if (err) return err; @@ -985,43 +1001,42 @@ int i915_window_blt_copy(struct drm_i915_gem_object *dst, return err; }
- mutex_lock(&i915->mm.window_mutex); + ps = src_is_lmem ? &i915->mm.lmem_window[0] : + &i915->mm.smem_window[0]; + pd = dst_is_lmem ? &i915->mm.lmem_window[1] : + &i915->mm.smem_window[1]; + + spin_lock(&i915->mm.window_queue.lock); + + err = wait_event_interruptible_locked(i915->mm.window_queue, + *ps && *pd); + if (err) { + spin_unlock(&i915->mm.window_queue.lock); + i915_gem_object_unpin_pages(src); + i915_gem_object_unpin_pages(dst); + return err; + } + + src_vma = *ps; + dst_vma = *pd; + src_vma->obj = src; dst_vma->obj = dst; - do { - cur_win_sz = min_t(u64, BLT_WINDOW_SZ, - (src->base.size - blt_copied)); - offset = blt_copied >> PAGE_SHIFT; - size = ALIGN(cur_win_sz, src->mm.region->min_page_size) >> - PAGE_SHIFT; - intel_partial_pages_for_sg_table(src, src_vma->pages, offset, - size, &last_sgl);
- /* - * Insert pages into vm, expects the pages to the full - * length of VMA. But we may have the pages of <= vma_size. - * Hence altering the vma size to match the total size of - * the pages attached. - */ - src_vma->size = size << PAGE_SHIFT; - i915_insert_vma_pages(src_vma, src_is_lmem); - sg_unmark_end(last_sgl); + *ps = NULL; + *pd = NULL;
- /* - * Source obj size could be smaller than the dst obj size, - * due to the varying min_page_size of the mem regions the - * obj belongs to. But when we insert the pages into vm, - * the total size of the pages supposed to be multiples of - * the min page size of that mem region. - */ - size = ALIGN(cur_win_sz, dst->mm.region->min_page_size) >> - PAGE_SHIFT; - intel_partial_pages_for_sg_table(dst, dst_vma->pages, offset, - size, &last_sgl); + spin_unlock(&i915->mm.window_queue.lock); + + do { + struct i915_request *rq; + long timeout; + u32 chunk;
- dst_vma->size = size << PAGE_SHIFT; - i915_insert_vma_pages(dst_vma, dst_is_lmem); - sg_unmark_end(last_sgl); + chunk = min_t(u64, BLT_WINDOW_SZ, remain); + + prepare_vma(src_vma, src, offset, chunk, src_is_lmem); + prepare_vma(dst_vma, dst, offset, chunk, dst_is_lmem);
rq = i915_request_create(ce); if (IS_ERR(rq)) { @@ -1032,11 +1047,14 @@ int i915_window_blt_copy(struct drm_i915_gem_object *dst, err = rq->engine->emit_init_breadcrumb(rq); if (unlikely(err)) { DRM_ERROR("init_breadcrumb failed. %d\n", err); + i915_request_set_error_once(rq, err); + __i915_request_skip(rq); + i915_request_add(rq); break; } } err = i915_window_blt_copy_batch_prepare(rq, src_vma, dst_vma, - cur_win_sz); + chunk); if (err) { DRM_ERROR("Batch preparation failed. %d\n", err); i915_request_set_error_once(rq, -EIO); @@ -1045,26 +1063,32 @@ int i915_window_blt_copy(struct drm_i915_gem_object *dst, i915_request_get(rq); i915_request_add(rq);
- timeout = i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT); - if (timeout < 0) { + if (!err) + timeout = i915_request_wait(rq, 0, + MAX_SCHEDULE_TIMEOUT); + i915_request_put(rq); + if (!err && timeout < 0) { DRM_ERROR("BLT Request is not completed. %ld\n", timeout); err = timeout; - i915_request_put(rq); break; }
- blt_copied += cur_win_sz; - err = 0; - i915_request_put(rq); - flush_work(&i915->gt.engine[BCS0]->retire_work); - } while (src->base.size != blt_copied); + remain -= chunk; + offset += chunk >> PAGE_SHIFT; + + flush_work(&ce->engine->retire_work); + } while (remain);
+ spin_lock(&i915->mm.window_queue.lock); src_vma->size = BLT_WINDOW_SZ; dst_vma->size = BLT_WINDOW_SZ; src_vma->obj = NULL; dst_vma->obj = NULL; - mutex_unlock(&i915->mm.window_mutex); + *ps = src_vma; + *pd = dst_vma; + wake_up_locked(&i915->mm.window_queue); + spin_unlock(&i915->mm.window_queue.lock);
dst->mm.dirty = true; i915_gem_object_unpin_pages(src); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 07da059640a1..82f431cc38cd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -600,7 +600,7 @@ struct i915_gem_mm { struct i915_vma *smem_window[2];
/* To protect above two set of vmas */ - struct mutex window_mutex; + wait_queue_head_t window_queue; };
#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Hold blitter engine power reference across the whole copy operation for efficiency.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 34bbefa6d67f..c84443e01ef1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -1028,6 +1028,8 @@ int i915_window_blt_copy(struct drm_i915_gem_object *dst,
spin_unlock(&i915->mm.window_queue.lock);
+ intel_engine_pm_get(ce->engine); + do { struct i915_request *rq; long timeout; @@ -1080,6 +1082,8 @@ int i915_window_blt_copy(struct drm_i915_gem_object *dst, flush_work(&ce->engine->retire_work); } while (remain);
+ intel_engine_pm_put(ce->engine); + spin_lock(&i915->mm.window_queue.lock); src_vma->size = BLT_WINDOW_SZ; dst_vma->size = BLT_WINDOW_SZ;
From: Ramalingam C ramalingam.c@intel.com
window_blt_copy feature is used for swapin and swapout based on the i915 module parameter called enable_eviction.
Signed-off-by: Ramalingam C ramalingam.c@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_region.c | 14 ++++++++++---- drivers/gpu/drm/i915/i915_drv.c | 4 ++-- drivers/gpu/drm/i915/i915_params.c | 6 ++++-- drivers/gpu/drm/i915/i915_params.h | 2 +- 4 files changed, 17 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 4fab9f6b4bee..f9ff0aa31752 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -16,7 +16,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src; unsigned long start, diff, msec; - int err; + int err = -EINVAL;
GEM_BUG_ON(obj->swapto); GEM_BUG_ON(i915_gem_object_has_pages(obj)); @@ -54,7 +54,10 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, __i915_gem_object_pin_pages(src);
/* copying the pages */ - err = i915_gem_object_memcpy(dst, src); + if (i915->params.enable_eviction >= 2) + err = i915_window_blt_copy(dst, src); + if (err && i915->params.enable_eviction != 2) + err = i915_gem_object_memcpy(dst, src);
__i915_gem_object_unpin_pages(src); __i915_gem_object_unset_pages(src); @@ -83,7 +86,7 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src; unsigned long start, diff, msec; - int err; + int err = -EINVAL;
GEM_BUG_ON(!obj->swapto); GEM_BUG_ON(i915_gem_object_has_pages(obj)); @@ -117,7 +120,10 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, __i915_gem_object_pin_pages(dst);
/* copying the pages */ - err = i915_gem_object_memcpy(dst, src); + if (i915->params.enable_eviction >= 2) + err = i915_window_blt_copy(dst, src); + if (err && i915->params.enable_eviction != 2) + err = i915_gem_object_memcpy(dst, src);
__i915_gem_object_unpin_pages(dst); __i915_gem_object_unset_pages(dst); diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 683643b211fa..78b528e89486 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -891,7 +891,7 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
i915_driver_register(i915);
- if (HAS_LMEM(i915)) { + if (HAS_LMEM(i915) && i915->params.enable_eviction >= 2) { ret = i915_setup_blt_windows(i915); if (ret) goto out_cleanup_drv_register; @@ -939,7 +939,7 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
void i915_driver_remove(struct drm_i915_private *i915) { - if (HAS_LMEM(i915)) + if (HAS_LMEM(i915) && i915->params.enable_eviction >= 2) i915_teardown_blt_windows(i915);
disable_rpm_wakeref_asserts(&i915->runtime_pm); diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 264de32f3d6a..9fa58ed76614 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -197,8 +197,10 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400, "Fake LMEM start offset (default: 0)"); #endif
-i915_param_named_unsafe(enable_eviction, bool, 0600, - "Enable memcpy based eviction which does not rely on DMA resv refactoring)"); +i915_param_named_unsafe(enable_eviction, uint, 0600, + "Enable eviction which does not rely on DMA resv refactoring " + "0=disabled, 1=memcpy based only, 2=blt based only, " + "3=blt based but fallsback to memcpy based [default])");
i915_param_named_unsafe(lmem_size, uint, 0400, "Change lmem size for each region. (default: 0, all memory)"); diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h index be6979e7feda..c835e592ee5f 100644 --- a/drivers/gpu/drm/i915/i915_params.h +++ b/drivers/gpu/drm/i915/i915_params.h @@ -72,8 +72,8 @@ struct drm_printer; param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \ param(unsigned long, fake_lmem_start, 0, 0400) \ param(unsigned int, lmem_size, 0, 0400) \ + param(unsigned int, enable_eviction, 3, 0600) \ /* leave bools at the end to not create holes */ \ - param(bool, enable_eviction, true, 0600) \ param(bool, enable_hangcheck, true, 0600) \ param(bool, load_detect_test, false, 0600) \ param(bool, force_reset_modeset_test, false, 0600) \
Quoting Matthew Auld (2020-11-27 12:06:56)
From: Ramalingam C ramalingam.c@intel.com
window_blt_copy feature is used for swapin and swapout based on the i915 module parameter called enable_eviction.
A module parameter? -Chris
From: Ramalingam C ramalingam.c@intel.com
Number of bytes swapped in and out are captured for both blitter and memcpy based evictions with time taken for the process.
Debugfs is extended to provide the eviction statistics through both methods with rate of transfer.
Signed-off-by: Ramalingam C ramalingam.c@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_region.c | 32 +++++++++++++--- drivers/gpu/drm/i915/i915_debugfs.c | 43 +++++++++++++++++++--- drivers/gpu/drm/i915/i915_drv.h | 5 +++ 3 files changed, 68 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index f9ff0aa31752..1ec6528498c8 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -16,6 +16,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src; unsigned long start, diff, msec; + bool blt_completed = false; int err = -EINVAL;
GEM_BUG_ON(obj->swapto); @@ -54,8 +55,11 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, __i915_gem_object_pin_pages(src);
/* copying the pages */ - if (i915->params.enable_eviction >= 2) + if (i915->params.enable_eviction >= 2) { err = i915_window_blt_copy(dst, src); + if (!err) + blt_completed = true; + } if (err && i915->params.enable_eviction != 2) err = i915_gem_object_memcpy(dst, src);
@@ -72,8 +76,14 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, if (!err) { diff = jiffies - start; msec = diff * 1000 / HZ; - atomic_long_add(msec, &i915->time_swap_out_ms); - atomic_long_add(sizes, &i915->num_bytes_swapped_out); + if (blt_completed) { + atomic_long_add(sizes, &i915->num_bytes_swapped_out); + atomic_long_add(msec, &i915->time_swap_out_ms); + } else { + atomic_long_add(sizes, + &i915->num_bytes_swapped_out_memcpy); + atomic_long_add(msec, &i915->time_swap_out_ms_memcpy); + } }
return err; @@ -86,6 +96,7 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, struct drm_i915_private *i915 = to_i915(obj->base.dev); struct drm_i915_gem_object *dst, *src; unsigned long start, diff, msec; + bool blt_completed = false; int err = -EINVAL;
GEM_BUG_ON(!obj->swapto); @@ -120,8 +131,11 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, __i915_gem_object_pin_pages(dst);
/* copying the pages */ - if (i915->params.enable_eviction >= 2) + if (i915->params.enable_eviction >= 2) { err = i915_window_blt_copy(dst, src); + if (!err) + blt_completed = true; + } if (err && i915->params.enable_eviction != 2) err = i915_gem_object_memcpy(dst, src);
@@ -138,8 +152,14 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, if (!err) { diff = jiffies - start; msec = diff * 1000 / HZ; - atomic_long_add(msec, &i915->time_swap_in_ms); - atomic_long_add(sizes, &i915->num_bytes_swapped_in); + if (blt_completed) { + atomic_long_add(sizes, &i915->num_bytes_swapped_in); + atomic_long_add(msec, &i915->time_swap_in_ms); + } else { + atomic_long_add(sizes, + &i915->num_bytes_swapped_in_memcpy); + atomic_long_add(msec, &i915->time_swap_in_ms_memcpy); + } }
return err; diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 2bf51dd9de7c..983030ac39e1 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -364,6 +364,7 @@ static int i915_gem_object_info(struct seq_file *m, void *data) struct drm_i915_private *i915 = node_to_i915(m->private); struct intel_memory_region *mr; enum intel_region_id id; + u64 time, bytes, rate;
seq_printf(m, "%u shrinkable [%u free] objects, %llu bytes\n", i915->mm.shrink_count, @@ -372,12 +373,42 @@ static int i915_gem_object_info(struct seq_file *m, void *data) for_each_memory_region(mr, i915, id) seq_printf(m, "%s: total:%pa, available:%pa bytes\n", mr->name, &mr->total, &mr->avail); - seq_printf(m, "num_bytes_swapped_out %ld num_bytes_swapped_in %ld\n", - atomic_long_read(&i915->num_bytes_swapped_out), - atomic_long_read(&i915->num_bytes_swapped_in)); - seq_printf(m, "time_swap_out_msec %ld time_swap_in_msec %ld\n", - atomic_long_read(&i915->time_swap_out_ms), - atomic_long_read(&i915->time_swap_in_ms)); + + time = atomic_long_read(&i915->time_swap_out_ms); + bytes = atomic_long_read(&i915->num_bytes_swapped_out); + if (time) + rate = div64_u64(bytes * 1000, time * 1024 * 1024); + else + rate = 0; + seq_printf(m, "BLT: swapout %llu Bytes in %llu mSec(%llu MB/Sec)\n", + bytes, time, rate); + + time = atomic_long_read(&i915->time_swap_in_ms); + bytes = atomic_long_read(&i915->num_bytes_swapped_in); + if (time) + rate = div64_u64(bytes * 1000, time * 1024 * 1024); + else + rate = 0; + seq_printf(m, "BLT: swapin %llu Bytes in %llu mSec(%llu MB/Sec)\n", + bytes, time, rate); + + time = atomic_long_read(&i915->time_swap_out_ms_memcpy); + bytes = atomic_long_read(&i915->num_bytes_swapped_out_memcpy); + if (time) + rate = div64_u64(bytes * 1000, time * 1024 * 1024); + else + rate = 0; + seq_printf(m, "Memcpy: swapout %llu Bytes in %llu mSec(%llu MB/Sec)\n", + bytes, time, rate); + + time = atomic_long_read(&i915->time_swap_in_ms_memcpy); + bytes = atomic_long_read(&i915->num_bytes_swapped_in_memcpy); + if (time) + rate = div64_u64(bytes * 1000, time * 1024 * 1024); + else + rate = 0; + seq_printf(m, "Memcpy: swapin %llu Bytes in %llu mSec(%llu MB/Sec)\n", + bytes, time, rate); seq_putc(m, '\n');
print_context_stats(m, i915); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 82f431cc38cd..6f0ab363bdee 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1225,6 +1225,11 @@ struct drm_i915_private { atomic_long_t num_bytes_swapped_in; atomic_long_t time_swap_out_ms; atomic_long_t time_swap_in_ms; + + atomic_long_t num_bytes_swapped_out_memcpy; + atomic_long_t num_bytes_swapped_in_memcpy; + atomic_long_t time_swap_out_ms_memcpy; + atomic_long_t time_swap_in_ms_memcpy; };
static inline struct drm_i915_private *to_i915(const struct drm_device *dev)
Quoting Matthew Auld (2020-11-27 12:06:57)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 82f431cc38cd..6f0ab363bdee 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1225,6 +1225,11 @@ struct drm_i915_private { atomic_long_t num_bytes_swapped_in; atomic_long_t time_swap_out_ms; atomic_long_t time_swap_in_ms;
atomic_long_t num_bytes_swapped_out_memcpy;
atomic_long_t num_bytes_swapped_in_memcpy;
atomic_long_t time_swap_out_ms_memcpy;
atomic_long_t time_swap_in_ms_memcpy;
See earlier comments about why this will be rejected. -Chris
From: Ramalingam C ramalingam.c@intel.com
Selftest live_blt_evict is written to create an lmem and smem objects and copy lmem into smem obj using the window based blt copy used for lmem eviction.
And we test for range of object size from 4K to 64M with different usecase scenario w.r.t to window size.
Signed-off-by: Ramalingam C ramalingam.c@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com --- .../i915/gem/selftests/i915_gem_object_blt.c | 166 ++++++++++++++++++ .../drm/i915/selftests/i915_live_selftests.h | 1 + 2 files changed, 167 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c index ee9496f3d11d..4f7941dea291 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c @@ -16,6 +16,7 @@ #include "selftests/mock_drm.h" #include "huge_gem_object.h" #include "mock_context.h" +#include "gem/i915_gem_region.h"
static int wrap_ktime_compare(const void *A, const void *B) { @@ -568,6 +569,171 @@ static int igt_copy_blt_ctx0(void *arg) return test_copy_engines(arg, igt_copy_blt_thread, SINGLE_CTX); }
+static int __igt_obj_window_blt_copy(struct drm_i915_private *i915, + struct intel_memory_region *src_mem, + struct intel_memory_region *dst_mem, + u64 size) +{ + struct drm_i915_gem_object *src, *dst; + ktime_t t0, t1; + u32 *vaddr, i; + int err; + + src = i915_gem_object_create_region(src_mem, size, 0); + if (IS_ERR(src)) { + err = PTR_ERR(src); + goto err; + } + size = max_t(u64, size, src->base.size); + i915_gem_object_lock_isolated(src); + + dst = i915_gem_object_create_region(dst_mem, size, 0); + if (IS_ERR(dst)) { + err = PTR_ERR(dst); + goto err_put_src; + } + + i915_gem_object_lock_isolated(dst); + + vaddr = i915_gem_object_pin_map(src, + i915_coherent_map_type(i915, src, true)); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + pr_err("Failed at pin map of src. %d\n", err); + goto err_put_dst; + } + + for (i = 0; i < size / sizeof(u32); i++) + vaddr[i] = i; + + if (!(src->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ)) + src->cache_dirty = true; + + vaddr = i915_gem_object_pin_map(dst, + i915_coherent_map_type(i915, dst, true)); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + pr_err("Failed at pin map of dst. %d\n", err); + goto err_unpin_src; + } + memset32(vaddr, 0xdeadbeaf, size / sizeof(u32)); + + if (!(dst->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE)) + dst->cache_dirty = true; + + /* + * FIXME: Blitter based eviction is failing occasionally due to + * trylock approach. To avoid the selftest failure due to trylocks, + * we are adding retries with a delay inn between. + * Retry count and delay are fixed on trial and error basis. + * As soon as trylocks are removed from blt eviction, we should + * remove this retry attempts. + */ +#define WINDOW_BLT_COPY_RETRY 3 + for (i = 0; i <= WINDOW_BLT_COPY_RETRY; i++) { + t0 = ktime_get(); + err = i915_window_blt_copy(dst, src); + if (err == -EBUSY) + msleep(1); + else + break; + } + + if (err) + goto err_unpin_dst; + + t1 = ktime_sub(ktime_get(), t0); + pr_info("blt of %zd KiB at %lld MiB/s\n", src->base.size >> 10, + div64_u64(mul_u32_u32(src->base.size, 1000 * 1000 * 1000), + t1) >> 20); + + for (i = 0; i < size / sizeof(u32); i += 17) { + if (!(dst->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ)) + drm_clflush_virt_range(&vaddr[i], sizeof(vaddr[i])); + + if (vaddr[i] != i) { + pr_err("vaddr[%u]=%x, expected=%x\n", i, + vaddr[i], i); + err = -EINVAL; + goto err_unpin_dst; + } + } + +err_unpin_dst: + i915_gem_object_unpin_map(dst); +err_unpin_src: + i915_gem_object_unpin_map(src); +err_put_dst: + i915_gem_object_unlock(dst); + i915_gem_object_put(dst); +err_put_src: + i915_gem_object_unlock(src); + i915_gem_object_put(src); +err: + if (err == -ENODEV) + err = 0; + return err; +} + +static int igt_obj_window_blt_copy(void *data) +{ + struct drm_i915_private *i915 = data; + u64 size[] = {SZ_2K, SZ_4K, SZ_64K, SZ_4M, SZ_8M + SZ_2K, SZ_64M}; + struct intel_memory_region *lmem = + intel_memory_region_by_type(i915, INTEL_MEMORY_LOCAL); + struct intel_memory_region *smem = + intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM); + int i, ret; + + for (i = 0; i < ARRAY_SIZE(size); i++) { + ret = __igt_obj_window_blt_copy(i915, lmem, lmem, size[i]); + if (ret < 0) { + pr_err("%s: Failed at lmem->lmem size: %llu, err: %d\n", + __func__, size[i], ret); + break; + } + ret = __igt_obj_window_blt_copy(i915, smem, smem, size[i]); + if (ret < 0) { + pr_err("%s: Failed at smem->smem size: %llu, err: %d\n", + __func__, size[i], ret); + break; + } + ret = __igt_obj_window_blt_copy(i915, lmem, smem, size[i]); + if (ret < 0) { + pr_err("%s: Failed at lmem->smem size: %llu, err: %d\n", + __func__, size[i], ret); + break; + } + + ret = __igt_obj_window_blt_copy(i915, smem, lmem, size[i]); + if (ret < 0) { + pr_err("%s: Failed at smem->lmem size: %llu, err: %d\n", + __func__, size[i], ret); + break; + } + } + + return ret; +} + +int i915_obj_window_blt_copy_live_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(igt_obj_window_blt_copy), + }; + + if (intel_gt_is_wedged(&i915->gt)) + return 0; + + if (!HAS_ENGINE(&i915->gt, BCS0)) + return 0; + + if (!HAS_LMEM(i915)) + return 0; + + return i915_live_subtests(tests, i915); +} + int i915_gem_object_blt_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h index a92c0e9b7e6b..2bf900f5d8b0 100644 --- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h +++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h @@ -39,6 +39,7 @@ selftest(hugepages, i915_gem_huge_page_live_selftests) selftest(gem_contexts, i915_gem_context_live_selftests) selftest(gem_execbuf, i915_gem_execbuffer_live_selftests) selftest(blt, i915_gem_object_blt_live_selftests) +selftest(win_blt_copy, i915_obj_window_blt_copy_live_selftests) selftest(client, i915_gem_client_blt_live_selftests) selftest(reset, intel_reset_live_selftests) selftest(memory_region, intel_memory_region_live_selftests)
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
As the initial phase of implementation, when the system in idle, copying the user objects from LMEM to SMEM during suspend and restoring back in resume. In present implementation using memcpy based eviction during swapout/swapin of objects. To test the functionality, suspend is initiated as part of igt application.
Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com Cc: CQ Tang cq.tang@intel.com --- .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 + drivers/gpu/drm/i915/i915_drv.c | 83 +++++++++++++++++++ 2 files changed, 86 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index e9f42d3137b3..331d113f7d5b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -322,6 +322,9 @@ struct drm_i915_gem_object { */ bool do_swapping; struct drm_i915_gem_object *swapto; + + /** mark evicted object during suspend */ + bool evicted; };
static inline struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 78b528e89486..e8c4931fc818 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1102,11 +1102,86 @@ static int i915_drm_prepare(struct drm_device *dev) return 0; }
+static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) +{ + struct drm_i915_private *i915 = to_i915(dev); + struct drm_i915_gem_object *obj; + struct intel_memory_region *mem; + int id, ret = 0; + + /* + * FIXME: Presently using memcpy, + * will replace with blitter once + * fix the issues. + */ + i915->params.enable_eviction = 1; + + for_each_memory_region(mem, i915, id) { + struct list_head still_in_list; + INIT_LIST_HEAD(&still_in_list); + if (mem->type == INTEL_MEMORY_LOCAL && mem->total) { + mutex_lock(&mem->objects.lock); + while ((obj = list_first_entry_or_null(&mem->objects.list, + typeof(*obj), + mm.region_link))) { + + list_move_tail(&obj->mm.region_link, &still_in_list); + + if (!i915_gem_object_has_pages(obj) && in_suspend) + continue; + + /* Ignore previously evicted objects */ + if (obj->swapto && in_suspend) + continue; + + mutex_unlock(&mem->objects.lock); + + if (in_suspend) + i915_gem_object_unbind(obj, 0); + + if (in_suspend) { + obj->swapto = NULL; + obj->evicted = false; + obj->do_swapping = true; + ret = __i915_gem_object_put_pages(obj); + obj->do_swapping = false; + if (ret) { + /* + * FIXME: internal ctx objects still pinned + * returning as BUSY. Presently just evicting + * the user objects, will fix it later + */ + obj->evicted = false; + ret = 0; + } else + obj->evicted = true; + } else { + if (obj->swapto && obj->evicted) { + ret = i915_gem_object_pin_pages(obj); + if (ret) { + i915_gem_object_put(obj); + } else { + i915_gem_object_unpin_pages(obj); + obj->evicted = false; + } + } + } + mutex_lock(&mem->objects.lock); + } + list_splice_tail(&still_in_list, &mem->objects.list); + mutex_unlock(&mem->objects.lock); + } + } + i915->params.enable_eviction = 3; + return ret; +} + static int i915_drm_suspend(struct drm_device *dev) { struct drm_i915_private *dev_priv = to_i915(dev); struct pci_dev *pdev = dev_priv->drm.pdev; pci_power_t opregion_target_state; + int ret = 0;
disable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
@@ -1138,6 +1213,10 @@ static int i915_drm_suspend(struct drm_device *dev)
intel_fbdev_set_suspend(dev, FBINFO_STATE_SUSPENDED, true);
+ ret = intel_dmem_evict_buffers(dev, true); + if (ret) + return ret; + dev_priv->suspend_count++;
intel_csr_ucode_suspend(dev_priv); @@ -1263,6 +1342,10 @@ static int i915_drm_resume(struct drm_device *dev)
drm_mode_config_reset(dev);
+ ret = intel_dmem_evict_buffers(dev, false); + if (ret) + DRM_ERROR("i915_resume:i915_gem_object_pin_pages failed with err=%d\n", ret); + i915_gem_resume(dev_priv);
intel_modeset_init_hw(dev_priv);
Quoting Matthew Auld (2020-11-27 12:06:59)
+static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) +{
struct drm_i915_private *i915 = to_i915(dev);
struct drm_i915_gem_object *obj;
struct intel_memory_region *mem;
int id, ret = 0;
/*
* FIXME: Presently using memcpy,
* will replace with blitter once
* fix the issues.
*/
Why hasn't it been fixed then? -Chris
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
We are only doing it now for kernel_context. We also need to do for the copy engine blitter context.
Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 1b2009b4dcb7..69c8ea70d1e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -66,6 +66,11 @@ static int __engine_unpark(struct intel_wakeref *wf) ce->ops->reset(ce); }
+ if (engine->class == COPY_ENGINE_CLASS) { + ce = engine->blitter_context; + ce->ops->reset(ce); + } + if (engine->unpark) engine->unpark(engine);
Quoting Matthew Auld (2020-11-27 12:07:00)
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
We are only doing it now for kernel_context. We also need to do for the copy engine blitter context.
Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
drivers/gpu/drm/i915/gt/intel_engine_pm.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 1b2009b4dcb7..69c8ea70d1e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -66,6 +66,11 @@ static int __engine_unpark(struct intel_wakeref *wf) ce->ops->reset(ce); }
Add a list of pinned volatile contexts to the engine that must be restored across resume. -Chris
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Without a dedicated context there can be a "deadlock" due inversion between object clearing and eviction on the shared blitter context timeline.
Clearing of a newly allocated objects emits it's request, but to execute the request, something may need to be evicted in order to make space for the new VMA. When the eviction code emits it's copy request it will be after the buffer clear one in the ringbuffer and so neither can complete.
If we add a dedicated context for eviction then we can de-couple the two and break the "deadlock".
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: CQ Tang cq.tang@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 4 +- drivers/gpu/drm/i915/gt/intel_engine.h | 2 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 40 ++++++++++++++++++-- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 9 +++-- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + 5 files changed, 47 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index c84443e01ef1..ddb448f275eb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -767,7 +767,7 @@ static struct i915_vma * i915_window_vma_init(struct drm_i915_private *i915, struct intel_memory_region *mem) { - struct intel_context *ce = i915->gt.engine[BCS0]->blitter_context; + struct intel_context *ce = i915->gt.engine[BCS0]->evict_context; struct i915_address_space *vm = ce->vm; struct i915_vma *vma; int ret; @@ -984,7 +984,7 @@ int i915_window_blt_copy(struct drm_i915_gem_object *dst, struct drm_i915_gem_object *src) { struct drm_i915_private *i915 = to_i915(src->base.dev); - struct intel_context *ce = i915->gt.engine[BCS0]->blitter_context; + struct intel_context *ce = i915->gt.engine[BCS0]->evict_context; bool src_is_lmem = i915_gem_object_is_lmem(src); bool dst_is_lmem = i915_gem_object_is_lmem(dst); u64 remain = src->base.size, offset = 0; diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 188c5ff6dc64..623a6876dca5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -188,6 +188,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value) #define I915_GEM_HWS_SEQNO_ADDR (I915_GEM_HWS_SEQNO * sizeof(u32)) #define I915_GEM_HWS_BLITTER 0x42 #define I915_GEM_HWS_BLITTER_ADDR (I915_GEM_HWS_BLITTER * sizeof(u32)) +#define I915_GEM_HWS_EVICT 0x44 +#define I915_GEM_HWS_EVICT_ADDR (I915_GEM_HWS_EVICT * sizeof(u32)) #define I915_GEM_HWS_SCRATCH 0x80
#define I915_HWS_CSB_BUF0_INDEX 0x10 diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 9e0394b06f38..a83af8775a64 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -874,6 +874,20 @@ create_blitter_context(struct intel_engine_cs *engine) return ce; }
+static struct intel_context * +create_evict_context(struct intel_engine_cs *engine) +{ + static struct lock_class_key evict; + struct intel_context *ce; + + ce = create_pinned_context(engine, I915_GEM_HWS_EVICT_ADDR, &evict, + "evict_context"); + if (IS_ERR(ce)) + return ce; + + return ce; +} + /** * intel_engines_init_common - initialize cengine state which might require hw access * @engine: Engine to initialize. @@ -912,22 +926,35 @@ static int engine_init_common(struct intel_engine_cs *engine) engine->emit_fini_breadcrumb_dw = ret;
/* - * The blitter context is used to quickly memset or migrate objects - * in local memory, so it has to always be available. + * The blitter and evict contexts are used to clear and migrate objects + * in local memory so they have to always be available. */ if (engine->class == COPY_ENGINE_CLASS) { ce = create_blitter_context(engine); if (IS_ERR(ce)) { ret = PTR_ERR(ce); - goto err_unpin; + goto err_blitter; }
engine->blitter_context = ce; + + if (HAS_LMEM(engine->i915)) { + ce = create_evict_context(engine); + if (IS_ERR(ce)) { + ret = PTR_ERR(ce); + goto err_evict; + } + + engine->evict_context = ce; + } }
return 0;
-err_unpin: +err_evict: + intel_context_unpin(engine->blitter_context); + intel_context_put(engine->blitter_context); +err_blitter: intel_context_unpin(engine->kernel_context); err_context: intel_context_put(engine->kernel_context); @@ -986,6 +1013,11 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) if (engine->default_state) fput(engine->default_state);
+ if (engine->evict_context) { + intel_context_unpin(engine->evict_context); + intel_context_put(engine->evict_context); + } + if (engine->blitter_context) { intel_context_unpin(engine->blitter_context); intel_context_put(engine->blitter_context); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 69c8ea70d1e8..a5ca95270e92 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -66,10 +66,13 @@ static int __engine_unpark(struct intel_wakeref *wf) ce->ops->reset(ce); }
- if (engine->class == COPY_ENGINE_CLASS) { - ce = engine->blitter_context; + ce = engine->blitter_context; + if (ce) + ce->ops->reset(ce); + + ce = engine->evict_context; + if (ce) ce->ops->reset(ce); - }
if (engine->unpark) engine->unpark(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index cb2de4bf86ba..14e92423661b 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -348,6 +348,7 @@ struct intel_engine_cs {
struct intel_context *kernel_context; /* pinned */ struct intel_context *blitter_context; /* pinned; exists for BCS only */ + struct intel_context *evict_context; /* pinned; exists for BCS only */
intel_engine_mask_t saturated; /* submitting semaphores too late? */
From: Prathap Kumar Valsan prathap.kumar.valsan@intel.com
During suspend we will lose all page tables as they are allocated in LMEM. In-order to make sure that the contexts do not access the corrupted page table after we restore, we are evicting all vma's that are bound to vm's. This includes kernel vm.
During resume, we are restoring the page tables back to scratch page.
Signed-off-by: Prathap Kumar Valsan prathap.kumar.valsan@intel.com Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com Cc: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 13 ++++ drivers/gpu/drm/i915/gt/gen8_ppgtt.h | 2 + drivers/gpu/drm/i915/gt/intel_ppgtt.c | 4 + drivers/gpu/drm/i915/i915_drv.c | 102 +++++++++++++++++++++++--- 4 files changed, 112 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index b6fcebeef02a..704cab807e0b 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -775,3 +775,16 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt) kfree(ppgtt); return ERR_PTR(err); } + +void gen8_restore_ppgtt_mappings(struct i915_address_space *vm) +{ + const unsigned int count = gen8_pd_top_count(vm); + int i; + + for (i = 1; i <= vm->top; i++) + fill_px(vm->scratch[i], vm->scratch[i - 1]->encode); + + fill_page_dma(px_base(i915_vm_to_ppgtt(vm)->pd), + vm->scratch[vm->top]->encode, count); +} + diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h index 76a08b9c1f5c..3fa4b95aaabd 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h @@ -6,8 +6,10 @@ #ifndef __GEN8_PPGTT_H__ #define __GEN8_PPGTT_H__
+struct i915_address_space; struct intel_gt;
+void gen8_restore_ppgtt_mappings(struct i915_address_space *vm); struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt);
#endif diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 34a02643bb75..9b3eacd12a7e 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -9,6 +9,8 @@ #include "intel_gtt.h" #include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" +#include "gem/i915_gem_context.h" +#include "gem/i915_gem_region.h" #include "gen6_ppgtt.h" #include "gen8_ppgtt.h"
@@ -317,3 +319,5 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt) ppgtt->vm.vma_ops.set_pages = ppgtt_set_pages; ppgtt->vm.vma_ops.clear_pages = clear_pages; } + + diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index e8c4931fc818..7115f4db5043 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -64,6 +64,7 @@ #include "gem/i915_gem_context.h" #include "gem/i915_gem_ioctls.h" #include "gem/i915_gem_mman.h" +#include "gt/gen8_ppgtt.h" #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" #include "gt/intel_rc6.h" @@ -1136,13 +1137,13 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend)
mutex_unlock(&mem->objects.lock);
- if (in_suspend) - i915_gem_object_unbind(obj, 0); - if (in_suspend) { obj->swapto = NULL; obj->evicted = false; obj->do_swapping = true; + + i915_gem_object_unbind(obj, 0); + ret = __i915_gem_object_put_pages(obj); obj->do_swapping = false; if (ret) { @@ -1176,6 +1177,43 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) return ret; }
+static int i915_gem_suspend_ppgtt_mappings(struct drm_i915_private *i915) +{ + struct i915_gem_context *ctx, *cn; + int ret; + + spin_lock(&i915->gem.contexts.lock); + list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) { + struct i915_address_space *vm; + + if (!kref_get_unless_zero(&ctx->ref)) + continue; + spin_unlock(&i915->gem.contexts.lock); + + vm = i915_gem_context_get_vm_rcu(ctx); + mutex_lock(&vm->mutex); + ret = i915_gem_evict_vm(vm); + mutex_unlock(&vm->mutex); + if (ret) { + GEM_WARN_ON(ret); + i915_vm_put(vm); + i915_gem_context_put(ctx); + return ret; + } + i915_vm_put(vm); + spin_lock(&i915->gem.contexts.lock); + list_safe_reset_next(ctx, cn, link); + i915_gem_context_put(ctx); + } + spin_unlock(&i915->gem.contexts.lock); + + mutex_lock(&i915->gt.vm->mutex); + ret = i915_gem_evict_vm(i915->gt.vm); + mutex_unlock(&i915->gt.vm->mutex); + + return ret; +} + static int i915_drm_suspend(struct drm_device *dev) { struct drm_i915_private *dev_priv = to_i915(dev); @@ -1213,9 +1251,17 @@ static int i915_drm_suspend(struct drm_device *dev)
intel_fbdev_set_suspend(dev, FBINFO_STATE_SUSPENDED, true);
- ret = intel_dmem_evict_buffers(dev, true); - if (ret) - return ret; + if (HAS_LMEM(dev_priv)) { + ret = intel_dmem_evict_buffers(dev, true); + if (ret) + return ret; + + i915_teardown_blt_windows(dev_priv); + + ret = i915_gem_suspend_ppgtt_mappings(dev_priv); + if (ret) + return ret; + }
dev_priv->suspend_count++;
@@ -1306,6 +1352,36 @@ int i915_suspend_switcheroo(struct drm_i915_private *i915, pm_message_t state) return i915_drm_suspend_late(&i915->drm, false); }
+static void i915_gem_restore_ppgtt_mappings(struct drm_i915_private *i915) +{ + struct i915_gem_context *ctx, *cn; + + spin_lock(&i915->gem.contexts.lock); + + list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) { + struct i915_address_space *vm; + + if (!kref_get_unless_zero(&ctx->ref)) + continue; + + spin_unlock(&i915->gem.contexts.lock); + + vm = i915_gem_context_get_vm_rcu(ctx); + mutex_lock(&vm->mutex); + gen8_restore_ppgtt_mappings(vm); + mutex_unlock(&vm->mutex); + i915_vm_put(vm); + spin_lock(&i915->gem.contexts.lock); + list_safe_reset_next(ctx, cn, link); + i915_gem_context_put(ctx); + } + spin_unlock(&i915->gem.contexts.lock); + + mutex_lock(&i915->gt.vm->mutex); + gen8_restore_ppgtt_mappings(i915->gt.vm); + mutex_unlock(&i915->gt.vm->mutex); +} + static int i915_drm_resume(struct drm_device *dev) { struct drm_i915_private *dev_priv = to_i915(dev); @@ -1342,9 +1418,17 @@ static int i915_drm_resume(struct drm_device *dev)
drm_mode_config_reset(dev);
- ret = intel_dmem_evict_buffers(dev, false); - if (ret) - DRM_ERROR("i915_resume:i915_gem_object_pin_pages failed with err=%d\n", ret); + if (HAS_LMEM(dev_priv)) { + i915_gem_restore_ppgtt_mappings(dev_priv); + + ret = i915_setup_blt_windows(dev_priv); + if (ret) + GEM_BUG_ON(ret); + + ret = intel_dmem_evict_buffers(dev, false); + if (ret) + DRM_ERROR("i915_resume:i915_gem_object_pin_pages failed with err=%d\n", ret); + }
i915_gem_resume(dev_priv);
Quoting Matthew Auld (2020-11-27 12:07:02)
From: Prathap Kumar Valsan prathap.kumar.valsan@intel.com
During suspend we will lose all page tables as they are allocated in LMEM. In-order to make sure that the contexts do not access the corrupted page table after we restore, we are evicting all vma's that are bound to vm's. This includes kernel vm.
During resume, we are restoring the page tables back to scratch page.
Signed-off-by: Prathap Kumar Valsan prathap.kumar.valsan@intel.com Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com Cc: CQ Tang cq.tang@intel.com
drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 13 ++++ drivers/gpu/drm/i915/gt/gen8_ppgtt.h | 2 + drivers/gpu/drm/i915/gt/intel_ppgtt.c | 4 + drivers/gpu/drm/i915/i915_drv.c | 102 +++++++++++++++++++++++--- 4 files changed, 112 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index b6fcebeef02a..704cab807e0b 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -775,3 +775,16 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt) kfree(ppgtt); return ERR_PTR(err); }
+void gen8_restore_ppgtt_mappings(struct i915_address_space *vm) +{
const unsigned int count = gen8_pd_top_count(vm);
int i;
for (i = 1; i <= vm->top; i++)
fill_px(vm->scratch[i], vm->scratch[i - 1]->encode);
fill_page_dma(px_base(i915_vm_to_ppgtt(vm)->pd),
vm->scratch[vm->top]->encode, count);
+}
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h index 76a08b9c1f5c..3fa4b95aaabd 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h @@ -6,8 +6,10 @@ #ifndef __GEN8_PPGTT_H__ #define __GEN8_PPGTT_H__
+struct i915_address_space; struct intel_gt;
+void gen8_restore_ppgtt_mappings(struct i915_address_space *vm); struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt);
#endif diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 34a02643bb75..9b3eacd12a7e 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -9,6 +9,8 @@ #include "intel_gtt.h" #include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" +#include "gem/i915_gem_context.h" +#include "gem/i915_gem_region.h" #include "gen6_ppgtt.h" #include "gen8_ppgtt.h"
@@ -317,3 +319,5 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt) ppgtt->vm.vma_ops.set_pages = ppgtt_set_pages; ppgtt->vm.vma_ops.clear_pages = clear_pages; }
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index e8c4931fc818..7115f4db5043 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -64,6 +64,7 @@ #include "gem/i915_gem_context.h" #include "gem/i915_gem_ioctls.h" #include "gem/i915_gem_mman.h" +#include "gt/gen8_ppgtt.h" #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" #include "gt/intel_rc6.h" @@ -1136,13 +1137,13 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend)
mutex_unlock(&mem->objects.lock);
if (in_suspend)
i915_gem_object_unbind(obj, 0);
if (in_suspend) { obj->swapto = NULL; obj->evicted = false; obj->do_swapping = true;
i915_gem_object_unbind(obj, 0);
ret = __i915_gem_object_put_pages(obj); obj->do_swapping = false; if (ret) {
@@ -1176,6 +1177,43 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) return ret; }
+static int i915_gem_suspend_ppgtt_mappings(struct drm_i915_private *i915) +{
struct i915_gem_context *ctx, *cn;
int ret;
spin_lock(&i915->gem.contexts.lock);
list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) {
Wrong list. Bad starting point from GEM. -Chris
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
If record default objects are created in LMEM and in suspend pin the pages of obj (src) and use blitter for eviction. But during request creation using blitter context and try to pin the same default object, to restore the ctx with default HW values, will leads to the dead lock situation. To avoid this, safe to keep these objects in SMEM.
Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com Cc: Prathap Kumar Valsan prathap.kumar.valsan@intel.com --- .../drm/i915/gt/intel_execlists_submission.c | 25 +++++++++++++------ 1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c640b90711fd..ee5732b436e3 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -4697,7 +4697,13 @@ static int __execlists_context_alloc(struct intel_context *ce, context_size += PAGE_SIZE; }
- if (HAS_LMEM(engine->i915)) { + /* FIXME: temporary fix for allocating default ctx objects + * in SMEM, to reslove suspend/resume issues while using + * blitter based eviction. Will remove this once the upstream + * changes merged, where default obj's stored using shmemfs file. + */ + if (HAS_LMEM(engine->i915) && + (!IS_DG1(engine->i915) || engine->default_state)) { ctx_obj = i915_gem_object_create_lmem(engine->i915, context_size, I915_BO_ALLOC_CONTIGUOUS); @@ -4707,16 +4713,18 @@ static int __execlists_context_alloc(struct intel_context *ce, if (IS_ERR(ctx_obj)) return PTR_ERR(ctx_obj);
- if (HAS_LMEM(engine->i915)) { + i915_gem_object_lock_isolated(ctx_obj); + if (HAS_LMEM(engine->i915) && + (!IS_DG1(engine->i915) || engine->default_state)) { ret = context_clear_lmem(ctx_obj); if (ret) - goto error_deref_obj; + goto error_unlock; }
vma = i915_vma_instance(ctx_obj, &engine->gt->ggtt->vm, NULL); if (IS_ERR(vma)) { ret = PTR_ERR(vma); - goto error_deref_obj; + goto error_unlock; }
if (!page_mask_bits(ce->timeline)) { @@ -4732,7 +4740,7 @@ static int __execlists_context_alloc(struct intel_context *ce, tl = intel_timeline_create(engine->gt); if (IS_ERR(tl)) { ret = PTR_ERR(tl); - goto error_deref_obj; + goto error_unlock; }
ce->timeline = tl; @@ -4741,15 +4749,18 @@ static int __execlists_context_alloc(struct intel_context *ce, ring = intel_engine_create_ring(engine, (unsigned long)ce->ring); if (IS_ERR(ring)) { ret = PTR_ERR(ring); - goto error_deref_obj; + goto error_unlock; }
ce->ring = ring; ce->state = vma;
+ i915_gem_object_unlock(ctx_obj); + return 0;
-error_deref_obj: +error_unlock: + i915_gem_object_unlock(ctx_obj); i915_gem_object_put(ctx_obj); return ret; }
Quoting Matthew Auld (2020-11-27 12:07:03)
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
If record default objects are created in LMEM and in suspend pin the pages of obj (src) and use blitter for eviction. But during request creation using blitter context and try to pin the same default object, to restore the ctx with default HW values, will leads to the dead lock situation. To avoid this, safe to keep these objects in SMEM.
Dead patch. Default object state should be recorded as shmemfs. -Chris
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
In suspend mode use blitter eviction before disable the runtime interrupts and in resume use blitter after the gem resume happens.
Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com Cc: Prathap Kumar Valsan prathap.kumar.valsan@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 36 +++++++++++++-------------------- 1 file changed, 14 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 7115f4db5043..eb5383e4a30b 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1110,13 +1110,6 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) struct intel_memory_region *mem; int id, ret = 0;
- /* - * FIXME: Presently using memcpy, - * will replace with blitter once - * fix the issues. - */ - i915->params.enable_eviction = 1; - for_each_memory_region(mem, i915, id) { struct list_head still_in_list; INIT_LIST_HEAD(&still_in_list); @@ -1173,7 +1166,6 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) mutex_unlock(&mem->objects.lock); } } - i915->params.enable_eviction = 3; return ret; }
@@ -1235,6 +1227,18 @@ static int i915_drm_suspend(struct drm_device *dev)
intel_dp_mst_suspend(dev_priv);
+ if (HAS_LMEM(dev_priv)) { + ret = intel_dmem_evict_buffers(dev, true); + if (ret) + return ret; + + i915_teardown_blt_windows(dev_priv); + + ret = i915_gem_suspend_ppgtt_mappings(dev_priv); + if (ret) + return ret; + } + intel_runtime_pm_disable_interrupts(dev_priv); intel_hpd_cancel_work(dev_priv);
@@ -1251,18 +1255,6 @@ static int i915_drm_suspend(struct drm_device *dev)
intel_fbdev_set_suspend(dev, FBINFO_STATE_SUSPENDED, true);
- if (HAS_LMEM(dev_priv)) { - ret = intel_dmem_evict_buffers(dev, true); - if (ret) - return ret; - - i915_teardown_blt_windows(dev_priv); - - ret = i915_gem_suspend_ppgtt_mappings(dev_priv); - if (ret) - return ret; - } - dev_priv->suspend_count++;
intel_csr_ucode_suspend(dev_priv); @@ -1418,6 +1410,8 @@ static int i915_drm_resume(struct drm_device *dev)
drm_mode_config_reset(dev);
+ i915_gem_resume(dev_priv); + if (HAS_LMEM(dev_priv)) { i915_gem_restore_ppgtt_mappings(dev_priv);
@@ -1430,8 +1424,6 @@ static int i915_drm_resume(struct drm_device *dev) DRM_ERROR("i915_resume:i915_gem_object_pin_pages failed with err=%d\n", ret); }
- i915_gem_resume(dev_priv); - intel_modeset_init_hw(dev_priv); intel_init_clock_gating(dev_priv); intel_hpd_init(dev_priv);
Quoting Matthew Auld (2020-11-27 12:07:04)
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
In suspend mode use blitter eviction before disable the runtime interrupts and in resume use blitter after the gem resume happens.
Consider add it to the suspend prepare function. -Chris
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
The objects which are perma-pinned (like guc), use memcpy to evict these objects. Since the objects are always have pinned pages, so can't use present existing swapout/swapin functions.
Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com Cc: Prathap Kumar Valsan prathap.kumar.valsan@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 105 +++++++++++++++++++++++++++----- 1 file changed, 89 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index eb5383e4a30b..c8af68227020 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1103,7 +1103,54 @@ static int i915_drm_prepare(struct drm_device *dev) return 0; }
-static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) +static int i915_gem_perma_pinned_object_swapout(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct drm_i915_gem_object *dst; + int err = -EINVAL; + + assert_object_held(obj); + dst = i915_gem_object_create_shmem(i915, obj->base.size); + if (IS_ERR(dst)) + return PTR_ERR(dst); + + i915_gem_object_lock_isolated(dst); + err = i915_gem_object_memcpy(dst, obj); + i915_gem_object_unlock(dst); + + if (!err) { + obj->swapto = dst; + obj->evicted = true; + } else + i915_gem_object_put(dst); + + return err; +} + +static int i915_gem_perma_pinned_object_swapin(struct drm_i915_gem_object *obj) +{ + struct drm_i915_gem_object *src; + int err = -EINVAL; + + assert_object_held(obj); + src = obj->swapto; + + if (WARN_ON(!i915_gem_object_trylock(src))) + return -EBUSY; + + err = i915_gem_object_memcpy(obj, src); + i915_gem_object_unlock(src); + + if (!err) { + obj->swapto = NULL; + obj->evicted = false; + i915_gem_object_put(src); + } + return err; +} + +static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend, + bool perma_pin) { struct drm_i915_private *i915 = to_i915(dev); struct drm_i915_gem_object *obj; @@ -1133,24 +1180,37 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) if (in_suspend) { obj->swapto = NULL; obj->evicted = false; - obj->do_swapping = true;
- i915_gem_object_unbind(obj, 0); + ret = i915_gem_object_unbind(obj, 0); + if (ret || i915_gem_object_has_pinned_pages(obj)) { + if (!i915_gem_object_trylock(obj)) { + ret = -EBUSY; + goto next; + } + ret = i915_gem_perma_pinned_object_swapout(obj); + i915_gem_object_unlock(obj); + goto next; + }
+ obj->do_swapping = true; ret = __i915_gem_object_put_pages(obj); obj->do_swapping = false; - if (ret) { - /* - * FIXME: internal ctx objects still pinned - * returning as BUSY. Presently just evicting - * the user objects, will fix it later - */ + if (ret) obj->evicted = false; - ret = 0; - } else + else obj->evicted = true; } else { - if (obj->swapto && obj->evicted) { + if (i915_gem_object_has_pinned_pages(obj) && perma_pin) { + if (!i915_gem_object_trylock(obj)) { + ret = -EBUSY; + goto next; + } + ret = i915_gem_perma_pinned_object_swapin(obj); + /* FIXME: Where is this error message taken care of? */ + i915_gem_object_unlock(obj); + } + + if (obj->swapto && obj->evicted && !perma_pin) { ret = i915_gem_object_pin_pages(obj); if (ret) { i915_gem_object_put(obj); @@ -1160,7 +1220,10 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend) } } } +next: mutex_lock(&mem->objects.lock); + if (ret) + break; } list_splice_tail(&still_in_list, &mem->objects.list); mutex_unlock(&mem->objects.lock); @@ -1228,7 +1291,7 @@ static int i915_drm_suspend(struct drm_device *dev) intel_dp_mst_suspend(dev_priv);
if (HAS_LMEM(dev_priv)) { - ret = intel_dmem_evict_buffers(dev, true); + ret = intel_dmem_evict_buffers(dev, true, false); if (ret) return ret;
@@ -1410,6 +1473,14 @@ static int i915_drm_resume(struct drm_device *dev)
drm_mode_config_reset(dev);
+ if (HAS_LMEM(dev_priv)) { + ret = intel_dmem_evict_buffers(dev, false, true); + if (ret) { + DRM_ERROR("perma pinned obj's failed with err=%d\n", ret); + return ret; + } + } + i915_gem_resume(dev_priv);
if (HAS_LMEM(dev_priv)) { @@ -1419,9 +1490,11 @@ static int i915_drm_resume(struct drm_device *dev) if (ret) GEM_BUG_ON(ret);
- ret = intel_dmem_evict_buffers(dev, false); - if (ret) - DRM_ERROR("i915_resume:i915_gem_object_pin_pages failed with err=%d\n", ret); + ret = intel_dmem_evict_buffers(dev, false, false); + if (ret) { + DRM_ERROR("gem_object_pin_pages failed with err=%d\n", ret); + return ret; + } }
intel_modeset_init_hw(dev_priv);
From: CQ Tang cq.tang@intel.com
When cache_level is NONE, we check HAS_LLC(i915). But additionally for DGFX, we also need to check HAS_SNOOP(i915) on system memory object to use I915_BO_CACHE_COHERENT_FOR_READ. on dg1, has_llc=0, and has_snoop=1. Otherwise, we set obj->cache_choerent=0 and have performance impact.
Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Matthew Auld matthew.auld@intel.com Signed-off-by: CQ Tang cq.tang@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index ddb448f275eb..be603171c444 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -95,6 +95,20 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, mutex_init(&obj->mm.get_dma_page.lock); }
+static bool i915_gem_object_use_llc(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + + if (HAS_LLC(i915)) + return true; + + if (IS_DGFX(i915) && HAS_SNOOP(i915) && + !i915_gem_object_is_lmem(obj)) + return true; + + return false; +} + /** * Mark up the object's coherency levels for a given cache_level * @obj: #drm_i915_gem_object @@ -108,7 +122,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj, if (cache_level != I915_CACHE_NONE) obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE); - else if (HAS_LLC(to_i915(obj->base.dev))) + else if (i915_gem_object_use_llc(obj)) obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ; else obj->cache_coherent = 0;
Quoting Matthew Auld (2020-11-27 12:07:06)
From: CQ Tang cq.tang@intel.com
When cache_level is NONE, we check HAS_LLC(i915). But additionally for DGFX, we also need to check HAS_SNOOP(i915) on system memory object to use I915_BO_CACHE_COHERENT_FOR_READ. on dg1, has_llc=0, and has_snoop=1. Otherwise, we set obj->cache_choerent=0 and have performance impact.
Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Matthew Auld matthew.auld@intel.com Signed-off-by: CQ Tang cq.tang@intel.com
drivers/gpu/drm/i915/gem/i915_gem_object.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index ddb448f275eb..be603171c444 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -95,6 +95,20 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, mutex_init(&obj->mm.get_dma_page.lock); }
+static bool i915_gem_object_use_llc(struct drm_i915_gem_object *obj) +{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
if (HAS_LLC(i915))
return true;
if (IS_DGFX(i915) && HAS_SNOOP(i915) &&
!i915_gem_object_is_lmem(obj))
return true;
return false;
+}
/**
- Mark up the object's coherency levels for a given cache_level
- @obj: #drm_i915_gem_object
@@ -108,7 +122,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj, if (cache_level != I915_CACHE_NONE) obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE);
else if (HAS_LLC(to_i915(obj->base.dev)))
else if (i915_gem_object_use_llc(obj)) obj->cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ; else obj->cache_coherent = 0;
You must also define obj->cache_level correctly. You can not just assume the object will be snooped. -Chris --------------------------------------------------------------------- Intel Corporation (UK) Limited Registered No. 1134945 (England) Registered Office: Pipers Way, Swindon SN3 1RJ VAT No: 860 2173 47
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
From: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 40 ++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index c8af68227020..b7d40a9c00bf 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -68,6 +68,7 @@ #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" #include "gt/intel_rc6.h" +#include "gt/intel_gt_requests.h"
#include "i915_debugfs.h" #include "i915_drv.h" @@ -1088,10 +1089,36 @@ static bool suspend_to_idle(struct drm_i915_private *dev_priv) return false; }
+static int i915_gem_suspend_ppgtt_mappings(struct drm_i915_private *i915); + +static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend, + bool perma_pin); + static int i915_drm_prepare(struct drm_device *dev) { struct drm_i915_private *i915 = to_i915(dev);
+ if (HAS_LMEM(i915)) { + struct intel_gt *gt= &i915->gt; + long timeout = I915_GEM_IDLE_TIMEOUT; + int ret; + + if (intel_gt_wait_for_idle(gt, timeout) == -ETIME) { + intel_gt_set_wedged(gt); + intel_gt_retire_requests(gt); + } + + ret = intel_dmem_evict_buffers(dev, true, false); + if (ret) + return ret; + + i915_teardown_blt_windows(i915); + + ret = i915_gem_suspend_ppgtt_mappings(i915); + if (ret) + return ret; + } + /* * NB intel_display_suspend() may issue new requests after we've * ostensibly marked the GPU as ready-to-sleep here. We need to @@ -1274,7 +1301,6 @@ static int i915_drm_suspend(struct drm_device *dev) struct drm_i915_private *dev_priv = to_i915(dev); struct pci_dev *pdev = dev_priv->drm.pdev; pci_power_t opregion_target_state; - int ret = 0;
disable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
@@ -1290,18 +1316,6 @@ static int i915_drm_suspend(struct drm_device *dev)
intel_dp_mst_suspend(dev_priv);
- if (HAS_LMEM(dev_priv)) { - ret = intel_dmem_evict_buffers(dev, true, false); - if (ret) - return ret; - - i915_teardown_blt_windows(dev_priv); - - ret = i915_gem_suspend_ppgtt_mappings(dev_priv); - if (ret) - return ret; - } - intel_runtime_pm_disable_interrupts(dev_priv); intel_hpd_cancel_work(dev_priv);
From: Thomas Hellström thomas.hellstrom@intel.com
This is important to help avoid evicting already resident buffers from the batch we're processing.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 25 ++++++++++++++++--- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e73a761a7d1f..c988f8ffd39f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -918,21 +918,38 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) return err; }
-static int eb_validate_vmas(struct i915_execbuffer *eb) +static int eb_lock_vmas(struct i915_execbuffer *eb) { unsigned int i; int err;
- INIT_LIST_HEAD(&eb->unbound); - for (i = 0; i < eb->buffer_count; i++) { - struct drm_i915_gem_exec_object2 *entry = &eb->exec[i]; struct eb_vma *ev = &eb->vma[i]; struct i915_vma *vma = ev->vma;
err = i915_gem_object_lock(vma->obj, &eb->ww); if (err) return err; + } + + return 0; +} + +static int eb_validate_vmas(struct i915_execbuffer *eb) +{ + unsigned int i; + int err; + + INIT_LIST_HEAD(&eb->unbound); + + err = eb_lock_vmas(eb); + if (err) + return err; + + for (i = 0; i < eb->buffer_count; i++) { + struct drm_i915_gem_exec_object2 *entry = &eb->exec[i]; + struct eb_vma *ev = &eb->vma[i]; + struct i915_vma *vma = ev->vma;
err = eb_pin_vma(eb, entry, ev); if (err == -EDEADLK)
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Use a separate acquire context list and a separate locking function for objects that are locked for eviction. These objects are then properly referenced while on the list and can be unlocked early in the ww transaction.
Co-developed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 67 +++++++++++++++++-- .../gpu/drm/i915/gem/i915_gem_object_types.h | 5 ++ drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 14 +++- drivers/gpu/drm/i915/i915_gem_ww.c | 51 ++++++++++---- drivers/gpu/drm/i915/i915_gem_ww.h | 3 + 5 files changed, 122 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 52a36b4052f0..e237b0fb0e79 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -158,6 +158,32 @@ static inline void assert_object_held_shared(struct drm_i915_gem_object *obj) assert_object_held(obj); }
+static inline int +i915_gem_object_lock_to_evict(struct drm_i915_gem_object *obj, + struct i915_gem_ww_ctx *ww) +{ + int ret; + + if (ww->intr) + ret = dma_resv_lock_interruptible(obj->base.resv, &ww->ctx); + else + ret = dma_resv_lock(obj->base.resv, &ww->ctx); + + if (!ret) { + list_add_tail(&obj->obj_link, &ww->eviction_list); + i915_gem_object_get(obj); + obj->evict_locked = true; + } + + GEM_WARN_ON(ret == -EALREADY); + if (ret == -EDEADLK) { + ww->contended_evict = true; + ww->contended = i915_gem_object_get(obj); + } + + return ret; +} + static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj, struct i915_gem_ww_ctx *ww, bool intr) @@ -169,13 +195,25 @@ static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj, else ret = dma_resv_lock(obj->base.resv, ww ? &ww->ctx : NULL);
- if (!ret && ww) + if (!ret && ww) { list_add_tail(&obj->obj_link, &ww->obj_list); - if (ret == -EALREADY) - ret = 0; + obj->evict_locked = false; + }
- if (ret == -EDEADLK) + if (ret == -EALREADY) { + ret = 0; + /* We've already evicted an object needed for this batch. */ + if (obj->evict_locked) { + list_move_tail(&obj->obj_link, &ww->obj_list); + i915_gem_object_put(obj); + obj->evict_locked = false; + } + } + + if (ret == -EDEADLK) { + ww->contended_evict = false; ww->contended = i915_gem_object_get(obj); + }
return ret; } @@ -580,6 +618,27 @@ i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, __i915_gem_object_invalidate_frontbuffer(obj, origin); }
+/** + * i915_gem_get_locking_ctx - Get the locking context of a locked object + * if any. + * + * @obj: The object to get the locking ctx from + * + * RETURN: The locking context if the object was locked using a context. + * NULL otherwise. + */ +static inline struct i915_gem_ww_ctx * +i915_gem_get_locking_ctx(const struct drm_i915_gem_object *obj) +{ + struct ww_acquire_ctx *ctx; + + ctx = obj->base.resv->lock.ctx; + if (!ctx) + return NULL; + + return container_of(ctx, struct i915_gem_ww_ctx, ctx); +} + #ifdef CONFIG_MMU_NOTIFIER static inline bool i915_gem_object_is_userptr(struct drm_i915_gem_object *obj) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 331d113f7d5b..c42c0d3d5d67 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -142,6 +142,11 @@ struct drm_i915_gem_object { */ struct list_head obj_link;
+ /** + * @evict_locked: Whether @obj_link sits on the eviction_list + */ + bool evict_locked; + /** Stolen memory for this object, instead of being backed by shmem. */ struct drm_mm_node *stolen; union { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index 27674048f17d..59d0f14b90ea 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -100,6 +100,7 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww, unsigned long *nr_scanned, unsigned int shrink) { + struct drm_i915_gem_object *obj; const struct { struct list_head *list; unsigned int bit; @@ -164,7 +165,6 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww, */ for (phase = phases; phase->list; phase++) { struct list_head still_in_list; - struct drm_i915_gem_object *obj; unsigned long flags;
if ((shrink & phase->bit) == 0) @@ -197,6 +197,10 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww, if (!can_release_pages(obj)) continue;
+ /* Already locked this object? */ + if (ww && ww == i915_gem_get_locking_ctx(obj)) + continue; + if (!kref_get_unless_zero(&obj->base.refcount)) continue;
@@ -209,7 +213,11 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww, if (!i915_gem_object_trylock(obj)) goto skip; } else { - err = i915_gem_object_lock(obj, ww); + err = i915_gem_object_lock_to_evict(obj, ww); + if (err == -EALREADY) { + err = 0; + goto skip; + } if (err) goto skip; } @@ -235,6 +243,8 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww, if (err) return err; } + if (ww) + i915_gem_ww_ctx_unlock_evictions(ww);
if (shrink & I915_SHRINK_BOUND) intel_runtime_pm_put(&i915->runtime_pm, wakeref); diff --git a/drivers/gpu/drm/i915/i915_gem_ww.c b/drivers/gpu/drm/i915/i915_gem_ww.c index 43960d8595eb..811bf7677d78 100644 --- a/drivers/gpu/drm/i915/i915_gem_ww.c +++ b/drivers/gpu/drm/i915/i915_gem_ww.c @@ -10,24 +10,45 @@ void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ww, bool intr) { ww_acquire_init(&ww->ctx, &reservation_ww_class); INIT_LIST_HEAD(&ww->obj_list); + INIT_LIST_HEAD(&ww->eviction_list); ww->intr = intr; ww->contended = NULL; + ww->contended_evict = false; +} + +void i915_gem_ww_ctx_unlock_evictions(struct i915_gem_ww_ctx *ww) +{ + struct drm_i915_gem_object *obj, *next; + + list_for_each_entry_safe(obj, next, &ww->eviction_list, obj_link) { + list_del(&obj->obj_link); + GEM_WARN_ON(!obj->evict_locked); + i915_gem_object_unlock(obj); + i915_gem_object_put(obj); + } }
static void i915_gem_ww_ctx_unlock_all(struct i915_gem_ww_ctx *ww) { - struct drm_i915_gem_object *obj; + struct drm_i915_gem_object *obj, *next;
- while ((obj = list_first_entry_or_null(&ww->obj_list, struct drm_i915_gem_object, obj_link))) { + list_for_each_entry_safe(obj, next, &ww->obj_list, obj_link) { list_del(&obj->obj_link); + GEM_WARN_ON(obj->evict_locked); i915_gem_object_unlock(obj); } + + i915_gem_ww_ctx_unlock_evictions(ww); }
void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj) { + bool evict_locked = obj->evict_locked; + list_del(&obj->obj_link); i915_gem_object_unlock(obj); + if (evict_locked) + i915_gem_object_put(obj); }
void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ww) @@ -39,27 +60,33 @@ void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ww)
int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ww) { + struct drm_i915_gem_object *obj = ww->contended; int ret = 0;
- if (WARN_ON(!ww->contended)) + if (WARN_ON(!obj)) return -EINVAL;
i915_gem_ww_ctx_unlock_all(ww); if (ww->intr) - ret = dma_resv_lock_slow_interruptible(ww->contended->base.resv, &ww->ctx); + ret = dma_resv_lock_slow_interruptible(obj->base.resv, &ww->ctx); else - dma_resv_lock_slow(ww->contended->base.resv, &ww->ctx); + dma_resv_lock_slow(obj->base.resv, &ww->ctx); + if (ret) + goto out;
/* - * Unlocking the contended lock again, as might not need it in - * the retried transaction. This does not increase starvation, - * but it's opening up for a wakeup flood if there are many - * transactions relaxing on this object. + * Unlocking the contended lock again, if it was locked for eviction. + * We will most likely not need it in the retried transaction. */ - if (!ret) - dma_resv_unlock(ww->contended->base.resv); + if (ww->contended_evict) { + dma_resv_unlock(obj->base.resv); + } else { + obj->evict_locked = false; + list_add_tail(&obj->obj_link, &ww->obj_list); + }
- i915_gem_object_put(ww->contended); +out: + i915_gem_object_put(obj); ww->contended = NULL;
return ret; diff --git a/drivers/gpu/drm/i915/i915_gem_ww.h b/drivers/gpu/drm/i915/i915_gem_ww.h index f6b1a796667b..11793b170cc2 100644 --- a/drivers/gpu/drm/i915/i915_gem_ww.h +++ b/drivers/gpu/drm/i915/i915_gem_ww.h @@ -10,15 +10,18 @@ struct i915_gem_ww_ctx { struct ww_acquire_ctx ctx; struct list_head obj_list; + struct list_head eviction_list; struct drm_i915_gem_object *contended; unsigned short intr; unsigned short loop; + unsigned short contended_evict; };
void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ctx, bool intr); void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ctx); int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ctx); void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj); +void i915_gem_ww_ctx_unlock_evictions(struct i915_gem_ww_ctx *ww);
/* Internal functions used by the inlines! Don't use. */ static inline int __i915_gem_ww_fini(struct i915_gem_ww_ctx *ww, int err)
From: Thomas Hellström thomas.hellstrom@intel.com
Use sleeping ww locks if we're in a ww transaction. Trylock otherwise. We unlock the evicted objects either when eviction failed or when we've reached the target. The ww ticket locks will then ensure we will eventually succeed reaching the target if there is evictable space available. However another process may still steal the evicted memory before we have a chance to allocate it. To ensure we eventually succeed we need to move the evict unlock until after get pages succeeds. That's considered a TODO for now.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_region.c | 7 ++- drivers/gpu/drm/i915/intel_memory_region.c | 57 ++++++++++++++++------ drivers/gpu/drm/i915/intel_memory_region.h | 2 + 3 files changed, 49 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 1ec6528498c8..8ec59fbaa3e6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -204,6 +204,7 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) struct scatterlist *sg; unsigned int sg_page_sizes; int ret; + struct i915_gem_ww_ctx *ww = i915_gem_get_locking_ctx(obj);
/* XXX: Check if we have any post. This is nasty hack, see gem_create */ if (obj->mm.gem_create_posted_err) @@ -222,7 +223,8 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) if (obj->flags & I915_BO_ALLOC_CONTIGUOUS) flags |= I915_ALLOC_CONTIGUOUS;
- ret = __intel_memory_region_get_pages_buddy(mem, size, flags, blocks); + ret = __intel_memory_region_get_pages_buddy(mem, ww, size, flags, + blocks); if (ret) goto err_free_sg;
@@ -277,7 +279,8 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) if (ret) { /* swapin failed, free the pages */ __intel_memory_region_put_pages_buddy(mem, blocks); - ret = -ENXIO; + if (ret != -EDEADLK && ret != -EINTR) + ret = -ENXIO; goto err_free_sg; } } else if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) { diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 57f01ef16628..6b26b6cd5958 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -96,6 +96,7 @@ __intel_memory_region_put_block_buddy(struct i915_buddy_block *block) }
static int intel_memory_region_evict(struct intel_memory_region *mem, + struct i915_gem_ww_ctx *ww, resource_size_t target) { struct drm_i915_private *i915 = mem->i915; @@ -109,6 +110,7 @@ static int intel_memory_region_evict(struct intel_memory_region *mem, struct list_head **phase; resource_size_t found; int pass; + int err = 0;
intel_gt_retire_requests(&i915->gt);
@@ -126,10 +128,11 @@ static int intel_memory_region_evict(struct intel_memory_region *mem, mm.region_link))) { list_move_tail(&obj->mm.region_link, &still_in_list);
- if (!i915_gem_object_has_pages(obj)) + if (i915_gem_object_is_framebuffer(obj)) continue;
- if (i915_gem_object_is_framebuffer(obj)) + /* Already locked this object? */ + if (ww && ww == i915_gem_get_locking_ctx(obj)) continue;
/* @@ -147,34 +150,51 @@ static int intel_memory_region_evict(struct intel_memory_region *mem,
mutex_unlock(&mem->objects.lock);
+ if (ww) { + err = i915_gem_object_lock_to_evict(obj, ww); + if (err) + goto put; + } else { + if (!i915_gem_object_trylock(obj)) + goto put; + } + + if (!i915_gem_object_has_pages(obj)) + goto unlock; + /* tell callee to do swapping */ if (i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM) && pass == 1) obj->do_swapping = true;
if (!i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE)) { - if (i915_gem_object_trylock(obj)) { - __i915_gem_object_put_pages(obj); - /* May arrive from get_pages on another bo */ - if (!i915_gem_object_has_pages(obj)) { - found += obj->base.size; - if (obj->mm.madv == I915_MADV_DONTNEED) - obj->mm.madv = __I915_MADV_PURGED; - } - i915_gem_object_unlock(obj); + __i915_gem_object_put_pages(obj); + /* May arrive from get_pages on another bo */ + + if (!i915_gem_object_has_pages(obj)) { + found += obj->base.size; + if (obj->mm.madv == I915_MADV_DONTNEED) + obj->mm.madv = __I915_MADV_PURGED; } }
obj->do_swapping = false; +unlock: + if (!ww) + i915_gem_object_unlock(obj); +put: i915_gem_object_put(obj); mutex_lock(&mem->objects.lock);
- if (found >= target) + if (err == -EDEADLK || err == -EINTR || found >= target) break; } list_splice_tail(&still_in_list, *phase); mutex_unlock(&mem->objects.lock);
+ if (err == -EDEADLK || err == -EINTR) + return err; + if (found < target && i915->params.enable_eviction) { pass++; phase++; @@ -182,11 +202,15 @@ static int intel_memory_region_evict(struct intel_memory_region *mem, goto next; }
+ if (ww) + i915_gem_ww_ctx_unlock_evictions(ww); + return (found < target) ? -ENOSPC : 0; }
int __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, + struct i915_gem_ww_ctx *ww, resource_size_t size, unsigned int flags, struct list_head *blocks) @@ -194,6 +218,7 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, unsigned int min_order = 0; unsigned int max_order; unsigned long n_pages; + int err;
GEM_BUG_ON(!IS_ALIGNED(size, mem->mm.chunk_size)); GEM_BUG_ON(!list_empty(blocks)); @@ -241,12 +266,11 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem,
if (order-- == min_order) { resource_size_t target; - int err;
target = n_pages * mem->mm.chunk_size;
mutex_unlock(&mem->mm_lock); - err = intel_memory_region_evict(mem, + err = intel_memory_region_evict(mem, ww, target); mutex_lock(&mem->mm_lock); if (err) @@ -272,6 +296,9 @@ __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, err_free_blocks: intel_memory_region_free_pages(mem, blocks); mutex_unlock(&mem->mm_lock); + if (err == -EDEADLK || err == -EINTR) + return err; + return -ENXIO; }
@@ -284,7 +311,7 @@ __intel_memory_region_get_block_buddy(struct intel_memory_region *mem, LIST_HEAD(blocks); int ret;
- ret = __intel_memory_region_get_pages_buddy(mem, size, flags, &blocks); + ret = __intel_memory_region_get_pages_buddy(mem, NULL, size, flags, &blocks); if (ret) return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 0bfc1fa36f74..ff1d97667618 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -16,6 +16,7 @@
#include "i915_buddy.h"
+struct i915_gem_ww_ctx; struct drm_i915_private; struct drm_i915_gem_object; struct intel_memory_region; @@ -116,6 +117,7 @@ int intel_memory_region_init_buddy(struct intel_memory_region *mem); void intel_memory_region_release_buddy(struct intel_memory_region *mem);
int __intel_memory_region_get_pages_buddy(struct intel_memory_region *mem, + struct i915_gem_ww_ctx *ww, resource_size_t size, unsigned int flags, struct list_head *blocks);
From: Thomas Hellström thomas.hellstrom@intel.com
Prefer a ww transaction rather than a single object lock to enable sleeping lock eviction if reached by the fault handler.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 45 +++++++++++++----------- 1 file changed, 24 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 33ccd4d665d4..a9526cc309d3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -238,6 +238,7 @@ static vm_fault_t vm_fault_cpu(struct vm_fault *vmf) struct vm_area_struct *area = vmf->vma; struct i915_mmap_offset *mmo = area->vm_private_data; struct drm_i915_gem_object *obj = mmo->obj; + struct i915_gem_ww_ctx ww; resource_size_t iomap; int err;
@@ -246,33 +247,35 @@ static vm_fault_t vm_fault_cpu(struct vm_fault *vmf) area->vm_flags & VM_WRITE)) return VM_FAULT_SIGBUS;
- if (i915_gem_object_lock_interruptible(obj, NULL)) - return VM_FAULT_NOPAGE; + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(obj, &ww); + if (err) + continue;
- err = i915_gem_object_pin_pages(obj); - if (err) - goto out; + err = i915_gem_object_pin_pages(obj); + if (err) + continue;
- iomap = -1; - if (!i915_gem_object_has_struct_page(obj)) { - iomap = obj->mm.region->iomap.base; - iomap -= obj->mm.region->region.start; - } + iomap = -1; + if (!i915_gem_object_has_struct_page(obj)) { + iomap = obj->mm.region->iomap.base; + iomap -= obj->mm.region->region.start; + }
- /* PTEs are revoked in obj->ops->put_pages() */ - err = remap_io_sg(area, - area->vm_start, area->vm_end - area->vm_start, - obj->mm.pages->sgl, iomap); + /* PTEs are revoked in obj->ops->put_pages() */ + err = remap_io_sg(area, + area->vm_start, area->vm_end - area->vm_start, + obj->mm.pages->sgl, iomap);
- if (area->vm_flags & VM_WRITE) { - GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); - obj->mm.dirty = true; - } + if (area->vm_flags & VM_WRITE) { + GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); + obj->mm.dirty = true; + }
- i915_gem_object_unpin_pages(obj); + i915_gem_object_unpin_pages(obj); + /* Implicit unlock */ + }
-out: - i915_gem_object_unlock(obj); return i915_error_to_vmf_fault(err); }
From: Thomas Hellström thomas.hellstrom@intel.com
By using a ww transaction, anybody using this function and ending up evicting objects can use sleeping waits when locking objects to evict.
Signed-off-by: Thomas Hellström thomas.hellstrom@intel.com Cc: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index d0f3da0925f5..0c20f9b18956 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -425,11 +425,22 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, void *i915_gem_object_pin_map_unlocked(struct drm_i915_gem_object *obj, enum i915_map_type type) { + struct i915_gem_ww_ctx ww; void *ret; + int err;
- i915_gem_object_lock(obj, NULL); - ret = i915_gem_object_pin_map(obj, type); - i915_gem_object_unlock(obj); + for_i915_gem_ww(&ww, err, false) { + err = i915_gem_object_lock(obj, &ww); + if (err) + continue; + + ret = i915_gem_object_pin_map(obj, type); + if (IS_ERR(ret)) + err = PTR_ERR(ret); + /* Implicit unlock */ + } + if (err) + return ERR_PTR(err);
return ret; }
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Current code uses jiffie time to do the accounting and then does:
diff = jiffies - start; msec = diff * 1000 / HZ; ... atomic_long_add(msec, &i915->time_swap_out_ms);
If we assume jiffie can be as non-granular as 10ms and that the current accounting records all evictions faster than one jiffie as infinite speed, we can end up over-estimating the reported eviction throughput.
Fix this by accumulating ktime_t and only dividing to more user friendly granularity at presentation time (debugfs read).
At the same time consolidate the code a bit and convert from multiple atomics to single seqlock per stat.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: CQ Tang cq.tang@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Mika Kuoppala mika.kuoppala@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_region.c | 67 ++++++++++---------- drivers/gpu/drm/i915/i915_debugfs.c | 73 +++++++++++----------- drivers/gpu/drm/i915/i915_drv.h | 25 +++++--- drivers/gpu/drm/i915/i915_gem.c | 5 ++ 4 files changed, 90 insertions(+), 80 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 8ec59fbaa3e6..1a390e502d5a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -9,14 +9,29 @@ #include "i915_trace.h" #include "i915_gem_mman.h"
+static void +__update_stat(struct i915_mm_swap_stat *stat, + unsigned long pages, + ktime_t start) +{ + if (stat) { + start = ktime_get() - start; + + write_seqlock(&stat->lock); + stat->time = ktime_add(stat->time, start); + stat->pages += pages; + write_sequnlock(&stat->lock); + } +} + static int i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, unsigned int sizes) { struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct i915_mm_swap_stat *stat = NULL; struct drm_i915_gem_object *dst, *src; - unsigned long start, diff, msec; - bool blt_completed = false; + ktime_t start = ktime_get(); int err = -EINVAL;
GEM_BUG_ON(obj->swapto); @@ -26,7 +41,6 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj); - start = jiffies;
/* create a shadow object on smem region */ dst = i915_gem_object_create_shmem(i915, obj->base.size); @@ -58,10 +72,14 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, if (i915->params.enable_eviction >= 2) { err = i915_window_blt_copy(dst, src); if (!err) - blt_completed = true; + stat = &i915->mm.blt_swap_stats.out; } - if (err && i915->params.enable_eviction != 2) + + if (err && i915->params.enable_eviction != 2) { err = i915_gem_object_memcpy(dst, src); + if (!err) + stat = &i915->mm.memcpy_swap_stats.out; + }
__i915_gem_object_unpin_pages(src); __i915_gem_object_unset_pages(src); @@ -73,18 +91,7 @@ i915_gem_object_swapout_pages(struct drm_i915_gem_object *obj, else i915_gem_object_put(dst);
- if (!err) { - diff = jiffies - start; - msec = diff * 1000 / HZ; - if (blt_completed) { - atomic_long_add(sizes, &i915->num_bytes_swapped_out); - atomic_long_add(msec, &i915->time_swap_out_ms); - } else { - atomic_long_add(sizes, - &i915->num_bytes_swapped_out_memcpy); - atomic_long_add(msec, &i915->time_swap_out_ms_memcpy); - } - } + __update_stat(stat, sizes >> PAGE_SHIFT, start);
return err; } @@ -94,9 +101,9 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, unsigned int sizes) { struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct i915_mm_swap_stat *stat = NULL; struct drm_i915_gem_object *dst, *src; - unsigned long start, diff, msec; - bool blt_completed = false; + ktime_t start = ktime_get(); int err = -EINVAL;
GEM_BUG_ON(!obj->swapto); @@ -106,7 +113,6 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(!i915->params.enable_eviction);
assert_object_held(obj); - start = jiffies;
src = obj->swapto;
@@ -134,10 +140,14 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, if (i915->params.enable_eviction >= 2) { err = i915_window_blt_copy(dst, src); if (!err) - blt_completed = true; + stat = &i915->mm.blt_swap_stats.in; } - if (err && i915->params.enable_eviction != 2) + + if (err && i915->params.enable_eviction != 2) { err = i915_gem_object_memcpy(dst, src); + if (!err) + stat = &i915->mm.memcpy_swap_stats.in; + }
__i915_gem_object_unpin_pages(dst); __i915_gem_object_unset_pages(dst); @@ -149,18 +159,7 @@ i915_gem_object_swapin_pages(struct drm_i915_gem_object *obj, i915_gem_object_put(src); }
- if (!err) { - diff = jiffies - start; - msec = diff * 1000 / HZ; - if (blt_completed) { - atomic_long_add(sizes, &i915->num_bytes_swapped_in); - atomic_long_add(msec, &i915->time_swap_in_ms); - } else { - atomic_long_add(sizes, - &i915->num_bytes_swapped_in_memcpy); - atomic_long_add(msec, &i915->time_swap_in_ms_memcpy); - } - } + __update_stat(stat, sizes >> PAGE_SHIFT, start);
return err; } diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 983030ac39e1..f06f900b598e 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -359,12 +359,46 @@ static void print_context_stats(struct seq_file *m, print_file_stats(m, "[k]contexts", kstats); }
+static void +evict_stat(struct seq_file *m, + const char *name, + const char *direction, + struct i915_mm_swap_stat *stat) +{ + unsigned long pages; + unsigned int seq; + u64 time, rate; + ktime_t ktime; + + do { + seq = read_seqbegin(&stat->lock); + pages = stat->pages; + ktime = stat->time; + } while (read_seqretry(&stat->lock, seq)); + + time = ktime_to_us(ktime); + rate = time ? div64_u64((u64)pages * PAGE_SIZE, time) : 0; + rate = div64_ul(rate * USEC_PER_SEC, 1024 * 1024); + + seq_printf(m, "%s swap %s %lu MiB in %llums, %llu MiB/s.\n", + name, direction, pages * PAGE_SIZE, ktime_to_ms(ktime), + rate); +} + +static void +evict_stats(struct seq_file *m, + const char *name, + struct i915_mm_swap_stats *stats) +{ + evict_stat(m, name, "in", &stats->in); + evict_stat(m, name, "out", &stats->out); +} + static int i915_gem_object_info(struct seq_file *m, void *data) { struct drm_i915_private *i915 = node_to_i915(m->private); struct intel_memory_region *mr; enum intel_region_id id; - u64 time, bytes, rate;
seq_printf(m, "%u shrinkable [%u free] objects, %llu bytes\n", i915->mm.shrink_count, @@ -374,41 +408,8 @@ static int i915_gem_object_info(struct seq_file *m, void *data) seq_printf(m, "%s: total:%pa, available:%pa bytes\n", mr->name, &mr->total, &mr->avail);
- time = atomic_long_read(&i915->time_swap_out_ms); - bytes = atomic_long_read(&i915->num_bytes_swapped_out); - if (time) - rate = div64_u64(bytes * 1000, time * 1024 * 1024); - else - rate = 0; - seq_printf(m, "BLT: swapout %llu Bytes in %llu mSec(%llu MB/Sec)\n", - bytes, time, rate); - - time = atomic_long_read(&i915->time_swap_in_ms); - bytes = atomic_long_read(&i915->num_bytes_swapped_in); - if (time) - rate = div64_u64(bytes * 1000, time * 1024 * 1024); - else - rate = 0; - seq_printf(m, "BLT: swapin %llu Bytes in %llu mSec(%llu MB/Sec)\n", - bytes, time, rate); - - time = atomic_long_read(&i915->time_swap_out_ms_memcpy); - bytes = atomic_long_read(&i915->num_bytes_swapped_out_memcpy); - if (time) - rate = div64_u64(bytes * 1000, time * 1024 * 1024); - else - rate = 0; - seq_printf(m, "Memcpy: swapout %llu Bytes in %llu mSec(%llu MB/Sec)\n", - bytes, time, rate); - - time = atomic_long_read(&i915->time_swap_in_ms_memcpy); - bytes = atomic_long_read(&i915->num_bytes_swapped_in_memcpy); - if (time) - rate = div64_u64(bytes * 1000, time * 1024 * 1024); - else - rate = 0; - seq_printf(m, "Memcpy: swapin %llu Bytes in %llu mSec(%llu MB/Sec)\n", - bytes, time, rate); + evict_stats(m, "Blitter", &i915->mm.blt_swap_stats); + evict_stats(m, "Memcpy", &i915->mm.memcpy_swap_stats); seq_putc(m, '\n');
print_context_stats(m, i915); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 6f0ab363bdee..45511f2d8da0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -49,6 +49,7 @@ #include <linux/shmem_fs.h> #include <linux/stackdepot.h> #include <linux/xarray.h> +#include <linux/seqlock.h>
#include <drm/intel-gtt.h> #include <drm/drm_legacy.h> /* for struct drm_dma_handle */ @@ -548,6 +549,17 @@ struct intel_l3_parity { int which_slice; };
+struct i915_mm_swap_stat { + seqlock_t lock; + unsigned long pages; + ktime_t time; +}; + +struct i915_mm_swap_stats { + struct i915_mm_swap_stat in; + struct i915_mm_swap_stat out; +}; + struct i915_gem_mm { /* Protects bound_list/unbound_list and #drm_i915_gem_object.mm.link */ spinlock_t obj_lock; @@ -601,6 +613,9 @@ struct i915_gem_mm {
/* To protect above two set of vmas */ wait_queue_head_t window_queue; + + struct i915_mm_swap_stats blt_swap_stats; + struct i915_mm_swap_stats memcpy_swap_stats; };
#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */ @@ -1220,16 +1235,6 @@ struct drm_i915_private { * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch * will be rejected. Instead look for a better place. */ - - atomic_long_t num_bytes_swapped_out; - atomic_long_t num_bytes_swapped_in; - atomic_long_t time_swap_out_ms; - atomic_long_t time_swap_in_ms; - - atomic_long_t num_bytes_swapped_out_memcpy; - atomic_long_t num_bytes_swapped_in_memcpy; - atomic_long_t time_swap_out_ms_memcpy; - atomic_long_t time_swap_in_ms_memcpy; };
static inline struct drm_i915_private *to_i915(const struct drm_device *dev) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 85cbdb8e2bb8..e94f3f689b30 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1151,6 +1151,11 @@ static void i915_gem_init__mm(struct drm_i915_private *i915) INIT_LIST_HEAD(&i915->mm.purge_list); INIT_LIST_HEAD(&i915->mm.shrink_list);
+ seqlock_init(&i915->mm.blt_swap_stats.in.lock); + seqlock_init(&i915->mm.blt_swap_stats.out.lock); + seqlock_init(&i915->mm.memcpy_swap_stats.in.lock); + seqlock_init(&i915->mm.memcpy_swap_stats.out.lock); + i915_gem_init__objects(i915); }
Quoting Matthew Auld (2020-11-27 12:07:13)
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Current code uses jiffie time to do the accounting and then does:
diff = jiffies - start; msec = diff * 1000 / HZ; ... atomic_long_add(msec, &i915->time_swap_out_ms);
If we assume jiffie can be as non-granular as 10ms and that the current accounting records all evictions faster than one jiffie as infinite speed, we can end up over-estimating the reported eviction throughput.
Fix this by accumulating ktime_t and only dividing to more user friendly granularity at presentation time (debugfs read).
At the same time consolidate the code a bit and convert from multiple atomics to single seqlock per stat.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: CQ Tang cq.tang@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Mika Kuoppala mika.kuoppala@linux.intel.com
A lot of effort to fix up patches after the fact, might as well make it a real PMU interface. -Chris
On 27/11/2020 14:40, Chris Wilson wrote:
Quoting Matthew Auld (2020-11-27 12:07:13)
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Current code uses jiffie time to do the accounting and then does:
diff = jiffies - start; msec = diff * 1000 / HZ; ... atomic_long_add(msec, &i915->time_swap_out_ms);
If we assume jiffie can be as non-granular as 10ms and that the current accounting records all evictions faster than one jiffie as infinite speed, we can end up over-estimating the reported eviction throughput.
Fix this by accumulating ktime_t and only dividing to more user friendly granularity at presentation time (debugfs read).
At the same time consolidate the code a bit and convert from multiple atomics to single seqlock per stat.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: CQ Tang cq.tang@intel.com Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Mika Kuoppala mika.kuoppala@linux.intel.com
A lot of effort to fix up patches after the fact, might as well make it a real PMU interface.
It did cross my mind and should be easy to add on top if deemed useful or interesting.
More importantly, it is okay with me to incorporate this patch into the earlier one(s) which first added statistics.
Regards,
Tvrtko
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
Add ww locks during suspend/resume.
Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index b7d40a9c00bf..c41865d5bf1e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1099,7 +1099,7 @@ static int i915_drm_prepare(struct drm_device *dev) struct drm_i915_private *i915 = to_i915(dev);
if (HAS_LMEM(i915)) { - struct intel_gt *gt= &i915->gt; + struct intel_gt *gt = &i915->gt; long timeout = I915_GEM_IDLE_TIMEOUT; int ret;
@@ -1182,7 +1182,8 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend, struct drm_i915_private *i915 = to_i915(dev); struct drm_i915_gem_object *obj; struct intel_memory_region *mem; - int id, ret = 0; + struct i915_gem_ww_ctx ww; + int id, ret = 0, err = 0;
for_each_memory_region(mem, i915, id) { struct list_head still_in_list; @@ -1204,19 +1205,20 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend,
mutex_unlock(&mem->objects.lock);
+ i915_gem_ww_ctx_init (&ww, true); +retry: + err = i915_gem_object_lock(obj, &ww); + if (err) + goto out_err; + if (in_suspend) { obj->swapto = NULL; obj->evicted = false;
ret = i915_gem_object_unbind(obj, 0); if (ret || i915_gem_object_has_pinned_pages(obj)) { - if (!i915_gem_object_trylock(obj)) { - ret = -EBUSY; - goto next; - } ret = i915_gem_perma_pinned_object_swapout(obj); - i915_gem_object_unlock(obj); - goto next; + goto out_err; }
obj->do_swapping = true; @@ -1228,13 +1230,7 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend, obj->evicted = true; } else { if (i915_gem_object_has_pinned_pages(obj) && perma_pin) { - if (!i915_gem_object_trylock(obj)) { - ret = -EBUSY; - goto next; - } ret = i915_gem_perma_pinned_object_swapin(obj); - /* FIXME: Where is this error message taken care of? */ - i915_gem_object_unlock(obj); }
if (obj->swapto && obj->evicted && !perma_pin) { @@ -1247,7 +1243,14 @@ static int intel_dmem_evict_buffers(struct drm_device *dev, bool in_suspend, } } } -next: +out_err: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + mutex_lock(&mem->objects.lock); if (ret) break;
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
Use I915_MAP_WC when default state object is allocated on LMEM.
Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com --- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 041e2a50160d..1fbc070a4651 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -8,6 +8,7 @@ #include <linux/shmem_fs.h>
#include "gem/i915_gem_object.h" +#include "gem/i915_gem_lmem.h" #include "shmem_utils.h"
struct file *shmem_create_from_data(const char *name, void *data, size_t len) @@ -39,7 +40,8 @@ struct file *shmem_create_from_object(struct drm_i915_gem_object *obj) return file; }
- ptr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); + ptr = i915_gem_object_pin_map_unlocked(obj, i915_gem_object_is_lmem(obj) ? + I915_MAP_WC : I915_MAP_WB); if (IS_ERR(ptr)) return ERR_CAST(ptr);
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
This is to fix a bug in upstream commit a6326a4f8ffb ("drm/i915/gt: Keep a no-frills swappable copy of the default context state")
We allocate context state obj ce->state from lmem, so in __engines_record_defaults(), we call shmem_create_from_object(). Because it is lmem object, this call will create a new shmemfs file, copy the contents into it, and return the file pointer and assign to engine->default_state. Of course ce->state lmem object is freed at the end of function __engines_record_redefaults().
Because a new shmemfs file is create for engine->default_state, and more importantly, we DON'T mark the pages dirty after we write into it, the OS page cache eviction will drop these pages.
Now with the test move forward, it will create new request/context, and will copy the saved engine->default_state into ce->state. If the default_state pages are dropped during page cache eviction, the copying will get new pages, and copy garbage from the new pages. Next, ce->state will have wrong instruction and causes GPU to hang.
The fixing is very simple, we just mark the shmemfs pages to be dirty when writing into it, and also mark the pages to accessed when read/write to them.
Fixes: a6326a4f8ffb("drm/i915/gt: Keep a no-frills swappable copy of the default context state") Cc: Sudeep Dutt sudeep.dutt@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: Chris Wilson chris@intel.com Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com --- drivers/gpu/drm/i915/gt/shmem_utils.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 1fbc070a4651..e24c2c2342bb 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -105,10 +105,13 @@ static int __shmem_rw(struct file *file, loff_t off, return PTR_ERR(page);
vaddr = kmap(page); - if (write) + if (write) { memcpy(vaddr + offset_in_page(off), ptr, this); - else + set_page_dirty(page); + } else { memcpy(ptr, vaddr + offset_in_page(off), this); + } + mark_page_accessed(page); kunmap(page); put_page(page);
Quoting Matthew Auld (2020-11-27 12:07:16)
From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com
This is to fix a bug in upstream commit a6326a4f8ffb ("drm/i915/gt: Keep a no-frills swappable copy of the default context state")
We allocate context state obj ce->state from lmem, so in __engines_record_defaults(), we call shmem_create_from_object(). Because it is lmem object, this call will create a new shmemfs file, copy the contents into it, and return the file pointer and assign to engine->default_state. Of course ce->state lmem object is freed at the end of function __engines_record_redefaults().
Because a new shmemfs file is create for engine->default_state, and more importantly, we DON'T mark the pages dirty after we write into it, the OS page cache eviction will drop these pages.
Now with the test move forward, it will create new request/context, and will copy the saved engine->default_state into ce->state. If the default_state pages are dropped during page cache eviction, the copying will get new pages, and copy garbage from the new pages. Next, ce->state will have wrong instruction and causes GPU to hang.
The fixing is very simple, we just mark the shmemfs pages to be dirty when writing into it, and also mark the pages to accessed when read/write to them.
Fixes: a6326a4f8ffb("drm/i915/gt: Keep a no-frills swappable copy of the default context state")
A bug fix, send it. But please write a concise changelog first.
I missed setting the dirty bit, and so the contents were not being saved on swap out as expected. Impact is severe; any context created after resume may be gibberish. -Chris
From: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/i915_pci.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index c3d9b36ef651..603976b9a973 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1001,6 +1001,7 @@ static const struct pci_device_id pciidlist[] = { INTEL_JSL_IDS(&jsl_info), INTEL_TGL_12_IDS(&tgl_info), INTEL_RKL_IDS(&rkl_info), + INTEL_DG1_IDS(&dg1_info), {0, 0, 0} }; MODULE_DEVICE_TABLE(pci, pciidlist);
Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 15 ---- drivers/gpu/drm/i915/i915_params.c | 5 -- drivers/gpu/drm/i915/i915_params.h | 1 - drivers/gpu/drm/i915/intel_memory_region.c | 11 +-- drivers/gpu/drm/i915/intel_region_lmem.c | 96 ---------------------- drivers/gpu/drm/i915/intel_region_lmem.h | 3 - 6 files changed, 1 insertion(+), 130 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index c41865d5bf1e..ee7272abc2b4 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -836,21 +836,6 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (!i915->params.nuclear_pageflip && match_info->gen < 5) i915->drm.driver_features &= ~DRIVER_ATOMIC;
- /* - * Check if we support fake LMEM -- for now we only unleash this for - * the live selftests(test-and-exit). - */ -#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) - if (IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM)) { - if (INTEL_GEN(i915) >= 9 && i915_selftest.live < 0 && - i915->params.fake_lmem_start) { - mkwrite_device_info(i915)->memory_regions = - REGION_SMEM | REGION_LMEM | REGION_STOLEN_SMEM; - GEM_BUG_ON(!HAS_LMEM(i915)); - } - } -#endif - ret = pci_enable_device(pdev); if (ret) goto out_fini; diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 9fa58ed76614..819341f77488 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -192,11 +192,6 @@ i915_param_named(enable_gvt, bool, 0400, "Enable support for Intel GVT-g graphics virtualization host support(default:false)"); #endif
-#if IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM) -i915_param_named_unsafe(fake_lmem_start, ulong, 0400, - "Fake LMEM start offset (default: 0)"); -#endif - i915_param_named_unsafe(enable_eviction, uint, 0600, "Enable eviction which does not rely on DMA resv refactoring " "0=disabled, 1=memcpy based only, 2=blt based only, " diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h index c835e592ee5f..ea6e99735ff2 100644 --- a/drivers/gpu/drm/i915/i915_params.h +++ b/drivers/gpu/drm/i915/i915_params.h @@ -70,7 +70,6 @@ struct drm_printer; param(int, fastboot, -1, 0600) \ param(int, enable_dpcd_backlight, -1, 0600) \ param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \ - param(unsigned long, fake_lmem_start, 0, 0400) \ param(unsigned int, lmem_size, 0, 0400) \ param(unsigned int, enable_eviction, 3, 0600) \ /* leave bools at the end to not create holes */ \ diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 6b26b6cd5958..045efb9b01d9 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -447,16 +447,7 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) mem = i915_gem_stolen_setup(i915); break; case INTEL_MEMORY_LOCAL: -#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) - if (IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM)) { - if (INTEL_GEN(i915) >= 9 && i915_selftest.live < 0 && - i915->params.fake_lmem_start) - mem = intel_setup_fake_lmem(i915); - } -#endif - - if (IS_ERR(mem)) - mem = i915_gem_setup_lmem(i915); + mem = i915_gem_setup_lmem(i915); break; }
diff --git a/drivers/gpu/drm/i915/intel_region_lmem.c b/drivers/gpu/drm/i915/intel_region_lmem.c index 1cdb6354b968..e3f5ca619318 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/intel_region_lmem.c @@ -9,64 +9,9 @@ #include "gem/i915_gem_region.h" #include "intel_region_lmem.h"
-static int init_fake_lmem_bar(struct intel_memory_region *mem) -{ - struct drm_i915_private *i915 = mem->i915; - struct i915_ggtt *ggtt = &i915->ggtt; - unsigned long n; - int ret; - - /* We want to 1:1 map the mappable aperture to our reserved region */ - - mem->fake_mappable.start = 0; - mem->fake_mappable.size = resource_size(&mem->region); - mem->fake_mappable.color = I915_COLOR_UNEVICTABLE; - - ret = drm_mm_reserve_node(&ggtt->vm.mm, &mem->fake_mappable); - if (ret) - return ret; - - mem->remap_addr = dma_map_resource(&i915->drm.pdev->dev, - mem->region.start, - mem->fake_mappable.size, - PCI_DMA_BIDIRECTIONAL, - DMA_ATTR_FORCE_CONTIGUOUS); - if (dma_mapping_error(&i915->drm.pdev->dev, mem->remap_addr)) { - drm_mm_remove_node(&mem->fake_mappable); - return -EINVAL; - } - - for (n = 0; n < mem->fake_mappable.size >> PAGE_SHIFT; ++n) { - ggtt->vm.insert_page(&ggtt->vm, - mem->remap_addr + (n << PAGE_SHIFT), - n << PAGE_SHIFT, - I915_CACHE_NONE, 0); - } - - mem->region = (struct resource)DEFINE_RES_MEM(mem->remap_addr, - mem->fake_mappable.size); - - return 0; -} - -static void release_fake_lmem_bar(struct intel_memory_region *mem) -{ - if (!drm_mm_node_allocated(&mem->fake_mappable)) - return; - - drm_mm_remove_node(&mem->fake_mappable); - - dma_unmap_resource(&mem->i915->drm.pdev->dev, - mem->remap_addr, - mem->fake_mappable.size, - PCI_DMA_BIDIRECTIONAL, - DMA_ATTR_FORCE_CONTIGUOUS); -} - static void region_lmem_release(struct intel_memory_region *mem) { - release_fake_lmem_bar(mem); io_mapping_fini(&mem->iomap); intel_memory_region_release_buddy(mem); } @@ -76,11 +21,6 @@ region_lmem_init(struct intel_memory_region *mem) { int ret;
- if (mem->i915->params.fake_lmem_start) { - ret = init_fake_lmem_bar(mem); - GEM_BUG_ON(ret); - } - if (!io_mapping_init_wc(&mem->iomap, mem->io_start, resource_size(&mem->region))) @@ -101,42 +41,6 @@ const struct intel_memory_region_ops intel_region_lmem_ops = { .create_object = __i915_gem_lmem_object_create, };
-struct intel_memory_region * -intel_setup_fake_lmem(struct drm_i915_private *i915) -{ - struct pci_dev *pdev = i915->drm.pdev; - struct intel_memory_region *mem; - resource_size_t mappable_end; - resource_size_t io_start; - resource_size_t start; - - GEM_BUG_ON(i915_ggtt_has_aperture(&i915->ggtt)); - GEM_BUG_ON(!i915->params.fake_lmem_start); - - /* Your mappable aperture belongs to me now! */ - mappable_end = pci_resource_len(pdev, 2); - io_start = pci_resource_start(pdev, 2), - start = i915->params.fake_lmem_start; - - mem = intel_memory_region_create(i915, - start, - mappable_end, - PAGE_SIZE, - io_start, - &intel_region_lmem_ops); - if (!IS_ERR(mem)) { - drm_info(&i915->drm, "Intel graphics fake LMEM: %pR\n", - &mem->region); - drm_info(&i915->drm, - "Intel graphics fake LMEM IO start: %llx\n", - (u64)mem->io_start); - drm_info(&i915->drm, "Intel graphics fake LMEM size: %llx\n", - (u64)resource_size(&mem->region)); - } - - return mem; -} - static void get_legacy_lowmem_region(struct intel_uncore *uncore, u64 *start, u32 *size) { diff --git a/drivers/gpu/drm/i915/intel_region_lmem.h b/drivers/gpu/drm/i915/intel_region_lmem.h index 054e729035c1..6dbed8de3ce3 100644 --- a/drivers/gpu/drm/i915/intel_region_lmem.h +++ b/drivers/gpu/drm/i915/intel_region_lmem.h @@ -12,7 +12,4 @@ extern const struct intel_memory_region_ops intel_region_lmem_ops;
struct intel_memory_region *i915_gem_setup_lmem(struct drm_i915_private *i915);
-struct intel_memory_region * -intel_setup_fake_lmem(struct drm_i915_private *i915); - #endif /* !__INTEL_REGION_LMEM_H */
dri-devel@lists.freedesktop.org