On Thu, Aug 26, 2021 at 03:59:30PM +0200, Thomas Hellström wrote:
On Thu, 2021-08-26 at 14:44 +0200, Daniel Vetter wrote:
On Thu, Aug 26, 2021 at 12:45:14PM +0200, Thomas Hellström wrote:
Pinned contexts, like the migrate contexts need reset after resume since their context image may have been lost. Also the GuC needs to register pinned contexts.
Add a list to struct intel_engine_cs where we add all pinned contexts on creation, and traverse that list at resume time to reset the pinned contexts.
This fixes the kms_pipe_crc_basic@suspend-read-crc-pipe-a selftest for now, but proper LMEM backup / restore is needed for full suspend functionality. However, note that even with full LMEM backup / restore it may be desirable to keep the reset since backing up the migrate context images must happen using memcpy() after the migrate context has become inactive, and for performance- and other reasons we want to avoid memcpy() from LMEM.
Also traverse the list at guc_init_lrc_mapping() calling guc_kernel_context_pin() for the pinned contexts, like is already done for the kernel context.
v2:
- Don't reset the contexts on each __engine_unpark() but rather at
resume time (Chris Wilson).
Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Brost Matthew matthew.brost@intel.com Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com
I guess it got lost, but I few weeks ago I stumbled over this and wondered why we're even setting up a separate context or at least why a separate vm compared to the gt->vm we have already?
Even on chips with bazillions of copy engines the plan is that we only reserve a single one for kernel migrations, so there's not really a need for quite this much generality I think. Maybe check with Jon Bloomfield on this.
Are you referring to the generality of the migration code itself or to the generality of using a list in this patch to register multiple pinned contexts to an engine?
For the migration code itself, I figured reserving one copy engine for migration was strictly needed for recoverable page-faults? In the current version we're not doing that, but just tying a pinned migration context to the first available copy engine on the gt, to be used when we don't have a ww context available to pin a separate context using a random copy engine. Note also the ring size of the migration contexts; since we're populating the page-tables for each blit, it's not hard to fill the ring and in the end multiple contexts I guess boils down to avoiding priority inversion on migration, including blocking high priority kernel context tasks.
As for not using the gt->vm, I'm not completely sure if we can do our special page-table setup on that, Got to defer that question to Chris, but once Ram's work of supporting 64K LMEM PTEs on that has landed I guess we could easily reuse the gt->vm if possible and suitable.
Just on why we have gt->vm and then also the migration vm. The old mail I typed up on this:
https://lore.kernel.org/dri-devel/CAKMK7uG6g+DQQEcjqeA6=Z2ENHogaMuvKERDgKm5j...
-Daniel