Hi all,
After many bits have been spilled on dri-devel discussion this I think we're converging on a consensus understanding of where we are, and it's time to resubmit patches.
This is essentially v2 of
https://lore.kernel.org/dri-devel/20210521090959.1663703-7-daniel.vetter@ffw...
but a lot has changed:
- Christian fixed up amdgpu with a much more competent patch.
- I used the entire audit I've done in that patch to instead improve the documentation. That's the first 3 patches.
- panfrost patches fixed (hopefully, testing would be appreciated)
- drm/tiny patch fixed
- I've also thrown an RFC on top at the end for what I think amdgpu should be doing. Probably really, really buggy, so beware :-)
Review on the entire pile except the very last RFC very much appreciated.
Note that this does not, by far, fix all the various issues in handling dma_buf.resv fences. This is just the part I had mostly ready already, and which didn't take long to refresh and rebase. The other part is checking whether drivers do anything funny that breaks the cross driver contract in how they handle dependencies the get from the dma_buf.resv. I know they do, but the full audit is not yet done.
Cheers, Daniel
Daniel Vetter (15): dma-resv: Fix kerneldoc dma-buf: Switch to inline kerneldoc dma-buf: Document dma-buf implicit fencing/resv fencing rules drm/panfrost: Shrink sched_lock drm/panfrost: Use xarray and helpers for depedency tracking drm/panfrost: Fix implicit sync drm/atomic-helper: make drm_gem_plane_helper_prepare_fb the default drm/<driver>: drm_gem_plane_helper_prepare_fb is now the default drm/armada: Remove prepare/cleanup_fb hooks drm/vram-helpers: Create DRM_GEM_VRAM_PLANE_HELPER_FUNCS drm/omap: Follow implicit fencing in prepare_fb drm/simple-helper: drm_gem_simple_display_pipe_prepare_fb as default drm/tiny: drm_gem_simple_display_pipe_prepare_fb is the default drm/gem: Tiny kernel clarification for drm_gem_fence_array_add RFC: drm/amdgpu: Implement a proper implicit fencing uapi
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 + drivers/gpu/drm/armada/armada_overlay.c | 2 - drivers/gpu/drm/armada/armada_plane.c | 29 ---- drivers/gpu/drm/armada/armada_plane.h | 2 - drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c | 1 - drivers/gpu/drm/ast/ast_mode.c | 3 +- drivers/gpu/drm/drm_atomic_helper.c | 10 ++ drivers/gpu/drm/drm_gem.c | 3 + drivers/gpu/drm/drm_gem_atomic_helper.c | 3 + drivers/gpu/drm/drm_simple_kms_helper.c | 12 +- drivers/gpu/drm/gud/gud_drv.c | 1 - .../gpu/drm/hisilicon/hibmc/hibmc_drm_de.c | 3 +- drivers/gpu/drm/imx/dcss/dcss-plane.c | 1 - drivers/gpu/drm/imx/ipuv3-plane.c | 1 - drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 1 - drivers/gpu/drm/ingenic/ingenic-ipu.c | 1 - drivers/gpu/drm/mcde/mcde_display.c | 1 - drivers/gpu/drm/mediatek/mtk_drm_plane.c | 1 - drivers/gpu/drm/meson/meson_overlay.c | 1 - drivers/gpu/drm/meson/meson_plane.c | 1 - drivers/gpu/drm/mxsfb/mxsfb_kms.c | 2 - drivers/gpu/drm/omapdrm/omap_plane.c | 3 + drivers/gpu/drm/panfrost/panfrost_drv.c | 41 +++-- drivers/gpu/drm/panfrost/panfrost_job.c | 71 ++++----- drivers/gpu/drm/panfrost/panfrost_job.h | 8 +- drivers/gpu/drm/pl111/pl111_display.c | 1 - drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 1 - drivers/gpu/drm/stm/ltdc.c | 1 - drivers/gpu/drm/sun4i/sun4i_layer.c | 1 - drivers/gpu/drm/sun4i/sun8i_ui_layer.c | 1 - drivers/gpu/drm/sun4i/sun8i_vi_layer.c | 1 - drivers/gpu/drm/tidss/tidss_plane.c | 1 - drivers/gpu/drm/tiny/hx8357d.c | 1 - drivers/gpu/drm/tiny/ili9225.c | 1 - drivers/gpu/drm/tiny/ili9341.c | 1 - drivers/gpu/drm/tiny/ili9486.c | 1 - drivers/gpu/drm/tiny/mi0283qt.c | 1 - drivers/gpu/drm/tiny/repaper.c | 1 - drivers/gpu/drm/tiny/st7586.c | 1 - drivers/gpu/drm/tiny/st7735r.c | 1 - drivers/gpu/drm/tve200/tve200_display.c | 1 - drivers/gpu/drm/vboxvideo/vbox_mode.c | 3 +- drivers/gpu/drm/xen/xen_drm_front_kms.c | 1 - include/drm/drm_gem_vram_helper.h | 12 ++ include/drm/drm_modeset_helper_vtables.h | 7 +- include/drm/drm_simple_kms_helper.h | 7 +- include/linux/dma-buf.h | 146 +++++++++++++++--- include/linux/dma-resv.h | 2 +- include/uapi/drm/amdgpu_drm.h | 10 ++ 51 files changed, 270 insertions(+), 170 deletions(-)
Oversight from
commit 6edbd6abb783d54f6ac4c3ed5cd9e50cff6c15e9 Author: Christian König christian.koenig@amd.com Date: Mon May 10 16:14:09 2021 +0200
dma-buf: rename and cleanup dma_resv_get_excl v3
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org --- include/linux/dma-resv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 562b885cf9c3..e1ca2080a1ff 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -212,7 +212,7 @@ static inline void dma_resv_unlock(struct dma_resv *obj) }
/** - * dma_resv_exclusive - return the object's exclusive fence + * dma_resv_excl_fence - return the object's exclusive fence * @obj: the reservation object * * Returns the exclusive fence (if any). Caller must either hold the objects
On Tue, Jun 22, 2021 at 12:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
Oversight from
commit 6edbd6abb783d54f6ac4c3ed5cd9e50cff6c15e9 Author: Christian König christian.koenig@amd.com Date: Mon May 10 16:14:09 2021 +0200
dma-buf: rename and cleanup dma_resv_get_excl v3
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Alex Deucher alexander.deucher@amd.com
include/linux/dma-resv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 562b885cf9c3..e1ca2080a1ff 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -212,7 +212,7 @@ static inline void dma_resv_unlock(struct dma_resv *obj) }
/**
- dma_resv_exclusive - return the object's exclusive fence
- dma_resv_excl_fence - return the object's exclusive fence
- @obj: the reservation object
- Returns the exclusive fence (if any). Caller must either hold the objects
-- 2.32.0.rc2
Hi Daniel,
On Tue, Jun 22, 2021 at 06:54:57PM +0200, Daniel Vetter wrote:
Oversight from
commit 6edbd6abb783d54f6ac4c3ed5cd9e50cff6c15e9 Author: Christian König christian.koenig@amd.com Date: Mon May 10 16:14:09 2021 +0200
this is what we uses Fixes: ... for.
It looks wrong to hide it in the description.
Sam
dma-buf: rename and cleanup dma_resv_get_excl v3
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
include/linux/dma-resv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 562b885cf9c3..e1ca2080a1ff 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -212,7 +212,7 @@ static inline void dma_resv_unlock(struct dma_resv *obj) }
/**
- dma_resv_exclusive - return the object's exclusive fence
- dma_resv_excl_fence - return the object's exclusive fence
- @obj: the reservation object
- Returns the exclusive fence (if any). Caller must either hold the objects
-- 2.32.0.rc2
On Tue, Jun 22, 2021 at 8:50 PM Sam Ravnborg sam@ravnborg.org wrote:
Hi Daniel,
On Tue, Jun 22, 2021 at 06:54:57PM +0200, Daniel Vetter wrote:
Oversight from
commit 6edbd6abb783d54f6ac4c3ed5cd9e50cff6c15e9 Author: Christian König christian.koenig@amd.com Date: Mon May 10 16:14:09 2021 +0200
this is what we uses Fixes: ... for.
It looks wrong to hide it in the description.
I've honestly become a bit vary of using Fixes: for docs/comments because the stable autoselect bots are _really_ keen on picking up anything with a Fixes: line in it. And that feels a bit like nonsense. -Daniel
Sam
dma-buf: rename and cleanup dma_resv_get_excl v3
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
include/linux/dma-resv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 562b885cf9c3..e1ca2080a1ff 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -212,7 +212,7 @@ static inline void dma_resv_unlock(struct dma_resv *obj) }
/**
- dma_resv_exclusive - return the object's exclusive fence
- dma_resv_excl_fence - return the object's exclusive fence
- @obj: the reservation object
- Returns the exclusive fence (if any). Caller must either hold the objects
-- 2.32.0.rc2
Am 22.06.21 um 18:54 schrieb Daniel Vetter:
Oversight from
commit 6edbd6abb783d54f6ac4c3ed5cd9e50cff6c15e9 Author: Christian König christian.koenig@amd.com Date: Mon May 10 16:14:09 2021 +0200
dma-buf: rename and cleanup dma_resv_get_excl v3
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Christian König christian.koenig@amd.com
include/linux/dma-resv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 562b885cf9c3..e1ca2080a1ff 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -212,7 +212,7 @@ static inline void dma_resv_unlock(struct dma_resv *obj) }
/**
- dma_resv_exclusive - return the object's exclusive fence
- dma_resv_excl_fence - return the object's exclusive fence
- @obj: the reservation object
- Returns the exclusive fence (if any). Caller must either hold the objects
On Wed, Jun 23, 2021 at 10:31:18AM +0200, Christian König wrote:
Am 22.06.21 um 18:54 schrieb Daniel Vetter:
Oversight from
commit 6edbd6abb783d54f6ac4c3ed5cd9e50cff6c15e9 Author: Christian König christian.koenig@amd.com Date: Mon May 10 16:14:09 2021 +0200
dma-buf: rename and cleanup dma_resv_get_excl v3
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Christian König christian.koenig@amd.com
Pushed to drm-misc-next. -Daniel
include/linux/dma-resv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 562b885cf9c3..e1ca2080a1ff 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -212,7 +212,7 @@ static inline void dma_resv_unlock(struct dma_resv *obj) } /**
- dma_resv_exclusive - return the object's exclusive fence
- dma_resv_excl_fence - return the object's exclusive fence
- @obj: the reservation object
- Returns the exclusive fence (if any). Caller must either hold the objects
Also review & update everything while we're at it.
This is prep work to smash a ton of stuff into the kerneldoc for @resv.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Nirmoy Das nirmoy.das@amd.com Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org --- include/linux/dma-buf.h | 107 +++++++++++++++++++++++++++++++--------- 1 file changed, 83 insertions(+), 24 deletions(-)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 92eec38a03aa..6d18b9e448b9 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -289,28 +289,6 @@ struct dma_buf_ops {
/** * struct dma_buf - shared buffer object - * @size: size of the buffer; invariant over the lifetime of the buffer. - * @file: file pointer used for sharing buffers across, and for refcounting. - * @attachments: list of dma_buf_attachment that denotes all devices attached, - * protected by dma_resv lock. - * @ops: dma_buf_ops associated with this buffer object. - * @lock: used internally to serialize list manipulation, attach/detach and - * vmap/unmap - * @vmapping_counter: used internally to refcnt the vmaps - * @vmap_ptr: the current vmap ptr if vmapping_counter > 0 - * @exp_name: name of the exporter; useful for debugging. - * @name: userspace-provided name; useful for accounting and debugging, - * protected by @resv. - * @name_lock: spinlock to protect name access - * @owner: pointer to exporter module; used for refcounting when exporter is a - * kernel module. - * @list_node: node for dma_buf accounting and debugging. - * @priv: exporter specific private data for this buffer object. - * @resv: reservation object linked to this dma-buf - * @poll: for userspace poll support - * @cb_excl: for userspace poll support - * @cb_shared: for userspace poll support - * @sysfs_entry: for exposing information about this buffer in sysfs. * The attachment_uid member of @sysfs_entry is protected by dma_resv lock * and is incremented on each attach. * @@ -324,24 +302,100 @@ struct dma_buf_ops { * Device DMA access is handled by the separate &struct dma_buf_attachment. */ struct dma_buf { + /** + * @size: + * + * Size of the buffer; invariant over the lifetime of the buffer. + */ size_t size; + + /** + * @file: + * + * File pointer used for sharing buffers across, and for refcounting. + * See dma_buf_get() and dma_buf_put(). + */ struct file *file; + + /** + * @attachments: + * + * List of dma_buf_attachment that denotes all devices attached, + * protected by &dma_resv lock @resv. + */ struct list_head attachments; + + /** @ops: dma_buf_ops associated with this buffer object. */ const struct dma_buf_ops *ops; + + /** + * @lock: + * + * Used internally to serialize list manipulation, attach/detach and + * vmap/unmap. Note that in many cases this is superseeded by + * dma_resv_lock() on @resv. + */ struct mutex lock; + + /** + * @vmapping_counter: + * + * Used internally to refcnt the vmaps returned by dma_buf_vmap(). + * Protected by @lock. + */ unsigned vmapping_counter; + + /** + * @vmap_ptr: + * The current vmap ptr if @vmapping_counter > 0. Protected by @lock. + */ struct dma_buf_map vmap_ptr; + + /** + * @exp_name: + * + * Name of the exporter; useful for debugging. See the + * DMA_BUF_SET_NAME IOCTL. + */ const char *exp_name; + + /** + * @name: + * + * Userspace-provided name; useful for accounting and debugging, + * protected by dma_resv_lock() on @resv and @name_lock for read access. + */ const char *name; + + /** @name_lock: Spinlock to protect name acces for read access. */ spinlock_t name_lock; + + /** + * @owner: + * + * Pointer to exporter module; used for refcounting when exporter is a + * kernel module. + */ struct module *owner; + + /** @list_node: node for dma_buf accounting and debugging. */ struct list_head list_node; + + /** @priv: exporter specific private data for this buffer object. */ void *priv; + + /** + * @resv: + * + * Reservation object linked to this dma-buf. + */ struct dma_resv *resv;
- /* poll support */ + /** @poll: for userspace poll support */ wait_queue_head_t poll;
+ /** @cb_excl: for userspace poll support */ + /** @cb_shared: for userspace poll support */ struct dma_buf_poll_cb_t { struct dma_fence_cb cb; wait_queue_head_t *poll; @@ -349,7 +403,12 @@ struct dma_buf { __poll_t active; } cb_excl, cb_shared; #ifdef CONFIG_DMABUF_SYSFS_STATS - /* for sysfs stats */ + /** + * @sysfs_entry: + * + * For exposing information about this buffer in sysfs. See also + * `DMA-BUF statistics`_ for the uapi this enables. + */ struct dma_buf_sysfs_entry { struct kobject kobj; struct dma_buf *dmabuf;
On Tue, Jun 22, 2021 at 12:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
Also review & update everything while we're at it.
This is prep work to smash a ton of stuff into the kerneldoc for @resv.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Nirmoy Das nirmoy.das@amd.com Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
include/linux/dma-buf.h | 107 +++++++++++++++++++++++++++++++--------- 1 file changed, 83 insertions(+), 24 deletions(-)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 92eec38a03aa..6d18b9e448b9 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -289,28 +289,6 @@ struct dma_buf_ops {
/**
- struct dma_buf - shared buffer object
- @size: size of the buffer; invariant over the lifetime of the buffer.
- @file: file pointer used for sharing buffers across, and for refcounting.
- @attachments: list of dma_buf_attachment that denotes all devices attached,
protected by dma_resv lock.
- @ops: dma_buf_ops associated with this buffer object.
- @lock: used internally to serialize list manipulation, attach/detach and
vmap/unmap
- @vmapping_counter: used internally to refcnt the vmaps
- @vmap_ptr: the current vmap ptr if vmapping_counter > 0
- @exp_name: name of the exporter; useful for debugging.
- @name: userspace-provided name; useful for accounting and debugging,
protected by @resv.
- @name_lock: spinlock to protect name access
- @owner: pointer to exporter module; used for refcounting when exporter is a
kernel module.
- @list_node: node for dma_buf accounting and debugging.
- @priv: exporter specific private data for this buffer object.
- @resv: reservation object linked to this dma-buf
- @poll: for userspace poll support
- @cb_excl: for userspace poll support
- @cb_shared: for userspace poll support
- @sysfs_entry: for exposing information about this buffer in sysfs.
- The attachment_uid member of @sysfs_entry is protected by dma_resv lock
- and is incremented on each attach.
@@ -324,24 +302,100 @@ struct dma_buf_ops {
- Device DMA access is handled by the separate &struct dma_buf_attachment.
*/ struct dma_buf {
/**
* @size:
*
* Size of the buffer; invariant over the lifetime of the buffer.
*/ size_t size;
/**
* @file:
*
* File pointer used for sharing buffers across, and for refcounting.
* See dma_buf_get() and dma_buf_put().
*/ struct file *file;
/**
* @attachments:
*
* List of dma_buf_attachment that denotes all devices attached,
* protected by &dma_resv lock @resv.
*/ struct list_head attachments;
/** @ops: dma_buf_ops associated with this buffer object. */
For consistency you may want to format this like: /** * @ops: * * dma_buf_ops associated with this buffer object. */
const struct dma_buf_ops *ops;
/**
* @lock:
*
* Used internally to serialize list manipulation, attach/detach and
* vmap/unmap. Note that in many cases this is superseeded by
* dma_resv_lock() on @resv.
*/ struct mutex lock;
/**
* @vmapping_counter:
*
* Used internally to refcnt the vmaps returned by dma_buf_vmap().
* Protected by @lock.
*/ unsigned vmapping_counter;
/**
* @vmap_ptr:
* The current vmap ptr if @vmapping_counter > 0. Protected by @lock.
*/
Same comment as above.
struct dma_buf_map vmap_ptr;
/**
* @exp_name:
*
* Name of the exporter; useful for debugging. See the
* DMA_BUF_SET_NAME IOCTL.
*/ const char *exp_name;
/**
* @name:
*
* Userspace-provided name; useful for accounting and debugging,
* protected by dma_resv_lock() on @resv and @name_lock for read access.
*/ const char *name;
/** @name_lock: Spinlock to protect name acces for read access. */ spinlock_t name_lock;
/**
* @owner:
*
* Pointer to exporter module; used for refcounting when exporter is a
* kernel module.
*/ struct module *owner;
/** @list_node: node for dma_buf accounting and debugging. */
and here.
struct list_head list_node;
/** @priv: exporter specific private data for this buffer object. */
and here.
void *priv;
/**
* @resv:
*
* Reservation object linked to this dma-buf.
*/ struct dma_resv *resv;
/* poll support */
/** @poll: for userspace poll support */
here.
wait_queue_head_t poll;
/** @cb_excl: for userspace poll support */
/** @cb_shared: for userspace poll support */
Here.
Either way, Reviewed-by: Alex Deucher alexander.deucher@amd.com
struct dma_buf_poll_cb_t { struct dma_fence_cb cb; wait_queue_head_t *poll;
@@ -349,7 +403,12 @@ struct dma_buf { __poll_t active; } cb_excl, cb_shared; #ifdef CONFIG_DMABUF_SYSFS_STATS
/* for sysfs stats */
/**
* @sysfs_entry:
*
* For exposing information about this buffer in sysfs. See also
* `DMA-BUF statistics`_ for the uapi this enables.
*/ struct dma_buf_sysfs_entry { struct kobject kobj; struct dma_buf *dmabuf;
-- 2.32.0.rc2
Hi Daniel.
On Tue, Jun 22, 2021 at 06:54:58PM +0200, Daniel Vetter wrote:
Also review & update everything while we're at it.
This is prep work to smash a ton of stuff into the kerneldoc for @resv.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Nirmoy Das nirmoy.das@amd.com Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
include/linux/dma-buf.h | 107 +++++++++++++++++++++++++++++++--------- 1 file changed, 83 insertions(+), 24 deletions(-)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 92eec38a03aa..6d18b9e448b9 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -289,28 +289,6 @@ struct dma_buf_ops {
/**
- struct dma_buf - shared buffer object
- @size: size of the buffer; invariant over the lifetime of the buffer.
- @file: file pointer used for sharing buffers across, and for refcounting.
- @attachments: list of dma_buf_attachment that denotes all devices attached,
protected by dma_resv lock.
- @ops: dma_buf_ops associated with this buffer object.
- @lock: used internally to serialize list manipulation, attach/detach and
vmap/unmap
- @vmapping_counter: used internally to refcnt the vmaps
- @vmap_ptr: the current vmap ptr if vmapping_counter > 0
- @exp_name: name of the exporter; useful for debugging.
- @name: userspace-provided name; useful for accounting and debugging,
protected by @resv.
- @name_lock: spinlock to protect name access
- @owner: pointer to exporter module; used for refcounting when exporter is a
kernel module.
- @list_node: node for dma_buf accounting and debugging.
- @priv: exporter specific private data for this buffer object.
- @resv: reservation object linked to this dma-buf
- @poll: for userspace poll support
- @cb_excl: for userspace poll support
- @cb_shared: for userspace poll support
- @sysfs_entry: for exposing information about this buffer in sysfs.
This sentence
- The attachment_uid member of @sysfs_entry is protected by dma_resv lock
- and is incremented on each attach.
belongs to the paragraph describing sysfs_entry and should be moved too. Or maybe reworded and then document all fields in dma_buf_sysfs_entry?
With this fixed: Acked-by: Sam Ravnborg sam@ravnborg.org
On Tue, Jun 22, 2021 at 9:01 PM Sam Ravnborg sam@ravnborg.org wrote:
Hi Daniel.
On Tue, Jun 22, 2021 at 06:54:58PM +0200, Daniel Vetter wrote:
Also review & update everything while we're at it.
This is prep work to smash a ton of stuff into the kerneldoc for @resv.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Nirmoy Das nirmoy.das@amd.com Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
include/linux/dma-buf.h | 107 +++++++++++++++++++++++++++++++--------- 1 file changed, 83 insertions(+), 24 deletions(-)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 92eec38a03aa..6d18b9e448b9 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -289,28 +289,6 @@ struct dma_buf_ops {
/**
- struct dma_buf - shared buffer object
- @size: size of the buffer; invariant over the lifetime of the buffer.
- @file: file pointer used for sharing buffers across, and for refcounting.
- @attachments: list of dma_buf_attachment that denotes all devices attached,
protected by dma_resv lock.
- @ops: dma_buf_ops associated with this buffer object.
- @lock: used internally to serialize list manipulation, attach/detach and
vmap/unmap
- @vmapping_counter: used internally to refcnt the vmaps
- @vmap_ptr: the current vmap ptr if vmapping_counter > 0
- @exp_name: name of the exporter; useful for debugging.
- @name: userspace-provided name; useful for accounting and debugging,
protected by @resv.
- @name_lock: spinlock to protect name access
- @owner: pointer to exporter module; used for refcounting when exporter is a
kernel module.
- @list_node: node for dma_buf accounting and debugging.
- @priv: exporter specific private data for this buffer object.
- @resv: reservation object linked to this dma-buf
- @poll: for userspace poll support
- @cb_excl: for userspace poll support
- @cb_shared: for userspace poll support
- @sysfs_entry: for exposing information about this buffer in sysfs.
This sentence
- The attachment_uid member of @sysfs_entry is protected by dma_resv lock
- and is incremented on each attach.
belongs to the paragraph describing sysfs_entry and should be moved too. Or maybe reworded and then document all fields in dma_buf_sysfs_entry?
Unfortunately kerneldoc lost the ability to document embedded structs/unions. At least last time I checked, it's a bit a bikeshed. So I'd need to pull the entire struct out. I'll just move it since it's indeed misplaced.
With this fixed: Acked-by: Sam Ravnborg sam@ravnborg.org
Thanks for taking a look. -Daniel
Am 22.06.21 um 18:54 schrieb Daniel Vetter:
Also review & update everything while we're at it.
This is prep work to smash a ton of stuff into the kerneldoc for @resv.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Nirmoy Das nirmoy.das@amd.com Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
Acked-by: Christian König christian.koenig@amd.com
include/linux/dma-buf.h | 107 +++++++++++++++++++++++++++++++--------- 1 file changed, 83 insertions(+), 24 deletions(-)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 92eec38a03aa..6d18b9e448b9 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -289,28 +289,6 @@ struct dma_buf_ops {
/**
- struct dma_buf - shared buffer object
- @size: size of the buffer; invariant over the lifetime of the buffer.
- @file: file pointer used for sharing buffers across, and for refcounting.
- @attachments: list of dma_buf_attachment that denotes all devices attached,
protected by dma_resv lock.
- @ops: dma_buf_ops associated with this buffer object.
- @lock: used internally to serialize list manipulation, attach/detach and
vmap/unmap
- @vmapping_counter: used internally to refcnt the vmaps
- @vmap_ptr: the current vmap ptr if vmapping_counter > 0
- @exp_name: name of the exporter; useful for debugging.
- @name: userspace-provided name; useful for accounting and debugging,
protected by @resv.
- @name_lock: spinlock to protect name access
- @owner: pointer to exporter module; used for refcounting when exporter is a
kernel module.
- @list_node: node for dma_buf accounting and debugging.
- @priv: exporter specific private data for this buffer object.
- @resv: reservation object linked to this dma-buf
- @poll: for userspace poll support
- @cb_excl: for userspace poll support
- @cb_shared: for userspace poll support
- @sysfs_entry: for exposing information about this buffer in sysfs.
- The attachment_uid member of @sysfs_entry is protected by dma_resv lock
- and is incremented on each attach.
@@ -324,24 +302,100 @@ struct dma_buf_ops {
- Device DMA access is handled by the separate &struct dma_buf_attachment.
*/ struct dma_buf {
- /**
* @size:
*
* Size of the buffer; invariant over the lifetime of the buffer.
size_t size;*/
- /**
* @file:
*
* File pointer used for sharing buffers across, and for refcounting.
* See dma_buf_get() and dma_buf_put().
struct file *file;*/
- /**
* @attachments:
*
* List of dma_buf_attachment that denotes all devices attached,
* protected by &dma_resv lock @resv.
struct list_head attachments;*/
- /** @ops: dma_buf_ops associated with this buffer object. */ const struct dma_buf_ops *ops;
- /**
* @lock:
*
* Used internally to serialize list manipulation, attach/detach and
* vmap/unmap. Note that in many cases this is superseeded by
* dma_resv_lock() on @resv.
struct mutex lock;*/
- /**
* @vmapping_counter:
*
* Used internally to refcnt the vmaps returned by dma_buf_vmap().
* Protected by @lock.
unsigned vmapping_counter;*/
- /**
* @vmap_ptr:
* The current vmap ptr if @vmapping_counter > 0. Protected by @lock.
struct dma_buf_map vmap_ptr;*/
- /**
* @exp_name:
*
* Name of the exporter; useful for debugging. See the
* DMA_BUF_SET_NAME IOCTL.
const char *exp_name;*/
- /**
* @name:
*
* Userspace-provided name; useful for accounting and debugging,
* protected by dma_resv_lock() on @resv and @name_lock for read access.
const char *name;*/
- /** @name_lock: Spinlock to protect name acces for read access. */ spinlock_t name_lock;
- /**
* @owner:
*
* Pointer to exporter module; used for refcounting when exporter is a
* kernel module.
struct module *owner;*/
- /** @list_node: node for dma_buf accounting and debugging. */ struct list_head list_node;
- /** @priv: exporter specific private data for this buffer object. */ void *priv;
- /**
* @resv:
*
* Reservation object linked to this dma-buf.
struct dma_resv *resv;*/
- /* poll support */
/** @poll: for userspace poll support */ wait_queue_head_t poll;
/** @cb_excl: for userspace poll support */
/** @cb_shared: for userspace poll support */ struct dma_buf_poll_cb_t { struct dma_fence_cb cb; wait_queue_head_t *poll;
@@ -349,7 +403,12 @@ struct dma_buf { __poll_t active; } cb_excl, cb_shared; #ifdef CONFIG_DMABUF_SYSFS_STATS
- /* for sysfs stats */
- /**
* @sysfs_entry:
*
* For exposing information about this buffer in sysfs. See also
* `DMA-BUF statistics`_ for the uapi this enables.
struct dma_buf_sysfs_entry { struct kobject kobj; struct dma_buf *dmabuf;*/
Also review & update everything while we're at it.
This is prep work to smash a ton of stuff into the kerneldoc for @resv.
v2: Move the doc for sysfs_entry.attachment_uid to the right place too (Sam)
Acked-by: Christian König christian.koenig@amd.com Cc: Sam Ravnborg sam@ravnborg.org Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Nirmoy Das nirmoy.das@amd.com Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org --- include/linux/dma-buf.h | 116 +++++++++++++++++++++++++++++++--------- 1 file changed, 90 insertions(+), 26 deletions(-)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 92eec38a03aa..81cebf414505 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -289,30 +289,6 @@ struct dma_buf_ops {
/** * struct dma_buf - shared buffer object - * @size: size of the buffer; invariant over the lifetime of the buffer. - * @file: file pointer used for sharing buffers across, and for refcounting. - * @attachments: list of dma_buf_attachment that denotes all devices attached, - * protected by dma_resv lock. - * @ops: dma_buf_ops associated with this buffer object. - * @lock: used internally to serialize list manipulation, attach/detach and - * vmap/unmap - * @vmapping_counter: used internally to refcnt the vmaps - * @vmap_ptr: the current vmap ptr if vmapping_counter > 0 - * @exp_name: name of the exporter; useful for debugging. - * @name: userspace-provided name; useful for accounting and debugging, - * protected by @resv. - * @name_lock: spinlock to protect name access - * @owner: pointer to exporter module; used for refcounting when exporter is a - * kernel module. - * @list_node: node for dma_buf accounting and debugging. - * @priv: exporter specific private data for this buffer object. - * @resv: reservation object linked to this dma-buf - * @poll: for userspace poll support - * @cb_excl: for userspace poll support - * @cb_shared: for userspace poll support - * @sysfs_entry: for exposing information about this buffer in sysfs. - * The attachment_uid member of @sysfs_entry is protected by dma_resv lock - * and is incremented on each attach. * * This represents a shared buffer, created by calling dma_buf_export(). The * userspace representation is a normal file descriptor, which can be created by @@ -324,24 +300,100 @@ struct dma_buf_ops { * Device DMA access is handled by the separate &struct dma_buf_attachment. */ struct dma_buf { + /** + * @size: + * + * Size of the buffer; invariant over the lifetime of the buffer. + */ size_t size; + + /** + * @file: + * + * File pointer used for sharing buffers across, and for refcounting. + * See dma_buf_get() and dma_buf_put(). + */ struct file *file; + + /** + * @attachments: + * + * List of dma_buf_attachment that denotes all devices attached, + * protected by &dma_resv lock @resv. + */ struct list_head attachments; + + /** @ops: dma_buf_ops associated with this buffer object. */ const struct dma_buf_ops *ops; + + /** + * @lock: + * + * Used internally to serialize list manipulation, attach/detach and + * vmap/unmap. Note that in many cases this is superseeded by + * dma_resv_lock() on @resv. + */ struct mutex lock; + + /** + * @vmapping_counter: + * + * Used internally to refcnt the vmaps returned by dma_buf_vmap(). + * Protected by @lock. + */ unsigned vmapping_counter; + + /** + * @vmap_ptr: + * The current vmap ptr if @vmapping_counter > 0. Protected by @lock. + */ struct dma_buf_map vmap_ptr; + + /** + * @exp_name: + * + * Name of the exporter; useful for debugging. See the + * DMA_BUF_SET_NAME IOCTL. + */ const char *exp_name; + + /** + * @name: + * + * Userspace-provided name; useful for accounting and debugging, + * protected by dma_resv_lock() on @resv and @name_lock for read access. + */ const char *name; + + /** @name_lock: Spinlock to protect name acces for read access. */ spinlock_t name_lock; + + /** + * @owner: + * + * Pointer to exporter module; used for refcounting when exporter is a + * kernel module. + */ struct module *owner; + + /** @list_node: node for dma_buf accounting and debugging. */ struct list_head list_node; + + /** @priv: exporter specific private data for this buffer object. */ void *priv; + + /** + * @resv: + * + * Reservation object linked to this dma-buf. + */ struct dma_resv *resv;
- /* poll support */ + /** @poll: for userspace poll support */ wait_queue_head_t poll;
+ /** @cb_excl: for userspace poll support */ + /** @cb_shared: for userspace poll support */ struct dma_buf_poll_cb_t { struct dma_fence_cb cb; wait_queue_head_t *poll; @@ -349,10 +401,22 @@ struct dma_buf { __poll_t active; } cb_excl, cb_shared; #ifdef CONFIG_DMABUF_SYSFS_STATS - /* for sysfs stats */ + /** + * @sysfs_entry: + * + * For exposing information about this buffer in sysfs. See also + * `DMA-BUF statistics`_ for the uapi this enables. + */ struct dma_buf_sysfs_entry { struct kobject kobj; struct dma_buf *dmabuf; + + /** + * @sysfs_entry.attachment_uid: + * + * This is protected by the dma_resv_lock() on @resv and is + * incremented on each attach. + */ unsigned int attachment_uid; struct kset *attach_stats_kset; } *sysfs_entry;
Hi Daniel, looks good.
On Wed, Jun 23, 2021 at 06:17:12PM +0200, Daniel Vetter wrote:
Also review & update everything while we're at it.
This is prep work to smash a ton of stuff into the kerneldoc for @resv.
v2: Move the doc for sysfs_entry.attachment_uid to the right place too (Sam)
Acked-by: Christian König christian.koenig@amd.com Cc: Sam Ravnborg sam@ravnborg.org Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Nirmoy Das nirmoy.das@amd.com Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Sam Ravnborg sam@ravnborg.org
Docs for struct dma_resv are fairly clear:
"A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)."
https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-obj...
Furthermore a review across all of upstream.
First of render drivers and how they set implicit fences:
- nouveau follows this contract, see in validate_fini_no_ticket()
nouveau_bo_fence(nvbo, fence, !!b->write_domains);
and that last boolean controls whether the exclusive or shared fence slot is used.
- radeon follows this contract by setting
p->relocs[i].tv.num_shared = !r->write_domain;
in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing.
- vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot.
- etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences()
- i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract.
- lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo
- msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer
- panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess
- v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved
- v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos()
- vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu
- vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence()
This covers the setting of the exclusive fences when writing.
Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few:
- i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan)
- etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback.
- v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small.
- v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up.
- panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper.
- lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d.
- msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea.
- all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain.
Onwards to the display side.
- Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs.
- I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right.
In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu.
Amdgpu tried to fix this already in
commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König christian.koenig@amd.com Date: Wed Sep 19 16:54:35 2018 +0200
drm/amdgpu: fix using shared fence for exported BOs v2
but this fix falls short on a number of areas:
- It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable.
- It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in
commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst maarten.lankhorst@canonical.com Date: Tue Jul 1 12:57:43 2014 +0200
dma-buf: add poll support, v3
- Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting
"Yeah, and that is exactly the reason why I will NAK this uAPI change.
"This doesn't works for amdgpu at all for the reasons outlined above."
https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail...
Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works.
Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan:
- amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk.
- Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model.
- for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same.
- since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag.
- New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot.
- Then when this is all in place we can merge this patch here.
What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted.
Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers.
v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed.
This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time.
Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd:
https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.n...
v3: Since Christian has fixed amdgpu now in
commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König christian.koenig@amd.com Date: Wed Jun 9 13:51:36 2021 +0200
drm/amdgpu: rework dma_resv handling v3
Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers.
Since dynamic importers have different rules also hammer these in again while we're at it.
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- include/linux/dma-buf.h | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 6d18b9e448b9..4807cefe81f5 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -388,6 +388,45 @@ struct dma_buf { * @resv: * * Reservation object linked to this dma-buf. + * + * IMPLICIT SYNCHRONIZATION RULES: + * + * Drivers which support implicit synchronization of buffer access as + * e.g. exposed in `Implicit Fence Poll Support`_ should follow the + * below rules. + * + * - Drivers should add a shared fence through + * dma_resv_add_shared_fence() for anything the userspace API + * considers a read access. This highly depends upon the API and + * window system: E.g. OpenGL is generally implicitly synchronized on + * Linux, but explicitly synchronized on Android. Whereas Vulkan is + * generally explicitly synchronized for everything, and window system + * buffers have explicit API calls (which then need to make sure the + * implicit fences store here in @resv are updated correctly). + * + * - Similarly drivers should set the exclusive fence through + * dma_resv_add_excl_fence() for anything the userspace API considers + * write access. + * + * - Drivers may just always set the exclusive fence, since that only + * causes unecessarily synchronization, but no correctness issues. + * + * - Some drivers only expose a synchronous userspace API with no + * pipelining across drivers. These do not set any fences for their + * access. An example here is v4l. + * + * DYNAMIC IMPORTER RULES: + * + * Dynamic importers, see dma_buf_attachment_is_dynamic(), have + * additional constraints on how they set up fences: + * + * - Dynamic importers must obey the exclusive fence and wait for it to + * signal before allowing access to the buffer's underlying storage + * through. + * + * - Dynamic importers should set fences for any access that they can't + * disable immediately from their @dma_buf_attach_ops.move_notify + * callback. */ struct dma_resv *resv;
Am 22.06.21 um 18:54 schrieb Daniel Vetter:
Docs for struct dma_resv are fairly clear:
"A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)."
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdri.freede...
Furthermore a review across all of upstream.
First of render drivers and how they set implicit fences:
nouveau follows this contract, see in validate_fini_no_ticket()
nouveau_bo_fence(nvbo, fence, !!b->write_domains);
and that last boolean controls whether the exclusive or shared fence slot is used.
radeon follows this contract by setting
p->relocs[i].tv.num_shared = !r->write_domain;
in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing.
vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot.
etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences()
i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract.
lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo
msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer
panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess
v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved
v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos()
vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu
vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence()
This covers the setting of the exclusive fences when writing.
Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few:
i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan)
etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback.
v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small.
v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up.
panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper.
lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d.
msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea.
all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain.
Onwards to the display side.
Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs.
I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right.
In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu.
Amdgpu tried to fix this already in
commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König christian.koenig@amd.com Date: Wed Sep 19 16:54:35 2018 +0200
drm/amdgpu: fix using shared fence for exported BOs v2
but this fix falls short on a number of areas:
It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable.
It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in
commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst maarten.lankhorst@canonical.com Date: Tue Jul 1 12:57:43 2014 +0200
dma-buf: add poll support, v3
Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting
"Yeah, and that is exactly the reason why I will NAK this uAPI change.
"This doesn't works for amdgpu at all for the reasons outlined above."
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works.
Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan:
amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk.
Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model.
for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same.
since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag.
New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot.
Then when this is all in place we can merge this patch here.
What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted.
Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers.
v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed.
This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is
- fishing the implicit fences out of a shared object at the right time
- setting the exclusive implicit fence slot at the right time.
Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
v3: Since Christian has fixed amdgpu now in
commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König christian.koenig@amd.com Date: Wed Jun 9 13:51:36 2021 +0200
drm/amdgpu: rework dma_resv handling v3
Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers.
Since dynamic importers have different rules also hammer these in again while we're at it.
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Reviewed-by: Christian König christian.koenig@amd.com
include/linux/dma-buf.h | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 6d18b9e448b9..4807cefe81f5 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -388,6 +388,45 @@ struct dma_buf { * @resv: * * Reservation object linked to this dma-buf.
*
* IMPLICIT SYNCHRONIZATION RULES:
*
* Drivers which support implicit synchronization of buffer access as
* e.g. exposed in `Implicit Fence Poll Support`_ should follow the
* below rules.
*
* - Drivers should add a shared fence through
* dma_resv_add_shared_fence() for anything the userspace API
* considers a read access. This highly depends upon the API and
* window system: E.g. OpenGL is generally implicitly synchronized on
* Linux, but explicitly synchronized on Android. Whereas Vulkan is
* generally explicitly synchronized for everything, and window system
* buffers have explicit API calls (which then need to make sure the
* implicit fences store here in @resv are updated correctly).
*
* - Similarly drivers should set the exclusive fence through
* dma_resv_add_excl_fence() for anything the userspace API considers
* write access.
*
* - Drivers may just always set the exclusive fence, since that only
* causes unecessarily synchronization, but no correctness issues.
*
* - Some drivers only expose a synchronous userspace API with no
* pipelining across drivers. These do not set any fences for their
* access. An example here is v4l.
*
* DYNAMIC IMPORTER RULES:
*
* Dynamic importers, see dma_buf_attachment_is_dynamic(), have
* additional constraints on how they set up fences:
*
* - Dynamic importers must obey the exclusive fence and wait for it to
* signal before allowing access to the buffer's underlying storage
* through.
*
* - Dynamic importers should set fences for any access that they can't
* disable immediately from their @dma_buf_attach_ops.move_notify
*/ struct dma_resv *resv;* callback.
Docs for struct dma_resv are fairly clear:
"A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)."
https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-obj...
Furthermore a review across all of upstream.
First of render drivers and how they set implicit fences:
- nouveau follows this contract, see in validate_fini_no_ticket()
nouveau_bo_fence(nvbo, fence, !!b->write_domains);
and that last boolean controls whether the exclusive or shared fence slot is used.
- radeon follows this contract by setting
p->relocs[i].tv.num_shared = !r->write_domain;
in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing.
- vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot.
- etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences()
- i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract.
- lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo
- msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer
- panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess
- v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved
- v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos()
- vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu
- vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence()
This covers the setting of the exclusive fences when writing.
Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few:
- i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan)
- etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback.
- v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small.
- v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up.
- panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper.
- lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d.
- msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea.
- all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain.
Onwards to the display side.
- Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs.
- I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right.
In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu.
Amdgpu tried to fix this already in
commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König christian.koenig@amd.com Date: Wed Sep 19 16:54:35 2018 +0200
drm/amdgpu: fix using shared fence for exported BOs v2
but this fix falls short on a number of areas:
- It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable.
- It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in
commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst maarten.lankhorst@canonical.com Date: Tue Jul 1 12:57:43 2014 +0200
dma-buf: add poll support, v3
- Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting
"Yeah, and that is exactly the reason why I will NAK this uAPI change.
"This doesn't works for amdgpu at all for the reasons outlined above."
https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail...
Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works.
Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan:
- amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk.
- Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model.
- for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same.
- since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag.
- New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot.
- Then when this is all in place we can merge this patch here.
What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted.
Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers.
v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed.
This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time.
Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd:
https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.n...
v3: Since Christian has fixed amdgpu now in
commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König christian.koenig@amd.com Date: Wed Jun 9 13:51:36 2021 +0200
drm/amdgpu: rework dma_resv handling v3
Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers.
Since dynamic importers have different rules also hammer these in again while we're at it.
v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect
Reviewed-by: Christian König christian.koenig@amd.com (v3)
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- include/linux/dma-buf.h | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 81cebf414505..494f639ee486 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -386,6 +386,45 @@ struct dma_buf { * @resv: * * Reservation object linked to this dma-buf. + * + * IMPLICIT SYNCHRONIZATION RULES: + * + * Drivers which support implicit synchronization of buffer access as + * e.g. exposed in `Implicit Fence Poll Support`_ should follow the + * below rules. + * + * - Drivers should add a shared fence through + * dma_resv_add_shared_fence() for anything the userspace API + * considers a read access. This highly depends upon the API and + * window system: E.g. OpenGL is generally implicitly synchronized on + * Linux, but explicitly synchronized on Android. Whereas Vulkan is + * generally explicitly synchronized for everything, and window system + * buffers have explicit API calls (which then need to make sure the + * implicit fences store here in @resv are updated correctly). + * + * - Similarly drivers should set the exclusive fence through + * dma_resv_add_excl_fence() for anything the userspace API considers + * write access. + * + * - Drivers may just always set the exclusive fence, since that only + * causes unecessarily synchronization, but no correctness issues. + * + * - Some drivers only expose a synchronous userspace API with no + * pipelining across drivers. These do not set any fences for their + * access. An example here is v4l. + * + * DYNAMIC IMPORTER RULES: + * + * Dynamic importers, see dma_buf_attachment_is_dynamic(), have + * additional constraints on how they set up fences: + * + * - Dynamic importers must obey the exclusive fence and wait for it to + * signal before allowing access to the buffer's underlying storage + * through the device. + * + * - Dynamic importers should set fences for any access that they can't + * disable immediately from their &dma_buf_attach_ops.move_notify + * callback. */ struct dma_resv *resv;
On Thu, 24 Jun 2021 at 02:20, Daniel Vetter daniel.vetter@ffwll.ch wrote:
Docs for struct dma_resv are fairly clear:
"A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)."
https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-obj...
Furthermore a review across all of upstream.
First of render drivers and how they set implicit fences:
nouveau follows this contract, see in validate_fini_no_ticket()
nouveau_bo_fence(nvbo, fence, !!b->write_domains);
and that last boolean controls whether the exclusive or shared fence slot is used.
radeon follows this contract by setting
p->relocs[i].tv.num_shared = !r->write_domain;
in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing.
vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot.
etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences()
i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract.
lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo
msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer
panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess
v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved
v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos()
vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu
vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence()
This covers the setting of the exclusive fences when writing.
Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few:
i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan)
etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback.
v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small.
v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up.
panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper.
lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d.
msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea.
all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain.
Onwards to the display side.
Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs.
I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right.
In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu.
Amdgpu tried to fix this already in
commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König christian.koenig@amd.com Date: Wed Sep 19 16:54:35 2018 +0200
drm/amdgpu: fix using shared fence for exported BOs v2
but this fix falls short on a number of areas:
It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable.
It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in
commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst <maarten.lankhorst@canonical.com> Date: Tue Jul 1 12:57:43 2014 +0200 dma-buf: add poll support, v3
Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting
"Yeah, and that is exactly the reason why I will NAK this uAPI change.
"This doesn't works for amdgpu at all for the reasons outlined above."
https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail...
Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works.
Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan:
amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk.
Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model.
for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same.
since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag.
New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot.
Then when this is all in place we can merge this patch here.
What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted.
Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers.
v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed.
This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is
- fishing the implicit fences out of a shared object at the right time
- setting the exclusive implicit fence slot at the right time.
Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd:
https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.n...
v3: Since Christian has fixed amdgpu now in
commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König christian.koenig@amd.com Date: Wed Jun 9 13:51:36 2021 +0200
drm/amdgpu: rework dma_resv handling v3
Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers.
Since dynamic importers have different rules also hammer these in again while we're at it.
v4:
- Add the missing "through the device" in the dynamic section that I overlooked.
- Fix a kerneldoc markup mistake, the link didn't connect
This is pretty epic commit msg, thanks for the investment, the commit msg should be required reading.
Reviewed-by: Dave Airlie airlied@redhat.com
Dave.
Hi,
On Wed, 23 Jun 2021 at 17:20, Daniel Vetter daniel.vetter@ffwll.ch wrote:
*
* IMPLICIT SYNCHRONIZATION RULES:
*
* Drivers which support implicit synchronization of buffer access as
* e.g. exposed in `Implicit Fence Poll Support`_ should follow the
* below rules.
'Should' ... ? Must.
* - Drivers should add a shared fence through
* dma_resv_add_shared_fence() for anything the userspace API
* considers a read access. This highly depends upon the API and
* window system: E.g. OpenGL is generally implicitly synchronized on
* Linux, but explicitly synchronized on Android. Whereas Vulkan is
* generally explicitly synchronized for everything, and window system
* buffers have explicit API calls (which then need to make sure the
* implicit fences store here in @resv are updated correctly).
*
* - [...]
Mmm, I think this is all right, but it could be worded much more clearly. Right now it's a bunch of points all smashed into one, and there's a lot of room for misinterpretation.
Here's a strawman, starting with most basic and restrictive, working through to when you're allowed to wriggle your way out:
Rule 1: Drivers must add a shared fence through dma_resv_add_shared_fence() for any read accesses against that buffer. This appends a fence to the shared array, ensuring that any future non-read access will be synchronised against this operation to only begin after it has completed.
Rule 2: Drivers must add an exclusive fence through dma_resv_add_excl_fence() for any write accesses against that buffer. This replaces the exclusive fence with the new operation, ensuring that all future access will be synchronised against this operation to only begin after it has completed.
Rule 3: Drivers must synchronise all accesses to buffers against existing implicit fences. Read accesses must synchronise against the exclusive fence (read-after-write), and write accesses must synchronise against both the exclusive (write-after-write) and shared (write-after-read) fences.
Note 1: Users like OpenGL and window systems on non-Android userspace are generally implicitly synchronised. An implicitly-synchronised userspace is unaware of fences from prior operations, so the kernel mediates scheduling to create the illusion that GPU work is FIFO. For example, an application will flush and schedule GPU write work to render its image, then immediately tell the window system to display that image; the window system may immediately flush and schedule GPU read work to display that image, with neither waiting for the write to have completed. The kernel provides coherence by synchronising the read access against the write fence in the exclusive slot, so that the image displayed is correct.
Note 2: Users like Vulkan and Android window system are generally explicitly synchronised. An explicitly-synchronised userspace is responsible for tracking its own read and write access and providing the kernel with synchronisation barriers. For instance, a Vulkan application rendering to a buffer and subsequently using it as a read texture, must annotate the read operation with a read-after-write synchronisation barrier.
Note 3: Implicit and explicit userspace can coexist. For instance, an explicitly-synchronised Vulkan application may be running as a client of an implicitly-synchronised window system which uses OpenGL for composition; an implicitly-synchronised OpenGL application may be running as a client of a window system which uses Vulkan for composition.
Note 4: Some subsystems, for example V4L2, do not pipeline operations, and instead only return to userspace when the scheduled work against a buffer has fully retired.
Exemption 1: Fully self-coherent userspace may skip implicit synchronisation barriers. For instance, accesses between two Vulkan-internal buffers allocated by a single application do not need to synchronise against each other's implicit fences, as the client is responsible for explicitly providing barriers for access. A self-contained OpenGL userspace also has no need to implicitly synchronise its access if the driver instead tracks all access and inserts the appropriate synchronisation barriers.
Exemption 2: When implicit and explicit userspace coexist, the explicit side may skip intermediate synchronisation, and only place synchronisation barriers at transition points. For example, a Vulkan compositor displaying a buffer from an OpenGL application would need to synchronise its first access against the fence placed in the exclusive implicit-synchronisation slot. Once this read has fully retired, the compositor has no need to participate in implicit synchronisation until it is ready to return the buffer to the application, at which point it must insert all its non-retired accesses into the shared slot, which the application will then synchronise future write accesses against.
Cheers, Daniel
On Thu, Jun 24, 2021 at 1:08 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Wed, 23 Jun 2021 at 17:20, Daniel Vetter daniel.vetter@ffwll.ch wrote:
*
* IMPLICIT SYNCHRONIZATION RULES:
*
* Drivers which support implicit synchronization of buffer access as
* e.g. exposed in `Implicit Fence Poll Support`_ should follow the
* below rules.
'Should' ... ? Must.
Yeah I guess I can upgrade a bunch of them.
* - Drivers should add a shared fence through
* dma_resv_add_shared_fence() for anything the userspace API
* considers a read access. This highly depends upon the API and
* window system: E.g. OpenGL is generally implicitly synchronized on
* Linux, but explicitly synchronized on Android. Whereas Vulkan is
* generally explicitly synchronized for everything, and window system
* buffers have explicit API calls (which then need to make sure the
* implicit fences store here in @resv are updated correctly).
*
* - [...]
Mmm, I think this is all right, but it could be worded much more clearly. Right now it's a bunch of points all smashed into one, and there's a lot of room for misinterpretation.
Here's a strawman, starting with most basic and restrictive, working through to when you're allowed to wriggle your way out:
Rule 1: Drivers must add a shared fence through dma_resv_add_shared_fence() for any read accesses against that buffer. This appends a fence to the shared array, ensuring that any future non-read access will be synchronised against this operation to only begin after it has completed.
Rule 2: Drivers must add an exclusive fence through dma_resv_add_excl_fence() for any write accesses against that buffer. This replaces the exclusive fence with the new operation, ensuring that all future access will be synchronised against this operation to only begin after it has completed.
Rule 3: Drivers must synchronise all accesses to buffers against existing implicit fences. Read accesses must synchronise against the exclusive fence (read-after-write), and write accesses must synchronise against both the exclusive (write-after-write) and shared (write-after-read) fences.
Note 1: Users like OpenGL and window systems on non-Android userspace are generally implicitly synchronised. An implicitly-synchronised userspace is unaware of fences from prior operations, so the kernel mediates scheduling to create the illusion that GPU work is FIFO. For example, an application will flush and schedule GPU write work to render its image, then immediately tell the window system to display that image; the window system may immediately flush and schedule GPU read work to display that image, with neither waiting for the write to have completed. The kernel provides coherence by synchronising the read access against the write fence in the exclusive slot, so that the image displayed is correct.
Note 2: Users like Vulkan and Android window system are generally explicitly synchronised. An explicitly-synchronised userspace is responsible for tracking its own read and write access and providing the kernel with synchronisation barriers. For instance, a Vulkan application rendering to a buffer and subsequently using it as a read texture, must annotate the read operation with a read-after-write synchronisation barrier.
Note 3: Implicit and explicit userspace can coexist. For instance, an explicitly-synchronised Vulkan application may be running as a client of an implicitly-synchronised window system which uses OpenGL for composition; an implicitly-synchronised OpenGL application may be running as a client of a window system which uses Vulkan for composition.
Note 4: Some subsystems, for example V4L2, do not pipeline operations, and instead only return to userspace when the scheduled work against a buffer has fully retired.
Exemption 1: Fully self-coherent userspace may skip implicit synchronisation barriers. For instance, accesses between two Vulkan-internal buffers allocated by a single application do not need to synchronise against each other's implicit fences, as the client is responsible for explicitly providing barriers for access. A self-contained OpenGL userspace also has no need to implicitly synchronise its access if the driver instead tracks all access and inserts the appropriate synchronisation barriers.
Exemption 2: When implicit and explicit userspace coexist, the explicit side may skip intermediate synchronisation, and only place synchronisation barriers at transition points. For example, a Vulkan compositor displaying a buffer from an OpenGL application would need to synchronise its first access against the fence placed in the exclusive implicit-synchronisation slot. Once this read has fully retired, the compositor has no need to participate in implicit synchronisation until it is ready to return the buffer to the application, at which point it must insert all its non-retired accesses into the shared slot, which the application will then synchronise future write accesses against.
So I think this is excellent, but maybe better suited in the uapi section as a sperate chapter? Essentially keep your rules in the driver-internal docs, but move the Note/excemptions into the uapi section under a "Implicit Sync Mode of Operation" or whatever heading?
The other thing to keep in mind is that this is very much incomplete: I'm silent on what drivers should do exactly with these fences. That's largely because I haven't fully completed that audit, and there's a pile of bugs there still. -Daniel
Docs for struct dma_resv are fairly clear:
"A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)."
https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-obj...
Furthermore a review across all of upstream.
First of render drivers and how they set implicit fences:
- nouveau follows this contract, see in validate_fini_no_ticket()
nouveau_bo_fence(nvbo, fence, !!b->write_domains);
and that last boolean controls whether the exclusive or shared fence slot is used.
- radeon follows this contract by setting
p->relocs[i].tv.num_shared = !r->write_domain;
in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing.
- vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot.
- etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences()
- i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract.
- lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo
- msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer
- panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess
- v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved
- v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos()
- vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu
- vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence()
This covers the setting of the exclusive fences when writing.
Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few:
- i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan)
- etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback.
- v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small.
- v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up.
- panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper.
- lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d.
- msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea.
- all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain.
Onwards to the display side.
- Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs.
- I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right.
In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu.
Amdgpu tried to fix this already in
commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König christian.koenig@amd.com Date: Wed Sep 19 16:54:35 2018 +0200
drm/amdgpu: fix using shared fence for exported BOs v2
but this fix falls short on a number of areas:
- It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable.
- It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in
commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst maarten.lankhorst@canonical.com Date: Tue Jul 1 12:57:43 2014 +0200
dma-buf: add poll support, v3
- Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting
"Yeah, and that is exactly the reason why I will NAK this uAPI change.
"This doesn't works for amdgpu at all for the reasons outlined above."
https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail...
Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works.
Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan:
- amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk.
- Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model.
- for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same.
- since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag.
- New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot.
- Then when this is all in place we can merge this patch here.
What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted.
Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers.
v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed.
This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time.
Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd:
https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.n...
v3: Since Christian has fixed amdgpu now in
commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König christian.koenig@amd.com Date: Wed Jun 9 13:51:36 2021 +0200
drm/amdgpu: rework dma_resv handling v3
Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers.
Since dynamic importers have different rules also hammer these in again while we're at it.
v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect
v5: - A few s/should/must/ to make clear what must be done (if the driver does implicit sync) and what's more a maybe (Daniel Stone) - drop all the example api discussion, that needs to be expanded, clarified and put into a new chapter in drm-uapi.rst (Daniel Stone)
Cc: Daniel Stone daniel@fooishbar.org Acked-by: Daniel Stone daniel@fooishbar.org Reviewed-by: Dave Airlie airlied@redhat.com (v4) Reviewed-by: Christian König christian.koenig@amd.com (v3) Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- include/linux/dma-buf.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 81cebf414505..2b814fde0d11 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -386,6 +386,40 @@ struct dma_buf { * @resv: * * Reservation object linked to this dma-buf. + * + * IMPLICIT SYNCHRONIZATION RULES: + * + * Drivers which support implicit synchronization of buffer access as + * e.g. exposed in `Implicit Fence Poll Support`_ must follow the + * below rules. + * + * - Drivers must add a shared fence through dma_resv_add_shared_fence() + * for anything the userspace API considers a read access. This highly + * depends upon the API and window system. + * + * - Similarly drivers must set the exclusive fence through + * dma_resv_add_excl_fence() for anything the userspace API considers + * write access. + * + * - Drivers may just always set the exclusive fence, since that only + * causes unecessarily synchronization, but no correctness issues. + * + * - Some drivers only expose a synchronous userspace API with no + * pipelining across drivers. These do not set any fences for their + * access. An example here is v4l. + * + * DYNAMIC IMPORTER RULES: + * + * Dynamic importers, see dma_buf_attachment_is_dynamic(), have + * additional constraints on how they set up fences: + * + * - Dynamic importers must obey the exclusive fence and wait for it to + * signal before allowing access to the buffer's underlying storage + * through the device. + * + * - Dynamic importers should set fences for any access that they can't + * disable immediately from their &dma_buf_attach_ops.move_notify + * callback. */ struct dma_resv *resv;
Docs for struct dma_resv are fairly clear:
"A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations)."
https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#reservation-obj...
Furthermore a review across all of upstream.
First of render drivers and how they set implicit fences:
- nouveau follows this contract, see in validate_fini_no_ticket()
nouveau_bo_fence(nvbo, fence, !!b->write_domains);
and that last boolean controls whether the exclusive or shared fence slot is used.
- radeon follows this contract by setting
p->relocs[i].tv.num_shared = !r->write_domain;
in radeon_cs_parser_relocs(), which ensures that the call to ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the right thing.
- vmwgfx seems to follow this contract with the shotgun approach of always setting ttm_val_buf->num_shared = 0, which means ttm_eu_fence_buffer_objects() will only use the exclusive slot.
- etnaviv follows this contract, as can be trivially seen by looking at submit_attach_object_fences()
- i915 is a bit a convoluted maze with multiple paths leading to i915_vma_move_to_active(). Which sets the exclusive flag if EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for softpin mode, or through the write_domain when using relocations. It follows this contract.
- lima follows this contract, see lima_gem_submit() which sets the exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that bo
- msm follows this contract, see msm_gpu_submit() which sets the exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer
- panfrost follows this contract with the shotgun approach of just always setting the exclusive fence, see panfrost_attach_object_fences(). Benefits of a single engine I guess
- v3d follows this contract with the same shotgun approach in v3d_attach_fences_and_unlock_reservation(), but it has at least an XXX comment that maybe this should be improved
- v4c uses the same shotgun approach of always setting an exclusive fence, see vc4_update_bo_seqnos()
- vgem also follows this contract, see vgem_fence_attach_ioctl() and the VGEM_FENCE_WRITE. This is used in some igts to validate prime sharing with i915.ko without the need of a 2nd gpu
- vritio follows this contract again with the shotgun approach of always setting an exclusive fence, see virtio_gpu_array_add_fence()
This covers the setting of the exclusive fences when writing.
Synchronizing against the exclusive fence is a lot more tricky, and I only spot checked a few:
- i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all implicit dependencies (which is used by vulkan)
- etnaviv does this. Implicit dependencies are collected in submit_fence_sync(), again with an opt-out flag ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in etnaviv_sched_dependency which is the drm_sched_backend_ops->dependency callback.
- v4c seems to not do much here, maybe gets away with it by not having a scheduler and only a single engine. Since all newer broadcom chips than the OG vc4 use v3d for rendering, which follows this contract, the impact of this issue is fairly small.
- v3d does this using the drm_gem_fence_array_add_implicit() helper, which then it's drm_sched_backend_ops->dependency callback v3d_job_dependency() picks up.
- panfrost is nice here and tracks the implicit fences in panfrost_job->implicit_fences, which again the drm_sched_backend_ops->dependency callback panfrost_job_dependency() picks up. It is mildly questionable though since it only picks up exclusive fences in panfrost_acquire_object_fences(), but not buggy in practice because it also always sets the exclusive fence. It should pick up both sets of fences, just in case there's ever going to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a pcie port and a real gpu, which might actually happen eventually. A bug, but easy to fix. Should probably use the drm_gem_fence_array_add_implicit() helper.
- lima is nice an easy, uses drm_gem_fence_array_add_implicit() and the same schema as v3d.
- msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT, but because it doesn't use the drm/scheduler it handles fences from the wrong context with a synchronous dma_fence_wait. See submit_fence_sync() leading to msm_gem_sync_object(). Investing into a scheduler might be a good idea.
- all the remaining drivers are ttm based, where I hope they do appropriately obey implicit fences already. I didn't do the full audit there because a) not follow the contract would confuse ttm quite well and b) reading non-standard scheduler and submit code which isn't based on drm/scheduler is a pain.
Onwards to the display side.
- Any driver using the drm_gem_plane_helper_prepare_fb() helper will correctly. Overwhelmingly most drivers get this right, except a few totally dont. I'll follow up with a patch to make this the default and avoid a bunch of bugs.
- I didn't audit the ttm drivers, but given that dma_resv started there I hope they get this right.
In conclusion this IS the contract, both as documented and overwhelmingly implemented, specically as implemented by all render drivers except amdgpu.
Amdgpu tried to fix this already in
commit 049aca4363d8af87cab8d53de5401602db3b9999 Author: Christian König christian.koenig@amd.com Date: Wed Sep 19 16:54:35 2018 +0200
drm/amdgpu: fix using shared fence for exported BOs v2
but this fix falls short on a number of areas:
- It's racy, by the time the buffer is shared it might be too late. To make sure there's definitely never a problem we need to set the fences correctly for any buffer that's potentially exportable.
- It's breaking uapi, dma-buf fds support poll() and differentitiate between, which was introduced in
commit 9b495a5887994a6d74d5c261d012083a92b94738 Author: Maarten Lankhorst maarten.lankhorst@canonical.com Date: Tue Jul 1 12:57:43 2014 +0200
dma-buf: add poll support, v3
- Christian König wants to nack new uapi building further on this dma_resv contract because it breaks amdgpu, quoting
"Yeah, and that is exactly the reason why I will NAK this uAPI change.
"This doesn't works for amdgpu at all for the reasons outlined above."
https://lore.kernel.org/dri-devel/f2eb6751-2f82-9b23-f57e-548de5b729de@gmail...
Rejecting new development because your own driver is broken and violates established cross driver contracts and uapi is really not how upstream works.
Now this patch will have a severe performance impact on anything that runs on multiple engines. So we can't just merge it outright, but need a bit a plan:
- amdgpu needs a proper uapi for handling implicit fencing. The funny thing is that to do it correctly, implicit fencing must be treated as a very strange IPC mechanism for transporting fences, where both setting the fence and dependency intercepts must be handled explicitly. Current best practices is a per-bo flag to indicate writes, and a per-bo flag to to skip implicit fencing in the CS ioctl as a new chunk.
- Since amdgpu has been shipping with broken behaviour we need an opt-out flag from the butchered implicit fencing model to enable the proper explicit implicit fencing model.
- for kernel memory fences due to bo moves at least the i915 idea is to use ttm_bo->moving. amdgpu probably needs the same.
- since the current p2p dma-buf interface assumes the kernel memory fence is in the exclusive dma_resv fence slot we need to add a new fence slot for kernel fences, which must never be ignored. Since currently only amdgpu supports this there's no real problem here yet, until amdgpu gains a NO_IMPLICIT CS flag.
- New userspace needs to ship in enough desktop distros so that users wont notice the perf impact. I think we can ignore LTS distros who upgrade their kernels but not their mesa3d snapshot.
- Then when this is all in place we can merge this patch here.
What is not a solution to this problem here is trying to make the dma_resv rules in the kernel more clever. The fundamental issue here is that the amdgpu CS uapi is the least expressive one across all drivers (only equalled by panfrost, which has an actual excuse) by not allowing any userspace control over how implicit sync is conducted.
Until this is fixed it's completely pointless to make the kernel more clever to improve amdgpu, because all we're doing is papering over this uapi design issue. amdgpu needs to attain the status quo established by other drivers first, once that's achieved we can tackle the remaining issues in a consistent way across drivers.
v2: Bas pointed me at AMDGPU_GEM_CREATE_EXPLICIT_SYNC, which I entirely missed.
This is great because it means the amdgpu specific piece for proper implicit fence handling exists already, and that since a while. The only thing that's now missing is - fishing the implicit fences out of a shared object at the right time - setting the exclusive implicit fence slot at the right time.
Jason has a patch series to fill that gap with a bunch of generic ioctl on the dma-buf fd:
https://lore.kernel.org/dri-devel/20210520190007.534046-1-jason@jlekstrand.n...
v3: Since Christian has fixed amdgpu now in
commit 8c505bdc9c8b955223b054e34a0be9c3d841cd20 (drm-misc/drm-misc-next) Author: Christian König christian.koenig@amd.com Date: Wed Jun 9 13:51:36 2021 +0200
drm/amdgpu: rework dma_resv handling v3
Use the audit covered in this commit message as the excuse to update the dma-buf docs around dma_buf.resv usage across drivers.
Since dynamic importers have different rules also hammer these in again while we're at it.
v4: - Add the missing "through the device" in the dynamic section that I overlooked. - Fix a kerneldoc markup mistake, the link didn't connect
v5: - A few s/should/must/ to make clear what must be done (if the driver does implicit sync) and what's more a maybe (Daniel Stone) - drop all the example api discussion, that needs to be expanded, clarified and put into a new chapter in drm-uapi.rst (Daniel Stone)
Cc: Daniel Stone daniel@fooishbar.org Acked-by: Daniel Stone daniel@fooishbar.org Reviewed-by: Dave Airlie airlied@redhat.com (v4) Reviewed-by: Christian König christian.koenig@amd.com (v3) Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- include/linux/dma-buf.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 81cebf414505..2b814fde0d11 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -386,6 +386,40 @@ struct dma_buf { * @resv: * * Reservation object linked to this dma-buf. + * + * IMPLICIT SYNCHRONIZATION RULES: + * + * Drivers which support implicit synchronization of buffer access as + * e.g. exposed in `Implicit Fence Poll Support`_ must follow the + * below rules. + * + * - Drivers must add a shared fence through dma_resv_add_shared_fence() + * for anything the userspace API considers a read access. This highly + * depends upon the API and window system. + * + * - Similarly drivers must set the exclusive fence through + * dma_resv_add_excl_fence() for anything the userspace API considers + * write access. + * + * - Drivers may just always set the exclusive fence, since that only + * causes unecessarily synchronization, but no correctness issues. + * + * - Some drivers only expose a synchronous userspace API with no + * pipelining across drivers. These do not set any fences for their + * access. An example here is v4l. + * + * DYNAMIC IMPORTER RULES: + * + * Dynamic importers, see dma_buf_attachment_is_dynamic(), have + * additional constraints on how they set up fences: + * + * - Dynamic importers must obey the exclusive fence and wait for it to + * signal before allowing access to the buffer's underlying storage + * through the device. + * + * - Dynamic importers should set fences for any access that they can't + * disable immediately from their &dma_buf_attach_ops.move_notify + * callback. */ struct dma_resv *resv;
drm/scheduler requires a lock between _init and _push_job, but the reservation lock dance doesn't. So shrink the critical section a notch.
v2: Lucas pointed out how this should really work, I got it all wrong in v1.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com --- drivers/gpu/drm/panfrost/panfrost_job.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 2df3e999a38d..38f8580c19f1 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -224,14 +224,13 @@ int panfrost_job_push(struct panfrost_job *job) struct ww_acquire_ctx acquire_ctx; int ret = 0;
- mutex_lock(&pfdev->sched_lock);
ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx); - if (ret) { - mutex_unlock(&pfdev->sched_lock); + if (ret) return ret; - } + + mutex_lock(&pfdev->sched_lock);
ret = drm_sched_job_init(&job->base, entity, NULL); if (ret) {
On Tue, 22 Jun 2021 18:55:00 +0200 Daniel Vetter daniel.vetter@ffwll.ch wrote:
drm/scheduler requires a lock between _init and _push_job, but the reservation lock dance doesn't. So shrink the critical section a notch.
v2: Lucas pointed out how this should really work, I got it all wrong in v1.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com
Reviewed-by: Boris Brezillon boris.brezillon@collabora.com
drivers/gpu/drm/panfrost/panfrost_job.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 2df3e999a38d..38f8580c19f1 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -224,14 +224,13 @@ int panfrost_job_push(struct panfrost_job *job) struct ww_acquire_ctx acquire_ctx; int ret = 0;
mutex_lock(&pfdev->sched_lock);
ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx);
if (ret) {
mutex_unlock(&pfdev->sched_lock);
- if (ret) return ret;
- }
mutex_lock(&pfdev->sched_lock);
ret = drm_sched_job_init(&job->base, entity, NULL); if (ret) {
More consistency and prep work for the next patch.
Aside: I wonder whether we shouldn't just move this entire xarray business into the scheduler so that not everyone has to reinvent the same wheels. Cc'ing some scheduler people for this too.
v2: Correctly handle sched_lock since Lucas pointed out it's needed.
v3: Rebase, dma_resv_get_excl_unlocked got renamed
v4: Don't leak job references on failure (Steven).
Cc: Lucas Stach l.stach@pengutronix.de Cc: "Christian König" christian.koenig@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Lee Jones lee.jones@linaro.org Cc: Steven Price steven.price@arm.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/panfrost/panfrost_drv.c | 41 +++++++--------- drivers/gpu/drm/panfrost/panfrost_job.c | 65 +++++++++++-------------- drivers/gpu/drm/panfrost/panfrost_job.h | 8 ++- 3 files changed, 49 insertions(+), 65 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 075ec0ef746c..3ee828f1e7a5 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -138,12 +138,6 @@ panfrost_lookup_bos(struct drm_device *dev, if (!job->bo_count) return 0;
- job->implicit_fences = kvmalloc_array(job->bo_count, - sizeof(struct dma_fence *), - GFP_KERNEL | __GFP_ZERO); - if (!job->implicit_fences) - return -ENOMEM; - ret = drm_gem_objects_lookup(file_priv, (void __user *)(uintptr_t)args->bo_handles, job->bo_count, &job->bos); @@ -174,7 +168,7 @@ panfrost_lookup_bos(struct drm_device *dev, }
/** - * panfrost_copy_in_sync() - Sets up job->in_fences[] with the sync objects + * panfrost_copy_in_sync() - Sets up job->deps with the sync objects * referenced by the job. * @dev: DRM device * @file_priv: DRM file for this fd @@ -194,22 +188,14 @@ panfrost_copy_in_sync(struct drm_device *dev, { u32 *handles; int ret = 0; - int i; + int i, in_fence_count;
- job->in_fence_count = args->in_sync_count; + in_fence_count = args->in_sync_count;
- if (!job->in_fence_count) + if (!in_fence_count) return 0;
- job->in_fences = kvmalloc_array(job->in_fence_count, - sizeof(struct dma_fence *), - GFP_KERNEL | __GFP_ZERO); - if (!job->in_fences) { - DRM_DEBUG("Failed to allocate job in fences\n"); - return -ENOMEM; - } - - handles = kvmalloc_array(job->in_fence_count, sizeof(u32), GFP_KERNEL); + handles = kvmalloc_array(in_fence_count, sizeof(u32), GFP_KERNEL); if (!handles) { ret = -ENOMEM; DRM_DEBUG("Failed to allocate incoming syncobj handles\n"); @@ -218,16 +204,23 @@ panfrost_copy_in_sync(struct drm_device *dev,
if (copy_from_user(handles, (void __user *)(uintptr_t)args->in_syncs, - job->in_fence_count * sizeof(u32))) { + in_fence_count * sizeof(u32))) { ret = -EFAULT; DRM_DEBUG("Failed to copy in syncobj handles\n"); goto fail; }
- for (i = 0; i < job->in_fence_count; i++) { + for (i = 0; i < in_fence_count; i++) { + struct dma_fence *fence; + ret = drm_syncobj_find_fence(file_priv, handles[i], 0, 0, - &job->in_fences[i]); - if (ret == -EINVAL) + &fence); + if (ret) + goto fail; + + ret = drm_gem_fence_array_add(&job->deps, fence); + + if (ret) goto fail; }
@@ -265,6 +258,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
kref_init(&job->refcount);
+ xa_init_flags(&job->deps, XA_FLAGS_ALLOC); + job->pfdev = pfdev; job->jc = args->jc; job->requirements = args->requirements; diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 38f8580c19f1..71cd43fa1b36 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -196,14 +196,21 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); }
-static void panfrost_acquire_object_fences(struct drm_gem_object **bos, - int bo_count, - struct dma_fence **implicit_fences) +static int panfrost_acquire_object_fences(struct drm_gem_object **bos, + int bo_count, + struct xarray *deps) { - int i; + int i, ret;
- for (i = 0; i < bo_count; i++) - implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv); + for (i = 0; i < bo_count; i++) { + struct dma_fence *fence = dma_resv_get_excl_unlocked(bos[i]->resv); + + ret = drm_gem_fence_array_add(deps, fence); + if (ret) + return ret; + } + + return 0; }
static void panfrost_attach_object_fences(struct drm_gem_object **bos, @@ -240,10 +247,14 @@ int panfrost_job_push(struct panfrost_job *job)
job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
- kref_get(&job->refcount); /* put by scheduler job completion */ + ret = panfrost_acquire_object_fences(job->bos, job->bo_count, + &job->deps); + if (ret) { + mutex_unlock(&pfdev->sched_lock); + goto unlock; + }
- panfrost_acquire_object_fences(job->bos, job->bo_count, - job->implicit_fences); + kref_get(&job->refcount); /* put by scheduler job completion */
drm_sched_entity_push_job(&job->base, entity);
@@ -262,18 +273,15 @@ static void panfrost_job_cleanup(struct kref *ref) { struct panfrost_job *job = container_of(ref, struct panfrost_job, refcount); + struct dma_fence *fence; + unsigned long index; unsigned int i;
- if (job->in_fences) { - for (i = 0; i < job->in_fence_count; i++) - dma_fence_put(job->in_fences[i]); - kvfree(job->in_fences); - } - if (job->implicit_fences) { - for (i = 0; i < job->bo_count; i++) - dma_fence_put(job->implicit_fences[i]); - kvfree(job->implicit_fences); + xa_for_each(&job->deps, index, fence) { + dma_fence_put(fence); } + xa_destroy(&job->deps); + dma_fence_put(job->done_fence); dma_fence_put(job->render_done_fence);
@@ -316,26 +324,9 @@ static struct dma_fence *panfrost_job_dependency(struct drm_sched_job *sched_job struct drm_sched_entity *s_entity) { struct panfrost_job *job = to_panfrost_job(sched_job); - struct dma_fence *fence; - unsigned int i; - - /* Explicit fences */ - for (i = 0; i < job->in_fence_count; i++) { - if (job->in_fences[i]) { - fence = job->in_fences[i]; - job->in_fences[i] = NULL; - return fence; - } - }
- /* Implicit fences, max. one per BO */ - for (i = 0; i < job->bo_count; i++) { - if (job->implicit_fences[i]) { - fence = job->implicit_fences[i]; - job->implicit_fences[i] = NULL; - return fence; - } - } + if (!xa_empty(&job->deps)) + return xa_erase(&job->deps, job->last_dep++);
return NULL; } diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h index bbd3ba97ff67..82306a03b57e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.h +++ b/drivers/gpu/drm/panfrost/panfrost_job.h @@ -19,9 +19,9 @@ struct panfrost_job { struct panfrost_device *pfdev; struct panfrost_file_priv *file_priv;
- /* Optional fences userspace can pass in for the job to depend on. */ - struct dma_fence **in_fences; - u32 in_fence_count; + /* Contains both explicit and implicit fences */ + struct xarray deps; + unsigned long last_dep;
/* Fence to be signaled by IRQ handler when the job is complete. */ struct dma_fence *done_fence; @@ -30,8 +30,6 @@ struct panfrost_job { __u32 requirements; __u32 flush_id;
- /* Exclusive fences we have taken from the BOs to wait for */ - struct dma_fence **implicit_fences; struct panfrost_gem_mapping **mappings; struct drm_gem_object **bos; u32 bo_count;
On Tue, 22 Jun 2021 18:55:01 +0200 Daniel Vetter daniel.vetter@ffwll.ch wrote:
More consistency and prep work for the next patch.
Aside: I wonder whether we shouldn't just move this entire xarray business into the scheduler so that not everyone has to reinvent the same wheels. Cc'ing some scheduler people for this too.
v2: Correctly handle sched_lock since Lucas pointed out it's needed.
v3: Rebase, dma_resv_get_excl_unlocked got renamed
v4: Don't leak job references on failure (Steven).
Hehe, I had pretty much the same patch here [1].
Reviewed-by: Boris Brezillon boris.brezillon@collabora.com
[1]https://patchwork.kernel.org/project/dri-devel/patch/20210311092539.2405596-...
Cc: Lucas Stach l.stach@pengutronix.de Cc: "Christian König" christian.koenig@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Lee Jones lee.jones@linaro.org Cc: Steven Price steven.price@arm.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/gpu/drm/panfrost/panfrost_drv.c | 41 +++++++--------- drivers/gpu/drm/panfrost/panfrost_job.c | 65 +++++++++++-------------- drivers/gpu/drm/panfrost/panfrost_job.h | 8 ++- 3 files changed, 49 insertions(+), 65 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 075ec0ef746c..3ee828f1e7a5 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -138,12 +138,6 @@ panfrost_lookup_bos(struct drm_device *dev, if (!job->bo_count) return 0;
- job->implicit_fences = kvmalloc_array(job->bo_count,
sizeof(struct dma_fence *),
GFP_KERNEL | __GFP_ZERO);
- if (!job->implicit_fences)
return -ENOMEM;
- ret = drm_gem_objects_lookup(file_priv, (void __user *)(uintptr_t)args->bo_handles, job->bo_count, &job->bos);
@@ -174,7 +168,7 @@ panfrost_lookup_bos(struct drm_device *dev, }
/**
- panfrost_copy_in_sync() - Sets up job->in_fences[] with the sync objects
- panfrost_copy_in_sync() - Sets up job->deps with the sync objects
- referenced by the job.
- @dev: DRM device
- @file_priv: DRM file for this fd
@@ -194,22 +188,14 @@ panfrost_copy_in_sync(struct drm_device *dev, { u32 *handles; int ret = 0;
- int i;
- int i, in_fence_count;
- job->in_fence_count = args->in_sync_count;
- in_fence_count = args->in_sync_count;
- if (!job->in_fence_count)
- if (!in_fence_count) return 0;
- job->in_fences = kvmalloc_array(job->in_fence_count,
sizeof(struct dma_fence *),
GFP_KERNEL | __GFP_ZERO);
- if (!job->in_fences) {
DRM_DEBUG("Failed to allocate job in fences\n");
return -ENOMEM;
- }
- handles = kvmalloc_array(job->in_fence_count, sizeof(u32), GFP_KERNEL);
- handles = kvmalloc_array(in_fence_count, sizeof(u32), GFP_KERNEL); if (!handles) { ret = -ENOMEM; DRM_DEBUG("Failed to allocate incoming syncobj handles\n");
@@ -218,16 +204,23 @@ panfrost_copy_in_sync(struct drm_device *dev,
if (copy_from_user(handles, (void __user *)(uintptr_t)args->in_syncs,
job->in_fence_count * sizeof(u32))) {
ret = -EFAULT; DRM_DEBUG("Failed to copy in syncobj handles\n"); goto fail; }in_fence_count * sizeof(u32))) {
- for (i = 0; i < job->in_fence_count; i++) {
- for (i = 0; i < in_fence_count; i++) {
struct dma_fence *fence;
- ret = drm_syncobj_find_fence(file_priv, handles[i], 0, 0,
&job->in_fences[i]);
if (ret == -EINVAL)
&fence);
if (ret)
goto fail;
ret = drm_gem_fence_array_add(&job->deps, fence);
}if (ret) goto fail;
@@ -265,6 +258,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
kref_init(&job->refcount);
- xa_init_flags(&job->deps, XA_FLAGS_ALLOC);
- job->pfdev = pfdev; job->jc = args->jc; job->requirements = args->requirements;
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 38f8580c19f1..71cd43fa1b36 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -196,14 +196,21 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); }
-static void panfrost_acquire_object_fences(struct drm_gem_object **bos,
int bo_count,
struct dma_fence **implicit_fences)
+static int panfrost_acquire_object_fences(struct drm_gem_object **bos,
int bo_count,
struct xarray *deps)
{
- int i;
- int i, ret;
- for (i = 0; i < bo_count; i++)
implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv);
- for (i = 0; i < bo_count; i++) {
struct dma_fence *fence = dma_resv_get_excl_unlocked(bos[i]->resv);
ret = drm_gem_fence_array_add(deps, fence);
if (ret)
return ret;
- }
- return 0;
}
static void panfrost_attach_object_fences(struct drm_gem_object **bos, @@ -240,10 +247,14 @@ int panfrost_job_push(struct panfrost_job *job)
job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
- kref_get(&job->refcount); /* put by scheduler job completion */
- ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
&job->deps);
- if (ret) {
mutex_unlock(&pfdev->sched_lock);
goto unlock;
- }
- panfrost_acquire_object_fences(job->bos, job->bo_count,
job->implicit_fences);
kref_get(&job->refcount); /* put by scheduler job completion */
drm_sched_entity_push_job(&job->base, entity);
@@ -262,18 +273,15 @@ static void panfrost_job_cleanup(struct kref *ref) { struct panfrost_job *job = container_of(ref, struct panfrost_job, refcount);
- struct dma_fence *fence;
- unsigned long index; unsigned int i;
- if (job->in_fences) {
for (i = 0; i < job->in_fence_count; i++)
dma_fence_put(job->in_fences[i]);
kvfree(job->in_fences);
- }
- if (job->implicit_fences) {
for (i = 0; i < job->bo_count; i++)
dma_fence_put(job->implicit_fences[i]);
kvfree(job->implicit_fences);
- xa_for_each(&job->deps, index, fence) {
}dma_fence_put(fence);
- xa_destroy(&job->deps);
- dma_fence_put(job->done_fence); dma_fence_put(job->render_done_fence);
@@ -316,26 +324,9 @@ static struct dma_fence *panfrost_job_dependency(struct drm_sched_job *sched_job struct drm_sched_entity *s_entity) { struct panfrost_job *job = to_panfrost_job(sched_job);
struct dma_fence *fence;
unsigned int i;
/* Explicit fences */
for (i = 0; i < job->in_fence_count; i++) {
if (job->in_fences[i]) {
fence = job->in_fences[i];
job->in_fences[i] = NULL;
return fence;
}
}
/* Implicit fences, max. one per BO */
for (i = 0; i < job->bo_count; i++) {
if (job->implicit_fences[i]) {
fence = job->implicit_fences[i];
job->implicit_fences[i] = NULL;
return fence;
}
}
if (!xa_empty(&job->deps))
return xa_erase(&job->deps, job->last_dep++);
return NULL;
} diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h index bbd3ba97ff67..82306a03b57e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.h +++ b/drivers/gpu/drm/panfrost/panfrost_job.h @@ -19,9 +19,9 @@ struct panfrost_job { struct panfrost_device *pfdev; struct panfrost_file_priv *file_priv;
- /* Optional fences userspace can pass in for the job to depend on. */
- struct dma_fence **in_fences;
- u32 in_fence_count;
/* Contains both explicit and implicit fences */
struct xarray deps;
unsigned long last_dep;
/* Fence to be signaled by IRQ handler when the job is complete. */ struct dma_fence *done_fence;
@@ -30,8 +30,6 @@ struct panfrost_job { __u32 requirements; __u32 flush_id;
- /* Exclusive fences we have taken from the BOs to wait for */
- struct dma_fence **implicit_fences; struct panfrost_gem_mapping **mappings; struct drm_gem_object **bos; u32 bo_count;
Currently this has no practial relevance I think because there's not many who can pull off a setup with panfrost and another gpu in the same system. But the rules are that if you're setting an exclusive fence, indicating a gpu write access in the implicit fencing system, then you need to wait for all fences, not just the previous exclusive fence.
panfrost against itself has no problem, because it always sets the exclusive fence (but that's probably something that will need to be fixed for vulkan and/or multi-engine gpus, or you'll suffer badly). Also no problem with that against display.
With the prep work done to switch over to the dependency helpers this is now a oneliner.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org --- drivers/gpu/drm/panfrost/panfrost_job.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 71cd43fa1b36..ef004d587dc4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -203,9 +203,8 @@ static int panfrost_acquire_object_fences(struct drm_gem_object **bos, int i, ret;
for (i = 0; i < bo_count; i++) { - struct dma_fence *fence = dma_resv_get_excl_unlocked(bos[i]->resv); - - ret = drm_gem_fence_array_add(deps, fence); + /* panfrost always uses write mode in its current uapi */ + ret = drm_gem_fence_array_add_implicit(deps, bos[i], true); if (ret) return ret; }
On Tue, 22 Jun 2021 18:55:02 +0200 Daniel Vetter daniel.vetter@ffwll.ch wrote:
Currently this has no practial relevance I think because there's not many who can pull off a setup with panfrost and another gpu in the same system. But the rules are that if you're setting an exclusive fence, indicating a gpu write access in the implicit fencing system, then you need to wait for all fences, not just the previous exclusive fence.
panfrost against itself has no problem, because it always sets the exclusive fence (but that's probably something that will need to be fixed for vulkan and/or multi-engine gpus, or you'll suffer badly). Also no problem with that against display.
With the prep work done to switch over to the dependency helpers this is now a oneliner.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org
Reviewed-by: Boris Brezillon boris.brezillon@collabora.com
Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/gpu/drm/panfrost/panfrost_job.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 71cd43fa1b36..ef004d587dc4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -203,9 +203,8 @@ static int panfrost_acquire_object_fences(struct drm_gem_object **bos, int i, ret;
for (i = 0; i < bo_count; i++) {
struct dma_fence *fence = dma_resv_get_excl_unlocked(bos[i]->resv);
ret = drm_gem_fence_array_add(deps, fence);
/* panfrost always uses write mode in its current uapi */
if (ret) return ret; }ret = drm_gem_fence_array_add_implicit(deps, bos[i], true);
On Wed, Jun 23, 2021 at 06:47:37PM +0200, Boris Brezillon wrote:
On Tue, 22 Jun 2021 18:55:02 +0200 Daniel Vetter daniel.vetter@ffwll.ch wrote:
Currently this has no practial relevance I think because there's not many who can pull off a setup with panfrost and another gpu in the same system. But the rules are that if you're setting an exclusive fence, indicating a gpu write access in the implicit fencing system, then you need to wait for all fences, not just the previous exclusive fence.
panfrost against itself has no problem, because it always sets the exclusive fence (but that's probably something that will need to be fixed for vulkan and/or multi-engine gpus, or you'll suffer badly). Also no problem with that against display.
With the prep work done to switch over to the dependency helpers this is now a oneliner.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Rob Herring robh@kernel.org Cc: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: Steven Price steven.price@arm.com Cc: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org
Reviewed-by: Boris Brezillon boris.brezillon@collabora.com
Pushed the 3 panfrost patches to drm-misc-next, thanks for reviewing them. -Daniel
Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/gpu/drm/panfrost/panfrost_job.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 71cd43fa1b36..ef004d587dc4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -203,9 +203,8 @@ static int panfrost_acquire_object_fences(struct drm_gem_object **bos, int i, ret;
for (i = 0; i < bo_count; i++) {
struct dma_fence *fence = dma_resv_get_excl_unlocked(bos[i]->resv);
ret = drm_gem_fence_array_add(deps, fence);
/* panfrost always uses write mode in its current uapi */
if (ret) return ret; }ret = drm_gem_fence_array_add_implicit(deps, bos[i], true);
There's a bunch of atomic drivers who don't do this quite correctly, luckily most of them aren't in wide use or people would have noticed the tearing.
By making this the default we avoid the constant audit pain and can additionally remove a ton of lines from vfuncs for a bit more clarity in smaller drivers.
While at it complain if there's a cleanup_fb hook but no prepare_fb hook, because that makes no sense. I haven't found any driver which violates this, but better safe than sorry.
Subsequent patches will reap the benefits.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/drm_atomic_helper.c | 10 ++++++++++ drivers/gpu/drm/drm_gem_atomic_helper.c | 3 +++ include/drm/drm_modeset_helper_vtables.h | 7 +++++-- 3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 531f2374b072..9f6c5f21c4d6 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -35,6 +35,7 @@ #include <drm/drm_damage_helper.h> #include <drm/drm_device.h> #include <drm/drm_drv.h> +#include <drm/drm_gem_atomic_helper.h> #include <drm/drm_plane_helper.h> #include <drm/drm_print.h> #include <drm/drm_self_refresh_helper.h> @@ -2408,6 +2409,15 @@ int drm_atomic_helper_prepare_planes(struct drm_device *dev, ret = funcs->prepare_fb(plane, new_plane_state); if (ret) goto fail; + } else { + WARN_ON_ONCE(funcs->cleanup_fb); + + if (!drm_core_check_feature(dev, DRIVER_GEM)) + continue; + + ret = drm_gem_plane_helper_prepare_fb(plane, new_plane_state); + if (ret) + goto fail; } }
diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c b/drivers/gpu/drm/drm_gem_atomic_helper.c index a27135084ae5..bc9396f2a0ed 100644 --- a/drivers/gpu/drm/drm_gem_atomic_helper.c +++ b/drivers/gpu/drm/drm_gem_atomic_helper.c @@ -135,6 +135,9 @@ * GEM based framebuffer drivers which have their buffers always pinned in * memory. * + * This function is the default implementation for GEM drivers of + * &drm_plane_helper_funcs.prepare_fb if no callback is provided. + * * See drm_atomic_set_fence_for_plane() for a discussion of implicit and * explicit fencing in atomic modeset updates. */ diff --git a/include/drm/drm_modeset_helper_vtables.h b/include/drm/drm_modeset_helper_vtables.h index f3a4b47b3986..4e727261dca5 100644 --- a/include/drm/drm_modeset_helper_vtables.h +++ b/include/drm/drm_modeset_helper_vtables.h @@ -1178,8 +1178,11 @@ struct drm_plane_helper_funcs { * equivalent functionality should be implemented through private * members in the plane structure. * - * Drivers which always have their buffers pinned should use - * drm_gem_plane_helper_prepare_fb() for this hook. + * For GEM drivers who neither have a @prepare_fb not @cleanup_fb hook + * set drm_gem_plane_helper_prepare_fb() is called automatically to + * implement this. Other drivers which need additional plane processing + * can call drm_gem_plane_helper_prepare_fb() from their @prepare_fb + * hook. * * The helpers will call @cleanup_fb with matching arguments for every * successful call to this hook.
Hi Daniel,
On Tue, Jun 22, 2021 at 06:55:03PM +0200, Daniel Vetter wrote:
There's a bunch of atomic drivers who don't do this quite correctly, luckily most of them aren't in wide use or people would have noticed the tearing.
By making this the default we avoid the constant audit pain and can additionally remove a ton of lines from vfuncs for a bit more clarity in smaller drivers.
While at it complain if there's a cleanup_fb hook but no prepare_fb hook, because that makes no sense. I haven't found any driver which violates this, but better safe than sorry.
Subsequent patches will reap the benefits.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_atomic_helper.c | 10 ++++++++++ drivers/gpu/drm/drm_gem_atomic_helper.c | 3 +++ include/drm/drm_modeset_helper_vtables.h | 7 +++++-- 3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 531f2374b072..9f6c5f21c4d6 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -35,6 +35,7 @@ #include <drm/drm_damage_helper.h> #include <drm/drm_device.h> #include <drm/drm_drv.h> +#include <drm/drm_gem_atomic_helper.h> #include <drm/drm_plane_helper.h> #include <drm/drm_print.h> #include <drm/drm_self_refresh_helper.h> @@ -2408,6 +2409,15 @@ int drm_atomic_helper_prepare_planes(struct drm_device *dev, ret = funcs->prepare_fb(plane, new_plane_state); if (ret) goto fail;
} else {
WARN_ON_ONCE(funcs->cleanup_fb);
if (!drm_core_check_feature(dev, DRIVER_GEM))
continue;
ret = drm_gem_plane_helper_prepare_fb(plane, new_plane_state);
if (ret)
} }goto fail;
diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c b/drivers/gpu/drm/drm_gem_atomic_helper.c index a27135084ae5..bc9396f2a0ed 100644 --- a/drivers/gpu/drm/drm_gem_atomic_helper.c +++ b/drivers/gpu/drm/drm_gem_atomic_helper.c @@ -135,6 +135,9 @@
- GEM based framebuffer drivers which have their buffers always pinned in
- memory.
- This function is the default implementation for GEM drivers of
- &drm_plane_helper_funcs.prepare_fb if no callback is provided.
*/
- See drm_atomic_set_fence_for_plane() for a discussion of implicit and
- explicit fencing in atomic modeset updates.
diff --git a/include/drm/drm_modeset_helper_vtables.h b/include/drm/drm_modeset_helper_vtables.h index f3a4b47b3986..4e727261dca5 100644 --- a/include/drm/drm_modeset_helper_vtables.h +++ b/include/drm/drm_modeset_helper_vtables.h @@ -1178,8 +1178,11 @@ struct drm_plane_helper_funcs { * equivalent functionality should be implemented through private * members in the plane structure. *
* Drivers which always have their buffers pinned should use
* drm_gem_plane_helper_prepare_fb() for this hook.
* For GEM drivers who neither have a @prepare_fb not @cleanup_fb hook
s/not/nor/ ??
* set drm_gem_plane_helper_prepare_fb() is called automatically to
^add comma?
* implement this.
Leave cleanup_fb out of the description to make it more readable. In the description of cleanup_fb you can document that it is wrong to have it without a matcching prepare_fb if you feel for it.
Sam
* Other drivers which need additional plane processing
* can call drm_gem_plane_helper_prepare_fb() from their @prepare_fb
* hook.
- The helpers will call @cleanup_fb with matching arguments for every
- successful call to this hook.
-- 2.32.0.rc2
On Tue, Jun 22, 2021 at 9:10 PM Sam Ravnborg sam@ravnborg.org wrote:
Hi Daniel,
On Tue, Jun 22, 2021 at 06:55:03PM +0200, Daniel Vetter wrote:
There's a bunch of atomic drivers who don't do this quite correctly, luckily most of them aren't in wide use or people would have noticed the tearing.
By making this the default we avoid the constant audit pain and can additionally remove a ton of lines from vfuncs for a bit more clarity in smaller drivers.
While at it complain if there's a cleanup_fb hook but no prepare_fb hook, because that makes no sense. I haven't found any driver which violates this, but better safe than sorry.
Subsequent patches will reap the benefits.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_atomic_helper.c | 10 ++++++++++ drivers/gpu/drm/drm_gem_atomic_helper.c | 3 +++ include/drm/drm_modeset_helper_vtables.h | 7 +++++-- 3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 531f2374b072..9f6c5f21c4d6 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -35,6 +35,7 @@ #include <drm/drm_damage_helper.h> #include <drm/drm_device.h> #include <drm/drm_drv.h> +#include <drm/drm_gem_atomic_helper.h> #include <drm/drm_plane_helper.h> #include <drm/drm_print.h> #include <drm/drm_self_refresh_helper.h> @@ -2408,6 +2409,15 @@ int drm_atomic_helper_prepare_planes(struct drm_device *dev, ret = funcs->prepare_fb(plane, new_plane_state); if (ret) goto fail;
} else {
WARN_ON_ONCE(funcs->cleanup_fb);
if (!drm_core_check_feature(dev, DRIVER_GEM))
continue;
ret = drm_gem_plane_helper_prepare_fb(plane, new_plane_state);
if (ret)
goto fail; } }
diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c b/drivers/gpu/drm/drm_gem_atomic_helper.c index a27135084ae5..bc9396f2a0ed 100644 --- a/drivers/gpu/drm/drm_gem_atomic_helper.c +++ b/drivers/gpu/drm/drm_gem_atomic_helper.c @@ -135,6 +135,9 @@
- GEM based framebuffer drivers which have their buffers always pinned in
- memory.
- This function is the default implementation for GEM drivers of
- &drm_plane_helper_funcs.prepare_fb if no callback is provided.
*/
- See drm_atomic_set_fence_for_plane() for a discussion of implicit and
- explicit fencing in atomic modeset updates.
diff --git a/include/drm/drm_modeset_helper_vtables.h b/include/drm/drm_modeset_helper_vtables.h index f3a4b47b3986..4e727261dca5 100644 --- a/include/drm/drm_modeset_helper_vtables.h +++ b/include/drm/drm_modeset_helper_vtables.h @@ -1178,8 +1178,11 @@ struct drm_plane_helper_funcs { * equivalent functionality should be implemented through private * members in the plane structure. *
* Drivers which always have their buffers pinned should use
* drm_gem_plane_helper_prepare_fb() for this hook.
* For GEM drivers who neither have a @prepare_fb not @cleanup_fb hook
s/not/nor/ ??
Yup.
* set drm_gem_plane_helper_prepare_fb() is called automatically to
^add comma?
* implement this.
Leave cleanup_fb out of the description to make it more readable.
With the not->nor typo fixed, why does this make it more readable? Afaiui neither ... nor ... is fairly standard English, and I really want to make this the default only if you specify absolutely no plane fb handling of your own.
In the description of cleanup_fb you can document that it is wrong to have it without a matcching prepare_fb if you feel for it.
So the reason I didn't document things that way is that imo the "cleanup_fb but not prepare_fb" case is just nonsense. But I also didn't want to accidentally paper over bugs where people set only cleanup_fb and forget to hook up the other one, hence the warning. But if you think we should explain that in docs, I guess I can shuffle it around. Just feel like specifying everything in the comments doesn't help the readability of the docs. -Daniel
Sam * Other drivers which need additional plane processing
* can call drm_gem_plane_helper_prepare_fb() from their @prepare_fb
* hook. * * The helpers will call @cleanup_fb with matching arguments for every * successful call to this hook.
-- 2.32.0.rc2
Hi Daniel,
* equivalent functionality should be implemented through private * members in the plane structure. *
* Drivers which always have their buffers pinned should use
* drm_gem_plane_helper_prepare_fb() for this hook.
* For GEM drivers who neither have a @prepare_fb not @cleanup_fb hook
s/not/nor/ ??
Yup.
* set drm_gem_plane_helper_prepare_fb() is called automatically to
^add comma?
* implement this.
Leave cleanup_fb out of the description to make it more readable.
With the not->nor typo fixed, why does this make it more readable? Afaiui neither ... nor ... is fairly standard English, and I really want to make this the default only if you specify absolutely no plane fb handling of your own.
What I tried to suggest was like this:
" Drivers which always have their buffers pinned should use drm_gem_plane_helper_prepare_fb() for this hook. For GEM drivers who do not have a @prepare_fb hook set, drm_gem_plane_helper_prepare_fb() is called automatically to implement this. "
But anyway is fine and with the typo fixed: Acked-by: Sam Ravnborg sam@ravnborg.org
Sam
There's a bunch of atomic drivers who don't do this quite correctly, luckily most of them aren't in wide use or people would have noticed the tearing.
By making this the default we avoid the constant audit pain and can additionally remove a ton of lines from vfuncs for a bit more clarity in smaller drivers.
While at it complain if there's a cleanup_fb hook but no prepare_fb hook, because that makes no sense. I haven't found any driver which violates this, but better safe than sorry.
Subsequent patches will reap the benefits.
v2: It's neither ... nor, not not (Sam)
Acked-by: Sam Ravnborg sam@ravnborg.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/drm_atomic_helper.c | 10 ++++++++++ drivers/gpu/drm/drm_gem_atomic_helper.c | 3 +++ include/drm/drm_modeset_helper_vtables.h | 7 +++++-- 3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 531f2374b072..9f6c5f21c4d6 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -35,6 +35,7 @@ #include <drm/drm_damage_helper.h> #include <drm/drm_device.h> #include <drm/drm_drv.h> +#include <drm/drm_gem_atomic_helper.h> #include <drm/drm_plane_helper.h> #include <drm/drm_print.h> #include <drm/drm_self_refresh_helper.h> @@ -2408,6 +2409,15 @@ int drm_atomic_helper_prepare_planes(struct drm_device *dev, ret = funcs->prepare_fb(plane, new_plane_state); if (ret) goto fail; + } else { + WARN_ON_ONCE(funcs->cleanup_fb); + + if (!drm_core_check_feature(dev, DRIVER_GEM)) + continue; + + ret = drm_gem_plane_helper_prepare_fb(plane, new_plane_state); + if (ret) + goto fail; } }
diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c b/drivers/gpu/drm/drm_gem_atomic_helper.c index a27135084ae5..bc9396f2a0ed 100644 --- a/drivers/gpu/drm/drm_gem_atomic_helper.c +++ b/drivers/gpu/drm/drm_gem_atomic_helper.c @@ -135,6 +135,9 @@ * GEM based framebuffer drivers which have their buffers always pinned in * memory. * + * This function is the default implementation for GEM drivers of + * &drm_plane_helper_funcs.prepare_fb if no callback is provided. + * * See drm_atomic_set_fence_for_plane() for a discussion of implicit and * explicit fencing in atomic modeset updates. */ diff --git a/include/drm/drm_modeset_helper_vtables.h b/include/drm/drm_modeset_helper_vtables.h index f3a4b47b3986..fdfa9f37ce05 100644 --- a/include/drm/drm_modeset_helper_vtables.h +++ b/include/drm/drm_modeset_helper_vtables.h @@ -1178,8 +1178,11 @@ struct drm_plane_helper_funcs { * equivalent functionality should be implemented through private * members in the plane structure. * - * Drivers which always have their buffers pinned should use - * drm_gem_plane_helper_prepare_fb() for this hook. + * For GEM drivers who neither have a @prepare_fb nor @cleanup_fb hook + * set drm_gem_plane_helper_prepare_fb() is called automatically to + * implement this. Other drivers which need additional plane processing + * can call drm_gem_plane_helper_prepare_fb() from their @prepare_fb + * hook. * * The helpers will call @cleanup_fb with matching arguments for every * successful call to this hook.
No need to set it explicitly.
Acked-by: Heiko Stuebner heiko@sntech.de Acked-by: Paul Cercueil paul@crapouillou.net Acked-by: Jernej Skrabec jernej.skrabec@gmail.com Acked-by: Chun-Kuang Hu chunkuang.hu@kernel.org Acked-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Acked-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Acked-by: Philippe Cornu philippe.cornu@foss.st.com Acked-by: Lucas Stach l.stach@pengutronix.de Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Laurentiu Palcu laurentiu.palcu@oss.nxp.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Shawn Guo shawnguo@kernel.org Cc: Sascha Hauer s.hauer@pengutronix.de Cc: Pengutronix Kernel Team kernel@pengutronix.de Cc: Fabio Estevam festevam@gmail.com Cc: NXP Linux Team linux-imx@nxp.com Cc: Philipp Zabel p.zabel@pengutronix.de Cc: Paul Cercueil paul@crapouillou.net Cc: Chun-Kuang Hu chunkuang.hu@kernel.org Cc: Matthias Brugger matthias.bgg@gmail.com Cc: Neil Armstrong narmstrong@baylibre.com Cc: Kevin Hilman khilman@baylibre.com Cc: Jerome Brunet jbrunet@baylibre.com Cc: Martin Blumenstingl martin.blumenstingl@googlemail.com Cc: Marek Vasut marex@denx.de Cc: Stefan Agner stefan@agner.ch Cc: Sandy Huang hjc@rock-chips.com Cc: "Heiko Stübner" heiko@sntech.de Cc: Yannick Fertre yannick.fertre@foss.st.com Cc: Philippe Cornu philippe.cornu@foss.st.com Cc: Benjamin Gaignard benjamin.gaignard@linaro.org Cc: Maxime Coquelin mcoquelin.stm32@gmail.com Cc: Alexandre Torgue alexandre.torgue@foss.st.com Cc: Maxime Ripard mripard@kernel.org Cc: Chen-Yu Tsai wens@csie.org Cc: Jernej Skrabec jernej.skrabec@gmail.com Cc: Jyri Sarha jyri.sarha@iki.fi Cc: Tomi Valkeinen tomba@kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-mediatek@lists.infradead.org Cc: linux-amlogic@lists.infradead.org Cc: linux-rockchip@lists.infradead.org Cc: linux-stm32@st-md-mailman.stormreply.com Cc: linux-sunxi@lists.linux.dev --- drivers/gpu/drm/imx/dcss/dcss-plane.c | 1 - drivers/gpu/drm/imx/ipuv3-plane.c | 1 - drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 1 - drivers/gpu/drm/ingenic/ingenic-ipu.c | 1 - drivers/gpu/drm/mediatek/mtk_drm_plane.c | 1 - drivers/gpu/drm/meson/meson_overlay.c | 1 - drivers/gpu/drm/meson/meson_plane.c | 1 - drivers/gpu/drm/mxsfb/mxsfb_kms.c | 2 -- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 1 - drivers/gpu/drm/stm/ltdc.c | 1 - drivers/gpu/drm/sun4i/sun4i_layer.c | 1 - drivers/gpu/drm/sun4i/sun8i_ui_layer.c | 1 - drivers/gpu/drm/sun4i/sun8i_vi_layer.c | 1 - drivers/gpu/drm/tidss/tidss_plane.c | 1 - 14 files changed, 15 deletions(-)
diff --git a/drivers/gpu/drm/imx/dcss/dcss-plane.c b/drivers/gpu/drm/imx/dcss/dcss-plane.c index 044d3bdf313c..ac45d54acd4e 100644 --- a/drivers/gpu/drm/imx/dcss/dcss-plane.c +++ b/drivers/gpu/drm/imx/dcss/dcss-plane.c @@ -361,7 +361,6 @@ static void dcss_plane_atomic_disable(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs dcss_plane_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = dcss_plane_atomic_check, .atomic_update = dcss_plane_atomic_update, .atomic_disable = dcss_plane_atomic_disable, diff --git a/drivers/gpu/drm/imx/ipuv3-plane.c b/drivers/gpu/drm/imx/ipuv3-plane.c index 8710f55d2579..ef114b6aa691 100644 --- a/drivers/gpu/drm/imx/ipuv3-plane.c +++ b/drivers/gpu/drm/imx/ipuv3-plane.c @@ -772,7 +772,6 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs ipu_plane_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = ipu_plane_atomic_check, .atomic_disable = ipu_plane_atomic_disable, .atomic_update = ipu_plane_atomic_update, diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 5244f4763477..c296472164d9 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -830,7 +830,6 @@ static const struct drm_plane_helper_funcs ingenic_drm_plane_helper_funcs = { .atomic_update = ingenic_drm_plane_atomic_update, .atomic_check = ingenic_drm_plane_atomic_check, .atomic_disable = ingenic_drm_plane_atomic_disable, - .prepare_fb = drm_gem_plane_helper_prepare_fb, };
static const struct drm_crtc_helper_funcs ingenic_drm_crtc_helper_funcs = { diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c index 61b6d9fdbba1..aeb8a757d213 100644 --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c @@ -625,7 +625,6 @@ static const struct drm_plane_helper_funcs ingenic_ipu_plane_helper_funcs = { .atomic_update = ingenic_ipu_plane_atomic_update, .atomic_check = ingenic_ipu_plane_atomic_check, .atomic_disable = ingenic_ipu_plane_atomic_disable, - .prepare_fb = drm_gem_plane_helper_prepare_fb, };
static int diff --git a/drivers/gpu/drm/mediatek/mtk_drm_plane.c b/drivers/gpu/drm/mediatek/mtk_drm_plane.c index b5582dcf564c..1667a7e7de38 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_plane.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_plane.c @@ -227,7 +227,6 @@ static void mtk_plane_atomic_update(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs mtk_plane_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = mtk_plane_atomic_check, .atomic_update = mtk_plane_atomic_update, .atomic_disable = mtk_plane_atomic_disable, diff --git a/drivers/gpu/drm/meson/meson_overlay.c b/drivers/gpu/drm/meson/meson_overlay.c index ed063152aecd..dfef8afcc245 100644 --- a/drivers/gpu/drm/meson/meson_overlay.c +++ b/drivers/gpu/drm/meson/meson_overlay.c @@ -747,7 +747,6 @@ static const struct drm_plane_helper_funcs meson_overlay_helper_funcs = { .atomic_check = meson_overlay_atomic_check, .atomic_disable = meson_overlay_atomic_disable, .atomic_update = meson_overlay_atomic_update, - .prepare_fb = drm_gem_plane_helper_prepare_fb, };
static bool meson_overlay_format_mod_supported(struct drm_plane *plane, diff --git a/drivers/gpu/drm/meson/meson_plane.c b/drivers/gpu/drm/meson/meson_plane.c index a18510dae4c8..8640a8a8a469 100644 --- a/drivers/gpu/drm/meson/meson_plane.c +++ b/drivers/gpu/drm/meson/meson_plane.c @@ -422,7 +422,6 @@ static const struct drm_plane_helper_funcs meson_plane_helper_funcs = { .atomic_check = meson_plane_atomic_check, .atomic_disable = meson_plane_atomic_disable, .atomic_update = meson_plane_atomic_update, - .prepare_fb = drm_gem_plane_helper_prepare_fb, };
static bool meson_plane_format_mod_supported(struct drm_plane *plane, diff --git a/drivers/gpu/drm/mxsfb/mxsfb_kms.c b/drivers/gpu/drm/mxsfb/mxsfb_kms.c index 300e7bab0f43..8797c671d0d5 100644 --- a/drivers/gpu/drm/mxsfb/mxsfb_kms.c +++ b/drivers/gpu/drm/mxsfb/mxsfb_kms.c @@ -500,13 +500,11 @@ static bool mxsfb_format_mod_supported(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs mxsfb_plane_primary_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = mxsfb_plane_atomic_check, .atomic_update = mxsfb_plane_primary_atomic_update, };
static const struct drm_plane_helper_funcs mxsfb_plane_overlay_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = mxsfb_plane_atomic_check, .atomic_update = mxsfb_plane_overlay_atomic_update, }; diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index f5b9028a16a3..ba9e14da41b4 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -1110,7 +1110,6 @@ static const struct drm_plane_helper_funcs plane_helper_funcs = { .atomic_disable = vop_plane_atomic_disable, .atomic_async_check = vop_plane_atomic_async_check, .atomic_async_update = vop_plane_atomic_async_update, - .prepare_fb = drm_gem_plane_helper_prepare_fb, };
static const struct drm_plane_funcs vop_plane_funcs = { diff --git a/drivers/gpu/drm/stm/ltdc.c b/drivers/gpu/drm/stm/ltdc.c index 08b71248044d..0a6f0239a9f8 100644 --- a/drivers/gpu/drm/stm/ltdc.c +++ b/drivers/gpu/drm/stm/ltdc.c @@ -947,7 +947,6 @@ static const struct drm_plane_funcs ltdc_plane_funcs = { };
static const struct drm_plane_helper_funcs ltdc_plane_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = ltdc_plane_atomic_check, .atomic_update = ltdc_plane_atomic_update, .atomic_disable = ltdc_plane_atomic_disable, diff --git a/drivers/gpu/drm/sun4i/sun4i_layer.c b/drivers/gpu/drm/sun4i/sun4i_layer.c index 11771bdd6e7c..929e95f86b5b 100644 --- a/drivers/gpu/drm/sun4i/sun4i_layer.c +++ b/drivers/gpu/drm/sun4i/sun4i_layer.c @@ -127,7 +127,6 @@ static bool sun4i_layer_format_mod_supported(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs sun4i_backend_layer_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_disable = sun4i_backend_layer_atomic_disable, .atomic_update = sun4i_backend_layer_atomic_update, }; diff --git a/drivers/gpu/drm/sun4i/sun8i_ui_layer.c b/drivers/gpu/drm/sun4i/sun8i_ui_layer.c index e779855bcd6e..7845c2a53a7f 100644 --- a/drivers/gpu/drm/sun4i/sun8i_ui_layer.c +++ b/drivers/gpu/drm/sun4i/sun8i_ui_layer.c @@ -332,7 +332,6 @@ static void sun8i_ui_layer_atomic_update(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs sun8i_ui_layer_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = sun8i_ui_layer_atomic_check, .atomic_disable = sun8i_ui_layer_atomic_disable, .atomic_update = sun8i_ui_layer_atomic_update, diff --git a/drivers/gpu/drm/sun4i/sun8i_vi_layer.c b/drivers/gpu/drm/sun4i/sun8i_vi_layer.c index 1c86c2dd0bbf..bb7c43036dfa 100644 --- a/drivers/gpu/drm/sun4i/sun8i_vi_layer.c +++ b/drivers/gpu/drm/sun4i/sun8i_vi_layer.c @@ -436,7 +436,6 @@ static void sun8i_vi_layer_atomic_update(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs sun8i_vi_layer_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = sun8i_vi_layer_atomic_check, .atomic_disable = sun8i_vi_layer_atomic_disable, .atomic_update = sun8i_vi_layer_atomic_update, diff --git a/drivers/gpu/drm/tidss/tidss_plane.c b/drivers/gpu/drm/tidss/tidss_plane.c index 1acd15aa4193..217415ec8eea 100644 --- a/drivers/gpu/drm/tidss/tidss_plane.c +++ b/drivers/gpu/drm/tidss/tidss_plane.c @@ -158,7 +158,6 @@ static void drm_plane_destroy(struct drm_plane *plane) }
static const struct drm_plane_helper_funcs tidss_plane_helper_funcs = { - .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = tidss_plane_atomic_check, .atomic_update = tidss_plane_atomic_update, .atomic_disable = tidss_plane_atomic_disable,
On Tue, 2021-06-22 at 18:55 +0200, Daniel Vetter wrote:
No need to set it explicitly.
[...]
drivers/gpu/drm/imx/ipuv3-plane.c | 1 - 14 files changed, 15 deletions(-)
[...]
diff --git a/drivers/gpu/drm/imx/ipuv3-plane.c b/drivers/gpu/drm/imx/ipuv3-plane.c index 8710f55d2579..ef114b6aa691 100644 --- a/drivers/gpu/drm/imx/ipuv3-plane.c +++ b/drivers/gpu/drm/imx/ipuv3-plane.c @@ -772,7 +772,6 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs ipu_plane_helper_funcs = {
- .prepare_fb = drm_gem_plane_helper_prepare_fb, .atomic_check = ipu_plane_atomic_check, .atomic_disable = ipu_plane_atomic_disable, .atomic_update = ipu_plane_atomic_update,
Acked-by: Philipp Zabel p.zabel@pengutronix.de
regards Philipp
All they do is refcount the fb, which the atomic helpers already do.
This is was necessary with the legacy helpers and I guess just carry over in the conversion. drm_plane_state always has a full reference for its ->fb pointer during its entire lifetime, see __drm_atomic_helper_plane_destroy_state()
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Russell King linux@armlinux.org.uk --- drivers/gpu/drm/armada/armada_overlay.c | 2 -- drivers/gpu/drm/armada/armada_plane.c | 29 ------------------------- drivers/gpu/drm/armada/armada_plane.h | 2 -- 3 files changed, 33 deletions(-)
diff --git a/drivers/gpu/drm/armada/armada_overlay.c b/drivers/gpu/drm/armada/armada_overlay.c index d3e3e5fdc390..424250535fed 100644 --- a/drivers/gpu/drm/armada/armada_overlay.c +++ b/drivers/gpu/drm/armada/armada_overlay.c @@ -247,8 +247,6 @@ static void armada_drm_overlay_plane_atomic_disable(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs armada_overlay_plane_helper_funcs = { - .prepare_fb = armada_drm_plane_prepare_fb, - .cleanup_fb = armada_drm_plane_cleanup_fb, .atomic_check = armada_drm_plane_atomic_check, .atomic_update = armada_drm_overlay_plane_atomic_update, .atomic_disable = armada_drm_overlay_plane_atomic_disable, diff --git a/drivers/gpu/drm/armada/armada_plane.c b/drivers/gpu/drm/armada/armada_plane.c index 40f5c34fb4d8..1c56a2883b91 100644 --- a/drivers/gpu/drm/armada/armada_plane.c +++ b/drivers/gpu/drm/armada/armada_plane.c @@ -78,33 +78,6 @@ void armada_drm_plane_calc(struct drm_plane_state *state, u32 addrs[2][3], } }
-int armada_drm_plane_prepare_fb(struct drm_plane *plane, - struct drm_plane_state *state) -{ - DRM_DEBUG_KMS("[PLANE:%d:%s] [FB:%d]\n", - plane->base.id, plane->name, - state->fb ? state->fb->base.id : 0); - - /* - * Take a reference on the new framebuffer - we want to - * hold on to it while the hardware is displaying it. - */ - if (state->fb) - drm_framebuffer_get(state->fb); - return 0; -} - -void armada_drm_plane_cleanup_fb(struct drm_plane *plane, - struct drm_plane_state *old_state) -{ - DRM_DEBUG_KMS("[PLANE:%d:%s] [FB:%d]\n", - plane->base.id, plane->name, - old_state->fb ? old_state->fb->base.id : 0); - - if (old_state->fb) - drm_framebuffer_put(old_state->fb); -} - int armada_drm_plane_atomic_check(struct drm_plane *plane, struct drm_atomic_state *state) { @@ -282,8 +255,6 @@ static void armada_drm_primary_plane_atomic_disable(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs armada_primary_plane_helper_funcs = { - .prepare_fb = armada_drm_plane_prepare_fb, - .cleanup_fb = armada_drm_plane_cleanup_fb, .atomic_check = armada_drm_plane_atomic_check, .atomic_update = armada_drm_primary_plane_atomic_update, .atomic_disable = armada_drm_primary_plane_atomic_disable, diff --git a/drivers/gpu/drm/armada/armada_plane.h b/drivers/gpu/drm/armada/armada_plane.h index 51dab8d8da22..368415c609a6 100644 --- a/drivers/gpu/drm/armada/armada_plane.h +++ b/drivers/gpu/drm/armada/armada_plane.h @@ -21,8 +21,6 @@ struct armada_plane_state {
void armada_drm_plane_calc(struct drm_plane_state *state, u32 addrs[2][3], u16 pitches[3], bool interlaced); -int armada_drm_plane_prepare_fb(struct drm_plane *plane, - struct drm_plane_state *state); void armada_drm_plane_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state); int armada_drm_plane_atomic_check(struct drm_plane *plane,
On Tue, Jun 22, 2021 at 06:55:05PM +0200, Daniel Vetter wrote:
All they do is refcount the fb, which the atomic helpers already do.
This is was necessary with the legacy helpers and I guess just carry over in the conversion. drm_plane_state always has a full reference for its ->fb pointer during its entire lifetime, see __drm_atomic_helper_plane_destroy_state()
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Russell King linux@armlinux.org.uk
Acked-by: Maxime Ripard maxime@cerno.tech
Maxime
Like we have for the shadow helpers too, and roll it out to drivers.
Acked-by: Tian Tao tiantao6@hisilicon.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Dave Airlie airlied@redhat.com Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Hans de Goede hdegoede@redhat.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Tian Tao tiantao6@hisilicon.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com --- drivers/gpu/drm/ast/ast_mode.c | 3 +-- drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c | 3 +-- drivers/gpu/drm/vboxvideo/vbox_mode.c | 3 +-- include/drm/drm_gem_vram_helper.h | 12 ++++++++++++ 4 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c index e5996ae03c49..f5d58c3088fe 100644 --- a/drivers/gpu/drm/ast/ast_mode.c +++ b/drivers/gpu/drm/ast/ast_mode.c @@ -612,8 +612,7 @@ ast_primary_plane_helper_atomic_disable(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs ast_primary_plane_helper_funcs = { - .prepare_fb = drm_gem_vram_plane_helper_prepare_fb, - .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb, + DRM_GEM_VRAM_PLANE_HELPER_FUNCS, .atomic_check = ast_primary_plane_helper_atomic_check, .atomic_update = ast_primary_plane_helper_atomic_update, .atomic_disable = ast_primary_plane_helper_atomic_disable, diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c index 29b8332b2bca..ccf80e369b4b 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c @@ -158,8 +158,7 @@ static const struct drm_plane_funcs hibmc_plane_funcs = { };
static const struct drm_plane_helper_funcs hibmc_plane_helper_funcs = { - .prepare_fb = drm_gem_vram_plane_helper_prepare_fb, - .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb, + DRM_GEM_VRAM_PLANE_HELPER_FUNCS, .atomic_check = hibmc_plane_atomic_check, .atomic_update = hibmc_plane_atomic_update, }; diff --git a/drivers/gpu/drm/vboxvideo/vbox_mode.c b/drivers/gpu/drm/vboxvideo/vbox_mode.c index 964381d55fc1..972c83b720aa 100644 --- a/drivers/gpu/drm/vboxvideo/vbox_mode.c +++ b/drivers/gpu/drm/vboxvideo/vbox_mode.c @@ -488,8 +488,7 @@ static const struct drm_plane_helper_funcs vbox_primary_helper_funcs = { .atomic_check = vbox_primary_atomic_check, .atomic_update = vbox_primary_atomic_update, .atomic_disable = vbox_primary_atomic_disable, - .prepare_fb = drm_gem_vram_plane_helper_prepare_fb, - .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb, + DRM_GEM_VRAM_PLANE_HELPER_FUNCS, };
static const struct drm_plane_funcs vbox_primary_plane_funcs = { diff --git a/include/drm/drm_gem_vram_helper.h b/include/drm/drm_gem_vram_helper.h index 27ed7e9243b9..f48d181c824b 100644 --- a/include/drm/drm_gem_vram_helper.h +++ b/include/drm/drm_gem_vram_helper.h @@ -124,6 +124,18 @@ void drm_gem_vram_plane_helper_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state);
+/** + * DRM_GEM_VRAM_PLANE_HELPER_FUNCS - + * Initializes struct drm_plane_helper_funcs for VRAM handling + * + * Drivers may use GEM BOs as VRAM helpers for the framebuffer memory. This + * macro initializes struct drm_plane_helper_funcs to use the respective helper + * functions. + */ +#define DRM_GEM_VRAM_PLANE_HELPER_FUNCS \ + .prepare_fb = drm_gem_vram_plane_helper_prepare_fb, \ + .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb + /* * Helpers for struct drm_simple_display_pipe_funcs */
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Like we have for the shadow helpers too, and roll it out to drivers.
Acked-by: Tian Tao tiantao6@hisilicon.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Dave Airlie airlied@redhat.com Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Hans de Goede hdegoede@redhat.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Tian Tao tiantao6@hisilicon.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com
Acked-by: Thomas Zimmermann tzimmermann@suse.de
drivers/gpu/drm/ast/ast_mode.c | 3 +-- drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c | 3 +-- drivers/gpu/drm/vboxvideo/vbox_mode.c | 3 +-- include/drm/drm_gem_vram_helper.h | 12 ++++++++++++ 4 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c index e5996ae03c49..f5d58c3088fe 100644 --- a/drivers/gpu/drm/ast/ast_mode.c +++ b/drivers/gpu/drm/ast/ast_mode.c @@ -612,8 +612,7 @@ ast_primary_plane_helper_atomic_disable(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs ast_primary_plane_helper_funcs = {
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS, .atomic_check = ast_primary_plane_helper_atomic_check, .atomic_update = ast_primary_plane_helper_atomic_update, .atomic_disable = ast_primary_plane_helper_atomic_disable,
diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c index 29b8332b2bca..ccf80e369b4b 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c @@ -158,8 +158,7 @@ static const struct drm_plane_funcs hibmc_plane_funcs = { };
static const struct drm_plane_helper_funcs hibmc_plane_helper_funcs = {
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS, .atomic_check = hibmc_plane_atomic_check, .atomic_update = hibmc_plane_atomic_update, };
diff --git a/drivers/gpu/drm/vboxvideo/vbox_mode.c b/drivers/gpu/drm/vboxvideo/vbox_mode.c index 964381d55fc1..972c83b720aa 100644 --- a/drivers/gpu/drm/vboxvideo/vbox_mode.c +++ b/drivers/gpu/drm/vboxvideo/vbox_mode.c @@ -488,8 +488,7 @@ static const struct drm_plane_helper_funcs vbox_primary_helper_funcs = { .atomic_check = vbox_primary_atomic_check, .atomic_update = vbox_primary_atomic_update, .atomic_disable = vbox_primary_atomic_disable,
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
DRM_GEM_VRAM_PLANE_HELPER_FUNCS, };
static const struct drm_plane_funcs vbox_primary_plane_funcs = {
diff --git a/include/drm/drm_gem_vram_helper.h b/include/drm/drm_gem_vram_helper.h index 27ed7e9243b9..f48d181c824b 100644 --- a/include/drm/drm_gem_vram_helper.h +++ b/include/drm/drm_gem_vram_helper.h @@ -124,6 +124,18 @@ void drm_gem_vram_plane_helper_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state);
+/**
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS -
- Initializes struct drm_plane_helper_funcs for VRAM handling
- Drivers may use GEM BOs as VRAM helpers for the framebuffer memory. This
- macro initializes struct drm_plane_helper_funcs to use the respective helper
- functions.
- */
+#define DRM_GEM_VRAM_PLANE_HELPER_FUNCS \
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb, \
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb
- /*
*/
- Helpers for struct drm_simple_display_pipe_funcs
Hi
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Like we have for the shadow helpers too, and roll it out to drivers.
In addition to the plane-helper macro, you may also want to add DRM_GEM_VRAM_SIMPLE_DISPLAY_PIPE_FUNCS and use it in bochs.
Best regards Thomas
Acked-by: Tian Tao tiantao6@hisilicon.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Dave Airlie airlied@redhat.com Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Hans de Goede hdegoede@redhat.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Tian Tao tiantao6@hisilicon.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com
drivers/gpu/drm/ast/ast_mode.c | 3 +-- drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c | 3 +-- drivers/gpu/drm/vboxvideo/vbox_mode.c | 3 +-- include/drm/drm_gem_vram_helper.h | 12 ++++++++++++ 4 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c index e5996ae03c49..f5d58c3088fe 100644 --- a/drivers/gpu/drm/ast/ast_mode.c +++ b/drivers/gpu/drm/ast/ast_mode.c @@ -612,8 +612,7 @@ ast_primary_plane_helper_atomic_disable(struct drm_plane *plane, }
static const struct drm_plane_helper_funcs ast_primary_plane_helper_funcs = {
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS, .atomic_check = ast_primary_plane_helper_atomic_check, .atomic_update = ast_primary_plane_helper_atomic_update, .atomic_disable = ast_primary_plane_helper_atomic_disable,
diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c index 29b8332b2bca..ccf80e369b4b 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c @@ -158,8 +158,7 @@ static const struct drm_plane_funcs hibmc_plane_funcs = { };
static const struct drm_plane_helper_funcs hibmc_plane_helper_funcs = {
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS, .atomic_check = hibmc_plane_atomic_check, .atomic_update = hibmc_plane_atomic_update, };
diff --git a/drivers/gpu/drm/vboxvideo/vbox_mode.c b/drivers/gpu/drm/vboxvideo/vbox_mode.c index 964381d55fc1..972c83b720aa 100644 --- a/drivers/gpu/drm/vboxvideo/vbox_mode.c +++ b/drivers/gpu/drm/vboxvideo/vbox_mode.c @@ -488,8 +488,7 @@ static const struct drm_plane_helper_funcs vbox_primary_helper_funcs = { .atomic_check = vbox_primary_atomic_check, .atomic_update = vbox_primary_atomic_update, .atomic_disable = vbox_primary_atomic_disable,
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
DRM_GEM_VRAM_PLANE_HELPER_FUNCS, };
static const struct drm_plane_funcs vbox_primary_plane_funcs = {
diff --git a/include/drm/drm_gem_vram_helper.h b/include/drm/drm_gem_vram_helper.h index 27ed7e9243b9..f48d181c824b 100644 --- a/include/drm/drm_gem_vram_helper.h +++ b/include/drm/drm_gem_vram_helper.h @@ -124,6 +124,18 @@ void drm_gem_vram_plane_helper_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state);
+/**
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS -
- Initializes struct drm_plane_helper_funcs for VRAM handling
- Drivers may use GEM BOs as VRAM helpers for the framebuffer memory. This
- macro initializes struct drm_plane_helper_funcs to use the respective helper
- functions.
- */
+#define DRM_GEM_VRAM_PLANE_HELPER_FUNCS \
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb, \
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb
- /*
*/
- Helpers for struct drm_simple_display_pipe_funcs
On Thu, Jun 24, 2021 at 09:46:20AM +0200, Thomas Zimmermann wrote:
Hi
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Like we have for the shadow helpers too, and roll it out to drivers.
In addition to the plane-helper macro, you may also want to add DRM_GEM_VRAM_SIMPLE_DISPLAY_PIPE_FUNCS and use it in bochs.
Hm I guess we can do that when we have a 2nd such case. I was more aiming to make it as friction-less as possible that drivers end up with a prepare_fb implementation which fishes out the implicit fences as needed in this series.
Thanks for looking at this patch, I'm merging them all to drm-misc-next now. -Daniel
Best regards Thomas
Acked-by: Tian Tao tiantao6@hisilicon.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Dave Airlie airlied@redhat.com Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Hans de Goede hdegoede@redhat.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Tian Tao tiantao6@hisilicon.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com
drivers/gpu/drm/ast/ast_mode.c | 3 +-- drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c | 3 +-- drivers/gpu/drm/vboxvideo/vbox_mode.c | 3 +-- include/drm/drm_gem_vram_helper.h | 12 ++++++++++++ 4 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c index e5996ae03c49..f5d58c3088fe 100644 --- a/drivers/gpu/drm/ast/ast_mode.c +++ b/drivers/gpu/drm/ast/ast_mode.c @@ -612,8 +612,7 @@ ast_primary_plane_helper_atomic_disable(struct drm_plane *plane, } static const struct drm_plane_helper_funcs ast_primary_plane_helper_funcs = {
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS, .atomic_check = ast_primary_plane_helper_atomic_check, .atomic_update = ast_primary_plane_helper_atomic_update, .atomic_disable = ast_primary_plane_helper_atomic_disable,
diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c index 29b8332b2bca..ccf80e369b4b 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_de.c @@ -158,8 +158,7 @@ static const struct drm_plane_funcs hibmc_plane_funcs = { }; static const struct drm_plane_helper_funcs hibmc_plane_helper_funcs = {
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS, .atomic_check = hibmc_plane_atomic_check, .atomic_update = hibmc_plane_atomic_update, };
diff --git a/drivers/gpu/drm/vboxvideo/vbox_mode.c b/drivers/gpu/drm/vboxvideo/vbox_mode.c index 964381d55fc1..972c83b720aa 100644 --- a/drivers/gpu/drm/vboxvideo/vbox_mode.c +++ b/drivers/gpu/drm/vboxvideo/vbox_mode.c @@ -488,8 +488,7 @@ static const struct drm_plane_helper_funcs vbox_primary_helper_funcs = { .atomic_check = vbox_primary_atomic_check, .atomic_update = vbox_primary_atomic_update, .atomic_disable = vbox_primary_atomic_disable,
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb,
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb,
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS, }; static const struct drm_plane_funcs vbox_primary_plane_funcs = {
diff --git a/include/drm/drm_gem_vram_helper.h b/include/drm/drm_gem_vram_helper.h index 27ed7e9243b9..f48d181c824b 100644 --- a/include/drm/drm_gem_vram_helper.h +++ b/include/drm/drm_gem_vram_helper.h @@ -124,6 +124,18 @@ void drm_gem_vram_plane_helper_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state); +/**
- DRM_GEM_VRAM_PLANE_HELPER_FUNCS -
- Initializes struct drm_plane_helper_funcs for VRAM handling
- Drivers may use GEM BOs as VRAM helpers for the framebuffer memory. This
- macro initializes struct drm_plane_helper_funcs to use the respective helper
- functions.
- */
+#define DRM_GEM_VRAM_PLANE_HELPER_FUNCS \
- .prepare_fb = drm_gem_vram_plane_helper_prepare_fb, \
- .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb
- /*
*/
- Helpers for struct drm_simple_display_pipe_funcs
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
I guess no one ever tried running omap together with lima or panfrost, not even sure that's possible. Anyway for consistency, fix this.
Reviewed-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Tomi Valkeinen tomba@kernel.org --- drivers/gpu/drm/omapdrm/omap_plane.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/omapdrm/omap_plane.c b/drivers/gpu/drm/omapdrm/omap_plane.c index 801da917507d..512af976b7e9 100644 --- a/drivers/gpu/drm/omapdrm/omap_plane.c +++ b/drivers/gpu/drm/omapdrm/omap_plane.c @@ -6,6 +6,7 @@
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> +#include <drm/drm_gem_atomic_helper.h> #include <drm/drm_plane_helper.h>
#include "omap_dmm_tiler.h" @@ -29,6 +30,8 @@ static int omap_plane_prepare_fb(struct drm_plane *plane, if (!new_state->fb) return 0;
+ drm_gem_plane_helper_prepare_fb(plane, new_state); + return omap_framebuffer_pin(new_state->fb); }
It's tedious to review this all the time, and my audit showed that arcpgu actually forgot to set this.
Make this the default and stop worrying.
Again I sprinkled WARN_ON_ONCE on top to make sure we don't have strange combinations of hooks: cleanup_fb without prepare_fb doesn't make sense, and since simpler drivers are all new they better be GEM based drivers.
v2: Warn and bail when it's _not_ a GEM driver (Noralf)
Cc: Noralf Trønnes noralf@tronnes.org Acked-by: Noralf Trønnes noralf@tronnes.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/drm_simple_kms_helper.c | 12 ++++++++++-- include/drm/drm_simple_kms_helper.h | 7 +++++-- 2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/drm_simple_kms_helper.c b/drivers/gpu/drm/drm_simple_kms_helper.c index 0b095a313c44..735f4f34bcc4 100644 --- a/drivers/gpu/drm/drm_simple_kms_helper.c +++ b/drivers/gpu/drm/drm_simple_kms_helper.c @@ -9,6 +9,8 @@ #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> +#include <drm/drm_drv.h> +#include <drm/drm_gem_atomic_helper.h> #include <drm/drm_managed.h> #include <drm/drm_plane_helper.h> #include <drm/drm_probe_helper.h> @@ -225,8 +227,14 @@ static int drm_simple_kms_plane_prepare_fb(struct drm_plane *plane, struct drm_simple_display_pipe *pipe;
pipe = container_of(plane, struct drm_simple_display_pipe, plane); - if (!pipe->funcs || !pipe->funcs->prepare_fb) - return 0; + if (!pipe->funcs || !pipe->funcs->prepare_fb) { + if (WARN_ON_ONCE(!drm_core_check_feature(plane->dev, DRIVER_GEM))) + return 0; + + WARN_ON_ONCE(pipe->funcs && pipe->funcs->cleanup_fb); + + return drm_gem_simple_display_pipe_prepare_fb(pipe, state); + }
return pipe->funcs->prepare_fb(pipe, state); } diff --git a/include/drm/drm_simple_kms_helper.h b/include/drm/drm_simple_kms_helper.h index ef9944e9c5fc..363a9a8c3587 100644 --- a/include/drm/drm_simple_kms_helper.h +++ b/include/drm/drm_simple_kms_helper.h @@ -116,8 +116,11 @@ struct drm_simple_display_pipe_funcs { * the documentation for the &drm_plane_helper_funcs.prepare_fb hook for * more details. * - * Drivers which always have their buffers pinned should use - * drm_gem_simple_display_pipe_prepare_fb() for this hook. + * For GEM drivers who neither have a @prepare_fb not @cleanup_fb hook + * set drm_gem_simple_display_pipe_prepare_fb() is called automatically + * to implement this. Other drivers which need additional plane + * processing can call drm_gem_simple_display_pipe_prepare_fb() from + * their @prepare_fb hook. */ int (*prepare_fb)(struct drm_simple_display_pipe *pipe, struct drm_plane_state *plane_state);
Hi Daniel,
On Tue, Jun 22, 2021 at 06:55:08PM +0200, Daniel Vetter wrote:
It's tedious to review this all the time, and my audit showed that arcpgu actually forgot to set this.
Make this the default and stop worrying.
Again I sprinkled WARN_ON_ONCE on top to make sure we don't have strange combinations of hooks: cleanup_fb without prepare_fb doesn't make sense, and since simpler drivers are all new they better be GEM based drivers.
v2: Warn and bail when it's _not_ a GEM driver (Noralf)
Cc: Noralf Trønnes noralf@tronnes.org Acked-by: Noralf Trønnes noralf@tronnes.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_simple_kms_helper.c | 12 ++++++++++-- include/drm/drm_simple_kms_helper.h | 7 +++++-- 2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/drm_simple_kms_helper.c b/drivers/gpu/drm/drm_simple_kms_helper.c index 0b095a313c44..735f4f34bcc4 100644 --- a/drivers/gpu/drm/drm_simple_kms_helper.c +++ b/drivers/gpu/drm/drm_simple_kms_helper.c @@ -9,6 +9,8 @@ #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> +#include <drm/drm_drv.h> +#include <drm/drm_gem_atomic_helper.h> #include <drm/drm_managed.h> #include <drm/drm_plane_helper.h> #include <drm/drm_probe_helper.h> @@ -225,8 +227,14 @@ static int drm_simple_kms_plane_prepare_fb(struct drm_plane *plane, struct drm_simple_display_pipe *pipe;
pipe = container_of(plane, struct drm_simple_display_pipe, plane);
- if (!pipe->funcs || !pipe->funcs->prepare_fb)
return 0;
if (!pipe->funcs || !pipe->funcs->prepare_fb) {
if (WARN_ON_ONCE(!drm_core_check_feature(plane->dev, DRIVER_GEM)))
return 0;
WARN_ON_ONCE(pipe->funcs && pipe->funcs->cleanup_fb);
return drm_gem_simple_display_pipe_prepare_fb(pipe, state);
}
return pipe->funcs->prepare_fb(pipe, state);
} diff --git a/include/drm/drm_simple_kms_helper.h b/include/drm/drm_simple_kms_helper.h index ef9944e9c5fc..363a9a8c3587 100644 --- a/include/drm/drm_simple_kms_helper.h +++ b/include/drm/drm_simple_kms_helper.h @@ -116,8 +116,11 @@ struct drm_simple_display_pipe_funcs { * the documentation for the &drm_plane_helper_funcs.prepare_fb hook for * more details. *
* Drivers which always have their buffers pinned should use
* drm_gem_simple_display_pipe_prepare_fb() for this hook.
* For GEM drivers who neither have a @prepare_fb not @cleanup_fb hook
* set drm_gem_simple_display_pipe_prepare_fb() is called automatically
* to implement this.
Same comments like before.
Sam
* Other drivers which need additional plane
* processing can call drm_gem_simple_display_pipe_prepare_fb() from
*/ int (*prepare_fb)(struct drm_simple_display_pipe *pipe, struct drm_plane_state *plane_state);* their @prepare_fb hook.
-- 2.32.0.rc2
It's tedious to review this all the time, and my audit showed that arcpgu actually forgot to set this.
Make this the default and stop worrying.
Again I sprinkled WARN_ON_ONCE on top to make sure we don't have strange combinations of hooks: cleanup_fb without prepare_fb doesn't make sense, and since simpler drivers are all new they better be GEM based drivers.
v2: Warn and bail when it's _not_ a GEM driver (Noralf)
v3: It's neither ... nor, not not (Sam)
Cc: Sam Ravnborg sam@ravnborg.org Cc: Noralf Trønnes noralf@tronnes.org Acked-by: Noralf Trønnes noralf@tronnes.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/drm_simple_kms_helper.c | 12 ++++++++++-- include/drm/drm_simple_kms_helper.h | 7 +++++-- 2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/drm_simple_kms_helper.c b/drivers/gpu/drm/drm_simple_kms_helper.c index 0b095a313c44..735f4f34bcc4 100644 --- a/drivers/gpu/drm/drm_simple_kms_helper.c +++ b/drivers/gpu/drm/drm_simple_kms_helper.c @@ -9,6 +9,8 @@ #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> +#include <drm/drm_drv.h> +#include <drm/drm_gem_atomic_helper.h> #include <drm/drm_managed.h> #include <drm/drm_plane_helper.h> #include <drm/drm_probe_helper.h> @@ -225,8 +227,14 @@ static int drm_simple_kms_plane_prepare_fb(struct drm_plane *plane, struct drm_simple_display_pipe *pipe;
pipe = container_of(plane, struct drm_simple_display_pipe, plane); - if (!pipe->funcs || !pipe->funcs->prepare_fb) - return 0; + if (!pipe->funcs || !pipe->funcs->prepare_fb) { + if (WARN_ON_ONCE(!drm_core_check_feature(plane->dev, DRIVER_GEM))) + return 0; + + WARN_ON_ONCE(pipe->funcs && pipe->funcs->cleanup_fb); + + return drm_gem_simple_display_pipe_prepare_fb(pipe, state); + }
return pipe->funcs->prepare_fb(pipe, state); } diff --git a/include/drm/drm_simple_kms_helper.h b/include/drm/drm_simple_kms_helper.h index ef9944e9c5fc..cf07132d4ee8 100644 --- a/include/drm/drm_simple_kms_helper.h +++ b/include/drm/drm_simple_kms_helper.h @@ -116,8 +116,11 @@ struct drm_simple_display_pipe_funcs { * the documentation for the &drm_plane_helper_funcs.prepare_fb hook for * more details. * - * Drivers which always have their buffers pinned should use - * drm_gem_simple_display_pipe_prepare_fb() for this hook. + * For GEM drivers who neither have a @prepare_fb nor @cleanup_fb hook + * set drm_gem_simple_display_pipe_prepare_fb() is called automatically + * to implement this. Other drivers which need additional plane + * processing can call drm_gem_simple_display_pipe_prepare_fb() from + * their @prepare_fb hook. */ int (*prepare_fb)(struct drm_simple_display_pipe *pipe, struct drm_plane_state *plane_state);
Hi Daniel, looks good.
On Wed, Jun 23, 2021 at 06:24:56PM +0200, Daniel Vetter wrote:
It's tedious to review this all the time, and my audit showed that arcpgu actually forgot to set this.
Make this the default and stop worrying.
Again I sprinkled WARN_ON_ONCE on top to make sure we don't have strange combinations of hooks: cleanup_fb without prepare_fb doesn't make sense, and since simpler drivers are all new they better be GEM based drivers.
v2: Warn and bail when it's _not_ a GEM driver (Noralf)
v3: It's neither ... nor, not not (Sam)
Cc: Sam Ravnborg sam@ravnborg.org Cc: Noralf Trønnes noralf@tronnes.org Acked-by: Noralf Trønnes noralf@tronnes.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
Acked-by: Sam Ravnborg sam@ravnborg.org
Goes through all the drivers and deletes the default hook since it's the default now.
Acked-by: David Lechner david@lechnology.com Acked-by: Noralf Trønnes noralf@tronnes.org Acked-by: Oleksandr Andrushchenko oleksandr_andrushchenko@epam.com Acked-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Joel Stanley joel@jms.id.au Cc: Andrew Jeffery andrew@aj.id.au Cc: "Noralf Trønnes" noralf@tronnes.org Cc: Linus Walleij linus.walleij@linaro.org Cc: Emma Anholt emma@anholt.net Cc: David Lechner david@lechnology.com Cc: Kamlesh Gurudasani kamlesh.gurudasani@gmail.com Cc: Oleksandr Andrushchenko oleksandr_andrushchenko@epam.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Sam Ravnborg sam@ravnborg.org Cc: Alex Deucher alexander.deucher@amd.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: linux-aspeed@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Cc: xen-devel@lists.xenproject.org --- drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c | 1 - drivers/gpu/drm/gud/gud_drv.c | 1 - drivers/gpu/drm/mcde/mcde_display.c | 1 - drivers/gpu/drm/pl111/pl111_display.c | 1 - drivers/gpu/drm/tiny/hx8357d.c | 1 - drivers/gpu/drm/tiny/ili9225.c | 1 - drivers/gpu/drm/tiny/ili9341.c | 1 - drivers/gpu/drm/tiny/ili9486.c | 1 - drivers/gpu/drm/tiny/mi0283qt.c | 1 - drivers/gpu/drm/tiny/repaper.c | 1 - drivers/gpu/drm/tiny/st7586.c | 1 - drivers/gpu/drm/tiny/st7735r.c | 1 - drivers/gpu/drm/tve200/tve200_display.c | 1 - drivers/gpu/drm/xen/xen_drm_front_kms.c | 1 - 14 files changed, 14 deletions(-)
diff --git a/drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c b/drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c index 098f96d4d50d..827e62c1daba 100644 --- a/drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c +++ b/drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c @@ -220,7 +220,6 @@ static const struct drm_simple_display_pipe_funcs aspeed_gfx_funcs = { .enable = aspeed_gfx_pipe_enable, .disable = aspeed_gfx_pipe_disable, .update = aspeed_gfx_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, .enable_vblank = aspeed_gfx_enable_vblank, .disable_vblank = aspeed_gfx_disable_vblank, }; diff --git a/drivers/gpu/drm/gud/gud_drv.c b/drivers/gpu/drm/gud/gud_drv.c index e8b672dc9832..1925df9c0fb7 100644 --- a/drivers/gpu/drm/gud/gud_drv.c +++ b/drivers/gpu/drm/gud/gud_drv.c @@ -364,7 +364,6 @@ static void gud_debugfs_init(struct drm_minor *minor) static const struct drm_simple_display_pipe_funcs gud_pipe_funcs = { .check = gud_pipe_check, .update = gud_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static const struct drm_mode_config_funcs gud_mode_config_funcs = { diff --git a/drivers/gpu/drm/mcde/mcde_display.c b/drivers/gpu/drm/mcde/mcde_display.c index 4ddc55d58f38..ce12a36e2db4 100644 --- a/drivers/gpu/drm/mcde/mcde_display.c +++ b/drivers/gpu/drm/mcde/mcde_display.c @@ -1479,7 +1479,6 @@ static struct drm_simple_display_pipe_funcs mcde_display_funcs = { .update = mcde_display_update, .enable_vblank = mcde_display_enable_vblank, .disable_vblank = mcde_display_disable_vblank, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
int mcde_display_init(struct drm_device *drm) diff --git a/drivers/gpu/drm/pl111/pl111_display.c b/drivers/gpu/drm/pl111/pl111_display.c index 6fd7f13f1aca..b5a8859739a2 100644 --- a/drivers/gpu/drm/pl111/pl111_display.c +++ b/drivers/gpu/drm/pl111/pl111_display.c @@ -440,7 +440,6 @@ static struct drm_simple_display_pipe_funcs pl111_display_funcs = { .enable = pl111_display_enable, .disable = pl111_display_disable, .update = pl111_display_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static int pl111_clk_div_choose_div(struct clk_hw *hw, unsigned long rate, diff --git a/drivers/gpu/drm/tiny/hx8357d.c b/drivers/gpu/drm/tiny/hx8357d.c index da5df93450de..9b33c05732aa 100644 --- a/drivers/gpu/drm/tiny/hx8357d.c +++ b/drivers/gpu/drm/tiny/hx8357d.c @@ -184,7 +184,6 @@ static const struct drm_simple_display_pipe_funcs hx8357d_pipe_funcs = { .enable = yx240qv29_enable, .disable = mipi_dbi_pipe_disable, .update = mipi_dbi_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static const struct drm_display_mode yx350hv15_mode = { diff --git a/drivers/gpu/drm/tiny/ili9225.c b/drivers/gpu/drm/tiny/ili9225.c index 69265d8a3beb..976d3209f164 100644 --- a/drivers/gpu/drm/tiny/ili9225.c +++ b/drivers/gpu/drm/tiny/ili9225.c @@ -328,7 +328,6 @@ static const struct drm_simple_display_pipe_funcs ili9225_pipe_funcs = { .enable = ili9225_pipe_enable, .disable = ili9225_pipe_disable, .update = ili9225_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static const struct drm_display_mode ili9225_mode = { diff --git a/drivers/gpu/drm/tiny/ili9341.c b/drivers/gpu/drm/tiny/ili9341.c index ad9ce7b4f76f..37e0c33399c8 100644 --- a/drivers/gpu/drm/tiny/ili9341.c +++ b/drivers/gpu/drm/tiny/ili9341.c @@ -140,7 +140,6 @@ static const struct drm_simple_display_pipe_funcs ili9341_pipe_funcs = { .enable = yx240qv29_enable, .disable = mipi_dbi_pipe_disable, .update = mipi_dbi_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static const struct drm_display_mode yx240qv29_mode = { diff --git a/drivers/gpu/drm/tiny/ili9486.c b/drivers/gpu/drm/tiny/ili9486.c index 75aa1476c66c..e9a63f4b2993 100644 --- a/drivers/gpu/drm/tiny/ili9486.c +++ b/drivers/gpu/drm/tiny/ili9486.c @@ -153,7 +153,6 @@ static const struct drm_simple_display_pipe_funcs waveshare_pipe_funcs = { .enable = waveshare_enable, .disable = mipi_dbi_pipe_disable, .update = mipi_dbi_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static const struct drm_display_mode waveshare_mode = { diff --git a/drivers/gpu/drm/tiny/mi0283qt.c b/drivers/gpu/drm/tiny/mi0283qt.c index 82fd1ad3413f..023de49e7a8e 100644 --- a/drivers/gpu/drm/tiny/mi0283qt.c +++ b/drivers/gpu/drm/tiny/mi0283qt.c @@ -144,7 +144,6 @@ static const struct drm_simple_display_pipe_funcs mi0283qt_pipe_funcs = { .enable = mi0283qt_enable, .disable = mipi_dbi_pipe_disable, .update = mipi_dbi_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static const struct drm_display_mode mi0283qt_mode = { diff --git a/drivers/gpu/drm/tiny/repaper.c b/drivers/gpu/drm/tiny/repaper.c index 2cee07a2e00b..007d9d59f01c 100644 --- a/drivers/gpu/drm/tiny/repaper.c +++ b/drivers/gpu/drm/tiny/repaper.c @@ -861,7 +861,6 @@ static const struct drm_simple_display_pipe_funcs repaper_pipe_funcs = { .enable = repaper_pipe_enable, .disable = repaper_pipe_disable, .update = repaper_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static int repaper_connector_get_modes(struct drm_connector *connector) diff --git a/drivers/gpu/drm/tiny/st7586.c b/drivers/gpu/drm/tiny/st7586.c index 05db980cc047..1be55bed609a 100644 --- a/drivers/gpu/drm/tiny/st7586.c +++ b/drivers/gpu/drm/tiny/st7586.c @@ -268,7 +268,6 @@ static const struct drm_simple_display_pipe_funcs st7586_pipe_funcs = { .enable = st7586_pipe_enable, .disable = st7586_pipe_disable, .update = st7586_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static const struct drm_display_mode st7586_mode = { diff --git a/drivers/gpu/drm/tiny/st7735r.c b/drivers/gpu/drm/tiny/st7735r.c index ec9dc817a2cc..122320db5d38 100644 --- a/drivers/gpu/drm/tiny/st7735r.c +++ b/drivers/gpu/drm/tiny/st7735r.c @@ -136,7 +136,6 @@ static const struct drm_simple_display_pipe_funcs st7735r_pipe_funcs = { .enable = st7735r_pipe_enable, .disable = mipi_dbi_pipe_disable, .update = mipi_dbi_pipe_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, };
static const struct st7735r_cfg jd_t18003_t01_cfg = { diff --git a/drivers/gpu/drm/tve200/tve200_display.c b/drivers/gpu/drm/tve200/tve200_display.c index 50e1fb71869f..17b8c8dd169d 100644 --- a/drivers/gpu/drm/tve200/tve200_display.c +++ b/drivers/gpu/drm/tve200/tve200_display.c @@ -316,7 +316,6 @@ static const struct drm_simple_display_pipe_funcs tve200_display_funcs = { .enable = tve200_display_enable, .disable = tve200_display_disable, .update = tve200_display_update, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, .enable_vblank = tve200_display_enable_vblank, .disable_vblank = tve200_display_disable_vblank, }; diff --git a/drivers/gpu/drm/xen/xen_drm_front_kms.c b/drivers/gpu/drm/xen/xen_drm_front_kms.c index 371202ebe900..cfda74490765 100644 --- a/drivers/gpu/drm/xen/xen_drm_front_kms.c +++ b/drivers/gpu/drm/xen/xen_drm_front_kms.c @@ -302,7 +302,6 @@ static const struct drm_simple_display_pipe_funcs display_funcs = { .mode_valid = display_mode_valid, .enable = display_enable, .disable = display_disable, - .prepare_fb = drm_gem_simple_display_pipe_prepare_fb, .check = display_check, .update = display_update, };
Spotted while trying to convert panfrost to these.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "Christian König" christian.koenig@amd.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/drm_gem.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index ba2e64ed8b47..68deb1de8235 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1302,6 +1302,9 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations); * @fence_array: array of dma_fence * for the job to block on. * @fence: the dma_fence to add to the list of dependencies. * + * This functions consumes the reference for @fence both on success and error + * cases. + * * Returns: * 0 on success, or an error on failing to expand the array. */
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Spotted while trying to convert panfrost to these.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "Christian König" christian.koenig@amd.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_gem.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index ba2e64ed8b47..68deb1de8235 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1302,6 +1302,9 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations);
- @fence_array: array of dma_fence * for the job to block on.
- @fence: the dma_fence to add to the list of dependencies.
- This functions consumes the reference for @fence both on success and error
- cases.
Oh, the later is a bit ugly I think. But good to know.
Reviewed-by: Christian König christian.koenig@amd.com
- Returns:
- 0 on success, or an error on failing to expand the array.
*/
On Wed, Jun 23, 2021 at 10:42:50AM +0200, Christian König wrote:
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Spotted while trying to convert panfrost to these.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "Christian König" christian.koenig@amd.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_gem.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index ba2e64ed8b47..68deb1de8235 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1302,6 +1302,9 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations);
- @fence_array: array of dma_fence * for the job to block on.
- @fence: the dma_fence to add to the list of dependencies.
- This functions consumes the reference for @fence both on success and error
- cases.
Oh, the later is a bit ugly I think. But good to know.
Reviewed-by: Christian König christian.koenig@amd.com
Merged to drm-misc-next, thanks for taking a look. Can you perhaps take a look at the drm/armada patch too, then I think I have reviews/acks for all of them?
Thanks, Daniel
- Returns:
- 0 on success, or an error on failing to expand the array.
*/
Am 24.06.21 um 14:41 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 10:42:50AM +0200, Christian König wrote:
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Spotted while trying to convert panfrost to these.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "Christian König" christian.koenig@amd.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_gem.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index ba2e64ed8b47..68deb1de8235 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1302,6 +1302,9 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations); * @fence_array: array of dma_fence * for the job to block on. * @fence: the dma_fence to add to the list of dependencies. *
- This functions consumes the reference for @fence both on success and error
- cases.
Oh, the later is a bit ugly I think. But good to know.
Reviewed-by: Christian König christian.koenig@amd.com
Merged to drm-misc-next, thanks for taking a look. Can you perhaps take a look at the drm/armada patch too, then I think I have reviews/acks for all of them?
What are you talking about? I only see drm/armada patches for the irq stuff Thomas is working on.
Christian.
Thanks, Daniel
* Returns: * 0 on success, or an error on failing to expand the array. */
On Thu, Jun 24, 2021 at 02:48:54PM +0200, Christian König wrote:
Am 24.06.21 um 14:41 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 10:42:50AM +0200, Christian König wrote:
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Spotted while trying to convert panfrost to these.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "Christian König" christian.koenig@amd.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_gem.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index ba2e64ed8b47..68deb1de8235 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1302,6 +1302,9 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations); * @fence_array: array of dma_fence * for the job to block on. * @fence: the dma_fence to add to the list of dependencies. *
- This functions consumes the reference for @fence both on success and error
- cases.
Oh, the later is a bit ugly I think. But good to know.
Reviewed-by: Christian König christian.koenig@amd.com
Merged to drm-misc-next, thanks for taking a look. Can you perhaps take a look at the drm/armada patch too, then I think I have reviews/acks for all of them?
What are you talking about? I only see drm/armada patches for the irq stuff Thomas is working on.
There was one in this series, but Maxime was quicker. I'm going to apply all the remaining ones now. After that I'll send out a patch set to add some dependency tracking to drm_sched_job so that there's not so much copypasta going on there. I stumbled over that when reviewing how we handle dependencies. -Daniel
Am 24.06.21 um 15:32 schrieb Daniel Vetter:
On Thu, Jun 24, 2021 at 02:48:54PM +0200, Christian König wrote:
Am 24.06.21 um 14:41 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 10:42:50AM +0200, Christian König wrote:
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Spotted while trying to convert panfrost to these.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "Christian König" christian.koenig@amd.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_gem.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index ba2e64ed8b47..68deb1de8235 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1302,6 +1302,9 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations); * @fence_array: array of dma_fence * for the job to block on. * @fence: the dma_fence to add to the list of dependencies. *
- This functions consumes the reference for @fence both on success and error
- cases.
Oh, the later is a bit ugly I think. But good to know.
Reviewed-by: Christian König christian.koenig@amd.com
Merged to drm-misc-next, thanks for taking a look. Can you perhaps take a look at the drm/armada patch too, then I think I have reviews/acks for all of them?
What are you talking about? I only see drm/armada patches for the irq stuff Thomas is working on.
There was one in this series, but Maxime was quicker. I'm going to apply all the remaining ones now. After that I'll send out a patch set to add some dependency tracking to drm_sched_job so that there's not so much copypasta going on there. I stumbled over that when reviewing how we handle dependencies.
Do you mean a common container for dma_fence objects a drm_sched_job depends on?
Thanks, Christian.
-Daniel
On Thu, Jun 24, 2021 at 03:35:19PM +0200, Christian König wrote:
Am 24.06.21 um 15:32 schrieb Daniel Vetter:
On Thu, Jun 24, 2021 at 02:48:54PM +0200, Christian König wrote:
Am 24.06.21 um 14:41 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 10:42:50AM +0200, Christian König wrote:
Am 22.06.21 um 18:55 schrieb Daniel Vetter:
Spotted while trying to convert panfrost to these.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "Christian König" christian.koenig@amd.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/drm_gem.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index ba2e64ed8b47..68deb1de8235 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1302,6 +1302,9 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations); * @fence_array: array of dma_fence * for the job to block on. * @fence: the dma_fence to add to the list of dependencies. *
- This functions consumes the reference for @fence both on success and error
- cases.
Oh, the later is a bit ugly I think. But good to know.
Reviewed-by: Christian König christian.koenig@amd.com
Merged to drm-misc-next, thanks for taking a look. Can you perhaps take a look at the drm/armada patch too, then I think I have reviews/acks for all of them?
What are you talking about? I only see drm/armada patches for the irq stuff Thomas is working on.
There was one in this series, but Maxime was quicker. I'm going to apply all the remaining ones now. After that I'll send out a patch set to add some dependency tracking to drm_sched_job so that there's not so much copypasta going on there. I stumbled over that when reviewing how we handle dependencies.
Do you mean a common container for dma_fence objects a drm_sched_job depends on?
Yup. Well the real usefulness is the interfaces, so that you can just grep for those when trying to figure out htf a driver handles its implicit dependencies. And amdgpu is unfortunately going to be a bit in the cold because it's special (but should be able to benefit too, just more than 1-2 patches to convert it over).
Anyway I'm going to type the cover letter rsn. -Daniel
Am 24.06.21 um 15:41 schrieb Daniel Vetter:
On Thu, Jun 24, 2021 at 03:35:19PM +0200, Christian König wrote:
Am 24.06.21 um 15:32 schrieb Daniel Vetter:
On Thu, Jun 24, 2021 at 02:48:54PM +0200, Christian König wrote:
Am 24.06.21 um 14:41 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 10:42:50AM +0200, Christian König wrote:
Am 22.06.21 um 18:55 schrieb Daniel Vetter: > Spotted while trying to convert panfrost to these. > > Signed-off-by: Daniel Vetter daniel.vetter@intel.com > Cc: "Christian König" christian.koenig@amd.com > Cc: Lucas Stach l.stach@pengutronix.de > Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com > Cc: Maxime Ripard mripard@kernel.org > Cc: Thomas Zimmermann tzimmermann@suse.de > Cc: David Airlie airlied@linux.ie > Cc: Daniel Vetter daniel@ffwll.ch > --- > drivers/gpu/drm/drm_gem.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index ba2e64ed8b47..68deb1de8235 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -1302,6 +1302,9 @@ EXPORT_SYMBOL(drm_gem_unlock_reservations); > * @fence_array: array of dma_fence * for the job to block on. > * @fence: the dma_fence to add to the list of dependencies. > * > + * This functions consumes the reference for @fence both on success and error > + * cases. > + * Oh, the later is a bit ugly I think. But good to know.
Reviewed-by: Christian König christian.koenig@amd.com
Merged to drm-misc-next, thanks for taking a look. Can you perhaps take a look at the drm/armada patch too, then I think I have reviews/acks for all of them?
What are you talking about? I only see drm/armada patches for the irq stuff Thomas is working on.
There was one in this series, but Maxime was quicker. I'm going to apply all the remaining ones now. After that I'll send out a patch set to add some dependency tracking to drm_sched_job so that there's not so much copypasta going on there. I stumbled over that when reviewing how we handle dependencies.
Do you mean a common container for dma_fence objects a drm_sched_job depends on?
Yup. Well the real usefulness is the interfaces, so that you can just grep for those when trying to figure out htf a driver handles its implicit dependencies. And amdgpu is unfortunately going to be a bit in the cold because it's special (but should be able to benefit too, just more than 1-2 patches to convert it over).
I had that on the TODO list for quite a while as well.
Essentially extracting what the dma_resv_list object is doing into a new object (but maybe without RCU).
Christian.
Anyway I'm going to type the cover letter rsn. -Daniel
WARNING: Absolutely untested beyond "gcc isn't dying in agony".
Implicit fencing done properly needs to treat the implicit fencing slots like a funny kind of IPC mailbox. In other words it needs to be explicitly. This is the only way it will mesh well with explicit fencing userspace like vk, and it's also the bare minimum required to be able to manage anything else that wants to use the same buffer on multiple engines in parallel, and still be able to share it through implicit sync.
amdgpu completely lacks such an uapi. Fix this.
Luckily the concept of ignoring implicit fences exists already, and takes care of all the complexities of making sure that non-optional fences (like bo moves) are not ignored. This support was added in
commit 177ae09b5d699a5ebd1cafcee78889db968abf54 Author: Andres Rodriguez andresx7@gmail.com Date: Fri Sep 15 20:44:06 2017 -0400
drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
Unfortuantely it's the wrong semantics, because it's a bo flag and disables implicit sync on an allocated buffer completely.
We _do_ want implicit sync, but control it explicitly. For this we need a flag on the drm_file, so that a given userspace (like vulkan) can manage the implicit sync slots explicitly. The other side of the pipeline (compositor, other process or just different stage in a media pipeline in the same process) can then either do the same, or fully participate in the implicit sync as implemented by the kernel by default.
By building on the existing flag for buffers we avoid any issues with opening up additional security concerns - anything this new flag here allows is already.
All drivers which supports this concept of a userspace-specific opt-out of implicit sync have a flag in their CS ioctl, but in reality that turned out to be a bit too inflexible. See the discussion below, let's try to do a bit better for amdgpu.
This alone only allows us to completely avoid any stalls due to implicit sync, it does not yet allow us to use implicit sync as a strange form of IPC for sync_file.
For that we need two more pieces:
- a way to get the current implicit sync fences out of a buffer. Could be done in a driver ioctl, but everyone needs this, and generally a dma-buf is involved anyway to establish the sharing. So an ioctl on the dma-buf makes a ton more sense:
https://lore.kernel.org/dri-devel/20210520190007.534046-4-jason@jlekstrand.n...
Current drivers in upstream solves this by having the opt-out flag on their CS ioctl. This has the downside that very often the CS which must actually stall for the implicit fence is run a while after the implicit fence point was logically sampled per the api spec (vk passes an explicit syncobj around for that afaiui), and so results in oversync. Converting the implicit sync fences into a snap-shot sync_file is actually accurate.
- Simillar we need to be able to set the exclusive implicit fence. Current drivers again do this with a CS ioctl flag, with again the same problems that the time the CS happens additional dependencies have been added. An explicit ioctl to only insert a sync_file (while respecting the rules for how exclusive and shared fence slots must be update in struct dma_resv) is much better. This is proposed here:
https://lore.kernel.org/dri-devel/20210520190007.534046-5-jason@jlekstrand.n...
These three pieces together allow userspace to fully control implicit fencing and remove all unecessary stall points due to them.
Well, as much as the implicit fencing model fundamentally allows: There is only one set of fences, you can only choose to sync against only writers (exclusive slot), or everyone. Hence suballocating multiple buffers or anything else like this is fundamentally not possible, and can only be fixed by a proper explicit fencing model.
Aside from that caveat this model gets implicit fencing as closely to explicit fencing semantics as possible:
On the actual implementation I opted for a simple setparam ioctl, no locking (just atomic reads/writes) for simplicity. There is a nice flag parameter in the VM ioctl which we could use, except: - it's not checked, so userspace likely passes garbage - there's already a comment that userspace _does_ pass garbage in the priority field So yeah unfortunately this flag parameter for setting vm flags is useless, and we need to hack up a new one.
v2: Explain why a new SETPARAM (Jason)
v3: Bas noticed I forgot to hook up the dependency-side shortcut. We need both, or this doesn't do much.
v4: Rebase over the amdgpu patch to always set the implicit sync fences.
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ 4 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 65df34c17264..c5386d13eb4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_bo *gds; struct amdgpu_bo *gws; struct amdgpu_bo *oa; + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r;
INIT_LIST_HEAD(&p->validated); @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
e->bo_va = amdgpu_vm_bo_find(vm, bo);
- if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { + if (bo->tbo.base.dma_buf && + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { e->chain = dma_fence_chain_alloc(); if (!e->chain) { r = -ENOMEM; @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; struct amdgpu_bo_list_entry *e; + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r;
list_for_each_entry(e, &p->validated, tv.head) { @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) struct dma_resv *resv = bo->tbo.base.resv; enum amdgpu_sync_mode sync_mode;
- sync_mode = amdgpu_bo_explicit_sync(bo) ? + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, &fpriv->vm); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index c080ba15ae77..f982626b5328 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) return 0; }
+int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, + struct drm_file *filp) +{ + struct drm_amdgpu_setparam *setparam = data; + struct amdgpu_fpriv *fpriv = filp->driver_priv; + + switch (setparam->param) { + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: + if (setparam->value) + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); + else + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); + break; + default: + return -EINVAL; + } + + return 0; +} + const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), };
static const struct drm_driver amdgpu_kms_driver = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index ddb85a85cbba..0e8c440c6303 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -321,6 +321,12 @@ struct amdgpu_vm { bool bulk_moveable; /* Flag to indicate if VM is used for compute */ bool is_compute_context; + /* + * Flag to indicate whether implicit sync should always be skipped on + * this context. We do not care about races at all, userspace is allowed + * to shoot itself with implicit sync to its fullest liking. + */ + bool no_implicit_sync; };
struct amdgpu_vm_manager { diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 0cbd1540aeac..9eae245c14d6 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -54,6 +54,7 @@ extern "C" { #define DRM_AMDGPU_VM 0x13 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 #define DRM_AMDGPU_SCHED 0x15 +#define DRM_AMDGPU_SETPARAM 0x16
#define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) @@ -71,6 +72,7 @@ extern "C" { #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam)
/** * DOC: memory domains @@ -306,6 +308,14 @@ union drm_amdgpu_sched { struct drm_amdgpu_sched_in in; };
+#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 + +struct drm_amdgpu_setparam { + /* AMDGPU_SETPARAM_* */ + __u32 param; + __u32 value; +}; + /* * This is not a reliable API and you should expect it to fail for any * number of reasons and have fallback path that do not use userptr to
On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
WARNING: Absolutely untested beyond "gcc isn't dying in agony".
Implicit fencing done properly needs to treat the implicit fencing slots like a funny kind of IPC mailbox. In other words it needs to be explicitly. This is the only way it will mesh well with explicit fencing userspace like vk, and it's also the bare minimum required to be able to manage anything else that wants to use the same buffer on multiple engines in parallel, and still be able to share it through implicit sync.
amdgpu completely lacks such an uapi. Fix this.
Luckily the concept of ignoring implicit fences exists already, and takes care of all the complexities of making sure that non-optional fences (like bo moves) are not ignored. This support was added in
commit 177ae09b5d699a5ebd1cafcee78889db968abf54 Author: Andres Rodriguez andresx7@gmail.com Date: Fri Sep 15 20:44:06 2017 -0400
drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
Unfortuantely it's the wrong semantics, because it's a bo flag and disables implicit sync on an allocated buffer completely.
We _do_ want implicit sync, but control it explicitly. For this we need a flag on the drm_file, so that a given userspace (like vulkan) can manage the implicit sync slots explicitly. The other side of the pipeline (compositor, other process or just different stage in a media pipeline in the same process) can then either do the same, or fully participate in the implicit sync as implemented by the kernel by default.
By building on the existing flag for buffers we avoid any issues with opening up additional security concerns - anything this new flag here allows is already.
All drivers which supports this concept of a userspace-specific opt-out of implicit sync have a flag in their CS ioctl, but in reality that turned out to be a bit too inflexible. See the discussion below, let's try to do a bit better for amdgpu.
This alone only allows us to completely avoid any stalls due to implicit sync, it does not yet allow us to use implicit sync as a strange form of IPC for sync_file.
For that we need two more pieces:
a way to get the current implicit sync fences out of a buffer. Could be done in a driver ioctl, but everyone needs this, and generally a dma-buf is involved anyway to establish the sharing. So an ioctl on the dma-buf makes a ton more sense:
https://lore.kernel.org/dri-devel/20210520190007.534046-4-jason@jlekstrand.n...
Current drivers in upstream solves this by having the opt-out flag on their CS ioctl. This has the downside that very often the CS which must actually stall for the implicit fence is run a while after the implicit fence point was logically sampled per the api spec (vk passes an explicit syncobj around for that afaiui), and so results in oversync. Converting the implicit sync fences into a snap-shot sync_file is actually accurate.
Simillar we need to be able to set the exclusive implicit fence. Current drivers again do this with a CS ioctl flag, with again the same problems that the time the CS happens additional dependencies have been added. An explicit ioctl to only insert a sync_file (while respecting the rules for how exclusive and shared fence slots must be update in struct dma_resv) is much better. This is proposed here:
https://lore.kernel.org/dri-devel/20210520190007.534046-5-jason@jlekstrand.n...
These three pieces together allow userspace to fully control implicit fencing and remove all unecessary stall points due to them.
Well, as much as the implicit fencing model fundamentally allows: There is only one set of fences, you can only choose to sync against only writers (exclusive slot), or everyone. Hence suballocating multiple buffers or anything else like this is fundamentally not possible, and can only be fixed by a proper explicit fencing model.
Aside from that caveat this model gets implicit fencing as closely to explicit fencing semantics as possible:
On the actual implementation I opted for a simple setparam ioctl, no locking (just atomic reads/writes) for simplicity. There is a nice flag parameter in the VM ioctl which we could use, except:
- it's not checked, so userspace likely passes garbage
- there's already a comment that userspace _does_ pass garbage in the priority field
So yeah unfortunately this flag parameter for setting vm flags is useless, and we need to hack up a new one.
v2: Explain why a new SETPARAM (Jason)
v3: Bas noticed I forgot to hook up the dependency-side shortcut. We need both, or this doesn't do much.
v4: Rebase over the amdgpu patch to always set the implicit sync fences.
So I think there is still a case missing in this implementation. Consider these 3 cases
(format: a->b: b waits on a. Yes, I know arrows are hard)
explicit->explicit: This doesn't wait now, which is good Implicit->explicit: This doesn't wait now, which is good explicit->implicit : This still waits as the explicit submission still adds shared fences and most things that set an exclusive fence for implicit sync will hence wait on it.
This is probably good enough for what radv needs now but also sounds like a risk wrt baking in new uapi behavior that we don't want to be the end result.
Within AMDGPU this is probably solvable in two ways:
1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences. 2) Have an EXPLICIT fence owner that is used for explicit submissions that is ignored by AMDGPU_SYNC_NE_OWNER.
But this doesn't solve cross-driver interactions here.
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ 4 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 65df34c17264..c5386d13eb4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_bo *gds; struct amdgpu_bo *gws; struct amdgpu_bo *oa;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; INIT_LIST_HEAD(&p->validated);
@@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
e->bo_va = amdgpu_vm_bo_find(vm, bo);
if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
if (bo->tbo.base.dma_buf &&
!(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { e->chain = dma_fence_chain_alloc(); if (!e->chain) { r = -ENOMEM;
@@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; struct amdgpu_bo_list_entry *e;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; list_for_each_entry(e, &p->validated, tv.head) {
@@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) struct dma_resv *resv = bo->tbo.base.resv; enum amdgpu_sync_mode sync_mode;
sync_mode = amdgpu_bo_explicit_sync(bo) ?
sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, &fpriv->vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index c080ba15ae77..f982626b5328 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) return 0; }
+int amdgpu_setparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
+{
struct drm_amdgpu_setparam *setparam = data;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
switch (setparam->param) {
case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC:
if (setparam->value)
WRITE_ONCE(fpriv->vm.no_implicit_sync, true);
else
WRITE_ONCE(fpriv->vm.no_implicit_sync, false);
break;
default:
return -EINVAL;
}
return 0;
+}
const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
};
static const struct drm_driver amdgpu_kms_driver = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index ddb85a85cbba..0e8c440c6303 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -321,6 +321,12 @@ struct amdgpu_vm { bool bulk_moveable; /* Flag to indicate if VM is used for compute */ bool is_compute_context;
/*
* Flag to indicate whether implicit sync should always be skipped on
* this context. We do not care about races at all, userspace is allowed
* to shoot itself with implicit sync to its fullest liking.
*/
bool no_implicit_sync;
};
struct amdgpu_vm_manager { diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 0cbd1540aeac..9eae245c14d6 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -54,6 +54,7 @@ extern "C" { #define DRM_AMDGPU_VM 0x13 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 #define DRM_AMDGPU_SCHED 0x15 +#define DRM_AMDGPU_SETPARAM 0x16
#define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) @@ -71,6 +72,7 @@ extern "C" { #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam)
/**
- DOC: memory domains
@@ -306,6 +308,14 @@ union drm_amdgpu_sched { struct drm_amdgpu_sched_in in; };
+#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1
+struct drm_amdgpu_setparam {
/* AMDGPU_SETPARAM_* */
__u32 param;
__u32 value;
+};
/*
- This is not a reliable API and you should expect it to fail for any
- number of reasons and have fallback path that do not use userptr to
-- 2.32.0.rc2
On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
WARNING: Absolutely untested beyond "gcc isn't dying in agony".
Implicit fencing done properly needs to treat the implicit fencing slots like a funny kind of IPC mailbox. In other words it needs to be explicitly. This is the only way it will mesh well with explicit fencing userspace like vk, and it's also the bare minimum required to be able to manage anything else that wants to use the same buffer on multiple engines in parallel, and still be able to share it through implicit sync.
amdgpu completely lacks such an uapi. Fix this.
Luckily the concept of ignoring implicit fences exists already, and takes care of all the complexities of making sure that non-optional fences (like bo moves) are not ignored. This support was added in
commit 177ae09b5d699a5ebd1cafcee78889db968abf54 Author: Andres Rodriguez andresx7@gmail.com Date: Fri Sep 15 20:44:06 2017 -0400
drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
Unfortuantely it's the wrong semantics, because it's a bo flag and disables implicit sync on an allocated buffer completely.
We _do_ want implicit sync, but control it explicitly. For this we need a flag on the drm_file, so that a given userspace (like vulkan) can manage the implicit sync slots explicitly. The other side of the pipeline (compositor, other process or just different stage in a media pipeline in the same process) can then either do the same, or fully participate in the implicit sync as implemented by the kernel by default.
By building on the existing flag for buffers we avoid any issues with opening up additional security concerns - anything this new flag here allows is already.
All drivers which supports this concept of a userspace-specific opt-out of implicit sync have a flag in their CS ioctl, but in reality that turned out to be a bit too inflexible. See the discussion below, let's try to do a bit better for amdgpu.
This alone only allows us to completely avoid any stalls due to implicit sync, it does not yet allow us to use implicit sync as a strange form of IPC for sync_file.
For that we need two more pieces:
a way to get the current implicit sync fences out of a buffer. Could be done in a driver ioctl, but everyone needs this, and generally a dma-buf is involved anyway to establish the sharing. So an ioctl on the dma-buf makes a ton more sense:
https://lore.kernel.org/dri-devel/20210520190007.534046-4-jason@jlekstrand.n...
Current drivers in upstream solves this by having the opt-out flag on their CS ioctl. This has the downside that very often the CS which must actually stall for the implicit fence is run a while after the implicit fence point was logically sampled per the api spec (vk passes an explicit syncobj around for that afaiui), and so results in oversync. Converting the implicit sync fences into a snap-shot sync_file is actually accurate.
Simillar we need to be able to set the exclusive implicit fence. Current drivers again do this with a CS ioctl flag, with again the same problems that the time the CS happens additional dependencies have been added. An explicit ioctl to only insert a sync_file (while respecting the rules for how exclusive and shared fence slots must be update in struct dma_resv) is much better. This is proposed here:
https://lore.kernel.org/dri-devel/20210520190007.534046-5-jason@jlekstrand.n...
These three pieces together allow userspace to fully control implicit fencing and remove all unecessary stall points due to them.
Well, as much as the implicit fencing model fundamentally allows: There is only one set of fences, you can only choose to sync against only writers (exclusive slot), or everyone. Hence suballocating multiple buffers or anything else like this is fundamentally not possible, and can only be fixed by a proper explicit fencing model.
Aside from that caveat this model gets implicit fencing as closely to explicit fencing semantics as possible:
On the actual implementation I opted for a simple setparam ioctl, no locking (just atomic reads/writes) for simplicity. There is a nice flag parameter in the VM ioctl which we could use, except:
- it's not checked, so userspace likely passes garbage
- there's already a comment that userspace _does_ pass garbage in the priority field
So yeah unfortunately this flag parameter for setting vm flags is useless, and we need to hack up a new one.
v2: Explain why a new SETPARAM (Jason)
v3: Bas noticed I forgot to hook up the dependency-side shortcut. We need both, or this doesn't do much.
v4: Rebase over the amdgpu patch to always set the implicit sync fences.
So I think there is still a case missing in this implementation. Consider these 3 cases
(format: a->b: b waits on a. Yes, I know arrows are hard)
explicit->explicit: This doesn't wait now, which is good Implicit->explicit: This doesn't wait now, which is good explicit->implicit : This still waits as the explicit submission still adds shared fences and most things that set an exclusive fence for implicit sync will hence wait on it.
This is probably good enough for what radv needs now but also sounds like a risk wrt baking in new uapi behavior that we don't want to be the end result.
Within AMDGPU this is probably solvable in two ways:
- Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences.
I'm not sure that works. I think the right fix is that radeonsi also switches to this model, with maybe a per-bo CS flag to set indicate write access, to cut down on the number of ioctls that are needed otherwise on shared buffers. This per-bo flag would essentially select between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis.
The current amdgpu uapi just doesn't allow any other model without an explicit opt-in. So current implicit sync userspace just has to oversync, there's not much choice.
- Have an EXPLICIT fence owner that is used for explicit submissions
that is ignored by AMDGPU_SYNC_NE_OWNER.
But this doesn't solve cross-driver interactions here.
Yeah cross-driver is still entirely unsolved, because amdgpu_bo_explicit_sync() on the bo didn't solve that either. -Daniel
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ 4 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 65df34c17264..c5386d13eb4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_bo *gds; struct amdgpu_bo *gws; struct amdgpu_bo *oa;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; INIT_LIST_HEAD(&p->validated);
@@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
e->bo_va = amdgpu_vm_bo_find(vm, bo);
if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
if (bo->tbo.base.dma_buf &&
!(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { e->chain = dma_fence_chain_alloc(); if (!e->chain) { r = -ENOMEM;
@@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; struct amdgpu_bo_list_entry *e;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; list_for_each_entry(e, &p->validated, tv.head) {
@@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) struct dma_resv *resv = bo->tbo.base.resv; enum amdgpu_sync_mode sync_mode;
sync_mode = amdgpu_bo_explicit_sync(bo) ?
sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, &fpriv->vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index c080ba15ae77..f982626b5328 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) return 0; }
+int amdgpu_setparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
+{
struct drm_amdgpu_setparam *setparam = data;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
switch (setparam->param) {
case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC:
if (setparam->value)
WRITE_ONCE(fpriv->vm.no_implicit_sync, true);
else
WRITE_ONCE(fpriv->vm.no_implicit_sync, false);
break;
default:
return -EINVAL;
}
return 0;
+}
const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
};
static const struct drm_driver amdgpu_kms_driver = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index ddb85a85cbba..0e8c440c6303 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -321,6 +321,12 @@ struct amdgpu_vm { bool bulk_moveable; /* Flag to indicate if VM is used for compute */ bool is_compute_context;
/*
* Flag to indicate whether implicit sync should always be skipped on
* this context. We do not care about races at all, userspace is allowed
* to shoot itself with implicit sync to its fullest liking.
*/
bool no_implicit_sync;
};
struct amdgpu_vm_manager { diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 0cbd1540aeac..9eae245c14d6 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -54,6 +54,7 @@ extern "C" { #define DRM_AMDGPU_VM 0x13 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 #define DRM_AMDGPU_SCHED 0x15 +#define DRM_AMDGPU_SETPARAM 0x16
#define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) @@ -71,6 +72,7 @@ extern "C" { #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam)
/**
- DOC: memory domains
@@ -306,6 +308,14 @@ union drm_amdgpu_sched { struct drm_amdgpu_sched_in in; };
+#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1
+struct drm_amdgpu_setparam {
/* AMDGPU_SETPARAM_* */
__u32 param;
__u32 value;
+};
/*
- This is not a reliable API and you should expect it to fail for any
- number of reasons and have fallback path that do not use userptr to
-- 2.32.0.rc2
Am 23.06.21 um 14:18 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
WARNING: Absolutely untested beyond "gcc isn't dying in agony".
Implicit fencing done properly needs to treat the implicit fencing slots like a funny kind of IPC mailbox. In other words it needs to be explicitly. This is the only way it will mesh well with explicit fencing userspace like vk, and it's also the bare minimum required to be able to manage anything else that wants to use the same buffer on multiple engines in parallel, and still be able to share it through implicit sync.
amdgpu completely lacks such an uapi. Fix this.
Luckily the concept of ignoring implicit fences exists already, and takes care of all the complexities of making sure that non-optional fences (like bo moves) are not ignored. This support was added in
commit 177ae09b5d699a5ebd1cafcee78889db968abf54 Author: Andres Rodriguez andresx7@gmail.com Date: Fri Sep 15 20:44:06 2017 -0400
drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
Unfortuantely it's the wrong semantics, because it's a bo flag and disables implicit sync on an allocated buffer completely.
We _do_ want implicit sync, but control it explicitly. For this we need a flag on the drm_file, so that a given userspace (like vulkan) can manage the implicit sync slots explicitly. The other side of the pipeline (compositor, other process or just different stage in a media pipeline in the same process) can then either do the same, or fully participate in the implicit sync as implemented by the kernel by default.
By building on the existing flag for buffers we avoid any issues with opening up additional security concerns - anything this new flag here allows is already.
All drivers which supports this concept of a userspace-specific opt-out of implicit sync have a flag in their CS ioctl, but in reality that turned out to be a bit too inflexible. See the discussion below, let's try to do a bit better for amdgpu.
This alone only allows us to completely avoid any stalls due to implicit sync, it does not yet allow us to use implicit sync as a strange form of IPC for sync_file.
For that we need two more pieces:
a way to get the current implicit sync fences out of a buffer. Could be done in a driver ioctl, but everyone needs this, and generally a dma-buf is involved anyway to establish the sharing. So an ioctl on the dma-buf makes a ton more sense:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
Current drivers in upstream solves this by having the opt-out flag on their CS ioctl. This has the downside that very often the CS which must actually stall for the implicit fence is run a while after the implicit fence point was logically sampled per the api spec (vk passes an explicit syncobj around for that afaiui), and so results in oversync. Converting the implicit sync fences into a snap-shot sync_file is actually accurate.
Simillar we need to be able to set the exclusive implicit fence. Current drivers again do this with a CS ioctl flag, with again the same problems that the time the CS happens additional dependencies have been added. An explicit ioctl to only insert a sync_file (while respecting the rules for how exclusive and shared fence slots must be update in struct dma_resv) is much better. This is proposed here:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
These three pieces together allow userspace to fully control implicit fencing and remove all unecessary stall points due to them.
Well, as much as the implicit fencing model fundamentally allows: There is only one set of fences, you can only choose to sync against only writers (exclusive slot), or everyone. Hence suballocating multiple buffers or anything else like this is fundamentally not possible, and can only be fixed by a proper explicit fencing model.
Aside from that caveat this model gets implicit fencing as closely to explicit fencing semantics as possible:
On the actual implementation I opted for a simple setparam ioctl, no locking (just atomic reads/writes) for simplicity. There is a nice flag parameter in the VM ioctl which we could use, except:
- it's not checked, so userspace likely passes garbage
- there's already a comment that userspace _does_ pass garbage in the priority field
So yeah unfortunately this flag parameter for setting vm flags is useless, and we need to hack up a new one.
v2: Explain why a new SETPARAM (Jason)
v3: Bas noticed I forgot to hook up the dependency-side shortcut. We need both, or this doesn't do much.
v4: Rebase over the amdgpu patch to always set the implicit sync fences.
So I think there is still a case missing in this implementation. Consider these 3 cases
(format: a->b: b waits on a. Yes, I know arrows are hard)
explicit->explicit: This doesn't wait now, which is good Implicit->explicit: This doesn't wait now, which is good explicit->implicit : This still waits as the explicit submission still adds shared fences and most things that set an exclusive fence for implicit sync will hence wait on it.
This is probably good enough for what radv needs now but also sounds like a risk wrt baking in new uapi behavior that we don't want to be the end result.
Within AMDGPU this is probably solvable in two ways:
- Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences.
I'm not sure that works. I think the right fix is that radeonsi also switches to this model, with maybe a per-bo CS flag to set indicate write access, to cut down on the number of ioctls that are needed otherwise on shared buffers. This per-bo flag would essentially select between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis.
Yeah, but I'm still not entirely sure why that approach isn't sufficient?
Problem with the per context or per vm flag is that you then don't get any implicit synchronization any more when another process starts using the buffer.
The current amdgpu uapi just doesn't allow any other model without an explicit opt-in. So current implicit sync userspace just has to oversync, there's not much choice.
- Have an EXPLICIT fence owner that is used for explicit submissions
that is ignored by AMDGPU_SYNC_NE_OWNER.
But this doesn't solve cross-driver interactions here.
Yeah cross-driver is still entirely unsolved, because amdgpu_bo_explicit_sync() on the bo didn't solve that either.
Hui? You have lost me. Why is that still unsolved?
Regards, Christian.
-Daniel
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ 4 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 65df34c17264..c5386d13eb4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_bo *gds; struct amdgpu_bo *gws; struct amdgpu_bo *oa;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; INIT_LIST_HEAD(&p->validated);
@@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
e->bo_va = amdgpu_vm_bo_find(vm, bo);
if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
if (bo->tbo.base.dma_buf &&
!(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { e->chain = dma_fence_chain_alloc(); if (!e->chain) { r = -ENOMEM;
@@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; struct amdgpu_bo_list_entry *e;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; list_for_each_entry(e, &p->validated, tv.head) {
@@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) struct dma_resv *resv = bo->tbo.base.resv; enum amdgpu_sync_mode sync_mode;
sync_mode = amdgpu_bo_explicit_sync(bo) ?
sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, &fpriv->vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index c080ba15ae77..f982626b5328 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) return 0; }
+int amdgpu_setparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
+{
struct drm_amdgpu_setparam *setparam = data;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
switch (setparam->param) {
case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC:
if (setparam->value)
WRITE_ONCE(fpriv->vm.no_implicit_sync, true);
else
WRITE_ONCE(fpriv->vm.no_implicit_sync, false);
break;
default:
return -EINVAL;
}
return 0;
+}
- const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
@@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
};
static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index ddb85a85cbba..0e8c440c6303 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -321,6 +321,12 @@ struct amdgpu_vm { bool bulk_moveable; /* Flag to indicate if VM is used for compute */ bool is_compute_context;
/*
* Flag to indicate whether implicit sync should always be skipped on
* this context. We do not care about races at all, userspace is allowed
* to shoot itself with implicit sync to its fullest liking.
*/
bool no_implicit_sync;
};
struct amdgpu_vm_manager {
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 0cbd1540aeac..9eae245c14d6 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -54,6 +54,7 @@ extern "C" { #define DRM_AMDGPU_VM 0x13 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 #define DRM_AMDGPU_SCHED 0x15 +#define DRM_AMDGPU_SETPARAM 0x16
#define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) @@ -71,6 +72,7 @@ extern "C" { #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam)
/**
- DOC: memory domains
@@ -306,6 +308,14 @@ union drm_amdgpu_sched { struct drm_amdgpu_sched_in in; };
+#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1
+struct drm_amdgpu_setparam {
/* AMDGPU_SETPARAM_* */
__u32 param;
__u32 value;
+};
- /*
- This is not a reliable API and you should expect it to fail for any
- number of reasons and have fallback path that do not use userptr to
-- 2.32.0.rc2
On Wed, Jun 23, 2021 at 2:59 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 14:18 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
WARNING: Absolutely untested beyond "gcc isn't dying in agony".
Implicit fencing done properly needs to treat the implicit fencing slots like a funny kind of IPC mailbox. In other words it needs to be explicitly. This is the only way it will mesh well with explicit fencing userspace like vk, and it's also the bare minimum required to be able to manage anything else that wants to use the same buffer on multiple engines in parallel, and still be able to share it through implicit sync.
amdgpu completely lacks such an uapi. Fix this.
Luckily the concept of ignoring implicit fences exists already, and takes care of all the complexities of making sure that non-optional fences (like bo moves) are not ignored. This support was added in
commit 177ae09b5d699a5ebd1cafcee78889db968abf54 Author: Andres Rodriguez andresx7@gmail.com Date: Fri Sep 15 20:44:06 2017 -0400
drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
Unfortuantely it's the wrong semantics, because it's a bo flag and disables implicit sync on an allocated buffer completely.
We _do_ want implicit sync, but control it explicitly. For this we need a flag on the drm_file, so that a given userspace (like vulkan) can manage the implicit sync slots explicitly. The other side of the pipeline (compositor, other process or just different stage in a media pipeline in the same process) can then either do the same, or fully participate in the implicit sync as implemented by the kernel by default.
By building on the existing flag for buffers we avoid any issues with opening up additional security concerns - anything this new flag here allows is already.
All drivers which supports this concept of a userspace-specific opt-out of implicit sync have a flag in their CS ioctl, but in reality that turned out to be a bit too inflexible. See the discussion below, let's try to do a bit better for amdgpu.
This alone only allows us to completely avoid any stalls due to implicit sync, it does not yet allow us to use implicit sync as a strange form of IPC for sync_file.
For that we need two more pieces:
a way to get the current implicit sync fences out of a buffer. Could be done in a driver ioctl, but everyone needs this, and generally a dma-buf is involved anyway to establish the sharing. So an ioctl on the dma-buf makes a ton more sense:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
Current drivers in upstream solves this by having the opt-out flag on their CS ioctl. This has the downside that very often the CS which must actually stall for the implicit fence is run a while after the implicit fence point was logically sampled per the api spec (vk passes an explicit syncobj around for that afaiui), and so results in oversync. Converting the implicit sync fences into a snap-shot sync_file is actually accurate.
Simillar we need to be able to set the exclusive implicit fence. Current drivers again do this with a CS ioctl flag, with again the same problems that the time the CS happens additional dependencies have been added. An explicit ioctl to only insert a sync_file (while respecting the rules for how exclusive and shared fence slots must be update in struct dma_resv) is much better. This is proposed here:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
These three pieces together allow userspace to fully control implicit fencing and remove all unecessary stall points due to them.
Well, as much as the implicit fencing model fundamentally allows: There is only one set of fences, you can only choose to sync against only writers (exclusive slot), or everyone. Hence suballocating multiple buffers or anything else like this is fundamentally not possible, and can only be fixed by a proper explicit fencing model.
Aside from that caveat this model gets implicit fencing as closely to explicit fencing semantics as possible:
On the actual implementation I opted for a simple setparam ioctl, no locking (just atomic reads/writes) for simplicity. There is a nice flag parameter in the VM ioctl which we could use, except:
- it's not checked, so userspace likely passes garbage
- there's already a comment that userspace _does_ pass garbage in the priority field
So yeah unfortunately this flag parameter for setting vm flags is useless, and we need to hack up a new one.
v2: Explain why a new SETPARAM (Jason)
v3: Bas noticed I forgot to hook up the dependency-side shortcut. We need both, or this doesn't do much.
v4: Rebase over the amdgpu patch to always set the implicit sync fences.
So I think there is still a case missing in this implementation. Consider these 3 cases
(format: a->b: b waits on a. Yes, I know arrows are hard)
explicit->explicit: This doesn't wait now, which is good Implicit->explicit: This doesn't wait now, which is good explicit->implicit : This still waits as the explicit submission still adds shared fences and most things that set an exclusive fence for implicit sync will hence wait on it.
This is probably good enough for what radv needs now but also sounds like a risk wrt baking in new uapi behavior that we don't want to be the end result.
Within AMDGPU this is probably solvable in two ways:
- Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences.
I'm not sure that works. I think the right fix is that radeonsi also switches to this model, with maybe a per-bo CS flag to set indicate write access, to cut down on the number of ioctls that are needed otherwise on shared buffers. This per-bo flag would essentially select between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis.
Yeah, but I'm still not entirely sure why that approach isn't sufficient?
Problem with the per context or per vm flag is that you then don't get any implicit synchronization any more when another process starts using the buffer.
That is exactly what I want for Vulkan :)
The current amdgpu uapi just doesn't allow any other model without an explicit opt-in. So current implicit sync userspace just has to oversync, there's not much choice.
- Have an EXPLICIT fence owner that is used for explicit submissions
that is ignored by AMDGPU_SYNC_NE_OWNER.
But this doesn't solve cross-driver interactions here.
Yeah cross-driver is still entirely unsolved, because amdgpu_bo_explicit_sync() on the bo didn't solve that either.
Hui? You have lost me. Why is that still unsolved?
The part we're trying to solve with this patch is Vulkan should not participate in any implicit sync at all wrt submissions (and then handle the implicit sync for WSI explicitly using the fence import/export stuff that Jason wrote). As long we add shared fences to the dma_resv we participate in implicit sync (at the level of an implicit sync read) still, at least from the perspective of later jobs waiting on these fences.
Regards, Christian.
-Daniel
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ 4 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 65df34c17264..c5386d13eb4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_bo *gds; struct amdgpu_bo *gws; struct amdgpu_bo *oa;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; INIT_LIST_HEAD(&p->validated);
@@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
e->bo_va = amdgpu_vm_bo_find(vm, bo);
if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
if (bo->tbo.base.dma_buf &&
!(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { e->chain = dma_fence_chain_alloc(); if (!e->chain) { r = -ENOMEM;
@@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; struct amdgpu_bo_list_entry *e;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; list_for_each_entry(e, &p->validated, tv.head) {
@@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) struct dma_resv *resv = bo->tbo.base.resv; enum amdgpu_sync_mode sync_mode;
sync_mode = amdgpu_bo_explicit_sync(bo) ?
sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, &fpriv->vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index c080ba15ae77..f982626b5328 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) return 0; }
+int amdgpu_setparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
+{
struct drm_amdgpu_setparam *setparam = data;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
switch (setparam->param) {
case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC:
if (setparam->value)
WRITE_ONCE(fpriv->vm.no_implicit_sync, true);
else
WRITE_ONCE(fpriv->vm.no_implicit_sync, false);
break;
default:
return -EINVAL;
}
return 0;
+}
- const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
@@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
};
static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index ddb85a85cbba..0e8c440c6303 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -321,6 +321,12 @@ struct amdgpu_vm { bool bulk_moveable; /* Flag to indicate if VM is used for compute */ bool is_compute_context;
/*
* Flag to indicate whether implicit sync should always be skipped on
* this context. We do not care about races at all, userspace is allowed
* to shoot itself with implicit sync to its fullest liking.
*/
bool no_implicit_sync;
};
struct amdgpu_vm_manager {
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 0cbd1540aeac..9eae245c14d6 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -54,6 +54,7 @@ extern "C" { #define DRM_AMDGPU_VM 0x13 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 #define DRM_AMDGPU_SCHED 0x15 +#define DRM_AMDGPU_SETPARAM 0x16
#define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) @@ -71,6 +72,7 @@ extern "C" { #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam)
/**
- DOC: memory domains
@@ -306,6 +308,14 @@ union drm_amdgpu_sched { struct drm_amdgpu_sched_in in; };
+#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1
+struct drm_amdgpu_setparam {
/* AMDGPU_SETPARAM_* */
__u32 param;
__u32 value;
+};
- /*
- This is not a reliable API and you should expect it to fail for any
- number of reasons and have fallback path that do not use userptr to
-- 2.32.0.rc2
Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen:
On Wed, Jun 23, 2021 at 2:59 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 14:18 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
WARNING: Absolutely untested beyond "gcc isn't dying in agony".
Implicit fencing done properly needs to treat the implicit fencing slots like a funny kind of IPC mailbox. In other words it needs to be explicitly. This is the only way it will mesh well with explicit fencing userspace like vk, and it's also the bare minimum required to be able to manage anything else that wants to use the same buffer on multiple engines in parallel, and still be able to share it through implicit sync.
amdgpu completely lacks such an uapi. Fix this.
Luckily the concept of ignoring implicit fences exists already, and takes care of all the complexities of making sure that non-optional fences (like bo moves) are not ignored. This support was added in
commit 177ae09b5d699a5ebd1cafcee78889db968abf54 Author: Andres Rodriguez andresx7@gmail.com Date: Fri Sep 15 20:44:06 2017 -0400
drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
Unfortuantely it's the wrong semantics, because it's a bo flag and disables implicit sync on an allocated buffer completely.
We _do_ want implicit sync, but control it explicitly. For this we need a flag on the drm_file, so that a given userspace (like vulkan) can manage the implicit sync slots explicitly. The other side of the pipeline (compositor, other process or just different stage in a media pipeline in the same process) can then either do the same, or fully participate in the implicit sync as implemented by the kernel by default.
By building on the existing flag for buffers we avoid any issues with opening up additional security concerns - anything this new flag here allows is already.
All drivers which supports this concept of a userspace-specific opt-out of implicit sync have a flag in their CS ioctl, but in reality that turned out to be a bit too inflexible. See the discussion below, let's try to do a bit better for amdgpu.
This alone only allows us to completely avoid any stalls due to implicit sync, it does not yet allow us to use implicit sync as a strange form of IPC for sync_file.
For that we need two more pieces:
a way to get the current implicit sync fences out of a buffer. Could be done in a driver ioctl, but everyone needs this, and generally a dma-buf is involved anyway to establish the sharing. So an ioctl on the dma-buf makes a ton more sense:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
Current drivers in upstream solves this by having the opt-out flag on their CS ioctl. This has the downside that very often the CS which must actually stall for the implicit fence is run a while after the implicit fence point was logically sampled per the api spec (vk passes an explicit syncobj around for that afaiui), and so results in oversync. Converting the implicit sync fences into a snap-shot sync_file is actually accurate.
Simillar we need to be able to set the exclusive implicit fence. Current drivers again do this with a CS ioctl flag, with again the same problems that the time the CS happens additional dependencies have been added. An explicit ioctl to only insert a sync_file (while respecting the rules for how exclusive and shared fence slots must be update in struct dma_resv) is much better. This is proposed here:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
These three pieces together allow userspace to fully control implicit fencing and remove all unecessary stall points due to them.
Well, as much as the implicit fencing model fundamentally allows: There is only one set of fences, you can only choose to sync against only writers (exclusive slot), or everyone. Hence suballocating multiple buffers or anything else like this is fundamentally not possible, and can only be fixed by a proper explicit fencing model.
Aside from that caveat this model gets implicit fencing as closely to explicit fencing semantics as possible:
On the actual implementation I opted for a simple setparam ioctl, no locking (just atomic reads/writes) for simplicity. There is a nice flag parameter in the VM ioctl which we could use, except:
- it's not checked, so userspace likely passes garbage
- there's already a comment that userspace _does_ pass garbage in the priority field
So yeah unfortunately this flag parameter for setting vm flags is useless, and we need to hack up a new one.
v2: Explain why a new SETPARAM (Jason)
v3: Bas noticed I forgot to hook up the dependency-side shortcut. We need both, or this doesn't do much.
v4: Rebase over the amdgpu patch to always set the implicit sync fences.
So I think there is still a case missing in this implementation. Consider these 3 cases
(format: a->b: b waits on a. Yes, I know arrows are hard)
explicit->explicit: This doesn't wait now, which is good Implicit->explicit: This doesn't wait now, which is good explicit->implicit : This still waits as the explicit submission still adds shared fences and most things that set an exclusive fence for implicit sync will hence wait on it.
This is probably good enough for what radv needs now but also sounds like a risk wrt baking in new uapi behavior that we don't want to be the end result.
Within AMDGPU this is probably solvable in two ways:
- Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences.
I'm not sure that works. I think the right fix is that radeonsi also switches to this model, with maybe a per-bo CS flag to set indicate write access, to cut down on the number of ioctls that are needed otherwise on shared buffers. This per-bo flag would essentially select between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis.
Yeah, but I'm still not entirely sure why that approach isn't sufficient?
Problem with the per context or per vm flag is that you then don't get any implicit synchronization any more when another process starts using the buffer.
That is exactly what I want for Vulkan :)
Yeah, but as far as I know this is not something we can do.
See we have use cases like screen capture and debug which rely on that behavior.
The only thing we can do is to say on a per buffer flag that a buffer should not participate in implicit sync at all.
Regards, Christian.
The current amdgpu uapi just doesn't allow any other model without an explicit opt-in. So current implicit sync userspace just has to oversync, there's not much choice.
- Have an EXPLICIT fence owner that is used for explicit submissions
that is ignored by AMDGPU_SYNC_NE_OWNER.
But this doesn't solve cross-driver interactions here.
Yeah cross-driver is still entirely unsolved, because amdgpu_bo_explicit_sync() on the bo didn't solve that either.
Hui? You have lost me. Why is that still unsolved?
The part we're trying to solve with this patch is Vulkan should not participate in any implicit sync at all wrt submissions (and then handle the implicit sync for WSI explicitly using the fence import/export stuff that Jason wrote). As long we add shared fences to the dma_resv we participate in implicit sync (at the level of an implicit sync read) still, at least from the perspective of later jobs waiting on these fences.
Regards, Christian.
-Daniel
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ 4 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 65df34c17264..c5386d13eb4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_bo *gds; struct amdgpu_bo *gws; struct amdgpu_bo *oa;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; INIT_LIST_HEAD(&p->validated);
@@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
e->bo_va = amdgpu_vm_bo_find(vm, bo);
if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
if (bo->tbo.base.dma_buf &&
!(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { e->chain = dma_fence_chain_alloc(); if (!e->chain) { r = -ENOMEM;
@@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; struct amdgpu_bo_list_entry *e;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; list_for_each_entry(e, &p->validated, tv.head) {
@@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) struct dma_resv *resv = bo->tbo.base.resv; enum amdgpu_sync_mode sync_mode;
sync_mode = amdgpu_bo_explicit_sync(bo) ?
sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, &fpriv->vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index c080ba15ae77..f982626b5328 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) return 0; }
+int amdgpu_setparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
+{
struct drm_amdgpu_setparam *setparam = data;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
switch (setparam->param) {
case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC:
if (setparam->value)
WRITE_ONCE(fpriv->vm.no_implicit_sync, true);
else
WRITE_ONCE(fpriv->vm.no_implicit_sync, false);
break;
default:
return -EINVAL;
}
return 0;
+}
- const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
@@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
};
static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index ddb85a85cbba..0e8c440c6303 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -321,6 +321,12 @@ struct amdgpu_vm { bool bulk_moveable; /* Flag to indicate if VM is used for compute */ bool is_compute_context;
/*
* Flag to indicate whether implicit sync should always be skipped on
* this context. We do not care about races at all, userspace is allowed
* to shoot itself with implicit sync to its fullest liking.
*/
bool no_implicit_sync;
};
struct amdgpu_vm_manager {
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 0cbd1540aeac..9eae245c14d6 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -54,6 +54,7 @@ extern "C" { #define DRM_AMDGPU_VM 0x13 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 #define DRM_AMDGPU_SCHED 0x15 +#define DRM_AMDGPU_SETPARAM 0x16
#define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) @@ -71,6 +72,7 @@ extern "C" { #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam)
/** * DOC: memory domains @@ -306,6 +308,14 @@ union drm_amdgpu_sched { struct drm_amdgpu_sched_in in; };
+#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1
+struct drm_amdgpu_setparam {
/* AMDGPU_SETPARAM_* */
__u32 param;
__u32 value;
+};
- /*
- This is not a reliable API and you should expect it to fail for any
- number of reasons and have fallback path that do not use userptr to
-- 2.32.0.rc2
On Wed, Jun 23, 2021 at 3:44 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen:
On Wed, Jun 23, 2021 at 2:59 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 14:18 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
WARNING: Absolutely untested beyond "gcc isn't dying in agony".
Implicit fencing done properly needs to treat the implicit fencing slots like a funny kind of IPC mailbox. In other words it needs to be explicitly. This is the only way it will mesh well with explicit fencing userspace like vk, and it's also the bare minimum required to be able to manage anything else that wants to use the same buffer on multiple engines in parallel, and still be able to share it through implicit sync.
amdgpu completely lacks such an uapi. Fix this.
Luckily the concept of ignoring implicit fences exists already, and takes care of all the complexities of making sure that non-optional fences (like bo moves) are not ignored. This support was added in
commit 177ae09b5d699a5ebd1cafcee78889db968abf54 Author: Andres Rodriguez andresx7@gmail.com Date: Fri Sep 15 20:44:06 2017 -0400
drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
Unfortuantely it's the wrong semantics, because it's a bo flag and disables implicit sync on an allocated buffer completely.
We _do_ want implicit sync, but control it explicitly. For this we need a flag on the drm_file, so that a given userspace (like vulkan) can manage the implicit sync slots explicitly. The other side of the pipeline (compositor, other process or just different stage in a media pipeline in the same process) can then either do the same, or fully participate in the implicit sync as implemented by the kernel by default.
By building on the existing flag for buffers we avoid any issues with opening up additional security concerns - anything this new flag here allows is already.
All drivers which supports this concept of a userspace-specific opt-out of implicit sync have a flag in their CS ioctl, but in reality that turned out to be a bit too inflexible. See the discussion below, let's try to do a bit better for amdgpu.
This alone only allows us to completely avoid any stalls due to implicit sync, it does not yet allow us to use implicit sync as a strange form of IPC for sync_file.
For that we need two more pieces:
a way to get the current implicit sync fences out of a buffer. Could be done in a driver ioctl, but everyone needs this, and generally a dma-buf is involved anyway to establish the sharing. So an ioctl on the dma-buf makes a ton more sense:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
Current drivers in upstream solves this by having the opt-out flag on their CS ioctl. This has the downside that very often the CS which must actually stall for the implicit fence is run a while after the implicit fence point was logically sampled per the api spec (vk passes an explicit syncobj around for that afaiui), and so results in oversync. Converting the implicit sync fences into a snap-shot sync_file is actually accurate.
Simillar we need to be able to set the exclusive implicit fence. Current drivers again do this with a CS ioctl flag, with again the same problems that the time the CS happens additional dependencies have been added. An explicit ioctl to only insert a sync_file (while respecting the rules for how exclusive and shared fence slots must be update in struct dma_resv) is much better. This is proposed here:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne...
These three pieces together allow userspace to fully control implicit fencing and remove all unecessary stall points due to them.
Well, as much as the implicit fencing model fundamentally allows: There is only one set of fences, you can only choose to sync against only writers (exclusive slot), or everyone. Hence suballocating multiple buffers or anything else like this is fundamentally not possible, and can only be fixed by a proper explicit fencing model.
Aside from that caveat this model gets implicit fencing as closely to explicit fencing semantics as possible:
On the actual implementation I opted for a simple setparam ioctl, no locking (just atomic reads/writes) for simplicity. There is a nice flag parameter in the VM ioctl which we could use, except:
- it's not checked, so userspace likely passes garbage
- there's already a comment that userspace _does_ pass garbage in the priority field
So yeah unfortunately this flag parameter for setting vm flags is useless, and we need to hack up a new one.
v2: Explain why a new SETPARAM (Jason)
v3: Bas noticed I forgot to hook up the dependency-side shortcut. We need both, or this doesn't do much.
v4: Rebase over the amdgpu patch to always set the implicit sync fences.
So I think there is still a case missing in this implementation. Consider these 3 cases
(format: a->b: b waits on a. Yes, I know arrows are hard)
explicit->explicit: This doesn't wait now, which is good Implicit->explicit: This doesn't wait now, which is good explicit->implicit : This still waits as the explicit submission still adds shared fences and most things that set an exclusive fence for implicit sync will hence wait on it.
This is probably good enough for what radv needs now but also sounds like a risk wrt baking in new uapi behavior that we don't want to be the end result.
Within AMDGPU this is probably solvable in two ways:
- Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences.
I'm not sure that works. I think the right fix is that radeonsi also switches to this model, with maybe a per-bo CS flag to set indicate write access, to cut down on the number of ioctls that are needed otherwise on shared buffers. This per-bo flag would essentially select between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis.
Yeah, but I'm still not entirely sure why that approach isn't sufficient?
Problem with the per context or per vm flag is that you then don't get any implicit synchronization any more when another process starts using the buffer.
That is exactly what I want for Vulkan :)
Yeah, but as far as I know this is not something we can do.
See we have use cases like screen capture and debug which rely on that behavior.
They will keep working, if (and only if) the vulkan side sets the winsys fences correctly. Also, everything else in vulkan aside from winsys is explicitly not synced at all, you have to import drm syncobj timeline on the gl side.
The only thing we can do is to say on a per buffer flag that a buffer should not participate in implicit sync at all.
Nah, this doesn't work. Because it's not a global decision, is a local decision for the rendered. Vulkan wants to control implicit sync explicitly, and the kernel can't force more synchronization. If a buffer is shared as a winsys buffer between vulkan client and gl using compositor, then you _have_ to use implicit sync on it. But vk needs to set the fences directly (and if the app gets it wrong, you get misrendering, but that is the specified behavour of vulkan). -Daniel
Regards, Christian.
The current amdgpu uapi just doesn't allow any other model without an explicit opt-in. So current implicit sync userspace just has to oversync, there's not much choice.
- Have an EXPLICIT fence owner that is used for explicit submissions
that is ignored by AMDGPU_SYNC_NE_OWNER.
But this doesn't solve cross-driver interactions here.
Yeah cross-driver is still entirely unsolved, because amdgpu_bo_explicit_sync() on the bo didn't solve that either.
Hui? You have lost me. Why is that still unsolved?
The part we're trying to solve with this patch is Vulkan should not participate in any implicit sync at all wrt submissions (and then handle the implicit sync for WSI explicitly using the fence import/export stuff that Jason wrote). As long we add shared fences to the dma_resv we participate in implicit sync (at the level of an implicit sync read) still, at least from the perspective of later jobs waiting on these fences.
Regards, Christian.
-Daniel
Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Cc: Dave Airlie airlied@gmail.com Cc: Rob Clark robdclark@chromium.org Cc: Kristian H. Kristensen hoegsberg@google.com Cc: Michel Dänzer michel@daenzer.net Cc: Daniel Stone daniels@collabora.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Deepak R Varma mh12gx2825@gmail.com Cc: Chen Li chenli@uniontech.com Cc: Kevin Wang kevin1.wang@amd.com Cc: Dennis Li Dennis.Li@amd.com Cc: Luben Tuikov luben.tuikov@amd.com Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ 4 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 65df34c17264..c5386d13eb4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_bo *gds; struct amdgpu_bo *gws; struct amdgpu_bo *oa;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; INIT_LIST_HEAD(&p->validated);
@@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
e->bo_va = amdgpu_vm_bo_find(vm, bo);
if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
if (bo->tbo.base.dma_buf &&
!(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { e->chain = dma_fence_chain_alloc(); if (!e->chain) { r = -ENOMEM;
@@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; struct amdgpu_bo_list_entry *e;
bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); int r; list_for_each_entry(e, &p->validated, tv.head) {
@@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) struct dma_resv *resv = bo->tbo.base.resv; enum amdgpu_sync_mode sync_mode;
sync_mode = amdgpu_bo_explicit_sync(bo) ?
sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, &fpriv->vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index c080ba15ae77..f982626b5328 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) return 0; }
+int amdgpu_setparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
+{
struct drm_amdgpu_setparam *setparam = data;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
switch (setparam->param) {
case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC:
if (setparam->value)
WRITE_ONCE(fpriv->vm.no_implicit_sync, true);
else
WRITE_ONCE(fpriv->vm.no_implicit_sync, false);
break;
default:
return -EINVAL;
}
return 0;
+}
- const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
@@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
};
static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index ddb85a85cbba..0e8c440c6303 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -321,6 +321,12 @@ struct amdgpu_vm { bool bulk_moveable; /* Flag to indicate if VM is used for compute */ bool is_compute_context;
/*
* Flag to indicate whether implicit sync should always be skipped on
* this context. We do not care about races at all, userspace is allowed
* to shoot itself with implicit sync to its fullest liking.
*/
bool no_implicit_sync;
};
struct amdgpu_vm_manager {
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 0cbd1540aeac..9eae245c14d6 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -54,6 +54,7 @@ extern "C" { #define DRM_AMDGPU_VM 0x13 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 #define DRM_AMDGPU_SCHED 0x15 +#define DRM_AMDGPU_SETPARAM 0x16
#define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) @@ -71,6 +72,7 @@ extern "C" { #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam)
/** * DOC: memory domains @@ -306,6 +308,14 @@ union drm_amdgpu_sched { struct drm_amdgpu_sched_in in; };
+#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1
+struct drm_amdgpu_setparam {
/* AMDGPU_SETPARAM_* */
__u32 param;
__u32 value;
+};
- /*
- This is not a reliable API and you should expect it to fail for any
- number of reasons and have fallback path that do not use userptr to
-- 2.32.0.rc2
Am 23.06.21 um 15:49 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 3:44 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen:
On Wed, Jun 23, 2021 at 2:59 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 14:18 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote: > WARNING: Absolutely untested beyond "gcc isn't dying in agony". > > Implicit fencing done properly needs to treat the implicit fencing > slots like a funny kind of IPC mailbox. In other words it needs to be > explicitly. This is the only way it will mesh well with explicit > fencing userspace like vk, and it's also the bare minimum required to > be able to manage anything else that wants to use the same buffer on > multiple engines in parallel, and still be able to share it through > implicit sync. > > amdgpu completely lacks such an uapi. Fix this. > > Luckily the concept of ignoring implicit fences exists already, and > takes care of all the complexities of making sure that non-optional > fences (like bo moves) are not ignored. This support was added in > > commit 177ae09b5d699a5ebd1cafcee78889db968abf54 > Author: Andres Rodriguez andresx7@gmail.com > Date: Fri Sep 15 20:44:06 2017 -0400 > > drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 > > Unfortuantely it's the wrong semantics, because it's a bo flag and > disables implicit sync on an allocated buffer completely. > > We _do_ want implicit sync, but control it explicitly. For this we > need a flag on the drm_file, so that a given userspace (like vulkan) > can manage the implicit sync slots explicitly. The other side of the > pipeline (compositor, other process or just different stage in a media > pipeline in the same process) can then either do the same, or fully > participate in the implicit sync as implemented by the kernel by > default. > > By building on the existing flag for buffers we avoid any issues with > opening up additional security concerns - anything this new flag here > allows is already. > > All drivers which supports this concept of a userspace-specific > opt-out of implicit sync have a flag in their CS ioctl, but in reality > that turned out to be a bit too inflexible. See the discussion below, > let's try to do a bit better for amdgpu. > > This alone only allows us to completely avoid any stalls due to > implicit sync, it does not yet allow us to use implicit sync as a > strange form of IPC for sync_file. > > For that we need two more pieces: > > - a way to get the current implicit sync fences out of a buffer. Could > be done in a driver ioctl, but everyone needs this, and generally a > dma-buf is involved anyway to establish the sharing. So an ioctl on > the dma-buf makes a ton more sense: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... > > Current drivers in upstream solves this by having the opt-out flag > on their CS ioctl. This has the downside that very often the CS > which must actually stall for the implicit fence is run a while > after the implicit fence point was logically sampled per the api > spec (vk passes an explicit syncobj around for that afaiui), and so > results in oversync. Converting the implicit sync fences into a > snap-shot sync_file is actually accurate. > > - Simillar we need to be able to set the exclusive implicit fence. > Current drivers again do this with a CS ioctl flag, with again the > same problems that the time the CS happens additional dependencies > have been added. An explicit ioctl to only insert a sync_file (while > respecting the rules for how exclusive and shared fence slots must > be update in struct dma_resv) is much better. This is proposed here: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... > > These three pieces together allow userspace to fully control implicit > fencing and remove all unecessary stall points due to them. > > Well, as much as the implicit fencing model fundamentally allows: > There is only one set of fences, you can only choose to sync against > only writers (exclusive slot), or everyone. Hence suballocating > multiple buffers or anything else like this is fundamentally not > possible, and can only be fixed by a proper explicit fencing model. > > Aside from that caveat this model gets implicit fencing as closely to > explicit fencing semantics as possible: > > On the actual implementation I opted for a simple setparam ioctl, no > locking (just atomic reads/writes) for simplicity. There is a nice > flag parameter in the VM ioctl which we could use, except: > - it's not checked, so userspace likely passes garbage > - there's already a comment that userspace _does_ pass garbage in the > priority field > So yeah unfortunately this flag parameter for setting vm flags is > useless, and we need to hack up a new one. > > v2: Explain why a new SETPARAM (Jason) > > v3: Bas noticed I forgot to hook up the dependency-side shortcut. We > need both, or this doesn't do much. > > v4: Rebase over the amdgpu patch to always set the implicit sync > fences. So I think there is still a case missing in this implementation. Consider these 3 cases
(format: a->b: b waits on a. Yes, I know arrows are hard)
explicit->explicit: This doesn't wait now, which is good Implicit->explicit: This doesn't wait now, which is good explicit->implicit : This still waits as the explicit submission still adds shared fences and most things that set an exclusive fence for implicit sync will hence wait on it.
This is probably good enough for what radv needs now but also sounds like a risk wrt baking in new uapi behavior that we don't want to be the end result.
Within AMDGPU this is probably solvable in two ways:
- Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences.
I'm not sure that works. I think the right fix is that radeonsi also switches to this model, with maybe a per-bo CS flag to set indicate write access, to cut down on the number of ioctls that are needed otherwise on shared buffers. This per-bo flag would essentially select between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis.
Yeah, but I'm still not entirely sure why that approach isn't sufficient?
Problem with the per context or per vm flag is that you then don't get any implicit synchronization any more when another process starts using the buffer.
That is exactly what I want for Vulkan :)
Yeah, but as far as I know this is not something we can do.
See we have use cases like screen capture and debug which rely on that behavior.
They will keep working, if (and only if) the vulkan side sets the winsys fences correctly. Also, everything else in vulkan aside from winsys is explicitly not synced at all, you have to import drm syncobj timeline on the gl side.
The only thing we can do is to say on a per buffer flag that a buffer should not participate in implicit sync at all.
Nah, this doesn't work. Because it's not a global decision, is a local decision for the rendered. Vulkan wants to control implicit sync explicitly, and the kernel can't force more synchronization. If a buffer is shared as a winsys buffer between vulkan client and gl using compositor, then you _have_ to use implicit sync on it. But vk needs to set the fences directly (and if the app gets it wrong, you get misrendering, but that is the specified behavour of vulkan).
Yeah, but that's exactly what we tried to avoid.
Mhm, when we attach the flag to the process/VM then this would break the use case of VA-API and Vulkan in the same process.
But I think if you attach the flag to the context that should indeed work fine.
Christian.
-Daniel
Regards, Christian.
The current amdgpu uapi just doesn't allow any other model without an explicit opt-in. So current implicit sync userspace just has to oversync, there's not much choice.
- Have an EXPLICIT fence owner that is used for explicit submissions
that is ignored by AMDGPU_SYNC_NE_OWNER.
But this doesn't solve cross-driver interactions here.
Yeah cross-driver is still entirely unsolved, because amdgpu_bo_explicit_sync() on the bo didn't solve that either.
Hui? You have lost me. Why is that still unsolved?
The part we're trying to solve with this patch is Vulkan should not participate in any implicit sync at all wrt submissions (and then handle the implicit sync for WSI explicitly using the fence import/export stuff that Jason wrote). As long we add shared fences to the dma_resv we participate in implicit sync (at the level of an implicit sync read) still, at least from the perspective of later jobs waiting on these fences.
Regards, Christian.
-Daniel
> Cc: mesa-dev@lists.freedesktop.org > Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl > Cc: Dave Airlie airlied@gmail.com > Cc: Rob Clark robdclark@chromium.org > Cc: Kristian H. Kristensen hoegsberg@google.com > Cc: Michel Dänzer michel@daenzer.net > Cc: Daniel Stone daniels@collabora.com > Cc: Sumit Semwal sumit.semwal@linaro.org > Cc: "Christian König" christian.koenig@amd.com > Cc: Alex Deucher alexander.deucher@amd.com > Cc: Daniel Vetter daniel.vetter@ffwll.ch > Cc: Deepak R Varma mh12gx2825@gmail.com > Cc: Chen Li chenli@uniontech.com > Cc: Kevin Wang kevin1.wang@amd.com > Cc: Dennis Li Dennis.Li@amd.com > Cc: Luben Tuikov luben.tuikov@amd.com > Cc: linaro-mm-sig@lists.linaro.org > Signed-off-by: Daniel Vetter daniel.vetter@intel.com > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ > include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ > 4 files changed, 42 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index 65df34c17264..c5386d13eb4a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, > struct amdgpu_bo *gds; > struct amdgpu_bo *gws; > struct amdgpu_bo *oa; > + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); > int r; > > INIT_LIST_HEAD(&p->validated); > @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, > > e->bo_va = amdgpu_vm_bo_find(vm, bo); > > - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { > + if (bo->tbo.base.dma_buf && > + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { > e->chain = dma_fence_chain_alloc(); > if (!e->chain) { > r = -ENOMEM; > @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) > { > struct amdgpu_fpriv *fpriv = p->filp->driver_priv; > struct amdgpu_bo_list_entry *e; > + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); > int r; > > list_for_each_entry(e, &p->validated, tv.head) { > @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) > struct dma_resv *resv = bo->tbo.base.resv; > enum amdgpu_sync_mode sync_mode; > > - sync_mode = amdgpu_bo_explicit_sync(bo) ? > + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? > AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; > r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, > &fpriv->vm); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index c080ba15ae77..f982626b5328 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) > return 0; > } > > +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, > + struct drm_file *filp) > +{ > + struct drm_amdgpu_setparam *setparam = data; > + struct amdgpu_fpriv *fpriv = filp->driver_priv; > + > + switch (setparam->param) { > + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: > + if (setparam->value) > + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); > + else > + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); > + break; > + default: > + return -EINVAL; > + } > + > + return 0; > +} > + > const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > }; > > static const struct drm_driver amdgpu_kms_driver = { > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > index ddb85a85cbba..0e8c440c6303 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > @@ -321,6 +321,12 @@ struct amdgpu_vm { > bool bulk_moveable; > /* Flag to indicate if VM is used for compute */ > bool is_compute_context; > + /* > + * Flag to indicate whether implicit sync should always be skipped on > + * this context. We do not care about races at all, userspace is allowed > + * to shoot itself with implicit sync to its fullest liking. > + */ > + bool no_implicit_sync; > }; > > struct amdgpu_vm_manager { > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > index 0cbd1540aeac..9eae245c14d6 100644 > --- a/include/uapi/drm/amdgpu_drm.h > +++ b/include/uapi/drm/amdgpu_drm.h > @@ -54,6 +54,7 @@ extern "C" { > #define DRM_AMDGPU_VM 0x13 > #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 > #define DRM_AMDGPU_SCHED 0x15 > +#define DRM_AMDGPU_SETPARAM 0x16 > > #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) > #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) > @@ -71,6 +72,7 @@ extern "C" { > #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) > #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) > #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) > +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) > > /** > * DOC: memory domains > @@ -306,6 +308,14 @@ union drm_amdgpu_sched { > struct drm_amdgpu_sched_in in; > }; > > +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 > + > +struct drm_amdgpu_setparam { > + /* AMDGPU_SETPARAM_* */ > + __u32 param; > + __u32 value; > +}; > + > /* > * This is not a reliable API and you should expect it to fail for any > * number of reasons and have fallback path that do not use userptr to > -- > 2.32.0.rc2 >
On Wed, Jun 23, 2021 at 4:02 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:49 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 3:44 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen:
On Wed, Jun 23, 2021 at 2:59 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 14:18 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote: > On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote: >> WARNING: Absolutely untested beyond "gcc isn't dying in agony". >> >> Implicit fencing done properly needs to treat the implicit fencing >> slots like a funny kind of IPC mailbox. In other words it needs to be >> explicitly. This is the only way it will mesh well with explicit >> fencing userspace like vk, and it's also the bare minimum required to >> be able to manage anything else that wants to use the same buffer on >> multiple engines in parallel, and still be able to share it through >> implicit sync. >> >> amdgpu completely lacks such an uapi. Fix this. >> >> Luckily the concept of ignoring implicit fences exists already, and >> takes care of all the complexities of making sure that non-optional >> fences (like bo moves) are not ignored. This support was added in >> >> commit 177ae09b5d699a5ebd1cafcee78889db968abf54 >> Author: Andres Rodriguez andresx7@gmail.com >> Date: Fri Sep 15 20:44:06 2017 -0400 >> >> drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 >> >> Unfortuantely it's the wrong semantics, because it's a bo flag and >> disables implicit sync on an allocated buffer completely. >> >> We _do_ want implicit sync, but control it explicitly. For this we >> need a flag on the drm_file, so that a given userspace (like vulkan) >> can manage the implicit sync slots explicitly. The other side of the >> pipeline (compositor, other process or just different stage in a media >> pipeline in the same process) can then either do the same, or fully >> participate in the implicit sync as implemented by the kernel by >> default. >> >> By building on the existing flag for buffers we avoid any issues with >> opening up additional security concerns - anything this new flag here >> allows is already. >> >> All drivers which supports this concept of a userspace-specific >> opt-out of implicit sync have a flag in their CS ioctl, but in reality >> that turned out to be a bit too inflexible. See the discussion below, >> let's try to do a bit better for amdgpu. >> >> This alone only allows us to completely avoid any stalls due to >> implicit sync, it does not yet allow us to use implicit sync as a >> strange form of IPC for sync_file. >> >> For that we need two more pieces: >> >> - a way to get the current implicit sync fences out of a buffer. Could >> be done in a driver ioctl, but everyone needs this, and generally a >> dma-buf is involved anyway to establish the sharing. So an ioctl on >> the dma-buf makes a ton more sense: >> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >> >> Current drivers in upstream solves this by having the opt-out flag >> on their CS ioctl. This has the downside that very often the CS >> which must actually stall for the implicit fence is run a while >> after the implicit fence point was logically sampled per the api >> spec (vk passes an explicit syncobj around for that afaiui), and so >> results in oversync. Converting the implicit sync fences into a >> snap-shot sync_file is actually accurate. >> >> - Simillar we need to be able to set the exclusive implicit fence. >> Current drivers again do this with a CS ioctl flag, with again the >> same problems that the time the CS happens additional dependencies >> have been added. An explicit ioctl to only insert a sync_file (while >> respecting the rules for how exclusive and shared fence slots must >> be update in struct dma_resv) is much better. This is proposed here: >> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >> >> These three pieces together allow userspace to fully control implicit >> fencing and remove all unecessary stall points due to them. >> >> Well, as much as the implicit fencing model fundamentally allows: >> There is only one set of fences, you can only choose to sync against >> only writers (exclusive slot), or everyone. Hence suballocating >> multiple buffers or anything else like this is fundamentally not >> possible, and can only be fixed by a proper explicit fencing model. >> >> Aside from that caveat this model gets implicit fencing as closely to >> explicit fencing semantics as possible: >> >> On the actual implementation I opted for a simple setparam ioctl, no >> locking (just atomic reads/writes) for simplicity. There is a nice >> flag parameter in the VM ioctl which we could use, except: >> - it's not checked, so userspace likely passes garbage >> - there's already a comment that userspace _does_ pass garbage in the >> priority field >> So yeah unfortunately this flag parameter for setting vm flags is >> useless, and we need to hack up a new one. >> >> v2: Explain why a new SETPARAM (Jason) >> >> v3: Bas noticed I forgot to hook up the dependency-side shortcut. We >> need both, or this doesn't do much. >> >> v4: Rebase over the amdgpu patch to always set the implicit sync >> fences. > So I think there is still a case missing in this implementation. > Consider these 3 cases > > (format: a->b: b waits on a. Yes, I know arrows are hard) > > explicit->explicit: This doesn't wait now, which is good > Implicit->explicit: This doesn't wait now, which is good > explicit->implicit : This still waits as the explicit submission still > adds shared fences and most things that set an exclusive fence for > implicit sync will hence wait on it. > > This is probably good enough for what radv needs now but also sounds > like a risk wrt baking in new uapi behavior that we don't want to be > the end result. > > Within AMDGPU this is probably solvable in two ways: > > 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences. I'm not sure that works. I think the right fix is that radeonsi also switches to this model, with maybe a per-bo CS flag to set indicate write access, to cut down on the number of ioctls that are needed otherwise on shared buffers. This per-bo flag would essentially select between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis.
Yeah, but I'm still not entirely sure why that approach isn't sufficient?
Problem with the per context or per vm flag is that you then don't get any implicit synchronization any more when another process starts using the buffer.
That is exactly what I want for Vulkan :)
Yeah, but as far as I know this is not something we can do.
See we have use cases like screen capture and debug which rely on that behavior.
They will keep working, if (and only if) the vulkan side sets the winsys fences correctly. Also, everything else in vulkan aside from winsys is explicitly not synced at all, you have to import drm syncobj timeline on the gl side.
The only thing we can do is to say on a per buffer flag that a buffer should not participate in implicit sync at all.
Nah, this doesn't work. Because it's not a global decision, is a local decision for the rendered. Vulkan wants to control implicit sync explicitly, and the kernel can't force more synchronization. If a buffer is shared as a winsys buffer between vulkan client and gl using compositor, then you _have_ to use implicit sync on it. But vk needs to set the fences directly (and if the app gets it wrong, you get misrendering, but that is the specified behavour of vulkan).
Yeah, but that's exactly what we tried to avoid.
Mhm, when we attach the flag to the process/VM then this would break the use case of VA-API and Vulkan in the same process.
But I think if you attach the flag to the context that should indeed work fine.
Yeah that's a question I have, whether the drm_file is shared within one process among everything, or whether radeonsi/libva/vk each have their own. If each have their own drm_file, then we should be fine, otherwise we need to figure out another place to put this (worst case as a CS extension that vk just sets on every submit).
Also yes this risks that a vk app which was violationing the winsys spec will now break, which is why I think we should do this sooner than later. Otherwise the list of w/a we might need to apply in vk userspace will become very long :-( At least since this is purely opt-in from userspace, we only need to have the w/a list in userspace, where mesa has the infrastructure for that already. -Daniel
Christian.
-Daniel
Regards, Christian.
The current amdgpu uapi just doesn't allow any other model without an explicit opt-in. So current implicit sync userspace just has to oversync, there's not much choice.
> 2) Have an EXPLICIT fence owner that is used for explicit submissions > that is ignored by AMDGPU_SYNC_NE_OWNER. > > But this doesn't solve cross-driver interactions here. Yeah cross-driver is still entirely unsolved, because amdgpu_bo_explicit_sync() on the bo didn't solve that either.
Hui? You have lost me. Why is that still unsolved?
The part we're trying to solve with this patch is Vulkan should not participate in any implicit sync at all wrt submissions (and then handle the implicit sync for WSI explicitly using the fence import/export stuff that Jason wrote). As long we add shared fences to the dma_resv we participate in implicit sync (at the level of an implicit sync read) still, at least from the perspective of later jobs waiting on these fences.
Regards, Christian.
-Daniel
>> Cc: mesa-dev@lists.freedesktop.org >> Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl >> Cc: Dave Airlie airlied@gmail.com >> Cc: Rob Clark robdclark@chromium.org >> Cc: Kristian H. Kristensen hoegsberg@google.com >> Cc: Michel Dänzer michel@daenzer.net >> Cc: Daniel Stone daniels@collabora.com >> Cc: Sumit Semwal sumit.semwal@linaro.org >> Cc: "Christian König" christian.koenig@amd.com >> Cc: Alex Deucher alexander.deucher@amd.com >> Cc: Daniel Vetter daniel.vetter@ffwll.ch >> Cc: Deepak R Varma mh12gx2825@gmail.com >> Cc: Chen Li chenli@uniontech.com >> Cc: Kevin Wang kevin1.wang@amd.com >> Cc: Dennis Li Dennis.Li@amd.com >> Cc: Luben Tuikov luben.tuikov@amd.com >> Cc: linaro-mm-sig@lists.linaro.org >> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ >> include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ >> 4 files changed, 42 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> index 65df34c17264..c5386d13eb4a 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >> struct amdgpu_bo *gds; >> struct amdgpu_bo *gws; >> struct amdgpu_bo *oa; >> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >> int r; >> >> INIT_LIST_HEAD(&p->validated); >> @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >> >> e->bo_va = amdgpu_vm_bo_find(vm, bo); >> >> - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { >> + if (bo->tbo.base.dma_buf && >> + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { >> e->chain = dma_fence_chain_alloc(); >> if (!e->chain) { >> r = -ENOMEM; >> @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >> { >> struct amdgpu_fpriv *fpriv = p->filp->driver_priv; >> struct amdgpu_bo_list_entry *e; >> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >> int r; >> >> list_for_each_entry(e, &p->validated, tv.head) { >> @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >> struct dma_resv *resv = bo->tbo.base.resv; >> enum amdgpu_sync_mode sync_mode; >> >> - sync_mode = amdgpu_bo_explicit_sync(bo) ? >> + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? >> AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; >> r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, >> &fpriv->vm); >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> index c080ba15ae77..f982626b5328 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >> @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) >> return 0; >> } >> >> +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, >> + struct drm_file *filp) >> +{ >> + struct drm_amdgpu_setparam *setparam = data; >> + struct amdgpu_fpriv *fpriv = filp->driver_priv; >> + >> + switch (setparam->param) { >> + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: >> + if (setparam->value) >> + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); >> + else >> + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); >> + break; >> + default: >> + return -EINVAL; >> + } >> + >> + return 0; >> +} >> + >> const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >> DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >> @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >> + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >> }; >> >> static const struct drm_driver amdgpu_kms_driver = { >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >> index ddb85a85cbba..0e8c440c6303 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >> @@ -321,6 +321,12 @@ struct amdgpu_vm { >> bool bulk_moveable; >> /* Flag to indicate if VM is used for compute */ >> bool is_compute_context; >> + /* >> + * Flag to indicate whether implicit sync should always be skipped on >> + * this context. We do not care about races at all, userspace is allowed >> + * to shoot itself with implicit sync to its fullest liking. >> + */ >> + bool no_implicit_sync; >> }; >> >> struct amdgpu_vm_manager { >> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h >> index 0cbd1540aeac..9eae245c14d6 100644 >> --- a/include/uapi/drm/amdgpu_drm.h >> +++ b/include/uapi/drm/amdgpu_drm.h >> @@ -54,6 +54,7 @@ extern "C" { >> #define DRM_AMDGPU_VM 0x13 >> #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 >> #define DRM_AMDGPU_SCHED 0x15 >> +#define DRM_AMDGPU_SETPARAM 0x16 >> >> #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) >> #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) >> @@ -71,6 +72,7 @@ extern "C" { >> #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) >> #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) >> #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) >> +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) >> >> /** >> * DOC: memory domains >> @@ -306,6 +308,14 @@ union drm_amdgpu_sched { >> struct drm_amdgpu_sched_in in; >> }; >> >> +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 >> + >> +struct drm_amdgpu_setparam { >> + /* AMDGPU_SETPARAM_* */ >> + __u32 param; >> + __u32 value; >> +}; >> + >> /* >> * This is not a reliable API and you should expect it to fail for any >> * number of reasons and have fallback path that do not use userptr to >> -- >> 2.32.0.rc2 >>
On Wed, Jun 23, 2021 at 4:50 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Jun 23, 2021 at 4:02 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:49 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 3:44 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen:
On Wed, Jun 23, 2021 at 2:59 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 14:18 schrieb Daniel Vetter: > On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen > bas@basnieuwenhuizen.nl wrote: >> On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote: >>> WARNING: Absolutely untested beyond "gcc isn't dying in agony". >>> >>> Implicit fencing done properly needs to treat the implicit fencing >>> slots like a funny kind of IPC mailbox. In other words it needs to be >>> explicitly. This is the only way it will mesh well with explicit >>> fencing userspace like vk, and it's also the bare minimum required to >>> be able to manage anything else that wants to use the same buffer on >>> multiple engines in parallel, and still be able to share it through >>> implicit sync. >>> >>> amdgpu completely lacks such an uapi. Fix this. >>> >>> Luckily the concept of ignoring implicit fences exists already, and >>> takes care of all the complexities of making sure that non-optional >>> fences (like bo moves) are not ignored. This support was added in >>> >>> commit 177ae09b5d699a5ebd1cafcee78889db968abf54 >>> Author: Andres Rodriguez andresx7@gmail.com >>> Date: Fri Sep 15 20:44:06 2017 -0400 >>> >>> drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 >>> >>> Unfortuantely it's the wrong semantics, because it's a bo flag and >>> disables implicit sync on an allocated buffer completely. >>> >>> We _do_ want implicit sync, but control it explicitly. For this we >>> need a flag on the drm_file, so that a given userspace (like vulkan) >>> can manage the implicit sync slots explicitly. The other side of the >>> pipeline (compositor, other process or just different stage in a media >>> pipeline in the same process) can then either do the same, or fully >>> participate in the implicit sync as implemented by the kernel by >>> default. >>> >>> By building on the existing flag for buffers we avoid any issues with >>> opening up additional security concerns - anything this new flag here >>> allows is already. >>> >>> All drivers which supports this concept of a userspace-specific >>> opt-out of implicit sync have a flag in their CS ioctl, but in reality >>> that turned out to be a bit too inflexible. See the discussion below, >>> let's try to do a bit better for amdgpu. >>> >>> This alone only allows us to completely avoid any stalls due to >>> implicit sync, it does not yet allow us to use implicit sync as a >>> strange form of IPC for sync_file. >>> >>> For that we need two more pieces: >>> >>> - a way to get the current implicit sync fences out of a buffer. Could >>> be done in a driver ioctl, but everyone needs this, and generally a >>> dma-buf is involved anyway to establish the sharing. So an ioctl on >>> the dma-buf makes a ton more sense: >>> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >>> >>> Current drivers in upstream solves this by having the opt-out flag >>> on their CS ioctl. This has the downside that very often the CS >>> which must actually stall for the implicit fence is run a while >>> after the implicit fence point was logically sampled per the api >>> spec (vk passes an explicit syncobj around for that afaiui), and so >>> results in oversync. Converting the implicit sync fences into a >>> snap-shot sync_file is actually accurate. >>> >>> - Simillar we need to be able to set the exclusive implicit fence. >>> Current drivers again do this with a CS ioctl flag, with again the >>> same problems that the time the CS happens additional dependencies >>> have been added. An explicit ioctl to only insert a sync_file (while >>> respecting the rules for how exclusive and shared fence slots must >>> be update in struct dma_resv) is much better. This is proposed here: >>> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >>> >>> These three pieces together allow userspace to fully control implicit >>> fencing and remove all unecessary stall points due to them. >>> >>> Well, as much as the implicit fencing model fundamentally allows: >>> There is only one set of fences, you can only choose to sync against >>> only writers (exclusive slot), or everyone. Hence suballocating >>> multiple buffers or anything else like this is fundamentally not >>> possible, and can only be fixed by a proper explicit fencing model. >>> >>> Aside from that caveat this model gets implicit fencing as closely to >>> explicit fencing semantics as possible: >>> >>> On the actual implementation I opted for a simple setparam ioctl, no >>> locking (just atomic reads/writes) for simplicity. There is a nice >>> flag parameter in the VM ioctl which we could use, except: >>> - it's not checked, so userspace likely passes garbage >>> - there's already a comment that userspace _does_ pass garbage in the >>> priority field >>> So yeah unfortunately this flag parameter for setting vm flags is >>> useless, and we need to hack up a new one. >>> >>> v2: Explain why a new SETPARAM (Jason) >>> >>> v3: Bas noticed I forgot to hook up the dependency-side shortcut. We >>> need both, or this doesn't do much. >>> >>> v4: Rebase over the amdgpu patch to always set the implicit sync >>> fences. >> So I think there is still a case missing in this implementation. >> Consider these 3 cases >> >> (format: a->b: b waits on a. Yes, I know arrows are hard) >> >> explicit->explicit: This doesn't wait now, which is good >> Implicit->explicit: This doesn't wait now, which is good >> explicit->implicit : This still waits as the explicit submission still >> adds shared fences and most things that set an exclusive fence for >> implicit sync will hence wait on it. >> >> This is probably good enough for what radv needs now but also sounds >> like a risk wrt baking in new uapi behavior that we don't want to be >> the end result. >> >> Within AMDGPU this is probably solvable in two ways: >> >> 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences. > I'm not sure that works. I think the right fix is that radeonsi also > switches to this model, with maybe a per-bo CS flag to set indicate > write access, to cut down on the number of ioctls that are needed > otherwise on shared buffers. This per-bo flag would essentially select > between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis. Yeah, but I'm still not entirely sure why that approach isn't sufficient?
Problem with the per context or per vm flag is that you then don't get any implicit synchronization any more when another process starts using the buffer.
That is exactly what I want for Vulkan :)
Yeah, but as far as I know this is not something we can do.
See we have use cases like screen capture and debug which rely on that behavior.
They will keep working, if (and only if) the vulkan side sets the winsys fences correctly. Also, everything else in vulkan aside from winsys is explicitly not synced at all, you have to import drm syncobj timeline on the gl side.
The only thing we can do is to say on a per buffer flag that a buffer should not participate in implicit sync at all.
Nah, this doesn't work. Because it's not a global decision, is a local decision for the rendered. Vulkan wants to control implicit sync explicitly, and the kernel can't force more synchronization. If a buffer is shared as a winsys buffer between vulkan client and gl using compositor, then you _have_ to use implicit sync on it. But vk needs to set the fences directly (and if the app gets it wrong, you get misrendering, but that is the specified behavour of vulkan).
Yeah, but that's exactly what we tried to avoid.
Mhm, when we attach the flag to the process/VM then this would break the use case of VA-API and Vulkan in the same process.
But I think if you attach the flag to the context that should indeed work fine.
Yeah that's a question I have, whether the drm_file is shared within one process among everything, or whether radeonsi/libva/vk each have their own. If each have their own drm_file, then we should be fine, otherwise we need to figure out another place to put this (worst case as a CS extension that vk just sets on every submit).
libdrm_amdgpu dedupes it all so we mostly end up with one drm_file per process (modulo minigbm on chromeos and modulo a master fd).
That said the current proposal is for the context right? And on the context this should pretty much work? So I'm not sure why this is the part we are discussing?
Also yes this risks that a vk app which was violationing the winsys spec will now break, which is why I think we should do this sooner than later. Otherwise the list of w/a we might need to apply in vk userspace will become very long :-( At least since this is purely opt-in from userspace, we only need to have the w/a list in userspace, where mesa has the infrastructure for that already. -Daniel
Christian.
-Daniel
Regards, Christian.
> The current amdgpu uapi just doesn't allow any other model without an > explicit opt-in. So current implicit sync userspace just has to > oversync, there's not much choice. > >> 2) Have an EXPLICIT fence owner that is used for explicit submissions >> that is ignored by AMDGPU_SYNC_NE_OWNER. >> >> But this doesn't solve cross-driver interactions here. > Yeah cross-driver is still entirely unsolved, because > amdgpu_bo_explicit_sync() on the bo didn't solve that either. Hui? You have lost me. Why is that still unsolved?
The part we're trying to solve with this patch is Vulkan should not participate in any implicit sync at all wrt submissions (and then handle the implicit sync for WSI explicitly using the fence import/export stuff that Jason wrote). As long we add shared fences to the dma_resv we participate in implicit sync (at the level of an implicit sync read) still, at least from the perspective of later jobs waiting on these fences.
Regards, Christian.
> -Daniel > >>> Cc: mesa-dev@lists.freedesktop.org >>> Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl >>> Cc: Dave Airlie airlied@gmail.com >>> Cc: Rob Clark robdclark@chromium.org >>> Cc: Kristian H. Kristensen hoegsberg@google.com >>> Cc: Michel Dänzer michel@daenzer.net >>> Cc: Daniel Stone daniels@collabora.com >>> Cc: Sumit Semwal sumit.semwal@linaro.org >>> Cc: "Christian König" christian.koenig@amd.com >>> Cc: Alex Deucher alexander.deucher@amd.com >>> Cc: Daniel Vetter daniel.vetter@ffwll.ch >>> Cc: Deepak R Varma mh12gx2825@gmail.com >>> Cc: Chen Li chenli@uniontech.com >>> Cc: Kevin Wang kevin1.wang@amd.com >>> Cc: Dennis Li Dennis.Li@amd.com >>> Cc: Luben Tuikov luben.tuikov@amd.com >>> Cc: linaro-mm-sig@lists.linaro.org >>> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ >>> include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ >>> 4 files changed, 42 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>> index 65df34c17264..c5386d13eb4a 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>> @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >>> struct amdgpu_bo *gds; >>> struct amdgpu_bo *gws; >>> struct amdgpu_bo *oa; >>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >>> int r; >>> >>> INIT_LIST_HEAD(&p->validated); >>> @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >>> >>> e->bo_va = amdgpu_vm_bo_find(vm, bo); >>> >>> - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { >>> + if (bo->tbo.base.dma_buf && >>> + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { >>> e->chain = dma_fence_chain_alloc(); >>> if (!e->chain) { >>> r = -ENOMEM; >>> @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >>> { >>> struct amdgpu_fpriv *fpriv = p->filp->driver_priv; >>> struct amdgpu_bo_list_entry *e; >>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >>> int r; >>> >>> list_for_each_entry(e, &p->validated, tv.head) { >>> @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >>> struct dma_resv *resv = bo->tbo.base.resv; >>> enum amdgpu_sync_mode sync_mode; >>> >>> - sync_mode = amdgpu_bo_explicit_sync(bo) ? >>> + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? >>> AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; >>> r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, >>> &fpriv->vm); >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> index c080ba15ae77..f982626b5328 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>> @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) >>> return 0; >>> } >>> >>> +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, >>> + struct drm_file *filp) >>> +{ >>> + struct drm_amdgpu_setparam *setparam = data; >>> + struct amdgpu_fpriv *fpriv = filp->driver_priv; >>> + >>> + switch (setparam->param) { >>> + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: >>> + if (setparam->value) >>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); >>> + else >>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); >>> + break; >>> + default: >>> + return -EINVAL; >>> + } >>> + >>> + return 0; >>> +} >>> + >>> const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>> DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>> @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>> + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>> }; >>> >>> static const struct drm_driver amdgpu_kms_driver = { >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>> index ddb85a85cbba..0e8c440c6303 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>> @@ -321,6 +321,12 @@ struct amdgpu_vm { >>> bool bulk_moveable; >>> /* Flag to indicate if VM is used for compute */ >>> bool is_compute_context; >>> + /* >>> + * Flag to indicate whether implicit sync should always be skipped on >>> + * this context. We do not care about races at all, userspace is allowed >>> + * to shoot itself with implicit sync to its fullest liking. >>> + */ >>> + bool no_implicit_sync; >>> }; >>> >>> struct amdgpu_vm_manager { >>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h >>> index 0cbd1540aeac..9eae245c14d6 100644 >>> --- a/include/uapi/drm/amdgpu_drm.h >>> +++ b/include/uapi/drm/amdgpu_drm.h >>> @@ -54,6 +54,7 @@ extern "C" { >>> #define DRM_AMDGPU_VM 0x13 >>> #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 >>> #define DRM_AMDGPU_SCHED 0x15 >>> +#define DRM_AMDGPU_SETPARAM 0x16 >>> >>> #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) >>> #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) >>> @@ -71,6 +72,7 @@ extern "C" { >>> #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) >>> #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) >>> #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) >>> +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) >>> >>> /** >>> * DOC: memory domains >>> @@ -306,6 +308,14 @@ union drm_amdgpu_sched { >>> struct drm_amdgpu_sched_in in; >>> }; >>> >>> +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 >>> + >>> +struct drm_amdgpu_setparam { >>> + /* AMDGPU_SETPARAM_* */ >>> + __u32 param; >>> + __u32 value; >>> +}; >>> + >>> /* >>> * This is not a reliable API and you should expect it to fail for any >>> * number of reasons and have fallback path that do not use userptr to >>> -- >>> 2.32.0.rc2 >>>
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Wed, Jun 23, 2021 at 04:58:27PM +0200, Bas Nieuwenhuizen wrote:
On Wed, Jun 23, 2021 at 4:50 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Jun 23, 2021 at 4:02 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:49 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 3:44 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen:
On Wed, Jun 23, 2021 at 2:59 PM Christian König christian.koenig@amd.com wrote: > Am 23.06.21 um 14:18 schrieb Daniel Vetter: >> On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen >> bas@basnieuwenhuizen.nl wrote: >>> On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote: >>>> WARNING: Absolutely untested beyond "gcc isn't dying in agony". >>>> >>>> Implicit fencing done properly needs to treat the implicit fencing >>>> slots like a funny kind of IPC mailbox. In other words it needs to be >>>> explicitly. This is the only way it will mesh well with explicit >>>> fencing userspace like vk, and it's also the bare minimum required to >>>> be able to manage anything else that wants to use the same buffer on >>>> multiple engines in parallel, and still be able to share it through >>>> implicit sync. >>>> >>>> amdgpu completely lacks such an uapi. Fix this. >>>> >>>> Luckily the concept of ignoring implicit fences exists already, and >>>> takes care of all the complexities of making sure that non-optional >>>> fences (like bo moves) are not ignored. This support was added in >>>> >>>> commit 177ae09b5d699a5ebd1cafcee78889db968abf54 >>>> Author: Andres Rodriguez andresx7@gmail.com >>>> Date: Fri Sep 15 20:44:06 2017 -0400 >>>> >>>> drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 >>>> >>>> Unfortuantely it's the wrong semantics, because it's a bo flag and >>>> disables implicit sync on an allocated buffer completely. >>>> >>>> We _do_ want implicit sync, but control it explicitly. For this we >>>> need a flag on the drm_file, so that a given userspace (like vulkan) >>>> can manage the implicit sync slots explicitly. The other side of the >>>> pipeline (compositor, other process or just different stage in a media >>>> pipeline in the same process) can then either do the same, or fully >>>> participate in the implicit sync as implemented by the kernel by >>>> default. >>>> >>>> By building on the existing flag for buffers we avoid any issues with >>>> opening up additional security concerns - anything this new flag here >>>> allows is already. >>>> >>>> All drivers which supports this concept of a userspace-specific >>>> opt-out of implicit sync have a flag in their CS ioctl, but in reality >>>> that turned out to be a bit too inflexible. See the discussion below, >>>> let's try to do a bit better for amdgpu. >>>> >>>> This alone only allows us to completely avoid any stalls due to >>>> implicit sync, it does not yet allow us to use implicit sync as a >>>> strange form of IPC for sync_file. >>>> >>>> For that we need two more pieces: >>>> >>>> - a way to get the current implicit sync fences out of a buffer. Could >>>> be done in a driver ioctl, but everyone needs this, and generally a >>>> dma-buf is involved anyway to establish the sharing. So an ioctl on >>>> the dma-buf makes a ton more sense: >>>> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >>>> >>>> Current drivers in upstream solves this by having the opt-out flag >>>> on their CS ioctl. This has the downside that very often the CS >>>> which must actually stall for the implicit fence is run a while >>>> after the implicit fence point was logically sampled per the api >>>> spec (vk passes an explicit syncobj around for that afaiui), and so >>>> results in oversync. Converting the implicit sync fences into a >>>> snap-shot sync_file is actually accurate. >>>> >>>> - Simillar we need to be able to set the exclusive implicit fence. >>>> Current drivers again do this with a CS ioctl flag, with again the >>>> same problems that the time the CS happens additional dependencies >>>> have been added. An explicit ioctl to only insert a sync_file (while >>>> respecting the rules for how exclusive and shared fence slots must >>>> be update in struct dma_resv) is much better. This is proposed here: >>>> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >>>> >>>> These three pieces together allow userspace to fully control implicit >>>> fencing and remove all unecessary stall points due to them. >>>> >>>> Well, as much as the implicit fencing model fundamentally allows: >>>> There is only one set of fences, you can only choose to sync against >>>> only writers (exclusive slot), or everyone. Hence suballocating >>>> multiple buffers or anything else like this is fundamentally not >>>> possible, and can only be fixed by a proper explicit fencing model. >>>> >>>> Aside from that caveat this model gets implicit fencing as closely to >>>> explicit fencing semantics as possible: >>>> >>>> On the actual implementation I opted for a simple setparam ioctl, no >>>> locking (just atomic reads/writes) for simplicity. There is a nice >>>> flag parameter in the VM ioctl which we could use, except: >>>> - it's not checked, so userspace likely passes garbage >>>> - there's already a comment that userspace _does_ pass garbage in the >>>> priority field >>>> So yeah unfortunately this flag parameter for setting vm flags is >>>> useless, and we need to hack up a new one. >>>> >>>> v2: Explain why a new SETPARAM (Jason) >>>> >>>> v3: Bas noticed I forgot to hook up the dependency-side shortcut. We >>>> need both, or this doesn't do much. >>>> >>>> v4: Rebase over the amdgpu patch to always set the implicit sync >>>> fences. >>> So I think there is still a case missing in this implementation. >>> Consider these 3 cases >>> >>> (format: a->b: b waits on a. Yes, I know arrows are hard) >>> >>> explicit->explicit: This doesn't wait now, which is good >>> Implicit->explicit: This doesn't wait now, which is good >>> explicit->implicit : This still waits as the explicit submission still >>> adds shared fences and most things that set an exclusive fence for >>> implicit sync will hence wait on it. >>> >>> This is probably good enough for what radv needs now but also sounds >>> like a risk wrt baking in new uapi behavior that we don't want to be >>> the end result. >>> >>> Within AMDGPU this is probably solvable in two ways: >>> >>> 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences. >> I'm not sure that works. I think the right fix is that radeonsi also >> switches to this model, with maybe a per-bo CS flag to set indicate >> write access, to cut down on the number of ioctls that are needed >> otherwise on shared buffers. This per-bo flag would essentially select >> between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis. > Yeah, but I'm still not entirely sure why that approach isn't sufficient? > > Problem with the per context or per vm flag is that you then don't get > any implicit synchronization any more when another process starts using > the buffer. That is exactly what I want for Vulkan :)
Yeah, but as far as I know this is not something we can do.
See we have use cases like screen capture and debug which rely on that behavior.
They will keep working, if (and only if) the vulkan side sets the winsys fences correctly. Also, everything else in vulkan aside from winsys is explicitly not synced at all, you have to import drm syncobj timeline on the gl side.
The only thing we can do is to say on a per buffer flag that a buffer should not participate in implicit sync at all.
Nah, this doesn't work. Because it's not a global decision, is a local decision for the rendered. Vulkan wants to control implicit sync explicitly, and the kernel can't force more synchronization. If a buffer is shared as a winsys buffer between vulkan client and gl using compositor, then you _have_ to use implicit sync on it. But vk needs to set the fences directly (and if the app gets it wrong, you get misrendering, but that is the specified behavour of vulkan).
Yeah, but that's exactly what we tried to avoid.
Mhm, when we attach the flag to the process/VM then this would break the use case of VA-API and Vulkan in the same process.
But I think if you attach the flag to the context that should indeed work fine.
Yeah that's a question I have, whether the drm_file is shared within one process among everything, or whether radeonsi/libva/vk each have their own. If each have their own drm_file, then we should be fine, otherwise we need to figure out another place to put this (worst case as a CS extension that vk just sets on every submit).
libdrm_amdgpu dedupes it all so we mostly end up with one drm_file per process (modulo minigbm on chromeos and modulo a master fd).
That said the current proposal is for the context right? And on the context this should pretty much work? So I'm not sure why this is the part we are discussing?
It's on the fpriv->vm, so on the FD. I assumed vulkan at least would want to have it's private VM for this. And on the quick I didn't see any other way to create a VM than to have an FD of your own.
If there's something else that means "gpu context with it's own vm" then the flag would need to be moved there, pointers appreciated (but maybe someone with hw + userspace can do that quicker). -Daniel
Also yes this risks that a vk app which was violationing the winsys spec will now break, which is why I think we should do this sooner than later. Otherwise the list of w/a we might need to apply in vk userspace will become very long :-( At least since this is purely opt-in from userspace, we only need to have the w/a list in userspace, where mesa has the infrastructure for that already. -Daniel
Christian.
-Daniel
Regards, Christian.
>> The current amdgpu uapi just doesn't allow any other model without an >> explicit opt-in. So current implicit sync userspace just has to >> oversync, there's not much choice. >> >>> 2) Have an EXPLICIT fence owner that is used for explicit submissions >>> that is ignored by AMDGPU_SYNC_NE_OWNER. >>> >>> But this doesn't solve cross-driver interactions here. >> Yeah cross-driver is still entirely unsolved, because >> amdgpu_bo_explicit_sync() on the bo didn't solve that either. > Hui? You have lost me. Why is that still unsolved? The part we're trying to solve with this patch is Vulkan should not participate in any implicit sync at all wrt submissions (and then handle the implicit sync for WSI explicitly using the fence import/export stuff that Jason wrote). As long we add shared fences to the dma_resv we participate in implicit sync (at the level of an implicit sync read) still, at least from the perspective of later jobs waiting on these fences.
> Regards, > Christian. > >> -Daniel >> >>>> Cc: mesa-dev@lists.freedesktop.org >>>> Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl >>>> Cc: Dave Airlie airlied@gmail.com >>>> Cc: Rob Clark robdclark@chromium.org >>>> Cc: Kristian H. Kristensen hoegsberg@google.com >>>> Cc: Michel Dänzer michel@daenzer.net >>>> Cc: Daniel Stone daniels@collabora.com >>>> Cc: Sumit Semwal sumit.semwal@linaro.org >>>> Cc: "Christian König" christian.koenig@amd.com >>>> Cc: Alex Deucher alexander.deucher@amd.com >>>> Cc: Daniel Vetter daniel.vetter@ffwll.ch >>>> Cc: Deepak R Varma mh12gx2825@gmail.com >>>> Cc: Chen Li chenli@uniontech.com >>>> Cc: Kevin Wang kevin1.wang@amd.com >>>> Cc: Dennis Li Dennis.Li@amd.com >>>> Cc: Luben Tuikov luben.tuikov@amd.com >>>> Cc: linaro-mm-sig@lists.linaro.org >>>> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ >>>> include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ >>>> 4 files changed, 42 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> index 65df34c17264..c5386d13eb4a 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >>>> struct amdgpu_bo *gds; >>>> struct amdgpu_bo *gws; >>>> struct amdgpu_bo *oa; >>>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >>>> int r; >>>> >>>> INIT_LIST_HEAD(&p->validated); >>>> @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >>>> >>>> e->bo_va = amdgpu_vm_bo_find(vm, bo); >>>> >>>> - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { >>>> + if (bo->tbo.base.dma_buf && >>>> + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { >>>> e->chain = dma_fence_chain_alloc(); >>>> if (!e->chain) { >>>> r = -ENOMEM; >>>> @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >>>> { >>>> struct amdgpu_fpriv *fpriv = p->filp->driver_priv; >>>> struct amdgpu_bo_list_entry *e; >>>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >>>> int r; >>>> >>>> list_for_each_entry(e, &p->validated, tv.head) { >>>> @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >>>> struct dma_resv *resv = bo->tbo.base.resv; >>>> enum amdgpu_sync_mode sync_mode; >>>> >>>> - sync_mode = amdgpu_bo_explicit_sync(bo) ? >>>> + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? >>>> AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; >>>> r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, >>>> &fpriv->vm); >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>> index c080ba15ae77..f982626b5328 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>> @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) >>>> return 0; >>>> } >>>> >>>> +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, >>>> + struct drm_file *filp) >>>> +{ >>>> + struct drm_amdgpu_setparam *setparam = data; >>>> + struct amdgpu_fpriv *fpriv = filp->driver_priv; >>>> + >>>> + switch (setparam->param) { >>>> + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: >>>> + if (setparam->value) >>>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); >>>> + else >>>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); >>>> + break; >>>> + default: >>>> + return -EINVAL; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>> DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>> @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>> + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>> }; >>>> >>>> static const struct drm_driver amdgpu_kms_driver = { >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>> index ddb85a85cbba..0e8c440c6303 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>> @@ -321,6 +321,12 @@ struct amdgpu_vm { >>>> bool bulk_moveable; >>>> /* Flag to indicate if VM is used for compute */ >>>> bool is_compute_context; >>>> + /* >>>> + * Flag to indicate whether implicit sync should always be skipped on >>>> + * this context. We do not care about races at all, userspace is allowed >>>> + * to shoot itself with implicit sync to its fullest liking. >>>> + */ >>>> + bool no_implicit_sync; >>>> }; >>>> >>>> struct amdgpu_vm_manager { >>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h >>>> index 0cbd1540aeac..9eae245c14d6 100644 >>>> --- a/include/uapi/drm/amdgpu_drm.h >>>> +++ b/include/uapi/drm/amdgpu_drm.h >>>> @@ -54,6 +54,7 @@ extern "C" { >>>> #define DRM_AMDGPU_VM 0x13 >>>> #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 >>>> #define DRM_AMDGPU_SCHED 0x15 >>>> +#define DRM_AMDGPU_SETPARAM 0x16 >>>> >>>> #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) >>>> #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) >>>> @@ -71,6 +72,7 @@ extern "C" { >>>> #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) >>>> #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) >>>> #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) >>>> +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) >>>> >>>> /** >>>> * DOC: memory domains >>>> @@ -306,6 +308,14 @@ union drm_amdgpu_sched { >>>> struct drm_amdgpu_sched_in in; >>>> }; >>>> >>>> +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 >>>> + >>>> +struct drm_amdgpu_setparam { >>>> + /* AMDGPU_SETPARAM_* */ >>>> + __u32 param; >>>> + __u32 value; >>>> +}; >>>> + >>>> /* >>>> * This is not a reliable API and you should expect it to fail for any >>>> * number of reasons and have fallback path that do not use userptr to >>>> -- >>>> 2.32.0.rc2 >>>>
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Am 23.06.21 um 17:03 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 04:58:27PM +0200, Bas Nieuwenhuizen wrote:
On Wed, Jun 23, 2021 at 4:50 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Jun 23, 2021 at 4:02 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:49 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 3:44 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen: > On Wed, Jun 23, 2021 at 2:59 PM Christian König > christian.koenig@amd.com wrote: >> Am 23.06.21 um 14:18 schrieb Daniel Vetter: >>> On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen >>> bas@basnieuwenhuizen.nl wrote: >>>> On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote: >>>>> WARNING: Absolutely untested beyond "gcc isn't dying in agony". >>>>> >>>>> Implicit fencing done properly needs to treat the implicit fencing >>>>> slots like a funny kind of IPC mailbox. In other words it needs to be >>>>> explicitly. This is the only way it will mesh well with explicit >>>>> fencing userspace like vk, and it's also the bare minimum required to >>>>> be able to manage anything else that wants to use the same buffer on >>>>> multiple engines in parallel, and still be able to share it through >>>>> implicit sync. >>>>> >>>>> amdgpu completely lacks such an uapi. Fix this. >>>>> >>>>> Luckily the concept of ignoring implicit fences exists already, and >>>>> takes care of all the complexities of making sure that non-optional >>>>> fences (like bo moves) are not ignored. This support was added in >>>>> >>>>> commit 177ae09b5d699a5ebd1cafcee78889db968abf54 >>>>> Author: Andres Rodriguez andresx7@gmail.com >>>>> Date: Fri Sep 15 20:44:06 2017 -0400 >>>>> >>>>> drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 >>>>> >>>>> Unfortuantely it's the wrong semantics, because it's a bo flag and >>>>> disables implicit sync on an allocated buffer completely. >>>>> >>>>> We _do_ want implicit sync, but control it explicitly. For this we >>>>> need a flag on the drm_file, so that a given userspace (like vulkan) >>>>> can manage the implicit sync slots explicitly. The other side of the >>>>> pipeline (compositor, other process or just different stage in a media >>>>> pipeline in the same process) can then either do the same, or fully >>>>> participate in the implicit sync as implemented by the kernel by >>>>> default. >>>>> >>>>> By building on the existing flag for buffers we avoid any issues with >>>>> opening up additional security concerns - anything this new flag here >>>>> allows is already. >>>>> >>>>> All drivers which supports this concept of a userspace-specific >>>>> opt-out of implicit sync have a flag in their CS ioctl, but in reality >>>>> that turned out to be a bit too inflexible. See the discussion below, >>>>> let's try to do a bit better for amdgpu. >>>>> >>>>> This alone only allows us to completely avoid any stalls due to >>>>> implicit sync, it does not yet allow us to use implicit sync as a >>>>> strange form of IPC for sync_file. >>>>> >>>>> For that we need two more pieces: >>>>> >>>>> - a way to get the current implicit sync fences out of a buffer. Could >>>>> be done in a driver ioctl, but everyone needs this, and generally a >>>>> dma-buf is involved anyway to establish the sharing. So an ioctl on >>>>> the dma-buf makes a ton more sense: >>>>> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >>>>> >>>>> Current drivers in upstream solves this by having the opt-out flag >>>>> on their CS ioctl. This has the downside that very often the CS >>>>> which must actually stall for the implicit fence is run a while >>>>> after the implicit fence point was logically sampled per the api >>>>> spec (vk passes an explicit syncobj around for that afaiui), and so >>>>> results in oversync. Converting the implicit sync fences into a >>>>> snap-shot sync_file is actually accurate. >>>>> >>>>> - Simillar we need to be able to set the exclusive implicit fence. >>>>> Current drivers again do this with a CS ioctl flag, with again the >>>>> same problems that the time the CS happens additional dependencies >>>>> have been added. An explicit ioctl to only insert a sync_file (while >>>>> respecting the rules for how exclusive and shared fence slots must >>>>> be update in struct dma_resv) is much better. This is proposed here: >>>>> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >>>>> >>>>> These three pieces together allow userspace to fully control implicit >>>>> fencing and remove all unecessary stall points due to them. >>>>> >>>>> Well, as much as the implicit fencing model fundamentally allows: >>>>> There is only one set of fences, you can only choose to sync against >>>>> only writers (exclusive slot), or everyone. Hence suballocating >>>>> multiple buffers or anything else like this is fundamentally not >>>>> possible, and can only be fixed by a proper explicit fencing model. >>>>> >>>>> Aside from that caveat this model gets implicit fencing as closely to >>>>> explicit fencing semantics as possible: >>>>> >>>>> On the actual implementation I opted for a simple setparam ioctl, no >>>>> locking (just atomic reads/writes) for simplicity. There is a nice >>>>> flag parameter in the VM ioctl which we could use, except: >>>>> - it's not checked, so userspace likely passes garbage >>>>> - there's already a comment that userspace _does_ pass garbage in the >>>>> priority field >>>>> So yeah unfortunately this flag parameter for setting vm flags is >>>>> useless, and we need to hack up a new one. >>>>> >>>>> v2: Explain why a new SETPARAM (Jason) >>>>> >>>>> v3: Bas noticed I forgot to hook up the dependency-side shortcut. We >>>>> need both, or this doesn't do much. >>>>> >>>>> v4: Rebase over the amdgpu patch to always set the implicit sync >>>>> fences. >>>> So I think there is still a case missing in this implementation. >>>> Consider these 3 cases >>>> >>>> (format: a->b: b waits on a. Yes, I know arrows are hard) >>>> >>>> explicit->explicit: This doesn't wait now, which is good >>>> Implicit->explicit: This doesn't wait now, which is good >>>> explicit->implicit : This still waits as the explicit submission still >>>> adds shared fences and most things that set an exclusive fence for >>>> implicit sync will hence wait on it. >>>> >>>> This is probably good enough for what radv needs now but also sounds >>>> like a risk wrt baking in new uapi behavior that we don't want to be >>>> the end result. >>>> >>>> Within AMDGPU this is probably solvable in two ways: >>>> >>>> 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences. >>> I'm not sure that works. I think the right fix is that radeonsi also >>> switches to this model, with maybe a per-bo CS flag to set indicate >>> write access, to cut down on the number of ioctls that are needed >>> otherwise on shared buffers. This per-bo flag would essentially select >>> between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis. >> Yeah, but I'm still not entirely sure why that approach isn't sufficient? >> >> Problem with the per context or per vm flag is that you then don't get >> any implicit synchronization any more when another process starts using >> the buffer. > That is exactly what I want for Vulkan :) Yeah, but as far as I know this is not something we can do.
See we have use cases like screen capture and debug which rely on that behavior.
They will keep working, if (and only if) the vulkan side sets the winsys fences correctly. Also, everything else in vulkan aside from winsys is explicitly not synced at all, you have to import drm syncobj timeline on the gl side.
The only thing we can do is to say on a per buffer flag that a buffer should not participate in implicit sync at all.
Nah, this doesn't work. Because it's not a global decision, is a local decision for the rendered. Vulkan wants to control implicit sync explicitly, and the kernel can't force more synchronization. If a buffer is shared as a winsys buffer between vulkan client and gl using compositor, then you _have_ to use implicit sync on it. But vk needs to set the fences directly (and if the app gets it wrong, you get misrendering, but that is the specified behavour of vulkan).
Yeah, but that's exactly what we tried to avoid.
Mhm, when we attach the flag to the process/VM then this would break the use case of VA-API and Vulkan in the same process.
But I think if you attach the flag to the context that should indeed work fine.
Yeah that's a question I have, whether the drm_file is shared within one process among everything, or whether radeonsi/libva/vk each have their own. If each have their own drm_file, then we should be fine, otherwise we need to figure out another place to put this (worst case as a CS extension that vk just sets on every submit).
libdrm_amdgpu dedupes it all so we mostly end up with one drm_file per process (modulo minigbm on chromeos and modulo a master fd).
That said the current proposal is for the context right? And on the context this should pretty much work? So I'm not sure why this is the part we are discussing?
It's on the fpriv->vm, so on the FD. I assumed vulkan at least would want to have it's private VM for this. And on the quick I didn't see any other way to create a VM than to have an FD of your own.
You can't have your own FD in libdrm_amdgpu userspace. We had a pretty hard design discussion about that already.
What you could do is to load your own copy of libdrm_amdgpu, but I won't recommend that.
Just putting the flag on the context instead of the VM is much cleaner as far as I can see anyway.
Christian.
If there's something else that means "gpu context with it's own vm" then the flag would need to be moved there, pointers appreciated (but maybe someone with hw + userspace can do that quicker). -Daniel
Also yes this risks that a vk app which was violationing the winsys spec will now break, which is why I think we should do this sooner than later. Otherwise the list of w/a we might need to apply in vk userspace will become very long :-( At least since this is purely opt-in from userspace, we only need to have the w/a list in userspace, where mesa has the infrastructure for that already. -Daniel
Christian.
-Daniel
Regards, Christian.
>>> The current amdgpu uapi just doesn't allow any other model without an >>> explicit opt-in. So current implicit sync userspace just has to >>> oversync, there's not much choice. >>> >>>> 2) Have an EXPLICIT fence owner that is used for explicit submissions >>>> that is ignored by AMDGPU_SYNC_NE_OWNER. >>>> >>>> But this doesn't solve cross-driver interactions here. >>> Yeah cross-driver is still entirely unsolved, because >>> amdgpu_bo_explicit_sync() on the bo didn't solve that either. >> Hui? You have lost me. Why is that still unsolved? > The part we're trying to solve with this patch is Vulkan should not > participate in any implicit sync at all wrt submissions (and then > handle the implicit sync for WSI explicitly using the fence > import/export stuff that Jason wrote). As long we add shared fences to > the dma_resv we participate in implicit sync (at the level of an > implicit sync read) still, at least from the perspective of later jobs > waiting on these fences. > >> Regards, >> Christian. >> >>> -Daniel >>> >>>>> Cc: mesa-dev@lists.freedesktop.org >>>>> Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl >>>>> Cc: Dave Airlie airlied@gmail.com >>>>> Cc: Rob Clark robdclark@chromium.org >>>>> Cc: Kristian H. Kristensen hoegsberg@google.com >>>>> Cc: Michel Dänzer michel@daenzer.net >>>>> Cc: Daniel Stone daniels@collabora.com >>>>> Cc: Sumit Semwal sumit.semwal@linaro.org >>>>> Cc: "Christian König" christian.koenig@amd.com >>>>> Cc: Alex Deucher alexander.deucher@amd.com >>>>> Cc: Daniel Vetter daniel.vetter@ffwll.ch >>>>> Cc: Deepak R Varma mh12gx2825@gmail.com >>>>> Cc: Chen Li chenli@uniontech.com >>>>> Cc: Kevin Wang kevin1.wang@amd.com >>>>> Cc: Dennis Li Dennis.Li@amd.com >>>>> Cc: Luben Tuikov luben.tuikov@amd.com >>>>> Cc: linaro-mm-sig@lists.linaro.org >>>>> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ >>>>> include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ >>>>> 4 files changed, 42 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>> index 65df34c17264..c5386d13eb4a 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>> @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >>>>> struct amdgpu_bo *gds; >>>>> struct amdgpu_bo *gws; >>>>> struct amdgpu_bo *oa; >>>>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >>>>> int r; >>>>> >>>>> INIT_LIST_HEAD(&p->validated); >>>>> @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >>>>> >>>>> e->bo_va = amdgpu_vm_bo_find(vm, bo); >>>>> >>>>> - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { >>>>> + if (bo->tbo.base.dma_buf && >>>>> + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { >>>>> e->chain = dma_fence_chain_alloc(); >>>>> if (!e->chain) { >>>>> r = -ENOMEM; >>>>> @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >>>>> { >>>>> struct amdgpu_fpriv *fpriv = p->filp->driver_priv; >>>>> struct amdgpu_bo_list_entry *e; >>>>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >>>>> int r; >>>>> >>>>> list_for_each_entry(e, &p->validated, tv.head) { >>>>> @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >>>>> struct dma_resv *resv = bo->tbo.base.resv; >>>>> enum amdgpu_sync_mode sync_mode; >>>>> >>>>> - sync_mode = amdgpu_bo_explicit_sync(bo) ? >>>>> + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? >>>>> AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; >>>>> r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, >>>>> &fpriv->vm); >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> index c080ba15ae77..f982626b5328 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) >>>>> return 0; >>>>> } >>>>> >>>>> +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, >>>>> + struct drm_file *filp) >>>>> +{ >>>>> + struct drm_amdgpu_setparam *setparam = data; >>>>> + struct amdgpu_fpriv *fpriv = filp->driver_priv; >>>>> + >>>>> + switch (setparam->param) { >>>>> + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: >>>>> + if (setparam->value) >>>>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); >>>>> + else >>>>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); >>>>> + break; >>>>> + default: >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>> DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>> @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>> + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>> }; >>>>> >>>>> static const struct drm_driver amdgpu_kms_driver = { >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>> index ddb85a85cbba..0e8c440c6303 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>> @@ -321,6 +321,12 @@ struct amdgpu_vm { >>>>> bool bulk_moveable; >>>>> /* Flag to indicate if VM is used for compute */ >>>>> bool is_compute_context; >>>>> + /* >>>>> + * Flag to indicate whether implicit sync should always be skipped on >>>>> + * this context. We do not care about races at all, userspace is allowed >>>>> + * to shoot itself with implicit sync to its fullest liking. >>>>> + */ >>>>> + bool no_implicit_sync; >>>>> }; >>>>> >>>>> struct amdgpu_vm_manager { >>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h >>>>> index 0cbd1540aeac..9eae245c14d6 100644 >>>>> --- a/include/uapi/drm/amdgpu_drm.h >>>>> +++ b/include/uapi/drm/amdgpu_drm.h >>>>> @@ -54,6 +54,7 @@ extern "C" { >>>>> #define DRM_AMDGPU_VM 0x13 >>>>> #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 >>>>> #define DRM_AMDGPU_SCHED 0x15 >>>>> +#define DRM_AMDGPU_SETPARAM 0x16 >>>>> >>>>> #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) >>>>> #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) >>>>> @@ -71,6 +72,7 @@ extern "C" { >>>>> #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) >>>>> #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) >>>>> #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) >>>>> +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) >>>>> >>>>> /** >>>>> * DOC: memory domains >>>>> @@ -306,6 +308,14 @@ union drm_amdgpu_sched { >>>>> struct drm_amdgpu_sched_in in; >>>>> }; >>>>> >>>>> +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 >>>>> + >>>>> +struct drm_amdgpu_setparam { >>>>> + /* AMDGPU_SETPARAM_* */ >>>>> + __u32 param; >>>>> + __u32 value; >>>>> +}; >>>>> + >>>>> /* >>>>> * This is not a reliable API and you should expect it to fail for any >>>>> * number of reasons and have fallback path that do not use userptr to >>>>> -- >>>>> 2.32.0.rc2 >>>>>
-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....
On Wed, Jun 23, 2021 at 05:07:17PM +0200, Christian König wrote:
Am 23.06.21 um 17:03 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 04:58:27PM +0200, Bas Nieuwenhuizen wrote:
On Wed, Jun 23, 2021 at 4:50 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Jun 23, 2021 at 4:02 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:49 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 3:44 PM Christian König christian.koenig@amd.com wrote: > Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen: > > On Wed, Jun 23, 2021 at 2:59 PM Christian König > > christian.koenig@amd.com wrote: > > > Am 23.06.21 um 14:18 schrieb Daniel Vetter: > > > > On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen > > > > bas@basnieuwenhuizen.nl wrote: > > > > > On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote: > > > > > > WARNING: Absolutely untested beyond "gcc isn't dying in agony". > > > > > > > > > > > > Implicit fencing done properly needs to treat the implicit fencing > > > > > > slots like a funny kind of IPC mailbox. In other words it needs to be > > > > > > explicitly. This is the only way it will mesh well with explicit > > > > > > fencing userspace like vk, and it's also the bare minimum required to > > > > > > be able to manage anything else that wants to use the same buffer on > > > > > > multiple engines in parallel, and still be able to share it through > > > > > > implicit sync. > > > > > > > > > > > > amdgpu completely lacks such an uapi. Fix this. > > > > > > > > > > > > Luckily the concept of ignoring implicit fences exists already, and > > > > > > takes care of all the complexities of making sure that non-optional > > > > > > fences (like bo moves) are not ignored. This support was added in > > > > > > > > > > > > commit 177ae09b5d699a5ebd1cafcee78889db968abf54 > > > > > > Author: Andres Rodriguez andresx7@gmail.com > > > > > > Date: Fri Sep 15 20:44:06 2017 -0400 > > > > > > > > > > > > drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 > > > > > > > > > > > > Unfortuantely it's the wrong semantics, because it's a bo flag and > > > > > > disables implicit sync on an allocated buffer completely. > > > > > > > > > > > > We _do_ want implicit sync, but control it explicitly. For this we > > > > > > need a flag on the drm_file, so that a given userspace (like vulkan) > > > > > > can manage the implicit sync slots explicitly. The other side of the > > > > > > pipeline (compositor, other process or just different stage in a media > > > > > > pipeline in the same process) can then either do the same, or fully > > > > > > participate in the implicit sync as implemented by the kernel by > > > > > > default. > > > > > > > > > > > > By building on the existing flag for buffers we avoid any issues with > > > > > > opening up additional security concerns - anything this new flag here > > > > > > allows is already. > > > > > > > > > > > > All drivers which supports this concept of a userspace-specific > > > > > > opt-out of implicit sync have a flag in their CS ioctl, but in reality > > > > > > that turned out to be a bit too inflexible. See the discussion below, > > > > > > let's try to do a bit better for amdgpu. > > > > > > > > > > > > This alone only allows us to completely avoid any stalls due to > > > > > > implicit sync, it does not yet allow us to use implicit sync as a > > > > > > strange form of IPC for sync_file. > > > > > > > > > > > > For that we need two more pieces: > > > > > > > > > > > > - a way to get the current implicit sync fences out of a buffer. Could > > > > > > be done in a driver ioctl, but everyone needs this, and generally a > > > > > > dma-buf is involved anyway to establish the sharing. So an ioctl on > > > > > > the dma-buf makes a ton more sense: > > > > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... > > > > > > > > > > > > Current drivers in upstream solves this by having the opt-out flag > > > > > > on their CS ioctl. This has the downside that very often the CS > > > > > > which must actually stall for the implicit fence is run a while > > > > > > after the implicit fence point was logically sampled per the api > > > > > > spec (vk passes an explicit syncobj around for that afaiui), and so > > > > > > results in oversync. Converting the implicit sync fences into a > > > > > > snap-shot sync_file is actually accurate. > > > > > > > > > > > > - Simillar we need to be able to set the exclusive implicit fence. > > > > > > Current drivers again do this with a CS ioctl flag, with again the > > > > > > same problems that the time the CS happens additional dependencies > > > > > > have been added. An explicit ioctl to only insert a sync_file (while > > > > > > respecting the rules for how exclusive and shared fence slots must > > > > > > be update in struct dma_resv) is much better. This is proposed here: > > > > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... > > > > > > > > > > > > These three pieces together allow userspace to fully control implicit > > > > > > fencing and remove all unecessary stall points due to them. > > > > > > > > > > > > Well, as much as the implicit fencing model fundamentally allows: > > > > > > There is only one set of fences, you can only choose to sync against > > > > > > only writers (exclusive slot), or everyone. Hence suballocating > > > > > > multiple buffers or anything else like this is fundamentally not > > > > > > possible, and can only be fixed by a proper explicit fencing model. > > > > > > > > > > > > Aside from that caveat this model gets implicit fencing as closely to > > > > > > explicit fencing semantics as possible: > > > > > > > > > > > > On the actual implementation I opted for a simple setparam ioctl, no > > > > > > locking (just atomic reads/writes) for simplicity. There is a nice > > > > > > flag parameter in the VM ioctl which we could use, except: > > > > > > - it's not checked, so userspace likely passes garbage > > > > > > - there's already a comment that userspace _does_ pass garbage in the > > > > > > priority field > > > > > > So yeah unfortunately this flag parameter for setting vm flags is > > > > > > useless, and we need to hack up a new one. > > > > > > > > > > > > v2: Explain why a new SETPARAM (Jason) > > > > > > > > > > > > v3: Bas noticed I forgot to hook up the dependency-side shortcut. We > > > > > > need both, or this doesn't do much. > > > > > > > > > > > > v4: Rebase over the amdgpu patch to always set the implicit sync > > > > > > fences. > > > > > So I think there is still a case missing in this implementation. > > > > > Consider these 3 cases > > > > > > > > > > (format: a->b: b waits on a. Yes, I know arrows are hard) > > > > > > > > > > explicit->explicit: This doesn't wait now, which is good > > > > > Implicit->explicit: This doesn't wait now, which is good > > > > > explicit->implicit : This still waits as the explicit submission still > > > > > adds shared fences and most things that set an exclusive fence for > > > > > implicit sync will hence wait on it. > > > > > > > > > > This is probably good enough for what radv needs now but also sounds > > > > > like a risk wrt baking in new uapi behavior that we don't want to be > > > > > the end result. > > > > > > > > > > Within AMDGPU this is probably solvable in two ways: > > > > > > > > > > 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences. > > > > I'm not sure that works. I think the right fix is that radeonsi also > > > > switches to this model, with maybe a per-bo CS flag to set indicate > > > > write access, to cut down on the number of ioctls that are needed > > > > otherwise on shared buffers. This per-bo flag would essentially select > > > > between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis. > > > Yeah, but I'm still not entirely sure why that approach isn't sufficient? > > > > > > Problem with the per context or per vm flag is that you then don't get > > > any implicit synchronization any more when another process starts using > > > the buffer. > > That is exactly what I want for Vulkan :) > Yeah, but as far as I know this is not something we can do. > > See we have use cases like screen capture and debug which rely on that > behavior. They will keep working, if (and only if) the vulkan side sets the winsys fences correctly. Also, everything else in vulkan aside from winsys is explicitly not synced at all, you have to import drm syncobj timeline on the gl side.
> The only thing we can do is to say on a per buffer flag that a buffer > should not participate in implicit sync at all. Nah, this doesn't work. Because it's not a global decision, is a local decision for the rendered. Vulkan wants to control implicit sync explicitly, and the kernel can't force more synchronization. If a buffer is shared as a winsys buffer between vulkan client and gl using compositor, then you _have_ to use implicit sync on it. But vk needs to set the fences directly (and if the app gets it wrong, you get misrendering, but that is the specified behavour of vulkan).
Yeah, but that's exactly what we tried to avoid.
Mhm, when we attach the flag to the process/VM then this would break the use case of VA-API and Vulkan in the same process.
But I think if you attach the flag to the context that should indeed work fine.
Yeah that's a question I have, whether the drm_file is shared within one process among everything, or whether radeonsi/libva/vk each have their own. If each have their own drm_file, then we should be fine, otherwise we need to figure out another place to put this (worst case as a CS extension that vk just sets on every submit).
libdrm_amdgpu dedupes it all so we mostly end up with one drm_file per process (modulo minigbm on chromeos and modulo a master fd).
That said the current proposal is for the context right? And on the context this should pretty much work? So I'm not sure why this is the part we are discussing?
It's on the fpriv->vm, so on the FD. I assumed vulkan at least would want to have it's private VM for this. And on the quick I didn't see any other way to create a VM than to have an FD of your own.
You can't have your own FD in libdrm_amdgpu userspace. We had a pretty hard design discussion about that already.
What you could do is to load your own copy of libdrm_amdgpu, but I won't recommend that.
Just putting the flag on the context instead of the VM is much cleaner as far as I can see anyway.
Helper for the blind? If you gues expect me to move that myself ... -Daniel
Christian.
If there's something else that means "gpu context with it's own vm" then the flag would need to be moved there, pointers appreciated (but maybe someone with hw + userspace can do that quicker). -Daniel
Also yes this risks that a vk app which was violationing the winsys spec will now break, which is why I think we should do this sooner than later. Otherwise the list of w/a we might need to apply in vk userspace will become very long :-( At least since this is purely opt-in from userspace, we only need to have the w/a list in userspace, where mesa has the infrastructure for that already. -Daniel
Christian.
-Daniel
> Regards, > Christian. > > > > > The current amdgpu uapi just doesn't allow any other model without an > > > > explicit opt-in. So current implicit sync userspace just has to > > > > oversync, there's not much choice. > > > > > > > > > 2) Have an EXPLICIT fence owner that is used for explicit submissions > > > > > that is ignored by AMDGPU_SYNC_NE_OWNER. > > > > > > > > > > But this doesn't solve cross-driver interactions here. > > > > Yeah cross-driver is still entirely unsolved, because > > > > amdgpu_bo_explicit_sync() on the bo didn't solve that either. > > > Hui? You have lost me. Why is that still unsolved? > > The part we're trying to solve with this patch is Vulkan should not > > participate in any implicit sync at all wrt submissions (and then > > handle the implicit sync for WSI explicitly using the fence > > import/export stuff that Jason wrote). As long we add shared fences to > > the dma_resv we participate in implicit sync (at the level of an > > implicit sync read) still, at least from the perspective of later jobs > > waiting on these fences. > > > > > Regards, > > > Christian. > > > > > > > -Daniel > > > > > > > > > > Cc: mesa-dev@lists.freedesktop.org > > > > > > Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl > > > > > > Cc: Dave Airlie airlied@gmail.com > > > > > > Cc: Rob Clark robdclark@chromium.org > > > > > > Cc: Kristian H. Kristensen hoegsberg@google.com > > > > > > Cc: Michel Dänzer michel@daenzer.net > > > > > > Cc: Daniel Stone daniels@collabora.com > > > > > > Cc: Sumit Semwal sumit.semwal@linaro.org > > > > > > Cc: "Christian König" christian.koenig@amd.com > > > > > > Cc: Alex Deucher alexander.deucher@amd.com > > > > > > Cc: Daniel Vetter daniel.vetter@ffwll.ch > > > > > > Cc: Deepak R Varma mh12gx2825@gmail.com > > > > > > Cc: Chen Li chenli@uniontech.com > > > > > > Cc: Kevin Wang kevin1.wang@amd.com > > > > > > Cc: Dennis Li Dennis.Li@amd.com > > > > > > Cc: Luben Tuikov luben.tuikov@amd.com > > > > > > Cc: linaro-mm-sig@lists.linaro.org > > > > > > Signed-off-by: Daniel Vetter daniel.vetter@intel.com > > > > > > --- > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ > > > > > > include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ > > > > > > 4 files changed, 42 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > > index 65df34c17264..c5386d13eb4a 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > > @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, > > > > > > struct amdgpu_bo *gds; > > > > > > struct amdgpu_bo *gws; > > > > > > struct amdgpu_bo *oa; > > > > > > + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); > > > > > > int r; > > > > > > > > > > > > INIT_LIST_HEAD(&p->validated); > > > > > > @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, > > > > > > > > > > > > e->bo_va = amdgpu_vm_bo_find(vm, bo); > > > > > > > > > > > > - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { > > > > > > + if (bo->tbo.base.dma_buf && > > > > > > + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { > > > > > > e->chain = dma_fence_chain_alloc(); > > > > > > if (!e->chain) { > > > > > > r = -ENOMEM; > > > > > > @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) > > > > > > { > > > > > > struct amdgpu_fpriv *fpriv = p->filp->driver_priv; > > > > > > struct amdgpu_bo_list_entry *e; > > > > > > + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); > > > > > > int r; > > > > > > > > > > > > list_for_each_entry(e, &p->validated, tv.head) { > > > > > > @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) > > > > > > struct dma_resv *resv = bo->tbo.base.resv; > > > > > > enum amdgpu_sync_mode sync_mode; > > > > > > > > > > > > - sync_mode = amdgpu_bo_explicit_sync(bo) ? > > > > > > + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? > > > > > > AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; > > > > > > r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, > > > > > > &fpriv->vm); > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > index c080ba15ae77..f982626b5328 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) > > > > > > return 0; > > > > > > } > > > > > > > > > > > > +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, > > > > > > + struct drm_file *filp) > > > > > > +{ > > > > > > + struct drm_amdgpu_setparam *setparam = data; > > > > > > + struct amdgpu_fpriv *fpriv = filp->driver_priv; > > > > > > + > > > > > > + switch (setparam->param) { > > > > > > + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: > > > > > > + if (setparam->value) > > > > > > + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); > > > > > > + else > > > > > > + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); > > > > > > + break; > > > > > > + default: > > > > > > + return -EINVAL; > > > > > > + } > > > > > > + > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > }; > > > > > > > > > > > > static const struct drm_driver amdgpu_kms_driver = { > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > > > > > > index ddb85a85cbba..0e8c440c6303 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > > > > > > @@ -321,6 +321,12 @@ struct amdgpu_vm { > > > > > > bool bulk_moveable; > > > > > > /* Flag to indicate if VM is used for compute */ > > > > > > bool is_compute_context; > > > > > > + /* > > > > > > + * Flag to indicate whether implicit sync should always be skipped on > > > > > > + * this context. We do not care about races at all, userspace is allowed > > > > > > + * to shoot itself with implicit sync to its fullest liking. > > > > > > + */ > > > > > > + bool no_implicit_sync; > > > > > > }; > > > > > > > > > > > > struct amdgpu_vm_manager { > > > > > > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > > > > > > index 0cbd1540aeac..9eae245c14d6 100644 > > > > > > --- a/include/uapi/drm/amdgpu_drm.h > > > > > > +++ b/include/uapi/drm/amdgpu_drm.h > > > > > > @@ -54,6 +54,7 @@ extern "C" { > > > > > > #define DRM_AMDGPU_VM 0x13 > > > > > > #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 > > > > > > #define DRM_AMDGPU_SCHED 0x15 > > > > > > +#define DRM_AMDGPU_SETPARAM 0x16 > > > > > > > > > > > > #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) > > > > > > #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) > > > > > > @@ -71,6 +72,7 @@ extern "C" { > > > > > > #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) > > > > > > #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) > > > > > > #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) > > > > > > +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) > > > > > > > > > > > > /** > > > > > > * DOC: memory domains > > > > > > @@ -306,6 +308,14 @@ union drm_amdgpu_sched { > > > > > > struct drm_amdgpu_sched_in in; > > > > > > }; > > > > > > > > > > > > +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 > > > > > > + > > > > > > +struct drm_amdgpu_setparam { > > > > > > + /* AMDGPU_SETPARAM_* */ > > > > > > + __u32 param; > > > > > > + __u32 value; > > > > > > +}; > > > > > > + > > > > > > /* > > > > > > * This is not a reliable API and you should expect it to fail for any > > > > > > * number of reasons and have fallback path that do not use userptr to > > > > > > -- > > > > > > 2.32.0.rc2 > > > > > >
-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....
Am 23.06.21 um 17:12 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 05:07:17PM +0200, Christian König wrote:
Am 23.06.21 um 17:03 schrieb Daniel Vetter:
On Wed, Jun 23, 2021 at 04:58:27PM +0200, Bas Nieuwenhuizen wrote:
On Wed, Jun 23, 2021 at 4:50 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Jun 23, 2021 at 4:02 PM Christian König christian.koenig@amd.com wrote:
Am 23.06.21 um 15:49 schrieb Daniel Vetter: > On Wed, Jun 23, 2021 at 3:44 PM Christian König > christian.koenig@amd.com wrote: >> Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen: >>> On Wed, Jun 23, 2021 at 2:59 PM Christian König >>> christian.koenig@amd.com wrote: >>>> Am 23.06.21 um 14:18 schrieb Daniel Vetter: >>>>> On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen >>>>> bas@basnieuwenhuizen.nl wrote: >>>>>> On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter daniel.vetter@ffwll.ch wrote: >>>>>>> WARNING: Absolutely untested beyond "gcc isn't dying in agony". >>>>>>> >>>>>>> Implicit fencing done properly needs to treat the implicit fencing >>>>>>> slots like a funny kind of IPC mailbox. In other words it needs to be >>>>>>> explicitly. This is the only way it will mesh well with explicit >>>>>>> fencing userspace like vk, and it's also the bare minimum required to >>>>>>> be able to manage anything else that wants to use the same buffer on >>>>>>> multiple engines in parallel, and still be able to share it through >>>>>>> implicit sync. >>>>>>> >>>>>>> amdgpu completely lacks such an uapi. Fix this. >>>>>>> >>>>>>> Luckily the concept of ignoring implicit fences exists already, and >>>>>>> takes care of all the complexities of making sure that non-optional >>>>>>> fences (like bo moves) are not ignored. This support was added in >>>>>>> >>>>>>> commit 177ae09b5d699a5ebd1cafcee78889db968abf54 >>>>>>> Author: Andres Rodriguez andresx7@gmail.com >>>>>>> Date: Fri Sep 15 20:44:06 2017 -0400 >>>>>>> >>>>>>> drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 >>>>>>> >>>>>>> Unfortuantely it's the wrong semantics, because it's a bo flag and >>>>>>> disables implicit sync on an allocated buffer completely. >>>>>>> >>>>>>> We _do_ want implicit sync, but control it explicitly. For this we >>>>>>> need a flag on the drm_file, so that a given userspace (like vulkan) >>>>>>> can manage the implicit sync slots explicitly. The other side of the >>>>>>> pipeline (compositor, other process or just different stage in a media >>>>>>> pipeline in the same process) can then either do the same, or fully >>>>>>> participate in the implicit sync as implemented by the kernel by >>>>>>> default. >>>>>>> >>>>>>> By building on the existing flag for buffers we avoid any issues with >>>>>>> opening up additional security concerns - anything this new flag here >>>>>>> allows is already. >>>>>>> >>>>>>> All drivers which supports this concept of a userspace-specific >>>>>>> opt-out of implicit sync have a flag in their CS ioctl, but in reality >>>>>>> that turned out to be a bit too inflexible. See the discussion below, >>>>>>> let's try to do a bit better for amdgpu. >>>>>>> >>>>>>> This alone only allows us to completely avoid any stalls due to >>>>>>> implicit sync, it does not yet allow us to use implicit sync as a >>>>>>> strange form of IPC for sync_file. >>>>>>> >>>>>>> For that we need two more pieces: >>>>>>> >>>>>>> - a way to get the current implicit sync fences out of a buffer. Could >>>>>>> be done in a driver ioctl, but everyone needs this, and generally a >>>>>>> dma-buf is involved anyway to establish the sharing. So an ioctl on >>>>>>> the dma-buf makes a ton more sense: >>>>>>> >>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >>>>>>> >>>>>>> Current drivers in upstream solves this by having the opt-out flag >>>>>>> on their CS ioctl. This has the downside that very often the CS >>>>>>> which must actually stall for the implicit fence is run a while >>>>>>> after the implicit fence point was logically sampled per the api >>>>>>> spec (vk passes an explicit syncobj around for that afaiui), and so >>>>>>> results in oversync. Converting the implicit sync fences into a >>>>>>> snap-shot sync_file is actually accurate. >>>>>>> >>>>>>> - Simillar we need to be able to set the exclusive implicit fence. >>>>>>> Current drivers again do this with a CS ioctl flag, with again the >>>>>>> same problems that the time the CS happens additional dependencies >>>>>>> have been added. An explicit ioctl to only insert a sync_file (while >>>>>>> respecting the rules for how exclusive and shared fence slots must >>>>>>> be update in struct dma_resv) is much better. This is proposed here: >>>>>>> >>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kerne... >>>>>>> >>>>>>> These three pieces together allow userspace to fully control implicit >>>>>>> fencing and remove all unecessary stall points due to them. >>>>>>> >>>>>>> Well, as much as the implicit fencing model fundamentally allows: >>>>>>> There is only one set of fences, you can only choose to sync against >>>>>>> only writers (exclusive slot), or everyone. Hence suballocating >>>>>>> multiple buffers or anything else like this is fundamentally not >>>>>>> possible, and can only be fixed by a proper explicit fencing model. >>>>>>> >>>>>>> Aside from that caveat this model gets implicit fencing as closely to >>>>>>> explicit fencing semantics as possible: >>>>>>> >>>>>>> On the actual implementation I opted for a simple setparam ioctl, no >>>>>>> locking (just atomic reads/writes) for simplicity. There is a nice >>>>>>> flag parameter in the VM ioctl which we could use, except: >>>>>>> - it's not checked, so userspace likely passes garbage >>>>>>> - there's already a comment that userspace _does_ pass garbage in the >>>>>>> priority field >>>>>>> So yeah unfortunately this flag parameter for setting vm flags is >>>>>>> useless, and we need to hack up a new one. >>>>>>> >>>>>>> v2: Explain why a new SETPARAM (Jason) >>>>>>> >>>>>>> v3: Bas noticed I forgot to hook up the dependency-side shortcut. We >>>>>>> need both, or this doesn't do much. >>>>>>> >>>>>>> v4: Rebase over the amdgpu patch to always set the implicit sync >>>>>>> fences. >>>>>> So I think there is still a case missing in this implementation. >>>>>> Consider these 3 cases >>>>>> >>>>>> (format: a->b: b waits on a. Yes, I know arrows are hard) >>>>>> >>>>>> explicit->explicit: This doesn't wait now, which is good >>>>>> Implicit->explicit: This doesn't wait now, which is good >>>>>> explicit->implicit : This still waits as the explicit submission still >>>>>> adds shared fences and most things that set an exclusive fence for >>>>>> implicit sync will hence wait on it. >>>>>> >>>>>> This is probably good enough for what radv needs now but also sounds >>>>>> like a risk wrt baking in new uapi behavior that we don't want to be >>>>>> the end result. >>>>>> >>>>>> Within AMDGPU this is probably solvable in two ways: >>>>>> >>>>>> 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences. >>>>> I'm not sure that works. I think the right fix is that radeonsi also >>>>> switches to this model, with maybe a per-bo CS flag to set indicate >>>>> write access, to cut down on the number of ioctls that are needed >>>>> otherwise on shared buffers. This per-bo flag would essentially select >>>>> between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis. >>>> Yeah, but I'm still not entirely sure why that approach isn't sufficient? >>>> >>>> Problem with the per context or per vm flag is that you then don't get >>>> any implicit synchronization any more when another process starts using >>>> the buffer. >>> That is exactly what I want for Vulkan :) >> Yeah, but as far as I know this is not something we can do. >> >> See we have use cases like screen capture and debug which rely on that >> behavior. > They will keep working, if (and only if) the vulkan side sets the > winsys fences correctly. Also, everything else in vulkan aside from > winsys is explicitly not synced at all, you have to import drm syncobj > timeline on the gl side. > >> The only thing we can do is to say on a per buffer flag that a buffer >> should not participate in implicit sync at all. > Nah, this doesn't work. Because it's not a global decision, is a local > decision for the rendered. Vulkan wants to control implicit sync > explicitly, and the kernel can't force more synchronization. If a > buffer is shared as a winsys buffer between vulkan client and gl using > compositor, then you _have_ to use implicit sync on it. But vk needs > to set the fences directly (and if the app gets it wrong, you get > misrendering, but that is the specified behavour of vulkan). Yeah, but that's exactly what we tried to avoid.
Mhm, when we attach the flag to the process/VM then this would break the use case of VA-API and Vulkan in the same process.
But I think if you attach the flag to the context that should indeed work fine.
Yeah that's a question I have, whether the drm_file is shared within one process among everything, or whether radeonsi/libva/vk each have their own. If each have their own drm_file, then we should be fine, otherwise we need to figure out another place to put this (worst case as a CS extension that vk just sets on every submit).
libdrm_amdgpu dedupes it all so we mostly end up with one drm_file per process (modulo minigbm on chromeos and modulo a master fd).
That said the current proposal is for the context right? And on the context this should pretty much work? So I'm not sure why this is the part we are discussing?
It's on the fpriv->vm, so on the FD. I assumed vulkan at least would want to have it's private VM for this. And on the quick I didn't see any other way to create a VM than to have an FD of your own.
You can't have your own FD in libdrm_amdgpu userspace. We had a pretty hard design discussion about that already.
What you could do is to load your own copy of libdrm_amdgpu, but I won't recommend that.
Just putting the flag on the context instead of the VM is much cleaner as far as I can see anyway.
Helper for the blind? If you gues expect me to move that myself ...
Add the flag to struct amdgpu_ctx, you can use amdgpu_ctx_ioctl() to set it. Then during CS that is available as p->ctx.
If I'm not totally mistaken that is also what Bas had in mind with his comment.
Christian.
-Daniel
Christian.
If there's something else that means "gpu context with it's own vm" then the flag would need to be moved there, pointers appreciated (but maybe someone with hw + userspace can do that quicker). -Daniel
Also yes this risks that a vk app which was violationing the winsys spec will now break, which is why I think we should do this sooner than later. Otherwise the list of w/a we might need to apply in vk userspace will become very long :-( At least since this is purely opt-in from userspace, we only need to have the w/a list in userspace, where mesa has the infrastructure for that already. -Daniel
Christian.
> -Daniel > >> Regards, >> Christian. >> >>>>> The current amdgpu uapi just doesn't allow any other model without an >>>>> explicit opt-in. So current implicit sync userspace just has to >>>>> oversync, there's not much choice. >>>>> >>>>>> 2) Have an EXPLICIT fence owner that is used for explicit submissions >>>>>> that is ignored by AMDGPU_SYNC_NE_OWNER. >>>>>> >>>>>> But this doesn't solve cross-driver interactions here. >>>>> Yeah cross-driver is still entirely unsolved, because >>>>> amdgpu_bo_explicit_sync() on the bo didn't solve that either. >>>> Hui? You have lost me. Why is that still unsolved? >>> The part we're trying to solve with this patch is Vulkan should not >>> participate in any implicit sync at all wrt submissions (and then >>> handle the implicit sync for WSI explicitly using the fence >>> import/export stuff that Jason wrote). As long we add shared fences to >>> the dma_resv we participate in implicit sync (at the level of an >>> implicit sync read) still, at least from the perspective of later jobs >>> waiting on these fences. >>> >>>> Regards, >>>> Christian. >>>> >>>>> -Daniel >>>>> >>>>>>> Cc: mesa-dev@lists.freedesktop.org >>>>>>> Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl >>>>>>> Cc: Dave Airlie airlied@gmail.com >>>>>>> Cc: Rob Clark robdclark@chromium.org >>>>>>> Cc: Kristian H. Kristensen hoegsberg@google.com >>>>>>> Cc: Michel Dänzer michel@daenzer.net >>>>>>> Cc: Daniel Stone daniels@collabora.com >>>>>>> Cc: Sumit Semwal sumit.semwal@linaro.org >>>>>>> Cc: "Christian König" christian.koenig@amd.com >>>>>>> Cc: Alex Deucher alexander.deucher@amd.com >>>>>>> Cc: Daniel Vetter daniel.vetter@ffwll.ch >>>>>>> Cc: Deepak R Varma mh12gx2825@gmail.com >>>>>>> Cc: Chen Li chenli@uniontech.com >>>>>>> Cc: Kevin Wang kevin1.wang@amd.com >>>>>>> Cc: Dennis Li Dennis.Li@amd.com >>>>>>> Cc: Luben Tuikov luben.tuikov@amd.com >>>>>>> Cc: linaro-mm-sig@lists.linaro.org >>>>>>> Signed-off-by: Daniel Vetter daniel.vetter@intel.com >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ >>>>>>> include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ >>>>>>> 4 files changed, 42 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>>>> index 65df34c17264..c5386d13eb4a 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>>>> @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >>>>>>> struct amdgpu_bo *gds; >>>>>>> struct amdgpu_bo *gws; >>>>>>> struct amdgpu_bo *oa; >>>>>>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >>>>>>> int r; >>>>>>> >>>>>>> INIT_LIST_HEAD(&p->validated); >>>>>>> @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, >>>>>>> >>>>>>> e->bo_va = amdgpu_vm_bo_find(vm, bo); >>>>>>> >>>>>>> - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { >>>>>>> + if (bo->tbo.base.dma_buf && >>>>>>> + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { >>>>>>> e->chain = dma_fence_chain_alloc(); >>>>>>> if (!e->chain) { >>>>>>> r = -ENOMEM; >>>>>>> @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >>>>>>> { >>>>>>> struct amdgpu_fpriv *fpriv = p->filp->driver_priv; >>>>>>> struct amdgpu_bo_list_entry *e; >>>>>>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); >>>>>>> int r; >>>>>>> >>>>>>> list_for_each_entry(e, &p->validated, tv.head) { >>>>>>> @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) >>>>>>> struct dma_resv *resv = bo->tbo.base.resv; >>>>>>> enum amdgpu_sync_mode sync_mode; >>>>>>> >>>>>>> - sync_mode = amdgpu_bo_explicit_sync(bo) ? >>>>>>> + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? >>>>>>> AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; >>>>>>> r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, >>>>>>> &fpriv->vm); >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>>> index c080ba15ae77..f982626b5328 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>>> @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, >>>>>>> + struct drm_file *filp) >>>>>>> +{ >>>>>>> + struct drm_amdgpu_setparam *setparam = data; >>>>>>> + struct amdgpu_fpriv *fpriv = filp->driver_priv; >>>>>>> + >>>>>>> + switch (setparam->param) { >>>>>>> + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: >>>>>>> + if (setparam->value) >>>>>>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); >>>>>>> + else >>>>>>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); >>>>>>> + break; >>>>>>> + default: >>>>>>> + return -EINVAL; >>>>>>> + } >>>>>>> + >>>>>>> + return 0; >>>>>>> +} >>>>>>> + >>>>>>> const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>>>> @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>>>> + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), >>>>>>> }; >>>>>>> >>>>>>> static const struct drm_driver amdgpu_kms_driver = { >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>>>> index ddb85a85cbba..0e8c440c6303 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>>>> @@ -321,6 +321,12 @@ struct amdgpu_vm { >>>>>>> bool bulk_moveable; >>>>>>> /* Flag to indicate if VM is used for compute */ >>>>>>> bool is_compute_context; >>>>>>> + /* >>>>>>> + * Flag to indicate whether implicit sync should always be skipped on >>>>>>> + * this context. We do not care about races at all, userspace is allowed >>>>>>> + * to shoot itself with implicit sync to its fullest liking. >>>>>>> + */ >>>>>>> + bool no_implicit_sync; >>>>>>> }; >>>>>>> >>>>>>> struct amdgpu_vm_manager { >>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h >>>>>>> index 0cbd1540aeac..9eae245c14d6 100644 >>>>>>> --- a/include/uapi/drm/amdgpu_drm.h >>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h >>>>>>> @@ -54,6 +54,7 @@ extern "C" { >>>>>>> #define DRM_AMDGPU_VM 0x13 >>>>>>> #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 >>>>>>> #define DRM_AMDGPU_SCHED 0x15 >>>>>>> +#define DRM_AMDGPU_SETPARAM 0x16 >>>>>>> >>>>>>> #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) >>>>>>> #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) >>>>>>> @@ -71,6 +72,7 @@ extern "C" { >>>>>>> #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) >>>>>>> #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) >>>>>>> #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) >>>>>>> +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) >>>>>>> >>>>>>> /** >>>>>>> * DOC: memory domains >>>>>>> @@ -306,6 +308,14 @@ union drm_amdgpu_sched { >>>>>>> struct drm_amdgpu_sched_in in; >>>>>>> }; >>>>>>> >>>>>>> +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 >>>>>>> + >>>>>>> +struct drm_amdgpu_setparam { >>>>>>> + /* AMDGPU_SETPARAM_* */ >>>>>>> + __u32 param; >>>>>>> + __u32 value; >>>>>>> +}; >>>>>>> + >>>>>>> /* >>>>>>> * This is not a reliable API and you should expect it to fail for any >>>>>>> * number of reasons and have fallback path that do not use userptr to >>>>>>> -- >>>>>>> 2.32.0.rc2 >>>>>>>
-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....
dri-devel@lists.freedesktop.org