Rework of my previous patchset which added support for GEM buffers backed by non-coherent memory to the ingenic-drm driver.
Having GEM buffers backed by non-coherent memory is interesting in the particular case where it is faster to render to a non-coherent buffer then sync the data cache, than to render to a write-combine buffer, and (by extension) much faster than using a shadow buffer. This is true for instance on some Ingenic SoCs, where even simple blits (e.g. memcpy) are about three times faster using this method.
For the record, the previous patchset was accepted for 5.10 then had to be reverted, as it conflicted with some changes made to the DMA API.
This new patchset is pretty different as it adds the functionality to the DRM core. The first three patches add variants to existing functions but with the "non-coherent memory" twist, exported as GPL symbols. The fourth patch adds a function to be used with the damage helpers. Finally, the last patch adds support for non-coherent GEM buffers to the ingenic-drm driver. The functionality is enabled through a module parameter, and is disabled by default.
Cheers, -Paul
Paul Cercueil (5): drm: Add and export function drm_gem_cma_create_noncoherent drm: Add and export function drm_gem_cma_dumb_create_noncoherent drm: Add and export function drm_gem_cma_mmap_noncoherent drm: Add and export function drm_gem_cma_sync_data drm/ingenic: Add option to alloc cached GEM buffers
drivers/gpu/drm/drm_gem_cma_helper.c | 223 +++++++++++++++++++--- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 + drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 +- include/drm/drm_gem_cma_helper.h | 13 ++ 5 files changed, 273 insertions(+), 30 deletions(-)
This function can be used by drivers that need to create a GEM object with non-coherent backing memory.
Creating non-coherent CMA objects is useful on architectures where writing to a buffer with the non-coherent cache attribute set then invalidating the cache is faster than writing to the same buffer with the write-combine cache attribute set. This is the case for instance on some Ingenic SoCs.
v2: Add inline doc about why we need this, and improve commit message
Signed-off-by: Paul Cercueil paul@crapouillou.net --- drivers/gpu/drm/drm_gem_cma_helper.c | 76 +++++++++++++++++++++------- include/drm/drm_gem_cma_helper.h | 2 + 2 files changed, 61 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index 7942cf05cd93..917b092b23c2 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -90,21 +90,10 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size) return ERR_PTR(ret); }
-/** - * drm_gem_cma_create - allocate an object with the given size - * @drm: DRM device - * @size: size of the object to allocate - * - * This function creates a CMA GEM object and allocates a contiguous chunk of - * memory as backing store. The backing memory has the writecombine attribute - * set. - * - * Returns: - * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative - * error code on failure. - */ -struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm, - size_t size) +static struct drm_gem_cma_object * +drm_gem_cma_create_with_cache_param(struct drm_device *drm, + size_t size, + bool noncoherent) { struct drm_gem_cma_object *cma_obj; int ret; @@ -115,8 +104,16 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm, if (IS_ERR(cma_obj)) return cma_obj;
- cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr, - GFP_KERNEL | __GFP_NOWARN); + if (noncoherent) { + cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size, + &cma_obj->paddr, + DMA_TO_DEVICE, + GFP_KERNEL | __GFP_NOWARN); + + } else { + cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr, + GFP_KERNEL | __GFP_NOWARN); + } if (!cma_obj->vaddr) { drm_dbg(drm, "failed to allocate buffer with size %zu\n", size); @@ -130,6 +127,51 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm, drm_gem_object_put(&cma_obj->base); return ERR_PTR(ret); } + +/** + * drm_gem_cma_create_noncoherent - allocate an object with the given size + * and non-coherent cache attribute + * @drm: DRM device + * @size: size of the object to allocate + * + * This function creates a CMA GEM object and allocates a contiguous chunk of + * memory as backing store. The backing memory has the noncoherent attribute + * set. + * + * Creating non-coherent CMA objects is useful on architectures where writing + * to a buffer with the non-coherent cache attribute set then invalidating the + * cache is faster than writing to the same buffer with the write-combine cache + * attribute set. This is the case for instance on some Ingenic SoCs. + * + * Returns: + * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative + * error code on failure. + */ +struct drm_gem_cma_object * +drm_gem_cma_create_noncoherent(struct drm_device *drm, size_t size) +{ + return drm_gem_cma_create_with_cache_param(drm, size, true); +} +EXPORT_SYMBOL_GPL(drm_gem_cma_create_noncoherent); + +/** + * drm_gem_cma_create - allocate an object with the given size + * @drm: DRM device + * @size: size of the object to allocate + * + * This function creates a CMA GEM object and allocates a contiguous chunk of + * memory as backing store. The backing memory has the writecombine attribute + * set. + * + * Returns: + * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative + * error code on failure. + */ +struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm, + size_t size) +{ + return drm_gem_cma_create_with_cache_param(drm, size, false); +} EXPORT_SYMBOL_GPL(drm_gem_cma_create);
/** diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h index 0a9711caa3e8..360771f5f485 100644 --- a/include/drm/drm_gem_cma_helper.h +++ b/include/drm/drm_gem_cma_helper.h @@ -79,6 +79,8 @@ int drm_gem_cma_dumb_create(struct drm_file *file_priv, /* allocate physical memory */ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm, size_t size); +struct drm_gem_cma_object * +drm_gem_cma_create_noncoherent(struct drm_device *drm, size_t size);
extern const struct vm_operations_struct drm_gem_cma_vm_ops;
This function can be used by drivers to create dumb buffers with non-coherent backing memory.
Signed-off-by: Paul Cercueil paul@crapouillou.net --- drivers/gpu/drm/drm_gem_cma_helper.c | 37 +++++++++++++++++++++++++--- include/drm/drm_gem_cma_helper.h | 5 ++++ 2 files changed, 38 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index 917b092b23c2..d100c5f9c140 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -181,6 +181,7 @@ EXPORT_SYMBOL_GPL(drm_gem_cma_create); * @drm: DRM device * @size: size of the object to allocate * @handle: return location for the GEM handle + * @noncoherent: allocate object with non-coherent cache attribute * * This function creates a CMA GEM object, allocating a physically contiguous * chunk of memory as backing store. The GEM object is then added to the list @@ -193,13 +194,13 @@ EXPORT_SYMBOL_GPL(drm_gem_cma_create); static struct drm_gem_cma_object * drm_gem_cma_create_with_handle(struct drm_file *file_priv, struct drm_device *drm, size_t size, - uint32_t *handle) + uint32_t *handle, bool noncoherent) { struct drm_gem_cma_object *cma_obj; struct drm_gem_object *gem_obj; int ret;
- cma_obj = drm_gem_cma_create(drm, size); + cma_obj = drm_gem_cma_create_with_cache_param(drm, size, noncoherent); if (IS_ERR(cma_obj)) return cma_obj;
@@ -276,7 +277,7 @@ int drm_gem_cma_dumb_create_internal(struct drm_file *file_priv, args->size = args->pitch * args->height;
cma_obj = drm_gem_cma_create_with_handle(file_priv, drm, args->size, - &args->handle); + &args->handle, false); return PTR_ERR_OR_ZERO(cma_obj); } EXPORT_SYMBOL_GPL(drm_gem_cma_dumb_create_internal); @@ -309,11 +310,39 @@ int drm_gem_cma_dumb_create(struct drm_file *file_priv, args->size = args->pitch * args->height;
cma_obj = drm_gem_cma_create_with_handle(file_priv, drm, args->size, - &args->handle); + &args->handle, false); return PTR_ERR_OR_ZERO(cma_obj); } EXPORT_SYMBOL_GPL(drm_gem_cma_dumb_create);
+/** + * drm_gem_cma_dumb_create_noncoherent - create a dumb buffer object with + * non-coherent cache attribute + * @file_priv: DRM file-private structure to create the dumb buffer for + * @drm: DRM device + * @args: IOCTL data + * + * Same as drm_gem_cma_dumb_create, but the dumb buffer object created has + * the non-coherent cache attribute set. + * + * Returns: + * 0 on success or a negative error code on failure. + */ +int drm_gem_cma_dumb_create_noncoherent(struct drm_file *file_priv, + struct drm_device *drm, + struct drm_mode_create_dumb *args) +{ + struct drm_gem_cma_object *cma_obj; + + args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8); + args->size = args->pitch * args->height; + + cma_obj = drm_gem_cma_create_with_handle(file_priv, drm, args->size, + &args->handle, true); + return PTR_ERR_OR_ZERO(cma_obj); +} +EXPORT_SYMBOL_GPL(drm_gem_cma_dumb_create_noncoherent); + const struct vm_operations_struct drm_gem_cma_vm_ops = { .open = drm_gem_vm_open, .close = drm_gem_vm_close, diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h index 360771f5f485..6b44e7492a63 100644 --- a/include/drm/drm_gem_cma_helper.h +++ b/include/drm/drm_gem_cma_helper.h @@ -76,6 +76,11 @@ int drm_gem_cma_dumb_create(struct drm_file *file_priv, struct drm_device *drm, struct drm_mode_create_dumb *args);
+/* create non-coherent memory region for DRM framebuffer */ +int drm_gem_cma_dumb_create_noncoherent(struct drm_file *file_priv, + struct drm_device *drm, + struct drm_mode_create_dumb *args); + /* allocate physical memory */ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm, size_t size);
This function can be used by drivers that need to mmap dumb buffers created with non-coherent backing memory.
v2: Use dma_to_phys() since cma_obj->paddr isn't a phys_addr_t but a dma_addr_t.
Signed-off-by: Paul Cercueil paul@crapouillou.net --- drivers/gpu/drm/drm_gem_cma_helper.c | 67 +++++++++++++++++++++++++--- include/drm/drm_gem_cma_helper.h | 1 + 2 files changed, 63 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index d100c5f9c140..e39b0464e19d 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -10,6 +10,7 @@ */
#include <linux/dma-buf.h> +#include <linux/dma-direct.h> #include <linux/dma-mapping.h> #include <linux/export.h> #include <linux/mm.h> @@ -42,10 +43,20 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = { .vm_ops = &drm_gem_cma_vm_ops, };
+static const struct drm_gem_object_funcs drm_gem_cma_noncoherent_funcs = { + .free = drm_gem_cma_free_object, + .print_info = drm_gem_cma_print_info, + .get_sg_table = drm_gem_cma_get_sg_table, + .vmap = drm_gem_cma_vmap, + .mmap = drm_gem_cma_mmap_noncoherent, + .vm_ops = &drm_gem_cma_vm_ops, +}; + /** * __drm_gem_cma_create - Create a GEM CMA object without allocating memory * @drm: DRM device * @size: size of the object to allocate + * @noncoherent: if true, will use non-coherent backed memory * * This function creates and initializes a GEM CMA object of the given size, * but doesn't allocate any memory to back the object. @@ -55,7 +66,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = { * error code on failure. */ static struct drm_gem_cma_object * -__drm_gem_cma_create(struct drm_device *drm, size_t size) +__drm_gem_cma_create(struct drm_device *drm, size_t size, bool noncoherent) { struct drm_gem_cma_object *cma_obj; struct drm_gem_object *gem_obj; @@ -68,8 +79,12 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size) if (!gem_obj) return ERR_PTR(-ENOMEM);
- if (!gem_obj->funcs) - gem_obj->funcs = &drm_gem_cma_default_funcs; + if (!gem_obj->funcs) { + if (noncoherent) + gem_obj->funcs = &drm_gem_cma_noncoherent_funcs; + else + gem_obj->funcs = &drm_gem_cma_default_funcs; + }
cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
@@ -100,7 +115,7 @@ drm_gem_cma_create_with_cache_param(struct drm_device *drm,
size = round_up(size, PAGE_SIZE);
- cma_obj = __drm_gem_cma_create(drm, size); + cma_obj = __drm_gem_cma_create(drm, size, noncoherent); if (IS_ERR(cma_obj)) return cma_obj;
@@ -503,7 +518,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev, return ERR_PTR(-EINVAL);
/* Create a CMA GEM buffer. */ - cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size); + cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, false); if (IS_ERR(cma_obj)) return ERR_CAST(cma_obj);
@@ -579,6 +594,48 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma) } EXPORT_SYMBOL_GPL(drm_gem_cma_mmap);
+/** + * drm_gem_cma_mmap_noncoherent - memory-map a CMA GEM object with + * non-coherent cache attribute + * @filp: file object + * @vma: VMA for the area to be mapped + * + * Just like drm_gem_cma_mmap, but for a GEM object backed by non-coherent + * memory. + * + * Returns: + * 0 on success or a negative error code on failure. + */ +int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj, + struct vm_area_struct *vma) +{ + struct drm_gem_cma_object *cma_obj; + unsigned long pfn; + int ret; + + /* + * Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and set the + * vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want to map + * the whole buffer. + */ + vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node); + vma->vm_flags &= ~VM_PFNMAP; + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); + + cma_obj = to_drm_gem_cma_obj(obj); + + pfn = PHYS_PFN(dma_to_phys(cma_obj->base.dev->dev, cma_obj->paddr)); + + ret = remap_pfn_range(vma, vma->vm_start, pfn, + vma->vm_end - vma->vm_start, + vma->vm_page_prot); + if (ret) + drm_gem_vm_close(vma); + + return ret; +} +EXPORT_SYMBOL_GPL(drm_gem_cma_mmap_noncoherent); + /** * drm_gem_cma_prime_import_sg_table_vmap - PRIME import another driver's * scatter/gather table and get the virtual address of the buffer diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h index 6b44e7492a63..6a3f7e1312cc 100644 --- a/include/drm/drm_gem_cma_helper.h +++ b/include/drm/drm_gem_cma_helper.h @@ -107,6 +107,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev, struct sg_table *sgt); int drm_gem_cma_vmap(struct drm_gem_object *obj, struct dma_buf_map *map); int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma); +int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj, struct vm_area_struct *vma);
/** * DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE - CMA GEM driver operations
This function can be used by drivers that use damage clips and have CMA GEM objects backed by non-coherent memory. Calling this function in a plane's .atomic_update ensures that all the data in the backing memory have been written to RAM.
Signed-off-by: Paul Cercueil paul@crapouillou.net --- drivers/gpu/drm/drm_gem_cma_helper.c | 43 ++++++++++++++++++++++++++++ include/drm/drm_gem_cma_helper.h | 5 ++++ 2 files changed, 48 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index e39b0464e19d..fdae54a18670 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -18,9 +18,14 @@ #include <linux/slab.h>
#include <drm/drm.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_device.h> #include <drm/drm_drv.h> +#include <drm/drm_fourcc.h> +#include <drm/drm_fb_cma_helper.h> +#include <drm/drm_framebuffer.h> #include <drm/drm_gem_cma_helper.h> +#include <drm/drm_plane.h> #include <drm/drm_vma_manager.h>
/** @@ -684,3 +689,41 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev, return obj; } EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap); + +/** + * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory + * @dev: DRM device + * @old_state: Old plane state + * @state: New plane state + * + * This function can be used by drivers that use damage clips and have + * CMA GEM objects backed by non-coherent memory. Calling this function + * in a plane's .atomic_update ensures that all the data in the backing + * memory have been written to RAM. + */ +void drm_gem_cma_sync_data(struct device *dev, + struct drm_plane_state *old_state, + struct drm_plane_state *state) +{ + const struct drm_format_info *finfo = state->fb->format; + struct drm_atomic_helper_damage_iter iter; + unsigned int offset, i; + struct drm_rect clip; + dma_addr_t daddr; + + drm_atomic_helper_damage_iter_init(&iter, old_state, state); + + drm_atomic_for_each_plane_damage(&iter, &clip) { + for (i = 0; i < finfo->num_planes; i++) { + daddr = drm_fb_cma_get_gem_addr(state->fb, state, i); + + /* Ignore x1/x2 values, invalidate complete lines */ + offset = clip.y1 * state->fb->pitches[i]; + + dma_sync_single_for_device(dev, daddr + offset, + (clip.y2 - clip.y1) * state->fb->pitches[i], + DMA_TO_DEVICE); + } + } +} +EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data); diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h index 6a3f7e1312cc..cdd3fb456916 100644 --- a/include/drm/drm_gem_cma_helper.h +++ b/include/drm/drm_gem_cma_helper.h @@ -7,6 +7,7 @@ #include <drm/drm_gem.h>
struct drm_mode_create_dumb; +struct drm_plane_state;
/** * struct drm_gem_cma_object - GEM object backed by CMA memory allocations @@ -190,4 +191,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm, struct dma_buf_attachment *attach, struct sg_table *sgt);
+void drm_gem_cma_sync_data(struct device *dev, + struct drm_plane_state *old_state, + struct drm_plane_state *state); + #endif /* __DRM_GEM_CMA_HELPER_H__ */
With the module parameter ingenic-drm.cached_gem_buffers, it is possible to specify that we want GEM buffers backed by non-coherent memory.
This dramatically speeds up software rendering on Ingenic SoCs, even for tasks where write-combine memory should in theory be faster (e.g. simple blits).
Leave it disabled by default, since it is specific to one use-case (software rendering).
v2: Rework code to work with new DRM APIs regarding plane states
Signed-off-by: Paul Cercueil paul@crapouillou.net --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++++++++++++++++++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 ++ drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 ++++++- 3 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index d60e1eefc9d1..ba1ac0fcda74 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -9,6 +9,7 @@ #include <linux/component.h> #include <linux/clk.h> #include <linux/dma-mapping.h> +#include <linux/io.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/of_device.h> @@ -23,6 +24,7 @@ #include <drm/drm_color_mgmt.h> #include <drm/drm_crtc.h> #include <drm/drm_crtc_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_gem_cma_helper.h> #include <drm/drm_fb_cma_helper.h> @@ -99,6 +101,11 @@ struct ingenic_drm { struct notifier_block clock_nb; };
+static bool ingenic_drm_cached_gem_buf; +module_param_named(cached_gem_buffers, ingenic_drm_cached_gem_buf, bool, 0400); +MODULE_PARM_DESC(cached_gem_buffers, + "Enable fully cached GEM buffers [default=false]"); + static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg) { switch (reg) { @@ -410,6 +417,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, old_plane_state->fb->format->format != new_plane_state->fb->format->format)) crtc_state->mode_changed = true;
+ drm_atomic_helper_check_plane_damage(state, new_plane_state); + return 0; }
@@ -541,10 +550,20 @@ static void ingenic_drm_update_palette(struct ingenic_drm *priv, } }
+void ingenic_drm_sync_data(struct device *dev, + struct drm_plane_state *old_state, + struct drm_plane_state *state) +{ + if (ingenic_drm_cached_gem_buf) + drm_gem_cma_sync_data(dev, old_state, state); +} + static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_drm *priv = drm_device_get_priv(plane->dev); + struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, + plane); struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane); struct drm_crtc_state *crtc_state; @@ -554,6 +573,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, u32 fourcc;
if (newstate && newstate->fb) { + ingenic_drm_sync_data(priv->dev, oldstate, newstate); + crtc_state = newstate->crtc->state;
addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0); @@ -743,6 +764,26 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc) regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0); }
+static struct drm_framebuffer * +ingenic_drm_gem_fb_create(struct drm_device *dev, struct drm_file *file, + const struct drm_mode_fb_cmd2 *mode_cmd) +{ + if (ingenic_drm_cached_gem_buf) + return drm_gem_fb_create_with_dirty(dev, file, mode_cmd); + + return drm_gem_fb_create(dev, file, mode_cmd); +} + +static int ingenic_drm_gem_cma_dumb_create(struct drm_file *file_priv, + struct drm_device *drm, + struct drm_mode_create_dumb *args) +{ + if (ingenic_drm_cached_gem_buf) + return drm_gem_cma_dumb_create_noncoherent(file_priv, drm, args); + + return drm_gem_cma_dumb_create(file_priv, drm, args); +} + DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
static const struct drm_driver ingenic_drm_driver_data = { @@ -755,7 +796,7 @@ static const struct drm_driver ingenic_drm_driver_data = { .patchlevel = 0,
.fops = &ingenic_drm_fops, - DRM_GEM_CMA_DRIVER_OPS, + DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(ingenic_drm_gem_cma_dumb_create),
.irq_handler = ingenic_drm_irq_handler, }; @@ -805,7 +846,7 @@ static const struct drm_encoder_helper_funcs ingenic_drm_encoder_helper_funcs = };
static const struct drm_mode_config_funcs ingenic_drm_mode_config_funcs = { - .fb_create = drm_gem_fb_create, + .fb_create = ingenic_drm_gem_fb_create, .output_poll_changed = drm_fb_helper_output_poll_changed, .atomic_check = drm_atomic_helper_check, .atomic_commit = drm_atomic_helper_commit, @@ -962,6 +1003,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
+ drm_plane_enable_fb_damage_clips(&priv->f1); + drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary, @@ -990,6 +1033,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
+ drm_plane_enable_fb_damage_clips(&priv->f0); + if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) { ret = component_bind_all(dev, drm); if (ret) { diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h index 1b4347f7f084..b6bca356e024 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm.h +++ b/drivers/gpu/drm/ingenic/ingenic-drm.h @@ -185,6 +185,10 @@ void ingenic_drm_plane_config(struct device *dev, struct drm_plane *plane, u32 fourcc); void ingenic_drm_plane_disable(struct device *dev, struct drm_plane *plane);
+void ingenic_drm_sync_data(struct device *dev, + struct drm_plane_state *old_state, + struct drm_plane_state *state); + extern struct platform_driver *ingenic_ipu_driver_ptr;
#endif /* DRIVERS_GPU_DRM_INGENIC_INGENIC_DRM_H */ diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c index 5ae6adab8306..7826eab044ba 100644 --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c @@ -20,6 +20,7 @@
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fb_cma_helper.h> #include <drm/drm_fourcc.h> @@ -285,6 +286,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane); + struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, + plane); struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane); const struct drm_format_info *finfo; @@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL); }
+ ingenic_drm_sync_data(ipu->master, oldstate, newstate); + /* New addresses will be committed in vblank handler... */ ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0); if (finfo->num_planes > 1) @@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
if (!new_plane_state->crtc || !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay) - return 0; + goto out_check_damage;
/* Plane must be fully visible */ if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 || @@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, return -EINVAL;
if (!osd_changed(new_plane_state, old_plane_state)) - return 0; + goto out_check_damage;
crtc_state->mode_changed = true;
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, ipu->denom_w = denom_w; ipu->denom_h = denom_h;
+out_check_damage: + drm_atomic_helper_check_plane_damage(state, new_plane_state); + return 0; }
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d) return err; }
+ drm_plane_enable_fb_damage_clips(plane); + /* * Sharpness settings range is [0,32] * 0 : nearest-neighbor
On Sun, 7 Mar 2021 20:28:35 +0000 Paul Cercueil wrote:
With the module parameter ingenic-drm.cached_gem_buffers, it is possible to specify that we want GEM buffers backed by non-coherent memory.
This dramatically speeds up software rendering on Ingenic SoCs, even for tasks where write-combine memory should in theory be faster (e.g. simple blits).
Wondering if it is due to the tricks at [1].
If so, is dma_alloc_noncoherent() necessary in this patchset?
Christoph can you give us a concise lesson on noncoherency covering at least noncoherent device, noncoherent memory(used in this work), no coherent caching(in [1]), their links to speedup, and the thumb rule to handle noncoherency in workdays. It feels toe curling every time I see noncoherence going downtown with speedup hand in hand.
[1] Subject: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API https://lore.kernel.org/lkml/20210301085236.947011-7-hch@lst.de/#t
Leave it disabled by default, since it is specific to one use-case (software rendering).
v2: Rework code to work with new DRM APIs regarding plane states
Signed-off-by: Paul Cercueil paul@crapouillou.net
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++++++++++++++++++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 ++ drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 ++++++- 3 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index d60e1eefc9d1..ba1ac0fcda74 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -9,6 +9,7 @@ #include <linux/component.h> #include <linux/clk.h> #include <linux/dma-mapping.h> +#include <linux/io.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/of_device.h> @@ -23,6 +24,7 @@ #include <drm/drm_color_mgmt.h> #include <drm/drm_crtc.h> #include <drm/drm_crtc_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_gem_cma_helper.h> #include <drm/drm_fb_cma_helper.h> @@ -99,6 +101,11 @@ struct ingenic_drm { struct notifier_block clock_nb; };
+static bool ingenic_drm_cached_gem_buf; +module_param_named(cached_gem_buffers, ingenic_drm_cached_gem_buf, bool, 0400); +MODULE_PARM_DESC(cached_gem_buffers,
"Enable fully cached GEM buffers [default=false]");
static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg) { switch (reg) { @@ -410,6 +417,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, old_plane_state->fb->format->format != new_plane_state->fb->format->format)) crtc_state->mode_changed = true;
- drm_atomic_helper_check_plane_damage(state, new_plane_state);
- return 0;
}
@@ -541,10 +550,20 @@ static void ingenic_drm_update_palette(struct ingenic_drm *priv, } }
+void ingenic_drm_sync_data(struct device *dev,
struct drm_plane_state *old_state,
struct drm_plane_state *state)
+{
- if (ingenic_drm_cached_gem_buf)
drm_gem_cma_sync_data(dev, old_state, state);
+}
static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
- struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state,
struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane); struct drm_crtc_state *crtc_state;plane);
@@ -554,6 +573,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, u32 fourcc;
if (newstate && newstate->fb) {
ingenic_drm_sync_data(priv->dev, oldstate, newstate);
crtc_state = newstate->crtc->state;
addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -743,6 +764,26 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc) regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0); }
+static struct drm_framebuffer * +ingenic_drm_gem_fb_create(struct drm_device *dev, struct drm_file *file,
const struct drm_mode_fb_cmd2 *mode_cmd)
+{
- if (ingenic_drm_cached_gem_buf)
return drm_gem_fb_create_with_dirty(dev, file, mode_cmd);
- return drm_gem_fb_create(dev, file, mode_cmd);
+}
+static int ingenic_drm_gem_cma_dumb_create(struct drm_file *file_priv,
struct drm_device *drm,
struct drm_mode_create_dumb *args)
+{
- if (ingenic_drm_cached_gem_buf)
return drm_gem_cma_dumb_create_noncoherent(file_priv, drm, args);
- return drm_gem_cma_dumb_create(file_priv, drm, args);
+}
DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
static const struct drm_driver ingenic_drm_driver_data = { @@ -755,7 +796,7 @@ static const struct drm_driver ingenic_drm_driver_data = { .patchlevel = 0,
.fops = &ingenic_drm_fops,
- DRM_GEM_CMA_DRIVER_OPS,
DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(ingenic_drm_gem_cma_dumb_create),
.irq_handler = ingenic_drm_irq_handler,
}; @@ -805,7 +846,7 @@ static const struct drm_encoder_helper_funcs ingenic_drm_encoder_helper_funcs = };
static const struct drm_mode_config_funcs ingenic_drm_mode_config_funcs = {
- .fb_create = drm_gem_fb_create,
- .fb_create = ingenic_drm_gem_fb_create, .output_poll_changed = drm_fb_helper_output_poll_changed, .atomic_check = drm_atomic_helper_check, .atomic_commit = drm_atomic_helper_commit,
@@ -962,6 +1003,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
drm_plane_enable_fb_damage_clips(&priv->f1);
drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -990,6 +1033,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
drm_plane_enable_fb_damage_clips(&priv->f0);
- if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) { ret = component_bind_all(dev, drm); if (ret) {
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h index 1b4347f7f084..b6bca356e024 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm.h +++ b/drivers/gpu/drm/ingenic/ingenic-drm.h @@ -185,6 +185,10 @@ void ingenic_drm_plane_config(struct device *dev, struct drm_plane *plane, u32 fourcc); void ingenic_drm_plane_disable(struct device *dev, struct drm_plane *plane);
+void ingenic_drm_sync_data(struct device *dev,
struct drm_plane_state *old_state,
struct drm_plane_state *state);
extern struct platform_driver *ingenic_ipu_driver_ptr;
#endif /* DRIVERS_GPU_DRM_INGENIC_INGENIC_DRM_H */ diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c index 5ae6adab8306..7826eab044ba 100644 --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c @@ -20,6 +20,7 @@
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fb_cma_helper.h> #include <drm/drm_fourcc.h> @@ -285,6 +286,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
- struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state,
struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane); const struct drm_format_info *finfo;plane);
@@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL); }
- ingenic_drm_sync_data(ipu->master, oldstate, newstate);
- /* New addresses will be committed in vblank handler... */ ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0); if (finfo->num_planes > 1)
@@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
if (!new_plane_state->crtc || !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
return 0;
goto out_check_damage;
/* Plane must be fully visible */ if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, return -EINVAL;
if (!osd_changed(new_plane_state, old_plane_state))
return 0;
goto out_check_damage;
crtc_state->mode_changed = true;
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, ipu->denom_w = denom_w; ipu->denom_h = denom_h;
+out_check_damage:
- drm_atomic_helper_check_plane_damage(state, new_plane_state);
- return 0;
}
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d) return err; }
- drm_plane_enable_fb_damage_clips(plane);
- /*
- Sharpness settings range is [0,32]
- 0 : nearest-neighbor
-- 2.30.1
Hi Hillf,
Le lun. 8 mars 2021 à 11:47, Hillf Danton hdanton@sina.com a écrit :
On Sun, 7 Mar 2021 20:28:35 +0000 Paul Cercueil wrote:
With the module parameter ingenic-drm.cached_gem_buffers, it is possible to specify that we want GEM buffers backed by non-coherent memory.
This dramatically speeds up software rendering on Ingenic SoCs, even for tasks where write-combine memory should in theory be faster (e.g. simple blits).
Wondering if it is due to the tricks at [1].
If so, is dma_alloc_noncoherent() necessary in this patchset?
You confuse non-contiguous with non-coherent, which are two different things.
Cheers, -Paul
Christoph can you give us a concise lesson on noncoherency covering at least noncoherent device, noncoherent memory(used in this work), no coherent caching(in [1]), their links to speedup, and the thumb rule to handle noncoherency in workdays. It feels toe curling every time I see noncoherence going downtown with speedup hand in hand.
[1] Subject: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API https://lore.kernel.org/lkml/20210301085236.947011-7-hch@lst.de/#t
Leave it disabled by default, since it is specific to one use-case (software rendering).
v2: Rework code to work with new DRM APIs regarding plane states
Signed-off-by: Paul Cercueil paul@crapouillou.net
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++++++++++++++++++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 ++ drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 ++++++- 3 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index d60e1eefc9d1..ba1ac0fcda74 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -9,6 +9,7 @@ #include <linux/component.h> #include <linux/clk.h> #include <linux/dma-mapping.h> +#include <linux/io.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/of_device.h> @@ -23,6 +24,7 @@ #include <drm/drm_color_mgmt.h> #include <drm/drm_crtc.h> #include <drm/drm_crtc_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_gem_cma_helper.h> #include <drm/drm_fb_cma_helper.h> @@ -99,6 +101,11 @@ struct ingenic_drm { struct notifier_block clock_nb; };
+static bool ingenic_drm_cached_gem_buf; +module_param_named(cached_gem_buffers, ingenic_drm_cached_gem_buf, bool, 0400); +MODULE_PARM_DESC(cached_gem_buffers,
"Enable fully cached GEM buffers [default=false]");
static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg) { switch (reg) { @@ -410,6 +417,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, old_plane_state->fb->format->format != new_plane_state->fb->format->format)) crtc_state->mode_changed = true;
- drm_atomic_helper_check_plane_damage(state, new_plane_state);
- return 0;
}
@@ -541,10 +550,20 @@ static void ingenic_drm_update_palette(struct ingenic_drm *priv, } }
+void ingenic_drm_sync_data(struct device *dev,
struct drm_plane_state *old_state,
struct drm_plane_state *state)
+{
- if (ingenic_drm_cached_gem_buf)
drm_gem_cma_sync_data(dev, old_state, state);
+}
static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
- struct drm_plane_state *oldstate =
drm_atomic_get_old_plane_state(state,
struct drm_plane_state *newstate =plane);
drm_atomic_get_new_plane_state(state, plane); struct drm_crtc_state *crtc_state; @@ -554,6 +573,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, u32 fourcc;
if (newstate && newstate->fb) {
ingenic_drm_sync_data(priv->dev, oldstate, newstate);
crtc_state = newstate->crtc->state;
addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -743,6 +764,26 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc) regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0); }
+static struct drm_framebuffer * +ingenic_drm_gem_fb_create(struct drm_device *dev, struct drm_file *file,
const struct drm_mode_fb_cmd2 *mode_cmd)
+{
- if (ingenic_drm_cached_gem_buf)
return drm_gem_fb_create_with_dirty(dev, file, mode_cmd);
- return drm_gem_fb_create(dev, file, mode_cmd);
+}
+static int ingenic_drm_gem_cma_dumb_create(struct drm_file *file_priv,
struct drm_device *drm,
struct drm_mode_create_dumb *args)
+{
- if (ingenic_drm_cached_gem_buf)
return drm_gem_cma_dumb_create_noncoherent(file_priv, drm, args);
- return drm_gem_cma_dumb_create(file_priv, drm, args);
+}
DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
static const struct drm_driver ingenic_drm_driver_data = { @@ -755,7 +796,7 @@ static const struct drm_driver ingenic_drm_driver_data = { .patchlevel = 0,
.fops = &ingenic_drm_fops,
- DRM_GEM_CMA_DRIVER_OPS,
DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(ingenic_drm_gem_cma_dumb_create),
.irq_handler = ingenic_drm_irq_handler, };
@@ -805,7 +846,7 @@ static const struct drm_encoder_helper_funcs ingenic_drm_encoder_helper_funcs = };
static const struct drm_mode_config_funcs ingenic_drm_mode_config_funcs = {
- .fb_create = drm_gem_fb_create,
- .fb_create = ingenic_drm_gem_fb_create, .output_poll_changed = drm_fb_helper_output_poll_changed, .atomic_check = drm_atomic_helper_check, .atomic_commit = drm_atomic_helper_commit,
@@ -962,6 +1003,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
drm_plane_enable_fb_damage_clips(&priv->f1);
drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -990,6 +1033,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
drm_plane_enable_fb_damage_clips(&priv->f0);
- if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) { ret = component_bind_all(dev, drm); if (ret) {
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h index 1b4347f7f084..b6bca356e024 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm.h +++ b/drivers/gpu/drm/ingenic/ingenic-drm.h @@ -185,6 +185,10 @@ void ingenic_drm_plane_config(struct device *dev, struct drm_plane *plane, u32 fourcc); void ingenic_drm_plane_disable(struct device *dev, struct drm_plane *plane);
+void ingenic_drm_sync_data(struct device *dev,
struct drm_plane_state *old_state,
struct drm_plane_state *state);
extern struct platform_driver *ingenic_ipu_driver_ptr;
#endif /* DRIVERS_GPU_DRM_INGENIC_INGENIC_DRM_H */ diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c index 5ae6adab8306..7826eab044ba 100644 --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c @@ -20,6 +20,7 @@
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fb_cma_helper.h> #include <drm/drm_fourcc.h> @@ -285,6 +286,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
- struct drm_plane_state *oldstate =
drm_atomic_get_old_plane_state(state,
struct drm_plane_state *newstate =plane);
drm_atomic_get_new_plane_state(state, plane); const struct drm_format_info *finfo; @@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL); }
- ingenic_drm_sync_data(ipu->master, oldstate, newstate);
- /* New addresses will be committed in vblank handler... */ ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0); if (finfo->num_planes > 1)
@@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
if (!new_plane_state->crtc || !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
return 0;
goto out_check_damage;
/* Plane must be fully visible */ if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, return -EINVAL;
if (!osd_changed(new_plane_state, old_plane_state))
return 0;
goto out_check_damage;
crtc_state->mode_changed = true;
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, ipu->denom_w = denom_w; ipu->denom_h = denom_h;
+out_check_damage:
- drm_atomic_helper_check_plane_damage(state, new_plane_state);
- return 0;
}
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d) return err; }
- drm_plane_enable_fb_damage_clips(plane);
- /*
- Sharpness settings range is [0,32]
- 0 : nearest-neighbor
-- 2.30.1
On Wed, 10 Mar 2021 19:01:01 +0000 Paul Cercueil wrote:
Le lun. 8 mars 2021 � 11:47, Hillf Danton hdanton@sina.com a �crit :
On Sun, 7 Mar 2021 20:28:35 +0000 Paul Cercueil wrote:
With the module parameter ingenic-drm.cached_gem_buffers, it is possible to specify that we want GEM buffers backed by non-coherent memory.
This dramatically speeds up software rendering on Ingenic SoCs, even for tasks where write-combine memory should in theory be faster (e.g. simple blits).
Wondering if it is due to the tricks at [1].
If so, is dma_alloc_noncoherent() necessary in this patchset?
You confuse non-contiguous with non-coherent, which are two different things.
You misunderstood me. From [1] we know coherent caching is arch thing, so your proposal is not mandatory on ARM IMHO - what baffles me is noncoherent back memory can speed up device, coherent ot not, regardless of arch. Can you point me to the reasons behind your speedup?
Cheers, -Paul
Christoph can you give us a concise lesson on noncoherency covering at least noncoherent device, noncoherent memory(used in this work), no coherent caching(in [1]), their links to speedup, and the thumb rule to handle noncoherency in workdays. It feels toe curling every time I see noncoherence going downtown with speedup hand in hand.
[1] Subject: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API https://lore.kernel.org/lkml/20210301085236.947011-7-hch@lst.de/#t
Leave it disabled by default, since it is specific to one use-case (software rendering).
v2: Rework code to work with new DRM APIs regarding plane states
Signed-off-by: Paul Cercueil paul@crapouillou.net
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++++++++++++++++++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 ++ drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 ++++++- 3 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index d60e1eefc9d1..ba1ac0fcda74 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -9,6 +9,7 @@ #include <linux/component.h> #include <linux/clk.h> #include <linux/dma-mapping.h> +#include <linux/io.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/of_device.h> @@ -23,6 +24,7 @@ #include <drm/drm_color_mgmt.h> #include <drm/drm_crtc.h> #include <drm/drm_crtc_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_gem_cma_helper.h> #include <drm/drm_fb_cma_helper.h> @@ -99,6 +101,11 @@ struct ingenic_drm { struct notifier_block clock_nb; };
+static bool ingenic_drm_cached_gem_buf; +module_param_named(cached_gem_buffers, ingenic_drm_cached_gem_buf, bool, 0400); +MODULE_PARM_DESC(cached_gem_buffers,
"Enable fully cached GEM buffers [default=false]");
static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg) { switch (reg) { @@ -410,6 +417,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, old_plane_state->fb->format->format != new_plane_state->fb->format->format)) crtc_state->mode_changed = true;
- drm_atomic_helper_check_plane_damage(state, new_plane_state);
- return 0;
}
@@ -541,10 +550,20 @@ static void ingenic_drm_update_palette(struct ingenic_drm *priv, } }
+void ingenic_drm_sync_data(struct device *dev,
struct drm_plane_state *old_state,
struct drm_plane_state *state)
+{
- if (ingenic_drm_cached_gem_buf)
drm_gem_cma_sync_data(dev, old_state, state);
+}
static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
- struct drm_plane_state *oldstate =
drm_atomic_get_old_plane_state(state,
struct drm_plane_state *newstate =plane);
drm_atomic_get_new_plane_state(state, plane); struct drm_crtc_state *crtc_state; @@ -554,6 +573,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, u32 fourcc;
if (newstate && newstate->fb) {
ingenic_drm_sync_data(priv->dev, oldstate, newstate);
crtc_state = newstate->crtc->state;
addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -743,6 +764,26 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc) regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0); }
+static struct drm_framebuffer * +ingenic_drm_gem_fb_create(struct drm_device *dev, struct drm_file *file,
const struct drm_mode_fb_cmd2 *mode_cmd)
+{
- if (ingenic_drm_cached_gem_buf)
return drm_gem_fb_create_with_dirty(dev, file, mode_cmd);
- return drm_gem_fb_create(dev, file, mode_cmd);
+}
+static int ingenic_drm_gem_cma_dumb_create(struct drm_file *file_priv,
struct drm_device *drm,
struct drm_mode_create_dumb *args)
+{
- if (ingenic_drm_cached_gem_buf)
return drm_gem_cma_dumb_create_noncoherent(file_priv, drm, args);
- return drm_gem_cma_dumb_create(file_priv, drm, args);
+}
DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
static const struct drm_driver ingenic_drm_driver_data = { @@ -755,7 +796,7 @@ static const struct drm_driver ingenic_drm_driver_data = { .patchlevel = 0,
.fops = &ingenic_drm_fops,
- DRM_GEM_CMA_DRIVER_OPS,
DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(ingenic_drm_gem_cma_dumb_create),
.irq_handler = ingenic_drm_irq_handler, };
@@ -805,7 +846,7 @@ static const struct drm_encoder_helper_funcs ingenic_drm_encoder_helper_funcs = };
static const struct drm_mode_config_funcs ingenic_drm_mode_config_funcs = {
- .fb_create = drm_gem_fb_create,
- .fb_create = ingenic_drm_gem_fb_create, .output_poll_changed = drm_fb_helper_output_poll_changed, .atomic_check = drm_atomic_helper_check, .atomic_commit = drm_atomic_helper_commit,
@@ -962,6 +1003,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
drm_plane_enable_fb_damage_clips(&priv->f1);
drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -990,6 +1033,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
drm_plane_enable_fb_damage_clips(&priv->f0);
- if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) { ret = component_bind_all(dev, drm); if (ret) {
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h index 1b4347f7f084..b6bca356e024 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm.h +++ b/drivers/gpu/drm/ingenic/ingenic-drm.h @@ -185,6 +185,10 @@ void ingenic_drm_plane_config(struct device *dev, struct drm_plane *plane, u32 fourcc); void ingenic_drm_plane_disable(struct device *dev, struct drm_plane *plane);
+void ingenic_drm_sync_data(struct device *dev,
struct drm_plane_state *old_state,
struct drm_plane_state *state);
extern struct platform_driver *ingenic_ipu_driver_ptr;
#endif /* DRIVERS_GPU_DRM_INGENIC_INGENIC_DRM_H */ diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c index 5ae6adab8306..7826eab044ba 100644 --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c @@ -20,6 +20,7 @@
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fb_cma_helper.h> #include <drm/drm_fourcc.h> @@ -285,6 +286,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
- struct drm_plane_state *oldstate =
drm_atomic_get_old_plane_state(state,
struct drm_plane_state *newstate =plane);
drm_atomic_get_new_plane_state(state, plane); const struct drm_format_info *finfo; @@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL); }
- ingenic_drm_sync_data(ipu->master, oldstate, newstate);
- /* New addresses will be committed in vblank handler... */ ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0); if (finfo->num_planes > 1)
@@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
if (!new_plane_state->crtc || !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
return 0;
goto out_check_damage;
/* Plane must be fully visible */ if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, return -EINVAL;
if (!osd_changed(new_plane_state, old_plane_state))
return 0;
goto out_check_damage;
crtc_state->mode_changed = true;
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, ipu->denom_w = denom_w; ipu->denom_h = denom_h;
+out_check_damage:
- drm_atomic_helper_check_plane_damage(state, new_plane_state);
- return 0;
}
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d) return err; }
- drm_plane_enable_fb_damage_clips(plane);
- /*
- Sharpness settings range is [0,32]
- 0 : nearest-neighbor
-- 2.30.1
Le jeu. 11 mars 2021 à 10:27, Hillf Danton hdanton@sina.com a écrit :
On Wed, 10 Mar 2021 19:01:01 +0000 Paul Cercueil wrote:
Le lun. 8 mars 2021 11:47, Hillf Danton hdanton@sina.com a crit :
On Sun, 7 Mar 2021 20:28:35 +0000 Paul Cercueil wrote:
With the module parameter ingenic-drm.cached_gem_buffers, it is possible to specify that we want GEM buffers backed by non-coherent memory.
This dramatically speeds up software rendering on Ingenic SoCs, even for tasks where write-combine memory should in theory be faster (e.g. simple blits).
Wondering if it is due to the tricks at [1].
If so, is dma_alloc_noncoherent() necessary in this patchset?
You confuse non-contiguous with non-coherent, which are two different things.
You misunderstood me. From [1] we know coherent caching is arch thing, so your proposal is not mandatory on ARM IMHO - what baffles me is noncoherent back memory can speed up device, coherent ot not, regardless of arch. Can you point me to the reasons behind your speedup?
Well, I did write in the cover letter that it would help *some* SoCs, so it is not meant to be a default setting. As you can see in the last patch, it is opt-in. Ingenic SoCs are MIPS SoCs, btw.
One thing I should add is that this speeds up *software* rendering (vs. write-combine). It makes perfect sense that it speeds up things like alpha-blending, but I can't explain why it speeds up standard blits. My guess is that cache line invalidation is just extremely fast on this SoC.
-Paul
Cheers, -Paul
Christoph can you give us a concise lesson on noncoherency covering at least noncoherent device, noncoherent memory(used in this work), no coherent caching(in [1]), their links to speedup, and the thumb rule to handle noncoherency in workdays. It feels toe curling every time I see noncoherence going downtown with speedup hand in hand.
[1] Subject: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API https://lore.kernel.org/lkml/20210301085236.947011-7-hch@lst.de/#t
Leave it disabled by default, since it is specific to one use-case (software rendering).
v2: Rework code to work with new DRM APIs regarding plane states
Signed-off-by: Paul Cercueil paul@crapouillou.net
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++++++++++++++++++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 ++ drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 ++++++- 3 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index d60e1eefc9d1..ba1ac0fcda74 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -9,6 +9,7 @@ #include <linux/component.h> #include <linux/clk.h> #include <linux/dma-mapping.h> +#include <linux/io.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/of_device.h> @@ -23,6 +24,7 @@ #include <drm/drm_color_mgmt.h> #include <drm/drm_crtc.h> #include <drm/drm_crtc_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_gem_cma_helper.h> #include <drm/drm_fb_cma_helper.h> @@ -99,6 +101,11 @@ struct ingenic_drm { struct notifier_block clock_nb; };
+static bool ingenic_drm_cached_gem_buf; +module_param_named(cached_gem_buffers, ingenic_drm_cached_gem_buf, bool, 0400); +MODULE_PARM_DESC(cached_gem_buffers,
"Enable fully cached GEM buffers [default=false]");
static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg) { switch (reg) { @@ -410,6 +417,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, old_plane_state->fb->format->format != new_plane_state->fb->format->format)) crtc_state->mode_changed = true;
- drm_atomic_helper_check_plane_damage(state, new_plane_state);
- return 0;
}
@@ -541,10 +550,20 @@ static void ingenic_drm_update_palette(struct ingenic_drm *priv, } }
+void ingenic_drm_sync_data(struct device *dev,
struct drm_plane_state *old_state,
struct drm_plane_state *state)
+{
- if (ingenic_drm_cached_gem_buf)
drm_gem_cma_sync_data(dev, old_state, state);
+}
static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
- struct drm_plane_state *oldstate =
drm_atomic_get_old_plane_state(state,
struct drm_plane_state *newstate =plane);
drm_atomic_get_new_plane_state(state, plane); struct drm_crtc_state *crtc_state; @@ -554,6 +573,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, u32 fourcc;
if (newstate && newstate->fb) {
ingenic_drm_sync_data(priv->dev, oldstate, newstate);
crtc_state = newstate->crtc->state; addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -743,6 +764,26 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc) regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0); }
+static struct drm_framebuffer * +ingenic_drm_gem_fb_create(struct drm_device *dev, struct drm_file *file,
const struct drm_mode_fb_cmd2 *mode_cmd)
+{
- if (ingenic_drm_cached_gem_buf)
return drm_gem_fb_create_with_dirty(dev, file, mode_cmd);
- return drm_gem_fb_create(dev, file, mode_cmd);
+}
+static int ingenic_drm_gem_cma_dumb_create(struct drm_file *file_priv,
struct drm_device *drm,
struct drm_mode_create_dumb *args)
+{
- if (ingenic_drm_cached_gem_buf)
return drm_gem_cma_dumb_create_noncoherent(file_priv, drm,
args);
- return drm_gem_cma_dumb_create(file_priv, drm, args);
+}
DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
static const struct drm_driver ingenic_drm_driver_data = { @@ -755,7 +796,7 @@ static const struct drm_driver ingenic_drm_driver_data = { .patchlevel = 0,
.fops = &ingenic_drm_fops,
- DRM_GEM_CMA_DRIVER_OPS,
DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(ingenic_drm_gem_cma_dumb_create),
.irq_handler = ingenic_drm_irq_handler, }; @@ -805,7 +846,7 @@ static const struct drm_encoder_helper_funcs
ingenic_drm_encoder_helper_funcs = };
static const struct drm_mode_config_funcs ingenic_drm_mode_config_funcs = {
- .fb_create = drm_gem_fb_create,
- .fb_create = ingenic_drm_gem_fb_create, .output_poll_changed = drm_fb_helper_output_poll_changed, .atomic_check = drm_atomic_helper_check, .atomic_commit = drm_atomic_helper_commit,
@@ -962,6 +1003,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
- drm_plane_enable_fb_damage_clips(&priv->f1);
- drm_crtc_helper_add(&priv->crtc,
&ingenic_drm_crtc_helper_funcs);
ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -990,6 +1033,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return ret; }
drm_plane_enable_fb_damage_clips(&priv->f0);
if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) { ret = component_bind_all(dev, drm); if (ret) {
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h index 1b4347f7f084..b6bca356e024 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm.h +++ b/drivers/gpu/drm/ingenic/ingenic-drm.h @@ -185,6 +185,10 @@ void ingenic_drm_plane_config(struct device *dev, struct drm_plane *plane, u32 fourcc); void ingenic_drm_plane_disable(struct device *dev, struct drm_plane *plane);
+void ingenic_drm_sync_data(struct device *dev,
struct drm_plane_state *old_state,
struct drm_plane_state *state);
extern struct platform_driver *ingenic_ipu_driver_ptr;
#endif /* DRIVERS_GPU_DRM_INGENIC_INGENIC_DRM_H */ diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c index 5ae6adab8306..7826eab044ba 100644 --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c @@ -20,6 +20,7 @@
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> +#include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fb_cma_helper.h> #include <drm/drm_fourcc.h> @@ -285,6 +286,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
- struct drm_plane_state *oldstate =
drm_atomic_get_old_plane_state(state,
struct drm_plane_state *newstate =plane);
drm_atomic_get_new_plane_state(state, plane); const struct drm_format_info *finfo; @@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL); }
- ingenic_drm_sync_data(ipu->master, oldstate, newstate);
- /* New addresses will be committed in vblank handler... */ ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate,
0); if (finfo->num_planes > 1) @@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
if (!new_plane_state->crtc || !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
return 0;
goto out_check_damage;
/* Plane must be fully visible */ if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0
|| @@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, return -EINVAL;
if (!osd_changed(new_plane_state, old_plane_state))
return 0;
goto out_check_damage;
crtc_state->mode_changed = true;
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane, ipu->denom_w = denom_w; ipu->denom_h = denom_h;
+out_check_damage:
- drm_atomic_helper_check_plane_damage(state, new_plane_state);
- return 0;
}
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d) return err; }
- drm_plane_enable_fb_damage_clips(plane);
- /*
- Sharpness settings range is [0,32]
- 0 : nearest-neighbor
-- 2.30.1
Hi Paul,
having individual functions for each mode only makes sense if the decision is at compile time. But in patch 5, you're working around your earlier design by introducing in-driver helpers that select the correct CMA function.
In SHMEM helpers we have the flag map_wc in the GEM structure that selects the pages caching mode (wc vs uncached). I think CMA should use this design as well. Have a map_noncoherent flag in the CMA GEM object and set it from the driver's implementation of gem_create_object.
And in the long run, we could try to consolidate all drivers/helpers mapping flags in struct drm_gem_object.
Best regards Thomas
Am 07.03.21 um 21:28 schrieb Paul Cercueil:
Rework of my previous patchset which added support for GEM buffers backed by non-coherent memory to the ingenic-drm driver.
Having GEM buffers backed by non-coherent memory is interesting in the particular case where it is faster to render to a non-coherent buffer then sync the data cache, than to render to a write-combine buffer, and (by extension) much faster than using a shadow buffer. This is true for instance on some Ingenic SoCs, where even simple blits (e.g. memcpy) are about three times faster using this method.
For the record, the previous patchset was accepted for 5.10 then had to be reverted, as it conflicted with some changes made to the DMA API.
This new patchset is pretty different as it adds the functionality to the DRM core. The first three patches add variants to existing functions but with the "non-coherent memory" twist, exported as GPL symbols. The fourth patch adds a function to be used with the damage helpers. Finally, the last patch adds support for non-coherent GEM buffers to the ingenic-drm driver. The functionality is enabled through a module parameter, and is disabled by default.
Cheers, -Paul
Paul Cercueil (5): drm: Add and export function drm_gem_cma_create_noncoherent drm: Add and export function drm_gem_cma_dumb_create_noncoherent drm: Add and export function drm_gem_cma_mmap_noncoherent drm: Add and export function drm_gem_cma_sync_data drm/ingenic: Add option to alloc cached GEM buffers
drivers/gpu/drm/drm_gem_cma_helper.c | 223 +++++++++++++++++++--- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 + drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 +- include/drm/drm_gem_cma_helper.h | 13 ++ 5 files changed, 273 insertions(+), 30 deletions(-)
Hi Thomas,
Le lun. 8 mars 2021 à 9:41, Thomas Zimmermann tzimmermann@suse.de a écrit :
Hi Paul,
having individual functions for each mode only makes sense if the decision is at compile time. But in patch 5, you're working around your earlier design by introducing in-driver helpers that select the correct CMA function.
In SHMEM helpers we have the flag map_wc in the GEM structure that selects the pages caching mode (wc vs uncached). I think CMA should use this design as well. Have a map_noncoherent flag in the CMA GEM object and set it from the driver's implementation of gem_create_object.
Is that a new addition? That severely reduces the patchset size, which is perfect.
I'll send a V3 then.
Cheers, -Paul
And in the long run, we could try to consolidate all drivers/helpers mapping flags in struct drm_gem_object.
Best regards Thomas
Am 07.03.21 um 21:28 schrieb Paul Cercueil:
Rework of my previous patchset which added support for GEM buffers backed by non-coherent memory to the ingenic-drm driver.
Having GEM buffers backed by non-coherent memory is interesting in the particular case where it is faster to render to a non-coherent buffer then sync the data cache, than to render to a write-combine buffer, and (by extension) much faster than using a shadow buffer. This is true for instance on some Ingenic SoCs, where even simple blits (e.g. memcpy) are about three times faster using this method.
For the record, the previous patchset was accepted for 5.10 then had to be reverted, as it conflicted with some changes made to the DMA API.
This new patchset is pretty different as it adds the functionality to the DRM core. The first three patches add variants to existing functions but with the "non-coherent memory" twist, exported as GPL symbols. The fourth patch adds a function to be used with the damage helpers. Finally, the last patch adds support for non-coherent GEM buffers to the ingenic-drm driver. The functionality is enabled through a module parameter, and is disabled by default.
Cheers, -Paul
Paul Cercueil (5): drm: Add and export function drm_gem_cma_create_noncoherent drm: Add and export function drm_gem_cma_dumb_create_noncoherent drm: Add and export function drm_gem_cma_mmap_noncoherent drm: Add and export function drm_gem_cma_sync_data drm/ingenic: Add option to alloc cached GEM buffers
drivers/gpu/drm/drm_gem_cma_helper.c | 223 +++++++++++++++++++--- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 + drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 +- include/drm/drm_gem_cma_helper.h | 13 ++ 5 files changed, 273 insertions(+), 30 deletions(-)
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
Hi
Am 10.03.21 um 20:02 schrieb Paul Cercueil:
Hi Thomas,
Le lun. 8 mars 2021 à 9:41, Thomas Zimmermann tzimmermann@suse.de a écrit :
Hi Paul,
having individual functions for each mode only makes sense if the decision is at compile time. But in patch 5, you're working around your earlier design by introducing in-driver helpers that select the correct CMA function.
In SHMEM helpers we have the flag map_wc in the GEM structure that selects the pages caching mode (wc vs uncached). I think CMA should
Re-reading this, it should rather be WC and cached.
use this design as well. Have a map_noncoherent flag in the CMA GEM object and set it from the driver's implementation of gem_create_object.
Is that a new addition? That severely reduces the patchset size, which is perfect.
It was added a few months ago, around the time you send the first version of the patches at hand.
Originally, SHMEM uses write combining by default. And several drivers used default mapping flags instead (so usually cached). IIRC I streamlined everything and flipped the default. Most drivers can use cached mappings and only some require WC.
Recently there was also a patchset that added support for cached video RAM (or something like that). So at some point we could store these flags in drm_gem_object. For now, I'd just put a flag into drm_gem_cma_object.
I'll send a V3 then.
Great!
Best regards Thomas
Cheers, -Paul
And in the long run, we could try to consolidate all drivers/helpers mapping flags in struct drm_gem_object.
Best regards Thomas
Am 07.03.21 um 21:28 schrieb Paul Cercueil:
Rework of my previous patchset which added support for GEM buffers backed by non-coherent memory to the ingenic-drm driver.
Having GEM buffers backed by non-coherent memory is interesting in the particular case where it is faster to render to a non-coherent buffer then sync the data cache, than to render to a write-combine buffer, and (by extension) much faster than using a shadow buffer. This is true for instance on some Ingenic SoCs, where even simple blits (e.g. memcpy) are about three times faster using this method.
For the record, the previous patchset was accepted for 5.10 then had to be reverted, as it conflicted with some changes made to the DMA API.
This new patchset is pretty different as it adds the functionality to the DRM core. The first three patches add variants to existing functions but with the "non-coherent memory" twist, exported as GPL symbols. The fourth patch adds a function to be used with the damage helpers. Finally, the last patch adds support for non-coherent GEM buffers to the ingenic-drm driver. The functionality is enabled through a module parameter, and is disabled by default.
Cheers, -Paul
Paul Cercueil (5): drm: Add and export function drm_gem_cma_create_noncoherent drm: Add and export function drm_gem_cma_dumb_create_noncoherent drm: Add and export function drm_gem_cma_mmap_noncoherent drm: Add and export function drm_gem_cma_sync_data drm/ingenic: Add option to alloc cached GEM buffers
drivers/gpu/drm/drm_gem_cma_helper.c | 223 +++++++++++++++++++--- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++- drivers/gpu/drm/ingenic/ingenic-drm.h | 4 + drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 +- include/drm/drm_gem_cma_helper.h | 13 ++ 5 files changed, 273 insertions(+), 30 deletions(-)
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
dri-devel@lists.freedesktop.org