This patch series fixes excessive DMM or CMA usage of GEM buffers leading to various runtime allocation failures. The series enables daily usage of devices without exausting limited resources like CMA or DMM space if GPU rendering is needed.
The first patch doesn't bring any functional changes, it just moves some TILER/DMM related code to a separate function, to simplify the review of the next two patches.
The second patch allows off-CPU rendering to non-scanout buffers. Without that patch, it is basically impossible to use the driver allocated GEM buffers on OMAP3 for anything else but a basic CPU rendered examples as if we want GPU rendering, we must allocate buffers as scanout buffers, which are CMA allocated. CMA soon gets fragmented and we start seeing allocation failures. Such failres in Xorg cannot be handeled gracefully, so the system is basically unusable.
Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get fragmented with time, leading to allocation failures.
Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and PVR EXA from https://github.com/maemo-leste/xf86-video-omap
Ivaylo Dimitrov (3): drm: omapdrm: simplify omap_gem_pin drm: omapdrm: Support exporting of non-contiguous GEM BOs drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER
drivers/gpu/drm/omapdrm/omap_gem.c | 198 +++++++++++++++++------------- drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- 3 files changed, 116 insertions(+), 90 deletions(-)
Move tiler related code to its own function.
Signed-off-by: Ivaylo Dimitrov ivo.g.dimitrov.75@gmail.com --- drivers/gpu/drm/omapdrm/omap_gem.c | 75 +++++++++++++++++++++----------------- 1 file changed, 42 insertions(+), 33 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c index b0fa174..bb12cb4 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem.c +++ b/drivers/gpu/drm/omapdrm/omap_gem.c @@ -750,6 +750,46 @@ void omap_gem_dma_sync_buffer(struct drm_gem_object *obj, } }
+static int omap_gem_pin_tiler(struct drm_gem_object *obj) +{ + struct omap_gem_object *omap_obj = to_omap_bo(obj); + u32 npages = obj->size >> PAGE_SHIFT; + enum tiler_fmt fmt = gem2fmt(omap_obj->flags); + struct tiler_block *block; + int ret; + + BUG_ON(omap_obj->block); + + if (omap_obj->flags & OMAP_BO_TILED_MASK) { + block = tiler_reserve_2d(fmt, omap_obj->width, omap_obj->height, + PAGE_SIZE); + } else { + block = tiler_reserve_1d(obj->size); + } + + if (IS_ERR(block)) { + ret = PTR_ERR(block); + dev_err(obj->dev->dev, "could not remap: %d (%d)\n", ret, fmt); + goto fail; + } + + /* TODO: enable async refill.. */ + ret = tiler_pin(block, omap_obj->pages, npages, omap_obj->roll, true); + if (ret) { + tiler_release(block); + dev_err(obj->dev->dev, "could not pin: %d\n", ret); + goto fail; + } + + omap_obj->dma_addr = tiler_ssptr(block); + omap_obj->block = block; + + DBG("got dma address: %pad", &omap_obj->dma_addr); + +fail: + return ret; +} + /** * omap_gem_pin() - Pin a GEM object in memory * @obj: the GEM object @@ -774,11 +814,6 @@ int omap_gem_pin(struct drm_gem_object *obj, dma_addr_t *dma_addr)
if (!omap_gem_is_contiguous(omap_obj) && priv->has_dmm) { if (refcount_read(&omap_obj->dma_addr_cnt) == 0) { - u32 npages = obj->size >> PAGE_SHIFT; - enum tiler_fmt fmt = gem2fmt(omap_obj->flags); - struct tiler_block *block; - - BUG_ON(omap_obj->block);
refcount_set(&omap_obj->dma_addr_cnt, 1);
@@ -786,35 +821,9 @@ int omap_gem_pin(struct drm_gem_object *obj, dma_addr_t *dma_addr) if (ret) goto fail;
- if (omap_obj->flags & OMAP_BO_TILED_MASK) { - block = tiler_reserve_2d(fmt, - omap_obj->width, - omap_obj->height, PAGE_SIZE); - } else { - block = tiler_reserve_1d(obj->size); - } - - if (IS_ERR(block)) { - ret = PTR_ERR(block); - dev_err(obj->dev->dev, - "could not remap: %d (%d)\n", ret, fmt); - goto fail; - } - - /* TODO: enable async refill.. */ - ret = tiler_pin(block, omap_obj->pages, npages, - omap_obj->roll, true); - if (ret) { - tiler_release(block); - dev_err(obj->dev->dev, - "could not pin: %d\n", ret); + ret = omap_gem_pin_tiler(obj); + if (ret) goto fail; - } - - omap_obj->dma_addr = tiler_ssptr(block); - omap_obj->block = block; - - DBG("got dma address: %pad", &omap_obj->dma_addr); } else { refcount_inc(&omap_obj->dma_addr_cnt); }
Currently code allocates non-scanout BOs from SHMEM and those objects are accessible to userspace by mmap(). However, on devices with no DMM (like OMAP3), the same objects are not accessible by kernel drivers that want to render to them as code refuses to export them. In turn this means that on devices with no DMM, all buffers must be allocated as scanout, otherwise only CPU can access them. On those devices, scanout buffers are allocated from CMA, making those allocations highly unreliable.
Fix that by implementing functionality to export SHMEM backed buffers on devices with no DMM. This makes CMA memory only being used when needed, instead for every buffer that has to be off-CPU rendered.
Tested on Motorola Droid4 and Nokia N900
Signed-off-by: Ivaylo Dimitrov ivo.g.dimitrov.75@gmail.com --- drivers/gpu/drm/omapdrm/omap_gem.c | 125 +++++++++++++++++------------- drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- 3 files changed, 73 insertions(+), 60 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c index bb12cb4..41c1a6d 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem.c +++ b/drivers/gpu/drm/omapdrm/omap_gem.c @@ -38,7 +38,7 @@ struct omap_gem_object { /** roll applied when mapping to DMM */ u32 roll;
- /** protects dma_addr_cnt, block, pages, dma_addrs and vaddr */ + /** protects pin_cnt, block, pages, dma_addrs and vaddr */ struct mutex lock;
/** @@ -50,24 +50,24 @@ struct omap_gem_object { * - buffers imported from dmabuf (with the OMAP_BO_MEM_DMABUF flag set) * if they are physically contiguous (when sgt->orig_nents == 1) * - * - buffers mapped through the TILER when dma_addr_cnt is not zero, in - * which case the DMA address points to the TILER aperture + * - buffers mapped through the TILER when pin_cnt is not zero, in which + * case the DMA address points to the TILER aperture * * Physically contiguous buffers have their DMA address equal to the * physical address as we don't remap those buffers through the TILER. * * Buffers mapped to the TILER have their DMA address pointing to the - * TILER aperture. As TILER mappings are refcounted (through - * dma_addr_cnt) the DMA address must be accessed through omap_gem_pin() - * to ensure that the mapping won't disappear unexpectedly. References - * must be released with omap_gem_unpin(). + * TILER aperture. As TILER mappings are refcounted (through pin_cnt) + * the DMA address must be accessed through omap_gem_pin() to ensure + * that the mapping won't disappear unexpectedly. References must be + * released with omap_gem_unpin(). */ dma_addr_t dma_addr;
/** - * # of users of dma_addr + * # of users */ - refcount_t dma_addr_cnt; + refcount_t pin_cnt;
/** * If the buffer has been imported from a dmabuf the OMAP_DB_DMABUF flag @@ -812,32 +812,28 @@ int omap_gem_pin(struct drm_gem_object *obj, dma_addr_t *dma_addr)
mutex_lock(&omap_obj->lock);
- if (!omap_gem_is_contiguous(omap_obj) && priv->has_dmm) { - if (refcount_read(&omap_obj->dma_addr_cnt) == 0) { + if (!omap_gem_is_contiguous(omap_obj)) { + if (refcount_read(&omap_obj->pin_cnt) == 0) {
- refcount_set(&omap_obj->dma_addr_cnt, 1); + refcount_set(&omap_obj->pin_cnt, 1);
ret = omap_gem_attach_pages(obj); if (ret) goto fail;
- ret = omap_gem_pin_tiler(obj); - if (ret) - goto fail; + if (priv->has_dmm) { + ret = omap_gem_pin_tiler(obj); + if (ret) + goto fail; + } } else { - refcount_inc(&omap_obj->dma_addr_cnt); + refcount_inc(&omap_obj->pin_cnt); } - - if (dma_addr) - *dma_addr = omap_obj->dma_addr; - } else if (omap_gem_is_contiguous(omap_obj)) { - if (dma_addr) - *dma_addr = omap_obj->dma_addr; - } else { - ret = -EINVAL; - goto fail; }
+ if (dma_addr) + *dma_addr = omap_obj->dma_addr; + fail: mutex_unlock(&omap_obj->lock);
@@ -856,27 +852,29 @@ static void omap_gem_unpin_locked(struct drm_gem_object *obj) struct omap_gem_object *omap_obj = to_omap_bo(obj); int ret;
- if (omap_gem_is_contiguous(omap_obj) || !priv->has_dmm) + if (omap_gem_is_contiguous(omap_obj)) return;
- if (refcount_dec_and_test(&omap_obj->dma_addr_cnt)) { + if (refcount_dec_and_test(&omap_obj->pin_cnt)) { if (omap_obj->sgt) { sg_free_table(omap_obj->sgt); kfree(omap_obj->sgt); omap_obj->sgt = NULL; } - ret = tiler_unpin(omap_obj->block); - if (ret) { - dev_err(obj->dev->dev, - "could not unpin pages: %d\n", ret); - } - ret = tiler_release(omap_obj->block); - if (ret) { - dev_err(obj->dev->dev, - "could not release unmap: %d\n", ret); + if (priv->has_dmm) { + ret = tiler_unpin(omap_obj->block); + if (ret) { + dev_err(obj->dev->dev, + "could not unpin pages: %d\n", ret); + } + ret = tiler_release(omap_obj->block); + if (ret) { + dev_err(obj->dev->dev, + "could not release unmap: %d\n", ret); + } + omap_obj->dma_addr = 0; + omap_obj->block = NULL; } - omap_obj->dma_addr = 0; - omap_obj->block = NULL; } }
@@ -909,7 +907,7 @@ int omap_gem_rotated_dma_addr(struct drm_gem_object *obj, u32 orient,
mutex_lock(&omap_obj->lock);
- if ((refcount_read(&omap_obj->dma_addr_cnt) > 0) && omap_obj->block && + if ((refcount_read(&omap_obj->pin_cnt) > 0) && omap_obj->block && (omap_obj->flags & OMAP_BO_TILED_MASK)) { *dma_addr = tiler_tsptr(omap_obj->block, orient, x, y); ret = 0; @@ -977,7 +975,8 @@ int omap_gem_put_pages(struct drm_gem_object *obj) return 0; }
-struct sg_table *omap_gem_get_sg(struct drm_gem_object *obj) +struct sg_table *omap_gem_get_sg(struct drm_gem_object *obj, + enum dma_data_direction dir) { struct omap_gem_object *omap_obj = to_omap_bo(obj); dma_addr_t addr; @@ -1002,28 +1001,44 @@ struct sg_table *omap_gem_get_sg(struct drm_gem_object *obj) goto err_unpin; }
- if (omap_obj->flags & OMAP_BO_TILED_MASK) { - enum tiler_fmt fmt = gem2fmt(omap_obj->flags); + if (addr) { + if (omap_obj->flags & OMAP_BO_TILED_MASK) { + enum tiler_fmt fmt = gem2fmt(omap_obj->flags);
- len = omap_obj->width << (int)fmt; - count = omap_obj->height; - stride = tiler_stride(fmt, 0); + len = omap_obj->width << (int)fmt; + count = omap_obj->height; + stride = tiler_stride(fmt, 0); + } else { + len = obj->size; + count = 1; + stride = 0; + } } else { - len = obj->size; - count = 1; - stride = 0; + count = obj->size >> PAGE_SHIFT; }
ret = sg_alloc_table(sgt, count, GFP_KERNEL); if (ret) goto err_free;
- for_each_sg(sgt->sgl, sg, count, i) { - sg_set_page(sg, phys_to_page(addr), len, offset_in_page(addr)); - sg_dma_address(sg) = addr; - sg_dma_len(sg) = len; + /* this must be after omap_gem_pin() to ensure we have pages attached */ + omap_gem_dma_sync_buffer(obj, dir); + + if (addr) { + for_each_sg(sgt->sgl, sg, count, i) { + sg_set_page(sg, phys_to_page(addr), len, + offset_in_page(addr)); + sg_dma_address(sg) = addr; + sg_dma_len(sg) = len;
- addr += stride; + addr += stride; + } + } else { + for_each_sg(sgt->sgl, sg, count, i) { + sg_set_page(sg, omap_obj->pages[i], PAGE_SIZE, 0); + sg_dma_address(sg) = omap_obj->dma_addrs[i]; + sg_dma_len(sg) = PAGE_SIZE; + } }
omap_obj->sgt = sgt; @@ -1133,7 +1148,7 @@ void omap_gem_describe(struct drm_gem_object *obj, struct seq_file *m) seq_printf(m, "%08x: %2d (%2d) %08llx %pad (%2d) %p %4d", omap_obj->flags, obj->name, kref_read(&obj->refcount), off, &omap_obj->dma_addr, - refcount_read(&omap_obj->dma_addr_cnt), + refcount_read(&omap_obj->pin_cnt), omap_obj->vaddr, omap_obj->roll);
if (omap_obj->flags & OMAP_BO_TILED_MASK) { @@ -1196,7 +1211,7 @@ static void omap_gem_free_object(struct drm_gem_object *obj) mutex_lock(&omap_obj->lock);
/* The object should not be pinned. */ - WARN_ON(refcount_read(&omap_obj->dma_addr_cnt) > 0); + WARN_ON(refcount_read(&omap_obj->pin_cnt) > 0);
if (omap_obj->pages) { if (omap_obj->flags & OMAP_BO_MEM_DMABUF) diff --git a/drivers/gpu/drm/omapdrm/omap_gem.h b/drivers/gpu/drm/omapdrm/omap_gem.h index 19209e3..4d44889 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem.h +++ b/drivers/gpu/drm/omapdrm/omap_gem.h @@ -82,7 +82,8 @@ int omap_gem_get_pages(struct drm_gem_object *obj, struct page ***pages, int omap_gem_rotated_dma_addr(struct drm_gem_object *obj, u32 orient, int x, int y, dma_addr_t *dma_addr); int omap_gem_tiled_stride(struct drm_gem_object *obj, u32 orient); -struct sg_table *omap_gem_get_sg(struct drm_gem_object *obj); +struct sg_table *omap_gem_get_sg(struct drm_gem_object *obj, + enum dma_data_direction dir); void omap_gem_put_sg(struct drm_gem_object *obj, struct sg_table *sgt);
#endif /* __OMAPDRM_GEM_H__ */ diff --git a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c index 57af3d9..8d15e07 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c +++ b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c @@ -23,13 +23,10 @@ static struct sg_table *omap_gem_map_dma_buf( { struct drm_gem_object *obj = attachment->dmabuf->priv; struct sg_table *sg; - sg = omap_gem_get_sg(obj); + sg = omap_gem_get_sg(obj, dir); if (IS_ERR(sg)) return sg;
- /* this must be after omap_gem_pin() to ensure we have pages attached */ - omap_gem_dma_sync_buffer(obj, dir); - return sg; }
On devices with DMM, all allocations are done through either DMM or TILER. DMM/TILER being a limited resource means that such allocations will start to fail before actual free memory is exhausted. What is even worse is that with time DMM/TILER space gets fragmented to the point that even if we have enough free DMM/TILER space and free memory, allocation fails because there is no big enough free block in DMM/TILER space.
Such failures can be easily observed with OMAP xorg DDX, for example - starting few GUI applications (so buffers for their windows are allocated) and then rotating landscape<->portrait while closing and opening new windows soon results in allocation failures.
Fix that by mapping buffers through DMM/TILER only when really needed, like, for scanout buffers.
Signed-off-by: Ivaylo Dimitrov ivo.g.dimitrov.75@gmail.com --- drivers/gpu/drm/omapdrm/omap_gem.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c index 41c1a6d..cf57179 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem.c +++ b/drivers/gpu/drm/omapdrm/omap_gem.c @@ -821,10 +821,12 @@ int omap_gem_pin(struct drm_gem_object *obj, dma_addr_t *dma_addr) if (ret) goto fail;
- if (priv->has_dmm) { - ret = omap_gem_pin_tiler(obj); - if (ret) - goto fail; + if (omap_obj->flags & OMAP_BO_SCANOUT) { + if (priv->has_dmm) { + ret = omap_gem_pin_tiler(obj); + if (ret) + goto fail; + } } } else { refcount_inc(&omap_obj->pin_cnt); @@ -861,6 +863,8 @@ static void omap_gem_unpin_locked(struct drm_gem_object *obj) kfree(omap_obj->sgt); omap_obj->sgt = NULL; } + if (!(omap_obj->flags & OMAP_BO_SCANOUT)) + return; if (priv->has_dmm) { ret = tiler_unpin(omap_obj->block); if (ret) {
Hi,
On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
On devices with DMM, all allocations are done through either DMM or TILER. DMM/TILER being a limited resource means that such allocations will start to fail before actual free memory is exhausted. What is even worse is that with time DMM/TILER space gets fragmented to the point that even if we have enough free DMM/TILER space and free memory, allocation fails because there is no big enough free block in DMM/TILER space.
Such failures can be easily observed with OMAP xorg DDX, for example - starting few GUI applications (so buffers for their windows are allocated) and then rotating landscape<->portrait while closing and opening new windows soon results in allocation failures.
Fix that by mapping buffers through DMM/TILER only when really needed, like, for scanout buffers.
Doesn't this break users that get a buffer from omapdrm and expect it to be contiguous?
Tomi
On 17.02.22 г. 14:46 ч., Tomi Valkeinen wrote:
Hi,
On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
On devices with DMM, all allocations are done through either DMM or TILER. DMM/TILER being a limited resource means that such allocations will start to fail before actual free memory is exhausted. What is even worse is that with time DMM/TILER space gets fragmented to the point that even if we have enough free DMM/TILER space and free memory, allocation fails because there is no big enough free block in DMM/TILER space.
Such failures can be easily observed with OMAP xorg DDX, for example - starting few GUI applications (so buffers for their windows are allocated) and then rotating landscape<->portrait while closing and opening new windows soon results in allocation failures.
Fix that by mapping buffers through DMM/TILER only when really needed, like, for scanout buffers.
Doesn't this break users that get a buffer from omapdrm and expect it to be contiguous?
If you mean dumb buffer, then no, this does not break users as dumb buffers are allocated as scanout:
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/omapdrm/omap_...
If you mean omap_bo allocated buffers, then if users want linear(scanout) buffer, then they request it explicitly by passing OMAP_BO_SCANOUT.
Ivo
gentle ping
On 19.01.22 г. 12:23 ч., Ivaylo Dimitrov wrote:
This patch series fixes excessive DMM or CMA usage of GEM buffers leading to various runtime allocation failures. The series enables daily usage of devices without exausting limited resources like CMA or DMM space if GPU rendering is needed.
The first patch doesn't bring any functional changes, it just moves some TILER/DMM related code to a separate function, to simplify the review of the next two patches.
The second patch allows off-CPU rendering to non-scanout buffers. Without that patch, it is basically impossible to use the driver allocated GEM buffers on OMAP3 for anything else but a basic CPU rendered examples as if we want GPU rendering, we must allocate buffers as scanout buffers, which are CMA allocated. CMA soon gets fragmented and we start seeing allocation failures. Such failres in Xorg cannot be handeled gracefully, so the system is basically unusable.
Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get fragmented with time, leading to allocation failures.
Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and PVR EXA from https://github.com/maemo-leste/xf86-video-omap
Ivaylo Dimitrov (3): drm: omapdrm: simplify omap_gem_pin drm: omapdrm: Support exporting of non-contiguous GEM BOs drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER
drivers/gpu/drm/omapdrm/omap_gem.c | 198 +++++++++++++++++------------- drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- 3 files changed, 116 insertions(+), 90 deletions(-)
Hi
Am 19.01.22 um 11:23 schrieb Ivaylo Dimitrov:
This patch series fixes excessive DMM or CMA usage of GEM buffers leading to various runtime allocation failures. The series enables daily usage of devices without exausting limited resources like CMA or DMM space if GPU rendering is needed.
The first patch doesn't bring any functional changes, it just moves some TILER/DMM related code to a separate function, to simplify the review of the next two patches.
The second patch allows off-CPU rendering to non-scanout buffers. Without that patch, it is basically impossible to use the driver allocated GEM buffers on OMAP3 for anything else but a basic CPU rendered examples as if we want GPU rendering, we must allocate buffers as scanout buffers, which are CMA allocated. CMA soon gets fragmented and we start seeing allocation failures. Such failres in Xorg cannot be handeled gracefully, so the system is basically unusable.
Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get fragmented with time, leading to allocation failures.
Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and PVR EXA from https://github.com/maemo-leste/xf86-video-omap
Ivaylo Dimitrov (3): drm: omapdrm: simplify omap_gem_pin drm: omapdrm: Support exporting of non-contiguous GEM BOs drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER
With the little expertise I have with omapdrm:
Acked-by: Thomas Zimmermann tzimmermann@suse.de
drivers/gpu/drm/omapdrm/omap_gem.c | 198 +++++++++++++++++------------- drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- 3 files changed, 116 insertions(+), 90 deletions(-)
Hi Ivaylo,
On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
This patch series fixes excessive DMM or CMA usage of GEM buffers leading to various runtime allocation failures. The series enables daily usage of devices without exausting limited resources like CMA or DMM space if GPU rendering is needed.
The first patch doesn't bring any functional changes, it just moves some TILER/DMM related code to a separate function, to simplify the review of the next two patches.
The second patch allows off-CPU rendering to non-scanout buffers. Without that patch, it is basically impossible to use the driver allocated GEM buffers on OMAP3 for anything else but a basic CPU rendered examples as if we want GPU rendering, we must allocate buffers as scanout buffers, which are CMA allocated. CMA soon gets fragmented and we start seeing allocation failures. Such failres in Xorg cannot be handeled gracefully, so the system is basically unusable.
Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get fragmented with time, leading to allocation failures.
I think this is just hacking around the problem. The problem is that omapdrm is being used by some as a generic buffer allocator. Those users should be changed to use a their own allocator or a generic allocator. And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would be scanout buffers.
Or do we have a regression in the driver? My understanding is that this has never really worked.
Tomi
Hi Tomi,
On 17.02.22 г. 15:03 ч., Tomi Valkeinen wrote:
Hi Ivaylo,
On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
This patch series fixes excessive DMM or CMA usage of GEM buffers leading to various runtime allocation failures. The series enables daily usage of devices without exausting limited resources like CMA or DMM space if GPU rendering is needed.
The first patch doesn't bring any functional changes, it just moves some TILER/DMM related code to a separate function, to simplify the review of the next two patches.
The second patch allows off-CPU rendering to non-scanout buffers. Without that patch, it is basically impossible to use the driver allocated GEM buffers on OMAP3 for anything else but a basic CPU rendered examples as if we want GPU rendering, we must allocate buffers as scanout buffers, which are CMA allocated. CMA soon gets fragmented and we start seeing allocation failures. Such failres in Xorg cannot be handeled gracefully, so the system is basically unusable.
Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get fragmented with time, leading to allocation failures.
I think this is just hacking around the problem. The problem is that omapdrm is being used by some as a generic buffer allocator. Those users
Well, the user of omap_bo interface I know is xf86-video-omap. Unless if by users you mean 'kernel users' which I know none.
I think that if 'we' are to teach xorg omap DDX (or any other user in that regard) to use GPU driver allocator for non-scanout buffers and omapdrm for scanout, it will become a mess. Not impossible though, just way more complicated than the $series. Also, why do omapdrm allow allocation of non-linear buffers and CPU (userspace) access to them, but refuses to export them to kernel drivers? Isn't that the whole point of DMABUF stuff? This is not consistent to me. The series fixes that inconsistency, nothing more.
should be changed to use a their own allocator or a generic allocator.
SGX driver/userspace has and uses its own allocator, however, I think there is more than that - what about TILER/VRFB? Do you say that SGX userspace shall be smart enough to requests TILER buffers from omapdrm when scanout buffer is requested and use its own allocator when not?
Actually I was thinking about something like that, and it is achievable now we have:
https://github.com/maemo-leste/sgx-ddk-um/blob/master/dbm/dbm.c (REed SGX 1.17 ddk gbm backend)
And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would be scanout buffers.
And what about OMAP_BO_TILED_XX stuff? To me this is even more of a hack, but it is what it is.
Do I get it correctly that you want to get rid of omap_bo_new/_tiled and have only dumb buffers available in omapdrm? TBH this would be great, however I still don't see how a TILER/VRFB buffer would be allocated, given that flags in drm_mode_create_dumb is not used anywhere in the kernel(AFAIK). Unless all scanout buffers are allocated through TILER/VRFB (which is a good idea IMO).
Or do we have a regression in the driver? My understanding is that this has never really worked.
There are couple of patches in omapdrm that change around BO flags and their meaning so I think there is a regression, as the same userspace/DDX on linux 5.9 results in only 2 linear buffers being allocated, but as SGX driver has different version as well, I can't be 100% sure without going through a lengthy assessment of SGX driver/omapdrm code and patches since 5.9. Which I am not going to do as I don't see what the benefit will be.
Please consider this patch series as a fix to an inconsistency, as it is merely that, it does not really bring any new functionality in terms of what is allocated.
Thanks and regards, Ivo
On 17/02/2022 18:21, Ivaylo Dimitrov wrote:
Hi Tomi,
On 17.02.22 г. 15:03 ч., Tomi Valkeinen wrote:
Hi Ivaylo,
On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
This patch series fixes excessive DMM or CMA usage of GEM buffers leading to various runtime allocation failures. The series enables daily usage of devices without exausting limited resources like CMA or DMM space if GPU rendering is needed.
The first patch doesn't bring any functional changes, it just moves some TILER/DMM related code to a separate function, to simplify the review of the next two patches.
The second patch allows off-CPU rendering to non-scanout buffers. Without that patch, it is basically impossible to use the driver allocated GEM buffers on OMAP3 for anything else but a basic CPU rendered examples as if we want GPU rendering, we must allocate buffers as scanout buffers, which are CMA allocated. CMA soon gets fragmented and we start seeing allocation failures. Such failres in Xorg cannot be handeled gracefully, so the system is basically unusable.
Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get fragmented with time, leading to allocation failures.
I think this is just hacking around the problem. The problem is that omapdrm is being used by some as a generic buffer allocator. Those users
Well, the user of omap_bo interface I know is xf86-video-omap. Unless if by users you mean 'kernel users' which I know none.
I think that if 'we' are to teach xorg omap DDX (or any other user in that regard) to use GPU driver allocator for non-scanout buffers and omapdrm for scanout, it will become a mess. Not impossible though, just way more complicated than the $series. Also, why do omapdrm allow allocation of non-linear buffers and CPU (userspace) access to them, but refuses to export them to kernel drivers? Isn't that the whole point of DMABUF stuff? This is not consistent to me. The series fixes that inconsistency, nothing more.
should be changed to use a their own allocator or a generic allocator.
SGX driver/userspace has and uses its own allocator, however, I think there is more than that - what about TILER/VRFB? Do you say that SGX userspace shall be smart enough to requests TILER buffers from omapdrm when scanout buffer is requested and use its own allocator when not?
All I'm saying is that omapdrm should not support allocating buffers that are not usable by the omapdrm hardware. It doesn't make any sense.
Actually I was thinking about something like that, and it is achievable now we have:
https://github.com/maemo-leste/sgx-ddk-um/blob/master/dbm/dbm.c (REed SGX 1.17 ddk gbm backend)
And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would be scanout buffers.
And what about OMAP_BO_TILED_XX stuff? To me this is even more of a hack, but it is what it is.
Yes, I agree, I don't think those OMAP_BO_TILED_* values should be exposed to userspace. But I also agree to the "it is what it is" =).
Do I get it correctly that you want to get rid of omap_bo_new/_tiled and have only dumb buffers available in omapdrm? TBH this would be great, however I still don't see how a TILER/VRFB buffer would be allocated, given that flags in drm_mode_create_dumb is not used anywhere in the kernel(AFAIK). Unless all scanout buffers are allocated through TILER/VRFB (which is a good idea IMO).
We can't get rid of those as they're userspace API.
Or do we have a regression in the driver? My understanding is that this has never really worked.
There are couple of patches in omapdrm that change around BO flags and their meaning so I think there is a regression, as the same userspace/DDX on linux 5.9 results in only 2 linear buffers being allocated, but as SGX driver has different version as well, I can't be 100% sure without going through a lengthy assessment of SGX driver/omapdrm code and patches since 5.9. Which I am not going to do as I don't see what the benefit will be.
Please consider this patch series as a fix to an inconsistency, as it is merely that, it does not really bring any new functionality in terms of what is allocated.
I've considered, and I think I agree. The design of omapdrm + tiler is broken in my opinion, but it's there, it has userspace APIs, and it's all old code. It's probably not worth the effort to try to clean it up, while still somehow keeping the old userspace working.
I've had these patchesin my work branch for a while and I haven't seen any issues. I'll keep them there for a bit longer and I'll look at these patches a bit more, but I think I'll merge them at some point.
Tomi
Hi,
On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
This patch series fixes excessive DMM or CMA usage of GEM buffers leading to various runtime allocation failures. The series enables daily usage of devices without exausting limited resources like CMA or DMM space if GPU rendering is needed.
The first patch doesn't bring any functional changes, it just moves some TILER/DMM related code to a separate function, to simplify the review of the next two patches.
The second patch allows off-CPU rendering to non-scanout buffers. Without that patch, it is basically impossible to use the driver allocated GEM buffers on OMAP3 for anything else but a basic CPU rendered examples as if we want GPU rendering, we must allocate buffers as scanout buffers, which are CMA allocated. CMA soon gets fragmented and we start seeing allocation failures. Such failres in Xorg cannot be handeled gracefully, so the system is basically unusable.
Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get fragmented with time, leading to allocation failures.
Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and PVR EXA from https://github.com/maemo-leste/xf86-video-omap
Ivaylo Dimitrov (3): drm: omapdrm: simplify omap_gem_pin drm: omapdrm: Support exporting of non-contiguous GEM BOs drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER
drivers/gpu/drm/omapdrm/omap_gem.c | 198 +++++++++++++++++------------- drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- 3 files changed, 116 insertions(+), 90 deletions(-)
I have pushed this to drm-misc-next.
Tomi
Hi,
On 28.03.22 г. 12:46 ч., Tomi Valkeinen wrote:
Hi,
On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
This patch series fixes excessive DMM or CMA usage of GEM buffers leading to various runtime allocation failures. The series enables daily usage of devices without exausting limited resources like CMA or DMM space if GPU rendering is needed.
The first patch doesn't bring any functional changes, it just moves some TILER/DMM related code to a separate function, to simplify the review of the next two patches.
The second patch allows off-CPU rendering to non-scanout buffers. Without that patch, it is basically impossible to use the driver allocated GEM buffers on OMAP3 for anything else but a basic CPU rendered examples as if we want GPU rendering, we must allocate buffers as scanout buffers, which are CMA allocated. CMA soon gets fragmented and we start seeing allocation failures. Such failres in Xorg cannot be handeled gracefully, so the system is basically unusable.
Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get fragmented with time, leading to allocation failures.
Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and PVR EXA from https://github.com/maemo-leste/xf86-video-omap
Ivaylo Dimitrov (3): drm: omapdrm: simplify omap_gem_pin drm: omapdrm: Support exporting of non-contiguous GEM BOs drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER
drivers/gpu/drm/omapdrm/omap_gem.c | 198 +++++++++++++++++------------- drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- 3 files changed, 116 insertions(+), 90 deletions(-)
I have pushed this to drm-misc-next.
Great, next is VRFB and TE support for Droid4 panel, as soon as I find some spare time :)
Ivo
dri-devel@lists.freedesktop.org