From: Rob Clark rob@ti.com
Support for DMM and tiled buffers. The DMM/TILER block in omap4+ SoC provides support for remapping physically discontiguous buffers for various DMA initiators (DSS, IVAHD, etc) which do not otherwise support non-physically contiguous buffers, as well as providing support for tiled buffers.
See the descriptions in the following two patches for more details.
Andy Gross (1): drm/omap: DMM/TILER support for OMAP4+ platform
Rob Clark (1): drm/omap: add GEM support for tiled/dmm buffers
drivers/staging/omapdrm/Makefile | 10 +- drivers/staging/omapdrm/TODO | 6 + drivers/staging/omapdrm/omap_dmm_priv.h | 187 ++++++++ drivers/staging/omapdrm/omap_dmm_tiler.c | 672 ++++++++++++++++++++++++++ drivers/staging/omapdrm/omap_dmm_tiler.h | 130 +++++ drivers/staging/omapdrm/omap_drm.h | 2 +- drivers/staging/omapdrm/omap_drv.c | 27 +- drivers/staging/omapdrm/omap_drv.h | 3 + drivers/staging/omapdrm/omap_fb.c | 2 +- drivers/staging/omapdrm/omap_gem.c | 432 ++++++++++++++++-- drivers/staging/omapdrm/omap_gem_helpers.c | 55 +++ drivers/staging/omapdrm/omap_priv.h | 7 +- drivers/staging/omapdrm/tcm-sita.c | 703 ++++++++++++++++++++++++++++ drivers/staging/omapdrm/tcm-sita.h | 95 ++++ drivers/staging/omapdrm/tcm.h | 326 +++++++++++++ 15 files changed, 2609 insertions(+), 48 deletions(-) create mode 100644 drivers/staging/omapdrm/omap_dmm_priv.h create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.c create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.h create mode 100644 drivers/staging/omapdrm/tcm-sita.c create mode 100644 drivers/staging/omapdrm/tcm-sita.h create mode 100644 drivers/staging/omapdrm/tcm.h
From: Andy Gross andy.gross@ti.com
Dynamic Memory Manager (DMM) is a hardware block in the OMAP4+ processor that contains at least one TILER instance. TILER, or Tiling and Isometric Lightweight Engine for Rotation, provides IOMMU capabilities through the use of a physical address translation table. The TILER also provides zero cost rotation and mirroring.
The TILER provides both 1D and 2D access by providing different views or address ranges that can be used to access the physical memory that has been mapped in through the PAT. Access to the 1D view results in linear access to the underlying memory. Access to the 2D views result in tiled access to the underlying memory resulted in increased efficiency.
The TILER address space is managed by a tiler container manager (TCM) and allocates the address space through the use of the Simple Tiler Allocation algorithm (SiTA). The purpose of the algorithm is to keep fragmentation of the address space as low as possible.
Signed-off-by: Andy Gross andy.gross@ti.com Signed-off-by: Rob Clark rob@ti.com --- drivers/staging/omapdrm/Makefile | 10 +- drivers/staging/omapdrm/TODO | 1 + drivers/staging/omapdrm/omap_dmm_priv.h | 187 ++++++++ drivers/staging/omapdrm/omap_dmm_tiler.c | 672 ++++++++++++++++++++++++++++ drivers/staging/omapdrm/omap_dmm_tiler.h | 130 ++++++ drivers/staging/omapdrm/omap_drm.h | 2 +- drivers/staging/omapdrm/omap_drv.c | 21 +- drivers/staging/omapdrm/omap_priv.h | 7 +- drivers/staging/omapdrm/tcm-sita.c | 703 ++++++++++++++++++++++++++++++ drivers/staging/omapdrm/tcm-sita.h | 95 ++++ drivers/staging/omapdrm/tcm.h | 326 ++++++++++++++ 11 files changed, 2143 insertions(+), 11 deletions(-) create mode 100644 drivers/staging/omapdrm/omap_dmm_priv.h create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.c create mode 100644 drivers/staging/omapdrm/omap_dmm_tiler.h create mode 100644 drivers/staging/omapdrm/tcm-sita.c create mode 100644 drivers/staging/omapdrm/tcm-sita.h create mode 100644 drivers/staging/omapdrm/tcm.h
diff --git a/drivers/staging/omapdrm/Makefile b/drivers/staging/omapdrm/Makefile index 4aa9a2f..275054a 100644 --- a/drivers/staging/omapdrm/Makefile +++ b/drivers/staging/omapdrm/Makefile @@ -4,7 +4,15 @@ #
ccflags-y := -Iinclude/drm -Werror -omapdrm-y := omap_drv.o omap_crtc.o omap_encoder.o omap_connector.o omap_fb.o omap_fbdev.o omap_gem.o +omapdrm-y := omap_drv.o \ + omap_crtc.o \ + omap_encoder.o \ + omap_connector.o \ + omap_fb.o \ + omap_fbdev.o \ + omap_gem.o \ + omap_dmm_tiler.o \ + tcm-sita.o
# temporary: omapdrm-y += omap_gem_helpers.o diff --git a/drivers/staging/omapdrm/TODO b/drivers/staging/omapdrm/TODO index 17781c9..18677e7 100644 --- a/drivers/staging/omapdrm/TODO +++ b/drivers/staging/omapdrm/TODO @@ -22,6 +22,7 @@ TODO . Review DSS vs KMS mismatches. The omap_dss_device is sort of part encoder, part connector. Which results in a bit of duct tape to fwd calls from encoder to connector. Possibly this could be done a bit better. +. Add debugfs information for DMM/TILER
Userspace: . git://github.com/robclark/xf86-video-omap.git diff --git a/drivers/staging/omapdrm/omap_dmm_priv.h b/drivers/staging/omapdrm/omap_dmm_priv.h new file mode 100644 index 0000000..65b990c --- /dev/null +++ b/drivers/staging/omapdrm/omap_dmm_priv.h @@ -0,0 +1,187 @@ +/* + * + * Copyright (C) 2011 Texas Instruments Incorporated - http://www.ti.com/ + * Author: Rob Clark rob@ti.com + * Andy Gross andy.gross@ti.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation version 2. + * + * This program is distributed "as is" WITHOUT ANY WARRANTY of any + * kind, whether express or implied; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#ifndef OMAP_DMM_PRIV_H +#define OMAP_DMM_PRIV_H + +#define DMM_REVISION 0x000 +#define DMM_HWINFO 0x004 +#define DMM_LISA_HWINFO 0x008 +#define DMM_DMM_SYSCONFIG 0x010 +#define DMM_LISA_LOCK 0x01C +#define DMM_LISA_MAP__0 0x040 +#define DMM_LISA_MAP__1 0x044 +#define DMM_TILER_HWINFO 0x208 +#define DMM_TILER_OR__0 0x220 +#define DMM_TILER_OR__1 0x224 +#define DMM_PAT_HWINFO 0x408 +#define DMM_PAT_GEOMETRY 0x40C +#define DMM_PAT_CONFIG 0x410 +#define DMM_PAT_VIEW__0 0x420 +#define DMM_PAT_VIEW__1 0x424 +#define DMM_PAT_VIEW_MAP__0 0x440 +#define DMM_PAT_VIEW_MAP_BASE 0x460 +#define DMM_PAT_IRQ_EOI 0x478 +#define DMM_PAT_IRQSTATUS_RAW 0x480 +#define DMM_PAT_IRQSTATUS 0x490 +#define DMM_PAT_IRQENABLE_SET 0x4A0 +#define DMM_PAT_IRQENABLE_CLR 0x4B0 +#define DMM_PAT_STATUS__0 0x4C0 +#define DMM_PAT_STATUS__1 0x4C4 +#define DMM_PAT_STATUS__2 0x4C8 +#define DMM_PAT_STATUS__3 0x4CC +#define DMM_PAT_DESCR__0 0x500 +#define DMM_PAT_DESCR__1 0x510 +#define DMM_PAT_DESCR__2 0x520 +#define DMM_PAT_DESCR__3 0x530 +#define DMM_PEG_HWINFO 0x608 +#define DMM_PEG_PRIO 0x620 +#define DMM_PEG_PRIO_PAT 0x640 + +#define DMM_IRQSTAT_DST (1<<0) +#define DMM_IRQSTAT_LST (1<<1) +#define DMM_IRQSTAT_ERR_INV_DSC (1<<2) +#define DMM_IRQSTAT_ERR_INV_DATA (1<<3) +#define DMM_IRQSTAT_ERR_UPD_AREA (1<<4) +#define DMM_IRQSTAT_ERR_UPD_CTRL (1<<5) +#define DMM_IRQSTAT_ERR_UPD_DATA (1<<6) +#define DMM_IRQSTAT_ERR_LUT_MISS (1<<7) + +#define DMM_IRQSTAT_ERR_MASK (DMM_IRQ_STAT_ERR_INV_DSC | \ + DMM_IRQ_STAT_ERR_INV_DATA | \ + DMM_IRQ_STAT_ERR_UPD_AREA | \ + DMM_IRQ_STAT_ERR_UPD_CTRL | \ + DMM_IRQ_STAT_ERR_UPD_DATA | \ + DMM_IRQ_STAT_ERR_LUT_MISS) + +#define DMM_PATSTATUS_READY (1<<0) +#define DMM_PATSTATUS_VALID (1<<1) +#define DMM_PATSTATUS_RUN (1<<2) +#define DMM_PATSTATUS_DONE (1<<3) +#define DMM_PATSTATUS_LINKED (1<<4) +#define DMM_PATSTATUS_BYPASSED (1<<7) +#define DMM_PATSTATUS_ERR_INV_DESCR (1<<10) +#define DMM_PATSTATUS_ERR_INV_DATA (1<<11) +#define DMM_PATSTATUS_ERR_UPD_AREA (1<<12) +#define DMM_PATSTATUS_ERR_UPD_CTRL (1<<13) +#define DMM_PATSTATUS_ERR_UPD_DATA (1<<14) +#define DMM_PATSTATUS_ERR_ACCESS (1<<15) + +#define DMM_PATSTATUS_ERR (DMM_PATSTATUS_ERR_INV_DESCR | \ + DMM_PATSTATUS_ERR_INV_DATA | \ + DMM_PATSTATUS_ERR_UPD_AREA | \ + DMM_PATSTATUS_ERR_UPD_CTRL | \ + DMM_PATSTATUS_ERR_UPD_DATA | \ + DMM_PATSTATUS_ERR_ACCESS) + + + +enum { + PAT_STATUS, + PAT_DESCR +}; + +struct pat_ctrl { + u32 start:4; + u32 dir:4; + u32 lut_id:8; + u32 sync:12; + u32 ini:4; +}; + +struct pat { + uint32_t next_pa; + struct pat_area area; + struct pat_ctrl ctrl; + uint32_t data_pa; +}; + +#define DMM_FIXED_RETRY_COUNT 1000 + +/* create refill buffer big enough to refill all slots, plus 3 descriptors.. + * 3 descriptors is probably the worst-case for # of 2d-slices in a 1d area, + * but I guess you don't hit that worst case at the same time as full area + * refill + */ +#define DESCR_SIZE 128 +#define REFILL_BUFFER_SIZE ((4 * 128 * 256) + (3 * DESCR_SIZE)) + +struct dmm; + +struct dmm_txn { + void *engine_handle; + struct tcm *tcm; + + uint8_t *current_va; + dma_addr_t current_pa; + + struct pat *last_pat; +}; + +struct refill_engine { + int id; + struct dmm *dmm; + struct tcm *tcm; + + uint8_t *refill_va; + dma_addr_t refill_pa; + + /* only one trans per engine for now */ + struct dmm_txn txn; + + /* offset to lut associated with container */ + u32 *lut_offset; + + wait_queue_head_t wait_for_refill; + + struct list_head idle_node; +}; + +struct dmm { + struct device *dev; + void __iomem *base; + int irq; + + struct page *dummy_page; + dma_addr_t dummy_pa; + + void *refill_va; + dma_addr_t refill_pa; + + /* refill engines */ + struct semaphore engine_sem; + struct list_head idle_head; + struct refill_engine *engines; + int num_engines; + + /* container information */ + int container_width; + int container_height; + int lut_width; + int lut_height; + int num_lut; + + /* array of LUT - TCM containers */ + struct tcm **tcm; + + /* LUT table storage */ + u32 *lut; + + /* allocation list and lock */ + struct list_head alloc_head; + spinlock_t list_lock; +}; + +#endif diff --git a/drivers/staging/omapdrm/omap_dmm_tiler.c b/drivers/staging/omapdrm/omap_dmm_tiler.c new file mode 100644 index 0000000..9ed5215 --- /dev/null +++ b/drivers/staging/omapdrm/omap_dmm_tiler.c @@ -0,0 +1,672 @@ +/* + * DMM IOMMU driver support functions for TI OMAP processors. + * + * Author: Rob Clark rob@ti.com + * Andy Gross andy.gross@ti.com + * + * Copyright (C) 2011 Texas Instruments Incorporated - http://www.ti.com/ + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation version 2. + * + * This program is distributed "as is" WITHOUT ANY WARRANTY of any + * kind, whether express or implied; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#include <linux/init.h> +#include <linux/module.h> +#include <linux/platform_device.h> /* platform_device() */ +#include <linux/errno.h> +#include <linux/sched.h> +#include <linux/wait.h> +#include <linux/interrupt.h> +#include <linux/dma-mapping.h> +#include <linux/slab.h> +#include <linux/vmalloc.h> +#include <linux/delay.h> +#include <linux/mm.h> +#include <linux/time.h> +#include <linux/list.h> +#include <linux/semaphore.h> + +#include "omap_dmm_tiler.h" +#include "omap_dmm_priv.h" + +/* mappings for associating views to luts */ +static struct tcm *containers[TILFMT_NFORMATS]; +static struct dmm *omap_dmm; + +/* Geometry table */ +#define GEOM(xshift, yshift, bytes_per_pixel) { \ + .x_shft = (xshift), \ + .y_shft = (yshift), \ + .cpp = (bytes_per_pixel), \ + .slot_w = 1 << (SLOT_WIDTH_BITS - (xshift)), \ + .slot_h = 1 << (SLOT_HEIGHT_BITS - (yshift)), \ + } + +static const struct { + uint32_t x_shft; /* unused X-bits (as part of bpp) */ + uint32_t y_shft; /* unused Y-bits (as part of bpp) */ + uint32_t cpp; /* bytes/chars per pixel */ + uint32_t slot_w; /* width of each slot (in pixels) */ + uint32_t slot_h; /* height of each slot (in pixels) */ +} geom[TILFMT_NFORMATS] = { + [TILFMT_8BIT] = GEOM(0, 0, 1), + [TILFMT_16BIT] = GEOM(0, 1, 2), + [TILFMT_32BIT] = GEOM(1, 1, 4), + [TILFMT_PAGE] = GEOM(SLOT_WIDTH_BITS, SLOT_HEIGHT_BITS, 1), +}; + + +/* lookup table for registers w/ per-engine instances */ +static const uint32_t reg[][4] = { + [PAT_STATUS] = {DMM_PAT_STATUS__0, DMM_PAT_STATUS__1, + DMM_PAT_STATUS__2, DMM_PAT_STATUS__3}, + [PAT_DESCR] = {DMM_PAT_DESCR__0, DMM_PAT_DESCR__1, + DMM_PAT_DESCR__2, DMM_PAT_DESCR__3}, +}; + +/* simple allocator to grab next 16 byte aligned memory from txn */ +static void *alloc_dma(struct dmm_txn *txn, size_t sz, dma_addr_t *pa) +{ + void *ptr; + struct refill_engine *engine = txn->engine_handle; + + /* dmm programming requires 16 byte aligned addresses */ + txn->current_pa = round_up(txn->current_pa, 16); + txn->current_va = (void *)round_up((long)txn->current_va, 16); + + ptr = txn->current_va; + *pa = txn->current_pa; + + txn->current_pa += sz; + txn->current_va += sz; + + BUG_ON((txn->current_va - engine->refill_va) > REFILL_BUFFER_SIZE); + + return ptr; +} + +/* check status and spin until wait_mask comes true */ +static int wait_status(struct refill_engine *engine, uint32_t wait_mask) +{ + struct dmm *dmm = engine->dmm; + uint32_t r = 0, err, i; + + i = DMM_FIXED_RETRY_COUNT; + while (true) { + r = readl(dmm->base + reg[PAT_STATUS][engine->id]); + err = r & DMM_PATSTATUS_ERR; + if (err) + return -EFAULT; + + if ((r & wait_mask) == wait_mask) + break; + + if (--i == 0) + return -ETIMEDOUT; + + udelay(1); + } + + return 0; +} + +irqreturn_t omap_dmm_irq_handler(int irq, void *arg) +{ + struct dmm *dmm = arg; + uint32_t status = readl(dmm->base + DMM_PAT_IRQSTATUS); + int i; + + /* ack IRQ */ + writel(status, dmm->base + DMM_PAT_IRQSTATUS); + + for (i = 0; i < dmm->num_engines; i++) { + if (status & DMM_IRQSTAT_LST) + wake_up_interruptible(&dmm->engines[i].wait_for_refill); + + status >>= 8; + } + + return IRQ_HANDLED; +} + +/** + * Get a handle for a DMM transaction + */ +static struct dmm_txn *dmm_txn_init(struct dmm *dmm, struct tcm *tcm) +{ + struct dmm_txn *txn = NULL; + struct refill_engine *engine = NULL; + + down(&dmm->engine_sem); + + /* grab an idle engine */ + spin_lock(&dmm->list_lock); + if (!list_empty(&dmm->idle_head)) { + engine = list_entry(dmm->idle_head.next, struct refill_engine, + idle_node); + list_del(&engine->idle_node); + } + spin_unlock(&dmm->list_lock); + + BUG_ON(!engine); + + txn = &engine->txn; + engine->tcm = tcm; + txn->engine_handle = engine; + txn->last_pat = NULL; + txn->current_va = engine->refill_va; + txn->current_pa = engine->refill_pa; + + return txn; +} + +/** + * Add region to DMM transaction. If pages or pages[i] is NULL, then the + * corresponding slot is cleared (ie. dummy_pa is programmed) + */ +static int dmm_txn_append(struct dmm_txn *txn, struct pat_area *area, + struct page **pages) +{ + dma_addr_t pat_pa = 0; + uint32_t *data; + struct pat *pat; + struct refill_engine *engine = txn->engine_handle; + int columns = (1 + area->x1 - area->x0); + int rows = (1 + area->y1 - area->y0); + int i = columns*rows; + u32 *lut = omap_dmm->lut + (engine->tcm->lut_id * omap_dmm->lut_width * + omap_dmm->lut_height) + + (area->y0 * omap_dmm->lut_width) + area->x0; + + pat = alloc_dma(txn, sizeof(struct pat), &pat_pa); + + if (txn->last_pat) + txn->last_pat->next_pa = (uint32_t)pat_pa; + + pat->area = *area; + pat->ctrl = (struct pat_ctrl){ + .start = 1, + .lut_id = engine->tcm->lut_id, + }; + + data = alloc_dma(txn, 4*i, &pat->data_pa); + + while (i--) { + data[i] = (pages && pages[i]) ? + page_to_phys(pages[i]) : engine->dmm->dummy_pa; + } + + /* fill in lut with new addresses */ + for (i = 0; i < rows; i++, lut += omap_dmm->lut_width) + memcpy(lut, &data[i*columns], columns * sizeof(u32)); + + txn->last_pat = pat; + + return 0; +} + +/** + * Commit the DMM transaction. + */ +static int dmm_txn_commit(struct dmm_txn *txn, bool wait) +{ + int ret = 0; + struct refill_engine *engine = txn->engine_handle; + struct dmm *dmm = engine->dmm; + + if (!txn->last_pat) { + dev_err(engine->dmm->dev, "need at least one txn\n"); + ret = -EINVAL; + goto cleanup; + } + + txn->last_pat->next_pa = 0; + + /* write to PAT_DESCR to clear out any pending transaction */ + writel(0x0, dmm->base + reg[PAT_DESCR][engine->id]); + + /* wait for engine ready: */ + ret = wait_status(engine, DMM_PATSTATUS_READY); + if (ret) { + ret = -EFAULT; + goto cleanup; + } + + /* kick reload */ + writel(engine->refill_pa, + dmm->base + reg[PAT_DESCR][engine->id]); + + if (wait) { + if (wait_event_interruptible_timeout(engine->wait_for_refill, + wait_status(engine, DMM_PATSTATUS_READY) == 0, + msecs_to_jiffies(1)) <= 0) { + dev_err(dmm->dev, "timed out waiting for done\n"); + ret = -ETIMEDOUT; + } + } + +cleanup: + spin_lock(&dmm->list_lock); + list_add(&engine->idle_node, &dmm->idle_head); + spin_unlock(&dmm->list_lock); + + up(&omap_dmm->engine_sem); + return ret; +} + +/* + * DMM programming + */ +static int fill(struct tcm_area *area, struct page **pages, bool wait) +{ + int ret = 0; + struct tcm_area slice, area_s; + struct dmm_txn *txn; + + txn = dmm_txn_init(omap_dmm, area->tcm); + if (IS_ERR_OR_NULL(txn)) + return PTR_ERR(txn); + + tcm_for_each_slice(slice, *area, area_s) { + struct pat_area p_area = { + .x0 = slice.p0.x, .y0 = slice.p0.y, + .x1 = slice.p1.x, .y1 = slice.p1.y, + }; + + ret = dmm_txn_append(txn, &p_area, pages); + if (ret) + goto fail; + + if (pages) + pages += tcm_sizeof(slice); + } + + ret = dmm_txn_commit(txn, wait); + +fail: + return ret; +} + +/* + * Pin/unpin + */ + +/* note: slots for which pages[i] == NULL are filled w/ dummy page + */ +int tiler_pin(struct tiler_block *block, struct page **pages, bool wait) +{ + int ret; + + ret = fill(&block->area, pages, wait); + + if (ret) + tiler_unpin(block); + + return ret; +} + +int tiler_unpin(struct tiler_block *block) +{ + return fill(&block->area, NULL, false); +} + +/* + * Reserve/release + */ +struct tiler_block *tiler_reserve_2d(enum tiler_fmt fmt, uint16_t w, + uint16_t h, uint16_t align) +{ + struct tiler_block *block = kzalloc(sizeof(*block), GFP_KERNEL); + u32 min_align = 128; + int ret; + + BUG_ON(!validfmt(fmt)); + + /* convert width/height to slots */ + w = DIV_ROUND_UP(w, geom[fmt].slot_w); + h = DIV_ROUND_UP(h, geom[fmt].slot_h); + + /* convert alignment to slots */ + min_align = max(min_align, (geom[fmt].slot_w * geom[fmt].cpp)); + align = ALIGN(align, min_align); + align /= geom[fmt].slot_w * geom[fmt].cpp; + + block->fmt = fmt; + + ret = tcm_reserve_2d(containers[fmt], w, h, align, &block->area); + if (ret) { + kfree(block); + return 0; + } + + /* add to allocation list */ + spin_lock(&omap_dmm->list_lock); + list_add(&block->alloc_node, &omap_dmm->alloc_head); + spin_unlock(&omap_dmm->list_lock); + + return block; +} + +struct tiler_block *tiler_reserve_1d(size_t size) +{ + struct tiler_block *block = kzalloc(sizeof(*block), GFP_KERNEL); + int num_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + + if (!block) + return 0; + + block->fmt = TILFMT_PAGE; + + if (tcm_reserve_1d(containers[TILFMT_PAGE], num_pages, + &block->area)) { + kfree(block); + return 0; + } + + spin_lock(&omap_dmm->list_lock); + list_add(&block->alloc_node, &omap_dmm->alloc_head); + spin_unlock(&omap_dmm->list_lock); + + return block; +} + +/* note: if you have pin'd pages, you should have already unpin'd first! */ +int tiler_release(struct tiler_block *block) +{ + int ret = tcm_free(&block->area); + + if (block->area.tcm) + dev_err(omap_dmm->dev, "failed to release block\n"); + + spin_lock(&omap_dmm->list_lock); + list_del(&block->alloc_node); + spin_unlock(&omap_dmm->list_lock); + + kfree(block); + return ret; +} + +/* + * Utils + */ + +/* calculate the tiler space address of a pixel in a view orientation */ +static u32 tiler_get_address(u32 orient, enum tiler_fmt fmt, u32 x, u32 y) +{ + u32 x_bits, y_bits, tmp, x_mask, y_mask, alignment; + + x_bits = CONT_WIDTH_BITS - geom[fmt].x_shft; + y_bits = CONT_HEIGHT_BITS - geom[fmt].y_shft; + alignment = geom[fmt].x_shft + geom[fmt].y_shft; + + /* validate coordinate */ + x_mask = MASK(x_bits); + y_mask = MASK(y_bits); + + if (x < 0 || x > x_mask || y < 0 || y > y_mask) + return 0; + + /* account for mirroring */ + if (orient & MASK_X_INVERT) + x ^= x_mask; + if (orient & MASK_Y_INVERT) + y ^= y_mask; + + /* get coordinate address */ + if (orient & MASK_XY_FLIP) + tmp = ((x << y_bits) + y); + else + tmp = ((y << x_bits) + x); + + return TIL_ADDR((tmp << alignment), orient, fmt); +} + +dma_addr_t tiler_ssptr(struct tiler_block *block) +{ + BUG_ON(!validfmt(block->fmt)); + + return TILVIEW_8BIT + tiler_get_address(0, block->fmt, + block->area.p0.x * geom[block->fmt].slot_w, + block->area.p0.y * geom[block->fmt].slot_h); +} + +void tiler_align(enum tiler_fmt fmt, uint16_t *w, uint16_t *h) +{ + BUG_ON(!validfmt(fmt)); + *w = round_up(*w, geom[fmt].slot_w); + *h = round_up(*h, geom[fmt].slot_h); +} + +uint32_t tiler_stride(enum tiler_fmt fmt) +{ + BUG_ON(!validfmt(fmt)); + + return 1 << (CONT_WIDTH_BITS + geom[fmt].y_shft); +} + +size_t tiler_size(enum tiler_fmt fmt, uint16_t w, uint16_t h) +{ + tiler_align(fmt, &w, &h); + return geom[fmt].cpp * w * h; +} + +size_t tiler_vsize(enum tiler_fmt fmt, uint16_t w, uint16_t h) +{ + BUG_ON(!validfmt(fmt)); + return round_up(geom[fmt].cpp * w, PAGE_SIZE) * h; +} + +int omap_dmm_remove(void) +{ + struct tiler_block *block, *_block; + int i; + + if (omap_dmm) { + /* free all area regions */ + spin_lock(&omap_dmm->list_lock); + list_for_each_entry_safe(block, _block, &omap_dmm->alloc_head, + alloc_node) { + list_del(&block->alloc_node); + kfree(block); + } + spin_unlock(&omap_dmm->list_lock); + + for (i = 0; i < omap_dmm->num_lut; i++) + if (omap_dmm->tcm && omap_dmm->tcm[i]) + omap_dmm->tcm[i]->deinit(omap_dmm->tcm[i]); + kfree(omap_dmm->tcm); + + kfree(omap_dmm->engines); + if (omap_dmm->refill_va) + dma_free_coherent(omap_dmm->dev, + REFILL_BUFFER_SIZE * omap_dmm->num_engines, + omap_dmm->refill_va, + omap_dmm->refill_pa); + if (omap_dmm->dummy_page) + __free_page(omap_dmm->dummy_page); + + vfree(omap_dmm->lut); + + if (omap_dmm->irq != -1) + free_irq(omap_dmm->irq, omap_dmm); + + kfree(omap_dmm); + } + + return 0; +} + +int omap_dmm_init(struct drm_device *dev) +{ + int ret = -EFAULT, i; + struct tcm_area area = {0}; + u32 hwinfo, pat_geom, lut_table_size; + struct omap_drm_platform_data *pdata = dev->dev->platform_data; + + if (!pdata || !pdata->dmm_pdata) { + dev_err(dev->dev, "dmm platform data not present, skipping\n"); + return ret; + } + + omap_dmm = kzalloc(sizeof(*omap_dmm), GFP_KERNEL); + if (!omap_dmm) { + dev_err(dev->dev, "failed to allocate driver data section\n"); + goto fail; + } + + /* lookup hwmod data - base address and irq */ + omap_dmm->base = pdata->dmm_pdata->base; + omap_dmm->irq = pdata->dmm_pdata->irq; + omap_dmm->dev = dev->dev; + + if (!omap_dmm->base) { + dev_err(dev->dev, "failed to get dmm base address\n"); + goto fail; + } + + hwinfo = readl(omap_dmm->base + DMM_PAT_HWINFO); + omap_dmm->num_engines = (hwinfo >> 24) & 0x1F; + omap_dmm->num_lut = (hwinfo >> 16) & 0x1F; + omap_dmm->container_width = 256; + omap_dmm->container_height = 128; + + /* read out actual LUT width and height */ + pat_geom = readl(omap_dmm->base + DMM_PAT_GEOMETRY); + omap_dmm->lut_width = ((pat_geom >> 16) & 0xF) << 5; + omap_dmm->lut_height = ((pat_geom >> 24) & 0xF) << 5; + + /* initialize DMM registers */ + writel(0x88888888, omap_dmm->base + DMM_PAT_VIEW__0); + writel(0x88888888, omap_dmm->base + DMM_PAT_VIEW__1); + writel(0x80808080, omap_dmm->base + DMM_PAT_VIEW_MAP__0); + writel(0x80000000, omap_dmm->base + DMM_PAT_VIEW_MAP_BASE); + writel(0x88888888, omap_dmm->base + DMM_TILER_OR__0); + writel(0x88888888, omap_dmm->base + DMM_TILER_OR__1); + + ret = request_irq(omap_dmm->irq, omap_dmm_irq_handler, IRQF_SHARED, + "omap_dmm_irq_handler", omap_dmm); + + if (ret) { + dev_err(dev->dev, "couldn't register IRQ %d, error %d\n", + omap_dmm->irq, ret); + omap_dmm->irq = -1; + goto fail; + } + + /* enable some interrupts! */ + writel(0xfefefefe, omap_dmm->base + DMM_PAT_IRQENABLE_SET); + + lut_table_size = omap_dmm->lut_width * omap_dmm->lut_height * + omap_dmm->num_lut; + + omap_dmm->lut = vmalloc(lut_table_size * sizeof(*omap_dmm->lut)); + if (!omap_dmm->lut) { + dev_err(dev->dev, "could not allocate lut table\n"); + ret = -ENOMEM; + goto fail; + } + + omap_dmm->dummy_page = alloc_page(GFP_KERNEL | __GFP_DMA32); + if (!omap_dmm->dummy_page) { + dev_err(dev->dev, "could not allocate dummy page\n"); + ret = -ENOMEM; + goto fail; + } + omap_dmm->dummy_pa = page_to_phys(omap_dmm->dummy_page); + + /* alloc refill memory */ + omap_dmm->refill_va = dma_alloc_coherent(dev->dev, + REFILL_BUFFER_SIZE * omap_dmm->num_engines, + &omap_dmm->refill_pa, GFP_KERNEL); + if (!omap_dmm->refill_va) { + dev_err(dev->dev, "could not allocate refill memory\n"); + goto fail; + } + + /* alloc engines */ + omap_dmm->engines = kzalloc( + omap_dmm->num_engines * sizeof(struct refill_engine), + GFP_KERNEL); + if (!omap_dmm->engines) { + dev_err(dev->dev, "could not allocate engines\n"); + ret = -ENOMEM; + goto fail; + } + + sema_init(&omap_dmm->engine_sem, omap_dmm->num_engines); + INIT_LIST_HEAD(&omap_dmm->idle_head); + for (i = 0; i < omap_dmm->num_engines; i++) { + omap_dmm->engines[i].id = i; + omap_dmm->engines[i].dmm = omap_dmm; + omap_dmm->engines[i].refill_va = omap_dmm->refill_va + + (REFILL_BUFFER_SIZE * i); + omap_dmm->engines[i].refill_pa = omap_dmm->refill_pa + + (REFILL_BUFFER_SIZE * i); + init_waitqueue_head(&omap_dmm->engines[i].wait_for_refill); + + list_add(&omap_dmm->engines[i].idle_node, &omap_dmm->idle_head); + } + + omap_dmm->tcm = kzalloc(omap_dmm->num_lut * sizeof(*omap_dmm->tcm), + GFP_KERNEL); + if (!omap_dmm->tcm) { + dev_err(dev->dev, "failed to allocate lut ptrs\n"); + ret = -ENOMEM; + goto fail; + } + + /* init containers */ + for (i = 0; i < omap_dmm->num_lut; i++) { + omap_dmm->tcm[i] = sita_init(omap_dmm->container_width, + omap_dmm->container_height, + NULL); + + if (!omap_dmm->tcm[i]) { + dev_err(dev->dev, "failed to allocate container\n"); + ret = -ENOMEM; + goto fail; + } + + omap_dmm->tcm[i]->lut_id = i; + } + + /* assign access mode containers to applicable tcm container */ + /* OMAP 4 has 1 container for all 4 views */ + containers[TILFMT_8BIT] = omap_dmm->tcm[0]; + containers[TILFMT_16BIT] = omap_dmm->tcm[0]; + containers[TILFMT_32BIT] = omap_dmm->tcm[0]; + containers[TILFMT_PAGE] = omap_dmm->tcm[0]; + + INIT_LIST_HEAD(&omap_dmm->alloc_head); + spin_lock_init(&omap_dmm->list_lock); + + area = (struct tcm_area) { + .is2d = true, + .tcm = NULL, + .p1.x = omap_dmm->container_width - 1, + .p1.y = omap_dmm->container_height - 1, + }; + + for (i = 0; i < lut_table_size; i++) + omap_dmm->lut[i] = omap_dmm->dummy_pa; + + /* initialize all LUTs to dummy page entries */ + for (i = 0; i < omap_dmm->num_lut; i++) { + area.tcm = omap_dmm->tcm[i]; + if (fill(&area, NULL, true)) + dev_err(omap_dmm->dev, "refill failed"); + } + + dev_info(omap_dmm->dev, "initialized all PAT entries\n"); + + return 0; + +fail: + omap_dmm_remove(); + return ret; +} diff --git a/drivers/staging/omapdrm/omap_dmm_tiler.h b/drivers/staging/omapdrm/omap_dmm_tiler.h new file mode 100644 index 0000000..7e63b6b --- /dev/null +++ b/drivers/staging/omapdrm/omap_dmm_tiler.h @@ -0,0 +1,130 @@ +/* + * + * Copyright (C) 2011 Texas Instruments Incorporated - http://www.ti.com/ + * Author: Rob Clark rob@ti.com + * Andy Gross andy.gross@ti.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation version 2. + * + * This program is distributed "as is" WITHOUT ANY WARRANTY of any + * kind, whether express or implied; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#ifndef OMAP_DMM_TILER_H +#define OMAP_DMM_TILER_H + +#include "omap_drv.h" +#include "tcm.h" + +enum tiler_fmt { + TILFMT_8BIT = 0, + TILFMT_16BIT, + TILFMT_32BIT, + TILFMT_PAGE, + TILFMT_NFORMATS +}; + +struct pat_area { + u32 x0:8; + u32 y0:8; + u32 x1:8; + u32 y1:8; +}; + +struct tiler_block { + struct list_head alloc_node; /* node for global block list */ + struct tcm_area area; /* area */ + enum tiler_fmt fmt; /* format */ +}; + +/* bits representing the same slot in DMM-TILER hw-block */ +#define SLOT_WIDTH_BITS 6 +#define SLOT_HEIGHT_BITS 6 + +/* bits reserved to describe coordinates in DMM-TILER hw-block */ +#define CONT_WIDTH_BITS 14 +#define CONT_HEIGHT_BITS 13 + +/* calculated constants */ +#define TILER_PAGE (1 << (SLOT_WIDTH_BITS + SLOT_HEIGHT_BITS)) +#define TILER_WIDTH (1 << (CONT_WIDTH_BITS - SLOT_WIDTH_BITS)) +#define TILER_HEIGHT (1 << (CONT_HEIGHT_BITS - SLOT_HEIGHT_BITS)) + +/* tiler space addressing bitfields */ +#define MASK_XY_FLIP (1 << 31) +#define MASK_Y_INVERT (1 << 30) +#define MASK_X_INVERT (1 << 29) +#define SHIFT_ACC_MODE 27 +#define MASK_ACC_MODE 3 + +#define MASK(bits) ((1 << (bits)) - 1) + +#define TILVIEW_8BIT 0x60000000u +#define TILVIEW_16BIT (TILVIEW_8BIT + VIEW_SIZE) +#define TILVIEW_32BIT (TILVIEW_16BIT + VIEW_SIZE) +#define TILVIEW_PAGE (TILVIEW_32BIT + VIEW_SIZE) +#define TILVIEW_END (TILVIEW_PAGE + VIEW_SIZE) + +/* create tsptr by adding view orientation and access mode */ +#define TIL_ADDR(x, orient, a)\ + ((u32) (x) | (orient) | ((a) << SHIFT_ACC_MODE)) + +/* externally accessible functions */ +int omap_dmm_init(struct drm_device *dev); +int omap_dmm_remove(void); + +/* pin/unpin */ +int tiler_pin(struct tiler_block *block, struct page **pages, bool wait); +int tiler_unpin(struct tiler_block *block); + +/* reserve/release */ +struct tiler_block *tiler_reserve_2d(enum tiler_fmt fmt, uint16_t w, uint16_t h, + uint16_t align); +struct tiler_block *tiler_reserve_1d(size_t size); +int tiler_release(struct tiler_block *block); + +/* utilities */ +dma_addr_t tiler_ssptr(struct tiler_block *block); +uint32_t tiler_stride(enum tiler_fmt fmt); +size_t tiler_size(enum tiler_fmt fmt, uint16_t w, uint16_t h); +size_t tiler_vsize(enum tiler_fmt fmt, uint16_t w, uint16_t h); +void tiler_align(enum tiler_fmt fmt, uint16_t *w, uint16_t *h); + + +/* GEM bo flags -> tiler fmt */ +static inline enum tiler_fmt gem2fmt(uint32_t flags) +{ + switch (flags & OMAP_BO_TILED) { + case OMAP_BO_TILED_8: + return TILFMT_8BIT; + case OMAP_BO_TILED_16: + return TILFMT_16BIT; + case OMAP_BO_TILED_32: + return TILFMT_32BIT; + default: + return TILFMT_PAGE; + } +} + +static inline bool validfmt(enum tiler_fmt fmt) +{ + switch (fmt) { + case TILFMT_8BIT: + case TILFMT_16BIT: + case TILFMT_32BIT: + case TILFMT_PAGE: + return true; + default: + return false; + } +} + +struct omap_dmm_platform_data { + void __iomem *base; + int irq; +}; + +#endif diff --git a/drivers/staging/omapdrm/omap_drm.h b/drivers/staging/omapdrm/omap_drm.h index 40167dd..f0ac34a 100644 --- a/drivers/staging/omapdrm/omap_drm.h +++ b/drivers/staging/omapdrm/omap_drm.h @@ -20,7 +20,7 @@ #ifndef __OMAP_DRM_H__ #define __OMAP_DRM_H__
-#include "drm.h" +#include <drm/drm.h>
/* Please note that modifications to all structs defined here are * subject to backwards-compatibility constraints. diff --git a/drivers/staging/omapdrm/omap_drv.c b/drivers/staging/omapdrm/omap_drv.c index cee0050..71de7cf 100644 --- a/drivers/staging/omapdrm/omap_drv.c +++ b/drivers/staging/omapdrm/omap_drv.c @@ -290,6 +290,7 @@ static unsigned int detect_connectors(struct drm_device *dev) static int omap_modeset_init(struct drm_device *dev) { const struct omap_drm_platform_data *pdata = dev->dev->platform_data; + struct omap_kms_platform_data *kms_pdata = NULL; struct omap_drm_private *priv = dev->dev_private; struct omap_dss_device *dssdev = NULL; int i, j; @@ -297,23 +298,27 @@ static int omap_modeset_init(struct drm_device *dev)
drm_mode_config_init(dev);
- if (pdata) { + if (pdata && pdata->kms_pdata) { + kms_pdata = pdata->kms_pdata; + /* if platform data is provided by the board file, use it to * control which overlays, managers, and devices we own. */ - for (i = 0; i < pdata->mgr_cnt; i++) { + for (i = 0; i < kms_pdata->mgr_cnt; i++) { struct omap_overlay_manager *mgr = - omap_dss_get_overlay_manager(pdata->mgr_ids[i]); + omap_dss_get_overlay_manager( + kms_pdata->mgr_ids[i]); create_encoder(dev, mgr); }
- for (i = 0; i < pdata->dev_cnt; i++) { + for (i = 0; i < kms_pdata->dev_cnt; i++) { struct omap_dss_device *dssdev = omap_dss_find_device( - (void *)pdata->dev_names[i], match_dev_name); + (void *)kms_pdata->dev_names[i], + match_dev_name); if (!dssdev) { dev_warn(dev->dev, "no such dssdev: %s\n", - pdata->dev_names[i]); + kms_pdata->dev_names[i]); continue; } create_connector(dev, dssdev); @@ -322,9 +327,9 @@ static int omap_modeset_init(struct drm_device *dev) connected_connectors = detect_connectors(dev);
j = 0; - for (i = 0; i < pdata->ovl_cnt; i++) { + for (i = 0; i < kms_pdata->ovl_cnt; i++) { struct omap_overlay *ovl = - omap_dss_get_overlay(pdata->ovl_ids[i]); + omap_dss_get_overlay(kms_pdata->ovl_ids[i]); create_crtc(dev, ovl, &j, connected_connectors); } } else { diff --git a/drivers/staging/omapdrm/omap_priv.h b/drivers/staging/omapdrm/omap_priv.h index f482d1e..c324709 100644 --- a/drivers/staging/omapdrm/omap_priv.h +++ b/drivers/staging/omapdrm/omap_priv.h @@ -30,7 +30,7 @@ * detected devices. This should be a good default behavior for most cases, * but yet there still might be times when you wish to do something different. */ -struct omap_drm_platform_data { +struct omap_kms_platform_data { int ovl_cnt; const int *ovl_ids; int mgr_cnt; @@ -39,4 +39,9 @@ struct omap_drm_platform_data { const char **dev_names; };
+struct omap_drm_platform_data { + struct omap_kms_platform_data *kms_pdata; + struct omap_dmm_platform_data *dmm_pdata; +}; + #endif /* __OMAP_DRM_H__ */ diff --git a/drivers/staging/omapdrm/tcm-sita.c b/drivers/staging/omapdrm/tcm-sita.c new file mode 100644 index 0000000..10d5ac3 --- /dev/null +++ b/drivers/staging/omapdrm/tcm-sita.c @@ -0,0 +1,703 @@ +/* + * tcm-sita.c + * + * SImple Tiler Allocator (SiTA): 2D and 1D allocation(reservation) algorithm + * + * Authors: Ravi Ramachandra r.ramachandra@ti.com, + * Lajos Molnar molnar@ti.com + * + * Copyright (C) 2009-2010 Texas Instruments, Inc. + * + * This package is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * THIS PACKAGE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED + * WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE. + * + */ +#include <linux/slab.h> +#include <linux/spinlock.h> + +#include "tcm-sita.h" + +#define ALIGN_DOWN(value, align) ((value) & ~((align) - 1)) + +/* Individual selection criteria for different scan areas */ +static s32 CR_L2R_T2B = CR_BIAS_HORIZONTAL; +static s32 CR_R2L_T2B = CR_DIAGONAL_BALANCE; + +/********************************************* + * TCM API - Sita Implementation + *********************************************/ +static s32 sita_reserve_2d(struct tcm *tcm, u16 h, u16 w, u8 align, + struct tcm_area *area); +static s32 sita_reserve_1d(struct tcm *tcm, u32 slots, struct tcm_area *area); +static s32 sita_free(struct tcm *tcm, struct tcm_area *area); +static void sita_deinit(struct tcm *tcm); + +/********************************************* + * Main Scanner functions + *********************************************/ +static s32 scan_areas_and_find_fit(struct tcm *tcm, u16 w, u16 h, u16 align, + struct tcm_area *area); + +static s32 scan_l2r_t2b(struct tcm *tcm, u16 w, u16 h, u16 align, + struct tcm_area *field, struct tcm_area *area); + +static s32 scan_r2l_t2b(struct tcm *tcm, u16 w, u16 h, u16 align, + struct tcm_area *field, struct tcm_area *area); + +static s32 scan_r2l_b2t_one_dim(struct tcm *tcm, u32 num_slots, + struct tcm_area *field, struct tcm_area *area); + +/********************************************* + * Support Infrastructure Methods + *********************************************/ +static s32 is_area_free(struct tcm_area ***map, u16 x0, u16 y0, u16 w, u16 h); + +static s32 update_candidate(struct tcm *tcm, u16 x0, u16 y0, u16 w, u16 h, + struct tcm_area *field, s32 criteria, + struct score *best); + +static void get_nearness_factor(struct tcm_area *field, + struct tcm_area *candidate, + struct nearness_factor *nf); + +static void get_neighbor_stats(struct tcm *tcm, struct tcm_area *area, + struct neighbor_stats *stat); + +static void fill_area(struct tcm *tcm, + struct tcm_area *area, struct tcm_area *parent); + + +/*********************************************/ + +/********************************************* + * Utility Methods + *********************************************/ +struct tcm *sita_init(u16 width, u16 height, struct tcm_pt *attr) +{ + struct tcm *tcm; + struct sita_pvt *pvt; + struct tcm_area area = {0}; + s32 i; + + if (width == 0 || height == 0) + return NULL; + + tcm = kmalloc(sizeof(*tcm), GFP_KERNEL); + pvt = kmalloc(sizeof(*pvt), GFP_KERNEL); + if (!tcm || !pvt) + goto error; + + memset(tcm, 0, sizeof(*tcm)); + memset(pvt, 0, sizeof(*pvt)); + + /* Updating the pointers to SiTA implementation APIs */ + tcm->height = height; + tcm->width = width; + tcm->reserve_2d = sita_reserve_2d; + tcm->reserve_1d = sita_reserve_1d; + tcm->free = sita_free; + tcm->deinit = sita_deinit; + tcm->pvt = (void *)pvt; + + spin_lock_init(&(pvt->lock)); + + /* Creating tam map */ + pvt->map = kmalloc(sizeof(*pvt->map) * tcm->width, GFP_KERNEL); + if (!pvt->map) + goto error; + + for (i = 0; i < tcm->width; i++) { + pvt->map[i] = + kmalloc(sizeof(**pvt->map) * tcm->height, + GFP_KERNEL); + if (pvt->map[i] == NULL) { + while (i--) + kfree(pvt->map[i]); + kfree(pvt->map); + goto error; + } + } + + if (attr && attr->x <= tcm->width && attr->y <= tcm->height) { + pvt->div_pt.x = attr->x; + pvt->div_pt.y = attr->y; + + } else { + /* Defaulting to 3:1 ratio on width for 2D area split */ + /* Defaulting to 3:1 ratio on height for 2D and 1D split */ + pvt->div_pt.x = (tcm->width * 3) / 4; + pvt->div_pt.y = (tcm->height * 3) / 4; + } + + spin_lock(&(pvt->lock)); + assign(&area, 0, 0, width - 1, height - 1); + fill_area(tcm, &area, NULL); + spin_unlock(&(pvt->lock)); + return tcm; + +error: + kfree(tcm); + kfree(pvt); + return NULL; +} + +static void sita_deinit(struct tcm *tcm) +{ + struct sita_pvt *pvt = (struct sita_pvt *)tcm->pvt; + struct tcm_area area = {0}; + s32 i; + + area.p1.x = tcm->width - 1; + area.p1.y = tcm->height - 1; + + spin_lock(&(pvt->lock)); + fill_area(tcm, &area, NULL); + spin_unlock(&(pvt->lock)); + + for (i = 0; i < tcm->height; i++) + kfree(pvt->map[i]); + kfree(pvt->map); + kfree(pvt); +} + +/** + * Reserve a 1D area in the container + * + * @param num_slots size of 1D area + * @param area pointer to the area that will be populated with the + * reserved area + * + * @return 0 on success, non-0 error value on failure. + */ +static s32 sita_reserve_1d(struct tcm *tcm, u32 num_slots, + struct tcm_area *area) +{ + s32 ret; + struct tcm_area field = {0}; + struct sita_pvt *pvt = (struct sita_pvt *)tcm->pvt; + + spin_lock(&(pvt->lock)); + + /* Scanning entire container */ + assign(&field, tcm->width - 1, tcm->height - 1, 0, 0); + + ret = scan_r2l_b2t_one_dim(tcm, num_slots, &field, area); + if (!ret) + /* update map */ + fill_area(tcm, area, area); + + spin_unlock(&(pvt->lock)); + return ret; +} + +/** + * Reserve a 2D area in the container + * + * @param w width + * @param h height + * @param area pointer to the area that will be populated with the reesrved + * area + * + * @return 0 on success, non-0 error value on failure. + */ +static s32 sita_reserve_2d(struct tcm *tcm, u16 h, u16 w, u8 align, + struct tcm_area *area) +{ + s32 ret; + struct sita_pvt *pvt = (struct sita_pvt *)tcm->pvt; + + /* not supporting more than 64 as alignment */ + if (align > 64) + return -EINVAL; + + /* we prefer 1, 32 and 64 as alignment */ + align = align <= 1 ? 1 : align <= 32 ? 32 : 64; + + spin_lock(&(pvt->lock)); + ret = scan_areas_and_find_fit(tcm, w, h, align, area); + if (!ret) + /* update map */ + fill_area(tcm, area, area); + + spin_unlock(&(pvt->lock)); + return ret; +} + +/** + * Unreserve a previously allocated 2D or 1D area + * @param area area to be freed + * @return 0 - success + */ +static s32 sita_free(struct tcm *tcm, struct tcm_area *area) +{ + struct sita_pvt *pvt = (struct sita_pvt *)tcm->pvt; + + spin_lock(&(pvt->lock)); + + /* check that this is in fact an existing area */ + WARN_ON(pvt->map[area->p0.x][area->p0.y] != area || + pvt->map[area->p1.x][area->p1.y] != area); + + /* Clear the contents of the associated tiles in the map */ + fill_area(tcm, area, NULL); + + spin_unlock(&(pvt->lock)); + + return 0; +} + +/** + * Note: In general the cordinates in the scan field area relevant to the can + * sweep directions. The scan origin (e.g. top-left corner) will always be + * the p0 member of the field. Therfore, for a scan from top-left p0.x <= p1.x + * and p0.y <= p1.y; whereas, for a scan from bottom-right p1.x <= p0.x and p1.y + * <= p0.y + */ + +/** + * Raster scan horizontally right to left from top to bottom to find a place for + * a 2D area of given size inside a scan field. + * + * @param w width of desired area + * @param h height of desired area + * @param align desired area alignment + * @param area pointer to the area that will be set to the best position + * @param field area to scan (inclusive) + * + * @return 0 on success, non-0 error value on failure. + */ +static s32 scan_r2l_t2b(struct tcm *tcm, u16 w, u16 h, u16 align, + struct tcm_area *field, struct tcm_area *area) +{ + s32 x, y; + s16 start_x, end_x, start_y, end_y, found_x = -1; + struct tcm_area ***map = ((struct sita_pvt *)tcm->pvt)->map; + struct score best = {{0}, {0}, {0}, 0}; + + start_x = field->p0.x; + end_x = field->p1.x; + start_y = field->p0.y; + end_y = field->p1.y; + + /* check scan area co-ordinates */ + if (field->p0.x < field->p1.x || + field->p1.y < field->p0.y) + return -EINVAL; + + /* check if allocation would fit in scan area */ + if (w > LEN(start_x, end_x) || h > LEN(end_y, start_y)) + return -ENOSPC; + + /* adjust start_x and end_y, as allocation would not fit beyond */ + start_x = ALIGN_DOWN(start_x - w + 1, align); /* - 1 to be inclusive */ + end_y = end_y - h + 1; + + /* check if allocation would still fit in scan area */ + if (start_x < end_x) + return -ENOSPC; + + /* scan field top-to-bottom, right-to-left */ + for (y = start_y; y <= end_y; y++) { + for (x = start_x; x >= end_x; x -= align) { + if (is_area_free(map, x, y, w, h)) { + found_x = x; + + /* update best candidate */ + if (update_candidate(tcm, x, y, w, h, field, + CR_R2L_T2B, &best)) + goto done; + + /* change upper x bound */ + end_x = x + 1; + break; + } else if (map[x][y] && map[x][y]->is2d) { + /* step over 2D areas */ + x = ALIGN(map[x][y]->p0.x - w + 1, align); + } + } + + /* break if you find a free area shouldering the scan field */ + if (found_x == start_x) + break; + } + + if (!best.a.tcm) + return -ENOSPC; +done: + assign(area, best.a.p0.x, best.a.p0.y, best.a.p1.x, best.a.p1.y); + return 0; +} + +/** + * Raster scan horizontally left to right from top to bottom to find a place for + * a 2D area of given size inside a scan field. + * + * @param w width of desired area + * @param h height of desired area + * @param align desired area alignment + * @param area pointer to the area that will be set to the best position + * @param field area to scan (inclusive) + * + * @return 0 on success, non-0 error value on failure. + */ +static s32 scan_l2r_t2b(struct tcm *tcm, u16 w, u16 h, u16 align, + struct tcm_area *field, struct tcm_area *area) +{ + s32 x, y; + s16 start_x, end_x, start_y, end_y, found_x = -1; + struct tcm_area ***map = ((struct sita_pvt *)tcm->pvt)->map; + struct score best = {{0}, {0}, {0}, 0}; + + start_x = field->p0.x; + end_x = field->p1.x; + start_y = field->p0.y; + end_y = field->p1.y; + + /* check scan area co-ordinates */ + if (field->p1.x < field->p0.x || + field->p1.y < field->p0.y) + return -EINVAL; + + /* check if allocation would fit in scan area */ + if (w > LEN(end_x, start_x) || h > LEN(end_y, start_y)) + return -ENOSPC; + + start_x = ALIGN(start_x, align); + + /* check if allocation would still fit in scan area */ + if (w > LEN(end_x, start_x)) + return -ENOSPC; + + /* adjust end_x and end_y, as allocation would not fit beyond */ + end_x = end_x - w + 1; /* + 1 to be inclusive */ + end_y = end_y - h + 1; + + /* scan field top-to-bottom, left-to-right */ + for (y = start_y; y <= end_y; y++) { + for (x = start_x; x <= end_x; x += align) { + if (is_area_free(map, x, y, w, h)) { + found_x = x; + + /* update best candidate */ + if (update_candidate(tcm, x, y, w, h, field, + CR_L2R_T2B, &best)) + goto done; + /* change upper x bound */ + end_x = x - 1; + + break; + } else if (map[x][y] && map[x][y]->is2d) { + /* step over 2D areas */ + x = ALIGN_DOWN(map[x][y]->p1.x, align); + } + } + + /* break if you find a free area shouldering the scan field */ + if (found_x == start_x) + break; + } + + if (!best.a.tcm) + return -ENOSPC; +done: + assign(area, best.a.p0.x, best.a.p0.y, best.a.p1.x, best.a.p1.y); + return 0; +} + +/** + * Raster scan horizontally right to left from bottom to top to find a place + * for a 1D area of given size inside a scan field. + * + * @param num_slots size of desired area + * @param align desired area alignment + * @param area pointer to the area that will be set to the best + * position + * @param field area to scan (inclusive) + * + * @return 0 on success, non-0 error value on failure. + */ +static s32 scan_r2l_b2t_one_dim(struct tcm *tcm, u32 num_slots, + struct tcm_area *field, struct tcm_area *area) +{ + s32 found = 0; + s16 x, y; + struct sita_pvt *pvt = (struct sita_pvt *)tcm->pvt; + struct tcm_area *p; + + /* check scan area co-ordinates */ + if (field->p0.y < field->p1.y) + return -EINVAL; + + /** + * Currently we only support full width 1D scan field, which makes sense + * since 1D slot-ordering spans the full container width. + */ + if (tcm->width != field->p0.x - field->p1.x + 1) + return -EINVAL; + + /* check if allocation would fit in scan area */ + if (num_slots > tcm->width * LEN(field->p0.y, field->p1.y)) + return -ENOSPC; + + x = field->p0.x; + y = field->p0.y; + + /* find num_slots consecutive free slots to the left */ + while (found < num_slots) { + if (y < 0) + return -ENOSPC; + + /* remember bottom-right corner */ + if (found == 0) { + area->p1.x = x; + area->p1.y = y; + } + + /* skip busy regions */ + p = pvt->map[x][y]; + if (p) { + /* move to left of 2D areas, top left of 1D */ + x = p->p0.x; + if (!p->is2d) + y = p->p0.y; + + /* start over */ + found = 0; + } else { + /* count consecutive free slots */ + found++; + if (found == num_slots) + break; + } + + /* move to the left */ + if (x == 0) + y--; + x = (x ? : tcm->width) - 1; + + } + + /* set top-left corner */ + area->p0.x = x; + area->p0.y = y; + return 0; +} + +/** + * Find a place for a 2D area of given size inside a scan field based on its + * alignment needs. + * + * @param w width of desired area + * @param h height of desired area + * @param align desired area alignment + * @param area pointer to the area that will be set to the best position + * + * @return 0 on success, non-0 error value on failure. + */ +static s32 scan_areas_and_find_fit(struct tcm *tcm, u16 w, u16 h, u16 align, + struct tcm_area *area) +{ + s32 ret = 0; + struct tcm_area field = {0}; + u16 boundary_x, boundary_y; + struct sita_pvt *pvt = (struct sita_pvt *)tcm->pvt; + + if (align > 1) { + /* prefer top-left corner */ + boundary_x = pvt->div_pt.x - 1; + boundary_y = pvt->div_pt.y - 1; + + /* expand width and height if needed */ + if (w > pvt->div_pt.x) + boundary_x = tcm->width - 1; + if (h > pvt->div_pt.y) + boundary_y = tcm->height - 1; + + assign(&field, 0, 0, boundary_x, boundary_y); + ret = scan_l2r_t2b(tcm, w, h, align, &field, area); + + /* scan whole container if failed, but do not scan 2x */ + if (ret != 0 && (boundary_x != tcm->width - 1 || + boundary_y != tcm->height - 1)) { + /* scan the entire container if nothing found */ + assign(&field, 0, 0, tcm->width - 1, tcm->height - 1); + ret = scan_l2r_t2b(tcm, w, h, align, &field, area); + } + } else if (align == 1) { + /* prefer top-right corner */ + boundary_x = pvt->div_pt.x; + boundary_y = pvt->div_pt.y - 1; + + /* expand width and height if needed */ + if (w > (tcm->width - pvt->div_pt.x)) + boundary_x = 0; + if (h > pvt->div_pt.y) + boundary_y = tcm->height - 1; + + assign(&field, tcm->width - 1, 0, boundary_x, boundary_y); + ret = scan_r2l_t2b(tcm, w, h, align, &field, area); + + /* scan whole container if failed, but do not scan 2x */ + if (ret != 0 && (boundary_x != 0 || + boundary_y != tcm->height - 1)) { + /* scan the entire container if nothing found */ + assign(&field, tcm->width - 1, 0, 0, tcm->height - 1); + ret = scan_r2l_t2b(tcm, w, h, align, &field, + area); + } + } + + return ret; +} + +/* check if an entire area is free */ +static s32 is_area_free(struct tcm_area ***map, u16 x0, u16 y0, u16 w, u16 h) +{ + u16 x = 0, y = 0; + for (y = y0; y < y0 + h; y++) { + for (x = x0; x < x0 + w; x++) { + if (map[x][y]) + return false; + } + } + return true; +} + +/* fills an area with a parent tcm_area */ +static void fill_area(struct tcm *tcm, struct tcm_area *area, + struct tcm_area *parent) +{ + s32 x, y; + struct sita_pvt *pvt = (struct sita_pvt *)tcm->pvt; + struct tcm_area a, a_; + + /* set area's tcm; otherwise, enumerator considers it invalid */ + area->tcm = tcm; + + tcm_for_each_slice(a, *area, a_) { + for (x = a.p0.x; x <= a.p1.x; ++x) + for (y = a.p0.y; y <= a.p1.y; ++y) + pvt->map[x][y] = parent; + + } +} + +/** + * Compares a candidate area to the current best area, and if it is a better + * fit, it updates the best to this one. + * + * @param x0, y0, w, h top, left, width, height of candidate area + * @param field scan field + * @param criteria scan criteria + * @param best best candidate and its scores + * + * @return 1 (true) if the candidate area is known to be the final best, so no + * more searching should be performed + */ +static s32 update_candidate(struct tcm *tcm, u16 x0, u16 y0, u16 w, u16 h, + struct tcm_area *field, s32 criteria, + struct score *best) +{ + struct score me; /* score for area */ + + /* + * NOTE: For horizontal bias we always give the first found, because our + * scan is horizontal-raster-based and the first candidate will always + * have the horizontal bias. + */ + bool first = criteria & CR_BIAS_HORIZONTAL; + + assign(&me.a, x0, y0, x0 + w - 1, y0 + h - 1); + + /* calculate score for current candidate */ + if (!first) { + get_neighbor_stats(tcm, &me.a, &me.n); + me.neighs = me.n.edge + me.n.busy; + get_nearness_factor(field, &me.a, &me.f); + } + + /* the 1st candidate is always the best */ + if (!best->a.tcm) + goto better; + + BUG_ON(first); + + /* diagonal balance check */ + if ((criteria & CR_DIAGONAL_BALANCE) && + best->neighs <= me.neighs && + (best->neighs < me.neighs || + /* this implies that neighs and occupied match */ + best->n.busy < me.n.busy || + (best->n.busy == me.n.busy && + /* check the nearness factor */ + best->f.x + best->f.y > me.f.x + me.f.y))) + goto better; + + /* not better, keep going */ + return 0; + +better: + /* save current area as best */ + memcpy(best, &me, sizeof(me)); + best->a.tcm = tcm; + return first; +} + +/** + * Calculate the nearness factor of an area in a search field. The nearness + * factor is smaller if the area is closer to the search origin. + */ +static void get_nearness_factor(struct tcm_area *field, struct tcm_area *area, + struct nearness_factor *nf) +{ + /** + * Using signed math as field coordinates may be reversed if + * search direction is right-to-left or bottom-to-top. + */ + nf->x = (s32)(area->p0.x - field->p0.x) * 1000 / + (field->p1.x - field->p0.x); + nf->y = (s32)(area->p0.y - field->p0.y) * 1000 / + (field->p1.y - field->p0.y); +} + +/* get neighbor statistics */ +static void get_neighbor_stats(struct tcm *tcm, struct tcm_area *area, + struct neighbor_stats *stat) +{ + s16 x = 0, y = 0; + struct sita_pvt *pvt = (struct sita_pvt *)tcm->pvt; + + /* Clearing any exisiting values */ + memset(stat, 0, sizeof(*stat)); + + /* process top & bottom edges */ + for (x = area->p0.x; x <= area->p1.x; x++) { + if (area->p0.y == 0) + stat->edge++; + else if (pvt->map[x][area->p0.y - 1]) + stat->busy++; + + if (area->p1.y == tcm->height - 1) + stat->edge++; + else if (pvt->map[x][area->p1.y + 1]) + stat->busy++; + } + + /* process left & right edges */ + for (y = area->p0.y; y <= area->p1.y; ++y) { + if (area->p0.x == 0) + stat->edge++; + else if (pvt->map[area->p0.x - 1][y]) + stat->busy++; + + if (area->p1.x == tcm->width - 1) + stat->edge++; + else if (pvt->map[area->p1.x + 1][y]) + stat->busy++; + } +} diff --git a/drivers/staging/omapdrm/tcm-sita.h b/drivers/staging/omapdrm/tcm-sita.h new file mode 100644 index 0000000..0444f86 --- /dev/null +++ b/drivers/staging/omapdrm/tcm-sita.h @@ -0,0 +1,95 @@ +/* + * tcm_sita.h + * + * SImple Tiler Allocator (SiTA) private structures. + * + * Author: Ravi Ramachandra r.ramachandra@ti.com + * + * Copyright (C) 2009-2011 Texas Instruments, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Texas Instruments Incorporated nor the names of + * its contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; + * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR + * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _TCM_SITA_H +#define _TCM_SITA_H + +#include "tcm.h" + +/* length between two coordinates */ +#define LEN(a, b) ((a) > (b) ? (a) - (b) + 1 : (b) - (a) + 1) + +enum criteria { + CR_MAX_NEIGHS = 0x01, + CR_FIRST_FOUND = 0x10, + CR_BIAS_HORIZONTAL = 0x20, + CR_BIAS_VERTICAL = 0x40, + CR_DIAGONAL_BALANCE = 0x80 +}; + +/* nearness to the beginning of the search field from 0 to 1000 */ +struct nearness_factor { + s32 x; + s32 y; +}; + +/* + * Statistics on immediately neighboring slots. Edge is the number of + * border segments that are also border segments of the scan field. Busy + * refers to the number of neighbors that are occupied. + */ +struct neighbor_stats { + u16 edge; + u16 busy; +}; + +/* structure to keep the score of a potential allocation */ +struct score { + struct nearness_factor f; + struct neighbor_stats n; + struct tcm_area a; + u16 neighs; /* number of busy neighbors */ +}; + +struct sita_pvt { + spinlock_t lock; /* spinlock to protect access */ + struct tcm_pt div_pt; /* divider point splitting container */ + struct tcm_area ***map; /* pointers to the parent area for each slot */ +}; + +/* assign coordinates to area */ +static inline +void assign(struct tcm_area *a, u16 x0, u16 y0, u16 x1, u16 y1) +{ + a->p0.x = x0; + a->p0.y = y0; + a->p1.x = x1; + a->p1.y = y1; +} + +#endif diff --git a/drivers/staging/omapdrm/tcm.h b/drivers/staging/omapdrm/tcm.h new file mode 100644 index 0000000..d273e3e --- /dev/null +++ b/drivers/staging/omapdrm/tcm.h @@ -0,0 +1,326 @@ +/* + * tcm.h + * + * TILER container manager specification and support functions for TI + * TILER driver. + * + * Author: Lajos Molnar molnar@ti.com + * + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Texas Instruments Incorporated nor the names of + * its contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; + * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR + * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef TCM_H +#define TCM_H + +struct tcm; + +/* point */ +struct tcm_pt { + u16 x; + u16 y; +}; + +/* 1d or 2d area */ +struct tcm_area { + bool is2d; /* whether area is 1d or 2d */ + struct tcm *tcm; /* parent */ + struct tcm_pt p0; + struct tcm_pt p1; +}; + +struct tcm { + u16 width, height; /* container dimensions */ + int lut_id; /* Lookup table identifier */ + + /* 'pvt' structure shall contain any tcm details (attr) along with + linked list of allocated areas and mutex for mutually exclusive access + to the list. It may also contain copies of width and height to notice + any changes to the publicly available width and height fields. */ + void *pvt; + + /* function table */ + s32 (*reserve_2d)(struct tcm *tcm, u16 height, u16 width, u8 align, + struct tcm_area *area); + s32 (*reserve_1d)(struct tcm *tcm, u32 slots, struct tcm_area *area); + s32 (*free) (struct tcm *tcm, struct tcm_area *area); + void (*deinit) (struct tcm *tcm); +}; + +/*============================================================================= + BASIC TILER CONTAINER MANAGER INTERFACE +=============================================================================*/ + +/* + * NOTE: + * + * Since some basic parameter checking is done outside the TCM algorithms, + * TCM implementation do NOT have to check the following: + * + * area pointer is NULL + * width and height fits within container + * number of pages is more than the size of the container + * + */ + +struct tcm *sita_init(u16 width, u16 height, struct tcm_pt *attr); + + +/** + * Deinitialize tiler container manager. + * + * @param tcm Pointer to container manager. + * + * @return 0 on success, non-0 error value on error. The call + * should free as much memory as possible and meaningful + * even on failure. Some error codes: -ENODEV: invalid + * manager. + */ +static inline void tcm_deinit(struct tcm *tcm) +{ + if (tcm) + tcm->deinit(tcm); +} + +/** + * Reserves a 2D area in the container. + * + * @param tcm Pointer to container manager. + * @param height Height(in pages) of area to be reserved. + * @param width Width(in pages) of area to be reserved. + * @param align Alignment requirement for top-left corner of area. Not + * all values may be supported by the container manager, + * but it must support 0 (1), 32 and 64. + * 0 value is equivalent to 1. + * @param area Pointer to where the reserved area should be stored. + * + * @return 0 on success. Non-0 error code on failure. Also, + * the tcm field of the area will be set to NULL on + * failure. Some error codes: -ENODEV: invalid manager, + * -EINVAL: invalid area, -ENOMEM: not enough space for + * allocation. + */ +static inline s32 tcm_reserve_2d(struct tcm *tcm, u16 width, u16 height, + u16 align, struct tcm_area *area) +{ + /* perform rudimentary error checking */ + s32 res = tcm == NULL ? -ENODEV : + (area == NULL || width == 0 || height == 0 || + /* align must be a 2 power */ + (align & (align - 1))) ? -EINVAL : + (height > tcm->height || width > tcm->width) ? -ENOMEM : 0; + + if (!res) { + area->is2d = true; + res = tcm->reserve_2d(tcm, height, width, align, area); + area->tcm = res ? NULL : tcm; + } + + return res; +} + +/** + * Reserves a 1D area in the container. + * + * @param tcm Pointer to container manager. + * @param slots Number of (contiguous) slots to reserve. + * @param area Pointer to where the reserved area should be stored. + * + * @return 0 on success. Non-0 error code on failure. Also, + * the tcm field of the area will be set to NULL on + * failure. Some error codes: -ENODEV: invalid manager, + * -EINVAL: invalid area, -ENOMEM: not enough space for + * allocation. + */ +static inline s32 tcm_reserve_1d(struct tcm *tcm, u32 slots, + struct tcm_area *area) +{ + /* perform rudimentary error checking */ + s32 res = tcm == NULL ? -ENODEV : + (area == NULL || slots == 0) ? -EINVAL : + slots > (tcm->width * (u32) tcm->height) ? -ENOMEM : 0; + + if (!res) { + area->is2d = false; + res = tcm->reserve_1d(tcm, slots, area); + area->tcm = res ? NULL : tcm; + } + + return res; +} + +/** + * Free a previously reserved area from the container. + * + * @param area Pointer to area reserved by a prior call to + * tcm_reserve_1d or tcm_reserve_2d call, whether + * it was successful or not. (Note: all fields of + * the structure must match.) + * + * @return 0 on success. Non-0 error code on failure. Also, the tcm + * field of the area is set to NULL on success to avoid subsequent + * freeing. This call will succeed even if supplying + * the area from a failed reserved call. + */ +static inline s32 tcm_free(struct tcm_area *area) +{ + s32 res = 0; /* free succeeds by default */ + + if (area && area->tcm) { + res = area->tcm->free(area->tcm, area); + if (res == 0) + area->tcm = NULL; + } + + return res; +} + +/*============================================================================= + HELPER FUNCTION FOR ANY TILER CONTAINER MANAGER +=============================================================================*/ + +/** + * This method slices off the topmost 2D slice from the parent area, and stores + * it in the 'slice' parameter. The 'parent' parameter will get modified to + * contain the remaining portion of the area. If the whole parent area can + * fit in a 2D slice, its tcm pointer is set to NULL to mark that it is no + * longer a valid area. + * + * @param parent Pointer to a VALID parent area that will get modified + * @param slice Pointer to the slice area that will get modified + */ +static inline void tcm_slice(struct tcm_area *parent, struct tcm_area *slice) +{ + *slice = *parent; + + /* check if we need to slice */ + if (slice->tcm && !slice->is2d && + slice->p0.y != slice->p1.y && + (slice->p0.x || (slice->p1.x != slice->tcm->width - 1))) { + /* set end point of slice (start always remains) */ + slice->p1.x = slice->tcm->width - 1; + slice->p1.y = (slice->p0.x) ? slice->p0.y : slice->p1.y - 1; + /* adjust remaining area */ + parent->p0.x = 0; + parent->p0.y = slice->p1.y + 1; + } else { + /* mark this as the last slice */ + parent->tcm = NULL; + } +} + +/* Verify if a tcm area is logically valid */ +static inline bool tcm_area_is_valid(struct tcm_area *area) +{ + return area && area->tcm && + /* coordinate bounds */ + area->p1.x < area->tcm->width && + area->p1.y < area->tcm->height && + area->p0.y <= area->p1.y && + /* 1D coordinate relationship + p0.x check */ + ((!area->is2d && + area->p0.x < area->tcm->width && + area->p0.x + area->p0.y * area->tcm->width <= + area->p1.x + area->p1.y * area->tcm->width) || + /* 2D coordinate relationship */ + (area->is2d && + area->p0.x <= area->p1.x)); +} + +/* see if a coordinate is within an area */ +static inline bool __tcm_is_in(struct tcm_pt *p, struct tcm_area *a) +{ + u16 i; + + if (a->is2d) { + return p->x >= a->p0.x && p->x <= a->p1.x && + p->y >= a->p0.y && p->y <= a->p1.y; + } else { + i = p->x + p->y * a->tcm->width; + return i >= a->p0.x + a->p0.y * a->tcm->width && + i <= a->p1.x + a->p1.y * a->tcm->width; + } +} + +/* calculate area width */ +static inline u16 __tcm_area_width(struct tcm_area *area) +{ + return area->p1.x - area->p0.x + 1; +} + +/* calculate area height */ +static inline u16 __tcm_area_height(struct tcm_area *area) +{ + return area->p1.y - area->p0.y + 1; +} + +/* calculate number of slots in an area */ +static inline u16 __tcm_sizeof(struct tcm_area *area) +{ + return area->is2d ? + __tcm_area_width(area) * __tcm_area_height(area) : + (area->p1.x - area->p0.x + 1) + (area->p1.y - area->p0.y) * + area->tcm->width; +} +#define tcm_sizeof(area) __tcm_sizeof(&(area)) +#define tcm_awidth(area) __tcm_area_width(&(area)) +#define tcm_aheight(area) __tcm_area_height(&(area)) +#define tcm_is_in(pt, area) __tcm_is_in(&(pt), &(area)) + +/* limit a 1D area to the first N pages */ +static inline s32 tcm_1d_limit(struct tcm_area *a, u32 num_pg) +{ + if (__tcm_sizeof(a) < num_pg) + return -ENOMEM; + if (!num_pg) + return -EINVAL; + + a->p1.x = (a->p0.x + num_pg - 1) % a->tcm->width; + a->p1.y = a->p0.y + ((a->p0.x + num_pg - 1) / a->tcm->width); + return 0; +} + +/** + * Iterate through 2D slices of a valid area. Behaves + * syntactically as a for(;;) statement. + * + * @param var Name of a local variable of type 'struct + * tcm_area *' that will get modified to + * contain each slice. + * @param area Pointer to the VALID parent area. This + * structure will not get modified + * throughout the loop. + * + */ +#define tcm_for_each_slice(var, area, safe) \ + for (safe = area, \ + tcm_slice(&safe, &var); \ + var.tcm; tcm_slice(&safe, &var)) + +#endif
From: Rob Clark rob@ti.com
TILER/DMM provides two features for omapdrm GEM objects: 1) providing a physically contiguous view to discontiguous memory for hw initiators that cannot otherwise support discontiguous buffers (DSS scanout, IVAHD video decode/encode, etc) 2) providing untiling for 2d tiled buffers, which are used in some cases to provide rotation and reduce memory bandwidth for hw initiators that tend to access data in 2d block patterns.
For 2d tiled buffers, there are some additional complications when it comes to userspace mmap'ings. For non-tiled buffers, the original (potentially physically discontiguous) pages are used to back the mmap. For tiled buffers, we need to mmap via the tiler/dmm region to provide an unswizzled view of the buffer. But (a) the buffer is not necessarily pinned in TILER all the time (it can be unmapped when there is no DMA access to the buffer), and (b) when they are they are pinned, they not necessarily page aligned from the perspective of the CPU. And non-page aligned userspace buffer mapping is evil.
To solve this, we reserve one or more small regions in each of the 2d containers when the driver is loaded to use as a "user-GART" where we can create a second page-aligned mapping of parts of the buffer being accessed from userspace. Page faulting is used to evict and remap different regions of whichever buffers are being accessed from user- space.
Signed-off-by: Rob Clark rob@ti.com --- drivers/staging/omapdrm/TODO | 5 + drivers/staging/omapdrm/omap_drv.c | 6 +- drivers/staging/omapdrm/omap_drv.h | 3 + drivers/staging/omapdrm/omap_fb.c | 2 +- drivers/staging/omapdrm/omap_gem.c | 432 +++++++++++++++++++++++++--- drivers/staging/omapdrm/omap_gem_helpers.c | 55 ++++ 6 files changed, 466 insertions(+), 37 deletions(-)
diff --git a/drivers/staging/omapdrm/TODO b/drivers/staging/omapdrm/TODO index 18677e7..55b1837 100644 --- a/drivers/staging/omapdrm/TODO +++ b/drivers/staging/omapdrm/TODO @@ -22,6 +22,11 @@ TODO . Review DSS vs KMS mismatches. The omap_dss_device is sort of part encoder, part connector. Which results in a bit of duct tape to fwd calls from encoder to connector. Possibly this could be done a bit better. +. Solve PM sequencing on resume. DMM/TILER must be reloaded before any + access is made from any component in the system. Which means on suspend + CRTC's should be disabled, and on resume the LUT should be reprogrammed + before CRTC's are re-enabled, to prevent DSS from trying to DMA from a + buffer mapped in DMM/TILER before LUT is reloaded. . Add debugfs information for DMM/TILER
Userspace: diff --git a/drivers/staging/omapdrm/omap_drv.c b/drivers/staging/omapdrm/omap_drv.c index 71de7cf..7ecf578 100644 --- a/drivers/staging/omapdrm/omap_drv.c +++ b/drivers/staging/omapdrm/omap_drv.c @@ -509,7 +509,7 @@ static int ioctl_gem_info(struct drm_device *dev, void *data, return -ENOENT; }
- args->size = obj->size; /* for now */ + args->size = omap_gem_mmap_size(obj); args->offset = omap_gem_mmap_offset(obj);
drm_gem_object_unreference_unlocked(obj); @@ -557,6 +557,8 @@ static int dev_load(struct drm_device *dev, unsigned long flags)
dev->dev_private = priv;
+ omap_gem_init(dev); + ret = omap_modeset_init(dev); if (ret) { dev_err(dev->dev, "omap_modeset_init failed: ret=%d\n", ret); @@ -589,8 +591,8 @@ static int dev_unload(struct drm_device *dev) drm_kms_helper_poll_fini(dev);
omap_fbdev_free(dev); - omap_modeset_free(dev); + omap_gem_deinit(dev);
kfree(dev->dev_private); dev->dev_private = NULL; diff --git a/drivers/staging/omapdrm/omap_drv.h b/drivers/staging/omapdrm/omap_drv.h index c8f2752..9d0783d 100644 --- a/drivers/staging/omapdrm/omap_drv.h +++ b/drivers/staging/omapdrm/omap_drv.h @@ -84,6 +84,8 @@ struct drm_connector *omap_framebuffer_get_next_connector( void omap_framebuffer_flush(struct drm_framebuffer *fb, int x, int y, int w, int h);
+void omap_gem_init(struct drm_device *dev); +void omap_gem_deinit(struct drm_device *dev);
struct drm_gem_object *omap_gem_new(struct drm_device *dev, union omap_gem_size gsize, uint32_t flags); @@ -109,6 +111,7 @@ int omap_gem_get_paddr(struct drm_gem_object *obj, dma_addr_t *paddr, bool remap); int omap_gem_put_paddr(struct drm_gem_object *obj); uint64_t omap_gem_mmap_offset(struct drm_gem_object *obj); +size_t omap_gem_mmap_size(struct drm_gem_object *obj);
static inline int align_pitch(int pitch, int width, int bpp) { diff --git a/drivers/staging/omapdrm/omap_fb.c b/drivers/staging/omapdrm/omap_fb.c index 82ed612..491be53 100644 --- a/drivers/staging/omapdrm/omap_fb.c +++ b/drivers/staging/omapdrm/omap_fb.c @@ -102,7 +102,7 @@ int omap_framebuffer_get_buffer(struct drm_framebuffer *fb, int x, int y, * dma_alloc_coherent()). But this should be ok because it * is only used by legacy fbdev */ - BUG_ON(!bo_vaddr); + BUG_ON(IS_ERR_OR_NULL(bo_vaddr)); *vaddr = bo_vaddr + offset; }
diff --git a/drivers/staging/omapdrm/omap_gem.c b/drivers/staging/omapdrm/omap_gem.c index bc1709c..1054da3 100644 --- a/drivers/staging/omapdrm/omap_gem.c +++ b/drivers/staging/omapdrm/omap_gem.c @@ -22,11 +22,13 @@ #include <linux/shmem_fs.h>
#include "omap_drv.h" +#include "omap_dmm_tiler.h"
/* remove these once drm core helpers are merged */ struct page ** _drm_gem_get_pages(struct drm_gem_object *obj, gfp_t gfpmask); void _drm_gem_put_pages(struct drm_gem_object *obj, struct page **pages, bool dirty, bool accessed); +int _drm_gem_create_mmap_offset_size(struct drm_gem_object *obj, size_t size);
/* * GEM buffer object implementation. @@ -45,9 +47,16 @@ struct omap_gem_object {
uint32_t flags;
+ /** width/height for tiled formats (rounded up to slot boundaries) */ + uint16_t width, height; + /** * If buffer is allocated physically contiguous, the OMAP_BO_DMA flag - * is set and the paddr is valid. + * is set and the paddr is valid. Also if the buffer is remapped in + * TILER and paddr_cnt > 0, then paddr is valid. But if you are using + * the physical address and OMAP_BO_DMA is not set, then you should + * be going thru omap_gem_{get,put}_paddr() to ensure the mapping is + * not removed from under your feet. * * Note that OMAP_BO_SCANOUT is a hint from userspace that DMA capable * buffer is requested, but doesn't mean that it is. Use the @@ -57,6 +66,16 @@ struct omap_gem_object { dma_addr_t paddr;
/** + * # of users of paddr + */ + uint32_t paddr_cnt; + + /** + * tiler block used when buffer is remapped in DMM/TILER. + */ + struct tiler_block *block; + + /** * Array of backing pages, if allocated. Note that pages are never * allocated for buffers originally allocated from contiguous memory */ @@ -91,6 +110,67 @@ struct omap_gem_object { } *sync; };
+/* To deal with userspace mmap'ings of 2d tiled buffers, which (a) are + * not necessarily pinned in TILER all the time, and (b) when they are + * they are not necessarily page aligned, we reserve one or more small + * regions in each of the 2d containers to use as a user-GART where we + * can create a second page-aligned mapping of parts of the buffer + * being accessed from userspace. + * + * Note that we could optimize slightly when we know that multiple + * tiler containers are backed by the same PAT.. but I'll leave that + * for later.. + */ +#define NUM_USERGART_ENTRIES 2 +struct usergart_entry { + struct tiler_block *block; /* the reserved tiler block */ + dma_addr_t paddr; + struct drm_gem_object *obj; /* the current pinned obj */ + pgoff_t obj_pgoff; /* page offset of obj currently + mapped in */ +}; +static struct { + struct usergart_entry entry[NUM_USERGART_ENTRIES]; + int height; /* height in rows */ + int height_shift; /* ilog2(height in rows) */ + int slot_shift; /* ilog2(width per slot) */ + int stride_pfn; /* stride in pages */ + int last; /* index of last used entry */ +} *usergart; + +static void evict_entry(struct drm_gem_object *obj, + enum tiler_fmt fmt, struct usergart_entry *entry) +{ + if (obj->dev->dev_mapping) { + size_t size = PAGE_SIZE * usergart[fmt].height; + loff_t off = omap_gem_mmap_offset(obj) + + (entry->obj_pgoff << PAGE_SHIFT); + unmap_mapping_range(obj->dev->dev_mapping, off, size, 1); + } + + entry->obj = NULL; +} + +/* Evict a buffer from usergart, if it is mapped there */ +static void evict(struct drm_gem_object *obj) +{ + struct omap_gem_object *omap_obj = to_omap_bo(obj); + + if (omap_obj->flags & OMAP_BO_TILED) { + enum tiler_fmt fmt = gem2fmt(omap_obj->flags); + int i; + + if (!usergart) + return; + + for (i = 0; i < NUM_USERGART_ENTRIES; i++) { + struct usergart_entry *entry = &usergart[fmt].entry[i]; + if (entry->obj == obj) + evict_entry(obj, fmt, entry); + } + } +} + /* GEM objects can either be allocated from contiguous memory (in which * case obj->filp==NULL), or w/ shmem backing (obj->filp!=NULL). But non * contiguous buffers can be remapped in TILER/DMM if they need to be @@ -142,7 +222,9 @@ uint64_t omap_gem_mmap_offset(struct drm_gem_object *obj) { if (!obj->map_list.map) { /* Make it mmapable */ - int ret = drm_gem_create_mmap_offset(obj); + size_t size = omap_gem_mmap_size(obj); + int ret = _drm_gem_create_mmap_offset_size(obj, size); + if (ret) { dev_err(obj->dev->dev, "could not allocate mmap offset"); return 0; @@ -152,6 +234,134 @@ uint64_t omap_gem_mmap_offset(struct drm_gem_object *obj) return (uint64_t)obj->map_list.hash.key << PAGE_SHIFT; }
+/** get mmap size */ +size_t omap_gem_mmap_size(struct drm_gem_object *obj) +{ + struct omap_gem_object *omap_obj = to_omap_bo(obj); + size_t size = obj->size; + + if (omap_obj->flags & OMAP_BO_TILED) { + /* for tiled buffers, the virtual size has stride rounded up + * to 4kb.. (to hide the fact that row n+1 might start 16kb or + * 32kb later!). But we don't back the entire buffer with + * pages, only the valid picture part.. so need to adjust for + * this in the size used to mmap and generate mmap offset + */ + size = tiler_vsize(gem2fmt(omap_obj->flags), + omap_obj->width, omap_obj->height); + } + + return size; +} + + +/* Normal handling for the case of faulting in non-tiled buffers */ +static int fault_1d(struct drm_gem_object *obj, + struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct omap_gem_object *omap_obj = to_omap_bo(obj); + unsigned long pfn; + pgoff_t pgoff; + + /* We don't use vmf->pgoff since that has the fake offset: */ + pgoff = ((unsigned long)vmf->virtual_address - + vma->vm_start) >> PAGE_SHIFT; + + if (omap_obj->pages) { + pfn = page_to_pfn(omap_obj->pages[pgoff]); + } else { + BUG_ON(!(omap_obj->flags & OMAP_BO_DMA)); + pfn = (omap_obj->paddr >> PAGE_SHIFT) + pgoff; + } + + VERB("Inserting %p pfn %lx, pa %lx", vmf->virtual_address, + pfn, pfn << PAGE_SHIFT); + + return vm_insert_mixed(vma, (unsigned long)vmf->virtual_address, pfn); +} + +/* Special handling for the case of faulting in 2d tiled buffers */ +static int fault_2d(struct drm_gem_object *obj, + struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct omap_gem_object *omap_obj = to_omap_bo(obj); + struct usergart_entry *entry; + enum tiler_fmt fmt = gem2fmt(omap_obj->flags); + struct page *pages[64]; /* XXX is this too much to have on stack? */ + unsigned long pfn; + pgoff_t pgoff, base_pgoff; + void __user *vaddr; + int i, ret, slots; + + if (!usergart) + return -EFAULT; + + /* TODO: this fxn might need a bit tweaking to deal w/ tiled buffers + * that are wider than 4kb + */ + + /* We don't use vmf->pgoff since that has the fake offset: */ + pgoff = ((unsigned long)vmf->virtual_address - + vma->vm_start) >> PAGE_SHIFT; + + /* actual address we start mapping at is rounded down to previous slot + * boundary in the y direction: + */ + base_pgoff = round_down(pgoff, usergart[fmt].height); + vaddr = vmf->virtual_address - ((pgoff - base_pgoff) << PAGE_SHIFT); + entry = &usergart[fmt].entry[usergart[fmt].last]; + + slots = omap_obj->width >> usergart[fmt].slot_shift; + + /* evict previous buffer using this usergart entry, if any: */ + if (entry->obj) + evict_entry(entry->obj, fmt, entry); + + entry->obj = obj; + entry->obj_pgoff = base_pgoff; + + /* now convert base_pgoff to phys offset from virt offset: + */ + base_pgoff = (base_pgoff >> usergart[fmt].height_shift) * slots; + + /* map in pages. Note the height of the slot is also equal to the + * number of pages that need to be mapped in to fill 4kb wide CPU page. + * If the height is 64, then 64 pages fill a 4kb wide by 64 row region. + * Beyond the valid pixel part of the buffer, we set pages[i] to NULL to + * get a dummy page mapped in.. if someone reads/writes it they will get + * random/undefined content, but at least it won't be corrupting + * whatever other random page used to be mapped in, or other undefined + * behavior. + */ + memcpy(pages, &omap_obj->pages[base_pgoff], + sizeof(struct page *) * slots); + memset(pages + slots, 0, + sizeof(struct page *) * (usergart[fmt].height - slots)); + + ret = tiler_pin(entry->block, pages, true); + if (ret) { + dev_err(obj->dev->dev, "failed to pin: %d\n", ret); + return ret; + } + + i = usergart[fmt].height; + pfn = entry->paddr >> PAGE_SHIFT; + + VERB("Inserting %p pfn %lx, pa %lx", vmf->virtual_address, + pfn, pfn << PAGE_SHIFT); + + while (i--) { + vm_insert_mixed(vma, (unsigned long)vaddr, pfn); + pfn += usergart[fmt].stride_pfn; + vaddr += PAGE_SIZE; + } + + /* simple round-robin: */ + usergart[fmt].last = (usergart[fmt].last + 1) % NUM_USERGART_ENTRIES; + + return 0; +} + /** * omap_gem_fault - pagefault handler for GEM objects * @vma: the VMA of the GEM object @@ -171,8 +381,6 @@ int omap_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) struct omap_gem_object *omap_obj = to_omap_bo(obj); struct drm_device *dev = obj->dev; struct page **pages; - unsigned long pfn; - pgoff_t pgoff; int ret;
/* Make sure we don't parallel update on a fault, nor move or remove @@ -192,21 +400,11 @@ int omap_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) * probably trigger put_pages()? */
- /* We don't use vmf->pgoff since that has the fake offset: */ - pgoff = ((unsigned long)vmf->virtual_address - - vma->vm_start) >> PAGE_SHIFT; + if (omap_obj->flags & OMAP_BO_TILED) + ret = fault_2d(obj, vma, vmf); + else + ret = fault_1d(obj, vma, vmf);
- if (omap_obj->pages) { - pfn = page_to_pfn(omap_obj->pages[pgoff]); - } else { - BUG_ON(!(omap_obj->flags & OMAP_BO_DMA)); - pfn = (omap_obj->paddr >> PAGE_SHIFT) + pgoff; - } - - VERB("Inserting %p pfn %lx, pa %lx", vmf->virtual_address, - pfn, pfn << PAGE_SHIFT); - - ret = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address, pfn);
fail: mutex_unlock(&dev->struct_mutex); @@ -308,8 +506,6 @@ int omap_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, struct drm_gem_object *obj; int ret = 0;
- mutex_lock(&dev->struct_mutex); - /* GEM does all our handle to object mapping */ obj = drm_gem_object_lookup(dev, file, handle); if (obj == NULL) { @@ -322,7 +518,6 @@ int omap_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, drm_gem_object_unreference_unlocked(obj);
fail: - mutex_unlock(&dev->struct_mutex); return ret; }
@@ -336,12 +531,61 @@ int omap_gem_get_paddr(struct drm_gem_object *obj, struct omap_gem_object *omap_obj = to_omap_bo(obj); int ret = 0;
- if (is_shmem(obj)) { - /* TODO: remap to TILER */ - return -ENOMEM; + mutex_lock(&obj->dev->struct_mutex); + + if (remap && is_shmem(obj)) { + if (omap_obj->paddr_cnt == 0) { + struct page **pages; + enum tiler_fmt fmt = gem2fmt(omap_obj->flags); + struct tiler_block *block; + BUG_ON(omap_obj->block); + + ret = get_pages(obj, &pages); + if (ret) + goto fail; + + + if (omap_obj->flags & OMAP_BO_TILED) { + block = tiler_reserve_2d(fmt, + omap_obj->width, + omap_obj->height, 0); + } else { + block = tiler_reserve_1d(obj->size); + } + + if (IS_ERR(block)) { + ret = PTR_ERR(block); + dev_err(obj->dev->dev, + "could not remap: %d (%d)\n", ret, fmt); + goto fail; + } + + /* TODO: enable async refill.. */ + ret = tiler_pin(block, pages, true); + if (ret) { + tiler_release(block); + dev_err(obj->dev->dev, + "could not pin: %d\n", ret); + goto fail; + } + + omap_obj->paddr = tiler_ssptr(block); + omap_obj->block = block; + + DBG("got paddr: %08x", omap_obj->paddr); + } + + omap_obj->paddr_cnt++; + + *paddr = omap_obj->paddr; + } else if (omap_obj->flags & OMAP_BO_DMA) { + *paddr = omap_obj->paddr; + } else { + ret = -EINVAL; }
- *paddr = omap_obj->paddr; +fail: + mutex_unlock(&obj->dev->struct_mutex);
return ret; } @@ -351,8 +595,30 @@ int omap_gem_get_paddr(struct drm_gem_object *obj, */ int omap_gem_put_paddr(struct drm_gem_object *obj) { - /* do something here when remap to TILER is used.. */ - return 0; + struct omap_gem_object *omap_obj = to_omap_bo(obj); + int ret = 0; + + mutex_lock(&obj->dev->struct_mutex); + if (omap_obj->paddr_cnt > 0) { + omap_obj->paddr_cnt--; + if (omap_obj->paddr_cnt == 0) { + ret = tiler_unpin(omap_obj->block); + if (ret) { + dev_err(obj->dev->dev, + "could not unpin pages: %d\n", ret); + goto fail; + } + ret = tiler_release(omap_obj->block); + if (ret) { + dev_err(obj->dev->dev, + "could not release unmap: %d\n", ret); + } + omap_obj->block = NULL; + } + } +fail: + mutex_unlock(&obj->dev->struct_mutex); + return ret; }
/* acquire pages when needed (for example, for DMA where physically @@ -396,13 +662,22 @@ int omap_gem_put_pages(struct drm_gem_object *obj) return 0; }
-/* Get kernel virtual address for CPU access.. only buffers that are - * allocated contiguously have a kernel virtual address, so this more - * or less only exists for omap_fbdev +/* Get kernel virtual address for CPU access.. this more or less only + * exists for omap_fbdev. This should be called with struct_mutex + * held. */ void *omap_gem_vaddr(struct drm_gem_object *obj) { struct omap_gem_object *omap_obj = to_omap_bo(obj); + WARN_ON(! mutex_is_locked(&obj->dev->struct_mutex)); + if (!omap_obj->vaddr) { + struct page **pages; + int ret = get_pages(obj, &pages); + if (ret) + return ERR_PTR(ret); + omap_obj->vaddr = vmap(pages, obj->size >> PAGE_SHIFT, + VM_MAP, pgprot_writecombine(PAGE_KERNEL)); + } return omap_obj->vaddr; }
@@ -670,6 +945,8 @@ void omap_gem_free_object(struct drm_gem_object *obj) struct drm_device *dev = obj->dev; struct omap_gem_object *omap_obj = to_omap_bo(obj);
+ evict(obj); + if (obj->map_list.map) { drm_gem_free_mmap_offset(obj); } @@ -682,6 +959,8 @@ void omap_gem_free_object(struct drm_gem_object *obj) if (!is_shmem(obj)) { dma_free_writecombine(dev->dev, obj->size, omap_obj->vaddr, omap_obj->paddr); + } else if (omap_obj->vaddr) { + vunmap(omap_obj->vaddr); } }
@@ -729,11 +1008,32 @@ struct drm_gem_object *omap_gem_new(struct drm_device *dev, int ret;
if (flags & OMAP_BO_TILED) { - /* TODO: not implemented yet */ - goto fail; - } + if (!usergart) { + dev_err(dev->dev, "Tiled buffers require DMM\n"); + goto fail; + } + + /* tiled buffers are always shmem paged backed.. when they are + * scanned out, they are remapped into DMM/TILER + */ + flags &= ~OMAP_BO_SCANOUT; + + /* currently don't allow cached buffers.. there is some caching + * stuff that needs to be handled better + */ + flags &= ~(OMAP_BO_CACHED|OMAP_BO_UNCACHED); + flags |= OMAP_BO_WC;
- size = PAGE_ALIGN(gsize.bytes); + /* align dimensions to slot boundaries... */ + tiler_align(gem2fmt(flags), + &gsize.tiled.width, &gsize.tiled.height); + + /* ...and calculate size based on aligned dimensions */ + size = tiler_size(gem2fmt(flags), + gsize.tiled.width, gsize.tiled.height); + } else { + size = PAGE_ALIGN(gsize.bytes); + }
omap_obj = kzalloc(sizeof(*omap_obj), GFP_KERNEL); if (!omap_obj) { @@ -754,6 +1054,11 @@ struct drm_gem_object *omap_gem_new(struct drm_device *dev,
omap_obj->flags = flags;
+ if (flags & OMAP_BO_TILED) { + omap_obj->width = gsize.tiled.width; + omap_obj->height = gsize.tiled.height; + } + if (flags & (OMAP_BO_DMA|OMAP_BO_EXT_MEM)) { ret = drm_gem_private_object_init(dev, obj, size); } else { @@ -772,3 +1077,62 @@ fail: } return NULL; } + +/* init/cleanup.. if DMM is used, we need to set some stuff up.. */ +void omap_gem_init(struct drm_device *dev) +{ + const enum tiler_fmt fmts[] = { + TILFMT_8BIT, TILFMT_16BIT, TILFMT_32BIT + }; + int i, j, ret; + + ret = omap_dmm_init(dev); + if (ret) { + /* DMM only supported on OMAP4 and later, so this isn't fatal */ + dev_warn(dev->dev, "omap_dmm_init failed, disabling DMM\n"); + return; + } + + usergart = kzalloc(3 * sizeof(*usergart), GFP_KERNEL); + + /* reserve 4k aligned/wide regions for userspace mappings: */ + for (i = 0; i < ARRAY_SIZE(fmts); i++) { + uint16_t h = 1, w = PAGE_SIZE >> i; + tiler_align(fmts[i], &w, &h); + /* note: since each region is 1 4kb page wide, and minimum + * number of rows, the height ends up being the same as the + * # of pages in the region + */ + usergart[i].height = h; + usergart[i].height_shift = ilog2(h); + usergart[i].stride_pfn = tiler_stride(fmts[i]) >> PAGE_SHIFT; + usergart[i].slot_shift = ilog2((PAGE_SIZE / h) >> i); + for (j = 0; j < NUM_USERGART_ENTRIES; j++) { + struct usergart_entry *entry = &usergart[i].entry[j]; + struct tiler_block *block = + tiler_reserve_2d(fmts[i], w, h, + PAGE_SIZE); + if (IS_ERR(block)) { + dev_err(dev->dev, + "reserve failed: %d, %d, %ld\n", + i, j, PTR_ERR(block)); + return; + } + entry->paddr = tiler_ssptr(block); + entry->block = block; + + DBG("%d:%d: %dx%d: paddr=%08x stride=%d", i, j, w, h, + entry->paddr, + usergart[i].stride_pfn << PAGE_SHIFT); + } + } +} + +void omap_gem_deinit(struct drm_device *dev) +{ + /* I believe we can rely on there being no more outstanding GEM + * objects which could depend on usergart/dmm at this point. + */ + omap_dmm_remove(); + kfree(usergart); +} diff --git a/drivers/staging/omapdrm/omap_gem_helpers.c b/drivers/staging/omapdrm/omap_gem_helpers.c index 1cfc12d..29275c7 100644 --- a/drivers/staging/omapdrm/omap_gem_helpers.c +++ b/drivers/staging/omapdrm/omap_gem_helpers.c @@ -112,3 +112,58 @@ void _drm_gem_put_pages(struct drm_gem_object *obj, struct page **pages,
drm_free_large(pages); } + +int +_drm_gem_create_mmap_offset_size(struct drm_gem_object *obj, size_t size) +{ + struct drm_device *dev = obj->dev; + struct drm_gem_mm *mm = dev->mm_private; + struct drm_map_list *list; + struct drm_local_map *map; + int ret = 0; + + /* Set the object up for mmap'ing */ + list = &obj->map_list; + list->map = kzalloc(sizeof(struct drm_map_list), GFP_KERNEL); + if (!list->map) + return -ENOMEM; + + map = list->map; + map->type = _DRM_GEM; + map->size = size; + map->handle = obj; + + /* Get a DRM GEM mmap offset allocated... */ + list->file_offset_node = drm_mm_search_free(&mm->offset_manager, + size / PAGE_SIZE, 0, 0); + + if (!list->file_offset_node) { + DRM_ERROR("failed to allocate offset for bo %d\n", obj->name); + ret = -ENOSPC; + goto out_free_list; + } + + list->file_offset_node = drm_mm_get_block(list->file_offset_node, + size / PAGE_SIZE, 0); + if (!list->file_offset_node) { + ret = -ENOMEM; + goto out_free_list; + } + + list->hash.key = list->file_offset_node->start; + ret = drm_ht_insert_item(&mm->offset_hash, &list->hash); + if (ret) { + DRM_ERROR("failed to add to map hash\n"); + goto out_free_mm; + } + + return 0; + +out_free_mm: + drm_mm_put_block(list->file_offset_node); +out_free_list: + kfree(list->map); + list->map = NULL; + + return ret; +}
On Mon, 5 Dec 2011 19:19:22 -0600 Rob Clark rob.clark@linaro.org wrote:
- usergart = kzalloc(3 * sizeof(*usergart), GFP_KERNEL);
- /* reserve 4k aligned/wide regions for userspace mappings: */
- for (i = 0; i < ARRAY_SIZE(fmts); i++) {
uint16_t h = 1, w = PAGE_SIZE >> i;
tiler_align(fmts[i], &w, &h);
/* note: since each region is 1 4kb page wide, and minimum
* number of rows, the height ends up being the same as the
* # of pages in the region
*/
usergart[i].height = h;
usergart[i].height_shift = ilog2(h);
Seems to be missing anallocation failure check
On Tue, Dec 6, 2011 at 5:30 AM, Alan Cox alan@lxorguk.ukuu.org.uk wrote:
On Mon, 5 Dec 2011 19:19:22 -0600 Rob Clark rob.clark@linaro.org wrote:
- usergart = kzalloc(3 * sizeof(*usergart), GFP_KERNEL);
- /* reserve 4k aligned/wide regions for userspace mappings: */
- for (i = 0; i < ARRAY_SIZE(fmts); i++) {
- uint16_t h = 1, w = PAGE_SIZE >> i;
- tiler_align(fmts[i], &w, &h);
- /* note: since each region is 1 4kb page wide, and minimum
- * number of rows, the height ends up being the same as the
- * # of pages in the region
- */
- usergart[i].height = h;
- usergart[i].height_shift = ilog2(h);
Seems to be missing anallocation failure check
oh, woops.. I've fixed that.. will give a couple more days to see if there are any other comments and then send v2..
BR, -R
Hi,
On Mon, 2011-12-05 at 19:19 -0600, Rob Clark wrote:
From: Rob Clark rob@ti.com
Support for DMM and tiled buffers. The DMM/TILER block in omap4+ SoC provides support for remapping physically discontiguous buffers for various DMA initiators (DSS, IVAHD, etc) which do not otherwise support non-physically contiguous buffers, as well as providing support for tiled buffers.
See the descriptions in the following two patches for more details.
Why is the tiler/dmm driver integrated into the drm driver?
Tomi
On Wed, Dec 7, 2011 at 3:31 AM, Tomi Valkeinen tomi.valkeinen@ti.com wrote:
Hi,
On Mon, 2011-12-05 at 19:19 -0600, Rob Clark wrote:
From: Rob Clark rob@ti.com
Support for DMM and tiled buffers. The DMM/TILER block in omap4+ SoC provides support for remapping physically discontiguous buffers for various DMA initiators (DSS, IVAHD, etc) which do not otherwise support non-physically contiguous buffers, as well as providing support for tiled buffers.
See the descriptions in the following two patches for more details.
Why is the tiler/dmm driver integrated into the drm driver?
Basically because of a big list of reasons to keep it integrated, and no good reason that I could think of to make it a standalone driver.
1) Because the function/usage is most like a GTT in other systems.. the usage is really graphics/multimedia related so GEM is a natural way to expose it to userspace. Other places we want to use tiler buffers, like camera, are neatly handled by using dmabuf to export the GEM buffer to a different device.
2) We went down the separate driver path in the past, and it really exposes a lot of problems. See the hacks that were done in the past to get wakeup/resume sequencing correct when tiler was a separate driver. (hint: the table of page addresses needs to be reprogrammed before any access to buffer mapped in DMM is done.. this can be accomplished quite simply by restoring the LUT before enabling any video pipes when it is in a single driver... although that is still in the TODO)
3) Doing some of the more advanced stuff, like page flipping using DMM's synchronized refill to update page addresses synchronized with scannout will, I think, end up being some kinda weird API.. I don't think I'd want to make that a public API exported by one driver consumed by another, but not such a problem if it is just something used internally by one driver.
4) The LUT isn't really big enough to be managed statically like we did in the past.. it needs to be managed dynamically, mapping and evicting buffers. This is something that is typical with other gfx drivers in their management of their GTT..
5) I wouldn't really want to duplicate the userspace mmap'ing games for 2d buffers in a lot of different drivers.
The code is structured such that it could be separated in the future, if there is any good reason too. But so far I've been unable to think of one.
BR, -R
Tomi
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, 2011-12-07 at 07:29 -0600, Rob Clark wrote:
On Wed, Dec 7, 2011 at 3:31 AM, Tomi Valkeinen tomi.valkeinen@ti.com wrote:
Hi,
On Mon, 2011-12-05 at 19:19 -0600, Rob Clark wrote:
From: Rob Clark rob@ti.com
Support for DMM and tiled buffers. The DMM/TILER block in omap4+ SoC provides support for remapping physically discontiguous buffers for various DMA initiators (DSS, IVAHD, etc) which do not otherwise support non-physically contiguous buffers, as well as providing support for tiled buffers.
See the descriptions in the following two patches for more details.
Why is the tiler/dmm driver integrated into the drm driver?
Basically because of a big list of reasons to keep it integrated, and no good reason that I could think of to make it a standalone driver.
Well, I think it's good architecture to keep independent things separate. Also we have other display frameworks in the kernel than DRM.
- Because the function/usage is most like a GTT in other systems..
the usage is really graphics/multimedia related so GEM is a natural way to expose it to userspace. Other places we want to use tiler buffers, like camera, are neatly handled by using dmabuf to export the GEM buffer to a different device.
- We went down the separate driver path in the past, and it really
exposes a lot of problems. See the hacks that were done in the past to get wakeup/resume sequencing correct when tiler was a separate driver. (hint: the table of page addresses needs to be reprogrammed before any access to buffer mapped in DMM is done.. this can be accomplished quite simply by restoring the LUT before enabling any video pipes when it is in a single driver... although that is still in the TODO)
- Doing some of the more advanced stuff, like page flipping using
DMM's synchronized refill to update page addresses synchronized with scannout will, I think, end up being some kinda weird API.. I don't think I'd want to make that a public API exported by one driver consumed by another, but not such a problem if it is just something used internally by one driver.
- The LUT isn't really big enough to be managed statically like we
did in the past.. it needs to be managed dynamically, mapping and evicting buffers. This is something that is typical with other gfx drivers in their management of their GTT..
- I wouldn't really want to duplicate the userspace mmap'ing games
for 2d buffers in a lot of different drivers.
I can't really argue your points as I'm not familiar with the problems with the tiler. So I'm not questioning the decision to integrate the tiler code into drm, just expressing my toughts =).
If the tiler driver had only kernel internal API, and no userspace APIs, I don't see why it'd be much more difficult to have as a separate driver than integrated into DRM. Well, except if the tiler code itself uses features offered by DRM extensively, and having the tiler code as an independent driver would mean replicating all those features.
I guess it's not quite a fair comparison, but I'm comparing tiler to the vrfb driver (well, lib is perhaps a better word for it), which doesn't depend on any framework and can be used by any kernel component.
Tomi
On Thu, Dec 8, 2011 at 2:45 AM, Tomi Valkeinen tomi.valkeinen@ti.comwrote:
On Wed, 2011-12-07 at 07:29 -0600, Rob Clark wrote:
On Wed, Dec 7, 2011 at 3:31 AM, Tomi Valkeinen tomi.valkeinen@ti.com
wrote:
Hi,
On Mon, 2011-12-05 at 19:19 -0600, Rob Clark wrote:
From: Rob Clark rob@ti.com
Support for DMM and tiled buffers. The DMM/TILER block in omap4+ SoC provides support for remapping physically discontiguous buffers for various DMA initiators (DSS, IVAHD, etc) which do not otherwise
support
non-physically contiguous buffers, as well as providing support for tiled buffers.
See the descriptions in the following two patches for more details.
Why is the tiler/dmm driver integrated into the drm driver?
Basically because of a big list of reasons to keep it integrated, and no good reason that I could think of to make it a standalone driver.
Well, I think it's good architecture to keep independent things separate. Also we have other display frameworks in the kernel than DRM.
I think we can split it out if we need to support other frameworks. It isn't that difficult to do. If we were to do that, I presume it would still reside in the gpu/ somewhere. But I think I'd rather wait until we have another framework require access.
- Because the function/usage is most like a GTT in other systems..
the usage is really graphics/multimedia related so GEM is a natural way to expose it to userspace. Other places we want to use tiler buffers, like camera, are neatly handled by using dmabuf to export the GEM buffer to a different device.
- We went down the separate driver path in the past, and it really
exposes a lot of problems. See the hacks that were done in the past to get wakeup/resume sequencing correct when tiler was a separate driver. (hint: the table of page addresses needs to be reprogrammed before any access to buffer mapped in DMM is done.. this can be accomplished quite simply by restoring the LUT before enabling any video pipes when it is in a single driver... although that is still in the TODO)
- Doing some of the more advanced stuff, like page flipping using
DMM's synchronized refill to update page addresses synchronized with scannout will, I think, end up being some kinda weird API.. I don't think I'd want to make that a public API exported by one driver consumed by another, but not such a problem if it is just something used internally by one driver.
- The LUT isn't really big enough to be managed statically like we
did in the past.. it needs to be managed dynamically, mapping and evicting buffers. This is something that is typical with other gfx drivers in their management of their GTT..
- I wouldn't really want to duplicate the userspace mmap'ing games
for 2d buffers in a lot of different drivers.
I can't really argue your points as I'm not familiar with the problems with the tiler. So I'm not questioning the decision to integrate the tiler code into drm, just expressing my toughts =).
If the tiler driver had only kernel internal API, and no userspace APIs, I don't see why it'd be much more difficult to have as a separate driver than integrated into DRM. Well, except if the tiler code itself uses features offered by DRM extensively, and having the tiler code as an independent driver would mean replicating all those features.
I guess it's not quite a fair comparison, but I'm comparing tiler to the vrfb driver (well, lib is perhaps a better word for it), which doesn't depend on any framework and can be used by any kernel component.
No I appreciate your input. I had thoughts along the same line. But since I did not have any exported kernel functions, I figured that it should just sit underneath the omapdrm umbrella (so to speak).
dri-devel@lists.freedesktop.org