[Rebased on v3.8-rc3 to avoid a conflict in drivers/gpu/drm/tegra/drm.h]
Hi,
This patch series introduces a number of useful features for the Tegra DRM driver. The first patch allows VBLANK support to be used in drivers that don't or can't use DRIVER_HAVE_IRQ for interrupt support.
Patch 2 adds support for the additional two planes available on Tegra hardware, while patch 3 implement .mode_set_base() to allow for some nice speed up when changing the framebuffer without actually changing the resolution. Patch 4 adds VBLANK support and finally patch 5 builds on it to provide the page-flipping IOCTL.
Thierry
Thierry Reding (5): drm: Allow vblank support without DRIVER_HAVE_IRQ drm/tegra: Add plane support drm/tegra: Implement .mode_set_base() drm/tegra: Implement VBLANK support drm/tegra: Implement page-flipping support
drivers/gpu/drm/drm_irq.c | 5 +- drivers/gpu/drm/tegra/dc.c | 475 ++++++++++++++++++++++++++++++++++---------- drivers/gpu/drm/tegra/dc.h | 2 +- drivers/gpu/drm/tegra/drm.c | 53 +++++ drivers/gpu/drm/tegra/drm.h | 37 ++++ 5 files changed, 461 insertions(+), 111 deletions(-)
Drivers that register interrupt handlers without the DRM core helpers don't initialize the .irq_enabled field and drm_dev_to_irq() may fail when called on them. This shouldn't preclude them from implementing the vblank IOCTL.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de --- drivers/gpu/drm/drm_irq.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 19c01ca..71f8205 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1218,8 +1218,9 @@ int drm_wait_vblank(struct drm_device *dev, void *data, int ret; unsigned int flags, seq, crtc, high_crtc;
- if ((!drm_dev_to_irq(dev)) || (!dev->irq_enabled)) - return -EINVAL; + if (drm_core_check_feature(dev, DRIVER_HAVE_IRQ)) + if ((!drm_dev_to_irq(dev)) || (!dev->irq_enabled)) + return -EINVAL;
if (vblwait->request.type & _DRM_VBLANK_SIGNAL) return -EINVAL;
Add support for the B and C planes which support RGB and YUV pixel formats and can be used as overlays or hardware cursor.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de --- drivers/gpu/drm/tegra/dc.c | 321 +++++++++++++++++++++++++++++++------------- drivers/gpu/drm/tegra/dc.h | 2 +- drivers/gpu/drm/tegra/drm.h | 29 ++++ 3 files changed, 259 insertions(+), 93 deletions(-)
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 656b2e3..157e962 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -18,19 +18,104 @@ #include "drm.h" #include "dc.h"
-struct tegra_dc_window { - fixed20_12 x; - fixed20_12 y; - fixed20_12 w; - fixed20_12 h; - unsigned int outx; - unsigned int outy; - unsigned int outw; - unsigned int outh; - unsigned int stride; - unsigned int fmt; +struct tegra_plane { + struct drm_plane base; + unsigned int index; };
+static inline struct tegra_plane *to_tegra_plane(struct drm_plane *plane) +{ + return container_of(plane, struct tegra_plane, base); +} + +static int tegra_plane_update(struct drm_plane *plane, struct drm_crtc *crtc, + struct drm_framebuffer *fb, int crtc_x, + int crtc_y, unsigned int crtc_w, + unsigned int crtc_h, uint32_t src_x, + uint32_t src_y, uint32_t src_w, uint32_t src_h) +{ + unsigned long base = tegra_framebuffer_base(fb); + struct tegra_plane *p = to_tegra_plane(plane); + struct tegra_dc *dc = to_tegra_dc(crtc); + struct tegra_dc_window window; + + memset(&window, 0, sizeof(window)); + window.src.x = src_x >> 16; + window.src.y = src_y >> 16; + window.src.w = src_w >> 16; + window.src.h = src_h >> 16; + window.dst.x = crtc_x; + window.dst.y = crtc_y; + window.dst.w = crtc_w; + window.dst.h = crtc_h; + window.format = tegra_dc_format(fb->pixel_format); + window.bits_per_pixel = fb->bits_per_pixel; + window.stride = fb->pitches[0]; + window.base = base; + + return tegra_dc_setup_window(dc, p->index + 1, &window); +} + +static int tegra_plane_disable(struct drm_plane *plane) +{ + struct tegra_dc *dc = to_tegra_dc(plane->crtc); + struct tegra_plane *p = to_tegra_plane(plane); + unsigned int index = p->index + 1; + unsigned long value; + + value = WINDOW_A_SELECT << index; + tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER); + + value = tegra_dc_readl(dc, DC_WIN_WIN_OPTIONS); + value &= ~WIN_ENABLE; + tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS); + + value = (WIN_A_ACT_REQ << index) | (WIN_A_UPDATE << index); + tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL); + + return 0; +} + +static void tegra_plane_destroy(struct drm_plane *plane) +{ + drm_plane_cleanup(plane); +} + +static const struct drm_plane_funcs tegra_plane_funcs = { + .update_plane = tegra_plane_update, + .disable_plane = tegra_plane_disable, + .destroy = tegra_plane_destroy, +}; + +static const uint32_t plane_formats[] = { + DRM_FORMAT_XRGB8888, + DRM_FORMAT_YUV422, +}; + +static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc) +{ + unsigned int i; + int err = 0; + + for (i = 0; i < 2; i++) { + struct tegra_plane *plane; + + plane = devm_kzalloc(drm->dev, sizeof(*plane), GFP_KERNEL); + if (!plane) + return -ENOMEM; + + plane->index = i; + + err = drm_plane_init(drm, &plane->base, 1 << dc->pipe, + &tegra_plane_funcs, plane_formats, + ARRAY_SIZE(plane_formats), false); + if (err < 0) + return err; + } + + return 0; +} + static const struct drm_crtc_funcs tegra_crtc_funcs = { .set_config = drm_crtc_helper_set_config, .destroy = drm_crtc_cleanup, @@ -47,10 +132,11 @@ static bool tegra_crtc_mode_fixup(struct drm_crtc *crtc, return true; }
-static inline u32 compute_dda_inc(fixed20_12 inf, unsigned int out, bool v, +static inline u32 compute_dda_inc(unsigned int in, unsigned int out, bool v, unsigned int bpp) { fixed20_12 outf = dfixed_init(out); + fixed20_12 inf = dfixed_init(in); u32 dda_inc; int max;
@@ -80,9 +166,10 @@ static inline u32 compute_dda_inc(fixed20_12 inf, unsigned int out, bool v, return dda_inc; }
-static inline u32 compute_initial_dda(fixed20_12 in) +static inline u32 compute_initial_dda(unsigned int in) { - return dfixed_frac(in); + fixed20_12 inf = dfixed_init(in); + return dfixed_frac(inf); }
static int tegra_dc_set_timings(struct tegra_dc *dc, @@ -153,6 +240,111 @@ static int tegra_crtc_setup_clk(struct drm_crtc *crtc, return 0; }
+int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index, + const struct tegra_dc_window *window) +{ + unsigned h_offset, v_offset, h_size, v_size, h_dda, v_dda, bpp; + unsigned long value; + + bpp = window->bits_per_pixel / 8; + + value = WINDOW_A_SELECT << index; + tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER); + + tegra_dc_writel(dc, window->format, DC_WIN_COLOR_DEPTH); + tegra_dc_writel(dc, 0, DC_WIN_BYTE_SWAP); + + value = V_POSITION(window->dst.y) | H_POSITION(window->dst.x); + tegra_dc_writel(dc, value, DC_WIN_POSITION); + + value = V_SIZE(window->dst.h) | H_SIZE(window->dst.w); + tegra_dc_writel(dc, value, DC_WIN_SIZE); + + h_offset = window->src.x * bpp; + v_offset = window->src.y; + h_size = window->src.w * bpp; + v_size = window->src.h; + + value = V_PRESCALED_SIZE(v_size) | H_PRESCALED_SIZE(h_size); + tegra_dc_writel(dc, value, DC_WIN_PRESCALED_SIZE); + + h_dda = compute_dda_inc(window->src.w, window->dst.w, false, bpp); + v_dda = compute_dda_inc(window->src.h, window->dst.h, true, bpp); + + value = V_DDA_INC(v_dda) | H_DDA_INC(h_dda); + tegra_dc_writel(dc, value, DC_WIN_DDA_INC); + + h_dda = compute_initial_dda(window->src.x); + v_dda = compute_initial_dda(window->src.y); + + tegra_dc_writel(dc, h_dda, DC_WIN_H_INITIAL_DDA); + tegra_dc_writel(dc, v_dda, DC_WIN_V_INITIAL_DDA); + + tegra_dc_writel(dc, 0, DC_WIN_UV_BUF_STRIDE); + tegra_dc_writel(dc, 0, DC_WIN_BUF_STRIDE); + + tegra_dc_writel(dc, window->base, DC_WINBUF_START_ADDR); + tegra_dc_writel(dc, window->stride, DC_WIN_LINE_STRIDE); + tegra_dc_writel(dc, h_offset, DC_WINBUF_ADDR_H_OFFSET); + tegra_dc_writel(dc, v_offset, DC_WINBUF_ADDR_V_OFFSET); + + value = WIN_ENABLE; + + if (bpp < 24) + value |= COLOR_EXPAND; + + tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS); + + /* + * Disable blending and assume Window A is the bottom-most window, + * Window C is the top-most window and Window B is in the middle. + */ + tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_NOKEY); + tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_1WIN); + + switch (index) { + case 0: + tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_X); + tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_Y); + tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_3WIN_XY); + break; + + case 1: + tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_X); + tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_Y); + tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_3WIN_XY); + break; + + case 2: + tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_X); + tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_Y); + tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_3WIN_XY); + break; + } + + value = (WIN_A_ACT_REQ << index) | (WIN_A_UPDATE << index); + tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL); + + return 0; +} + +unsigned int tegra_dc_format(uint32_t format) +{ + switch (format) { + case DRM_FORMAT_XRGB8888: + return WIN_COLOR_DEPTH_B8G8R8A8; + + case DRM_FORMAT_RGB565: + return WIN_COLOR_DEPTH_B5G6R5; + + default: + break; + } + + WARN(1, "unsupported pixel format %u, using default\n", format); + return WIN_COLOR_DEPTH_B8G8R8A8; +} + static int tegra_crtc_mode_set(struct drm_crtc *crtc, struct drm_display_mode *mode, struct drm_display_mode *adjusted, @@ -160,8 +352,7 @@ static int tegra_crtc_mode_set(struct drm_crtc *crtc, { struct tegra_framebuffer *fb = to_tegra_fb(crtc->fb); struct tegra_dc *dc = to_tegra_dc(crtc); - unsigned int h_dda, v_dda, bpp; - struct tegra_dc_window win; + struct tegra_dc_window window; unsigned long div, value; int err;
@@ -192,81 +383,23 @@ static int tegra_crtc_mode_set(struct drm_crtc *crtc, tegra_dc_writel(dc, value, DC_DISP_DISP_CLOCK_CONTROL);
/* setup window parameters */ - memset(&win, 0, sizeof(win)); - win.x.full = dfixed_const(0); - win.y.full = dfixed_const(0); - win.w.full = dfixed_const(mode->hdisplay); - win.h.full = dfixed_const(mode->vdisplay); - win.outx = 0; - win.outy = 0; - win.outw = mode->hdisplay; - win.outh = mode->vdisplay; - - switch (crtc->fb->pixel_format) { - case DRM_FORMAT_XRGB8888: - win.fmt = WIN_COLOR_DEPTH_B8G8R8A8; - break; - - case DRM_FORMAT_RGB565: - win.fmt = WIN_COLOR_DEPTH_B5G6R5; - break; - - default: - win.fmt = WIN_COLOR_DEPTH_B8G8R8A8; - WARN_ON(1); - break; - } - - bpp = crtc->fb->bits_per_pixel / 8; - win.stride = crtc->fb->pitches[0]; - - /* program window registers */ - value = WINDOW_A_SELECT; - tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER); - - tegra_dc_writel(dc, win.fmt, DC_WIN_COLOR_DEPTH); - tegra_dc_writel(dc, 0, DC_WIN_BYTE_SWAP); - - value = V_POSITION(win.outy) | H_POSITION(win.outx); - tegra_dc_writel(dc, value, DC_WIN_POSITION); - - value = V_SIZE(win.outh) | H_SIZE(win.outw); - tegra_dc_writel(dc, value, DC_WIN_SIZE); - - value = V_PRESCALED_SIZE(dfixed_trunc(win.h)) | - H_PRESCALED_SIZE(dfixed_trunc(win.w) * bpp); - tegra_dc_writel(dc, value, DC_WIN_PRESCALED_SIZE); - - h_dda = compute_dda_inc(win.w, win.outw, false, bpp); - v_dda = compute_dda_inc(win.h, win.outh, true, bpp); - - value = V_DDA_INC(v_dda) | H_DDA_INC(h_dda); - tegra_dc_writel(dc, value, DC_WIN_DDA_INC); - - h_dda = compute_initial_dda(win.x); - v_dda = compute_initial_dda(win.y); - - tegra_dc_writel(dc, h_dda, DC_WIN_H_INITIAL_DDA); - tegra_dc_writel(dc, v_dda, DC_WIN_V_INITIAL_DDA); - - tegra_dc_writel(dc, 0, DC_WIN_UV_BUF_STRIDE); - tegra_dc_writel(dc, 0, DC_WIN_BUF_STRIDE); - - tegra_dc_writel(dc, fb->obj->paddr, DC_WINBUF_START_ADDR); - tegra_dc_writel(dc, win.stride, DC_WIN_LINE_STRIDE); - tegra_dc_writel(dc, dfixed_trunc(win.x) * bpp, - DC_WINBUF_ADDR_H_OFFSET); - tegra_dc_writel(dc, dfixed_trunc(win.y), DC_WINBUF_ADDR_V_OFFSET); - - value = WIN_ENABLE; - - if (bpp < 24) - value |= COLOR_EXPAND; - - tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS); - - tegra_dc_writel(dc, 0xff00, DC_WIN_BLEND_NOKEY); - tegra_dc_writel(dc, 0xff00, DC_WIN_BLEND_1WIN); + memset(&window, 0, sizeof(window)); + window.src.x = 0; + window.src.y = 0; + window.src.w = mode->hdisplay; + window.src.h = mode->vdisplay; + window.dst.x = 0; + window.dst.y = 0; + window.dst.w = mode->hdisplay; + window.dst.h = mode->vdisplay; + window.format = tegra_dc_format(crtc->fb->pixel_format); + window.bits_per_pixel = crtc->fb->bits_per_pixel; + window.stride = crtc->fb->pitches[0]; + window.base = fb->obj->paddr; + + err = tegra_dc_setup_window(dc, 0, &window); + if (err < 0) + dev_err(dc->dev, "failed to enable root plane\n");
return 0; } @@ -588,7 +721,7 @@ static int tegra_dc_show_regs(struct seq_file *s, void *data) DUMP_REG(DC_WIN_BLEND_1WIN); DUMP_REG(DC_WIN_BLEND_2WIN_X); DUMP_REG(DC_WIN_BLEND_2WIN_Y); - DUMP_REG(DC_WIN_BLEND32WIN_XY); + DUMP_REG(DC_WIN_BLEND_3WIN_XY); DUMP_REG(DC_WIN_HP_FETCH_CONTROL); DUMP_REG(DC_WINBUF_START_ADDR); DUMP_REG(DC_WINBUF_START_ADDR_NS); @@ -690,6 +823,10 @@ static int tegra_dc_drm_init(struct host1x_client *client, return err; }
+ err = tegra_dc_add_planes(drm, dc); + if (err < 0) + return err; + if (IS_ENABLED(CONFIG_DEBUG_FS)) { err = tegra_dc_debugfs_init(dc, drm->primary); if (err < 0) diff --git a/drivers/gpu/drm/tegra/dc.h b/drivers/gpu/drm/tegra/dc.h index 99977b5..ccfc220 100644 --- a/drivers/gpu/drm/tegra/dc.h +++ b/drivers/gpu/drm/tegra/dc.h @@ -359,7 +359,7 @@ #define DC_WIN_BLEND_1WIN 0x710 #define DC_WIN_BLEND_2WIN_X 0x711 #define DC_WIN_BLEND_2WIN_Y 0x712 -#define DC_WIN_BLEND32WIN_XY 0x713 +#define DC_WIN_BLEND_3WIN_XY 0x713
#define DC_WIN_HP_FETCH_CONTROL 0x714
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index 741b5dc..835f9a3 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -28,6 +28,11 @@ static inline struct tegra_framebuffer *to_tegra_fb(struct drm_framebuffer *fb) return container_of(fb, struct tegra_framebuffer, base); }
+static inline unsigned long tegra_framebuffer_base(struct drm_framebuffer *fb) +{ + return to_tegra_fb(fb)->obj->paddr; +} + struct host1x { struct drm_device *drm; struct device *dev; @@ -118,6 +123,30 @@ static inline unsigned long tegra_dc_readl(struct tegra_dc *dc, return readl(dc->regs + (reg << 2)); }
+struct tegra_dc_window { + struct { + unsigned int x; + unsigned int y; + unsigned int w; + unsigned int h; + } src; + struct { + unsigned int x; + unsigned int y; + unsigned int w; + unsigned int h; + } dst; + unsigned int bits_per_pixel; + unsigned int format; + unsigned int stride; + unsigned long base; +}; + +/* from dc.c */ +extern unsigned int tegra_dc_format(uint32_t format); +extern int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index, + const struct tegra_dc_window *window); + struct tegra_output_ops { int (*enable)(struct tegra_output *output); int (*disable)(struct tegra_output *output);
Am Montag, den 14.01.2013, 17:05 +0100 schrieb Thierry Reding:
Add support for the B and C planes which support RGB and YUV pixel formats and can be used as overlays or hardware cursor.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
[...]
+static int tegra_plane_disable(struct drm_plane *plane) +{
- struct tegra_dc *dc = to_tegra_dc(plane->crtc);
- struct tegra_plane *p = to_tegra_plane(plane);
- unsigned int index = p->index + 1;
- unsigned long value;
- value = WINDOW_A_SELECT << index;
- tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER);
- value = tegra_dc_readl(dc, DC_WIN_WIN_OPTIONS);
- value &= ~WIN_ENABLE;
- tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS);
- value = (WIN_A_ACT_REQ << index) | (WIN_A_UPDATE << index);
- tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
This should be two separate writes to the register. I don't know how relevant this is on real HW, but the TRM states: "Restrictions: ACT_REQ cannot be programmed at the same time the corresponding "UPDATE" is programmed."
Better be safe than sorry and split it up. [...]
+int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index,
const struct tegra_dc_window *window)
+{
- unsigned h_offset, v_offset, h_size, v_size, h_dda, v_dda, bpp;
- unsigned long value;
- bpp = window->bits_per_pixel / 8;
- value = WINDOW_A_SELECT << index;
- tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER);
- tegra_dc_writel(dc, window->format, DC_WIN_COLOR_DEPTH);
- tegra_dc_writel(dc, 0, DC_WIN_BYTE_SWAP);
- value = V_POSITION(window->dst.y) | H_POSITION(window->dst.x);
- tegra_dc_writel(dc, value, DC_WIN_POSITION);
- value = V_SIZE(window->dst.h) | H_SIZE(window->dst.w);
- tegra_dc_writel(dc, value, DC_WIN_SIZE);
- h_offset = window->src.x * bpp;
- v_offset = window->src.y;
- h_size = window->src.w * bpp;
- v_size = window->src.h;
- value = V_PRESCALED_SIZE(v_size) | H_PRESCALED_SIZE(h_size);
- tegra_dc_writel(dc, value, DC_WIN_PRESCALED_SIZE);
- h_dda = compute_dda_inc(window->src.w, window->dst.w, false, bpp);
- v_dda = compute_dda_inc(window->src.h, window->dst.h, true, bpp);
- value = V_DDA_INC(v_dda) | H_DDA_INC(h_dda);
- tegra_dc_writel(dc, value, DC_WIN_DDA_INC);
- h_dda = compute_initial_dda(window->src.x);
- v_dda = compute_initial_dda(window->src.y);
- tegra_dc_writel(dc, h_dda, DC_WIN_H_INITIAL_DDA);
- tegra_dc_writel(dc, v_dda, DC_WIN_V_INITIAL_DDA);
- tegra_dc_writel(dc, 0, DC_WIN_UV_BUF_STRIDE);
- tegra_dc_writel(dc, 0, DC_WIN_BUF_STRIDE);
- tegra_dc_writel(dc, window->base, DC_WINBUF_START_ADDR);
- tegra_dc_writel(dc, window->stride, DC_WIN_LINE_STRIDE);
- tegra_dc_writel(dc, h_offset, DC_WINBUF_ADDR_H_OFFSET);
- tegra_dc_writel(dc, v_offset, DC_WINBUF_ADDR_V_OFFSET);
- value = WIN_ENABLE;
- if (bpp < 24)
value |= COLOR_EXPAND;
- tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS);
- /*
* Disable blending and assume Window A is the bottom-most window,
* Window C is the top-most window and Window B is in the middle.
*/
I would like to see the root window using WIN_C, so we only loose the least capable plane (WIN_A: no filtering or YUV conversion) when using a plane for the hardware cursor. Maybe you can fold this in, otherwise I'll send a patch on top of this series.
- tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_NOKEY);
- tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_1WIN);
- switch (index) {
- case 0:
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_X);
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_Y);
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_3WIN_XY);
break;
- case 1:
tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_X);
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_Y);
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_3WIN_XY);
break;
- case 2:
tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_X);
tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_Y);
tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_3WIN_XY);
break;
- }
- value = (WIN_A_ACT_REQ << index) | (WIN_A_UPDATE << index);
- tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
Same comment as above.
- return 0;
+}
[...]
On Mon, Jan 14, 2013 at 06:03:44PM +0100, Lucas Stach wrote:
Am Montag, den 14.01.2013, 17:05 +0100 schrieb Thierry Reding:
[...]
- value = (WIN_A_ACT_REQ << index) | (WIN_A_UPDATE << index);
- tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
This should be two separate writes to the register. I don't know how relevant this is on real HW, but the TRM states: "Restrictions: ACT_REQ cannot be programmed at the same time the corresponding "UPDATE" is programmed."
Better be safe than sorry and split it up.
It doesn't seem to make a difference, but I can split it up anyway.
[...]
- /*
* Disable blending and assume Window A is the bottom-most window,
* Window C is the top-most window and Window B is in the middle.
*/
I would like to see the root window using WIN_C, so we only loose the least capable plane (WIN_A: no filtering or YUV conversion) when using a plane for the hardware cursor. Maybe you can fold this in, otherwise I'll send a patch on top of this series.
On the other hand, doing so will loose a perfectly good video overlay plane.
[...]
- value = (WIN_A_ACT_REQ << index) | (WIN_A_UPDATE << index);
- tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
Same comment as above.
Done. I'll fold a similar change into the .mode_set_base() patch and will also add a patch that converts the remaining occurrences in tegra_crtc_commit().
Thanks, Thierry
On 01/15/2013 12:05 AM, Thierry Reding wrote:
Add support for the B and C planes which support RGB and YUV pixel formats and can be used as overlays or hardware cursor.
I think "hardware cursor" has specific meaning for Tegra(e.g: Tegra30 has a 32x32 24bpp or 64x64 2bpp hardware cursor). So you may change it to "hardware accelerated cursor"?
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
[...]
+static const uint32_t plane_formats[] = {
- DRM_FORMAT_XRGB8888,
- DRM_FORMAT_YUV422,
I haven't found something related with YUV format in this patch set. For example, "tegra_dc_format" also doesn't take YUV into consideration. So remove this line.
+};
+static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc) +{
- unsigned int i;
- int err = 0;
- for (i = 0; i < 2; i++) {
struct tegra_plane *plane;
plane = devm_kzalloc(drm->dev, sizeof(*plane), GFP_KERNEL);
if (!plane)
return -ENOMEM;
plane->index = i;
I suggest to change this line to: "plane->index = i + 1;". This makes the plane's index be consistent with Tegra's windows number. And also we don't need to worry about passing "plane->index + 1" to some functions which need to know which window is operating on.
err = drm_plane_init(drm, &plane->base, 1 << dc->pipe,
&tegra_plane_funcs, plane_formats,
ARRAY_SIZE(plane_formats), false);
if (err < 0)
return err;
- }
- return 0;
+}
[...]
+int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index,
const struct tegra_dc_window *window)
+{
- unsigned h_offset, v_offset, h_size, v_size, h_dda, v_dda, bpp;
- unsigned long value;
- bpp = window->bits_per_pixel / 8;
- value = WINDOW_A_SELECT << index;
- tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER);
- tegra_dc_writel(dc, window->format, DC_WIN_COLOR_DEPTH);
- tegra_dc_writel(dc, 0, DC_WIN_BYTE_SWAP);
- value = V_POSITION(window->dst.y) | H_POSITION(window->dst.x);
- tegra_dc_writel(dc, value, DC_WIN_POSITION);
- value = V_SIZE(window->dst.h) | H_SIZE(window->dst.w);
- tegra_dc_writel(dc, value, DC_WIN_SIZE);
- h_offset = window->src.x * bpp;
- v_offset = window->src.y;
- h_size = window->src.w * bpp;
- v_size = window->src.h;
- value = V_PRESCALED_SIZE(v_size) | H_PRESCALED_SIZE(h_size);
- tegra_dc_writel(dc, value, DC_WIN_PRESCALED_SIZE);
- h_dda = compute_dda_inc(window->src.w, window->dst.w, false, bpp);
- v_dda = compute_dda_inc(window->src.h, window->dst.h, true, bpp);
- value = V_DDA_INC(v_dda) | H_DDA_INC(h_dda);
- tegra_dc_writel(dc, value, DC_WIN_DDA_INC);
- h_dda = compute_initial_dda(window->src.x);
- v_dda = compute_initial_dda(window->src.y);
In current implementation, "compute_initial_dda" always returns zero. So why we need it? Although according to TRM, typically we set H/V initial dda to zero.
- tegra_dc_writel(dc, h_dda, DC_WIN_H_INITIAL_DDA);
- tegra_dc_writel(dc, v_dda, DC_WIN_V_INITIAL_DDA);
- tegra_dc_writel(dc, 0, DC_WIN_UV_BUF_STRIDE);
- tegra_dc_writel(dc, 0, DC_WIN_BUF_STRIDE);
- tegra_dc_writel(dc, window->base, DC_WINBUF_START_ADDR);
- tegra_dc_writel(dc, window->stride, DC_WIN_LINE_STRIDE);
- tegra_dc_writel(dc, h_offset, DC_WINBUF_ADDR_H_OFFSET);
- tegra_dc_writel(dc, v_offset, DC_WINBUF_ADDR_V_OFFSET);
- value = WIN_ENABLE;
- if (bpp < 24)
value |= COLOR_EXPAND;
- tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS);
- /*
* Disable blending and assume Window A is the bottom-most window,
* Window C is the top-most window and Window B is in the middle.
*/
- tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_NOKEY);
- tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_1WIN);
- switch (index) {
- case 0:
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_X);
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_Y);
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_3WIN_XY);
break;
- case 1:
tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_X);
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_2WIN_Y);
tegra_dc_writel(dc, 0x000000, DC_WIN_BLEND_3WIN_XY);
break;
- case 2:
tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_X);
tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_2WIN_Y);
tegra_dc_writel(dc, 0xffff00, DC_WIN_BLEND_3WIN_XY);
break;
- }
- value = (WIN_A_ACT_REQ << index) | (WIN_A_UPDATE << index);
- tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
- return 0;
+}
+unsigned int tegra_dc_format(uint32_t format) +{
- switch (format) {
- case DRM_FORMAT_XRGB8888:
return WIN_COLOR_DEPTH_B8G8R8A8;
Just for curious, why we choose "WIN_COLOR_DEPTH_B8G8R8A8" while not "WIN_COLOR_DEPTH_R8G8B8A8" here? I recall you and Stephen talked about this last year but I still don't know the reason.
- case DRM_FORMAT_RGB565:
return WIN_COLOR_DEPTH_B5G6R5;
- default:
break;
- }
- WARN(1, "unsupported pixel format %u, using default\n", format);
- return WIN_COLOR_DEPTH_B8G8R8A8;
+}
[...]
+/* from dc.c */ +extern unsigned int tegra_dc_format(uint32_t format); +extern int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index,
const struct tegra_dc_window *window);
struct tegra_output_ops { int (*enable)(struct tegra_output *output); int (*disable)(struct tegra_output *output);
Am Dienstag, den 15.01.2013, 17:53 +0800 schrieb Mark Zhang:
On 01/15/2013 12:05 AM, Thierry Reding wrote:
Add support for the B and C planes which support RGB and YUV pixel formats and can be used as overlays or hardware cursor.
I think "hardware cursor" has specific meaning for Tegra(e.g: Tegra30 has a 32x32 24bpp or 64x64 2bpp hardware cursor). So you may change it to "hardware accelerated cursor"?
According to the TRM no Tegra has ARGB hardware cursor support, but only 2-color. So we talked about doing the hardware cursor by using a plane. If the TRM is wrong in this regard and we can get a ARGB cursor on Tegra 3 it would be nice to know.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
[...]
+};
+static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc) +{
- unsigned int i;
- int err = 0;
- for (i = 0; i < 2; i++) {
struct tegra_plane *plane;
plane = devm_kzalloc(drm->dev, sizeof(*plane), GFP_KERNEL);
if (!plane)
return -ENOMEM;
plane->index = i;
I suggest to change this line to: "plane->index = i + 1;". This makes the plane's index be consistent with Tegra's windows number. And also we don't need to worry about passing "plane->index + 1" to some functions which need to know which window is operating on.
Again, if we make WIN_C the root window, we can keep the plane index assignment as is and get rid of the "index + 1" passing.
On 01/15/2013 06:50 PM, Lucas Stach wrote:
Am Dienstag, den 15.01.2013, 17:53 +0800 schrieb Mark Zhang:
On 01/15/2013 12:05 AM, Thierry Reding wrote:
Add support for the B and C planes which support RGB and YUV pixel formats and can be used as overlays or hardware cursor.
I think "hardware cursor" has specific meaning for Tegra(e.g: Tegra30 has a 32x32 24bpp or 64x64 2bpp hardware cursor). So you may change it to "hardware accelerated cursor"?
According to the TRM no Tegra has ARGB hardware cursor support, but only 2-color. So we talked about doing the hardware cursor by using a plane. If the TRM is wrong in this regard and we can get a ARGB cursor on Tegra 3 it would be nice to know.
Lucas, yes, TRM says "Hardware cursor is supported for 32x32 or for 64x64 2-bpp cursor.", but just as you can see, we can set cursor's foreground & background color by register "DC_DISP_CURSOR_FOREGROUND_0 " & "DC_DISP_CURSOR_BACKGROUND_0".
So I asked the expert in nvidia and here is the explanation of the hardware cursor:
"each pixel in the cursor is encoded by 2 bits. only 3 values are used per pixel: transparent, foreground, background.
when pixel is transparent - no pixel is displayed. (also known as a mask) when pixel is foreground - color of pixel is 24-bit value in DC_DISP_CURSOR_FOREGROUND_0. when pixel is background - color of pixel is 24-bit value in DC_DISP_CURSOR_BACKGROUND_0.
So I would still phrase it as a 2-bit cursor. It's a palette with 2 colors plus a 1-bit alpha. The palette entries are 24-bit."
Mark
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
[...]
+};
+static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc) +{
- unsigned int i;
- int err = 0;
- for (i = 0; i < 2; i++) {
struct tegra_plane *plane;
plane = devm_kzalloc(drm->dev, sizeof(*plane), GFP_KERNEL);
if (!plane)
return -ENOMEM;
plane->index = i;
I suggest to change this line to: "plane->index = i + 1;". This makes the plane's index be consistent with Tegra's windows number. And also we don't need to worry about passing "plane->index + 1" to some functions which need to know which window is operating on.
Again, if we make WIN_C the root window, we can keep the plane index assignment as is and get rid of the "index + 1" passing.
On Tue, Jan 15, 2013 at 05:53:03PM +0800, Mark Zhang wrote:
On 01/15/2013 12:05 AM, Thierry Reding wrote:
Add support for the B and C planes which support RGB and YUV pixel formats and can be used as overlays or hardware cursor.
I think "hardware cursor" has specific meaning for Tegra(e.g: Tegra30 has a 32x32 24bpp or 64x64 2bpp hardware cursor). So you may change it to "hardware accelerated cursor"?
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
[...]
+static const uint32_t plane_formats[] = {
- DRM_FORMAT_XRGB8888,
- DRM_FORMAT_YUV422,
I haven't found something related with YUV format in this patch set. For example, "tegra_dc_format" also doesn't take YUV into consideration. So remove this line.
Also note that YUV422 is a planar format. And since it's not the most common 4:2:2 format, my first guess would be that it's probably not what you wanted. YUYV or UYVY is more likely the one you're after.
On Tue, Jan 15, 2013 at 01:35:32PM +0200, Ville Syrjälä wrote:
On Tue, Jan 15, 2013 at 05:53:03PM +0800, Mark Zhang wrote:
On 01/15/2013 12:05 AM, Thierry Reding wrote:
Add support for the B and C planes which support RGB and YUV pixel formats and can be used as overlays or hardware cursor.
I think "hardware cursor" has specific meaning for Tegra(e.g: Tegra30 has a 32x32 24bpp or 64x64 2bpp hardware cursor). So you may change it to "hardware accelerated cursor"?
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
[...]
+static const uint32_t plane_formats[] = {
- DRM_FORMAT_XRGB8888,
- DRM_FORMAT_YUV422,
I haven't found something related with YUV format in this patch set. For example, "tegra_dc_format" also doesn't take YUV into consideration. So remove this line.
Also note that YUV422 is a planar format. And since it's not the most common 4:2:2 format, my first guess would be that it's probably not what you wanted. YUYV or UYVY is more likely the one you're after.
Yes, I copied it from the TRM, which has YUV422 listed as non-planar format (it has YUV422, which is the planar variant). It isn't very specific about which variant YUV422 really is, though.
As Mark said, the window setup code can't handle planar formats yet and tegra_dc_format() doesn't convert between DRM and Tegra formats other than 32-bit and 16-bit RGB either, so maybe I should just drop it instead.
Thierry
The sequence for replacing the scanout buffer is much shorter than a full mode change operation so implementing this callback considerably speeds up cases where only a new framebuffer is to be scanned out.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de --- drivers/gpu/drm/tegra/dc.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 157e962..8dd7d8a 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -116,6 +116,25 @@ static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc) return 0; }
+static int tegra_dc_set_base(struct tegra_dc *dc, int x, int y, + struct tegra_framebuffer *fb) +{ + unsigned long value; + + tegra_dc_writel(dc, WINDOW_A_SELECT, DC_CMD_DISPLAY_WINDOW_HEADER); + + value = fb->base.offsets[0] + y * fb->base.pitches[0] + + x * fb->base.bits_per_pixel / 8; + + tegra_dc_writel(dc, fb->obj->paddr + value, DC_WINBUF_START_ADDR); + + value = GENERAL_ACT_REQ | WIN_A_ACT_REQ; + value |= GENERAL_UPDATE | WIN_A_UPDATE; + tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL); + + return 0; +} + static const struct drm_crtc_funcs tegra_crtc_funcs = { .set_config = drm_crtc_helper_set_config, .destroy = drm_crtc_cleanup, @@ -404,6 +423,15 @@ static int tegra_crtc_mode_set(struct drm_crtc *crtc, return 0; }
+static int tegra_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y, + struct drm_framebuffer *old_fb) +{ + struct tegra_framebuffer *fb = to_tegra_fb(crtc->fb); + struct tegra_dc *dc = to_tegra_dc(crtc); + + return tegra_dc_set_base(dc, x, y, fb); +} + static void tegra_crtc_prepare(struct drm_crtc *crtc) { struct tegra_dc *dc = to_tegra_dc(crtc); @@ -483,6 +511,7 @@ static const struct drm_crtc_helper_funcs tegra_crtc_helper_funcs = { .dpms = tegra_crtc_dpms, .mode_fixup = tegra_crtc_mode_fixup, .mode_set = tegra_crtc_mode_set, + .mode_set_base = tegra_crtc_mode_set_base, .prepare = tegra_crtc_prepare, .commit = tegra_crtc_commit, .load_lut = tegra_crtc_load_lut,
Am Montag, den 14.01.2013, 17:05 +0100 schrieb Thierry Reding:
The sequence for replacing the scanout buffer is much shorter than a full mode change operation so implementing this callback considerably speeds up cases where only a new framebuffer is to be scanned out.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
Aside from the small comment: Reviewed-by: Lucas Stach dev@lynxeye.de
drivers/gpu/drm/tegra/dc.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 157e962..8dd7d8a 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -116,6 +116,25 @@ static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc) return 0; }
+static int tegra_dc_set_base(struct tegra_dc *dc, int x, int y,
struct tegra_framebuffer *fb)
+{
- unsigned long value;
- tegra_dc_writel(dc, WINDOW_A_SELECT, DC_CMD_DISPLAY_WINDOW_HEADER);
- value = fb->base.offsets[0] + y * fb->base.pitches[0] +
x * fb->base.bits_per_pixel / 8;
- tegra_dc_writel(dc, fb->obj->paddr + value, DC_WINBUF_START_ADDR);
- value = GENERAL_ACT_REQ | WIN_A_ACT_REQ;
- value |= GENERAL_UPDATE | WIN_A_UPDATE;
- tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
Make this two separate writes.
- return 0;
+}
static const struct drm_crtc_funcs tegra_crtc_funcs = { .set_config = drm_crtc_helper_set_config, .destroy = drm_crtc_cleanup, @@ -404,6 +423,15 @@ static int tegra_crtc_mode_set(struct drm_crtc *crtc, return 0; }
+static int tegra_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
struct drm_framebuffer *old_fb)
+{
- struct tegra_framebuffer *fb = to_tegra_fb(crtc->fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- return tegra_dc_set_base(dc, x, y, fb);
+}
static void tegra_crtc_prepare(struct drm_crtc *crtc) { struct tegra_dc *dc = to_tegra_dc(crtc); @@ -483,6 +511,7 @@ static const struct drm_crtc_helper_funcs tegra_crtc_helper_funcs = { .dpms = tegra_crtc_dpms, .mode_fixup = tegra_crtc_mode_fixup, .mode_set = tegra_crtc_mode_set,
- .mode_set_base = tegra_crtc_mode_set_base, .prepare = tegra_crtc_prepare, .commit = tegra_crtc_commit, .load_lut = tegra_crtc_load_lut,
On 01/15/2013 12:05 AM, Thierry Reding wrote:
The sequence for replacing the scanout buffer is much shorter than a full mode change operation so implementing this callback considerably speeds up cases where only a new framebuffer is to be scanned out.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
drivers/gpu/drm/tegra/dc.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 157e962..8dd7d8a 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -116,6 +116,25 @@ static int tegra_dc_add_planes(struct drm_device *drm, struct tegra_dc *dc) return 0; }
+static int tegra_dc_set_base(struct tegra_dc *dc, int x, int y,
struct tegra_framebuffer *fb)
+{
- unsigned long value;
- tegra_dc_writel(dc, WINDOW_A_SELECT, DC_CMD_DISPLAY_WINDOW_HEADER);
- value = fb->base.offsets[0] + y * fb->base.pitches[0] +
x * fb->base.bits_per_pixel / 8;
- tegra_dc_writel(dc, fb->obj->paddr + value, DC_WINBUF_START_ADDR);
Add one line here:
tegra_dc_writel(dc, fb->base.pitches[0], DC_WIN_LINE_STRIDE);
I mentioned in previous mail that the page-flip doesn't work well while multiple heads attached(in my test case, LVDS & HDMI are enabled). And I found out that this is because we didn't update the line stride according to the new FB.
But ideally, I think we don't need to update the line stride setting. The reason that we need it here is: We set a "incorrect" line stride intentionally when multiple-head is enabled.
I use a Tegra30 cardhu board for testing. The LVDS panel works on 1366x768 while the resolution of the HDMI is 1080p. The size of the FB created during KMS is 1080p. To make the LVDS & HDMI show correct, we set the dc0's(connected with LVDS) line stride to 7680(1920x4). So the final result is, the LVDS panel shows the (0,0)/(1366,768) portion in a 1080p FB.
But the video mode of the LVDS which reported to user space is still 1366x768. So the userspace program creates 2 FBs based on that and ask drm driver to flip them. So we need to update the line stride here.
Actually, I think we need to find a better solution when multi-head is enabled. For example, create a large FB and make two dcs display the different part of it(just like the XRANDR's extension mode). Or maybe we can take a look at other drm drivers for inspiration.
Mark
- value = GENERAL_ACT_REQ | WIN_A_ACT_REQ;
- value |= GENERAL_UPDATE | WIN_A_UPDATE;
- tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
- return 0;
+}
static const struct drm_crtc_funcs tegra_crtc_funcs = { .set_config = drm_crtc_helper_set_config, .destroy = drm_crtc_cleanup, @@ -404,6 +423,15 @@ static int tegra_crtc_mode_set(struct drm_crtc *crtc, return 0; }
+static int tegra_crtc_mode_set_base(struct drm_crtc *crtc, int x, int y,
struct drm_framebuffer *old_fb)
+{
- struct tegra_framebuffer *fb = to_tegra_fb(crtc->fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- return tegra_dc_set_base(dc, x, y, fb);
+}
static void tegra_crtc_prepare(struct drm_crtc *crtc) { struct tegra_dc *dc = to_tegra_dc(crtc); @@ -483,6 +511,7 @@ static const struct drm_crtc_helper_funcs tegra_crtc_helper_funcs = { .dpms = tegra_crtc_dpms, .mode_fixup = tegra_crtc_mode_fixup, .mode_set = tegra_crtc_mode_set,
- .mode_set_base = tegra_crtc_mode_set_base, .prepare = tegra_crtc_prepare, .commit = tegra_crtc_commit, .load_lut = tegra_crtc_load_lut,
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de --- drivers/gpu/drm/tegra/dc.c | 55 +++++++++++++++++++++++++++++++-------------- drivers/gpu/drm/tegra/drm.c | 44 ++++++++++++++++++++++++++++++++++++ drivers/gpu/drm/tegra/drm.h | 3 +++ 3 files changed, 85 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 8dd7d8a..3aa7ccc 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -135,6 +135,32 @@ static int tegra_dc_set_base(struct tegra_dc *dc, int x, int y, return 0; }
+void tegra_dc_enable_vblank(struct tegra_dc *dc) +{ + unsigned long value, flags; + + spin_lock_irqsave(&dc->lock, flags); + + value = tegra_dc_readl(dc, DC_CMD_INT_MASK); + value |= VBLANK_INT; + tegra_dc_writel(dc, value, DC_CMD_INT_MASK); + + spin_unlock_irqrestore(&dc->lock, flags); +} + +void tegra_dc_disable_vblank(struct tegra_dc *dc) +{ + unsigned long value, flags; + + spin_lock_irqsave(&dc->lock, flags); + + value = tegra_dc_readl(dc, DC_CMD_INT_MASK); + value &= ~VBLANK_INT; + tegra_dc_writel(dc, value, DC_CMD_INT_MASK); + + spin_unlock_irqrestore(&dc->lock, flags); +} + static const struct drm_crtc_funcs tegra_crtc_funcs = { .set_config = drm_crtc_helper_set_config, .destroy = drm_crtc_cleanup, @@ -375,6 +401,8 @@ static int tegra_crtc_mode_set(struct drm_crtc *crtc, unsigned long div, value; int err;
+ drm_vblank_pre_modeset(crtc->dev, dc->pipe); + err = tegra_crtc_setup_clk(crtc, mode, &div); if (err) { dev_err(dc->dev, "failed to setup clock for CRTC: %d\n", err); @@ -476,31 +504,23 @@ static void tegra_crtc_prepare(struct drm_crtc *crtc) tegra_dc_writel(dc, value, DC_DISP_DISP_MEM_HIGH_PRIORITY_TIMER);
value = VBLANK_INT | WIN_A_UF_INT | WIN_B_UF_INT | WIN_C_UF_INT; - tegra_dc_writel(dc, value, DC_CMD_INT_MASK); - - value = VBLANK_INT | WIN_A_UF_INT | WIN_B_UF_INT | WIN_C_UF_INT; tegra_dc_writel(dc, value, DC_CMD_INT_ENABLE); + + value = WIN_A_UF_INT | WIN_B_UF_INT | WIN_C_UF_INT; + tegra_dc_writel(dc, value, DC_CMD_INT_MASK); }
static void tegra_crtc_commit(struct drm_crtc *crtc) { struct tegra_dc *dc = to_tegra_dc(crtc); - unsigned long update_mask; unsigned long value;
- update_mask = GENERAL_ACT_REQ | WIN_A_ACT_REQ; - - tegra_dc_writel(dc, update_mask << 8, DC_CMD_STATE_CONTROL); - - value = tegra_dc_readl(dc, DC_CMD_INT_ENABLE); - value |= FRAME_END_INT; - tegra_dc_writel(dc, value, DC_CMD_INT_ENABLE); + value = GENERAL_ACT_REQ | WIN_A_ACT_REQ | + GENERAL_UPDATE | WIN_A_UPDATE;
- value = tegra_dc_readl(dc, DC_CMD_INT_MASK); - value |= FRAME_END_INT; - tegra_dc_writel(dc, value, DC_CMD_INT_MASK); + tegra_dc_writel(dc, value, DC_CMD_STATE_CONTROL);
- tegra_dc_writel(dc, update_mask, DC_CMD_STATE_CONTROL); + drm_vblank_post_modeset(crtc->dev, dc->pipe); }
static void tegra_crtc_load_lut(struct drm_crtc *crtc) @@ -517,7 +537,7 @@ static const struct drm_crtc_helper_funcs tegra_crtc_helper_funcs = { .load_lut = tegra_crtc_load_lut, };
-static irqreturn_t tegra_drm_irq(int irq, void *data) +static irqreturn_t tegra_dc_irq(int irq, void *data) { struct tegra_dc *dc = data; unsigned long status; @@ -862,7 +882,7 @@ static int tegra_dc_drm_init(struct host1x_client *client, dev_err(dc->dev, "debugfs setup failed: %d\n", err); }
- err = devm_request_irq(dc->dev, dc->irq, tegra_drm_irq, 0, + err = devm_request_irq(dc->dev, dc->irq, tegra_dc_irq, 0, dev_name(dc->dev), dc); if (err < 0) { dev_err(dc->dev, "failed to request IRQ#%u: %d\n", dc->irq, @@ -911,6 +931,7 @@ static int tegra_dc_probe(struct platform_device *pdev) if (!dc) return -ENOMEM;
+ spin_lock_init(&dc->lock); INIT_LIST_HEAD(&dc->list); dc->dev = &pdev->dev;
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 3a503c9..62f8121 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -40,6 +40,10 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (err < 0) return err;
+ err = drm_vblank_init(drm, drm->mode_config.num_crtc); + if (err < 0) + return err; + err = tegra_drm_fb_init(drm); if (err < 0) return err; @@ -89,6 +93,42 @@ static const struct file_operations tegra_drm_fops = { .llseek = noop_llseek, };
+static struct drm_crtc *tegra_crtc_from_pipe(struct drm_device *drm, int pipe) +{ + struct drm_crtc *crtc; + + list_for_each_entry(crtc, &drm->mode_config.crtc_list, head) { + struct tegra_dc *dc = to_tegra_dc(crtc); + + if (dc->pipe == pipe) + return crtc; + } + + return NULL; +} + +static int tegra_drm_enable_vblank(struct drm_device *drm, int pipe) +{ + struct drm_crtc *crtc = tegra_crtc_from_pipe(drm, pipe); + struct tegra_dc *dc = to_tegra_dc(crtc); + + if (!crtc) + return -ENODEV; + + tegra_dc_enable_vblank(dc); + + return 0; +} + +static void tegra_drm_disable_vblank(struct drm_device *drm, int pipe) +{ + struct drm_crtc *crtc = tegra_crtc_from_pipe(drm, pipe); + struct tegra_dc *dc = to_tegra_dc(crtc); + + if (crtc) + tegra_dc_disable_vblank(dc); +} + struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
+ .get_vblank_counter = drm_vblank_count, + .enable_vblank = tegra_drm_enable_vblank, + .disable_vblank = tegra_drm_disable_vblank, + .gem_free_object = drm_gem_cma_free_object, .gem_vm_ops = &drm_gem_cma_vm_ops, .dumb_create = drm_gem_cma_dumb_create, diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index 835f9a3..ca742b2 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -80,6 +80,7 @@ struct tegra_output;
struct tegra_dc { struct host1x_client client; + spinlock_t lock;
struct host1x *host1x; struct device *dev; @@ -146,6 +147,8 @@ struct tegra_dc_window { extern unsigned int tegra_dc_format(uint32_t format); extern int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index, const struct tegra_dc_window *window); +extern void tegra_dc_enable_vblank(struct tegra_dc *dc); +extern void tegra_dc_disable_vblank(struct tegra_dc *dc);
struct tegra_output_ops { int (*enable)(struct tegra_output *output);
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
- .get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
thanks, -mario
- .enable_vblank = tegra_drm_enable_vblank,
- .disable_vblank = tegra_drm_disable_vblank,
- .gem_free_object = drm_gem_cma_free_object, .gem_vm_ops = &drm_gem_cma_vm_ops, .dumb_create = drm_gem_cma_dumb_create,
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
Am Dienstag, den 22.01.2013, 18:37 +0100 schrieb Mario Kleiner:
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
- .get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
I've looked this up in the TRM a while ago as I know we have the same problem in nouveau, but it seems there is no HW vblank counter on Tegra.
Mario, you know a fair bit more about this than I do, what is the preferred way of handling this if we are sure we are not able to implement anything meaningful here? Just return 0?
Regards, Lucas
On Tue, Jan 22, 2013 at 10:39 AM, Lucas Stach dev@lynxeye.de wrote:
Am Dienstag, den 22.01.2013, 18:37 +0100 schrieb Mario Kleiner:
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
- .get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
I've looked this up in the TRM a while ago as I know we have the same problem in nouveau, but it seems there is no HW vblank counter on Tegra.
Mario, you know a fair bit more about this than I do, what is the preferred way of handling this if we are sure we are not able to implement anything meaningful here? Just return 0?
Regards, Lucas
In my branch for the old non-DRM version of the tegra driver, I clock gate and power gate display when using a one-shot smart panel. So not only are there no more IRQs, but even if Tegra had a hardware vblank counter it would also be dead too. (it doesn't have one, but I could make the case to add one in the next chip if we could actually make use of it, given my previous statement, I don't think it will help)
How badly do people need this feature? Because I really do think smart panels are going to been the norm in a few years. A bit off-topic to Thierry's submission, but I'd really like to discourage apps from relying on this feature, does anyone else agree?
- Jon
On 22.01.13 19:49, Jon Mayo wrote:
On Tue, Jan 22, 2013 at 10:39 AM, Lucas Stach dev@lynxeye.de wrote:
Am Dienstag, den 22.01.2013, 18:37 +0100 schrieb Mario Kleiner:
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
- .get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
I've looked this up in the TRM a while ago as I know we have the same problem in nouveau, but it seems there is no HW vblank counter on Tegra.
Mario, you know a fair bit more about this than I do, what is the preferred way of handling this if we are sure we are not able to implement anything meaningful here? Just return 0?
Regards, Lucas
In my branch for the old non-DRM version of the tegra driver, I clock gate and power gate display when using a one-shot smart panel. So not only are there no more IRQs, but even if Tegra had a hardware vblank counter it would also be dead too. (it doesn't have one, but I could make the case to add one in the next chip if we could actually make use of it, given my previous statement, I don't think it will help)
How badly do people need this feature? Because I really do think smart panels are going to been the norm in a few years.
How do smart panels work? Any reference to learn about these?
A bit off-topic to
Thierry's submission, but I'd really like to discourage apps from relying on this feature, does anyone else agree?
The current swap scheduling is based on having an accurate software vblank counter. Atm. that counter is maintained in software, incremented during vblank irq. At irq off -> on transition we need a hw counter to reinitialize. And there is a timeout between dropping the last reference to a counter via drm_vblank_put() and actually disabling the vblank irq. If the timeout is long enough and a timing sensitive app is aware that vblank counters may be inaccurate/unreliable over long time periods, it can work around the problem.
My app, e.g., which is a very timing sensitive scientific application toolkit, only assumes that vblank counts are consistent over a short period of time. It queries the vblank count and time as a baseline, calculates a target vblank count for swap from it and then schedules the swap for that target count. If vblank irq's stay active at least between the query call and scheduling a swap, all is good, so a timeout to irq disable of a couple of video refresh cycles would be safe, even if a driver doesn't have a working hw counter.
I think at least some media-players and some of the open source vdpau or vaapi implementations and maybe some compositors may rely on this as well for audio-video sync etc.
If the vblank disable mechanism is too aggressive in its power saving (= too short timeout) or the app is very trusting of the specs being robustly implemented it will go wrong.
It's a tradeoff between power-saving and robustness of the implementation.
The other way around this would be to have some new kernel api that executes swaps based on a target system time instead of a target vblank count. I assume that, e.g., vdpau's presentation api uses time-based scheduling for that reason, to avoid dependency on vblank counters.
-mario
On 22.01.2013 21:59, Mario Kleiner wrote:
The current swap scheduling is based on having an accurate software vblank counter. Atm. that counter is maintained in software, incremented during vblank irq. At irq off -> on transition we need a hw counter to reinitialize. And there is a timeout between dropping the last reference to a counter via drm_vblank_put() and actually disabling the vblank irq. If the timeout is long enough and a timing sensitive app is aware that vblank counters may be inaccurate/unreliable over long time periods, it can work around the problem.
We have a hardware counter that can be used as VBLANK counter - host1x sync point register numbers 26 and 27. tegradrm already sets them up in dc init (DC_CMD_CONT_SYNCPT_VSYNC). Basic syncpt support (read, wait) is part of my patch set to implement 2D acceleration.
Terje
On Wed, Jan 23, 2013 at 10:02:20AM +0200, Terje Bergström wrote:
On 22.01.2013 21:59, Mario Kleiner wrote:
The current swap scheduling is based on having an accurate software vblank counter. Atm. that counter is maintained in software, incremented during vblank irq. At irq off -> on transition we need a hw counter to reinitialize. And there is a timeout between dropping the last reference to a counter via drm_vblank_put() and actually disabling the vblank irq. If the timeout is long enough and a timing sensitive app is aware that vblank counters may be inaccurate/unreliable over long time periods, it can work around the problem.
We have a hardware counter that can be used as VBLANK counter - host1x sync point register numbers 26 and 27. tegradrm already sets them up in dc init (DC_CMD_CONT_SYNCPT_VSYNC). Basic syncpt support (read, wait) is part of my patch set to implement 2D acceleration.
Are the syncpoints incremented even if the VBLANK interrupts are turned off? If that's the case they could indeed be used as a hardware counter rather than the fake approach used right now.
Maybe we should leave the code as it is right now and fix that up once the syncpoint support has been merged.
Thierry
On 11.02.2013 01:08, Thierry Reding wrote:
Are the syncpoints incremented even if the VBLANK interrupts are turned off? If that's the case they could indeed be used as a hardware counter rather than the fake approach used right now.
Maybe we should leave the code as it is right now and fix that up once the syncpoint support has been merged.
Yes, the sync point increment signal to host1x is independent of VBLANK interrupt.
Definitely not needed yet, so that was a comment for future improvement.
Terje
On 22.01.13 19:39, Lucas Stach wrote:
Am Dienstag, den 22.01.2013, 18:37 +0100 schrieb Mario Kleiner:
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
- .get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
I've looked this up in the TRM a while ago as I know we have the same problem in nouveau, but it seems there is no HW vblank counter on Tegra.
Mario, you know a fair bit more about this than I do, what is the preferred way of handling this if we are sure we are not able to implement anything meaningful here? Just return 0?
I think 0 would be better, just to make it clear to readers of the source code that this function is basically unimplemented. For correctness it doesn't matter what you return, drm_vblank_count() is ok as well.
If we wanted, we could probably implement a good enough emulated hw-vblank counter, at least on nouveau? If there is some on-chip clock register that is driven by the same hardware oscillator as the crtc's so it doesn't drift, we could store the clock time in the vblank_disable() hook, and some measured duration of a video refresh, wrt. that clock. Then in .get_vblank_counter() calculate how many vblanks have elapsed from (current_clock_time - vblank_off_clock_time) / frame_duration and increment our own private software vblank counter. Something along that line...
-mario
On Tue, Jan 22, 2013 at 11:20 AM, Mario Kleiner mario.kleiner.de@gmail.com wrote:
On 22.01.13 19:39, Lucas Stach wrote:
Am Dienstag, den 22.01.2013, 18:37 +0100 schrieb Mario Kleiner:
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
.get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
I've looked this up in the TRM a while ago as I know we have the same problem in nouveau, but it seems there is no HW vblank counter on Tegra.
Mario, you know a fair bit more about this than I do, what is the preferred way of handling this if we are sure we are not able to implement anything meaningful here? Just return 0?
I think 0 would be better, just to make it clear to readers of the source code that this function is basically unimplemented. For correctness it doesn't matter what you return, drm_vblank_count() is ok as well.
If we wanted, we could probably implement a good enough emulated hw-vblank counter, at least on nouveau? If there is some on-chip clock register that is driven by the same hardware oscillator as the crtc's so it doesn't drift, we could store the clock time in the vblank_disable() hook, and some measured duration of a video refresh, wrt. that clock. Then in .get_vblank_counter() calculate how many vblanks have elapsed from (current_clock_time - vblank_off_clock_time) / frame_duration and increment our own private software vblank counter. Something along that line...
-mario
Looking through the T30 manuals, I see all the CLK_RST_CONTROLLER_CLK_SOURCE_xxx entries that support PLLD as a parent are display related. You won't find any timers or counters that use the same exact clock. Being out-of-sync is a real possibility, and something adaptive would have to be implement to hide the desync between a fake and a real vblank once you make the transition.
That said, I think being inaccurate here doesn't have a very high cost. Please give me some examples if you disagree though, I'd be interested in some good use cases to throw into my test plan.
- Jon
On 22.01.13 20:27, Jon Mayo wrote:
On Tue, Jan 22, 2013 at 11:20 AM, Mario Kleiner mario.kleiner.de@gmail.com wrote:
On 22.01.13 19:39, Lucas Stach wrote:
Am Dienstag, den 22.01.2013, 18:37 +0100 schrieb Mario Kleiner:
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET |
DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
.get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
I've looked this up in the TRM a while ago as I know we have the same problem in nouveau, but it seems there is no HW vblank counter on Tegra.
Mario, you know a fair bit more about this than I do, what is the preferred way of handling this if we are sure we are not able to implement anything meaningful here? Just return 0?
I think 0 would be better, just to make it clear to readers of the source code that this function is basically unimplemented. For correctness it doesn't matter what you return, drm_vblank_count() is ok as well.
If we wanted, we could probably implement a good enough emulated hw-vblank counter, at least on nouveau? If there is some on-chip clock register that is driven by the same hardware oscillator as the crtc's so it doesn't drift, we could store the clock time in the vblank_disable() hook, and some measured duration of a video refresh, wrt. that clock. Then in .get_vblank_counter() calculate how many vblanks have elapsed from (current_clock_time - vblank_off_clock_time) / frame_duration and increment our own private software vblank counter. Something along that line...
-mario
Looking through the T30 manuals, I see all the CLK_RST_CONTROLLER_CLK_SOURCE_xxx entries that support PLLD as a parent are display related. You won't find any timers or counters that use the same exact clock. Being out-of-sync is a real possibility, and something adaptive would have to be implement to hide the desync between a fake and a real vblank once you make the transition.
That said, I think being inaccurate here doesn't have a very high cost. Please give me some examples if you disagree though, I'd be interested in some good use cases to throw into my test plan.
- Jon
Agreed. Fwiw at least i can't think of applications which would need stability over more than maybe a couple of seconds or a minute.
-mario
On Tue, Jan 22, 2013 at 06:37:39PM +0100, Mario Kleiner wrote:
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
- .get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
I've finally managed to find some time to look into this some more. The comment atop the drm_driver.get_vblank_counter() field actually suggests that drivers should set this to drm_vblank_count if no real hardware counter is supported.
Now it seems like we may get functionality to obtain the real VBLANK counter once the syncpoint support is merged, so maybe we can leave this as-is until then and replace it with a proper implementation at that point in time?
Alternatively I could use a small wrapper with an explicit comment that this should be implemented using the upcoming syncpoint support.
Thierry
On 02/11/2013 10:13 AM, Thierry Reding wrote:
On Tue, Jan 22, 2013 at 06:37:39PM +0100, Mario Kleiner wrote:
On 14.01.13 17:05, Thierry Reding wrote:
Implement support for the VBLANK IOCTL. Note that Tegra is somewhat special in this case because it doesn't use the generic IRQ support provided by the DRM core (DRIVER_HAVE_IRQ) but rather registers one interrupt handler for each display controller.
While at it, clean up the way that interrupts are enabled to ensure that the VBLANK interrupt only gets enabled when required.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
... snip ...
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, @@ -96,6 +136,10 @@ struct drm_driver tegra_drm_driver = { .open = tegra_drm_open, .lastclose = tegra_drm_lastclose,
- .get_vblank_counter = drm_vblank_count,
-> .get_vblank_counter = drm_vblank_count is a no-op.
.get_vblank_counter() is supposed to return some real hardware vblank counter value to reinitialize the software vblank counter at vbl irq enable time. That software counter gets queried via drm_vblank_count(). If you hook this up to drm_vblank_count() it essentially returns a constant, frozen vblank count, it has the same effect as returning zero or any other constant value -- You lose all vblank counter increments during vblank irq off time. The same problem is present in nouveau-kms.
I think it would be better to either implement a real hw counter query, or some function with a /* TODO: Implement me properly */ comment which returns zero, so it is clear something is missing here.
I've finally managed to find some time to look into this some more. The comment atop the drm_driver.get_vblank_counter() field actually suggests that drivers should set this to drm_vblank_count if no real hardware counter is supported.
It certainly works fine that way. I just think it hides that some implementation is missing there, whereas an extra no-op function makes that clear to the reader.
Now it seems like we may get functionality to obtain the real VBLANK counter once the syncpoint support is merged, so maybe we can leave this as-is until then and replace it with a proper implementation at that point in time?
Perfectly fine with me.
ciao, -mario
Alternatively I could use a small wrapper with an explicit comment that this should be implemented using the upcoming syncpoint support.
Thierry
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
All the necessary support bits like .mode_set_base() and VBLANK are now available, so page-flipping case easily be implemented on top.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de --- drivers/gpu/drm/tegra/dc.c | 72 +++++++++++++++++++++++++++++++++++++++++++++ drivers/gpu/drm/tegra/drm.c | 9 ++++++ drivers/gpu/drm/tegra/drm.h | 5 ++++ 3 files changed, 86 insertions(+)
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 3aa7ccc..ce4e14a 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -161,7 +161,78 @@ void tegra_dc_disable_vblank(struct tegra_dc *dc) spin_unlock_irqrestore(&dc->lock, flags); }
+static void tegra_dc_finish_page_flip(struct tegra_dc *dc) +{ + struct drm_pending_vblank_event *event; + struct drm_device *drm = dc->base.dev; + unsigned long flags; + struct timeval now; + + spin_lock_irqsave(&drm->event_lock, flags); + event = dc->event; + dc->event = NULL; + spin_unlock_irqrestore(&drm->event_lock, flags); + + if (!event) + return; + + event->event.sequence = drm_vblank_count_and_time(drm, dc->pipe, &now); + event->event.tv_sec = now.tv_sec; + event->event.tv_usec = now.tv_usec; + + spin_lock_irqsave(&drm->event_lock, flags); + list_add_tail(&event->base.link, &event->base.file_priv->event_list); + wake_up_interruptible(&event->base.file_priv->event_wait); + spin_unlock_irqrestore(&drm->event_lock, flags); + + drm_vblank_put(drm, dc->pipe); +} + +void tegra_dc_cancel_page_flip(struct drm_crtc *crtc, struct drm_file *file) +{ + struct tegra_dc *dc = to_tegra_dc(crtc); + struct drm_device *drm = crtc->dev; + unsigned long flags; + + spin_lock_irqsave(&drm->event_lock, flags); + + if (dc->event && dc->event->base.file_priv == file) { + dc->event->base.destroy(&dc->event->base); + drm_vblank_put(drm, dc->pipe); + dc->event = NULL; + } + + spin_unlock_irqrestore(&drm->event_lock, flags); +} + +static int tegra_dc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb, + struct drm_pending_vblank_event *event) +{ + struct tegra_framebuffer *newfb = to_tegra_fb(fb); + struct tegra_dc *dc = to_tegra_dc(crtc); + struct drm_device *drm = crtc->dev; + unsigned long flags; + + if (dc->event) + return -EBUSY; + + tegra_dc_set_base(dc, 0, 0, newfb); + + if (event) { + event->pipe = dc->pipe; + + spin_lock_irqsave(&drm->event_lock, flags); + dc->event = event; + spin_unlock_irqrestore(&drm->event_lock, flags); + + drm_vblank_get(drm, dc->pipe); + } + + return 0; +} + static const struct drm_crtc_funcs tegra_crtc_funcs = { + .page_flip = tegra_dc_page_flip, .set_config = drm_crtc_helper_set_config, .destroy = drm_crtc_cleanup, }; @@ -556,6 +627,7 @@ static irqreturn_t tegra_dc_irq(int irq, void *data) dev_dbg(dc->dev, "%s(): vertical blank\n", __func__); */ drm_handle_vblank(dc->base.dev, dc->pipe); + tegra_dc_finish_page_flip(dc); }
if (status & (WIN_A_UF_INT | WIN_B_UF_INT | WIN_C_UF_INT)) { diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 62f8121..7bab784 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -129,11 +129,20 @@ static void tegra_drm_disable_vblank(struct drm_device *drm, int pipe) tegra_dc_disable_vblank(dc); }
+static void tegra_drm_preclose(struct drm_device *drm, struct drm_file *file) +{ + struct drm_crtc *crtc; + + list_for_each_entry(crtc, &drm->mode_config.crtc_list, head) + tegra_dc_cancel_page_flip(crtc, file); +} + struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, .unload = tegra_drm_unload, .open = tegra_drm_open, + .preclose = tegra_drm_preclose, .lastclose = tegra_drm_lastclose,
.get_vblank_counter = drm_vblank_count, diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index ca742b2..1f1cb37 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -100,6 +100,9 @@ struct tegra_dc { struct drm_info_list *debugfs_files; struct drm_minor *minor; struct dentry *debugfs; + + /* page-flip handling */ + struct drm_pending_vblank_event *event; };
static inline struct tegra_dc *host1x_client_to_dc(struct host1x_client *client) @@ -149,6 +152,8 @@ extern int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index, const struct tegra_dc_window *window); extern void tegra_dc_enable_vblank(struct tegra_dc *dc); extern void tegra_dc_disable_vblank(struct tegra_dc *dc); +extern void tegra_dc_cancel_page_flip(struct drm_crtc *crtc, + struct drm_file *file);
struct tegra_output_ops { int (*enable)(struct tegra_output *output);
I'm testing the pageflip & vblank change on cardhu. Seems the HDMI doesn't work(LVDS only is OK). I'll let you know if I get something.
Mark On 01/15/2013 12:06 AM, Thierry Reding wrote:
All the necessary support bits like .mode_set_base() and VBLANK are now available, so page-flipping case easily be implemented on top.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
drivers/gpu/drm/tegra/dc.c | 72 +++++++++++++++++++++++++++++++++++++++++++++ drivers/gpu/drm/tegra/drm.c | 9 ++++++ drivers/gpu/drm/tegra/drm.h | 5 ++++ 3 files changed, 86 insertions(+)
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 3aa7ccc..ce4e14a 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -161,7 +161,78 @@ void tegra_dc_disable_vblank(struct tegra_dc *dc) spin_unlock_irqrestore(&dc->lock, flags); }
+static void tegra_dc_finish_page_flip(struct tegra_dc *dc) +{
- struct drm_pending_vblank_event *event;
- struct drm_device *drm = dc->base.dev;
- unsigned long flags;
- struct timeval now;
- spin_lock_irqsave(&drm->event_lock, flags);
- event = dc->event;
- dc->event = NULL;
- spin_unlock_irqrestore(&drm->event_lock, flags);
- if (!event)
return;
- event->event.sequence = drm_vblank_count_and_time(drm, dc->pipe, &now);
- event->event.tv_sec = now.tv_sec;
- event->event.tv_usec = now.tv_usec;
- spin_lock_irqsave(&drm->event_lock, flags);
- list_add_tail(&event->base.link, &event->base.file_priv->event_list);
- wake_up_interruptible(&event->base.file_priv->event_wait);
- spin_unlock_irqrestore(&drm->event_lock, flags);
- drm_vblank_put(drm, dc->pipe);
+}
+void tegra_dc_cancel_page_flip(struct drm_crtc *crtc, struct drm_file *file) +{
- struct tegra_dc *dc = to_tegra_dc(crtc);
- struct drm_device *drm = crtc->dev;
- unsigned long flags;
- spin_lock_irqsave(&drm->event_lock, flags);
- if (dc->event && dc->event->base.file_priv == file) {
dc->event->base.destroy(&dc->event->base);
drm_vblank_put(drm, dc->pipe);
dc->event = NULL;
- }
- spin_unlock_irqrestore(&drm->event_lock, flags);
+}
+static int tegra_dc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
struct drm_pending_vblank_event *event)
+{
- struct tegra_framebuffer *newfb = to_tegra_fb(fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- struct drm_device *drm = crtc->dev;
- unsigned long flags;
- if (dc->event)
return -EBUSY;
- tegra_dc_set_base(dc, 0, 0, newfb);
- if (event) {
event->pipe = dc->pipe;
spin_lock_irqsave(&drm->event_lock, flags);
dc->event = event;
spin_unlock_irqrestore(&drm->event_lock, flags);
drm_vblank_get(drm, dc->pipe);
- }
- return 0;
+}
static const struct drm_crtc_funcs tegra_crtc_funcs = {
- .page_flip = tegra_dc_page_flip, .set_config = drm_crtc_helper_set_config, .destroy = drm_crtc_cleanup,
}; @@ -556,6 +627,7 @@ static irqreturn_t tegra_dc_irq(int irq, void *data) dev_dbg(dc->dev, "%s(): vertical blank\n", __func__); */ drm_handle_vblank(dc->base.dev, dc->pipe);
tegra_dc_finish_page_flip(dc);
}
if (status & (WIN_A_UF_INT | WIN_B_UF_INT | WIN_C_UF_INT)) {
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 62f8121..7bab784 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -129,11 +129,20 @@ static void tegra_drm_disable_vblank(struct drm_device *drm, int pipe) tegra_dc_disable_vblank(dc); }
+static void tegra_drm_preclose(struct drm_device *drm, struct drm_file *file) +{
- struct drm_crtc *crtc;
- list_for_each_entry(crtc, &drm->mode_config.crtc_list, head)
tegra_dc_cancel_page_flip(crtc, file);
+}
struct drm_driver tegra_drm_driver = { .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM, .load = tegra_drm_load, .unload = tegra_drm_unload, .open = tegra_drm_open,
.preclose = tegra_drm_preclose, .lastclose = tegra_drm_lastclose,
.get_vblank_counter = drm_vblank_count,
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index ca742b2..1f1cb37 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -100,6 +100,9 @@ struct tegra_dc { struct drm_info_list *debugfs_files; struct drm_minor *minor; struct dentry *debugfs;
- /* page-flip handling */
- struct drm_pending_vblank_event *event;
};
static inline struct tegra_dc *host1x_client_to_dc(struct host1x_client *client) @@ -149,6 +152,8 @@ extern int tegra_dc_setup_window(struct tegra_dc *dc, unsigned int index, const struct tegra_dc_window *window); extern void tegra_dc_enable_vblank(struct tegra_dc *dc); extern void tegra_dc_disable_vblank(struct tegra_dc *dc); +extern void tegra_dc_cancel_page_flip(struct drm_crtc *crtc,
struct drm_file *file);
struct tegra_output_ops { int (*enable)(struct tegra_output *output);
Am Mittwoch, den 16.01.2013, 19:10 +0800 schrieb Mark Zhang:
I'm testing the pageflip & vblank change on cardhu. Seems the HDMI doesn't work(LVDS only is OK). I'll let you know if I get something.
Just to provide another data point: I'm running this series and don't see any failures with DVI output. Though I'm only run a single output, not some dual-head config.
Even vbltest from libdrm/tests works and shows correct refresh rate.
Regards, Lucas
On 01/16/2013 07:53 PM, Lucas Stach wrote:
Am Mittwoch, den 16.01.2013, 19:10 +0800 schrieb Mark Zhang:
I'm testing the pageflip & vblank change on cardhu. Seems the HDMI doesn't work(LVDS only is OK). I'll let you know if I get something.
Just to provide another data point: I'm running this series and don't see any failures with DVI output. Though I'm only run a single output, not some dual-head config.
Yes. HDMI only is OK. I met some problems when the LVDS & HDMI are enabled at the same time.
Mark
Even vbltest from libdrm/tests works and shows correct refresh rate.
Regards, Lucas
On 01/15/2013 12:06 AM, Thierry Reding wrote:
All the necessary support bits like .mode_set_base() and VBLANK are now available, so page-flipping case easily be implemented on top.
Signed-off-by: Thierry Reding thierry.reding@avionic-design.de
[...]
+static int tegra_dc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
struct drm_pending_vblank_event *event)
+{
- struct tegra_framebuffer *newfb = to_tegra_fb(fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- struct drm_device *drm = crtc->dev;
- unsigned long flags;
- if (dc->event)
return -EBUSY;
- tegra_dc_set_base(dc, 0, 0, newfb);
"tegra_dc_set_base" only updates the frame buffer start address to dc registers. I think this is not enough because it's possible that this new FB may trigger a full modeset while not just a fb change. For example, the "bpp" of the new FB differs from current setting in dc register.
So I suggest to trigger a full modeset here(although it's slower than fb change) or maybe you can explain why the FB change is enough.
Mark
- if (event) {
event->pipe = dc->pipe;
spin_lock_irqsave(&drm->event_lock, flags);
dc->event = event;
spin_unlock_irqrestore(&drm->event_lock, flags);
drm_vblank_get(drm, dc->pipe);
- }
- return 0;
+}
[...]
struct tegra_output_ops { int (*enable)(struct tegra_output *output);
On 14.01.2013 18:06, Thierry Reding wrote:
+static int tegra_dc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
struct drm_pending_vblank_event *event)
+{
- struct tegra_framebuffer *newfb = to_tegra_fb(fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- struct drm_device *drm = crtc->dev;
- unsigned long flags;
- if (dc->event)
return -EBUSY;
- tegra_dc_set_base(dc, 0, 0, newfb);
- if (event) {
event->pipe = dc->pipe;
spin_lock_irqsave(&drm->event_lock, flags);
dc->event = event;
spin_unlock_irqrestore(&drm->event_lock, flags);
drm_vblank_get(drm, dc->pipe);
- }
- return 0;
+}
The patch seems fine to me. I have a question for a follow-up.
In downstream dc driver we initiate a page flip, and assign a fence (syncpt id, value) to it. We return the fence to user space. Then when the actual page flip is done, dc increments the sync point.
User space can take the fence and use it for synchronizing graphics operations. In downstream, we use that fence to be able to submit operations to graphics units and synchronize them to the flip by adding a host wait. It improves performance, because we can prepare and send the graphics operations to hardware while flip is still happening.
Is this something we could do in tegra-drm, too? DRM's page flip IOCTL doesn't seem to have a way to pass a fence back from fence, so we'd need to find a way to pass the fence back to user space.
Of course, this has the prerequisite of having support for syncpts, which I hoped we would get to 3.9. The review for host1x patches seem to have stalled, so that's iffy.
Terje
On Tue, Jan 22, 2013 at 10:31:11AM +0200, Terje Bergström wrote:
On 14.01.2013 18:06, Thierry Reding wrote:
+static int tegra_dc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
struct drm_pending_vblank_event *event)
+{
- struct tegra_framebuffer *newfb = to_tegra_fb(fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- struct drm_device *drm = crtc->dev;
- unsigned long flags;
- if (dc->event)
return -EBUSY;
- tegra_dc_set_base(dc, 0, 0, newfb);
- if (event) {
event->pipe = dc->pipe;
spin_lock_irqsave(&drm->event_lock, flags);
dc->event = event;
spin_unlock_irqrestore(&drm->event_lock, flags);
drm_vblank_get(drm, dc->pipe);
- }
- return 0;
+}
The patch seems fine to me. I have a question for a follow-up.
In downstream dc driver we initiate a page flip, and assign a fence (syncpt id, value) to it. We return the fence to user space. Then when the actual page flip is done, dc increments the sync point.
User space can take the fence and use it for synchronizing graphics operations. In downstream, we use that fence to be able to submit operations to graphics units and synchronize them to the flip by adding a host wait. It improves performance, because we can prepare and send the graphics operations to hardware while flip is still happening.
Is this something we could do in tegra-drm, too? DRM's page flip IOCTL doesn't seem to have a way to pass a fence back from fence, so we'd need to find a way to pass the fence back to user space.
Of course, this has the prerequisite of having support for syncpts, which I hoped we would get to 3.9. The review for host1x patches seem to have stalled, so that's iffy.
Yes, I haven't found as much time as I would have liked to look at the latest patches. Perhaps there will be a time slot at the end of the week. I thought there was also the remaining issue with the memory allocator that Lucas (Cc'ed) was working on?
Thierry
Am Dienstag, den 22.01.2013, 09:57 +0100 schrieb Thierry Reding:
On Tue, Jan 22, 2013 at 10:31:11AM +0200, Terje Bergström wrote:
Of course, this has the prerequisite of having support for syncpts, which I hoped we would get to 3.9. The review for host1x patches seem to have stalled, so that's iffy.
Yes, I haven't found as much time as I would have liked to look at the latest patches. Perhaps there will be a time slot at the end of the week. I thought there was also the remaining issue with the memory allocator that Lucas (Cc'ed) was working on?
Yes, I haven't finished this work yet. I got side tracked by upstreaming the platform patches for the Colibri and some real life activities. I'll get back to this ASAP.
But even if I get this out real soon, I'm not really comfortable with speeding things to 3.9. I would like to review the userspace side of thing a lot more thoroughly, before committing to the interface. But maybe this can also happen in the 3.9 RC timeframe.
Regards, Lucas
On Tue, Jan 22, 2013 at 10:15:21AM +0100, Lucas Stach wrote:
Am Dienstag, den 22.01.2013, 09:57 +0100 schrieb Thierry Reding:
On Tue, Jan 22, 2013 at 10:31:11AM +0200, Terje Bergström wrote:
Of course, this has the prerequisite of having support for syncpts, which I hoped we would get to 3.9. The review for host1x patches seem to have stalled, so that's iffy.
Yes, I haven't found as much time as I would have liked to look at the latest patches. Perhaps there will be a time slot at the end of the week. I thought there was also the remaining issue with the memory allocator that Lucas (Cc'ed) was working on?
Yes, I haven't finished this work yet. I got side tracked by upstreaming the platform patches for the Colibri and some real life activities. I'll get back to this ASAP.
But even if I get this out real soon, I'm not really comfortable with speeding things to 3.9. I would like to review the userspace side of thing a lot more thoroughly, before committing to the interface. But maybe this can also happen in the 3.9 RC timeframe.
Maybe it'd make sense to finish up the various userspace parts first, so that we can test an accelerated DDX on top of the kernel patches before merging the host1x and gr2d patches.
I'm not quite sure if I remember correctly, but I think David mentioned something along the lines of requiring a working userspace that can be used to test the DRM interfaces as a prerequisite to getting this kind of code merged.
Thierry
On 22.01.2013 11:31, Thierry Reding wrote:
I'm not quite sure if I remember correctly, but I think David mentioned something along the lines of requiring a working userspace that can be used to test the DRM interfaces as a prerequisite to getting this kind of code merged.
That's why we have provided user space, test suite that tests all interfaces we are exporting, and a demo application that runs. If the prerequisite is a working DDX, that's obviously not enough.
Terje
Am Dienstag, den 22.01.2013, 11:44 +0200 schrieb Terje Bergström:
On 22.01.2013 11:31, Thierry Reding wrote:
I'm not quite sure if I remember correctly, but I think David mentioned something along the lines of requiring a working userspace that can be used to test the DRM interfaces as a prerequisite to getting this kind of code merged.
That's why we have provided user space, test suite that tests all interfaces we are exporting, and a demo application that runs. If the prerequisite is a working DDX, that's obviously not enough.
I think the test suite is enough to fulfil the formal requirement of having a working userspace. But still it would be nice to have at least some simple accel functions working in the DDX. Maybe just to see on how well the current design integrates into the DDX code.
So I wouldn't make a working DDX a hard requirement for merging G2D code, but I suspect that if the kernel code goes into 3.10 instead of 3.9, we'll just naturally get to the point of working DDX accel by the same time.
Regards, Lucas
On 22.01.2013 11:48, Lucas Stach wrote:
I think the test suite is enough to fulfil the formal requirement of having a working userspace. But still it would be nice to have at least some simple accel functions working in the DDX. Maybe just to see on how well the current design integrates into the DDX code.
So I wouldn't make a working DDX a hard requirement for merging G2D code, but I suspect that if the kernel code goes into 3.10 instead of 3.9, we'll just naturally get to the point of working DDX accel by the same time.
For me, the most important thing would be to nail down a baseline structure. That way new feature development would be unblocked (IOMMU, other host1x clients, context switching, wait bases come to mind) and we could start to synchronize downstream with upstream.
Biggest benefit of getting merged is that it's a pretty strong indication of a solid baseline. :-)
Terje
On 22.01.2013 11:15, Lucas Stach wrote:
But even if I get this out real soon, I'm not really comfortable with speeding things to 3.9. I would like to review the userspace side of thing a lot more thoroughly, before committing to the interface. But maybe this can also happen in the 3.9 RC timeframe.
Ok. I uploaded the libdrm patches to gitorious (git@gitorious.org:linux-host1x/libdrm-host1x.git). We sent out the user space patches more than a month ago, so I expect them to have fallen out of everybody's radar long ago.
If there's a need for us to resend the patches, let me know.
Terje
On 22.01.13 09:31, Terje Bergström wrote:
On 14.01.2013 18:06, Thierry Reding wrote:
+static int tegra_dc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
struct drm_pending_vblank_event *event)
+{
- struct tegra_framebuffer *newfb = to_tegra_fb(fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- struct drm_device *drm = crtc->dev;
- unsigned long flags;
- if (dc->event)
return -EBUSY;
Hi
I haven't read the Tegra programming manual or played with this, so maybe i'm wrong for some reason, but i think there is a race here between actual pageflip completion - latching newfb into the scanout base register and the completion routine that gets called from the vblank irq handler.
If this code gets called close to vblank or inside vblank, couldn't it happen that tegra_dc_set_base() programs a pageflip that gets executed immediately - the new fb gets latched into the crtc, but the corresponding vblank irq handler for the vblank of flip completion runs before the "if (event)" block can set dc->event = event;? Or vblank irq's are off because drm_vblank_get() is only called at the end of the routine? In both cases the completion routine would miss the correct vblank and only timestamp and emit the completion event 1 vblank after actual flip completion.
In that case it would be better to place tegra_dc_set_base() after the "if (event)" block and have some check inside the flip completion routine to make sure the pageflip completion event is only emitted if the scanout is really already latched with the newfb.
thanks, -mario
- tegra_dc_set_base(dc, 0, 0, newfb);
- if (event) {
event->pipe = dc->pipe;
spin_lock_irqsave(&drm->event_lock, flags);
dc->event = event;
spin_unlock_irqrestore(&drm->event_lock, flags);
drm_vblank_get(drm, dc->pipe);
- }
- return 0;
+}
On Tue, Jan 22, 2013 at 06:27:24PM +0100, Mario Kleiner wrote:
On 22.01.13 09:31, Terje Bergström wrote:
On 14.01.2013 18:06, Thierry Reding wrote:
+static int tegra_dc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
struct drm_pending_vblank_event *event)
+{
- struct tegra_framebuffer *newfb = to_tegra_fb(fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- struct drm_device *drm = crtc->dev;
- unsigned long flags;
- if (dc->event)
return -EBUSY;
Hi
I haven't read the Tegra programming manual or played with this, so maybe i'm wrong for some reason, but i think there is a race here between actual pageflip completion - latching newfb into the scanout base register and the completion routine that gets called from the vblank irq handler.
If this code gets called close to vblank or inside vblank, couldn't it happen that tegra_dc_set_base() programs a pageflip that gets executed immediately - the new fb gets latched into the crtc, but the corresponding vblank irq handler for the vblank of flip completion runs before the "if (event)" block can set dc->event = event;? Or vblank irq's are off because drm_vblank_get() is only called at the end of the routine? In both cases the completion routine would miss the correct vblank and only timestamp and emit the completion event 1 vblank after actual flip completion.
In that case it would be better to place tegra_dc_set_base() after the "if (event)" block and have some check inside the flip completion routine to make sure the pageflip completion event is only emitted if the scanout is really already latched with the newfb.
An easier solution might be to expand the scope of the lock a bit to encompass the call to tegra_dc_set_base() and extend until the end of tegra_dc_page_flip(). That should properly keep the page-flip completion from interfering, right?
spin_lock_irqsave(&drm->event_lock, flags);
tegra_dc_set_base(dc, 0, 0, newfb);
if (event) { event->pipe = dc->pipe; dc->event = event; drm_vblank_get(drm, dc->pipe); }
spin_unlock_irqrestore(&drm->event_lock, flags);
Thierry
On 02/11/2013 10:00 AM, Thierry Reding wrote:
On Tue, Jan 22, 2013 at 06:27:24PM +0100, Mario Kleiner wrote:
On 22.01.13 09:31, Terje Bergström wrote:
On 14.01.2013 18:06, Thierry Reding wrote:
+static int tegra_dc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
struct drm_pending_vblank_event *event)
+{
- struct tegra_framebuffer *newfb = to_tegra_fb(fb);
- struct tegra_dc *dc = to_tegra_dc(crtc);
- struct drm_device *drm = crtc->dev;
- unsigned long flags;
- if (dc->event)
return -EBUSY;
Hi
I haven't read the Tegra programming manual or played with this, so maybe i'm wrong for some reason, but i think there is a race here between actual pageflip completion - latching newfb into the scanout base register and the completion routine that gets called from the vblank irq handler.
If this code gets called close to vblank or inside vblank, couldn't it happen that tegra_dc_set_base() programs a pageflip that gets executed immediately - the new fb gets latched into the crtc, but the corresponding vblank irq handler for the vblank of flip completion runs before the "if (event)" block can set dc->event = event;? Or vblank irq's are off because drm_vblank_get() is only called at the end of the routine? In both cases the completion routine would miss the correct vblank and only timestamp and emit the completion event 1 vblank after actual flip completion.
In that case it would be better to place tegra_dc_set_base() after the "if (event)" block and have some check inside the flip completion routine to make sure the pageflip completion event is only emitted if the scanout is really already latched with the newfb.
An easier solution might be to expand the scope of the lock a bit to encompass the call to tegra_dc_set_base() and extend until the end of tegra_dc_page_flip(). That should properly keep the page-flip completion from interfering, right?
spin_lock_irqsave(&drm->event_lock, flags);
tegra_dc_set_base(dc, 0, 0, newfb);
if (event) { event->pipe = dc->pipe; dc->event = event; drm_vblank_get(drm, dc->pipe); }
spin_unlock_irqrestore(&drm->event_lock, flags);
Thierry
Yes, that would avoid races between page flip ioctl and the irq handler. But i think the tegra_dc_set_base() should go below the if (event) {} block, before the spin_unlock_irqrestore() so you can be sure that drm_vblank_get() has been called before tegra_dc_set_base() is executed. Otherwise vblank irq's may be off during the vblank in which the tegra_dc_set_base() takes effect and a flip complete would only get reported one scanout cycle later at the next vblank -> You'd signal a pageflip completion 1 frame too late.
You also still need to check in tegra_dc_finish_page_flip() if the new scanout buffer has really been latched before emitting the flip complete event. Otherwise it could happen that your execution of tegra_dc_page_flip() intersects with the vblank interval and manages to queue the event in time, so the finish_page_flip picks it up, but the register programming in tegra_dc_set_base() happened a bit too late, so it doesn't get latched in the same vblank but one vblank later --> You'd signal pageflip completion 1 frame too early.
I've just downloaded the Tegra-3 TRM and had a quick look. Section 20.5.1 "Display shadow registers". As far as i understand, if dc->event is pending in tegra_dc_finish_page_flip(), you could read back the ACTIVE copy of DC_WINBUF_START_ADDR by setting READ_MUX properly in DC_CMD_STATE_ACCESS_0, and compare its value against the ARM copy in DC_WINBUF_A_START_ADDR_NS_0. If both values match, the pageflip has happened and the dc->event can be dequeued and emitted.
That assumes that the ARM copy is latched into the ACTIVE copy only at leading edge of vblank. Section 20.5.1 says "...latching happens on the next frame boundary after (GENERAL/WIN_A/WIN_B/WIN_C)_ACT_REQ is programmed..." - not totally clear to me if this is the same as start of vblank?
-mario
dri-devel@lists.freedesktop.org