This series adds support for the gamma and CTM properties to VC4. I've sent the necessary register defs as part of the series this time around to give the full picture.
The CTM support is somewhat limited in that we can only enable it for one CRTC at a time and coefficients are S0.9 in hardware. The latter seems good enough for the various color corrections Android offers.
Eric Anholt (1): drm/vc4: Add some missing HVS register definitions.
Stefan Schake (3): drm/vc4: Expose gamma as atomic property drm/vc4: Add color transformation matrix (CTM) support drm/vc4: Restrict active CTM to one CRTC
drivers/gpu/drm/vc4/vc4_crtc.c | 122 ++++++++++++++++++++++++++++++++++++++--- drivers/gpu/drm/vc4/vc4_regs.h | 96 ++++++++++++++++++++++++++++++++ 2 files changed, 209 insertions(+), 9 deletions(-)
From: Eric Anholt eric@anholt.net
At least the RGBA expand field we should have been setting, because we aren't expanding correctly for 565 -> 8888. Other registers are ones that may be interesting for various projects that have been discussed.
Signed-off-by: Eric Anholt eric@anholt.net Cc: Stefan Schake stschake@gmail.com --- v2: New in the series. Included here to keep kbuild robot happy and give the full picture.
drivers/gpu/drm/vc4/vc4_regs.h | 96 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h index a141496104a6..4af3e29d076a 100644 --- a/drivers/gpu/drm/vc4/vc4_regs.h +++ b/drivers/gpu/drm/vc4/vc4_regs.h @@ -330,6 +330,21 @@ #define SCALER_DISPCTRL0 0x00000040 # define SCALER_DISPCTRLX_ENABLE BIT(31) # define SCALER_DISPCTRLX_RESET BIT(30) +/* Generates a single frame when VSTART is seen and stops at the last + * pixel read from the FIFO. + */ +# define SCALER_DISPCTRLX_ONESHOT BIT(29) +/* Processes a single context in the dlist and then task switch, + * instead of an entire line. + */ +# define SCALER_DISPCTRLX_ONECTX BIT(28) +/* Set to have DISPSLAVE return 2 16bpp pixels and no status data. */ +# define SCALER_DISPCTRLX_FIFO32 BIT(27) +/* Turns on output to the DISPSLAVE register instead of the normal + * FIFO. + */ +# define SCALER_DISPCTRLX_FIFOREG BIT(26) + # define SCALER_DISPCTRLX_WIDTH_MASK VC4_MASK(23, 12) # define SCALER_DISPCTRLX_WIDTH_SHIFT 12 # define SCALER_DISPCTRLX_HEIGHT_MASK VC4_MASK(11, 0) @@ -402,6 +417,68 @@ */ # define SCALER_GAMADDR_SRAMENB BIT(30)
+#define SCALER_OLEDOFFS 0x00000080 +/* Clamps R to [16,235] and G/B to [16,240]. */ +# define SCALER_OLEDOFFS_YUVCLAMP BIT(31) + +/* Chooses which display FIFO the matrix applies to. */ +# define SCALER_OLEDOFFS_DISPFIFO_MASK VC4_MASK(25, 24) +# define SCALER_OLEDOFFS_DISPFIFO_SHIFT 24 +# define SCALER_OLEDOFFS_DISPFIFO_DISABLED 0 +# define SCALER_OLEDOFFS_DISPFIFO_0 1 +# define SCALER_OLEDOFFS_DISPFIFO_1 2 +# define SCALER_OLEDOFFS_DISPFIFO_2 3 + +/* Offsets are 8-bit 2s-complement. */ +# define SCALER_OLEDOFFS_RED_MASK VC4_MASK(23, 16) +# define SCALER_OLEDOFFS_RED_SHIFT 16 +# define SCALER_OLEDOFFS_GREEN_MASK VC4_MASK(15, 8) +# define SCALER_OLEDOFFS_GREEN_SHIFT 8 +# define SCALER_OLEDOFFS_BLUE_MASK VC4_MASK(7, 0) +# define SCALER_OLEDOFFS_BLUE_SHIFT 0 + +/* The coefficients are S0.9 fractions. */ +#define SCALER_OLEDCOEF0 0x00000084 +# define SCALER_OLEDCOEF0_B_TO_R_MASK VC4_MASK(29, 20) +# define SCALER_OLEDCOEF0_B_TO_R_SHIFT 20 +# define SCALER_OLEDCOEF0_B_TO_G_MASK VC4_MASK(19, 10) +# define SCALER_OLEDCOEF0_B_TO_G_SHIFT 10 +# define SCALER_OLEDCOEF0_B_TO_B_MASK VC4_MASK(9, 0) +# define SCALER_OLEDCOEF0_B_TO_B_SHIFT 0 + +#define SCALER_OLEDCOEF1 0x00000088 +# define SCALER_OLEDCOEF1_G_TO_R_MASK VC4_MASK(29, 20) +# define SCALER_OLEDCOEF1_G_TO_R_SHIFT 20 +# define SCALER_OLEDCOEF1_G_TO_G_MASK VC4_MASK(19, 10) +# define SCALER_OLEDCOEF1_G_TO_G_SHIFT 10 +# define SCALER_OLEDCOEF1_G_TO_B_MASK VC4_MASK(9, 0) +# define SCALER_OLEDCOEF1_G_TO_B_SHIFT 0 + +#define SCALER_OLEDCOEF2 0x0000008c +# define SCALER_OLEDCOEF2_R_TO_R_MASK VC4_MASK(29, 20) +# define SCALER_OLEDCOEF2_R_TO_R_SHIFT 20 +# define SCALER_OLEDCOEF2_R_TO_G_MASK VC4_MASK(19, 10) +# define SCALER_OLEDCOEF2_R_TO_G_SHIFT 10 +# define SCALER_OLEDCOEF2_R_TO_B_MASK VC4_MASK(9, 0) +# define SCALER_OLEDCOEF2_R_TO_B_SHIFT 0 + +/* Slave addresses for DMAing from HVS composition output to other + * devices. The top bits are valid only in !FIFO32 mode. + */ +#define SCALER_DISPSLAVE0 0x000000c0 +#define SCALER_DISPSLAVE1 0x000000c9 +#define SCALER_DISPSLAVE2 0x000000d0 +# define SCALER_DISPSLAVE_ISSUE_VSTART BIT(31) +# define SCALER_DISPSLAVE_ISSUE_HSTART BIT(30) +/* Set when the current line has been read and an HSTART is required. */ +# define SCALER_DISPSLAVE_EOL BIT(26) +/* Set when the display FIFO is empty. */ +# define SCALER_DISPSLAVE_EMPTY BIT(25) +/* Set when there is RGB data ready to read. */ +# define SCALER_DISPSLAVE_VALID BIT(24) +# define SCALER_DISPSLAVE_RGB_MASK VC4_MASK(23, 0) +# define SCALER_DISPSLAVE_RGB_SHIFT 0 + #define SCALER_GAMDATA 0x000000e0 #define SCALER_DLIST_START 0x00002000 #define SCALER_DLIST_SIZE 0x00004000 @@ -767,6 +844,10 @@ enum hvs_pixel_format { HVS_PIXEL_FORMAT_YCBCR_YUV420_2PLANE = 9, HVS_PIXEL_FORMAT_YCBCR_YUV422_3PLANE = 10, HVS_PIXEL_FORMAT_YCBCR_YUV422_2PLANE = 11, + HVS_PIXEL_FORMAT_H264 = 12, + HVS_PIXEL_FORMAT_PALETTE = 13, + HVS_PIXEL_FORMAT_YUV444_RGB = 14, + HVS_PIXEL_FORMAT_AYUV444_RGB = 15, };
/* Note: the LSB is the rightmost character shown. Only valid for @@ -800,12 +881,27 @@ enum hvs_pixel_format { #define SCALER_CTL0_TILING_128B 2 #define SCALER_CTL0_TILING_256B_OR_T 3
+#define SCALER_CTL0_ALPHA_MASK BIT(19) #define SCALER_CTL0_HFLIP BIT(16) #define SCALER_CTL0_VFLIP BIT(15)
+#define SCALER_CTL0_KEY_MODE_MASK VC4_MASK(18, 17) +#define SCALER_CTL0_KEY_MODE_SHIFT 17 +#define SCALER_CTL0_KEY_DISABLED 0 +#define SCALER_CTL0_KEY_LUMA_OR_COMMON_RGB 1 +#define SCALER_CTL0_KEY_MATCH 2 /* turn transparent */ +#define SCALER_CTL0_KEY_REPLACE 3 /* replace with value from key mask word 2 */ + #define SCALER_CTL0_ORDER_MASK VC4_MASK(14, 13) #define SCALER_CTL0_ORDER_SHIFT 13
+#define SCALER_CTL0_RGBA_EXPAND_MASK VC4_MASK(12, 11) +#define SCALER_CTL0_RGBA_EXPAND_SHIFT 11 +#define SCALER_CTL0_RGBA_EXPAND_ZERO 0 +#define SCALER_CTL0_RGBA_EXPAND_LSB 1 +#define SCALER_CTL0_RGBA_EXPAND_MSB 2 +#define SCALER_CTL0_RGBA_EXPAND_ROUND 3 + #define SCALER_CTL0_SCL1_MASK VC4_MASK(10, 8) #define SCALER_CTL0_SCL1_SHIFT 8
We are an atomic driver so the gamma LUT should also be exposed as a CRTC property through the DRM atomic color management. This will also take care of the legacy path for us.
Signed-off-by: Stefan Schake stschake@gmail.com --- v2: Use drm_color_lut_size for LUT length
drivers/gpu/drm/vc4/vc4_crtc.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c index bf4667481935..239215cb3274 100644 --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -298,23 +298,21 @@ vc4_crtc_lut_load(struct drm_crtc *crtc) HVS_WRITE(SCALER_GAMDATA, vc4_crtc->lut_b[i]); }
-static int -vc4_crtc_gamma_set(struct drm_crtc *crtc, u16 *r, u16 *g, u16 *b, - uint32_t size, - struct drm_modeset_acquire_ctx *ctx) +static void +vc4_crtc_update_gamma_lut(struct drm_crtc *crtc) { struct vc4_crtc *vc4_crtc = to_vc4_crtc(crtc); + struct drm_color_lut *lut = crtc->state->gamma_lut->data; + u32 length = drm_color_lut_size(crtc->state->gamma_lut); u32 i;
- for (i = 0; i < size; i++) { - vc4_crtc->lut_r[i] = r[i] >> 8; - vc4_crtc->lut_g[i] = g[i] >> 8; - vc4_crtc->lut_b[i] = b[i] >> 8; + for (i = 0; i < length; i++) { + vc4_crtc->lut_r[i] = drm_color_lut_extract(lut[i].red, 8); + vc4_crtc->lut_g[i] = drm_color_lut_extract(lut[i].green, 8); + vc4_crtc->lut_b[i] = drm_color_lut_extract(lut[i].blue, 8); }
vc4_crtc_lut_load(crtc); - - return 0; }
static u32 vc4_get_fifo_full_level(u32 format) @@ -699,6 +697,9 @@ static void vc4_crtc_atomic_flush(struct drm_crtc *crtc, if (crtc->state->active && old_state->active) vc4_crtc_update_dlist(crtc);
+ if (crtc->state->color_mgmt_changed && crtc->state->gamma_lut) + vc4_crtc_update_gamma_lut(crtc); + if (debug_dump_regs) { DRM_INFO("CRTC %d HVS after:\n", drm_crtc_index(crtc)); vc4_hvs_dump_state(dev); @@ -909,7 +910,7 @@ static const struct drm_crtc_funcs vc4_crtc_funcs = { .reset = vc4_crtc_reset, .atomic_duplicate_state = vc4_crtc_duplicate_state, .atomic_destroy_state = vc4_crtc_destroy_state, - .gamma_set = vc4_crtc_gamma_set, + .gamma_set = drm_atomic_helper_legacy_gamma_set, .enable_vblank = vc4_enable_vblank, .disable_vblank = vc4_disable_vblank, }; @@ -1035,6 +1036,7 @@ static int vc4_crtc_bind(struct device *dev, struct device *master, void *data) primary_plane->crtc = crtc; vc4_crtc->channel = vc4_crtc->data->hvs_channel; drm_mode_crtc_set_gamma_size(crtc, ARRAY_SIZE(vc4_crtc->lut_r)); + drm_crtc_enable_color_mgmt(crtc, 0, false, crtc->gamma_size);
/* Set up some arbitrary number of planes. We're not limited * by a set number of physical registers, just the space in
Stefan Schake stschake@gmail.com writes:
Don't we need to set things back to linear if gamma_lut is NULL? (maybe by updating the SCALER_DISPBKGND_GAMMA flag on the HVS channel)
Other than that, this looks great.
The hardware supports a CTM with S0.9 values. We therefore only allow a value of 1.0 or fractional only and reject all others with integer parts. This restriction is mostly inconsequential in practice since commonly used transformation matrices have all scalars <= 1.0.
Signed-off-by: Stefan Schake stschake@gmail.com --- v2: Simplify CTM atomic check (Ville)
drivers/gpu/drm/vc4/vc4_crtc.c | 97 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 94 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c index 8d71098d00c4..bafb0102fe1d 100644 --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -315,6 +315,79 @@ vc4_crtc_update_gamma_lut(struct drm_crtc *crtc) vc4_crtc_lut_load(crtc); }
+/* Converts a DRM S31.32 value to the HW S0.9 format. */ +static u16 vc4_crtc_s31_32_to_s0_9(u64 in) +{ + u16 r; + + /* Sign bit. */ + r = in & BIT_ULL(63) ? BIT(9) : 0; + /* We have zero integer bits so we can only saturate here. */ + if ((in & GENMASK_ULL(62, 32)) > 0) + r |= GENMASK(8, 0); + /* Otherwise take the 9 most important fractional bits. */ + else + r |= (in >> 22) & GENMASK(8, 0); + return r; +} + +static void +vc4_crtc_update_ctm(struct drm_crtc *crtc) +{ + struct drm_device *dev = crtc->dev; + struct vc4_dev *vc4 = to_vc4_dev(dev); + struct vc4_crtc *vc4_crtc = to_vc4_crtc(crtc); + struct drm_color_ctm *ctm = crtc->state->ctm->data; + + HVS_WRITE(SCALER_OLEDCOEF2, + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[0]), + SCALER_OLEDCOEF2_R_TO_R) | + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[3]), + SCALER_OLEDCOEF2_R_TO_G) | + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[6]), + SCALER_OLEDCOEF2_R_TO_B)); + HVS_WRITE(SCALER_OLEDCOEF1, + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[1]), + SCALER_OLEDCOEF1_G_TO_R) | + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[4]), + SCALER_OLEDCOEF1_G_TO_G) | + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[7]), + SCALER_OLEDCOEF1_G_TO_B)); + HVS_WRITE(SCALER_OLEDCOEF0, + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[2]), + SCALER_OLEDCOEF0_B_TO_R) | + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[5]), + SCALER_OLEDCOEF0_B_TO_G) | + VC4_SET_FIELD(vc4_crtc_s31_32_to_s0_9(ctm->matrix[8]), + SCALER_OLEDCOEF0_B_TO_B)); + + /* Channel is 0-based but for DISPFIFO, 0 means disabled. */ + HVS_WRITE(SCALER_OLEDOFFS, VC4_SET_FIELD(vc4_crtc->channel + 1, + SCALER_OLEDOFFS_DISPFIFO)); +} + +/* Check if the CTM contains valid input. + * + * DRM exposes CTM with S31.32 scalars, but the HW only supports S0.9. + * We don't allow integer values >1, and 1 only without fractional part + * to handle the common 1.0 value. + */ +static int vc4_crtc_atomic_check_ctm(struct drm_crtc_state *state) +{ + struct drm_color_ctm *ctm = state->ctm->data; + u32 i; + + for (i = 0; i < ARRAY_SIZE(ctm->matrix); i++) { + u64 val = ctm->matrix[i]; + + val &= ~BIT_ULL(63); + if (val > BIT_ULL(32)) + return -EINVAL; + } + + return 0; +} + static u32 vc4_get_fifo_full_level(u32 format) { static const u32 fifo_len_bytes = 64; @@ -621,6 +694,15 @@ static int vc4_crtc_atomic_check(struct drm_crtc *crtc, if (hweight32(state->connector_mask) > 1) return -EINVAL;
+ if (state->ctm) { + /* The CTM hardware has no integer bits, so we check + * and reject scalars >1.0 that we have no chance of + * approximating. + */ + if (vc4_crtc_atomic_check_ctm(state)) + return -EINVAL; + } + drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, state) dlist_count += vc4_plane_dlist_size(plane_state);
@@ -697,8 +779,17 @@ static void vc4_crtc_atomic_flush(struct drm_crtc *crtc, if (crtc->state->active && old_state->active) vc4_crtc_update_dlist(crtc);
- if (crtc->state->color_mgmt_changed && crtc->state->gamma_lut) - vc4_crtc_update_gamma_lut(crtc); + if (crtc->state->color_mgmt_changed) { + if (crtc->state->gamma_lut) + vc4_crtc_update_gamma_lut(crtc); + + if (crtc->state->ctm) + vc4_crtc_update_ctm(crtc); + /* We are transitioning to CTM disabled. */ + else if (old_state->ctm) + HVS_WRITE(SCALER_OLEDOFFS, + VC4_SET_FIELD(0, SCALER_OLEDOFFS_DISPFIFO)); + }
if (debug_dump_regs) { DRM_INFO("CRTC %d HVS after:\n", drm_crtc_index(crtc)); @@ -1036,7 +1127,7 @@ static int vc4_crtc_bind(struct device *dev, struct device *master, void *data) primary_plane->crtc = crtc; vc4_crtc->channel = vc4_crtc->data->hvs_channel; drm_mode_crtc_set_gamma_size(crtc, ARRAY_SIZE(vc4_crtc->lut_r)); - drm_crtc_enable_color_mgmt(crtc, 0, false, crtc->gamma_size); + drm_crtc_enable_color_mgmt(crtc, 0, true, crtc->gamma_size);
/* Set up some arbitrary number of planes. We're not limited * by a set number of physical registers, just the space in
Stefan Schake stschake@gmail.com writes:
My primary concern with this patch is whether the OLEDCOEFFs apply before or after gamma, since atomic is specific about what order they happen in. I didn't find anything in the docs, so I'd have to pull up RTL to confirm.
I find splitting the if from else without braces weird and error-prone. Could we rewrite as:
+ if (crtc->state->ctm) + vc4_crtc_update_ctm(crtc); + else if (old_state->ctm) { + /* We are transitioning to CTM disabled. */ + HVS_WRITE(SCALER_OLEDOFFS, + VC4_SET_FIELD(0, SCALER_OLEDOFFS_DISPFIFO)); + } + }
Also, I think there might be a problem if you swap from CTM on CRTC1 to CRTC 0 in a single commit -- CRTC0's CTM setup ends up getting disbaled by CRTC1.
I think this should go away once you start tracking who's doing CTM at the atomic state level, and put the SCALER_OLEDOFFS update into the top-level atomic flush.
We only have one hardware block to do the CTM and need to reject attempts to enable it for multiple CRTCs simultaneously.
Signed-off-by: Stefan Schake stschake@gmail.com --- v2: No change
drivers/gpu/drm/vc4/vc4_crtc.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c index bafb0102fe1d..180b93ec447e 100644 --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -676,10 +676,17 @@ static enum drm_mode_status vc4_crtc_mode_valid(struct drm_crtc *crtc, return MODE_OK; }
+static int vc4_crtc_get_ctm_fifo(struct vc4_dev *vc4) +{ + return VC4_GET_FIELD(HVS_READ(SCALER_OLEDOFFS), + SCALER_OLEDOFFS_DISPFIFO); +} + static int vc4_crtc_atomic_check(struct drm_crtc *crtc, struct drm_crtc_state *state) { struct vc4_crtc_state *vc4_state = to_vc4_crtc_state(state); + struct drm_crtc_state *old_state = crtc->state; struct drm_device *dev = crtc->dev; struct vc4_dev *vc4 = to_vc4_dev(dev); struct drm_plane *plane; @@ -701,6 +708,10 @@ static int vc4_crtc_atomic_check(struct drm_crtc *crtc, */ if (vc4_crtc_atomic_check_ctm(state)) return -EINVAL; + + /* We can only enable CTM for one fifo or CRTC at a time */ + if (!old_state->ctm && vc4_crtc_get_ctm_fifo(vc4)) + return -EINVAL; }
drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, state)
Hi Stefan,
On 25 March 2018 at 02:52, Stefan Schake stschake@gmail.com wrote:
This needs to be managed as a global resource through atomic state objects, rather than checking the current hardware state.
Cheers, Daniel
Hey Daniel,
On Sun, Mar 25, 2018 at 10:01 AM, Daniel Stone daniel@fooishbar.org wrote:
Do you mean as a property or some such that is accessible to userland or merely that this could be raced?
I haven't had much luck finding examples for resources shared between CRTCs in the current tree. My understanding here was that if userland commits on CRTC B after a check-only on A, we are no longer bound by the earlier result for the check-only. Otherwise, I would have to already commit my CTM block to one CRTC at check (possibly check only) time.
Thanks, Stefan
On Sun, Mar 25, 2018 at 08:14:35PM +0200, Stefan Schake wrote:
https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#handling-driver-privat...
since you only have one CTM it's a shared resource which internally needs to be tracked as a driver private thing.
Cheers, Daniel
On 26 March 2018 at 09:29, Daniel Vetter daniel@ffwll.ch wrote:
Indeed, the above is exactly what I meant. Checking based on the hardware status will falsely succeed if you go from having zero CRTCs using CTM, to multiple CRTCs using CTM, in a single atomic commit, as the hardware status won't have changed in time.
Cheers, Daniel
dri-devel@lists.freedesktop.org