Amlogic uses a proprietary lossless image compression protocol and format for their hardware video codec accelerators, either video decoders or video input encoders.
It considerably reduces memory bandwidth while writing and reading frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit per component, YCbCr 420, single plane : - DRM_FORMAT_YUV420_8BIT - DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2 Amlogic VDEC decoder.
At least two options are supported : - Scatter mode: the buffer is filled with a IOMMU scatter table referring to the encoder current memory layout. This mode if more efficient in terms of memory allocation but frames are not dumpable and only valid during until the buffer is freed and back in control of the encoder - Memory saving: when the pixel bpp is 8b, the size of the superblock can be reduced, thus saving memory.
This serie adds the missing register, updated the FBC decoder registers content to be committed by the crtc code.
The Amlogic FBC has been tested with compressed content from the Amlogic HW VP9 decoder on S905X (GXL), S905D2 (G12A) and S905X3 (SM1) in 8bit (Scatter+Mem Saving on G12A/SM1, Mem Saving on GXL) and 10bit (Scatter on G12A/SM1, default on GXL).
It's expected to work as-is on GXM and G12B SoCs.
Changes since v1 at [1]: - s/VD1_AXI_SEL_AFB/VD1_AXI_SEL_AFBC/ into meson_registers.h
[1] https://patchwork.freedesktop.org/series/73722/#rev1
Neil Armstrong (4): drm/fourcc: Add modifier definitions for describing Amlogic Video Framebuffer Compression drm/meson: add Amlogic Video FBC registers drm/meson: overlay: setup overlay for Amlogic FBC drm/meson: crtc: handle commit of Amlogic FBC frames
drivers/gpu/drm/meson/meson_crtc.c | 118 ++++++++--- drivers/gpu/drm/meson/meson_drv.h | 16 ++ drivers/gpu/drm/meson/meson_overlay.c | 257 +++++++++++++++++++++++- drivers/gpu/drm/meson/meson_registers.h | 22 ++ include/uapi/drm/drm_fourcc.h | 56 ++++++ 5 files changed, 431 insertions(+), 38 deletions(-)
Amlogic uses a proprietary lossless image compression protocol and format for their hardware video codec accelerators, either video decoders or video input encoders.
It considerably reduces memory bandwidth while writing and reading frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit per component, YCbCr 420, single plane : - DRM_FORMAT_YUV420_8BIT - DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2 Amlogic VDEC decoder.
At least two options are supported : - Scatter mode: the buffer is filled with a IOMMU scatter table referring to the encoder current memory layout. This mode if more efficient in terms of memory allocation but frames are not dumpable and only valid during until the buffer is freed and back in control of the encoder - Memory saving: when the pixel bpp is 8b, the size of the superblock can be reduced, thus saving memory.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com --- include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 8bc0b31597d8..8a6e87bacadb 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -309,6 +309,7 @@ extern "C" { #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09 +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
/* add more to the end as needed */
@@ -804,6 +805,61 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
+/* + * Amlogic Video Framebuffer Compression modifiers + * + * Amlogic uses a proprietary lossless image compression protocol and format + * for their hardware video codec accelerators, either video decoders or + * video input encoders. + * + * It considerably reduces memory bandwidth while writing and reading + * frames in memory. + * Implementation details may be platform and SoC specific, and shared + * between the producer and the decoder on the same platform. + * + * The underlying storage is considered to be 3 components, 8bit or 10-bit + * per component YCbCr 420, single plane : + * - DRM_FORMAT_YUV420_8BIT + * - DRM_FORMAT_YUV420_10BIT + * + * The classic memory storage is composed of: + * - a body content organized in 64x32 superblocks with 4096 bytes per + * superblock in default mode. + * - a 32 bytes per 128x64 header block + */ +#define DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT fourcc_mod_code(AMLOGIC, 0) + +/* + * Amlogic Video Framebuffer Compression Options + * + * Two optional features are available which may not supported/used on every + * SoCs and Compressed Framebuffer producers. + */ +#define DRM_FORMAT_MOD_AMLOGIC_FBC(__modes) fourcc_mod_code(AMLOGIC, __modes) + +/* + * Amlogic FBC Scatter Memory layout + * + * Indicates the header contains IOMMU references to the compressed + * frames content to optimize memory access and layout. + * In this mode, only the header memory address is needed, thus the + * content memory organization is tied to the current producer + * execution and cannot be saved/dumped. + */ +#define DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER (1ULL << 0) + +/* + * Amlogic FBC Memory Saving mode + * + * Indicates the storage is packed when pixel size is multiple of word + * boudaries, i.e. 8bit should be stored in this mode to save allocation + * memory. + * + * This mode reduces body layout to 3072 bytes per 64x32 superblock and + * 3200 bytes per 64x32 superblock combined with scatter mode. + */ +#define DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (1ULL << 1) + #if defined(__cplusplus) } #endif
On Fri, Feb 21, 2020 at 10:09 AM Neil Armstrong narmstrong@baylibre.com wrote:
Amlogic uses a proprietary lossless image compression protocol and format for their hardware video codec accelerators, either video decoders or video input encoders.
It considerably reduces memory bandwidth while writing and reading frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit per component, YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2 Amlogic VDEC decoder.
At least two options are supported :
- Scatter mode: the buffer is filled with a IOMMU scatter table referring to the encoder current memory layout. This mode if more efficient in terms of memory allocation but frames are not dumpable and only valid during until the buffer is freed and back in control of the encoder
- Memory saving: when the pixel bpp is 8b, the size of the superblock can be reduced, thus saving memory.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com
include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 8bc0b31597d8..8a6e87bacadb 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -309,6 +309,7 @@ extern "C" { #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09 +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
/* add more to the end as needed */
@@ -804,6 +805,61 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
+/*
- Amlogic Video Framebuffer Compression modifiers
- Amlogic uses a proprietary lossless image compression protocol and format
- for their hardware video codec accelerators, either video decoders or
- video input encoders.
- It considerably reduces memory bandwidth while writing and reading
- frames in memory.
- Implementation details may be platform and SoC specific, and shared
- between the producer and the decoder on the same platform.
- The underlying storage is considered to be 3 components, 8bit or 10-bit
- per component YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
- The classic memory storage is composed of:
- a body content organized in 64x32 superblocks with 4096 bytes per
- superblock in default mode.
- a 32 bytes per 128x64 header block
- */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT fourcc_mod_code(AMLOGIC, 0)
+/*
- Amlogic Video Framebuffer Compression Options
- Two optional features are available which may not supported/used on every
- SoCs and Compressed Framebuffer producers.
- */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC(__modes) fourcc_mod_code(AMLOGIC, __modes)
+/*
- Amlogic FBC Scatter Memory layout
- Indicates the header contains IOMMU references to the compressed
- frames content to optimize memory access and layout.
- In this mode, only the header memory address is needed, thus the
- content memory organization is tied to the current producer
- execution and cannot be saved/dumped.
- */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER (1ULL << 0)
+/*
- Amlogic FBC Memory Saving mode
- Indicates the storage is packed when pixel size is multiple of word
- boudaries, i.e. 8bit should be stored in this mode to save allocation
- memory.
- This mode reduces body layout to 3072 bytes per 64x32 superblock and
- 3200 bytes per 64x32 superblock combined with scatter mode.
- */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (1ULL << 1)
#if defined(__cplusplus) }
#endif
2.22.0
I'm the main developer of the V4L2 video decoder (H264, VP9..) on amlogic platforms, which is a producer of such compressed frames.
Those modifiers suit well the combinations of options that can be applied to the frames when created. I also helped testing the following scenarios of decode+display on various SoCs:
- SM1: DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER (10-bit & 8-bit video) - SM1: DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER + DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (8-bit video) - G12A: DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER (10-bit & 8-bit video) - G12A: DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER + DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (8-bit video) - GXL: DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT (10-bit & 8-bit video) - GXL: DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (8-bit video)
Reviewed-by: Maxime Jourdan mjourdan@baylibre.com
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
Amlogic uses a proprietary lossless image compression protocol and format for their hardware video codec accelerators, either video decoders or video input encoders.
It considerably reduces memory bandwidth while writing and reading frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit per component, YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2 Amlogic VDEC decoder.
At least two options are supported :
- Scatter mode: the buffer is filled with a IOMMU scatter table referring to the encoder current memory layout. This mode if more efficient in terms of memory allocation but frames are not dumpable and only valid during until the buffer is freed and back in control of the encoder
- Memory saving: when the pixel bpp is 8b, the size of the superblock can be reduced, thus saving memory.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com
include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 8bc0b31597d8..8a6e87bacadb 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -309,6 +309,7 @@ extern "C" { #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09 +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
/* add more to the end as needed */
@@ -804,6 +805,61 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
+/*
- Amlogic Video Framebuffer Compression modifiers
- Amlogic uses a proprietary lossless image compression protocol and format
- for their hardware video codec accelerators, either video decoders or
- video input encoders.
- It considerably reduces memory bandwidth while writing and reading
- frames in memory.
- Implementation details may be platform and SoC specific, and shared
- between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
It can be an issue in two ways:
- If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
- If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
Neil mentioned the data contains a "header" that further specifies things, but there is no specification about the header itself. Therefore I don't think we can even know if the header contains something that acts like a sub-modifier or not.
All this sounds like the modifier definitions here are not enough to fully interpret the data. At the very least I would expect a reference to a document explaining the "header", or even better, a kernel ReST doc.
I wonder if this is at all suitable as a DRM format modifier as is. I have been assuming that a modifier together with all the usual FB parameters should be enough to interpret the stored data, but in this case I have doubt it actually is.
I have no problem with proprietary data layouts as long as they are fully specified.
I do feel like I would not be able to write a software decoder for this set of modifiers given the details below.
Thanks, pq
- The underlying storage is considered to be 3 components, 8bit or 10-bit
- per component YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
- The classic memory storage is composed of:
- a body content organized in 64x32 superblocks with 4096 bytes per
- superblock in default mode.
- a 32 bytes per 128x64 header block
- */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT fourcc_mod_code(AMLOGIC, 0)
+/*
- Amlogic Video Framebuffer Compression Options
- Two optional features are available which may not supported/used on every
- SoCs and Compressed Framebuffer producers.
- */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC(__modes) fourcc_mod_code(AMLOGIC, __modes)
+/*
- Amlogic FBC Scatter Memory layout
- Indicates the header contains IOMMU references to the compressed
- frames content to optimize memory access and layout.
- In this mode, only the header memory address is needed, thus the
- content memory organization is tied to the current producer
- execution and cannot be saved/dumped.
- */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER (1ULL << 0)
+/*
- Amlogic FBC Memory Saving mode
- Indicates the storage is packed when pixel size is multiple of word
- boudaries, i.e. 8bit should be stored in this mode to save allocation
- memory.
- This mode reduces body layout to 3072 bytes per 64x32 superblock and
- 3200 bytes per 64x32 superblock combined with scatter mode.
- */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (1ULL << 1)
#if defined(__cplusplus) } #endif
Hi,
On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
Amlogic uses a proprietary lossless image compression protocol and format for their hardware video codec accelerators, either video decoders or video input encoders.
It considerably reduces memory bandwidth while writing and reading frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit per component, YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2 Amlogic VDEC decoder.
At least two options are supported :
- Scatter mode: the buffer is filled with a IOMMU scatter table referring to the encoder current memory layout. This mode if more efficient in terms of memory allocation but frames are not dumpable and only valid during until the buffer is freed and back in control of the encoder
- Memory saving: when the pixel bpp is 8b, the size of the superblock can be reduced, thus saving memory.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com
include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 8bc0b31597d8..8a6e87bacadb 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -309,6 +309,7 @@ extern "C" { #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09 +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
/* add more to the end as needed */
@@ -804,6 +805,61 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
+/*
- Amlogic Video Framebuffer Compression modifiers
- Amlogic uses a proprietary lossless image compression protocol and format
- for their hardware video codec accelerators, either video decoders or
- video input encoders.
- It considerably reduces memory bandwidth while writing and reading
- frames in memory.
- Implementation details may be platform and SoC specific, and shared
- between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
It can be an issue in two ways:
If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
Playing devil's advocate, the comment sounds similar to I915_FORMAT_MOD_{X,Y}_TILED:
* This format is highly platforms specific and not useful for cross-driver * sharing. It exists since on a given platform it does uniquely identify the * layout in a simple way for i915-specific userspace.
Isn't the statement that this for sharing between producer and decoder _on the same platform_ a similar clause with the same effect?
What advantage is there to exposing the gory details? For Arm AFBC it's necessary because IP on the SoC can be (likely to be) from different vendors with different capabilities.
If this is only for talking between Amlogic IP on the same SoC, and those devices support all the same "flavours", I don't see what is gained by making userspace care about internals.
Thanks, -Brian
Neil mentioned the data contains a "header" that further specifies things, but there is no specification about the header itself. Therefore I don't think we can even know if the header contains something that acts like a sub-modifier or not.
All this sounds like the modifier definitions here are not enough to fully interpret the data. At the very least I would expect a reference to a document explaining the "header", or even better, a kernel ReST doc.
I wonder if this is at all suitable as a DRM format modifier as is. I have been assuming that a modifier together with all the usual FB parameters should be enough to interpret the stored data, but in this case I have doubt it actually is.
I have no problem with proprietary data layouts as long as they are fully specified.
I do feel like I would not be able to write a software decoder for this set of modifiers given the details below.
Thanks, pq
On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey brian.starkey@arm.com wrote:
Hi,
On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
Amlogic uses a proprietary lossless image compression protocol and format for their hardware video codec accelerators, either video decoders or video input encoders.
It considerably reduces memory bandwidth while writing and reading frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit per component, YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2 Amlogic VDEC decoder.
At least two options are supported :
- Scatter mode: the buffer is filled with a IOMMU scatter table referring to the encoder current memory layout. This mode if more efficient in terms of memory allocation but frames are not dumpable and only valid during until the buffer is freed and back in control of the encoder
- Memory saving: when the pixel bpp is 8b, the size of the superblock can be reduced, thus saving memory.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com
include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 8bc0b31597d8..8a6e87bacadb 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -309,6 +309,7 @@ extern "C" { #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09 +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
/* add more to the end as needed */
@@ -804,6 +805,61 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
+/*
- Amlogic Video Framebuffer Compression modifiers
- Amlogic uses a proprietary lossless image compression protocol and format
- for their hardware video codec accelerators, either video decoders or
- video input encoders.
- It considerably reduces memory bandwidth while writing and reading
- frames in memory.
- Implementation details may be platform and SoC specific, and shared
- between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
It can be an issue in two ways:
If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
Playing devil's advocate, the comment sounds similar to I915_FORMAT_MOD_{X,Y}_TILED:
- This format is highly platforms specific and not useful for cross-driver
- sharing. It exists since on a given platform it does uniquely identify the
- layout in a simple way for i915-specific userspace.
Yeah which we regret now. We need to now roll out a new set of modifiers for at least some of the differences in these on the modern-ish chips (the old crap is pretty much lost cause anyway).
This was kinda a nasty hack to smooth things over since we have epic amounts of userspace, but it's really not a great idea (and no one else really has epic amounts of existing userspace that uses tiling flags everywhere, this is all new code). -Daniel
Isn't the statement that this for sharing between producer and decoder _on the same platform_ a similar clause with the same effect?
What advantage is there to exposing the gory details? For Arm AFBC it's necessary because IP on the SoC can be (likely to be) from different vendors with different capabilities.
If this is only for talking between Amlogic IP on the same SoC, and those devices support all the same "flavours", I don't see what is gained by making userspace care about internals.
The trouble is if you mix&match IP cores, and one of them supports flavours A, B, C and the other C, D, E. But all you have is a single magic modifier for "whatever the flavour is that soc prefers". So someone gets to stuff this in DT.
Also eventually, maybe, perhaps ARM does grow up into the client/server space with add-on pcie graphics, and at least for client you very often end up with integrated + add-in pcie gpu. At that point you really can't have magic per-soc modifiers anymore.
If people get confused I'm happy to add a "WARNING: This was a dumb idea for backwards compat with legacy code, no one with new stuff ever repeat it" to the i915 modifers. -Daniel
Thanks, -Brian
Neil mentioned the data contains a "header" that further specifies things, but there is no specification about the header itself. Therefore I don't think we can even know if the header contains something that acts like a sub-modifier or not.
All this sounds like the modifier definitions here are not enough to fully interpret the data. At the very least I would expect a reference to a document explaining the "header", or even better, a kernel ReST doc.
I wonder if this is at all suitable as a DRM format modifier as is. I have been assuming that a modifier together with all the usual FB parameters should be enough to interpret the stored data, but in this case I have doubt it actually is.
I have no problem with proprietary data layouts as long as they are fully specified.
I do feel like I would not be able to write a software decoder for this set of modifiers given the details below.
Thanks, pq
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, 3 Mar 2020 12:37:16 +0100 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey brian.starkey@arm.com wrote:
Hi,
On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
Amlogic uses a proprietary lossless image compression protocol and format for their hardware video codec accelerators, either video decoders or video input encoders.
It considerably reduces memory bandwidth while writing and reading frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit per component, YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2 Amlogic VDEC decoder.
At least two options are supported :
- Scatter mode: the buffer is filled with a IOMMU scatter table referring to the encoder current memory layout. This mode if more efficient in terms of memory allocation but frames are not dumpable and only valid during until the buffer is freed and back in control of the encoder
- Memory saving: when the pixel bpp is 8b, the size of the superblock can be reduced, thus saving memory.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com
include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 8bc0b31597d8..8a6e87bacadb 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -309,6 +309,7 @@ extern "C" { #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09 +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
/* add more to the end as needed */
@@ -804,6 +805,61 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
+/*
- Amlogic Video Framebuffer Compression modifiers
- Amlogic uses a proprietary lossless image compression protocol and format
- for their hardware video codec accelerators, either video decoders or
- video input encoders.
- It considerably reduces memory bandwidth while writing and reading
- frames in memory.
- Implementation details may be platform and SoC specific, and shared
- between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
It can be an issue in two ways:
If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
Playing devil's advocate, the comment sounds similar to I915_FORMAT_MOD_{X,Y}_TILED:
- This format is highly platforms specific and not useful for cross-driver
- sharing. It exists since on a given platform it does uniquely identify the
- layout in a simple way for i915-specific userspace.
Yeah which we regret now. We need to now roll out a new set of modifiers for at least some of the differences in these on the modern-ish chips (the old crap is pretty much lost cause anyway).
This was kinda a nasty hack to smooth things over since we have epic amounts of userspace, but it's really not a great idea (and no one else really has epic amounts of existing userspace that uses tiling flags everywhere, this is all new code). -Daniel
Isn't the statement that this for sharing between producer and decoder _on the same platform_ a similar clause with the same effect?
What advantage is there to exposing the gory details? For Arm AFBC it's necessary because IP on the SoC can be (likely to be) from different vendors with different capabilities.
If this is only for talking between Amlogic IP on the same SoC, and those devices support all the same "flavours", I don't see what is gained by making userspace care about internals.
The trouble is if you mix&match IP cores, and one of them supports flavours A, B, C and the other C, D, E. But all you have is a single magic modifier for "whatever the flavour is that soc prefers". So someone gets to stuff this in DT.
Also eventually, maybe, perhaps ARM does grow up into the client/server space with add-on pcie graphics, and at least for client you very often end up with integrated + add-in pcie gpu. At that point you really can't have magic per-soc modifiers anymore.
Hi,
I also heard that Pipewire will copy buffers and modifiers verbatim from one machine to another when streaming across network, assuming that the same modifier means the same thing on all machines.[Citation needed]
If that is something that must not be done with DRM modifiers, then please contact them and document that.
Thanks, pq
If people get confused I'm happy to add a "WARNING: This was a dumb idea for backwards compat with legacy code, no one with new stuff ever repeat it" to the i915 modifers. -Daniel
Thanks, -Brian
Neil mentioned the data contains a "header" that further specifies things, but there is no specification about the header itself. Therefore I don't think we can even know if the header contains something that acts like a sub-modifier or not.
All this sounds like the modifier definitions here are not enough to fully interpret the data. At the very least I would expect a reference to a document explaining the "header", or even better, a kernel ReST doc.
I wonder if this is at all suitable as a DRM format modifier as is. I have been assuming that a modifier together with all the usual FB parameters should be enough to interpret the stored data, but in this case I have doubt it actually is.
I have no problem with proprietary data layouts as long as they are fully specified.
I do feel like I would not be able to write a software decoder for this set of modifiers given the details below.
Thanks, pq
On Tue, 3 Mar 2020 15:25:41 +0200 Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 3 Mar 2020 12:37:16 +0100 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey brian.starkey@arm.com wrote:
Hi,
On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
...
+/*
- Amlogic Video Framebuffer Compression modifiers
- Amlogic uses a proprietary lossless image compression protocol and format
- for their hardware video codec accelerators, either video decoders or
- video input encoders.
- It considerably reduces memory bandwidth while writing and reading
- frames in memory.
- Implementation details may be platform and SoC specific, and shared
- between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
It can be an issue in two ways:
If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
Playing devil's advocate, the comment sounds similar to I915_FORMAT_MOD_{X,Y}_TILED:
- This format is highly platforms specific and not useful for cross-driver
- sharing. It exists since on a given platform it does uniquely identify the
- layout in a simple way for i915-specific userspace.
Yeah which we regret now. We need to now roll out a new set of modifiers for at least some of the differences in these on the modern-ish chips (the old crap is pretty much lost cause anyway).
This was kinda a nasty hack to smooth things over since we have epic amounts of userspace, but it's really not a great idea (and no one else really has epic amounts of existing userspace that uses tiling flags everywhere, this is all new code). -Daniel
Isn't the statement that this for sharing between producer and decoder _on the same platform_ a similar clause with the same effect?
What advantage is there to exposing the gory details? For Arm AFBC it's necessary because IP on the SoC can be (likely to be) from different vendors with different capabilities.
If this is only for talking between Amlogic IP on the same SoC, and those devices support all the same "flavours", I don't see what is gained by making userspace care about internals.
The trouble is if you mix&match IP cores, and one of them supports flavours A, B, C and the other C, D, E. But all you have is a single magic modifier for "whatever the flavour is that soc prefers". So someone gets to stuff this in DT.
Also eventually, maybe, perhaps ARM does grow up into the client/server space with add-on pcie graphics, and at least for client you very often end up with integrated + add-in pcie gpu. At that point you really can't have magic per-soc modifiers anymore.
Hi,
I also heard that Pipewire will copy buffers and modifiers verbatim from one machine to another when streaming across network, assuming that the same modifier means the same thing on all machines.[Citation needed]
If that is something that must not be done with DRM modifiers, then please contact them and document that.
Sorry, it's waypipe, not pipewire: https://gitlab.freedesktop.org/mstoeckl/waypipe/
Thanks, pq
On Tue, Mar 03, 2020 at 05:33:32PM +0200, Pekka Paalanen wrote:
On Tue, 3 Mar 2020 15:25:41 +0200 Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 3 Mar 2020 12:37:16 +0100 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey brian.starkey@arm.com wrote:
Hi,
On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
...
+/*
- Amlogic Video Framebuffer Compression modifiers
- Amlogic uses a proprietary lossless image compression protocol and format
- for their hardware video codec accelerators, either video decoders or
- video input encoders.
- It considerably reduces memory bandwidth while writing and reading
- frames in memory.
- Implementation details may be platform and SoC specific, and shared
- between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
It can be an issue in two ways:
If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
Playing devil's advocate, the comment sounds similar to I915_FORMAT_MOD_{X,Y}_TILED:
- This format is highly platforms specific and not useful for cross-driver
- sharing. It exists since on a given platform it does uniquely identify the
- layout in a simple way for i915-specific userspace.
Yeah which we regret now. We need to now roll out a new set of modifiers for at least some of the differences in these on the modern-ish chips (the old crap is pretty much lost cause anyway).
This was kinda a nasty hack to smooth things over since we have epic amounts of userspace, but it's really not a great idea (and no one else really has epic amounts of existing userspace that uses tiling flags everywhere, this is all new code). -Daniel
Isn't the statement that this for sharing between producer and decoder _on the same platform_ a similar clause with the same effect?
What advantage is there to exposing the gory details? For Arm AFBC it's necessary because IP on the SoC can be (likely to be) from different vendors with different capabilities.
If this is only for talking between Amlogic IP on the same SoC, and those devices support all the same "flavours", I don't see what is gained by making userspace care about internals.
The trouble is if you mix&match IP cores, and one of them supports flavours A, B, C and the other C, D, E. But all you have is a single magic modifier for "whatever the flavour is that soc prefers". So someone gets to stuff this in DT.
Also eventually, maybe, perhaps ARM does grow up into the client/server space with add-on pcie graphics, and at least for client you very often end up with integrated + add-in pcie gpu. At that point you really can't have magic per-soc modifiers anymore.
Hi,
I also heard that Pipewire will copy buffers and modifiers verbatim from one machine to another when streaming across network, assuming that the same modifier means the same thing on all machines.[Citation needed]
If that is something that must not be done with DRM modifiers, then please contact them and document that.
Sorry, it's waypipe, not pipewire: https://gitlab.freedesktop.org/mstoeckl/waypipe/
I do think this is very much something we want to make possible. They might pick a silly modifier (compression modifiers only compress bw, by necessity the lossless ones have to increase storage space so kinda dumb thing to push over the network if you don't add .xz or whatever on top).
I'm also hoping that intel's modifiers are definitely the one and only that we ever screwed up, and we should be getting those fixed in the near future too.
So maybe what we should do instead is add a comment to the modifier docs that this stuff _is_ supposed to be transferrable over networks and work. -Daniel
On Fri, 6 Mar 2020 11:13:28 +0100 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Mar 03, 2020 at 05:33:32PM +0200, Pekka Paalanen wrote:
On Tue, 3 Mar 2020 15:25:41 +0200 Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 3 Mar 2020 12:37:16 +0100 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey brian.starkey@arm.com wrote:
Hi,
On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
...
> +/* > + * Amlogic Video Framebuffer Compression modifiers > + * > + * Amlogic uses a proprietary lossless image compression protocol and format > + * for their hardware video codec accelerators, either video decoders or > + * video input encoders. > + * > + * It considerably reduces memory bandwidth while writing and reading > + * frames in memory. > + * Implementation details may be platform and SoC specific, and shared > + * between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
It can be an issue in two ways:
If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
Playing devil's advocate, the comment sounds similar to I915_FORMAT_MOD_{X,Y}_TILED:
- This format is highly platforms specific and not useful for cross-driver
- sharing. It exists since on a given platform it does uniquely identify the
- layout in a simple way for i915-specific userspace.
Yeah which we regret now. We need to now roll out a new set of modifiers for at least some of the differences in these on the modern-ish chips (the old crap is pretty much lost cause anyway).
This was kinda a nasty hack to smooth things over since we have epic amounts of userspace, but it's really not a great idea (and no one else really has epic amounts of existing userspace that uses tiling flags everywhere, this is all new code). -Daniel
Isn't the statement that this for sharing between producer and decoder _on the same platform_ a similar clause with the same effect?
What advantage is there to exposing the gory details? For Arm AFBC it's necessary because IP on the SoC can be (likely to be) from different vendors with different capabilities.
If this is only for talking between Amlogic IP on the same SoC, and those devices support all the same "flavours", I don't see what is gained by making userspace care about internals.
The trouble is if you mix&match IP cores, and one of them supports flavours A, B, C and the other C, D, E. But all you have is a single magic modifier for "whatever the flavour is that soc prefers". So someone gets to stuff this in DT.
Also eventually, maybe, perhaps ARM does grow up into the client/server space with add-on pcie graphics, and at least for client you very often end up with integrated + add-in pcie gpu. At that point you really can't have magic per-soc modifiers anymore.
Hi,
I also heard that Pipewire will copy buffers and modifiers verbatim from one machine to another when streaming across network, assuming that the same modifier means the same thing on all machines.[Citation needed]
If that is something that must not be done with DRM modifiers, then please contact them and document that.
Sorry, it's waypipe, not pipewire: https://gitlab.freedesktop.org/mstoeckl/waypipe/
I do think this is very much something we want to make possible. They might pick a silly modifier (compression modifiers only compress bw, by necessity the lossless ones have to increase storage space so kinda dumb thing to push over the network if you don't add .xz or whatever on top).
I'm also hoping that intel's modifiers are definitely the one and only that we ever screwed up, and we should be getting those fixed in the near future too.
So maybe what we should do instead is add a comment to the modifier docs that this stuff _is_ supposed to be transferrable over networks and work.
Personally I was not sure if it was so. Good to hear it is. Writing it down would be much appreciated.
While at it, could you also write down something about the requirements of memory layout documentation? What I mean is, is it required that the memory layout is publicly specified *somewhere* if not in the modifier doc itself?
It's not necessary for anyone to actually know the memory layout when the use cases only involve hardware access, but if there is no public spec I fear it would be easy to adapt an incompatible layout somewhere and never be able to notice until some rare case of interoperability mysteriously produces garbage.
Thanks, pq
Hi Pekka, Brian, Daniel,
On 06/03/2020 11:13, Daniel Vetter wrote:
On Tue, Mar 03, 2020 at 05:33:32PM +0200, Pekka Paalanen wrote:
On Tue, 3 Mar 2020 15:25:41 +0200 Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 3 Mar 2020 12:37:16 +0100 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey brian.starkey@arm.com wrote:
Hi,
On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
...
> +/* > + * Amlogic Video Framebuffer Compression modifiers > + * > + * Amlogic uses a proprietary lossless image compression protocol and format > + * for their hardware video codec accelerators, either video decoders or > + * video input encoders. > + * > + * It considerably reduces memory bandwidth while writing and reading > + * frames in memory. > + * Implementation details may be platform and SoC specific, and shared > + * between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
This one is definitely only for the SCATTER modifier, not the DEFAULT and MEM_SAVING.
It can be an issue in two ways:
- If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
It's clearly not.
The DEFAULT and MEM_SAVING modifiers are clearly transferable, and their layout is extremely simple. While we don't have the memory compression algorithm, the memory layout is simple to describe and doesn't act as a sub-modifier.
The complexity lies in the SCATTER modifier, which describe an instant live memory layout, that is not transferable and with an unknown and variable layout.
- If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
It's not the case for the DEFAULT and MEM_SAVING modifiers.
The SCATTER modifier is mandatory for the Amlogic G12A and G12B HW video decoder, but the same HW is capable of displaying the non-SCATTER buffer for example.
Playing devil's advocate, the comment sounds similar to I915_FORMAT_MOD_{X,Y}_TILED:
- This format is highly platforms specific and not useful for cross-driver
- sharing. It exists since on a given platform it does uniquely identify the
- layout in a simple way for i915-specific userspace.
Yeah which we regret now. We need to now roll out a new set of modifiers for at least some of the differences in these on the modern-ish chips (the old crap is pretty much lost cause anyway).
This was kinda a nasty hack to smooth things over since we have epic amounts of userspace, but it's really not a great idea (and no one else really has epic amounts of existing userspace that uses tiling flags everywhere, this is all new code). -Daniel
Isn't the statement that this for sharing between producer and decoder _on the same platform_ a similar clause with the same effect?
What advantage is there to exposing the gory details? For Arm AFBC it's necessary because IP on the SoC can be (likely to be) from different vendors with different capabilities.
If this is only for talking between Amlogic IP on the same SoC, and those devices support all the same "flavours", I don't see what is gained by making userspace care about internals.
The trouble is if you mix&match IP cores, and one of them supports flavours A, B, C and the other C, D, E. But all you have is a single magic modifier for "whatever the flavour is that soc prefers". So someone gets to stuff this in DT.
This is not the case here, maybe I should explicit the "DEFAULT" modifier with a bit like "BASIC" to explicitly define support for the currently defined DEFAULT mode.
Also eventually, maybe, perhaps ARM does grow up into the client/server space with add-on pcie graphics, and at least for client you very often end up with integrated + add-in pcie gpu. At that point you really can't have magic per-soc modifiers anymore.
Hi,
I also heard that Pipewire will copy buffers and modifiers verbatim from one machine to another when streaming across network, assuming that the same modifier means the same thing on all machines.[Citation needed]
Transferring AFBC buffers doesn't sound like a good idea to me....
If that is something that must not be done with DRM modifiers, then please contact them and document that.
Sorry, it's waypipe, not pipewire: https://gitlab.freedesktop.org/mstoeckl/waypipe/
I do think this is very much something we want to make possible. They might pick a silly modifier (compression modifiers only compress bw, by necessity the lossless ones have to increase storage space so kinda dumb thing to push over the network if you don't add .xz or whatever on top).
The AFBC, and Amlogic FBC are not size optimized compressions, but really layout and memory access optimized compressions, without a proper network size compression, transferring plain NV12 would be the same.
I'm also hoping that intel's modifiers are definitely the one and only that we ever screwed up, and we should be getting those fixed in the near future too.
I'd like too.
So maybe what we should do instead is add a comment to the modifier docs that this stuff _is_ supposed to be transferrable over networks and work.
Only the "SCATTER" is not transferable, the other options are definitely transferable, and across 6 families and at least between a minimum of 15 different upstream supported SoCs.
Should it be in the modifier description ? should I add a reserved bit in the Amlogic modifier space describing it's non-transferable nature ?
-Daniel
Neil
On Fri, 6 Mar 2020 15:40:01 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
Hi Pekka, Brian, Daniel,
On 06/03/2020 11:13, Daniel Vetter wrote:
On Tue, Mar 03, 2020 at 05:33:32PM +0200, Pekka Paalanen wrote:
...
Sorry, it's waypipe, not pipewire: https://gitlab.freedesktop.org/mstoeckl/waypipe/
I do think this is very much something we want to make possible. They might pick a silly modifier (compression modifiers only compress bw, by necessity the lossless ones have to increase storage space so kinda dumb thing to push over the network if you don't add .xz or whatever on top).
The AFBC, and Amlogic FBC are not size optimized compressions, but really layout and memory access optimized compressions, without a proper network size compression, transferring plain NV12 would be the same.
FWIW, waypipe is not intended to be the most efficient network streaming protocol, but it is intended to be a direct Wayland protocol proxy (X11 forwarding, anyone?), which means that it needs to be able to transmit also dmabuf buffers as is. It does not want to understand modifiers but just send opaque data.
It may or may not do lossless compression of the data it sends over the wire, but it will replicate the dmabuf on the remote end.
Or so I'm told.
Thanks, pq
On Tue, Mar 03, 2020 at 12:37:16PM +0100, Daniel Vetter wrote:
On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey brian.starkey@arm.com wrote:
Hi,
On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
On Fri, 21 Feb 2020 10:08:42 +0100 Neil Armstrong narmstrong@baylibre.com wrote:
Amlogic uses a proprietary lossless image compression protocol and format for their hardware video codec accelerators, either video decoders or video input encoders.
It considerably reduces memory bandwidth while writing and reading frames in memory.
The underlying storage is considered to be 3 components, 8bit or 10-bit per component, YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT
This modifier will be notably added to DMA-BUF frames imported from the V4L2 Amlogic VDEC decoder.
At least two options are supported :
- Scatter mode: the buffer is filled with a IOMMU scatter table referring to the encoder current memory layout. This mode if more efficient in terms of memory allocation but frames are not dumpable and only valid during until the buffer is freed and back in control of the encoder
- Memory saving: when the pixel bpp is 8b, the size of the superblock can be reduced, thus saving memory.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com
include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 8bc0b31597d8..8a6e87bacadb 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -309,6 +309,7 @@ extern "C" { #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09 +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
/* add more to the end as needed */
@@ -804,6 +805,61 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
+/*
- Amlogic Video Framebuffer Compression modifiers
- Amlogic uses a proprietary lossless image compression protocol and format
- for their hardware video codec accelerators, either video decoders or
- video input encoders.
- It considerably reduces memory bandwidth while writing and reading
- frames in memory.
- Implementation details may be platform and SoC specific, and shared
- between the producer and the decoder on the same platform.
Hi,
after a lengthy IRC discussion on #dri-devel, this "may be platform and SoC specific" is a problem.
It can be an issue in two ways:
If something in the data acts like a sub-modifier, then advertising support for one modifier does not really tell if the data layout is supported or not.
If you need to know the platform and/or SoC to be able to interpret the data, it means the modifier is ill-defined and cannot be used in inter-machine communication (e.g. Pipewire).
Playing devil's advocate, the comment sounds similar to I915_FORMAT_MOD_{X,Y}_TILED:
- This format is highly platforms specific and not useful for cross-driver
- sharing. It exists since on a given platform it does uniquely identify the
- layout in a simple way for i915-specific userspace.
Yeah which we regret now. We need to now roll out a new set of modifiers for at least some of the differences in these on the modern-ish chips (the old crap is pretty much lost cause anyway).
This was kinda a nasty hack to smooth things over since we have epic amounts of userspace, but it's really not a great idea (and no one else really has epic amounts of existing userspace that uses tiling flags everywhere, this is all new code). -Daniel
Isn't the statement that this for sharing between producer and decoder _on the same platform_ a similar clause with the same effect?
What advantage is there to exposing the gory details? For Arm AFBC it's necessary because IP on the SoC can be (likely to be) from different vendors with different capabilities.
If this is only for talking between Amlogic IP on the same SoC, and those devices support all the same "flavours", I don't see what is gained by making userspace care about internals.
The trouble is if you mix&match IP cores, and one of them supports flavours A, B, C and the other C, D, E. But all you have is a single magic modifier for "whatever the flavour is that soc prefers". So someone gets to stuff this in DT.
Yes, if incompatible support levels are possible, then they must be described, no disagreement there. That's why AFBC is so explicit.
Also eventually, maybe, perhaps ARM does grow up into the client/server space with add-on pcie graphics, and at least for client you very often end up with integrated + add-in pcie gpu. At that point you really can't have magic per-soc modifiers anymore.
I don't entirely agree. This is only relevant for modifiers which might be used between the PCIe GPU and the SoC (in your example). Per-SoC modifiers still work, they just lose meaning at the SoC boundary.
Looking at the description of DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER in particular, it sounds like that would never be shareable even if it had a more "complete" modifier.
If people get confused I'm happy to add a "WARNING: This was a dumb idea for backwards compat with legacy code, no one with new stuff ever repeat it" to the i915 modifers. -Daniel
I think marking it as non-preferred (and why) would be a good idea, so as not to use it as an example.
Cheers, -Brian
Thanks, -Brian
Neil mentioned the data contains a "header" that further specifies things, but there is no specification about the header itself. Therefore I don't think we can even know if the header contains something that acts like a sub-modifier or not.
All this sounds like the modifier definitions here are not enough to fully interpret the data. At the very least I would expect a reference to a document explaining the "header", or even better, a kernel ReST doc.
I wonder if this is at all suitable as a DRM format modifier as is. I have been assuming that a modifier together with all the usual FB parameters should be enough to interpret the stored data, but in this case I have doubt it actually is.
I have no problem with proprietary data layouts as long as they are fully specified.
I do feel like I would not be able to write a software decoder for this set of modifiers given the details below.
Thanks, pq
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Add the registers of the VPU VD1 Amlogic FBC decoder module, and routing register.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com --- drivers/gpu/drm/meson/meson_registers.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/meson/meson_registers.h b/drivers/gpu/drm/meson/meson_registers.h index 8ea00546cd4e..08631fdfe4b9 100644 --- a/drivers/gpu/drm/meson/meson_registers.h +++ b/drivers/gpu/drm/meson/meson_registers.h @@ -144,10 +144,15 @@ #define VIU_SW_RESET_OSD1 BIT(0) #define VIU_MISC_CTRL0 0x1a06 #define VIU_CTRL0_VD1_AFBC_MASK 0x170000 +#define VIU_CTRL0_AFBC_TO_VD1 BIT(20) #define VIU_MISC_CTRL1 0x1a07 #define MALI_AFBC_MISC GENMASK(15, 8) #define D2D3_INTF_LENGTH 0x1a08 #define D2D3_INTF_CTRL0 0x1a09 +#define VD1_AFBCD0_MISC_CTRL 0x1a0a +#define VD1_AXI_SEL_AFBC (1 << 12) +#define AFBC_VD1_SEL (1 << 10) +#define VD2_AFBCD1_MISC_CTRL 0x1a0b #define VIU_OSD1_CTRL_STAT 0x1a10 #define VIU_OSD1_OSD_BLK_ENABLE BIT(0) #define VIU_OSD1_OSD_MEM_MODE_LINEAR BIT(2) @@ -365,6 +370,23 @@ #define VIU_OSD1_OETF_LUT_ADDR_PORT 0x1add #define VIU_OSD1_OETF_LUT_DATA_PORT 0x1ade #define AFBC_ENABLE 0x1ae0 +#define AFBC_MODE 0x1ae1 +#define AFBC_SIZE_IN 0x1ae2 +#define AFBC_DEC_DEF_COLOR 0x1ae3 +#define AFBC_CONV_CTRL 0x1ae4 +#define AFBC_LBUF_DEPTH 0x1ae5 +#define AFBC_HEAD_BADDR 0x1ae6 +#define AFBC_BODY_BADDR 0x1ae7 +#define AFBC_SIZE_OUT 0x1ae8 +#define AFBC_OUT_YSCOPE 0x1ae9 +#define AFBC_STAT 0x1aea +#define AFBC_VD_CFMT_CTRL 0x1aeb +#define AFBC_VD_CFMT_W 0x1aec +#define AFBC_MIF_HOR_SCOPE 0x1aed +#define AFBC_MIF_VER_SCOPE 0x1aee +#define AFBC_PIXEL_HOR_SCOPE 0x1aef +#define AFBC_PIXEL_VER_SCOPE 0x1af0 +#define AFBC_VD_CFMT_H 0x1af1
/* vpp */ #define VPP_DUMMY_DATA 0x1d00
Setup the Amlogic FBC decoder for the VD1 video overlay plane.
The VD1 Amlogic FBC decoder is integrated in the pipeline like the YUV pixel reading/formatter but used a direct memory address instead.
The default mode needs to calculate the content body size since the header is allocated after.
The scatter mode needs a simplier management since only the header is needed, since it contains an IOMMU scatter table to locate the superblocks in memory.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com --- drivers/gpu/drm/meson/meson_drv.h | 16 ++ drivers/gpu/drm/meson/meson_overlay.c | 257 +++++++++++++++++++++++++- 2 files changed, 265 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/meson/meson_drv.h b/drivers/gpu/drm/meson/meson_drv.h index 04fdf3826643..da951964e988 100644 --- a/drivers/gpu/drm/meson/meson_drv.h +++ b/drivers/gpu/drm/meson/meson_drv.h @@ -80,6 +80,7 @@ struct meson_drm {
bool vd1_enabled; bool vd1_commit; + bool vd1_afbc; unsigned int vd1_planes; uint32_t vd1_if0_gen_reg; uint32_t vd1_if0_luma_x0; @@ -105,6 +106,21 @@ struct meson_drm { uint32_t vd1_height0; uint32_t vd1_height1; uint32_t vd1_height2; + uint32_t vd1_afbc_mode; + uint32_t vd1_afbc_en; + uint32_t vd1_afbc_head_addr; + uint32_t vd1_afbc_body_addr; + uint32_t vd1_afbc_conv_ctrl; + uint32_t vd1_afbc_dec_def_color; + uint32_t vd1_afbc_vd_cfmt_ctrl; + uint32_t vd1_afbc_vd_cfmt_w; + uint32_t vd1_afbc_vd_cfmt_h; + uint32_t vd1_afbc_mif_hor_scope; + uint32_t vd1_afbc_mif_ver_scope; + uint32_t vd1_afbc_size_out; + uint32_t vd1_afbc_pixel_hor_scope; + uint32_t vd1_afbc_pixel_ver_scope; + uint32_t vd1_afbc_size_in; uint32_t vpp_pic_in_height; uint32_t vpp_postblend_vd1_h_start_end; uint32_t vpp_postblend_vd1_v_start_end; diff --git a/drivers/gpu/drm/meson/meson_overlay.c b/drivers/gpu/drm/meson/meson_overlay.c index 2468b0212d52..1fbb81732e9a 100644 --- a/drivers/gpu/drm/meson/meson_overlay.c +++ b/drivers/gpu/drm/meson/meson_overlay.c @@ -5,6 +5,7 @@ * Copyright (C) 2015 Amlogic, Inc. All rights reserved. */
+#define DEBUG #include <linux/bitfield.h>
#include <drm/drm_atomic.h> @@ -76,6 +77,84 @@ #define VD_REGION24_START(value) FIELD_PREP(GENMASK(11, 0), value) #define VD_REGION13_END(value) FIELD_PREP(GENMASK(27, 16), value)
+/* AFBC_ENABLE */ +#define AFBC_DEC_ENABLE BIT(8) +#define AFBC_FRM_START BIT(0) + +/* AFBC_MODE */ +#define AFBC_HORZ_SKIP_UV(value) FIELD_PREP(GENMASK(1, 0), value) +#define AFBC_VERT_SKIP_UV(value) FIELD_PREP(GENMASK(3, 2), value) +#define AFBC_HORZ_SKIP_Y(value) FIELD_PREP(GENMASK(5, 4), value) +#define AFBC_VERT_SKIP_Y(value) FIELD_PREP(GENMASK(7, 6), value) +#define AFBC_COMPBITS_YUV(value) FIELD_PREP(GENMASK(13, 8), value) +#define AFBC_COMPBITS_8BIT 0 +#define AFBC_COMPBITS_10BIT (2 | (2 << 2) | (2 << 4)) +#define AFBC_BURST_LEN(value) FIELD_PREP(GENMASK(15, 14), value) +#define AFBC_HOLD_LINE_NUM(value) FIELD_PREP(GENMASK(22, 16), value) +#define AFBC_MIF_URGENT(value) FIELD_PREP(GENMASK(25, 24), value) +#define AFBC_REV_MODE(value) FIELD_PREP(GENMASK(27, 26), value) +#define AFBC_BLK_MEM_MODE BIT(28) +#define AFBC_SCATTER_MODE BIT(29) +#define AFBC_SOFT_RESET BIT(31) + +/* AFBC_SIZE_IN */ +#define AFBC_HSIZE_IN(value) FIELD_PREP(GENMASK(28, 16), value) +#define AFBC_VSIZE_IN(value) FIELD_PREP(GENMASK(12, 0), value) + +/* AFBC_DEC_DEF_COLOR */ +#define AFBC_DEF_COLOR_Y(value) FIELD_PREP(GENMASK(29, 20), value) +#define AFBC_DEF_COLOR_U(value) FIELD_PREP(GENMASK(19, 10), value) +#define AFBC_DEF_COLOR_V(value) FIELD_PREP(GENMASK(9, 0), value) + +/* AFBC_CONV_CTRL */ +#define AFBC_CONV_LBUF_LEN(value) FIELD_PREP(GENMASK(11, 0), value) + +/* AFBC_LBUF_DEPTH */ +#define AFBC_DEC_LBUF_DEPTH(value) FIELD_PREP(GENMASK(27, 16), value) +#define AFBC_MIF_LBUF_DEPTH(value) FIELD_PREP(GENMASK(11, 0), value) + +/* AFBC_OUT_XSCOPE/AFBC_SIZE_OUT */ +#define AFBC_HSIZE_OUT(value) FIELD_PREP(GENMASK(28, 16), value) +#define AFBC_VSIZE_OUT(value) FIELD_PREP(GENMASK(12, 0), value) +#define AFBC_OUT_HORZ_BGN(value) FIELD_PREP(GENMASK(28, 16), value) +#define AFBC_OUT_HORZ_END(value) FIELD_PREP(GENMASK(12, 0), value) + +/* AFBC_OUT_YSCOPE */ +#define AFBC_OUT_VERT_BGN(value) FIELD_PREP(GENMASK(28, 16), value) +#define AFBC_OUT_VERT_END(value) FIELD_PREP(GENMASK(12, 0), value) + +/* AFBC_VD_CFMT_CTRL */ +#define AFBC_HORZ_RPT_PIXEL0 BIT(23) +#define AFBC_HORZ_Y_C_RATIO(value) FIELD_PREP(GENMASK(22, 21), value) +#define AFBC_HORZ_FMT_EN BIT(20) +#define AFBC_VERT_RPT_LINE0 BIT(16) +#define AFBC_VERT_INITIAL_PHASE(value) FIELD_PREP(GENMASK(11, 8), value) +#define AFBC_VERT_PHASE_STEP(value) FIELD_PREP(GENMASK(7, 1), value) +#define AFBC_VERT_FMT_EN BIT(0) + +/* AFBC_VD_CFMT_W */ +#define AFBC_VD_V_WIDTH(value) FIELD_PREP(GENMASK(11, 0), value) +#define AFBC_VD_H_WIDTH(value) FIELD_PREP(GENMASK(27, 16), value) + +/* AFBC_MIF_HOR_SCOPE */ +#define AFBC_MIF_BLK_BGN_H(value) FIELD_PREP(GENMASK(25, 16), value) +#define AFBC_MIF_BLK_END_H(value) FIELD_PREP(GENMASK(9, 0), value) + +/* AFBC_MIF_VER_SCOPE */ +#define AFBC_MIF_BLK_BGN_V(value) FIELD_PREP(GENMASK(27, 16), value) +#define AFBC_MIF_BLK_END_V(value) FIELD_PREP(GENMASK(11, 0), value) + +/* AFBC_PIXEL_HOR_SCOPE */ +#define AFBC_DEC_PIXEL_BGN_H(value) FIELD_PREP(GENMASK(28, 16), value) +#define AFBC_DEC_PIXEL_END_H(value) FIELD_PREP(GENMASK(12, 0), value) + +/* AFBC_PIXEL_VER_SCOPE */ +#define AFBC_DEC_PIXEL_BGN_V(value) FIELD_PREP(GENMASK(28, 16), value) +#define AFBC_DEC_PIXEL_END_V(value) FIELD_PREP(GENMASK(12, 0), value) + +/* AFBC_VD_CFMT_H */ +#define AFBC_VD_HEIGHT(value) FIELD_PREP(GENMASK(12, 0), value) + struct meson_overlay { struct drm_plane base; struct meson_drm *priv; @@ -157,6 +236,9 @@ static void meson_overlay_setup_scaler_params(struct meson_drm *priv, unsigned int ratio_x, ratio_y; int temp_height, temp_width; unsigned int w_in, h_in; + int afbc_left, afbc_right; + int afbc_top_src, afbc_bottom_src; + int afbc_top, afbc_bottom; int temp, start, end;
if (!crtc_state) { @@ -169,7 +251,7 @@ static void meson_overlay_setup_scaler_params(struct meson_drm *priv,
w_in = fixed16_to_int(state->src_w); h_in = fixed16_to_int(state->src_h); - crop_top = fixed16_to_int(state->src_x); + crop_top = fixed16_to_int(state->src_y); crop_left = fixed16_to_int(state->src_x);
video_top = state->crtc_y; @@ -243,6 +325,14 @@ static void meson_overlay_setup_scaler_params(struct meson_drm *priv, DRM_DEBUG("vsc startp %d endp %d start_lines %d end_lines %d\n", vsc_startp, vsc_endp, vd_start_lines, vd_end_lines);
+ afbc_top = round_down(vd_start_lines, 4); + afbc_bottom = round_up(vd_end_lines + 1, 4); + afbc_top_src = 0; + afbc_bottom_src = round_up(h_in + 1, 4); + + DRM_DEBUG("afbc top %d (src %d) bottom %d (src %d)\n", + afbc_top, afbc_top_src, afbc_bottom, afbc_bottom_src); + /* Horizontal */
start = video_left + video_width / 2 - ((w_in << 17) / ratio_x); @@ -278,6 +368,16 @@ static void meson_overlay_setup_scaler_params(struct meson_drm *priv, DRM_DEBUG("hsc startp %d endp %d start_lines %d end_lines %d\n", hsc_startp, hsc_endp, hd_start_lines, hd_end_lines);
+ if (hd_start_lines > 0 || (hd_end_lines < w_in)) { + afbc_left = 0; + afbc_right = round_up(w_in, 32); + } else { + afbc_left = round_down(hd_start_lines, 32); + afbc_right = round_up(hd_end_lines + 1, 32); + } + + DRM_DEBUG("afbc left %d right %d\n", afbc_left, afbc_right); + priv->viu.vpp_vsc_start_phase_step = ratio_y << 6;
priv->viu.vpp_vsc_ini_phase = vphase << 8; @@ -293,6 +393,35 @@ static void meson_overlay_setup_scaler_params(struct meson_drm *priv, VD_H_WIDTH(hd_end_lines - hd_start_lines + 1) | VD_V_WIDTH(hd_end_lines/2 - hd_start_lines/2 + 1);
+ priv->viu.vd1_afbc_vd_cfmt_w = + AFBC_VD_H_WIDTH(afbc_right - afbc_left) | + AFBC_VD_V_WIDTH(afbc_right / 2 - afbc_left / 2); + + priv->viu.vd1_afbc_vd_cfmt_h = + AFBC_VD_HEIGHT((afbc_bottom - afbc_top) / 2); + + priv->viu.vd1_afbc_mif_hor_scope = AFBC_MIF_BLK_BGN_H(afbc_left / 32) | + AFBC_MIF_BLK_END_H((afbc_right / 32) - 1); + + priv->viu.vd1_afbc_mif_ver_scope = AFBC_MIF_BLK_BGN_V(afbc_top / 4) | + AFBC_MIF_BLK_END_H((afbc_bottom / 4) - 1); + + priv->viu.vd1_afbc_size_out = + AFBC_HSIZE_OUT(afbc_right - afbc_left) | + AFBC_VSIZE_OUT(afbc_bottom - afbc_top); + + priv->viu.vd1_afbc_pixel_hor_scope = + AFBC_DEC_PIXEL_BGN_H(hd_start_lines - afbc_left) | + AFBC_DEC_PIXEL_END_H(hd_end_lines - afbc_left); + + priv->viu.vd1_afbc_pixel_ver_scope = + AFBC_DEC_PIXEL_BGN_V(vd_start_lines - afbc_top) | + AFBC_DEC_PIXEL_END_V(vd_end_lines - afbc_top); + + priv->viu.vd1_afbc_size_in = + AFBC_HSIZE_IN(afbc_right - afbc_left) | + AFBC_VSIZE_IN(afbc_bottom_src - afbc_top_src); + priv->viu.vd1_if0_luma_y0 = VD_Y_START(vd_start_lines) | VD_Y_END(vd_end_lines);
@@ -350,11 +479,63 @@ static void meson_overlay_atomic_update(struct drm_plane *plane,
spin_lock_irqsave(&priv->drm->event_lock, flags);
- priv->viu.vd1_if0_gen_reg = VD_URGENT_CHROMA | - VD_URGENT_LUMA | - VD_HOLD_LINES(9) | - VD_CHRO_RPT_LASTL_CTRL | - VD_ENABLE; + if ((fb->modifier & DRM_FORMAT_MOD_AMLOGIC_FBC(0)) == + DRM_FORMAT_MOD_AMLOGIC_FBC(0)) { + priv->viu.vd1_afbc = true; + + priv->viu.vd1_afbc_mode = AFBC_MIF_URGENT(3) | + AFBC_HOLD_LINE_NUM(8) | + AFBC_BURST_LEN(2); + + if (fb->modifier & DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER) + priv->viu.vd1_afbc_mode |= AFBC_SCATTER_MODE; + + if (fb->modifier & DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING) + priv->viu.vd1_afbc_mode |= AFBC_BLK_MEM_MODE; + + priv->viu.vd1_afbc_en = 0x1600 | AFBC_DEC_ENABLE; + + priv->viu.vd1_afbc_conv_ctrl = AFBC_CONV_LBUF_LEN(256); + + priv->viu.vd1_afbc_dec_def_color = AFBC_DEF_COLOR_Y(1023); + + /* 420: horizontal / 2, vertical / 4 */ + priv->viu.vd1_afbc_vd_cfmt_ctrl = AFBC_HORZ_RPT_PIXEL0 | + AFBC_HORZ_Y_C_RATIO(1) | + AFBC_HORZ_FMT_EN | + AFBC_VERT_RPT_LINE0 | + AFBC_VERT_INITIAL_PHASE(12) | + AFBC_VERT_PHASE_STEP(8) | + AFBC_VERT_FMT_EN; + + switch (fb->format->format) { + /* AFBC Only formats */ + case DRM_FORMAT_YUV420_10BIT: + priv->viu.vd1_afbc_mode |= + AFBC_COMPBITS_YUV(AFBC_COMPBITS_10BIT); + priv->viu.vd1_afbc_dec_def_color |= + AFBC_DEF_COLOR_U(512) | + AFBC_DEF_COLOR_V(512); + break; + case DRM_FORMAT_YUV420_8BIT: + priv->viu.vd1_afbc_dec_def_color |= + AFBC_DEF_COLOR_U(128) | + AFBC_DEF_COLOR_V(128); + break; + } + + priv->viu.vd1_if0_gen_reg = 0; + priv->viu.vd1_if0_canvas0 = 0; + priv->viu.viu_vd1_fmt_ctrl = 0; + } else { + priv->viu.vd1_afbc = false; + + priv->viu.vd1_if0_gen_reg = VD_URGENT_CHROMA | + VD_URGENT_LUMA | + VD_HOLD_LINES(9) | + VD_CHRO_RPT_LASTL_CTRL | + VD_ENABLE; + }
/* Setup scaler params */ meson_overlay_setup_scaler_params(priv, plane, interlace_mode); @@ -370,6 +551,7 @@ static void meson_overlay_atomic_update(struct drm_plane *plane, priv->viu.vd1_if0_gen_reg2 = 0; priv->viu.viu_vd1_fmt_ctrl = 0;
+ /* None will match for AFBC Only formats */ switch (fb->format->format) { /* TOFIX DRM_FORMAT_RGB888 should be supported */ case DRM_FORMAT_YUYV: @@ -488,13 +670,42 @@ static void meson_overlay_atomic_update(struct drm_plane *plane, priv->viu.vd1_stride0 = fb->pitches[0]; priv->viu.vd1_height0 = drm_format_info_plane_height(fb->format, - fb->height, 0); + fb->height, 0); DRM_DEBUG("plane 0 addr 0x%x stride %d height %d\n", priv->viu.vd1_addr0, priv->viu.vd1_stride0, priv->viu.vd1_height0); }
+ if (priv->viu.vd1_afbc) { + if (priv->viu.vd1_afbc_mode & AFBC_SCATTER_MODE) { + /* + * In Scatter mode, the header contains the physical + * body content layout, thus the body content + * size isn't needed. + */ + priv->viu.vd1_afbc_head_addr = priv->viu.vd1_addr0 >> 4; + priv->viu.vd1_afbc_body_addr = 0; + } else { + /* Default mode is 4k per superblock */ + unsigned long block_size = 4096; + unsigned long body_size; + + /* 8bit mem saving mode is 3072bytes per superblock */ + if (priv->viu.vd1_afbc_mode & AFBC_BLK_MEM_MODE) + block_size = 3072; + + body_size = (ALIGN(priv->viu.vd1_stride0, 64) / 64) * + (ALIGN(priv->viu.vd1_height0, 32) / 32) * + block_size; + + priv->viu.vd1_afbc_body_addr = priv->viu.vd1_addr0 >> 4; + /* Header is after body content */ + priv->viu.vd1_afbc_head_addr = (priv->viu.vd1_addr0 + + body_size) >> 4; + } + } + priv->viu.vd1_enabled = true;
spin_unlock_irqrestore(&priv->drm->event_lock, flags); @@ -531,6 +742,23 @@ static const struct drm_plane_helper_funcs meson_overlay_helper_funcs = { .prepare_fb = drm_gem_fb_prepare_fb, };
+static bool meson_overlay_format_mod_supported(struct drm_plane *plane, + u32 format, u64 modifier) +{ + if (modifier == DRM_FORMAT_MOD_LINEAR && + format != DRM_FORMAT_YUV420_8BIT && + format != DRM_FORMAT_YUV420_10BIT) + return true; + + if ((modifier & DRM_FORMAT_MOD_AMLOGIC_FBC(0)) == + DRM_FORMAT_MOD_AMLOGIC_FBC(0) && + (format == DRM_FORMAT_YUV420_8BIT || + format == DRM_FORMAT_YUV420_10BIT)) + return true; + + return false; +} + static const struct drm_plane_funcs meson_overlay_funcs = { .update_plane = drm_atomic_helper_update_plane, .disable_plane = drm_atomic_helper_disable_plane, @@ -538,6 +766,7 @@ static const struct drm_plane_funcs meson_overlay_funcs = { .reset = drm_atomic_helper_plane_reset, .atomic_duplicate_state = drm_atomic_helper_plane_duplicate_state, .atomic_destroy_state = drm_atomic_helper_plane_destroy_state, + .format_mod_supported = meson_overlay_format_mod_supported, };
static const uint32_t supported_drm_formats[] = { @@ -549,6 +778,18 @@ static const uint32_t supported_drm_formats[] = { DRM_FORMAT_YUV420, DRM_FORMAT_YUV411, DRM_FORMAT_YUV410, + DRM_FORMAT_YUV420_8BIT, /* Amlogic FBC Only */ + DRM_FORMAT_YUV420_10BIT, /* Amlogic FBC Only */ +}; + +static const uint64_t format_modifiers[] = { + DRM_FORMAT_MOD_AMLOGIC_FBC(DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER | + DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING), + DRM_FORMAT_MOD_AMLOGIC_FBC(DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER), + DRM_FORMAT_MOD_AMLOGIC_FBC(DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING), + DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT, + DRM_FORMAT_MOD_LINEAR, + DRM_FORMAT_MOD_INVALID, };
int meson_overlay_create(struct meson_drm *priv) @@ -570,7 +811,7 @@ int meson_overlay_create(struct meson_drm *priv) &meson_overlay_funcs, supported_drm_formats, ARRAY_SIZE(supported_drm_formats), - NULL, + format_modifiers, DRM_PLANE_TYPE_OVERLAY, "meson_overlay_plane");
drm_plane_helper_add(plane, &meson_overlay_helper_funcs);
Since the VD1 Amlogic FBC decoder is now configured by the overlay driver, commit the right registers to decode the Amlogic FBC frame.
Signed-off-by: Neil Armstrong narmstrong@baylibre.com --- drivers/gpu/drm/meson/meson_crtc.c | 118 +++++++++++++++++++++-------- 1 file changed, 88 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/meson/meson_crtc.c b/drivers/gpu/drm/meson/meson_crtc.c index e66b6271ff58..d6dcfd654e9c 100644 --- a/drivers/gpu/drm/meson/meson_crtc.c +++ b/drivers/gpu/drm/meson/meson_crtc.c @@ -291,6 +291,10 @@ static void meson_crtc_enable_vd1(struct meson_drm *priv) VPP_VD1_PREBLEND | VPP_VD1_POSTBLEND | VPP_COLOR_MNG_ENABLE, priv->io_base + _REG(VPP_MISC)); + + writel_bits_relaxed(VIU_CTRL0_AFBC_TO_VD1, + priv->viu.vd1_afbc ? VIU_CTRL0_AFBC_TO_VD1 : 0, + priv->io_base + _REG(VIU_MISC_CTRL0)); }
static void meson_g12a_crtc_enable_vd1(struct meson_drm *priv) @@ -300,6 +304,10 @@ static void meson_g12a_crtc_enable_vd1(struct meson_drm *priv) VD_BLEND_POSTBLD_SRC_VD1 | VD_BLEND_POSTBLD_PREMULT_EN, priv->io_base + _REG(VD1_BLEND_SRC_CTRL)); + + writel_relaxed(priv->viu.vd1_afbc ? + (VD1_AXI_SEL_AFBC | AFBC_VD1_SEL) : 0, + priv->io_base + _REG(VD1_AFBCD0_MISC_CTRL)); }
void meson_crtc_irq(struct meson_drm *priv) @@ -383,36 +391,86 @@ void meson_crtc_irq(struct meson_drm *priv) /* Update the VD1 registers */ if (priv->viu.vd1_enabled && priv->viu.vd1_commit) {
- switch (priv->viu.vd1_planes) { - case 3: - meson_canvas_config(priv->canvas, - priv->canvas_id_vd1_2, - priv->viu.vd1_addr2, - priv->viu.vd1_stride2, - priv->viu.vd1_height2, - MESON_CANVAS_WRAP_NONE, - MESON_CANVAS_BLKMODE_LINEAR, - MESON_CANVAS_ENDIAN_SWAP64); - /* fallthrough */ - case 2: - meson_canvas_config(priv->canvas, - priv->canvas_id_vd1_1, - priv->viu.vd1_addr1, - priv->viu.vd1_stride1, - priv->viu.vd1_height1, - MESON_CANVAS_WRAP_NONE, - MESON_CANVAS_BLKMODE_LINEAR, - MESON_CANVAS_ENDIAN_SWAP64); - /* fallthrough */ - case 1: - meson_canvas_config(priv->canvas, - priv->canvas_id_vd1_0, - priv->viu.vd1_addr0, - priv->viu.vd1_stride0, - priv->viu.vd1_height0, - MESON_CANVAS_WRAP_NONE, - MESON_CANVAS_BLKMODE_LINEAR, - MESON_CANVAS_ENDIAN_SWAP64); + if (priv->viu.vd1_afbc) { + writel_relaxed(priv->viu.vd1_afbc_head_addr, + priv->io_base + + _REG(AFBC_HEAD_BADDR)); + writel_relaxed(priv->viu.vd1_afbc_body_addr, + priv->io_base + + _REG(AFBC_BODY_BADDR)); + writel_relaxed(priv->viu.vd1_afbc_en, + priv->io_base + + _REG(AFBC_ENABLE)); + writel_relaxed(priv->viu.vd1_afbc_mode, + priv->io_base + + _REG(AFBC_MODE)); + writel_relaxed(priv->viu.vd1_afbc_size_in, + priv->io_base + + _REG(AFBC_SIZE_IN)); + writel_relaxed(priv->viu.vd1_afbc_dec_def_color, + priv->io_base + + _REG(AFBC_DEC_DEF_COLOR)); + writel_relaxed(priv->viu.vd1_afbc_conv_ctrl, + priv->io_base + + _REG(AFBC_CONV_CTRL)); + writel_relaxed(priv->viu.vd1_afbc_size_out, + priv->io_base + + _REG(AFBC_SIZE_OUT)); + writel_relaxed(priv->viu.vd1_afbc_vd_cfmt_ctrl, + priv->io_base + + _REG(AFBC_VD_CFMT_CTRL)); + writel_relaxed(priv->viu.vd1_afbc_vd_cfmt_w, + priv->io_base + + _REG(AFBC_VD_CFMT_W)); + writel_relaxed(priv->viu.vd1_afbc_mif_hor_scope, + priv->io_base + + _REG(AFBC_MIF_HOR_SCOPE)); + writel_relaxed(priv->viu.vd1_afbc_mif_ver_scope, + priv->io_base + + _REG(AFBC_MIF_VER_SCOPE)); + writel_relaxed(priv->viu.vd1_afbc_pixel_hor_scope, + priv->io_base+ + _REG(AFBC_PIXEL_HOR_SCOPE)); + writel_relaxed(priv->viu.vd1_afbc_pixel_ver_scope, + priv->io_base + + _REG(AFBC_PIXEL_VER_SCOPE)); + writel_relaxed(priv->viu.vd1_afbc_vd_cfmt_h, + priv->io_base + + _REG(AFBC_VD_CFMT_H)); + } else { + switch (priv->viu.vd1_planes) { + case 3: + meson_canvas_config(priv->canvas, + priv->canvas_id_vd1_2, + priv->viu.vd1_addr2, + priv->viu.vd1_stride2, + priv->viu.vd1_height2, + MESON_CANVAS_WRAP_NONE, + MESON_CANVAS_BLKMODE_LINEAR, + MESON_CANVAS_ENDIAN_SWAP64); + /* fallthrough */ + case 2: + meson_canvas_config(priv->canvas, + priv->canvas_id_vd1_1, + priv->viu.vd1_addr1, + priv->viu.vd1_stride1, + priv->viu.vd1_height1, + MESON_CANVAS_WRAP_NONE, + MESON_CANVAS_BLKMODE_LINEAR, + MESON_CANVAS_ENDIAN_SWAP64); + /* fallthrough */ + case 1: + meson_canvas_config(priv->canvas, + priv->canvas_id_vd1_0, + priv->viu.vd1_addr0, + priv->viu.vd1_stride0, + priv->viu.vd1_height0, + MESON_CANVAS_WRAP_NONE, + MESON_CANVAS_BLKMODE_LINEAR, + MESON_CANVAS_ENDIAN_SWAP64); + } + + writel_relaxed(0, priv->io_base + _REG(AFBC_ENABLE)); }
writel_relaxed(priv->viu.vd1_if0_gen_reg,
dri-devel@lists.freedesktop.org