Hi Lionel,
On 21 January 2016 at 15:03, Lionel Landwerlin lionel.g.landwerlin@intel.com wrote:
Hi,
This serie introduces pipe level color management through a set of properties attached to the CRTC. It also provides an implementation for some Intel platforms.
This serie is based of a previous set of patches by Shashank Sharma and takes into account of the comments by Daniel Stone & Daniel Vetter.
This is a lot more tractable than previous series, thanks! I think a lot of the confusion I had around this was from the number of hardware-specific features stuffed into this, and the manner in which they were stuffed in. For example, with the previous series, it looks like you could configure both PRE_CTM and POST_CTM LUTs in 12-bit mode, which is impossible as the PRM suggests the only way to have both LUTs active is with split-gamma mode. (For anyone else looking at the Broadwell PRM, note the split-gamma mode describes the two LUTs completely backwards: the only thing that makes sense is for the pre-CTM LUT to have a range of 0..1.0, and the post-CTM LUT to have a range of -3.0..3.0, rather than the other way around.)
Now with everything just using split-gamma mode, I'm much happier with how this is looking. I took a look at some other architectures to see how this would fit, and also had a chat with Richard Hughes to clear some things up. AMD seems to support every possible mode under the sun, so should support any API we came up with. Most other architectures only implemented a single gamma table (equivalent to legacy gamma ramp), but there was one I have fairly detailed documentation for and also supports everything.
The degamma/colour-transform-matrix/gamma model is definitely a good one, and it seems like everyone agrees on a 3x3 matrix for CTM. So far, so good. What I worry about is the _values_ we put into the CTM.
Intel supports two quite fun properties of matrix output. Firstly, the range is (-3.0..3.0) rather than the (0.0..1.0) you might expect. Negative values are axis-mirrored, i.e. lut2_index = fabs(matrix_output), thus giving us a LUT range of (0.0..3.0). Secondly, whilst (0.0..1.0) is represented by linear LUT entries, the LUT values for (1.0..3.0) are calculated by a linear interpolation between LUT entry #512 (i.e. that for 1.0) and a bonus entry #513 (value for 3.0). I haven't seen this supported anywhere else, so would tend towards mirroring the last value into the extra supernumerary entry, i.e. emulate saturation for matrix output values to 1.0.
I don't really know what to do about negative values as CTM output, since the doc I have here is silent on whether negative values are similarly axis-mirrored/sign-stripped, or whether they are instead clamped to 0.0. Either way, I'm not really sure it's behaviour we can rely upon to be portable.
As a detail, the architecture I'm looking at has mixed granularity for the second (post-CTM) LUT: lower RGB-value entries have higher granularity (precision in LUT indexing), with lower granularity for higher entries. I don't think this is a problem though, since we can just decimate in the kernel (i.e. ignore every n'th LUT entry, to write a smaller LUT to hardware than we received to userspace).
Anyway, beyond that, it seems there are a few things we agree on: - optional pre-matrix ('degamma') per-channel LUT of variable length, but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0 - optional 3x3 matrix with input range [0.0,1.0], with output values saturating to 1.0, and negative values producing undefined behaviour - optional post-matrix ('gamma') per-channel LUT of variable length, but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
This would mean missing some Intel-specific features, but whether or not this is actually required I don't really know. At least it seems like it would be enough to implement standard ICC profile correction from Weston in a hardware-independent manner.
Thierry, Alex, did you have any comments or ideas on this?
Cheers, Daniel
-----Original Message----- From: Daniel Stone [mailto:daniel@fooishbar.org] Sent: Friday, January 22, 2016 10:05 AM To: Lionel Landwerlin Cc: intel-gfx; Thierry Reding; Deucher, Alexander; dri-devel Subject: Re: [Intel-gfx] [PATCH 0/6] Pipe level color management
Hi Lionel,
On 21 January 2016 at 15:03, Lionel Landwerlin lionel.g.landwerlin@intel.com wrote:
Hi,
This serie introduces pipe level color management through a set of
properties
attached to the CRTC. It also provides an implementation for some Intel platforms.
This serie is based of a previous set of patches by Shashank Sharma and
takes
into account of the comments by Daniel Stone & Daniel Vetter.
This is a lot more tractable than previous series, thanks! I think a lot of the confusion I had around this was from the number of hardware-specific features stuffed into this, and the manner in which they were stuffed in. For example, with the previous series, it looks like you could configure both PRE_CTM and POST_CTM LUTs in 12-bit mode, which is impossible as the PRM suggests the only way to have both LUTs active is with split-gamma mode. (For anyone else looking at the Broadwell PRM, note the split-gamma mode describes the two LUTs completely backwards: the only thing that makes sense is for the pre-CTM LUT to have a range of 0..1.0, and the post-CTM LUT to have a range of -3.0..3.0, rather than the other way around.)
Now with everything just using split-gamma mode, I'm much happier with how this is looking. I took a look at some other architectures to see how this would fit, and also had a chat with Richard Hughes to clear some things up. AMD seems to support every possible mode under the sun, so should support any API we came up with. Most other architectures only implemented a single gamma table (equivalent to legacy gamma ramp), but there was one I have fairly detailed documentation for and also supports everything.
The degamma/colour-transform-matrix/gamma model is definitely a good one, and it seems like everyone agrees on a 3x3 matrix for CTM. So far, so good. What I worry about is the _values_ we put into the CTM.
Intel supports two quite fun properties of matrix output. Firstly, the range is (-3.0..3.0) rather than the (0.0..1.0) you might expect. Negative values are axis-mirrored, i.e. lut2_index = fabs(matrix_output), thus giving us a LUT range of (0.0..3.0). Secondly, whilst (0.0..1.0) is represented by linear LUT entries, the LUT values for (1.0..3.0) are calculated by a linear interpolation between LUT entry #512 (i.e. that for 1.0) and a bonus entry #513 (value for 3.0). I haven't seen this supported anywhere else, so would tend towards mirroring the last value into the extra supernumerary entry, i.e. emulate saturation for matrix output values to 1.0.
I don't really know what to do about negative values as CTM output, since the doc I have here is silent on whether negative values are similarly axis-mirrored/sign-stripped, or whether they are instead clamped to 0.0. Either way, I'm not really sure it's behaviour we can rely upon to be portable.
As a detail, the architecture I'm looking at has mixed granularity for the second (post-CTM) LUT: lower RGB-value entries have higher granularity (precision in LUT indexing), with lower granularity for higher entries. I don't think this is a problem though, since we can just decimate in the kernel (i.e. ignore every n'th LUT entry, to write a smaller LUT to hardware than we received to userspace).
Anyway, beyond that, it seems there are a few things we agree on:
- optional pre-matrix ('degamma') per-channel LUT of variable
length, but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
- optional 3x3 matrix with input range [0.0,1.0], with output values
saturating to 1.0, and negative values producing undefined behaviour
- optional post-matrix ('gamma') per-channel LUT of variable length,
but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
This would mean missing some Intel-specific features, but whether or not this is actually required I don't really know. At least it seems like it would be enough to implement standard ICC profile correction from Weston in a hardware-independent manner.
Thierry, Alex, did you have any comments or ideas on this?
Adding Harry to comment as he's more familiar with our display hardware color management.
Alex
On 22/01/16 15:04, Daniel Stone wrote:
Hi Lionel,
On 21 January 2016 at 15:03, Lionel Landwerlin lionel.g.landwerlin@intel.com wrote:
Hi,
This serie introduces pipe level color management through a set of properties attached to the CRTC. It also provides an implementation for some Intel platforms.
This serie is based of a previous set of patches by Shashank Sharma and takes into account of the comments by Daniel Stone & Daniel Vetter.
This is a lot more tractable than previous series, thanks! I think a lot of the confusion I had around this was from the number of hardware-specific features stuffed into this, and the manner in which they were stuffed in. For example, with the previous series, it looks like you could configure both PRE_CTM and POST_CTM LUTs in 12-bit mode, which is impossible as the PRM suggests the only way to have both LUTs active is with split-gamma mode. (For anyone else looking at the Broadwell PRM, note the split-gamma mode describes the two LUTs completely backwards: the only thing that makes sense is for the pre-CTM LUT to have a range of 0..1.0, and the post-CTM LUT to have a range of -3.0..3.0, rather than the other way around.)
Now with everything just using split-gamma mode, I'm much happier with how this is looking. I took a look at some other architectures to see how this would fit, and also had a chat with Richard Hughes to clear some things up. AMD seems to support every possible mode under the sun, so should support any API we came up with. Most other architectures only implemented a single gamma table (equivalent to legacy gamma ramp), but there was one I have fairly detailed documentation for and also supports everything.
There might be some interest from others to have a single 12bits post csc gamma LUT. So I was going to submit another serie to enable this as a special case just for some generations of the Intel hardware once this work lands.
Obviously in order to program the hardware in that mode you would need to a userspace specially tuned for the particular platform on which it's running. This mode wouldn't be exposed through the current set of properties.
The degamma/colour-transform-matrix/gamma model is definitely a good one, and it seems like everyone agrees on a 3x3 matrix for CTM. So far, so good. What I worry about is the _values_ we put into the CTM.
Intel supports two quite fun properties of matrix output. Firstly, the range is (-3.0..3.0) rather than the (0.0..1.0) you might expect. Negative values are axis-mirrored, i.e. lut2_index = fabs(matrix_output), thus giving us a LUT range of (0.0..3.0). Secondly, whilst (0.0..1.0) is represented by linear LUT entries, the LUT values for (1.0..3.0) are calculated by a linear interpolation between LUT entry #512 (i.e. that for 1.0) and a bonus entry #513 (value for 3.0). I haven't seen this supported anywhere else, so would tend towards mirroring the last value into the extra supernumerary entry, i.e. emulate saturation for matrix output values to 1.0.
It's good you mention this, because I wrote a test assuming negative values would be clamped to (0.0..1.0) and the test passes :/ So maybe there is something fishy here...
I don't really know what to do about negative values as CTM output, since the doc I have here is silent on whether negative values are similarly axis-mirrored/sign-stripped, or whether they are instead clamped to 0.0. Either way, I'm not really sure it's behaviour we can rely upon to be portable.
Maybe we should compute both and verify at least one works?
We also discussed briefly the multiplication results. My tests show the Intel hardware seems to clamp the results. Let's say you have a 255 pixel going through a 0.5 coefficient, you should get 127.5 which would be rounded to 128. Intel hardware gives us 127. Same for a pixel at 255 and a coefficient of 0.25. You would get 63.75 so 64 rounded and my results show 63.
Again, computing both hypothesis might be a solution.
As a detail, the architecture I'm looking at has mixed granularity for the second (post-CTM) LUT: lower RGB-value entries have higher granularity (precision in LUT indexing), with lower granularity for higher entries. I don't think this is a problem though, since we can just decimate in the kernel (i.e. ignore every n'th LUT entry, to write a smaller LUT to hardware than we received to userspace).
Anyway, beyond that, it seems there are a few things we agree on:
- optional pre-matrix ('degamma') per-channel LUT of variable
length, but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
- optional 3x3 matrix with input range [0.0,1.0], with output values
saturating to 1.0, and negative values producing undefined behaviour
- optional post-matrix ('gamma') per-channel LUT of variable length,
but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
This would mean missing some Intel-specific features, but whether or not this is actually required I don't really know. At least it seems like it would be enough to implement standard ICC profile correction from Weston in a hardware-independent manner.
Thierry, Alex, did you have any comments or ideas on this?
Cheers, Daniel
Thanks for your feedback!
- Lionel
On Fri, Jan 22, 2016 at 04:06:15PM +0000, Lionel Landwerlin wrote:
On 22/01/16 15:04, Daniel Stone wrote:
Hi Lionel,
On 21 January 2016 at 15:03, Lionel Landwerlin lionel.g.landwerlin@intel.com wrote:
Hi,
This serie introduces pipe level color management through a set of properties attached to the CRTC. It also provides an implementation for some Intel platforms.
This serie is based of a previous set of patches by Shashank Sharma and takes into account of the comments by Daniel Stone & Daniel Vetter.
This is a lot more tractable than previous series, thanks! I think a lot of the confusion I had around this was from the number of hardware-specific features stuffed into this, and the manner in which they were stuffed in. For example, with the previous series, it looks like you could configure both PRE_CTM and POST_CTM LUTs in 12-bit mode, which is impossible as the PRM suggests the only way to have both LUTs active is with split-gamma mode. (For anyone else looking at the Broadwell PRM, note the split-gamma mode describes the two LUTs completely backwards: the only thing that makes sense is for the pre-CTM LUT to have a range of 0..1.0, and the post-CTM LUT to have a range of -3.0..3.0, rather than the other way around.)
Now with everything just using split-gamma mode, I'm much happier with how this is looking. I took a look at some other architectures to see how this would fit, and also had a chat with Richard Hughes to clear some things up. AMD seems to support every possible mode under the sun, so should support any API we came up with. Most other architectures only implemented a single gamma table (equivalent to legacy gamma ramp), but there was one I have fairly detailed documentation for and also supports everything.
There might be some interest from others to have a single 12bits post csc gamma LUT. So I was going to submit another serie to enable this as a special case just for some generations of the Intel hardware once this work lands.
Obviously in order to program the hardware in that mode you would need to a userspace specially tuned for the particular platform on which it's running. This mode wouldn't be exposed through the current set of properties.
Yeah, the idea there is that we don't tell this userspace explicitly, but it can be used, i.e. a) in the config properties we advertise the sizes of the split gamma table b) but when userspace supplies an atomic request that matches the larger 12bit gamma, and no pre-ctm gamma, then we'll use that.
The degamma/colour-transform-matrix/gamma model is definitely a good one, and it seems like everyone agrees on a 3x3 matrix for CTM. So far, so good. What I worry about is the _values_ we put into the CTM.
Intel supports two quite fun properties of matrix output. Firstly, the range is (-3.0..3.0) rather than the (0.0..1.0) you might expect. Negative values are axis-mirrored, i.e. lut2_index = fabs(matrix_output), thus giving us a LUT range of (0.0..3.0). Secondly, whilst (0.0..1.0) is represented by linear LUT entries, the LUT values for (1.0..3.0) are calculated by a linear interpolation between LUT entry #512 (i.e. that for 1.0) and a bonus entry #513 (value for 3.0). I haven't seen this supported anywhere else, so would tend towards mirroring the last value into the extra supernumerary entry, i.e. emulate saturation for matrix output values to 1.0.
It's good you mention this, because I wrote a test assuming negative values would be clamped to (0.0..1.0) and the test passes :/ So maybe there is something fishy here...
I don't really know what to do about negative values as CTM output, since the doc I have here is silent on whether negative values are similarly axis-mirrored/sign-stripped, or whether they are instead clamped to 0.0. Either way, I'm not really sure it's behaviour we can rely upon to be portable.
Maybe we should compute both and verify at least one works?
We also discussed briefly the multiplication results. My tests show the Intel hardware seems to clamp the results. Let's say you have a 255 pixel going through a 0.5 coefficient, you should get 127.5 which would be rounded to 128. Intel hardware gives us 127. Same for a pixel at 255 and a coefficient of 0.25. You would get 63.75 so 64 rounded and my results show 63.
Again, computing both hypothesis might be a solution.
As a detail, the architecture I'm looking at has mixed granularity for the second (post-CTM) LUT: lower RGB-value entries have higher granularity (precision in LUT indexing), with lower granularity for higher entries. I don't think this is a problem though, since we can just decimate in the kernel (i.e. ignore every n'th LUT entry, to write a smaller LUT to hardware than we received to userspace).
This was the original plan (iirc one of the byt luts is like this too).
Anyway, beyond that, it seems there are a few things we agree on:
- optional pre-matrix ('degamma') per-channel LUT of variable
length, but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
- optional 3x3 matrix with input range [0.0,1.0], with output values
saturating to 1.0, and negative values producing undefined behaviour
- optional post-matrix ('gamma') per-channel LUT of variable length,
but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
Yeah I thought same here, we'll support 0.0..1.0 and anything that overshoots is platform-defined. I have no idea what to do with the overshoot lut table entries, so I guess we'll just leave those until someone screams badly ;-)
This would mean missing some Intel-specific features, but whether or not this is actually required I don't really know. At least it seems like it would be enough to implement standard ICC profile correction from Weston in a hardware-independent manner.
Thierry, Alex, did you have any comments or ideas on this?
Cheers, Daniel
Hi,
On 22 January 2016 at 16:21, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Jan 22, 2016 at 04:06:15PM +0000, Lionel Landwerlin wrote:
On 22/01/16 15:04, Daniel Stone wrote:
Now with everything just using split-gamma mode, I'm much happier with how this is looking. I took a look at some other architectures to see how this would fit, and also had a chat with Richard Hughes to clear some things up. AMD seems to support every possible mode under the sun, so should support any API we came up with. Most other architectures only implemented a single gamma table (equivalent to legacy gamma ramp), but there was one I have fairly detailed documentation for and also supports everything.
There might be some interest from others to have a single 12bits post csc gamma LUT. So I was going to submit another serie to enable this as a special case just for some generations of the Intel hardware once this work lands.
Obviously in order to program the hardware in that mode you would need to a userspace specially tuned for the particular platform on which it's running. This mode wouldn't be exposed through the current set of properties.
Yeah, the idea there is that we don't tell this userspace explicitly, but it can be used, i.e. a) in the config properties we advertise the sizes of the split gamma table b) but when userspace supplies an atomic request that matches the larger 12bit gamma, and no pre-ctm gamma, then we'll use that.
Right, exactly. My concern was mainly that it seemed incredibly easy to program broken state (12-bit LUT seemingly active both pre- and post-CTM). I honestly wasn't sure whether the code or my expectations were wrong though. But adding it back properly is just fine.
Intel supports two quite fun properties of matrix output. Firstly, the range is (-3.0..3.0) rather than the (0.0..1.0) you might expect. Negative values are axis-mirrored, i.e. lut2_index = fabs(matrix_output), thus giving us a LUT range of (0.0..3.0). Secondly, whilst (0.0..1.0) is represented by linear LUT entries, the LUT values for (1.0..3.0) are calculated by a linear interpolation between LUT entry #512 (i.e. that for 1.0) and a bonus entry #513 (value for 3.0). I haven't seen this supported anywhere else, so would tend towards mirroring the last value into the extra supernumerary entry, i.e. emulate saturation for matrix output values to 1.0.
It's good you mention this, because I wrote a test assuming negative values would be clamped to (0.0..1.0) and the test passes :/ So maybe there is something fishy here...
Hmm, the doc explicitly mentions the axis-mirroring/sign-stripping, but is either so weird or so incorrect that I think you'd probably only be able to tell empirically. ;)
I don't really know what to do about negative values as CTM output, since the doc I have here is silent on whether negative values are similarly axis-mirrored/sign-stripped, or whether they are instead clamped to 0.0. Either way, I'm not really sure it's behaviour we can rely upon to be portable.
Maybe we should compute both and verify at least one works?
Well, I'd have to write the entire CM support for that platform first, which might be a little while ... ;)
We also discussed briefly the multiplication results. My tests show the Intel hardware seems to clamp the results. Let's say you have a 255 pixel going through a 0.5 coefficient, you should get 127.5 which would be rounded to 128. Intel hardware gives us 127. Same for a pixel at 255 and a coefficient of 0.25. You would get 63.75 so 64 rounded and my results show 63.
Again, computing both hypothesis might be a solution.
The docs I have here are silent on what happens. Not much we can do about it really, so I think we can just leave it as platform-dependent.
As a detail, the architecture I'm looking at has mixed granularity for the second (post-CTM) LUT: lower RGB-value entries have higher granularity (precision in LUT indexing), with lower granularity for higher entries. I don't think this is a problem though, since we can just decimate in the kernel (i.e. ignore every n'th LUT entry, to write a smaller LUT to hardware than we received to userspace).
This was the original plan (iirc one of the byt luts is like this too).
Cool; it's the only thing I could think of which made any sense.
Anyway, beyond that, it seems there are a few things we agree on:
- optional pre-matrix ('degamma') per-channel LUT of variable
length, but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
- optional 3x3 matrix with input range [0.0,1.0], with output values
saturating to 1.0, and negative values producing undefined behaviour
- optional post-matrix ('gamma') per-channel LUT of variable length,
but (from a userspace point of view) fixed precision, input & output ranges 0.0..1.0
Yeah I thought same here, we'll support 0.0..1.0 and anything that overshoots is platform-defined. I have no idea what to do with the overshoot lut table entries, so I guess we'll just leave those until someone screams badly ;-)
Program the overshoot register (LUT entry #513) to the same as the final LUT entry (#512), and overshoot becomes saturation, i.e. desired semantics. Not really sure what to do with undershoot (clamp to 0 or mirror axes), but given that's completely hardware dependent and there doesn't seem much we can do about it, I'd leave it at documenting 'don't do that'.
Cheers, Daniel
Hi,
On 22 January 2016 at 15:04, Daniel Stone daniel@fooishbar.org wrote:
On 21 January 2016 at 15:03, Lionel Landwerlin lionel.g.landwerlin@intel.com wrote:
Hi,
This serie introduces pipe level color management through a set of properties attached to the CRTC. It also provides an implementation for some Intel platforms.
This serie is based of a previous set of patches by Shashank Sharma and takes into account of the comments by Daniel Stone & Daniel Vetter.
This is a lot more tractable than previous series, thanks!
One subtlety which did just come up on IRC and should probably be documented, is the interaction with the 'Broadcast Mode' property. Currently, Broadcast Mode logically sits on top of colour management: all tables passed into CM are assumed to be for 0-255 colour ranges, and setting limited BM instead clamps these tables at apply time to 16-235 (by manually adjusting them in the driver).
It'd be nice to document this, so someone doesn't implement this the other way around, or assuming that CM trumps BM, or ...
Cheers, Daniel
dri-devel@lists.freedesktop.org