On Wed, 10 Nov 2021 14:32:26 -0300 Igor Torrente igormtorrente@gmail.com wrote:
Hi Pekka,
On Tue, Nov 9, 2021 at 6:32 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 26 Oct 2021 08:34:00 -0300 Igor Torrente igormtorrente@gmail.com wrote:
Summary
This series of patches refactor some vkms components in order to introduce new formats to the planes and writeback connector.
Now in the blend function, the plane's pixels are converted to ARGB16161616 and then blended together.
The CRC is calculated based on the ARGB1616161616 buffer. And if required, this buffer is copied/converted to the writeback buffer format.
And to handle the pixel conversion, new functions were added to convert from a specific format to ARGB16161616 (the reciprocal is also true).
Tests
This patch series was tested using the following igt tests: -t ".*kms_plane.*" -t ".*kms_writeback.*" -t ".*kms_cursor_crc*" -t ".*kms_flip.*"
New tests passing
- pipe-A-cursor-size-change
- pipe-A-cursor-alpha-transparent
Performance
Following some optimization proposed by Pekka Paalanen, now the code runs way faster than V1 and slightly faster than the current implementation.
| Frametime | |:---------------:|:---------:|:--------------:|:------------:| | implmentation | Current | Per-pixel(V1) | Per-line(V2) | | frametime range | 8~22 ms | 32~56 ms | 6~19 ms | | Average | 10.0 ms | 35.8 ms | 8.6 ms |
Wow, that's much better than I expected.
What is your benchmark? That is, what program do you use and what operations does it trigger to produce these measurements? What are the sizes of all the planes/buffers involved? What kind of CPU was this ran on?
1 and 2) I just measured the frametime of the IGT test ".*kms_cursor_crc*" using jiffies. I Collected all the frametimes, put all of them into a spreadsheet, calculated some values and drew some histograms.
I mean, it is not the best benchmark, but at least give an idea of what is happening.
- The primary plane was 1024x768, but the cursor plane
varies between the tests. All XRGB_8888, if I'm not mistaken.
- I tested it on a Qemu VM running on the Intel core i5 4440. ~3.3GHz
Hi Igor,
alright, that analysis sounds fine, even though varying cursor plane size is casting some ambiguity on the results.
If you want to dig deeper into measuring this, I would suggest some scenarios if at all possible:
- large primary plane and large cursor plane with 100% overlap, to measure the raw pixel throughput
- large primary plane and small cursor plane with 100% overlap, to measure the efficiency of skipping pixels that do not need blending
- large primary plane and large cursor plane with only a little overlap (cursor largely off-screen), to measure the efficiency of skipping pixels that do not contribute to the end result at all
But that's only curiosity, I think your existing benchmarks sound perfectly fine as the difference is so big.
Thanks, pq