Re: [PATCH v15 1/2] drm/tegra: dc: Support memory bandwidth management

14 Mar 2021

      On Thu, Mar 11, 2021 at 08:22:54PM +0300, Dmitry Osipenko wrote:
...
Display controller (DC) performs isochronous memory transfers, and thus,
has a requirement for a minimum memory bandwidth that shall be fulfilled,
otherwise framebuffer data can't be fetched fast enough and this results
in a DC's data-FIFO underflow that follows by a visual corruption.
[...]
...
+static unsigned long
+tegra_plane_overlap_mask(struct drm_crtc_state *state,

	 const struct drm_plane_state *plane_state)

+{

const struct drm_plane_state *other_state;
const struct tegra_plane *tegra;
unsigned long overlap_mask = 0;
struct drm_plane *plane;
struct drm_rect rect;

if (!plane_state->visible || !plane_state->fb)
return 0;

drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) {

[...]
...

/*
* Data-prefetch FIFO will easily help to overcome temporal memory

* pressure if other plane overlaps with the cursor plane.

*/

if (tegra_plane_is_cursor(plane_state) && overlap_mask)
return 0;

return overlap_mask;

+}
Since for cursor plane this always returns 0, you could test
tegra_plane_is_cursor() at the start of the function.
...
+static int tegra_crtc_calculate_memory_bandwidth(struct drm_crtc *crtc,

				 struct drm_atomic_state *state)

[...]
...

/*
* For overlapping planes pixel's data is fetched for each plane at

* the same time, hence bandwidths are accumulated in this case.

* This needs to be taken into account for calculating total bandwidth

* consumed by all planes.

*

* Here we get the overlapping state of each plane, which is a

* bitmask of plane indices telling with what planes there is an

* overlap. Note that bitmask[plane] includes BIT(plane) in order

* to make further code nicer and simpler.

*/

drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, new_state) {
tegra_state = to_const_tegra_plane_state(plane_state);

tegra = to_tegra_plane(plane);

if (WARN_ON_ONCE(tegra->index >= TEGRA_DC_LEGACY_PLANES_NUM))

	return -EINVAL;

plane_peak_bw[tegra->index] = tegra_state->peak_memory_bandwidth;

mask = tegra_plane_overlap_mask(new_state, plane_state);

overlap_mask[tegra->index] = mask;

if (hweight_long(mask) != 3)

	all_planes_overlap_simultaneously = false;

}

old_state = drm_atomic_get_old_crtc_state(state, crtc);
old_dc_state = to_const_dc_state(old_state);

/*
* Then we calculate maximum bandwidth of each plane state.

* The bandwidth includes the plane BW + BW of the "simultaneously"

* overlapping planes, where "simultaneously" means areas where DC

* fetches from the planes simultaneously during of scan-out process.

*

* For example, if plane A overlaps with planes B and C, but B and C

* don't overlap, then the peak bandwidth will be either in area where

* A-and-B or A-and-C planes overlap.

*

* The plane_peak_bw[] contains peak memory bandwidth values of

* each plane, this information is needed by interconnect provider

* in order to set up latency allowness based on the peak BW, see

* tegra_crtc_update_memory_bandwidth().

*/

for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) {
overlap_bw = 0;

for_each_set_bit(k, &overlap_mask[i], 3) {

	if (k == i)

		continue;

	if (all_planes_overlap_simultaneously)

		overlap_bw += plane_peak_bw[k];

	else

		overlap_bw = max(overlap_bw, plane_peak_bw[k]);

}

new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw;

/*

 * If plane's peak bandwidth changed (for example plane isn't

 * overlapped anymore) and plane isn't in the atomic state,

 * then add plane to the state in order to have the bandwidth

 * updated.

 */

if (old_dc_state->plane_peak_bw[i] !=

    new_dc_state->plane_peak_bw[i]) {

	plane = tegra_crtc_get_plane_by_index(crtc, i);

	if (!plane)

		continue;

	plane_state = drm_atomic_get_plane_state(state, plane);

	if (IS_ERR(plane_state))

		return PTR_ERR(plane_state);

}

}

[...]
Does it matter to which channel (plane) the peak bw is attached? Would
it still work if the first channel specified max(peak_bw of overlaps)
and others only zeroes?
...

/*
* Horizontal downscale needs a lower memory latency, which roughly

* depends on the scaled width.  Trying to tune latency of a memory

* client alone will likely result in a strong negative impact on

* other memory clients, hence we will request a higher bandwidth

* since latency depends on bandwidth.  This allows to prevent memory

* FIFO underflows for a large plane downscales, meanwhile allowing

* display to share bandwidth fairly with other memory clients.

*/

if (src_w > dst_w)
mul = (src_w - dst_w) * bpp / 2048 + 1;

else
mul = 1;

[...]
One point is unexplained yet: why is the multiplier proportional to a
*difference* between src and dst widths? Also, I would expect max (worst
case) is pixclock * read_size when src_w/dst_w >= read_size.
BTW, you could move this below and : if (src > dst_w) peak_bandwidth *= ...
...

/* average bandwidth in bytes/s */
avg_bandwidth  = (bpp * src_w * src_h * mul + 7) / 8;
avg_bandwidth *= drm_mode_vrefresh(&crtc_state->mode);

/* mode.clock in kHz, peak bandwidth in kbit/s */
peak_bandwidth = crtc_state->mode.clock * bpp * mul;

[...]
Best Regards
Michał Mirosław

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH v15 1/2] drm/tegra: dc: Support memory bandwidth management