Hi,
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 36 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 22 deletions(-)
Several DRM/KMS atomic commits can run in parallel if they affect different CRTC. These commits share the global HVS state, so we have some code to make sure we run commits in sequence. This synchronization code is one of the first thing that runs in vc4_atomic_commit_tail().
Another constraints we have is that we need to make sure the HVS clock gets a boost during the commit. That code relies on clk_set_min_rate and will remove the old minimum and set a new one. We also need another, temporary, minimum for the duration of the commit.
The algorithm is thus to set a temporary minimum, drop the previous one, do the commit, and finally set the minimum for the current mode.
However, the part that sets the temporary minimum and drops the older one runs before the commit synchronization code.
Thus, under the proper conditions, we can end up mixing up the minimums and ending up with the wrong one for our current step.
To avoid it, let's move the clock setup in the protected section.
Fixes: d7d96c00e585 ("drm/vc4: hvs: Boost the core clock during modeset") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index f0b3e4cf5bce..764ddb41a4ce 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -353,9 +353,6 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) vc4_hvs_mask_underrun(dev, vc4_crtc_state->assigned_channel); }
- if (vc4->hvs->hvs5) - clk_set_min_rate(hvs->core_clk, 500000000); - old_hvs_state = vc4_hvs_get_old_global_state(state); if (!old_hvs_state) return; @@ -377,6 +374,9 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) drm_err(dev, "Timed out waiting for commit\n"); }
+ if (vc4->hvs->hvs5) + clk_set_min_rate(hvs->core_clk, 500000000); + drm_atomic_helper_commit_modeset_disables(dev, state);
vc4_ctm_commit(vc4, state);
The HVS global state functions return an error pointer, but in most cases we check if it's NULL, possibly resulting in an invalid pointer dereference.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 764ddb41a4ce..3f780c195749 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -354,7 +354,7 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) }
old_hvs_state = vc4_hvs_get_old_global_state(state); - if (!old_hvs_state) + if (IS_ERR(old_hvs_state)) return;
for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) { @@ -410,8 +410,8 @@ static int vc4_atomic_commit_setup(struct drm_atomic_state *state) unsigned int i;
hvs_state = vc4_hvs_get_new_global_state(state); - if (!hvs_state) - return -EINVAL; + if (WARN_ON(IS_ERR(hvs_state))) + return PTR_ERR(hvs_state);
for_each_new_crtc_in_state(state, crtc, crtc_state, i) { struct vc4_crtc_state *vc4_crtc_state = @@ -762,8 +762,8 @@ static int vc4_pv_muxing_atomic_check(struct drm_device *dev, unsigned int i;
hvs_new_state = vc4_hvs_get_global_state(state); - if (!hvs_new_state) - return -EINVAL; + if (IS_ERR(hvs_new_state)) + return PTR_ERR(hvs_new_state);
for (i = 0; i < ARRAY_SIZE(hvs_new_state->fifo_state); i++) if (!hvs_new_state->fifo_state[i].in_use)
Commit 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") introduced a global state for the HVS, with each FIFO storing the current CRTC commit so that we can properly synchronize commits.
However, the refcounting was off and we thus ended up leaking the drm_crtc_commit structure every commit. Add a drm_crtc_commit_put to prevent the leakage.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 3f780c195749..4847d1af399a 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -372,6 +372,8 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) ret = drm_crtc_commit_wait(old_hvs_state->fifo_state[channel].pending_commit); if (ret) drm_err(dev, "Timed out waiting for commit\n"); + + drm_crtc_commit_put(commit); }
if (vc4->hvs->hvs5)
Commit 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") introduced a wait on the previous commit done on a given HVS FIFO.
However, we never cleared that pointer once done. Since drm_crtc_commit_put can free the drm_crtc_commit structure directly if we were the last user, this means that it can lead to a use-after free if we were to duplicate the state, and that stale pointer would even be copied to the new state.
Set the pointer to NULL once we're done with the wait so that we don't carry over a pointer to a free'd structure.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 4847d1af399a..217a2009c651 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -374,6 +374,7 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) drm_err(dev, "Timed out waiting for commit\n");
drm_crtc_commit_put(commit); + old_hvs_state->fifo_state[channel].pending_commit = NULL; }
if (vc4->hvs->hvs5)
Our HVS global state, when duplicated, will also copy the pointer to the drm_crtc_commit (and increase the reference count) for each FIFO if the pointer is not NULL.
However, our atomic_setup function will overwrite that pointer without putting the reference back leading to a memory leak.
Since the commit is only relevant during the atomic commit process, it doesn't make sense to duplicate the reference to the commit anyway. Let's remove it.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 217a2009c651..6533f3360e75 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -671,12 +671,6 @@ vc4_hvs_channels_duplicate_state(struct drm_private_obj *obj)
for (i = 0; i < HVS_NUM_CHANNELS; i++) { state->fifo_state[i].in_use = old_state->fifo_state[i].in_use; - - if (!old_state->fifo_state[i].pending_commit) - continue; - - state->fifo_state[i].pending_commit = - drm_crtc_commit_get(old_state->fifo_state[i].pending_commit); }
return &state->base;
Our current code is supposed to serialise the commits by waiting for all the drm_crtc_commits associated to the previous HVS state.
However, assuming we have two CRTCs running and being configured and we configure each one alternatively, we end up in a situation where we're not waiting at all.
Indeed, starting with a state (state 0) where both CRTCs are running, and doing a commit (state 1) on the first CRTC (CRTC 0), we'll associate its commit to its assigned FIFO in vc4_hvs_state.
If we get a new commit (state 2), this time affecting the second CRTC (CRTC 1), the DRM core will allow both commits to execute in parallel (assuming they don't have any share resources).
Our code in vc4_atomic_commit_tail is supposed to make sure we only get one commit at a time and serialised by order of submission. It does so by using for_each_old_crtc_in_state, making sure that the CRTC has a FIFO assigned, is used, and has a commit pending. If it does, then we'll wait for the commit before going forward.
During the transition from state 0 to state 1, as our old CRTC state we get the CRTC 0 state 0, its commit, we wait for it, everything works fine.
During the transition from state 1 to state 2 though, the use of for_each_old_crtc_in_state is wrong. Indeed, while the code assumes it's returning the state of the CRTC in the old state (so CRTC 0 state 1), it actually returns the old state of the CRTC affected by the current commit, so CRTC 0 state 0 since it wasn't part of state 1.
Due to this, if we alternate between the configuration of CRTC 0 and CRTC 1, we never actually wait for anything since we should be waiting on the other every time, but it never is affected by the previous commit.
Change the logic to, at every commit, look at every FIFO in the previous HVS state, and if it's in use and has a commit associated to it, wait for that commit.
Fixes: 9ec03d7f1ed3 ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_kms.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 6533f3360e75..d7a2bcb7d723 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -337,10 +337,10 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) struct drm_device *dev = state->dev; struct vc4_dev *vc4 = to_vc4_dev(dev); struct vc4_hvs *hvs = vc4->hvs; - struct drm_crtc_state *old_crtc_state; struct drm_crtc_state *new_crtc_state; struct drm_crtc *crtc; struct vc4_hvs_state *old_hvs_state; + unsigned int channel; int i;
for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { @@ -357,15 +357,10 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state) if (IS_ERR(old_hvs_state)) return;
- for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) { - struct vc4_crtc_state *vc4_crtc_state = - to_vc4_crtc_state(old_crtc_state); - unsigned int channel = vc4_crtc_state->assigned_channel; + for (channel = 0; channel < HVS_NUM_CHANNELS; channel++) { + struct drm_crtc_commit *commit; int ret;
- if (channel == VC4_HVS_CHANNEL_DISABLED) - continue; - if (!old_hvs_state->fifo_state[channel].in_use) continue;
Maxime Ripard maxime@cerno.tech 於 2021年11月15日 週一 下午7:31寫道:
Hi,
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 36 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 22 deletions(-)
-- 2.33.1
Thanks to Maxime's information!
I tried to applied this patch series based on the latest mainline kernel at commit commit 8ab774587903 ("Merge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace"), which almost equals "tags/v5.16-rc1" and tested it on RPi 4B. However, the system hangs and becomes dead at the kernel message:
[drm] Initialized vc4 0.0.0 20140616 for gpu on minor 0
The full dmesg can be found at https://bugzilla.kernel.org/attachment.cgi?id=299603
If I revert the patch series to the original mainline kernel, system can boot up.
BR, Jian-Hong Pan
Hi,
On Wed, Nov 17, 2021 at 03:08:31PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月15日 週一 下午7:31寫道:
Hi,
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 36 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 22 deletions(-)
-- 2.33.1
Thanks to Maxime's information!
I tried to applied this patch series based on the latest mainline kernel at commit commit 8ab774587903 ("Merge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace"), which almost equals "tags/v5.16-rc1" and tested it on RPi 4B. However, the system hangs and becomes dead at the kernel message:
[drm] Initialized vc4 0.0.0 20140616 for gpu on minor 0
The full dmesg can be found at https://bugzilla.kernel.org/attachment.cgi?id=299603
If I revert the patch series to the original mainline kernel, system can boot up.
Can you share a bit more information on the boot setup you have? Do you have a display connected? If so, on both output or just a single one?
Thanks! Maxime
On Wed, Nov 17, 2021 at 09:24:54AM +0100, Maxime Ripard wrote:
Hi,
On Wed, Nov 17, 2021 at 03:08:31PM +0800, Jian-Hong Pan wrote:
Maxime Ripard maxime@cerno.tech 於 2021年11月15日 週一 下午7:31寫道:
Hi,
The conversion to DRM commit helpers (f3c420fe19f8, "drm/vc4: kms: Convert to atomic helpers") introduced a number of issues in corner cases, most of them showing themselves in the form of either a vblank timeout or use-after-free error.
These patches should fix most of them, some of them still being debugged.
Maxime
Maxime Ripard (6): drm/vc4: kms: Wait for the commit before increasing our clock rate drm/vc4: kms: Fix return code check drm/vc4: kms: Add missing drm_crtc_commit_put drm/vc4: kms: Clear the HVS FIFO commit pointer once done drm/vc4: kms: Don't duplicate pending commit drm/vc4: kms: Fix previous HVS commit wait
drivers/gpu/drm/vc4/vc4_kms.c | 36 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 22 deletions(-)
-- 2.33.1
Thanks to Maxime's information!
I tried to applied this patch series based on the latest mainline kernel at commit commit 8ab774587903 ("Merge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace"), which almost equals "tags/v5.16-rc1" and tested it on RPi 4B. However, the system hangs and becomes dead at the kernel message:
[drm] Initialized vc4 0.0.0 20140616 for gpu on minor 0
The full dmesg can be found at https://bugzilla.kernel.org/attachment.cgi?id=299603
If I revert the patch series to the original mainline kernel, system can boot up.
Can you share a bit more information on the boot setup you have? Do you have a display connected? If so, on both output or just a single one?
Nevermind, I found what the issue is. I'll send a v2 shortly.
Maxime
dri-devel@lists.freedesktop.org