On 2/13/19 7:21 PM, Helen Koike wrote:
On 2/13/19 5:18 PM, Nicholas Kazlauskas wrote:
The prepare_fb call always happens on new_plane_state.
The drm_atomic_helper_cleanup_planes checks to see if plane state pointer has changed when deciding to call cleanup_fb on either the new_plane_state or the old_plane_state.
For a non-async atomic commit the state pointer is swapped, so this helper calls prepare_fb on the new_plane_state and cleanup_fb on the old_plane_state. This makes sense, since we want to prepare the framebuffer we are going to use and cleanup the the framebuffer we are no longer using.
For the async atomic update helpers this differs. The async atomic update helpers perform in-place updates on the existing state. They call drm_atomic_helper_cleanup_planes but the state pointer is not swapped. This means that prepare_fb is called on the new_plane_state and cleanup_fb is called on the new_plane_state (not the old).
In the case where old_plane_state->fb == new_plane_state->fb then there should be no behavioral difference between an async update and a non-async commit. But there are issues that arise when old_plane_state->fb != new_plane_state->fb.
The first is that the new_plane_state->fb is immediately cleaned up after it has been prepared, so we're using a fb that we shouldn't be.
The second occurs during a sequence of async atomic updates and non-async regular atomic commits. Suppose there are two framebuffers being interleaved in a double-buffering scenario, fb1 and fb2:
- Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
We call cleanup_fb on fb2 twice in this example scenario, and any further use will result in use-after-free.
The simple fix to this problem is to block framebuffer changes in the drm_atomic_helper_async_check function for now.
Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Harry Wentland harry.wentland@amd.com Cc: Andrey Grodzovsky andrey.grodzovsky@amd.com Cc: stable@vger.kernel.org # v4.14+ Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update") Signed-off-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com
drivers/gpu/drm/drm_atomic_helper.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 54e2ae614dcc..f4290f6b0c38 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1602,6 +1602,15 @@ int drm_atomic_helper_async_check(struct drm_device *dev, old_plane_state->crtc != new_plane_state->crtc) return -EINVAL;
- /*
* FIXME: Since prepare_fb and cleanup_fb are always called on
* the new_plane_state for async updates we need to block framebuffer
* changes. This prevents use of a fb that's been cleaned up and
* double cleanups from occuring.
*/
- if (old_plane_state->fb != new_plane_state->fb)
return -EINVAL;
- funcs = plane->helper_private; if (!funcs->atomic_async_update) return -EINVAL;
Hi,
Thank you for working on this.
I have two points I would like to raise:
- cursor update is too slow:
I am working on the v8 of https://lore.kernel.org/patchwork/patch/949264/ "drm/i915: update cursors asynchronously through atomic".
But with this patch applied, the following tests fails:
cursor-vs-flip-atomic-transitions-varying-size cursor-vs-flip-toggle cursor-vs-flip-varying-size
with errors of type:
"CRITICAL: completed 97 cursor updated in a period of 30 flips, we expect to complete approximately 15360 updates, with the threshold set at 7680"
I guess it's because what should be async updates are being turned into non-async updates (which is slower).
Well, I guess we need to work on a proper solution for the FIXME above, this is just an observation/note.
- I have another possibly related issue:
From the tests I am making, unless I did some mistake or miss interpretation (which is possible since I am new to drm), it seems that there is a problem if an async update re-uses the same fb from a previous non-async commit, for example:
- Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2,
- Async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1 #
double prepare fb1
Sorry, this is solved by this patch, I meant this:
- Non-async commit, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb2, - Async commit, oldfb = fb1, newfb = fb1, prepare fb1, cleanup fb1 #double prepare fb1
But I was told that async commit should block if there is a pending commit, so this scenario shouldn't happen, but there is still something off (please see below).
The cursor through atomic patch was "solving" this by bypassing the prepare call with a:
@@ -13495,6 +13503,9 @@ intel_prepare_plane_fb(struct drm_plane *plane, struct drm_i915_gem_object *old_obj = intel_fb_obj(plane->state->fb); int ret;
if (new_state->state->async_update)
return 0;
if (old_obj) { struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(new_state->state,
Which looks really wrong to me, but without it, igt tests (plane_cursor_legacy, kms_cursor_legacy) fail, I get lots of I915_VMA_PIN_OVERFLOW, which suggest double pinning the frame buffers.
Please, ignore this error above, when I ran the test this patch (Block fb changes) wasn't applied when I got this error about pin overflow, but I still have issues:
With this patch (Block fb changes) applied I get several dmesg failures on WARN_ON(to_intel_plane_state(state)->vma) in the intel_plane_destroy_state() function.
https://people.collabora.com/~koike/results-5.0.0-rc1+-v8-tmp/html/problems....
But I don't get this errors with the hack below (that bypass intel_prepare_plane_fb if the buffer is already pinned).
I'm still not sure why, I would really appreciate if any one has any pointers.
Thanks Helen
I did this really ugly test to bypass the prepare call just if the buffer is already pinned: https://gitlab.collabora.com/koike/linux/commit/2d01d4af30347ff43fbac254d8ec...
With this hack igt tests pass.
As I am new to DRM, I would appreciate if someone could confirm that this is a real bug or please clarify if I am miss interpreting anything.
If this is a real bug, then if someone could give me pointers in how to properly fix this it would be great.
Thanks a lot Helen