On Tue, May 2, 2017 at 5:01 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 28, 2017 at 8:05 PM, Rob Clark robdclark@gmail.com wrote:
The ->preclose() hook is a good place to block for pending atomic updates. We can't do this in ->postclose(), as it needs to happen before drm_fb_release(). Otherwise, since we have already swapped state (in the case of a non-blocking atomic update), this means that the plane_state->fb will be released and cleared before we wait for fences from the atomic-commit wq.
There are probably more complex solutions possible. But since already scheduled atomic update, possibly blocking on already scheduled gpu/etc fences, will complete eventually (assuming nothing catches fire), so the sanest thing seems to be just block until already scheduled atomic updates complete before tearing things down.
Fixes:
WARNING: CPU: 1 PID: 69 at ../drivers/gpu/drm/drm_atomic_helper.c:1061 drm_atomic_helper_wait_for_fences+0xe0/0xf8 Modules linked in:
CPU: 1 PID: 69 Comm: kworker/1:1 Tainted: G W 4.11.0-rc8+ #1187 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) Workqueue: events drm_mode_rmfb_work_fn task: ffffffc036560d00 task.stack: ffffffc036550000 PC is at drm_atomic_helper_wait_for_fences+0xe0/0xf8 LR is at complete_commit.isra.1+0x44/0x1c0 pc : [<ffffff80084f6040>] lr : [<ffffff800854176c>] pstate: 20000145 sp : ffffffc036553b60 x29: ffffffc036553b60 x28: ffffffc0264e6a00 x27: ffffffc035659000 x26: 0000000000000000 x25: ffffffc0240e8000 x24: 0000000000000038 x23: 0000000000000000 x22: ffffff800858f200 x21: ffffffc0240e8000 x20: ffffffc02f56a800 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: ffffffc00a192700 x13: 0000000000000004 x12: 0000000000000000 x11: ffffff80089a1690 x10: 00000000000008f0 x9 : ffffffc036553b20 x8 : ffffffc036561650 x7 : ffffffc03fe6cb40 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000002 x3 : ffffffc035659000 x2 : ffffffc0240e8c80 x1 : 0000000000000000 x0 : ffffffc02adbe588
---[ end trace 13aeec77c3fb55e2 ]--- Call trace: Exception stack(0xffffffc036553990 to 0xffffffc036553ac0) 3980: 0000000000000000 0000008000000000 39a0: ffffffc036553b60 ffffff80084f6040 0000000000004ff0 0000000000000038 39c0: ffffffc0365539d0 ffffff800857e098 ffffffc036553a00 ffffff800857e1b0 39e0: ffffffc036553a10 ffffff800857c554 ffffffc0365e8400 ffffffc0365e8400 3a00: ffffffc036553a20 ffffff8008103358 000000000001aad7 ffffff800851b72c 3a20: ffffffc036553a50 ffffff80080e9228 ffffffc02adbe588 0000000000000000 3a40: ffffffc0240e8c80 ffffffc035659000 0000000000000002 0000000000000001 3a60: 0000000000000000 ffffffc03fe6cb40 ffffffc036561650 ffffffc036553b20 3a80: 00000000000008f0 ffffff80089a1690 0000000000000000 0000000000000004 3aa0: ffffffc00a192700 0000000000000000 0000000000000000 0000000000000000 [<ffffff80084f6040>] drm_atomic_helper_wait_for_fences+0xe0/0xf8 [<ffffff800854176c>] complete_commit.isra.1+0x44/0x1c0 [<ffffff8008541c64>] msm_atomic_commit+0x32c/0x350 [<ffffff8008516230>] drm_atomic_commit+0x50/0x60 [<ffffff8008517548>] drm_atomic_remove_fb+0x158/0x250 [<ffffff80085186d0>] drm_framebuffer_remove+0x50/0x158 [<ffffff8008518818>] drm_mode_rmfb_work_fn+0x40/0x58 [<ffffff80080d5668>] process_one_work+0x1d0/0x378 [<ffffff80080d5a54>] worker_thread+0x244/0x488 [<ffffff80080db7fc>] kthread+0xfc/0x128 [<ffffff8008082ec0>] ret_from_fork+0x10/0x50
Reported-by: Stanimir Varbanov stanimir.varbanov@linaro.org Signed-off-by: Rob Clark robdclark@gmail.com
The hunk that removes the comment about ->preclose() included in this patch to challenge the assumption that ->preclose() shouldn't exist ;-)
And I'm going to challenge your patch here. Both fences and framebuffers and atomic commits are refcounted. If you go boom on them when userspace closes the fd, you have a refcount bug. We don't fix those by flusing stuff :-)
So, it isn't a refcount'ing but, but something much funnier..
It seems that mdp5 had custom plane state with it's own dup_state fxn, pre-dating the addition of __drm_atomic_helper_plane_duplicate_state(), and when the helper was introduced it wasn't retrofitted. Which was all good until the fence ptr is added to base plane_state struct. So this means that plane_state->fence was getting copied over into the duplicated plane_state.
So the atomic rmfb code would sometimes manage to copy the fence ptr if there is another pending update which had already swapped state but not yet committed.
BR, -R
Please add a pair of get/put() calls at the right place instead.
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch