The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
- /* - * Legacy cursor ioctls are completely unsynced, and userspace - * relies on that (by doing tons of cursor updates). - */ - if (old_state->legacy_cursor_update) - return; - for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue; @@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
- /* Legacy cursor updates are fully unsynced. */ - if (state->legacy_cursor_update) { - complete_all(&commit->flip_done); - continue; - } - if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
With the removal of helper support it doesn't do anything anymore. Also, we already have async plane update code in vc4.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Eric Anholt eric@anholt.net Cc: Maxime Ripard mripard@kernel.org --- drivers/gpu/drm/vc4/vc4_kms.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 149825ff5df8..bf0da77ab2e6 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -353,12 +353,6 @@ static int vc4_atomic_commit(struct drm_device *dev, return 0; }
- /* We know for sure we don't want an async update here. Set - * state->legacy_cursor_update to false to prevent - * drm_atomic_helper_setup_commit() from auto-completing - * commit->flip_done. - */ - state->legacy_cursor_update = false; ret = drm_atomic_helper_setup_commit(state, nonblock); if (ret) return ret;
On Wed, Oct 21, 2020 at 06:32:41PM +0200, Daniel Vetter wrote:
With the removal of helper support it doesn't do anything anymore. Also, we already have async plane update code in vc4.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Eric Anholt eric@anholt.net Cc: Maxime Ripard mripard@kernel.org
Acked-by: Maxime Ripard mripard@kernel.org
Maxime
It's the horror and shouldn't be used. Realized we're not clear on this in a discussion with Rob about what msm is doing to better support async commits.
v2: Refine existing todo item to include this (Thomas)
Cc: Sean Paul sean@poorly.run Cc: Rob Clark robdclark@gmail.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch --- Documentation/gpu/todo.rst | 4 ++++ include/drm/drm_atomic.h | 12 +++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst index 700637e25ecd..6b224ef14455 100644 --- a/Documentation/gpu/todo.rst +++ b/Documentation/gpu/todo.rst @@ -105,6 +105,10 @@ converted over to the new infrastructure. One issue with the helpers is that they require that drivers handle completion events for atomic commits correctly. But fixing these bugs is good anyway.
+Somewhat related is the legacy_cursor_update hack, which should be replaced with +the new atomic_async_check/commit functionality in the helpers in drivers that +still look at that flag. + Contact: Daniel Vetter, respective driver maintainers
Level: Advanced diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h index d07c851d255b..413fd0ca56a8 100644 --- a/include/drm/drm_atomic.h +++ b/include/drm/drm_atomic.h @@ -308,7 +308,6 @@ struct __drm_private_objs_state { * struct drm_atomic_state - the global state object for atomic updates * @ref: count of all references to this state (will not be freed until zero) * @dev: parent DRM device - * @legacy_cursor_update: hint to enforce legacy cursor IOCTL semantics * @async_update: hint for asynchronous plane update * @planes: pointer to array of structures with per-plane data * @crtcs: pointer to array of CRTC pointers @@ -336,6 +335,17 @@ struct drm_atomic_state { * drm_atomic_crtc_needs_modeset(). */ bool allow_modeset : 1; + /** + * @legacy_cursor_update: + * + * Hint to enforce legacy cursor IOCTL semantics. + * + * WARNING: This is thoroughly broken and pretty much impossible to + * implement correctly. Drivers must ignore this and should instead + * implement &drm_plane_helper_funcs.atomic_async_check and + * &drm_plane_helper_funcs.atomic_async_commit hooks. New users of this + * flag are not allowed. + */ bool legacy_cursor_update : 1; bool async_update : 1; /**
On 2020-10-21 12:32 p.m., Daniel Vetter wrote:
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
I'm fine with the idea but it looks like we need modification to amdgpu to not break anything:
if (state->legacy_cursor_update) { /* ... */ state->async_update = !drm_atomic_helper_async_check(dev, state);
We only check async update for legacy_cursor_updates here which won't cover the atomic path. I think it's safe to drop the check here but that should probably be done before or as part of this series.
Regards, Nicholas Kazlauskas
drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
/*
* Legacy cursor ioctls are completely unsynced, and userspace
* relies on that (by doing tons of cursor updates).
*/
- if (old_state->legacy_cursor_update)
return;
- for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue;
@@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
/* Legacy cursor updates are fully unsynced. */
if (state->legacy_cursor_update) {
complete_all(&commit->flip_done);
continue;
}
- if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
On Thu, Oct 22, 2020 at 09:36:23AM -0400, Kazlauskas, Nicholas wrote:
On 2020-10-21 12:32 p.m., Daniel Vetter wrote:
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
I'm fine with the idea but it looks like we need modification to amdgpu to not break anything:
if (state->legacy_cursor_update) { /* ... */ state->async_update = !drm_atomic_helper_async_check(dev, state);
We only check async update for legacy_cursor_updates here which won't cover the atomic path. I think it's safe to drop the check here but that should probably be done before or as part of this series.
This part is fine, you're essentially duplicating what the helpers are doing too. I'm not sure whether we should lift this to core atomic semantics or something else, but should be all ok as-is. Might still be good to test this in case something isn't 100% complete and amdgpu atomic commit still relies on legacy_cursor_update semantics somewhere.
But after this patch your atomic code and atomic helpers check/commit functions match (I think), so we /should/ be good.
Cheers, Daniel
Regards, Nicholas Kazlauskas
drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
/*
* Legacy cursor ioctls are completely unsynced, and userspace
* relies on that (by doing tons of cursor updates).
*/
- if (old_state->legacy_cursor_update)
return;
- for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue;
@@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
/* Legacy cursor updates are fully unsynced. */
if (state->legacy_cursor_update) {
complete_all(&commit->flip_done);
continue;
}
- if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
On Wed, Oct 21, 2020 at 9:32 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
This *completely* destroys fps when there is cursor movement, it would be a pretty bad regression, so nak
BR, -R
drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
/*
* Legacy cursor ioctls are completely unsynced, and userspace
* relies on that (by doing tons of cursor updates).
*/
if (old_state->legacy_cursor_update)
return;
for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue;
@@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
/* Legacy cursor updates are fully unsynced. */
if (state->legacy_cursor_update) {
complete_all(&commit->flip_done);
continue;
}
if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Oct 22, 2020 at 10:02 AM Rob Clark robdclark@gmail.com wrote:
On Wed, Oct 21, 2020 at 9:32 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
This *completely* destroys fps when there is cursor movement, it would be a pretty bad regression, so nak
Which I *guess* is due to dpu not wiring up the plane->async_* funcs, effectively making cursor updates synchronous.. but it will take some time to sort out :-(
BR, -R
drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
/*
* Legacy cursor ioctls are completely unsynced, and userspace
* relies on that (by doing tons of cursor updates).
*/
if (old_state->legacy_cursor_update)
return;
for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue;
@@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
/* Legacy cursor updates are fully unsynced. */
if (state->legacy_cursor_update) {
complete_all(&commit->flip_done);
continue;
}
if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Oct 22, 2020 at 7:22 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Oct 22, 2020 at 10:02 AM Rob Clark robdclark@gmail.com wrote:
On Wed, Oct 21, 2020 at 9:32 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
This *completely* destroys fps when there is cursor movement, it would be a pretty bad regression, so nak
Which I *guess* is due to dpu not wiring up the plane->async_* funcs, effectively making cursor updates synchronous.. but it will take some time to sort out :-(
Does something like the below (not even compile tested) get dpu back in order?
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 561bfa48841c..ec8b4f74da49 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -215,6 +215,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) /* async updates are limited to single-crtc updates: */ WARN_ON(crtc_mask != drm_crtc_mask(async_crtc));
+ complete_all(async_crtc->state->flip_done); + /* * Start timer if we don't already have an update pending * on this crtc:
That way we could perhaps still move ahead with removing the hacks from shared helpers, and msm-dpu can keep doing what it does. The other hunk is in a function that dpu code doesn't even use, so can't see how that would change anything. -Daniel
BR, -R
drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
/*
* Legacy cursor ioctls are completely unsynced, and userspace
* relies on that (by doing tons of cursor updates).
*/
if (old_state->legacy_cursor_update)
return;
for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue;
@@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
/* Legacy cursor updates are fully unsynced. */
if (state->legacy_cursor_update) {
complete_all(&commit->flip_done);
continue;
}
if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Oct 22, 2020 at 12:16 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Thu, Oct 22, 2020 at 7:22 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Oct 22, 2020 at 10:02 AM Rob Clark robdclark@gmail.com wrote:
On Wed, Oct 21, 2020 at 9:32 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
This *completely* destroys fps when there is cursor movement, it would be a pretty bad regression, so nak
Which I *guess* is due to dpu not wiring up the plane->async_* funcs, effectively making cursor updates synchronous.. but it will take some time to sort out :-(
Does something like the below (not even compile tested) get dpu back in order?
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 561bfa48841c..ec8b4f74da49 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -215,6 +215,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) /* async updates are limited to single-crtc updates: */ WARN_ON(crtc_mask != drm_crtc_mask(async_crtc));
complete_all(async_crtc->state->flip_done);
/* * Start timer if we don't already have an update pending * on this crtc:
That way we could perhaps still move ahead with removing the hacks from shared helpers, and msm-dpu can keep doing what it does. The other hunk is in a function that dpu code doesn't even use, so can't see how that would change anything.
That causes massive explosions... angering WARN_ON(dpu_crtc->event);
Seems it is probably the right idea but the wrong place? I'll try to make some time in next few days to look at this more, but juggling a bunch of different things atm (and I guess at any rate this won't be a 5.10 thing, so we have a bit of time)
BR, -R
-Daniel
BR, -R
drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
/*
* Legacy cursor ioctls are completely unsynced, and userspace
* relies on that (by doing tons of cursor updates).
*/
if (old_state->legacy_cursor_update)
return;
for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue;
@@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
/* Legacy cursor updates are fully unsynced. */
if (state->legacy_cursor_update) {
complete_all(&commit->flip_done);
continue;
}
if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Fri, Oct 23, 2020 at 5:02 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Oct 22, 2020 at 12:16 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Thu, Oct 22, 2020 at 7:22 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Oct 22, 2020 at 10:02 AM Rob Clark robdclark@gmail.com wrote:
On Wed, Oct 21, 2020 at 9:32 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
This *completely* destroys fps when there is cursor movement, it would be a pretty bad regression, so nak
Which I *guess* is due to dpu not wiring up the plane->async_* funcs, effectively making cursor updates synchronous.. but it will take some time to sort out :-(
Does something like the below (not even compile tested) get dpu back in order?
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 561bfa48841c..ec8b4f74da49 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -215,6 +215,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) /* async updates are limited to single-crtc updates: */ WARN_ON(crtc_mask != drm_crtc_mask(async_crtc));
complete_all(async_crtc->state->flip_done);
/* * Start timer if we don't already have an update pending * on this crtc:
That way we could perhaps still move ahead with removing the hacks from shared helpers, and msm-dpu can keep doing what it does. The other hunk is in a function that dpu code doesn't even use, so can't see how that would change anything.
That causes massive explosions... angering WARN_ON(dpu_crtc->event);
Seems it is probably the right idea but the wrong place? I'll try to make some time in next few days to look at this more, but juggling a bunch of different things atm (and I guess at any rate this won't be a 5.10 thing, so we have a bit of time)
Yeah we have time for this, legacy_cursor_update being a mistake in atomic has been a thorn to my ego for years, it's not going to get worse with a bit more time itching :-) It's more that I want to really make sure we don't spread this further, since the hacks in atomic helpers really break the entire commit helper model quite badly all over.
So maybe just ack on the documentation patch interim, while we figure out something at leasure? I've also broken i915, so maybe I figure out meanwhile how to reapply the duct-tape there. -Daniel
BR, -R
-Daniel
BR, -R
drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
/*
* Legacy cursor ioctls are completely unsynced, and userspace
* relies on that (by doing tons of cursor updates).
*/
if (old_state->legacy_cursor_update)
return;
for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue;
@@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
/* Legacy cursor updates are fully unsynced. */
if (state->legacy_cursor_update) {
complete_all(&commit->flip_done);
continue;
}
if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Fri, Oct 23, 2020 at 9:00 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Fri, Oct 23, 2020 at 5:02 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Oct 22, 2020 at 12:16 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Thu, Oct 22, 2020 at 7:22 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Oct 22, 2020 at 10:02 AM Rob Clark robdclark@gmail.com wrote:
On Wed, Oct 21, 2020 at 9:32 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
This *completely* destroys fps when there is cursor movement, it would be a pretty bad regression, so nak
Which I *guess* is due to dpu not wiring up the plane->async_* funcs, effectively making cursor updates synchronous.. but it will take some time to sort out :-(
Does something like the below (not even compile tested) get dpu back in order?
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 561bfa48841c..ec8b4f74da49 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -215,6 +215,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) /* async updates are limited to single-crtc updates: */ WARN_ON(crtc_mask != drm_crtc_mask(async_crtc));
complete_all(async_crtc->state->flip_done);
/* * Start timer if we don't already have an update pending * on this crtc:
That way we could perhaps still move ahead with removing the hacks from shared helpers, and msm-dpu can keep doing what it does. The other hunk is in a function that dpu code doesn't even use, so can't see how that would change anything.
That causes massive explosions... angering WARN_ON(dpu_crtc->event);
Seems it is probably the right idea but the wrong place? I'll try to make some time in next few days to look at this more, but juggling a bunch of different things atm (and I guess at any rate this won't be a 5.10 thing, so we have a bit of time)
Yeah we have time for this, legacy_cursor_update being a mistake in atomic has been a thorn to my ego for years, it's not going to get worse with a bit more time itching :-) It's more that I want to really make sure we don't spread this further, since the hacks in atomic helpers really break the entire commit helper model quite badly all over.
So maybe just ack on the documentation patch interim, while we figure out something at leasure? I've also broken i915, so maybe I figure out meanwhile how to reapply the duct-tape there.
yeah, makes sense, a-b for doc patch
BR, -R
-Daniel
BR, -R
-Daniel
BR, -R
drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
/*
* Legacy cursor ioctls are completely unsynced, and userspace
* relies on that (by doing tons of cursor updates).
*/
if (old_state->legacy_cursor_update)
return;
for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue;
@@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
/* Legacy cursor updates are fully unsynced. */
if (state->legacy_cursor_update) {
complete_all(&commit->flip_done);
continue;
}
if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL);
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
With the removal of helper support it doesn't do anything anymore. Also, we already have async plane update code in vc4.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Eric Anholt eric@anholt.net Cc: Maxime Ripard mripard@kernel.org --- drivers/gpu/drm/vc4/vc4_kms.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 149825ff5df8..bf0da77ab2e6 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -353,12 +353,6 @@ static int vc4_atomic_commit(struct drm_device *dev, return 0; }
- /* We know for sure we don't want an async update here. Set - * state->legacy_cursor_update to false to prevent - * drm_atomic_helper_setup_commit() from auto-completing - * commit->flip_done. - */ - state->legacy_cursor_update = false; ret = drm_atomic_helper_setup_commit(state, nonblock); if (ret) return ret;
It's the horror and shouldn't be used. Realized we're not clear on this in a discussion with Rob about what msm is doing to better support async commits.
v2: Refine existing todo item to include this (Thomas)
Cc: Sean Paul sean@poorly.run Cc: Rob Clark robdclark@gmail.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch --- Documentation/gpu/todo.rst | 4 ++++ include/drm/drm_atomic.h | 12 +++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst index 700637e25ecd..6b224ef14455 100644 --- a/Documentation/gpu/todo.rst +++ b/Documentation/gpu/todo.rst @@ -105,6 +105,10 @@ converted over to the new infrastructure. One issue with the helpers is that they require that drivers handle completion events for atomic commits correctly. But fixing these bugs is good anyway.
+Somewhat related is the legacy_cursor_update hack, which should be replaced with +the new atomic_async_check/commit functionality in the helpers in drivers that +still look at that flag. + Contact: Daniel Vetter, respective driver maintainers
Level: Advanced diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h index d07c851d255b..413fd0ca56a8 100644 --- a/include/drm/drm_atomic.h +++ b/include/drm/drm_atomic.h @@ -308,7 +308,6 @@ struct __drm_private_objs_state { * struct drm_atomic_state - the global state object for atomic updates * @ref: count of all references to this state (will not be freed until zero) * @dev: parent DRM device - * @legacy_cursor_update: hint to enforce legacy cursor IOCTL semantics * @async_update: hint for asynchronous plane update * @planes: pointer to array of structures with per-plane data * @crtcs: pointer to array of CRTC pointers @@ -336,6 +335,17 @@ struct drm_atomic_state { * drm_atomic_crtc_needs_modeset(). */ bool allow_modeset : 1; + /** + * @legacy_cursor_update: + * + * Hint to enforce legacy cursor IOCTL semantics. + * + * WARNING: This is thoroughly broken and pretty much impossible to + * implement correctly. Drivers must ignore this and should instead + * implement &drm_plane_helper_funcs.atomic_async_check and + * &drm_plane_helper_funcs.atomic_async_commit hooks. New users of this + * flag are not allowed. + */ bool legacy_cursor_update : 1; bool async_update : 1; /**
fs_reclaim_acquire/release nicely catch recursion issues when allocating GFP_KERNEL memory against shrinkers (which gpu drivers tend to use to keep the excessive caches in check). For mmu notifier recursions we do have lockdep annotations since 23b68395c7c7 ("mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end").
But these only fire if a path actually results in some pte invalidation - for most small allocations that's very rarely the case. The other trouble is that pte invalidation can happen any time when __GFP_RECLAIM is set. Which means only really GFP_ATOMIC is a safe choice, GFP_NOIO isn't good enough to avoid potential mmu notifier recursion.
I was pondering whether we should just do the general annotation, but there's always the risk for false positives. Plus I'm assuming that the core fs and io code is a lot better reviewed and tested than random mmu notifier code in drivers. Hence why I decide to only annotate for that specific case.
Furthermore even if we'd create a lockdep map for direct reclaim, we'd still need to explicit pull in the mmu notifier map - there's a lot more places that do pte invalidation than just direct reclaim, these two contexts arent the same.
Note that the mmu notifiers needing their own independent lockdep map is also the reason we can't hold them from fs_reclaim_acquire to fs_reclaim_release - it would nest with the acquistion in the pte invalidation code, causing a lockdep splat. And we can't remove the annotations from pte invalidation and all the other places since they're called from many other places than page reclaim. Hence we can only do the equivalent of might_lock, but on the raw lockdep map.
With this we can also remove the lockdep priming added in 66204f1d2d1b ("mm/mmu_notifiers: prime lockdep") since the new annotations are strictly more powerful.
v2: Review from Thomas Hellstrom: - unbotch the fs_reclaim context check, I accidentally inverted it, but it didn't blow up because I inverted it immediately - fix compiling for !CONFIG_MMU_NOTIFIER
v3: Unbreak the PF_MEMALLOC_ context flags. Thanks to Qian for the report and Dave for explaining what I failed to see.
Cc: linux-fsdevel@vger.kernel.org Cc: Dave Chinner david@fromorbit.com Cc: Qian Cai cai@lca.pw Cc: linux-xfs@vger.kernel.org Cc: Thomas Hellström (Intel) thomas_os@shipmail.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Jason Gunthorpe jgg@mellanox.com Cc: linux-mm@kvack.org Cc: linux-rdma@vger.kernel.org Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- mm/mmu_notifier.c | 7 ------- mm/page_alloc.c | 31 ++++++++++++++++++++----------- 2 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 4fc918163dd3..6bf798373eb0 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -612,13 +612,6 @@ int __mmu_notifier_register(struct mmu_notifier *subscription, mmap_assert_write_locked(mm); BUG_ON(atomic_read(&mm->mm_users) <= 0);
- if (IS_ENABLED(CONFIG_LOCKDEP)) { - fs_reclaim_acquire(GFP_KERNEL); - lock_map_acquire(&__mmu_notifier_invalidate_range_start_map); - lock_map_release(&__mmu_notifier_invalidate_range_start_map); - fs_reclaim_release(GFP_KERNEL); - } - if (!mm->notifier_subscriptions) { /* * kmalloc cannot be called under mm_take_all_locks(), but we diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 780c8f023b28..2edd3fd447fa 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -57,6 +57,7 @@ #include <trace/events/oom.h> #include <linux/prefetch.h> #include <linux/mm_inline.h> +#include <linux/mmu_notifier.h> #include <linux/migrate.h> #include <linux/hugetlb.h> #include <linux/sched/rt.h> @@ -4207,10 +4208,8 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla static struct lockdep_map __fs_reclaim_map = STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
-static bool __need_fs_reclaim(gfp_t gfp_mask) +static bool __need_reclaim(gfp_t gfp_mask) { - gfp_mask = current_gfp_context(gfp_mask); - /* no reclaim without waiting on it */ if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) return false; @@ -4219,10 +4218,6 @@ static bool __need_fs_reclaim(gfp_t gfp_mask) if (current->flags & PF_MEMALLOC) return false;
- /* We're only interested __GFP_FS allocations for now */ - if (!(gfp_mask & __GFP_FS)) - return false; - if (gfp_mask & __GFP_NOLOCKDEP) return false;
@@ -4241,15 +4236,29 @@ void __fs_reclaim_release(void)
void fs_reclaim_acquire(gfp_t gfp_mask) { - if (__need_fs_reclaim(gfp_mask)) - __fs_reclaim_acquire(); + gfp_mask = current_gfp_context(gfp_mask); + + if (__need_reclaim(gfp_mask)) { + if (gfp_mask & __GFP_FS) + __fs_reclaim_acquire(); + +#ifdef CONFIG_MMU_NOTIFIER + lock_map_acquire(&__mmu_notifier_invalidate_range_start_map); + lock_map_release(&__mmu_notifier_invalidate_range_start_map); +#endif + + } } EXPORT_SYMBOL_GPL(fs_reclaim_acquire);
void fs_reclaim_release(gfp_t gfp_mask) { - if (__need_fs_reclaim(gfp_mask)) - __fs_reclaim_release(); + gfp_mask = current_gfp_context(gfp_mask); + + if (__need_reclaim(gfp_mask)) { + if (gfp_mask & __GFP_FS) + __fs_reclaim_release(); + } } EXPORT_SYMBOL_GPL(fs_reclaim_release); #endif
Extracted from slab.h, which seems to have the most complete version including the correct might_sleep() check. Roll it out to slob.c.
Motivated by a discussion with Paul about possibly changing call_rcu behaviour to allocate memory, but only roughly every 500th call.
There are a lot fewer places in the kernel that care about whether allocating memory is allowed or not (due to deadlocks with reclaim code) than places that care whether sleeping is allowed. But debugging these also tends to be a lot harder, so nice descriptive checks could come in handy. I might have some use eventually for annotations in drivers/gpu.
Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking does not consult the PF_MEMALLOC flags. But there is no flag equivalent for GFP_NOWAIT, hence this check can't go wrong due to memalloc_no*_save/restore contexts.
Cc: Paul E. McKenney paulmck@kernel.org Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Michel Lespinasse walken@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Waiman Long longman@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Randy Dunlap rdunlap@infradead.org Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: Dave Chinner david@fromorbit.com Cc: Qian Cai cai@lca.pw Cc: linux-xfs@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- include/linux/sched/mm.h | 16 ++++++++++++++++ mm/slab.h | 5 +---- mm/slob.c | 6 ++---- 3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index f889e332912f..2b0037abac0b 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -205,6 +205,22 @@ static inline void fs_reclaim_acquire(gfp_t gfp_mask) { } static inline void fs_reclaim_release(gfp_t gfp_mask) { } #endif
+/** + * might_alloc - Marks possible allocation sites + * @gfp_mask: gfp_t flags that would be use to allocate + * + * Similar to might_sleep() and other annotations this can be used in functions + * that might allocate, but often dont. Compiles to nothing without + * CONFIG_LOCKDEP. Includes a conditional might_sleep() if @gfp allows blocking. + */ +static inline void might_alloc(gfp_t gfp_mask) +{ + fs_reclaim_acquire(gfp_mask); + fs_reclaim_release(gfp_mask); + + might_sleep_if(gfpflags_allow_blocking(gfp_mask)); +} + /** * memalloc_noio_save - Marks implicit GFP_NOIO allocation scope. * diff --git a/mm/slab.h b/mm/slab.h index 6cc323f1313a..fedd789b2270 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -492,10 +492,7 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, { flags &= gfp_allowed_mask;
- fs_reclaim_acquire(flags); - fs_reclaim_release(flags); - - might_sleep_if(gfpflags_allow_blocking(flags)); + might_alloc(flags);
if (should_failslab(s, flags)) return NULL; diff --git a/mm/slob.c b/mm/slob.c index 7cc9805c8091..8d4bfa46247f 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -474,8 +474,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
gfp &= gfp_allowed_mask;
- fs_reclaim_acquire(gfp); - fs_reclaim_release(gfp); + might_alloc(gfp);
if (size < PAGE_SIZE - minalign) { int align = minalign; @@ -597,8 +596,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
flags &= gfp_allowed_mask;
- fs_reclaim_acquire(flags); - fs_reclaim_release(flags); + might_alloc(flags);
if (c->size < PAGE_SIZE) { b = slob_alloc(c->size, flags, c->align, node, 0);
On 10/23/20 2:21 PM, Daniel Vetter wrote:
Extracted from slab.h, which seems to have the most complete version including the correct might_sleep() check. Roll it out to slob.c.
Motivated by a discussion with Paul about possibly changing call_rcu behaviour to allocate memory, but only roughly every 500th call.
There are a lot fewer places in the kernel that care about whether allocating memory is allowed or not (due to deadlocks with reclaim code) than places that care whether sleeping is allowed. But debugging these also tends to be a lot harder, so nice descriptive checks could come in handy. I might have some use eventually for annotations in drivers/gpu.
Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking does not consult the PF_MEMALLOC flags. But there is no flag equivalent for GFP_NOWAIT, hence this check can't go wrong due to memalloc_no*_save/restore contexts.
Cc: Paul E. McKenney paulmck@kernel.org Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Michel Lespinasse walken@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Waiman Long longman@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Randy Dunlap rdunlap@infradead.org Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: Dave Chinner david@fromorbit.com Cc: Qian Cai cai@lca.pw Cc: linux-xfs@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Looks useful. Acked-by: Vlastimil Babka vbabka@suse.cz
On Fri, Oct 23, 2020 at 02:21:15PM +0200, Daniel Vetter wrote:
Extracted from slab.h, which seems to have the most complete version including the correct might_sleep() check. Roll it out to slob.c.
Motivated by a discussion with Paul about possibly changing call_rcu behaviour to allocate memory, but only roughly every 500th call.
There are a lot fewer places in the kernel that care about whether allocating memory is allowed or not (due to deadlocks with reclaim code) than places that care whether sleeping is allowed. But debugging these also tends to be a lot harder, so nice descriptive checks could come in handy. I might have some use eventually for annotations in drivers/gpu.
Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking does not consult the PF_MEMALLOC flags. But there is no flag equivalent for GFP_NOWAIT, hence this check can't go wrong due to memalloc_no*_save/restore contexts.
Cc: Paul E. McKenney paulmck@kernel.org Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Michel Lespinasse walken@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Waiman Long longman@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Randy Dunlap rdunlap@infradead.org Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: Dave Chinner david@fromorbit.com Cc: Qian Cai cai@lca.pw Cc: linux-xfs@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Nice!!!
Acked-by: Paul E. McKenney paulmck@kernel.org
include/linux/sched/mm.h | 16 ++++++++++++++++ mm/slab.h | 5 +---- mm/slob.c | 6 ++---- 3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index f889e332912f..2b0037abac0b 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -205,6 +205,22 @@ static inline void fs_reclaim_acquire(gfp_t gfp_mask) { } static inline void fs_reclaim_release(gfp_t gfp_mask) { } #endif
+/**
- might_alloc - Marks possible allocation sites
- @gfp_mask: gfp_t flags that would be use to allocate
- Similar to might_sleep() and other annotations this can be used in functions
- that might allocate, but often dont. Compiles to nothing without
- CONFIG_LOCKDEP. Includes a conditional might_sleep() if @gfp allows blocking.
- */
+static inline void might_alloc(gfp_t gfp_mask) +{
- fs_reclaim_acquire(gfp_mask);
- fs_reclaim_release(gfp_mask);
- might_sleep_if(gfpflags_allow_blocking(gfp_mask));
+}
/**
- memalloc_noio_save - Marks implicit GFP_NOIO allocation scope.
diff --git a/mm/slab.h b/mm/slab.h index 6cc323f1313a..fedd789b2270 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -492,10 +492,7 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, { flags &= gfp_allowed_mask;
- fs_reclaim_acquire(flags);
- fs_reclaim_release(flags);
- might_sleep_if(gfpflags_allow_blocking(flags));
might_alloc(flags);
if (should_failslab(s, flags)) return NULL;
diff --git a/mm/slob.c b/mm/slob.c index 7cc9805c8091..8d4bfa46247f 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -474,8 +474,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
gfp &= gfp_allowed_mask;
- fs_reclaim_acquire(gfp);
- fs_reclaim_release(gfp);
might_alloc(gfp);
if (size < PAGE_SIZE - minalign) { int align = minalign;
@@ -597,8 +596,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
flags &= gfp_allowed_mask;
- fs_reclaim_acquire(flags);
- fs_reclaim_release(flags);
might_alloc(flags);
if (c->size < PAGE_SIZE) { b = slob_alloc(c->size, flags, c->align, node, 0);
-- 2.28.0
This is a bit disappointing since we need to split the annotations over all the different parts.
I was considering just leaking the critical section into the ->atomic_commit_tail callback of each driver. But that would mean we need to pass the fence_cookie into each driver (there's a total of 13 implementations of this hook right now), so bad flag day. And also a bit leaky abstraction.
Hence just do it function-by-function.
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/drm_atomic_helper.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 549a31e6042c..23013209d4bf 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1567,6 +1567,7 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done); void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; + bool fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_commit_modeset_disables(dev, old_state);
@@ -1578,6 +1579,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
drm_atomic_helper_commit_hw_done(old_state);
+ dma_fence_end_signalling(fence_cookie); + drm_atomic_helper_wait_for_vblanks(dev, old_state);
drm_atomic_helper_cleanup_planes(dev, old_state); @@ -1597,6 +1600,7 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_tail); void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; + bool fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_commit_modeset_disables(dev, old_state);
@@ -1609,6 +1613,8 @@ void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
drm_atomic_helper_commit_hw_done(old_state);
+ dma_fence_end_signalling(fence_cookie); + drm_atomic_helper_wait_for_vblanks(dev, old_state);
drm_atomic_helper_cleanup_planes(dev, old_state); @@ -1624,6 +1630,9 @@ static void commit_tail(struct drm_atomic_state *old_state) ktime_t start; s64 commit_time_ms; unsigned int i, new_self_refresh_mask = 0; + bool fence_cookie; + + fence_cookie = dma_fence_begin_signalling();
funcs = dev->mode_config.helper_private;
@@ -1652,6 +1661,8 @@ static void commit_tail(struct drm_atomic_state *old_state) if (new_crtc_state->self_refresh_active) new_self_refresh_mask |= BIT(i);
+ dma_fence_end_signalling(fence_cookie); + if (funcs && funcs->atomic_commit_tail) funcs->atomic_commit_tail(old_state); else @@ -1810,6 +1821,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, bool nonblock) { int ret; + bool fence_cookie;
if (state->async_update) { ret = drm_atomic_helper_prepare_planes(dev, state); @@ -1832,6 +1844,8 @@ int drm_atomic_helper_commit(struct drm_device *dev, if (ret) return ret;
+ fence_cookie = dma_fence_begin_signalling(); + if (!nonblock) { ret = drm_atomic_helper_wait_for_fences(dev, state, true); if (ret) @@ -1869,6 +1883,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, */
drm_atomic_state_get(state); + dma_fence_end_signalling(fence_cookie); if (nonblock) queue_work(system_unbound_wq, &state->commit_work); else @@ -1877,6 +1892,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, return 0;
err: + dma_fence_end_signalling(fence_cookie); drm_atomic_helper_cleanup_planes(dev, state); return ret; }
This is needed to signal the fences from page flips, annotate it accordingly. We need to annotate entire timer callback since if we get stuck anywhere in there, then the timer stops, and hence fences stop. Just annotating the top part that does the vblank handling isn't enough.
Tested-by: Melissa Wen melissa.srw@gmail.com Reviewed-by: Rodrigo Siqueira rodrigosiqueiramelo@gmail.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Rodrigo Siqueira rodrigosiqueiramelo@gmail.com Cc: Haneen Mohammed hamohammed.sa@gmail.com Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/vkms/vkms_crtc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index e43e4e1b268a..8124d8f2ee15 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0+
+#include <linux/dma-fence.h> + #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_probe_helper.h> @@ -14,7 +16,9 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) struct drm_crtc *crtc = &output->crtc; struct vkms_crtc_state *state; u64 ret_overrun; - bool ret; + bool ret, fence_cookie; + + fence_cookie = dma_fence_begin_signalling();
ret_overrun = hrtimer_forward_now(&output->vblank_hrtimer, output->period_ns); @@ -49,6 +53,8 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) DRM_DEBUG_DRIVER("Composer worker already queued\n"); }
+ dma_fence_end_signalling(fence_cookie); + return HRTIMER_RESTART; }
This is rather overkill since currently all drivers call this from hardirq (or at least timers). But maybe in the future we're going to have thread irq handlers and what not, doesn't hurt to be prepared. Plus this is an easy start for sprinkling these fence annotations into shared code.
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/drm_vblank.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index f135b79593dd..ba7e741764aa 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -24,6 +24,7 @@ * OTHER DEALINGS IN THE SOFTWARE. */
+#include <linux/dma-fence.h> #include <linux/export.h> #include <linux/kthread.h> #include <linux/moduleparam.h> @@ -1913,7 +1914,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe) { struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; unsigned long irqflags; - bool disable_irq; + bool disable_irq, fence_cookie;
if (drm_WARN_ON_ONCE(dev, !drm_dev_has_vblank(dev))) return false; @@ -1921,6 +1922,8 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe) if (drm_WARN_ON(dev, pipe >= dev->num_crtcs)) return false;
+ fence_cookie = dma_fence_begin_signalling(); + spin_lock_irqsave(&dev->event_lock, irqflags);
/* Need timestamp lock to prevent concurrent execution with @@ -1933,6 +1936,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe) if (!vblank->enabled) { spin_unlock(&dev->vblank_time_lock); spin_unlock_irqrestore(&dev->event_lock, irqflags); + dma_fence_end_signalling(fence_cookie); return false; }
@@ -1959,6 +1963,8 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe) if (disable_irq) vblank_disable_fn(&vblank->disable_timer);
+ dma_fence_end_signalling(fence_cookie); + return true; } EXPORT_SYMBOL(drm_handle_vblank);
I need a canary in a ttm-based atomic driver to make sure the dma_fence_begin/end_signalling annotations actually work.
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index bb1bc7f5d149..b05fecf06f25 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -57,6 +57,7 @@
#include "ivsrcid/ivsrcid_vislands30.h"
+#include <linux/module.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/version.h> @@ -7492,6 +7493,9 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state; int crtc_disable_count = 0; bool mode_set_reset_required = false; + bool fence_cookie; + + fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_update_legacy_modeset_state(dev, state); drm_atomic_helper_calc_timestamping_constants(state); @@ -7816,6 +7820,8 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) /* Signal HW programming completion */ drm_atomic_helper_commit_hw_done(state);
+ dma_fence_end_signalling(fence_cookie); + if (wait_for_vblank) drm_atomic_helper_wait_for_flip_done(dev, state);
Like the helpers, nothing special. Well except not, because we the critical section extends until after hw_done(), since that's the last thing which could hold up a subsequent atomic commit. That means the wait_for_flip_done is included, but that's not a problem, we're allowed to call dma_fence_wait() from signalling critical sections. Even on our own fence (which this does), it's just a bit confusing. But in a way those last 2 function calls are already part of the fence signalling critical section for the next atomic commit.
Reading this I'm wondering why komeda waits for flip_done() before calling hw_done(), which is a bit backwards (but hey hw can be special). Might be good to throw a comment in there that explains why, because the original commit that added this just doesn't.
Reviewed-by: James Qian Wang james.qian.wang@arm.com Cc: "James (Qian) Wang" james.qian.wang@arm.com Cc: Liviu Dudau liviu.dudau@arm.com Cc: Mihail Atanassov mihail.atanassov@arm.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/arm/display/komeda/komeda_kms.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c index 1f6682032ca4..cc5b5915bc5e 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c @@ -73,6 +73,7 @@ static struct drm_driver komeda_kms_driver = { static void komeda_kms_commit_tail(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; + bool fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_commit_modeset_disables(dev, old_state);
@@ -85,6 +86,8 @@ static void komeda_kms_commit_tail(struct drm_atomic_state *old_state)
drm_atomic_helper_commit_hw_done(old_state);
+ dma_fence_end_signalling(fence_cookie); + drm_atomic_helper_cleanup_planes(dev, old_state); }
Again needs to be put right after the call to drm_atomic_helper_commit_hw_done(), since that's the last thing which can hold up a subsequent atomic commit.
No surprises here.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "James (Qian) Wang" james.qian.wang@arm.com Cc: Liviu Dudau liviu.dudau@arm.com Cc: Mihail Atanassov mihail.atanassov@arm.com --- drivers/gpu/drm/arm/malidp_drv.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/arm/malidp_drv.c b/drivers/gpu/drm/arm/malidp_drv.c index 69fee05c256c..26e60401a8e1 100644 --- a/drivers/gpu/drm/arm/malidp_drv.c +++ b/drivers/gpu/drm/arm/malidp_drv.c @@ -234,6 +234,7 @@ static void malidp_atomic_commit_tail(struct drm_atomic_state *state) struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state; int i; + bool fence_cookie = dma_fence_begin_signalling();
pm_runtime_get_sync(drm->dev);
@@ -260,6 +261,8 @@ static void malidp_atomic_commit_tail(struct drm_atomic_state *state)
malidp_atomic_commit_hw_done(state);
+ dma_fence_end_signalling(fence_cookie); + pm_runtime_put(drm->dev);
drm_atomic_helper_cleanup_planes(drm, state);
On Fri, Oct 23, 2020 at 02:21:21PM +0200, Daniel Vetter wrote:
Again needs to be put right after the call to drm_atomic_helper_commit_hw_done(), since that's the last thing which can hold up a subsequent atomic commit.
No surprises here.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: "James (Qian) Wang" james.qian.wang@arm.com Cc: Liviu Dudau liviu.dudau@arm.com
Acked-by: Liviu Dudau liviu.dudau@arm.com
Thanks for the patch!
Best regards, Liviu
Cc: Mihail Atanassov mihail.atanassov@arm.com
drivers/gpu/drm/arm/malidp_drv.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/arm/malidp_drv.c b/drivers/gpu/drm/arm/malidp_drv.c index 69fee05c256c..26e60401a8e1 100644 --- a/drivers/gpu/drm/arm/malidp_drv.c +++ b/drivers/gpu/drm/arm/malidp_drv.c @@ -234,6 +234,7 @@ static void malidp_atomic_commit_tail(struct drm_atomic_state *state) struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state; int i;
bool fence_cookie = dma_fence_begin_signalling();
pm_runtime_get_sync(drm->dev);
@@ -260,6 +261,8 @@ static void malidp_atomic_commit_tail(struct drm_atomic_state *state)
malidp_atomic_commit_hw_done(state);
dma_fence_end_signalling(fence_cookie);
pm_runtime_put(drm->dev);
drm_atomic_helper_cleanup_planes(drm, state);
-- 2.28.0
One of these drivers that predates the nonblocking support in helpers, and hand-rolled its own thing. Entirely not anything specific here, we can just delete it all and replace it with the helper version.
Could also perhaps use the drm_mode_config_helper_suspend/resume stuff, for another few lines deleted. But I'm not looking at that stuff, I'm just going through all the atomic commit functions and make sure they have properly annotated dma-fence critical sections everywhere.
v2: - Also delete the workqueue (Sam) - drop the @commit kerneldoc, I missed that one.
Reviewed-by: Sam Ravnborg sam@ravnborg.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sam Ravnborg sam@ravnborg.org Cc: Boris Brezillon bbrezillon@kernel.org Cc: Nicolas Ferre nicolas.ferre@microchip.com Cc: Alexandre Belloni alexandre.belloni@bootlin.com Cc: Ludovic Desroches ludovic.desroches@microchip.com Cc: linux-arm-kernel@lists.infradead.org --- drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c | 107 +------------------ drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h | 7 -- 2 files changed, 2 insertions(+), 112 deletions(-)
diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c index 871293d1aeeb..03984932d174 100644 --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c @@ -557,103 +557,10 @@ static irqreturn_t atmel_hlcdc_dc_irq_handler(int irq, void *data) return IRQ_HANDLED; }
-struct atmel_hlcdc_dc_commit { - struct work_struct work; - struct drm_device *dev; - struct drm_atomic_state *state; -}; - -static void -atmel_hlcdc_dc_atomic_complete(struct atmel_hlcdc_dc_commit *commit) -{ - struct drm_device *dev = commit->dev; - struct atmel_hlcdc_dc *dc = dev->dev_private; - struct drm_atomic_state *old_state = commit->state; - - /* Apply the atomic update. */ - drm_atomic_helper_commit_modeset_disables(dev, old_state); - drm_atomic_helper_commit_planes(dev, old_state, 0); - drm_atomic_helper_commit_modeset_enables(dev, old_state); - - drm_atomic_helper_wait_for_vblanks(dev, old_state); - - drm_atomic_helper_cleanup_planes(dev, old_state); - - drm_atomic_state_put(old_state); - - /* Complete the commit, wake up any waiter. */ - spin_lock(&dc->commit.wait.lock); - dc->commit.pending = false; - wake_up_all_locked(&dc->commit.wait); - spin_unlock(&dc->commit.wait.lock); - - kfree(commit); -} - -static void atmel_hlcdc_dc_atomic_work(struct work_struct *work) -{ - struct atmel_hlcdc_dc_commit *commit = - container_of(work, struct atmel_hlcdc_dc_commit, work); - - atmel_hlcdc_dc_atomic_complete(commit); -} - -static int atmel_hlcdc_dc_atomic_commit(struct drm_device *dev, - struct drm_atomic_state *state, - bool async) -{ - struct atmel_hlcdc_dc *dc = dev->dev_private; - struct atmel_hlcdc_dc_commit *commit; - int ret; - - ret = drm_atomic_helper_prepare_planes(dev, state); - if (ret) - return ret; - - /* Allocate the commit object. */ - commit = kzalloc(sizeof(*commit), GFP_KERNEL); - if (!commit) { - ret = -ENOMEM; - goto error; - } - - INIT_WORK(&commit->work, atmel_hlcdc_dc_atomic_work); - commit->dev = dev; - commit->state = state; - - spin_lock(&dc->commit.wait.lock); - ret = wait_event_interruptible_locked(dc->commit.wait, - !dc->commit.pending); - if (ret == 0) - dc->commit.pending = true; - spin_unlock(&dc->commit.wait.lock); - - if (ret) - goto err_free; - - /* We have our own synchronization through the commit lock. */ - BUG_ON(drm_atomic_helper_swap_state(state, false) < 0); - - /* Swap state succeeded, this is the point of no return. */ - drm_atomic_state_get(state); - if (async) - queue_work(dc->wq, &commit->work); - else - atmel_hlcdc_dc_atomic_complete(commit); - - return 0; - -err_free: - kfree(commit); -error: - drm_atomic_helper_cleanup_planes(dev, state); - return ret; -} - static const struct drm_mode_config_funcs mode_config_funcs = { .fb_create = drm_gem_fb_create, .atomic_check = drm_atomic_helper_check, - .atomic_commit = atmel_hlcdc_dc_atomic_commit, + .atomic_commit = drm_atomic_helper_commit, };
static int atmel_hlcdc_dc_modeset_init(struct drm_device *dev) @@ -712,11 +619,6 @@ static int atmel_hlcdc_dc_load(struct drm_device *dev) if (!dc) return -ENOMEM;
- dc->wq = alloc_ordered_workqueue("atmel-hlcdc-dc", 0); - if (!dc->wq) - return -ENOMEM; - - init_waitqueue_head(&dc->commit.wait); dc->desc = match->data; dc->hlcdc = dev_get_drvdata(dev->dev->parent); dev->dev_private = dc; @@ -724,7 +626,7 @@ static int atmel_hlcdc_dc_load(struct drm_device *dev) ret = clk_prepare_enable(dc->hlcdc->periph_clk); if (ret) { dev_err(dev->dev, "failed to enable periph_clk\n"); - goto err_destroy_wq; + return ret; }
pm_runtime_enable(dev->dev); @@ -761,9 +663,6 @@ static int atmel_hlcdc_dc_load(struct drm_device *dev) pm_runtime_disable(dev->dev); clk_disable_unprepare(dc->hlcdc->periph_clk);
-err_destroy_wq: - destroy_workqueue(dc->wq); - return ret; }
@@ -771,7 +670,6 @@ static void atmel_hlcdc_dc_unload(struct drm_device *dev) { struct atmel_hlcdc_dc *dc = dev->dev_private;
- flush_workqueue(dc->wq); drm_kms_helper_poll_fini(dev); drm_atomic_helper_shutdown(dev); drm_mode_config_cleanup(dev); @@ -784,7 +682,6 @@ static void atmel_hlcdc_dc_unload(struct drm_device *dev)
pm_runtime_disable(dev->dev); clk_disable_unprepare(dc->hlcdc->periph_clk); - destroy_workqueue(dc->wq); }
static int atmel_hlcdc_dc_irq_postinstall(struct drm_device *dev) diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h index 469d4507e576..5b5c774e0edf 100644 --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h @@ -331,9 +331,7 @@ struct atmel_hlcdc_dc_desc { * @crtc: CRTC provided by the display controller * @planes: instantiated planes * @layers: active HLCDC layers - * @wq: display controller workqueue * @suspend: used to store the HLCDC state when entering suspend - * @commit: used for async commit handling */ struct atmel_hlcdc_dc { const struct atmel_hlcdc_dc_desc *desc; @@ -341,15 +339,10 @@ struct atmel_hlcdc_dc { struct atmel_hlcdc *hlcdc; struct drm_crtc *crtc; struct atmel_hlcdc_layer *layers[ATMEL_HLCDC_MAX_LAYERS]; - struct workqueue_struct *wq; struct { u32 imr; struct drm_atomic_state *state; } suspend; - struct { - wait_queue_head_t wait; - bool pending; - } commit; };
extern struct atmel_hlcdc_formats atmel_hlcdc_plane_rgb_formats;
drm_atomic_helper_commit_hw_done() is the last thing (no plane cleanup apparrently), so it's the entire function. And a nice comment explaining why thw wait_for_flip_done is ahead, unlike the usual sequence.
Aside, I think since the atomic helpers do track plane disabling now separately this might no longer be a real problem since:
commit 21a01abbe32a3cbeb903378a24e504bfd9fe0648 Author: Maarten Lankhorst maarten.lankhorst@linux.intel.com Date: Mon Sep 4 12:48:37 2017 +0200
drm/atomic: Fix freeing connector/plane state too early by tracking commits, v3.
Plus the subsequent bugfixes of course, this was tricky to get right.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Philipp Zabel p.zabel@pengutronix.de Cc: Shawn Guo shawnguo@kernel.org Cc: Sascha Hauer s.hauer@pengutronix.de Cc: Pengutronix Kernel Team kernel@pengutronix.de Cc: Fabio Estevam festevam@gmail.com Cc: NXP Linux Team linux-imx@nxp.com Cc: linux-arm-kernel@lists.infradead.org --- drivers/gpu/drm/imx/imx-drm-core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/imx/imx-drm-core.c b/drivers/gpu/drm/imx/imx-drm-core.c index 7d00c49fd5a5..0a6db8eeb25d 100644 --- a/drivers/gpu/drm/imx/imx-drm-core.c +++ b/drivers/gpu/drm/imx/imx-drm-core.c @@ -80,6 +80,7 @@ static void imx_drm_atomic_commit_tail(struct drm_atomic_state *state) struct drm_plane_state *old_plane_state, *new_plane_state; bool plane_disabling = false; int i; + bool fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_commit_modeset_disables(dev, state);
@@ -110,6 +111,7 @@ static void imx_drm_atomic_commit_tail(struct drm_atomic_state *state) }
drm_atomic_helper_commit_hw_done(state); + dma_fence_end_signalling(fence_cookie); }
static const struct drm_mode_config_helper_funcs imx_drm_mode_config_helpers = {
Nothing special, just put the end right after hw_done(). Note that in one path there's a wait for the flip/update to complete. But as far as I understand from comments and code that's only relevant for modesets, and skipped if there wasn't a modeset done on a given crtc.
For a bit more clarity pull the hw_done() call out of the if/else, that way it's a bit clearer flow. But happy to shuffle this around as is seen fit.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Tomi Valkeinen tomi.valkeinen@ti.com --- drivers/gpu/drm/omapdrm/omap_drv.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c index 2e598b8b72af..2b82a708eca6 100644 --- a/drivers/gpu/drm/omapdrm/omap_drv.c +++ b/drivers/gpu/drm/omapdrm/omap_drv.c @@ -68,6 +68,7 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; struct omap_drm_private *priv = dev->dev_private; + bool fence_cookie = dma_fence_begin_signalling();
priv->dispc_ops->runtime_get(priv->dispc);
@@ -90,8 +91,6 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state) omap_atomic_wait_for_completion(dev, old_state);
drm_atomic_helper_commit_planes(dev, old_state, 0); - - drm_atomic_helper_commit_hw_done(old_state); } else { /* * OMAP3 DSS seems to have issues with the work-around above, @@ -101,10 +100,12 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_commit_planes(dev, old_state, 0);
drm_atomic_helper_commit_modeset_enables(dev, old_state); - - drm_atomic_helper_commit_hw_done(old_state); }
+ drm_atomic_helper_commit_hw_done(old_state); + + dma_fence_end_signalling(fence_cookie); + /* * Wait for completion of the page flips to ensure that old buffers * can't be touched by the hardware anymore before cleaning up planes.
On 23/10/2020 15:21, Daniel Vetter wrote:
Nothing special, just put the end right after hw_done(). Note that in one path there's a wait for the flip/update to complete. But as far as I understand from comments and code that's only relevant for modesets, and skipped if there wasn't a modeset done on a given crtc.
For a bit more clarity pull the hw_done() call out of the if/else, that way it's a bit clearer flow. But happy to shuffle this around as is seen fit.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Tomi Valkeinen tomi.valkeinen@ti.com
drivers/gpu/drm/omapdrm/omap_drv.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c index 2e598b8b72af..2b82a708eca6 100644 --- a/drivers/gpu/drm/omapdrm/omap_drv.c +++ b/drivers/gpu/drm/omapdrm/omap_drv.c @@ -68,6 +68,7 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; struct omap_drm_private *priv = dev->dev_private;
bool fence_cookie = dma_fence_begin_signalling();
priv->dispc_ops->runtime_get(priv->dispc);
@@ -90,8 +91,6 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state) omap_atomic_wait_for_completion(dev, old_state);
drm_atomic_helper_commit_planes(dev, old_state, 0);
} else { /*drm_atomic_helper_commit_hw_done(old_state);
- OMAP3 DSS seems to have issues with the work-around above,
@@ -101,10 +100,12 @@ static void omap_atomic_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_commit_planes(dev, old_state, 0);
drm_atomic_helper_commit_modeset_enables(dev, old_state);
}drm_atomic_helper_commit_hw_done(old_state);
- drm_atomic_helper_commit_hw_done(old_state);
- dma_fence_end_signalling(fence_cookie);
- /*
- Wait for completion of the page flips to ensure that old buffers
- can't be touched by the hardware anymore before cleaning up planes.
Reviewed-by: Tomi Valkeinen tomi.valkeinen@ti.com
Tomi
Ends right after drm_atomic_helper_commit_hw_done(), absolutely nothing fancy going on here.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Kieran Bingham kieran.bingham+renesas@ideasonboard.com Cc: linux-renesas-soc@vger.kernel.org --- drivers/gpu/drm/rcar-du/rcar_du_kms.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c index 72dda446355f..8d91140151cc 100644 --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c @@ -441,6 +441,7 @@ static void rcar_du_atomic_commit_tail(struct drm_atomic_state *old_state) struct drm_crtc_state *crtc_state; struct drm_crtc *crtc; unsigned int i; + bool fence_cookie = dma_fence_begin_signalling();
/* * Store RGB routing to DPAD0 and DPAD1, the hardware will be configured @@ -467,6 +468,7 @@ static void rcar_du_atomic_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_commit_modeset_enables(dev, old_state);
drm_atomic_helper_commit_hw_done(old_state); + dma_fence_end_signalling(fence_cookie); drm_atomic_helper_wait_for_flip_done(dev, old_state);
drm_atomic_helper_cleanup_planes(dev, old_state);
Hi Daniel,
Thank you for the patch.
On Fri, Oct 23, 2020 at 02:21:25PM +0200, Daniel Vetter wrote:
Ends right after drm_atomic_helper_commit_hw_done(), absolutely nothing fancy going on here.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Kieran Bingham kieran.bingham+renesas@ideasonboard.com Cc: linux-renesas-soc@vger.kernel.org
I'm lacking context here, there's only one instance of a call to dma_fence_begin_signalling() in the subsystem, and no cover letter in the series to explain what's going on. Some information would help reviewing the patch :-)
Also, could you mention in the cover letter for the next version if you will merge the whole series, or expect individual maintainers to pick up the relevant patches ?
drivers/gpu/drm/rcar-du/rcar_du_kms.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c index 72dda446355f..8d91140151cc 100644 --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c @@ -441,6 +441,7 @@ static void rcar_du_atomic_commit_tail(struct drm_atomic_state *old_state) struct drm_crtc_state *crtc_state; struct drm_crtc *crtc; unsigned int i;
bool fence_cookie = dma_fence_begin_signalling();
/*
- Store RGB routing to DPAD0 and DPAD1, the hardware will be configured
@@ -467,6 +468,7 @@ static void rcar_du_atomic_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_commit_modeset_enables(dev, old_state);
drm_atomic_helper_commit_hw_done(old_state);
dma_fence_end_signalling(fence_cookie); drm_atomic_helper_wait_for_flip_done(dev, old_state);
drm_atomic_helper_cleanup_planes(dev, old_state);
On Wed, Dec 16, 2020 at 2:29 AM Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:
Hi Daniel,
Thank you for the patch.
On Fri, Oct 23, 2020 at 02:21:25PM +0200, Daniel Vetter wrote:
Ends right after drm_atomic_helper_commit_hw_done(), absolutely nothing fancy going on here.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Kieran Bingham kieran.bingham+renesas@ideasonboard.com Cc: linux-renesas-soc@vger.kernel.org
I'm lacking context here, there's only one instance of a call to dma_fence_begin_signalling() in the subsystem, and no cover letter in the series to explain what's going on. Some information would help reviewing the patch :-)
Also, could you mention in the cover letter for the next version if you will merge the whole series, or expect individual maintainers to pick up the relevant patches ?
This series was a misfire of git send-email. I figured that following up to 65 patches with "I'm sorry" doesn't help the spam problem, so I only did it once.
This patch was part of a proper series half a year ago, with cover letter and everything, and a few patches from that series have landed. I've planed to resubmit this all this week again. -Daniel
drivers/gpu/drm/rcar-du/rcar_du_kms.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c index 72dda446355f..8d91140151cc 100644 --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c @@ -441,6 +441,7 @@ static void rcar_du_atomic_commit_tail(struct drm_atomic_state *old_state) struct drm_crtc_state *crtc_state; struct drm_crtc *crtc; unsigned int i;
bool fence_cookie = dma_fence_begin_signalling(); /* * Store RGB routing to DPAD0 and DPAD1, the hardware will be configured
@@ -467,6 +468,7 @@ static void rcar_du_atomic_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_commit_modeset_enables(dev, old_state);
drm_atomic_helper_commit_hw_done(old_state);
dma_fence_end_signalling(fence_cookie); drm_atomic_helper_wait_for_flip_done(dev, old_state); drm_atomic_helper_cleanup_planes(dev, old_state);
-- Regards,
Laurent Pinchart
Again ends just after drm_atomic_helper_commit_hw_done(), but with the twist that we need to make sure we're only annotate the custom version. And not the other clause which just calls drm_atomic_helper_commit_tail_rpm(), which is already annotated.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Thierry Reding thierry.reding@gmail.com Cc: Jonathan Hunter jonathanh@nvidia.com Cc: linux-tegra@vger.kernel.org --- drivers/gpu/drm/tegra/drm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index f0f581cd345e..24f353c1cee8 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -65,11 +65,14 @@ static void tegra_atomic_commit_tail(struct drm_atomic_state *old_state) struct tegra_drm *tegra = drm->dev_private;
if (tegra->hub) { + bool fence_cookie = dma_fence_begin_signalling(); + drm_atomic_helper_commit_modeset_disables(drm, old_state); tegra_display_hub_atomic_commit(drm, old_state); drm_atomic_helper_commit_planes(drm, old_state, 0); drm_atomic_helper_commit_modeset_enables(drm, old_state); drm_atomic_helper_commit_hw_done(old_state); + dma_fence_end_signalling(fence_cookie); drm_atomic_helper_wait_for_vblanks(drm, old_state); drm_atomic_helper_cleanup_planes(drm, old_state); } else {
Ends right after hw_done(), totally standard case.
Acked-by: Jyri Sarha jsarha@ti.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jyri Sarha jsarha@ti.com Cc: Tomi Valkeinen tomi.valkeinen@ti.com --- drivers/gpu/drm/tidss/tidss_kms.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/tidss/tidss_kms.c b/drivers/gpu/drm/tidss/tidss_kms.c index 09485c7f0d6f..95f8e0f78e32 100644 --- a/drivers/gpu/drm/tidss/tidss_kms.c +++ b/drivers/gpu/drm/tidss/tidss_kms.c @@ -4,6 +4,8 @@ * Author: Tomi Valkeinen tomi.valkeinen@ti.com */
+#include <linux/dma-fence.h> + #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_bridge.h> @@ -26,6 +28,7 @@ static void tidss_atomic_commit_tail(struct drm_atomic_state *old_state) { struct drm_device *ddev = old_state->dev; struct tidss_device *tidss = to_tidss(ddev); + bool fence_cookie = dma_fence_begin_signalling();
dev_dbg(ddev->dev, "%s\n", __func__);
@@ -36,6 +39,7 @@ static void tidss_atomic_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_commit_modeset_enables(ddev, old_state);
drm_atomic_helper_commit_hw_done(old_state); + dma_fence_end_signalling(fence_cookie); drm_atomic_helper_wait_for_flip_done(ddev, old_state);
drm_atomic_helper_cleanup_planes(ddev, old_state);
If the scheduler rt thread gets stuck on a mutex that we're holding while waiting for gpu workloads to complete, we have a problem.
Add dma-fence annotations so that lockdep can check this for us.
I've tried to quite carefully review this, and I think it's at the right spot. But obviosly no expert on drm scheduler.
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 9a0d77a68018..f69abc4e70d3 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -764,9 +764,12 @@ static int drm_sched_main(void *param) { struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param; int r; + bool fence_cookie;
sched_set_fifo_low(current);
+ fence_cookie = dma_fence_begin_signalling(); + while (!kthread_should_stop()) { struct drm_sched_entity *entity = NULL; struct drm_sched_fence *s_fence; @@ -824,6 +827,9 @@ static int drm_sched_main(void *param)
wake_up(&sched->job_scheduled); } + + dma_fence_end_signalling(fence_cookie); + return 0; }
This is a bit tricky, since ->notifier_lock is held while calling dma_fence_wait we must ensure that also the read side (i.e. dma_fence_begin_signalling) is on the same side. If we mix this up lockdep complaints, and that's again why we want to have these annotations.
A nice side effect of this is that because of the fs_reclaim priming for dma_fence_enable lockdep now automatically checks for us that nothing in here allocates memory, without even running any userptr workloads.
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index d50b63a93d37..3b3999225e31 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1212,6 +1212,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, struct amdgpu_job *job; uint64_t seq; int r; + bool fence_cookie;
job = p->job; p->job = NULL; @@ -1226,6 +1227,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, */ mutex_lock(&p->adev->notifier_lock);
+ fence_cookie = dma_fence_begin_signalling(); + /* If userptr are invalidated after amdgpu_cs_parser_bos(), return * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl. */ @@ -1262,12 +1265,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence); + dma_fence_end_signalling(fence_cookie); mutex_unlock(&p->adev->notifier_lock);
return 0;
error_abort: drm_sched_job_cleanup(&job->base); + dma_fence_end_signalling(fence_cookie); mutex_unlock(&p->adev->notifier_lock);
error_unlock:
My dma-fence lockdep annotations caught an inversion because we allocate memory where we really shouldn't:
kmem_cache_alloc+0x2b/0x6d0 amdgpu_fence_emit+0x30/0x330 [amdgpu] amdgpu_ib_schedule+0x306/0x550 [amdgpu] amdgpu_job_run+0x10f/0x260 [amdgpu] drm_sched_main+0x1b9/0x490 [gpu_sched] kthread+0x12e/0x150
Trouble right now is that lockdep only validates against GFP_FS, which would be good enough for shrinkers. But for mmu_notifiers we actually need !GFP_ATOMIC, since they can be called from any page laundering, even if GFP_NOFS or GFP_NOIO are set.
I guess we should improve the lockdep annotations for fs_reclaim_acquire/release.
Ofc real fix is to properly preallocate this fence and stuff it into the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of the way.
v2: Two more allocations in scheduler paths.
Frist one:
__kmalloc+0x58/0x720 amdgpu_vmid_grab+0x100/0xca0 [amdgpu] amdgpu_job_dependency+0xf9/0x120 [amdgpu] drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched] drm_sched_main+0xf9/0x490 [gpu_sched]
Second one:
kmem_cache_alloc+0x2b/0x6d0 amdgpu_sync_fence+0x7e/0x110 [amdgpu] amdgpu_vmid_grab+0x86b/0xca0 [amdgpu] amdgpu_job_dependency+0xf9/0x120 [amdgpu] drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched] drm_sched_main+0xf9/0x490 [gpu_sched]
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index fe2d495d08ab..09614b325b5f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, uint32_t seq; int r;
- fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL); + fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC); if (fence == NULL) return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index 7521f4ab55de..2a4cde7cd746 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm, if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait)) return amdgpu_sync_fence(sync, ring->vmid_wait);
- fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL); + fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC); if (!fences) return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c index 8ea6c49529e7..af22b526cec9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c @@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f) if (amdgpu_sync_add_later(sync, f)) return 0;
- e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL); + e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC); if (!e) return -ENOMEM;
In the face of unpriviledged userspace being able to submit bogus gpu workloads the kernel needs gpu timeout and reset (tdr) to guarantee that dma_fences actually complete. Annotate this worker to make sure we don't have any accidental locking inversions or other problems lurking.
Originally this was part of the overall scheduler annotation patch. But amdgpu has some glorious inversions here:
- grabs console_lock - does a full modeset, which grabs all kinds of locks (drm_modeset_lock, dma_resv_lock) which can deadlock with dma_fence_wait held inside them. - almost minor at that point, but the modeset code also allocates memory
These all look like they'll be very hard to fix properly, the hardware seems to require a full display reset with any gpu recovery.
Hence split out as a seperate patch.
Since amdgpu isn't the only hardware driver that needs to reset the display (at least gen2/3 on intel have the same problem) we need a generic solution for this. There's two tricks we could still from drm/i915 and lift to dma-fence:
- The big whack, aka force-complete all fences. i915 does this for all pending jobs if the reset is somehow stuck. Trouble is we'd need to do this for all fences in the entire system, and just the book-keeping for that will be fun. Plus lots of drivers use fences for all kinds of internal stuff like memory management, so unconditionally resetting all of them doesn't work.
I'm also hoping that with these fence annotations we could enlist lockdep in finding the last offenders causing deadlocks, and we could remove this get-out-of-jail trick.
- The more feasible approach (across drivers at least as part of the dma_fence contract) is what drm/i915 does for gen2/3: When we need to reset the display we wake up all dma_fence_wait_interruptible calls, or well at least the equivalent of those in i915 internally.
Relying on ioctl restart we force all other threads to release their locks, which means the tdr thread is guaranteed to be able to get them. I think we could implement this at the dma_fence level, including proper lockdep annotations.
dma_fence_begin_tdr(): - must be nested within a dma_fence_begin/end_signalling section - will wake up all interruptible (but not the non-interruptible) dma_fence_wait() calls and force them to complete with a -ERESTARTSYS errno code. All new interrupitble calls to dma_fence_wait() will immeidately fail with the same error code.
dma_fence_end_trdr(): - this will convert dma_fence_wait() calls back to normal.
Of course interrupting dma_fence_wait is only ok if the caller specified that, which means we need to split the annotations into interruptible and non-interruptible version. If we then make sure that we only use interruptible dma_fence_wait() calls while holding drm_modeset_lock we can grab them in tdr code, and allow display resets. Doing the same for dma_resv_lock might be a lot harder, so buffer updates must be avoided.
What's worse, we're not going to be able to make the dma_fence_wait calls in mmu-notifiers interruptible, that doesn't work. So allocating memory still wont' be allowed, even in tdr sections. Plus obviously we can use this trick only in tdr, it is rather intrusive.
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/scheduler/sched_main.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index f69abc4e70d3..ae0d5ceca49a 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -281,9 +281,12 @@ static void drm_sched_job_timedout(struct work_struct *work) { struct drm_gpu_scheduler *sched; struct drm_sched_job *job; + bool fence_cookie;
sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
+ fence_cookie = dma_fence_begin_signalling(); + /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->ring_mirror_list, @@ -315,6 +318,8 @@ static void drm_sched_job_timedout(struct work_struct *work) spin_lock(&sched->job_list_lock); drm_sched_start_timeout(sched); spin_unlock(&sched->job_list_lock); + + dma_fence_end_signalling(fence_cookie); }
/**
To improve coverage also annotate the gpu reset code itself, since that's called from other places than drm/scheduler (which is already annotated). Annotations nests, so this doesn't break anything, and allows easier testing.
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e8b41756c9f9..029a026ecfa9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4496,6 +4496,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, int i, r = 0; bool need_emergency_restart = false; bool audio_suspended = false; + bool fence_cookie; + + fence_cookie = dma_fence_begin_signalling();
/** * Special case: RAS triggered and full reset isn't supported @@ -4529,6 +4532,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", job ? job->base.id : -1, hive->hive_id); amdgpu_put_xgmi_hive(hive); + dma_fence_end_signalling(fence_cookie); return 0; } mutex_lock(&hive->hive_lock); @@ -4541,8 +4545,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, */ INIT_LIST_HEAD(&device_list); if (adev->gmc.xgmi.num_physical_nodes > 1) { - if (!hive) + if (!hive) { + dma_fence_end_signalling(fence_cookie); return -ENODEV; + } if (!list_is_first(&adev->gmc.xgmi.head, &hive->device_list)) list_rotate_to_front(&adev->gmc.xgmi.head, &hive->device_list); device_list_handle = &hive->device_list; @@ -4556,8 +4562,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, if (!amdgpu_device_lock_adev(tmp_adev, hive)) { dev_info(tmp_adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress", job ? job->base.id : -1); - r = 0; - goto skip_recovery; }
/* @@ -4699,6 +4703,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
if (r) dev_info(adev->dev, "GPU reset end with ret = %d\n", r); + dma_fence_end_signalling(fence_cookie); return r; }
This is one from the department of "maybe play lottery if you hit this, karma compensation might work". Or at least lockdep ftw!
This reverts commit 565d1941557756a584ac357d945bc374d5fcd1d0.
It's not quite as low-risk as the commit message claims, because this grabs console_lock, which might be held when we allocate memory, which might never happen because the dma_fence_wait() is stuck waiting on our gpu reset:
[ 136.763714] ====================================================== [ 136.763714] WARNING: possible circular locking dependency detected [ 136.763715] 5.7.0-rc3+ #346 Tainted: G W [ 136.763716] ------------------------------------------------------ [ 136.763716] kworker/2:3/682 is trying to acquire lock: [ 136.763716] ffffffff8226f140 (console_lock){+.+.}-{0:0}, at: drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763723] but task is already holding lock: [ 136.763724] ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 136.763726] which lock already depends on the new lock.
[ 136.763726] the existing dependency chain (in reverse order) is: [ 136.763727] -> #2 (dma_fence_map){++++}-{0:0}: [ 136.763730] __dma_fence_might_wait+0x41/0xb0 [ 136.763732] dma_resv_lockdep+0x171/0x202 [ 136.763734] do_one_initcall+0x5d/0x2f0 [ 136.763736] kernel_init_freeable+0x20d/0x26d [ 136.763738] kernel_init+0xa/0xfb [ 136.763740] ret_from_fork+0x27/0x50 [ 136.763740] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 136.763743] fs_reclaim_acquire.part.0+0x25/0x30 [ 136.763745] kmem_cache_alloc_trace+0x2e/0x6e0 [ 136.763747] device_create_groups_vargs+0x52/0xf0 [ 136.763747] device_create+0x49/0x60 [ 136.763749] fb_console_init+0x25/0x145 [ 136.763750] fbmem_init+0xcc/0xe2 [ 136.763750] do_one_initcall+0x5d/0x2f0 [ 136.763751] kernel_init_freeable+0x20d/0x26d [ 136.763752] kernel_init+0xa/0xfb [ 136.763753] ret_from_fork+0x27/0x50 [ 136.763753] -> #0 (console_lock){+.+.}-{0:0}: [ 136.763755] __lock_acquire+0x1241/0x23f0 [ 136.763756] lock_acquire+0xad/0x370 [ 136.763757] console_lock+0x47/0x70 [ 136.763761] drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763809] amdgpu_device_gpu_recover.cold+0x21e/0xe7b [amdgpu] [ 136.763850] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 136.763851] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 136.763852] process_one_work+0x23c/0x580 [ 136.763853] worker_thread+0x50/0x3b0 [ 136.763854] kthread+0x12e/0x150 [ 136.763855] ret_from_fork+0x27/0x50 [ 136.763855] other info that might help us debug this:
[ 136.763856] Chain exists of: console_lock --> fs_reclaim --> dma_fence_map
[ 136.763857] Possible unsafe locking scenario:
[ 136.763857] CPU0 CPU1 [ 136.763857] ---- ---- [ 136.763857] lock(dma_fence_map); [ 136.763858] lock(fs_reclaim); [ 136.763858] lock(dma_fence_map); [ 136.763858] lock(console_lock); [ 136.763859] *** DEADLOCK ***
[ 136.763860] 4 locks held by kworker/2:3/682: [ 136.763860] #0: ffff8887fb81c938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 136.763862] #1: ffffc90000cafe58 ((work_completion)(&(&sched->work_tdr)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 136.763863] #2: ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 136.763865] #3: ffff8887ab621748 (&adev->lock_reset){+.+.}-{3:3}, at: amdgpu_device_gpu_recover.cold+0x5ab/0xe7b [amdgpu] [ 136.763914] stack backtrace: [ 136.763915] CPU: 2 PID: 682 Comm: kworker/2:3 Tainted: G W 5.7.0-rc3+ #346 [ 136.763916] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018 [ 136.763918] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 136.763919] Call Trace: [ 136.763922] dump_stack+0x8f/0xd0 [ 136.763924] check_noncircular+0x162/0x180 [ 136.763926] __lock_acquire+0x1241/0x23f0 [ 136.763927] lock_acquire+0xad/0x370 [ 136.763932] ? drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763933] ? mark_held_locks+0x2d/0x80 [ 136.763934] ? _raw_spin_unlock_irqrestore+0x46/0x60 [ 136.763936] console_lock+0x47/0x70 [ 136.763940] ? drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763944] drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763993] amdgpu_device_gpu_recover.cold+0x21e/0xe7b [amdgpu] [ 136.764036] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 136.764038] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 136.764040] process_one_work+0x23c/0x580 [ 136.764041] worker_thread+0x50/0x3b0 [ 136.764042] ? process_one_work+0x580/0x580 [ 136.764044] kthread+0x12e/0x150 [ 136.764045] ? kthread_create_worker_on_cpu+0x70/0x70 [ 136.764046] ret_from_fork+0x27/0x50
Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 029a026ecfa9..935116614884 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4330,8 +4330,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, if (r) goto out;
- amdgpu_fbdev_set_suspend(tmp_adev, 0); - /* * The GPU enters bad state once faulty pages * by ECC has reached the threshold, and ras @@ -4590,8 +4588,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, */ amdgpu_unregister_gpu_instance(tmp_adev);
- amdgpu_fbdev_set_suspend(tmp_adev, 1); - /* disable ras on ALL IPs */ if (!need_emergency_restart && amdgpu_device_ip_need_full_reset(tmp_adev))
i915 does tons of allocations from this worker, which lockdep catches.
Also generic infrastructure like this with big potential for how dma_fence or other cross driver contracts work, really should be reviewed on dri-devel. Implementing custom wheels for everything within the driver is a classic case of "platform problem" [1]. Which in upstream we really shouldn't have.
Since there's no quick way to solve these splats (dma_fence_work is used a bunch in basic buffer management and command submission) like for amdgpu, I'm giving up at this point here. Annotating i915 scheduler and gpu reset could would be interesting, but since lockdep is one-shot we can't see what surprises would lurk there.
1: https://lwn.net/Articles/443531/ Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Christian König christian.koenig@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/i915/i915_sw_fence_work.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c index a3a81bb8f2c3..5b74acadaef5 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c @@ -17,12 +17,15 @@ static void fence_work(struct work_struct *work) { struct dma_fence_work *f = container_of(work, typeof(*f), work); int err; + bool fence_cookie;
+ fence_cookie = dma_fence_begin_signalling(); err = f->ops->work(f); if (err) dma_fence_set_error(&f->dma, err);
fence_complete(f); + dma_fence_end_signalling(fence_cookie); dma_fence_put(&f->dma); }
This reverts commit 7fbc6eca6f0db8244cddc39471d71471a2f30254.
I need working lockdep back --- drivers/gpu/drm/i915/i915_sw_fence_work.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c index 5b74acadaef5..a3a81bb8f2c3 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c @@ -17,15 +17,12 @@ static void fence_work(struct work_struct *work) { struct dma_fence_work *f = container_of(work, typeof(*f), work); int err; - bool fence_cookie;
- fence_cookie = dma_fence_begin_signalling(); err = f->ops->work(f); if (err) dma_fence_set_error(&f->dma, err);
fence_complete(f); - dma_fence_end_signalling(fence_cookie); dma_fence_put(&f->dma); }
Purely conjecture, but I think the original locking inversion with the legacy page flip code between flipping and ttm's bo move function shoudn't exist anymore with atomic: With atomic the bo pinning and actual modeset commit is completely separated in the code patsh.
This annotation was originally added in
commit 060810d7abaabcab282e062c595871d661561400 Author: Ben Skeggs bskeggs@redhat.com Date: Mon Jul 8 14:15:51 2013 +1000
drm/nouveau: fix locking issues in page flipping paths
due to
commit b580c9e2b7ba5030a795aa2fb73b796523d65a78 Author: Maarten Lankhorst m.b.lankhorst@gmail.com Date: Thu Jun 27 13:48:18 2013 +0200
drm/nouveau: make flipping lockdep safe
Reviewed-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst m.b.lankhorst@gmail.com Cc: Ben Skeggs bskeggs@redhat.com Cc: Dave Airlie airlied@gmail.com Cc: nouveau@lists.freedesktop.org --- I might be totally wrong, so this definitely needs testing :-)
Cheers, Daniel --- drivers/gpu/drm/nouveau/nouveau_bo.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 70b6f3b1ae85..c04a808664f8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -775,7 +775,10 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, return ret; }
- mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING); + if (drm_drv_uses_atomic_modeset(drm->dev)) + mutex_lock(&cli->mutex); + else + mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING); ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, ctx->interruptible); if (ret == 0) { ret = drm->ttm.move(chan, bo, &bo->mem, new_reg);
This isn't actually protecting anything becuase: - when running, ttm_resource_manager->use_type is protected through vmw_private->reservation_semaphore against concurrent execbuf or well anything else that might evict or reserve buffers - during suspend/resume there's nothing else running, hence vmw_pm_freeze and vmw_pm_restore do not need to take the same lock. - this also holds for the SVGA_REG_ENABLE register write
Hence it is safe to just remove that spinlock.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: VMware Graphics linux-graphics-maintainer@vmware.com Cc: Roland Scheidegger sroland@vmware.com --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 10 +--------- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 1 - 2 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 4860370740e0..6ecdd1df311b 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -677,7 +677,6 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset) spin_lock_init(&dev_priv->hw_lock); spin_lock_init(&dev_priv->waiter_lock); spin_lock_init(&dev_priv->cap_lock); - spin_lock_init(&dev_priv->svga_lock); spin_lock_init(&dev_priv->cursor_lock);
for (i = vmw_res_context; i < vmw_res_max; ++i) { @@ -1194,12 +1193,10 @@ static void __vmw_svga_enable(struct vmw_private *dev_priv) { struct ttm_resource_manager *man = ttm_manager_type(&dev_priv->bdev, TTM_PL_VRAM);
- spin_lock(&dev_priv->svga_lock); if (!ttm_resource_manager_used(man)) { vmw_write(dev_priv, SVGA_REG_ENABLE, SVGA_REG_ENABLE); ttm_resource_manager_set_used(man, true); } - spin_unlock(&dev_priv->svga_lock); }
/** @@ -1225,14 +1222,12 @@ static void __vmw_svga_disable(struct vmw_private *dev_priv) { struct ttm_resource_manager *man = ttm_manager_type(&dev_priv->bdev, TTM_PL_VRAM);
- spin_lock(&dev_priv->svga_lock); if (ttm_resource_manager_used(man)) { ttm_resource_manager_set_used(man, false); vmw_write(dev_priv, SVGA_REG_ENABLE, SVGA_REG_ENABLE_HIDE | SVGA_REG_ENABLE_ENABLE); } - spin_unlock(&dev_priv->svga_lock); }
/** @@ -1259,17 +1254,14 @@ void vmw_svga_disable(struct vmw_private *dev_priv) */ vmw_kms_lost_device(dev_priv->dev); ttm_write_lock(&dev_priv->reservation_sem, false); - spin_lock(&dev_priv->svga_lock); if (ttm_resource_manager_used(man)) { ttm_resource_manager_set_used(man, false); - spin_unlock(&dev_priv->svga_lock); if (ttm_resource_manager_evict_all(&dev_priv->bdev, man)) DRM_ERROR("Failed evicting VRAM buffers.\n"); vmw_write(dev_priv, SVGA_REG_ENABLE, SVGA_REG_ENABLE_HIDE | SVGA_REG_ENABLE_ENABLE); - } else - spin_unlock(&dev_priv->svga_lock); + } ttm_write_unlock(&dev_priv->reservation_sem); }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index b45becbb00f8..266096728520 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -597,7 +597,6 @@ struct vmw_private {
bool stealth; bool enable_fb; - spinlock_t svga_lock;
/** * PM management.
Other way round is a bit inconsistent (but not buggy in any kind). This is prep work so that ttm_resource_manager_set_used can assert that the resource manager is empty.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 6ecdd1df311b..68206d0a1237 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -1255,9 +1255,9 @@ void vmw_svga_disable(struct vmw_private *dev_priv) vmw_kms_lost_device(dev_priv->dev); ttm_write_lock(&dev_priv->reservation_sem, false); if (ttm_resource_manager_used(man)) { - ttm_resource_manager_set_used(man, false); if (ttm_resource_manager_evict_all(&dev_priv->bdev, man)) DRM_ERROR("Failed evicting VRAM buffers.\n"); + ttm_resource_manager_set_used(man, false); vmw_write(dev_priv, SVGA_REG_ENABLE, SVGA_REG_ENABLE_HIDE | SVGA_REG_ENABLE_ENABLE);
ttm_resource_manager->use_type is only used for runtime changes by vmwgfx. I think ideally we'd push this functionality into drivers - ttm itself does not provide any locking to guarantee this is safe, so the only way this can work at runtime is if the driver does provide additional guarantees. vwmgfx does that through the vmw_private->reservation_sem. Therefore supporting this feature in shared code feels a bit misplaced.
As a first step add a WARN_ON to make sure the resource manager is empty. This is just to make sure I actually understand correctly what vmwgfx is doing, and to make sure an eventual subsequent refactor doesn't break anything.
This check should also be useful for other drivers, to make sure they haven't leaked anything.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com --- include/drm/ttm/ttm_resource.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f48a70d39ac5..789ec477b607 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -191,6 +191,10 @@ struct ttm_resource { static inline void ttm_resource_manager_set_used(struct ttm_resource_manager *man, bool used) { + int i; + + for (i = 0; i < TTM_MAX_BO_PRIORITY; i++) + WARN_ON(!list_empty(&man->lru[i])); man->use_type = used; }
Am 23.10.20 um 14:21 schrieb Daniel Vetter:
ttm_resource_manager->use_type is only used for runtime changes by vmwgfx. I think ideally we'd push this functionality into drivers - ttm itself does not provide any locking to guarantee this is safe, so the only way this can work at runtime is if the driver does provide additional guarantees. vwmgfx does that through the vmw_private->reservation_sem. Therefore supporting this feature in shared code feels a bit misplaced.
As a first step add a WARN_ON to make sure the resource manager is empty. This is just to make sure I actually understand correctly what vmwgfx is doing, and to make sure an eventual subsequent refactor doesn't break anything.
This check should also be useful for other drivers, to make sure they haven't leaked anything.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com
I'm pretty sure that this will trigger for vmwgfx. But that's what it is supposed to do, isn't it?
Reviewed-by: Christian König christian.koenig@amd.com
include/drm/ttm/ttm_resource.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f48a70d39ac5..789ec477b607 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -191,6 +191,10 @@ struct ttm_resource { static inline void ttm_resource_manager_set_used(struct ttm_resource_manager *man, bool used) {
- int i;
- for (i = 0; i < TTM_MAX_BO_PRIORITY; i++)
man->use_type = used; }WARN_ON(!list_empty(&man->lru[i]));
On Fri, Oct 23, 2020 at 4:54 PM Christian König christian.koenig@amd.com wrote:
Am 23.10.20 um 14:21 schrieb Daniel Vetter:
ttm_resource_manager->use_type is only used for runtime changes by vmwgfx. I think ideally we'd push this functionality into drivers - ttm itself does not provide any locking to guarantee this is safe, so the only way this can work at runtime is if the driver does provide additional guarantees. vwmgfx does that through the vmw_private->reservation_sem. Therefore supporting this feature in shared code feels a bit misplaced.
As a first step add a WARN_ON to make sure the resource manager is empty. This is just to make sure I actually understand correctly what vmwgfx is doing, and to make sure an eventual subsequent refactor doesn't break anything.
This check should also be useful for other drivers, to make sure they haven't leaked anything.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com
I'm pretty sure that this will trigger for vmwgfx. But that's what it is supposed to do, isn't it?
Yeah, this is an accidental dump of my wip pile, and it's not done yet at all. Please disregard (at least for now). -Daniel
Reviewed-by: Christian König christian.koenig@amd.com
include/drm/ttm/ttm_resource.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f48a70d39ac5..789ec477b607 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -191,6 +191,10 @@ struct ttm_resource { static inline void ttm_resource_manager_set_used(struct ttm_resource_manager *man, bool used) {
int i;
for (i = 0; i < TTM_MAX_BO_PRIORITY; i++)
}WARN_ON(!list_empty(&man->lru[i])); man->use_type = used;
On Fri, Oct 23, 2020 at 04:56:20PM +0200, Daniel Vetter wrote:
On Fri, Oct 23, 2020 at 4:54 PM Christian König christian.koenig@amd.com wrote:
Am 23.10.20 um 14:21 schrieb Daniel Vetter:
ttm_resource_manager->use_type is only used for runtime changes by vmwgfx. I think ideally we'd push this functionality into drivers - ttm itself does not provide any locking to guarantee this is safe, so the only way this can work at runtime is if the driver does provide additional guarantees. vwmgfx does that through the vmw_private->reservation_sem. Therefore supporting this feature in shared code feels a bit misplaced.
As a first step add a WARN_ON to make sure the resource manager is empty. This is just to make sure I actually understand correctly what vmwgfx is doing, and to make sure an eventual subsequent refactor doesn't break anything.
This check should also be useful for other drivers, to make sure they haven't leaked anything.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com
I'm pretty sure that this will trigger for vmwgfx. But that's what it is supposed to do, isn't it?
Yeah, this is an accidental dump of my wip pile, and it's not done yet at all. Please disregard (at least for now). -Daniel
Reviewed-by: Christian König christian.koenig@amd.com
Ok decided to submit these 3 patches finally, including the 2 vmwgfx fixes which should avoid the splat. I included your r-b, pls complain if that's not ok anymore.
Thanks, Daniel
include/drm/ttm/ttm_resource.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f48a70d39ac5..789ec477b607 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -191,6 +191,10 @@ struct ttm_resource { static inline void ttm_resource_manager_set_used(struct ttm_resource_manager *man, bool used) {
int i;
for (i = 0; i < TTM_MAX_BO_PRIORITY; i++)
}WARN_ON(!list_empty(&man->lru[i])); man->use_type = used;
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Way back it was a reasonable assumptions that iomem mappings never change the pfn range they point at. But this has changed:
- gpu drivers dynamically manage their memory nowadays, invalidating ptes with unmap_mapping_range when buffers get moved
- contiguous dma allocations have moved from dedicated carvetouts to cma regions. This means if we miss the unmap the pfn might contain pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
- even /dev/mem now invalidates mappings when the kernel requests that iomem region when CONFIG_IO_STRICT_DEVMEM is set, see commit 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region")
Accessing pfns obtained from ptes without holding all the locks is therefore no longer a good idea. Fix this.
Since zpci_memcpy_from|toio seems to not do anything nefarious with locks we just need to open code get_pfn and follow_pfn and make sure we drop the locks only after we're done. The write function also needs the copy_from_user move, since we can't take userspace faults while holding the mmap sem.
v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL like before (Gerard)
v3: Polish commit message (Niklas)
Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Dan Williams dan.j.williams@intel.com Cc: Kees Cook keescook@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Niklas Schnelle schnelle@linux.ibm.com Cc: Gerald Schaefer gerald.schaefer@linux.ibm.com Cc: linux-s390@vger.kernel.org Cc: Niklas Schnelle schnelle@linux.ibm.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++----------------- 1 file changed, 57 insertions(+), 41 deletions(-)
diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c index 401cf670a243..1a6adbc68ee8 100644 --- a/arch/s390/pci/pci_mmio.c +++ b/arch/s390/pci/pci_mmio.c @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst, return rc; }
-static long get_pfn(unsigned long user_addr, unsigned long access, - unsigned long *pfn) -{ - struct vm_area_struct *vma; - long ret; - - mmap_read_lock(current->mm); - ret = -EINVAL; - vma = find_vma(current->mm, user_addr); - if (!vma) - goto out; - ret = -EACCES; - if (!(vma->vm_flags & access)) - goto out; - ret = follow_pfn(vma, user_addr, pfn); -out: - mmap_read_unlock(current->mm); - return ret; -} - SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, const void __user *, user_buffer, size_t, length) { u8 local_buf[64]; void __iomem *io_addr; void *buf; - unsigned long pfn; + struct vm_area_struct *vma; + pte_t *ptep; + spinlock_t *ptl; long ret;
if (!zpci_is_enabled()) @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, * We only support write access to MIO capable devices if we are on * a MIO enabled system. Otherwise we would have to check for every * address if it is a special ZPCI_ADDR and would have to do - * a get_pfn() which we don't need for MIO capable devices. Currently + * a pfn lookup which we don't need for MIO capable devices. Currently * ISM devices are the only devices without MIO support and there is no * known need for accessing these from userspace. */ @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, } else buf = local_buf;
- ret = get_pfn(mmio_addr, VM_WRITE, &pfn); + ret = -EFAULT; + if (copy_from_user(buf, user_buffer, length)) + goto out_free; + + mmap_read_lock(current->mm); + ret = -EINVAL; + vma = find_vma(current->mm, mmio_addr); + if (!vma) + goto out_unlock_mmap; + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) + goto out_unlock_mmap; + ret = -EACCES; + if (!(vma->vm_flags & VM_WRITE)) + goto out_unlock_mmap; + + ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl); if (ret) - goto out; - io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | + goto out_unlock_mmap; + + io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
- ret = -EFAULT; if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) - goto out; - - if (copy_from_user(buf, user_buffer, length)) - goto out; + goto out_unlock_pt;
ret = zpci_memcpy_toio(io_addr, buf, length); -out: +out_unlock_pt: + pte_unmap_unlock(ptep, ptl); +out_unlock_mmap: + mmap_read_unlock(current->mm); +out_free: if (buf != local_buf) kfree(buf); return ret; @@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, u8 local_buf[64]; void __iomem *io_addr; void *buf; - unsigned long pfn; + struct vm_area_struct *vma; + pte_t *ptep; + spinlock_t *ptl; long ret;
if (!zpci_is_enabled()) @@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, * We only support read access to MIO capable devices if we are on * a MIO enabled system. Otherwise we would have to check for every * address if it is a special ZPCI_ADDR and would have to do - * a get_pfn() which we don't need for MIO capable devices. Currently + * a pfn lookup which we don't need for MIO capable devices. Currently * ISM devices are the only devices without MIO support and there is no * known need for accessing these from userspace. */ @@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, buf = local_buf; }
- ret = get_pfn(mmio_addr, VM_READ, &pfn); + mmap_read_lock(current->mm); + ret = -EINVAL; + vma = find_vma(current->mm, mmio_addr); + if (!vma) + goto out_unlock_mmap; + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) + goto out_unlock_mmap; + ret = -EACCES; + if (!(vma->vm_flags & VM_WRITE)) + goto out_unlock_mmap; + + ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl); if (ret) - goto out; - io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK)); + goto out_unlock_mmap; + + io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) | + (mmio_addr & ~PAGE_MASK));
if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) { ret = -EFAULT; - goto out; + goto out_unlock_pt; } ret = zpci_memcpy_fromio(buf, io_addr, length); - if (ret) - goto out; - if (copy_to_user(user_buffer, buf, length)) + +out_unlock_pt: + pte_unmap_unlock(ptep, ptl); +out_unlock_mmap: + mmap_read_unlock(current->mm); + + if (!ret && copy_to_user(user_buffer, buf, length)) ret = -EFAULT;
-out: if (buf != local_buf) kfree(buf); return ret;
On Fri, Oct 23, 2020 at 02:21:40PM +0200, Daniel Vetter wrote:
Way back it was a reasonable assumptions that iomem mappings never change the pfn range they point at. But this has changed:
- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved
- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see commit 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region")
Accessing pfns obtained from ptes without holding all the locks is therefore no longer a good idea. Fix this.
Since zpci_memcpy_from|toio seems to not do anything nefarious with locks we just need to open code get_pfn and follow_pfn and make sure we drop the locks only after we're done. The write function also needs the copy_from_user move, since we can't take userspace faults while holding the mmap sem.
v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL like before (Gerard)
v3: Polish commit message (Niklas)
Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Dan Williams dan.j.williams@intel.com Cc: Kees Cook keescook@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Niklas Schnelle schnelle@linux.ibm.com Cc: Gerald Schaefer gerald.schaefer@linux.ibm.com Cc: linux-s390@vger.kernel.org Cc: Niklas Schnelle schnelle@linux.ibm.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
Apologies for this random patch bomb, typoed git send-email command :-/
v4 of this will come properly on Monday or so. -Daniel
arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++----------------- 1 file changed, 57 insertions(+), 41 deletions(-)
diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c index 401cf670a243..1a6adbc68ee8 100644 --- a/arch/s390/pci/pci_mmio.c +++ b/arch/s390/pci/pci_mmio.c @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst, return rc; }
-static long get_pfn(unsigned long user_addr, unsigned long access,
unsigned long *pfn)
-{
- struct vm_area_struct *vma;
- long ret;
- mmap_read_lock(current->mm);
- ret = -EINVAL;
- vma = find_vma(current->mm, user_addr);
- if (!vma)
goto out;
- ret = -EACCES;
- if (!(vma->vm_flags & access))
goto out;
- ret = follow_pfn(vma, user_addr, pfn);
-out:
- mmap_read_unlock(current->mm);
- return ret;
-}
SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, const void __user *, user_buffer, size_t, length) { u8 local_buf[64]; void __iomem *io_addr; void *buf;
- unsigned long pfn;
struct vm_area_struct *vma;
pte_t *ptep;
spinlock_t *ptl; long ret;
if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, * We only support write access to MIO capable devices if we are on * a MIO enabled system. Otherwise we would have to check for every * address if it is a special ZPCI_ADDR and would have to do
* a get_pfn() which we don't need for MIO capable devices. Currently
* a pfn lookup which we don't need for MIO capable devices. Currently
*/
- ISM devices are the only devices without MIO support and there is no
- known need for accessing these from userspace.
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, } else buf = local_buf;
- ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
- ret = -EFAULT;
- if (copy_from_user(buf, user_buffer, length))
goto out_free;
- mmap_read_lock(current->mm);
- ret = -EINVAL;
- vma = find_vma(current->mm, mmio_addr);
- if (!vma)
goto out_unlock_mmap;
- if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
goto out_unlock_mmap;
- ret = -EACCES;
- if (!(vma->vm_flags & VM_WRITE))
goto out_unlock_mmap;
- ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl); if (ret)
goto out;
- io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
goto out_unlock_mmap;
- io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
- ret = -EFAULT; if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
goto out;
- if (copy_from_user(buf, user_buffer, length))
goto out;
goto out_unlock_pt;
ret = zpci_memcpy_toio(io_addr, buf, length);
-out: +out_unlock_pt:
- pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
- mmap_read_unlock(current->mm);
+out_free: if (buf != local_buf) kfree(buf); return ret; @@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, u8 local_buf[64]; void __iomem *io_addr; void *buf;
- unsigned long pfn;
struct vm_area_struct *vma;
pte_t *ptep;
spinlock_t *ptl; long ret;
if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, * We only support read access to MIO capable devices if we are on * a MIO enabled system. Otherwise we would have to check for every * address if it is a special ZPCI_ADDR and would have to do
* a get_pfn() which we don't need for MIO capable devices. Currently
* a pfn lookup which we don't need for MIO capable devices. Currently
*/
- ISM devices are the only devices without MIO support and there is no
- known need for accessing these from userspace.
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, buf = local_buf; }
- ret = get_pfn(mmio_addr, VM_READ, &pfn);
- mmap_read_lock(current->mm);
- ret = -EINVAL;
- vma = find_vma(current->mm, mmio_addr);
- if (!vma)
goto out_unlock_mmap;
- if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
goto out_unlock_mmap;
- ret = -EACCES;
- if (!(vma->vm_flags & VM_WRITE))
goto out_unlock_mmap;
- ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl); if (ret)
goto out;
- io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
goto out_unlock_mmap;
io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
(mmio_addr & ~PAGE_MASK));
if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) { ret = -EFAULT;
goto out;
} ret = zpci_memcpy_fromio(buf, io_addr, length);goto out_unlock_pt;
- if (ret)
goto out;
- if (copy_to_user(user_buffer, buf, length))
+out_unlock_pt:
- pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
- mmap_read_unlock(current->mm);
- if (!ret && copy_to_user(user_buffer, buf, length)) ret = -EFAULT;
-out: if (buf != local_buf) kfree(buf); return ret; -- 2.28.0
All we need are a pages array, pin_user_pages_fast can give us that directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
Reviewed-by: John Hubbard jhubbard@nvidia.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Inki Dae inki.dae@samsung.com Cc: Joonyoung Shim jy0922.shim@samsung.com Cc: Seung-Woo Kim sw0312.kim@samsung.com Cc: Kyungmin Park kyungmin.park@samsung.com Cc: Kukjin Kim kgene@kernel.org Cc: Krzysztof Kozlowski krzk@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v2: Use unpin_user_pages_dirty_lock (John) --- drivers/gpu/drm/exynos/Kconfig | 1 - drivers/gpu/drm/exynos/exynos_drm_g2d.c | 47 +++++++++++-------------- 2 files changed, 20 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig index 6417f374b923..43257ef3c09d 100644 --- a/drivers/gpu/drm/exynos/Kconfig +++ b/drivers/gpu/drm/exynos/Kconfig @@ -88,7 +88,6 @@ comment "Sub-drivers" config DRM_EXYNOS_G2D bool "G2D" depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST - select FRAME_VECTOR help Choose this option if you want to use Exynos G2D for DRM.
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c index 967a5cdc120e..ecede41af9b9 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr { dma_addr_t dma_addr; unsigned long userptr; unsigned long size; - struct frame_vector *vec; + struct page **pages; + unsigned int npages; struct sg_table *sgt; atomic_t refcount; bool in_pool; @@ -378,7 +379,6 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d, bool force) { struct g2d_cmdlist_userptr *g2d_userptr = obj; - struct page **pages;
if (!obj) return; @@ -398,15 +398,9 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d, dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt, DMA_BIDIRECTIONAL, 0);
- pages = frame_vector_pages(g2d_userptr->vec); - if (!IS_ERR(pages)) { - int i; - - for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++) - set_page_dirty_lock(pages[i]); - } - put_vaddr_frames(g2d_userptr->vec); - frame_vector_destroy(g2d_userptr->vec); + unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages, + true); + kvfree(g2d_userptr->pages);
if (!g2d_userptr->out_of_list) list_del_init(&g2d_userptr->list); @@ -474,35 +468,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d, offset = userptr & ~PAGE_MASK; end = PAGE_ALIGN(userptr + size); npages = (end - start) >> PAGE_SHIFT; - g2d_userptr->vec = frame_vector_create(npages); - if (!g2d_userptr->vec) { + g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages), + GFP_KERNEL); + if (!g2d_userptr->pages) { ret = -ENOMEM; goto err_free; }
- ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE, - g2d_userptr->vec); + ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE, + g2d_userptr->pages); if (ret != npages) { DRM_DEV_ERROR(g2d->dev, "failed to get user pages from userptr.\n"); if (ret < 0) - goto err_destroy_framevec; - ret = -EFAULT; - goto err_put_framevec; - } - if (frame_vector_to_pages(g2d_userptr->vec) < 0) { + goto err_destroy_pages; + npages = ret; ret = -EFAULT; - goto err_put_framevec; + goto err_unpin_pages; } + g2d_userptr->npages = npages;
sgt = kzalloc(sizeof(*sgt), GFP_KERNEL); if (!sgt) { ret = -ENOMEM; - goto err_put_framevec; + goto err_unpin_pages; }
ret = sg_alloc_table_from_pages(sgt, - frame_vector_pages(g2d_userptr->vec), + g2d_userptr->pages, npages, offset, size, GFP_KERNEL); if (ret < 0) { DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n"); @@ -538,11 +531,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d, err_free_sgt: kfree(sgt);
-err_put_framevec: - put_vaddr_frames(g2d_userptr->vec); +err_unpin_pages: + unpin_user_pages(g2d_userptr->pages, npages);
-err_destroy_framevec: - frame_vector_destroy(g2d_userptr->vec); +err_destroy_pages: + kvfree(g2d_userptr->pages);
err_free: kfree(g2d_userptr);
The exynos g2d interface is very unusual, but it looks like the userptr objects are persistent. Hence they need FOLL_LONGTERM.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Inki Dae inki.dae@samsung.com Cc: Joonyoung Shim jy0922.shim@samsung.com Cc: Seung-Woo Kim sw0312.kim@samsung.com Cc: Kyungmin Park kyungmin.park@samsung.com Cc: Kukjin Kim kgene@kernel.org Cc: Krzysztof Kozlowski krzk@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c index ecede41af9b9..1e0c5a7f206e 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c @@ -475,7 +475,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d, goto err_free; }
- ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE, + ret = pin_user_pages_fast(start, npages, + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, g2d_userptr->pages); if (ret != npages) { DRM_DEV_ERROR(g2d->dev,
All we need are a pages array, pin_user_pages_fast can give us that directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
Reviewed-by: John Hubbard jhubbard@nvidia.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay oded.gabbay@gmail.com Cc: Omer Shpigelman oshpigelman@habana.ai Cc: Ofir Bitton obitton@habana.ai Cc: Tomer Tayar ttayar@habana.ai Cc: Moti Haimovski mhaimovski@habana.ai Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Pawel Piskorski ppiskorski@habana.ai Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v2: Use unpin_user_pages_dirty_lock (John) v3: Update kerneldoc (Oded) --- drivers/misc/habanalabs/Kconfig | 1 - drivers/misc/habanalabs/common/habanalabs.h | 6 ++- drivers/misc/habanalabs/common/memory.c | 49 ++++++++------------- 3 files changed, 22 insertions(+), 34 deletions(-)
diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig index 8eb5d38c618e..2f04187f7167 100644 --- a/drivers/misc/habanalabs/Kconfig +++ b/drivers/misc/habanalabs/Kconfig @@ -6,7 +6,6 @@ config HABANA_AI tristate "HabanaAI accelerators (habanalabs)" depends on PCI && HAS_IOMEM - select FRAME_VECTOR select DMA_SHARED_BUFFER select GENERIC_ALLOCATOR select HWMON diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index edbd627b29d2..41af090b3e6a 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -870,7 +870,8 @@ struct hl_ctx_mgr { * struct hl_userptr - memory mapping chunk information * @vm_type: type of the VM. * @job_node: linked-list node for hanging the object on the Job's list. - * @vec: pointer to the frame vector. + * @pages: pointer to struct page array + * @npages: size of @pages array * @sgt: pointer to the scatter-gather table that holds the pages. * @dir: for DMA unmapping, the direction must be supplied, so save it. * @debugfs_list: node in debugfs list of command submissions. @@ -881,7 +882,8 @@ struct hl_ctx_mgr { struct hl_userptr { enum vm_type_t vm_type; /* must be first */ struct list_head job_node; - struct frame_vector *vec; + struct page **pages; + unsigned int npages; struct sg_table *sgt; enum dma_data_direction dir; struct list_head debugfs_list; diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c index 5ff4688683fd..327b64479f97 100644 --- a/drivers/misc/habanalabs/common/memory.c +++ b/drivers/misc/habanalabs/common/memory.c @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size, return -EFAULT; }
- userptr->vec = frame_vector_create(npages); - if (!userptr->vec) { + userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages), + GFP_KERNEL); + if (!userptr->pages) { dev_err(hdev->dev, "Failed to create frame vector\n"); return -ENOMEM; }
- rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE, - userptr->vec); + rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE, + userptr->pages);
if (rc != npages) { dev_err(hdev->dev, "Failed to map host memory, user ptr probably wrong\n"); if (rc < 0) - goto destroy_framevec; + goto destroy_pages; + npages = rc; rc = -EFAULT; - goto put_framevec; - } - - if (frame_vector_to_pages(userptr->vec) < 0) { - dev_err(hdev->dev, - "Failed to translate frame vector to pages\n"); - rc = -EFAULT; - goto put_framevec; + goto put_pages; } + userptr->npages = npages;
rc = sg_alloc_table_from_pages(userptr->sgt, - frame_vector_pages(userptr->vec), - npages, offset, size, GFP_ATOMIC); + userptr->pages, + npages, offset, size, GFP_ATOMIC); if (rc < 0) { dev_err(hdev->dev, "failed to create SG table from pages\n"); - goto put_framevec; + goto put_pages; }
return 0;
-put_framevec: - put_vaddr_frames(userptr->vec); -destroy_framevec: - frame_vector_destroy(userptr->vec); +put_pages: + unpin_user_pages(userptr->pages, npages); +destroy_pages: + kvfree(userptr->pages); return rc; }
@@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size, */ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr) { - struct page **pages; - hl_debugfs_remove_userptr(hdev, userptr);
if (userptr->dma_mapped) @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr) userptr->sgt->nents, userptr->dir);
- pages = frame_vector_pages(userptr->vec); - if (!IS_ERR(pages)) { - int i; - - for (i = 0; i < frame_vector_count(userptr->vec); i++) - set_page_dirty_lock(pages[i]); - } - put_vaddr_frames(userptr->vec); - frame_vector_destroy(userptr->vec); + unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true); + kvfree(userptr->pages);
list_del(&userptr->job_node);
These are persistent, not just for the duration of a dma operation.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay oded.gabbay@gmail.com Cc: Omer Shpigelman oshpigelman@habana.ai Cc: Ofir Bitton obitton@habana.ai Cc: Tomer Tayar ttayar@habana.ai Cc: Moti Haimovski mhaimovski@habana.ai Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Pawel Piskorski ppiskorski@habana.ai Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/misc/habanalabs/common/memory.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c index 327b64479f97..767d3644c033 100644 --- a/drivers/misc/habanalabs/common/memory.c +++ b/drivers/misc/habanalabs/common/memory.c @@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size, return -ENOMEM; }
- rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE, + rc = pin_user_pages_fast(start, npages, + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, userptr->pages);
if (rc != npages) {
This is used by media/videbuf2 for persistent dma mappings, not just for a single dma operation and then freed again, so needs FOLL_LONGTERM.
Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to locking issues. Rework the code to pull the pup path out from the mmap_sem critical section as suggested by Jason.
By relying entirely on the vma checks in pin_user_pages and follow_pfn (for vm_flags and vma_is_fsdax) we can also streamline the code a lot.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Pawel Osciak pawel@osciak.com Cc: Marek Szyprowski m.szyprowski@samsung.com Cc: Kyungmin Park kyungmin.park@samsung.com Cc: Tomasz Figa tfiga@chromium.org Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v2: Streamline the code and further simplify the loop checks (Jason) --- mm/frame_vector.c | 50 ++++++++++++++--------------------------------- 1 file changed, 15 insertions(+), 35 deletions(-)
diff --git a/mm/frame_vector.c b/mm/frame_vector.c index 10f82d5643b6..d44779e56313 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, struct vm_area_struct *vma; int ret = 0; int err; - int locked;
if (nr_frames == 0) return 0; @@ -48,40 +47,25 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
start = untagged_addr(start);
- mmap_read_lock(mm); - locked = 1; - vma = find_vma_intersection(mm, start, start + 1); - if (!vma) { - ret = -EFAULT; - goto out; - } - - /* - * While get_vaddr_frames() could be used for transient (kernel - * controlled lifetime) pinning of memory pages all current - * users establish long term (userspace controlled lifetime) - * page pinning. Treat get_vaddr_frames() like - * get_user_pages_longterm() and disallow it for filesystem-dax - * mappings. - */ - if (vma_is_fsdax(vma)) { - ret = -EOPNOTSUPP; - goto out; - } - - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { + ret = pin_user_pages_fast(start, nr_frames, + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, + (struct page **)(vec->ptrs)); + if (ret > 0) { vec->got_ref = true; vec->is_pfns = false; - ret = pin_user_pages_locked(start, nr_frames, - gup_flags, (struct page **)(vec->ptrs), &locked); - goto out; + goto out_unlocked; }
+ mmap_read_lock(mm); vec->got_ref = false; vec->is_pfns = true; do { unsigned long *nums = frame_vector_pfns(vec);
+ vma = find_vma_intersection(mm, start, start + 1); + if (!vma) + break; + while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) { err = follow_pfn(vma, start, &nums[ret]); if (err) { @@ -92,17 +76,13 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, start += PAGE_SIZE; ret++; } - /* - * We stop if we have enough pages or if VMA doesn't completely - * cover the tail page. - */ - if (ret >= nr_frames || start < vma->vm_end) + /* Bail out if VMA doesn't completely cover the tail page. */ + if (start < vma->vm_end) break; - vma = find_vma_intersection(mm, start, start + 1); - } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP)); + } while (ret < nr_frames); out: - if (locked) - mmap_read_unlock(mm); + mmap_read_unlock(mm); +out_unlocked: if (!ret) ret = -EFAULT; if (ret > 0)
It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR symbol from all over the tree (well just one place, somehow omap media driver still had this in its Kconfig, despite not using it).
Reviewed-by: John Hubbard jhubbard@nvidia.com Acked-by: Mauro Carvalho Chehab mchehab+huawei@kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Pawel Osciak pawel@osciak.com Cc: Marek Szyprowski m.szyprowski@samsung.com Cc: Kyungmin Park kyungmin.park@samsung.com Cc: Tomasz Figa tfiga@chromium.org Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v3: - Create a new frame_vector.h header for this (Mauro) --- drivers/media/common/videobuf2/Kconfig | 1 - drivers/media/common/videobuf2/Makefile | 1 + .../media/common/videobuf2}/frame_vector.c | 2 + drivers/media/platform/omap/Kconfig | 1 - include/linux/mm.h | 42 ----------------- include/media/frame_vector.h | 47 +++++++++++++++++++ include/media/videobuf2-core.h | 1 + mm/Kconfig | 3 -- mm/Makefile | 1 - 9 files changed, 51 insertions(+), 48 deletions(-) rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%) create mode 100644 include/media/frame_vector.h
diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig index edbc99ebba87..d2223a12c95f 100644 --- a/drivers/media/common/videobuf2/Kconfig +++ b/drivers/media/common/videobuf2/Kconfig @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
config VIDEOBUF2_MEMOPS tristate - select FRAME_VECTOR
config VIDEOBUF2_DMA_CONTIG tristate diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile index 77bebe8b202f..54306f8d096c 100644 --- a/drivers/media/common/videobuf2/Makefile +++ b/drivers/media/common/videobuf2/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 videobuf2-common-objs := videobuf2-core.o +videobuf2-common-objs += frame_vector.o
ifeq ($(CONFIG_TRACEPOINTS),y) videobuf2-common-objs += vb2-trace.o diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c similarity index 99% rename from mm/frame_vector.c rename to drivers/media/common/videobuf2/frame_vector.c index d44779e56313..6590987c14bd 100644 --- a/mm/frame_vector.c +++ b/drivers/media/common/videobuf2/frame_vector.c @@ -8,6 +8,8 @@ #include <linux/pagemap.h> #include <linux/sched.h>
+#include <media/frame_vector.h> + /** * get_vaddr_frames() - map virtual addresses to pfns * @start: starting user address diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig index f73b5893220d..de16de46c0f4 100644 --- a/drivers/media/platform/omap/Kconfig +++ b/drivers/media/platform/omap/Kconfig @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT depends on VIDEO_V4L2 select VIDEOBUF2_DMA_CONTIG select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3 - select FRAME_VECTOR help V4L2 Display driver support for OMAP2/3 based boards. diff --git a/include/linux/mm.h b/include/linux/mm.h index 16b799a0522c..acd60fbf1a5a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, struct task_struct *task, bool bypass_rlim);
-/* Container for pinned pfns / pages */ -struct frame_vector { - unsigned int nr_allocated; /* Number of frames we have space for */ - unsigned int nr_frames; /* Number of frames stored in ptrs array */ - bool got_ref; /* Did we pin pages by getting page ref? */ - bool is_pfns; /* Does array contain pages or pfns? */ - void *ptrs[]; /* Array of pinned pfns / pages. Use - * pfns_vector_pages() or pfns_vector_pfns() - * for access */ -}; - -struct frame_vector *frame_vector_create(unsigned int nr_frames); -void frame_vector_destroy(struct frame_vector *vec); -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns, - unsigned int gup_flags, struct frame_vector *vec); -void put_vaddr_frames(struct frame_vector *vec); -int frame_vector_to_pages(struct frame_vector *vec); -void frame_vector_to_pfns(struct frame_vector *vec); - -static inline unsigned int frame_vector_count(struct frame_vector *vec) -{ - return vec->nr_frames; -} - -static inline struct page **frame_vector_pages(struct frame_vector *vec) -{ - if (vec->is_pfns) { - int err = frame_vector_to_pages(vec); - - if (err) - return ERR_PTR(err); - } - return (struct page **)(vec->ptrs); -} - -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec) -{ - if (!vec->is_pfns) - frame_vector_to_pfns(vec); - return (unsigned long *)(vec->ptrs); -} - struct kvec; int get_kernel_pages(const struct kvec *iov, int nr_pages, int write, struct page **pages); diff --git a/include/media/frame_vector.h b/include/media/frame_vector.h new file mode 100644 index 000000000000..1ed0cd64510d --- /dev/null +++ b/include/media/frame_vector.h @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _MEDIA_FRAME_VECTOR_H +#define _MEDIA_FRAME_VECTOR_H + +/* Container for pinned pfns / pages in frame_vector.c */ +struct frame_vector { + unsigned int nr_allocated; /* Number of frames we have space for */ + unsigned int nr_frames; /* Number of frames stored in ptrs array */ + bool got_ref; /* Did we pin pages by getting page ref? */ + bool is_pfns; /* Does array contain pages or pfns? */ + void *ptrs[]; /* Array of pinned pfns / pages. Use + * pfns_vector_pages() or pfns_vector_pfns() + * for access */ +}; + +struct frame_vector *frame_vector_create(unsigned int nr_frames); +void frame_vector_destroy(struct frame_vector *vec); +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns, + unsigned int gup_flags, struct frame_vector *vec); +void put_vaddr_frames(struct frame_vector *vec); +int frame_vector_to_pages(struct frame_vector *vec); +void frame_vector_to_pfns(struct frame_vector *vec); + +static inline unsigned int frame_vector_count(struct frame_vector *vec) +{ + return vec->nr_frames; +} + +static inline struct page **frame_vector_pages(struct frame_vector *vec) +{ + if (vec->is_pfns) { + int err = frame_vector_to_pages(vec); + + if (err) + return ERR_PTR(err); + } + return (struct page **)(vec->ptrs); +} + +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec) +{ + if (!vec->is_pfns) + frame_vector_to_pfns(vec); + return (unsigned long *)(vec->ptrs); +} + +#endif /* _MEDIA_FRAME_VECTOR_H */ diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h index bbb3f26fbde9..d045e3a5a1d8 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h @@ -18,6 +18,7 @@ #include <linux/dma-buf.h> #include <linux/bitops.h> #include <media/media-request.h> +#include <media/frame_vector.h>
#define VB2_MAX_FRAME (32) #define VB2_MAX_PLANES (8) diff --git a/mm/Kconfig b/mm/Kconfig index 6c974888f86f..da6c943fe9f1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -815,9 +815,6 @@ config DEVICE_PRIVATE memory; i.e., memory that is only accessible from the device (or group of devices). You likely also want to select HMM_MIRROR.
-config FRAME_VECTOR - bool - config ARCH_USES_HIGH_VMA_FLAGS bool config ARCH_HAS_PKEYS diff --git a/mm/Makefile b/mm/Makefile index d5649f1c12c0..a025fd6c6afd 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o obj-$(CONFIG_USERFAULTFD) += userfaultfd.o obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
Way back it was a reasonable assumptions that iomem mappings never change the pfn range they point at. But this has changed:
- gpu drivers dynamically manage their memory nowadays, invalidating ptes with unmap_mapping_range when buffers get moved
- contiguous dma allocations have moved from dedicated carvetouts to cma regions. This means if we miss the unmap the pfn might contain pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
- even /dev/mem now invalidates mappings when the kernel requests that iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region")
Accessing pfns obtained from ptes without holding all the locks is therefore no longer a good idea. Fix this.
Since ioremap might need to manipulate pagetables too we need to drop the pt lock and have a retry loop if we raced.
While at it, also add kerneldoc and improve the comment for the vma_ops->access function. It's for accessing, not for moving the memory from iomem to system memory, as the old comment seemed to suggest.
References: 28b2ee20c7cb ("access_process_vm device memory infrastructure") Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Dan Williams dan.j.williams@intel.com Cc: Kees Cook keescook@chromium.org Cc: Benjamin Herrensmidt benh@kernel.crashing.org Cc: Dave Airlie airlied@linux.ie Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v2: Fix inversion in the retry check (John).
v4: While at it, use offset_in_page (Chris Wilson) --- include/linux/mm.h | 3 ++- mm/memory.c | 46 +++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index acd60fbf1a5a..2a16631c1fda 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -566,7 +566,8 @@ struct vm_operations_struct { vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
/* called by access_process_vm when get_user_pages() fails, typically - * for use by special VMAs that can switch between memory and hardware + * for use by special VMAs. See also generic_access_phys() for a generic + * implementation useful for any iomem mapping. */ int (*access)(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write); diff --git a/mm/memory.c b/mm/memory.c index eeae590e526a..1b46eae3b703 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4840,28 +4840,68 @@ int follow_phys(struct vm_area_struct *vma, return ret; }
+/** + * generic_access_phys - generic implementation for iomem mmap access + * @vma: the vma to access + * @addr: userspace addres, not relative offset within @vma + * @buf: buffer to read/write + * @len: length of transfer + * @write: set to FOLL_WRITE when writing, otherwise reading + * + * This is a generic implementation for &vm_operations_struct.access for an + * iomem mapping. This callback is used by access_process_vm() when the @vma is + * not page based. + */ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write) { resource_size_t phys_addr; unsigned long prot = 0; void __iomem *maddr; - int offset = addr & (PAGE_SIZE-1); + pte_t *ptep, pte; + spinlock_t *ptl; + int offset = offset_in_page(addr); + int ret = -EINVAL; + + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) + return -EINVAL; + +retry: + if (follow_pte(vma->vm_mm, addr, &ptep, &ptl)) + return -EINVAL; + pte = *ptep; + pte_unmap_unlock(ptep, ptl);
- if (follow_phys(vma, addr, write, &prot, &phys_addr)) + prot = pgprot_val(pte_pgprot(pte)); + phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT; + + if ((write & FOLL_WRITE) && !pte_write(pte)) return -EINVAL;
maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot); if (!maddr) return -ENOMEM;
+ if (follow_pte(vma->vm_mm, addr, &ptep, &ptl)) + goto out_unmap; + + if (!pte_same(pte, *ptep)) { + pte_unmap_unlock(ptep, ptl); + iounmap(maddr); + + goto retry; + } + if (write) memcpy_toio(maddr + offset, buf, len); else memcpy_fromio(buf, maddr + offset, len); + ret = len; + pte_unmap_unlock(ptep, ptl); +out_unmap: iounmap(maddr);
- return len; + return ret; } EXPORT_SYMBOL_GPL(generic_access_phys); #endif
Way back it was a reasonable assumptions that iomem mappings never change the pfn range they point at. But this has changed:
- gpu drivers dynamically manage their memory nowadays, invalidating ptes with unmap_mapping_range when buffers get moved
- contiguous dma allocations have moved from dedicated carvetouts to cma regions. This means if we miss the unmap the pfn might contain pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
- even /dev/mem now invalidates mappings when the kernel requests that iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region")
Accessing pfns obtained from ptes without holding all the locks is therefore no longer a good idea.
Unfortunately there's some users where this is not fixable (like v4l userptr of iomem mappings) or involves a pile of work (vfio type1 iommu). For now annotate these as unsafe and splat appropriately.
This patch adds an unsafe_follow_pfn, which later patches will then roll out to all appropriate places.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- include/linux/mm.h | 2 ++ mm/memory.c | 32 +++++++++++++++++++++++++++++++- mm/nommu.c | 17 +++++++++++++++++ security/Kconfig | 13 +++++++++++++ 4 files changed, 63 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 2a16631c1fda..ec8c90928fc9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address, pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp); int follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn); +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn); int follow_phys(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned long *prot, resource_size_t *phys); int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/memory.c b/mm/memory.c index 1b46eae3b703..9a2ec07ff20b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4788,7 +4788,12 @@ EXPORT_SYMBOL(follow_pte_pmd); * @address: user virtual address * @pfn: location to store found PFN * - * Only IO mappings and raw PFN mappings are allowed. + * Only IO mappings and raw PFN mappings are allowed. Note that callers must + * ensure coherency with pte updates by using a &mmu_notifier to follow updates. + * If this is not feasible, or the access to the @pfn is only very short term, + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of + * the access instead. Any caller not following these requirements must use + * unsafe_follow_pfn() instead. * * Return: zero and the pfn at @pfn on success, -ve otherwise. */ @@ -4811,6 +4816,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL(follow_pfn);
+/** + * unsafe_follow_pfn - look up PFN at a user virtual address + * @vma: memory mapping + * @address: user virtual address + * @pfn: location to store found PFN + * + * Only IO mappings and raw PFN mappings are allowed. + * + * Returns zero and the pfn at @pfn on success, -ve otherwise. + */ +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn) +{ +#ifdef CONFIG_STRICT_FOLLOW_PFN + pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n"); + return -EINVAL; +#else + WARN_ONCE(1, "unsafe follow_pfn usage\n"); + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + + return follow_pfn(vma, address, pfn); +#endif +} +EXPORT_SYMBOL(unsafe_follow_pfn); + #ifdef CONFIG_HAVE_IOREMAP_PROT int follow_phys(struct vm_area_struct *vma, unsigned long address, unsigned int flags, diff --git a/mm/nommu.c b/mm/nommu.c index 75a327149af1..3db2910f0d64 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL(follow_pfn);
+/** + * unsafe_follow_pfn - look up PFN at a user virtual address + * @vma: memory mapping + * @address: user virtual address + * @pfn: location to store found PFN + * + * Only IO mappings and raw PFN mappings are allowed. + * + * Returns zero and the pfn at @pfn on success, -ve otherwise. + */ +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn) +{ + return follow_pfn(vma, address, pfn); +} +EXPORT_SYMBOL(unsafe_follow_pfn); + LIST_HEAD(vmap_area_list);
void vfree(const void *addr) diff --git a/security/Kconfig b/security/Kconfig index 7561f6f99f1d..48945402e103 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH If you wish for all usermode helper programs to be disabled, specify an empty string here (i.e. "").
+config STRICT_FOLLOW_PFN + bool "Disable unsafe use of follow_pfn" + depends on MMU + help + Some functionality in the kernel follows userspace mappings to iomem + ranges in an unsafe matter. Examples include v4l userptr for zero-copy + buffers sharing. + + If this option is switched on, such access is rejected. Only enable + this option when you must run userspace which requires this. + + If in doubt, say Y. + source "security/selinux/Kconfig" source "security/smack/Kconfig" source "security/tomoyo/Kconfig"
The media model assumes that buffers are all preallocated, so that when a media pipeline is running we never miss a deadline because the buffers aren't allocated or available.
This means we cannot fix the v4l follow_pfn usage through mmu_notifier, without breaking how this all works. The only real fix is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and tell everyone to cut over to dma-buf memory sharing for zerocopy.
userptr for normal memory will keep working as-is, this only affects the zerocopy userptr usage enabled in 50ac952d2263 ("[media] videobuf2-dma-sg: Support io userptr operations on io memory").
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Pawel Osciak pawel@osciak.com Cc: Marek Szyprowski m.szyprowski@samsung.com Cc: Kyungmin Park kyungmin.park@samsung.com Cc: Tomasz Figa tfiga@chromium.org Cc: Laurent Dufour ldufour@linux.ibm.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: Michel Lespinasse walken@google.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v3: - Reference the commit that enabled the zerocopy userptr use case to make it abundandtly clear that this patch only affects that, and not normal memory userptr. The old commit message already explained that normal memory userptr is unaffected, but I guess that was not clear enough. --- drivers/media/common/videobuf2/frame_vector.c | 2 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c index 6590987c14bd..e630494da65c 100644 --- a/drivers/media/common/videobuf2/frame_vector.c +++ b/drivers/media/common/videobuf2/frame_vector.c @@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, break;
while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) { - err = follow_pfn(vma, start, &nums[ret]); + err = unsafe_follow_pfn(vma, start, &nums[ret]); if (err) { if (ret == 0) ret = err; diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c index 52312ce2ba05..821c4a76ab96 100644 --- a/drivers/media/v4l2-core/videobuf-dma-contig.c +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c @@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem, user_address = untagged_baddr;
while (pages_done < (mem->size >> PAGE_SHIFT)) { - ret = follow_pfn(vma, user_address, &this_pfn); + ret = unsafe_follow_pfn(vma, user_address, &this_pfn); if (ret) break;
The code seems to stuff these pfns into iommu pts (or something like that, I didn't follow), but there's no mmu_notifier to ensure that access is synchronized with pte updates.
Hence mark these as unsafe. This means that with CONFIG_STRICT_FOLLOW_PFN, these will be rejected.
Real fix is to wire up an mmu_notifier ... somehow. Probably means any invalidate is a fatal fault for this vfio device, but then this shouldn't ever happen if userspace is reasonable.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Alex Williamson alex.williamson@redhat.com Cc: Cornelia Huck cohuck@redhat.com Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/vfio/vfio_iommu_type1.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 5fbf0c1f7433..a4d53f3d0a35 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, { int ret;
- ret = follow_pfn(vma, vaddr, pfn); + ret = unsafe_follow_pfn(vma, vaddr, pfn); if (ret) { bool unlocked = false;
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, if (ret) return ret;
- ret = follow_pfn(vma, vaddr, pfn); + ret = unsafe_follow_pfn(vma, vaddr, pfn); }
return ret;
There's three ways to access PCI BARs from userspace: /dev/mem, sysfs files, and the old proc interface. Two check against iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, this starts to matter, since we don't want random userspace having access to PCI BARs while a driver is loaded and using it.
Fix this by adding the same iomem_is_exclusive() check we already have on the sysfs side in pci_mmap_resource().
References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v2: Improve commit message (Bjorn) --- drivers/pci/proc.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index d35186b01d98..3a2f90beb4cb 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma) else return -EINVAL; } + + if (dev->resource[i].flags & IORESOURCE_MEM && + iomem_is_exclusive(dev->resource[i].start)) + return -EINVAL; + ret = pci_mmap_page_range(dev, i, vma, fpriv->mmap_state, write_combine); if (ret < 0)
When we care about pagecache maintenance, we need to make sure that both f_mapping and i_mapping point at the right mapping.
But for iomem mappings we only care about the virtual/pte side of things, so f_mapping is enough. Also setting inode->i_mapping was confusing me as a driver maintainer, since in e.g. drivers/gpu we don't do that. Per Dan this seems to be copypasta from places which do care about pagecache consistency, but not needed. Hence remove it for slightly less confusion.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Reviewed-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/char/mem.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c index abd4ffdc8cde..5502f56f3655 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -864,7 +864,6 @@ static int open_port(struct inode *inode, struct file *filp) * revocations when drivers want to take over a /dev/mem mapped * range. */ - inode->i_mapping = devmem_inode->i_mapping; filp->f_mapping = inode->i_mapping;
return 0;
We want all iomem mmaps to consistently revoke ptes when the kernel takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the pci bar mmaps available through procfs and sysfs, which currently do not revoke mappings.
To prepare for this, move the code from the /dev/kmem driver to kernel/resource.c.
Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: David Hildenbrand david@redhat.com Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v3: - add barrier for consistency and document why we don't have to check for NULL (Jason) v4 - Adjust comments to reflect the general nature of this iomem revoke code now (Dan) --- drivers/char/mem.c | 85 +--------------------------------- include/linux/ioport.h | 6 +-- kernel/resource.c | 101 ++++++++++++++++++++++++++++++++++++++++- 3 files changed, 102 insertions(+), 90 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 5502f56f3655..53338aad8d28 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -31,9 +31,6 @@ #include <linux/uio.h> #include <linux/uaccess.h> #include <linux/security.h> -#include <linux/pseudo_fs.h> -#include <uapi/linux/magic.h> -#include <linux/mount.h>
#ifdef CONFIG_IA64 # include <linux/efi.h> @@ -809,42 +806,6 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig) return ret; }
-static struct inode *devmem_inode; - -#ifdef CONFIG_IO_STRICT_DEVMEM -void revoke_devmem(struct resource *res) -{ - /* pairs with smp_store_release() in devmem_init_inode() */ - struct inode *inode = smp_load_acquire(&devmem_inode); - - /* - * Check that the initialization has completed. Losing the race - * is ok because it means drivers are claiming resources before - * the fs_initcall level of init and prevent /dev/mem from - * establishing mappings. - */ - if (!inode) - return; - - /* - * The expectation is that the driver has successfully marked - * the resource busy by this point, so devmem_is_allowed() - * should start returning false, however for performance this - * does not iterate the entire resource range. - */ - if (devmem_is_allowed(PHYS_PFN(res->start)) && - devmem_is_allowed(PHYS_PFN(res->end))) { - /* - * *cringe* iomem=relaxed says "go ahead, what's the - * worst that can happen?" - */ - return; - } - - unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1); -} -#endif - static int open_port(struct inode *inode, struct file *filp) { int rc; @@ -864,7 +825,7 @@ static int open_port(struct inode *inode, struct file *filp) * revocations when drivers want to take over a /dev/mem mapped * range. */ - filp->f_mapping = inode->i_mapping; + filp->f_mapping = iomem_get_mapping();
return 0; } @@ -995,48 +956,6 @@ static char *mem_devnode(struct device *dev, umode_t *mode)
static struct class *mem_class;
-static int devmem_fs_init_fs_context(struct fs_context *fc) -{ - return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM; -} - -static struct file_system_type devmem_fs_type = { - .name = "devmem", - .owner = THIS_MODULE, - .init_fs_context = devmem_fs_init_fs_context, - .kill_sb = kill_anon_super, -}; - -static int devmem_init_inode(void) -{ - static struct vfsmount *devmem_vfs_mount; - static int devmem_fs_cnt; - struct inode *inode; - int rc; - - rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt); - if (rc < 0) { - pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc); - return rc; - } - - inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb); - if (IS_ERR(inode)) { - rc = PTR_ERR(inode); - pr_err("Cannot allocate inode for /dev/mem: %d\n", rc); - simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt); - return rc; - } - - /* - * Publish /dev/mem initialized. - * Pairs with smp_load_acquire() in revoke_devmem(). - */ - smp_store_release(&devmem_inode, inode); - - return 0; -} - static int __init chr_dev_init(void) { int minor; @@ -1058,8 +977,6 @@ static int __init chr_dev_init(void) */ if ((minor == DEVPORT_MINOR) && !arch_has_dev_port()) continue; - if ((minor == DEVMEM_MINOR) && devmem_init_inode() != 0) - continue;
device_create(mem_class, NULL, MKDEV(MEM_MAJOR, minor), NULL, devlist[minor].name); diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 6c2b06fe8beb..8ffb61b36606 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -302,11 +302,7 @@ struct resource *devm_request_free_mem_region(struct device *dev, struct resource *request_free_mem_region(struct resource *base, unsigned long size, const char *name);
-#ifdef CONFIG_IO_STRICT_DEVMEM -void revoke_devmem(struct resource *res); -#else -static inline void revoke_devmem(struct resource *res) { }; -#endif +extern struct address_space *iomem_get_mapping(void);
#endif /* __ASSEMBLY__ */ #endif /* _LINUX_IOPORT_H */ diff --git a/kernel/resource.c b/kernel/resource.c index 841737bbda9e..a800acbc578c 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -18,12 +18,15 @@ #include <linux/spinlock.h> #include <linux/fs.h> #include <linux/proc_fs.h> +#include <linux/pseudo_fs.h> #include <linux/sched.h> #include <linux/seq_file.h> #include <linux/device.h> #include <linux/pfn.h> #include <linux/mm.h> +#include <linux/mount.h> #include <linux/resource_ext.h> +#include <uapi/linux/magic.h> #include <asm/io.h>
@@ -1112,6 +1115,58 @@ resource_size_t resource_alignment(struct resource *res)
static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
+static struct inode *iomem_inode; + +#ifdef CONFIG_IO_STRICT_DEVMEM +static void revoke_iomem(struct resource *res) +{ + /* pairs with smp_store_release() in iomem_init_inode() */ + struct inode *inode = smp_load_acquire(&iomem_inode); + + /* + * Check that the initialization has completed. Losing the race + * is ok because it means drivers are claiming resources before + * the fs_initcall level of init and prevent iomem_get_mapping users + * from establishing mappings. + */ + if (!inode) + return; + + /* + * The expectation is that the driver has successfully marked + * the resource busy by this point, so devmem_is_allowed() + * should start returning false, however for performance this + * does not iterate the entire resource range. + */ + if (devmem_is_allowed(PHYS_PFN(res->start)) && + devmem_is_allowed(PHYS_PFN(res->end))) { + /* + * *cringe* iomem=relaxed says "go ahead, what's the + * worst that can happen?" + */ + return; + } + + unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1); +} +struct address_space *iomem_get_mapping(void) +{ + /* + * This function is only called from file open paths, hence guaranteed + * that fs_initcalls have completed and no need to check for NULL. But + * since revoke_iomem can be called before the initcall we still need + * the barrier to appease checkers. + */ + return smp_load_acquire(&iomem_inode)->i_mapping; +} +#else +static void revoke_iomem(struct resource *res) {} +struct address_space *iomem_get_mapping(void) +{ + return NULL; +} +#endif + /** * __request_region - create a new busy resource region * @parent: parent resource descriptor @@ -1179,7 +1234,7 @@ struct resource * __request_region(struct resource *parent, write_unlock(&resource_lock);
if (res && orig_parent == &iomem_resource) - revoke_devmem(res); + revoke_iomem(res);
return res; } @@ -1713,4 +1768,48 @@ static int __init strict_iomem(char *str) return 1; }
+static int iomem_fs_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type iomem_fs_type = { + .name = "iomem", + .owner = THIS_MODULE, + .init_fs_context = iomem_fs_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static int __init iomem_init_inode(void) +{ + static struct vfsmount *iomem_vfs_mount; + static int iomem_fs_cnt; + struct inode *inode; + int rc; + + rc = simple_pin_fs(&iomem_fs_type, &iomem_vfs_mount, &iomem_fs_cnt); + if (rc < 0) { + pr_err("Cannot mount iomem pseudo filesystem: %d\n", rc); + return rc; + } + + inode = alloc_anon_inode(iomem_vfs_mount->mnt_sb); + if (IS_ERR(inode)) { + rc = PTR_ERR(inode); + pr_err("Cannot allocate inode for iomem: %d\n", rc); + simple_release_fs(&iomem_vfs_mount, &iomem_fs_cnt); + return rc; + } + + /* + * Publish iomem revocation inode initialized. + * Pairs with smp_load_acquire() in revoke_iomem(). + */ + smp_store_release(&iomem_inode, inode); + + return 0; +} + +fs_initcall(iomem_init_inode); + __setup("iomem=", strict_iomem);
We want to be able to revoke pci mmaps so that the same access rules applies as for /dev/kmem. Revoke support for devmem was added in 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region").
The simplest way to achieve this is by having the same filp->f_mapping for all mappings, so that unmap_mapping_range can find them all, no matter through which file they've been created. Since this must be set at open time we need sysfs support for this.
Add an optional mapping parameter bin_attr, which is only consulted when there's also an mmap callback, since without mmap support allowing to adjust the ->f_mapping makes no sense.
Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: "Rafael J. Wysocki" rafael@kernel.org Cc: Christian Brauner christian.brauner@ubuntu.com Cc: "David S. Miller" davem@davemloft.net Cc: Michael Ellerman mpe@ellerman.id.au Cc: Sourabh Jain sourabhjain@linux.ibm.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mauro Carvalho Chehab mchehab+huawei@kernel.org Cc: Nayna Jain nayna@linux.ibm.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- fs/sysfs/file.c | 11 +++++++++++ include/linux/sysfs.h | 2 ++ 2 files changed, 13 insertions(+)
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index eb6897ab78e7..9d8ccdb000e3 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -169,6 +169,16 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of, return battr->mmap(of->file, kobj, battr, vma); }
+static int sysfs_kf_bin_open(struct kernfs_open_file *of) +{ + struct bin_attribute *battr = of->kn->priv; + + if (battr->mapping) + of->file->f_mapping = battr->mapping; + + return 0; +} + void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr) { struct kernfs_node *kn = kobj->sd, *tmp; @@ -240,6 +250,7 @@ static const struct kernfs_ops sysfs_bin_kfops_mmap = { .read = sysfs_kf_bin_read, .write = sysfs_kf_bin_write, .mmap = sysfs_kf_bin_mmap, + .open = sysfs_kf_bin_open, };
int sysfs_add_file_mode_ns(struct kernfs_node *parent, diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h index 34e84122f635..a17a474d1601 100644 --- a/include/linux/sysfs.h +++ b/include/linux/sysfs.h @@ -164,11 +164,13 @@ __ATTRIBUTE_GROUPS(_name)
struct file; struct vm_area_struct; +struct address_space;
struct bin_attribute { struct attribute attr; size_t size; void *private; + struct address_space *mapping; ssize_t (*read)(struct file *, struct kobject *, struct bin_attribute *, char *, loff_t, size_t); ssize_t (*write)(struct file *, struct kobject *, struct bin_attribute *,
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps ptes when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses.
Except there's two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole.
For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time:
- for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported - for procfs it's a bit more tricky, since procfs pci access has only one file per device, and access to a specific resources first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm.
A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There's only 2 ways in-tree to support mmap of ioports: generic pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approach support ioport mmap through a special pfn range and not through magic pte attributes. Aliasing is therefore not a problem.
The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v2: - Totally new approach: Adjust filp->f_mapping at open time. Note that this now works on all architectures, not just those support ARCH_GENERIC_PCI_MMAP_RESOURCE --- drivers/pci/pci-sysfs.c | 4 ++++ drivers/pci/proc.c | 1 + 2 files changed, 5 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 6d78df981d41..cee38fcb4a86 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -928,6 +928,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; b->legacy_io->mmap = pci_mmap_legacy_io; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_io); error = device_create_bin_file(&b->dev, b->legacy_io); if (error) @@ -940,6 +941,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); if (error) @@ -1155,6 +1157,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } } + if (res_attr->mmap) + res_attr->mapping = iomem_get_mapping(); res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num); diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 3a2f90beb4cb..9bab07302bbf 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) fpriv->write_combine = 0;
file->private_data = fpriv; + file->f_mapping = iomem_get_mapping();
return 0; }
On Fri, Oct 23, 2020 at 02:21:12PM +0200, Daniel Vetter wrote:
With the removal of helper support it doesn't do anything anymore. Also, we already have async plane update code in vc4.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Eric Anholt eric@anholt.net Cc: Maxime Ripard mripard@kernel.org
Apologies for this patch bomb, typoed git send-email command. -Daniel
drivers/gpu/drm/vc4/vc4_kms.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c index 149825ff5df8..bf0da77ab2e6 100644 --- a/drivers/gpu/drm/vc4/vc4_kms.c +++ b/drivers/gpu/drm/vc4/vc4_kms.c @@ -353,12 +353,6 @@ static int vc4_atomic_commit(struct drm_device *dev, return 0; }
- /* We know for sure we don't want an async update here. Set
* state->legacy_cursor_update to false to prevent
* drm_atomic_helper_setup_commit() from auto-completing
* commit->flip_done.
*/
- state->legacy_cursor_update = false; ret = drm_atomic_helper_setup_commit(state, nonblock); if (ret) return ret;
-- 2.28.0
The stuff never really worked, and leads to lots of fun because it out-of-order frees atomic states. Which upsets KASAN, among other things.
For async updates we now have a more solid solution with the ->atomic_async_check and ->atomic_async_commit hooks. Support for that for msm and vc4 landed. nouveau and i915 have their own commit routines, doing something similar.
For everyone else it's probably better to remove the use-after-free bug, and encourage folks to use the async support instead. The affected drivers which register a legacy cursor plane and don't either use the new async stuff or their own commit routine are: amdgpu, atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
Inspired by an amdgpu bug report.
v2: Drop RFC, I think with amdgpu converted over to use atomic_async_check/commit done in
commit 674e78acae0dfb4beb56132e41cbae5b60f7d662 Author: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Date: Wed Dec 5 14:59:07 2018 -0500
drm/amd/display: Add fast path for cursor plane updates
we don't have any driver anymore where we have userspace expecting solid legacy cursor support _and_ they are using the atomic helpers in their fully glory. So we can retire this.
v3: Paper over msm and i915 regression. The complete_all is the only thing missing afaict.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199425 Cc: mikita.lipski@amd.com Cc: Michel Dänzer michel@daenzer.net Cc: harry.wentland@amd.com Cc: Rob Clark robdclark@gmail.com Cc: "Kazlauskas, Nicholas" nicholas.kazlauskas@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/drm_atomic_helper.c | 13 ------------- drivers/gpu/drm/i915/display/intel_display.c | 13 +++++++++++++ drivers/gpu/drm/msm/msm_atomic.c | 2 ++ 3 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index a7bcb4b4586c..549a31e6042c 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1481,13 +1481,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev, int i, ret; unsigned crtc_mask = 0;
- /* - * Legacy cursor ioctls are completely unsynced, and userspace - * relies on that (by doing tons of cursor updates). - */ - if (old_state->legacy_cursor_update) - return; - for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, new_crtc_state, i) { if (!new_crtc_state->active) continue; @@ -2106,12 +2099,6 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, continue; }
- /* Legacy cursor updates are fully unsynced. */ - if (state->legacy_cursor_update) { - complete_all(&commit->flip_done); - continue; - } - if (!new_crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL); diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 130303e0298a..4257ad7c69c7 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -16122,6 +16122,19 @@ static int intel_atomic_commit(struct drm_device *dev, state->base.legacy_cursor_update = false; }
+ /* + * FIXME: Cut over to (async) commit helpers instead of hand-rolling + * everything. + */ + if (state->base.legacy_cursor_update) { + struct intel_crtc_state *new_crtc_state; + struct intel_crtc *crtc; + int i; + + for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) + complete_all(&new_crtc_state->uapi.commit->flip_done); + } + ret = intel_atomic_prepare_commit(state); if (ret) { drm_dbg_atomic(&dev_priv->drm, diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 561bfa48841c..e0599b16a4ef 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -215,6 +215,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) /* async updates are limited to single-crtc updates: */ WARN_ON(crtc_mask != drm_crtc_mask(async_crtc));
+ complete_all(&async_crtc->state->commit->flip_done); + /* * Start timer if we don't already have an update pending * on this crtc:
dri-devel@lists.freedesktop.org