Hi all,
We're having some good fun with the i915 mmu notifier (it deadlocks), and I think it'd be very useful to have a bunch more runtime debug checks to catch screw-ups.
I'm also working on some lockdep improvements in gpu code (better annotations and stuff like that). Together with this series here this seems to catch a lot of bugs pretty much instantly, which previously took hours/days of CI workloads to reproduce. Plus now you get nice backtraces and the kernel keeps working, whereas without this it's real deadlocks with piles of stuck processes (the deadlock needed at least 3 processes, but generally it took more to close the loop, plus everyone piling in on top).
If this looks like a good idea I'm happy to polish it for merging.
Thanks, Daniel
Daniel Vetter (3): mm: Check if mmu notifier callbacks are allowed to fail mm, notifier: Catch sleeping/blocking for !blockable mm, notifier: Add a lockdep map for invalidate_range_start
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 17 ++++++++++++++++- 2 files changed, 23 insertions(+), 1 deletion(-)
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: David Rientjes rientjes@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- mm/mmu_notifier.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..59e102589a25 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : ""); + WARN(blockable,"%pS callback failure not allowed\n", + mn->ops->invalidate_range_start); ret = _ret; } }
Quoting Daniel Vetter (2018-11-22 16:51:04)
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
Most callers could handle the failure correctly. It looks like the failure was not propagated for convenience. -Chris
On Thu, Nov 22, 2018 at 04:53:34PM +0000, Chris Wilson wrote:
Quoting Daniel Vetter (2018-11-22 16:51:04)
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
Most callers could handle the failure correctly. It looks like the failure was not propagated for convenience.
I have no idea whether the mm is semantically ok if pte shootdown doesn't work for all sorts of strange reasons. From the commit that introduced the error code it souded like this was very much only ok in the limited case of an already killed process, in the oom killer path, where it's really only about trying to free any kind of memory. And where the process is gone already, so semantics of what exactly happens don't matter that much anymore.
And even if a lot more paths could support some kind of error recovery (they'd need to restart stuff, at least for your i915 patch to work I think), as long as we have paths where that's not allowed I think it's good to catch any bugs where a nonzero errno is errornously returned. -Daniel
On Fri 23-11-18 09:49:34, Daniel Vetter wrote:
On Thu, Nov 22, 2018 at 04:53:34PM +0000, Chris Wilson wrote:
Quoting Daniel Vetter (2018-11-22 16:51:04)
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
Most callers could handle the failure correctly. It looks like the failure was not propagated for convenience.
I have no idea whether the mm is semantically ok if pte shootdown doesn't work for all sorts of strange reasons. From the commit that introduced the error code it souded like this was very much only ok in the limited case of an already killed process, in the oom killer path, where it's really only about trying to free any kind of memory. And where the process is gone already, so semantics of what exactly happens don't matter that much anymore.
Yes this was indeed the case. There is still the exit path which would do the rest of the work so we are not leaving anything behind.
Am 22.11.18 um 17:51 schrieb Daniel Vetter:
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: David Rientjes rientjes@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Acked-by: Christian König christian.koenig@amd.com
mm/mmu_notifier.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..59e102589a25 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : "");
WARN(blockable,"%pS callback failure not allowed\n",
}mn->ops->invalidate_range_start); ret = _ret; }
On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
What does WARN give you more than the existing pr_info? Is really backtrace that interesting?
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: David Rientjes rientjes@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
mm/mmu_notifier.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..59e102589a25 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : "");
WARN(blockable,"%pS callback failure not allowed\n",
}mn->ops->invalidate_range_start); ret = _ret; }
-- 2.19.1
On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:
On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
What does WARN give you more than the existing pr_info? Is really backtrace that interesting?
Automated tools have to ignore everything at info level (there's too much of that). I guess I could do something like
if (blockable) pr_warn(...) else pr_info(...)
WARN() is simply my goto tool for getting something at warning level dumped into dmesg. But I think the pr_warn with the callback function should be enough indeed.
If you wonder where all the info level stuff happens that we have to ignore: suspend/resume is a primary culprit (fairly important for gfx/desktops), but there's a bunch of other places. Even if we ignore everything at info and below we still need filters because some drivers are a bit too trigger-happy (i915 definitely included I guess, so everyone contributes to this problem).
Cheers, Daniel
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: David Rientjes rientjes@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
mm/mmu_notifier.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..59e102589a25 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : "");
WARN(blockable,"%pS callback failure not allowed\n",
}mn->ops->invalidate_range_start); ret = _ret; }
-- 2.19.1
-- Michal Hocko SUSE Labs
On Fri 23-11-18 13:30:57, Daniel Vetter wrote:
On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:
On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
What does WARN give you more than the existing pr_info? Is really backtrace that interesting?
Automated tools have to ignore everything at info level (there's too much of that). I guess I could do something like
if (blockable) pr_warn(...) else pr_info(...)
WARN() is simply my goto tool for getting something at warning level dumped into dmesg. But I think the pr_warn with the callback function should be enough indeed.
I wouldn't mind s@pr_info@pr_warn@
If you wonder where all the info level stuff happens that we have to ignore: suspend/resume is a primary culprit (fairly important for gfx/desktops), but there's a bunch of other places. Even if we ignore everything at info and below we still need filters because some drivers are a bit too trigger-happy (i915 definitely included I guess, so everyone contributes to this problem).
Thanks for the clarification.
On Fri, Nov 23, 2018 at 1:43 PM Michal Hocko mhocko@kernel.org wrote:
On Fri 23-11-18 13:30:57, Daniel Vetter wrote:
On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:
On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
What does WARN give you more than the existing pr_info? Is really backtrace that interesting?
Automated tools have to ignore everything at info level (there's too much of that). I guess I could do something like
if (blockable) pr_warn(...) else pr_info(...)
WARN() is simply my goto tool for getting something at warning level dumped into dmesg. But I think the pr_warn with the callback function should be enough indeed.
I wouldn't mind s@pr_info@pr_warn@
Well that's too much, because then it would misfire in the oom testcase, where failing is ok (desireble even, we want to avoid blocking after all). So needs to be a switch (or else we need to filter it in results, and that's a bit a maintenance headache from a CI pov). -Danile
If you wonder where all the info level stuff happens that we have to ignore: suspend/resume is a primary culprit (fairly important for gfx/desktops), but there's a bunch of other places. Even if we ignore everything at info and below we still need filters because some drivers are a bit too trigger-happy (i915 definitely included I guess, so everyone contributes to this problem).
Thanks for the clarification.
Michal Hocko SUSE Labs
On Fri 23-11-18 14:15:11, Daniel Vetter wrote:
On Fri, Nov 23, 2018 at 1:43 PM Michal Hocko mhocko@kernel.org wrote:
On Fri 23-11-18 13:30:57, Daniel Vetter wrote:
On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:
On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.
What does WARN give you more than the existing pr_info? Is really backtrace that interesting?
Automated tools have to ignore everything at info level (there's too much of that). I guess I could do something like
if (blockable) pr_warn(...) else pr_info(...)
WARN() is simply my goto tool for getting something at warning level dumped into dmesg. But I think the pr_warn with the callback function should be enough indeed.
I wouldn't mind s@pr_info@pr_warn@
Well that's too much, because then it would misfire in the oom testcase, where failing is ok (desireble even, we want to avoid blocking after all). So needs to be a switch (or else we need to filter it in results, and that's a bit a maintenance headache from a CI pov).
I thought the failure should be rare enough that warning about them can be actually useful. E.g. in the oom case we can live with the failure because we want to release _some_ memory but know about a callback that prevents us to go the full way might be interesting.
But I do not really feel strongly about this. I find WARN a bit abuse because the trace is unlikely going to help us much. If you want to make a verbosity depending on the blockable context then I will surely not stand in the way.
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) { - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable); + int _ret; + + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable) + preempt_disable(); + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable); + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable) + preempt_enable(); if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret,
Am 22.11.18 um 17:51 schrieb Daniel Vetter:
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
int _ret;
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_disable();
_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_enable();
Just for the sake of better documenting this how about adding this to include/linux/kernel.h right next to might_sleep():
#define disallow_sleeping_if(cond) for((cond) ? preempt_disable() : (void)0; (cond); preempt_disable())
(Just from the back of my head, might contain peanuts and/or hints of errors).
Christian.
if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret,
On Thu, Nov 22, 2018 at 06:55:17PM +0000, Koenig, Christian wrote:
Am 22.11.18 um 17:51 schrieb Daniel Vetter:
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
int _ret;
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_disable();
_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_enable();
Just for the sake of better documenting this how about adding this to include/linux/kernel.h right next to might_sleep():
#define disallow_sleeping_if(cond) for((cond) ? preempt_disable() : (void)0; (cond); preempt_disable())
(Just from the back of my head, might contain peanuts and/or hints of errors).
I think these magic for blocks aren't used in the kernel. goto breaks them, and we use goto a lot. I think a disallow/allow_sleep() pair with the conditional preept_disable/enable() calls would be nice though. I can do that if the overall idea sticks. -Daniel
Christian.
if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret,
Am 23.11.18 um 09:46 schrieb Daniel Vetter:
On Thu, Nov 22, 2018 at 06:55:17PM +0000, Koenig, Christian wrote:
Am 22.11.18 um 17:51 schrieb Daniel Vetter:
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
int _ret;
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_disable();
_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_enable();
Just for the sake of better documenting this how about adding this to include/linux/kernel.h right next to might_sleep():
#define disallow_sleeping_if(cond) for((cond) ? preempt_disable() : (void)0; (cond); preempt_disable())
(Just from the back of my head, might contain peanuts and/or hints of errors).
I think these magic for blocks aren't used in the kernel. goto breaks them, and we use goto a lot.
Yeah, good argument.
I think a disallow/allow_sleep() pair with the conditional preept_disable/enable() calls would be nice though. I can do that if the overall idea sticks.
Sounds like a good idea to me as well.
Christian.
-Daniel
Christian.
if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret,
On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
int _ret;
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_disable();
_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_enable(); if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret,
-- 2.19.1
On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?
Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already.
There's been patches floating with this kind of bug I think, and the call chains we're dealing with a fairly deep. I don't trust review to reliably catch this kind of fail, that's why I'm looking into tools to better validat this stuff to augment review.
And yes it's ugly :-/
Wrt the behavior difference: I guess we could put another counter into the task struct, and change might_sleep() to check it. All under CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable sideeffect. My worry with that is that people will spot it, and abuse it in creative ways that do affect semantics. See horrors like drm_can_sleep() (and I'm sure gfx folks are not the only ones who seriously lacked taste here).
Up to the experts really how to best paint this shed I think.
Thanks, Daniel
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
int _ret;
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_disable();
_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
preempt_enable(); if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret,
-- 2.19.1
-- Michal Hocko SUSE Labs
On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?
Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already.
Create a task with oom_score_adj = 1000 and trigger the oom killer via sysrq and you should get a predictable oom invocation and execution.
[...]
Wrt the behavior difference: I guess we could put another counter into the task struct, and change might_sleep() to check it. All under CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable sideeffect. My worry with that is that people will spot it, and abuse it in creative ways that do affect semantics. See horrors like drm_can_sleep() (and I'm sure gfx folks are not the only ones who seriously lacked taste here).
Up to the experts really how to best paint this shed I think.
Actually I like a way to say non_block_{begin,end} and might_sleep firing inside that context.
On Fri, Nov 23, 2018 at 1:46 PM Michal Hocko mhocko@kernel.org wrote:
On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?
Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already.
Create a task with oom_score_adj = 1000 and trigger the oom killer via sysrq and you should get a predictable oom invocation and execution.
Ah right. We kinda do that already in an attempt to get the tests killed without the runner, for accidental oom. Just didn't think about this in the context of intentionally firing the oom. I'll try whether I can bake up some new subtest in our userptr/mmu-notifier testcases.
[...]
Wrt the behavior difference: I guess we could put another counter into the task struct, and change might_sleep() to check it. All under CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable sideeffect. My worry with that is that people will spot it, and abuse it in creative ways that do affect semantics. See horrors like drm_can_sleep() (and I'm sure gfx folks are not the only ones who seriously lacked taste here).
Up to the experts really how to best paint this shed I think.
Actually I like a way to say non_block_{begin,end} and might_sleep firing inside that context.
Ok, I'll respin with these (introduced in a separate patch). -Daniel
On 23/11/2018 13:12, Daniel Vetter wrote:
On Fri, Nov 23, 2018 at 1:46 PM Michal Hocko mhocko@kernel.org wrote:
On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.
I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.
Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?
Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already.
Create a task with oom_score_adj = 1000 and trigger the oom killer via sysrq and you should get a predictable oom invocation and execution.
Ah right. We kinda do that already in an attempt to get the tests killed without the runner, for accidental oom. Just didn't think about this in the context of intentionally firing the oom. I'll try whether I can bake up some new subtest in our userptr/mmu-notifier testcases.
Very handy trick - I think I will think of applying it in the shrinker area as well.
Regards,
Tvrtko
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.
A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.
By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.
Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.
Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
#ifdef CONFIG_MMU_NOTIFIER
+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif + /* * The mmu notifier_mm structure is allocated and installed in * mm->mmu_notifier_mm inside the mm_take_all_locks() protected @@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) { + mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0, + _RET_IP_); if (mm_has_notifiers(mm)) __mmu_notifier_invalidate_range_start(mm, start, end, true); + mutex_release(&__mmu_notifier_invalidate_range_start_map, 1, _RET_IP_); }
static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm, diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 4d282cfb296e..c6e797927376 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -23,6 +23,13 @@ /* global SRCU for all MMs */ DEFINE_STATIC_SRCU(srcu);
+#ifdef CONFIG_LOCKDEP +struct lockdep_map __mmu_notifier_invalidate_range_start_map = { + .name = "mmu_notifier_invalidate_range_start" +}; +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map); +#endif + /* * This function allows mmu_notifier::release callback to delay a call to * a function that will free appropriate resources. The function must be
On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.
A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.
By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.
Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.
Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.
Thanks, Daniel
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
#ifdef CONFIG_MMU_NOTIFIER
+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif
/*
- The mmu notifier_mm structure is allocated and installed in
- mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
- mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
if (mm_has_notifiers(mm)) __mmu_notifier_invalidate_range_start(mm, start, end, true);_RET_IP_);
- mutex_release(&__mmu_notifier_invalidate_range_start_map, 1, _RET_IP_);
}
static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm, diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 4d282cfb296e..c6e797927376 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -23,6 +23,13 @@ /* global SRCU for all MMs */ DEFINE_STATIC_SRCU(srcu);
+#ifdef CONFIG_LOCKDEP +struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
- .name = "mmu_notifier_invalidate_range_start"
+}; +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map); +#endif
/*
- This function allows mmu_notifier::release callback to delay a call to
- a function that will free appropriate resources. The function must be
-- 2.19.1
Quoting Daniel Vetter (2018-11-27 07:49:18)
On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.
A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.
By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.
Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.
Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.
Thanks, Daniel
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
#ifdef CONFIG_MMU_NOTIFIER
+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif
/*
- The mmu notifier_mm structure is allocated and installed in
- mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
_RET_IP_);
Would not lock_acquire_shared() be more appropriate, i.e. treat this as a rwsem_acquire_read()? -Chris
On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Daniel Vetter (2018-11-27 07:49:18)
On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.
A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.
By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.
Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.
Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.
Thanks, Daniel
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
#ifdef CONFIG_MMU_NOTIFIER
+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif
/*
- The mmu notifier_mm structure is allocated and installed in
- mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
_RET_IP_);
Would not lock_acquire_shared() be more appropriate, i.e. treat this as a rwsem_acquire_read()?
read lock critical sections can't create any dependencies against any other read lock critical section of the same lock. Switching this to a read lock would just render the annotation pointless (if you don't include at least some write lock critical section somewhere, but I have no idea where you'd do that). A read lock that you only ever take for reading essentially doesn't do anything at all.
So not clear on why you're suggesting this?
It's the exact same idea like fs_reclaim of intserting a fake lock to tie all possible callchains to a given functions together with all possible callchains from that function. Of course this is only valid if all NxM combinations could happen in theory. For fs_reclaim that's true because direct reclaim can pick anything it wants to shrink/evict. For mmu notifier that's true as long as we assume any mmu notifier can be in use by any process, which only depends upon sufficiently contrived/evil userspace.
I guess I could use lock_map_acquire/release() wrappers for this like fs_reclaim, would be a bit more clear. -Daniel
Quoting Daniel Vetter (2018-11-27 17:28:43)
On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Daniel Vetter (2018-11-27 07:49:18)
On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.
A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.
By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.
Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.
Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.
Thanks, Daniel
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
#ifdef CONFIG_MMU_NOTIFIER
+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif
/*
- The mmu notifier_mm structure is allocated and installed in
- mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
_RET_IP_);
Would not lock_acquire_shared() be more appropriate, i.e. treat this as a rwsem_acquire_read()?
read lock critical sections can't create any dependencies against any other read lock critical section of the same lock. Switching this to a read lock would just render the annotation pointless (if you don't include at least some write lock critical section somewhere, but I have no idea where you'd do that). A read lock that you only ever take for reading essentially doesn't do anything at all.
So not clear on why you're suggesting this?
Just that it's not acting as a mutex, so emulating one looks wrong. -Chris
On Tue, Nov 27, 2018 at 05:33:58PM +0000, Chris Wilson wrote:
Quoting Daniel Vetter (2018-11-27 17:28:43)
On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Daniel Vetter (2018-11-27 07:49:18)
On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.
A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.
By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.
Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.
Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.
Thanks, Daniel
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
#ifdef CONFIG_MMU_NOTIFIER
+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif
/*
- The mmu notifier_mm structure is allocated and installed in
- mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
_RET_IP_);
Would not lock_acquire_shared() be more appropriate, i.e. treat this as a rwsem_acquire_read()?
read lock critical sections can't create any dependencies against any other read lock critical section of the same lock. Switching this to a read lock would just render the annotation pointless (if you don't include at least some write lock critical section somewhere, but I have no idea where you'd do that). A read lock that you only ever take for reading essentially doesn't do anything at all.
So not clear on why you're suggesting this?
Just that it's not acting as a mutex, so emulating one looks wrong.
Ok, I think switching to lock_map_acquire/release should address that. -Daniel
dri-devel@lists.freedesktop.org