[PATCH 0/3] RFC: mmu notifier debug checks

List overview All Threads
Download

newer

older

[CI v11 01/23] drm/dsc: Modify DRM...

[PATCH 1/7] gpu: host1x: Resize...

Daniel Vetter

22 Nov 2018 22 Nov '18

4:51 p.m.

Hi all,

We're having some good fun with the i915 mmu notifier (it deadlocks), and I think it'd be very useful to have a bunch more runtime debug checks to catch screw-ups.

I'm also working on some lockdep improvements in gpu code (better annotations and stuff like that). Together with this series here this seems to catch a lot of bugs pretty much instantly, which previously took hours/days of CI workloads to reproduce. Plus now you get nice backtraces and the kernel keeps working, whereas without this it's real deadlocks with piles of stuck processes (the deadlock needed at least 3 processes, but generally it took more to close the loop, plus everyone piling in on top).

If this looks like a good idea I'm happy to polish it for merging.

Thanks, Daniel

Daniel Vetter (3): mm: Check if mmu notifier callbacks are allowed to fail mm, notifier: Catch sleeping/blocking for !blockable mm, notifier: Add a lockdep map for invalidate_range_start

include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 17 ++++++++++++++++- 2 files changed, 23 insertions(+), 1 deletion(-)

-- 2.19.1

Show replies by date

Daniel Vetter

22 Nov 22 Nov

4:51 p.m.

New subject: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: David Rientjes rientjes@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- mm/mmu_notifier.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..59e102589a25 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : ""); + WARN(blockable,"%pS callback failure not allowed\n", + mn->ops->invalidate_range_start); ret = _ret; } }

-- 2.19.1

Chris Wilson

4:53 p.m.

New subject: [Intel-gfx] [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

Quoting Daniel Vetter (2018-11-22 16:51:04)

...

Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

Most callers could handle the failure correctly. It looks like the failure was not propagated for convenience. -Chris

Daniel Vetter

23 Nov 23 Nov

8:49 a.m.

New subject: [Intel-gfx] [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Thu, Nov 22, 2018 at 04:53:34PM +0000, Chris Wilson wrote:

...

Quoting Daniel Vetter (2018-11-22 16:51:04)

...
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

Most callers could handle the failure correctly. It looks like the failure was not propagated for convenience.

I have no idea whether the mm is semantically ok if pte shootdown doesn't work for all sorts of strange reasons. From the commit that introduced the error code it souded like this was very much only ok in the limited case of an already killed process, in the oom killer path, where it's really only about trying to free any kind of memory. And where the process is gone already, so semantics of what exactly happens don't matter that much anymore.

And even if a lot more paths could support some kind of error recovery (they'd need to restart stuff, at least for your i915 patch to work I think), as long as we have paths where that's not allowed I think it's good to catch any bugs where a nonzero errno is errornously returned. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Michal Hocko

11:14 a.m.

New subject: [Intel-gfx] [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri 23-11-18 09:49:34, Daniel Vetter wrote:

...

On Thu, Nov 22, 2018 at 04:53:34PM +0000, Chris Wilson wrote:

...
Quoting Daniel Vetter (2018-11-22 16:51:04)

...
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

Most callers could handle the failure correctly. It looks like the failure was not propagated for convenience.

I have no idea whether the mm is semantically ok if pte shootdown doesn't work for all sorts of strange reasons. From the commit that introduced the error code it souded like this was very much only ok in the limited case of an already killed process, in the oom killer path, where it's really only about trying to free any kind of memory. And where the process is gone already, so semantics of what exactly happens don't matter that much anymore.

Yes this was indeed the case. There is still the exit path which would do the rest of the work so we are not leaving anything behind.

-- Michal Hocko SUSE Labs

Koenig, Christian

22 Nov 22 Nov

6:50 p.m.

New subject: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

Am 22.11.18 um 17:51 schrieb Daniel Vetter:

...

Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: David Rientjes rientjes@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

Acked-by: Christian König christian.koenig@amd.com

...

mm/mmu_notifier.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..59e102589a25 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : "");
		WARN(blockable,"%pS callback failure not allowed\n",
		     mn->ops->invalidate_range_start);
	ret = _ret;
}
}

Michal Hocko

23 Nov 23 Nov

11:15 a.m.

New subject: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Thu 22-11-18 17:51:04, Daniel Vetter wrote:

...

Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

What does WARN give you more than the existing pr_info? Is really backtrace that interesting?

...

Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: David Rientjes rientjes@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

mm/mmu_notifier.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..59e102589a25 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : "");
		WARN(blockable,"%pS callback failure not allowed\n",
		     mn->ops->invalidate_range_start);
	ret = _ret;
}
}
-- 2.19.1

-- Michal Hocko SUSE Labs

Daniel Vetter

12:30 p.m.

New subject: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:

...

On Thu 22-11-18 17:51:04, Daniel Vetter wrote:

...
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

What does WARN give you more than the existing pr_info? Is really backtrace that interesting?

Automated tools have to ignore everything at info level (there's too much of that). I guess I could do something like

if (blockable) pr_warn(...) else pr_info(...)

WARN() is simply my goto tool for getting something at warning level dumped into dmesg. But I think the pr_warn with the callback function should be enough indeed.

If you wonder where all the info level stuff happens that we have to ignore: suspend/resume is a primary culprit (fairly important for gfx/desktops), but there's a bunch of other places. Even if we ignore everything at info and below we still need filters because some drivers are a bit too trigger-happy (i915 definitely included I guess, so everyone contributes to this problem).

Cheers, Daniel

...

...
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: David Rientjes rientjes@google.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Cc: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com

mm/mmu_notifier.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..59e102589a25 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : "");
		WARN(blockable,"%pS callback failure not allowed\n",
		     mn->ops->invalidate_range_start);
	ret = _ret;
}
}
-- 2.19.1
-- Michal Hocko SUSE Labs

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Michal Hocko

12:43 p.m.

New subject: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri 23-11-18 13:30:57, Daniel Vetter wrote:

...

On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:

...
On Thu 22-11-18 17:51:04, Daniel Vetter wrote:

...
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

What does WARN give you more than the existing pr_info? Is really backtrace that interesting?

Automated tools have to ignore everything at info level (there's too much of that). I guess I could do something like

if (blockable) pr_warn(...) else pr_info(...)

WARN() is simply my goto tool for getting something at warning level dumped into dmesg. But I think the pr_warn with the callback function should be enough indeed.

I wouldn't mind s@pr_info@pr_warn@

...

If you wonder where all the info level stuff happens that we have to ignore: suspend/resume is a primary culprit (fairly important for gfx/desktops), but there's a bunch of other places. Even if we ignore everything at info and below we still need filters because some drivers are a bit too trigger-happy (i915 definitely included I guess, so everyone contributes to this problem).

Thanks for the clarification.

-- Michal Hocko SUSE Labs

Daniel Vetter

1:15 p.m.

New subject: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri, Nov 23, 2018 at 1:43 PM Michal Hocko mhocko@kernel.org wrote:

...

On Fri 23-11-18 13:30:57, Daniel Vetter wrote:

...
On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:

...
On Thu 22-11-18 17:51:04, Daniel Vetter wrote:

...
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

What does WARN give you more than the existing pr_info? Is really backtrace that interesting?

Automated tools have to ignore everything at info level (there's too much of that). I guess I could do something like

if (blockable) pr_warn(...) else pr_info(...)

WARN() is simply my goto tool for getting something at warning level dumped into dmesg. But I think the pr_warn with the callback function should be enough indeed.

I wouldn't mind s@pr_info@pr_warn@

Well that's too much, because then it would misfire in the oom testcase, where failing is ok (desireble even, we want to avoid blocking after all). So needs to be a switch (or else we need to filter it in results, and that's a bit a maintenance headache from a CI pov). -Danile

...

...
If you wonder where all the info level stuff happens that we have to ignore: suspend/resume is a primary culprit (fairly important for gfx/desktops), but there's a bunch of other places. Even if we ignore everything at info and below we still need filters because some drivers are a bit too trigger-happy (i915 definitely included I guess, so everyone contributes to this problem).

Thanks for the clarification.

Michal Hocko SUSE Labs

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Michal Hocko

1:30 p.m.

New subject: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri 23-11-18 14:15:11, Daniel Vetter wrote:

...

On Fri, Nov 23, 2018 at 1:43 PM Michal Hocko mhocko@kernel.org wrote:

...
On Fri 23-11-18 13:30:57, Daniel Vetter wrote:

...
On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:

...
On Thu 22-11-18 17:51:04, Daniel Vetter wrote:

...
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to.

What does WARN give you more than the existing pr_info? Is really backtrace that interesting?

Automated tools have to ignore everything at info level (there's too much of that). I guess I could do something like

if (blockable) pr_warn(...) else pr_info(...)

WARN() is simply my goto tool for getting something at warning level dumped into dmesg. But I think the pr_warn with the callback function should be enough indeed.

I wouldn't mind s@pr_info@pr_warn@

Well that's too much, because then it would misfire in the oom testcase, where failing is ok (desireble even, we want to avoid blocking after all). So needs to be a switch (or else we need to filter it in results, and that's a bit a maintenance headache from a CI pov).

I thought the failure should be rare enough that warning about them can be actually useful. E.g. in the oom case we can live with the failure because we want to release _some_ memory but know about a callback that prevents us to go the full way might be interesting.

But I do not really feel strongly about this. I find WARN a bit abuse because the trace is unlikely going to help us much. If you want to make a verbosity depending on the blockable context then I will surely not stand in the way.

-- Michal Hocko SUSE Labs

Daniel Vetter

22 Nov 22 Nov

4:51 p.m.

New subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) { - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable); + int _ret; + + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable) + preempt_disable(); + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable); + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable) + preempt_enable(); if (_ret) { pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret,

-- 2.19.1

Koenig, Christian

6:55 p.m.

New subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

Am 22.11.18 um 17:51 schrieb Daniel Vetter:

...

We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
	int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	int _ret;
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_disable();
	_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_enable();

Just for the sake of better documenting this how about adding this to include/linux/kernel.h right next to might_sleep():

#define disallow_sleeping_if(cond) for((cond) ? preempt_disable() : (void)0; (cond); preempt_disable())

(Just from the back of my head, might contain peanuts and/or hints of errors).

Christian.

...

	if (_ret) {
		pr_info("%pS callback failed with %d in %sblockable context.\n",
				mn->ops->invalidate_range_start, _ret,

Daniel Vetter

23 Nov 23 Nov

8:46 a.m.

New subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Thu, Nov 22, 2018 at 06:55:17PM +0000, Koenig, Christian wrote:

...

Am 22.11.18 um 17:51 schrieb Daniel Vetter:

...
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
	int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	int _ret;
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_disable();
	_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_enable();
Just for the sake of better documenting this how about adding this to include/linux/kernel.h right next to might_sleep():

#define disallow_sleeping_if(cond) for((cond) ? preempt_disable() : (void)0; (cond); preempt_disable())

(Just from the back of my head, might contain peanuts and/or hints of errors).

I think these magic for blocks aren't used in the kernel. goto breaks them, and we use goto a lot. I think a disallow/allow_sleep() pair with the conditional preept_disable/enable() calls would be nice though. I can do that if the overall idea sticks. -Daniel

...

Christian.

...

	if (_ret) {
		pr_info("%pS callback failed with %d in %sblockable context.\n",
				mn->ops->invalidate_range_start, _ret,

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

10:14 a.m.

New subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

Am 23.11.18 um 09:46 schrieb Daniel Vetter:

...

On Thu, Nov 22, 2018 at 06:55:17PM +0000, Koenig, Christian wrote:

...
Am 22.11.18 um 17:51 schrieb Daniel Vetter:

...
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
	int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	int _ret;
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_disable();
	_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_enable();
Just for the sake of better documenting this how about adding this to include/linux/kernel.h right next to might_sleep():

#define disallow_sleeping_if(cond) for((cond) ? preempt_disable() : (void)0; (cond); preempt_disable())

(Just from the back of my head, might contain peanuts and/or hints of errors).
I think these magic for blocks aren't used in the kernel. goto breaks them, and we use goto a lot.

Yeah, good argument.

...

I think a disallow/allow_sleep() pair with the conditional preept_disable/enable() calls would be nice though. I can do that if the overall idea sticks.

Sounds like a good idea to me as well.

Christian.

...

-Daniel

...

Christian.

...

		if (_ret) {
			pr_info("%pS callback failed with %d in %sblockable context.\n",
					mn->ops->invalidate_range_start, _ret,

Michal Hocko

11:12 a.m.

New subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Thu 22-11-18 17:51:05, Daniel Vetter wrote:

...

We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?

...

Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
	int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	int _ret;
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_disable();
	_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_enable();
if (_ret) {
	pr_info("%pS callback failed with %d in %sblockable context.\n",
			mn->ops->invalidate_range_start, _ret,
-- 2.19.1

-- Michal Hocko SUSE Labs

Daniel Vetter

12:38 p.m.

New subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:

...

On Thu 22-11-18 17:51:05, Daniel Vetter wrote:

...
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?

Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already.

There's been patches floating with this kind of bug I think, and the call chains we're dealing with a fairly deep. I don't trust review to reliably catch this kind of fail, that's why I'm looking into tools to better validat this stuff to augment review.

And yes it's ugly :-/

Wrt the behavior difference: I guess we could put another counter into the task struct, and change might_sleep() to check it. All under CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable sideeffect. My worry with that is that people will spot it, and abuse it in creative ways that do affect semantics. See horrors like drm_can_sleep() (and I'm sure gfx folks are not the only ones who seriously lacked taste here).

Up to the experts really how to best paint this shed I think.

Thanks, Daniel

...

...
Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: "Christian König" christian.koenig@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: "Jérôme Glisse" jglisse@redhat.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

mm/mmu_notifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 59e102589a25..4d282cfb296e 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, id = srcu_read_lock(&srcu); hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->invalidate_range_start) {
	int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	int _ret;
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_disable();
	_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
	if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
		preempt_enable();
if (_ret) {
	pr_info("%pS callback failed with %d in %sblockable context.\n",
			mn->ops->invalidate_range_start, _ret,
-- 2.19.1
-- Michal Hocko SUSE Labs

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Michal Hocko

12:46 p.m.

New subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Fri 23-11-18 13:38:38, Daniel Vetter wrote:

...

On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:

...
On Thu 22-11-18 17:51:05, Daniel Vetter wrote:

...
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?

Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already.

Create a task with oom_score_adj = 1000 and trigger the oom killer via sysrq and you should get a predictable oom invocation and execution.

[...]

...

Wrt the behavior difference: I guess we could put another counter into the task struct, and change might_sleep() to check it. All under CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable sideeffect. My worry with that is that people will spot it, and abuse it in creative ways that do affect semantics. See horrors like drm_can_sleep() (and I'm sure gfx folks are not the only ones who seriously lacked taste here).

Up to the experts really how to best paint this shed I think.

Actually I like a way to say non_block_{begin,end} and might_sleep firing inside that context.

-- Michal Hocko SUSE Labs

Daniel Vetter

1:12 p.m.

New subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Fri, Nov 23, 2018 at 1:46 PM Michal Hocko mhocko@kernel.org wrote:

...

On Fri 23-11-18 13:38:38, Daniel Vetter wrote:

...
On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:

...
On Thu 22-11-18 17:51:05, Daniel Vetter wrote:

...
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?

Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already.

Create a task with oom_score_adj = 1000 and trigger the oom killer via sysrq and you should get a predictable oom invocation and execution.

Ah right. We kinda do that already in an attempt to get the tests killed without the runner, for accidental oom. Just didn't think about this in the context of intentionally firing the oom. I'll try whether I can bake up some new subtest in our userptr/mmu-notifier testcases.

...

[...]

...
Wrt the behavior difference: I guess we could put another counter into the task struct, and change might_sleep() to check it. All under CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable sideeffect. My worry with that is that people will spot it, and abuse it in creative ways that do affect semantics. See horrors like drm_can_sleep() (and I'm sure gfx folks are not the only ones who seriously lacked taste here).

Up to the experts really how to best paint this shed I think.

Actually I like a way to say non_block_{begin,end} and might_sleep firing inside that context.

Ok, I'll respin with these (introduced in a separate patch). -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Tvrtko Ursulin

1:23 p.m.

New subject: [Intel-gfx] [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On 23/11/2018 13:12, Daniel Vetter wrote:

...

On Fri, Nov 23, 2018 at 1:46 PM Michal Hocko mhocko@kernel.org wrote:

...
On Fri 23-11-18 13:38:38, Daniel Vetter wrote:

...
On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:

...
On Thu 22-11-18 17:51:05, Daniel Vetter wrote:

...
We need to make sure implementations don't cheat and don't have a possible schedule/blocking point deeply burried where review can't catch it.

I'm not sure whether this is the best way to make sure all the might_sleep() callsites trigger, and it's a bit ugly in the code flow. But it gets the job done.

Yeah, it is quite ugly. Especially because it makes DEBUG config bahavior much different. So is this really worth it? Has this already discovered any existing bug?

Given that we need an oom trigger to hit this we're not hitting this in CI (oom is just way to unpredictable to even try). I'd kinda like to also add some debug interface so I can provoke an oom kill of a specially prepared process, to make sure we can reliably exercise this path without killing the kernel accidentally. We do similar tricks for our shrinker already.

Create a task with oom_score_adj = 1000 and trigger the oom killer via sysrq and you should get a predictable oom invocation and execution.

Ah right. We kinda do that already in an attempt to get the tests killed without the runner, for accidental oom. Just didn't think about this in the context of intentionally firing the oom. I'll try whether I can bake up some new subtest in our userptr/mmu-notifier testcases.

Very handy trick - I think I will think of applying it in the shrinker area as well.

Regards,

Tvrtko

Daniel Vetter

22 Nov 22 Nov

4:51 p.m.

New subject: [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.

Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif + /* * The mmu notifier_mm structure is allocated and installed in * mm->mmu_notifier_mm inside the mm_take_all_locks() protected @@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) { + mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0, + _RET_IP_); if (mm_has_notifiers(mm)) __mmu_notifier_invalidate_range_start(mm, start, end, true); + mutex_release(&__mmu_notifier_invalidate_range_start_map, 1, _RET_IP_); }

static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm, diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 4d282cfb296e..c6e797927376 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -23,6 +23,13 @@ /* global SRCU for all MMs */ DEFINE_STATIC_SRCU(srcu);

+#ifdef CONFIG_LOCKDEP +struct lockdep_map __mmu_notifier_invalidate_range_start_map = { + .name = "mmu_notifier_invalidate_range_start" +}; +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map); +#endif + /* * This function allows mmu_notifier::release callback to delay a call to * a function that will free appropriate resources. The function must be

-- 2.19.1

Daniel Vetter

27 Nov 27 Nov

7:49 a.m.

New subject: [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:

...

This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.

Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.

Thanks, Daniel

...

include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif

/*

The mmu notifier_mm structure is allocated and installed in

mm->mmu_notifier_mm inside the mm_take_all_locks() protected

@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
      _RET_IP_);
if (mm_has_notifiers(mm)) __mmu_notifier_invalidate_range_start(mm, start, end, true);
mutex_release(&__mmu_notifier_invalidate_range_start_map, 1, _RET_IP_);
}

static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm, diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 4d282cfb296e..c6e797927376 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -23,6 +23,13 @@ /* global SRCU for all MMs */ DEFINE_STATIC_SRCU(srcu);

+#ifdef CONFIG_LOCKDEP +struct lockdep_map __mmu_notifier_invalidate_range_start_map = {

.name = "mmu_notifier_invalidate_range_start"

+}; +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map); +#endif

/*

This function allows mmu_notifier::release callback to delay a call to

a function that will free appropriate resources. The function must be

-- 2.19.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Chris Wilson

4:49 p.m.

New subject: [Intel-gfx] [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

Quoting Daniel Vetter (2018-11-27 07:49:18)

...

On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:

...
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.

Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.

Thanks, Daniel

...
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif

/*

The mmu notifier_mm structure is allocated and installed in

mm->mmu_notifier_mm inside the mm_take_all_locks() protected

@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
              _RET_IP_);

Would not lock_acquire_shared() be more appropriate, i.e. treat this as a rwsem_acquire_read()? -Chris

Daniel Vetter

5:28 p.m.

New subject: [Intel-gfx] [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson chris@chris-wilson.co.uk wrote:

...

Quoting Daniel Vetter (2018-11-27 07:49:18)

...
On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:

...
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.

Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.

Thanks, Daniel

...
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif

/*

The mmu notifier_mm structure is allocated and installed in

mm->mmu_notifier_mm inside the mm_take_all_locks() protected

@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
              _RET_IP_);
Would not lock_acquire_shared() be more appropriate, i.e. treat this as a rwsem_acquire_read()?

read lock critical sections can't create any dependencies against any other read lock critical section of the same lock. Switching this to a read lock would just render the annotation pointless (if you don't include at least some write lock critical section somewhere, but I have no idea where you'd do that). A read lock that you only ever take for reading essentially doesn't do anything at all.

So not clear on why you're suggesting this?

It's the exact same idea like fs_reclaim of intserting a fake lock to tie all possible callchains to a given functions together with all possible callchains from that function. Of course this is only valid if all NxM combinations could happen in theory. For fs_reclaim that's true because direct reclaim can pick anything it wants to shrink/evict. For mmu notifier that's true as long as we assume any mmu notifier can be in use by any process, which only depends upon sufficiently contrived/evil userspace.

I guess I could use lock_map_acquire/release() wrappers for this like fs_reclaim, would be a bit more clear. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Chris Wilson

5:33 p.m.

New subject: [Intel-gfx] [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

Quoting Daniel Vetter (2018-11-27 17:28:43)

...

On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson chris@chris-wilson.co.uk wrote:

...
Quoting Daniel Vetter (2018-11-27 07:49:18)

...
On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:

...
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.

Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.

Thanks, Daniel

...
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif

/*

The mmu notifier_mm structure is allocated and installed in

mm->mmu_notifier_mm inside the mm_take_all_locks() protected

@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
              _RET_IP_);
Would not lock_acquire_shared() be more appropriate, i.e. treat this as a rwsem_acquire_read()?
read lock critical sections can't create any dependencies against any other read lock critical section of the same lock. Switching this to a read lock would just render the annotation pointless (if you don't include at least some write lock critical section somewhere, but I have no idea where you'd do that). A read lock that you only ever take for reading essentially doesn't do anything at all.

So not clear on why you're suggesting this?

Just that it's not acting as a mutex, so emulating one looks wrong. -Chris

Daniel Vetter

5:39 p.m.

New subject: [Intel-gfx] [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

On Tue, Nov 27, 2018 at 05:33:58PM +0000, Chris Wilson wrote:

...

Quoting Daniel Vetter (2018-11-27 17:28:43)

...
On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson chris@chris-wilson.co.uk wrote:

...
Quoting Daniel Vetter (2018-11-27 07:49:18)

...
On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:

...
This is a similar idea to the fs_reclaim fake lockdep lock. It's fairly easy to provoke a specific notifier to be run on a specific range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for all the various callchains that might lead to them. But both at the same time is really hard to reliable hit, especially when you want to exercise paths like direct reclaim or compaction, where it's not easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep to see a lot more dependencies, without having to actually hit them in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled this out for the invaliate_range_start callback. If there's interest, we should probably roll this out to all of them. But my undestanding of core mm is seriously lacking, and I'm not clear on whether we need a lockdep map for each callback, or whether some can be shared.

Cc: Andrew Morton akpm@linux-foundation.org Cc: David Rientjes rientjes@google.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: "Christian König" christian.koenig@amd.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com

Any comments on this one here? This is really the main ingredient for catching deadlocks in mmu notifier callbacks. The other two patches are more the icing on the cake.

Thanks, Daniel

...
include/linux/mmu_notifier.h | 7 +++++++ mm/mmu_notifier.c | 7 +++++++ 2 files changed, 14 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 9893a6432adf..a39ba218dbbe 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -12,6 +12,10 @@ struct mmu_notifier_ops;

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map; +#endif

/*

The mmu notifier_mm structure is allocated and installed in

mm->mmu_notifier_mm inside the mm_take_all_locks() protected

@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end) {
mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
              _RET_IP_);
Would not lock_acquire_shared() be more appropriate, i.e. treat this as a rwsem_acquire_read()?
read lock critical sections can't create any dependencies against any other read lock critical section of the same lock. Switching this to a read lock would just render the annotation pointless (if you don't include at least some write lock critical section somewhere, but I have no idea where you'd do that). A read lock that you only ever take for reading essentially doesn't do anything at all.

So not clear on why you're suggesting this?
Just that it's not acting as a mutex, so emulating one looks wrong.

Ok, I think switching to lock_map_acquire/release should address that. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

2363

Age (days ago)

2368

Last active (days ago)

dri-devel@lists.freedesktop.org

25 comments

7 participants

tags (0)

participants (7)

Chris Wilson
Christian König
Daniel Vetter
Daniel Vetter
Koenig, Christian
Michal Hocko
Tvrtko Ursulin