On Mon, Sep 14 2020 at 13:59, Linus Torvalds wrote:
On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner tglx@linutronix.de wrote:
Recently merged code does:
gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC;
Looks obviously correct, except for the fact that preemptible() is unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in that code use GFP_ATOMIC on such kernels.
I don't think this is a good reason to entirely get rid of the no-preempt thing.
I did not say that this is a good reason. It just illustrates the problem.
The above is just garbage. It's bogus. You can't do it.
Blaming the no-preempt code for this bug is extremely unfair, imho.
I'm not blaming the no-preempt code. I'm blaming inconsistency and there is no real good argument for inconsistent behaviour, TBH.
And the no-preempt code does help make for much better code generation for simple spinlocks.
Yes it does generate better code, but I tried hard to spot a difference in various metrics exposed by perf. It's all in the noise and I only can spot a difference when the actual preemption check after the decrement which still depends on CONFIG_PREEMPT is in place, but that's not the case for PREEMPT_NONE or PREEMPT_VOLUNTARY kernels where the decrement is just a decrement w/o any conditional behind it.
Where is that horribly buggy recent code? It's not in that exact format, certainly, since 'grep' doesn't find it.
Bah, that was stuff in next which got dropped again.
But just look at any check which uses preemptible(), especially those which check !preemptible():
In the X86 #GP handler we have:
/* * To be potentially processing a kprobe fault and to trust the result * from kprobe_running(), we have to be non-preemptible. */ if (!preemptible() && kprobe_running() && kprobe_fault_handler(regs, X86_TRAP_GP)) goto exit;
and a similar check in the S390 code in kprobe_exceptions_notify(). That all magically 'works' because that code might have been actually tested with lockdep enabled which enforces PREEMPT_COUNT...
The SG code has some interesting usage as well:
if (miter->__flags & SG_MITER_ATOMIC) { WARN_ON_ONCE(preemptible()); kunmap_atomic(miter->addr);
How is that WARN_ON_ONCE() supposed to catch anything? Especially as calling code does:
flags = SG_MITER_TO_SG; if (!preemptible()) flags |= SG_MITER_ATOMIC;
which is equally useless on kernels which have PREEMPT_COUNT=n.
There are bugs which are related to in_atomic() or other in_***() usage all over the place as well.
Inconsistency at the core level is a clear recipe for disaster and at some point we have to bite the bullet and accept that consistency is more important than the non measurable 3 cycles?
Thanks,
tglx