On Wed, Sep 23, 2020 at 11:52:51AM -0400, Steven Rostedt wrote:
On Wed, 23 Sep 2020 10:40:32 +0200 peterz@infradead.org wrote:
However, with migrate_disable() we can have each task preempted in a migrate_disable() region, worse we can stack them all on the _same_ CPU (super ridiculous odds, sure). And then we end up only able to run one task, with the rest of the CPUs picking their nose.
What if we just made migrate_disable() a local_lock() available for !RT?
Can't, neiter migrate_disable() nor migrate_enable() are allowed to block -- which is what makes their implementation so 'interesting'.
This should lower the SHC in theory, if you can't have stacked migrate disables on the same CPU.
See this email in that other thread (you're on Cc too IIRC):
https://lkml.kernel.org/r/20200923170809.GY1362448@hirez.programming.kicks-a...
I think that is we 'frob' the balance PULL, we'll end up with something similar.
Whichever way around we turn this thing, the migrate_disable() runtime (we'll have to add a tracer for that), will be an interference term on the lower priority task, exactly like preempt_disable() would be. We'll just not exclude a higher priority task from running.
AFAICT; the best case is a single migrate_disable() nesting, where a higher priority task preempts in a migrate_disable() section -- this is per design.
When this preempted task becomes elegible to run under the ideal model (IOW it becomes one of the M highest priority tasks), it might still have to wait for the preemptee's migrate_disable() section to complete. Thereby suffering interference in the exact duration of migrate_disable() section.
Per this argument, the change from preempt_disable() to migrate_disable() gets us:
- higher priority tasks gain reduced wake-up latency - lower priority tasks are unchanged and are subject to the exact same interference term as if the higher priority task were using preempt_disable().
Since we've already established this term is unbounded, any task but the highest priority task is basically buggered.
TL;DR, if we get balancing fixed and achieve (near) the optimal case above, migrate_disable() is an over-all win. But it's provably non-deterministic as long as the migrate_disable() sections are non-deterministic.
The reason this all mostly works in practise is (I think) because:
- People care most about the higher prio RT tasks and craft them to mostly avoid the migrate_disable() infected code.
- The preemption scenario is most pronounced at higher utilization scenarios, and I suspect this is fairly rare to begin with.
- And while many of these migrate_disable() regions are unbound in theory, in practise they're often fairly reasonable.
So my current todo list is:
- Change RT PULL - Change DL PULL - Add migrate_disable() tracer; exactly like preempt/irqoff, except measuring task-runtime instead of cpu-time. - Add a mode that measures actual interference. - Add a traceevent to detect preemption in migrate_disable().
And then I suppose I should twist Daniel's arm to update his model to include these scenarios and numbers.