On Sun, Sep 20 2020 at 09:57, Linus Torvalds wrote:
On Sun, Sep 20, 2020 at 1:49 AM Thomas Gleixner tglx@linutronix.de wrote: Btw, looking at the stack code, Ithink your new implementation of it is a bit scary:
static inline int kmap_atomic_idx_push(void) {
int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
int idx = current->kmap_ctrl.idx++;
and now that 'current->kmap_ctrl.idx' is not atomic wrt
(a) NMI's (this may be ok, maybe we never do kmaps in NMIs, and with nesting I think it's fine anyway - the NMI will undo whatever it did)
Right. Nesting should be a non issue, but I don't think we have kmap_atomic() in NMI context.
(b) the prev/next switch
And that (b) part worries me. You do the kmap_switch_temporary() to switch the entries, but you do that *separately* from actually switching 'current' to the new value.
So kmap_switch_temporary() looks safe, but I don't think it actually is. Because while it first unmaps the old entries and then remaps the new ones, an interrupt can come in, and at that point it matters what is *CURRENT*.
And regardless of whether 'current' is 'prev' or 'next', that kmap_switch_temporary() loop may be doing the wrong thing, depending on which one had the deeper stack. The interrupt will be using whatever "current->kmap_ctrl.idx" is, but that might overwrite entries that are in the process of being restored (if current is still 'prev', but kmap_switch_temporary() is in the "restore @next's kmaps" pgase), or it might stomp on entries that have been pte_clear()'ed by the 'prev' thing.
Duh yes. Never thought about that.
Alternatively, that process counter would need about a hundred lines of commentary about exactly why it's safe. Because I don't think it is.
I think the more obvious solution is to split the whole exercise:
schedule() prepare_switch() unmap()
switch_to()
finish_switch() map()
That's safe because neither the unmap() nor the map() code changes kmap_ctrl.idx. So if there is an interrupt coming in between unmap() and switch_to() then a kmap_local() there will use the next entry. So we could even do the unmap() with interrupts enabled (preemption disabled). Same for the map() part.
To explain that we need only a few lines of commentry methinks.
Thanks,
tglx