On Tue, Feb 15, 2022 at 10:37 PM Damien Le Moal damien.lemoal@opensource.wdc.com wrote:
Yeah, Matthew pointed out the same thing for another use-case, where it looks like DEPT is looking at the state at the wrong point (not at the scheduling point, but at prepare_to_sleep()).
This ata_port_wait() is the exact same pattern, ie we have
spin_lock_irqsave(ap->lock, flags);
while (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) { prepare_to_wait(&ap->eh_wait_q, &wait, TASK_UNINTERRUPTIBLE); spin_unlock_irqrestore(ap->lock, flags); schedule();
and DEPT has incorrectly taken it to mean that 'ap->lock' is held during the wait, when it is actually released before actually waiting.
For the spin-locks, this is all very obvious (because they'd have been caught long ago by much simpler debug code), but the same prepare_to_wait -> wait pattern can most definitely happen with sleeping locks too, so they are all slightly suspect.
And yes, the detailed reports are hard to read because the locations are given as "ata_port_wait_eh+0x52/0xc0". Running them through scripts/decode_stacktrace.sh to turn them into filename and line numbers - and also sort out inlining - would help a lot.
Byungchul, could you fix those two issues? Some of your reports may well be entirely valid, but the hard-to-read hex offsets and the knowledge that at least some of them are confused about how prepare_to_wait -> wait actually works makes the motivation to look at the details much less..
Linus
On Wed, 16 Feb 2022 10:09:14 -0800 Linus Torvalds torvalds@linux-foundation.org wrote:
Hi Byungchul,
I'm not sure what your tool is using to attach to the kernel to analyze the events (perhaps tracepoints?). But you can have the prepare_to_wait event just flag the task is about to wait, and then attach to the schedule (sched_switch) event to denote that it actually waited.
Of course have the finish_wait() clear the flag.
-- Steve
On Wed, Feb 16, 2022 at 01:33:18PM -0500, Steven Rostedt wrote:
(Sorry for late reply, which was because of my email client issue.)
Thank you for the hint. However, unfortunately, I didn't use tracepoint for that. However, the key idea is the thing that I should take!
Thanks a lot!
Thanks, Byungchul
-- Steve
torvalds@linux-foundation.org wrote:
Of couse, that's what I should do. Thanks for your feedback.
dri-devel@lists.freedesktop.org