Re: [REPORT] syscall reboot + umh + firmware fallback

12 May 2022


      Hello,
Just took a look out of curiosity.
On Thu, May 12, 2022 at 02:25:57PM +0900, Byungchul Park wrote:
...
PROCESS A	PROCESS B	WORKER C
__do_sys_reboot()
   	__do_sys_reboot()
 mutex_lock(&system_transition_mutex)
 ...		 mutex_lock(&system_transition_mutex) <- stuck
   	 ...
   			request_firmware_work_func()
   			 _request_firmware()
   			  firmware_fallback_sysfs()
   			   usermodehelper_read_lock_wait()
   			    down_read(&umhelper_sem)
   			   ...
   			   fw_load_sysfs_fallback()
   			    fw_sysfs_wait_timeout()
   			     wait_for_completion_killable_timeout(&fw_st->completion) <- stuck
 kernel_halt()
  __usermodehelper_disable()
   down_write(&umhelper_sem) <- stuck

All the 3 contexts are stuck at this point.
PROCESS A	PROCESS B	WORKER C
...
   up_write(&umhelper_sem)
 ...
 mutex_unlock(&system_transition_mutex) <- cannot wake up B
 ...
 kernel_halt()
  notifier_call_chain()
   hw_shutdown_notify()
    kill_pending_fw_fallback_reqs()
     __fw_load_abort()
      complete_all(&fw_st->completion) <- cannot wake up C

		   ...
		   usermodeheler_read_unlock()
		    up_read(&umhelper_sem) <- cannot wake up A

I'm not sure I'm reading it correctly but it looks like "process B" column
is superflous given that it's waiting on the same lock to do the same thing
that A is already doing (besides, you can't really halt the machine twice).
What it's reporting seems to be ABBA deadlock between A waiting on
umhelper_sem and C waiting on fw_st->completion. The report seems spurious:
1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
   to make forward progress because it will unstick itself after timeout
   expires.
2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
   The fw loader can be, and mainly should be, woken up by firmware loading
   actually completing instead of being aborted.
I guess the reason why B shows up there is because the operation order is
such that just between A and C, the complete_all() takes place before
__usermodehlper_disable(), so the whole thing kinda doesn't make sense as
you can't block a past operation by a future one. Inserting process B
introduces the reverse ordering.
Thanks.
-- 
tejun

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [REPORT] syscall reboot + umh + firmware fallback

All the 3 contexts are stuck at this point.