On 2021-08-16 12:20 p.m., Quan, Evan wrote:
[AMD Official Use Only]
Hi Michel,
The patch seems reasonable to me(especially the cancel_delayed_work_sync() part). However, can you explain more about the code below? What's the race issue here exactly?
- /* mutex_lock could deadlock with cancel_delayed_work_sync in amdgpu_gfx_off_ctrl. */
- if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) {
/* If there's a bug which causes amdgpu_gfx_off_ctrl to be called with enable=true
* when adev->gfx.gfx_off_req_count is already 0, we might race with that.
* Re-schedule to make sure gfx off will be re-enabled in the HW eventually.
*/
schedule_delayed_work(&adev->gfx.gfx_off_delay_work, AMDGPU_GFX_OFF_DELAY_ENABLE);
return;
- }
If amdgpu_gfx_off_ctrl was called with enable=true when adev->gfx.gfx_off_req_count == 0 already, it could have prevented amdgpu_device_delay_enable_gfx_off from locking the mutex.
v3 solves this by only scheduling the work when adev->gfx.gfx_off_req_count transitions from 1 to 0, which means it no longer needs to lock the mutex.