Hi,
On Wed, Jun 8, 2022 at 9:13 AM Rob Clark robdclark@gmail.com wrote:
From: Rob Clark robdclark@chromium.org
I've seen a few crashes like:
CPU: 0 PID: 216 Comm: A618-worker Tainted: G W 5.4.196 #7 Hardware name: Google Wormdingler rev1+ INX panel board (DT) pstate: 20c00009 (nzCv daif +PAN +UAO) pc : msm_readl+0x14/0x34 lr : a6xx_gpu_busy+0x40/0x80 sp : ffffffc011b93ad0 x29: ffffffc011b93ad0 x28: ffffffe77cba3000 x27: 0000000000000001 x26: ffffffe77bb4c4ac x25: ffffffa2f227dfa0 x24: ffffffa2f22aab28 x23: 0000000000000000 x22: ffffffa2f22bf020 x21: ffffffa2f22bf000 x20: ffffffc011b93b10 x19: ffffffc011bd4110 x18: 000000000000000e x17: 0000000000000004 x16: 000000000000000c x15: 000001be3a969450 x14: 0000000000000400 x13: 00000000000101d6 x12: 0000000034155555 x11: 0000000000000001 x10: 0000000000000000 x9 : 0000000100000000 x8 : ffffffc011bd4000 x7 : 0000000000000000 x6 : 0000000000000007 x5 : ffffffc01d8b38f0 x4 : 0000000000000000 x3 : 00000000ffffffff x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffffffc011bd4110 Call trace: msm_readl+0x14/0x34 a6xx_gpu_busy+0x40/0x80 msm_devfreq_get_dev_status+0x70/0x1d0 devfreq_simple_ondemand_func+0x34/0x100 update_devfreq+0x50/0xe8 qos_notifier_call+0x2c/0x64 qos_max_notifier_call+0x1c/0x2c notifier_call_chain+0x58/0x98 __blocking_notifier_call_chain+0x74/0x84 blocking_notifier_call_chain+0x38/0x48 pm_qos_update_target+0xf8/0x19c freq_qos_apply+0x54/0x6c apply_constraint+0x60/0x104 __dev_pm_qos_update_request+0xb4/0x184 dev_pm_qos_update_request+0x38/0x58 msm_devfreq_idle_work+0x34/0x40 kthread_worker_fn+0x144/0x1c8 kthread+0x140/0x284 ret_from_fork+0x10/0x18 Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260) ---[ end trace f6309767a42d0831 ]---
Which smells a lot like touching hw after power collapse. This seems a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use() in a6xx_gpu_busy() should have kept us from touching hw if it wasn't powered.
I dunno if we want to change the commit message since I think my patch [1] addresses the above problem?
[1] https://lore.kernel.org/r/20220609094716.v2.1.Ie846c5352bc307ee4248d7cab998a...
But, we've seen cases where the idle_work scheduled by msm_devfreq_idle() ends up racing with the resume path. Which, again, shouldn't be a problem other than unnecessary freq changes.
v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy()
Fixes: 9bc95570175a ("drm/msm: Devfreq tuning") Signed-off-by: Rob Clark robdclark@chromium.org Link: https://lore.kernel.org/r/20210927152928.831245-1-robdclark@gmail.com
drivers/gpu/drm/msm/msm_gpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
In any case, your patch fixes the potential WARN_ON and seems like the right thing to do, so:
Reviewed-by: Douglas Anderson dianders@chromium.org