Am Mi., 22. Dez. 2021 um 01:17 Uhr schrieb Lucas Stach l.stach@pengutronix.de:
Some GPU heavy test programs manage to trigger the hangcheck quite often. If there are no other GPU users in the system and the test program exhibits a very regular structure in the commandstreams that are being submitted, we can end up with two distinct submits managing to trigger the hangcheck with the FE in a very similar address range. This leads the hangcheck to believe that the GPU is stuck, while in reality the GPU is already busy working on a different job. To avoid those spurious GPU resets, also remember and consider the last completed fence seqno in the hang check.
Reported-by: Joerg Albert joerg.albert@iav.de Signed-off-by: Lucas Stach l.stach@pengutronix.de
Reviewed-by: Christian Gmeiner christian.gmeiner@gmail.com