Re: [PATCH 1/4] drm/v3d: Delay the scheduler timeout if we're still making progress.

5 Jul 2018


      Lucas Stach l.stach@pengutronix.de writes:
...
Am Dienstag, den 03.07.2018, 10:05 -0700 schrieb Eric Anholt:
...
GTF-GLES2.gtf.GL.acos.acos_float_vert_xvary submits jobs that take 4
seconds at maximum resolution, but we still want to reset quickly if a
job is really hung.  Sample the CL's current address and the return
address (since we call into tile lists repeatedly) and if either has
changed then assume we've made progress.
So this means you are doubling your timeout? AFAICS for the first time
you hit the timeout handler the cached ctca and ctra values will
probably always differ from the current values. Maybe this warrants a
mention in the commit message, as it's changing the behavior of the
scheduler timeout.
I supposes that doubles the minimum timeout, but I don't think there's
any principled choice behind that value.
...
Also how easy is it for userspace to construct such an infinite loop in
the CL? Thinking about a rogue client DoSing the GPU while exploiting
this check in the timeout handler to stay under the radar...
You'd need to have a big enough CL that you don't sample the same
location twice in a row, but otherwise it's trivial and equivalent to a
V3D33 igt case I wrote.  I don't think we as the kernel particularly
cares to protect from that case, though -- it's mainly "does a broken
WebGL shader take down your desktop?" which we will still be protecting
from.  If you wanted to protect from a general userspace attacker, you
could have a maximum 1 minute timeout or something, but I'm not sure
your life is actually much better when you let an arbitrary number of
clients submit many jobs to round-robin through each of which has a long
timeout like that.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH 1/4] drm/v3d: Delay the scheduler timeout if we're still making progress.