On 05/02/2021, Eric Anholt eric@anholt.net wrote:
On Thu, Feb 4, 2021 at 10:09 AM Chema Casanova jmcasanova@igalia.com wrote:
On 3/9/20 18:48, Yukimasa Sugizaki wrote:
From: Yukimasa Sugizaki ysugi@idein.jp
The default timeout is 500 ms which is too short for some workloads including Piglit. Adding this parameter will help users to run heavier tasks.
Signed-off-by: Yukimasa Sugizaki ysugi@idein.jp
drivers/gpu/drm/v3d/v3d_sched.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index feef0c749e68..983efb018560 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -19,11 +19,17 @@ */
#include <linux/kthread.h> +#include <linux/moduleparam.h>
#include "v3d_drv.h" #include "v3d_regs.h" #include "v3d_trace.h"
+static uint timeout = 500; +module_param(timeout, uint, 0444); +MODULE_PARM_DESC(timeout,
"Timeout for a job in ms (0 means infinity and default is 500
ms)");
I think that parameter definition could be included at v3d_drv.c as other drivers do. Then we would have all future parameters together. In that case we would need also to include the extern definition at v3d_drv.h for the driver variable.
The param name could be more descriptive like "sched_timeout_ms" in the lima driver.
I wonder if it would make sense to have different timeouts parameters for different type of jobs we have. At least this series come from the need additional time to complete compute jobs. This is what amdgpu does, but we probably don't need something such complex.
I think it would also make sense to increase our default value for the compute jobs.
What do you think?
In any case I think that having this general scheduling timeout parameter as other drivers do is a good idea.
I agree with your observations. I'm going to add bin_timeout_ms, render_timeout_ms, tfu_timeout_ms, v3d_timeout_ms, and cache_clean_timeout_ms parameters to v3d_drv.c instead of the sole timeout parameter. Though not sophisticated, this should be enough.
Having a param for being able to test if the job completes eventually is a good idea, but if tests are triggering timeouts, then you probably need to investigate what's going on in the driver -- it's not like you want .5second unpreemptible jobs to be easy to produce.
Also, for CS, I wonder if we have another reg besides CSD_CURRENT_CFG4 we could be looking at for "we're making progress on the job". Half a second with no discernible progress sounds like a bug.
If binning/render/TFU/cache_clean jobs keep running over 0.5 seconds, then it should be a bug because they normally complete processing within the period. However, a CSD job can do anything. For example, SaschaWillems's ray tracing example requires about 0.7 seconds to compose a frame with compute shader with Igalia's Vulkan driver. Another example is our matrix computation library for QPU: https://github.com/Idein/qmkl6 It requires about 1.1 seconds to multiply two 2048x2048 matrices. Because in general we do not know what is the expected behavior of a QPU program and what is not, we also cannot detect a progress the program is making. This is why we want to have the timeout parameters to be able to be modified.
Regards, Y. Sugizaki.