The loop is scanning until the original max_ip (size of the BO), but we want to not examine any code after the PROG_END's delay slots. There was a block trying to do that, except that we had some early continue statements if the signal wasn't a PROG_END or a BRANCH.
The failure mode would be that a valid shader is rejected because some undefined memory after the PROG_END slots is parsed as a branch and the rest of its setup is illegal. I haven't seen this in the wild, but valgrind was complaining when about this up in the userland simulator mode.
Signed-off-by: Eric Anholt eric@anholt.net --- drivers/gpu/drm/vc4/vc4_validate_shaders.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_validate_shaders.c b/drivers/gpu/drm/vc4/vc4_validate_shaders.c index 2543cf5b8b51..917321ce832f 100644 --- a/drivers/gpu/drm/vc4/vc4_validate_shaders.c +++ b/drivers/gpu/drm/vc4/vc4_validate_shaders.c @@ -608,9 +608,7 @@ static bool vc4_validate_branches(struct vc4_shader_validation_state *validation_state) { uint32_t max_branch_target = 0; - bool found_shader_end = false; int ip; - int shader_end_ip = 0; int last_branch = -2;
for (ip = 0; ip < validation_state->max_ip; ip++) { @@ -621,8 +619,13 @@ vc4_validate_branches(struct vc4_shader_validation_state *validation_state) uint32_t branch_target_ip;
if (sig == QPU_SIG_PROG_END) { - shader_end_ip = ip; - found_shader_end = true; + /* There are two delay slots after program end is + * signaled that are still executed, then we're + * finished. validation_state->max_ip is the + * instruction after the last valid instruction in the + * program. + */ + validation_state->max_ip = ip + 3; continue; }
@@ -676,15 +679,9 @@ vc4_validate_branches(struct vc4_shader_validation_state *validation_state) } set_bit(after_delay_ip, validation_state->branch_targets); max_branch_target = max(max_branch_target, after_delay_ip); - - /* There are two delay slots after program end is signaled - * that are still executed, then we're finished. - */ - if (found_shader_end && ip == shader_end_ip + 2) - break; }
- if (max_branch_target > shader_end_ip) { + if (max_branch_target > validation_state->max_ip - 3) { DRM_ERROR("Branch landed after QPU_SIG_PROG_END"); return false; }
The validation for it ends up being quite simple, but I hadn't got around to it before merging the driver. For backwards compatibility, we also need to add a flag so that the userspace GL driver can easily tell if the kernel will allow ETC1 textures (on an old kernel, it will continue to convert to RGBA8)
Signed-off-by: Eric Anholt eric@anholt.net --- drivers/gpu/drm/vc4/vc4_drv.c | 1 + drivers/gpu/drm/vc4/vc4_validate.c | 7 +++++++ include/uapi/drm/vc4_drm.h | 1 + 3 files changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c index 8703f56b7947..b087404c2784 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.c +++ b/drivers/gpu/drm/vc4/vc4_drv.c @@ -78,6 +78,7 @@ static int vc4_get_param_ioctl(struct drm_device *dev, void *data, pm_runtime_put(&vc4->v3d->pdev->dev); break; case DRM_VC4_PARAM_SUPPORTS_BRANCHES: + case DRM_VC4_PARAM_SUPPORTS_ETC1: args->value = true; break; default: diff --git a/drivers/gpu/drm/vc4/vc4_validate.c b/drivers/gpu/drm/vc4/vc4_validate.c index 26503e307438..e18f88203d32 100644 --- a/drivers/gpu/drm/vc4/vc4_validate.c +++ b/drivers/gpu/drm/vc4/vc4_validate.c @@ -644,6 +644,13 @@ reloc_tex(struct vc4_exec_info *exec, cpp = 1; break; case VC4_TEXTURE_TYPE_ETC1: + /* ETC1 is arranged as 64-bit blocks, where each block is 4x4 + * pixels. + */ + cpp = 8; + width = (width + 3) >> 2; + height = (height + 3) >> 2; + break; case VC4_TEXTURE_TYPE_BW1: case VC4_TEXTURE_TYPE_A4: case VC4_TEXTURE_TYPE_A1: diff --git a/include/uapi/drm/vc4_drm.h b/include/uapi/drm/vc4_drm.h index ad7edc3edf7c..69caa21f0cb2 100644 --- a/include/uapi/drm/vc4_drm.h +++ b/include/uapi/drm/vc4_drm.h @@ -286,6 +286,7 @@ struct drm_vc4_get_hang_state { #define DRM_VC4_PARAM_V3D_IDENT1 1 #define DRM_VC4_PARAM_V3D_IDENT2 2 #define DRM_VC4_PARAM_SUPPORTS_BRANCHES 3 +#define DRM_VC4_PARAM_SUPPORTS_ETC1 4
struct drm_vc4_get_param { __u32 param;
dri-devel@lists.freedesktop.org