Injecting probe errors for MMIO send, CT send to make probe flow more robust.
Use i915_probe_error to report probe injection errors.
Thanneeru Srinivasulu (4): drm/i915/huc: Use i915_probe_error to report early CTB failures drm/i915/huc: Use i915_probe_error to report early HuC failures drm/i915/guc: Inject probe errors for MMIO send drm/i915/guc: Inject probe errors for CT send
drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 ++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 12 ++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 ++-- 3 files changed, 16 insertions(+), 4 deletions(-)
Replace DRM_ERROR with CT_PROBE_ERROR to report early CTB failures.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 0a3504bc0b61..83764db0fd6d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -191,8 +191,8 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type, err = guc_action_register_ct_buffer(ct_to_guc(ct), type, desc_addr, buff_addr, size); if (unlikely(err)) - CT_ERROR(ct, "Failed to register %s buffer (%pe)\n", - guc_ct_buffer_type_to_str(type), ERR_PTR(err)); + CT_PROBE_ERROR(ct, "Failed to register %s buffer (%pe)\n", + guc_ct_buffer_type_to_str(type), ERR_PTR(err)); return err; }
On Mon, Oct 11, 2021 at 08:51:03PM +0530, Thanneeru Srinivasulu wrote:
Replace DRM_ERROR with CT_PROBE_ERROR to report early CTB failures.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 0a3504bc0b61..83764db0fd6d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -191,8 +191,8 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type, err = guc_action_register_ct_buffer(ct_to_guc(ct), type, desc_addr, buff_addr, size); if (unlikely(err))
CT_ERROR(ct, "Failed to register %s buffer (%pe)\n",
guc_ct_buffer_type_to_str(type), ERR_PTR(err));
CT_PROBE_ERROR(ct, "Failed to register %s buffer (%pe)\n",
return err;guc_ct_buffer_type_to_str(type), ERR_PTR(err));
}
-- 2.25.1
On Mon, 11 Oct 2021, Matthew Brost matthew.brost@intel.com wrote:
On Mon, Oct 11, 2021 at 08:51:03PM +0530, Thanneeru Srinivasulu wrote:
Replace DRM_ERROR with CT_PROBE_ERROR to report early CTB failures.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 0a3504bc0b61..83764db0fd6d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -191,8 +191,8 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type, err = guc_action_register_ct_buffer(ct_to_guc(ct), type, desc_addr, buff_addr, size); if (unlikely(err))
CT_ERROR(ct, "Failed to register %s buffer (%pe)\n",
guc_ct_buffer_type_to_str(type), ERR_PTR(err));
CT_PROBE_ERROR(ct, "Failed to register %s buffer (%pe)\n",
guc_ct_buffer_type_to_str(type), ERR_PTR(err));
Please tell me why we are adding not just i915-specific logging helpers, but file specific ones?
To be honest I'd like to see all of the CT_ERROR, CT_DEBUG, CT_PROBE_ERROR macros just gone.
BR, Jani.
return err; }
-- 2.25.1
On Tue, 12 Oct 2021, Jani Nikula jani.nikula@linux.intel.com wrote:
On Mon, 11 Oct 2021, Matthew Brost matthew.brost@intel.com wrote:
On Mon, Oct 11, 2021 at 08:51:03PM +0530, Thanneeru Srinivasulu wrote:
Replace DRM_ERROR with CT_PROBE_ERROR to report early CTB failures.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 0a3504bc0b61..83764db0fd6d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -191,8 +191,8 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type, err = guc_action_register_ct_buffer(ct_to_guc(ct), type, desc_addr, buff_addr, size); if (unlikely(err))
CT_ERROR(ct, "Failed to register %s buffer (%pe)\n",
guc_ct_buffer_type_to_str(type), ERR_PTR(err));
CT_PROBE_ERROR(ct, "Failed to register %s buffer (%pe)\n",
guc_ct_buffer_type_to_str(type), ERR_PTR(err));
Please tell me why we are adding not just i915-specific logging helpers, but file specific ones?
To be honest I'd like to see all of the CT_ERROR, CT_DEBUG, CT_PROBE_ERROR macros just gone.
For that matter, why has GEM_BUG_ON spread like a disease to display and intel_uncore.c too?
BR, Jani.
BR, Jani.
return err; }
-- 2.25.1
On 12.10.2021 18:16, Jani Nikula wrote:
On Mon, 11 Oct 2021, Matthew Brost matthew.brost@intel.com wrote:
On Mon, Oct 11, 2021 at 08:51:03PM +0530, Thanneeru Srinivasulu wrote:
Replace DRM_ERROR with CT_PROBE_ERROR to report early CTB failures.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 0a3504bc0b61..83764db0fd6d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -191,8 +191,8 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type, err = guc_action_register_ct_buffer(ct_to_guc(ct), type, desc_addr, buff_addr, size); if (unlikely(err))
CT_ERROR(ct, "Failed to register %s buffer (%pe)\n",
guc_ct_buffer_type_to_str(type), ERR_PTR(err));
CT_PROBE_ERROR(ct, "Failed to register %s buffer (%pe)\n",
guc_ct_buffer_type_to_str(type), ERR_PTR(err));
Please tell me why we are adding not just i915-specific logging helpers, but file specific ones?
To be honest I'd like to see all of the CT_ERROR, CT_DEBUG, CT_PROBE_ERROR macros just gone.
the reason for CT_DEBUG is that it can be quite noisy so we must have an easy option to compile it out on non-debug configs, can't just replace that helper with drm_dbg or i915_dbg (that we don't have) as it will be available likely on I915_DEBUG config, while we want more fine control.
use of file (or component) level helpers allows us to simplify the code (no need to repeat long i915->drm lookup from component pointer) and we may provide common prefix and/or classification of the messages.
extra bonus, especially useful after introduction of multi-gt support, will be possibility of augmenting message to include gt identifier, without the need to update all existing places if they were using i915- or drm- level functions directly.
for this last feature, likely "gt" specific intel_gt_err|probe_err|dbg helpers will do the job as well, so if someone introduce them, I'm happy to convert CT_ERROR calls to these new helpers if really really needed.
-Michal
BR, Jani.
return err; }
-- 2.25.1
Replace DRM_ERROR with i915_probe_error to report early HuC failures.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index ff4b6869b80b..ff0f5b9130c9 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -179,7 +179,7 @@ int intel_huc_auth(struct intel_huc *huc) ret = intel_guc_auth_huc(guc, intel_guc_ggtt_offset(guc, huc->rsa_data)); if (ret) { - DRM_ERROR("HuC: GuC did not ack Auth request %d\n", ret); + i915_probe_error(gt->i915, "HuC: GuC did not ack Auth request %d\n", ret); goto fail; }
@@ -190,7 +190,7 @@ int intel_huc_auth(struct intel_huc *huc) huc->status.value, 2, 50, NULL); if (ret) { - DRM_ERROR("HuC: Firmware not verified %d\n", ret); + i915_probe_error(gt->i915, "HuC: Firmware not verified %d\n", ret); goto fail; }
On Mon, Oct 11, 2021 at 08:51:04PM +0530, Thanneeru Srinivasulu wrote:
Replace DRM_ERROR with i915_probe_error to report early HuC failures.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index ff4b6869b80b..ff0f5b9130c9 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -179,7 +179,7 @@ int intel_huc_auth(struct intel_huc *huc) ret = intel_guc_auth_huc(guc, intel_guc_ggtt_offset(guc, huc->rsa_data)); if (ret) {
DRM_ERROR("HuC: GuC did not ack Auth request %d\n", ret);
goto fail; }i915_probe_error(gt->i915, "HuC: GuC did not ack Auth request %d\n", ret);
@@ -190,7 +190,7 @@ int intel_huc_auth(struct intel_huc *huc) huc->status.value, 2, 50, NULL); if (ret) {
DRM_ERROR("HuC: Firmware not verified %d\n", ret);
goto fail; }i915_probe_error(gt->i915, "HuC: Firmware not verified %d\n", ret);
-- 2.25.1
Injecting probe errors -ENXIO for MMIO send.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 8f8182bf7c11..490d66712afc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -403,6 +403,10 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *request, u32 len, int i; int ret;
+ ret = i915_inject_probe_error(i915, -ENXIO); + if (ret) + return ret; + GEM_BUG_ON(!len); GEM_BUG_ON(len > guc->send_regs.count);
On Mon, Oct 11, 2021 at 08:51:05PM +0530, Thanneeru Srinivasulu wrote:
Injecting probe errors -ENXIO for MMIO send.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 8f8182bf7c11..490d66712afc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -403,6 +403,10 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *request, u32 len, int i; int ret;
- ret = i915_inject_probe_error(i915, -ENXIO);
- if (ret)
return ret;
- GEM_BUG_ON(!len); GEM_BUG_ON(len > guc->send_regs.count);
-- 2.25.1
Inject probe errors -ENXIO, -EBUSY for CT send.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 83764db0fd6d..8ffef3abd3da 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -765,6 +765,14 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 status = ~0; /* undefined */ int ret;
+ ret = i915_inject_probe_error(ct_to_i915(ct), -ENXIO); + if (ret) + return ret; + + ret = i915_inject_probe_error(ct_to_i915(ct), -EBUSY); + if (ret) + return ret; + if (unlikely(!ct->enabled)) { struct intel_guc *guc = ct_to_guc(ct); struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
On Mon, Oct 11, 2021 at 08:51:06PM +0530, Thanneeru Srinivasulu wrote:
Inject probe errors -ENXIO, -EBUSY for CT send.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 83764db0fd6d..8ffef3abd3da 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -765,6 +765,14 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 status = ~0; /* undefined */ int ret;
- ret = i915_inject_probe_error(ct_to_i915(ct), -ENXIO);
- if (ret)
return ret;
I don't see where -ENXIO is returned during an error that we handle unless I am missing something. If we don't return -ENXIO anywhere else I don't think we need to inject this error.
Matt
- ret = i915_inject_probe_error(ct_to_i915(ct), -EBUSY);
- if (ret)
return ret;
- if (unlikely(!ct->enabled)) { struct intel_guc *guc = ct_to_guc(ct); struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
-- 2.25.1
On 11.10.2021 20:00, Matthew Brost wrote:
On Mon, Oct 11, 2021 at 08:51:06PM +0530, Thanneeru Srinivasulu wrote:
Inject probe errors -ENXIO, -EBUSY for CT send.
Signed-off-by: Thanneeru Srinivasulu thanneeru.srinivasulu@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 83764db0fd6d..8ffef3abd3da 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -765,6 +765,14 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 status = ~0; /* undefined */ int ret;
- ret = i915_inject_probe_error(ct_to_i915(ct), -ENXIO);
- if (ret)
return ret;
I don't see where -ENXIO is returned during an error that we handle unless I am missing something. If we don't return -ENXIO anywhere else I don't think we need to inject this error.
but the point of this exercise is not to handle such error but to gracefully abort probe without panic or leaks. note that we are already using -ENXIO in many other injected failure points (mostly in uc code)
thus for me above change is also fine and the whole series is:
Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com
-Michal
Matt
- ret = i915_inject_probe_error(ct_to_i915(ct), -EBUSY);
- if (ret)
return ret;
- if (unlikely(!ct->enabled)) { struct intel_guc *guc = ct_to_guc(ct); struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
-- 2.25.1
dri-devel@lists.freedesktop.org