HiĀ All,
So this patch, which I already submitted a while ago and back then it got one comment from Chris Wilson:
Basically you want a limit on the frequency of ... pcode access? As presented, you ultimately do not trust any write and the only solution is to disable all writes. No RPS whatsoever, run at max and hope rc6 works (maybe even decrease the rc6 threshold).
One of the ideas we floated was moving the pcode access to a worker and ratelimiting the updates.
Has finally seen some testing by users affected by the infamous "intel_idle.max_cstate=1 required to prevent crashes" bug: https://bugzilla.kernel.org/show_bug.cgi?id=109051
And so far the reports are that it does help to make the users stable for 2/3 users and it not being effective for 1/3 users.
Now that we've some test results, I believe that it is worthwhile to get this simple patch mainlined, hence this re-submission.
Regards,
Hans
Bay Trail devices are known to hang when changing the frequency often, this is discussed in great length in: https://bugzilla.kernel.org/show_bug.cgi?id=109051
Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3") is an attempt to workaround this. Several users in bko109051 report that an earlier version of this patch, v1: https://bugzilla.kernel.org/attachment.cgi?id=251471
Works better for them and they still see hangs with the merged v3.
Comparing the 2 versions shows that they are indeed not equivalent, v1 not only skips writing the GEN6_RP* registers from valleyview_set_rps, as v3 does. It also contained these modifications to i915_irq.c:
if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) { if (!vlv_c0_above(dev_priv, &dev_priv->rps.down_ei, &now, - dev_priv->rps.down_threshold)) + VLV_RP_DOWN_EI_THRESHOLD)) events |= GEN6_PM_RP_DOWN_THRESHOLD; dev_priv->rps.down_ei = now; }
if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) { if (vlv_c0_above(dev_priv, &dev_priv->rps.up_ei, &now, - dev_priv->rps.up_threshold)) + VLV_RP_UP_EI_THRESHOLD)) events |= GEN6_PM_RP_UP_THRESHOLD; dev_priv->rps.up_ei = now; }
Which use less aggressive up/down thresholds, which results in less GEN6_PM_RP_*_THRESHOLD events and thus in less calls to intel_set_rps() -> valleyview_set_rps() -> vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ). With the last call being the likely cause of the hang.
This commit hardcodes the threshold_up and _down values for Bay Trail to less aggressive values, reducing the amount of clock frequency changes, thus avoiding the hangs some people are still seeing with the merged fix.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=109051 Signed-off-by: Hans de Goede hdegoede@redhat.com --- drivers/gpu/drm/i915/i915_reg.h | 3 +++ drivers/gpu/drm/i915/intel_pm.c | 5 ++++- 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 505c605eff98..acafc8408e43 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -1390,6 +1390,9 @@ enum i915_power_well_id { #define VLV_BIAS_CPU_125_SOC_875 (6 << 2) #define CHV_BIAS_CPU_50_SOC_50 (3 << 2)
+#define VLV_RP_UP_EI_THRESHOLD 90 +#define VLV_RP_DOWN_EI_THRESHOLD 70 + /* vlv2 north clock has */ #define CCK_FUSE_REG 0x8 #define CCK_FUSE_HPLL_FREQ_MASK 0x3 diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 1db79a860b96..b06379622301 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -6116,8 +6116,11 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) /* When byt can survive without system hang with dynamic * sw freq adjustments, this restriction can be lifted. */ - if (IS_VALLEYVIEW(dev_priv)) + if (IS_VALLEYVIEW(dev_priv)) { + threshold_up = VLV_RP_UP_EI_THRESHOLD; + threshold_down = VLV_RP_DOWN_EI_THRESHOLD; goto skip_hw_write; + }
I915_WRITE(GEN6_RP_UP_EI, GT_INTERVAL_FROM_US(dev_priv, ei_up));
dri-devel@lists.freedesktop.org