Noticed this while fixing some unrelated issues with NAKs being dropped - we don't wait nearly long enough to receive ACTs from MST hubs in some situations. Also, we take the time to refactor this function a bit.
This fixes some ACT timeouts I observed on an EVGA MST hub with i915.
Lyude Paul (4): drm/dp_mst: Improve kdocs for drm_dp_check_act_status() drm/dp_mst: Reformat drm_dp_check_act_status() a bit drm/dp_mst: Increase ACT retry timeout to 3s drm/dp_mst: Print errors on ACT timeouts
drivers/gpu/drm/drm_dp_mst_topology.c | 71 +++++++++++++++++---------- 1 file changed, 44 insertions(+), 27 deletions(-)
No functional changes.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: Sean Paul sean@poorly.run --- drivers/gpu/drm/drm_dp_mst_topology.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 5b205aea58d4..828ca63cc576 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -4462,10 +4462,14 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
/** - * drm_dp_check_act_status() - Check ACT handled status. + * drm_dp_check_act_status() - Polls for ACT handled status. * @mgr: manager to use * - * Check the payload status bits in the DPCD for ACT handled completion. + * Tries waiting for the MST hub to finish updating it's payload table by + * polling for the ACT handled bit. + * + * Returns: + * 0 if the ACT was handled in time, negative error code on failure. */ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr) {
Just add a bit more line wrapping, get rid of some extraneous whitespace, remove an unneeded goto label, and move around some variable declarations. No functional changes here.
Signed-off-by: Lyude Paul lyude@redhat.com [this isn't a fix, but it's needed for the fix that comes after this] Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)") Cc: Sean Paul sean@poorly.run Cc: stable@vger.kernel.org # v3.17+ --- drivers/gpu/drm/drm_dp_mst_topology.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 828ca63cc576..c83adbdfc1cd 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -4473,33 +4473,31 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr, */ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr) { + int count = 0, ret; u8 status; - int ret; - int count = 0;
do { - ret = drm_dp_dpcd_readb(mgr->aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status); - + ret = drm_dp_dpcd_readb(mgr->aux, + DP_PAYLOAD_TABLE_UPDATE_STATUS, + &status); if (ret < 0) { - DRM_DEBUG_KMS("failed to read payload table status %d\n", ret); - goto fail; + DRM_DEBUG_KMS("failed to read payload table status %d\n", + ret); + return ret; }
if (status & DP_PAYLOAD_ACT_HANDLED) break; count++; udelay(100); - } while (count < 30);
if (!(status & DP_PAYLOAD_ACT_HANDLED)) { - DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n", status, count); - ret = -EINVAL; - goto fail; + DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n", + status, count); + return -EINVAL; } return 0; -fail: - return ret; } EXPORT_SYMBOL(drm_dp_check_act_status);
Currently we only poll for an ACT up to 30 times, with a busy-wait delay of 100µs between each attempt - giving us a timeout of 2900µs. While this might seem sensible, it would appear that in certain scenarios it can take dramatically longer then that for us to receive an ACT. On one of the EVGA MST hubs that I have available, I observed said hub sometimes taking longer then a second before signalling the ACT. These delays mostly seem to occur when previous sideband messages we've sent are NAKd by the hub, however it wouldn't be particularly surprising if it's possible to reproduce times like this simply by introducing branch devices with large LCTs since payload allocations have to take effect on every downstream device up to the payload's target.
So, instead of just retrying 30 times we poll for the ACT for up to 3ms, and additionally use usleep_range() to avoid a very long and rude busy-wait. Note that the previous retry count of 30 appears to have been arbitrarily chosen, as I can't find any mention of a recommended timeout or retry count for ACTs in the DisplayPort 2.0 specification. This also goes for the range we were previously using for udelay(), although I suspect that was just copied from the recommended delay for link training on SST devices.
Changes since v1: * Use readx_poll_timeout() instead of open-coding timeout loop - Sean Paul
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)") Cc: Sean Paul sean@poorly.run Cc: stable@vger.kernel.org # v3.17+ --- drivers/gpu/drm/drm_dp_mst_topology.c | 57 ++++++++++++++++----------- 1 file changed, 34 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index c83adbdfc1cd..ce61964baa7c 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -27,6 +27,7 @@ #include <linux/kernel.h> #include <linux/sched.h> #include <linux/seq_file.h> +#include <linux/iopoll.h>
#if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) #include <linux/stacktrace.h> @@ -4460,43 +4461,53 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr, return ret; }
+static int do_get_act_status(struct drm_dp_aux *aux) +{ + int ret; + u8 status; + + ret = drm_dp_dpcd_readb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status); + if (ret < 0) + return ret; + + return status; +}
/** * drm_dp_check_act_status() - Polls for ACT handled status. * @mgr: manager to use * * Tries waiting for the MST hub to finish updating it's payload table by - * polling for the ACT handled bit. + * polling for the ACT handled bit for up to 3 seconds (yes-some hubs really + * take that long). * * Returns: * 0 if the ACT was handled in time, negative error code on failure. */ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr) { - int count = 0, ret; - u8 status; - - do { - ret = drm_dp_dpcd_readb(mgr->aux, - DP_PAYLOAD_TABLE_UPDATE_STATUS, - &status); - if (ret < 0) { - DRM_DEBUG_KMS("failed to read payload table status %d\n", - ret); - return ret; - } - - if (status & DP_PAYLOAD_ACT_HANDLED) - break; - count++; - udelay(100); - } while (count < 30); - - if (!(status & DP_PAYLOAD_ACT_HANDLED)) { - DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n", - status, count); + /* + * There doesn't seem to be any recommended retry count or timeout in + * the MST specification. Since some hubs have been observed to take + * over 1 second to update their payload allocations under certain + * conditions, we use a rather large timeout value. + */ + const int timeout_ms = 3000; + int ret, status; + + ret = readx_poll_timeout(do_get_act_status, mgr->aux, status, + status & DP_PAYLOAD_ACT_HANDLED || status < 0, + 100, timeout_ms * USEC_PER_MSEC); + if (ret < 0 && status >= 0) { + DRM_DEBUG_KMS("Failed to get ACT bit %d after %dms\n", + status, timeout_ms); return -EINVAL; + } else if (status < 0) { + DRM_DEBUG_KMS("Failed to read payload table status: %d\n", + status); + return status; } + return 0; } EXPORT_SYMBOL(drm_dp_check_act_status);
On Mon, Apr 6, 2020 at 6:13 PM Lyude Paul lyude@redhat.com wrote:
Currently we only poll for an ACT up to 30 times, with a busy-wait delay of 100µs between each attempt - giving us a timeout of 2900µs. While this might seem sensible, it would appear that in certain scenarios it can take dramatically longer then that for us to receive an ACT. On one of the EVGA MST hubs that I have available, I observed said hub sometimes taking longer then a second before signalling the ACT. These delays mostly seem to occur when previous sideband messages we've sent are NAKd by the hub, however it wouldn't be particularly surprising if it's possible to reproduce times like this simply by introducing branch devices with large LCTs since payload allocations have to take effect on every downstream device up to the payload's target.
So, instead of just retrying 30 times we poll for the ACT for up to 3ms, and additionally use usleep_range() to avoid a very long and rude busy-wait. Note that the previous retry count of 30 appears to have been arbitrarily chosen, as I can't find any mention of a recommended timeout or retry count for ACTs in the DisplayPort 2.0 specification. This also goes for the range we were previously using for udelay(), although I suspect that was just copied from the recommended delay for link training on SST devices.
Changes since v1:
- Use readx_poll_timeout() instead of open-coding timeout loop - Sean Paul
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)") Cc: Sean Paul sean@poorly.run Cc: stable@vger.kernel.org # v3.17+
drivers/gpu/drm/drm_dp_mst_topology.c | 57 ++++++++++++++++----------- 1 file changed, 34 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index c83adbdfc1cd..ce61964baa7c 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -27,6 +27,7 @@ #include <linux/kernel.h> #include <linux/sched.h> #include <linux/seq_file.h> +#include <linux/iopoll.h>
#if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) #include <linux/stacktrace.h> @@ -4460,43 +4461,53 @@ static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr, return ret; }
+static int do_get_act_status(struct drm_dp_aux *aux) +{
int ret;
u8 status;
ret = drm_dp_dpcd_readb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status);
if (ret < 0)
return ret;
return status;
+}
/**
- drm_dp_check_act_status() - Polls for ACT handled status.
- @mgr: manager to use
- Tries waiting for the MST hub to finish updating it's payload table by
- polling for the ACT handled bit.
- polling for the ACT handled bit for up to 3 seconds (yes-some hubs really
*/
- take that long).
- Returns:
- 0 if the ACT was handled in time, negative error code on failure.
int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr) {
int count = 0, ret;
u8 status;
do {
ret = drm_dp_dpcd_readb(mgr->aux,
DP_PAYLOAD_TABLE_UPDATE_STATUS,
&status);
if (ret < 0) {
DRM_DEBUG_KMS("failed to read payload table status %d\n",
ret);
return ret;
}
if (status & DP_PAYLOAD_ACT_HANDLED)
break;
count++;
udelay(100);
} while (count < 30);
if (!(status & DP_PAYLOAD_ACT_HANDLED)) {
DRM_DEBUG_KMS("failed to get ACT bit %d after %d retries\n",
status, count);
/*
* There doesn't seem to be any recommended retry count or timeout in
* the MST specification. Since some hubs have been observed to take
* over 1 second to update their payload allocations under certain
* conditions, we use a rather large timeout value.
*/
const int timeout_ms = 3000;
int ret, status;
ret = readx_poll_timeout(do_get_act_status, mgr->aux, status,
status & DP_PAYLOAD_ACT_HANDLED || status < 0,
100, timeout_ms * USEC_PER_MSEC);
In v1 the usleep range was 100 -> 1000, in v2 it's going to be 51 -> 100. Perhaps bump this up to 200?
if (ret < 0 && status >= 0) {
DRM_DEBUG_KMS("Failed to get ACT bit %d after %dms\n",
status, timeout_ms);
I still think status should be base 16 when printed
With those nits addressed,
Reviewed-by: Sean Paul sean@poorly.run
return -EINVAL;
} else if (status < 0) {
DRM_DEBUG_KMS("Failed to read payload table status: %d\n",
status);
return status; }
return 0;
} EXPORT_SYMBOL(drm_dp_check_act_status); -- 2.25.1
Although it's not unexpected for drm_dp_check_act_status() to fail due to DPCD read failures (as the hub may have just been unplugged suddenly), timeouts are a bit more worrying as they either mean we need a longer timeout value, or we aren't setting up payload allocations properly. So, let's start printing errors on timeouts.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: Sean Paul sean@poorly.run --- drivers/gpu/drm/drm_dp_mst_topology.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index ce61964baa7c..0cbeb0f5c834 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -4499,10 +4499,14 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr) status & DP_PAYLOAD_ACT_HANDLED || status < 0, 100, timeout_ms * USEC_PER_MSEC); if (ret < 0 && status >= 0) { - DRM_DEBUG_KMS("Failed to get ACT bit %d after %dms\n", - status, timeout_ms); + DRM_ERROR("Failed to get ACT after %dms, last status: %02x\n", + timeout_ms, status); return -EINVAL; } else if (status < 0) { + /* + * Failure here isn't unexpected - the hub may have + * just been unplugged + */ DRM_DEBUG_KMS("Failed to read payload table status: %d\n", status); return status;
On Mon, Apr 6, 2020 at 6:13 PM Lyude Paul lyude@redhat.com wrote:
Although it's not unexpected for drm_dp_check_act_status() to fail due to DPCD read failures (as the hub may have just been unplugged suddenly), timeouts are a bit more worrying as they either mean we need a longer timeout value, or we aren't setting up payload allocations properly. So, let's start printing errors on timeouts.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: Sean Paul sean@poorly.run
Patches 1,2,4 are still
Reviewed-by: Sean Paul sean@poorly.run
drivers/gpu/drm/drm_dp_mst_topology.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index ce61964baa7c..0cbeb0f5c834 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -4499,10 +4499,14 @@ int drm_dp_check_act_status(struct drm_dp_mst_topology_mgr *mgr) status & DP_PAYLOAD_ACT_HANDLED || status < 0, 100, timeout_ms * USEC_PER_MSEC); if (ret < 0 && status >= 0) {
DRM_DEBUG_KMS("Failed to get ACT bit %d after %dms\n",
status, timeout_ms);
DRM_ERROR("Failed to get ACT after %dms, last status: %02x\n",
timeout_ms, status); return -EINVAL; } else if (status < 0) {
/*
* Failure here isn't unexpected - the hub may have
* just been unplugged
*/ DRM_DEBUG_KMS("Failed to read payload table status: %d\n", status); return status;
-- 2.25.1
dri-devel@lists.freedesktop.org