From: Hans Verkuil hans.verkuil@cisco.com
The first patch adds new status flags to indicate when a pending message is aborted because the CEC adapter is unconfigured, and when a transmit times out (this indicates a driver bug).
The second and third patch fix a minor issue with the adv HDMI receivers: if the EDID goes away, then the physical address also becomes invalid.
The fourth patch fixes a race condition in the omap4 CEC driver that causes a transmit time out. The final patch drops the code in the omap4 CEC driver that attempts to set the number of retransmits: those register bits are read-only, so the code is pointless.
There are no dependencies between these patches, although the first and fourth patch relate to the same problem. With the new transmit TIMEOUT status I hope that it will be easier to catch driver bugs like that earlier since this bug remained hidden for too long.
Regards,
Hans
Hans Verkuil (5): cec: add new tx/rx status bits to detect aborts/timeouts adv7604: when the EDID is cleared, unconfigure CEC as well adv7842: when the EDID is cleared, unconfigure CEC as well omapdrm/dss/hdmi4_cec.c: clear TX FIFO before transmit_done omapdrm/dss/hdmi4_cec.c: don't set the retransmit count
.../media/uapi/cec/cec-ioc-receive.rst | 25 ++++++- drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 38 +++++------ drivers/media/cec/cec-adap.c | 66 +++++-------------- drivers/media/i2c/adv7604.c | 4 +- drivers/media/i2c/adv7842.c | 4 +- include/uapi/linux/cec.h | 3 + 6 files changed, 67 insertions(+), 73 deletions(-)
From: Hans Verkuil hans.verkuil@cisco.com
If the HDMI cable is disconnected or the CEC adapter is manually unconfigured, then all pending transmits and wait-for-replies are aborted. Signal this with new status bits (CEC_RX/TX_STATUS_ABORTED).
If due to (usually) a driver bug a transmit never ends (i.e. the transmit_done was never called by the driver), then when this times out the message is marked with CEC_TX_STATUS_TIMEOUT.
This should not happen and is an indication of a driver bug.
Without a separate status bit for this it was impossible to detect this from userspace.
The 'transmit timed out' kernel message is now a warning, so this should be more prominent in the kernel log as well.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com --- .../media/uapi/cec/cec-ioc-receive.rst | 25 ++++++- drivers/media/cec/cec-adap.c | 66 +++++-------------- include/uapi/linux/cec.h | 3 + 3 files changed, 44 insertions(+), 50 deletions(-)
diff --git a/Documentation/media/uapi/cec/cec-ioc-receive.rst b/Documentation/media/uapi/cec/cec-ioc-receive.rst index e964074cd15b..b25e48afaa08 100644 --- a/Documentation/media/uapi/cec/cec-ioc-receive.rst +++ b/Documentation/media/uapi/cec/cec-ioc-receive.rst @@ -16,10 +16,10 @@ CEC_RECEIVE, CEC_TRANSMIT - Receive or transmit a CEC message Synopsis ========
-.. c:function:: int ioctl( int fd, CEC_RECEIVE, struct cec_msg *argp ) +.. c:function:: int ioctl( int fd, CEC_RECEIVE, struct cec_msg *argp ) :name: CEC_RECEIVE
-.. c:function:: int ioctl( int fd, CEC_TRANSMIT, struct cec_msg *argp ) +.. c:function:: int ioctl( int fd, CEC_TRANSMIT, struct cec_msg *argp ) :name: CEC_TRANSMIT
Arguments @@ -272,6 +272,19 @@ View On' messages from initiator 0xf ('Unregistered') to destination 0 ('TV'). - The transmit failed after one or more retries. This status bit is mutually exclusive with :ref:`CEC_TX_STATUS_OK <CEC-TX-STATUS-OK>`. Other bits can still be set to explain which failures were seen. + * .. _`CEC-TX-STATUS-ABORTED`: + + - ``CEC_TX_STATUS_ABORTED`` + - 0x40 + - The transmit was aborted due to an HDMI disconnect, or the adapter + was unconfigured, or a transmit was interrupted, or the driver + returned an error when attempting to start a transmit. + * .. _`CEC-TX-STATUS-TIMEOUT`: + + - ``CEC_TX_STATUS_TIMEOUT`` + - 0x80 + - The transmit timed out. This should not normally happen and this + indicates a driver problem.
.. tabularcolumns:: |p{5.6cm}|p{0.9cm}|p{11.0cm}| @@ -300,6 +313,14 @@ View On' messages from initiator 0xf ('Unregistered') to destination 0 ('TV'). - The message was received successfully but the reply was ``CEC_MSG_FEATURE_ABORT``. This status is only set if this message was the reply to an earlier transmitted message. + * .. _`CEC-RX-STATUS-ABORTED`: + + - ``CEC_RX_STATUS_ABORTED`` + - 0x08 + - The wait for a reply to an earlier transmitted message was aborted + because the HDMI cable was disconnected, the adapter was unconfigured + or the :ref:`CEC_TRANSMIT <CEC_RECEIVE>` that waited for a + reply was interrupted.
diff --git a/drivers/media/cec/cec-adap.c b/drivers/media/cec/cec-adap.c index 030b2602faf0..2ebb53fd4800 100644 --- a/drivers/media/cec/cec-adap.c +++ b/drivers/media/cec/cec-adap.c @@ -341,7 +341,7 @@ static void cec_data_completed(struct cec_data *data) * * This function is called with adap->lock held. */ -static void cec_data_cancel(struct cec_data *data) +static void cec_data_cancel(struct cec_data *data, u8 tx_status) { /* * It's either the current transmit, or it is a pending @@ -356,13 +356,11 @@ static void cec_data_cancel(struct cec_data *data) }
if (data->msg.tx_status & CEC_TX_STATUS_OK) { - /* Mark the canceled RX as a timeout */ data->msg.rx_ts = ktime_get_ns(); - data->msg.rx_status = CEC_RX_STATUS_TIMEOUT; + data->msg.rx_status = CEC_RX_STATUS_ABORTED; } else { - /* Mark the canceled TX as an error */ data->msg.tx_ts = ktime_get_ns(); - data->msg.tx_status |= CEC_TX_STATUS_ERROR | + data->msg.tx_status |= tx_status | CEC_TX_STATUS_MAX_RETRIES; data->msg.tx_error_cnt++; data->attempts = 0; @@ -390,15 +388,15 @@ static void cec_flush(struct cec_adapter *adap) while (!list_empty(&adap->transmit_queue)) { data = list_first_entry(&adap->transmit_queue, struct cec_data, list); - cec_data_cancel(data); + cec_data_cancel(data, CEC_TX_STATUS_ABORTED); } if (adap->transmitting) - cec_data_cancel(adap->transmitting); + cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED);
/* Cancel the pending timeout work. */ list_for_each_entry_safe(data, n, &adap->wait_queue, list) { if (cancel_delayed_work(&data->work)) - cec_data_cancel(data); + cec_data_cancel(data, CEC_TX_STATUS_OK); /* * If cancel_delayed_work returned false, then * the cec_wait_timeout function is running, @@ -474,12 +472,13 @@ int cec_thread_func(void *_adap) * so much traffic on the bus that the adapter was * unable to transmit for CEC_XFER_TIMEOUT_MS (2.1s). */ - dprintk(1, "%s: message %*ph timed out\n", __func__, + pr_warn("cec-%s: message %*ph timed out\n", adap->name, adap->transmitting->msg.len, adap->transmitting->msg.msg); adap->tx_timeouts++; /* Just give up on this. */ - cec_data_cancel(adap->transmitting); + cec_data_cancel(adap->transmitting, + CEC_TX_STATUS_TIMEOUT); goto unlock; }
@@ -530,7 +529,7 @@ int cec_thread_func(void *_adap) /* Tell the adapter to transmit, cancel on error */ if (adap->ops->adap_transmit(adap, data->attempts, signal_free_time, &data->msg)) - cec_data_cancel(data); + cec_data_cancel(data, CEC_TX_STATUS_ABORTED);
unlock: mutex_unlock(&adap->lock); @@ -702,8 +701,6 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, { struct cec_data *data; u8 last_initiator = 0xff; - unsigned int timeout; - int res = 0;
msg->rx_ts = 0; msg->tx_ts = 0; @@ -845,48 +842,21 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, if (!block) return 0;
- /* - * If we don't get a completion before this time something is really - * wrong and we time out. - */ - timeout = CEC_XFER_TIMEOUT_MS; - /* Add the requested timeout if we have to wait for a reply as well */ - if (msg->timeout) - timeout += msg->timeout; - /* * Release the lock and wait, retake the lock afterwards. */ mutex_unlock(&adap->lock); - res = wait_for_completion_killable_timeout(&data->c, - msecs_to_jiffies(timeout)); + wait_for_completion_killable(&data->c); mutex_lock(&adap->lock);
- if (data->completed) { - /* The transmit completed (possibly with an error) */ - *msg = data->msg; - kfree(data); - return 0; - } - /* - * The wait for completion timed out or was interrupted, so mark this - * as non-blocking and disconnect from the filehandle since it is - * still 'in flight'. When it finally completes it will just drop the - * result silently. - */ - data->blocking = false; - if (data->fh) - list_del(&data->xfer_list); - data->fh = NULL; + /* Cancel the transmit if it was interrupted */ + if (!data->completed) + cec_data_cancel(data, CEC_TX_STATUS_ABORTED);
- if (res == 0) { /* timed out */ - /* Check if the reply or the transmit failed */ - if (msg->timeout && (msg->tx_status & CEC_TX_STATUS_OK)) - msg->rx_status = CEC_RX_STATUS_TIMEOUT; - else - msg->tx_status = CEC_TX_STATUS_MAX_RETRIES; - } - return res > 0 ? 0 : res; + /* The transmit completed (possibly with an error) */ + *msg = data->msg; + kfree(data); + return 0; }
/* Helper function to be used by drivers and this framework. */ diff --git a/include/uapi/linux/cec.h b/include/uapi/linux/cec.h index 097fcd812471..3094af68b6e7 100644 --- a/include/uapi/linux/cec.h +++ b/include/uapi/linux/cec.h @@ -152,10 +152,13 @@ static inline void cec_msg_set_reply_to(struct cec_msg *msg, #define CEC_TX_STATUS_LOW_DRIVE (1 << 3) #define CEC_TX_STATUS_ERROR (1 << 4) #define CEC_TX_STATUS_MAX_RETRIES (1 << 5) +#define CEC_TX_STATUS_ABORTED (1 << 6) +#define CEC_TX_STATUS_TIMEOUT (1 << 7)
#define CEC_RX_STATUS_OK (1 << 0) #define CEC_RX_STATUS_TIMEOUT (1 << 1) #define CEC_RX_STATUS_FEATURE_ABORT (1 << 2) +#define CEC_RX_STATUS_ABORTED (1 << 3)
static inline int cec_msg_status_is_ok(const struct cec_msg *msg) {
From: Hans Verkuil hans.verkuil@cisco.com
When there is no EDID the CEC adapter should be unconfigured as well. So call cec_phys_addr_invalidate() when this happens.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com --- drivers/media/i2c/adv7604.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/media/i2c/adv7604.c b/drivers/media/i2c/adv7604.c index 668be2bca57a..3376d5cb05d5 100644 --- a/drivers/media/i2c/adv7604.c +++ b/drivers/media/i2c/adv7604.c @@ -2284,8 +2284,10 @@ static int adv76xx_set_edid(struct v4l2_subdev *sd, struct v4l2_edid *edid) state->aspect_ratio.numerator = 16; state->aspect_ratio.denominator = 9;
- if (!state->edid.present) + if (!state->edid.present) { state->edid.blocks = 0; + cec_phys_addr_invalidate(state->cec_adap); + }
v4l2_dbg(2, debug, sd, "%s: clear EDID pad %d, edid.present = 0x%x\n", __func__, edid->pad, state->edid.present);
From: Hans Verkuil hans.verkuil@cisco.com
When there is no EDID the CEC adapter should be unconfigured as well. So call cec_phys_addr_invalidate() when this happens.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com --- drivers/media/i2c/adv7842.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/media/i2c/adv7842.c b/drivers/media/i2c/adv7842.c index 4f8fbdd00e35..71fe56565f75 100644 --- a/drivers/media/i2c/adv7842.c +++ b/drivers/media/i2c/adv7842.c @@ -786,8 +786,10 @@ static int edid_write_hdmi_segment(struct v4l2_subdev *sd, u8 port) /* Disable I2C access to internal EDID ram from HDMI DDC ports */ rep_write_and_or(sd, 0x77, 0xf3, 0x00);
- if (!state->hdmi_edid.present) + if (!state->hdmi_edid.present) { + cec_phys_addr_invalidate(state->cec_adap); return 0; + }
pa = cec_get_edid_phys_addr(edid, 256, &spa_loc); err = cec_phys_addr_validate(pa, &pa, NULL);
From: Hans Verkuil hans.verkuil@cisco.com
The TX FIFO has to be cleared if the transmit failed due to e.g. a NACK condition, otherwise the hardware will keep trying to transmit the message.
An attempt was made to do this, but it was done after the call to cec_transmit_done, which can cause a race condition since the call to cec_transmit_done can cause a new transmit to be issued, and then attempting to clear the TX FIFO will actually clear the new transmit instead of the old transmit and the new transmit simply never happens.
By clearing the FIFO before transmit_done is called this race is fixed.
Note that there is no reason to clear the FIFO if the transmit was successful, so the attempt to clear the FIFO in that case was dropped.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com --- drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 35 ++++++++++++------------- 1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c index 340383150fb9..dee66a5101b5 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c @@ -106,6 +106,22 @@ static void hdmi_cec_received_msg(struct hdmi_core_data *core) } }
+static bool hdmi_cec_clear_tx_fifo(struct cec_adapter *adap) +{ + struct hdmi_core_data *core = cec_get_drvdata(adap); + int retry = HDMI_CORE_CEC_RETRY; + int temp; + + REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7); + while (retry) { + temp = hdmi_read_reg(core->base, HDMI_CEC_DBG_3); + if (FLD_GET(temp, 7, 7) == 0) + break; + retry--; + } + return retry != 0; +} + void hdmi4_cec_irq(struct hdmi_core_data *core) { u32 stat0 = hdmi_read_reg(core->base, HDMI_CEC_INT_STATUS_0); @@ -117,36 +133,19 @@ void hdmi4_cec_irq(struct hdmi_core_data *core) if (stat0 & 0x20) { cec_transmit_done(core->adap, CEC_TX_STATUS_OK, 0, 0, 0, 0); - REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7); } else if (stat1 & 0x02) { u32 dbg3 = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);
+ hdmi_cec_clear_tx_fifo(core->adap); cec_transmit_done(core->adap, CEC_TX_STATUS_NACK | CEC_TX_STATUS_MAX_RETRIES, 0, (dbg3 >> 4) & 7, 0, 0); - REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7); } if (stat0 & 0x02) hdmi_cec_received_msg(core); }
-static bool hdmi_cec_clear_tx_fifo(struct cec_adapter *adap) -{ - struct hdmi_core_data *core = cec_get_drvdata(adap); - int retry = HDMI_CORE_CEC_RETRY; - int temp; - - REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7); - while (retry) { - temp = hdmi_read_reg(core->base, HDMI_CEC_DBG_3); - if (FLD_GET(temp, 7, 7) == 0) - break; - retry--; - } - return retry != 0; -} - static bool hdmi_cec_clear_rx_fifo(struct cec_adapter *adap) { struct hdmi_core_data *core = cec_get_drvdata(adap);
Tomi,
Can you review this patch and the next? They should go to 4.20. This patch in particular is a nasty one, hard to reproduce.
This patch should also be Cc-ed to stable for 4.15 and up.
Tracking down randomly disappearing CEC transmits was no fun :-(
Regards,
Hans
On 10/04/18 11:08, Hans Verkuil wrote:
From: Hans Verkuil hans.verkuil@cisco.com
The TX FIFO has to be cleared if the transmit failed due to e.g. a NACK condition, otherwise the hardware will keep trying to transmit the message.
An attempt was made to do this, but it was done after the call to cec_transmit_done, which can cause a race condition since the call to cec_transmit_done can cause a new transmit to be issued, and then attempting to clear the TX FIFO will actually clear the new transmit instead of the old transmit and the new transmit simply never happens.
By clearing the FIFO before transmit_done is called this race is fixed.
Note that there is no reason to clear the FIFO if the transmit was successful, so the attempt to clear the FIFO in that case was dropped.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com
drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 35 ++++++++++++------------- 1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c index 340383150fb9..dee66a5101b5 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c @@ -106,6 +106,22 @@ static void hdmi_cec_received_msg(struct hdmi_core_data *core) } }
+static bool hdmi_cec_clear_tx_fifo(struct cec_adapter *adap) +{
- struct hdmi_core_data *core = cec_get_drvdata(adap);
- int retry = HDMI_CORE_CEC_RETRY;
- int temp;
- REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
- while (retry) {
temp = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);
if (FLD_GET(temp, 7, 7) == 0)
break;
retry--;
- }
- return retry != 0;
+}
void hdmi4_cec_irq(struct hdmi_core_data *core) { u32 stat0 = hdmi_read_reg(core->base, HDMI_CEC_INT_STATUS_0); @@ -117,36 +133,19 @@ void hdmi4_cec_irq(struct hdmi_core_data *core) if (stat0 & 0x20) { cec_transmit_done(core->adap, CEC_TX_STATUS_OK, 0, 0, 0, 0);
} else if (stat1 & 0x02) { u32 dbg3 = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
cec_transmit_done(core->adap, CEC_TX_STATUS_NACK | CEC_TX_STATUS_MAX_RETRIES, 0, (dbg3 >> 4) & 7, 0, 0);hdmi_cec_clear_tx_fifo(core->adap);
} if (stat0 & 0x02) hdmi_cec_received_msg(core);REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
}
-static bool hdmi_cec_clear_tx_fifo(struct cec_adapter *adap) -{
- struct hdmi_core_data *core = cec_get_drvdata(adap);
- int retry = HDMI_CORE_CEC_RETRY;
- int temp;
- REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
- while (retry) {
temp = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);
if (FLD_GET(temp, 7, 7) == 0)
break;
retry--;
- }
- return retry != 0;
-}
static bool hdmi_cec_clear_rx_fifo(struct cec_adapter *adap) { struct hdmi_core_data *core = cec_get_drvdata(adap);
On 05/10/18 17:13, Hans Verkuil wrote:
Tomi,
Can you review this patch and the next? They should go to 4.20. This patch in particular is a nasty one, hard to reproduce.
This patch should also be Cc-ed to stable for 4.15 and up.
Done. There's no dependency from the omapdrm patches to the first patch, is there? I.e. I can apply these via omapdrm?
Tomi
On 10/08/2018 02:52 PM, Tomi Valkeinen wrote:
On 05/10/18 17:13, Hans Verkuil wrote:
Tomi,
Can you review this patch and the next? They should go to 4.20. This patch in particular is a nasty one, hard to reproduce.
This patch should also be Cc-ed to stable for 4.15 and up.
Done. There's no dependency from the omapdrm patches to the first patch, is there? I.e. I can apply these via omapdrm?
Correct.
Regards,
Hans
Hi Hans,
On 04/10/18 12:08, Hans Verkuil wrote:
From: Hans Verkuil hans.verkuil@cisco.com
The TX FIFO has to be cleared if the transmit failed due to e.g. a NACK condition, otherwise the hardware will keep trying to transmit the message.
An attempt was made to do this, but it was done after the call to cec_transmit_done, which can cause a race condition since the call to cec_transmit_done can cause a new transmit to be issued, and then attempting to clear the TX FIFO will actually clear the new transmit instead of the old transmit and the new transmit simply never happens.
By clearing the FIFO before transmit_done is called this race is fixed.
Note that there is no reason to clear the FIFO if the transmit was successful, so the attempt to clear the FIFO in that case was dropped.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com
drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 35 ++++++++++++------------- 1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c index 340383150fb9..dee66a5101b5 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c @@ -106,6 +106,22 @@ static void hdmi_cec_received_msg(struct hdmi_core_data *core) } }
+static bool hdmi_cec_clear_tx_fifo(struct cec_adapter *adap) +{
- struct hdmi_core_data *core = cec_get_drvdata(adap);
- int retry = HDMI_CORE_CEC_RETRY;
- int temp;
- REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
- while (retry) {
temp = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);
if (FLD_GET(temp, 7, 7) == 0)
break;
This is fine, but as you're using the helper macros already, there's REG_GET:
REG_GET(core->base, HDMI_CEC_DBG_3, 7, 7)
which removes the need for temp. Are you sure this works reliably? Usually when polling a register bit, I like to measure real-world-time in some way to ensure I actually poll for a certain amount of time.
And just a matter of opinion, but I would've written:
while (retry) { if (!REG_GET(..)) return true; retry--; }
return false;
retry--;
- }
- return retry != 0;
+}
void hdmi4_cec_irq(struct hdmi_core_data *core) { u32 stat0 = hdmi_read_reg(core->base, HDMI_CEC_INT_STATUS_0); @@ -117,36 +133,19 @@ void hdmi4_cec_irq(struct hdmi_core_data *core) if (stat0 & 0x20) { cec_transmit_done(core->adap, CEC_TX_STATUS_OK, 0, 0, 0, 0);
} else if (stat1 & 0x02) { u32 dbg3 = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
hdmi_cec_clear_tx_fifo(core->adap);
Would a dev_err be ok here?
Tomi
On 10/08/2018 02:45 PM, Tomi Valkeinen wrote:
Hi Hans,
On 04/10/18 12:08, Hans Verkuil wrote:
From: Hans Verkuil hans.verkuil@cisco.com
The TX FIFO has to be cleared if the transmit failed due to e.g. a NACK condition, otherwise the hardware will keep trying to transmit the message.
An attempt was made to do this, but it was done after the call to cec_transmit_done, which can cause a race condition since the call to cec_transmit_done can cause a new transmit to be issued, and then attempting to clear the TX FIFO will actually clear the new transmit instead of the old transmit and the new transmit simply never happens.
By clearing the FIFO before transmit_done is called this race is fixed.
Note that there is no reason to clear the FIFO if the transmit was successful, so the attempt to clear the FIFO in that case was dropped.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com
drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 35 ++++++++++++------------- 1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c index 340383150fb9..dee66a5101b5 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c @@ -106,6 +106,22 @@ static void hdmi_cec_received_msg(struct hdmi_core_data *core) } }
+static bool hdmi_cec_clear_tx_fifo(struct cec_adapter *adap) +{
- struct hdmi_core_data *core = cec_get_drvdata(adap);
- int retry = HDMI_CORE_CEC_RETRY;
- int temp;
- REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
- while (retry) {
temp = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);
if (FLD_GET(temp, 7, 7) == 0)
break;
This is fine, but as you're using the helper macros already, there's REG_GET:
REG_GET(core->base, HDMI_CEC_DBG_3, 7, 7)
which removes the need for temp. Are you sure this works reliably? Usually when polling a register bit, I like to measure real-world-time in some way to ensure I actually poll for a certain amount of time.
I'll add a bit of debugging to double-check but as far as I remember this is very fast and adding delays is overkill.
FYI: we (Cisco) use this code in our products and we'd have seen it if this would fail.
And just a matter of opinion, but I would've written:
while (retry) { if (!REG_GET(..)) return true; retry--; }
return false;
retry--;
- }
- return retry != 0;
+}
In this patch I just moved up the hdmi_cec_clear_tx_fifo so I can use it in hdmi4_cec_irq. I rather not make any changes to that function.
Unless you object I prefer to make a new patch for 4.21 to improve it.
void hdmi4_cec_irq(struct hdmi_core_data *core) { u32 stat0 = hdmi_read_reg(core->base, HDMI_CEC_INT_STATUS_0); @@ -117,36 +133,19 @@ void hdmi4_cec_irq(struct hdmi_core_data *core) if (stat0 & 0x20) { cec_transmit_done(core->adap, CEC_TX_STATUS_OK, 0, 0, 0, 0);
} else if (stat1 & 0x02) { u32 dbg3 = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
hdmi_cec_clear_tx_fifo(core->adap);
Would a dev_err be ok here?
Hmm. I would prefer a dev_err_once. Chances are that if this ever fails, it might fail continuously (as in: something is very seriously wrong), and you don't want that in an irq function.
Regards,
Hans
On 08/10/18 15:55, Hans Verkuil wrote:
On 10/08/2018 02:45 PM, Tomi Valkeinen wrote:
Hi Hans,
On 04/10/18 12:08, Hans Verkuil wrote:
From: Hans Verkuil hans.verkuil@cisco.com
The TX FIFO has to be cleared if the transmit failed due to e.g. a NACK condition, otherwise the hardware will keep trying to transmit the message.
An attempt was made to do this, but it was done after the call to cec_transmit_done, which can cause a race condition since the call to cec_transmit_done can cause a new transmit to be issued, and then attempting to clear the TX FIFO will actually clear the new transmit instead of the old transmit and the new transmit simply never happens.
By clearing the FIFO before transmit_done is called this race is fixed.
Note that there is no reason to clear the FIFO if the transmit was successful, so the attempt to clear the FIFO in that case was dropped.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com
drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 35 ++++++++++++------------- 1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c index 340383150fb9..dee66a5101b5 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c @@ -106,6 +106,22 @@ static void hdmi_cec_received_msg(struct hdmi_core_data *core) } }
+static bool hdmi_cec_clear_tx_fifo(struct cec_adapter *adap) +{
- struct hdmi_core_data *core = cec_get_drvdata(adap);
- int retry = HDMI_CORE_CEC_RETRY;
- int temp;
- REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
- while (retry) {
temp = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);
if (FLD_GET(temp, 7, 7) == 0)
break;
This is fine, but as you're using the helper macros already, there's REG_GET:
REG_GET(core->base, HDMI_CEC_DBG_3, 7, 7)
which removes the need for temp. Are you sure this works reliably? Usually when polling a register bit, I like to measure real-world-time in some way to ensure I actually poll for a certain amount of time.
I'll add a bit of debugging to double-check but as far as I remember this is very fast and adding delays is overkill.
I agree, it's very unlikely that this would not work. Loops like this just make me a bit uneasy =).
As we will catch the error via the return value, I think it's not worth adding real-time tracking here, unless problems start to show up.
FYI: we (Cisco) use this code in our products and we'd have seen it if this would fail.
And just a matter of opinion, but I would've written:
while (retry) { if (!REG_GET(..)) return true; retry--; }
return false;
retry--;
- }
- return retry != 0;
+}
In this patch I just moved up the hdmi_cec_clear_tx_fifo so I can use it in hdmi4_cec_irq. I rather not make any changes to that function.
Unless you object I prefer to make a new patch for 4.21 to improve it.
Sounds fine.
void hdmi4_cec_irq(struct hdmi_core_data *core) { u32 stat0 = hdmi_read_reg(core->base, HDMI_CEC_INT_STATUS_0); @@ -117,36 +133,19 @@ void hdmi4_cec_irq(struct hdmi_core_data *core) if (stat0 & 0x20) { cec_transmit_done(core->adap, CEC_TX_STATUS_OK, 0, 0, 0, 0);
} else if (stat1 & 0x02) { u32 dbg3 = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
hdmi_cec_clear_tx_fifo(core->adap);
Would a dev_err be ok here?
Hmm. I would prefer a dev_err_once. Chances are that if this ever fails, it might fail continuously (as in: something is very seriously wrong), and you don't want that in an irq function.
That's ok. As long as there's some indication that we're having an issue. Or do you think some other part will break and print errors, and it's easy to pinpoint to the fifo clearing? Well, I don't even know how the fifo clearing could break....
Tomi
From: Hans Verkuil hans.verkuil@cisco.com
The HDMI_CEC_DBG_3 register does have a retransmit count, but you can't write to it, those bits are read-only.
So drop the attempt to set the retransmit count, since it doesn't work.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com --- drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c index dee66a5101b5..00407f1995a8 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c @@ -280,9 +280,6 @@ static int hdmi_cec_adap_transmit(struct cec_adapter *adap, u8 attempts, hdmi_write_reg(core->base, HDMI_CEC_INT_STATUS_1, HDMI_CEC_RETRANSMIT_CNT_INT_MASK);
- /* Set the retry count */ - REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, attempts - 1, 6, 4); - /* Set the initiator addresses */ hdmi_write_reg(core->base, HDMI_CEC_TX_INIT, cec_msg_initiator(msg));
On 04/10/18 12:09, Hans Verkuil wrote:
From: Hans Verkuil hans.verkuil@cisco.com
The HDMI_CEC_DBG_3 register does have a retransmit count, but you can't write to it, those bits are read-only.
So drop the attempt to set the retransmit count, since it doesn't work.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com
drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c index dee66a5101b5..00407f1995a8 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c @@ -280,9 +280,6 @@ static int hdmi_cec_adap_transmit(struct cec_adapter *adap, u8 attempts, hdmi_write_reg(core->base, HDMI_CEC_INT_STATUS_1, HDMI_CEC_RETRANSMIT_CNT_INT_MASK);
- /* Set the retry count */
- REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, attempts - 1, 6, 4);
I presume there's no harm in having a different retry count in the HW than what was requested via the API?
Reviewed-by: Tomi Valkeinen tomi.valkeinen@ti.com
Tomi
On 10/08/2018 02:47 PM, Tomi Valkeinen wrote:
On 04/10/18 12:09, Hans Verkuil wrote:
From: Hans Verkuil hans.verkuil@cisco.com
The HDMI_CEC_DBG_3 register does have a retransmit count, but you can't write to it, those bits are read-only.
So drop the attempt to set the retransmit count, since it doesn't work.
Signed-off-by: Hans Verkuil hans.verkuil@cisco.com
drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c index dee66a5101b5..00407f1995a8 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c @@ -280,9 +280,6 @@ static int hdmi_cec_adap_transmit(struct cec_adapter *adap, u8 attempts, hdmi_write_reg(core->base, HDMI_CEC_INT_STATUS_1, HDMI_CEC_RETRANSMIT_CNT_INT_MASK);
- /* Set the retry count */
- REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, attempts - 1, 6, 4);
I presume there's no harm in having a different retry count in the HW than what was requested via the API?
Correct.
Some CEC HW implementations expect that the software calculates the retry count, but the omap4 does this in hardware (not quite optimally, I'm afraid) and those just ignore the argument.
Regards,
Hans
Reviewed-by: Tomi Valkeinen tomi.valkeinen@ti.com
Tomi
dri-devel@lists.freedesktop.org