Hi Brian,
On Mon, Mar 18, 2019 at 04:59:21PM +0000, Brian Starkey wrote:
Hi Laurent,
Sorry for the delay, I was travelling last week and didn't find a chance to digest your diagrams (thanks for all the detail!)
No worries, and thank you for the time you're spending on this. In the meantime I've found a solution that solves the race, using an entirely different mechanism. It's all explained in v6 of the patch series.
On Fri, Mar 08, 2019 at 02:24:40PM +0200, Laurent Pinchart wrote:
On Thu, Mar 07, 2019 at 12:28:08PM +0000, Brian Starkey wrote:
On Wed, Mar 06, 2019 at 08:22:44PM +0200, Laurent Pinchart wrote:
[snip]
I can always queue a new one, but I have no way of telling if the newly queued list raced with the frame end interrupt.
There's another register I'm not using that contains a shadow copy of the registers list DMA address. It mirrors the address programmed by the driver when there is no DMA of the registers list in progress, and contains the address the of registers list being DMA'ed when a DMA is in progress. I don't think I can do much with it, as reading it either before or after reprogramming (to check if I can reprogram or if the reprogram has been taken into account) is racy.
When you say it mirrors the address programmed, is that latched when the update is accepted, or it will update the moment you program the address register?
It is latched when the update is processed, at the next vblank following the address programming. The timing diagram below shows vblank, the UPD bit I use for synchronization, the registers list address register exposed to the CPU, and the internal address used to DMA the registers list. The address register is written four times to show how the hardware reacts, but in practice there will be a single write per frame, right after the beginning of vblank.
DMA starts | DMA ends | | V V ___________
VBLANK ______________________| |________ ________________ ________________ UPD _____| |___| _____ ______ _____________ ________ _______ ADDR register __A__X__B___X______C______X___D____X___E___ ______________________ ____________________ ADDR internal ___________A__________X__________C_________
I can reprogram the address any number of times I want before the vblank, but the update bit mechanism only lets me protect the race related to the first write. For any subsequent write I won't be able to tell whether it was complete before the hardware started the frame, or arrived too late.
I assume the latter or you would have thought of this yourself (that seems like a really strange/not-useful behaviour!). But if it is the former you could:
- In writeback start-of-frame, create a copy of the register list, disabling writeback.
- Write the address of this copy to the register
If/when an atomic commit comes in before you service the next end-of-frame:
- Write the address of the new register list
- Check the status register. If the "pending" bit is zero, you know you had a potential race.
- Check the DMA address register. If it corresponds to the new scene, the new commit won the race, otherwise it's been delayed by a frame.
The issue here is that there's a potential race if the pending (UPD) bit is one too. If the commit arrives just before vblank, but the address is written just after vblank, after the UPD bit has been reset to 0 by the hardware, but before the vblank interrupt is processed, then the commit won't be applied yet, and I will have no way to know. Compare the two following scenarios:
[1] [2] [3] [4] [5] | | | | | | V V | V V __________ V __________
VBLANK _______________| |________________| |__ _________ _______________________ UPD _____| |___| |_____________ _____ _____________ _____________ _______________________ ADDR register __A__X______B______X______C______X___________D___________ ___________________________________________ _____________ ADDR internal _______A_______X______B____________________X______D______
[1] Atomic commit, registers list address write, with writeback enabled [2] DMA starts for first atomic commit [3] Writeback disable registers list address write [4] Next atomic commit, with writeback disabled [5] DMA starts for second atomic commit
[1] [2] [3] [4][5] | | | | | | V V V V V __________ __________
VBLANK _______________| |________________| |__ _________ _______________________ __________ UPD _____| |___| |__| _____ _____________ __________________________ __________ ADDR register __A__X______B______X_____________C____________X_____D____ ___________________________________________ _____________ ADDR internal _______A_______X______B____________________X______C______
[1] Atomic commit, registers list address write, with writeback enabled [2] DMA starts for first atomic commit [3] Writeback disable registers list address write [4] DMA starts for writeback disable registers list (3) [5] Next atomic commit, with writeback disabled, performed right after vblank but befrore the vblank interrupt is serviced
The UPD bit is 1 after writing the ADDR register the second time in both cases. Furthermore, if [4] and [5] are very close in the second case, the UPD bit may read 1 just before [5] if the read comes before [4]:
read UPD bit; /* VBLANK [4] */ write ADDR register;
I thus can't rely on UPD = 1 before the write meaning that the write was performed before vblank, and I can't rely either on the UPD bit after write, as it's 1 in both cases.
My mistake, I got the UPD bit the wrong way around. I'm still not entirely sure why you can't use "ADDR internal" to determine which side won the race. It shows 'B' in the first case, and 'C' in the second.
Because ADDR internal isn't available to the CPU :-(
When a new commit comes, unconditionally:
- Write new address
- Read status
if status.UPD == 0 --> You know for sure your new commit was just latched. if status.UPD == 1 --> You need to check ADDR internal to see which of these three happened:
1) Your first case happened. We're somewhere in the middle of the frame. ADDR internal will show 'B', and you know commit 'D' is going on-screen at the next vblank. 2) Your second case happened. The new commit raced with the latching of writeback-disable and "lost". ADDR internal will show 'C', and the new commit is delayed by a frame 3) (Teeny tiny small window) In-between reading status and ADDR internal, the new commit was latched. ADDR internal will show 'D'. You know the new commit "won" so treat it the same as if UPD == 0 (which it will be, now).
Anyway, it's all moot now that you've found the chained lists thing - that sounds ideal. I'll take a look at the new series shortly.
It's a neat hardware feature, yes. We were already using it for a different purpose, I should have thought about it for writeback too from the very beginning.
Please note I've sent a pull request for v7 as it has been fully reviewed. Nonetheless, if you find issues, I can fix them on top.
I initially thought I could protect against the race using the following procedure.
Single atomic commit, no IRQ delay
[1] [2] [3] [4] | | | | | V V V V __________ __________
VBLANK _______________| |________________| |__ _________ UPD _____| |_________________________________________ _____ ___________________________________________________ ADDR register __A__X___________________B_______________________________ _______________ _________________________________________ ADDR internal _______A_______X______B__________________________________
[1] Atomic commit, registers list address write, with writeback enabled [2] DMA starts for first atomic commit [3] Frame start, disable writeback in-place in registers list B [4] DMA starts for "patched" registers list, disables writeback
Two atomic commits, no IRQ delay
[1] [2] [3] [4] [5] | | | | | | V V V V V __________ __________
VBLANK _______________| |________________| |__ _________ ______________________ UPD _____| |____| |_____________ _____ ______________ ____________________________________ ADDR register __A__X______B_______X_________________C__________________ _______________ ___________________________ _____________ ADDR internal _______A_______X_____________B_____________X______C______
[1] Atomic commit, registers list address write, with writeback enabled [2] DMA starts for first atomic commit [3] Next atomic commit, registers list address write, with writeback disabled [4] Frame start, disable writeback in-place in registers list B [5] DMA starts for second atomic commit, disables writeback
[3] and [4] can happen in any order, as long as they both come before [5]. If [3] comes after [5], we're back to the previous case (Single atomic commit, no IRQ delay).
Single atomic commit, IRQ delay
[1] [2] [3] [4] [5] | | | | | | V V V V V __________ __________
VBLANK _______________| |________________| |__ _________ UPD _____| |_________________________________________ _____ ___________________________________________________ ADDR register __A__X___________________B_______________________________ _______________ _________________________________________ ADDR internal _______A_______X______B__________________________________
[1] Atomic commit, registers list address write, with writeback enabled [2] DMA starts for first atomic commit [3] Frame start, IRQ is delayed to [5] [4] DMA starts for unmodified registers list, writeback still enable [5] disable writeback in-place in registers list B, too late
Here I need to detect that [5] was delayed after [4], and thus delay the completion of the writeback job by one frame. This could be done by checking the vblank interrupt status bit, if it is set then vblank occurred and raced [5]. However, the same issue can also happen when no race occurred if processing of the vblank interrupt for [2] is delayed until [3]. Both the vblank interrupt and the frame start interrupt status bits will be set, indicate a potential race.
The time between [2] and [3] is very short compared to the time between [3] and [4] and to interrupt latency in general, so we would have lots of false positives.
You don't happen to have a DMA engine trigger or something you could use to do the register list modification at a guaranteed time do you?
Not that I know of, unfortunately.
Are you always going to be protected by an IOMMU, preventing the writeback from trashing physical memory? If that's not the case, then the race can have pretty dire consequences.
If the IOMMU is enabled in DT, yes. It's a system-level decision.
Well, it's your driver at the end of the day. But for me, a known race-condition which would cause random memory corruption sounds like a really Bad Thing. Halving frame-rate on systems with no IOMMU seems preferable to me.
It is a really bad thing. I think the decision should be taken by Renesas though.