On Tue, Jan 12, 2016 at 1:13 PM, Chris Wilson chris@chris-wilson.co.uk wrote:
That is a continual worry. To try and assuage that fear, I sent 8x flush gpu writes between the end of the copy and setting the "I'm done" flag. The definition of the GPU flush is that it both flushes all previous writes before it completes and only after it completes does it do the post-sync write (before moving onto the next command). The spec is always a bit hazy on what order the memory writes will be visible on the CPU though.
Sending the 8x GPU flushes before marking "I'm done" did not fix the corruption.
Ok. So assuming the GPU flushes are supposed to work, it should be all good.
So the reason you see the old content may just be that the GPU writes are still buffered on the GPU. And you adding a clflushopt on the same address just changes the timing enough that you don't see the memory ordering any more (or it's just much harder to see, it might still be there).
Indeed. So I replaced the post-clflush_cache_range() clflush() with a udelay(10) instead, and the corruption vanished. Putting the udelay(10) before the clflush_cache_range() does not fix the corruption.
Odd.
passes, I'm inclined to point the finger at the mb() following the clflush_cache_range().
We have an entirely unrelated discussion about the value of "mfence" as a memory barrier.
Mind trying to just make the memory barrier (in arch/x86/include/asm/barrier.h) be a locked op instead?
The docs say "Executions of the CLFLUSHOPT instruction are ordered with respect to fence instructions and to locked read-modify-write instructions; ..", so the mfence should be plenty good enough. But nobody sane uses mfence for memory ordering (that's the other discussion we're having), since a locked rmw instruction is faster.
So maybe it's a CPU bug. I'd still consider a GPU memory ordering bug *way* more likely (the CPU core tensd to be better validated in my experience), but since you're trying odd things anyway, try changing the "mfence" to "lock; addl $0,0(%%rsp)" instead.
I doubt it makes any difference, but ..
Linus