On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote:
During testing we observed that the last cacheline was not being flushed from a
mb() for (addr = addr & -clflush_size; addr < end; addr += clflush_size) clflushopt(); mb()
loop (where the initial addr and end were not cacheline aligned).
Changing the loop from addr < end to addr <= end, or replacing the clflushopt() with clflush() both fixed the testcase. Hinting that GCC was miscompling the assembly within the loop and specifically the alternative within clflushopt() was confusing the loop optimizer.
Adding a barrier() into clflushopt() is enough for GCC to dtrt, but solving why GCC is not seeing the constraints from the alternative_io() would be smarter...
Hmm, would something like adding the memory clobber to the alternative_io() definition work?
--- diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 7bfc85bbb8ff..d923e5dacdb1 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -207,7 +207,7 @@ static inline int alternatives_text_reserved(void *start, void *end) /* Like alternative_input, but with a single output argument */ #define alternative_io(oldinstr, newinstr, feature, output, input...) \ asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) \ - : output : "i" (0), ## input) + : output : "i" (0), ## input : "memory")
/* Like alternative_io, but for replacing a direct call with another one. */ #define alternative_call(oldfunc, newfunc, feature, output, input...) \