On January 11, 2016 3:28:01 AM PST, Chris Wilson chris@chris-wilson.co.uk wrote:
On Sat, Jan 09, 2016 at 02:36:03PM -0800, Andy Lutomirski wrote:
On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson
chris@chris-wilson.co.uk wrote:
On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote:
On 01/07/16 14:29, H. Peter Anvin wrote:
I would be very interested in knowing if replacing the final
clflushopt
with a clflush would resolve your problems (in which case the
last mb()
shouldn't be necessary either.)
Nevermind. CLFLUSH is not ordered with regards to CLFLUSHOPT to
the
same cache line.
Could you add a sync_cpu(); call to the end (can replace the final
mb())
and see if that helps your case?
s/sync_cpu()/sync_core()/
No. I still see failures on Baytrail and Braswell (Pineview is not affected) with the final mb() replaced with sync_core(). I can
reproduce
failures on Pineview by tweaking the clflush_cache_range()
parameters,
so I am fairly confident that it is validating the current code.
iirc sync_core() is cpuid, a heavy serialising instruction, an alternative to mfence. Is there anything that else I can infer
about
the nature of my bug from this result?
No clue, but I don't know much about the underlying architecture.
Can you try clflush_cache_ranging one cacheline less and then
manually
doing clflushopt; mb on the last cache line, just to make sure that the helper is really doing the right thing? You could also try clflush instead of clflushopt to see if that makes a difference.
I had looked at increasing the range over which clflush_cache_range() runs (using roundup/rounddown by cache lines), but it took something like +/- 256 bytes to pass all the tests. And also did s/clflushopt/clflush/ to confirm that made no differnce.
Bizarrely,
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 6000ad7..cf074400 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -141,6 +141,7 @@ void clflush_cache_range(void *vaddr, unsigned int size) for (; p < vend; p += clflush_size) clflushopt(p);
clflushopt(vend-1); mb();
} EXPORT_SYMBOL_GPL(clflush_cache_range);
works like a charm. -Chris
That clflushopt touches a cache line already touched and therefore serializes with it.