There was a recent change to LLVM which increased conformance with OpenCL floating point semantics at some performance cost. That might explain at least some of the difference.