FWIW, this only takes a few milliseconds on my systems. You'd have to profile where the time is spent on your system, but it's more likely somewhere between glamor and Mesa / LLVM than in the kernel.