The vsynced blit path for presentation is implemented by *stalling* GPU command processing. That's a big hammer, and might be related to your problems. You can skip this (at the cost of tearing in some cases) by disabling SwapBuffersWait. Have you tried that, just for testing?