https://bugzilla.kernel.org/show_bug.cgi?id=119631
--- Comment #12 from Axel Davy vebveb@hotmail.fr --- thread_submit delays submission of buffers to the X server until the moment the buffer has all rendering finished.
This is useful for DRI PRIME, because the kernel doesn't have yet all the pieces to have the other card wait for the rendering. I thought perhaps in your case the issue was that the async flip was taking too long because it needed to wait for the rendering to finish, but apparently not.