https://bugs.freedesktop.org/show_bug.cgi?id=97879
--- Comment #68 from Marek Olšák maraeo@gmail.com --- The long hangs are spent in an upload/decompression thread the game uses. The thread doesn't use any GL calls.
My theory that's now confirmed was that the game maps buffers in VRAM and does read-modify-write on that memory in that thread. We know that direct VRAM access is uncached and super slow if it's not a series of sequential writes that is write-combined on the CPU, which is like a write-only cache. Games typically upload data using a sequential copy/fill, which is why we had not seen this issue before.
If the game used the MAP READ flag, the driver would return a mapping in RAM with caching enabled. You'll also get that if you set the CLIENT STORAGE bit in glBufferStorage.
For completeness, you can also get a mapping in RAM that's uncached. The CPU and GPU performance is somewhere between the VRAM and cached RAM. We use it for streaming because the GPU has faster access to RAM uncached by the CPU.