I was able to track the TTM issues down to a very small coherent dma memory pool setting (4MB). With the kernel options "coherent_pool=128M cma=256M" all the code stressing texture up-/download works fine at expected performance. I'll give it a try next week to see whether this also fixes the issues with GL_ARB_buffer_storage.