On Thu, Aug 18, 2016 at 4:23 AM, Michel Dänzer michel@daenzer.net wrote:
Maybe the rasterization as two triangles results in bad PCIe bandwidth utilization. Using the asynchronous DMA engine for these transfers would probably be ideal, but having the 3D engine rasterize a single rectangle (either using the rectangle primitive or a large triangle with scissor) might already help.
There is only one thing that's bad for PCIe when the surface is linear: the 3D engine. Disabling all but the first shader engine and all but the first 2 RBs should improve performance for blits from VRAM to GTT. The closed driver does that, but I don't remember if the destination must be linear, must be in GTT, or both. In any case, SDMA should still be the best for VRAM->GTT blits.
Marek