On 08/18/2016 04:32 AM, Michel Dänzer wrote:
On 18/08/16 08:51 AM, Mario Kleiner wrote:
That's what the ati-ddx/amdgpu-ddx does at the moment, as it detects the mismatch in tiling flags and uses the DRI3/Present copy path instead of the pageflip path. The problem is that the servers Present implementation doesn't request a vsync'ed start of the copy operation [...]
It waits for vblank before starting the copy.
Yes, a vblank event triggers the present_execute in the server. But all the latency from vblank event dispatch to the copy command packet hitting the gpu is still way too bad to avoid tearing. I tried again and couldn't find a single intel/amd/nvidia gpu here that doesn't tear more or less badly depending on load with DRI3/Present Copyswaps. Even tearfree wouldn't be good enough for my kind of applications as crucial timing/timestamps could still be off frequently by at least 1 frame.
There is this other approach from NVidia's Alex Goins for their proprietary driver, whose patches landed in the X-Server 1.19 master branch a couple of weeks ago. I haven't read his patches in detail yet, and i so far couldn't successfully test them with the reference implementation in modesetting ddx 1.19. Afaik there the display gpu exports a pair of scanout friendly, page flipping compatible dmabufs (i assume linear, contiguous, accessible by the display engines),
FWIW, that wouldn't be possible with our "older" GPUs which can't scan out from GTT: A BO can be either shared with another GPU or scanout friendly, not both at the same time.
Ok, good to know.
and the offload gpu imports those and renders into them. That saves one extra copy, so should be somewhat more efficient.
Using two shared buffers actually isn't as efficient as possible wrt inter-GPU bandwidth.
Out of interest, why? You'd have only one detiling copy VRAM -> RAM? Or is it about switching some kind of GTT mappings with two buffers that is inefficient?
Setting it up seems to be more involved and less flexible though. So far i couldn't make it work here for testing. Maybe bugs, maybe mistakes on my side, maybe i just have the wrong hardware for it.
Yeah, my impression has been it's a rather complicated solution geared towards the Intel iGPU + proprietary nVidia use case.
Setting up output source/output sink is not fun, as i learned now, rather clumsy and complex compared to render offload. I hope the real thing will come with some fool-proof one-click setup GUI, otherwise i don't have great hopes, given the technical skill level of my users. I still didn't manage to get it working, not even with the new Nvidia proprietary beta drivers on a real Optimus laptop.
-mario