Trying to figure out which e-mail in this mess is the right one to reply to....
On Tue, Apr 27, 2021 at 12:31 PM Lucas Stach l.stach@pengutronix.de wrote:
Hi,
Am Dienstag, dem 27.04.2021 um 09:26 -0400 schrieb Marek Olšák:
Ok. So that would only make the following use cases broken for now:
- amd render -> external gpu
Assuming said external GPU doesn't support memory fences. If we do amdgpu and i915 at the same time, that covers basically most of the external GPU use-cases. Of course, we'd want to convert nouveau as well for the rest.
- amd video encode -> network device
FWIW, "only" breaking amd render -> external gpu will make us pretty unhappy, as we have some cases where we are combining an AMD APU with a FPGA based graphics card. I can't go into the specifics of this use- case too much but basically the AMD graphics is rendering content that gets composited on top of a live video pipeline running through the FPGA.
I think it's worth taking a step back and asking what's being here before we freak out too much. If we do go this route, it doesn't mean that your FPGA use-case can't work, it just means it won't work out-of-the box anymore. You'll have to separate execution and memory dependencies inside your FPGA driver. That's still not great but it's not as bad as you maybe made it sound.
What about the case when we get a buffer from an external device and we're supposed to make it "busy" when we are using it, and the external device wants to wait until we stop using it? Is it something that can happen, thus turning "external -> amd" into "external <-> amd"?
Zero-copy texture sampling from a video input certainly appreciates this very much. Trying to pass the render fence through the various layers of userspace to be able to tell when the video input can reuse a buffer is a great experience in yak shaving. Allowing the video input to reuse the buffer as soon as the read dma_fence from the GPU is signaled is much more straight forward.
Oh, it's definitely worse than that. Every window system interaction is bi-directional. The X server has to wait on the client before compositing from it and the client has to wait on X before re-using that back-buffer. Of course, we can break that later dependency by doing a full CPU wait but that's going to mean either more latency or reserving more back buffers. There's no good clean way to claim that any of this is one-directional.
--Jason