Am 20.04.21 um 16:53 schrieb Daniel Stone:
Deadlock mitigation to recover from
segfaults:
- The kernel knows which process is obliged
to signal which fence. This information is
part of the Present request and supplied by
userspace.
- If the producer crashes, the kernel
signals the submit fence, so that the consumer
can make forward progress.
- If the consumer crashes, the kernel
signals the return fence, so that the producer
can reclaim the buffer.
- A GPU hang signals all fences. Other
deadlocks will be handled like GPU hangs.
Another thought: with completely arbitrary
userspace fencing, none of this is helpful either.
If the compositor can't guarantee that a hostile
client has submitted a fence which will never be
signaled, then it won't be waiting on it, so it
already needs infrastructure to handle something
like this.
That already handles the crashed-client case,
because if the client crashes, then its connection
will be dropped, which will trigger the compositor
to destroy all its resources anyway, including any
pending waits.
Exactly that's the problem. A compositor isn't immediately
informed that the client crashed, instead it is still
referencing the buffer and trying to use it for
compositing.
If the compositor no longer has a guarantee that the
buffer will be ready for composition in a reasonable amount
of time (which dma_fence gives us, and this proposal does
not appear to give us), then the compositor isn't trying to
use the buffer for compositing, it's waiting asynchronously
on a notification that the fence has signaled before it
attempts to use the buffer.
Marek's initial suggestion is that the kernel signal the
fence, which would unblock composition (and presumably show
garbage on screen, or at best jump back to old content).
My position is that the compositor will know the process
has crashed anyway - because its socket has been closed - at
which point we destroy all the client's resources including
its windows and buffers regardless. Signaling the fence
doesn't give us any value here, _unless_ the compositor is
just blindly waiting for the fence to signal ... which it
can't do because there's no guarantee the fence will ever
signal.