Hello all,
For a while now, I've been working to fix tearing with PRIME. This is my second version of the solution, revised according to Daniel Vetter's and Daniel Stone's suggestions.
Rather than using fence callbacks to explicitly trigger flipping, fences are now used to block flipping/atomic commits up to a timeout of 96 ms, based on Daniel Stone's non-upstreamed patch to the Exynos DRM driver. I have two patches, one that implements fencing for i915's legacy mmio_flip path, and one for atomic commits for futureproofing. Currently the mmio_flip path is the one ultimately used by the X patches, due to the lack of asynchronous atomic modesetting support in i915.
With my synchronization patches to X, it is possible to export two shared buffers per crtc instead of just one. The sink driver uses the legacy drmModePageFlip() to flip between the buffers, as the rest of the driver has yet to be ported to atomics. In the pageflip/vblank event handler, the sink driver requests a present from the source using the new X ABI function pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully, it uses drmModePageFlip() to flip to the updated buffer, otherwise it waits until the next vblank and tries again.
When the source driver presents on a given buffer, it first attaches a fence. The source driver is responsible for either using software signaling or hardware semaphore-backed fences to ensure the fence is signaled when the present is finished. If the sink's DRM driver implements fencing in the flipping path, it will guarantee that that flip won't occur until the present has finished.
This means that DRM drivers that don't implement fencing in their flipping paths won't be able to guarantee 100% tear-free PRIME with my X patches. However, the good news is that even without fencing, tearing is rare. Generally presenting finishes before the next vblank, so there is no need to wait on the fence. The X patches are a drastic improvement with or without fencing, but the fencing is nonetheless important to guarantee tear-free under all conditions.
To give some greater context, I've uploaded my branches for DRM and the X server to Github. I'll move forward with upstreaming the X changes if and when these DRM patches go in.
DRM Tree: https://github.com/GoinsWithTheWind/drm-prime-sync X Tree: https://github.com/GoinsWithTheWind/xserver-prime-sync
(branch agoins-prime-v2)
Thanks, Alex @ NVIDIA Linux Driver Team
Alex Goins (2): i915: wait for fences in mmio_flip() i915: wait for fences in atomic commit
drivers/gpu/drm/i915/intel_display.c | 37 ++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+)
If a buffer is backed by dmabuf, wait on its reservation object's fences before flipping.
Signed-off-by: Alex Goins agoins@nvidia.com --- drivers/gpu/drm/i915/intel_display.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index b2270d5..1485640 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -44,6 +44,8 @@ #include <drm/drm_plane_helper.h> #include <drm/drm_rect.h> #include <linux/dma_remapping.h> +#include <linux/reservation.h> +#include <linux/dma-buf.h>
/* Primary plane formats for gen <= 3 */ static const uint32_t i8xx_primary_formats[] = { @@ -11170,10 +11172,21 @@ static void ilk_do_mmio_flip(struct intel_crtc *intel_crtc) static void intel_do_mmio_flip(struct intel_crtc *intel_crtc) { struct drm_device *dev = intel_crtc->base.dev; + struct drm_i915_gem_object *pending_flip_obj = + intel_crtc->unpin_work->pending_flip_obj; u32 start_vbl_count;
intel_mark_page_flip_active(intel_crtc);
+ /* For framebuffer backed by dmabuf, wait for fence */ + mutex_lock(&dev->object_name_lock); + if (pending_flip_obj->base.dma_buf) { + reservation_object_wait_timeout_rcu( + pending_flip_obj->base.dma_buf->resv, + true, false, msecs_to_jiffies(96)); + } + mutex_unlock(&dev->object_name_lock); + intel_pipe_update_start(intel_crtc, &start_vbl_count);
if (INTEL_INFO(dev)->gen >= 9)
On Fri, Oct 30, 2015 at 06:03:50PM -0700, Alex Goins wrote:
If a buffer is backed by dmabuf, wait on its reservation object's fences before flipping.
Signed-off-by: Alex Goins agoins@nvidia.com
drivers/gpu/drm/i915/intel_display.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index b2270d5..1485640 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -44,6 +44,8 @@ #include <drm/drm_plane_helper.h> #include <drm/drm_rect.h> #include <linux/dma_remapping.h> +#include <linux/reservation.h> +#include <linux/dma-buf.h>
/* Primary plane formats for gen <= 3 */ static const uint32_t i8xx_primary_formats[] = { @@ -11170,10 +11172,21 @@ static void ilk_do_mmio_flip(struct intel_crtc *intel_crtc) static void intel_do_mmio_flip(struct intel_crtc *intel_crtc) { struct drm_device *dev = intel_crtc->base.dev;
struct drm_i915_gem_object *pending_flip_obj =
intel_crtc->unpin_work->pending_flip_obj;
u32 start_vbl_count;
intel_mark_page_flip_active(intel_crtc);
/* For framebuffer backed by dmabuf, wait for fence */
mutex_lock(&dev->object_name_lock);
The lock here is unfortunate. I thought once a dmabuf as attached to an object, it persists until the object is destroyed, so afaict the lock here is unnecessary (as it only protects against a userspace race in attaching a dmabuf).
- if (pending_flip_obj->base.dma_buf) {
reservation_object_wait_timeout_rcu(
Side-question, are these fences exclusive or do we track read/write? -Chris
- /* For framebuffer backed by dmabuf, wait for fence */
- mutex_lock(&dev->object_name_lock);
The lock here is unfortunate. I thought once a dmabuf as attached to an object, it persists until the object is destroyed, so afaict the lock here is unnecessary (as it only protects against a userspace race in attaching a dmabuf).
You're probably right. I'll send out a v3 patch set with the lock removed if there are no other comments.
- if (pending_flip_obj->base.dma_buf) {
reservation_object_wait_timeout_rcu(
Side-question, are these fences exclusive or do we track read/write?
They are exclusive, with the sink X driver using vblank events to explicitly request the source X driver to write. If you guys want to track read/write I could switch over to shared fences, but for my use case only exclusive fences are necessary.
Thanks, Alex
On Mon, Nov 09, 2015 at 09:42:05PM +0000, Alexander Goins wrote:
- /* For framebuffer backed by dmabuf, wait for fence */
- mutex_lock(&dev->object_name_lock);
The lock here is unfortunate. I thought once a dmabuf as attached to an object, it persists until the object is destroyed, so afaict the lock here is unnecessary (as it only protects against a userspace race in attaching a dmabuf).
You're probably right. I'll send out a v3 patch set with the lock removed if there are no other comments.
- if (pending_flip_obj->base.dma_buf) {
reservation_object_wait_timeout_rcu(
Side-question, are these fences exclusive or do we track read/write?
They are exclusive, with the sink X driver using vblank events to explicitly request the source X driver to write. If you guys want to track read/write I could switch over to shared fences, but for my use case only exclusive fences are necessary.
reservations track reads/writes and here we should only wait for the exclusive fence (i.e. pending writes) and not also for reads. Doing that would result in piles of unecessary stalls all over. We need a different wait_timeout call here.
Also fences should always eventually signal (worst case with a hangcheck fallback that force-signals everything that died in a gpu hang), so I don't think a timeout is the correct approach.
What we need otoh is that this needs to be interruptible (at least for the new atomic path). -Daniel
All of that sounds easily done by changing reservation_object_wait_timeout_rcu() args wait_all to FALSE, intr to TRUE, and timeout to MAX_SCHEDULE_TIMEOUT.
Thanks, Alex
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Monday, November 16, 2015 7:16 AM To: Alexander Goins Cc: Chris Wilson; dri-devel@lists.freedesktop.org Subject: Re: [PATCH i915 v2 1/2] i915: wait for fences in mmio_flip()
On Mon, Nov 09, 2015 at 09:42:05PM +0000, Alexander Goins wrote:
- /* For framebuffer backed by dmabuf, wait for fence */
- mutex_lock(&dev->object_name_lock);
The lock here is unfortunate. I thought once a dmabuf as attached to an object, it persists until the object is destroyed, so afaict the lock here is unnecessary (as it only protects against a userspace race in attaching a dmabuf).
You're probably right. I'll send out a v3 patch set with the lock removed if there are no other comments.
- if (pending_flip_obj->base.dma_buf) {
reservation_object_wait_timeout_rcu(
Side-question, are these fences exclusive or do we track read/write?
They are exclusive, with the sink X driver using vblank events to explicitly request the source X driver to write. If you guys want to track read/write I could switch over to shared fences, but for my use case only exclusive fences are necessary.
reservations track reads/writes and here we should only wait for the exclusive fence (i.e. pending writes) and not also for reads. Doing that would result in piles of unecessary stalls all over. We need a different wait_timeout call here.
Also fences should always eventually signal (worst case with a hangcheck fallback that force-signals everything that died in a gpu hang), so I don't think a timeout is the correct approach.
What we need otoh is that this needs to be interruptible (at least for the new atomic path). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
For all buffers backed by dmabuf, wait for its reservation object's fences before committing.
Signed-off-by: Alex Goins agoins@nvidia.com --- drivers/gpu/drm/i915/intel_display.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 1485640..3e6d588 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13097,6 +13097,8 @@ static int intel_atomic_commit(struct drm_device *dev, struct drm_i915_private *dev_priv = dev->dev_private; struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; + struct drm_plane *plane; + struct drm_plane_state *plane_state; int ret = 0; int i; bool any_ms = false; @@ -13112,6 +13114,28 @@ static int intel_atomic_commit(struct drm_device *dev,
drm_atomic_helper_swap_state(dev, state);
+ /* For all framebuffers backed by dmabuf, wait for fence */ + for_each_plane_in_state(state, plane, plane_state, i) { + struct drm_framebuffer *fb; + struct drm_i915_gem_object *obj; + + fb = plane->state->fb; + if (!fb) + continue; + + obj = intel_fb_obj(fb); + if (!obj) + continue; + + mutex_lock(&obj->base.dev->object_name_lock); + if (obj->base.dma_buf) { + reservation_object_wait_timeout_rcu( + obj->base.dma_buf->resv, + true, false, msecs_to_jiffies(96)); + } + mutex_unlock(&obj->base.dev->object_name_lock); + } + for_each_crtc_in_state(state, crtc, crtc_state, i) { struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
Op 31-10-15 om 02:03 schreef Alex Goins:
For all buffers backed by dmabuf, wait for its reservation object's fences before committing.
Signed-off-by: Alex Goins agoins@nvidia.com
drivers/gpu/drm/i915/intel_display.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 1485640..3e6d588 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13097,6 +13097,8 @@ static int intel_atomic_commit(struct drm_device *dev, struct drm_i915_private *dev_priv = dev->dev_private; struct drm_crtc *crtc; struct drm_crtc_state *crtc_state;
- struct drm_plane *plane;
- struct drm_plane_state *plane_state; int ret = 0; int i; bool any_ms = false;
@@ -13112,6 +13114,28 @@ static int intel_atomic_commit(struct drm_device *dev,
drm_atomic_helper_swap_state(dev, state);
- /* For all framebuffers backed by dmabuf, wait for fence */
- for_each_plane_in_state(state, plane, plane_state, i) {
struct drm_framebuffer *fb;
struct drm_i915_gem_object *obj;
fb = plane->state->fb;
if (!fb)
continue;
obj = intel_fb_obj(fb);
if (!obj)
continue;
mutex_lock(&obj->base.dev->object_name_lock);
if (obj->base.dma_buf) {
reservation_object_wait_timeout_rcu(
obj->base.dma_buf->resv,
true, false, msecs_to_jiffies(96));
}
mutex_unlock(&obj->base.dev->object_name_lock);
- }
- for_each_crtc_in_state(state, crtc, crtc_state, i) { struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
Could you put the wait inside prepare_plane_fb?
Could you put the wait inside prepare_plane_fb?
Yeah, that works, and it gets rid of the need to loop over the planes. I wait on the fence before taking out the struct_mutex lock like so:
@@ -13369,6 +13345,13 @@ intel_prepare_plane_fb(struct drm_plane *plane, if (!obj) return 0;
+ /* For framebuffer backed by dmabuf, wait for fence */ + if (obj->base.dma_buf) { + reservation_object_wait_timeout_rcu( + obj->base.dma_buf->resv, + true, false, msecs_to_jiffies(96)); + } + mutex_lock(&dev->struct_mutex);
I'll send out the revised patch in the v3 patch set if there are no further comments.
Thanks, Alex
dri-devel@lists.freedesktop.org