[PATCH 1/2] Revert "drm/radeon: remove drm_vblank_get|put from pflip handling"

List overview All Threads
Download

newer

older

[PATCH] drm/exynos: Support DP...

unparseable, undocumented...

Michel Dänzer

17 Jun 2014 17 Jun '14

10:12 a.m.

From: Michel Dänzer michel.daenzer@amd.com

This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304.

drm_vblank_get() is necessary to ensure the DRM vblank counter value is up to date in drm_send_vblank_event().

Seems to fix weston hangs waiting for page flips to complete.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/radeon/radeon_display.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 2a8b9f1..97d7a80 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -357,6 +357,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id)

spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);

+ drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id); radeon_fence_unref(&work->fence); radeon_irq_kms_pflip_irq_get(rdev, work->crtc_id); queue_work(radeon_crtc->flip_queue, &work->unpin_work); @@ -459,6 +460,12 @@ static void radeon_flip_work_func(struct work_struct *__work) base &= ~7; }

+ r = drm_vblank_get(crtc->dev, radeon_crtc->crtc_id); + if (r) { + DRM_ERROR("failed to get vblank before flip\n"); + goto pflip_cleanup; + } + /* We borrow the event spin lock for protecting flip_work */ spin_lock_irqsave(&crtc->dev->event_lock, flags);

@@ -473,6 +480,16 @@ static void radeon_flip_work_func(struct work_struct *__work)

return;

+pflip_cleanup: + if (unlikely(radeon_bo_reserve(work->new_rbo, false) != 0)) { + DRM_ERROR("failed to reserve new rbo in error path\n"); + goto cleanup; + } + if (unlikely(radeon_bo_unpin(work->new_rbo) != 0)) { + DRM_ERROR("failed to unpin new rbo in error path\n"); + } + radeon_bo_unreserve(work->new_rbo); + cleanup: drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); radeon_fence_unref(&work->fence);

-- 2.0.0

Show replies by date

Michel Dänzer

17 Jun 17 Jun

10:12 a.m.

New subject: [PATCH 2/2] drm/radeon: Fix radeon_irq_kms_pflip_irq_get/put() imbalance

From: Michel Dänzer michel.daenzer@amd.com

Fixes a regression in 3.16-rc1 compared to 3.15.

The unbalanced calls would presumably result in the page flip interrupts never getting disabled once they are enabled.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/radeon/radeon_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 97d7a80..8b575a4 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -359,7 +359,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id)

drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id); radeon_fence_unref(&work->fence); - radeon_irq_kms_pflip_irq_get(rdev, work->crtc_id); + radeon_irq_kms_pflip_irq_put(rdev, work->crtc_id); queue_work(radeon_crtc->flip_queue, &work->unpin_work); }

-- 2.0.0

Christian König

11:41 a.m.

Am 17.06.2014 12:12, schrieb Michel Dänzer:

...

From: Michel Dänzer michel.daenzer@amd.com

This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304.

drm_vblank_get() is necessary to ensure the DRM vblank counter value is up to date in drm_send_vblank_event().

Seems to fix weston hangs waiting for page flips to complete.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

Both patches are: Reviewed-by: Christian König christian.koenig@amd.com

...

drivers/gpu/drm/radeon/radeon_display.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 2a8b9f1..97d7a80 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -357,6 +357,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id)

spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);

drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id); radeon_fence_unref(&work->fence); radeon_irq_kms_pflip_irq_get(rdev, work->crtc_id); queue_work(radeon_crtc->flip_queue, &work->unpin_work);

@@ -459,6 +460,12 @@ static void radeon_flip_work_func(struct work_struct *__work) base &= ~7; }
r = drm_vblank_get(crtc->dev, radeon_crtc->crtc_id);

if (r) {
DRM_ERROR("failed to get vblank before flip\n");
goto pflip_cleanup;
}

/* We borrow the event spin lock for protecting flip_work */ spin_lock_irqsave(&crtc->dev->event_lock, flags);
@@ -473,6 +480,16 @@ static void radeon_flip_work_func(struct work_struct *__work)

return;

+pflip_cleanup:
if (unlikely(radeon_bo_reserve(work->new_rbo, false) != 0)) {
DRM_ERROR("failed to reserve new rbo in error path\n");
goto cleanup;
}

if (unlikely(radeon_bo_unpin(work->new_rbo) != 0)) {
DRM_ERROR("failed to unpin new rbo in error path\n");
}

radeon_bo_unreserve(work->new_rbo);

cleanup: drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); radeon_fence_unref(&work->fence);

Alex Deucher

1:45 p.m.

On Tue, Jun 17, 2014 at 7:41 AM, Christian König deathsimple@vodafone.de wrote:

...

Am 17.06.2014 12:12, schrieb Michel Dänzer:

...
From: Michel Dänzer michel.daenzer@amd.com

This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304.

drm_vblank_get() is necessary to ensure the DRM vblank counter value is up to date in drm_send_vblank_event().

Seems to fix weston hangs waiting for page flips to complete.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

Both patches are: Reviewed-by: Christian König christian.koenig@amd.com

Both applied to my fixes tree.

Alex

...

...
drivers/gpu/drm/radeon/radeon_display.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 2a8b9f1..97d7a80 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -357,6 +357,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id) spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id);
radeon_fence_unref(&work->fence);
radeon_irq_kms_pflip_irq_get(rdev, work->crtc_id);
queue_work(radeon_crtc->flip_queue, &work->unpin_work);
@@ -459,6 +460,12 @@ static void radeon_flip_work_func(struct work_struct *__work) base &= ~7; }
r = drm_vblank_get(crtc->dev, radeon_crtc->crtc_id);
  if (r) {
          DRM_ERROR("failed to get vblank before flip\n");
          goto pflip_cleanup;
  }
  /* We borrow the event spin lock for protecting flip_work */
  spin_lock_irqsave(&crtc->dev->event_lock, flags);
@@ -473,6 +480,16 @@ static void radeon_flip_work_func(struct
work_struct *__work) return; +pflip_cleanup:
  if (unlikely(radeon_bo_reserve(work->new_rbo, false) != 0)) {
          DRM_ERROR("failed to reserve new rbo in error path\n");
          goto cleanup;
  }
  if (unlikely(radeon_bo_unpin(work->new_rbo) != 0)) {
          DRM_ERROR("failed to unpin new rbo in error path\n");
  }
  radeon_bo_unreserve(work->new_rbo);
cleanup: drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); radeon_fence_unref(&work->fence);
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Michel Dänzer

18 Jun 18 Jun

5:53 a.m.

On 17.06.2014 20:41, Christian König wrote:

...

Am 17.06.2014 12:12, schrieb Michel Dänzer:

...
From: Michel Dänzer michel.daenzer@amd.com

This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304.

drm_vblank_get() is necessary to ensure the DRM vblank counter value is up to date in drm_send_vblank_event().

Seems to fix weston hangs waiting for page flips to complete.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

Both patches are: Reviewed-by: Christian König christian.koenig@amd.com

Thank you.

Looking into these issues has got me thinking about the use of the page flip interrupt: If the page flip interrupt arrives before the corresponding vertical blank interrupt, the DRM vblank counter will be lower than expected by 1 in drm_send_vblank_event(). I suspect this is the cause of

(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

The easy way to avoid that would be to stop using the page flip interrupt for this again. Could there be another solution for the issues you addressed by using it?

If not, another issue I encountered in 3.15 is that radeon_crtc_handle_flip() is called unconditionally when a page flip interrupt arrives. If the flip was already handled (presumably from the vertical blank interrupt), the BUG_ON() in drm_vblank_put() triggers a panic. This happened to me with weston.

This is presumably not an issue in 3.16 because radeon_crtc_handle_flip() now bails early if radeon_crtc->flip_work == NULL.

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Christian König

9:14 a.m.

Am 18.06.2014 07:53, schrieb Michel Dänzer:

...

On 17.06.2014 20:41, Christian König wrote:

...
Am 17.06.2014 12:12, schrieb Michel Dänzer:

...
From: Michel Dänzer michel.daenzer@amd.com

This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304.

drm_vblank_get() is necessary to ensure the DRM vblank counter value is up to date in drm_send_vblank_event().

Seems to fix weston hangs waiting for page flips to complete.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

Both patches are: Reviewed-by: Christian König christian.koenig@amd.com

Thank you.

Looking into these issues has got me thinking about the use of the page flip interrupt: If the page flip interrupt arrives before the corresponding vertical blank interrupt, the DRM vblank counter will be lower than expected by 1 in drm_send_vblank_event(). I suspect this is the cause of

(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

First of all thanks for looking into it. Are you getting this on 3.16 or 3.15?

I don't think that the pflip irq is thrown earlier than the vblank, but on 3.16 it might actually be that we program the flip so fast into the hardware that we do it one frame earlier than planned.

...

The easy way to avoid that would be to stop using the page flip interrupt for this again. Could there be another solution for the issues you addressed by using it?

The original problem was that programming the flip in the vblank event actually doesn't work reliable because of the underlying hardware double buffering. We just can't tell if the flip will complete in this frame or if the vblank interrupt was processed so late that it will happen in the next frame.

We could just busy loop until either the pending bit or the bit for the update period becomes null, but even busy waiting for the pending bit to go up in an interrupt handler like we did before is quite questionable.

Additional to that using the pflip interrupt enables us to sync to the hblank as well or just not at all with just changing a few register bits. And it's also a prerequisite of switching to a non constant sync rate. So I would like to keep it and try to fix the issues we are seeing instead.

...

If not, another issue I encountered in 3.15 is that radeon_crtc_handle_flip() is called unconditionally when a page flip interrupt arrives. If the flip was already handled (presumably from the vertical blank interrupt), the BUG_ON() in drm_vblank_put() triggers a panic. This happened to me with weston.

Calling radeon_crtc_handle_flip multiple times shouldn't be a problem, that can happen with the old code as well. Setting unpin_work to NULL under a spin lock protects us from that case.

But take a look at the 3.15 version of radeon_crtc_page_flip instead!!! We first set "unpin_work", release the spin lock and *then* reserve and pin the BO. If I'm not completely wrong there is a race condition here that when the vblank interrupt happens before the rest of the function all kind of bad things can happen.

The only thing preventing us from that is that the vblank interrupt is turned on only at the end of the function, but the vblank interrupt can be turned on before by other reasons as well.

...

This is presumably not an issue in 3.16 because radeon_crtc_handle_flip() now bails early if radeon_crtc->flip_work == NULL.

Thanks, Christian.

Michel Dänzer

23 Jun 23 Jun

9:34 a.m.

On 18.06.2014 18:14, Christian König wrote:

...

Am 18.06.2014 07:53, schrieb Michel Dänzer:

...
Looking into these issues has got me thinking about the use of the page flip interrupt: If the page flip interrupt arrives before the corresponding vertical blank interrupt, the DRM vblank counter will be lower than expected by 1 in drm_send_vblank_event(). I suspect this is the cause of

(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

First of all thanks for looking into it. Are you getting this on 3.16 or 3.15?

I haven't actually run into this myself yet. I thought I'd seen it in several bug reports, but right now I can only find https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to include the page flipping changes from 3.16.

...

I don't think that the pflip irq is thrown earlier than the vblank, but on 3.16 it might actually be that we program the flip so fast into the hardware that we do it one frame earlier than planned.

So userspace is notified of the previous vertical blank period and calls the page flip ioctl in response, which then manages to program the scanout address update into the hardware before the scanout address update is latched during the previous vertical blank period?

To avoid that scenario, one possibility might be to check if we're in vertical blank before calling radeon_page_flip(), and if so sleep for 1ms or so before trying again? That might unnecessarily delay flips on other CRTCs though...

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Christian König

12:45 p.m.

Am 23.06.2014 11:34, schrieb Michel Dänzer:

...

On 18.06.2014 18:14, Christian König wrote:

...
Am 18.06.2014 07:53, schrieb Michel Dänzer:

...
Looking into these issues has got me thinking about the use of the page flip interrupt: If the page flip interrupt arrives before the corresponding vertical blank interrupt, the DRM vblank counter will be lower than expected by 1 in drm_send_vblank_event(). I suspect this is the cause of

(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

First of all thanks for looking into it. Are you getting this on 3.16 or 3.15?

I haven't actually run into this myself yet. I thought I'd seen it in several bug reports, but right now I can only find https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to include the page flipping changes from 3.16.

...
I don't think that the pflip irq is thrown earlier than the vblank, but on 3.16 it might actually be that we program the flip so fast into the hardware that we do it one frame earlier than planned.

So userspace is notified of the previous vertical blank period and calls the page flip ioctl in response, which then manages to program the scanout address update into the hardware before the scanout address update is latched during the previous vertical blank period?

Yes correct. That at least sounds like the most likely explanation to me.

...

To avoid that scenario, one possibility might be to check if we're in vertical blank before calling radeon_page_flip(), and if so sleep for 1ms or so before trying again? That might unnecessarily delay flips on other CRTCs though...

It won't delay the other CRTCs because each CRTC has it's own kernel thread, but it won't be optimal either.

Going to try to reproduce the bug with 3.16, Christian.

Dieter Nützel

7:46 p.m.

Am 23.06.2014 11:34, schrieb Michel Dänzer:

...

On 18.06.2014 18:14, Christian König wrote:

...
Am 18.06.2014 07:53, schrieb Michel Dänzer:

...
Looking into these issues has got me thinking about the use of the page flip interrupt: If the page flip interrupt arrives before the corresponding vertical blank interrupt, the DRM vblank counter will be lower than expected by 1 in drm_send_vblank_event(). I suspect this is the cause of

(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

First of all thanks for looking into it. Are you getting this on 3.16 or 3.15?

I haven't actually run into this myself yet. I thought I'd seen it in several bug reports, but right now I can only find https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to include the page flipping changes from 3.16.

With 3.16-rc2 I get it now on my RV730 AGP as in the above bug report. But only the lines in Xorg.0.log. NO signs of any damage/error in use.

Since 3.15 and 3.16 (rc2 only) my system is rock solid.

I've tried 3.15-rc7 + Christian's pflip rework (did some little handwork), too. It was solid but I saw the reported flip/black distortion in the below part during Kwin 4.13 cube screen effect (rotation). Your fix for 3.16-rc1 fixed that.

Before 3.15/3.16-rcX I got some hangs from time to time during system boot. Nothing in the logs but SSD RAID1 rebuild. Maybe it was MD related an NOT r600/DRM.

3.16-rcX (3.15-rc7+pflip patches) seems to be more responsive that 3.15, for me.

First and latest attchments from bug #80141 https://bugs.freedesktop.org/attachment.cgi?id=101605 show same.

Where should I add/send my Xorg.0.log?

Cheers, Dieter

...

...
I don't think that the pflip irq is thrown earlier than the vblank, but on 3.16 it might actually be that we program the flip so fast into the hardware that we do it one frame earlier than planned.

So userspace is notified of the previous vertical blank period and calls the page flip ioctl in response, which then manages to program the scanout address update into the hardware before the scanout address update is latched during the previous vertical blank period?

To avoid that scenario, one possibility might be to check if we're in vertical blank before calling radeon_page_flip(), and if so sleep for 1ms or so before trying again? That might unnecessarily delay flips on other CRTCs though...

Dieter Nützel

8:32 p.m.

Am 23.06.2014 21:46, schrieb Dieter Nützel:

...

Am 23.06.2014 11:34, schrieb Michel Dänzer:

...
On 18.06.2014 18:14, Christian König wrote:

...
Am 18.06.2014 07:53, schrieb Michel Dänzer:

...
Looking into these issues has got me thinking about the use of the page flip interrupt: If the page flip interrupt arrives before the corresponding vertical blank interrupt, the DRM vblank counter will be lower than expected by 1 in drm_send_vblank_event(). I suspect this is the cause of

(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

First of all thanks for looking into it. Are you getting this on 3.16 or 3.15?

I haven't actually run into this myself yet. I thought I'd seen it in several bug reports, but right now I can only find https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to include the page flipping changes from 3.16.

With 3.16-rc2 I get it now on my RV730 AGP as in the above bug report. But only the lines in Xorg.0.log. NO signs of any damage/error in use.

Since 3.15 and 3.16 (rc2 only) my system is rock solid.

I've tried 3.15-rc7 + Christian's pflip rework (did some little handwork), too. It was solid but I saw the reported flip/black distortion in the below part during Kwin 4.13 cube screen effect (rotation). Your fix for 3.16-rc1 fixed that.

Before 3.15/3.16-rcX I got some hangs from time to time during system boot. Nothing in the logs but SSD RAID1 rebuild. Maybe it was MD related an NOT r600/DRM.

3.16-rcX (3.15-rc7+pflip patches) seems to be more responsive that 3.15, for me.

First and latest attchments from bug #80141 https://bugs.freedesktop.org/attachment.cgi?id=101605 show same.

Where should I add/send my Xorg.0.log?

Cheers, Dieter

Addendum:

I can reliable generate such lines in Xorg.0.log with KWin cube desktop effect.

Rotate screens with mouse wheel or screen switcher => new entry in Xorg.0.log. If it happens I notice ('see') flip delay.

[ 9893.183] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 594382 < target_msc 594383 [ 10859.753] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 652497 < target_msc 652498 [ 10915.719] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 655863 < target_msc 655864 [ 10916.817] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 655929 < target_msc 655930 [ 10925.843] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 656472 < target_msc 656473 [ 10926.774] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 656528 < target_msc 656529 [ 10965.519] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 658859 < target_msc 658860 [ 11081.878] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 665846 < target_msc 665847

...

...
...
I don't think that the pflip irq is thrown earlier than the vblank, but on 3.16 it might actually be that we program the flip so fast into the hardware that we do it one frame earlier than planned.

So userspace is notified of the previous vertical blank period and calls the page flip ioctl in response, which then manages to program the scanout address update into the hardware before the scanout address update is latched during the previous vertical blank period?

To avoid that scenario, one possibility might be to check if we're in vertical blank before calling radeon_page_flip(), and if so sleep for 1ms or so before trying again? That might unnecessarily delay flips on other CRTCs though...

dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Michel Dänzer

24 Jun 24 Jun

10:05 a.m.

On 24.06.2014 05:32, Dieter Nützel wrote:

...

Am 23.06.2014 21:46, schrieb Dieter Nützel:

...
Am 23.06.2014 11:34, schrieb Michel Dänzer:

...
On 18.06.2014 18:14, Christian König wrote:

...
Am 18.06.2014 07:53, schrieb Michel Dänzer:

...
(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

First of all thanks for looking into it. Are you getting this on 3.16 or 3.15?

I haven't actually run into this myself yet. I thought I'd seen it in several bug reports, but right now I can only find https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to include the page flipping changes from 3.16.

With 3.16-rc2 I get it now on my RV730 AGP as in the above bug report. But only the lines in Xorg.0.log. NO signs of any damage/error in use.

Since 3.15 and 3.16 (rc2 only) my system is rock solid.

I've tried 3.15-rc7 + Christian's pflip rework (did some little handwork), too. It was solid but I saw the reported flip/black distortion in the below part during Kwin 4.13 cube screen effect (rotation). Your fix for 3.16-rc1 fixed that.

That's good to hear.

...

I can reliable generate such lines in Xorg.0.log with KWin cube desktop effect.

Rotate screens with mouse wheel or screen switcher => new entry in Xorg.0.log. If it happens I notice ('see') flip delay.

I was only able to reproduce it a couple of times even with that, but not at all yet with the patch below. Does it help for you as well?

...

...
...
...
I don't think that the pflip irq is thrown earlier than the vblank, but on 3.16 it might actually be that we program the flip so fast into the hardware that we do it one frame earlier than planned.

So userspace is notified of the previous vertical blank period and calls the page flip ioctl in response, which then manages to program the scanout address update into the hardware before the scanout address update is latched during the previous vertical blank period?

I think there's another possible scenario:

1. Userspace submits page flip intended for MSC x 2. The vertical blank interrupt is triggered for MSC x => radeon_crtc_handle_vblank() => radeon_crtc_handle_flip() 3. Userspace submits page flip intended for MSC (x + 1) 4. The page flip interrupt is triggered for the previous flip => radeon_crtc_handle_flip() => drm_send_vblank_event(). The second flip hasn't actually executed yet, and the event has MSC x instead of (x + 1) as expected by userspace.

If that is the case, only actually enabling and handling the page flip interrupt when a flip is pending might also avoid it. I can hack that up tomorrow, if Christian doesn't beat me to it.

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 8b575a4..8350f8c 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -336,14 +336,19 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id) struct radeon_crtc *radeon_crtc = rdev->mode_info.crtcs[crtc_id]; struct radeon_flip_work *work; unsigned long flags; + struct timeval vblank_time; + u32 vblank_seq;

/* this can happen at init */ if (radeon_crtc == NULL) return;

+ vblank_seq = drm_vblank_count_and_time(rdev->ddev, crtc_id, &vblank_time); + spin_lock_irqsave(&rdev->ddev->event_lock, flags); work = radeon_crtc->flip_work; - if (work == NULL) { + if (work == NULL || + (vblank_seq - work->event->event.sequence) > (1<<23)) { spin_unlock_irqrestore(&rdev->ddev->event_lock, flags); return; } @@ -379,6 +384,7 @@ static void radeon_flip_work_func(struct work_struct *__work)

struct drm_crtc *crtc = &radeon_crtc->base; struct drm_framebuffer *fb = work->fb; + struct timeval vblank_time;

uint32_t tiling_flags, pitch_pixels; uint64_t base; @@ -466,6 +472,10 @@ static void radeon_flip_work_func(struct work_struct *__work) goto pflip_cleanup; }

+ work->event->event.sequence = + drm_vblank_count_and_time(crtc->dev, radeon_crtc->crtc_id, + &vblank_time) + 1; + /* We borrow the event spin lock for protecting flip_work */ spin_lock_irqsave(&crtc->dev->event_lock, flags);

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Dieter Nützel

7:58 p.m.

Am 24.06.2014 12:05, schrieb Michel Dänzer:

...

On 24.06.2014 05:32, Dieter Nützel wrote:

...
Am 23.06.2014 21:46, schrieb Dieter Nützel:

...
Am 23.06.2014 11:34, schrieb Michel Dänzer:

...
On 18.06.2014 18:14, Christian König wrote:

...
Am 18.06.2014 07:53, schrieb Michel Dänzer:

...
(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

First of all thanks for looking into it. Are you getting this on 3.16 or 3.15?

I haven't actually run into this myself yet. I thought I'd seen it in several bug reports, but right now I can only find https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to include the page flipping changes from 3.16.

With 3.16-rc2 I get it now on my RV730 AGP as in the above bug report. But only the lines in Xorg.0.log. NO signs of any damage/error in use.

Since 3.15 and 3.16 (rc2 only) my system is rock solid.

I've tried 3.15-rc7 + Christian's pflip rework (did some little handwork), too. It was solid but I saw the reported flip/black distortion in the below part during Kwin 4.13 cube screen effect (rotation). Your fix for 3.16-rc1 fixed that.

That's good to hear.

...
I can reliable generate such lines in Xorg.0.log with KWin cube desktop effect.

Rotate screens with mouse wheel or screen switcher => new entry in Xorg.0.log. If it happens I notice ('see') flip delay.

I was only able to reproduce it a couple of times even with that, but not at all yet with the patch below. Does it help for you as well?

Will try in the next run.

My daughter generated kernel crash for us.;-) See would open up a zoom image in Konqi of a new Waveboard for here girl friends...

But I could only take images with my mobile. kernel BUG at drivers/gpu/drm/drm_irq.c:976! Will send one, have two more.

Greetings, Dieter

2nd, try. This time without image. Let me know where I should add it, please.

...

...
...
...
...
I don't think that the pflip irq is thrown earlier than the vblank, but on 3.16 it might actually be that we program the flip so fast into the hardware that we do it one frame earlier than planned.

So userspace is notified of the previous vertical blank period and calls the page flip ioctl in response, which then manages to program the scanout address update into the hardware before the scanout address update is latched during the previous vertical blank period?

I think there's another possible scenario:

Userspace submits page flip intended for MSC x

The vertical blank interrupt is triggered for MSC x => radeon_crtc_handle_vblank() => radeon_crtc_handle_flip()

Userspace submits page flip intended for MSC (x + 1)

The page flip interrupt is triggered for the previous flip => radeon_crtc_handle_flip() => drm_send_vblank_event(). The second

flip hasn't actually executed yet, and the event has MSC x instead of (x

as expected by userspace.

If that is the case, only actually enabling and handling the page flip interrupt when a flip is pending might also avoid it. I can hack that up tomorrow, if Christian doesn't beat me to it.

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 8b575a4..8350f8c 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -336,14 +336,19 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id) struct radeon_crtc *radeon_crtc = rdev->mode_info.crtcs[crtc_id]; struct radeon_flip_work *work; unsigned long flags;
  struct timeval vblank_time;
  u32 vblank_seq;

  /* this can happen at init */
  if (radeon_crtc == NULL)
          return;
  vblank_seq = drm_vblank_count_and_time(rdev->ddev, crtc_id,
&vblank_time);
  spin_lock_irqsave(&rdev->ddev->event_lock, flags);
  work = radeon_crtc->flip_work;
  if (work == NULL) {
  if (work == NULL ||
      (vblank_seq - work->event->event.sequence) > (1<<23)) {
          spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
          return;
  }
@@ -379,6 +384,7 @@ static void radeon_flip_work_func(struct work_struct *__work)
    struct drm_crtc *crtc = &radeon_crtc->base;
    struct drm_framebuffer *fb = work->fb;
  struct timeval vblank_time;

  uint32_t tiling_flags, pitch_pixels;
  uint64_t base;
@@ -466,6 +472,10 @@ static void radeon_flip_work_func(struct work_struct *__work) goto pflip_cleanup; }
  work->event->event.sequence =
          drm_vblank_count_and_time(crtc->dev, 
radeon_crtc->crtc_id,
                                    &vblank_time) + 1;
  /* We borrow the event spin lock for protecting flip_work */
  spin_lock_irqsave(&crtc->dev->event_lock, flags);

Dieter Nützel

9:52 p.m.

Am 24.06.2014 12:05, schrieb Michel Dänzer:

...

On 24.06.2014 05:32, Dieter Nützel wrote:

...
Am 23.06.2014 21:46, schrieb Dieter Nützel:

...
Am 23.06.2014 11:34, schrieb Michel Dänzer:

...
On 18.06.2014 18:14, Christian König wrote:

...
Am 18.06.2014 07:53, schrieb Michel Dänzer:

...
(WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour.

First of all thanks for looking into it. Are you getting this on 3.16 or 3.15?

I haven't actually run into this myself yet. I thought I'd seen it in several bug reports, but right now I can only find https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to include the page flipping changes from 3.16.

With 3.16-rc2 I get it now on my RV730 AGP as in the above bug report. But only the lines in Xorg.0.log. NO signs of any damage/error in use.

Since 3.15 and 3.16 (rc2 only) my system is rock solid.

I've tried 3.15-rc7 + Christian's pflip rework (did some little handwork), too. It was solid but I saw the reported flip/black distortion in the below part during Kwin 4.13 cube screen effect (rotation). Your fix for 3.16-rc1 fixed that.

That's good to hear.

...
I can reliable generate such lines in Xorg.0.log with KWin cube desktop effect.

Rotate screens with mouse wheel or screen switcher => new entry in Xorg.0.log. If it happens I notice ('see') flip delay.

I was only able to reproduce it a couple of times even with that, but not at all yet with the patch below. Does it help for you as well?

You have my Tested-by: for it. Can't reproduce it any longer with your patch below. Even that it didn't apply ontop of 3.16-rc2, but most of the time I know what I'm doing...;-)

Now some little Fußball watching!

Cheers, Dieter

...

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 8b575a4..8350f8c 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -336,14 +336,19 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id) struct radeon_crtc *radeon_crtc = rdev->mode_info.crtcs[crtc_id]; struct radeon_flip_work *work; unsigned long flags;
  struct timeval vblank_time;
  u32 vblank_seq;

  /* this can happen at init */
  if (radeon_crtc == NULL)
          return;
  vblank_seq = drm_vblank_count_and_time(rdev->ddev, crtc_id,
&vblank_time);
  spin_lock_irqsave(&rdev->ddev->event_lock, flags);
  work = radeon_crtc->flip_work;
  if (work == NULL) {
  if (work == NULL ||
      (vblank_seq - work->event->event.sequence) > (1<<23)) {
          spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
          return;
  }
@@ -379,6 +384,7 @@ static void radeon_flip_work_func(struct work_struct *__work)
    struct drm_crtc *crtc = &radeon_crtc->base;
    struct drm_framebuffer *fb = work->fb;
  struct timeval vblank_time;

  uint32_t tiling_flags, pitch_pixels;
  uint64_t base;
@@ -466,6 +472,10 @@ static void radeon_flip_work_func(struct work_struct *__work) goto pflip_cleanup; }
  work->event->event.sequence =
          drm_vblank_count_and_time(crtc->dev, 
radeon_crtc->crtc_id,
                                    &vblank_time) + 1;
  /* We borrow the event spin lock for protecting flip_work */
  spin_lock_irqsave(&crtc->dev->event_lock, flags);

3987

Age (days ago)

3994

Last active (days ago)

dri-devel@lists.freedesktop.org

12 comments

4 participants

tags (0)

participants (4)

Alex Deucher
Christian König
Dieter Nützel
Michel Dänzer