After the commit def35e7c5926 ("drm/vkms: Bugfix extra vblank frame") some of the crc tests started to fail in the vkms with the following error:
[drm:drm_crtc_add_crc_entry [drm]] *ERROR* Overflow of CRC buffer, userspace reads too slow. [drm] failed to queue vkms_crc_work_handle ...
The aforementioned commit fixed the extra vblank added by `drm_crtc_arm_vblank_event()` which is invoked inside `vkms_crtc_atomic_flush()` if the vblank event count was zero, otherwise `drm_crtc_send_vblank_event()` is invoked. The fix was implemented in `vkms_get_vblank_timestamp()` by subtracting one period from the current timestamp, as the code snippet below illustrates:
if (!in_vblank_irq) *vblank_time -= output->period_ns;
The above fix works well when `drm_crtc_arm_vblank_event()` is invoked. However, it does not properly work when `drm_crtc_send_vblank_event()` executes since it subtracts the correct timestamp, which it shouldn't. In this case, the `drm_crtc_accurate_vblank_count()` function will returns the wrong frame number, which generates the aforementioned error. Such decrease in `get_vblank_timestamp()` produce a negative number in the following calculation within `drm_update_vblank_count()`:
u64 diff_ns = ktime_to_ns(ktime_sub(t_vblank, vblank->time));
After this operation, the DIV_ROUND_CLOSEST_ULL macro is invoked using diff_ns with a negative number, which generates an undefined result; therefore, the returned frame is a huge and incorrect number. Finally, the code below is part of the `vkms_crc_work_handle()`, note that the while loop depends on the returned value from `drm_crtc_accurate_vblank_count()` which may cause the loop to take a long time to finish in case of huge value.
frame_end = drm_crtc_accurate_vblank_count(crtc); while (frame_start <= frame_end) drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
This commit fixes this issue by checking if the vblank timestamp corresponding to the current software vblank counter is equal to the current vblank; if they are equal, it means that `drm_crtc_send_vblank_event()` was invoked and vkms does not need to discount the extra vblank, otherwise, `drm_crtc_arm_vblank_event()` was executed and vkms have to discount the extra vblank. This fix made the CRC tests work again whereas keep all tests from kms_flip working as well.
V2: Update commit message
Signed-off-by: Rodrigo Siqueira rodrigosiqueiramelo@gmail.com Signed-off-by: Shayenne Moura shayenneluzmoura@gmail.com --- drivers/gpu/drm/vkms/vkms_crtc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index 7508815fac11..3ce60e66673e 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -74,9 +74,13 @@ bool vkms_get_vblank_timestamp(struct drm_device *dev, unsigned int pipe, { struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev); struct vkms_output *output = &vkmsdev->output; + struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
*vblank_time = output->vblank_hrtimer.node.expires;
+ if (*vblank_time == vblank->time) + return true; + if (!in_vblank_irq) *vblank_time -= output->period_ns;
On Tue, Jun 04, 2019 at 11:45:43PM -0300, Rodrigo Siqueira wrote:
After the commit def35e7c5926 ("drm/vkms: Bugfix extra vblank frame") some of the crc tests started to fail in the vkms with the following error:
[drm:drm_crtc_add_crc_entry [drm]] *ERROR* Overflow of CRC buffer, userspace reads too slow. [drm] failed to queue vkms_crc_work_handle ...
The aforementioned commit fixed the extra vblank added by `drm_crtc_arm_vblank_event()` which is invoked inside `vkms_crtc_atomic_flush()` if the vblank event count was zero, otherwise `drm_crtc_send_vblank_event()` is invoked. The fix was implemented in `vkms_get_vblank_timestamp()` by subtracting one period from the current timestamp, as the code snippet below illustrates:
if (!in_vblank_irq) *vblank_time -= output->period_ns;
The above fix works well when `drm_crtc_arm_vblank_event()` is invoked. However, it does not properly work when `drm_crtc_send_vblank_event()` executes since it subtracts the correct timestamp, which it shouldn't. In this case, the `drm_crtc_accurate_vblank_count()` function will returns the wrong frame number, which generates the aforementioned error. Such decrease in `get_vblank_timestamp()` produce a negative number in the following calculation within `drm_update_vblank_count()`:
u64 diff_ns = ktime_to_ns(ktime_sub(t_vblank, vblank->time));
After this operation, the DIV_ROUND_CLOSEST_ULL macro is invoked using diff_ns with a negative number, which generates an undefined result; therefore, the returned frame is a huge and incorrect number. Finally, the code below is part of the `vkms_crc_work_handle()`, note that the while loop depends on the returned value from `drm_crtc_accurate_vblank_count()` which may cause the loop to take a long time to finish in case of huge value.
frame_end = drm_crtc_accurate_vblank_count(crtc); while (frame_start <= frame_end) drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
This commit fixes this issue by checking if the vblank timestamp corresponding to the current software vblank counter is equal to the current vblank; if they are equal, it means that `drm_crtc_send_vblank_event()` was invoked and vkms does not need to discount the extra vblank, otherwise, `drm_crtc_arm_vblank_event()` was executed and vkms have to discount the extra vblank. This fix made the CRC tests work again whereas keep all tests from kms_flip working as well.
V2: Update commit message
Signed-off-by: Rodrigo Siqueira rodrigosiqueiramelo@gmail.com Signed-off-by: Shayenne Moura shayenneluzmoura@gmail.com
Thanks a lot for typing up this commit message. I'm still not following what's going on (this stuff is tricky), but now I think I can at least ask useful scenarios.
For my understanding: Things go wrong when in the function vkms_crtc_atomic_flush() the call to drm_crtc_vblank_get() returns 0, and we call drm_crtc_send_vblank_event() directly? I'm not 100% from your description above whether that's the failure case where everything blows up.
The other part I'm having a hard time understanding: If we are in the case where we call drm_crtc_send_vblank_event() directly, how do we end up in the vkms_get_vblank_timestamp().
From what I can see there's not caller to where we sample a new vblank ...
I think a full backtrace that hits the new condition you're adding to understand when exactly we're hitting it would be perfect.
drivers/gpu/drm/vkms/vkms_crtc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index 7508815fac11..3ce60e66673e 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -74,9 +74,13 @@ bool vkms_get_vblank_timestamp(struct drm_device *dev, unsigned int pipe, { struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev); struct vkms_output *output = &vkmsdev->output;
struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
*vblank_time = output->vblank_hrtimer.node.expires;
if (*vblank_time == vblank->time)
return true;
Ok, I think with the above I do now have a rough idea what's going wrong. I think that would be a bug in the drm_vblank.c. Or at least it could be, I think I first need to better understand still what's going on here to decided that.
I have a bit a hunch this is fallout from our vblank fudging, but I guess we'll see.
It's definitely clear that things blow up if the vblank time somehow goes backwards, but I don't understand yet how that's possible. -Daniel
- if (!in_vblank_irq) *vblank_time -= output->period_ns;
-- 2.21.0
dri-devel@lists.freedesktop.org