On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
Hi,
since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE Tumbleweed). There's a screen with progressbar showing the startup, which normally fades away after reaching 100%. But with kernel 4.4, the progress gets stuck somewhere between 1/2 and 3/4 (not always the same). Top shows that kwin is using few % of CPU's but mostly sleeps in poll(). When I kill it from another console, I see that everything has actually started up, just the progressbar screen was obscuring it. The windows obviously don't have decorations etc. Starting kwin manually again shows me again the progressbar screen at the same position.
Hmm. Sounds like it could then be waiting for a vblank in the distant future. There's that 1<<23 limit in the code though, but even with that we end up with a max wait of ~38 hours assuming a 60Hz refresh rate.
Stuff to try might include enabling drm.debug=0x2f, though that'll generate a lot of stuff. Another option would be to use the drm vblank tracepoints to try and catch what seq number it's waiting for and where we're at currently. Or I suppose you could just hack up drm_wait_vblank() to print an error message or something if the requested seq number is in the future by, say, more than a few seconds, and if that's the case then we could try to figure out why that happens.
I have suspected that kwin is waiting for some event, but nevertheless tried bisecting the kernel between 4.3 and 4.4, which lead to:
# first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use vblank timestamps to guesstimate how many vblanks were missed
I can confirm that 4.4 works if I revert the following commits: 63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v3)"
d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v2)"
The sha1s don't seem to match what I have, so not sure which kernel tree you have, but looking at the radeon commit at least one thing immediately caught my attention;
+ /* Bump counter if we are at >= leading edge of vblank, + * but before vsync where vpos would turn negative and + * the hw counter really increments. + */ + if (vpos >= 0) + count++;
It's rather hard to see what it's really doing since the custom flags to the get_scanout_position now cause it return non-standard things. But if I'm reading things correctly it should really say something like:
if (vpos >= 0 && vpos < (vsync_start - vblank_start)) count++;
Hmm. Actually even that might not be correct since it could be using the "fake" vblank start here, so might be it'd need to be something like:
if (vpos >= 0 && vpos < (vsync_start - vblank_start + lb_vblank_lead_lines) count++;
Also might be worth a shot to just ignore the hw frame counter. Eg.:
index e266ffc520d2..db728580549a 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -492,7 +492,6 @@ static struct drm_driver kms_driver = { .lastclose = radeon_driver_lastclose_kms, .set_busid = drm_pci_set_busid, .unload = radeon_driver_unload_kms, - .get_vblank_counter = radeon_get_vblank_counter_kms, .enable_vblank = radeon_enable_vblank_kms, .disable_vblank = radeon_disable_vblank_kms, .get_vblank_timestamp = radeon_get_vblank_timestamp_kms, diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c index 979f3bf65f2c..3c5fcab74152 100644 --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c @@ -152,11 +152,6 @@ int radeon_driver_irq_postinstall_kms(struct drm_device *dev) { struct radeon_device *rdev = dev->dev_private;
- if (ASIC_IS_AVIVO(rdev)) - dev->max_vblank_count = 0x00ffffff; - else - dev->max_vblank_count = 0x001fffff; - return 0; }
assuming I'm reading the code correctly.
31ace027c9f1f8e0a2b09bbf961e4db7b1f6cf19 "drm: Don't zero vblank timestamps from the irq handler"
ac0567a4b132fa66e3edf3f913938af9daf7f916 "drm: Add DRM_DEBUG_VBL()"
4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps to guesstimate how many vblanks were missed"
All clean reverts, just needs some fixup on top to use abs() instead of abs64() due to 79211c8ed19c055ca105502c8733800d442a0ae6.
Unfortunately I don't know if this is a kernel problem or kwin problem. I tried to CC maintainers of both, advices what to try or what info to provide welcome. The card is "CAICOS" with 1GB memory.
Thanks, Vlastimil
On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
On 21.01.2016 15:38, Michel Dänzer wrote:
On 21.01.2016 14:31, Mario Kleiner wrote:
On 01/21/2016 04:43 AM, Michel Dänzer wrote:
On 21.01.2016 05:32, Mario Kleiner wrote:
So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
Right, looks like there's been a regression breaking this. I suspect the problem is that vblank->last isn't getting updated from drm_vblank_post_modeset. Not sure which change broke that though, or how to fix it. Ville?
The whole logic has changed and the software counter updates are now driven all the time by the hw counter.
BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for vblank counters"). I've been meaning to track that down since then; one of these days hopefully, but if anybody has any ideas offhand...
I spent the last few hours reading through the drm and radeon code and i think what should probably work is to replace the drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on calls. These are apparently meant for drivers whose hw counters reset during modeset, [...]
... just like drm_vblank_pre/post_modeset. That those were broken is a regression which needs to be fixed anyway. I don't think switching to drm_vblank_on/off is suitable for stable trees.
Even more so since as I mentioned, there is (has been since at least about half a year ago) a counter jumping bug with drm_vblank_on/off as well.
Hm, never noticed you reported that. I thought the reason for not picking up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace where it tried to use vblank waits on a disabled pipe?
Can you please point me at the vblank on/off jump bug please?
Thanks, Daniel
On 21.01.2016 16:58, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
On 21.01.2016 15:38, Michel Dänzer wrote:
On 21.01.2016 14:31, Mario Kleiner wrote:
On 01/21/2016 04:43 AM, Michel Dänzer wrote:
On 21.01.2016 05:32, Mario Kleiner wrote:
So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
Right, looks like there's been a regression breaking this. I suspect the problem is that vblank->last isn't getting updated from drm_vblank_post_modeset. Not sure which change broke that though, or how to fix it. Ville?
The whole logic has changed and the software counter updates are now driven all the time by the hw counter.
BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for vblank counters"). I've been meaning to track that down since then; one of these days hopefully, but if anybody has any ideas offhand...
I spent the last few hours reading through the drm and radeon code and i think what should probably work is to replace the drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on calls. These are apparently meant for drivers whose hw counters reset during modeset, [...]
... just like drm_vblank_pre/post_modeset. That those were broken is a regression which needs to be fixed anyway. I don't think switching to drm_vblank_on/off is suitable for stable trees.
Even more so since as I mentioned, there is (has been since at least about half a year ago) a counter jumping bug with drm_vblank_on/off as well.
Hm, never noticed you reported that. I thought the reason for not picking up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace where it tried to use vblank waits on a disabled pipe?
http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
I don't know why it didn't get picked up.
Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
On 21.01.2016 16:58, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
On 21.01.2016 15:38, Michel Dänzer wrote:
On 21.01.2016 14:31, Mario Kleiner wrote:
On 01/21/2016 04:43 AM, Michel Dänzer wrote:
On 21.01.2016 05:32, Mario Kleiner wrote: > > So the problem is that AMDs hardware frame counters reset to > zero during a modeset. The old DRM code dealt with drivers doing that by > keeping vblank irqs enabled during modesets and incrementing vblank > count by one during each vblank irq, i think that's what > drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
Right, looks like there's been a regression breaking this. I suspect the problem is that vblank->last isn't getting updated from drm_vblank_post_modeset. Not sure which change broke that though, or how to fix it. Ville?
The whole logic has changed and the software counter updates are now driven all the time by the hw counter.
BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for vblank counters"). I've been meaning to track that down since then; one of these days hopefully, but if anybody has any ideas offhand...
I spent the last few hours reading through the drm and radeon code and i think what should probably work is to replace the drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on calls. These are apparently meant for drivers whose hw counters reset during modeset, [...]
... just like drm_vblank_pre/post_modeset. That those were broken is a regression which needs to be fixed anyway. I don't think switching to drm_vblank_on/off is suitable for stable trees.
Even more so since as I mentioned, there is (has been since at least about half a year ago) a counter jumping bug with drm_vblank_on/off as well.
Hm, never noticed you reported that. I thought the reason for not picking up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace where it tried to use vblank waits on a disabled pipe?
http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
I don't know why it didn't get picked up.
Yeah, checking my tree your ack is indeed in there. I think I'll resend them.
Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
Ok, so just uncovered the overflow bug. -Daniel
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
On 21.01.2016 16:58, Daniel Vetter wrote:
Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
I suspect this may not be evident with current Intel hardware because dev->max_vblank_count = 0xffffffff, which makes the wraparound code in drm_update_vblank_count a no-op. Maybe you can reproduce it if you artificially set a lower max_vblank_count in the driver.
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
On 21.01.2016 16:58, Daniel Vetter wrote:
Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
dev->max_vblank_count = 0xffffffff, which makes the wraparound code in drm_update_vblank_count a no-op. Maybe you can reproduce it if you artificially set a lower max_vblank_count in the driver.
-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
On 01/22/2016 04:18 PM, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
On 21.01.2016 16:58, Daniel Vetter wrote:
Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Fwiw, testing the HD-57570 single display with my patch that uses drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show hardware counter reset to zero as expected, but no jumps of software vblank counter. So with that vblank_off/on placement it seems to work nicely here.
-mario
dev->max_vblank_count = 0xffffffff, which makes the wraparound code in drm_update_vblank_count a no-op. Maybe you can reproduce it if you artificially set a lower max_vblank_count in the driver.
-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
On 01/22/2016 07:29 PM, Mario Kleiner wrote:
On 01/22/2016 04:18 PM, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
On 21.01.2016 16:58, Daniel Vetter wrote:
Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Fwiw, testing the HD-57570 single display with my patch that uses drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show hardware counter reset to zero as expected, but no jumps of software vblank counter. So with that vblank_off/on placement it seems to work nicely here.
-mario
I spoke too early. The jump doesn't happen when i change video modes - video resolution / refresh rate etc, despite hw counter reset. But if i just disable and then reenable a display, the software counter jumps.
-mario
On 23.01.2016 00:18, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
On 21.01.2016 16:58, Daniel Vetter wrote:
Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly?
On 01/25/2016 05:15 AM, Michel Dänzer wrote:
On 23.01.2016 00:18, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
On 21.01.2016 16:58, Daniel Vetter wrote:
Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly?
I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off.
So on a manual display disable -> reenable you get something like
At disable:
Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count.
Then the crtc is shut down and its hw counter resets to zero.
At reenable:
Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump.
The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF).
I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now.
Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine.
-mario
On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
On 01/25/2016 05:15 AM, Michel Dänzer wrote:
On 23.01.2016 00:18, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
On 21.01.2016 16:58, Daniel Vetter wrote: > > Can you please point me at the vblank on/off jump bug please?
AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days.
Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly?
I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off.
So on a manual display disable -> reenable you get something like
At disable:
Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count.
Then the crtc is shut down and its hw counter resets to zero.
At reenable:
Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump.
The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF).
I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now.
It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off.
Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine.
-mario
On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
On 01/25/2016 05:15 AM, Michel Dänzer wrote:
On 23.01.2016 00:18, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote:
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: > On 21.01.2016 16:58, Daniel Vetter wrote: >> >> Can you please point me at the vblank on/off jump bug please? > > AFAIR I originally reported it in response to > http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html > , but I can't find that in the archives, so maybe that was just on IRC. > See > http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html > . Basically, I ran into the bug fixed by your patch because the counter > jumped forward on every DPMS off, so it hit the 32-bit boundary after > just a few days.
Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly?
I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off.
So on a manual display disable -> reenable you get something like
At disable:
Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count.
Then the crtc is shut down and its hw counter resets to zero.
At reenable:
Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump.
The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF).
I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now.
It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off.
Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all careful to not call that function if vblanks are already disabled. The only exception is drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms drivers which have resetting hw counters or other problematic behaviour during modesets etc. then this will break. E.g., calling the vblank timestamping stuff is also not safe/well-defined during modesets when the timestamping constants are not (yet) updated to reflect the new mode timing of the modeset in progress.
-mario
Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine.
-mario
On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
On 01/25/2016 05:15 AM, Michel Dänzer wrote:
On 23.01.2016 00:18, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote: > On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >> On 21.01.2016 16:58, Daniel Vetter wrote: >>> >>> Can you please point me at the vblank on/off jump bug please? >> >> AFAIR I originally reported it in response to >> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >> , but I can't find that in the archives, so maybe that was just on IRC. >> See >> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >> . Basically, I ran into the bug fixed by your patch because the counter >> jumped forward on every DPMS off, so it hit the 32-bit boundary after >> just a few days. > > Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly?
I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off.
So on a manual display disable -> reenable you get something like
At disable:
Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count.
Then the crtc is shut down and its hw counter resets to zero.
At reenable:
Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump.
The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF).
I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now.
It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off.
Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all careful to not call that function if vblanks are already disabled. The only exception is drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms drivers which have resetting hw counters or other problematic behaviour during modesets etc. then this will break. E.g., calling the vblank timestamping stuff is also not safe/well-defined during modesets when the timestamping constants are not (yet) updated to reflect the new mode timing of the modeset in progress.
The idea is to maintain the appearance that the counter ticks all the time as long as the crtc is active. While that may not be really required in case if no one is currently interested in the vblank counter, I think it's a nice thing to have just to make the behaviour of the counter consistent.
As far as calling drm_vblank_off() after the hw counter got reset, well, that not correct. It should be called before the reset.
-mario
Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine.
-mario
Readding Daniel, which somehow got dropped from the cc.
On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
On 01/25/2016 05:15 AM, Michel Dänzer wrote:
On 23.01.2016 00:18, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: > > [ Trimming KDE folks from Cc ] > > On 21.01.2016 19:09, Daniel Vetter wrote: >> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >>> On 21.01.2016 16:58, Daniel Vetter wrote: >>>> >>>> Can you please point me at the vblank on/off jump bug please? >>> >>> AFAIR I originally reported it in response to >>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >>> , but I can't find that in the archives, so maybe that was just on IRC. >>> See >>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >>> . Basically, I ran into the bug fixed by your patch because the counter >>> jumped forward on every DPMS off, so it hit the 32-bit boundary after >>> just a few days. >> >> Ok, so just uncovered the overflow bug. > > Not sure what you mean by "just", but to be clear: The drm_vblank_on/off > counter jumping bug (similar to the bug this thread is about), which > exposed the overflow bug, is still alive and kicking in 4.5. It seems > to happen when turning off the CRTC: > > [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 > [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 > [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] > [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 > [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 > [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 > [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 > [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 > [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] > [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 > [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly?
I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off.
So on a manual display disable -> reenable you get something like
At disable:
Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count.
Then the crtc is shut down and its hw counter resets to zero.
At reenable:
Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump.
The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF).
I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now.
It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off.
Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all careful to not call that function if vblanks are already disabled. The only exception is drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms drivers which have resetting hw counters or other problematic behaviour during modesets etc. then this will break. E.g., calling the vblank timestamping stuff is also not safe/well-defined during modesets when the timestamping constants are not (yet) updated to reflect the new mode timing of the modeset in progress.
The idea is to maintain the appearance that the counter ticks all the time as long as the crtc is active. While that may not be really required in case if no one is currently interested in the vblank counter, I think it's a nice thing to have just to make the behaviour of the counter consistent.
As far as calling drm_vblank_off() after the hw counter got reset, well, that not correct. It should be called before the reset.
What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The first call to DMPS_OFF will call drm_vblank_off() and really disable vblank-irqs if they were running, updating the counts/ts a last time. But then the dpms off will reset the hw counter to zero. When one reenables the display, a second call to DPMS_OFF will now call drm_vblank_off again when it apparently shouldn't.
I just tested this patch, which fixes the counter jumps on radeon-kms with my or Daniel's drm_vblank_off patches to radeon:
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 607f493..d739d93 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe) spin_lock_irqsave(&dev->event_lock, irqflags);
spin_lock(&dev->vbl_lock); - vblank_disable_and_save(dev, pipe); + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, vblank->enabled); + + if (vblank->enabled) + vblank_disable_and_save(dev, pipe); wake_up(&vblank->queue);
/* @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe) return;
spin_lock_irqsave(&dev->vbl_lock, irqflags); + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, vblank->enabled); + /* Drop our private "prevent drm_vblank_get" refcount */ if (vblank->inmodeset) { atomic_dec(&vblank->refcount);
Another, maybe better, approach might be to no-op redundant calls to drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to drm_vblank_on() iff !vblank->inmodeset.
-mario
-mario
Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine.
-mario
On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
Readding Daniel, which somehow got dropped from the cc.
On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
On 01/25/2016 05:15 AM, Michel Dänzer wrote:
On 23.01.2016 00:18, Ville Syrjälä wrote: >On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: >> >>[ Trimming KDE folks from Cc ] >> >>On 21.01.2016 19:09, Daniel Vetter wrote: >>>On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >>>>On 21.01.2016 16:58, Daniel Vetter wrote: >>>>> >>>>>Can you please point me at the vblank on/off jump bug please? >>>> >>>>AFAIR I originally reported it in response to >>>>http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >>>>, but I can't find that in the archives, so maybe that was just on IRC. >>>>See >>>>http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >>>>. Basically, I ran into the bug fixed by your patch because the counter >>>>jumped forward on every DPMS off, so it hit the 32-bit boundary after >>>>just a few days. >>> >>>Ok, so just uncovered the overflow bug. >> >>Not sure what you mean by "just", but to be clear: The drm_vblank_on/off >>counter jumping bug (similar to the bug this thread is about), which >>exposed the overflow bug, is still alive and kicking in 4.5. It seems >>to happen when turning off the CRTC: >> >>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 >>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] >>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 > >Not sure what bug we're talking about here, but here the hw counter >clearly jumps backwards. > >>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 >>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 >>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 >>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] >>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 > >Same here.
At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though.
>These things just don't happen on i915 because drm_vblank_off() and >drm_vblank_on() are always called around the times when the hw counter >might get reset. Or at least that's how it should be.
Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly?
I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off.
So on a manual display disable -> reenable you get something like
At disable:
Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count.
Then the crtc is shut down and its hw counter resets to zero.
At reenable:
Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump.
The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF).
I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now.
It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off.
Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all careful to not call that function if vblanks are already disabled. The only exception is drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms drivers which have resetting hw counters or other problematic behaviour during modesets etc. then this will break. E.g., calling the vblank timestamping stuff is also not safe/well-defined during modesets when the timestamping constants are not (yet) updated to reflect the new mode timing of the modeset in progress.
The idea is to maintain the appearance that the counter ticks all the time as long as the crtc is active. While that may not be really required in case if no one is currently interested in the vblank counter, I think it's a nice thing to have just to make the behaviour of the counter consistent.
As far as calling drm_vblank_off() after the hw counter got reset, well, that not correct. It should be called before the reset.
What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The first call to DMPS_OFF will call drm_vblank_off() and really disable vblank-irqs if they were running, updating the counts/ts a last time. But then the dpms off will reset the hw counter to zero. When one reenables the display, a second call to DPMS_OFF will now call drm_vblank_off again when it apparently shouldn't.
I just tested this patch, which fixes the counter jumps on radeon-kms with my or Daniel's drm_vblank_off patches to radeon:
This might be due to the legacy helpers, which just love to redundantly disable stuff that's off already. The problem I see with no-oping these out is that for atomic drivers (which really should get this right) this might paper over bugs: E.g. when you forget to call _off() when disabling the crtc, then calling _on() twice in a row is indeed a serious bug. Similar when you forget to call _on() and have multiple _off() calls in a row.
So not sure what to do here. -Daniel
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 607f493..d739d93 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe) spin_lock_irqsave(&dev->event_lock, irqflags);
spin_lock(&dev->vbl_lock);
vblank_disable_and_save(dev, pipe);
DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
vblank->enabled);
if (vblank->enabled)
vblank_disable_and_save(dev, pipe); wake_up(&vblank->queue); /*
@@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe) return;
spin_lock_irqsave(&dev->vbl_lock, irqflags);
DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
vblank->enabled);
/* Drop our private "prevent drm_vblank_get" refcount */ if (vblank->inmodeset) { atomic_dec(&vblank->refcount);
Another, maybe better, approach might be to no-op redundant calls to drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to drm_vblank_on() iff !vblank->inmodeset.
-mario
-mario
Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine.
-mario
On 01/25/2016 07:51 PM, Daniel Vetter wrote:
On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
Readding Daniel, which somehow got dropped from the cc.
On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
On 01/25/2016 05:15 AM, Michel Dänzer wrote: > On 23.01.2016 00:18, Ville Syrjälä wrote: >> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: >>> >>> [ Trimming KDE folks from Cc ] >>> >>> On 21.01.2016 19:09, Daniel Vetter wrote: >>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >>>>> On 21.01.2016 16:58, Daniel Vetter wrote: >>>>>> >>>>>> Can you please point me at the vblank on/off jump bug please? >>>>> >>>>> AFAIR I originally reported it in response to >>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >>>>> , but I can't find that in the archives, so maybe that was just on IRC. >>>>> See >>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >>>>> . Basically, I ran into the bug fixed by your patch because the counter >>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after >>>>> just a few days. >>>> >>>> Ok, so just uncovered the overflow bug. >>> >>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off >>> counter jumping bug (similar to the bug this thread is about), which >>> exposed the overflow bug, is still alive and kicking in 4.5. It seems >>> to happen when turning off the CRTC: >>> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 >>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] >>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 >> >> Not sure what bug we're talking about here, but here the hw counter >> clearly jumps backwards. >> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 >> >> Same here. > > At least one of the jumps is expected, because this is around turning > off the CRTC for DPMS off. Don't know yet why there are two jumps back > though. > > >> These things just don't happen on i915 because drm_vblank_off() and >> drm_vblank_on() are always called around the times when the hw counter >> might get reset. Or at least that's how it should be. > > Which is of course the idea of Daniel's patch (which is what I'm getting > the above with) or Mario's patch as well, but clearly something's still > wrong. It's certainly possible that it's something in the driver, but > since calling drm_vblank_pre/post_modeset from the same places seems to > work fine (ignoring the regression discussed in this thread)... Do > drm_vblank_on/off require something else to handle this correctly? > >
I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off.
So on a manual display disable -> reenable you get something like
At disable:
Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count.
Then the crtc is shut down and its hw counter resets to zero.
At reenable:
Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump.
The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF).
I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now.
It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off.
Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all careful to not call that function if vblanks are already disabled. The only exception is drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms drivers which have resetting hw counters or other problematic behaviour during modesets etc. then this will break. E.g., calling the vblank timestamping stuff is also not safe/well-defined during modesets when the timestamping constants are not (yet) updated to reflect the new mode timing of the modeset in progress.
The idea is to maintain the appearance that the counter ticks all the time as long as the crtc is active. While that may not be really required in case if no one is currently interested in the vblank counter, I think it's a nice thing to have just to make the behaviour of the counter consistent.
As far as calling drm_vblank_off() after the hw counter got reset, well, that not correct. It should be called before the reset.
What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The first call to DMPS_OFF will call drm_vblank_off() and really disable vblank-irqs if they were running, updating the counts/ts a last time. But then the dpms off will reset the hw counter to zero. When one reenables the display, a second call to DPMS_OFF will now call drm_vblank_off again when it apparently shouldn't.
I just tested this patch, which fixes the counter jumps on radeon-kms with my or Daniel's drm_vblank_off patches to radeon:
This might be due to the legacy helpers, which just love to redundantly disable stuff that's off already. The problem I see with no-oping these out is that for atomic drivers (which really should get this right) this might paper over bugs: E.g. when you forget to call _off() when disabling the crtc, then calling _on() twice in a row is indeed a serious bug. Similar when you forget to call _on() and have multiple _off() calls in a row.
So not sure what to do here. -Daniel
Yes, the legacy helpers cause two calls to dpms off if one disables a display. First during display disable as intended. Then when one reenables the display during modesetting as part of crtc_funcs->prepare() - at least on radeon.
Maybe the minimum thing that would help is to just check for vblank->inmodeset in drm_vblank_off(). If that would be the case we'd know it is a redundant call and could no-op it and do a WARN_ON(vblank->inmodeset)?
drm_vblank_on() i don't know how to treat, but that one calls drm_reset_vblank_timestamp() which should be less problematic if called redundantly.
Now the patch i want to try next to fix the drm_vblank_pre/post_modeset regression in Linux 4.4/4.5 is to add a ...
if ((diff > 1) && vblank->inmodeset) diff = 1;
... to the bottom of drm_update_vblank_count(). That should hopefully restore the pre/post_modeset behavior as close to the original behavior as possible. As a side effect it would also prevent the counter jump caused by redundant calls to drm_vblank_off().
-mario
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 607f493..d739d93 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe) spin_lock_irqsave(&dev->event_lock, irqflags);
spin_lock(&dev->vbl_lock);
vblank_disable_and_save(dev, pipe);
DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
vblank->enabled);
if (vblank->enabled)
vblank_disable_and_save(dev, pipe); wake_up(&vblank->queue); /*
@@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe) return;
spin_lock_irqsave(&dev->vbl_lock, irqflags);
DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
vblank->enabled);
/* Drop our private "prevent drm_vblank_get" refcount */ if (vblank->inmodeset) { atomic_dec(&vblank->refcount);
Another, maybe better, approach might be to no-op redundant calls to drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to drm_vblank_on() iff !vblank->inmodeset.
-mario
-mario
Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine.
-mario
On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:
On 01/25/2016 07:51 PM, Daniel Vetter wrote:
On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
Readding Daniel, which somehow got dropped from the cc.
On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: > > >On 01/25/2016 05:15 AM, Michel Dänzer wrote: >>On 23.01.2016 00:18, Ville Syrjälä wrote: >>>On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: >>>> >>>>[ Trimming KDE folks from Cc ] >>>> >>>>On 21.01.2016 19:09, Daniel Vetter wrote: >>>>>On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >>>>>>On 21.01.2016 16:58, Daniel Vetter wrote: >>>>>>> >>>>>>>Can you please point me at the vblank on/off jump bug please? >>>>>> >>>>>>AFAIR I originally reported it in response to >>>>>>http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >>>>>>, but I can't find that in the archives, so maybe that was just on IRC. >>>>>>See >>>>>>http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >>>>>>. Basically, I ran into the bug fixed by your patch because the counter >>>>>>jumped forward on every DPMS off, so it hit the 32-bit boundary after >>>>>>just a few days. >>>>> >>>>>Ok, so just uncovered the overflow bug. >>>> >>>>Not sure what you mean by "just", but to be clear: The drm_vblank_on/off >>>>counter jumping bug (similar to the bug this thread is about), which >>>>exposed the overflow bug, is still alive and kicking in 4.5. It seems >>>>to happen when turning off the CRTC: >>>> >>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 >>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] >>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 >>> >>>Not sure what bug we're talking about here, but here the hw counter >>>clearly jumps backwards. >>> >>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 >>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 >>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 >>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] >>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 >>> >>>Same here. >> >>At least one of the jumps is expected, because this is around turning >>off the CRTC for DPMS off. Don't know yet why there are two jumps back >>though. >> >> >>>These things just don't happen on i915 because drm_vblank_off() and >>>drm_vblank_on() are always called around the times when the hw counter >>>might get reset. Or at least that's how it should be. >> >>Which is of course the idea of Daniel's patch (which is what I'm getting >>the above with) or Mario's patch as well, but clearly something's still >>wrong. It's certainly possible that it's something in the driver, but >>since calling drm_vblank_pre/post_modeset from the same places seems to >>work fine (ignoring the regression discussed in this thread)... Do >>drm_vblank_on/off require something else to handle this correctly? >> >> > >I suspect it is because vblank_disable_and_save calls >drm_update_vblank_count() unconditionally, even if vblank irqs are >already off. > >So on a manual display disable -> reenable you get something like > >At disable: > >Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> >vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes >final count. > >Then the crtc is shut down and its hw counter resets to zero. > >At reenable: > >Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> >atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> >drm_vblank_off -> vblank_disable_and_save -> A pointless >drm_update_vblank_count() while the hw counter is already reset to zero >--> Unwanted counter jump. > > >The problem doesn't happen on a pure modeset to a different video >resolution/refresh rate, as then we only have one call into >atombios_crtc_dpms(DPMS_OFF). > >I think the fix is to fix vblank_disable_and_save() to only call >drm_update_vblank_count() if vblank irqs get actually disabled, not on >no-op calls. I will try that now.
It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off.
Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all careful to not call that function if vblanks are already disabled. The only exception is drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms drivers which have resetting hw counters or other problematic behaviour during modesets etc. then this will break. E.g., calling the vblank timestamping stuff is also not safe/well-defined during modesets when the timestamping constants are not (yet) updated to reflect the new mode timing of the modeset in progress.
The idea is to maintain the appearance that the counter ticks all the time as long as the crtc is active. While that may not be really required in case if no one is currently interested in the vblank counter, I think it's a nice thing to have just to make the behaviour of the counter consistent.
As far as calling drm_vblank_off() after the hw counter got reset, well, that not correct. It should be called before the reset.
What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The first call to DMPS_OFF will call drm_vblank_off() and really disable vblank-irqs if they were running, updating the counts/ts a last time. But then the dpms off will reset the hw counter to zero. When one reenables the display, a second call to DPMS_OFF will now call drm_vblank_off again when it apparently shouldn't.
I just tested this patch, which fixes the counter jumps on radeon-kms with my or Daniel's drm_vblank_off patches to radeon:
This might be due to the legacy helpers, which just love to redundantly disable stuff that's off already. The problem I see with no-oping these out is that for atomic drivers (which really should get this right) this might paper over bugs: E.g. when you forget to call _off() when disabling the crtc, then calling _on() twice in a row is indeed a serious bug. Similar when you forget to call _on() and have multiple _off() calls in a row.
So not sure what to do here. -Daniel
Yes, the legacy helpers cause two calls to dpms off if one disables a display. First during display disable as intended. Then when one reenables the display during modesetting as part of crtc_funcs->prepare() - at least on radeon.
Maybe the minimum thing that would help is to just check for vblank->inmodeset in drm_vblank_off(). If that would be the case we'd know it is a redundant call and could no-op it and do a WARN_ON(vblank->inmodeset)?
I have that here locally, blows up all over the place on radoen. And also would blow up everywhere else.
I was thinking of adding the vblank->inmodeset check to radeon/amdgpu, with a note why it's needed (legacy crtc helpers just suck).
drm_vblank_on() i don't know how to treat, but that one calls drm_reset_vblank_timestamp() which should be less problematic if called redundantly.
I think even legacy crtc helpers don't enable stuff again if it's not been disabled before. So on drm_vblank_on() we might be able to put a WARN_ON in place ...
Now the patch i want to try next to fix the drm_vblank_pre/post_modeset regression in Linux 4.4/4.5 is to add a ...
if ((diff > 1) && vblank->inmodeset) diff = 1;
... to the bottom of drm_update_vblank_count(). That should hopefully restore the pre/post_modeset behavior as close to the original behavior as possible. As a side effect it would also prevent the counter jump caused by redundant calls to drm_vblank_off().
Hm, can we just frob pre/post_modeset only with some checks? I'd like to not put that kind of "I have no idea about my hw state" hacks into the new helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to enforce correct usage, which would be a real bummer imo. -Daniel
On 01/25/2016 09:32 PM, Daniel Vetter wrote:
On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:
On 01/25/2016 07:51 PM, Daniel Vetter wrote:
On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
Readding Daniel, which somehow got dropped from the cc.
On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
On 01/25/2016 02:23 PM, Ville Syrjälä wrote: > On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: >> >> >> On 01/25/2016 05:15 AM, Michel Dänzer wrote: >>> On 23.01.2016 00:18, Ville Syrjälä wrote: >>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: >>>>> >>>>> [ Trimming KDE folks from Cc ] >>>>> >>>>> On 21.01.2016 19:09, Daniel Vetter wrote: >>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote: >>>>>>>> >>>>>>>> Can you please point me at the vblank on/off jump bug please? >>>>>>> >>>>>>> AFAIR I originally reported it in response to >>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >>>>>>> , but I can't find that in the archives, so maybe that was just on IRC. >>>>>>> See >>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >>>>>>> . Basically, I ran into the bug fixed by your patch because the counter >>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after >>>>>>> just a few days. >>>>>> >>>>>> Ok, so just uncovered the overflow bug. >>>>> >>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off >>>>> counter jumping bug (similar to the bug this thread is about), which >>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems >>>>> to happen when turning off the CRTC: >>>>> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 >>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] >>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 >>>> >>>> Not sure what bug we're talking about here, but here the hw counter >>>> clearly jumps backwards. >>>> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 >>>> >>>> Same here. >>> >>> At least one of the jumps is expected, because this is around turning >>> off the CRTC for DPMS off. Don't know yet why there are two jumps back >>> though. >>> >>> >>>> These things just don't happen on i915 because drm_vblank_off() and >>>> drm_vblank_on() are always called around the times when the hw counter >>>> might get reset. Or at least that's how it should be. >>> >>> Which is of course the idea of Daniel's patch (which is what I'm getting >>> the above with) or Mario's patch as well, but clearly something's still >>> wrong. It's certainly possible that it's something in the driver, but >>> since calling drm_vblank_pre/post_modeset from the same places seems to >>> work fine (ignoring the regression discussed in this thread)... Do >>> drm_vblank_on/off require something else to handle this correctly? >>> >>> >> >> I suspect it is because vblank_disable_and_save calls >> drm_update_vblank_count() unconditionally, even if vblank irqs are >> already off. >> >> So on a manual display disable -> reenable you get something like >> >> At disable: >> >> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> >> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes >> final count. >> >> Then the crtc is shut down and its hw counter resets to zero. >> >> At reenable: >> >> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> >> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> >> drm_vblank_off -> vblank_disable_and_save -> A pointless >> drm_update_vblank_count() while the hw counter is already reset to zero >> --> Unwanted counter jump. >> >> >> The problem doesn't happen on a pure modeset to a different video >> resolution/refresh rate, as then we only have one call into >> atombios_crtc_dpms(DPMS_OFF). >> >> I think the fix is to fix vblank_disable_and_save() to only call >> drm_update_vblank_count() if vblank irqs get actually disabled, not on >> no-op calls. I will try that now. > > It does that on purpose. Otherwise the vblank counter would appear to > have stalled while the interrupt was off. >
Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all careful to not call that function if vblanks are already disabled. The only exception is drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms drivers which have resetting hw counters or other problematic behaviour during modesets etc. then this will break. E.g., calling the vblank timestamping stuff is also not safe/well-defined during modesets when the timestamping constants are not (yet) updated to reflect the new mode timing of the modeset in progress.
The idea is to maintain the appearance that the counter ticks all the time as long as the crtc is active. While that may not be really required in case if no one is currently interested in the vblank counter, I think it's a nice thing to have just to make the behaviour of the counter consistent.
As far as calling drm_vblank_off() after the hw counter got reset, well, that not correct. It should be called before the reset.
What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The first call to DMPS_OFF will call drm_vblank_off() and really disable vblank-irqs if they were running, updating the counts/ts a last time. But then the dpms off will reset the hw counter to zero. When one reenables the display, a second call to DPMS_OFF will now call drm_vblank_off again when it apparently shouldn't.
I just tested this patch, which fixes the counter jumps on radeon-kms with my or Daniel's drm_vblank_off patches to radeon:
This might be due to the legacy helpers, which just love to redundantly disable stuff that's off already. The problem I see with no-oping these out is that for atomic drivers (which really should get this right) this might paper over bugs: E.g. when you forget to call _off() when disabling the crtc, then calling _on() twice in a row is indeed a serious bug. Similar when you forget to call _on() and have multiple _off() calls in a row.
So not sure what to do here. -Daniel
Yes, the legacy helpers cause two calls to dpms off if one disables a display. First during display disable as intended. Then when one reenables the display during modesetting as part of crtc_funcs->prepare() - at least on radeon.
Maybe the minimum thing that would help is to just check for vblank->inmodeset in drm_vblank_off(). If that would be the case we'd know it is a redundant call and could no-op it and do a WARN_ON(vblank->inmodeset)?
I have that here locally, blows up all over the place on radoen. And also would blow up everywhere else.
You mean the WARN_ON causes ugliness? The no-op on redundant calls to drm_vblank_off would hopefully not blow up anything but prevent blow ups?
The problem is that there should not ever be a call to the drm_update_vblank_count() function once a crtc is in modeset/dpms off etc., not only because of the hw vblank counters being reset to zero, but also because vblank timestamps computed may be wrong, going backwards etc. That could again confuse clients.
I was thinking of adding the vblank->inmodeset check to radeon/amdgpu, with a note why it's needed (legacy crtc helpers just suck).
Maybe you could do that check in radeon/amdgpu, but still also leave it in drm_vblank_off()? If all kms drivers properly avoid redundant calls as part of legacy paths then the WARN_ON and no-op in drm_vblank_off should not ever trigger, unless there is a real bug, right? In which case it should hopefully prevent worse things like a hanging composited desktop, or login, and instead just make noise in the kernel log?
drm_vblank_on() i don't know how to treat, but that one calls drm_reset_vblank_timestamp() which should be less problematic if called redundantly.
I think even legacy crtc helpers don't enable stuff again if it's not been disabled before. So on drm_vblank_on() we might be able to put a WARN_ON in place ...
Hm, logging thist stuff here during modesets and display dis/enable i see lots of drm_vblank_on that come in "pairs" only about two dozen msecs apart.
Now the patch i want to try next to fix the drm_vblank_pre/post_modeset regression in Linux 4.4/4.5 is to add a ...
if ((diff > 1) && vblank->inmodeset) diff = 1;
... to the bottom of drm_update_vblank_count(). That should hopefully restore the pre/post_modeset behavior as close to the original behavior as possible. As a side effect it would also prevent the counter jump caused by redundant calls to drm_vblank_off().
Hm, can we just frob pre/post_modeset only with some checks? I'd like to not put that kind of "I have no idea about my hw state" hacks into the new helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to enforce correct usage, which would be a real bummer imo. -Daniel
We could check for only (vblank->inmodeset & 0x2) to only apply it to the legacy pre/post path, trusting that the drm_vblank_off/on path will be made robust in a different way, e.g., by the stuff discussed above and careful implementation in each kms driver that uses those. Atm. radeon doesn't use off/on, so your enablement patch set can make sure it does the right thing from the beginning.
rockchip-kms may need similar treatment to radeon to avoid redundant calls.
Btw. how the patch to drm_update_vblank_count() close to the bottom would actually look is more like:
if ((diff > 1) && ((vblank->inmodeset & 0x2) || (flags & DRM_CALLED_FROM_VBLIRQ))) diff = 1;
Another problem we have is that the implementation of the vblank timestamp doublebuffer with our custom sequence locking is only save against concurrent readers as long as the increment for store_vblank() is +1, ie. diff = 1 in that code.
Other diff values like 2 would cause us to write to the other timestamp slot that is potentially read by some concurrent readers at the same time and they would get corrupted timestamps.
Now outside vblank irq drm_update_vblank_count() is only supposed to be called while there aren't any concurrent readers ie., vblank->refcount == 0, and the way drm_vblank_on/off are implemented this seems to be the case, so anything goes.
if called from drm_handle_vblank() aka (flags & DRM_CALLED_FROM_VBLIRQ), we have to assume that there are concurrent readers, because vblank irqs are usually only kept active if vblank->refcount is > 0. This means only +1 increments are allowed. In the past this was always the case, but with the new implementation since Linux 4.4, it could happen that we get diff > 1 if the vblank irq gets deferred by more than 1 video refresh cycle, e.g., due to long irq off periods, maybe preemption on a RT_PREEMPT kernel, long held locks, firmware triggered SMI's etc. That would cause bad corruption of timestamps.
So unless or until we also rewrite the timestamp caching, we need that extra protection against diff > 1 in vblank irq.
-mario
On Mon, Jan 25, 2016 at 10:42 PM, Mario Kleiner mario.kleiner.de@gmail.com wrote:
Now the patch i want to try next to fix the drm_vblank_pre/post_modeset regression in Linux 4.4/4.5 is to add a ...
if ((diff > 1) && vblank->inmodeset) diff = 1;
... to the bottom of drm_update_vblank_count(). That should hopefully restore the pre/post_modeset behavior as close to the original behavior as possible. As a side effect it would also prevent the counter jump caused by redundant calls to drm_vblank_off().
Hm, can we just frob pre/post_modeset only with some checks? I'd like to not put that kind of "I have no idea about my hw state" hacks into the new helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to enforce correct usage, which would be a real bummer imo. -Daniel
We could check for only (vblank->inmodeset & 0x2) to only apply it to the legacy pre/post path, trusting that the drm_vblank_off/on path will be made robust in a different way, e.g., by the stuff discussed above and careful implementation in each kms driver that uses those. Atm. radeon doesn't use off/on, so your enablement patch set can make sure it does the right thing from the beginning.
rockchip-kms may need similar treatment to radeon to avoid redundant calls.
Btw. how the patch to drm_update_vblank_count() close to the bottom would actually look is more like:
if ((diff > 1) && ((vblank->inmodeset & 0x2) || (flags & DRM_CALLED_FROM_VBLIRQ))) diff = 1;
Yeah I think that should work as a short-term fix for radoen. When you do that, can you pls do a second patch to give the magic 0x2 and 0x1 some meaning? Otherwise this is super-hard to understand code ... -Daniel
On 01/21/2016 09:28 AM, Mario Kleiner wrote:
... just like drm_vblank_pre/post_modeset. That those were broken is a regression which needs to be fixed anyway. I don't think switching to drm_vblank_on/off is suitable for stable trees.
Looking at Vlastimil's original post again, I'd say the most likely culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many vblanks were missed").
Yeah, this is what I bisected to.
Yes, i think reverting that one alone would likely fix it by reverting to the old vblank update logic.
Yep I said in the original mail that reverting on top of 4.4 fixed it. Well not just this single commit, but also some patches on top (e.g. radeon and amdgpu adaptations to that commit, IIRC it wouldn't have compiled otherwise).
Once drm_vblank_off is called, drm_vblank_get will no-op and return an error, so clients can't enable vblank irqs during the modeset - pageflip ioctl and waitvblank ioctl would fail while a modeset happens - hopefully userspace handles this correctly everywhere.
We've fixed xf86-video-ati for this.
I'll hack up a patch for demonstration now.
You're a bit late to that party. :)
http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
Oops. Just sent out my little (so far untested) creations. Yes, they are essentially the same as Daniel's patches. The only addition is to also fix that other potential small race i describe by slightly moving the xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put imbalance in radeon_pm if vblank_on/off would be used.
Thanks, I'll test.
-mario
dri-devel@lists.freedesktop.org