On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
On 01/25/2016 05:15 AM, Michel Dänzer wrote:
On 23.01.2016 00:18, Ville Syrjälä wrote:
On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
[ Trimming KDE folks from Cc ]
On 21.01.2016 19:09, Daniel Vetter wrote: > On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >> On 21.01.2016 16:58, Daniel Vetter wrote: >>> >>> Can you please point me at the vblank on/off jump bug please? >> >> AFAIR I originally reported it in response to >> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >> , but I can't find that in the archives, so maybe that was just on IRC. >> See >> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >> . Basically, I ran into the bug fixed by your patch because the counter >> jumped forward on every DPMS off, so it hit the 32-bit boundary after >> just a few days. > > Ok, so just uncovered the overflow bug.
Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC:
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards.
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
Same here.
At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though.
These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be.
Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly?
I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off.
So on a manual display disable -> reenable you get something like
At disable:
Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count.
Then the crtc is shut down and its hw counter resets to zero.
At reenable:
Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump.
The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF).
I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now.
It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off.
Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all careful to not call that function if vblanks are already disabled. The only exception is drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms drivers which have resetting hw counters or other problematic behaviour during modesets etc. then this will break. E.g., calling the vblank timestamping stuff is also not safe/well-defined during modesets when the timestamping constants are not (yet) updated to reflect the new mode timing of the modeset in progress.
The idea is to maintain the appearance that the counter ticks all the time as long as the crtc is active. While that may not be really required in case if no one is currently interested in the vblank counter, I think it's a nice thing to have just to make the behaviour of the counter consistent.
As far as calling drm_vblank_off() after the hw counter got reset, well, that not correct. It should be called before the reset.
-mario
Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine.
-mario