On Thu, Nov 19, 2015 at 10:06:01PM +0200, Ville Syrjälä wrote:
On Thu, Nov 19, 2015 at 05:44:51PM -0200, Paulo Zanoni wrote:
2014-05-26 11:26 GMT-03:00 ville.syrjala@linux.intel.com:
From: Ville Syrjälä ville.syrjala@linux.intel.com
Now that the vblank races are plugged, we can opt out of using the vblank disable timer and just let vblank interrupts get disabled immediately when the last reference is dropped.
Gen2 is the exception since it has no hardware frame counter.
Hi
Remember last week's FBC vblank optimization patch that had an erroneous drm_crtc_vblank_get() instead of drm_crtc_vblank_count()? After I fixed the bug in the patch I realized that it was the unbalanced vblank_get() call that moved PC state residency up.
I did some experiments, and on my specific BDW machine, after running "powertop --auto-tune", I get about 15-25% PC7 residency without FBC. If I revert this patch, the number jumps to 40-45%. With FBC, the PC7 residency goes from 60-70% to 85-90% when I revert this patch. I'm running just an idle Cinnamon with an open terminal.
So, since the commit message lacks more details, what are the downsides of reverting this patch? What are the advantages of opting out of the vblank timer? I see my desktop does tons and tons of vblank get/put calls per second, so the disable timer makes a lot of sense.
"Idle" desktop :(
Really the immediate disable should save power. Where are these tons of vblank get/puts coming from actually? I would assume you'd get a handful per frame at most, and that when you're actually doing something. On an idle system I would expect nothing at all happens during most frames.
Yes, with the immediate disable we end up with a few get/put per frame of rendering, as userspace queries and queues the next flip/swap. With more clients, there are more opportunities for queries prior to queuing the swap. It really shouldn't be more than a handful per frame if all the clients are vrefresh limited. (The oddity was the vblank_mode=0 rendering where we still maintained the handful of get/put per frame...)
I also wish there was some easy way to check how this patch (or its revert) affect a bunch of different workloads...
Maybe add the PC residencies to the power meter in intel-gpu-overlay, alongside the RC residencies?
(Also CCing Chris for insightful comments on performance)
IIRC Chris had a patch to not disable the interrupt immediately when the refcount drops to 0, but instead delay the disable until the next interrupt actually happens. But I guess it didn't go in? Probably I should have reviewed it but didn't. It sounds like a decent idea to me in any case for the active use case.
The discussion went off into the barriers around the vblank counter, and I forgot to bring it up again. I think even Mario was happy enough about it.
Paulo, you may want to try
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=ef97f4...
or you can change the KEEPALIVES value in src/sna/sna_dri2.c for a similar effect. -Chris