Hello,
Using an Atom E3845 board, we had a pretty bad performance regression when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I traced it back to commit 78a42377. Reverting this commit and subsequent related commits (b9ffd80, 71745376, etc) fixes the performance regression for me.
Without those patches, I can play 8-9 1080p MPEG2 streams, after them, it's down to 5-6.
I tested using a libdrm checkout from Feb 16, and the latest git master of libva, libva-intel-driver and gst-plugins-vaapi. The "identity drop-probability=1" is to prevent anything from being displayed, so it's purely decoding performance.
Pure decode, single stream not displayed: time gst-launch-1.0 filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! identity drop-probability=1 ! vaapisink
With kernel 3.18.0-rc7-01052-g493018d real 0m11.429s user 0m6.516s sys 0m1.640s
With kernel 3.18.0-rc7-01053-g78a4237 real 0m12.694s user 0m6.744s sys 0m2.680s
8 simultaneous streams displayed: time gst-launch-1.0 filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \ filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \ filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \ filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \ filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \ filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \ filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \ filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0
With kernel 3.18.0-rc7-01052-g493018d real 2m45.317s user 1m21.296s sys 0m51.080s
With kernel 3.18.0-rc7-01053-g78a4237 real 3m1.275s user 1m24.336s sys 1m38.360s
On Thu, Apr 09, 2015 at 09:00:43PM -0400, Olivier Crête wrote:
Hello,
Using an Atom E3845 board, we had a pretty bad performance regression when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I traced it back to commit 78a42377. Reverting this commit and subsequent related commits (b9ffd80, 71745376, etc) fixes the performance regression for me.
Can you please test
http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-olivier-crete
on your setup.
First http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&... to get a baseline with nightly as that contains some fine tuning to the batch allocations, which is pretty significant for libva on Atom (only double clflushing one or two pages every batch rather than 128) and then http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&... to see if the command parser tuning helps.
Hope this helps, -Chris
Hello,
Thanks for the quick reply!
With my real use-cases:
1. 9x 720p60 mpeg2 videos - 4.0-rc6: ~12 frames per second are on time - 4.0-rc6 + reverts: a stable 45 frames per second are on time - 044307a9: 40-45 frames per second are on time - 0a24802a: 45-46 frames per second are on time
2. 1080i30 mpeg2 videos - 4.0-rc6: 5 videos - 044307a9: 10 videos - 0a24802a: 10 videos
So you basically beat my baseline too, good job, thanks a lot! Any chance you can sneak this into 4.0 ?
Olivier
On Fri, 2015-04-10 at 07:23 +0100, Chris Wilson wrote:
On Thu, Apr 09, 2015 at 09:00:43PM -0400, Olivier Crête wrote:
Hello,
Using an Atom E3845 board, we had a pretty bad performance regression when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I traced it back to commit 78a42377. Reverting this commit and subsequent related commits (b9ffd80, 71745376, etc) fixes the performance regression for me.
Can you please test
http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-olivier-crete
on your setup.
First http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&... to get a baseline with nightly as that contains some fine tuning to the batch allocations, which is pretty significant for libva on Atom (only double clflushing one or two pages every batch rather than 128) and then http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&... to see if the command parser tuning helps.
Hope this helps, -Chris
dri-devel@lists.freedesktop.org