https://bugs.freedesktop.org/show_bug.cgi?id=71448
Priority: medium Bug ID: 71448 Assignee: dri-devel@lists.freedesktop.org Summary: [UVD] qvdpautest is very slow on radeonsi (HD 7950) Severity: normal Classification: Unclassified OS: All Reporter: darkbasic@linuxsystems.it Hardware: Other Status: NEW Version: XOrg CVS Component: DRM/Radeon Product: DRI
http://bpaste.net/show/148239/
kernel is 3.13 (~agd5f drm-next-3.13). The whole graphic stack is from git except xorg-server which is 1.14.3.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #1 from Vladimir Ysikov grantipak@gmail.com --- ArchLinux x32; kernel 3.12; llvm - svn; mesa - git; xorg-server 1.14.4; Radeon HD 7950
qvdpautest 0.5.2 AMD Phenom(tm) 9550 Quad-Core Processor Unknown GPU
VDPAU API version : 1 VDPAU implementation : G3DVL VDPAU Driver Shared Library version 1.0
MPEG DECODING (1920x1080): 19 frames/s MPEG DECODING (1280x720): 19 frames/s H264 DECODING (1920x1080): 15 frames/s H264 DECODING (1280x720): 16 frames/s MPEG4 DECODING (1920x1080): 15 frames/s
MIXER WEAVE (1920x1080): 3293 frames/s MIXER BOB (1920x1080): 3878 fields/s MIXER TEMPORAL (1920x1080): 3884 fields/s MIXER TEMPORAL + IVTC (1920x1080): 3881 fields/s MIXER TEMPORAL + SKIP_CHROMA (1920x1080): 3895 fields/s MIXER TEMPORAL_SPATIAL (1920x1080): 3881 fields/s MIXER TEMPORAL_SPATIAL + IVTC (1920x1080): 3885 fields/s MIXER TEMPORAL_SPATIAL + SKIP_CHROMA (1920x1080): 3888 fields/s MIXER TEMPORAL_SPATIAL (720x576 video to 1920x1080 display): 3439 fields/s
MULTITHREADED MPEG DECODING (1920x1080): 76 frames/s MULTITHREADED MIXER TEMPORAL (1920x1080): 3930 fields/s
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #2 from Alex Deucher agd5f@yahoo.com --- Make sure dpm is enabled. add radeon.dpm=1 to the kernel command line in grub.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #3 from Mike Lothian mike@fireburn.co.uk --- I'm not sure if it's related but make sure your xserver is patched to work with the latest mesa fd1b24a93e ("glx: Add support for the new DRI")
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #4 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #0)
In the future, please attach the output rather than referring to an external site that may go away at some point.
kernel is 3.13 (~agd5f drm-next-3.13). The whole graphic stack is from git except xorg-server which is 1.14.3.
from your log: FATAL: get_bits failed : No backend implementation could be loaded.!!
There's some problem with your build.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #5 from Christian König deathsimple@vodafone.de --- (In reply to comment #4)
FATAL: get_bits failed : No backend implementation could be loaded.!!
There's some problem with your build.
That message is normal, just a function we haven't implemented yet.
But I agree the numbers look like you are on the bootup clocks for UVD/graphics or something is going wrong with dpm.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #6 from darkbasic darkbasic@linuxsystems.it --- dpm is enabled of course (because I set radeon.dpm=1 and because 3.13 should have dpm enabled by default afaik). When using UVD with dpm set to auto it switches from the lowest state to the highest lots of times, again and again. *Anyway* when I did run the attached benchmark I forced dpm to "high" before starting.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #7 from darkbasic darkbasic@linuxsystems.it --- Myke I'm upgrading to 1.14.4 and patching with glx: Add support for the new DRI loader entrypoint: http://cgit.freedesktop.org/xorg/xserver/commit/?id=7ecfab47eb221dbb996ea6c0...
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #8 from darkbasic darkbasic@linuxsystems.it --- I applied 'glx: Add support for the new DRI loader entrypoint' to xorg-server-1.14.4 and I updated the rest of the graphic stack to latest snapshot from git master: nothing changes.
While the test was running I got a "Bus error":
Fontconfig warning: "/etc/fonts/conf.d/50-user.conf", line 14: reading configurations from ~/.fonts.conf is deprecated. qvdpautest 0.5.2 Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz Unknown GPU
VDPAU API version : 1 VDPAU implementation : G3DVL VDPAU Driver Shared Library version 1.0
FATAL: get_bits failed : No backend implementation could be loaded.!!
MPEG DECODING (1920x1080): 8 frames/s MPEG DECODING (1280x720): 5 frames/s Errore di bus
Also I noticed that despite I did "echo high > /sys/devices/pci0000:00/0000:00:1c.6/0000:03:00.0/power_dpm_force_performance_level" I still get lots of power states switching in dmesg.
Please see attached dmesg.
I also noticed lots of "HDMI: ELD buf size is 0, force 128" and "HDMI: invalid ELD data byte 0" in my dmesg. Maybe something audio related? Monitor is attached using DVI, not HDMI.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #9 from darkbasic darkbasic@linuxsystems.it --- Created attachment 89017 --> https://bugs.freedesktop.org/attachment.cgi?id=89017&action=edit dmesg
dmesg after running qvdpautest
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #10 from Vadim Girlin ptpzz@yandex.ru --- This is probably related to dpm and gpu clocks - if I run "vblank_mode=0 glxgears" in parallel with the benchmark the results are significantly better for me:
w/o gears:
MPEG DECODING (1920x1080): 13 frames/s MPEG DECODING (1280x720): 13 frames/s H264 DECODING (1920x1080): 12 frames/s H264 DECODING (1280x720): 13 frames/s
with gears:
MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 118 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 92 frames/s
(In reply to comment #6)
*Anyway* when I did run the attached benchmark I forced dpm to "high" before starting.
Setting power_dpm_force_performance_level to "high" doesn't really work for me in this case - AFAICS the driver resets it back to "auto" when the benchmark starts, probably when switching to uvd state.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #11 from Grigori Goronzy greg@chown.ath.cx --- (In reply to comment #5)
(In reply to comment #4)
FATAL: get_bits failed : No backend implementation could be loaded.!!
There's some problem with your build.
That message is normal, just a function we haven't implemented yet.
As far as I can see it's actually illegal API usage in qvdpautest. It's trying to read from uninitialized video surfaces, which is not guaranteed to work. Swapping around the order of tests so that it does the PutBits test first fixes it.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #12 from Christian König deathsimple@vodafone.de --- (In reply to comment #8)
Also I noticed that despite I did "echo high > /sys/devices/pci0000:00/0000:00:1c.6/0000:03:00.0/ power_dpm_force_performance_level" I still get lots of power states switching in dmesg.
As Vadim correctly noted forcing any power state doesn't work here, because we need to switch to the UVD power state anyway.
BTW: Is this a regression?
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #13 from Christian König deathsimple@vodafone.de --- (In reply to comment #10)
This is probably related to dpm and gpu clocks - if I run "vblank_mode=0 glxgears" in parallel with the benchmark the results are significantly better for me:
w/o gears:
MPEG DECODING (1920x1080): 13 frames/s MPEG DECODING (1280x720): 13 frames/s H264 DECODING (1920x1080): 12 frames/s H264 DECODING (1280x720): 13 frames/s
with gears:
MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 118 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 92 frames/s
^^ very valuable comment, thx.
So do I get that right that generation 3D load affects UVD decoding performance here?
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #14 from Christian König deathsimple@vodafone.de --- (In reply to comment #11)
(In reply to comment #5)
(In reply to comment #4)
FATAL: get_bits failed : No backend implementation could be loaded.!!
There's some problem with your build.
That message is normal, just a function we haven't implemented yet.
As far as I can see it's actually illegal API usage in qvdpautest. It's trying to read from uninitialized video surfaces, which is not guaranteed to work. Swapping around the order of tests so that it does the PutBits test first fixes it.
Thx for the into. qvdpautest is badly written in many aspects (takes to many time, is inaccurate etc...). Would be nice if somebody could sit down and either write something new from scratch or start to improve it.
Some rather stupid command-line tool with a couple of options for testing different decoding profile and output methods should be perfectly sufficient.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #15 from darkbasic darkbasic@linuxsystems.it --- Here is mine while running glxgears:
MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 16 frames/s H264 DECODING (1280x720): 91 frames/s Profile unsupported. MPEG4 DECODING (1920x1080): 72 frames/s
No, it isn't a reggression: it never worked for me.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #16 from Vadim Girlin ptpzz@yandex.ru --- (In reply to comment #13)
So do I get that right that generation 3D load affects UVD decoding performance here?
Yes, I think the explanation of the difference in benchmark results is that glxgears triggers higher gpu power level, probably the benchmark alone simply doesn't provide enough load for that or maybe something is wrong with dpm logic. Here is what I see while running the benchmark:
w/o gears: power level 0 sclk: 45000 mclk: 120000 vddc: 900 vddci: 975 pcie gen: 2 with gears: power level 2 sclk: 100000 mclk: 120000 vddc: 1219 vddci: 975 pcie gen: 2 uvd clocks are the same in both cases: uvd vclk: 72000 dclk: 56000
when the system is completely idle I see the following values: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 vddc: 825 vddci: 850 pcie gen: 2
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #17 from darkbasic darkbasic@linuxsystems.it --- Anyway even with glxgears it's far from being perfect. See this video: http://www.youtube.com/watch?v=aM3aRiKgxwM
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #18 from Alex Deucher agd5f@yahoo.com --- With kernel 3.13, the driver retains the user selected performance level across state changes. Additionally, when using a UVD state, the sclk and mclk are always forced to their highest levels. This isn't reflected in the debugfs output since that just prints the unpatched power state. Does plain video playback work ok (i.e., not qvdpautest)?
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #19 from darkbasic darkbasic@linuxsystems.it --- Alex from the tests I did 3.13 doesn't seem to behave the way you described.
With power state set to "high", desktop effect OFF and no glxgears: MPEG DECODING (1920x1080): 5 frames/s MPEG DECODING (1280x720): 21 frames/s H264 DECODING (1920x1080): 8 frames/s H264 DECODING (1280x720): 5 frames/s
With power state set to "high", desktop effect OFF and glxgears: MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 16 frames/s H264 DECODING (1280x720): 91 frames/s
With power state set to "high", desktop effect ON and glxgears: MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 91 frames/s
It seems even glxgears wasn't able to keep highest power state in all tests: enabling desktop effects was enough to keep higher state in the second-last test.
What do you mean by "plain video playback"? If you mean without using vdpau it works flawlessly.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #20 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #19)
What do you mean by "plain video playback"? If you mean without using vdpau it works flawlessly.
Just play a video with vdpau using mplayer or some other app that supports vdpau. I'm wondering if perhaps the way qvdpautest works causes the driver to switch between power states too often so the clocks never get a chance to stablize.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #21 from darkbasic darkbasic@linuxsystems.it --- No, I have the very same problem with mplayer2 + vdapu and adobe flash + vdpau.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #22 from Vladimir Ysikov grantipak@gmail.com --- (In reply to comment #14)
Some rather stupid command-line tool with a couple of options for testing different decoding profile and output methods should be perfectly sufficient.
mplayer -benchmark ?
http://www.w6rz.net/1080p25.zip
kwin desktop effect ON and no glxgears mplayer -vo gl -benchmark -nosound 1080p25.ts ... BENCHMARKs: VC: 79.531s VO: 27.184s A: 0.000s Sys: 2.558s = 109.273s BENCHMARK%: VC: 72.7814% VO: 24.8773% A: 0.0000% Sys: 2.3413% = 100.0000%
kwin desktop effect ON and no glxgears mplayer -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts ... BENCHMARKs: VC: 2.425s VO: 38.371s A: 0.000s Sys: 3.190s = 43.986s BENCHMARK%: VC: 5.5141% VO: 87.2335% A: 0.0000% Sys: 7.2523% = 100.0000%
kwin desktop effect ON and glxgears mplayer -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts ... BENCHMARKs: VC: 2.449s VO: 38.074s A: 0.000s Sys: 3.748s = 44.271s BENCHMARK%: VC: 5.5325% VO: 86.0010% A: 0.0000% Sys: 8.4665% = 100.0000%
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #23 from darkbasic darkbasic@linuxsystems.it --- I used to use mplayer benchmark before switching to qvdpautest, unfortunately results are not comparable because other peoples have to have the very same videos (which tends to go offline after some months). vdpautest is the right way to go in my opinion, we just need someone to fix the remaining bugs.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #24 from darkbasic darkbasic@linuxsystems.it --- kwin desktop effects ON and glxgears mplayer2 -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts
BENCHMARKs: VC: 5.073s VO: 105.829s A: 0.000s Sys: 3.263s = 114.164s BENCHMARK%: VC: 4.4433% VO: 92.6989% A: 0.0000% Sys: 2.8578% = 100.0000%
Mplayer better shows the behaviour I wanted to show with the previous video: it takes *ALOT* of time to start the benchmark, but then it's quite fast. At least while glxgears is running: in fact without glxgears it takes ages.
Here is a second video showing the lag I'm talking about: http://www.youtube.com/watch?v=BDhB61U9S0A As you can see when it starts decoding is quite fast.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #25 from darkbasic darkbasic@linuxsystems.it --- DPM doesn't still work with 3.14-rc0 :(:(:(
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #26 from Alex Deucher agd5f@yahoo.com --- Does forcing the power state to high help? As root:
echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #27 from darkbasic darkbasic@linuxsystems.it --- Power state high doesn't help.
qvdpautest + 3.14-rc0 + AUTO + KWIN OFF:
MPEG DECODING (1920x1080): 8 frames/s MPEG DECODING (1280x720): 9 frames/s H264 DECODING (1920x1080): 8 frames/s H264 DECODING (1280x720): 8 frames/s Profile unsupported. MPEG4 DECODING (1920x1080): 8 frames/s
qvdpautest + 3.14-rc0 + HIGH + KWIN OFF: the same
qvdpautest + 3.14-rc0 + HIGH + KWIN ON + glxgears:
MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 90 frames/s Profile unsupported. MPEG4 DECODING (1920x1080): 71 frames/s
Fortunately I noticed a great improvement with 3.14: I don't have the huge lag before the start of the benchmark and glxgears doesn't freeze anymore like in http://www.youtube.com/watch?v=BDhB61U9S0A
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #28 from Christian König deathsimple@vodafone.de --- Created attachment 93072 --> https://bugs.freedesktop.org/attachment.cgi?id=93072&action=edit Fix.
Sorry that it took me so long to find this. It's a rather simple issue that the IRQ support for UVD on SI wasn't activated.
With this patch in place I now get 52fps with 1080p H264 decoding.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #29 from Vladimir Ysikov grantipak@gmail.com --- (In reply to comment #28)
Created attachment 93072 [details] Fix.
Sorry that it took me so long to find this. It's a rather simple issue that the IRQ support for UVD on SI wasn't activated.
With this patch in place I now get 52fps with 1080p H264 decoding.
With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git, llvm-svn, Archlinux x86.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #30 from Vladimir Ysikov grantipak@gmail.com --- Created attachment 93084 --> https://bugs.freedesktop.org/attachment.cgi?id=93084&action=edit dmesg output
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #31 from Vladimir Ysikov grantipak@gmail.com --- Created attachment 93085 --> https://bugs.freedesktop.org/attachment.cgi?id=93085&action=edit Xorg log
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #32 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #29)
With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git, llvm-svn, Archlinux x86.
Are you sure this lockup isn't caused by some other upgrade you did such as mesa? The patch works fine here.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #33 from darkbasic darkbasic@linuxsystems.it --- I confirm the patch works on top of drm-next:
MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 91 frames/s Profile unsupported. MPEG4 DECODING (1920x1080): 72 frames/s
Any chance to get it merged in time for 3.14?
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #34 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #33)
Any chance to get it merged in time for 3.14?
Yes, it'll show up in 3.14 and the stable kernels.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #35 from Christian König deathsimple@vodafone.de --- Sounds like we can close this bug.
The GPU lockup of the GFX ring seems to be unreleated, please open up a new bugreport if that's really a regression.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #36 from Vladimir Ysikov grantipak@gmail.com --- (In reply to comment #32)
(In reply to comment #29)
With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git, llvm-svn, Archlinux x86.
Are you sure this lockup isn't caused by some other upgrade you did such as mesa? The patch works fine here.
Yes i am sure. I run qvdpautest before and after upgrade kernel. But now i run test again several times and all fine, no more gpu lockup.
If this happen again i open new bug.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #37 from Christian König deathsimple@vodafone.de --- Ok, thanks allot. Looks like we can close it.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
darkbasic darkbasic@linuxsystems.it changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |REOPENED Resolution|FIXED |---
--- Comment #38 from darkbasic darkbasic@linuxsystems.it --- We need to reopen, it hangs for me too.
Here is a video which hangs 100% of the times: https://mega.co.nz/#!eQhSjJQR!EEe8-taN5IspIu-RW0WQzmvKzc5fkCn282kS5ugZ_as
Play with mplayer2 -vo vdpau, -vc ffmpeg12vdpau,ffwmv3vdpau,ffvc1vdpau,ffh264vdpau,ffodivxvdpau, PlanetEarthBirds.mkv
https://bugs.freedesktop.org/show_bug.cgi?id=71448
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |FIXED
--- Comment #39 from Christian König deathsimple@vodafone.de --- As already mentioned then please open up a new bugreport, cause that's clearly a different issue.
https://bugs.freedesktop.org/show_bug.cgi?id=71448
--- Comment #40 from darkbasic darkbasic@linuxsystems.it --- Here it is: https://bugs.freedesktop.org/show_bug.cgi?id=74335
dri-devel@lists.freedesktop.org