https://bugs.freedesktop.org/show_bug.cgi?id=83436
--- Comment #17 from smoki smoki00790@gmail.com --- OK i found -mtune=generic is culprit for performance :). Played a little with -mtune to found what is minimum this code wants to work fast:
-mtune=i586 = slow -mtune=pentium = slow -mtune=pentium-mmx = slow -mtune=pentium-pro = fast -mtune=i686 = fast -mtune=pentium3 = fast -mtune=pentium-pro = fast etc...
So -mtune=generic seems to set lower cpu target than this code needed to perform fast.