Actually nothing to do with the -O3, but with the -mtune -march it seems. On Debian default is -mtune=generic -march=i586 maybe that is the problem, if i pass -mtune=native -march=native then it performs fine in most cases :) But not all, glretrace sometimes perform slow sometimes fast... so build system seems borked currently even more on 32bit :)