I'm not too familiar with VAAPI, but for transcoding, you really need efficient pipelining between the decode and the encode. If there are CPU copies in the middle, performance won't be great. We generally recommend using gstreamer with OpenMax using tunneling so that there are no extra copies between the decode and encode stages of the transcode. I'm not sure if VAAPI supports something like this.