Third would be having a firewall in 2D driver checking the stream and ensuring all registers that accept addresses are written by values derived from dmabufs. I haven't tried implementing this, but it'd involve a lookup table in kernel and CPU reading through the command stream. Offsets and sizes would also need to be validated. There would be a performance hit.
This is the standard mechanism, and what exynos does as well.
The per process VM method is also used as an extension to this on some hw.
Dave.