So the MMU fault is somehow specific to what I'm doing. Interesting.
I think I found the issue: the MMU "flush and sync" is not good enough in some cases.
What the Vivante kernel driver does, for MMUv2, after mapping some kinds of buffer objects (apparently those tagged INDEX and VERTEX, this includes shader code and CL buffers) is
- Send MMU flush command (like we do) - Add a notify event "resume" (they hardwire event 29 for this) - Add END command the command buffer so that the FE stops - Remember where to continue
Then in the interrupt handler:
- If the "resume" notify event comes in - Wait for FE to be idle - Restart the FE to the remembered position
This is implemented in "pause" here http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc... gcvPAGE_TABLE_DIRTY_BIT_FE is set here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc... endAfterFlushMmuCache is set here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc... The interrupt notification is handled here: http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc...
I hacked this into the DRM driver and have been running my test for quite some time, bumping against the tail end of the address range many times, without any MMU faults.
My proposal is to add a bo flag for buffers that need this kind of "hard" MMU reset (this is not all of them, e.g. textures don't), and if their iova mapping requires a MMU flush, do the above stop-and-start ritual (in case of MMUv2).
Wladimir