On 11/18/2011 03:56 PM, Jerome Glisse wrote:
On Fri, Nov 18, 2011 at 03:30:03PM +0100, Thomas Hellstrom wrote:
On 11/18/2011 02:15 PM, Ben Skeggs wrote:
On Fri, 2011-11-18 at 08:57 +0100, Thomas Hellstrom wrote:
Jerome,
I don't like this change for the following reasons
-snip-
One can take advantage of move notify callback but, there are corner case where bind/unbind might be call without move notify being call (in error path mostly). So to make sure that each virtual address space have a consistent view of wether a buffer object is backed or not by system page it's better to pass the buffer object to bind/unbind.
As I discussed previously with Jerome on IRC, I believe the move_notify hook is sufficient. I fixed a couple of issues back with it back when I implemented support for this in nouveau.
Jerome did point out two issues however, which I believe can be solved easily enough.
The first was that the error path after move_notify() has been called doesn't reverse the move_notify() too, leaving incorrect page table entries. This, I think, could easily be solved by swapping bo->mem and mem, and re-calling move_notify() to have the driver reverse whatever it did.
The second issue is that apparently move_notify() doesn't get called in the destroy path. Nouveau's GEM layer takes care of this for our userspace bos, and we don't use any kernel bos that are mapped into a channel's address space so we don't hit it. This can probably be solved easily enough too I expect.
Ben.
OK. I understand. Surely if a move_notify is missing somewhere, that can easily be fixed.
It might be good if we can outline how the virtual tables are set up. In my world, the following would happen:
- Pre command submission:
a) reserve bo b) call ttm_bo_validate to set placement. move_notify will tear down any existing GPU page tables for the bo. c) Set up GPU page tables. d) Command submission e) unreserve_bo().
- When eviction happens:
a) reserve bo b) move_notify is called to tear down page tables c) change placement d) Unreserve bo.
Let's say an error occurs in 1d) Why would you need to undo 1c?
Similarly if an error occurs in 2c) Why would you need to undo 2b)?
Error is in ttm_bo_handle_move_mem move_notify is call before we do the move but the move might fail.
But even if the move fails in 1b) isn't it safe to just leave the virtual GPU map unbound, since GPU page tables will *always* be set up when placement is complete in 1c?
For destroy issue is when destroying a gtt buffer, it will just be unbind, no call to ttm_bo_handle_move_mem which is the only function calling back move_notify. (see ttm_bo_release_list).
Yes, here a fix is needed.
I will fix this 2 corner case. I just wanted to make things symetrical. By hooking up the virtual address space update with bind/unbind
Note that at one point in the future the dream i have is no more reserve, iommu page fault which is coming up (PASID process address space id, with ATS address translation service and PRI page request interface) would allow such things. Only synchronization will be needed when moving object from system ram to vram or vice versa.
Indeed, but then TTM will probably be overkill, and something simpler can be implemented.
Cheers, Jerome
Thanks, Thomas