Op 28-11-12 14:21, Thomas Hellstrom schreef:
On 11/28/2012 01:15 PM, Maarten Lankhorst wrote:
Op 28-11-12 12:54, Thomas Hellstrom schreef:
On 11/28/2012 12:25 PM, Maarten Lankhorst wrote:
By removing the unlocking of lru and retaking it immediately, a race is removed where the bo is taken off the swap list or the lru list between the unlock and relock. As such the cleanup_refs code can be simplified, it will attempt to call ttm_bo_wait non-blockingly, and if it fails it will drop the locks and perform a blocking wait, or return an error if no_wait_gpu was set.
The need for looping is also eliminated, since swapout and evict_mem_first will always follow the destruction path, so no new fence is allowed to be attached. As far as I can see this may already have been the case, but the unlocking / relocking required a complicated loop to deal with re-reservation.
The downside is that ttm_bo_cleanup_memtype_use is no longer called with reservation held, so drivers must be aware that move_notify with a null parameter doesn't require a reservation.
Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not immediately clear from this patch.
Because we would hold the reservation while waiting and with the object still on swap and lru lists still, that would defeat the whole purpose of keeping the object on multiple lists, plus break current code that assumes bo's on the those lists can always be reserved.
the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but I'm sure that would not be the case if the reservations were shared across devices.
The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation, and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error.
If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that case, so not adding it back to the other lists is harmless.
But I thought it was a feature that it still appeared on the lists while not being reserved, since in that case swapout and evict_first wouldn't need to hunt down another buffer to swap out or free, instead waiting ddestroy wait to finish, with possibly multiple waiters.
Isn't it possible to do the same in the !no_wait_gpu case? /Thomas
Sure, we wouldn't even need to re-add since only having the bo on the ddestroy list not catastrophical. I was just worried because since that it is still a behavioral change.
I'm tempted to measure events about that though, and use perf events on how often contentions occur, buffers get evicted and loaded back in again, etc..
Before we start worrying about any optimizations, we should worry about solid instrumentation first.
I think for example that evict_first and swapout could walk the ddestroy list first, all buffers that can be reserved there will be tested if non-blocking wait succeeds or not. If it does succeed function returns early, if not it unreserves it again without touching any list. This would cause ttm to prefer non-blocking destruction of bo's before normal code runs, but without measurements it's no more than a nice thought experiment.
~Maarten