buggy/weird behavior in ttm

List overview All Threads
Download

newer

older

[Bug 17902] [830M missing dvo...

[pull] radeon drm-fixes-3.7

Maarten Lankhorst

11 Oct 2012 11 Oct '12

2:50 p.m.

I was trying to clean ttm up a little so my changes would be less invasive, and simplify the code for debuggability. During testing I noticed the following weirdnesses: - ttm_mem_evict_first ignores no_wait_gpu if the buffer is on the ddestroy list. If you follow the code, it will effectively spin in ttm_mem_evict_first if a bo is on the list and no_wait_gpu is true. This makes it very hard to change this function around to something more sane, what is the desired effect? Could this perhaps be changed to something more sane?

I was working on a commit that removes fence_lock since I was killing off the fence lock, but that requires some kind of defined behavior for this. Unless we leave this in place as expected behavior..

- no_wait_reserve is ignored if no_wait_gpu is false ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but subsequently it will do a wait_unreserved if no_wait_gpu is false. I'm planning on removing this argument and act like it is always true, since nothing on the lru list should fail to reserve currently.

- effectively unlimited callchain between some functions that all go through ttm_mem_evict_first:

/------------------------\ ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first \ ttm_bo_handle_move_mem / I'm not surprised that there was a deadlock before, it seems to me it would be pretty suicidal to ever do a blocking reserve on any of those lists, lockdep would be all over you for this.

Also it seems ttm_bo_move_ttm, ttm_bo_move_memcpy and ttm_bo_move_accel_cleanup don't use some of their arguments, so could those be dropped?

~Maarten

Show replies by date

Thomas Hellstrom

11 Oct 11 Oct

4:57 p.m.

On 10/11/2012 04:50 PM, Maarten Lankhorst wrote:

...

I was trying to clean ttm up a little so my changes would be less invasive, and simplify the code for debuggability. During testing I noticed the following weirdnesses:

ttm_mem_evict_first ignores no_wait_gpu if the buffer is on the ddestroy list. If you follow the code, it will effectively spin in ttm_mem_evict_first if a bo is on the list and no_wait_gpu is true.

Yes, you're right. This is a bug. At a first glance it looks like the code should return unconditionally after kref_put().

...

I was working on a commit that removes fence_lock since I was killing off the fence lock, but that requires some kind of defined behavior for this. Unless we leave this in place as expected behavior..

I don't quite follow you? If you mean defined behavior for the fence lock it is that it protects the bo::sync_obj and bo::sync_obj_arg of *all* buffers. It was prevously one lock per buffer. The locking order is that it should be taken before the lru lock, but looking at the code it seems it could be quite simplified the other way around...

Anyway, if you plan to remove the fence lock and protect it with reserve, you must make sure that a waiting reserve is never done in a destruction path. I think this mostly concerns the nvidia driver.

...

no_wait_reserve is ignored if no_wait_gpu is false ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but subsequently it will do a wait_unreserved if no_wait_gpu is false.

I'm planning on removing this argument and act like it is always true, since nothing on the lru list should fail to reserve currently.

Yes, since all buffers that are reserved are removed from the LRU list, there should never be a waiting reserve on them, so no_wait_reserve can be removed from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.

...

effectively unlimited callchain between some functions that all go through ttm_mem_evict_first:
                               /------------------------\
ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first \ ttm_bo_handle_move_mem / I'm not surprised that there was a deadlock before, it seems to me it would be pretty suicidal to ever do a blocking reserve on any of those lists, lockdep would be all over you for this.

Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be a BUG.

But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. But there should be no waiting reserves in the eviction path currently.

...

Also it seems ttm_bo_move_ttm, ttm_bo_move_memcpy and ttm_bo_move_accel_cleanup don't use some of their arguments, so could those be dropped?

Yes, but they are designed so that they could be plugged into the driver::move interface, so if you change the argument list you should do it on driver::move as well, and make sure they all have the same argument list.

...

~Maarten

/Thomas

Maarten Lankhorst

6:42 p.m.

Op 11-10-12 18:57, Thomas Hellstrom schreef:

...

On 10/11/2012 04:50 PM, Maarten Lankhorst wrote:

...
I was trying to clean ttm up a little so my changes would be less invasive, and simplify the code for debuggability. During testing I noticed the following weirdnesses:

ttm_mem_evict_first ignores no_wait_gpu if the buffer is on the ddestroy list. If you follow the code, it will effectively spin in ttm_mem_evict_first if a bo is on the list and no_wait_gpu is true.

Yes, you're right. This is a bug. At a first glance it looks like the code should return unconditionally after kref_put().

...
I was working on a commit that removes fence_lock since I was killing off the fence lock, but that requires some kind of defined behavior for this. Unless we leave this in place as expected behavior..

I don't quite follow you? If you mean defined behavior for the fence lock it is that it protects the bo::sync_obj and bo::sync_obj_arg of *all* buffers. It was prevously one lock per buffer. The locking order is that it should be taken before the lru lock, but looking at the code it seems it could be quite simplified the other way around...

I mean that fence_lock can be killed off if you put slightly more care in ttm_bo_cleanup_refs and ttm_bo_cleanup_refs_or_queue. Those are the only 2 functions in my tree that seem to call ttm_bo_wait without a reservation, so I fixed those up instead.

In ttm_bo_cleanup_refs_or_queue you can simply change the order around, and abort reservation if ttm_bo_wait fails nonblockingly, without touching lru lists.

ttm_bo_cleanup_refs is slightly trickier, but to preserve current behavior I just did this: Take a reservation first, try ttm_bo_wait without blocking, if it returns -EBUSY and no_wait_gpu is unset, take the fence pointer then undo the reservation without touching the lru lists. If it doesn't return -EBUSY, wait succeeded and we should retry the function from the start.

So maybe slightly more complexity in ttm_bo_cleanup_refs (not really, the function was doing behavior similar to this ANYWAY, I just moved things around) and an entire spinlock gone. :)

...

Anyway, if you plan to remove the fence lock and protect it with reserve, you must make sure that a waiting reserve is never done in a destruction path. I think this mostly concerns the nvidia driver.

Well I don't think any lock should ever be held during destruction time,

...

...

no_wait_reserve is ignored if no_wait_gpu is false ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but subsequently it will do a wait_unreserved if no_wait_gpu is false.

I'm planning on removing this argument and act like it is always true, since nothing on the lru list should fail to reserve currently.

Yes, since all buffers that are reserved are removed from the LRU list, there should never be a waiting reserve on them, so no_wait_reserve can be removed from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.

I suppose there will stay a small race though,

...

...
effectively unlimited callchain between some functions that all go through ttm_mem_evict_first:
                               /------------------------\
ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first \ ttm_bo_handle_move_mem / I'm not surprised that there was a deadlock before, it seems to me it would be pretty suicidal to ever do a blocking reserve on any of those lists, lockdep would be all over you for this.
Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be a BUG.

But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. But there should be no waiting reserves in the eviction path currently.

Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, since it seems all the callers of this function assume that ttm_mem_evict_first can only fail if there is really nothing more to free and blocking nested would really upset lockdep and leave you open to the same deadlocks.

An unexpected bonus from adding this skipping would be that the atomic lru_list removal on unreserve guarantee becomes a lot easier to drop while keeping behavior exactly the same. Only the swapout code would need to be adjusted for to try the whole list. Maybe ttm_bo_delayed_delete with remove_all = false too would change behavior slightly too, but this seems to be less likely, as it could only ever happen on delayed destruction of imported dma-buf's.

I'm still mentally debugging the code so there's no final version yet, I'm trying to ensure that the changes would not alter existing behavior.

...

...
Also it seems ttm_bo_move_ttm, ttm_bo_move_memcpy and ttm_bo_move_accel_cleanup don't use some of their arguments, so could those be dropped?

Yes, but they are designed so that they could be plugged into the driver::move interface, so if you change the argument list you should do it on driver::move as well, and make sure they all have the same argument list.

Not true, ttm_bo_move_memcpy has no interruptible parameter, so it's not compatible with driver.move. The drivers that do use this function have set up their own alias that calls that function instead, so I was thinking of dropping those parameters from memcpy since they're unused and the drivers that could point their move member to it don't do it, so no point in having compatibility...

~Maarten

Thomas Hellstrom

7:26 p.m.

On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

...

...
Anyway, if you plan to remove the fence lock and protect it with reserve, you must make sure that a waiting reserve is never done in a destruction path. I think this mostly concerns the nvidia driver.

Well I don't think any lock should ever be held during destruction time,

What I mean is, that *something* needs to protect the fence pointer. Currently it's the fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor Nvidia should, when a resource is about to be freed, be forced to *block* waiting for reserve just to access the fence pointer. When and if you have a solution that fulfills those requirements, I'm ready to review it.

...

...
...

no_wait_reserve is ignored if no_wait_gpu is false ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but subsequently it will do a wait_unreserved if no_wait_gpu is false.

I'm planning on removing this argument and act like it is always true, since nothing on the lru list should fail to reserve currently.

Yes, since all buffers that are reserved are removed from the LRU list, there should never be a waiting reserve on them, so no_wait_reserve can be removed from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.

I suppose there will stay a small race though,

Hmm, where?

...

...
...
effectively unlimited callchain between some functions that all go through ttm_mem_evict_first:
                                /------------------------\
ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first \ ttm_bo_handle_move_mem / I'm not surprised that there was a deadlock before, it seems to me it would be pretty suicidal to ever do a blocking reserve on any of those lists, lockdep would be all over you for this.
Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be a BUG.

But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. But there should be no waiting reserves in the eviction path currently.
Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, since it seems all the callers of this function assume that ttm_mem_evict_first can only fail if there is really nothing more to free and blocking nested would really upset lockdep and leave you open to the same deadlocks.

I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case? (There is a bug in there, though that the reservation is not released if the buffer is no longer on the reservation list here):

if (unlikely(ret != 0) || list_empty(&bo->ddestroy)) { spin_unlock(&glob->lru_lock); return ret; }

...

An unexpected bonus from adding this skipping would be that the atomic lru_list removal on unreserve guarantee becomes a lot easier to drop while keeping behavior exactly the same. Only the swapout code would need to be adjusted for to try the whole list. Maybe ttm_bo_delayed_delete with remove_all = false too would change behavior slightly too, but this seems to be less likely, as it could only ever happen on delayed destruction of imported dma-buf's.

May I kindly remind you that the atomic lru_list removal stays in TTM until we've had a high level design discussion weighing it against an extended reservation object API.

...

Not true, ttm_bo_move_memcpy has no interruptible parameter, so it's not compatible with driver.move. The drivers that do use this function have set up their own alias that calls that function instead, so I was thinking of dropping those parameters from memcpy since they're unused and the drivers that could point their move member to it don't do it, so no point in having compatibility...

OK, then I'm OK with parameter changes in those functions.

...

~Maarten

Thomas Hellstrom

7:35 p.m.

On 10/11/2012 09:26 PM, Thomas Hellstrom wrote:

...

On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

...

I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case? (There is a bug in there, though that the reservation is not released if the buffer is no longer on the reservation list here):

if (unlikely(ret != 0) || list_empty(&bo->ddestroy)) { spin_unlock(&glob->lru_lock); return ret; }

Actually I was looking at older code. That bug is already fixed.

/Thomas

Maarten Lankhorst

8:55 p.m.

Op 11-10-12 21:26, Thomas Hellstrom schreef:

...

On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

...
...
Anyway, if you plan to remove the fence lock and protect it with reserve, you must make sure that a waiting reserve is never done in a destruction path. I think this mostly concerns the nvidia driver.

Well I don't think any lock should ever be held during destruction time,

What I mean is, that *something* needs to protect the fence pointer. Currently it's the fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor Nvidia should, when a resource is about to be freed, be forced to *block* waiting for reserve just to access the fence pointer. When and if you have a solution that fulfills those requirements, I'm ready to review it.

It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around.

...

...
...
...

no_wait_reserve is ignored if no_wait_gpu is false ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but subsequently it will do a wait_unreserved if no_wait_gpu is false.

I'm planning on removing this argument and act like it is always true, since nothing on the lru list should fail to reserve currently.

Yes, since all buffers that are reserved are removed from the LRU list, there should never be a waiting reserve on them, so no_wait_reserve can be removed from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.

I suppose there will stay a small race though,

Hmm, where?

When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

...

...
...
...
effectively unlimited callchain between some functions that all go through ttm_mem_evict_first:
                                /------------------------\
ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first \ ttm_bo_handle_move_mem / I'm not surprised that there was a deadlock before, it seems to me it would be pretty suicidal to ever do a blocking reserve on any of those lists, lockdep would be all over you for this.
Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be a BUG.

But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. But there should be no waiting reserves in the eviction path currently.
Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, since it seems all the callers of this function assume that ttm_mem_evict_first can only fail if there is really nothing more to free and blocking nested would really upset lockdep and leave you open to the same deadlocks.
I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case?

The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why.

...

...
An unexpected bonus from adding this skipping would be that the atomic lru_list removal on unreserve guarantee becomes a lot easier to drop while keeping behavior exactly the same. Only the swapout code would need to be adjusted for to try the whole list. Maybe ttm_bo_delayed_delete with remove_all = false too would change behavior slightly too, but this seems to be less likely, as it could only ever happen on delayed destruction of imported dma-buf's.

May I kindly remind you that the atomic lru_list removal stays in TTM until we've had a high level design discussion weighing it against an extended reservation object API.

I'm aware, but I still wanted to see if it was possible or not in a clean way for testing, so I don't ask for things that would be impossible to do or too involved to do in a safe manner.

Will you make it to UDS-r?

~Maarten

Thomas Hellstrom

12 Oct 12 Oct

5:57 a.m.

On 10/11/2012 10:55 PM, Maarten Lankhorst wrote:

...

Op 11-10-12 21:26, Thomas Hellstrom schreef:

...
On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

...
...
Anyway, if you plan to remove the fence lock and protect it with reserve, you must make sure that a waiting reserve is never done in a destruction path. I think this mostly concerns the nvidia driver.

Well I don't think any lock should ever be held during destruction time,

What I mean is, that *something* needs to protect the fence pointer. Currently it's the fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor Nvidia should, when a resource is about to be freed, be forced to *block* waiting for reserve just to access the fence pointer. When and if you have a solution that fulfills those requirements, I'm ready to review it.

It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around.

Well, I haven't looked into the code in detail yet. If you say it's non-blocking I believe you. I was actually more concerned abut the Nvidia case where IIRC the wait was called both with and without reservation.

...

...
...
...
...

no_wait_reserve is ignored if no_wait_gpu is false ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but subsequently it will do a wait_unreserved if no_wait_gpu is false.

I'm planning on removing this argument and act like it is always true, since nothing on the lru list should fail to reserve currently.

Yes, since all buffers that are reserved are removed from the LRU list, there should never be a waiting reserve on them, so no_wait_reserve can be removed from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.

I suppose there will stay a small race though,

Hmm, where?

When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

Yes, that code isn't fully correct, it's missing a check for still on ddestroy after a waiting reserve. However, the only chance of a waiting reserve given that the buffer *IS* on the ddestroy list is if the current reserver returned early because someone started an accelerated eviction which can't happen currently. The code needs fixing up though.

...

...
...
...
...
effectively unlimited callchain between some functions that all go through ttm_mem_evict_first:
                                 /------------------------\
ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first \ ttm_bo_handle_move_mem / I'm not surprised that there was a deadlock before, it seems to me it would be pretty suicidal to ever do a blocking reserve on any of those lists, lockdep would be all over you for this.
Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be a BUG.

But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. But there should be no waiting reserves in the eviction path currently.
Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, since it seems all the callers of this function assume that ttm_mem_evict_first can only fail if there is really nothing more to free and blocking nested would really upset lockdep and leave you open to the same deadlocks.
I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case?
The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why.

Interesting. I guess that must be because of the previous reservation history for that buffer? Let's say we were to reinitialize the lockdep history for the reservation object when it was put on the ddestroy list, I assume lockdep would keep quiet, because there are never any other bo reservations while such a buffer is reserved?

/Thomas

Maarten Lankhorst

7:49 a.m.

Op 12-10-12 07:57, Thomas Hellstrom schreef:

...

On 10/11/2012 10:55 PM, Maarten Lankhorst wrote:

...
Op 11-10-12 21:26, Thomas Hellstrom schreef:

...
On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

...
...
Anyway, if you plan to remove the fence lock and protect it with reserve, you must make sure that a waiting reserve is never done in a destruction path. I think this mostly concerns the nvidia driver.

Well I don't think any lock should ever be held during destruction time,

What I mean is, that *something* needs to protect the fence pointer. Currently it's the fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor Nvidia should, when a resource is about to be freed, be forced to *block* waiting for reserve just to access the fence pointer. When and if you have a solution that fulfills those requirements, I'm ready to review it.

It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around.

Well, I haven't looked into the code in detail yet. If you say it's non-blocking I believe you. I was actually more concerned abut the Nvidia case where IIRC the wait was called both with and without reservation.

I fixed this up in my tree, there were 2 places where nouveau did this, and I added a WARN_ON if ttm_bo_wait was ever called without reservation.

Another thing that can go wrong is unpinning in destruction path, this will give a lockdep warnings because of reservation <-> lock inversion, but this can easily be fixed by doing nouveau_bo_unpin() before calling drm_gem_object_unreference_unlocked, and it's easy to detect without lockdep too. It might be possible to trigger a deadlock with it, but I didn't try that too hard..

...

...
...
...
...
...

no_wait_reserve is ignored if no_wait_gpu is false ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but subsequently it will do a wait_unreserved if no_wait_gpu is false.

I'm planning on removing this argument and act like it is always true, since nothing on the lru list should fail to reserve currently.

Yes, since all buffers that are reserved are removed from the LRU list, there should never be a waiting reserve on them, so no_wait_reserve can be removed from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.

I suppose there will stay a small race though,

Hmm, where?

When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

Yes, that code isn't fully correct, it's missing a check for still on ddestroy after a waiting reserve. However, the only chance of a waiting reserve given that the buffer *IS* on the ddestroy list is if the current reserver returned early because someone started an accelerated eviction which can't happen currently. The code needs fixing up though.

bo gets put on ddestroy list, delayed destroy handler gets reservation and we try to get a reservation at the same time in ttm_mem_evict_first, losing out.

Unlikely? Yes, but I don't see how it is impossible.

...

...
...
...
...
...
effectively unlimited callchain between some functions that all go through ttm_mem_evict_first:
                                 /------------------------\
ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first \ ttm_bo_handle_move_mem / I'm not surprised that there was a deadlock before, it seems to me it would be pretty suicidal to ever do a blocking reserve on any of those lists, lockdep would be all over you for this.
Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be a BUG.

But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. But there should be no waiting reserves in the eviction path currently.
Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, since it seems all the callers of this function assume that ttm_mem_evict_first can only fail if there is really nothing more to free and blocking nested would really upset lockdep and leave you open to the same deadlocks.
I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case?
The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why.
Interesting. I guess that must be because of the previous reservation history for that buffer? Let's say we were to reinitialize the lockdep history for the reservation object when it was put on the ddestroy list, I assume lockdep would keep quiet, because there are never any other bo reservations while such a buffer is reserved?

This could be a real deadlock if ttm_mem_evict_first was called with an evictable buffer on the ddestroy list that was shared with another device, and another bo that is shared too was held at that time.

Only fix is to either stop doing the blocking, or to disable/incorrect lockdep annotations for this case at the cost of not being able to get lockdep to detect this situation were it to ever occur, since at the moment it is probably not be a triggerable deadlock.

~Maarten

Thomas Hellstrom

15 Oct 15 Oct

12:14 p.m.

On 10/12/2012 09:49 AM, Maarten Lankhorst wrote:

...

...
...
...
...
I suppose there will stay a small race though,

Hmm, where?

When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

Yes, that code isn't fully correct, it's missing a check for still on ddestroy after a waiting reserve. However, the only chance of a waiting reserve given that the buffer *IS* on the ddestroy list is if the current reserver returned early because someone started an accelerated eviction which can't happen currently. The code needs fixing up though.

bo gets put on ddestroy list, delayed destroy handler gets reservation and we try to get a reservation at the same time in ttm_mem_evict_first, losing out.

Unlikely? Yes, but I don't see how it is impossible.

Anyone getting the reservation will remove the buffer from the ddestroy list atomically, That means that another caller that finds the buffer on the ddestroy list will get the reservation without wait. The exeption is if *anyone* gets the reservation without removing it from the ddestroy list. That would trigger catastrophic failure, but I don't think that's possible with the current code.

So yes, the code is broken, but I don't think it breaks anything currently.

/Thomas

Maarten Lankhorst

12 Oct 12 Oct

10:09 a.m.

Op 12-10-12 07:57, Thomas Hellstrom schreef:

...

On 10/11/2012 10:55 PM, Maarten Lankhorst wrote:

...
Op 11-10-12 21:26, Thomas Hellstrom schreef:

...
On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

...
...
Anyway, if you plan to remove the fence lock and protect it with reserve, you must make sure that a waiting reserve is never done in a destruction path. I think this mostly concerns the nvidia driver.

Well I don't think any lock should ever be held during destruction time,

What I mean is, that *something* needs to protect the fence pointer. Currently it's the fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor Nvidia should, when a resource is about to be freed, be forced to *block* waiting for reserve just to access the fence pointer. When and if you have a solution that fulfills those requirements, I'm ready to review it.

It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around.

Well, I haven't looked into the code in detail yet. If you say it's non-blocking I believe you. I was actually more concerned abut the Nvidia case where IIRC the wait was called both with and without reservation.

...
...
...
...
...

no_wait_reserve is ignored if no_wait_gpu is false ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but subsequently it will do a wait_unreserved if no_wait_gpu is false.

I'm planning on removing this argument and act like it is always true, since nothing on the lru list should fail to reserve currently.

Yes, since all buffers that are reserved are removed from the LRU list, there should never be a waiting reserve on them, so no_wait_reserve can be removed from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.

I suppose there will stay a small race though,

Hmm, where?

When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

Yes, that code isn't fully correct, it's missing a check for still on ddestroy after a waiting reserve. However, the only chance of a waiting reserve given that the buffer *IS* on the ddestroy list is if the current reserver returned early because someone started an accelerated eviction which can't happen currently. The code needs fixing up though.

...
...
...
...
...
effectively unlimited callchain between some functions that all go through ttm_mem_evict_first:
                                 /------------------------\
ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first \ ttm_bo_handle_move_mem / I'm not surprised that there was a deadlock before, it seems to me it would be pretty suicidal to ever do a blocking reserve on any of those lists, lockdep would be all over you for this.
Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be a BUG.

But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. But there should be no waiting reserves in the eviction path currently.
Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, since it seems all the callers of this function assume that ttm_mem_evict_first can only fail if there is really nothing more to free and blocking nested would really upset lockdep and leave you open to the same deadlocks.
I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case?
The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why.
Interesting. I guess that must be because of the previous reservation history for that buffer? Let's say we were to reinitialize the lockdep history for the reservation object when it was put on the ddestroy list, I assume lockdep would keep quiet, because there are never any other bo reservations while such a buffer is reserved?

Lockdep works on classes of lock, not necessarily individual locks.

Doing 2 bo_reserve's of any bo would count as possible deadlock, no matter if you always take them in a certain order or not.

To make multi-object reservation work, the fix is to add a ticket "lock" of which all the reservation objects are a nested lock of. Since in this case the ticket lock would prevent deadlocks, this is acceptable. Having 2 ticket 'locks' at the same time would count as deadlock, rightfully. If you hold a reservation from a ticket, then try to reserve without a ticket, it counts as deadlock too. See below for some examples I was using to test.

Since it's counted as a normal lock, normal lockdep rules regarding locking apply, so if you hold a lock then take a reservation, and then do it the other way around it is counted as a potential deadlock.

Also you can't simply reset the history for a single object because it acts on classes of locks, not individual locks. Resetting the state would mean lockdep gets thoroughly confused since it no longer knows about currently held reservations by any task or any cpu, so please don't.

~Maarten

Below are some tests I was doing to show the different kind of things lockdep will catch. I used them to do some selftests on reservation objects' lockdep.

Doing spinlock, ticket, try, block tests, the spinlock tests are inverting lock between spinlock and the other possibilities. Because the reservation is a locktype, those tests will fail, FAILURE that lockdep caught an error. SUCCESS means lockdep thinks what you do is ok:

syntax for object_reserve: int object_reserve(object, interruptible, no_wait, ticket);

static void reservation_test_fail_reserve(void) { struct reservation_ticket t; struct reservation_object o;

reservation_object_init(&o); reservation_ticket_init(&t); t.seqno++;

object_reserve(&o, false, false, &t); /* No lockdep test, pure API */ WARN_ON(object_reserve(&o, false, true, &t) != -EDEADLK); t.seqno--; WARN_ON(object_reserve(&o, false, true, &t) != -EBUSY); t.seqno += 2; WARN_ON(object_reserve(&o, false, true, &t) != -EAGAIN); object_unreserve(&o, NULL);

reservation_ticket_fini(&t); }

static void reservation_test_two_tickets(void) { struct reservation_ticket t, t2;

reservation_ticket_init(&t); reservation_ticket_init(&t2);

reservation_ticket_fini(&t2); reservation_ticket_fini(&t); }

static void reservation_test_ticket_unreserve_twice(void) { struct reservation_ticket t;

reservation_ticket_init(&t); reservation_ticket_fini(&t); reservation_ticket_fini(&t); }

static void reservation_test_object_unreserve_twice(void) { struct reservation_object o;

reservation_object_init(&o); object_reserve(&o, false, false, NULL); object_unreserve(&o, NULL); object_unreserve(&o, NULL); }

static void reservation_test_fence_nest_unreserved(void) { struct reservation_object o;

reservation_object_init(&o);

spin_lock_nest_lock(&lock_A, &o); spin_unlock(&lock_A); }

static void reservation_test_ticket_block(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2); reservation_ticket_init(&t);

object_reserve(&o, false, false, &t); object_reserve(&o2, false, false, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, &t);

reservation_ticket_fini(&t); }

static void reservation_test_ticket_try(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2); reservation_ticket_init(&t);

object_reserve(&o, false, false, &t); object_reserve(&o2, false, true, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, &t);

reservation_ticket_fini(&t); }

static void reservation_test_ticket_ticket(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2); reservation_ticket_init(&t);

object_reserve(&o, false, false, &t); object_reserve(&o2, false, false, &t); object_unreserve(&o2, &t); object_unreserve(&o, &t);

reservation_ticket_fini(&t); }

static void reservation_test_try_block(void) { struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, true, NULL); object_reserve(&o2, false, false, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, NULL); }

static void reservation_test_try_try(void) { struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, true, NULL); object_reserve(&o2, false, true, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, NULL); }

static void reservation_test_try_ticket(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, true, NULL); reservation_ticket_init(&t);

object_reserve(&o2, false, false, &t); object_unreserve(&o2, &t); object_unreserve(&o, NULL);

reservation_ticket_fini(&t); }

static void reservation_test_block_block(void) { struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, false, NULL); object_reserve(&o2, false, false, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, NULL); }

static void reservation_test_block_try(void) { struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, false, NULL); object_reserve(&o2, false, true, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, NULL); }

static void reservation_test_block_ticket(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, false, NULL); reservation_ticket_init(&t);

object_reserve(&o2, false, false, &t); object_unreserve(&o2, &t); object_unreserve(&o, NULL);

reservation_ticket_fini(&t); }

static void reservation_test_fence_block(void) { struct reservation_object o;

reservation_object_init(&o); spin_lock(&lock_A); spin_unlock(&lock_A);

object_reserve(&o, false, false, NULL); spin_lock(&lock_A); spin_unlock(&lock_A); object_unreserve(&o, NULL);

spin_lock(&lock_A); object_reserve(&o, false, false, NULL); object_unreserve(&o, NULL); spin_unlock(&lock_A); }

static void reservation_test_fence_try(void) { struct reservation_object o;

reservation_object_init(&o); spin_lock(&lock_A); spin_unlock(&lock_A);

object_reserve(&o, false, true, NULL); spin_lock(&lock_A); spin_unlock(&lock_A); object_unreserve(&o, NULL);

spin_lock(&lock_A); object_reserve(&o, false, true, NULL); object_unreserve(&o, NULL); spin_unlock(&lock_A); }

static void reservation_test_fence_ticket(void) { struct reservation_ticket t; struct reservation_object o;

reservation_object_init(&o); spin_lock(&lock_A); spin_unlock(&lock_A);

reservation_ticket_init(&t);

object_reserve(&o, false, false, &t); spin_lock(&lock_A); spin_unlock(&lock_A); object_unreserve(&o, &t);

spin_lock(&lock_A); object_reserve(&o, false, false, &t); object_unreserve(&o, &t); spin_unlock(&lock_A);

reservation_ticket_fini(&t); }

static void reservation_tests(void) { printk(" --------------------------------------------------------------------------\n"); printk(" | Reservation tests |\n"); printk(" ---------------------\n");

print_testname("reservation api failures"); dotest(reservation_test_fail_reserve, SUCCESS, LOCKTYPE_RESERVATION); printk("\n");

print_testname("reserving two tickets"); dotest(reservation_test_two_tickets, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("unreserve ticket twice"); dotest(reservation_test_ticket_unreserve_twice, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("unreserve object twice"); dotest(reservation_test_object_unreserve_twice, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("spinlock nest unreserved"); dotest(reservation_test_fence_nest_unreserved, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

printk(" -----------------------------------------------------\n"); printk(" |block | try |ticket|\n"); printk(" -----------------------------------------------------\n");

print_testname("ticket"); dotest(reservation_test_ticket_block, FAILURE, LOCKTYPE_RESERVATION); dotest(reservation_test_ticket_try, SUCCESS, LOCKTYPE_RESERVATION); dotest(reservation_test_ticket_ticket, SUCCESS, LOCKTYPE_RESERVATION); printk("\n");

print_testname("try"); dotest(reservation_test_try_block, FAILURE, LOCKTYPE_RESERVATION); dotest(reservation_test_try_try, SUCCESS, LOCKTYPE_RESERVATION); dotest(reservation_test_try_ticket, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("block"); dotest(reservation_test_block_block, FAILURE, LOCKTYPE_RESERVATION); dotest(reservation_test_block_try, SUCCESS, LOCKTYPE_RESERVATION); dotest(reservation_test_block_ticket, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("spinlock"); dotest(reservation_test_fence_block, FAILURE, LOCKTYPE_RESERVATION); dotest(reservation_test_fence_try, SUCCESS, LOCKTYPE_RESERVATION); dotest(reservation_test_fence_ticket, FAILURE, LOCKTYPE_RESERVATION); printk("\n"); }

Thomas Hellstrom

15 Oct 15 Oct

12:27 p.m.

On 10/12/2012 12:09 PM, Maarten Lankhorst wrote:

...

Op 12-10-12 07:57, Thomas Hellstrom schreef:

...
On 10/11/2012 10:55 PM, Maarten Lankhorst wrote:

...
Op 11-10-12 21:26, Thomas Hellstrom schreef:

...
On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

...
...
Anyway, if you plan to remove the fence lock and protect it with reserve, you must make sure that a waiting reserve is never done in a destruction path. I think this mostly concerns the nvidia driver.

Well I don't think any lock should ever be held during destruction time,

What I mean is, that *something* needs to protect the fence pointer. Currently it's the fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor Nvidia should, when a resource is about to be freed, be forced to *block* waiting for reserve just to access the fence pointer. When and if you have a solution that fulfills those requirements, I'm ready to review it.

It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around.

Well, I haven't looked into the code in detail yet. If you say it's non-blocking I believe you. I was actually more concerned abut the Nvidia case where IIRC the wait was called both with and without reservation.

...
...
...
...
> - no_wait_reserve is ignored if no_wait_gpu is false > ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but > subsequently it will do a wait_unreserved if no_wait_gpu is false. > I'm planning on removing this argument and act like it is always true, since > nothing on the lru list should fail to reserve currently. Yes, since all buffers that are reserved are removed from the LRU list, there should never be a waiting reserve on them, so no_wait_reserve can be removed from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.

I suppose there will stay a small race though,

Hmm, where?

When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

Yes, that code isn't fully correct, it's missing a check for still on ddestroy after a waiting reserve. However, the only chance of a waiting reserve given that the buffer *IS* on the ddestroy list is if the current reserver returned early because someone started an accelerated eviction which can't happen currently. The code needs fixing up though.

...
...
...
...
> - effectively unlimited callchain between some functions that all go through > ttm_mem_evict_first: > > /------------------------\ > ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first > \ ttm_bo_handle_move_mem / > I'm not surprised that there was a deadlock before, it seems to me it would > be pretty suicidal to ever do a blocking reserve on any of those lists, > lockdep would be all over you for this. Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be a BUG.

But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. But there should be no waiting reserves in the eviction path currently.

Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, since it seems all the callers of this function assume that ttm_mem_evict_first can only fail if there is really nothing more to free and blocking nested would really upset lockdep and leave you open to the same deadlocks.

I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case?

The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why.

Interesting. I guess that must be because of the previous reservation history for that buffer? Let's say we were to reinitialize the lockdep history for the reservation object when it was put on the ddestroy list, I assume lockdep would keep quiet, because there are never any other bo reservations while such a buffer is reserved?

Lockdep works on classes of lock, not necessarily individual locks.

Doing 2 bo_reserve's of any bo would count as possible deadlock, no matter if you always take them in a certain order or not.

So you mean that if I bo_reserve A and then bo_reserve B (which is used only when binding A to the GPU), lockdep will complain even if nobody ever bo_reserves B before A? That will make it impossible to use BOs as page tables for GPU binding for example.

...

To make multi-object reservation work, the fix is to add a ticket "lock" of which all the reservation objects are a nested lock of. Since in this case the ticket lock would prevent deadlocks, this is acceptable. Having 2 ticket 'locks' at the same time would count as deadlock, rightfully. If you hold a reservation from a ticket, then try to reserve without a ticket, it counts as deadlock too. See below for some examples I was using to test.

But if a ticket lock can be used to abstract a locking sequence that we know will never deadlock, why can't we abstract locking from a list with atomic list removal with another "lock", and make sure we get the order right between them?

...

Since it's counted as a normal lock, normal lockdep rules regarding locking apply, so if you hold a lock then take a reservation, and then do it the other way around it is counted as a potential deadlock.

Also you can't simply reset the history for a single object because it acts on classes of locks, not individual locks. Resetting the state would mean lockdep gets thoroughly confused since it no longer knows about currently held reservations by any task or any cpu, so please don't.

~Maarten

Below are some tests I was doing to show the different kind of things lockdep will catch. I used them to do some selftests on reservation objects' lockdep.

I'll take a look and see if I can digest this :)

/Thomas

...

Doing spinlock, ticket, try, block tests, the spinlock tests are inverting lock between spinlock and the other possibilities. Because the reservation is a locktype, those tests will fail, FAILURE that lockdep caught an error. SUCCESS means lockdep thinks what you do is ok:

syntax for object_reserve: int object_reserve(object, interruptible, no_wait, ticket);

static void reservation_test_fail_reserve(void) { struct reservation_ticket t; struct reservation_object o;

reservation_object_init(&o); reservation_ticket_init(&t); t.seqno++;

object_reserve(&o, false, false, &t); /* No lockdep test, pure API */ WARN_ON(object_reserve(&o, false, true, &t) != -EDEADLK); t.seqno--; WARN_ON(object_reserve(&o, false, true, &t) != -EBUSY); t.seqno += 2; WARN_ON(object_reserve(&o, false, true, &t) != -EAGAIN); object_unreserve(&o, NULL);

reservation_ticket_fini(&t); }

static void reservation_test_two_tickets(void) { struct reservation_ticket t, t2;

reservation_ticket_init(&t); reservation_ticket_init(&t2);

reservation_ticket_fini(&t2); reservation_ticket_fini(&t); }

static void reservation_test_ticket_unreserve_twice(void) { struct reservation_ticket t;

reservation_ticket_init(&t); reservation_ticket_fini(&t); reservation_ticket_fini(&t); }

static void reservation_test_object_unreserve_twice(void) { struct reservation_object o;

reservation_object_init(&o); object_reserve(&o, false, false, NULL); object_unreserve(&o, NULL); object_unreserve(&o, NULL); }

static void reservation_test_fence_nest_unreserved(void) { struct reservation_object o;

reservation_object_init(&o);

spin_lock_nest_lock(&lock_A, &o); spin_unlock(&lock_A); }

static void reservation_test_ticket_block(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2); reservation_ticket_init(&t);

object_reserve(&o, false, false, &t); object_reserve(&o2, false, false, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, &t);

reservation_ticket_fini(&t); }

static void reservation_test_ticket_try(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2); reservation_ticket_init(&t);

object_reserve(&o, false, false, &t); object_reserve(&o2, false, true, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, &t);

reservation_ticket_fini(&t); }

static void reservation_test_ticket_ticket(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2); reservation_ticket_init(&t);

object_reserve(&o, false, false, &t); object_reserve(&o2, false, false, &t); object_unreserve(&o2, &t); object_unreserve(&o, &t);

reservation_ticket_fini(&t); }

static void reservation_test_try_block(void) { struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, true, NULL); object_reserve(&o2, false, false, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, NULL); }

static void reservation_test_try_try(void) { struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, true, NULL); object_reserve(&o2, false, true, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, NULL); }

static void reservation_test_try_ticket(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, true, NULL); reservation_ticket_init(&t);

object_reserve(&o2, false, false, &t); object_unreserve(&o2, &t); object_unreserve(&o, NULL);

reservation_ticket_fini(&t); }

static void reservation_test_block_block(void) { struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, false, NULL); object_reserve(&o2, false, false, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, NULL); }

static void reservation_test_block_try(void) { struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, false, NULL); object_reserve(&o2, false, true, NULL); object_unreserve(&o2, NULL); object_unreserve(&o, NULL); }

static void reservation_test_block_ticket(void) { struct reservation_ticket t; struct reservation_object o, o2;

reservation_object_init(&o); reservation_object_init(&o2);

object_reserve(&o, false, false, NULL); reservation_ticket_init(&t);

object_reserve(&o2, false, false, &t); object_unreserve(&o2, &t); object_unreserve(&o, NULL);

reservation_ticket_fini(&t); }

static void reservation_test_fence_block(void) { struct reservation_object o;

reservation_object_init(&o); spin_lock(&lock_A); spin_unlock(&lock_A);

object_reserve(&o, false, false, NULL); spin_lock(&lock_A); spin_unlock(&lock_A); object_unreserve(&o, NULL);

spin_lock(&lock_A); object_reserve(&o, false, false, NULL); object_unreserve(&o, NULL); spin_unlock(&lock_A); }

static void reservation_test_fence_try(void) { struct reservation_object o;

reservation_object_init(&o); spin_lock(&lock_A); spin_unlock(&lock_A);

object_reserve(&o, false, true, NULL); spin_lock(&lock_A); spin_unlock(&lock_A); object_unreserve(&o, NULL);

spin_lock(&lock_A); object_reserve(&o, false, true, NULL); object_unreserve(&o, NULL); spin_unlock(&lock_A); }

static void reservation_test_fence_ticket(void) { struct reservation_ticket t; struct reservation_object o;

reservation_object_init(&o); spin_lock(&lock_A); spin_unlock(&lock_A);

reservation_ticket_init(&t);

object_reserve(&o, false, false, &t); spin_lock(&lock_A); spin_unlock(&lock_A); object_unreserve(&o, &t);

spin_lock(&lock_A); object_reserve(&o, false, false, &t); object_unreserve(&o, &t); spin_unlock(&lock_A);

reservation_ticket_fini(&t); }

static void reservation_tests(void) { printk(" --------------------------------------------------------------------------\n"); printk(" | Reservation tests |\n"); printk(" ---------------------\n");

print_testname("reservation api failures"); dotest(reservation_test_fail_reserve, SUCCESS, LOCKTYPE_RESERVATION); printk("\n");

print_testname("reserving two tickets"); dotest(reservation_test_two_tickets, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("unreserve ticket twice"); dotest(reservation_test_ticket_unreserve_twice, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("unreserve object twice"); dotest(reservation_test_object_unreserve_twice, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("spinlock nest unreserved"); dotest(reservation_test_fence_nest_unreserved, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

printk(" -----------------------------------------------------\n"); printk(" |block | try |ticket|\n"); printk(" -----------------------------------------------------\n");

print_testname("ticket"); dotest(reservation_test_ticket_block, FAILURE, LOCKTYPE_RESERVATION); dotest(reservation_test_ticket_try, SUCCESS, LOCKTYPE_RESERVATION); dotest(reservation_test_ticket_ticket, SUCCESS, LOCKTYPE_RESERVATION); printk("\n");

print_testname("try"); dotest(reservation_test_try_block, FAILURE, LOCKTYPE_RESERVATION); dotest(reservation_test_try_try, SUCCESS, LOCKTYPE_RESERVATION); dotest(reservation_test_try_ticket, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("block"); dotest(reservation_test_block_block, FAILURE, LOCKTYPE_RESERVATION); dotest(reservation_test_block_try, SUCCESS, LOCKTYPE_RESERVATION); dotest(reservation_test_block_ticket, FAILURE, LOCKTYPE_RESERVATION); printk("\n");

print_testname("spinlock"); dotest(reservation_test_fence_block, FAILURE, LOCKTYPE_RESERVATION); dotest(reservation_test_fence_try, SUCCESS, LOCKTYPE_RESERVATION); dotest(reservation_test_fence_ticket, FAILURE, LOCKTYPE_RESERVATION); printk("\n"); }

Maarten Lankhorst

3:37 p.m.

Op 15-10-12 14:27, Thomas Hellstrom schreef:

...

On 10/12/2012 12:09 PM, Maarten Lankhorst wrote:

...
Op 12-10-12 07:57, Thomas Hellstrom schreef:

...
On 10/11/2012 10:55 PM, Maarten Lankhorst wrote:

...
Op 11-10-12 21:26, Thomas Hellstrom schreef:

...
On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

...
> Anyway, if you plan to remove the fence lock and protect it with reserve, you must > make sure that a waiting reserve is never done in a destruction path. I think this > mostly concerns the nvidia driver. Well I don't think any lock should ever be held during destruction time,

What I mean is, that *something* needs to protect the fence pointer. Currently it's the fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor Nvidia should, when a resource is about to be freed, be forced to *block* waiting for reserve just to access the fence pointer. When and if you have a solution that fulfills those requirements, I'm ready to review it.

It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around.

Well, I haven't looked into the code in detail yet. If you say it's non-blocking I believe you. I was actually more concerned abut the Nvidia case where IIRC the wait was called both with and without reservation.

...
...
...
>> - no_wait_reserve is ignored if no_wait_gpu is false >> ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but >> subsequently it will do a wait_unreserved if no_wait_gpu is false. >> I'm planning on removing this argument and act like it is always true, since >> nothing on the lru list should fail to reserve currently. > Yes, since all buffers that are reserved are removed from the LRU list, there > should never be a waiting reserve on them, so no_wait_reserve can be removed > from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain. I suppose there will stay a small race though,

Hmm, where?

When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

Yes, that code isn't fully correct, it's missing a check for still on ddestroy after a waiting reserve. However, the only chance of a waiting reserve given that the buffer *IS* on the ddestroy list is if the current reserver returned early because someone started an accelerated eviction which can't happen currently. The code needs fixing up though.

...
...
...
>> - effectively unlimited callchain between some functions that all go through >> ttm_mem_evict_first: >> >> /------------------------\ >> ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first >> \ ttm_bo_handle_move_mem / >> I'm not surprised that there was a deadlock before, it seems to me it would >> be pretty suicidal to ever do a blocking reserve on any of those lists, >> lockdep would be all over you for this. > Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth > and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. > What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted > to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be > a BUG. > > But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. > But there should be no waiting reserves in the eviction path currently. Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, since it seems all the callers of this function assume that ttm_mem_evict_first can only fail if there is really nothing more to free and blocking nested would really upset lockdep and leave you open to the same deadlocks.

I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case?

The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why.

Interesting. I guess that must be because of the previous reservation history for that buffer? Let's say we were to reinitialize the lockdep history for the reservation object when it was put on the ddestroy list, I assume lockdep would keep quiet, because there are never any other bo reservations while such a buffer is reserved?

Lockdep works on classes of lock, not necessarily individual locks.

Doing 2 bo_reserve's of any bo would count as possible deadlock, no matter if you always take them in a certain order or not.

So you mean that if I bo_reserve A and then bo_reserve B (which is used only when binding A to the GPU), lockdep will complain even if nobody ever bo_reserves B before A? That will make it impossible to use BOs as page tables for GPU binding for example.

As far as I tell can nobody does it like that, page tables are simply initialized on channel creation, pinned in memory and kept like that while the host serializes with their own vm locking.

...

...
To make multi-object reservation work, the fix is to add a ticket "lock" of which all the reservation objects are a nested lock of. Since in this case the ticket lock would prevent deadlocks, this is acceptable. Having 2 ticket 'locks' at the same time would count as deadlock, rightfully. If you hold a reservation from a ticket, then try to reserve without a ticket, it counts as deadlock too. See below for some examples I was using to test.

But if a ticket lock can be used to abstract a locking sequence that we know will never deadlock, why can't we abstract locking from a list with atomic list removal with another "lock", and make sure we get the order right between them?

No, see the test below, lockdep won't be fooled by your diversions that easily!! :-)

However, if the following rules are valid, life becomes a lot easier: 1. items on the ddestroy list are not allowed to have a new sync object attached, only reservations for destruction of this object are allowed. 2. the only valid thing at this point left is unmapping from vm space and destruction of buffer backing in memory or vram 3. All reservations outside of ttm_bo_cleanup_refs(or_queue) are forbidden. (XXX: check if an exception needed for drivers to accomplish this destruction?)

Adding a WARN_ON(!list_empty(&bo->ddestroy)); to ttm_bo_reserve and ttm_eu_reserve_buffers should be enough to catch violators of the above rules.

If those rules hold, destruction becomes a lot more straightforward. ttm_bo_swapout and ttm_mem_evict_first can both do the reserve with lru_lock held, which will be guaranteed to succeed, and the buffer is also guaranteed to be on the ddestroy list still since we haven't dropped the lock yet.

So with a reservation, lru_lock held, and entry on the ddestroy list still alive:

Do a trywait on the bo, if it returns -EBUSY, and no_wait_gpu is unset, get a reference to bo->sync_obj. unreserve.

If -EBUSY was returned and no_wait_gpu is set, unlock lru_lock and return error.

If -EBUSY and no_wait_gpu was not set, unlock lru_lock, do a wait on our copy of bo->sync_obj, and unref it, return if wait fails. If wait hasn't failed, retake lru_lock.

Check if the if the ddestroy entry is gone. If so, someone else was faster, return 0. If not simply remove bo from all lists without reservation, unref bo->sync_obj**, and perform the remaining cleanup.

As far as I can see, this wouldn't leave open a race even if 2 threads do the same, although the second thread might return 0 to signal success before the backing is gone, but this can also happen right now. It's even harmless, since in those cases the functions calling these functions would simply call them one more time, and this time the destroyed buffer would definitely not be on the swap/lru lists any more.

It would also keep current behavior the same as far as I can tell, where multiple waiters could wait for the same bo to be destroyed.

**) Yes strictly speaking a violation of fence rules, but in this case the buffer already dropped to 0 references, and there's a BUG_ON in ttm_bo_release_list that would get triggered otherwise. Can alternatively be fixed by doing a BUG_ON in ttm_bo_release_list if there is a sync_obj that's not in the signaled state.

The test I mentioned:

...

...
static void reservation_test_block_ticket(void) { struct reservation_ticket t; struct reservation_object o, o2;
reservation_object_init(&o);
reservation_object_init(&o2);

object_reserve(&o, false, false, NULL);
reservation_ticket_init(&t);

object_reserve(&o2, false, false, &t);
object_unreserve(&o2, &t);
object_unreserve(&o, NULL);

reservation_ticket_fini(&t);
}

(snip)

...

dotest(reservation_test_block_ticket, FAILURE, LOCKTYPE_RESERVATION);

Maarten Lankhorst

6:34 p.m.

Op 15-10-12 17:37, Maarten Lankhorst schreef:

...

...
...
To make multi-object reservation work, the fix is to add a ticket "lock" of which all the reservation objects are a nested lock of. Since in this case the ticket lock would prevent deadlocks, this is acceptable. Having 2 ticket 'locks' at the same time would count as deadlock, rightfully. If you hold a reservation from a ticket, then try to reserve without a ticket, it counts as deadlock too. See below for some examples I was using to test.

But if a ticket lock can be used to abstract a locking sequence that we know will never deadlock, why can't we abstract locking from a list with atomic list removal with another "lock", and make sure we get the order right between them?

No, see the test below, lockdep won't be fooled by your diversions that easily!! :-)

However, if the following rules are valid, life becomes a lot easier:

items on the ddestroy list are not allowed to have a new sync object attached, only reservations for destruction of this object are allowed.

the only valid thing at this point left is unmapping from vm space and destruction of buffer backing in memory or vram

All reservations outside of ttm_bo_cleanup_refs(or_queue) are forbidden. (XXX: check if an exception needed for drivers to accomplish this destruction?)

Adding a WARN_ON(!list_empty(&bo->ddestroy)); to ttm_bo_reserve and ttm_eu_reserve_buffers should be enough to catch violators of the above rules.

If those rules hold, destruction becomes a lot more straightforward. ttm_bo_swapout and ttm_mem_evict_first can both do the reserve with lru_lock held, which will be guaranteed to succeed, and the buffer is also guaranteed to be on the ddestroy list still since we haven't dropped the lock yet.

So with a reservation, lru_lock held, and entry on the ddestroy list still alive:

Do a trywait on the bo, if it returns -EBUSY, and no_wait_gpu is unset, get a reference to bo->sync_obj. unreserve.

If -EBUSY was returned and no_wait_gpu is set, unlock lru_lock and return error.

If -EBUSY and no_wait_gpu was not set, unlock lru_lock, do a wait on our copy of bo->sync_obj, and unref it, return if wait fails. If wait hasn't failed, retake lru_lock.

Check if the if the ddestroy entry is gone. If so, someone else was faster, return 0. If not simply remove bo from all lists without reservation, unref bo->sync_obj**, and perform the remaining cleanup.

As far as I can see, this wouldn't leave open a race even if 2 threads do the same, although the second thread might return 0 to signal success before the backing is gone, but this can also happen right now. It's even harmless, since in those cases the functions calling these functions would simply call them one more time, and this time the destroyed buffer would definitely not be on the swap/lru lists any more.

It would also keep current behavior the same as far as I can tell, where multiple waiters could wait for the same bo to be destroyed.

**) Yes strictly speaking a violation of fence rules, but in this case the buffer already dropped to 0 references, and there's a BUG_ON in ttm_bo_release_list that would get triggered otherwise. Can alternatively be fixed by doing a BUG_ON in ttm_bo_release_list if there is a sync_obj that's not in the signaled state.

Attached 3 followup patches to show what I mean, they're based on my tree, which means with all patches applied that I posted on friday. This is not thoroughly tested, but I think it should work.

First is fixing radeon not to crash on calling move_notify from release_list. Second moves the memory cleanup to ttm_bo_cleanup_memtype_use, which is more readable and a more logical place to put it. Third cleans up how ttm_bo_cleanup_refs is called.

Also available on http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

If this looks ok I'll send those below out when the rest of the patches I posted on friday are reviewed. :-)

======

drm/radeon: allow move_notify to be called without reservation

The few places that care should have those checks instead. This allow destruction of bo backed memory without a reservation.

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com --- drivers/gpu/drm/radeon/radeon_gart.c | 1 - drivers/gpu/drm/radeon/radeon_object.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 0ee5e29..701b215 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -1044,7 +1044,6 @@ void radeon_vm_bo_invalidate(struct radeon_device *rdev, { struct radeon_bo_va *bo_va;

- BUG_ON(!radeon_bo_is_reserved(bo)); list_for_each_entry(bo_va, &bo->va, bo_list) { bo_va->valid = false; } diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 5b959b6..fa954d7 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -539,7 +539,7 @@ void radeon_bo_get_tiling_flags(struct radeon_bo *bo, int radeon_bo_check_tiling(struct radeon_bo *bo, bool has_moved, bool force_drop) { - BUG_ON(!radeon_bo_is_reserved(bo)); + BUG_ON(!radeon_bo_is_reserved(bo) && !force_drop);

if (!(bo->tiling_flags & RADEON_TILING_SURFACE)) return 0;

======

drm/ttm: remove ttm_bo_cleanup_memtype_use

move to release_list instead

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com --- drivers/gpu/drm/ttm/ttm_bo.c | 45 ++++++++++++------------------------------ 1 file changed, 13 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index dd6a3e6..9b95e96 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -146,8 +146,16 @@ static void ttm_bo_release_list(struct kref *list_kref) BUG_ON(!list_empty(&bo->lru)); BUG_ON(!list_empty(&bo->ddestroy));

- if (bo->ttm) + if (bo->bdev->driver->move_notify) + bo->bdev->driver->move_notify(bo, NULL); + + if (bo->ttm) { + ttm_tt_unbind(bo->ttm); ttm_tt_destroy(bo->ttm); + bo->ttm = NULL; + } + ttm_bo_mem_put(bo, &bo->mem); + atomic_dec(&bo->glob->bo_count); if (bo->destroy) bo->destroy(bo); @@ -465,35 +473,6 @@ out_err: return ret; }

-/** - * Call bo::reserved. - * Will release GPU memory type usage on destruction. - * This is the place to put in driver specific hooks to release - * driver private resources. - * Will release the bo::reserved lock. - */ - -static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo) -{ - if (bo->bdev->driver->move_notify) - bo->bdev->driver->move_notify(bo, NULL); - - if (bo->ttm) { - ttm_tt_unbind(bo->ttm); - ttm_tt_destroy(bo->ttm); - bo->ttm = NULL; - } - ttm_bo_mem_put(bo, &bo->mem); - - atomic_set(&bo->reserved, 0); - - /* - * Make processes trying to reserve really pick it up. - */ - smp_mb__after_atomic_dec(); - wake_up_all(&bo->event_queue); -} - static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) { struct ttm_bo_device *bdev = bo->bdev; @@ -517,8 +496,9 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)

put_count = ttm_bo_del_from_lru(bo);

+ atomic_set(&bo->reserved, 0); + wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock); - ttm_bo_cleanup_memtype_use(bo);

ttm_bo_list_ref_sub(bo, put_count, true);

@@ -608,8 +588,9 @@ retry: list_del_init(&bo->ddestroy); ++put_count;

+ atomic_set(&bo->reserved, 0); + wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock); - ttm_bo_cleanup_memtype_use(bo);

ttm_bo_list_ref_sub(bo, put_count, true);

======

drm/ttm: add sense to ttm_bo_cleanup_refs

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com --- drivers/gpu/drm/ttm/ttm_bo.c | 98 ++++++++++++++------------------ drivers/gpu/drm/ttm/ttm_execbuf_util.c | 1 2 files changed, 45 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 9b95e96..34d4bb1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -295,6 +295,8 @@ int ttm_bo_reserve(struct ttm_buffer_object *bo, int put_count = 0; int ret;

+ WARN_ON(!list_empty_careful(&bo->ddestroy)); + spin_lock(&glob->lru_lock); ret = ttm_bo_reserve_locked(bo, interruptible, no_wait, use_sequence, sequence); @@ -522,14 +524,15 @@ queue: * If bo idle, remove from delayed- and lru lists, and unref. * If not idle, do nothing. * + * Must be called with lru_lock and reservation held, this function + * will drop both before returning. + * * @interruptible Any sleeps should occur interruptibly. - * @no_wait_reserve Never wait for reserve. Return -EBUSY instead. * @no_wait_gpu Never wait for gpu. Return -EBUSY instead. */

static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, bool interruptible, - bool no_wait_reserve, bool no_wait_gpu) { struct ttm_bo_device *bdev = bo->bdev; @@ -537,53 +540,45 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, struct ttm_bo_global *glob = bo->glob; int put_count; int ret = 0; - void *sync_obj; - -retry: - spin_lock(&glob->lru_lock);

- ret = ttm_bo_reserve_locked(bo, interruptible, - no_wait_reserve, false, 0); - - if (unlikely(ret != 0)) { - spin_unlock(&glob->lru_lock); - return ret; - } + ret = ttm_bo_wait(bo, false, false, true);

- if (unlikely(list_empty(&bo->ddestroy))) { + if (ret && no_wait_gpu) { atomic_set(&bo->reserved, 0); wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock); - return 0; - } - ret = ttm_bo_wait(bo, false, false, true); - - if (ret) { - if (no_wait_gpu) { - atomic_set(&bo->reserved, 0); - wake_up_all(&bo->event_queue); - spin_unlock(&glob->lru_lock); - return ret; - } + return ret; + } else if (ret) { + void *sync_obj;

/** - * Take a reference to the fence and unreserve, if the wait - * was succesful and no new sync_obj was attached, - * ttm_bo_wait in retry will return ret = 0, and end the loop. + * Take a reference to the fence and unreserve, + * at this point the buffer should be dead, so + * no new sync objects can be attached. */ - sync_obj = driver->sync_obj_ref(&bo->sync_obj); atomic_set(&bo->reserved, 0); wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock);

- ret = driver->sync_obj_wait(bo->sync_obj, false, interruptible); + ret = driver->sync_obj_wait(sync_obj, false, interruptible); driver->sync_obj_unref(&sync_obj); if (ret) return ret; - goto retry; + spin_lock(&glob->lru_lock); + } else { + atomic_set(&bo->reserved, 0); + wake_up_all(&bo->event_queue); + } + + if (unlikely(list_empty(&bo->ddestroy))) { + spin_unlock(&glob->lru_lock); + return 0; }

+ if (bo->sync_obj) + driver->sync_obj_unref(&bo->sync_obj); + put_count = ttm_bo_del_from_lru(bo); list_del_init(&bo->ddestroy); ++put_count; @@ -625,9 +620,12 @@ static int ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all) kref_get(&nentry->list_kref); }

- spin_unlock(&glob->lru_lock); - ret = ttm_bo_cleanup_refs(entry, false, !remove_all, - !remove_all); + ret = ttm_bo_reserve_locked(entry, false, !remove_all); + if (ret) + spin_unlock(&glob->lru_lock); + else + ret = ttm_bo_cleanup_refs(entry, false, !remove_all); + kref_put(&entry->list_kref, ttm_bo_release_list); entry = nentry;

@@ -778,18 +776,6 @@ retry: bo = list_first_entry(&man->lru, struct ttm_buffer_object, lru); kref_get(&bo->list_kref);

- if (!list_empty(&bo->ddestroy)) { - spin_unlock(&glob->lru_lock); - ret = ttm_bo_cleanup_refs(bo, interruptible, - true, no_wait_gpu); - kref_put(&bo->list_kref, ttm_bo_release_list); - - if (likely(ret == 0 || ret == -ERESTARTSYS)) - return ret; - - goto retry; - } - ret = ttm_bo_reserve_locked(bo, false, true, false, 0);

if (unlikely(ret == -EBUSY)) { @@ -808,6 +794,12 @@ retry: goto retry; }

+ if (!list_empty(&bo->ddestroy)) { + ret = ttm_bo_cleanup_refs(bo, interruptible, no_wait_gpu); + kref_put(&bo->list_kref, ttm_bo_release_list); + return ret; + } + put_count = ttm_bo_del_from_lru(bo); spin_unlock(&glob->lru_lock);

@@ -1718,14 +1710,6 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink) struct ttm_buffer_object, swap); kref_get(&bo->list_kref);

- if (!list_empty(&bo->ddestroy)) { - spin_unlock(&glob->lru_lock); - (void) ttm_bo_cleanup_refs(bo, false, false, false); - kref_put(&bo->list_kref, ttm_bo_release_list); - spin_lock(&glob->lru_lock); - continue; - } - /** * Reserve buffer. Since we unlock while sleeping, we need * to re-check that nobody removed us from the swap-list while @@ -1741,6 +1725,12 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink) } }

+ if (!list_empty(&bo->ddestroy)) { + ret = ttm_bo_cleanup_refs(bo, false, false); + kref_put(&bo->list_kref, ttm_bo_release_list); + return ret; + } + BUG_ON(ret != 0); put_count = ttm_bo_del_from_lru(bo); spin_unlock(&glob->lru_lock); diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c index 51b5e97..3ea2457 100644 --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c @@ -153,6 +153,7 @@ retry: struct ttm_buffer_object *bo = entry->bo;

retry_this_bo: + WARN_ON(!list_empty_careful(&bo->ddestroy)); ret = ttm_bo_reserve_locked(bo, true, true, true, val_seq); switch (ret) { case 0:

Jerome Glisse

7:44 p.m.

On Mon, Oct 15, 2012 at 08:34:17PM +0200, Maarten Lankhorst wrote:

...

Op 15-10-12 17:37, Maarten Lankhorst schreef:

...
...
...
To make multi-object reservation work, the fix is to add a ticket "lock" of which all the reservation objects are a nested lock of. Since in this case the ticket lock would prevent deadlocks, this is acceptable. Having 2 ticket 'locks' at the same time would count as deadlock, rightfully. If you hold a reservation from a ticket, then try to reserve without a ticket, it counts as deadlock too. See below for some examples I was using to test.

But if a ticket lock can be used to abstract a locking sequence that we know will never deadlock, why can't we abstract locking from a list with atomic list removal with another "lock", and make sure we get the order right between them?

No, see the test below, lockdep won't be fooled by your diversions that easily!! :-)

However, if the following rules are valid, life becomes a lot easier:

items on the ddestroy list are not allowed to have a new sync object attached, only reservations for destruction of this object are allowed.

the only valid thing at this point left is unmapping from vm space and destruction of buffer backing in memory or vram

All reservations outside of ttm_bo_cleanup_refs(or_queue) are forbidden. (XXX: check if an exception needed for drivers to accomplish this destruction?)

Adding a WARN_ON(!list_empty(&bo->ddestroy)); to ttm_bo_reserve and ttm_eu_reserve_buffers should be enough to catch violators of the above rules.

If those rules hold, destruction becomes a lot more straightforward. ttm_bo_swapout and ttm_mem_evict_first can both do the reserve with lru_lock held, which will be guaranteed to succeed, and the buffer is also guaranteed to be on the ddestroy list still since we haven't dropped the lock yet.

So with a reservation, lru_lock held, and entry on the ddestroy list still alive:

Do a trywait on the bo, if it returns -EBUSY, and no_wait_gpu is unset, get a reference to bo->sync_obj. unreserve.

If -EBUSY was returned and no_wait_gpu is set, unlock lru_lock and return error.

If -EBUSY and no_wait_gpu was not set, unlock lru_lock, do a wait on our copy of bo->sync_obj, and unref it, return if wait fails. If wait hasn't failed, retake lru_lock.

Check if the if the ddestroy entry is gone. If so, someone else was faster, return 0. If not simply remove bo from all lists without reservation, unref bo->sync_obj**, and perform the remaining cleanup.

As far as I can see, this wouldn't leave open a race even if 2 threads do the same, although the second thread might return 0 to signal success before the backing is gone, but this can also happen right now. It's even harmless, since in those cases the functions calling these functions would simply call them one more time, and this time the destroyed buffer would definitely not be on the swap/lru lists any more.

It would also keep current behavior the same as far as I can tell, where multiple waiters could wait for the same bo to be destroyed.

**) Yes strictly speaking a violation of fence rules, but in this case the buffer already dropped to 0 references, and there's a BUG_ON in ttm_bo_release_list that would get triggered otherwise. Can alternatively be fixed by doing a BUG_ON in ttm_bo_release_list if there is a sync_obj that's not in the signaled state.

Attached 3 followup patches to show what I mean, they're based on my tree, which means with all patches applied that I posted on friday. This is not thoroughly tested, but I think it should work.

First is fixing radeon not to crash on calling move_notify from release_list. Second moves the memory cleanup to ttm_bo_cleanup_memtype_use, which is more readable and a more logical place to put it. Third cleans up how ttm_bo_cleanup_refs is called.

Also available on http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

If this looks ok I'll send those below out when the rest of the patches I posted on friday are reviewed. :-)

The bo_va list is protected by the reservation. I am not against removing the check as it's should be ok in cleanup path to not hold the reservation assuming no other thread will call into radeon with this same bo.

Cheers, Jerome Glisse

...

======

drm/radeon: allow move_notify to be called without reservation

The few places that care should have those checks instead. This allow destruction of bo backed memory without a reservation.

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com

drivers/gpu/drm/radeon/radeon_gart.c | 1 - drivers/gpu/drm/radeon/radeon_object.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 0ee5e29..701b215 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -1044,7 +1044,6 @@ void radeon_vm_bo_invalidate(struct radeon_device *rdev, { struct radeon_bo_va *bo_va;

BUG_ON(!radeon_bo_is_reserved(bo)); list_for_each_entry(bo_va, &bo->va, bo_list) { bo_va->valid = false; }

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 5b959b6..fa954d7 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -539,7 +539,7 @@ void radeon_bo_get_tiling_flags(struct radeon_bo *bo, int radeon_bo_check_tiling(struct radeon_bo *bo, bool has_moved, bool force_drop) {

BUG_ON(!radeon_bo_is_reserved(bo));

BUG_ON(!radeon_bo_is_reserved(bo) && !force_drop);

if (!(bo->tiling_flags & RADEON_TILING_SURFACE)) return 0;

======

drm/ttm: remove ttm_bo_cleanup_memtype_use

move to release_list instead

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com

drivers/gpu/drm/ttm/ttm_bo.c | 45 ++++++++++++------------------------------ 1 file changed, 13 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index dd6a3e6..9b95e96 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -146,8 +146,16 @@ static void ttm_bo_release_list(struct kref *list_kref) BUG_ON(!list_empty(&bo->lru)); BUG_ON(!list_empty(&bo->ddestroy));

if (bo->ttm)
if (bo->bdev->driver->move_notify)
bo->bdev->driver->move_notify(bo, NULL);
if (bo->ttm) {
ttm_tt_unbind(bo->ttm);
ttm_tt_destroy(bo->ttm);
bo->ttm = NULL;
}

ttm_bo_mem_put(bo, &bo->mem);

atomic_dec(&bo->glob->bo_count); if (bo->destroy) bo->destroy(bo);
@@ -465,35 +473,6 @@ out_err: return ret; }

-/**

Call bo::reserved.

Will release GPU memory type usage on destruction.

This is the place to put in driver specific hooks to release

driver private resources.

Will release the bo::reserved lock.

*/

-static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo) -{
if (bo->bdev->driver->move_notify)
bo->bdev->driver->move_notify(bo, NULL);
if (bo->ttm) {
ttm_tt_unbind(bo->ttm);
ttm_tt_destroy(bo->ttm);
bo->ttm = NULL;
}

ttm_bo_mem_put(bo, &bo->mem);

atomic_set(&bo->reserved, 0);

/*
* Make processes trying to reserve really pick it up.
*/
smp_mb__after_atomic_dec();

wake_up_all(&bo->event_queue);
-}

static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) { struct ttm_bo_device *bdev = bo->bdev; @@ -517,8 +496,9 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
put_count = ttm_bo_del_from_lru(bo);
atomic_set(&bo->reserved, 0);
wake_up_all(&bo->event_queue);
spin_unlock(&glob->lru_lock);
ttm_bo_cleanup_memtype_use(bo);
ttm_bo_list_ref_sub(bo, put_count, true);
@@ -608,8 +588,9 @@ retry: list_del_init(&bo->ddestroy); ++put_count;

atomic_set(&bo->reserved, 0);

wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock);

ttm_bo_cleanup_memtype_use(bo);

ttm_bo_list_ref_sub(bo, put_count, true);

======

drm/ttm: add sense to ttm_bo_cleanup_refs

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com

drivers/gpu/drm/ttm/ttm_bo.c | 98 ++++++++++++++------------------ drivers/gpu/drm/ttm/ttm_execbuf_util.c | 1 2 files changed, 45 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 9b95e96..34d4bb1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -295,6 +295,8 @@ int ttm_bo_reserve(struct ttm_buffer_object *bo, int put_count = 0; int ret;

WARN_ON(!list_empty_careful(&bo->ddestroy));

spin_lock(&glob->lru_lock); ret = ttm_bo_reserve_locked(bo, interruptible, no_wait, use_sequence, sequence);

@@ -522,14 +524,15 @@ queue:

If bo idle, remove from delayed- and lru lists, and unref.

If not idle, do nothing.

Must be called with lru_lock and reservation held, this function

will drop both before returning.

@interruptible Any sleeps should occur interruptibly.

@no_wait_reserve Never wait for reserve. Return -EBUSY instead.

@no_wait_gpu Never wait for gpu. Return -EBUSY instead.

*/

static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, bool interruptible,
	       bool no_wait_reserve,
       bool no_wait_gpu)
{ struct ttm_bo_device *bdev = bo->bdev; @@ -537,53 +540,45 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, struct ttm_bo_global *glob = bo->glob; int put_count; int ret = 0;

void *sync_obj;

-retry:
spin_lock(&glob->lru_lock);

ret = ttm_bo_reserve_locked(bo, interruptible,
		    no_wait_reserve, false, 0);
if (unlikely(ret != 0)) {
spin_unlock(&glob->lru_lock);
return ret;
}
ret = ttm_bo_wait(bo, false, false, true);

if (unlikely(list_empty(&bo->ddestroy))) {

if (ret && no_wait_gpu) { atomic_set(&bo->reserved, 0); wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock);
return 0;
}

ret = ttm_bo_wait(bo, false, false, true);

if (ret) {
if (no_wait_gpu) {
	atomic_set(&bo->reserved, 0);
	wake_up_all(&bo->event_queue);
	spin_unlock(&glob->lru_lock);
	return ret;
}
return ret;
} else if (ret) {
void *sync_obj;
/**
 * Take a reference to the fence and unreserve, if the wait
 * was succesful and no new sync_obj was attached,
 * ttm_bo_wait in retry will return ret = 0, and end the loop.
 * Take a reference to the fence and unreserve,
 * at this point the buffer should be dead, so
 * no new sync objects can be attached.
*/
sync_obj = driver->sync_obj_ref(&bo->sync_obj); atomic_set(&bo->reserved, 0); wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock);
ret = driver->sync_obj_wait(bo->sync_obj, false, interruptible);
ret = driver->sync_obj_wait(sync_obj, false, interruptible);
driver->sync_obj_unref(&sync_obj); if (ret) return ret;
goto retry;
spin_lock(&glob->lru_lock);
} else {
atomic_set(&bo->reserved, 0);
wake_up_all(&bo->event_queue);
}

if (unlikely(list_empty(&bo->ddestroy))) {
spin_unlock(&glob->lru_lock);
return 0;
}
if (bo->sync_obj)
driver->sync_obj_unref(&bo->sync_obj);
put_count = ttm_bo_del_from_lru(bo); list_del_init(&bo->ddestroy); ++put_count;
@@ -625,9 +620,12 @@ static int ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all) kref_get(&nentry->list_kref); }
spin_unlock(&glob->lru_lock);
ret = ttm_bo_cleanup_refs(entry, false, !remove_all,
			  !remove_all);
ret = ttm_bo_reserve_locked(entry, false, !remove_all);
if (ret)
	spin_unlock(&glob->lru_lock);
else
	ret = ttm_bo_cleanup_refs(entry, false, !remove_all);
kref_put(&entry->list_kref, ttm_bo_release_list); entry = nentry;
@@ -778,18 +776,6 @@ retry: bo = list_first_entry(&man->lru, struct ttm_buffer_object, lru); kref_get(&bo->list_kref);
if (!list_empty(&bo->ddestroy)) {
spin_unlock(&glob->lru_lock);
ret = ttm_bo_cleanup_refs(bo, interruptible,
			  true, no_wait_gpu);
kref_put(&bo->list_kref, ttm_bo_release_list);
if (likely(ret == 0 || ret == -ERESTARTSYS))
	return ret;
goto retry;
}

ret = ttm_bo_reserve_locked(bo, false, true, false, 0);

if (unlikely(ret == -EBUSY)) {
@@ -808,6 +794,12 @@ retry: goto retry; }
if (!list_empty(&bo->ddestroy)) {
ret = ttm_bo_cleanup_refs(bo, interruptible, no_wait_gpu);
kref_put(&bo->list_kref, ttm_bo_release_list);
return ret;
}

put_count = ttm_bo_del_from_lru(bo); spin_unlock(&glob->lru_lock);
@@ -1718,14 +1710,6 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink) struct ttm_buffer_object, swap); kref_get(&bo->list_kref);
if (!list_empty(&bo->ddestroy)) {
	spin_unlock(&glob->lru_lock);
	(void) ttm_bo_cleanup_refs(bo, false, false, false);
	kref_put(&bo->list_kref, ttm_bo_release_list);
	spin_lock(&glob->lru_lock);
	continue;
}
/**

Reserve buffer. Since we unlock while sleeping, we need

to re-check that nobody removed us from the swap-list while
@@ -1741,6 +1725,12 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink) } }
if (!list_empty(&bo->ddestroy)) {
ret = ttm_bo_cleanup_refs(bo, false, false);
kref_put(&bo->list_kref, ttm_bo_release_list);
return ret;
}

BUG_ON(ret != 0); put_count = ttm_bo_del_from_lru(bo); spin_unlock(&glob->lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c index 51b5e97..3ea2457 100644 --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c @@ -153,6 +153,7 @@ retry: struct ttm_buffer_object *bo = entry->bo;

retry_this_bo:
WARN_ON(!list_empty_careful(&bo->ddestroy));
ret = ttm_bo_reserve_locked(bo, true, true, true, val_seq); switch (ret) { case 0:
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Thomas Hellstrom

6:40 p.m.

On 10/15/2012 05:37 PM, Maarten Lankhorst wrote:

...

Op 15-10-12 14:27, Thomas Hellstrom schreef:

...
On 10/12/2012 12:09 PM, Maarten Lankhorst wrote:

...
Op 12-10-12 07:57, Thomas Hellstrom schreef:

...
On 10/11/2012 10:55 PM, Maarten Lankhorst wrote:

...
Op 11-10-12 21:26, Thomas Hellstrom schreef:

...
On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:

>> Anyway, if you plan to remove the fence lock and protect it with reserve, you must >> make sure that a waiting reserve is never done in a destruction path. I think this >> mostly concerns the nvidia driver. > Well I don't think any lock should ever be held during destruction time, What I mean is, that *something* needs to protect the fence pointer. Currently it's the fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor Nvidia should, when a resource is about to be freed, be forced to *block* waiting for reserve just to access the fence pointer. When and if you have a solution that fulfills those requirements, I'm ready to review it.

It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around.

Well, I haven't looked into the code in detail yet. If you say it's non-blocking I believe you. I was actually more concerned abut the Nvidia case where IIRC the wait was called both with and without reservation.

...
...
>>> - no_wait_reserve is ignored if no_wait_gpu is false >>> ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but >>> subsequently it will do a wait_unreserved if no_wait_gpu is false. >>> I'm planning on removing this argument and act like it is always true, since >>> nothing on the lru list should fail to reserve currently. >> Yes, since all buffers that are reserved are removed from the LRU list, there >> should never be a waiting reserve on them, so no_wait_reserve can be removed >> from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain. > I suppose there will stay a small race though, Hmm, where?

When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

Yes, that code isn't fully correct, it's missing a check for still on ddestroy after a waiting reserve. However, the only chance of a waiting reserve given that the buffer *IS* on the ddestroy list is if the current reserver returned early because someone started an accelerated eviction which can't happen currently. The code needs fixing up though.

...
...
>>> - effectively unlimited callchain between some functions that all go through >>> ttm_mem_evict_first: >>> >>> /------------------------\ >>> ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first >>> \ ttm_bo_handle_move_mem / >>> I'm not surprised that there was a deadlock before, it seems to me it would >>> be pretty suicidal to ever do a blocking reserve on any of those lists, >>> lockdep would be all over you for this. >> Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth >> and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. >> What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted >> to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be >> a BUG. >> >> But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. >> But there should be no waiting reserves in the eviction path currently. > Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. > Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, > since it seems all the callers of this function assume that ttm_mem_evict_first can only fail > if there is really nothing more to free and blocking nested would really upset lockdep > and leave you open to the same deadlocks. I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, because the buffer about to be reserved is always *last* in a reservation sequence, and the reservation is always released (or the buffer destroyed) before trying to reserve another buffer. Technically the buffer is not looked up from a LRU list but from the delayed delete list. Could you describe such a deadlock case?

The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why.

Interesting. I guess that must be because of the previous reservation history for that buffer? Let's say we were to reinitialize the lockdep history for the reservation object when it was put on the ddestroy list, I assume lockdep would keep quiet, because there are never any other bo reservations while such a buffer is reserved?

Lockdep works on classes of lock, not necessarily individual locks.

Doing 2 bo_reserve's of any bo would count as possible deadlock, no matter if you always take them in a certain order or not.

So you mean that if I bo_reserve A and then bo_reserve B (which is used only when binding A to the GPU), lockdep will complain even if nobody ever bo_reserves B before A? That will make it impossible to use BOs as page tables for GPU binding for example.

As far as I tell can nobody does it like that, page tables are simply initialized on channel creation, pinned in memory and kept like that while the host serializes with their own vm locking.

I don't think the fact that nobody's using a feature yet is a valid argument to say it will never be used. With that argument a lot of code could go away. Including reservation objects... ;)

Reserving multiple buffer objects in a pre-determined order is a perfectly valid thing to do. For example an execbuf implementation could ditch the ticketed reserve and instead do a quick sort of the buffer objects in pointer value order. And FWIW vmware have a couple of patches pending for a future "hardware" revision that implement GPU bind using buffer objects for page tables; the code becomes neat and we can use the delayed delete mechanism to avoid stalling at unbind time.

So from my pow, if lockdep fails to handle that situation, the lockdep implementation is incomplete.

...

...
...
To make multi-object reservation work, the fix is to add a ticket "lock" of which all the reservation objects are a nested lock of. Since in this case the ticket lock would prevent deadlocks, this is acceptable. Having 2 ticket 'locks' at the same time would count as deadlock, rightfully. If you hold a reservation from a ticket, then try to reserve without a ticket, it counts as deadlock too. See below for some examples I was using to test.

But if a ticket lock can be used to abstract a locking sequence that we know will never deadlock, why can't we abstract locking from a list with atomic list removal with another "lock", and make sure we get the order right between them?

No, see the test below, lockdep won't be fooled by your diversions that easily!! :-)

It's not a diversion, it's an attempt to abstract a valid locking scenario.

I think you got me wrong. In the analogy of mutex_lock_nested(), (see http://www.mjmwired.net/kernel/Documentation/lockdep-design.txt

line 153

enum reservation_class { RES_TICKET, RES_LRU, RES_DDESTROY };

/Thomas

Maarten Lankhorst

7:32 p.m.

Op 15-10-12 20:40, Thomas Hellstrom schreef:

...

On 10/15/2012 05:37 PM, Maarten Lankhorst wrote:

...
Op 15-10-12 14:27, Thomas Hellstrom schreef:

...
On 10/12/2012 12:09 PM, Maarten Lankhorst wrote:

...
Op 12-10-12 07:57, Thomas Hellstrom schreef:

...
On 10/11/2012 10:55 PM, Maarten Lankhorst wrote:

...
Op 11-10-12 21:26, Thomas Hellstrom schreef: > On 10/11/2012 08:42 PM, Maarten Lankhorst wrote: > >>> Anyway, if you plan to remove the fence lock and protect it with reserve, you must >>> make sure that a waiting reserve is never done in a destruction path. I think this >>> mostly concerns the nvidia driver. >> Well I don't think any lock should ever be held during destruction time, > What I mean is, that *something* needs to protect the fence pointer. Currently it's the > fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor > Nvidia should, when a resource is about to be freed, be forced to *block* waiting for > reserve just to access the fence pointer. When and if you have a solution that fulfills > those requirements, I'm ready to review it. It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails, behavior doesn't change just because I changed the order around.

Well, I haven't looked into the code in detail yet. If you say it's non-blocking I believe you. I was actually more concerned abut the Nvidia case where IIRC the wait was called both with and without reservation.

...
>>>> - no_wait_reserve is ignored if no_wait_gpu is false >>>> ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but >>>> subsequently it will do a wait_unreserved if no_wait_gpu is false. >>>> I'm planning on removing this argument and act like it is always true, since >>>> nothing on the lru list should fail to reserve currently. >>> Yes, since all buffers that are reserved are removed from the LRU list, there >>> should never be a waiting reserve on them, so no_wait_reserve can be removed >>> from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain. >> I suppose there will stay a small race though, > Hmm, where? When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved away from under you.

Yes, that code isn't fully correct, it's missing a check for still on ddestroy after a waiting reserve. However, the only chance of a waiting reserve given that the buffer *IS* on the ddestroy list is if the current reserver returned early because someone started an accelerated eviction which can't happen currently. The code needs fixing up though.

...
>>>> - effectively unlimited callchain between some functions that all go through >>>> ttm_mem_evict_first: >>>> >>>> /------------------------\ >>>> ttm_mem_evict_first - ttm_bo_evict - -ttm_bo_mem_space - ttm_bo_mem_force_space - ttm_mem_evict_first >>>> \ ttm_bo_handle_move_mem / >>>> I'm not surprised that there was a deadlock before, it seems to me it would >>>> be pretty suicidal to ever do a blocking reserve on any of those lists, >>>> lockdep would be all over you for this. >>> Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth >>> and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one. >>> What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted >>> to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be >>> a BUG. >>> >>> But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock. >>> But there should be no waiting reserves in the eviction path currently. >> Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve. >> Fixing this might mean that ttm_mem_evict_first may need to become more aggressive, >> since it seems all the callers of this function assume that ttm_mem_evict_first can only fail >> if there is really nothing more to free and blocking nested would really upset lockdep >> and leave you open to the same deadlocks. > I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock, > because the buffer about to be reserved is always *last* in a reservation sequence, and the > reservation is always released (or the buffer destroyed) before trying to reserve another buffer. > Technically the buffer is not looked up from a LRU list but from the delayed delete list. > Could you describe such a deadlock case? The only interesting case for this is ttm_mem_evict_first, and while it may not technically be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it would not be a deadlock is if you know the exact semantics of why.

Interesting. I guess that must be because of the previous reservation history for that buffer? Let's say we were to reinitialize the lockdep history for the reservation object when it was put on the ddestroy list, I assume lockdep would keep quiet, because there are never any other bo reservations while such a buffer is reserved?

Lockdep works on classes of lock, not necessarily individual locks.

Doing 2 bo_reserve's of any bo would count as possible deadlock, no matter if you always take them in a certain order or not.

So you mean that if I bo_reserve A and then bo_reserve B (which is used only when binding A to the GPU), lockdep will complain even if nobody ever bo_reserves B before A? That will make it impossible to use BOs as page tables for GPU binding for example.

As far as I tell can nobody does it like that, page tables are simply initialized on channel creation, pinned in memory and kept like that while the host serializes with their own vm locking.

I don't think the fact that nobody's using a feature yet is a valid argument to say it will never be used. With that argument a lot of code could go away. Including reservation objects... ;)

Reserving multiple buffer objects in a pre-determined order is a perfectly valid thing to do. For example an execbuf implementation could ditch the ticketed reserve and instead do a quick sort of the buffer objects in pointer value order. And FWIW vmware have a couple of patches pending for a future "hardware" revision that implement GPU bind using buffer objects for page tables; the code becomes neat and we can use the delayed delete mechanism to avoid stalling at unbind time. So from my pow, if lockdep fails to handle that situation, the lockdep implementation is incomplete.

Well lockdep can only distinguish between classes. It would be possible to add a type of reservation_ticket that is basically identical to ttm_bo_reserve with no ticket now, except lockdep won't be able to warn you if you should yourself in the foot while doing it.

This is unfortunately a limitation of lockdep, since it can only warn about combinations of classes/subclasses locks. As such, ttm object A and B look identical to ttm, this is why you need the reservation_ticket dance, and is the only reason why lockdep is not even slower than it already is.

However it won't be entirely useless, it will still warn about everything else that can go wrong, the only check that will be disabled is the one where you mess up the reservation order in your execbuffer implementation, so if that is the only place where you want to do those things it wouldn't be hard to debug anyway.

...

...
...
...
To make multi-object reservation work, the fix is to add a ticket "lock" of which all the reservation objects are a nested lock of. Since in this case the ticket lock would prevent deadlocks, this is acceptable. Having 2 ticket 'locks' at the same time would count as deadlock, rightfully. If you hold a reservation from a ticket, then try to reserve without a ticket, it counts as deadlock too. See below for some examples I was using to test.

But if a ticket lock can be used to abstract a locking sequence that we know will never deadlock, why can't we abstract locking from a list with atomic list removal with another "lock", and make sure we get the order right between them?

No, see the test below, lockdep won't be fooled by your diversions that easily!! :-)

It's not a diversion, it's an attempt to abstract a valid locking scenario.

I think you got me wrong. In the analogy of mutex_lock_nested(), (see http://www.mjmwired.net/kernel/Documentation/lockdep-design.txt

Oh that, basically you create a new subclass, but it's not needed for core ttm, and won't help you with trying to do the a, b, c reservations in fixed order, unless you want to give each buffer it's own class. This is unadvisable though, it will render a lot of lockdep checks useless, run you against the 80 lock limit of lockdep on multi object reservations and consume a whole lot of extra memory for all other scenarios, so please don't. Especially not after all the time I just spent getting rid of any dangerous blocking reservation in ttm_bo.c in the first place! :-)

btw what is ttm_bo_fbdev_io for? Just noticed that when I checked if I missed any dangerous ttm_bo_reserve, but it doesn't seem to be used anywhere.

~Maarten

4610

Age (days ago)

4614

Last active (days ago)

dri-devel@lists.freedesktop.org

15 comments

3 participants

tags (0)

participants (3)

Jerome Glisse
Maarten Lankhorst
Thomas Hellstrom