On 07.05.2012 17:23, Jerome Glisse wrote:
On Mon, May 7, 2012 at 7:42 AM, Christian Königdeathsimple@vodafone.de wrote:
A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen.
We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU.
If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate.
If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete.
v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary.
Signed-off-by: Christian Königdeathsimple@vodafone.de Signed-off-by: Jerome Glissejglisse@redhat.com
This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change:
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Nope that won't work correctly, "last" is pointing to the last allocation and that's the most unlikely to be freed at this time. Also in this version (like in the one before) radeon_sa_bo_next_hole lets hole point to the "prev" of the found sa_bo without checking if this isn't the lists head. That might cause a crash if an to be freed allocation is the first one in the buffer.
What radeon_sa_bo_try_free would need to do to get your approach working is to loop over the end of the buffer and also try to free at the beginning, but saying that keeping the last allocation results in a whole bunch of extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. where the next allocation is most likely to succeed) simplifies the code quite a bit (but I agree that on the down side it makes it harder to understand).
Your patch here can enter in infinite loop and never return holding the lock. See below.
[SNIP]
- } while (radeon_sa_bo_next_hole(sa_manager, fences));
Here you can infinite loop, in the case there is a bunch of hole in the allocator but none of them allow to full fill the allocation. radeon_sa_bo_next_hole will keep returning true looping over and over on all the all. That's why i only restrict my patch to 2 hole skeeping and then fails the allocation or try to wait. I believe sadly we need an heuristic and 2 hole skeeping at most sounded like a good one.
Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes.
Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely).
Cheers, Christian.
I looked again and yes it can loop infinitly, think of hole you can never free ie radeon_sa_bo_try_free can't free anything. This situation can happen if you have several thread allocating sa bo at the same time while none of them are yet done with there sa_bo (ie none have call sa_bo_free yet). I updated a v3 that track oldest and fix all things you were pointing out above.
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Cheers, Jerome
On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse j.glisse@gmail.com wrote:
On 07.05.2012 17:23, Jerome Glisse wrote:
On Mon, May 7, 2012 at 7:42 AM, Christian Königdeathsimple@vodafone.de wrote:
A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen.
We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU.
If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate.
If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete.
v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary.
Signed-off-by: Christian Königdeathsimple@vodafone.de Signed-off-by: Jerome Glissejglisse@redhat.com
This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change:
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Nope that won't work correctly, "last" is pointing to the last allocation and that's the most unlikely to be freed at this time. Also in this version (like in the one before) radeon_sa_bo_next_hole lets hole point to the "prev" of the found sa_bo without checking if this isn't the lists head. That might cause a crash if an to be freed allocation is the first one in the buffer.
What radeon_sa_bo_try_free would need to do to get your approach working is to loop over the end of the buffer and also try to free at the beginning, but saying that keeping the last allocation results in a whole bunch of extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. where the next allocation is most likely to succeed) simplifies the code quite a bit (but I agree that on the down side it makes it harder to understand).
Your patch here can enter in infinite loop and never return holding the lock. See below.
[SNIP]
- } while (radeon_sa_bo_next_hole(sa_manager, fences));
Here you can infinite loop, in the case there is a bunch of hole in the allocator but none of them allow to full fill the allocation. radeon_sa_bo_next_hole will keep returning true looping over and over on all the all. That's why i only restrict my patch to 2 hole skeeping and then fails the allocation or try to wait. I believe sadly we need an heuristic and 2 hole skeeping at most sounded like a good one.
Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes.
Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely).
Cheers, Christian.
I looked again and yes it can loop infinitly, think of hole you can never free ie radeon_sa_bo_try_free can't free anything. This situation can happen if you have several thread allocating sa bo at the same time while none of them are yet done with there sa_bo (ie none have call sa_bo_free yet). I updated a v3 that track oldest and fix all things you were pointing out above.
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Cheers, Jerome
Of course by tracking oldest it defeat the algo so updated patch : http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Just fix the corner case of list of single entry.
Cheers, Jerome
On 07.05.2012 20:52, Jerome Glisse wrote:
On Mon, May 7, 2012 at 1:59 PM, Jerome Glissej.glisse@gmail.com wrote:
On 07.05.2012 17:23, Jerome Glisse wrote:
On Mon, May 7, 2012 at 7:42 AM, Christian Königdeathsimple@vodafone.de wrote:
A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen.
We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU.
If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate.
If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete.
v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary.
Signed-off-by: Christian Königdeathsimple@vodafone.de Signed-off-by: Jerome Glissejglisse@redhat.com
This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change:
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Nope that won't work correctly, "last" is pointing to the last allocation and that's the most unlikely to be freed at this time. Also in this version (like in the one before) radeon_sa_bo_next_hole lets hole point to the "prev" of the found sa_bo without checking if this isn't the lists head. That might cause a crash if an to be freed allocation is the first one in the buffer.
What radeon_sa_bo_try_free would need to do to get your approach working is to loop over the end of the buffer and also try to free at the beginning, but saying that keeping the last allocation results in a whole bunch of extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. where the next allocation is most likely to succeed) simplifies the code quite a bit (but I agree that on the down side it makes it harder to understand).
Your patch here can enter in infinite loop and never return holding the lock. See below.
[SNIP]
} while (radeon_sa_bo_next_hole(sa_manager, fences));
Here you can infinite loop, in the case there is a bunch of hole in the allocator but none of them allow to full fill the allocation. radeon_sa_bo_next_hole will keep returning true looping over and over on all the all. That's why i only restrict my patch to 2 hole skeeping and then fails the allocation or try to wait. I believe sadly we need an heuristic and 2 hole skeeping at most sounded like a good one.
Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes.
Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely).
Cheers, Christian.
I looked again and yes it can loop infinitly, think of hole you can never free ie radeon_sa_bo_try_free can't free anything. This situation can happen if you have several thread allocating sa bo at the same time while none of them are yet done with there sa_bo (ie none have call sa_bo_free yet). I updated a v3 that track oldest and fix all things you were pointing out above.
No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries of the flist, so it only considers holes that have a signaled fence and so can be freed.
Having multiple threads allocate objects that can't be freed yet will just result in empty flists, and so radeon_sa_bo_next_hole will return false, resulting in calling radeon_fence_wait_any with an empty fence list, which in turn will result in an ENOENT and abortion of allocation (ok maybe we should catch that and return -ENOMEM instead).
So even the corner cases should now be handled fine.
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Cheers, Jerome
Of course by tracking oldest it defeat the algo so updated patch : http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Just fix the corner case of list of single entry.
That still won't work correctly, cause the corner case isn't that there is just one allocation left on the list, the corner case is that we need to be able to allocate something before the first sa_bo, just consider the following with your current implementation:
B F F F F 1 2 3 4 F E
B is the beginning of the buffer. F is free space. 1,2,3,4 are allocations. E is the end of the buffer.
So lets say that we have an allocation that won't fit in the free space between "4" and "E", now even if if radeon_sa_next_hole sets "last" to 1, we aren't able to allocate anything at the beginning of the buffer...
Christian.
On Mon, May 7, 2012 at 4:38 PM, Christian König deathsimple@vodafone.de wrote:
On 07.05.2012 20:52, Jerome Glisse wrote:
On Mon, May 7, 2012 at 1:59 PM, Jerome Glissej.glisse@gmail.com wrote:
On 07.05.2012 17:23, Jerome Glisse wrote:
On Mon, May 7, 2012 at 7:42 AM, Christian Königdeathsimple@vodafone.de wrote:
A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen.
We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU.
If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate.
If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete.
v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary.
Signed-off-by: Christian Königdeathsimple@vodafone.de Signed-off-by: Jerome Glissejglisse@redhat.com
This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change:
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Nope that won't work correctly, "last" is pointing to the last allocation and that's the most unlikely to be freed at this time. Also in this version (like in the one before) radeon_sa_bo_next_hole lets hole point to the "prev" of the found sa_bo without checking if this isn't the lists head. That might cause a crash if an to be freed allocation is the first one in the buffer.
What radeon_sa_bo_try_free would need to do to get your approach working is to loop over the end of the buffer and also try to free at the beginning, but saying that keeping the last allocation results in a whole bunch of extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. where the next allocation is most likely to succeed) simplifies the code quite a bit (but I agree that on the down side it makes it harder to understand).
Your patch here can enter in infinite loop and never return holding the lock. See below.
[SNIP]
- } while (radeon_sa_bo_next_hole(sa_manager, fences));
Here you can infinite loop, in the case there is a bunch of hole in the allocator but none of them allow to full fill the allocation. radeon_sa_bo_next_hole will keep returning true looping over and over on all the all. That's why i only restrict my patch to 2 hole skeeping and then fails the allocation or try to wait. I believe sadly we need an heuristic and 2 hole skeeping at most sounded like a good one.
Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes.
Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely).
Cheers, Christian.
I looked again and yes it can loop infinitly, think of hole you can never free ie radeon_sa_bo_try_free can't free anything. This situation can happen if you have several thread allocating sa bo at the same time while none of them are yet done with there sa_bo (ie none have call sa_bo_free yet). I updated a v3 that track oldest and fix all things you were pointing out above.
No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries of the flist, so it only considers holes that have a signaled fence and so can be freed.
Having multiple threads allocate objects that can't be freed yet will just result in empty flists, and so radeon_sa_bo_next_hole will return false, resulting in calling radeon_fence_wait_any with an empty fence list, which in turn will result in an ENOENT and abortion of allocation (ok maybe we should catch that and return -ENOMEM instead).
So even the corner cases should now be handled fine.
No, there is still infinite loop possible with gpu lockup, i am against the while (next_hole) using for on 2 iteration looks a lot better and it avoids sa allocator possibly looping too much (because it can loop a lot more than RADEON_NUM_RINGS, the maximum number of loop is sa_manager->size/4).
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Cheers, Jerome
Of course by tracking oldest it defeat the algo so updated patch :
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
Just fix the corner case of list of single entry.
That still won't work correctly, cause the corner case isn't that there is just one allocation left on the list, the corner case is that we need to be able to allocate something before the first sa_bo, just consider the following with your current implementation:
B F F F F 1 2 3 4 F E
B is the beginning of the buffer. F is free space. 1,2,3,4 are allocations. E is the end of the buffer.
So lets say that we have an allocation that won't fit in the free space between "4" and "E", now even if if radeon_sa_next_hole sets "last" to 1, we aren't able to allocate anything at the beginning of the buffer...
Christian.
Yes that isn't handled
Cheers, Jerome
On 07.05.2012 23:28, Jerome Glisse wrote:
On Mon, May 7, 2012 at 4:38 PM, Christian Königdeathsimple@vodafone.de wrote:
On 07.05.2012 20:52, Jerome Glisse wrote:
On Mon, May 7, 2012 at 1:59 PM, Jerome Glissej.glisse@gmail.com wrote:
On 07.05.2012 17:23, Jerome Glisse wrote:
Your patch here can enter in infinite loop and never return holding the lock. See below.
[SNIP]
> + } while (radeon_sa_bo_next_hole(sa_manager, fences)); Here you can infinite loop, in the case there is a bunch of hole in the allocator but none of them allow to full fill the allocation. radeon_sa_bo_next_hole will keep returning true looping over and over on all the all. That's why i only restrict my patch to 2 hole skeeping and then fails the allocation or try to wait. I believe sadly we need an heuristic and 2 hole skeeping at most sounded like a good one.
Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes.
Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely).
Cheers, Christian.
I looked again and yes it can loop infinitly, think of hole you can never free ie radeon_sa_bo_try_free can't free anything. This situation can happen if you have several thread allocating sa bo at the same time while none of them are yet done with there sa_bo (ie none have call sa_bo_free yet). I updated a v3 that track oldest and fix all things you were pointing out above.
No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries of the flist, so it only considers holes that have a signaled fence and so can be freed.
Having multiple threads allocate objects that can't be freed yet will just result in empty flists, and so radeon_sa_bo_next_hole will return false, resulting in calling radeon_fence_wait_any with an empty fence list, which in turn will result in an ENOENT and abortion of allocation (ok maybe we should catch that and return -ENOMEM instead).
So even the corner cases should now be handled fine.
No, there is still infinite loop possible with gpu lockup, i am against the while (next_hole) using for on 2 iteration looks a lot better and it avoids sa allocator possibly looping too much (because it can loop a lot more than RADEON_NUM_RINGS, the maximum number of loop is sa_manager->size/4).
I'm still pretty sure that there isn't the possibility for an infinite loop, so please explain further where exactly the problem is. radeon_sa_bo_next_hole will return true as long as it can find AND remove an allocation with an already signaled fence, and since nobody else can add allocations while we are in the loop we sooner or later run out of allocations and so the loop ends.
Also what the loop does is just cleaning up all the already signaled allocations, and it doesn't matter if there is one allocation or a million, we need to clean them up anyway. So aborting the loop and trying to wait for anything to be signaled makes no sense at all, and aborting the whole allocation at this point makes even less sense, cause that only delays the work that needs to be done anyway (freeing the allocations) to a later call to radeon_sa_bo_new.
What could make sense is limiting how often we are waiting for some fences, since while waiting we release the lock and then other processes can jump in and grab what we wanted to have by waiting for something to happen.
Cheers, Christian.
On Tue, May 8, 2012 at 6:23 AM, Christian König deathsimple@vodafone.de wrote:
On 07.05.2012 23:28, Jerome Glisse wrote:
On Mon, May 7, 2012 at 4:38 PM, Christian Königdeathsimple@vodafone.de wrote:
On 07.05.2012 20:52, Jerome Glisse wrote:
On Mon, May 7, 2012 at 1:59 PM, Jerome Glissej.glisse@gmail.com wrote:
On 07.05.2012 17:23, Jerome Glisse wrote: > > Your patch here can enter in infinite loop and never return holding > the lock. See below. > > [SNIP] > >> + } while (radeon_sa_bo_next_hole(sa_manager, >> fences)); > > Here you can infinite loop, in the case there is a bunch of hole in > the allocator but none of them allow to full fill the allocation. > radeon_sa_bo_next_hole will keep returning true looping over and over > on all the all. That's why i only restrict my patch to 2 hole > skeeping > and then fails the allocation or try to wait. I believe sadly we need > an heuristic and 2 hole skeeping at most sounded like a good one.
Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes.
Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely).
Cheers, Christian.
I looked again and yes it can loop infinitly, think of hole you can never free ie radeon_sa_bo_try_free can't free anything. This situation can happen if you have several thread allocating sa bo at the same time while none of them are yet done with there sa_bo (ie none have call sa_bo_free yet). I updated a v3 that track oldest and fix all things you were pointing out above.
No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries of the flist, so it only considers holes that have a signaled fence and so can be freed.
Having multiple threads allocate objects that can't be freed yet will just result in empty flists, and so radeon_sa_bo_next_hole will return false, resulting in calling radeon_fence_wait_any with an empty fence list, which in turn will result in an ENOENT and abortion of allocation (ok maybe we should catch that and return -ENOMEM instead).
So even the corner cases should now be handled fine.
No, there is still infinite loop possible with gpu lockup, i am against the while (next_hole) using for on 2 iteration looks a lot better and it avoids sa allocator possibly looping too much (because it can loop a lot more than RADEON_NUM_RINGS, the maximum number of loop is sa_manager->size/4).
I'm still pretty sure that there isn't the possibility for an infinite loop, so please explain further where exactly the problem is. radeon_sa_bo_next_hole will return true as long as it can find AND remove an allocation with an already signaled fence, and since nobody else can add allocations while we are in the loop we sooner or later run out of allocations and so the loop ends.
Yeah you right
Also what the loop does is just cleaning up all the already signaled allocations, and it doesn't matter if there is one allocation or a million, we need to clean them up anyway. So aborting the loop and trying to wait for anything to be signaled makes no sense at all, and aborting the whole allocation at this point makes even less sense, cause that only delays the work that needs to be done anyway (freeing the allocations) to a later call to radeon_sa_bo_new.
What could make sense is limiting how often we are waiting for some fences, since while waiting we release the lock and then other processes can jump in and grab what we wanted to have by waiting for something to happen.
Cheers, Christian.
Still i don't want to loop more than necessary, it's bad, i am ok with : http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
If there is fence signaled it will retry 2 times at most, otherwise it will go to wait and that way better. Because with your while loop the worst case is something proportional to the manager size given it's 1Mo it can loop for a long long time.
Cheers, Jerome
On 08.05.2012 16:55, Jerome Glisse wrote:
Still i don't want to loop more than necessary, it's bad, i am ok with : http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-a...
If there is fence signaled it will retry 2 times at most, otherwise it will go to wait and that way better. Because with your while loop the worst case is something proportional to the manager size given it's 1Mo it can loop for a long long time.
Yeah, this loop can indeed consume quite some time. Ok then let's at least give every ring a chance to supply some holes, otherwise I fear that we might not even found something worth to wait for after only 2 tries.
Going to send that out after figuring out why the patchset still causes texture corruptions on my system.
Christian.
dri-devel@lists.freedesktop.org