From: Michel Dänzer michel.daenzer@amd.com
I wasn't sure if TTM_PL_FLAG_TOPDOWN works correctly with non-0 lpfn, but AFAICT it does.
Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/radeon/radeon_object.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 9c92976..d23b0dd 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -179,9 +179,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) * improve fragmentation quality. * 512kb was measured as the most optimal number. */ - if (!((rbo->flags & RADEON_GEM_CPU_ACCESS) && - (rbo->placements[i].flags & TTM_PL_FLAG_VRAM)) && - rbo->tbo.mem.size > 512 * 1024) { + if (rbo->tbo.mem.size > 512 * 1024) { for (i = 0; i < c; i++) { rbo->placements[i].flags |= TTM_PL_FLAG_TOPDOWN; }
From: Michel Dänzer michel.daenzer@amd.com
If the BO should be placed at the top of the area, we should start looking for holes from the top.
Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/ttm/ttm_bo_manager.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index 964387f..1e93f6c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -55,6 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv; struct drm_mm *mm = &rman->mm; struct drm_mm_node *node = NULL; + enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST; enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT; unsigned long lpfn; int ret; @@ -67,15 +68,16 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, if (!node) return -ENOMEM;
- if (place->flags & TTM_PL_FLAG_TOPDOWN) + if (place->flags & TTM_PL_FLAG_TOPDOWN) { + sflags |= DRM_MM_SEARCH_BELOW; aflags = DRM_MM_CREATE_TOP; + }
spin_lock(&rman->lock); ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages, mem->page_alignment, 0, place->fpfn, lpfn, - DRM_MM_SEARCH_BEST, - aflags); + sflags, aflags); spin_unlock(&rman->lock);
if (unlikely(ret)) {
From: Michel Dänzer michel.daenzer@amd.com
DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:
* The smallest hole may be in the overall bottom of the area * If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole
Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index 1e93f6c..aa0bd05 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -69,7 +69,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, return -ENOMEM;
if (place->flags & TTM_PL_FLAG_TOPDOWN) { - sflags |= DRM_MM_SEARCH_BELOW; + sflags = DRM_MM_SEARCH_BELOW; aflags = DRM_MM_CREATE_TOP; }
On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:
From: Michel Dänzer michel.daenzer@amd.com
DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:
- The smallest hole may be in the overall bottom of the area
- If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.
Another one that I've advertised a bunch of times already is the scan roaster in drm_mm.c: Currently ttm just evicts until there's a big enough hole, which is fairly awful if you have quasi-segmented memory like with top-down/bottom-up schemes and different ranges for different units. With the roaster you just walk the lru and build up potential holes until there's a suitable one, and then only evict those buffers. Which means if you have a certain range of memory under very high pressure (e.g. the 256M which uvd can use or whatever it is), then you wont thrash all the other vram too.
Cheers, Daniel
drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index 1e93f6c..aa0bd05 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -69,7 +69,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, return -ENOMEM;
if (place->flags & TTM_PL_FLAG_TOPDOWN) {
sflags |= DRM_MM_SEARCH_BELOW;
aflags = DRM_MM_CREATE_TOP; }sflags = DRM_MM_SEARCH_BELOW;
-- 2.1.1
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 28.10.2014 20:10, Daniel Vetter wrote:
On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:
From: Michel Dänzer michel.daenzer@amd.com
DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:
- The smallest hole may be in the overall bottom of the area
- If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.
Sent out a new patch 4 which nukes it.
Another one that I've advertised a bunch of times already is the scan roaster in drm_mm.c: Currently ttm just evicts until there's a big enough hole, which is fairly awful if you have quasi-segmented memory like with top-down/bottom-up schemes and different ranges for different units. With the roaster you just walk the lru and build up potential holes until there's a suitable one, and then only evict those buffers. Which means if you have a certain range of memory under very high pressure (e.g. the 256M which uvd can use or whatever it is), then you wont thrash all the other vram too.
I recently made some changes in TTM and radeon to alleviate that particular issue, but the scan roaster does sound useful. Unfortunately, modifying TTM to use it is a bit too involved for me right now, any volunteers?
On 10/28/2014 12:10 PM, Daniel Vetter wrote:
On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:
From: Michel Dänzer michel.daenzer@amd.com
DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:
- The smallest hole may be in the overall bottom of the area
- If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.
Another one that I've advertised a bunch of times already is the scan roaster in drm_mm.c: Currently ttm just evicts until there's a big enough hole, which is fairly awful if you have quasi-segmented memory like with top-down/bottom-up schemes and different ranges for different units. With the roaster you just walk the lru and build up potential holes until there's a suitable one, and then only evict those buffers. Which means if you have a certain range of memory under very high pressure (e.g. the 256M which uvd can use or whatever it is), then you wont thrash all the other vram too.
Cheers, Daniel
And I think I've commented each time why that is currently not possible or at least worthile with TTM ;).
Although the idea is great, even if you make a nice list of suitable buffers to evict, as soon as you drop the LRU spinlock, that list is invalid. Worse, while you start evicting your list, other processes may steal part of the hole you've created.
As some point we should perhaps figure out how to protect each managed area with a separate mutex, but that would create trouble if, for example, evicting from VRAM to managed GART.
/Thomas
On Thu, Oct 30, 2014 at 04:07:05PM +0100, Thomas Hellstrom wrote:
On 10/28/2014 12:10 PM, Daniel Vetter wrote:
On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:
From: Michel Dänzer michel.daenzer@amd.com
DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:
- The smallest hole may be in the overall bottom of the area
- If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.
Another one that I've advertised a bunch of times already is the scan roaster in drm_mm.c: Currently ttm just evicts until there's a big enough hole, which is fairly awful if you have quasi-segmented memory like with top-down/bottom-up schemes and different ranges for different units. With the roaster you just walk the lru and build up potential holes until there's a suitable one, and then only evict those buffers. Which means if you have a certain range of memory under very high pressure (e.g. the 256M which uvd can use or whatever it is), then you wont thrash all the other vram too.
Cheers, Daniel
And I think I've commented each time why that is currently not possible or at least worthile with TTM ;).
Although the idea is great, even if you make a nice list of suitable buffers to evict, as soon as you drop the LRU spinlock, that list is invalid. Worse, while you start evicting your list, other processes may steal part of the hole you've created.
As some point we should perhaps figure out how to protect each managed area with a separate mutex, but that would create trouble if, for example, evicting from VRAM to managed GART.
Yeah, you need to do the entire scanning/evicting under the lru lock, which means you need to jump over all the buffers you can't immediately trylock. Which also means that you need a slowpath fallback that rips out everything in case the buffer doesn't fit anywhere. Wrt the lru lock you need to at least overwrite the generic placement/eviction code since you need to a bit more logic and state-keeping for drm_mm than what's currently exposed in the super-generic ttm managed area allocator (which apparently could also manage an idr). But shouldn't be more fuzz than a new vfunc with the current code as default implementation.
For lru lock recursion (due to vram->gtt eviction or similar) I guess either rework stuff to allow the entire eviction process under lru. Or keep the drm_mm allocations on some separate list until you've evicted everything (dropping the lock meanwhile) and the mass-free the entire list under the lru lock and immediately do your allocation.
I'm not sure whether the long hold times for the lru lock really would be a problem (they'll happen even when you drop it for actual eviction since walking the lru lists is costly when you jump over lots of buffers that don't get you towards your goal), since evicting all buffers is probably going to thrash the system much worse anyway. And for the worst case a regular sched for lru lock contention or preemption should prevent interactivity disasters for parallel cmd submission.
So yeah some details to take care of, but I don't think there's anything insurmountable. It's just that I don't have time (nor justification) for it. And it will be indeed quite a bit of work to get going. -Daniel
From: Michel Dänzer michel.daenzer@amd.com
For placing a BO strictly from the bottom up, i.e. in the lowest hole which can hold it.
Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/ttm/ttm_bo_manager.c | 5 +++++ include/drm/ttm/ttm_placement.h | 5 ++++- 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index aa0bd05..a64397b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -73,6 +73,11 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, aflags = DRM_MM_CREATE_TOP; }
+ if (place->flags & TTM_PL_FLAG_BOTTOMUP) { + sflags = DRM_MM_SEARCH_DEFAULT; + aflags = DRM_MM_CREATE_DEFAULT; + } + spin_lock(&rman->lock); ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages, mem->page_alignment, 0, diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h index 8ed44f9..c9ae86a 100644 --- a/include/drm/ttm/ttm_placement.h +++ b/include/drm/ttm/ttm_placement.h @@ -66,7 +66,9 @@ * TTM_PL_FLAG_NO_EVICT means that the buffer may never * be evicted to make room for other buffers. * TTM_PL_FLAG_TOPDOWN requests to be placed from the - * top of the memory area, instead of the bottom. + * top of the memory area. + * TTM_PL_FLAG_BOTTOMUP requests to be placed from the + * bottom of the memory area. */
#define TTM_PL_FLAG_CACHED (1 << 16) @@ -75,6 +77,7 @@ #define TTM_PL_FLAG_SHARED (1 << 20) #define TTM_PL_FLAG_NO_EVICT (1 << 21) #define TTM_PL_FLAG_TOPDOWN (1 << 22) +#define TTM_PL_FLAG_BOTTOMUP (1 << 23)
#define TTM_PL_MASK_CACHING (TTM_PL_FLAG_CACHED | \ TTM_PL_FLAG_UNCACHED | \
From: Michel Dänzer michel.daenzer@amd.com
In the words of Daniel Vetter:
«I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.»
Signed-off-by: Michel Dänzer michel.daenzer@amd.com ---
This patch replaces the previous patches 4 and 5.
Not sure about the function header comment hunk, maybe what it says is true even with DRM_MM_SEARCH_BELOW?
drivers/gpu/drm/drm_mm.c | 36 +++++------------------------------- drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +- include/drm/drm_mm.h | 3 +-- 3 files changed, 7 insertions(+), 34 deletions(-)
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 04a209e..3e0ef8c 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -409,16 +409,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, enum drm_mm_search_flags flags) { struct drm_mm_node *entry; - struct drm_mm_node *best; unsigned long adj_start; unsigned long adj_end; - unsigned long best_size;
BUG_ON(mm->scanned_blocks);
- best = NULL; - best_size = ~0UL; - __drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) { unsigned long hole_size = adj_end - adj_start; @@ -429,19 +424,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, continue; }
- if (!check_free_hole(adj_start, adj_end, size, alignment)) - continue; - - if (!(flags & DRM_MM_SEARCH_BEST)) + if (check_free_hole(adj_start, adj_end, size, alignment)) return entry; - - if (hole_size < best_size) { - best = entry; - best_size = hole_size; - } }
- return best; + return NULL; }
static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm, @@ -453,16 +440,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ enum drm_mm_search_flags flags) { struct drm_mm_node *entry; - struct drm_mm_node *best; unsigned long adj_start; unsigned long adj_end; - unsigned long best_size;
BUG_ON(mm->scanned_blocks);
- best = NULL; - best_size = ~0UL; - __drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) { unsigned long hole_size = adj_end - adj_start; @@ -478,19 +460,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ continue; }
- if (!check_free_hole(adj_start, adj_end, size, alignment)) - continue; - - if (!(flags & DRM_MM_SEARCH_BEST)) + if (check_free_hole(adj_start, adj_end, size, alignment)) return entry; - - if (hole_size < best_size) { - best = entry; - best_size = hole_size; - } }
- return best; + return NULL; }
/** @@ -679,7 +653,7 @@ EXPORT_SYMBOL(drm_mm_scan_add_block); * corrupted. * * When the scan list is empty, the selected memory nodes can be freed. An - * immediately following drm_mm_search_free with !DRM_MM_SEARCH_BEST will then + * immediately following drm_mm_search_free with !DRM_MM_SEARCH_BELOW will then * return the just freed block (because its at the top of the free_stack list). * * Returns: diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index aa0bd05..42c2aa9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -55,7 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv; struct drm_mm *mm = &rman->mm; struct drm_mm_node *node = NULL; - enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST; + enum drm_mm_search_flags sflags = DRM_MM_SEARCH_DEFAULT; enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT; unsigned long lpfn; int ret; diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h index a24addf..f38f7a0 100644 --- a/include/drm/drm_mm.h +++ b/include/drm/drm_mm.h @@ -46,8 +46,7 @@
enum drm_mm_search_flags { DRM_MM_SEARCH_DEFAULT = 0, - DRM_MM_SEARCH_BEST = 1 << 0, - DRM_MM_SEARCH_BELOW = 1 << 1, + DRM_MM_SEARCH_BELOW = 1 << 0, };
enum drm_mm_allocator_flags {
On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer michel@daenzer.net wrote:
From: Michel Dänzer michel.daenzer@amd.com
In the words of Daniel Vetter:
«I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.»
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
Applied this series to my -next tree.
Thanks,
Alex
This patch replaces the previous patches 4 and 5.
Not sure about the function header comment hunk, maybe what it says is true even with DRM_MM_SEARCH_BELOW?
drivers/gpu/drm/drm_mm.c | 36 +++++------------------------------- drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +- include/drm/drm_mm.h | 3 +-- 3 files changed, 7 insertions(+), 34 deletions(-)
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 04a209e..3e0ef8c 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -409,16 +409,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, enum drm_mm_search_flags flags) { struct drm_mm_node *entry;
struct drm_mm_node *best; unsigned long adj_start; unsigned long adj_end;
unsigned long best_size; BUG_ON(mm->scanned_blocks);
best = NULL;
best_size = ~0UL;
__drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) { unsigned long hole_size = adj_end - adj_start;
@@ -429,19 +424,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, continue; }
if (!check_free_hole(adj_start, adj_end, size, alignment))
continue;
if (!(flags & DRM_MM_SEARCH_BEST))
if (check_free_hole(adj_start, adj_end, size, alignment)) return entry;
if (hole_size < best_size) {
best = entry;
best_size = hole_size;
} }
return best;
return NULL;
}
static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm, @@ -453,16 +440,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ enum drm_mm_search_flags flags) { struct drm_mm_node *entry;
struct drm_mm_node *best; unsigned long adj_start; unsigned long adj_end;
unsigned long best_size; BUG_ON(mm->scanned_blocks);
best = NULL;
best_size = ~0UL;
__drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) { unsigned long hole_size = adj_end - adj_start;
@@ -478,19 +460,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ continue; }
if (!check_free_hole(adj_start, adj_end, size, alignment))
continue;
if (!(flags & DRM_MM_SEARCH_BEST))
if (check_free_hole(adj_start, adj_end, size, alignment)) return entry;
if (hole_size < best_size) {
best = entry;
best_size = hole_size;
} }
return best;
return NULL;
}
/** @@ -679,7 +653,7 @@ EXPORT_SYMBOL(drm_mm_scan_add_block);
- corrupted.
- When the scan list is empty, the selected memory nodes can be freed. An
- immediately following drm_mm_search_free with !DRM_MM_SEARCH_BEST will then
- immediately following drm_mm_search_free with !DRM_MM_SEARCH_BELOW will then
- return the just freed block (because its at the top of the free_stack list).
- Returns:
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index aa0bd05..42c2aa9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -55,7 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv; struct drm_mm *mm = &rman->mm; struct drm_mm_node *node = NULL;
enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST;
enum drm_mm_search_flags sflags = DRM_MM_SEARCH_DEFAULT; enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT; unsigned long lpfn; int ret;
diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h index a24addf..f38f7a0 100644 --- a/include/drm/drm_mm.h +++ b/include/drm/drm_mm.h @@ -46,8 +46,7 @@
enum drm_mm_search_flags { DRM_MM_SEARCH_DEFAULT = 0,
DRM_MM_SEARCH_BEST = 1 << 0,
DRM_MM_SEARCH_BELOW = 1 << 1,
DRM_MM_SEARCH_BELOW = 1 << 0,
};
enum drm_mm_allocator_flags {
2.1.1
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 01.11.2014 03:10, Alex Deucher wrote:
On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer michel@daenzer.net wrote:
From: Michel Dänzer michel.daenzer@amd.com
In the words of Daniel Vetter:
«I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.»
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
Applied this series to my -next tree.
Thanks. Unfortunately, the removal of DRM_MM_SEARCH_BEST caused performance regressions, presumably due to excessive eviction from VRAM, see https://bugs.freedesktop.org/show_bug.cgi?id=84662#c49 . I can reproduce a significant bad effect with the UE4 Elemental demo on my development machine. Please drop this patch.
The original patches 4 & 5 have the same effect, so they shouldn't be applied either. Maybe we can try this again once TTM is using the MM scan roaster for eviction.
On Tue, Nov 4, 2014 at 3:47 AM, Michel Dänzer michel@daenzer.net wrote:
On 01.11.2014 03:10, Alex Deucher wrote:
On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer michel@daenzer.net wrote:
From: Michel Dänzer michel.daenzer@amd.com
In the words of Daniel Vetter:
«I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.»
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
Applied this series to my -next tree.
Thanks. Unfortunately, the removal of DRM_MM_SEARCH_BEST caused performance regressions, presumably due to excessive eviction from VRAM, see https://bugs.freedesktop.org/show_bug.cgi?id=84662#c49 . I can reproduce a significant bad effect with the UE4 Elemental demo on my development machine. Please drop this patch.
The original patches 4 & 5 have the same effect, so they shouldn't be applied either. Maybe we can try this again once TTM is using the MM scan roaster for eviction.
I dropped this patch.
Thanks,
Alex
From: Michel Dänzer michel.daenzer@amd.com
We want to allocate small BOs strictly from the bottom up, just as large BOs are allocated strictly from the top down.
Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/radeon/radeon_object.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index d23b0dd..d9f7a02 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -180,9 +180,11 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) * 512kb was measured as the most optimal number. */ if (rbo->tbo.mem.size > 512 * 1024) { - for (i = 0; i < c; i++) { + for (i = 0; i < c; i++) rbo->placements[i].flags |= TTM_PL_FLAG_TOPDOWN; - } + } else { + for (i = 0; i < c; i++) + rbo->placements[i].flags |= TTM_PL_FLAG_BOTTOMUP; } }
On Tue, 28 Oct 2014 18:35:02 +0900 Michel Dänzer michel@daenzer.net wrote:
From: Michel Dänzer michel.daenzer@amd.com
I wasn't sure if TTM_PL_FLAG_TOPDOWN works correctly with non-0 lpfn, but AFAICT it does.
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
This series is
Reviewed-by: Lauri Kasanen cand@gmx.com
- Lauri
dri-devel@lists.freedesktop.org