[PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs

List overview All Threads
Download

newer

older

[PATCH] drm: Reduce code size of...

[RFC 0/2] PM and DRM: Add runtime...

Michel Dänzer

28 Oct 2014 28 Oct '14

9:35 a.m.

From: Michel Dänzer michel.daenzer@amd.com

I wasn't sure if TTM_PL_FLAG_TOPDOWN works correctly with non-0 lpfn, but AFAICT it does.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/radeon/radeon_object.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 9c92976..d23b0dd 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -179,9 +179,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) * improve fragmentation quality. * 512kb was measured as the most optimal number. */ - if (!((rbo->flags & RADEON_GEM_CPU_ACCESS) && - (rbo->placements[i].flags & TTM_PL_FLAG_VRAM)) && - rbo->tbo.mem.size > 512 * 1024) { + if (rbo->tbo.mem.size > 512 * 1024) { for (i = 0; i < c; i++) { rbo->placements[i].flags |= TTM_PL_FLAG_TOPDOWN; }

-- 2.1.1

Show replies by date

Michel Dänzer

28 Oct 28 Oct

9:35 a.m.

New subject: [PATCH 2/5] drm/ttm: Add DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN

From: Michel Dänzer michel.daenzer@amd.com

If the BO should be placed at the top of the area, we should start looking for holes from the top.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/ttm/ttm_bo_manager.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index 964387f..1e93f6c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -55,6 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv; struct drm_mm *mm = &rman->mm; struct drm_mm_node *node = NULL; + enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST; enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT; unsigned long lpfn; int ret; @@ -67,15 +68,16 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, if (!node) return -ENOMEM;

- if (place->flags & TTM_PL_FLAG_TOPDOWN) + if (place->flags & TTM_PL_FLAG_TOPDOWN) { + sflags |= DRM_MM_SEARCH_BELOW; aflags = DRM_MM_CREATE_TOP; + }

spin_lock(&rman->lock); ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages, mem->page_alignment, 0, place->fpfn, lpfn, - DRM_MM_SEARCH_BEST, - aflags); + sflags, aflags); spin_unlock(&rman->lock);

if (unlikely(ret)) {

-- 2.1.1

Michel Dänzer

9:35 a.m.

New subject: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN

From: Michel Dänzer michel.daenzer@amd.com

DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:

* The smallest hole may be in the overall bottom of the area * If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole

Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index 1e93f6c..aa0bd05 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -69,7 +69,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, return -ENOMEM;

if (place->flags & TTM_PL_FLAG_TOPDOWN) { - sflags |= DRM_MM_SEARCH_BELOW; + sflags = DRM_MM_SEARCH_BELOW; aflags = DRM_MM_CREATE_TOP; }

-- 2.1.1

Daniel Vetter

11:10 a.m.

New subject: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN

On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:

...

From: Michel Dänzer michel.daenzer@amd.com

DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:

The smallest hole may be in the overall bottom of the area

If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.

Another one that I've advertised a bunch of times already is the scan roaster in drm_mm.c: Currently ttm just evicts until there's a big enough hole, which is fairly awful if you have quasi-segmented memory like with top-down/bottom-up schemes and different ranges for different units. With the roaster you just walk the lru and build up potential holes until there's a suitable one, and then only evict those buffers. Which means if you have a certain range of memory under very high pressure (e.g. the 256M which uvd can use or whatever it is), then you wont thrash all the other vram too.

Cheers, Daniel

...

drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index 1e93f6c..aa0bd05 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -69,7 +69,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, return -ENOMEM;

if (place->flags & TTM_PL_FLAG_TOPDOWN) {
sflags |= DRM_MM_SEARCH_BELOW;
sflags = DRM_MM_SEARCH_BELOW;
aflags = DRM_MM_CREATE_TOP; }
-- 2.1.1

dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Michel Dänzer

30 Oct 30 Oct

8:41 a.m.

New subject: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN

On 28.10.2014 20:10, Daniel Vetter wrote:

...

On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:

...
From: Michel Dänzer michel.daenzer@amd.com

DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:

The smallest hole may be in the overall bottom of the area

If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.

Sent out a new patch 4 which nukes it.

...

Another one that I've advertised a bunch of times already is the scan roaster in drm_mm.c: Currently ttm just evicts until there's a big enough hole, which is fairly awful if you have quasi-segmented memory like with top-down/bottom-up schemes and different ranges for different units. With the roaster you just walk the lru and build up potential holes until there's a suitable one, and then only evict those buffers. Which means if you have a certain range of memory under very high pressure (e.g. the 256M which uvd can use or whatever it is), then you wont thrash all the other vram too.

I recently made some changes in TTM and radeon to alleviate that particular issue, but the scan roaster does sound useful. Unfortunately, modifying TTM to use it is a bit too involved for me right now, any volunteers?

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Thomas Hellstrom

3:07 p.m.

New subject: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN

On 10/28/2014 12:10 PM, Daniel Vetter wrote:

...

On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:

...
From: Michel Dänzer michel.daenzer@amd.com

DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:

The smallest hole may be in the overall bottom of the area

If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.

Another one that I've advertised a bunch of times already is the scan roaster in drm_mm.c: Currently ttm just evicts until there's a big enough hole, which is fairly awful if you have quasi-segmented memory like with top-down/bottom-up schemes and different ranges for different units. With the roaster you just walk the lru and build up potential holes until there's a suitable one, and then only evict those buffers. Which means if you have a certain range of memory under very high pressure (e.g. the 256M which uvd can use or whatever it is), then you wont thrash all the other vram too.

Cheers, Daniel

And I think I've commented each time why that is currently not possible or at least worthile with TTM ;).

Although the idea is great, even if you make a nice list of suitable buffers to evict, as soon as you drop the LRU spinlock, that list is invalid. Worse, while you start evicting your list, other processes may steal part of the hole you've created.

As some point we should perhaps figure out how to protect each managed area with a separate mutex, but that would create trouble if, for example, evicting from VRAM to managed GART.

/Thomas

Daniel Vetter

31 Oct 31 Oct

5:19 p.m.

New subject: [PATCH 3/5] drm/ttm: Use only DRM_MM_SEARCH_BELOW for TTM_PL_FLAG_TOPDOWN

On Thu, Oct 30, 2014 at 04:07:05PM +0100, Thomas Hellstrom wrote:

...

On 10/28/2014 12:10 PM, Daniel Vetter wrote:

...
On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote:

...
From: Michel Dänzer michel.daenzer@amd.com

DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems against the idea of TTM_PL_FLAG_TOPDOWN:

The smallest hole may be in the overall bottom of the area

If the hole isn't much larger than the BO, it doesn't make much difference whether the BO is placed at the bottom or at the top of the hole

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.

Another one that I've advertised a bunch of times already is the scan roaster in drm_mm.c: Currently ttm just evicts until there's a big enough hole, which is fairly awful if you have quasi-segmented memory like with top-down/bottom-up schemes and different ranges for different units. With the roaster you just walk the lru and build up potential holes until there's a suitable one, and then only evict those buffers. Which means if you have a certain range of memory under very high pressure (e.g. the 256M which uvd can use or whatever it is), then you wont thrash all the other vram too.

Cheers, Daniel

And I think I've commented each time why that is currently not possible or at least worthile with TTM ;).

Although the idea is great, even if you make a nice list of suitable buffers to evict, as soon as you drop the LRU spinlock, that list is invalid. Worse, while you start evicting your list, other processes may steal part of the hole you've created.

As some point we should perhaps figure out how to protect each managed area with a separate mutex, but that would create trouble if, for example, evicting from VRAM to managed GART.

Yeah, you need to do the entire scanning/evicting under the lru lock, which means you need to jump over all the buffers you can't immediately trylock. Which also means that you need a slowpath fallback that rips out everything in case the buffer doesn't fit anywhere. Wrt the lru lock you need to at least overwrite the generic placement/eviction code since you need to a bit more logic and state-keeping for drm_mm than what's currently exposed in the super-generic ttm managed area allocator (which apparently could also manage an idr). But shouldn't be more fuzz than a new vfunc with the current code as default implementation.

For lru lock recursion (due to vram->gtt eviction or similar) I guess either rework stuff to allow the entire eviction process under lru. Or keep the drm_mm allocations on some separate list until you've evicted everything (dropping the lock meanwhile) and the mass-free the entire list under the lru lock and immediately do your allocation.

I'm not sure whether the long hold times for the lru lock really would be a problem (they'll happen even when you drop it for actual eviction since walking the lru lists is costly when you jump over lots of buffers that don't get you towards your goal), since evicting all buffers is probably going to thrash the system much worse anyway. And for the worst case a regular sched for lru lock contention or preemption should prevent interactivity disasters for parallel cmd submission.

So yeah some details to take care of, but I don't think there's anything insurmountable. It's just that I don't have time (nor justification) for it. And it will be indeed quite a bit of work to get going. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Michel Dänzer

28 Oct 28 Oct

9:35 a.m.

New subject: [PATCH 4/5] drm/ttm: Add TTM_PL_FLAG_BOTTOMUP

From: Michel Dänzer michel.daenzer@amd.com

For placing a BO strictly from the bottom up, i.e. in the lowest hole which can hold it.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/ttm/ttm_bo_manager.c | 5 +++++ include/drm/ttm/ttm_placement.h | 5 ++++- 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index aa0bd05..a64397b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -73,6 +73,11 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, aflags = DRM_MM_CREATE_TOP; }

+ if (place->flags & TTM_PL_FLAG_BOTTOMUP) { + sflags = DRM_MM_SEARCH_DEFAULT; + aflags = DRM_MM_CREATE_DEFAULT; + } + spin_lock(&rman->lock); ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages, mem->page_alignment, 0, diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h index 8ed44f9..c9ae86a 100644 --- a/include/drm/ttm/ttm_placement.h +++ b/include/drm/ttm/ttm_placement.h @@ -66,7 +66,9 @@ * TTM_PL_FLAG_NO_EVICT means that the buffer may never * be evicted to make room for other buffers. * TTM_PL_FLAG_TOPDOWN requests to be placed from the - * top of the memory area, instead of the bottom. + * top of the memory area. + * TTM_PL_FLAG_BOTTOMUP requests to be placed from the + * bottom of the memory area. */

#define TTM_PL_FLAG_CACHED (1 << 16) @@ -75,6 +77,7 @@ #define TTM_PL_FLAG_SHARED (1 << 20) #define TTM_PL_FLAG_NO_EVICT (1 << 21) #define TTM_PL_FLAG_TOPDOWN (1 << 22) +#define TTM_PL_FLAG_BOTTOMUP (1 << 23)

#define TTM_PL_MASK_CACHING (TTM_PL_FLAG_CACHED | \ TTM_PL_FLAG_UNCACHED | \

-- 2.1.1

Michel Dänzer

30 Oct 30 Oct

8:36 a.m.

New subject: [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST

From: Michel Dänzer michel.daenzer@amd.com

In the words of Daniel Vetter:

«I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.»

Signed-off-by: Michel Dänzer michel.daenzer@amd.com ---

This patch replaces the previous patches 4 and 5.

Not sure about the function header comment hunk, maybe what it says is true even with DRM_MM_SEARCH_BELOW?

drivers/gpu/drm/drm_mm.c | 36 +++++------------------------------- drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +- include/drm/drm_mm.h | 3 +-- 3 files changed, 7 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 04a209e..3e0ef8c 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -409,16 +409,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, enum drm_mm_search_flags flags) { struct drm_mm_node *entry; - struct drm_mm_node *best; unsigned long adj_start; unsigned long adj_end; - unsigned long best_size;

BUG_ON(mm->scanned_blocks);

- best = NULL; - best_size = ~0UL; - __drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) { unsigned long hole_size = adj_end - adj_start; @@ -429,19 +424,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, continue; }

- if (!check_free_hole(adj_start, adj_end, size, alignment)) - continue; - - if (!(flags & DRM_MM_SEARCH_BEST)) + if (check_free_hole(adj_start, adj_end, size, alignment)) return entry; - - if (hole_size < best_size) { - best = entry; - best_size = hole_size; - } }

- return best; + return NULL; }

static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm, @@ -453,16 +440,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ enum drm_mm_search_flags flags) { struct drm_mm_node *entry; - struct drm_mm_node *best; unsigned long adj_start; unsigned long adj_end; - unsigned long best_size;

BUG_ON(mm->scanned_blocks);

- best = NULL; - best_size = ~0UL; - __drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) { unsigned long hole_size = adj_end - adj_start; @@ -478,19 +460,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ continue; }

- return best; + return NULL; }

/** @@ -679,7 +653,7 @@ EXPORT_SYMBOL(drm_mm_scan_add_block); * corrupted. * * When the scan list is empty, the selected memory nodes can be freed. An - * immediately following drm_mm_search_free with !DRM_MM_SEARCH_BEST will then + * immediately following drm_mm_search_free with !DRM_MM_SEARCH_BELOW will then * return the just freed block (because its at the top of the free_stack list). * * Returns: diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index aa0bd05..42c2aa9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -55,7 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv; struct drm_mm *mm = &rman->mm; struct drm_mm_node *node = NULL; - enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST; + enum drm_mm_search_flags sflags = DRM_MM_SEARCH_DEFAULT; enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT; unsigned long lpfn; int ret; diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h index a24addf..f38f7a0 100644 --- a/include/drm/drm_mm.h +++ b/include/drm/drm_mm.h @@ -46,8 +46,7 @@

enum drm_mm_search_flags { DRM_MM_SEARCH_DEFAULT = 0, - DRM_MM_SEARCH_BEST = 1 << 0, - DRM_MM_SEARCH_BELOW = 1 << 1, + DRM_MM_SEARCH_BELOW = 1 << 0, };

enum drm_mm_allocator_flags {

-- 2.1.1

Alex Deucher

31 Oct 31 Oct

6:10 p.m.

New subject: [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST

On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer michel@daenzer.net wrote:

...

From: Michel Dänzer michel.daenzer@amd.com

In the words of Daniel Vetter:

«I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.»

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

Applied this series to my -next tree.

Thanks,

Alex

...

This patch replaces the previous patches 4 and 5.

Not sure about the function header comment hunk, maybe what it says is true even with DRM_MM_SEARCH_BELOW?

drivers/gpu/drm/drm_mm.c | 36 +++++------------------------------- drivers/gpu/drm/ttm/ttm_bo_manager.c | 2 +- include/drm/drm_mm.h | 3 +-- 3 files changed, 7 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 04a209e..3e0ef8c 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -409,16 +409,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, enum drm_mm_search_flags flags) { struct drm_mm_node *entry;
  struct drm_mm_node *best;
  unsigned long adj_start;
  unsigned long adj_end;
  unsigned long best_size;

  BUG_ON(mm->scanned_blocks);
  best = NULL;
  best_size = ~0UL;
  __drm_mm_for_each_hole(entry, mm, adj_start, adj_end,
                         flags & DRM_MM_SEARCH_BELOW) {
          unsigned long hole_size = adj_end - adj_start;
@@ -429,19 +424,11 @@ static struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, continue; }
          if (!check_free_hole(adj_start, adj_end, size, alignment))
                  continue;
          if (!(flags & DRM_MM_SEARCH_BEST))
          if (check_free_hole(adj_start, adj_end, size, alignment))
                  return entry;
          if (hole_size < best_size) {
                  best = entry;
                  best_size = hole_size;
          }
  }
  return best;
  return NULL;
}

static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm, @@ -453,16 +440,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ enum drm_mm_search_flags flags) { struct drm_mm_node *entry;
  struct drm_mm_node *best;
  unsigned long adj_start;
  unsigned long adj_end;
  unsigned long best_size;

  BUG_ON(mm->scanned_blocks);
  best = NULL;
  best_size = ~0UL;
  __drm_mm_for_each_hole(entry, mm, adj_start, adj_end,
                         flags & DRM_MM_SEARCH_BELOW) {
          unsigned long hole_size = adj_end - adj_start;
@@ -478,19 +460,11 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ continue; }
          if (!check_free_hole(adj_start, adj_end, size, alignment))
                  continue;
          if (!(flags & DRM_MM_SEARCH_BEST))
          if (check_free_hole(adj_start, adj_end, size, alignment))
                  return entry;
          if (hole_size < best_size) {
                  best = entry;
                  best_size = hole_size;
          }
  }
  return best;
  return NULL;
}

/** @@ -679,7 +653,7 @@ EXPORT_SYMBOL(drm_mm_scan_add_block);

corrupted.

When the scan list is empty, the selected memory nodes can be freed. An

immediately following drm_mm_search_free with !DRM_MM_SEARCH_BEST will then

immediately following drm_mm_search_free with !DRM_MM_SEARCH_BELOW will then

return the just freed block (because its at the top of the free_stack list).

Returns:

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index aa0bd05..42c2aa9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -55,7 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv; struct drm_mm *mm = &rman->mm; struct drm_mm_node *node = NULL;
  enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST;
  enum drm_mm_search_flags sflags = DRM_MM_SEARCH_DEFAULT;
  enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT;
  unsigned long lpfn;
  int ret;
diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h index a24addf..f38f7a0 100644 --- a/include/drm/drm_mm.h +++ b/include/drm/drm_mm.h @@ -46,8 +46,7 @@

enum drm_mm_search_flags { DRM_MM_SEARCH_DEFAULT = 0,
  DRM_MM_SEARCH_BEST =            1 << 0,
  DRM_MM_SEARCH_BELOW =           1 << 1,
  DRM_MM_SEARCH_BELOW =           1 << 0,
};

enum drm_mm_allocator_flags {

2.1.1

dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Michel Dänzer

4 Nov 4 Nov

8:47 a.m.

New subject: [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST

On 01.11.2014 03:10, Alex Deucher wrote:

...

On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer michel@daenzer.net wrote:

...
From: Michel Dänzer michel.daenzer@amd.com

In the words of Daniel Vetter:

«I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.»

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

Applied this series to my -next tree.

Thanks. Unfortunately, the removal of DRM_MM_SEARCH_BEST caused performance regressions, presumably due to excessive eviction from VRAM, see https://bugs.freedesktop.org/show_bug.cgi?id=84662#c49 . I can reproduce a significant bad effect with the UE4 Elemental demo on my development machine. Please drop this patch.

The original patches 4 & 5 have the same effect, so they shouldn't be applied either. Maybe we can try this again once TTM is using the MM scan roaster for eviction.

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Alex Deucher

2:30 p.m.

New subject: [PATCH 4/4] drm/mm: Remove DRM_MM_SEARCH_BEST

On Tue, Nov 4, 2014 at 3:47 AM, Michel Dänzer michel@daenzer.net wrote:

...

On 01.11.2014 03:10, Alex Deucher wrote:

...
On Thu, Oct 30, 2014 at 4:36 AM, Michel Dänzer michel@daenzer.net wrote:

...
From: Michel Dänzer michel.daenzer@amd.com

In the words of Daniel Vetter:

«I think SEARCH_BEST is pretty much always a bad idea - it rips apart allocations from the same execbuf, and usually those get recycled around the same time. Which means you'll just fragment your mm even more if you try to find the best hole instead of just picking one and the stuffing the entire execbuf into it. So imo we might as well just kill it.»

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

Applied this series to my -next tree.

Thanks. Unfortunately, the removal of DRM_MM_SEARCH_BEST caused performance regressions, presumably due to excessive eviction from VRAM, see https://bugs.freedesktop.org/show_bug.cgi?id=84662#c49 . I can reproduce a significant bad effect with the UE4 Elemental demo on my development machine. Please drop this patch.

The original patches 4 & 5 have the same effect, so they shouldn't be applied either. Maybe we can try this again once TTM is using the MM scan roaster for eviction.

I dropped this patch.

Thanks,

Alex

Michel Dänzer

28 Oct 28 Oct

9:35 a.m.

New subject: [PATCH 5/5] drm/radeon: Use TTM_PL_FLAG_BOTTOMUP

From: Michel Dänzer michel.daenzer@amd.com

We want to allocate small BOs strictly from the bottom up, just as large BOs are allocated strictly from the top down.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com --- drivers/gpu/drm/radeon/radeon_object.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index d23b0dd..d9f7a02 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -180,9 +180,11 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) * 512kb was measured as the most optimal number. */ if (rbo->tbo.mem.size > 512 * 1024) { - for (i = 0; i < c; i++) { + for (i = 0; i < c; i++) rbo->placements[i].flags |= TTM_PL_FLAG_TOPDOWN; - } + } else { + for (i = 0; i < c; i++) + rbo->placements[i].flags |= TTM_PL_FLAG_BOTTOMUP; } }

-- 2.1.1

Lauri Kasanen

10:46 a.m.

On Tue, 28 Oct 2014 18:35:02 +0900 Michel Dänzer michel@daenzer.net wrote:

...

From: Michel Dänzer michel.daenzer@amd.com

I wasn't sure if TTM_PL_FLAG_TOPDOWN works correctly with non-0 lpfn, but AFAICT it does.

Signed-off-by: Michel Dänzer michel.daenzer@amd.com

This series is

Reviewed-by: Lauri Kasanen cand@gmx.com

- Lauri

3821

Age (days ago)

3828

Last active (days ago)

dri-devel@lists.freedesktop.org

13 comments

5 participants

tags (0)

participants (5)

Alex Deucher
Daniel Vetter
Lauri Kasanen
Michel Dänzer
Thomas Hellstrom