[PATCH 0/3] drm/ttm: hard to swim in an empty pool

List overview All Threads
Download

newer

older

[Bug 91256] Xorg dies - monitor...

[Bug 36563] New: Unity locks up...

Jérôme Glisse

13 Aug 2014 13 Aug '14

3:52 a.m.

So it seems there was several issue with the various ttm pool. The obvious one is fixed in patch 2 where the always empty pool syndrom is addressed. However the pool size are kind of crazy and because before some pool were never actualy fill we might never have experience the hill effect of the crazy maximum limit. This is what is addressed by first patch.

Last patch cook it up further so that under memory pressure the pool size is divided by 2 each time a shrinker is run on a pool. There is a timeout to restore the pool size on next allocation. Idea here is that memory should not last and if it last then shrinker will keep minimize the pool size and anyway thing are probably already sluggish once we it the shrinker path.

Of course because this fix thing in ttm memory allocation this need careful testing. So before pushing anything i would like to see more people testing this.

Cheers, Jérôme

Show replies by date

Jérôme Glisse

13 Aug 13 Aug

3:52 a.m.

New subject: [PATCH 1/3] drm/ttm: set sensible pool size limit.

From: Jérôme Glisse jglisse@redhat.com

Due to bug in code it appear that some of the pool where never properly use and always empty. Before fixing that bug this patch set sensible limit on pool size. The magic 64MB number was nominated.

This is obviously a some what arbitrary number but the intend of ttm pool is to minimize page alloc cost especialy when allocating page that will be mark to be excluded from cpu cache mecanisms. We assume that mostly small buffer that are constantly allocated/deallocated might suffer from core memory allocation overhead as well as cache status change. This are the assumptions behind the 64MB value.

This obviously need some serious testing including monitoring pool size.

Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Mario Kleiner mario.kleiner.de@gmail.com Cc: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com --- drivers/gpu/drm/ttm/ttm_memory.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c index dbc2def..73b2ded 100644 --- a/drivers/gpu/drm/ttm/ttm_memory.c +++ b/drivers/gpu/drm/ttm/ttm_memory.c @@ -38,6 +38,16 @@ #include <linux/slab.h>

#define TTM_MEMORY_ALLOC_RETRIES 4 +/* Have a maximum of 64MB of memory inside the pool. + * + * This is obviously a some what arbitrary number but the intend of ttm pool + * is to minimize page alloc cost especialy when allocating page that will be + * mark to be excluded from cpu cache mecanisms. We assume that mostly small + * buffer that are constantly allocated/deallocated might suffer from core + * memory allocation overhead as well as cache status change. This are the + * assumptions behind the 64MB value. + */ +#define MAX_POOL_SIZE (64UL << 20UL)

struct ttm_mem_zone { struct kobject kobj; @@ -363,6 +373,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob) int ret; int i; struct ttm_mem_zone *zone; + unsigned long max_pool_size;

spin_lock_init(&glob->lock); glob->swap_queue = create_singlethread_workqueue("ttm_swap"); @@ -393,8 +404,9 @@ int ttm_mem_global_init(struct ttm_mem_global *glob) pr_info("Zone %7s: Available graphics memory: %llu kiB\n", zone->name, (unsigned long long)zone->max_mem >> 10); } - ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE)); - ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE)); + max_pool_size = min(glob->zone_kernel->max_mem >> 3UL, MAX_POOL_SIZE); + ttm_page_alloc_init(glob, max_pool_size / (2 * PAGE_SIZE)); + ttm_dma_page_alloc_init(glob, max_pool_size / (2 * PAGE_SIZE)); return 0; out_no_zone: ttm_mem_global_release(glob);

-- 1.9.3

Michel Dänzer

6:24 a.m.

New subject: [PATCH 1/3] drm/ttm: set sensible pool size limit.

On 13.08.2014 12:52, Jérôme Glisse wrote:

...

From: Jérôme Glisse jglisse@redhat.com

Due to bug in code it appear that some of the pool where never properly use and always empty. Before fixing that bug this patch set sensible limit on pool size. The magic 64MB number was nominated.

This is obviously a some what arbitrary number but the intend of ttm pool is to minimize page alloc cost especialy when allocating page that will be mark to be excluded from cpu cache mecanisms. We assume that mostly small buffer that are constantly allocated/deallocated might suffer from core memory allocation overhead as well as cache status change. This are the assumptions behind the 64MB value.

This obviously need some serious testing including monitoring pool size.

Signed-off-by: Jérôme Glisse jglisse@redhat.com

[...]

...

@@ -393,8 +404,9 @@ int ttm_mem_global_init(struct ttm_mem_global *glob) pr_info("Zone %7s: Available graphics memory: %llu kiB\n", zone->name, (unsigned long long)zone->max_mem >> 10); }

ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));

ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));

max_pool_size = min(glob->zone_kernel->max_mem >> 3UL, MAX_POOL_SIZE);

This introduces a 'comparison of distinct pointer types lacks a cast' warning for me.

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Jérôme Glisse

3:52 a.m.

New subject: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool.

From: Jérôme Glisse jglisse@redhat.com

Current code never allowed the page pool to actualy fill in anyway. This fix it and also allow it to grow over its limit until it grow beyond the batch size for allocation and deallocation.

Signed-off-by: Jérôme Glisse jglisse@redhat.com Reviewed-by: Mario Kleiner mario.kleiner.de@gmail.com Tested-by: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com --- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index c96db43..a076ff3 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list); - npages = count; - if (pool->npages_free > _manager->options.max_size) { + if (pool->npages_free >= (_manager->options.max_size + + NUM_PAGES_TO_ALLOC)) npages = pool->npages_free - _manager->options.max_size; - /* free at least NUM_PAGES_TO_ALLOC number of pages - * to reduce calls to set_memory_wb */ - if (npages < NUM_PAGES_TO_ALLOC) - npages = NUM_PAGES_TO_ALLOC; - } } spin_unlock_irqrestore(&pool->lock, irq_flags);

-- 1.9.3

Konrad Rzeszutek Wilk

25 Mar 25 Mar

7:06 p.m.

New subject: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool.

On Tue, Aug 12, 2014 at 11:52:05PM -0400, Jérôme Glisse wrote:

...

From: Jérôme Glisse jglisse@redhat.com

Current code never allowed the page pool to actualy fill in anyway. This fix it and also allow it to grow over its limit until it grow beyond the batch size for allocation and deallocation.

Signed-off-by: Jérôme Glisse jglisse@redhat.com Reviewed-by: Mario Kleiner mario.kleiner.de@gmail.com Tested-by: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

...

drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index c96db43..a076ff3 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
npages = count;
if (pool->npages_free > _manager->options.max_size) {
if (pool->npages_free >= (_manager->options.max_size +
			  NUM_PAGES_TO_ALLOC))
npages = pool->npages_free - _manager->options.max_size;
	/* free at least NUM_PAGES_TO_ALLOC number of pages
	 * to reduce calls to set_memory_wb */
	if (npages < NUM_PAGES_TO_ALLOC)
		npages = NUM_PAGES_TO_ALLOC;
}
} spin_unlock_irqrestore(&pool->lock, irq_flags);
-- 1.9.3

Michel Dänzer

6 Jul 6 Jul

9:11 a.m.

New subject: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool.

Hi Jérôme,

On 13.08.2014 12:52, Jérôme Glisse wrote:

...

From: Jérôme Glisse jglisse@redhat.com

Current code never allowed the page pool to actualy fill in anyway. This fix it and also allow it to grow over its limit until it grow beyond the batch size for allocation and deallocation.

Signed-off-by: Jérôme Glisse jglisse@redhat.com Reviewed-by: Mario Kleiner mario.kleiner.de@gmail.com Tested-by: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index c96db43..a076ff3 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
npages = count;
if (pool->npages_free > _manager->options.max_size) {
if (pool->npages_free >= (_manager->options.max_size +
			  NUM_PAGES_TO_ALLOC))
npages = pool->npages_free - _manager->options.max_size;
	/* free at least NUM_PAGES_TO_ALLOC number of pages
	 * to reduce calls to set_memory_wb */
	if (npages < NUM_PAGES_TO_ALLOC)
		npages = NUM_PAGES_TO_ALLOC;
}
} spin_unlock_irqrestore(&pool->lock, irq_flags);

Colleagues of mine have measured significant performance gains for some workloads with this patch. Without it, a lot of CPU cycles are spent changing the caching attributes of pages on allocation.

Note that the performance effect seems to mostly disappear when applying patch 1 as well, so apparently 64MB is too small for the max pool size.

Is there any chance this patch could be applied without the controversial patch 3? If not, do you have ideas for addressing the concerns raised against patch 3?

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Jerome Glisse

4:10 p.m.

New subject: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool.

On Mon, Jul 06, 2015 at 06:11:29PM +0900, Michel Dänzer wrote:

...

Hi Jérôme,

On 13.08.2014 12:52, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

Current code never allowed the page pool to actualy fill in anyway. This fix it and also allow it to grow over its limit until it grow beyond the batch size for allocation and deallocation.

Signed-off-by: Jérôme Glisse jglisse@redhat.com Reviewed-by: Mario Kleiner mario.kleiner.de@gmail.com Tested-by: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index c96db43..a076ff3 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
npages = count;
if (pool->npages_free > _manager->options.max_size) {
if (pool->npages_free >= (_manager->options.max_size +
			  NUM_PAGES_TO_ALLOC))
npages = pool->npages_free - _manager->options.max_size;
	/* free at least NUM_PAGES_TO_ALLOC number of pages
	 * to reduce calls to set_memory_wb */
	if (npages < NUM_PAGES_TO_ALLOC)
		npages = NUM_PAGES_TO_ALLOC;
}
} spin_unlock_irqrestore(&pool->lock, irq_flags);
Colleagues of mine have measured significant performance gains for some workloads with this patch. Without it, a lot of CPU cycles are spent changing the caching attributes of pages on allocation.

Note that the performance effect seems to mostly disappear when applying patch 1 as well, so apparently 64MB is too small for the max pool size.

Is there any chance this patch could be applied without the controversial patch 3? If not, do you have ideas for addressing the concerns raised against patch 3?

Wahou, now i need to find the keys to the DeLorean to travel back in time.

This patch is a fix and should be applied without 1 or 3. Because today basicly the pool is always emptied and never fill up. But as Thomas pointed out there is already bit too much pool accross the stack. Proper solution would be to work something inside the mm level or the architecture (i assume AMD is mostly interested in x86 on that front).

Here the whole issue is really about allocating page with WC/UC cache properties. Changing cache properties on page is really costly on several level, like the kernel needs to break the huge linear mapping and populate lower level to remap the page with proper cache attribute inside the kernel mapping.

As far as i remember the kernel never goes back to huge page mapping when restoring page cache attribute, which meaning that little by litte with uptime you loose the whole huge page mapping benefit for everything and you waste more memory.

Anyway just wanted to dump here my recolection and how i think patch 1 & 3 should be replaced by simply moving down this allocation infrastructure inside core mm code. Where it should always have been.

In meantime i think we need to apply this patch as it is really a fix to make the code actually do what the comment and design pretends it does.

Cheers, Jérôme

Michel Dänzer

7 Jul 7 Jul

6:39 a.m.

New subject: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool.

On 07.07.2015 01:10, Jerome Glisse wrote:

...

On Mon, Jul 06, 2015 at 06:11:29PM +0900, Michel Dänzer wrote:

...
Hi Jérôme,

On 13.08.2014 12:52, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

Current code never allowed the page pool to actualy fill in anyway. This fix it and also allow it to grow over its limit until it grow beyond the batch size for allocation and deallocation.

Signed-off-by: Jérôme Glisse jglisse@redhat.com Reviewed-by: Mario Kleiner mario.kleiner.de@gmail.com Tested-by: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index c96db43..a076ff3 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
npages = count;
if (pool->npages_free > _manager->options.max_size) {
if (pool->npages_free >= (_manager->options.max_size +
			  NUM_PAGES_TO_ALLOC))
npages = pool->npages_free - _manager->options.max_size;
	/* free at least NUM_PAGES_TO_ALLOC number of pages
	 * to reduce calls to set_memory_wb */
	if (npages < NUM_PAGES_TO_ALLOC)
		npages = NUM_PAGES_TO_ALLOC;
}
} spin_unlock_irqrestore(&pool->lock, irq_flags);
Colleagues of mine have measured significant performance gains for some workloads with this patch. Without it, a lot of CPU cycles are spent changing the caching attributes of pages on allocation.

Note that the performance effect seems to mostly disappear when applying patch 1 as well, so apparently 64MB is too small for the max pool size.

Is there any chance this patch could be applied without the controversial patch 3? If not, do you have ideas for addressing the concerns raised against patch 3?
Wahou, now i need to find the keys to the DeLorean to travel back in time.

This patch is a fix and should be applied without 1 or 3. Because today basicly the pool is always emptied and never fill up. But as Thomas pointed out there is already bit too much pool accross the stack. Proper solution would be to work something inside the mm level or the architecture (i assume AMD is mostly interested in x86 on that front).

Here the whole issue is really about allocating page with WC/UC cache properties. Changing cache properties on page is really costly on several level, like the kernel needs to break the huge linear mapping and populate lower level to remap the page with proper cache attribute inside the kernel mapping.

As far as i remember the kernel never goes back to huge page mapping when restoring page cache attribute, which meaning that little by litte with uptime you loose the whole huge page mapping benefit for everything and you waste more memory.

That sounds pretty bad, but this patch might actually help a little for that by reducing the amount of huge page mappings that need to be broken up?

...

In meantime i think we need to apply this patch as it is really a fix to make the code actually do what the comment and design pretends it does.

I agree.

BTW, maybe it should be split up into the actual fix (removing the npages assignment) and the NUM_PAGES_TO_ALLOC related simplification?

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Jerome Glisse

5:41 p.m.

New subject: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool.

On Tue, Jul 07, 2015 at 03:39:29PM +0900, Michel Dänzer wrote:

...

On 07.07.2015 01:10, Jerome Glisse wrote:

...
On Mon, Jul 06, 2015 at 06:11:29PM +0900, Michel Dänzer wrote:

...
Hi Jérôme,

On 13.08.2014 12:52, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

Current code never allowed the page pool to actualy fill in anyway. This fix it and also allow it to grow over its limit until it grow beyond the batch size for allocation and deallocation.

Signed-off-by: Jérôme Glisse jglisse@redhat.com Reviewed-by: Mario Kleiner mario.kleiner.de@gmail.com Tested-by: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index c96db43..a076ff3 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
npages = count;
if (pool->npages_free > _manager->options.max_size) {
if (pool->npages_free >= (_manager->options.max_size +
			  NUM_PAGES_TO_ALLOC))
npages = pool->npages_free - _manager->options.max_size;
	/* free at least NUM_PAGES_TO_ALLOC number of pages
	 * to reduce calls to set_memory_wb */
	if (npages < NUM_PAGES_TO_ALLOC)
		npages = NUM_PAGES_TO_ALLOC;
}
} spin_unlock_irqrestore(&pool->lock, irq_flags);
Colleagues of mine have measured significant performance gains for some workloads with this patch. Without it, a lot of CPU cycles are spent changing the caching attributes of pages on allocation.

Note that the performance effect seems to mostly disappear when applying patch 1 as well, so apparently 64MB is too small for the max pool size.

Is there any chance this patch could be applied without the controversial patch 3? If not, do you have ideas for addressing the concerns raised against patch 3?
Wahou, now i need to find the keys to the DeLorean to travel back in time.

This patch is a fix and should be applied without 1 or 3. Because today basicly the pool is always emptied and never fill up. But as Thomas pointed out there is already bit too much pool accross the stack. Proper solution would be to work something inside the mm level or the architecture (i assume AMD is mostly interested in x86 on that front).

Here the whole issue is really about allocating page with WC/UC cache properties. Changing cache properties on page is really costly on several level, like the kernel needs to break the huge linear mapping and populate lower level to remap the page with proper cache attribute inside the kernel mapping.

As far as i remember the kernel never goes back to huge page mapping when restoring page cache attribute, which meaning that little by litte with uptime you loose the whole huge page mapping benefit for everything and you waste more memory.
That sounds pretty bad, but this patch might actually help a little for that by reducing the amount of huge page mappings that need to be broken up?

Not really, for limiting huge page mapping breakage you would need to allocate page in same physical cluster so that only 1 huge page mapping needs to be broken. It would be a bit like DMA32 or DMA16 physical range. So this would obviously need some work at the MM level. At ttm level this can be more or less implemented using GFP_DMA32 flag on page allocation but at the same time doing that you kind of put more pressure on the first 4G of memory and i think nowadays with the whole webbrowser consuming several GB of texture you probably do not want to do that.

...

...
In meantime i think we need to apply this patch as it is really a fix to make the code actually do what the comment and design pretends it does.

I agree.

BTW, maybe it should be split up into the actual fix (removing the npages assignment) and the NUM_PAGES_TO_ALLOC related simplification?

This would make 2 really small patch, patch is small as it is :) But why not.

Cheers, Jérôme

Michel Dänzer

8 Jul 8 Jul

2:34 a.m.

New subject: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool.

On 08.07.2015 02:41, Jerome Glisse wrote:

...

On Tue, Jul 07, 2015 at 03:39:29PM +0900, Michel Dänzer wrote:

...
On 07.07.2015 01:10, Jerome Glisse wrote:

...
In meantime i think we need to apply this patch as it is really a fix to make the code actually do what the comment and design pretends it does.

I agree.

BTW, maybe it should be split up into the actual fix (removing the npages assignment) and the NUM_PAGES_TO_ALLOC related simplification?

This would make 2 really small patch, patch is small as it is :) But why not.

It's not about size but about having one commit for each logical change. It took me a while to realize that the NUM_PAGES_TO_ALLOC changes aren't directly related to the fix, and I've seen the same thing happen to others.

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Jérôme Glisse

13 Aug 13 Aug

3:52 a.m.

New subject: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool

From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Mario Kleiner mario.kleiner.de@gmail.com Cc: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++------- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++++++++++++++++++++++-- 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 09874d6..ab41adf 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -68,6 +68,8 @@ * @list: Pool of free uc/wc pages for fast reuse. * @gfp_flags: Flags to pass for alloc_page. * @npages: Number of pages in pool. + * @cur_max_size: Current maximum size for the pool. + * @shrink_timeout: Timeout for pool maximum size restriction. */ struct ttm_page_pool { spinlock_t lock; @@ -76,6 +78,8 @@ struct ttm_page_pool { gfp_t gfp_flags; unsigned npages; char *name; + unsigned cur_max_size; + unsigned long last_shrink; unsigned long nfrees; unsigned long nrefills; }; @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, pool->nfrees += freed_pages; }

+static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) +{ + if (time_before(jiffies, pool->shrink_timeout)) + return; + /* In case we reached zero bounce back to 512 pages. */ + pool->cur_max_size = max(pool->cur_max_size << 1, 512); + pool->cur_max_size = min(pool->cur_max_size, + _manager->options.max_size); +} + /** * Free pages from pool. * @@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (shrink_pages == 0) break; pool = &_manager->pools[(i + pool_offset)%NUM_POOLS]; + /* No matter what make sure the pool do not grow in next second. */ + pool->cur_max_size = pool->cur_max_size >> 1; + pool->shrink_timeout = jiffies + HZ; shrink_pages = ttm_page_pool_free(pool, nr_free, sc->gfp_mask); freed += nr_free - shrink_pages; @@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, } /* Check that we don't go over the pool limit */ npages = 0; - if (pool->npages > _manager->options.max_size) { - npages = pool->npages - _manager->options.max_size; - /* free at least NUM_PAGES_TO_ALLOC number of pages - * to reduce calls to set_memory_wb */ - if (npages < NUM_PAGES_TO_ALLOC) - npages = NUM_PAGES_TO_ALLOC; - } + /* + * Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to + * set_memory_wb. + */ + if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC)) + npages = pool->npages - pool->cur_max_size; spin_unlock_irqrestore(&pool->lock, irq_flags); if (npages) ttm_page_pool_free(pool, npages, GFP_KERNEL); @@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, return 0; }

+ /* Update pool size in case shrinker limited it. */ + ttm_pool_update_max_size(pool); + /* combine zero flag to pool flags */ gfp_flags |= pool->gfp_flags;

@@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, gfp_t flags, pool->npages = pool->nfrees = 0; pool->gfp_flags = flags; pool->name = name; + pool->cur_max_size = _manager->options.max_size; + pool->shrink_timeout = jiffies; }

int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index a076ff3..80b10aa 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -93,6 +93,8 @@ enum pool_type { * @size: Size used during DMA allocation. * @npages_free: Count of available pages for re-use. * @npages_in_use: Count of pages that are in use. + * @cur_max_size: Current maximum size for the pool. + * @shrink_timeout: Timeout for pool maximum size restriction. * @nfrees: Stats when pool is shrinking. * @nrefills: Stats when the pool is grown. * @gfp_flags: Flags to pass for alloc_page. @@ -110,6 +112,8 @@ struct dma_pool { unsigned size; unsigned npages_free; unsigned npages_in_use; + unsigned cur_max_size; + unsigned long last_shrink; unsigned long nfrees; /* Stats when shrunk. */ unsigned long nrefills; /* Stats when grown. */ gfp_t gfp_flags; @@ -331,6 +335,17 @@ static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page) kfree(d_page); d_page = NULL; } + +static inline void ttm_dma_pool_update_max_size(struct dma_pool *pool) +{ + if (time_before(jiffies, pool->shrink_timeout)) + return; + /* In case we reached zero bounce back to 512 pages. */ + pool->cur_max_size = max(pool->cur_max_size << 1, 512); + pool->cur_max_size = min(pool->cur_max_size, + _manager->options.max_size); +} + static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) { struct dma_page *d_page; @@ -606,6 +621,8 @@ static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags, pool->size = PAGE_SIZE; pool->type = type; pool->nrefills = 0; + pool->cur_max_size = _manager->options.max_size; + pool->shrink_timeout = jiffies; p = pool->name; for (i = 0; i < 5; i++) { if (type & t[i]) { @@ -892,6 +909,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev) } }

+ /* Update pool size in case shrinker limited it. */ + ttm_dma_pool_update_max_size(pool); + INIT_LIST_HEAD(&ttm_dma->pages_list); for (i = 0; i < ttm->num_pages; ++i) { ret = ttm_dma_pool_get_pages(pool, ttm_dma, i); @@ -953,9 +973,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list); - if (pool->npages_free >= (_manager->options.max_size + + if (pool->npages_free >= (pool->cur_max_size + NUM_PAGES_TO_ALLOC)) - npages = pool->npages_free - _manager->options.max_size; + npages = pool->npages_free - pool->cur_max_size; } spin_unlock_irqrestore(&pool->lock, irq_flags);

@@ -1024,6 +1044,9 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) /* Do it in round-robin fashion. */ if (++idx < pool_offset) continue; + /* No matter what make sure the pool do not grow in next second. */ + p->pool->cur_max_size = p->pool->cur_max_size >> 1; + p->pool->shrink_timeout = jiffies + HZ; nr_free = shrink_pages; shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, sc->gfp_mask);

-- 1.9.3

Michel Dänzer

6:32 a.m.

New subject: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool

On 13.08.2014 12:52, Jérôme Glisse wrote:

...

From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Signed-off-by: Jérôme Glisse jglisse@redhat.com

[...]

...

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 09874d6..ab41adf 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -68,6 +68,8 @@

@list: Pool of free uc/wc pages for fast reuse.

@gfp_flags: Flags to pass for alloc_page.

@npages: Number of pages in pool.

@cur_max_size: Current maximum size for the pool.

@shrink_timeout: Timeout for pool maximum size restriction.

*/

struct ttm_page_pool { spinlock_t lock; @@ -76,6 +78,8 @@ struct ttm_page_pool { gfp_t gfp_flags; unsigned npages; char *name;

unsigned cur_max_size;

unsigned long last_shrink;

s/last_shrink/shrink_timeout/

Looks like maybe you posted an untested stale set of patches?

...

@@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, pool->nfrees += freed_pages; }

+static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) +{
if (time_before(jiffies, pool->shrink_timeout))
return;
/* In case we reached zero bounce back to 512 pages. */

pool->cur_max_size = max(pool->cur_max_size << 1, 512);

Another 'comparison of distinct pointer types lacks a cast' warning.

Both issues apply to ttm_page_alloc_dma.c as well.

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer

Thomas Hellstrom

9:06 a.m.

New subject: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool

On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...

From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

/Thomas

...

Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Mario Kleiner mario.kleiner.de@gmail.com Cc: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++------- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++++++++++++++++++++++-- 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 09874d6..ab41adf 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -68,6 +68,8 @@

@list: Pool of free uc/wc pages for fast reuse.

@gfp_flags: Flags to pass for alloc_page.

@npages: Number of pages in pool.

@cur_max_size: Current maximum size for the pool.

@shrink_timeout: Timeout for pool maximum size restriction.

*/

struct ttm_page_pool { spinlock_t lock; @@ -76,6 +78,8 @@ struct ttm_page_pool { gfp_t gfp_flags; unsigned npages; char *name;

unsigned cur_max_size;

unsigned long last_shrink; unsigned long nfrees; unsigned long nrefills;

}; @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, pool->nfrees += freed_pages; }

+static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) +{
if (time_before(jiffies, pool->shrink_timeout))
return;
/* In case we reached zero bounce back to 512 pages. */

pool->cur_max_size = max(pool->cur_max_size << 1, 512);

pool->cur_max_size = min(pool->cur_max_size,
		 _manager->options.max_size);
+}

/**

Free pages from pool.

@@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (shrink_pages == 0) break; pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
/* No matter what make sure the pool do not grow in next second. */
pool->cur_max_size = pool->cur_max_size >> 1;
pool->shrink_timeout = jiffies + HZ;
shrink_pages = ttm_page_pool_free(pool, nr_free, sc->gfp_mask); freed += nr_free - shrink_pages;
@@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, } /* Check that we don't go over the pool limit */ npages = 0;
if (pool->npages > _manager->options.max_size) {
npages = pool->npages - _manager->options.max_size;
/* free at least NUM_PAGES_TO_ALLOC number of pages
 * to reduce calls to set_memory_wb */
if (npages < NUM_PAGES_TO_ALLOC)
	npages = NUM_PAGES_TO_ALLOC;
}
/*
* Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to
* set_memory_wb.
*/
if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC))
npages = pool->npages - pool->cur_max_size;
spin_unlock_irqrestore(&pool->lock, irq_flags); if (npages) ttm_page_pool_free(pool, npages, GFP_KERNEL);
@@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, return 0; }

/* Update pool size in case shrinker limited it. */

ttm_pool_update_max_size(pool);

/* combine zero flag to pool flags */ gfp_flags |= pool->gfp_flags;

@@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, gfp_t flags, pool->npages = pool->nfrees = 0; pool->gfp_flags = flags; pool->name = name;

pool->cur_max_size = _manager->options.max_size;

pool->shrink_timeout = jiffies;

}

int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index a076ff3..80b10aa 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -93,6 +93,8 @@ enum pool_type {

@size: Size used during DMA allocation.

@npages_free: Count of available pages for re-use.

@npages_in_use: Count of pages that are in use.

@cur_max_size: Current maximum size for the pool.

@shrink_timeout: Timeout for pool maximum size restriction.

@nfrees: Stats when pool is shrinking.

@nrefills: Stats when the pool is grown.

@gfp_flags: Flags to pass for alloc_page.

@@ -110,6 +112,8 @@ struct dma_pool { unsigned size; unsigned npages_free; unsigned npages_in_use;

unsigned cur_max_size;

unsigned long last_shrink; unsigned long nfrees; /* Stats when shrunk. */ unsigned long nrefills; /* Stats when grown. */ gfp_t gfp_flags;

@@ -331,6 +335,17 @@ static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page) kfree(d_page); d_page = NULL; }

+static inline void ttm_dma_pool_update_max_size(struct dma_pool *pool) +{
if (time_before(jiffies, pool->shrink_timeout))
return;
/* In case we reached zero bounce back to 512 pages. */

pool->cur_max_size = max(pool->cur_max_size << 1, 512);

pool->cur_max_size = min(pool->cur_max_size,
		 _manager->options.max_size);
+}

static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) { struct dma_page *d_page; @@ -606,6 +621,8 @@ static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags, pool->size = PAGE_SIZE; pool->type = type; pool->nrefills = 0;

pool->cur_max_size = _manager->options.max_size;

pool->shrink_timeout = jiffies; p = pool->name; for (i = 0; i < 5; i++) { if (type & t[i]) {

@@ -892,6 +909,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev) } }

/* Update pool size in case shrinker limited it. */

ttm_dma_pool_update_max_size(pool);

INIT_LIST_HEAD(&ttm_dma->pages_list); for (i = 0; i < ttm->num_pages; ++i) { ret = ttm_dma_pool_get_pages(pool, ttm_dma, i);

@@ -953,9 +973,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
if (pool->npages_free >= (_manager->options.max_size +
if (pool->npages_free >= (pool->cur_max_size +
		  NUM_PAGES_TO_ALLOC))
	npages = pool->npages_free - _manager->options.max_size;
	npages = pool->npages_free - pool->cur_max_size;
} spin_unlock_irqrestore(&pool->lock, irq_flags);
@@ -1024,6 +1044,9 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) /* Do it in round-robin fashion. */ if (++idx < pool_offset) continue;
/* No matter what make sure the pool do not grow in next second. */
p->pool->cur_max_size = p->pool->cur_max_size >> 1;
p->pool->shrink_timeout = jiffies + HZ;
nr_free = shrink_pages; shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, sc->gfp_mask);

Daniel Vetter

10:42 a.m.

New subject: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool

On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...

On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

...

/Thomas

...
Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Mario Kleiner mario.kleiner.de@gmail.com Cc: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++------- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++++++++++++++++++++++-- 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 09874d6..ab41adf 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -68,6 +68,8 @@

@list: Pool of free uc/wc pages for fast reuse.

@gfp_flags: Flags to pass for alloc_page.

@npages: Number of pages in pool.

@cur_max_size: Current maximum size for the pool.

@shrink_timeout: Timeout for pool maximum size restriction.

*/

struct ttm_page_pool { spinlock_t lock; @@ -76,6 +78,8 @@ struct ttm_page_pool { gfp_t gfp_flags; unsigned npages; char *name;

unsigned cur_max_size;

unsigned long last_shrink; unsigned long nfrees; unsigned long nrefills;

}; @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, pool->nfrees += freed_pages; }

+static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) +{
if (time_before(jiffies, pool->shrink_timeout))
return;
/* In case we reached zero bounce back to 512 pages. */

pool->cur_max_size = max(pool->cur_max_size << 1, 512);

pool->cur_max_size = min(pool->cur_max_size,
		 _manager->options.max_size);
+}

/**

Free pages from pool.

@@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (shrink_pages == 0) break; pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
/* No matter what make sure the pool do not grow in next second. */
pool->cur_max_size = pool->cur_max_size >> 1;
pool->shrink_timeout = jiffies + HZ;
shrink_pages = ttm_page_pool_free(pool, nr_free, sc->gfp_mask); freed += nr_free - shrink_pages;
@@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, } /* Check that we don't go over the pool limit */ npages = 0;
if (pool->npages > _manager->options.max_size) {
npages = pool->npages - _manager->options.max_size;
/* free at least NUM_PAGES_TO_ALLOC number of pages
 * to reduce calls to set_memory_wb */
if (npages < NUM_PAGES_TO_ALLOC)
	npages = NUM_PAGES_TO_ALLOC;
}
/*
* Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to
* set_memory_wb.
*/
if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC))
npages = pool->npages - pool->cur_max_size;
spin_unlock_irqrestore(&pool->lock, irq_flags); if (npages) ttm_page_pool_free(pool, npages, GFP_KERNEL);
@@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, return 0; }

/* Update pool size in case shrinker limited it. */

ttm_pool_update_max_size(pool);

/* combine zero flag to pool flags */ gfp_flags |= pool->gfp_flags;

@@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, gfp_t flags, pool->npages = pool->nfrees = 0; pool->gfp_flags = flags; pool->name = name;

pool->cur_max_size = _manager->options.max_size;

pool->shrink_timeout = jiffies;

}

int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index a076ff3..80b10aa 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -93,6 +93,8 @@ enum pool_type {

@size: Size used during DMA allocation.

@npages_free: Count of available pages for re-use.

@npages_in_use: Count of pages that are in use.

@cur_max_size: Current maximum size for the pool.

@shrink_timeout: Timeout for pool maximum size restriction.

@nfrees: Stats when pool is shrinking.

@nrefills: Stats when the pool is grown.

@gfp_flags: Flags to pass for alloc_page.

@@ -110,6 +112,8 @@ struct dma_pool { unsigned size; unsigned npages_free; unsigned npages_in_use;

unsigned cur_max_size;

unsigned long last_shrink; unsigned long nfrees; /* Stats when shrunk. */ unsigned long nrefills; /* Stats when grown. */ gfp_t gfp_flags;

@@ -331,6 +335,17 @@ static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page) kfree(d_page); d_page = NULL; }

+static inline void ttm_dma_pool_update_max_size(struct dma_pool *pool) +{
if (time_before(jiffies, pool->shrink_timeout))
return;
/* In case we reached zero bounce back to 512 pages. */

pool->cur_max_size = max(pool->cur_max_size << 1, 512);

pool->cur_max_size = min(pool->cur_max_size,
		 _manager->options.max_size);
+}

static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) { struct dma_page *d_page; @@ -606,6 +621,8 @@ static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags, pool->size = PAGE_SIZE; pool->type = type; pool->nrefills = 0;

pool->cur_max_size = _manager->options.max_size;

pool->shrink_timeout = jiffies; p = pool->name; for (i = 0; i < 5; i++) { if (type & t[i]) {

@@ -892,6 +909,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev) } }

/* Update pool size in case shrinker limited it. */

ttm_dma_pool_update_max_size(pool);

INIT_LIST_HEAD(&ttm_dma->pages_list); for (i = 0; i < ttm->num_pages; ++i) { ret = ttm_dma_pool_get_pages(pool, ttm_dma, i);

@@ -953,9 +973,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
if (pool->npages_free >= (_manager->options.max_size +
if (pool->npages_free >= (pool->cur_max_size +
		  NUM_PAGES_TO_ALLOC))
	npages = pool->npages_free - _manager->options.max_size;
	npages = pool->npages_free - pool->cur_max_size;
} spin_unlock_irqrestore(&pool->lock, irq_flags);
@@ -1024,6 +1044,9 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) /* Do it in round-robin fashion. */ if (++idx < pool_offset) continue;
/* No matter what make sure the pool do not grow in next second. */
p->pool->cur_max_size = p->pool->cur_max_size >> 1;
p->pool->shrink_timeout = jiffies + HZ;
nr_free = shrink_pages; shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, sc->gfp_mask);
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellstrom

12:35 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...

On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

/Thomas

...

...
/Thomas

...
Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Mario Kleiner mario.kleiner.de@gmail.com Cc: Michel Dänzer michel@daenzer.net Cc: Thomas Hellstrom thellstrom@vmware.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com

drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++------- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++++++++++++++++++++++-- 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 09874d6..ab41adf 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -68,6 +68,8 @@

@list: Pool of free uc/wc pages for fast reuse.

@gfp_flags: Flags to pass for alloc_page.

@npages: Number of pages in pool.

@cur_max_size: Current maximum size for the pool.

@shrink_timeout: Timeout for pool maximum size restriction.

*/

struct ttm_page_pool { spinlock_t lock; @@ -76,6 +78,8 @@ struct ttm_page_pool { gfp_t gfp_flags; unsigned npages; char *name;

unsigned cur_max_size;

unsigned long last_shrink; unsigned long nfrees; unsigned long nrefills;

}; @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, pool->nfrees += freed_pages; }

+static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) +{
if (time_before(jiffies, pool->shrink_timeout))
return;
/* In case we reached zero bounce back to 512 pages. */

pool->cur_max_size = max(pool->cur_max_size << 1, 512);

pool->cur_max_size = min(pool->cur_max_size,
		 _manager->options.max_size);
+}

/**

Free pages from pool.

@@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (shrink_pages == 0) break; pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
/* No matter what make sure the pool do not grow in next second. */
pool->cur_max_size = pool->cur_max_size >> 1;
pool->shrink_timeout = jiffies + HZ;
shrink_pages = ttm_page_pool_free(pool, nr_free, sc->gfp_mask); freed += nr_free - shrink_pages;
@@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, } /* Check that we don't go over the pool limit */ npages = 0;
if (pool->npages > _manager->options.max_size) {
npages = pool->npages - _manager->options.max_size;
/* free at least NUM_PAGES_TO_ALLOC number of pages
 * to reduce calls to set_memory_wb */
if (npages < NUM_PAGES_TO_ALLOC)
	npages = NUM_PAGES_TO_ALLOC;
}
/*
* Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to
* set_memory_wb.
*/
if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC))
npages = pool->npages - pool->cur_max_size;
spin_unlock_irqrestore(&pool->lock, irq_flags); if (npages) ttm_page_pool_free(pool, npages, GFP_KERNEL);
@@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, return 0; }

/* Update pool size in case shrinker limited it. */

ttm_pool_update_max_size(pool);

/* combine zero flag to pool flags */ gfp_flags |= pool->gfp_flags;

@@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, gfp_t flags, pool->npages = pool->nfrees = 0; pool->gfp_flags = flags; pool->name = name;

pool->cur_max_size = _manager->options.max_size;

pool->shrink_timeout = jiffies;

}

int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index a076ff3..80b10aa 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -93,6 +93,8 @@ enum pool_type {

@size: Size used during DMA allocation.

@npages_free: Count of available pages for re-use.

@npages_in_use: Count of pages that are in use.

@cur_max_size: Current maximum size for the pool.

@shrink_timeout: Timeout for pool maximum size restriction.

@nfrees: Stats when pool is shrinking.

@nrefills: Stats when the pool is grown.

@gfp_flags: Flags to pass for alloc_page.

@@ -110,6 +112,8 @@ struct dma_pool { unsigned size; unsigned npages_free; unsigned npages_in_use;

unsigned cur_max_size;

unsigned long last_shrink; unsigned long nfrees; /* Stats when shrunk. */ unsigned long nrefills; /* Stats when grown. */ gfp_t gfp_flags;

@@ -331,6 +335,17 @@ static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page) kfree(d_page); d_page = NULL; }

+static inline void ttm_dma_pool_update_max_size(struct dma_pool *pool) +{
if (time_before(jiffies, pool->shrink_timeout))
return;
/* In case we reached zero bounce back to 512 pages. */

pool->cur_max_size = max(pool->cur_max_size << 1, 512);

pool->cur_max_size = min(pool->cur_max_size,
		 _manager->options.max_size);
+}

static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) { struct dma_page *d_page; @@ -606,6 +621,8 @@ static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags, pool->size = PAGE_SIZE; pool->type = type; pool->nrefills = 0;

pool->cur_max_size = _manager->options.max_size;

pool->shrink_timeout = jiffies; p = pool->name; for (i = 0; i < 5; i++) { if (type & t[i]) {

@@ -892,6 +909,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev) } }

/* Update pool size in case shrinker limited it. */

ttm_dma_pool_update_max_size(pool);

INIT_LIST_HEAD(&ttm_dma->pages_list); for (i = 0; i < ttm->num_pages; ++i) { ret = ttm_dma_pool_get_pages(pool, ttm_dma, i);

@@ -953,9 +973,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
if (pool->npages_free >= (_manager->options.max_size +
if (pool->npages_free >= (pool->cur_max_size +
		  NUM_PAGES_TO_ALLOC))
	npages = pool->npages_free - _manager->options.max_size;
	npages = pool->npages_free - pool->cur_max_size;
} spin_unlock_irqrestore(&pool->lock, irq_flags);
@@ -1024,6 +1044,9 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) /* Do it in round-robin fashion. */ if (++idx < pool_offset) continue;
/* No matter what make sure the pool do not grow in next second. */
p->pool->cur_max_size = p->pool->cur_max_size >> 1;
p->pool->shrink_timeout = jiffies + HZ;
nr_free = shrink_pages; shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, sc->gfp_mask);
dri-devel mailing list dri-devel@lists.freedesktop.org https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mail...

David Herrmann

12:40 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On Wed, Aug 13, 2014 at 2:35 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...

On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

gem->filp points to a fresh shmem file, which itself is limited like dmabuf. That should suffice, right?

Thanks David

Thomas Hellstrom

12:48 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On 08/13/2014 02:40 PM, David Herrmann wrote:

...

Hi

On Wed, Aug 13, 2014 at 2:35 PM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

gem->filp points to a fresh shmem file, which itself is limited like dmabuf. That should suffice, right?

Thanks David

I'm thinking of situations where you have a gem name and open a new handle. It allocates a new unaccounted idr object. Admittedly you'd have to open a hell of a lot of new handles to stress the system, but that's an example of the situation I'm thinking of. Similarly perhaps if you create a gem handle from a prime file-descriptor but I haven't looked at that code in detail.

Thanks

/Thomas

Daniel Vetter

1:01 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...

On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

My comment really was about balancing mm users under the assumption that they're all unlimited. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Oded Gabbay

2:09 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On 13/08/14 16:01, Daniel Vetter wrote:

...

On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

My comment really was about balancing mm users under the assumption that they're all unlimited. -Daniel

I think the point you brought up becomes very important for compute (HSA) processes. I still don't know how to distinguish between legitimate use of GPU local memory and misbehaving/malicious processes.

We have a requirement that HSA processes will be allowed to allocate and pin GPU local memory. They do it through an ioctl. In the kernel driver, we have an accounting of those memory allocations, meaning that I can print a list of all the objects that were allocated by a certain process, per device. Therefore, in theory, I can reclaim any object, but that will probably break the userspace app. If the app is misbehaving/malicious than that's ok, I guess. But how do I know that ? And what prevents that malicious app to re-spawn and do the same allocation again ?

Oded

Thomas Hellstrom

3:19 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On 08/13/2014 04:09 PM, Oded Gabbay wrote:

...

On 13/08/14 16:01, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

My comment really was about balancing mm users under the assumption that they're all unlimited. -Daniel

I think the point you brought up becomes very important for compute (HSA) processes. I still don't know how to distinguish between legitimate use of GPU local memory and misbehaving/malicious processes.

We have a requirement that HSA processes will be allowed to allocate and pin GPU local memory. They do it through an ioctl. In the kernel driver, we have an accounting of those memory allocations, meaning that I can print a list of all the objects that were allocated by a certain process, per device. Therefore, in theory, I can reclaim any object, but that will probably break the userspace app. If the app is misbehaving/malicious than that's ok, I guess. But how do I know that ? And what prevents that malicious app to re-spawn and do the same allocation again ?

If you have per-process accounting of all those memory allocations and you need to reclaim and there's no nice way of doing it, you should probably do it like the kernel OOM killer: You simply kill the process that is most likely to bring back most memory (or use any other heuristic). Typically the kernel OOM killer does that when all swap space is exhausted, probably assuming that the process that uses most swap space is most likely to be malicious, if there are any malicious processes.

If the process respawns, and you run out of resources again, repeat the process.

/Thomas

...

Oded

Daniel Vetter

4:30 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On Wed, Aug 13, 2014 at 05:09:49PM +0300, Oded Gabbay wrote:

...

On 13/08/14 16:01, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

My comment really was about balancing mm users under the assumption that they're all unlimited. -Daniel

I think the point you brought up becomes very important for compute (HSA) processes. I still don't know how to distinguish between legitimate use of GPU local memory and misbehaving/malicious processes.

We have a requirement that HSA processes will be allowed to allocate and pin GPU local memory. They do it through an ioctl. In the kernel driver, we have an accounting of those memory allocations, meaning that I can print a list of all the objects that were allocated by a certain process, per device. Therefore, in theory, I can reclaim any object, but that will probably break the userspace app. If the app is misbehaving/malicious than that's ok, I guess. But how do I know that ? And what prevents that malicious app to re-spawn and do the same allocation again ?

You can't do that in the kernel, this is policy decisions which is userspaces job. But what we instead need to allow is to properly track memory allocations so that memory limits can be set with cgroups. With SVM you get that for free. Without SVM we need some work in that area since currently the memory accounting for gem/ttm drivers is broken.

The other bit is limits for wasting gpu time, and I guess for that we want a new gpu time cgroup system so that users can set soft/hard limits for different gpgpu tasks on servers. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellstrom

3:13 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On 08/13/2014 03:01 PM, Daniel Vetter wrote:

...

On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers) this type of accounting stops non-privileged malicious GPU-users from exhausting all system physical memory causing grief for other kernel systems but not from causing grief for other GPU users. I think that's the minimum level that's intended also for example also for the struct file accounting.

...

My comment really was about balancing mm users under the assumption that they're all unlimited.

Yeah, sorry for stealing the thread. I usually bring this up now and again but nowadays with an exponential backoff.

...

-Daniel

Thomas

Daniel Vetter

4:24 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:

...

On 08/13/2014 03:01 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers) this type of accounting stops non-privileged malicious GPU-users from exhausting all system physical memory causing grief for other kernel systems but not from causing grief for other GPU users. I think that's the minimum level that's intended also for example also for the struct file accounting.

I think in i915 we're fairly close on that minimal standard - interactions with shrinkers and oom logic work decently. It starts to fall apart though when we've actually run out of memory - if the real memory hog is a gpu process the oom killer won't notice all that memory since it's not accounted against processes correctly.

I don't agree that gpu process should be punished in general compared to other subsystems in the kernel. If the user wants to use 90% of all memory for gpu tasks then I want to make that possible, even if it means that everything else thrashes horribly. And as long as the system recovers and rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a fairly arbitrary (tunable) setting to limit system memory consumption, but I might be wrong on that.

...

...
My comment really was about balancing mm users under the assumption that they're all unlimited.

Yeah, sorry for stealing the thread. I usually bring this up now and again but nowadays with an exponential backoff.

Oh I'd love to see some cgroups or similar tracking so that server users could set sane per-process/user/task limits on how much memory/gpu time that group is allowed to consume. It's just that I haven't seen real demand for this and so couldn't make the time available to implement it. So thus far my goal is to make everything work nicely for unlimited tasks right up to the point where the OOM killer needs to step in. Past that everything starts to fall apart, but thus far that was good enough for desktop usage.

Maybe WebGL will finally make this important enough so that we can fix it for real ... -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Alex Deucher

4:30 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter daniel@ffwll.ch wrote:

...

On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 03:01 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > From: Jérôme Glisse jglisse@redhat.com > > When experiencing memory pressure we want to minimize pool size so that > memory we just shrinked is not added back again just as the next thing. > > This will divide by 2 the maximum pool size for each device each time > the pool have to shrink. The limit is bumped again is next allocation > happen after one second since the last shrink. The one second delay is > obviously an arbitrary choice. Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers) this type of accounting stops non-privileged malicious GPU-users from exhausting all system physical memory causing grief for other kernel systems but not from causing grief for other GPU users. I think that's the minimum level that's intended also for example also for the struct file accounting.

I think in i915 we're fairly close on that minimal standard - interactions with shrinkers and oom logic work decently. It starts to fall apart though when we've actually run out of memory - if the real memory hog is a gpu process the oom killer won't notice all that memory since it's not accounted against processes correctly.

I don't agree that gpu process should be punished in general compared to other subsystems in the kernel. If the user wants to use 90% of all memory for gpu tasks then I want to make that possible, even if it means that everything else thrashes horribly. And as long as the system recovers and rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a fairly arbitrary (tunable) setting to limit system memory consumption, but I might be wrong on that.

Yes, it currently limits you to half of memory, but at least we would like to make it tuneable since there are a lot of user cases where the user wants to use 90% of memory for GPU tasks at the expense of everything else.

Alex

...

...
...
My comment really was about balancing mm users under the assumption that they're all unlimited.

Yeah, sorry for stealing the thread. I usually bring this up now and again but nowadays with an exponential backoff.

Oh I'd love to see some cgroups or similar tracking so that server users could set sane per-process/user/task limits on how much memory/gpu time that group is allowed to consume. It's just that I haven't seen real demand for this and so couldn't make the time available to implement it. So thus far my goal is to make everything work nicely for unlimited tasks right up to the point where the OOM killer needs to step in. Past that everything starts to fall apart, but thus far that was good enough for desktop usage.

Maybe WebGL will finally make this important enough so that we can fix it for real ...

-Daniel

Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Daniel Vetter

4:38 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On Wed, Aug 13, 2014 at 12:30:45PM -0400, Alex Deucher wrote:

...

On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 03:01 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: > On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >> From: Jérôme Glisse jglisse@redhat.com >> >> When experiencing memory pressure we want to minimize pool size so that >> memory we just shrinked is not added back again just as the next thing. >> >> This will divide by 2 the maximum pool size for each device each time >> the pool have to shrink. The limit is bumped again is next allocation >> happen after one second since the last shrink. The one second delay is >> obviously an arbitrary choice. > Jérôme, > > I don't like this patch. It adds extra complexity and its usefulness is > highly questionable. > There are a number of caches in the system, and if all of them added > some sort of voluntary shrink heuristics like this, we'd end up with > impossible-to-debug unpredictable performance issues. > > We should let the memory subsystem decide when to reclaim pages from > caches and what caches to reclaim them from. Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers) this type of accounting stops non-privileged malicious GPU-users from exhausting all system physical memory causing grief for other kernel systems but not from causing grief for other GPU users. I think that's the minimum level that's intended also for example also for the struct file accounting.

I think in i915 we're fairly close on that minimal standard - interactions with shrinkers and oom logic work decently. It starts to fall apart though when we've actually run out of memory - if the real memory hog is a gpu process the oom killer won't notice all that memory since it's not accounted against processes correctly.

I don't agree that gpu process should be punished in general compared to other subsystems in the kernel. If the user wants to use 90% of all memory for gpu tasks then I want to make that possible, even if it means that everything else thrashes horribly. And as long as the system recovers and rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a fairly arbitrary (tunable) setting to limit system memory consumption, but I might be wrong on that.

Yes, it currently limits you to half of memory, but at least we would like to make it tuneable since there are a lot of user cases where the user wants to use 90% of memory for GPU tasks at the expense of everything else.

Ime a lot of fun stuff starts to happen when you go there. We have piles of memory thrashing testcases and generally had lots of fun with our shrinker, so I think until you've really beaten onto those paths in ttm+radeon I'd keep the limit where it is. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Vetter

4:45 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On Wed, Aug 13, 2014 at 6:38 PM, Daniel Vetter daniel@ffwll.ch wrote:

...

...
Yes, it currently limits you to half of memory, but at least we would like to make it tuneable since there are a lot of user cases where the user wants to use 90% of memory for GPU tasks at the expense of everything else.

Ime a lot of fun stuff starts to happen when you go there. We have piles of memory thrashing testcases and generally had lots of fun with our shrinker, so I think until you've really beaten onto those paths in ttm+radeon I'd keep the limit where it is.

One example that already starts if you go above 50% is that by default the dirty pagecache is limited to 40% of memory. Above that you start to stall in writeback, but gpus are really fast at re-dirtying memory. So we've seen cases where the core mm OOMed with essentially 90% of memory on writeback and piles of free swap. Waiting a few seconds for the SSD to catch up would have gotten it out of that tight spot without killing any process. One side-effect of such fun is that memory allocations start to fail in really interesting places, and you need to pile in hacks so make it all a bit more synchronous to avoid the core mm freaking out. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellstrom

5:38 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On 08/13/2014 06:30 PM, Alex Deucher wrote:

...

On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 03:01 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: > On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >> From: Jérôme Glisse jglisse@redhat.com >> >> When experiencing memory pressure we want to minimize pool size so that >> memory we just shrinked is not added back again just as the next thing. >> >> This will divide by 2 the maximum pool size for each device each time >> the pool have to shrink. The limit is bumped again is next allocation >> happen after one second since the last shrink. The one second delay is >> obviously an arbitrary choice. > Jérôme, > > I don't like this patch. It adds extra complexity and its usefulness is > highly questionable. > There are a number of caches in the system, and if all of them added > some sort of voluntary shrink heuristics like this, we'd end up with > impossible-to-debug unpredictable performance issues. > > We should let the memory subsystem decide when to reclaim pages from > caches and what caches to reclaim them from. Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers) this type of accounting stops non-privileged malicious GPU-users from exhausting all system physical memory causing grief for other kernel systems but not from causing grief for other GPU users. I think that's the minimum level that's intended also for example also for the struct file accounting.

I think in i915 we're fairly close on that minimal standard - interactions with shrinkers and oom logic work decently. It starts to fall apart though when we've actually run out of memory - if the real memory hog is a gpu process the oom killer won't notice all that memory since it's not accounted against processes correctly.

I don't agree that gpu process should be punished in general compared to other subsystems in the kernel. If the user wants to use 90% of all memory for gpu tasks then I want to make that possible, even if it means that everything else thrashes horribly. And as long as the system recovers and rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a fairly arbitrary (tunable) setting to limit system memory consumption, but I might be wrong on that.

Yes, it currently limits you to half of memory, but at least we would like to make it tuneable since there are a lot of user cases where the user wants to use 90% of memory for GPU tasks at the expense of everything else.

Alex

It's in /sys/devices/virtual/drm/ttm/memory_accounting/*

Run-time tunable, but you should probably write an app to tune if you want to hand out to users, since if you up the limit, you probably want to modify a number of values.

zone_memory: ro: Total memory in the zone. used_memory: ro: Currently pinned memory. available_memory: rw: Allocation limit. emergency_memory: rw: Allocation limit for CAP_SYS_ADMIN swap_limit: rw: Swapper thread starts at this limit.

/Thomas

Thomas Hellstrom

5:20 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On 08/13/2014 06:24 PM, Daniel Vetter wrote:

...

On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 03:01 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > From: Jérôme Glisse jglisse@redhat.com > > When experiencing memory pressure we want to minimize pool size so that > memory we just shrinked is not added back again just as the next thing. > > This will divide by 2 the maximum pool size for each device each time > the pool have to shrink. The limit is bumped again is next allocation > happen after one second since the last shrink. The one second delay is > obviously an arbitrary choice. Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers) this type of accounting stops non-privileged malicious GPU-users from exhausting all system physical memory causing grief for other kernel systems but not from causing grief for other GPU users. I think that's the minimum level that's intended also for example also for the struct file accounting.

I think in i915 we're fairly close on that minimal standard - interactions with shrinkers and oom logic work decently. It starts to fall apart though when we've actually run out of memory - if the real memory hog is a gpu process the oom killer won't notice all that memory since it's not accounted against processes correctly.

I don't agree that gpu process should be punished in general compared to other subsystems in the kernel. If the user wants to use 90% of all memory for gpu tasks then I want to make that possible, even if it means that everything else thrashes horribly. And as long as the system recovers and rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a fairly arbitrary (tunable) setting to limit system memory consumption, but I might be wrong on that.

No, that's correct, or rather it's intended to limit pinned unreclaimable system memory (though part of what's unreclaimable could actually be made reclaimable if we'd implement another shrinker level).

...

...
...
My comment really was about balancing mm users under the assumption that they're all unlimited.

Yeah, sorry for stealing the thread. I usually bring this up now and again but nowadays with an exponential backoff.

Oh I'd love to see some cgroups or similar tracking so that server users could set sane per-process/user/task limits on how much memory/gpu time that group is allowed to consume. It's just that I haven't seen real demand for this and so couldn't make the time available to implement it. So thus far my goal is to make everything work nicely for unlimited tasks right up to the point where the OOM killer needs to step in. Past that everything starts to fall apart, but thus far that was good enough for desktop usage.

Well I'm not sure if things have changed but last time (a couple of years ago) I looked at this situation (kernel out of physical memory but a fair amount of swap space left) the OOM killer was never invoked, so a number of more or less critical kernel systems (disk I/O, paging, networking) where getting -ENOMEM and hitting rarely tested error paths. A state you don't want to have the kernel in. Now the OOM algorithm may of course have changed since then.

My point is that with unaccounted constructs like gem-open-from-name it should be easy for any unpriviliged authenticated gem client to pin all kernel physical memory, put the kernel in that state and keep it there, and IMO a kernel-user space interface shouldn't allow that.

/Thomas

...

Maybe WebGL will finally make this important enough so that we can fix it for real ... -Daniel

Jesse Barnes

14 Aug 14 Aug

10:29 p.m.

New subject: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

On Wed, 13 Aug 2014 17:13:56 +0200 Thomas Hellstrom thellstrom@vmware.com wrote:

...

On 08/13/2014 03:01 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 12:42 PM, Daniel Vetter wrote:

...
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:

...
On 08/13/2014 05:52 AM, Jérôme Glisse wrote:

...
From: Jérôme Glisse jglisse@redhat.com

When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing.

This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice.

Jérôme,

I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel

Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers) this type of accounting stops non-privileged malicious GPU-users from exhausting all system physical memory causing grief for other kernel systems but not from causing grief for other GPU users. I think that's the minimum level that's intended also for example also for the struct file accounting.

...
My comment really was about balancing mm users under the assumption that they're all unlimited.

Yeah, sorry for stealing the thread. I usually bring this up now and again but nowadays with an exponential backoff.

Yeah I agree we're missing some good limits stuff in i915 and DRM in general probably. Chris started looking at this awhile back, but I haven't seen anything recently. Tying into the ulimits/rlimits might make sense, and at the very least we need to account for things properly so we can add new limits where needed.

-- Jesse Barnes, Intel Open Source Technology Center

3576

Age (days ago)

3905

Last active (days ago)

dri-devel@lists.freedesktop.org

28 comments

10 participants

tags (0)

participants (10)

Alex Deucher
Daniel Vetter
David Herrmann
Jerome Glisse
Jesse Barnes
Jérôme Glisse
Konrad Rzeszutek Wilk
Michel Dänzer
Oded Gabbay
Thomas Hellstrom