Jerome pointed me to some accounting error in the DMA API debugging code and while I can't figure it out yet, I did notice some extreme slowness - which is due to the nouveau driver calling the unpopulate (now that unbind + unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2 did not have those issues but I can't recall).
Anyhow these patches fix the 50% perf regression I saw and also some minor bugs that I noticed.
Otherwise we are doing redundant work. Especially since the 'unbind' and 'unpopulate' have been merged and nouveau driver ends up calling it quite excessivly. On a GeForce 8600 GT with Gnome Shell (GNOME 3) we end up spending about 54% CPU time in __change_page_attr_set_clr checking the page flags when I move the mouse cursor all over the screen.
The callgraph (annotated) looks as so before this patch:
53.29% gnome-shell [kernel.kallsyms] [k] static_protections | --- static_protections | |--91.80%-- __change_page_attr_set_clr | change_page_attr_set_clr | set_pages_array_wb | | | |--96.55%-- ttm_dma_unpopulate | | nouveau_ttm_tt_unpopulate | | ttm_tt_destroy | | ttm_bo_cleanup_memtype_use | | ttm_bo_release | | kref_put | | ttm_bo_unref | | nouveau_gem_object_del | | drm_gem_object_free | | kref_put | | drm_gem_object_unreference_unlocked | | drm_gem_object_handle_unreference_unlocked.part.1 | | drm_gem_handle_delete | | drm_gem_close_ioctl | | drm_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | __GI___ioctl | | | --3.45%-- ttm_dma_pages_put | ttm_dma_page_pool_free | ttm_dma_unpopulate | nouveau_ttm_tt_unpopulate | ttm_tt_destroy | ttm_bo_cleanup_memtype_use | ttm_bo_release | kref_put | ttm_bo_unref | nouveau_gem_object_del | drm_gem_object_free | kref_put | drm_gem_object_unreference_unlocked | drm_gem_object_handle_unreference_unlocked.part.1 | drm_gem_handle_delete | drm_gem_close_ioctl | drm_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | __GI___ioctl | --8.20%-- change_page_attr_set_clr set_pages_array_wb | |--93.76%-- ttm_dma_unpopulate | nouveau_ttm_tt_unpopulate | ttm_tt_destroy | ttm_bo_cleanup_memtype_use | ttm_bo_release | kref_put | ttm_bo_unref | nouveau_gem_object_del | drm_gem_object_free | kref_put | drm_gem_object_unreference_unlocked | drm_gem_object_handle_unreference_unlocked.part.1 | drm_gem_handle_delete | drm_gem_close_ioctl | drm_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | __GI___ioctl | --6.24%-- ttm_dma_pages_put ttm_dma_page_pool_free ttm_dma_unpopulate nouveau_ttm_tt_unpopulate ttm_tt_destroy ttm_bo_cleanup_memtype_use ttm_bo_release kref_put ttm_bo_unref nouveau_gem_object_del drm_gem_object_free kref_put drm_gem_object_unreference_unlocked drm_gem_object_handle_unreference_unlocked.part.1 drm_gem_handle_delete drm_gem_close_ioctl drm_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath __GI___ioctl
and after this patch all of that disappears.
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com --- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index 6678abc..6c06d0b 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -384,7 +384,9 @@ static void ttm_dma_pages_put(struct dma_pool *pool, struct list_head *d_pages, { struct dma_page *d_page, *tmp;
- if (npages && set_pages_array_wb(pages, npages)) + /* Don't set WB on WB page pool. */ + if (npages && !(pool->type & IS_CACHED) && + set_pages_array_wb(pages, npages)) pr_err(TTM_PFX "%s: Failed to set %d pages to wb!\n", pool->dev_name, npages);
@@ -396,7 +398,8 @@ static void ttm_dma_pages_put(struct dma_pool *pool, struct list_head *d_pages,
static void ttm_dma_page_put(struct dma_pool *pool, struct dma_page *d_page) { - if (set_pages_array_wb(&d_page->p, 1)) + /* Don't set WB on WB page pool. */ + if (!(pool->type & IS_CACHED) && set_pages_array_wb(&d_page->p, 1)) pr_err(TTM_PFX "%s: Failed to set %d pages to wb!\n", pool->dev_name, 1);
The code to figure out how many pages to shrink the pool ends up capping the 'count' at _manager->options.max_size - which is OK. Except that the 'count' is also used when accounting for how many pages are recycled - which we end up with the invalid values. This fixes it by using a different value for the amount of pages to shrink.
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com --- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index 6c06d0b..e57aa24 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) struct dma_page *d_page, *next; enum pool_type type; bool is_cached = false; - unsigned count = 0, i; + unsigned count = 0, i, npages; unsigned long irq_flags;
type = ttm_to_type(ttm->page_flags, ttm->caching_state); @@ -971,11 +971,13 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) pool->npages_in_use -= count; if (is_cached) { pool->nfrees += count; + npages = count; } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list); + npages = count; if (pool->npages_free > _manager->options.max_size) { - count = pool->npages_free - _manager->options.max_size; + npages = pool->npages_free - _manager->options.max_size; } } spin_unlock_irqrestore(&pool->lock, irq_flags); @@ -1000,8 +1002,8 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) }
/* shrink pool if necessary */ - if (count) - ttm_dma_page_pool_free(pool, count); + if (npages) + ttm_dma_page_pool_free(pool, npages); ttm->state = tt_unpopulated; } EXPORT_SYMBOL_GPL(ttm_dma_unpopulate);
On Mon, Dec 12, 2011 at 3:09 PM, Konrad Rzeszutek Wilk konrad.wilk@oracle.com wrote:
The code to figure out how many pages to shrink the pool ends up capping the 'count' at _manager->options.max_size - which is OK. Except that the 'count' is also used when accounting for how many pages are recycled - which we end up with the invalid values. This fixes it by using a different value for the amount of pages to shrink.
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index 6c06d0b..e57aa24 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) struct dma_page *d_page, *next; enum pool_type type; bool is_cached = false;
- unsigned count = 0, i;
- unsigned count = 0, i, npages;
unsigned long irq_flags;
type = ttm_to_type(ttm->page_flags, ttm->caching_state); @@ -971,11 +971,13 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) pool->npages_in_use -= count; if (is_cached) { pool->nfrees += count;
- npages = count;
This is wrong, i thought we didn't wanted to shrink cached page, your 3/3 patch remove this line. Thought i am kind of wondering why we don't shrink the cached pool, i can't remember the reason.
} else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
- npages = count;
if (pool->npages_free > _manager->options.max_size) {
- count = pool->npages_free - _manager->options.max_size;
- npages = pool->npages_free - _manager->options.max_size;
} } spin_unlock_irqrestore(&pool->lock, irq_flags); @@ -1000,8 +1002,8 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) }
/* shrink pool if necessary */
- if (count)
- ttm_dma_page_pool_free(pool, count);
- if (npages)
- ttm_dma_page_pool_free(pool, npages);
ttm->state = tt_unpopulated; } EXPORT_SYMBOL_GPL(ttm_dma_unpopulate); -- 1.7.7.3
Otherwise Reviewed-by:Jerome Glisse jglisse@redhat.com
On Wed, Dec 21, 2011 at 10:17:36AM -0500, Jerome Glisse wrote:
On Mon, Dec 12, 2011 at 3:09 PM, Konrad Rzeszutek Wilk konrad.wilk@oracle.com wrote:
The code to figure out how many pages to shrink the pool ends up capping the 'count' at _manager->options.max_size - which is OK. Except that the 'count' is also used when accounting for how many pages are recycled - which we end up with the invalid values. This fixes it by using a different value for the amount of pages to shrink.
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index 6c06d0b..e57aa24 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) struct dma_page *d_page, *next; enum pool_type type; bool is_cached = false;
- unsigned count = 0, i;
- unsigned count = 0, i, npages;
unsigned long irq_flags;
type = ttm_to_type(ttm->page_flags, ttm->caching_state); @@ -971,11 +971,13 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) pool->npages_in_use -= count; if (is_cached) { pool->nfrees += count;
- npages = count;
This is wrong, i thought we didn't wanted to shrink cached page, your 3/3 patch remove this line. Thought i am kind of wondering why we don't shrink the cached pool, i can't remember the reason.
We delete the cached ones:
988 if (is_cached) { 989 list_for_each_entry_safe(d_page, next, &ttm_dma->pages_list, page_list) { 990 ttm_mem_global_free_page(ttm->glob->mem_glob, 991 d_page->p); 992 ttm_dma_page_put(pool, d_page); 993 }
And then we end up calling in ttm_dma_page_pool_free for (count) pages that we had already deleted. So it ends up being a bit redundant.
The 3/3 patch fixes that (by removing the call to ttm_dma_page_pool_free for is_cached. This patch (2/3) tries to preserve the logic if it is either is_cached or !is_cached.
} else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list);
- npages = count;
if (pool->npages_free > _manager->options.max_size) {
- count = pool->npages_free - _manager->options.max_size;
- npages = pool->npages_free - _manager->options.max_size;
} } spin_unlock_irqrestore(&pool->lock, irq_flags); @@ -1000,8 +1002,8 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) }
/* shrink pool if necessary */
- if (count)
- ttm_dma_page_pool_free(pool, count);
- if (npages)
- ttm_dma_page_pool_free(pool, npages);
ttm->state = tt_unpopulated; } EXPORT_SYMBOL_GPL(ttm_dma_unpopulate); -- 1.7.7.3
Otherwise Reviewed-by:Jerome Glisse jglisse@redhat.com
Thanks!
We would free the page pool - even if it was for cached pages (which are deleted from the pool - so there are no free pages).
Also we also missed the opportunity to batch the amount of pages to free (similar to how ttm_page_alloc.c does it). This reintroduces the code that was lost during rebasing.
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com --- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index e57aa24..156ddcd 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) struct dma_page *d_page, *next; enum pool_type type; bool is_cached = false; - unsigned count = 0, i, npages; + unsigned count = 0, i, npages = 0; unsigned long irq_flags;
type = ttm_to_type(ttm->page_flags, ttm->caching_state); @@ -971,13 +971,16 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) pool->npages_in_use -= count; if (is_cached) { pool->nfrees += count; - npages = count; } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list); npages = count; if (pool->npages_free > _manager->options.max_size) { npages = pool->npages_free - _manager->options.max_size; + /* free at least NUM_PAGES_TO_ALLOC number of pages + * to reduce calls to set_memory_wb */ + if (npages < NUM_PAGES_TO_ALLOC) + npages = NUM_PAGES_TO_ALLOC; } } spin_unlock_irqrestore(&pool->lock, irq_flags); @@ -1001,7 +1004,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) ttm_dma->dma_address[i] = 0; }
- /* shrink pool if necessary */ + /* shrink pool if necessary (only on !is_cached pools)*/ if (npages) ttm_dma_page_pool_free(pool, npages); ttm->state = tt_unpopulated;
On Mon, Dec 12, 2011 at 3:09 PM, Konrad Rzeszutek Wilk konrad.wilk@oracle.com wrote:
We would free the page pool - even if it was for cached pages (which are deleted from the pool - so there are no free pages).
Also we also missed the opportunity to batch the amount of pages to free (similar to how ttm_page_alloc.c does it). This reintroduces the code that was lost during rebasing.
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index e57aa24..156ddcd 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) struct dma_page *d_page, *next; enum pool_type type; bool is_cached = false;
- unsigned count = 0, i, npages;
- unsigned count = 0, i, npages = 0;
unsigned long irq_flags;
type = ttm_to_type(ttm->page_flags, ttm->caching_state); @@ -971,13 +971,16 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) pool->npages_in_use -= count; if (is_cached) { pool->nfrees += count;
- npages = count;
See patch 2/3 comment otherwise Reviewed-by: Jerome Glisse jglisse@redhat.com
} else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list); npages = count; if (pool->npages_free > _manager->options.max_size) { npages = pool->npages_free - _manager->options.max_size;
- /* free at least NUM_PAGES_TO_ALLOC number of pages
- * to reduce calls to set_memory_wb */
- if (npages < NUM_PAGES_TO_ALLOC)
- npages = NUM_PAGES_TO_ALLOC;
} } spin_unlock_irqrestore(&pool->lock, irq_flags); @@ -1001,7 +1004,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) ttm_dma->dma_address[i] = 0; }
- /* shrink pool if necessary */
- /* shrink pool if necessary (only on !is_cached pools)*/
if (npages) ttm_dma_page_pool_free(pool, npages); ttm->state = tt_unpopulated; -- 1.7.7.3
On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
Jerome pointed me to some accounting error in the DMA API debugging code and while I can't figure it out yet, I did notice some extreme slowness - which is due to the nouveau driver calling the unpopulate (now that unbind + unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2 did not have those issues but I can't recall).
Anyhow these patches fix the 50% perf regression I saw and also some minor bugs that I noticed.
Gonna review those today and test them.
Cheers, Jerome
On 12/13/2011 05:07 PM, Jerome Glisse wrote:
On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
Jerome pointed me to some accounting error in the DMA API debugging code and while I can't figure it out yet, I did notice some extreme slowness - which is due to the nouveau driver calling the unpopulate (now that unbind + unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2 did not have those issues but I can't recall).
Anyhow these patches fix the 50% perf regression I saw and also some minor bugs that I noticed.
Gonna review those today and test them.
Cheers, Jerome
Hi!
I'm not whether any drivers are still using the AGP backend? Calling unpopulate / (previous clear) each time unbind is done should be quite inefficient with that one, as AGP sets up its own data structures and copies page tables on each populate. That should really be avoided unless there is a good reason to have it.
/Thomas
On Tue, Dec 13, 2011 at 05:23:30PM +0100, Thomas Hellstrom wrote:
On 12/13/2011 05:07 PM, Jerome Glisse wrote:
On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
Jerome pointed me to some accounting error in the DMA API debugging code and while I can't figure it out yet, I did notice some extreme slowness - which is due to the nouveau driver calling the unpopulate (now that unbind + unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2 did not have those issues but I can't recall).
Anyhow these patches fix the 50% perf regression I saw and also some minor bugs that I noticed.
Gonna review those today and test them.
Cheers, Jerome
Hi!
I'm not whether any drivers are still using the AGP backend?
Uh, probably they do if the cards are AGP? The problem I encountered was with an PCIe Nvidia card:
01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1
Calling unpopulate / (previous clear) each time unbind is done should be quite inefficient with that one, as AGP sets up its own data structures and copies page tables on each populate. That should really be avoided unless there is a good reason to have it.
nouveau_bo_rd32 and nv50_crtc_cursor_set showed up as the callers that were causing the unpopulate calls. It did happen _a lot_ when I moved the cursor madly.
On 12/13/2011 05:40 PM, Konrad Rzeszutek Wilk wrote:
On Tue, Dec 13, 2011 at 05:23:30PM +0100, Thomas Hellstrom wrote:
On 12/13/2011 05:07 PM, Jerome Glisse wrote:
On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
Jerome pointed me to some accounting error in the DMA API debugging code and while I can't figure it out yet, I did notice some extreme slowness - which is due to the nouveau driver calling the unpopulate (now that unbind + unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2 did not have those issues but I can't recall).
Anyhow these patches fix the 50% perf regression I saw and also some minor bugs that I noticed.
Gonna review those today and test them.
Cheers, Jerome
Hi!
I'm not whether any drivers are still using the AGP backend?
Uh, probably they do if the cards are AGP? The problem I encountered was with an PCIe Nvidia card:
01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1
Calling unpopulate / (previous clear) each time unbind is done should be quite inefficient with that one, as AGP sets up its own data structures and copies page tables on each populate. That should really be avoided unless there is a good reason to have it.
nouveau_bo_rd32 and nv50_crtc_cursor_set showed up as the callers that were causing the unpopulate calls. It did happen _a lot_ when I moved the cursor madly.
Konrad, Jerome Was there a resolution to this. If the ttm_tt rewrite results in unpopulate being called more often than before, and that causes performance regressions, that must be fixed as soon as possible.
/Thomas
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Dec 19, 2011 at 08:51:15PM +0100, Thomas Hellstrom wrote:
On 12/13/2011 05:40 PM, Konrad Rzeszutek Wilk wrote:
On Tue, Dec 13, 2011 at 05:23:30PM +0100, Thomas Hellstrom wrote:
On 12/13/2011 05:07 PM, Jerome Glisse wrote:
On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
Jerome pointed me to some accounting error in the DMA API debugging code and while I can't figure it out yet, I did notice some extreme slowness - which is due to the nouveau driver calling the unpopulate (now that unbind + unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2 did not have those issues but I can't recall).
Anyhow these patches fix the 50% perf regression I saw and also some minor bugs that I noticed.
Gonna review those today and test them.
Cheers, Jerome
Hi!
I'm not whether any drivers are still using the AGP backend?
Uh, probably they do if the cards are AGP? The problem I encountered was with an PCIe Nvidia card:
01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1
Calling unpopulate / (previous clear) each time unbind is done should be quite inefficient with that one, as AGP sets up its own data structures and copies page tables on each populate. That should really be avoided unless there is a good reason to have it.
nouveau_bo_rd32 and nv50_crtc_cursor_set showed up as the callers that were causing the unpopulate calls. It did happen _a lot_ when I moved the cursor madly.
Konrad, Jerome Was there a resolution to this. If the ttm_tt rewrite results in unpopulate being called more often than before, and that causes performance regressions, that must be fixed as soon as possible.
I am waiting for Jerome to look over my patches. Next week I was going to setup a test-bed with an Radeon and Nvidia AGP card and bootup two kernels - one without the ttm_bind/ttm_populate squashing and one with it.
Then run some benchmark code.. Thought not sure what kind of benchmark code I should run? Any thoughts? The regression I observed were just by moving the mouse around but that is not very "scientfic".
openareana? alien? etuxracer? Any of them offer some timed demo mode to get an idea of performance impact?
On Tue, Dec 13, 2011 at 11:40:31AM -0500, Konrad Rzeszutek Wilk wrote:
On Tue, Dec 13, 2011 at 05:23:30PM +0100, Thomas Hellstrom wrote:
On 12/13/2011 05:07 PM, Jerome Glisse wrote:
On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
Jerome pointed me to some accounting error in the DMA API debugging code and while I can't figure it out yet, I did notice some extreme slowness - which is due to the nouveau driver calling the unpopulate (now that unbind + unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2 did not have those issues but I can't recall).
Anyhow these patches fix the 50% perf regression I saw and also some minor bugs that I noticed.
Gonna review those today and test them.
Cheers, Jerome
Hi!
I'm not whether any drivers are still using the AGP backend?
Uh, probably they do if the cards are AGP? The problem I encountered was with an PCIe Nvidia card:
01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1
Calling unpopulate / (previous clear) each time unbind is done should be quite inefficient with that one, as AGP sets up its own data structures and copies page tables on each populate. That should really be avoided unless there is a good reason to have it.
nouveau_bo_rd32 and nv50_crtc_cursor_set showed up as the callers that were causing the unpopulate calls. It did happen _a lot_ when I moved the cursor madly.
Looked at that and i am pretty sure that those function can't trigger unpopulate event (but i might have missed a path). I tested before and after the ttm change and those function are as time consuming as before.
I think nouveau is doing something stupid in userspace when it comes to cursor (might be the ddx or just gnome3 or X server haven't tracked that down).
Anyway, bottom line is i believe we don't call unpopulate more than before the ttm changes. It seems to be the same from code inspection, previously each time we called clear we also freed the pages so clear is equivalent to unpopulate. In theory we don't call unpopulate on each unbind but given the current user we do (only path were unbind is not followed by unpopulate is ttm_bo_move_ttm same as before the patch so no change there).
So, Konrad patch to avoid calling set_pages_array_wb when the page is cached is a proper fix, this is what happened before, and this is what the ttm change broke. I believe there is no performance regression with this ttm change on radeon or nouveau (tested only with gnome3+nexuiz on nv50 and r600,r700,evergreen).
Cheers, Jerome
Hi!
I'm not whether any drivers are still using the AGP backend?
Actually the openchrome projects KMS kernel uses a AGP backend.
Calling unpopulate / (previous clear) each time unbind is done should be quite inefficient with that one, as AGP sets up its own data structures and copies page tables on each populate. That should really be avoided unless there is a good reason to have it.
/Thomas
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org