The patch is compile-tested ony and not very intrusive. It should be applied on top of the latest TTM patches.
Besides reduced CPU-usage on SMP kernels, there is the benefit of using shared code. Will also ease implementation of concurrent CS due to the deadlock prevention mechanisms.
If time allows, I'll write a patch to do the same on Novueau, although Novueau does things a little differently than the other TTM-aware drivers.
Rather than re-implementing in the Radeon driver, Use the execbuf / cs / pushbuf utilities that comes with TTM. This comes with an even greater benefit now that many spinlocks have been optimized away...
Signed-off-by: Thomas Hellstrom thellstrom@vmware.com --- drivers/gpu/drm/radeon/radeon.h | 4 +- drivers/gpu/drm/radeon/radeon_cs.c | 17 ++++++---- drivers/gpu/drm/radeon/radeon_object.c | 55 ++----------------------------- drivers/gpu/drm/radeon/radeon_object.h | 3 -- 4 files changed, 16 insertions(+), 63 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 73f600d..c718986 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -69,6 +69,7 @@ #include <ttm/ttm_bo_driver.h> #include <ttm/ttm_placement.h> #include <ttm/ttm_module.h> +#include <ttm/ttm_execbuf_util.h>
#include "radeon_family.h" #include "radeon_mode.h" @@ -259,13 +260,12 @@ struct radeon_bo { };
struct radeon_bo_list { - struct list_head list; + struct ttm_validate_buffer tv; struct radeon_bo *bo; uint64_t gpu_offset; unsigned rdomain; unsigned wdomain; u32 tiling_flags; - bool reserved; };
/* diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 6d64a27..35b5eb8 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -77,13 +77,13 @@ int radeon_cs_parser_relocs(struct radeon_cs_parser *p) p->relocs_ptr[i] = &p->relocs[i]; p->relocs[i].robj = p->relocs[i].gobj->driver_private; p->relocs[i].lobj.bo = p->relocs[i].robj; - p->relocs[i].lobj.rdomain = r->read_domains; p->relocs[i].lobj.wdomain = r->write_domain; + p->relocs[i].lobj.rdomain = r->read_domains; + p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo; p->relocs[i].handle = r->handle; p->relocs[i].flags = r->flags; - INIT_LIST_HEAD(&p->relocs[i].lobj.list); radeon_bo_list_add_object(&p->relocs[i].lobj, - &p->validated); + &p->validated); } } return radeon_bo_list_validate(&p->validated); @@ -189,10 +189,13 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error) { unsigned i;
- if (!error && parser->ib) { - radeon_bo_list_fence(&parser->validated, parser->ib->fence); - } - radeon_bo_list_unreserve(&parser->validated); + + if (!error && parser->ib) + ttm_eu_fence_buffer_objects(&parser->validated, + parser->ib->fence); + else + ttm_eu_backoff_reservation(&parser->validated); + if (parser->relocs != NULL) { for (i = 0; i < parser->nrelocs; i++) { if (parser->relocs[i].gobj) diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 8af5ae4..abf98d0 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -292,34 +292,9 @@ void radeon_bo_list_add_object(struct radeon_bo_list *lobj, struct list_head *head) { if (lobj->wdomain) { - list_add(&lobj->list, head); + list_add(&lobj->tv.head, head); } else { - list_add_tail(&lobj->list, head); - } -} - -int radeon_bo_list_reserve(struct list_head *head) -{ - struct radeon_bo_list *lobj; - int r; - - list_for_each_entry(lobj, head, list){ - r = radeon_bo_reserve(lobj->bo, false); - if (unlikely(r != 0)) - return r; - lobj->reserved = true; - } - return 0; -} - -void radeon_bo_list_unreserve(struct list_head *head) -{ - struct radeon_bo_list *lobj; - - list_for_each_entry(lobj, head, list) { - /* only unreserve object we successfully reserved */ - if (lobj->reserved && radeon_bo_is_reserved(lobj->bo)) - radeon_bo_unreserve(lobj->bo); + list_add_tail(&lobj->tv.head, head); } }
@@ -330,14 +305,11 @@ int radeon_bo_list_validate(struct list_head *head) u32 domain; int r;
- list_for_each_entry(lobj, head, list) { - lobj->reserved = false; - } - r = radeon_bo_list_reserve(head); + r = ttm_eu_reserve_buffers(head); if (unlikely(r != 0)) { return r; } - list_for_each_entry(lobj, head, list) { + list_for_each_entry(lobj, head, tv.head) { bo = lobj->bo; if (!bo->pin_count) { domain = lobj->wdomain ? lobj->wdomain : lobj->rdomain; @@ -360,25 +332,6 @@ int radeon_bo_list_validate(struct list_head *head) return 0; }
-void radeon_bo_list_fence(struct list_head *head, void *fence) -{ - struct radeon_bo_list *lobj; - struct radeon_bo *bo; - struct radeon_fence *old_fence = NULL; - - list_for_each_entry(lobj, head, list) { - bo = lobj->bo; - spin_lock(&bo->tbo.bdev->fence_lock); - old_fence = (struct radeon_fence *)bo->tbo.sync_obj; - bo->tbo.sync_obj = radeon_fence_ref(fence); - bo->tbo.sync_obj_arg = NULL; - spin_unlock(&bo->tbo.bdev->fence_lock); - if (old_fence) { - radeon_fence_unref(&old_fence); - } - } -} - int radeon_bo_fbdev_mmap(struct radeon_bo *bo, struct vm_area_struct *vma) { diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 7885d07..3b143c0 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -151,10 +151,7 @@ extern int radeon_bo_init(struct radeon_device *rdev); extern void radeon_bo_fini(struct radeon_device *rdev); extern void radeon_bo_list_add_object(struct radeon_bo_list *lobj, struct list_head *head); -extern int radeon_bo_list_reserve(struct list_head *head); -extern void radeon_bo_list_unreserve(struct list_head *head); extern int radeon_bo_list_validate(struct list_head *head); -extern void radeon_bo_list_fence(struct list_head *head, void *fence); extern int radeon_bo_fbdev_mmap(struct radeon_bo *bo, struct vm_area_struct *vma); extern int radeon_bo_set_tiling_flags(struct radeon_bo *bo,
On Wed, Nov 17, 2010 at 7:38 AM, Thomas Hellstrom thellstrom@vmware.com wrote:
The patch is compile-tested ony and not very intrusive. It should be applied on top of the latest TTM patches.
Besides reduced CPU-usage on SMP kernels, there is the benefit of using shared code. Will also ease implementation of concurrent CS due to the deadlock prevention mechanisms.
If time allows, I'll write a patch to do the same on Novueau, although Novueau does things a little differently than the other TTM-aware drivers.
Test on on rs780 and it seems to work properly, but i haven't really stress test it beside few games. I see no perf or CPU use improvement but it could very well be hidden by others cost.
Note that we can't really have concurrent cs due to the way checking work. Anyway you could add my reviewed by.
Reviewed-by: Jerome Glisse jglisse@redhat.com Tested-by: Jerome Glisse jglisse@redhat.com
On 11/17/2010 08:11 PM, Jerome Glisse wrote:
On Wed, Nov 17, 2010 at 7:38 AM, Thomas Hellstromthellstrom@vmware.com wrote:
The patch is compile-tested ony and not very intrusive. It should be applied on top of the latest TTM patches.
Besides reduced CPU-usage on SMP kernels, there is the benefit of using shared code. Will also ease implementation of concurrent CS due to the deadlock prevention mechanisms.
If time allows, I'll write a patch to do the same on Novueau, although Novueau does things a little differently than the other TTM-aware drivers.
Test on on rs780 and it seems to work properly, but i haven't really stress test it beside few games. I see no perf or CPU use improvement but it could very well be hidden by others cost.
Probably. I would have expected that a CPU-bound app, for example ipers if running at 100% CPU (before the patch) would've seen some frame rate improvements..
Note that we can't really have concurrent cs due to the way checking work. Anyway you could add my reviewed by.
Thanks, Jerome. /Thomas
Reviewed-by: Jerome Glissejglisse@redhat.com Tested-by: Jerome Glissejglisse@redhat.com
dri-devel@lists.freedesktop.org