Hi!
I updated the openchrome tree and while testing on the AGP system discovered some interesting problems with the new TTM changes. The problems center around the ttm_tt_[un]populate which I modeled after the radeon and nouveau driver. First problem I noticed was on a AGP system that my ttm_tt_populate function would oops. Tracking it down I found the problem was the ttm_agp_tt_create calls ttm_tt_init instead of ttm_dma_tt_init so once my ttm_tt_populate function would attempt to touch the dma_address it would oops. The second issue is the assumption of the cast for struct ttm_tt in both the populate and unpopulate function. For the AGP case the proper case would be to ttm_agp_backend. At this point my assumption is the ttm_bo_move function has to be rewritten to handle the AGP case to avoid calling ttm_tt_bind and in all cases ttm_tt_bind needs to be avoided. Looking at the radeon and nouveau drivers I don't see that testing happening. Am I just missing something?
On Thu, Dec 22, 2011 at 4:56 PM, James Simmons jsimmons@infradead.org wrote:
Hi!
I updated the openchrome tree and while testing on the AGP system discovered some interesting problems with the new TTM changes. The problems center around the ttm_tt_[un]populate which I modeled after the radeon and nouveau driver. First problem I noticed was on a AGP system that my ttm_tt_populate function would oops. Tracking it down I found the problem was the ttm_agp_tt_create calls ttm_tt_init instead of ttm_dma_tt_init so once my ttm_tt_populate function would attempt to touch the dma_address it would oops. The second issue is the assumption of the cast for struct ttm_tt in both the populate and unpopulate function. For the AGP case the proper case would be to ttm_agp_backend. At this point my assumption is the ttm_bo_move function has to be rewritten to handle the AGP case to avoid calling ttm_tt_bind and in all cases ttm_tt_bind needs to be avoided. Looking at the radeon and nouveau drivers I don't see that testing happening. Am I just missing something?
Happens on AGP radeons as well: https://bugs.freedesktop.org/show_bug.cgi?id=43719
Alex
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi!
I updated the openchrome tree and while testing on the AGP system discovered some interesting problems with the new TTM changes. The problems center around the ttm_tt_[un]populate which I modeled after the radeon and nouveau driver. First problem I noticed was on a AGP system that my ttm_tt_populate function would oops. Tracking it down I found the problem was the ttm_agp_tt_create calls ttm_tt_init instead of ttm_dma_tt_init so once my ttm_tt_populate function would attempt to touch the dma_address it would oops. The second issue is the assumption of the cast for struct ttm_tt in both the populate and unpopulate function. For the AGP case the proper case would be to ttm_agp_backend. At this point my assumption is the ttm_bo_move function has to be rewritten to handle the AGP case to avoid calling ttm_tt_bind and in all cases ttm_tt_bind needs to be avoided. Looking at the radeon and nouveau drivers I don't see that testing happening. Am I just missing something?
Happens on AGP radeons as well: https://bugs.freedesktop.org/show_bug.cgi?id=43719
So I'm not crazy, so this needs to be fixed. Here is what my understanding of the TTM layer is. My impression is that struct ttm_bo_driver handles multiple domains, AGP, pcie etc and in each method you have to handle each specific domain you support. Also *move gives the impression of moving between these different domains. Where as for struct ttm_backend_func was more for allocating from a specific domain. Also I never saw a clear way to handle multiple backends. For example my AGP systems can also do pci dma as well.
On Thu, Dec 22, 2011 at 8:19 PM, James Simmons jsimmons@infradead.org wrote:
Hi!
I updated the openchrome tree and while testing on the AGP system discovered some interesting problems with the new TTM changes. The problems center around the ttm_tt_[un]populate which I modeled after the radeon and nouveau driver. First problem I noticed was on a AGP system that my ttm_tt_populate function would oops. Tracking it down I found the problem was the ttm_agp_tt_create calls ttm_tt_init instead of ttm_dma_tt_init so once my ttm_tt_populate function would attempt to touch the dma_address it would oops. The second issue is the assumption of the cast for struct ttm_tt in both the populate and unpopulate function. For the AGP case the proper case would be to ttm_agp_backend. At this point my assumption is the ttm_bo_move function has to be rewritten to handle the AGP case to avoid calling ttm_tt_bind and in all cases ttm_tt_bind needs to be avoided. Looking at the radeon and nouveau drivers I don't see that testing happening. Am I just missing something?
Happens on AGP radeons as well: https://bugs.freedesktop.org/show_bug.cgi?id=43719
So I'm not crazy, so this needs to be fixed. Here is what my understanding of the TTM layer is. My impression is that struct ttm_bo_driver handles multiple domains, AGP, pcie etc and in each method you have to handle each specific domain you support. Also *move gives the impression of moving between these different domains. Where as for struct ttm_backend_func was more for allocating from a specific domain. Also I never saw a clear way to handle multiple backends. For example my AGP systems can also do pci dma as well.
Yes this need to be fixed, i am likely guilty, i tested agp along the way but i must have miss something. If no one beat me to it i will get back to this on january.
Cheers, Jerome
On Fri, Dec 23, 2011 at 01:19:43AM +0000, James Simmons wrote:
Hi!
I updated the openchrome tree and while testing on the AGP system discovered some interesting problems with the new TTM changes. The problems center around the ttm_tt_[un]populate which I modeled after the radeon and nouveau driver. First problem I noticed was on a AGP system that my ttm_tt_populate function would oops. Tracking it down I found the problem was the ttm_agp_tt_create calls ttm_tt_init instead of ttm_dma_tt_init so once my ttm_tt_populate function would attempt to touch the dma_address it would oops. The second issue is the assumption of the cast for struct ttm_tt in both the populate and unpopulate function. For the AGP case the proper case would be to ttm_agp_backend. At this point my assumption is the ttm_bo_move function has to be rewritten to handle the AGP case to avoid calling ttm_tt_bind and in all cases ttm_tt_bind needs to be avoided. Looking at the radeon and nouveau drivers I don't see that testing happening. Am I just missing something?
Happens on AGP radeons as well: https://bugs.freedesktop.org/show_bug.cgi?id=43719
So I'm not crazy, so this needs to be fixed. Here is what my understanding of the TTM layer is. My impression is that struct ttm_bo_driver handles multiple domains, AGP, pcie etc and in each method you have to handle each specific domain you support. Also *move gives the impression of moving between these different domains. Where as for struct ttm_backend_func was more for allocating from a specific domain. Also I never saw a clear way to handle multiple backends. For example my AGP systems can also do pci dma as well.
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Attached is patch to fix this, so sorry about that, i must have lost my agp change along the way when working on the patchset. This patch is not extensively tested, i will do more testing tomorrow on more gpu, might even found an nvidia agp i can try. Again sorry for breaking this.
Cheers, Jerome
On Tue, Jan 03, 2012 at 05:43:40PM -0500, Jerome Glisse wrote:
On Fri, Dec 23, 2011 at 01:19:43AM +0000, James Simmons wrote:
Hi!
I updated the openchrome tree and while testing on the AGP system discovered some interesting problems with the new TTM changes. The problems center around the ttm_tt_[un]populate which I modeled after the radeon and nouveau driver. First problem I noticed was on a AGP system that my ttm_tt_populate function would oops. Tracking it down I found the problem was the ttm_agp_tt_create calls ttm_tt_init instead of ttm_dma_tt_init so once my ttm_tt_populate function would attempt to touch the dma_address it would oops. The second issue is the assumption of the cast for struct ttm_tt in both the populate and unpopulate function. For the AGP case the proper case would be to ttm_agp_backend. At this point my assumption is the ttm_bo_move function has to be rewritten to handle the AGP case to avoid calling ttm_tt_bind and in all cases ttm_tt_bind needs to be avoided. Looking at the radeon and nouveau drivers I don't see that testing happening. Am I just missing something?
Happens on AGP radeons as well: https://bugs.freedesktop.org/show_bug.cgi?id=43719
So I'm not crazy, so this needs to be fixed. Here is what my understanding of the TTM layer is. My impression is that struct ttm_bo_driver handles multiple domains, AGP, pcie etc and in each method you have to handle each specific domain you support. Also *move gives the impression of moving between these different domains. Where as for struct ttm_backend_func was more for allocating from a specific domain. Also I never saw a clear way to handle multiple backends. For example my AGP systems can also do pci dma as well.
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Attached is patch to fix this, so sorry about that, i must have lost my agp change along the way when working on the patchset. This patch is not extensively tested, i will do more testing tomorrow on more gpu, might even found an nvidia agp i can try. Again sorry for breaking this.
Hey Jerome,
Was going to look at this week and see how it performs before (and after) the squash ttm bind+populate operation. Any thoughts of what benchmarks I should run?
Fyi, I've some NVidia AGP cards. We both live in Boston so could meet up.
And I could ask you about the patches I sent - not clear if you want to me squash the patch 2 and 3 together or just redo them diffrently.
Cheers, Jerome
From 99a6eee89a43bce155a93ab6d71ea041aac10680 Mon Sep 17 00:00:00 2001
From: Jerome Glisse jglisse@redhat.com Date: Tue, 3 Jan 2012 17:37:37 -0500 Subject: [PATCH] ttm: fix agp since ttm tt rework
ttm tt rework modified the way we allocate and populate the ttm_tt structure, the AGP side was missing some bit to properly work. Fix those and fix radeon and nouveau AGP support.
Tested on radeon only so far.
Signed-off-by: Jerome Glisse jglisse@redhat.com
drivers/gpu/drm/nouveau/nouveau_bo.c | 13 +++++++++++++ drivers/gpu/drm/radeon/radeon_ttm.c | 11 +++++++++++ drivers/gpu/drm/ttm/ttm_agp_backend.c | 17 +++++++++++++++++ drivers/gpu/drm/ttm/ttm_tt.c | 2 ++ include/drm/ttm/ttm_bo_driver.h | 2 ++ 5 files changed, 45 insertions(+), 0 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 8cf4a48..724b41a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1066,6 +1066,12 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) dev_priv = nouveau_bdev(ttm->bdev); dev = dev_priv->dev;
+#if __OS_HAS_AGP
- if (dev_priv->gart_info.type == NOUVEAU_GART_AGP) {
return ttm_agp_tt_populate(ttm);
- }
+#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { return ttm_dma_populate((void *)ttm, dev->dev); @@ -1105,6 +1111,13 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) dev_priv = nouveau_bdev(ttm->bdev); dev = dev_priv->dev;
+#if __OS_HAS_AGP
- if (dev_priv->gart_info.type == NOUVEAU_GART_AGP) {
ttm_agp_tt_unpopulate(ttm);
return;
- }
+#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { ttm_dma_unpopulate((void *)ttm, dev->dev); diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index b0ebaf8..dc73d77 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -588,6 +588,11 @@ static int radeon_ttm_tt_populate(struct ttm_tt *ttm) return 0;
rdev = radeon_get_rdev(ttm->bdev); +#if __OS_HAS_AGP
- if (rdev->flags & RADEON_IS_AGP) {
return ttm_agp_tt_populate(ttm);
- }
+#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { @@ -624,6 +629,12 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm) unsigned i;
rdev = radeon_get_rdev(ttm->bdev); +#if __OS_HAS_AGP
- if (rdev->flags & RADEON_IS_AGP) {
ttm_agp_tt_unpopulate(ttm);
return;
- }
+#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c index 14ebd36..747c141 100644 --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c @@ -31,6 +31,7 @@
#include "ttm/ttm_module.h" #include "ttm/ttm_bo_driver.h" +#include "ttm/ttm_page_alloc.h" #ifdef TTM_HAS_AGP #include "ttm/ttm_placement.h" #include <linux/agp_backend.h> @@ -97,6 +98,7 @@ static void ttm_agp_destroy(struct ttm_tt *ttm)
if (agp_be->mem) ttm_agp_unbind(ttm);
- ttm_tt_fini(ttm); kfree(agp_be);
}
@@ -129,4 +131,19 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_bo_device *bdev, } EXPORT_SYMBOL(ttm_agp_tt_create);
+int ttm_agp_tt_populate(struct ttm_tt *ttm) +{
- if (ttm->state != tt_unpopulated)
return 0;
- return ttm_pool_populate(ttm);
+} +EXPORT_SYMBOL(ttm_agp_tt_populate);
+void ttm_agp_tt_unpopulate(struct ttm_tt *ttm) +{
- ttm_pool_unpopulate(ttm);
+} +EXPORT_SYMBOL(ttm_agp_tt_unpopulate);
#endif diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58e1fa1..2f75d20 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -191,6 +191,7 @@ int ttm_tt_init(struct ttm_tt *ttm, struct ttm_bo_device *bdev, ttm->page_flags = page_flags; ttm->dummy_read_page = dummy_read_page; ttm->state = tt_unpopulated;
ttm->swap_storage = NULL;
ttm_tt_alloc_page_directory(ttm); if (!ttm->pages) {
@@ -222,6 +223,7 @@ int ttm_dma_tt_init(struct ttm_dma_tt *ttm_dma, struct ttm_bo_device *bdev, ttm->page_flags = page_flags; ttm->dummy_read_page = dummy_read_page; ttm->state = tt_unpopulated;
ttm->swap_storage = NULL;
INIT_LIST_HEAD(&ttm_dma->pages_list); ttm_dma_tt_alloc_page_directory(ttm_dma);
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 2be8891..d43e892 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -1030,6 +1030,8 @@ extern struct ttm_tt *ttm_agp_tt_create(struct ttm_bo_device *bdev, struct agp_bridge_data *bridge, unsigned long size, uint32_t page_flags, struct page *dummy_read_page); +int ttm_agp_tt_populate(struct ttm_tt *ttm); +void ttm_agp_tt_unpopulate(struct ttm_tt *ttm); #endif
#endif
1.7.5.4
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Jan 03, 2012 at 05:59:33PM -0500, Konrad Rzeszutek Wilk wrote:
On Tue, Jan 03, 2012 at 05:43:40PM -0500, Jerome Glisse wrote:
On Fri, Dec 23, 2011 at 01:19:43AM +0000, James Simmons wrote:
Hi!
I updated the openchrome tree and while testing on the AGP system discovered some interesting problems with the new TTM changes. The problems center around the ttm_tt_[un]populate which I modeled after the radeon and nouveau driver. First problem I noticed was on a AGP system that my ttm_tt_populate function would oops. Tracking it down I found the problem was the ttm_agp_tt_create calls ttm_tt_init instead of ttm_dma_tt_init so once my ttm_tt_populate function would attempt to touch the dma_address it would oops. The second issue is the assumption of the cast for struct ttm_tt in both the populate and unpopulate function. For the AGP case the proper case would be to ttm_agp_backend. At this point my assumption is the ttm_bo_move function has to be rewritten to handle the AGP case to avoid calling ttm_tt_bind and in all cases ttm_tt_bind needs to be avoided. Looking at the radeon and nouveau drivers I don't see that testing happening. Am I just missing something?
Happens on AGP radeons as well: https://bugs.freedesktop.org/show_bug.cgi?id=43719
So I'm not crazy, so this needs to be fixed. Here is what my understanding of the TTM layer is. My impression is that struct ttm_bo_driver handles multiple domains, AGP, pcie etc and in each method you have to handle each specific domain you support. Also *move gives the impression of moving between these different domains. Where as for struct ttm_backend_func was more for allocating from a specific domain. Also I never saw a clear way to handle multiple backends. For example my AGP systems can also do pci dma as well.
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Attached is patch to fix this, so sorry about that, i must have lost my agp change along the way when working on the patchset. This patch is not extensively tested, i will do more testing tomorrow on more gpu, might even found an nvidia agp i can try. Again sorry for breaking this.
Hey Jerome,
Was going to look at this week and see how it performs before (and after) the squash ttm bind+populate operation. Any thoughts of what benchmarks I should run?
OK, Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
I ran it on my radeon AGP cards and they performed nicely. I do need to run it with the nouveau AGP card tomorrow (barring any move_notify issues).
Attached is patch to fix this, so sorry about that, i must have lost my agp change along the way when working on the patchset. This patch is not extensively tested, i will do more testing tomorrow on more gpu, might even found an nvidia agp i can try. Again sorry for breaking this.
Thanks for the fix up. I was wondering if this purposed change could be done instead. Its the same as your fix up except that you pass in the ttm_buffer_object for tt_create. The reason is I really like to have the ability to have more than one back end to grab a bunch pf pages from. Currently its AGP or something else. On some of my embedded devices the AGP space is not very large and can be exhausted very quickly. Those embedded devices have DMA engines I could use instead.
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 8cf4a48..58ad508 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -349,10 +349,11 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) }
static struct ttm_tt * -nouveau_ttm_tt_create(struct ttm_bo_device *bdev, +nouveau_ttm_tt_create(struct ttm_buffer_object *bo, unsigned long size, uint32_t page_flags, struct page *dummy_read_page) { + struct ttm_bo_device *bdev = bo->bdev; struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev); struct drm_device *dev = dev_priv->dev;
@@ -1066,6 +1067,12 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) dev_priv = nouveau_bdev(ttm->bdev); dev = dev_priv->dev;
+#if __OS_HAS_AGP + if (dev_priv->gart_info.type == NOUVEAU_GART_AGP) { + return ttm_agp_tt_populate(ttm); + } +#endif + #ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { return ttm_dma_populate((void *)ttm, dev->dev); @@ -1105,6 +1112,13 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) dev_priv = nouveau_bdev(ttm->bdev); dev = dev_priv->dev;
+#if __OS_HAS_AGP + if (dev_priv->gart_info.type == NOUVEAU_GART_AGP) { + ttm_agp_tt_unpopulate(ttm); + return; + } +#endif + #ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { ttm_dma_unpopulate((void *)ttm, dev->dev); diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index b0ebaf8..655f8e9 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -549,10 +549,12 @@ static struct ttm_backend_func radeon_backend_func = { .destroy = &radeon_ttm_backend_destroy, };
-struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev, - unsigned long size, uint32_t page_flags, +struct ttm_tt *radeon_ttm_tt_create(struct ttm_buffer_object *bo, + unsigned long size, + uint32_t page_flags, struct page *dummy_read_page) { + struct ttm_bo_device *bdev = bo->bdev; struct radeon_device *rdev; struct radeon_ttm_tt *gtt;
@@ -588,6 +590,11 @@ static int radeon_ttm_tt_populate(struct ttm_tt *ttm) return 0;
rdev = radeon_get_rdev(ttm->bdev); +#if __OS_HAS_AGP + if (rdev->flags & RADEON_IS_AGP) { + return ttm_agp_tt_populate(ttm); + } +#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { @@ -624,6 +631,12 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm) unsigned i;
rdev = radeon_get_rdev(ttm->bdev); +#if __OS_HAS_AGP + if (rdev->flags & RADEON_IS_AGP) { + ttm_agp_tt_unpopulate(ttm); + return; + } +#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c index 14ebd36..747c141 100644 --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c @@ -31,6 +31,7 @@
#include "ttm/ttm_module.h" #include "ttm/ttm_bo_driver.h" +#include "ttm/ttm_page_alloc.h" #ifdef TTM_HAS_AGP #include "ttm/ttm_placement.h" #include <linux/agp_backend.h> @@ -97,6 +98,7 @@ static void ttm_agp_destroy(struct ttm_tt *ttm)
if (agp_be->mem) ttm_agp_unbind(ttm); + ttm_tt_fini(ttm); kfree(agp_be); }
@@ -129,4 +131,19 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_bo_device *bdev, } EXPORT_SYMBOL(ttm_agp_tt_create);
+int ttm_agp_tt_populate(struct ttm_tt *ttm) +{ + if (ttm->state != tt_unpopulated) + return 0; + + return ttm_pool_populate(ttm); +} +EXPORT_SYMBOL(ttm_agp_tt_populate); + +void ttm_agp_tt_unpopulate(struct ttm_tt *ttm) +{ + ttm_pool_unpopulate(ttm); +} +EXPORT_SYMBOL(ttm_agp_tt_unpopulate); + #endif diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 2f0eab6..1adc149 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -338,7 +338,7 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, bool zero_alloc) if (zero_alloc) page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC; case ttm_bo_type_kernel: - bo->ttm = bdev->driver->ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT, + bo->ttm = bdev->driver->ttm_tt_create(bo, bo->num_pages << PAGE_SHIFT, page_flags, glob->dummy_read_page); if (unlikely(bo->ttm == NULL)) ret = -ENOMEM; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58e1fa1..2f75d20 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -191,6 +191,7 @@ int ttm_tt_init(struct ttm_tt *ttm, struct ttm_bo_device *bdev, ttm->page_flags = page_flags; ttm->dummy_read_page = dummy_read_page; ttm->state = tt_unpopulated; + ttm->swap_storage = NULL;
ttm_tt_alloc_page_directory(ttm); if (!ttm->pages) { @@ -222,6 +223,7 @@ int ttm_dma_tt_init(struct ttm_dma_tt *ttm_dma, struct ttm_bo_device *bdev, ttm->page_flags = page_flags; ttm->dummy_read_page = dummy_read_page; ttm->state = tt_unpopulated; + ttm->swap_storage = NULL;
INIT_LIST_HEAD(&ttm_dma->pages_list); ttm_dma_tt_alloc_page_directory(ttm_dma); diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 2be8891..6b67c37 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -318,7 +318,7 @@ struct ttm_bo_driver { /** * ttm_tt_create * - * @bdev: pointer to a struct ttm_bo_device: + * @bo: pointer to a struct ttm_bo: * @size: Size of the data needed backing. * @page_flags: Page flags as identified by TTM_PAGE_FLAG_XX flags. * @dummy_read_page: See struct ttm_bo_device. @@ -328,7 +328,7 @@ struct ttm_bo_driver { * Returns: * NULL: Out of memory. */ - struct ttm_tt *(*ttm_tt_create)(struct ttm_bo_device *bdev, + struct ttm_tt *(*ttm_tt_create)(struct ttm_buffer_object *bo, unsigned long size, uint32_t page_flags, struct page *dummy_read_page); @@ -1030,6 +1030,8 @@ extern struct ttm_tt *ttm_agp_tt_create(struct ttm_bo_device *bdev, struct agp_bridge_data *bridge, unsigned long size, uint32_t page_flags, struct page *dummy_read_page); +int ttm_agp_tt_populate(struct ttm_tt *ttm); +void ttm_agp_tt_unpopulate(struct ttm_tt *ttm); #endif
#endif
On Wed, Jan 4, 2012 at 10:36 AM, James Simmons jsimmons@infradead.org wrote:
Attached is patch to fix this, so sorry about that, i must have lost my agp change along the way when working on the patchset. This patch is not extensively tested, i will do more testing tomorrow on more gpu, might even found an nvidia agp i can try. Again sorry for breaking this.
Thanks for the fix up. I was wondering if this purposed change could be done instead. Its the same as your fix up except that you pass in the ttm_buffer_object for tt_create. The reason is I really like to have the ability to have more than one back end to grab a bunch pf pages from. Currently its AGP or something else. On some of my embedded devices the AGP space is not very large and can be exhausted very quickly. Those embedded devices have DMA engines I could use instead.
This change violate the layer separation ttm wish to preserve see : http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg16443.html
You can achieve what you want by either adding a new domain so you would have system, vram, agp, pcidma and object can be bound to one and only one. Or you can hack your own agp ttm backend that could bind bo to agp or pci or both at the same time depending on what you want to achieve.
Cheers, Jerome
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 8cf4a48..58ad508 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -349,10 +349,11 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) }
static struct ttm_tt * -nouveau_ttm_tt_create(struct ttm_bo_device *bdev, +nouveau_ttm_tt_create(struct ttm_buffer_object *bo, unsigned long size, uint32_t page_flags, struct page *dummy_read_page) {
- struct ttm_bo_device *bdev = bo->bdev;
struct drm_nouveau_private *dev_priv = nouveau_bdev(bdev); struct drm_device *dev = dev_priv->dev;
@@ -1066,6 +1067,12 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) dev_priv = nouveau_bdev(ttm->bdev); dev = dev_priv->dev;
+#if __OS_HAS_AGP
- if (dev_priv->gart_info.type == NOUVEAU_GART_AGP) {
- return ttm_agp_tt_populate(ttm);
- }
+#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { return ttm_dma_populate((void *)ttm, dev->dev); @@ -1105,6 +1112,13 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) dev_priv = nouveau_bdev(ttm->bdev); dev = dev_priv->dev;
+#if __OS_HAS_AGP
- if (dev_priv->gart_info.type == NOUVEAU_GART_AGP) {
- ttm_agp_tt_unpopulate(ttm);
- return;
- }
+#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { ttm_dma_unpopulate((void *)ttm, dev->dev); diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index b0ebaf8..655f8e9 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -549,10 +549,12 @@ static struct ttm_backend_func radeon_backend_func = { .destroy = &radeon_ttm_backend_destroy, };
-struct ttm_tt *radeon_ttm_tt_create(struct ttm_bo_device *bdev,
- unsigned long size, uint32_t page_flags,
+struct ttm_tt *radeon_ttm_tt_create(struct ttm_buffer_object *bo,
- unsigned long size,
- uint32_t page_flags,
struct page *dummy_read_page) {
- struct ttm_bo_device *bdev = bo->bdev;
struct radeon_device *rdev; struct radeon_ttm_tt *gtt;
@@ -588,6 +590,11 @@ static int radeon_ttm_tt_populate(struct ttm_tt *ttm) return 0;
rdev = radeon_get_rdev(ttm->bdev); +#if __OS_HAS_AGP
- if (rdev->flags & RADEON_IS_AGP) {
- return ttm_agp_tt_populate(ttm);
- }
+#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { @@ -624,6 +631,12 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm) unsigned i;
rdev = radeon_get_rdev(ttm->bdev); +#if __OS_HAS_AGP
- if (rdev->flags & RADEON_IS_AGP) {
- ttm_agp_tt_unpopulate(ttm);
- return;
- }
+#endif
#ifdef CONFIG_SWIOTLB if (swiotlb_nr_tbl()) { diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c index 14ebd36..747c141 100644 --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c @@ -31,6 +31,7 @@
#include "ttm/ttm_module.h" #include "ttm/ttm_bo_driver.h" +#include "ttm/ttm_page_alloc.h" #ifdef TTM_HAS_AGP #include "ttm/ttm_placement.h" #include <linux/agp_backend.h> @@ -97,6 +98,7 @@ static void ttm_agp_destroy(struct ttm_tt *ttm)
if (agp_be->mem) ttm_agp_unbind(ttm);
- ttm_tt_fini(ttm);
kfree(agp_be); }
@@ -129,4 +131,19 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_bo_device *bdev, } EXPORT_SYMBOL(ttm_agp_tt_create);
+int ttm_agp_tt_populate(struct ttm_tt *ttm) +{
- if (ttm->state != tt_unpopulated)
- return 0;
- return ttm_pool_populate(ttm);
+} +EXPORT_SYMBOL(ttm_agp_tt_populate);
+void ttm_agp_tt_unpopulate(struct ttm_tt *ttm) +{
- ttm_pool_unpopulate(ttm);
+} +EXPORT_SYMBOL(ttm_agp_tt_unpopulate);
#endif diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 2f0eab6..1adc149 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -338,7 +338,7 @@ static int ttm_bo_add_ttm(struct ttm_buffer_object *bo, bool zero_alloc) if (zero_alloc) page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC; case ttm_bo_type_kernel:
- bo->ttm = bdev->driver->ttm_tt_create(bdev, bo->num_pages << PAGE_SHIFT,
- bo->ttm = bdev->driver->ttm_tt_create(bo, bo->num_pages << PAGE_SHIFT,
page_flags, glob->dummy_read_page); if (unlikely(bo->ttm == NULL)) ret = -ENOMEM; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 58e1fa1..2f75d20 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -191,6 +191,7 @@ int ttm_tt_init(struct ttm_tt *ttm, struct ttm_bo_device *bdev, ttm->page_flags = page_flags; ttm->dummy_read_page = dummy_read_page; ttm->state = tt_unpopulated;
- ttm->swap_storage = NULL;
ttm_tt_alloc_page_directory(ttm); if (!ttm->pages) { @@ -222,6 +223,7 @@ int ttm_dma_tt_init(struct ttm_dma_tt *ttm_dma, struct ttm_bo_device *bdev, ttm->page_flags = page_flags; ttm->dummy_read_page = dummy_read_page; ttm->state = tt_unpopulated;
- ttm->swap_storage = NULL;
INIT_LIST_HEAD(&ttm_dma->pages_list); ttm_dma_tt_alloc_page_directory(ttm_dma); diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 2be8891..6b67c37 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -318,7 +318,7 @@ struct ttm_bo_driver { /** * ttm_tt_create *
- * @bdev: pointer to a struct ttm_bo_device:
- * @bo: pointer to a struct ttm_bo:
* @size: Size of the data needed backing. * @page_flags: Page flags as identified by TTM_PAGE_FLAG_XX flags. * @dummy_read_page: See struct ttm_bo_device. @@ -328,7 +328,7 @@ struct ttm_bo_driver { * Returns: * NULL: Out of memory. */
- struct ttm_tt *(*ttm_tt_create)(struct ttm_bo_device *bdev,
- struct ttm_tt *(*ttm_tt_create)(struct ttm_buffer_object *bo,
unsigned long size, uint32_t page_flags, struct page *dummy_read_page); @@ -1030,6 +1030,8 @@ extern struct ttm_tt *ttm_agp_tt_create(struct ttm_bo_device *bdev, struct agp_bridge_data *bridge, unsigned long size, uint32_t page_flags, struct page *dummy_read_page); +int ttm_agp_tt_populate(struct ttm_tt *ttm); +void ttm_agp_tt_unpopulate(struct ttm_tt *ttm); #endif
#endif
Attached is patch to fix this, so sorry about that, i must have lost my agp change along the way when working on the patchset. This patch is not extensively tested, i will do more testing tomorrow on more gpu, might even found an nvidia agp i can try. Again sorry for breaking this.
Thanks for the fix up. I was wondering if this purposed change could be done instead. Its the same as your fix up except that you pass in the ttm_buffer_object for tt_create. The reason is I really like to have the ability to have more than one back end to grab a bunch pf pages from. Currently its AGP or something else. On some of my embedded devices the AGP space is not very large and can be exhausted very quickly. Those embedded devices have DMA engines I could use instead.
This change violate the layer separation ttm wish to preserve see : http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg16443.html
You can achieve what you want by either adding a new domain so you would have system, vram, agp, pcidma and object can be bound to one and only one. Or you can hack your own agp ttm backend that could bind bo to agp or pci or both at the same time depending on what you want to achieve.
The question is how does one know which domain you want in tt_create. Currenty drivers are using there dev_priv but if you have have more than one option available how does one choose? Would you be okay with passing in a domain flag?
On Wed, Jan 4, 2012 at 11:04 AM, James Simmons jsimmons@infradead.org wrote:
Attached is patch to fix this, so sorry about that, i must have lost my agp change along the way when working on the patchset. This patch is not extensively tested, i will do more testing tomorrow on more gpu, might even found an nvidia agp i can try. Again sorry for breaking this.
Thanks for the fix up. I was wondering if this purposed change could be done instead. Its the same as your fix up except that you pass in the ttm_buffer_object for tt_create. The reason is I really like to have the ability to have more than one back end to grab a bunch pf pages from. Currently its AGP or something else. On some of my embedded devices the AGP space is not very large and can be exhausted very quickly. Those embedded devices have DMA engines I could use instead.
This change violate the layer separation ttm wish to preserve see : http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg16443.html
You can achieve what you want by either adding a new domain so you would have system, vram, agp, pcidma and object can be bound to one and only one. Or you can hack your own agp ttm backend that could bind bo to agp or pci or both at the same time depending on what you want to achieve.
The question is how does one know which domain you want in tt_create. Currenty drivers are using there dev_priv but if you have have more than one option available how does one choose? Would you be okay with passing in a domain flag?
Well i agree that something would be usefull there so the driver know which bind/unbind function it should use. Thomas i would prefer passing the bo to the tt_create callback but a flag is the minimum we need.
Cheers, Jerome
You can achieve what you want by either adding a new domain so you would have system, vram, agp, pcidma and object can be bound to one and only one. Or you can hack your own agp ttm backend that could bind bo to agp or pci or both at the same time depending on what you want to achieve.
The question is how does one know which domain you want in tt_create. Currenty drivers are using there dev_priv but if you have have more than one option available how does one choose? Would you be okay with passing in a domain flag?
Well i agree that something would be usefull there so the driver know which bind/unbind function it should use. Thomas i would prefer passing the bo to the tt_create callback but a flag is the minimum we need.
We can discuss this after the merge widow. Jerome your patch does fix a regression whereas my proposal is a enhancement.
On 01/06/2012 04:51 PM, James Simmons wrote:
You can achieve what you want by either adding a new domain so you would have system, vram, agp, pcidma and object can be bound to one and only one. Or you can hack your own agp ttm backend that could bind bo to agp or pci or both at the same time depending on what you want to achieve.
The question is how does one know which domain you want in tt_create. Currenty drivers are using there dev_priv but if you have have more than one option available how does one choose? Would you be okay with passing in a domain flag?
Well i agree that something would be usefull there so the driver know which bind/unbind function it should use. Thomas i would prefer passing the bo to the tt_create callback but a flag is the minimum we need.
We can discuss this after the merge widow. Jerome your patch does fix a regression whereas my proposal is a enhancement.
..Back from parental leave.
I'm not 100% sure I understand the problem correctly, but I assume the problem is that you receive a "bind" request for pages allocated out of the wrong DMA pool? Is that correct?
In that case we need to make both the following happen: 1) The backend should be required to supply a fallback that can bind anyway by allocating the correct page type and copy. 2) I'm OK with providing a hint to tt_create. IIRC we're passing bo::mem to tt_bind. Would it be ok to pass the same to tt_create?
/Thomas
On Mon, Jan 09, 2012 at 10:07:02AM +0100, Thomas Hellstrom wrote:
On 01/06/2012 04:51 PM, James Simmons wrote:
You can achieve what you want by either adding a new domain so you would have system, vram, agp, pcidma and object can be bound to one and only one. Or you can hack your own agp ttm backend that could bind bo to agp or pci or both at the same time depending on what you want to achieve.
The question is how does one know which domain you want in tt_create. Currenty drivers are using there dev_priv but if you have have more than one option available how does one choose? Would you be okay with passing in a domain flag?
Well i agree that something would be usefull there so the driver know which bind/unbind function it should use. Thomas i would prefer passing the bo to the tt_create callback but a flag is the minimum we need.
We can discuss this after the merge widow. Jerome your patch does fix a regression whereas my proposal is a enhancement.
..Back from parental leave.
I'm not 100% sure I understand the problem correctly, but I assume the problem is that you receive a "bind" request for pages allocated out of the wrong DMA pool? Is that correct?
In that case we need to make both the following happen:
- The backend should be required to supply a fallback that can bind
anyway by allocating the correct page type and copy. 2) I'm OK with providing a hint to tt_create. IIRC we're passing bo::mem to tt_bind. Would it be ok to pass the same to tt_create?
/Thomas
Issue here is that different backend (AGP or PCI DMA or someother communication way btw GPU and system memory) needs to use different page allocator/backend. Thus tt_create need either to know about the bo or at least about bo:mem.
Cheers, Jerome
On Mon, Jan 09, 2012 at 10:07:02AM +0100, Thomas Hellstrom wrote:
On 01/06/2012 04:51 PM, James Simmons wrote:
You can achieve what you want by either adding a new domain so you would have system, vram, agp, pcidma and object can be bound to one and only one. Or you can hack your own agp ttm backend that could bind bo to agp or pci or both at the same time depending on what you want to achieve.
The question is how does one know which domain you want in tt_create. Currenty drivers are using there dev_priv but if you have have more than one option available how does one choose? Would you be okay with passing in a domain flag?
Well i agree that something would be usefull there so the driver know which bind/unbind function it should use. Thomas i would prefer passing the bo to the tt_create callback but a flag is the minimum we need.
We can discuss this after the merge widow. Jerome your patch does fix a regression whereas my proposal is a enhancement.
..Back from parental leave.
Congratulations!
I'm not 100% sure I understand the problem correctly, but I assume the problem is that you receive a "bind" request for pages allocated out of the wrong DMA pool? Is that correct?
In that case we need to make both the following happen:
- The backend should be required to supply a fallback that can bind
anyway by allocating the correct page type and copy. 2) I'm OK with providing a hint to tt_create. IIRC we're passing bo::mem to tt_bind. Would it be ok to pass the same to tt_create?
/Thomas
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org