Hi,
So attached is patch serie (sorry for that i am away of my normal mail config) which ultimately merge ttm_backend & ttm_tt it allows to get rid of data duplication btw the two, especialy btw ttm_tt and driver specific backend. So net result is less 300lines of code accross ttm and driver.
Konrad given some of the upstream nouveau change, your patchserie conflict and would lead to revert some nouveau fixes. I believe the intention is to get the ttm-dma code into 3.3 (3.2 seems late but dunno). If 3.3 is the aim than i will rebase your patch on top of this serie, this should lower the number of change your serie needed.
Note that this is early code, only compile tested, i just wanted to get feedback to know if anyone has objection on that.
For quick review, first 5 patches can be ignored, they are cleanup and dead code removal or small fixes. The real action is in the last patch.
Next set is to move page allocation through ttm_helper into the driver thus making the whole dma stuff a driver business, driver who don't care about dma (like vmw) could ignore it, while driver that have to would deal with it through ttm_helper.
Cheers, Jerome
On Tue, Nov 01, 2011 at 09:11:37PM -0400, Jerome Glisse wrote:
Hi,
So attached is patch serie (sorry for that i am away of my normal mail config) which ultimately merge ttm_backend & ttm_tt it allows to get rid of data duplication btw the two, especialy btw ttm_tt and driver specific backend. So net result is less 300lines of code accross ttm and driver.
Konrad given some of the upstream nouveau change, your patchserie conflict and would lead to revert some nouveau fixes. I believe the intention is to get the ttm-dma code into 3.3 (3.2 seems late but dunno). If 3.3 is the aim than i will rebase your patch on top of this
Yup. 3.3.
serie, this should lower the number of change your serie needed.
OK.
Note that this is early code, only compile tested, i just wanted to get feedback to know if anyone has objection on that.
Let me take a look then at them tomorrow.
For quick review, first 5 patches can be ignored, they are cleanup and dead code removal or small fixes. The real action is in the last patch.
Next set is to move page allocation through ttm_helper into the driver thus making the whole dma stuff a driver business, driver who don't care about dma (like vmw) could ignore it, while driver that have to would deal with it through ttm_helper.
Cheers, Jerome
On Tue, Nov 1, 2011 at 9:29 PM, Konrad Rzeszutek Wilk konrad.wilk@oracle.com wrote:
On Tue, Nov 01, 2011 at 09:11:37PM -0400, Jerome Glisse wrote:
Hi,
So attached is patch serie (sorry for that i am away of my normal mail config) which ultimately merge ttm_backend & ttm_tt it allows to get rid of data duplication btw the two, especialy btw ttm_tt and driver specific backend. So net result is less 300lines of code accross ttm and driver.
Konrad given some of the upstream nouveau change, your patchserie conflict and would lead to revert some nouveau fixes. I believe the intention is to get the ttm-dma code into 3.3 (3.2 seems late but dunno). If 3.3 is the aim than i will rebase your patch on top of this
Yup. 3.3.
So basicly the issue is that nouveau now abuse the ttm dma_address array to store the result of the pci map page, issue is then to either code your ttm page code or pci unmap when unmapping. This can be worked around, i have half a rebase patch already. But i believe if ttm_backend & ttm_tt are merge and with proper page callback helper, nouveau could be more easily adapted.
Anyway both approach can be taken either you patch on top or bottom. As it's somewhat different issue.
Cheers, Jerome
On Tue, Nov 01, 2011 at 11:05:49PM -0400, Jerome Glisse wrote:
On Tue, Nov 1, 2011 at 9:29 PM, Konrad Rzeszutek Wilk konrad.wilk@oracle.com wrote:
On Tue, Nov 01, 2011 at 09:11:37PM -0400, Jerome Glisse wrote:
Hi,
So attached is patch serie (sorry for that i am away of my normal mail config) which ultimately merge ttm_backend & ttm_tt it allows to get rid of data duplication btw the two, especialy btw ttm_tt and driver specific backend. So net result is less 300lines of code accross ttm and driver.
Konrad given some of the upstream nouveau change, your patchserie conflict and would lead to revert some nouveau fixes. I believe the intention is to get the ttm-dma code into 3.3 (3.2 seems late but dunno). If 3.3 is the aim than i will rebase your patch on top of this
Yup. 3.3.
So basicly the issue is that nouveau now abuse the ttm dma_address array to store the result of the pci map page, issue is then to either code your ttm page code or pci unmap when unmapping. This can be
Right. The crux is that you can't do pci unmap on coherent pages, so there has to be some bool to figure out if your are doing ttm page code or pci map/unmap.
worked around, i have half a rebase patch already. But i believe if ttm_backend & ttm_tt are merge and with proper page callback helper, nouveau could be more easily adapted.
Ok.
Anyway both approach can be taken either you patch on top or bottom.
I would recommend ttm page code at the bottom as I do not think there is going to be much change to that code - but your code is still fresh and you might want to redo it some way or another way, fix bugs, etc - which could mean extra merge conflict issues if the ttm page code is on top.
Thought you could also ditch some of the patches (the ones that deal with the drivers): 0003-ttm-radeon-nouveau-Check-the-DMA-address-from-TTM-ag.patch
That should make the amount of changes to nouveau|radeon smaller..
These could also be expunged: 0004-ttm-Wrap-ttm_-put-get-_pages-and-extract-GFP_-and-ca.patch 0005-ttm-Get-rid-of-temporary-scaffolding.patch
?
As it's somewhat different issue.
Right, git rebase/merge conflict resolution work - or hopefully no work :-)
On 11/02/2011 02:11 AM, Jerome Glisse wrote:
Hi,
So attached is patch serie (sorry for that i am away of my normal mail config) which ultimately merge ttm_backend& ttm_tt it allows to get rid of data duplication btw the two, especialy btw ttm_tt and driver specific backend. So net result is less 300lines of code accross ttm and driver.
Jerome, I'm positive to the idea. Since ttm_tt is supposed to consist of code that is shared between the backends, and backends now have callbacks for almost everything ttm_tt could probably go away.
I'd haven't looked at the patches yet, though. I try to do that soonish and get back.
/Thomas
On Tue, Nov 01, 2011 at 09:11:37PM -0400, Jerome Glisse wrote:
Hi,
So attached is patch serie (sorry for that i am away of my normal mail config) which ultimately merge ttm_backend & ttm_tt it allows to get rid of data duplication btw the two, especialy btw ttm_tt and driver specific backend. So net result is less 300lines of code accross ttm and driver.
Konrad given some of the upstream nouveau change, your patchserie conflict and would lead to revert some nouveau fixes. I believe the intention is to get the ttm-dma code into 3.3 (3.2 seems late but dunno). If 3.3 is the aim than i will rebase your patch on top of this serie, this should lower the number of change your serie needed.
Note that this is early code, only compile tested, i just wanted to get feedback to know if anyone has objection on that.
For quick review, first 5 patches can be ignored, they are cleanup and dead code removal or small fixes. The real action is in the last patch.
Next set is to move page allocation through ttm_helper into the driver thus making the whole dma stuff a driver business, driver who don't care about dma (like vmw) could ignore it, while driver that have to would deal with it through ttm_helper.
I took a look at them (1-3 are good. Please put Reviewed-by on them). The
[PATCH 4/6] drm/ttm: convert page allocation to use page ptr array instead of list
has a interesting side effect - previously the list was used so on the get_pages the order of pages was:
a,b,c,d,e,f
and when one was putting pages back, the list was iterated as so:
for (i = 0; i < ttm->num_pages; ++i) { .. list_add(&cur_page->lru, &h); /* list_add is like a stack, items are put in the front */ } ttm_put_pages(&h,..);
which meant that the list would have the items in: f,e,d,c,b,a
order. The TTM pool code would iterate over those in that order and call __free_page. Which means some performance drawback when the memory had to be iterated forward and then backwards, thought it probably was cached in the L3 cache, so probably no biggie.
I don't know why it was done that way, but I wrote the TTM DMA code to be optimized for that (and it has the code to reverse direction - b/c the testcases I used initially had the a,b,c,d,e,f order when doing get_pages and put_pages) - but I can alter the code to do it in the forward fashion instead of the reverse fashion.. Better yet, it removes around 70 lines of code from the TTM DMA.
Anyhow, it might be noting that in the commit description and perhaps make a bug-fix for the put_pages being called in the error path instead of ttm_put_pages as a seperate patch as you suggested.
[PATCH 6/6] drm/ttm: merge ttm_backend and ttm_tt
That is going to be a bit tricky. You are using the pci_map_page, which should not be used if the pages have been allocated via the pci_alloc_coherent (they are already mapped). Perhaps a simple check such as:
if !(ttm->be && ttm->be->func && ttm->be->func->get_pages) { ttm->dma_address[i] = pci_map_page(dev->pdev, ...) }
That long ttm->be && ttm->be->func can probably be wrapped in a nice macro and be put in ttm_page_alloc.h as:
diff --git a/include/drm/ttm/ttm_page_alloc.h b/include/drm/ttm/ttm_page_alloc.h index daf5db6..39d6076 100644 --- a/include/drm/ttm/ttm_page_alloc.h +++ b/include/drm/ttm/ttm_page_alloc.h @@ -30,6 +30,12 @@ #include "ttm_memory.h"
#ifdef CONFIG_SWIOTLB +static inline bool ttm_dma_inuse(struct ttm_tt *ttm) +{ + if (ttm && ttm->be && ttm->be->func && ttm->be->func->get_pages) + return true; + return false; +} extern bool ttm_dma_override(struct ttm_backend_func *be);
extern int ttm_dma_disable; @@ -47,6 +53,11 @@ void ttm_dma_page_alloc_fini(void); extern int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data); #else #define ttm_dma_disable (1) + +static inline bool ttm_dma_inuse(struct ttm_tt *ttm) +{ + return false; +} static inline bool ttm_dma_override(struct ttm_backend_func *be) { return false;
Hmm, I am also not clear what you are using as base? 3.1 does not seem to have a whole bunch of the things your code is doing. Like the be->func->clear call from nouveau_sgdma.c.
What base should I be looking at? Is this the base that Dave has for 3.2?
On Wed, Nov 02, 2011 at 11:04:43AM -0400, Konrad Rzeszutek Wilk wrote:
On Tue, Nov 01, 2011 at 09:11:37PM -0400, Jerome Glisse wrote:
Hi,
So attached is patch serie (sorry for that i am away of my normal mail config) which ultimately merge ttm_backend & ttm_tt it allows to get rid of data duplication btw the two, especialy btw ttm_tt and driver specific backend. So net result is less 300lines of code accross ttm and driver.
Konrad given some of the upstream nouveau change, your patchserie conflict and would lead to revert some nouveau fixes. I believe the intention is to get the ttm-dma code into 3.3 (3.2 seems late but dunno). If 3.3 is the aim than i will rebase your patch on top of this serie, this should lower the number of change your serie needed.
Note that this is early code, only compile tested, i just wanted to get feedback to know if anyone has objection on that.
For quick review, first 5 patches can be ignored, they are cleanup and dead code removal or small fixes. The real action is in the last patch.
Next set is to move page allocation through ttm_helper into the driver thus making the whole dma stuff a driver business, driver who don't care about dma (like vmw) could ignore it, while driver that have to would deal with it through ttm_helper.
I took a look at them (1-3 are good. Please put Reviewed-by on them). The
[PATCH 4/6] drm/ttm: convert page allocation to use page ptr array instead of list
has a interesting side effect - previously the list was used so on the get_pages the order of pages was:
a,b,c,d,e,f
and when one was putting pages back, the list was iterated as so:
for (i = 0; i < ttm->num_pages; ++i) { .. list_add(&cur_page->lru, &h); /* list_add is like a stack, items are put in the front */ } ttm_put_pages(&h,..);
which meant that the list would have the items in: f,e,d,c,b,a
order. The TTM pool code would iterate over those in that order and call __free_page. Which means some performance drawback when the memory had to be iterated forward and then backwards, thought it probably was cached in the L3 cache, so probably no biggie.
I don't know why it was done that way, but I wrote the TTM DMA code to be optimized for that (and it has the code to reverse direction - b/c the testcases I used initially had the a,b,c,d,e,f order when doing get_pages and put_pages) - but I can alter the code to do it in the forward fashion instead of the reverse fashion.. Better yet, it removes around 70 lines of code from the TTM DMA.
Order in which we put them back on the list can be easily change, either by use of add_tail or by iterating array from end to begining. i am not sure how much this can impact things.
Anyhow, it might be noting that in the commit description and perhaps make a bug-fix for the put_pages being called in the error path instead of ttm_put_pages as a seperate patch as you suggested.
[PATCH 6/6] drm/ttm: merge ttm_backend and ttm_tt
That is going to be a bit tricky. You are using the pci_map_page, which should not be used if the pages have been allocated via the pci_alloc_coherent (they are already mapped). Perhaps a simple check such as:
if !(ttm->be && ttm->be->func && ttm->be->func->get_pages) { ttm->dma_address[i] = pci_map_page(dev->pdev, ...) }
Your dma change are not suppose to be on top of that, idea is to add yet another callbacks populate+free_page which will do either pci map page or just use your code (ie use dma alloc and store dma addr into the array). So there is a missing piece here before adding your dma code. I just wanted to keep ttm_tt & ttm_backend merge as simple as possible without major change. Adding the populate + get_page callback sounded like good candidate for another patch.
That long ttm->be && ttm->be->func can probably be wrapped in a nice macro and be put in ttm_page_alloc.h as:
diff --git a/include/drm/ttm/ttm_page_alloc.h b/include/drm/ttm/ttm_page_alloc.h index daf5db6..39d6076 100644 --- a/include/drm/ttm/ttm_page_alloc.h +++ b/include/drm/ttm/ttm_page_alloc.h @@ -30,6 +30,12 @@ #include "ttm_memory.h"
#ifdef CONFIG_SWIOTLB +static inline bool ttm_dma_inuse(struct ttm_tt *ttm) +{
- if (ttm && ttm->be && ttm->be->func && ttm->be->func->get_pages)
return true;
- return false;
+} extern bool ttm_dma_override(struct ttm_backend_func *be);
extern int ttm_dma_disable; @@ -47,6 +53,11 @@ void ttm_dma_page_alloc_fini(void); extern int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data); #else #define ttm_dma_disable (1)
+static inline bool ttm_dma_inuse(struct ttm_tt *ttm) +{
- return false;
+} static inline bool ttm_dma_override(struct ttm_backend_func *be) { return false;
Hmm, I am also not clear what you are using as base? 3.1 does not seem to have a whole bunch of the things your code is doing. Like the be->func->clear call from nouveau_sgdma.c.
What base should I be looking at? Is this the base that Dave has for 3.2?
Should apply on top of linus master kernel as of yesterday.
I will do some testing and add the populate+free page callback and resend.
Cheers, Jerome
I don't know why it was done that way, but I wrote the TTM DMA code to be optimized for that (and it has the code to reverse direction - b/c the testcases I used initially had the a,b,c,d,e,f order when doing get_pages and put_pages) - but I can alter the code to do it in the forward fashion instead of the reverse fashion.. Better yet, it removes around 70 lines of code from the TTM DMA.
Order in which we put them back on the list can be easily change, either by use of add_tail or by iterating array from end to begining. i am not sure how much this can impact things.
Neither am I. I can run some perf numbers when playing tuxracer and see if there is a disadvantage/advantage.
Anyhow, it might be noting that in the commit description and perhaps make a bug-fix for the put_pages being called in the error path instead of ttm_put_pages as a seperate patch as you suggested.
[PATCH 6/6] drm/ttm: merge ttm_backend and ttm_tt
That is going to be a bit tricky. You are using the pci_map_page, which should not be used if the pages have been allocated via the pci_alloc_coherent (they are already mapped). Perhaps a simple check such as:
if !(ttm->be && ttm->be->func && ttm->be->func->get_pages) { ttm->dma_address[i] = pci_map_page(dev->pdev, ...) }
Your dma change are not suppose to be on top of that, idea is to add yet another callbacks populate+free_page which will do either pci map page or just use your code (ie use dma alloc and store dma addr into the array). So there is a missing piece here before adding your dma code. I just wanted to keep ttm_tt & ttm_backend merge as simple as possible without major change. Adding the populate + get_page callback sounded like good candidate for another patch.
<nods>
What base should I be looking at? Is this the base that Dave has for 3.2?
Should apply on top of linus master kernel as of yesterday.
Oh.. very very *very* fresh.
dri-devel@lists.freedesktop.org