Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: Ørjan Eide orjan.eide@arm.com --- drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -173,6 +173,7 @@ struct ion_dma_buf_attachment { struct device *dev; struct sg_table *table; struct list_head list; + bool mapped:1; };
static int ion_dma_buf_attach(struct dma_buf *dmabuf, @@ -195,6 +196,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf, a->table = table; a->dev = attachment->dev; INIT_LIST_HEAD(&a->list); + a->mapped = false;
attachment->priv = a;
@@ -231,6 +233,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment, direction)) return ERR_PTR(-ENOMEM);
+ a->mapped = true; + return table; }
@@ -238,6 +242,10 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment, struct sg_table *table, enum dma_data_direction direction) { + struct ion_dma_buf_attachment *a = attachment->priv; + + a->mapped = false; + dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction); }
@@ -297,6 +305,8 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
mutex_lock(&buffer->lock); list_for_each_entry(a, &buffer->attachments, list) { + if (!a->mapped) + continue; dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents, direction); } @@ -320,6 +330,8 @@ static int ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
mutex_lock(&buffer->lock); list_for_each_entry(a, &buffer->attachments, list) { + if (!a->mapped) + continue; dma_sync_sg_for_device(a->dev, a->table->sgl, a->table->nents, direction); } -- 2.17.1
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Tue, Apr 14, 2020 at 03:46:27PM +0200, Ørjan Eide wrote:
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Does not work for kernel patches, sorry, now deleted :(
Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: Ørjan Eide orjan.eide@arm.com --- drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Resubmit without disclaimer, sorry about that.
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -173,6 +173,7 @@ struct ion_dma_buf_attachment { struct device *dev; struct sg_table *table; struct list_head list; + bool mapped:1; };
static int ion_dma_buf_attach(struct dma_buf *dmabuf, @@ -195,6 +196,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf, a->table = table; a->dev = attachment->dev; INIT_LIST_HEAD(&a->list); + a->mapped = false;
attachment->priv = a;
@@ -231,6 +233,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment, direction)) return ERR_PTR(-ENOMEM);
+ a->mapped = true; + return table; }
@@ -238,6 +242,10 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment, struct sg_table *table, enum dma_data_direction direction) { + struct ion_dma_buf_attachment *a = attachment->priv; + + a->mapped = false; + dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction); }
@@ -297,6 +305,8 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
mutex_lock(&buffer->lock); list_for_each_entry(a, &buffer->attachments, list) { + if (!a->mapped) + continue; dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents, direction); } @@ -320,6 +330,8 @@ static int ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
mutex_lock(&buffer->lock); list_for_each_entry(a, &buffer->attachments, list) { + if (!a->mapped) + continue; dma_sync_sg_for_device(a->dev, a->table->sgl, a->table->nents, direction); }
On Tue, Apr 14, 2020 at 04:18:47PM +0200, Ørjan Eide wrote:
Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: Ørjan Eide orjan.eide@arm.com
drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Resubmit without disclaimer, sorry about that.
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c
Now that we have the dma-buff stuff in the tree, do we even need the ion code in the kernel anymore? Can't we delete it now?
thanks,
greg k-h
On Tue, Apr 14, 2020 at 04:28:10PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 14, 2020 at 04:18:47PM +0200, �rjan Eide wrote:
Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: �rjan Eide orjan.eide@arm.com
drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Resubmit without disclaimer, sorry about that.
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c
Now that we have the dma-buff stuff in the tree, do we even need the ion code in the kernel anymore? Can't we delete it now?
It looks like the new dma-heaps have the same issue as ion. The heap-helpers also do dma_sync_sg_for_device() unconditionally on end_cpu_access which may happen before dma_map_sg(), leading to use of the 0 dma_address in the sg list of a, yet unmapped, attachment.
It could be fixed in dma-heaps just like this patch does for ion. Is patch a valid way to fix this problem? Or, should this rather be handled in dma-buf core by tracking the mapped state of attachments there?
On Tue, Apr 14, 2020 at 9:11 AM Ørjan Eide orjan.eide@arm.com wrote:
On Tue, Apr 14, 2020 at 04:28:10PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 14, 2020 at 04:18:47PM +0200, �rjan Eide wrote:
Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: �rjan Eide orjan.eide@arm.com
drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Resubmit without disclaimer, sorry about that.
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c
Now that we have the dma-buff stuff in the tree, do we even need the ion code in the kernel anymore? Can't we delete it now?
It looks like the new dma-heaps have the same issue as ion. The heap-helpers also do dma_sync_sg_for_device() unconditionally on end_cpu_access which may happen before dma_map_sg(), leading to use of the 0 dma_address in the sg list of a, yet unmapped, attachment.
Yea, the dma-buf heaps code came from the ION logic, so it likely has the same faults.
It could be fixed in dma-heaps just like this patch does for ion. Is patch a valid way to fix this problem? Or, should this rather be handled in dma-buf core by tracking the mapped state of attachments there?
In the short-term, I'd definitely prefer to have a fix to dmabuf heaps rather then ION, but I also agree that long term it probably shouldn't just be up to the dma-buf exporter (as there are other dmabuf exporters that may have it wrong too), and that we need to address some DMA API expectations/limitations to better handle multiple device pipelines. (I actually gave a talk last fall on some of the issues I see around it: https://youtu.be/UsEVoWD_o0c )
thanks -john
On Tue, Apr 14, 2020 at 7:28 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 14, 2020 at 04:18:47PM +0200, Ørjan Eide wrote:
Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: Ørjan Eide orjan.eide@arm.com
drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Resubmit without disclaimer, sorry about that.
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c
Now that we have the dma-buff stuff in the tree, do we even need the ion code in the kernel anymore? Can't we delete it now?
I agree that we shouldn't be taking further (non-security/cleanup) patches to the ION code.
I'd like to give developers a little bit of a transition period (I was thinking a year, but really just one LTS release that has both would do) where they can move their ION heaps over to dmabuf heaps and test both against the same tree.
But I do think we can mark it as deprecated and let folks know that around the end of the year it will be deleted.
That sound ok?
thanks -john
On Tue, Apr 14, 2020 at 09:41:31PM -0700, John Stultz wrote:
On Tue, Apr 14, 2020 at 7:28 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 14, 2020 at 04:18:47PM +0200, Ørjan Eide wrote:
Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: Ørjan Eide orjan.eide@arm.com
drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Resubmit without disclaimer, sorry about that.
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c
Now that we have the dma-buff stuff in the tree, do we even need the ion code in the kernel anymore? Can't we delete it now?
I agree that we shouldn't be taking further (non-security/cleanup) patches to the ION code.
I'd like to give developers a little bit of a transition period (I was thinking a year, but really just one LTS release that has both would do) where they can move their ION heaps over to dmabuf heaps and test both against the same tree.
But I do think we can mark it as deprecated and let folks know that around the end of the year it will be deleted.
No one ever notices "depreciated" things, they only notice if the code is no longer there :)
So I'm all for just deleting it and seeing who even notices...
thanks,
greg k-h
On Thu, Apr 16, 2020 at 12:25:08PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 14, 2020 at 09:41:31PM -0700, John Stultz wrote:
On Tue, Apr 14, 2020 at 7:28 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 14, 2020 at 04:18:47PM +0200, Ørjan Eide wrote:
Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: Ørjan Eide orjan.eide@arm.com
drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Resubmit without disclaimer, sorry about that.
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c
Now that we have the dma-buff stuff in the tree, do we even need the ion code in the kernel anymore? Can't we delete it now?
I agree that we shouldn't be taking further (non-security/cleanup) patches to the ION code.
I'd like to give developers a little bit of a transition period (I was thinking a year, but really just one LTS release that has both would do) where they can move their ION heaps over to dmabuf heaps and test both against the same tree.
But I do think we can mark it as deprecated and let folks know that around the end of the year it will be deleted.
No one ever notices "depreciated" things, they only notice if the code is no longer there :)
So I'm all for just deleting it and seeing who even notices...
+1 on just deleting ion and watching if anyone notices. In case you're typing that patch, here's my:
Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
-Daniel
On Thu, Apr 16, 2020 at 12:25:08PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 14, 2020 at 09:41:31PM -0700, John Stultz wrote:
On Tue, Apr 14, 2020 at 7:28 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 14, 2020 at 04:18:47PM +0200, Ørjan Eide wrote:
Only sync the sg-list of an Ion dma-buf attachment when the attachment is actually mapped on the device.
dma-bufs may be synced at any time. It can be reached from user space via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when syncs may be attempted, and dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() may not be paired.
Since the sg_list's dma_address isn't set up until the buffer is used on the device, and dma_map_sg() is called on it, the dma_address will be NULL if sync is attempted on the dma-buf before it's mapped on a device.
Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")) this was a problem as the dma-api (at least the swiotlb_dma_ops on arm64) would use the potentially invalid dma_address. How that failed depended on how the device handled physical address 0. If 0 was a valid address to physical ram, that page would get flushed a lot, while the actual pages in the buffer would not get synced correctly. While if 0 is an invalid physical address it may cause a fault and trigger a crash.
In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code"), as this moved the dma-api to use the page pointer in the sg_list, and (for Ion buffers at least) this will always be valid if the sg_list exists at all.
But, this issue is re-introduced in v5.3 with commit 449fa54d6815 ("dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old behaviour and picks the dma_address that may be invalid.
dma-buf core doesn't ensure that the buffer is mapped on the device, and thus have a valid sg_list, before calling the exporter's begin_cpu_access.
Signed-off-by: Ørjan Eide orjan.eide@arm.com
drivers/staging/android/ion/ion.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Resubmit without disclaimer, sorry about that.
This seems to be part of a bigger issue where dma-buf exporters assume that their dma-buf begin_cpu_access and end_cpu_access callbacks have a certain guaranteed behavior, which isn't ensured by dma-buf core.
This patch fixes this in ion only, but it also needs to be fixed for other exporters, either handled like this in each exporter, or in dma-buf core before calling into the exporters.
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 38b51eace4f9..7b752ba0cb6d 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c
Now that we have the dma-buff stuff in the tree, do we even need the ion code in the kernel anymore? Can't we delete it now?
I agree that we shouldn't be taking further (non-security/cleanup) patches to the ION code.
I'd like to give developers a little bit of a transition period (I was thinking a year, but really just one LTS release that has both would do) where they can move their ION heaps over to dmabuf heaps and test both against the same tree.
But I do think we can mark it as deprecated and let folks know that around the end of the year it will be deleted.
No one ever notices "depreciated" things, they only notice if the code is no longer there :)
So I'm all for just deleting it and seeing who even notices...
Agreed. Acked-by: Christian Brauner christian.brauner@ubuntu.com
On Mon, Apr 20, 2020 at 1:22 AM Christian Brauner christian.brauner@ubuntu.com wrote:
On Thu, Apr 16, 2020 at 12:25:08PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 14, 2020 at 09:41:31PM -0700, John Stultz wrote:
But I do think we can mark it as deprecated and let folks know that around the end of the year it will be deleted.
No one ever notices "depreciated" things, they only notice if the code is no longer there :)
So I'm all for just deleting it and seeing who even notices...
Agreed.
I mean, I get there's not much love for ION in staging, and I too am eager to see it go, but I also feel like in the discussions around submitting the dmabuf heaps at talks, etc, that there was clear value in removing ION after a short time so that folks could transition being able to test both implementations against the same kernel so performance regressions, etc could be worked out.
I am actively getting many requests for help for vendors who are looking at dmabuf heaps and are starting the transition process, and I'm trying my best to motivate them to directly work within the community so their needed heap functionality can go upstream. But it's going to be a process, and their first attempts aren't going to magically land upstream. I think being able to really compare their implementations as they iterate and push things upstream will help in order to be able to have upstream solutions that are also properly functional for production usage.
The dmabuf heaps have been in an official kernel now for all of three weeks. So yea, we can "delete [ION] and see who even notices", but I worry that may seem a bit like contempt for the folks doing the work on transitioning over, which doesn't help getting them to participate within the community.
thanks -john
On Mon, Apr 20, 2020 at 01:03:39PM -0700, John Stultz wrote:
On Mon, Apr 20, 2020 at 1:22 AM Christian Brauner christian.brauner@ubuntu.com wrote:
On Thu, Apr 16, 2020 at 12:25:08PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 14, 2020 at 09:41:31PM -0700, John Stultz wrote:
But I do think we can mark it as deprecated and let folks know that around the end of the year it will be deleted.
No one ever notices "depreciated" things, they only notice if the code is no longer there :)
So I'm all for just deleting it and seeing who even notices...
Agreed.
I mean, I get there's not much love for ION in staging, and I too am eager to see it go, but I also feel like in the discussions around submitting the dmabuf heaps at talks, etc, that there was clear value in removing ION after a short time so that folks could transition being able to test both implementations against the same kernel so performance regressions, etc could be worked out.
I am actively getting many requests for help for vendors who are looking at dmabuf heaps and are starting the transition process, and I'm trying my best to motivate them to directly work within the community so their needed heap functionality can go upstream. But it's going to be a process, and their first attempts aren't going to magically land upstream. I think being able to really compare their implementations as they iterate and push things upstream will help in order to be able to have upstream solutions that are also properly functional for production usage.
But we are not accepting any new ion allocators or changes at the moment, so I don't see how the ion code in the kernel is helping/hurting anything here.
There has been a bunch of changes to the ion code recently, in the Android kernel trees, in order to get a lot of the different manufacturer "forks" of ion back together into one place. But again, those patches are not going to be sent upstream for merging so how is ion affecting the dmabuf code at all here?
The dmabuf heaps have been in an official kernel now for all of three weeks. So yea, we can "delete [ION] and see who even notices", but I worry that may seem a bit like contempt for the folks doing the work on transitioning over, which doesn't help getting them to participate within the community.
But they aren't participating in the community today as no one is touching the ion code. So I fail to see how keeping a dead-end-version of ion in the kernel tree really affects anyone these days.
thanks,
greg k-h
On Tue, Apr 21, 2020 at 10:05:44AM +0200, Greg Kroah-Hartman wrote:
On Mon, Apr 20, 2020 at 01:03:39PM -0700, John Stultz wrote:
On Mon, Apr 20, 2020 at 1:22 AM Christian Brauner christian.brauner@ubuntu.com wrote:
On Thu, Apr 16, 2020 at 12:25:08PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 14, 2020 at 09:41:31PM -0700, John Stultz wrote:
But I do think we can mark it as deprecated and let folks know that around the end of the year it will be deleted.
No one ever notices "depreciated" things, they only notice if the code is no longer there :)
So I'm all for just deleting it and seeing who even notices...
Agreed.
I mean, I get there's not much love for ION in staging, and I too am eager to see it go, but I also feel like in the discussions around submitting the dmabuf heaps at talks, etc, that there was clear value in removing ION after a short time so that folks could transition being able to test both implementations against the same kernel so performance regressions, etc could be worked out.
I am actively getting many requests for help for vendors who are looking at dmabuf heaps and are starting the transition process, and I'm trying my best to motivate them to directly work within the community so their needed heap functionality can go upstream. But it's going to be a process, and their first attempts aren't going to magically land upstream. I think being able to really compare their implementations as they iterate and push things upstream will help in order to be able to have upstream solutions that are also properly functional for production usage.
But we are not accepting any new ion allocators or changes at the moment, so I don't see how the ion code in the kernel is helping/hurting anything here.
There has been a bunch of changes to the ion code recently, in the Android kernel trees, in order to get a lot of the different manufacturer "forks" of ion back together into one place. But again, those patches are not going to be sent upstream for merging so how is ion affecting the dmabuf code at all here?
The dmabuf heaps have been in an official kernel now for all of three weeks. So yea, we can "delete [ION] and see who even notices", but I worry that may seem a bit like contempt for the folks doing the work on transitioning over, which doesn't help getting them to participate within the community.
But they aren't participating in the community today as no one is touching the ion code. So I fail to see how keeping a dead-end-version of ion in the kernel tree really affects anyone these days.
So, any thoughts here? What's the timeline for ion being able to be removed that you are comfortable with?
thanks,
greg k-h
On Fri, Jul 3, 2020 at 12:03 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 21, 2020 at 10:05:44AM +0200, Greg Kroah-Hartman wrote:
On Mon, Apr 20, 2020 at 01:03:39PM -0700, John Stultz wrote:
The dmabuf heaps have been in an official kernel now for all of three weeks. So yea, we can "delete [ION] and see who even notices", but I worry that may seem a bit like contempt for the folks doing the work on transitioning over, which doesn't help getting them to participate within the community.
But they aren't participating in the community today as no one is touching the ion code. So I fail to see how keeping a dead-end-version of ion in the kernel tree really affects anyone these days.
So, any thoughts here? What's the timeline for ion being able to be removed that you are comfortable with?
Sorry for the slow reply. So my earlier plan was to drop it after the next LTS?
thanks -john
On Tue, Jul 07, 2020 at 08:43:30PM -0700, John Stultz wrote:
On Fri, Jul 3, 2020 at 12:03 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 21, 2020 at 10:05:44AM +0200, Greg Kroah-Hartman wrote:
On Mon, Apr 20, 2020 at 01:03:39PM -0700, John Stultz wrote:
The dmabuf heaps have been in an official kernel now for all of three weeks. So yea, we can "delete [ION] and see who even notices", but I worry that may seem a bit like contempt for the folks doing the work on transitioning over, which doesn't help getting them to participate within the community.
But they aren't participating in the community today as no one is touching the ion code. So I fail to see how keeping a dead-end-version of ion in the kernel tree really affects anyone these days.
So, any thoughts here? What's the timeline for ion being able to be removed that you are comfortable with?
Sorry for the slow reply. So my earlier plan was to drop it after the next LTS?
Ok, fair enough, we can wait until January.
thanks,
greg k-h
On Tue, Apr 14, 2020 at 04:18:47PM +0200, Ørjan Eide wrote:
@@ -238,6 +242,10 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment, struct sg_table *table, enum dma_data_direction direction) {
- struct ion_dma_buf_attachment *a = attachment->priv;
- a->mapped = false;
Possibly a stupid question but here we're not holding a lock. Is concurrency an issue?
- dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
}
@@ -297,6 +305,8 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
mutex_lock(&buffer->lock); list_for_each_entry(a, &buffer->attachments, list) {
if (!a->mapped)
dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents, direction); }continue;
regards, dan carpenter
On Thu, Apr 16, 2020 at 12:49:56PM +0300, Dan Carpenter wrote:
On Tue, Apr 14, 2020 at 04:18:47PM +0200, Ørjan Eide wrote:
@@ -238,6 +242,10 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment, struct sg_table *table, enum dma_data_direction direction) {
- struct ion_dma_buf_attachment *a = attachment->priv;
- a->mapped = false;
Possibly a stupid question but here we're not holding a lock. Is concurrency an issue?
Thanks for taking a look.
Here and in ion_map_dma_buf(), where mapped is set, this should be safe. The callers must synchronize calls to map/unmap, and ensure that they are called in pairs.
I think there may be a potential issue between where mapped is set here, and where it's read in ion_dma_buf_{begin,end}_cpu_access(). Coherency issues the mapped bool won't blow up, at worst the contents of the buffer may be invalid as cache synchronization may have been skipped. This is still an improvement as before it would either crash or sync the wrong page repeatedly.
The mapped state is tied to the dma_map_sg() call, so we would need take the buffer->lock around both dma_map_sg and setting mapped to be sure that the buffer will always have been synced.
- dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
}
@@ -297,6 +305,8 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
mutex_lock(&buffer->lock); list_for_each_entry(a, &buffer->attachments, list) {
if (!a->mapped)
dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents, direction); }continue;
dri-devel@lists.freedesktop.org