The current settings leaves the DRM device's dma_ops field NULL, which makes it use the dummy DMA ops on arm64 and return an error whenever we try to import a buffer. Call of_dma_configure() with a NULL node (since the device is not spawn from the device tree) so that arch_setup_dma_ops() is called and sets the default ioswtlb DMA ops.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index d347188bf8f4..bc0555adecaf 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -9,6 +9,7 @@
#include <linux/host1x.h> #include <linux/iommu.h> +#include <linux/of_device.h>
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> @@ -990,6 +991,7 @@ static int host1x_drm_probe(struct host1x_device *dev) return -ENOMEM;
dev_set_drvdata(&dev->dev, drm); + of_dma_configure(drm->dev, NULL);
err = drm_dev_register(drm, 0); if (err < 0)
The default DMA mask covers a 32 bits address range, but tegradrm can address more than that. Set the DMA mask to the actual addressable range to avoid the use of unneeded bounce buffers.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- Thierry, I am not absolutely sure whether the size is correct and applies to all Tegra generations - please let me know if this needs to be reworked.
drivers/gpu/drm/tegra/drm.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index bc0555adecaf..503fc9e73521 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -992,6 +992,7 @@ static int host1x_drm_probe(struct host1x_device *dev)
dev_set_drvdata(&dev->dev, drm); of_dma_configure(drm->dev, NULL); + dma_set_mask(drm->dev, DMA_BIT_MASK(34));
err = drm_dev_register(drm, 0); if (err < 0)
On Tue, Feb 23, 2016 at 03:25:54PM +0900, Alexandre Courbot wrote:
The default DMA mask covers a 32 bits address range, but tegradrm can address more than that. Set the DMA mask to the actual addressable range to avoid the use of unneeded bounce buffers.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
Thierry, I am not absolutely sure whether the size is correct and applies to all Tegra generations - please let me know if this needs to be reworked.
drivers/gpu/drm/tegra/drm.c | 1 + 1 file changed, 1 insertion(+)
This kind of depends on whether or not the device is behind an IOMMU. If it is, then the IOMMU DMA MASK would apply, which can be derived from the number of address bits that the IOMMU can handle. The SMMU supports 32 address bits on Tegra30 and Tegra114, 34 address bits on more recent generations.
I think for now it's safer to leave the DMA mask at the default (32 bit) to avoid the need to distinguish between IOMMU and non-IOMMU devices.
Thierry
On 02/23/2016 08:04 AM, Thierry Reding wrote:
- PGP Signed by an unknown key > > On Tue, Feb 23, 2016 at 03:25:54PM +0900, Alexandre Courbot wrote:
The default DMA mask covers a 32 bits address range, but tegradrm >>
can address more than that. Set the DMA mask to the actual >> addressable range to avoid the use of unneeded bounce buffers. >> >> Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- Thierry, >> I am not absolutely sure whether the size is correct and applies to >> all Tegra generations - please let me know if this needs to be >> reworked.
drivers/gpu/drm/tegra/drm.c | 1 + 1 file changed, 1 insertion(+)
This kind of depends on whether or not the device is behind an
IOMMU. > If it is, then the IOMMU DMA MASK would apply, which can be derived > from the number of address bits that the IOMMU can handle. The SMMU > supports 32 address bits on Tegra30 and Tegra114, 34 address bits on > more recent generations. > > I think for now it's safer to leave the DMA mask at the default (32 > bit) to avoid the need to distinguish between IOMMU and non-IOMMU > devices.
The GPUs after Tegra114 can choose per access whether they're using IOMMU or not. The interface is 34 bits wide, so the physical addresses can be 34 bits. IOMMU addresses are limited by Tegra SMMU to 32-bit for gk20a. gm20b can use 34-bit if SMMU is configured to combine four ASIDs together.
On Tue, Feb 23, 2016 at 08:18:26AM -0800, Terje Bergstrom wrote:
On 02/23/2016 08:04 AM, Thierry Reding wrote:
- PGP Signed by an unknown key > > On Tue, Feb 23, 2016 at 03:25:54PM
+0900, Alexandre Courbot wrote:
The default DMA mask covers a 32 bits address range, but tegradrm >> can
address more than that. Set the DMA mask to the actual >> addressable range to avoid the use of unneeded bounce buffers. >> >> Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- Thierry, >> I am not absolutely sure whether the size is correct and applies to >> all Tegra generations - please let me know if this needs to be >> reworked. >> >> drivers/gpu/drm/tegra/drm.c | 1 + 1 file changed, 1 insertion(+) > > This kind of depends on whether or not the device is behind an IOMMU. > If it is, then the IOMMU DMA MASK would apply, which can be derived > from the number of address bits that the IOMMU can handle. The SMMU > supports 32 address bits on Tegra30 and Tegra114, 34 address bits on > more recent generations.
I think for now it's safer to leave the DMA mask at the default (32 >
bit) to avoid the need to distinguish between IOMMU and non-IOMMU > devices.
The GPUs after Tegra114 can choose per access whether they're using IOMMU or not. The interface is 34 bits wide, so the physical addresses can be 34 bits. IOMMU addresses are limited by Tegra SMMU to 32-bit for gk20a. gm20b can use 34-bit if SMMU is configured to combine four ASIDs together.
This particular patch sets up the DMA mask for the display engines. But yes, most of the above holds true for that case as well, except that as far as I know there is no mechanism to have the display engines choose per access, whether or not to use the SMMU.
Thierry
On 02/24/2016 01:04 AM, Thierry Reding wrote:
- PGP Signed by an unknown key
On Tue, Feb 23, 2016 at 03:25:54PM +0900, Alexandre Courbot wrote:
The default DMA mask covers a 32 bits address range, but tegradrm can address more than that. Set the DMA mask to the actual addressable range to avoid the use of unneeded bounce buffers.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
Thierry, I am not absolutely sure whether the size is correct and applies to all Tegra generations - please let me know if this needs to be reworked.
drivers/gpu/drm/tegra/drm.c | 1 + 1 file changed, 1 insertion(+)
This kind of depends on whether or not the device is behind an IOMMU. If it is, then the IOMMU DMA MASK would apply, which can be derived from the number of address bits that the IOMMU can handle. The SMMU supports 32 address bits on Tegra30 and Tegra114, 34 address bits on more recent generations.
I think for now it's safer to leave the DMA mask at the default (32 bit) to avoid the need to distinguish between IOMMU and non-IOMMU devices.
Leaving it that way makes it (almost) impossible to import buffers on TX1. Patch 1 sets the DMA ops to swiotlb, so at least after this we actually try to import the buffer. However, any page that is higher than the 32 bits range will be bounced. If you are lucky, you won't notice it (even though I don't think it is acceptable to bounce data to be displayed), but most of the time the swiotlb bounce area will run full and import will fail with the following message:
drm drm: swiotlb buffer is full (sz: 294912 bytes)
So we should really try and fix this. The issue is, how do you detect whether you are behind a IOMMU? The DCs have an iommus property, but the drm device (which does the importing) does not. And when we import a buffer into tegradrm, nothing guarantees that it is for display. So far I cannot think of a better heuristics than "assume 32 bits on < t124 and 34 bits afterwards".
On Tue, Feb 23, 2016 at 03:25:53PM +0900, Alexandre Courbot wrote:
The current settings leaves the DRM device's dma_ops field NULL, which makes it use the dummy DMA ops on arm64 and return an error whenever we try to import a buffer. Call of_dma_configure() with a NULL node (since the device is not spawn from the device tree) so that arch_setup_dma_ops() is called and sets the default ioswtlb DMA ops.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/tegra/drm.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index d347188bf8f4..bc0555adecaf 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -9,6 +9,7 @@
#include <linux/host1x.h> #include <linux/iommu.h> +#include <linux/of_device.h>
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> @@ -990,6 +991,7 @@ static int host1x_drm_probe(struct host1x_device *dev) return -ENOMEM;
dev_set_drvdata(&dev->dev, drm);
- of_dma_configure(drm->dev, NULL);
Looking at the various pieces, I think this really belongs in host1x_device_add() (see drivers/gpu/host1x/bus.c) where it can replace the open-coded setting of DMA and coherent DMA masks. Also why can't we pass the correct device tree node here? The DRM device is a virtual device that hangs off the host1x device, so I think it could use the same device tree node as the host1x device.
Something like the below (untested).
Thierry
--- >8 --- diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c index c2e7fba370bb..d46d26a574da 100644 --- a/drivers/gpu/host1x/bus.c +++ b/drivers/gpu/host1x/bus.c @@ -17,6 +17,7 @@
#include <linux/host1x.h> #include <linux/of.h> +#include <linux/of_device.h> #include <linux/slab.h>
#include "bus.h" @@ -393,9 +394,8 @@ static int host1x_device_add(struct host1x *host1x, INIT_LIST_HEAD(&device->list); device->driver = driver;
- device->dev.coherent_dma_mask = host1x->dev->coherent_dma_mask; - device->dev.dma_mask = &device->dev.coherent_dma_mask; dev_set_name(&device->dev, "%s", driver->driver.name); + of_dma_configure(&device->dev, host1x->dev->of_node); device->dev.release = host1x_device_release; device->dev.bus = &host1x_bus_type; device->dev.parent = host1x->dev;
On 02/24/2016 12:28 AM, Thierry Reding wrote:
- PGP Signed by an unknown key
On Tue, Feb 23, 2016 at 03:25:53PM +0900, Alexandre Courbot wrote:
The current settings leaves the DRM device's dma_ops field NULL, which makes it use the dummy DMA ops on arm64 and return an error whenever we try to import a buffer. Call of_dma_configure() with a NULL node (since the device is not spawn from the device tree) so that arch_setup_dma_ops() is called and sets the default ioswtlb DMA ops.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/tegra/drm.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index d347188bf8f4..bc0555adecaf 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -9,6 +9,7 @@
#include <linux/host1x.h> #include <linux/iommu.h> +#include <linux/of_device.h>
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> @@ -990,6 +991,7 @@ static int host1x_drm_probe(struct host1x_device *dev) return -ENOMEM;
dev_set_drvdata(&dev->dev, drm);
- of_dma_configure(drm->dev, NULL);
Looking at the various pieces, I think this really belongs in host1x_device_add() (see drivers/gpu/host1x/bus.c) where it can replace the open-coded setting of DMA and coherent DMA masks. Also why can't we pass the correct device tree node here? The DRM device is a virtual device that hangs off the host1x device, so I think it could use the same device tree node as the host1x device.
Something like the below (untested).
You're right, that looks like a much better place to do this. of_dma_configure() is called at the bus level (platform and PCI), so it makes sense to do it from host1x too.
dri-devel@lists.freedesktop.org