Hi all,
Here's a quick v2 with the tags so far picked up and some inline commentary about the shareability domains for the pagetable code.
Robin.
Robin Murphy (3): iommu/io-pgtable-arm: Support coherency for Mali LPAE drm/panfrost: Support cache-coherent integrations arm64: dts: meson: Describe G12b GPU as coherent
arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++ drivers/gpu/drm/panfrost/panfrost_device.h | 1 + drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- 6 files changed, 20 insertions(+), 1 deletion(-)
Midgard GPUs have ACE-Lite master interfaces which allows systems to integrate them in an I/O-coherent manner. It seems that from the GPU's viewpoint, the rest of the system is its outer shareable domain, and so even when snoop signals are wired up, they are only emitted for outer shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does indeed get coherent pagetable walks working nicely for the coherent T620 in the Arm Juno SoC.
Reviewed-by: Steven Price steven.price@arm.com Tested-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Robin Murphy robin.murphy@arm.com --- drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index dc7bcf858b6d..b4072a18e45d 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, << ARM_LPAE_PTE_ATTRINDX_SHIFT); }
- if (prot & IOMMU_CACHE) + /* + * Also Mali has its own notions of shareability wherein its Inner + * domain covers the cores within the GPU, and its Outer domain is + * "outside the GPU" (i.e. either the Inner or System domain in CPU + * terms, depending on coherency). + */ + if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) pte |= ARM_LPAE_PTE_SH_IS; else pte |= ARM_LPAE_PTE_SH_OS; @@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) | ARM_MALI_LPAE_TTBR_READ_INNER | ARM_MALI_LPAE_TTBR_ADRMODE_TABLE; + if (cfg->coherent_walk) + cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER; + return &data->iop;
out_free_data:
On Tue, Sep 22, 2020 at 03:16:48PM +0100, Robin Murphy wrote:
Midgard GPUs have ACE-Lite master interfaces which allows systems to integrate them in an I/O-coherent manner. It seems that from the GPU's viewpoint, the rest of the system is its outer shareable domain, and so even when snoop signals are wired up, they are only emitted for outer shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does indeed get coherent pagetable walks working nicely for the coherent T620 in the Arm Juno SoC.
Reviewed-by: Steven Price steven.price@arm.com Tested-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Robin Murphy robin.murphy@arm.com
drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index dc7bcf858b6d..b4072a18e45d 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, << ARM_LPAE_PTE_ATTRINDX_SHIFT); }
- if (prot & IOMMU_CACHE)
- /*
* Also Mali has its own notions of shareability wherein its Inner
* domain covers the cores within the GPU, and its Outer domain is
* "outside the GPU" (i.e. either the Inner or System domain in CPU
* terms, depending on coherency).
*/
- if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) pte |= ARM_LPAE_PTE_SH_IS; else pte |= ARM_LPAE_PTE_SH_OS;
@@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) | ARM_MALI_LPAE_TTBR_READ_INNER | ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
- if (cfg->coherent_walk)
cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
Acked-by: Will Deacon will@kernel.org
I'm assuming I'm not the right person to merge this, and it needs to go alongside the other patches in this series.
Will
On Tue, 22 Sep 2020 15:16:48 +0100 Robin Murphy robin.murphy@arm.com wrote:
Midgard GPUs have ACE-Lite master interfaces which allows systems to integrate them in an I/O-coherent manner. It seems that from the GPU's viewpoint, the rest of the system is its outer shareable domain, and so even when snoop signals are wired up, they are only emitted for outer shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does indeed get coherent pagetable walks working nicely for the coherent T620 in the Arm Juno SoC.
Reviewed-by: Steven Price steven.price@arm.com Tested-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Robin Murphy robin.murphy@arm.com
drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index dc7bcf858b6d..b4072a18e45d 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, << ARM_LPAE_PTE_ATTRINDX_SHIFT); }
- if (prot & IOMMU_CACHE)
- /*
* Also Mali has its own notions of shareability wherein its Inner
* domain covers the cores within the GPU, and its Outer domain is
* "outside the GPU" (i.e. either the Inner or System domain in CPU
* terms, depending on coherency).
*/
- if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) pte |= ARM_LPAE_PTE_SH_IS; else pte |= ARM_LPAE_PTE_SH_OS;
Actually, it still doesn't work on s922x :-/. For it to work I correctly, I need to drop the outer shareable flag here.
@@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) | ARM_MALI_LPAE_TTBR_READ_INNER | ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
- if (cfg->coherent_walk)
cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
- return &data->iop;
out_free_data:
On 05/10/2020 15:50, Boris Brezillon wrote:
On Tue, 22 Sep 2020 15:16:48 +0100 Robin Murphy robin.murphy@arm.com wrote:
Midgard GPUs have ACE-Lite master interfaces which allows systems to integrate them in an I/O-coherent manner. It seems that from the GPU's viewpoint, the rest of the system is its outer shareable domain, and so even when snoop signals are wired up, they are only emitted for outer shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does indeed get coherent pagetable walks working nicely for the coherent T620 in the Arm Juno SoC.
Reviewed-by: Steven Price steven.price@arm.com Tested-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Robin Murphy robin.murphy@arm.com
drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index dc7bcf858b6d..b4072a18e45d 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, << ARM_LPAE_PTE_ATTRINDX_SHIFT); }
- if (prot & IOMMU_CACHE)
- /*
* Also Mali has its own notions of shareability wherein its Inner
* domain covers the cores within the GPU, and its Outer domain is
* "outside the GPU" (i.e. either the Inner or System domain in CPU
* terms, depending on coherency).
*/
- if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) pte |= ARM_LPAE_PTE_SH_IS; else pte |= ARM_LPAE_PTE_SH_OS;
Actually, it still doesn't work on s922x :-/. For it to work I correctly, I need to drop the outer shareable flag here.
The logic here does seem a bit odd. Originally it was:
IOMMU_CACHE -> Inner shared (value 3) !IOMMU_CACHE -> Outer shared (value 2)
For Mali we're forcing everything to the second option. But Mali being Mali doesn't do things the same as LPAE, so for Mali we have:
0 - not shared 1 - reserved 2 - inner(*) and outer shareable 3 - inner shareable only
(*) where "inner" means internal to the GPU, and "outer" means shared with the CPU "inner". Very confusing!
So originally we had: IOMMU_CACHE -> not shared with CPU (only internally in the GPU) !IOMMU_CACHE -> shared with CPU
The change above gets us to "always shared", dropping the SH_OS bit would get us to not even shareable between cores (which doesn't sound like what we want).
It's not at all clear to me why the change helps, but I suspect we want at least "inner" shareable.
Steve
@@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) | ARM_MALI_LPAE_TTBR_READ_INNER | ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
if (cfg->coherent_walk)
cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
return &data->iop;
out_free_data:
On Mon, 5 Oct 2020 16:16:32 +0100 Steven Price steven.price@arm.com wrote:
On 05/10/2020 15:50, Boris Brezillon wrote:
On Tue, 22 Sep 2020 15:16:48 +0100 Robin Murphy robin.murphy@arm.com wrote:
Midgard GPUs have ACE-Lite master interfaces which allows systems to integrate them in an I/O-coherent manner. It seems that from the GPU's viewpoint, the rest of the system is its outer shareable domain, and so even when snoop signals are wired up, they are only emitted for outer shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does indeed get coherent pagetable walks working nicely for the coherent T620 in the Arm Juno SoC.
Reviewed-by: Steven Price steven.price@arm.com Tested-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Robin Murphy robin.murphy@arm.com
drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index dc7bcf858b6d..b4072a18e45d 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, << ARM_LPAE_PTE_ATTRINDX_SHIFT); }
- if (prot & IOMMU_CACHE)
- /*
* Also Mali has its own notions of shareability wherein its Inner
* domain covers the cores within the GPU, and its Outer domain is
* "outside the GPU" (i.e. either the Inner or System domain in CPU
* terms, depending on coherency).
*/
- if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE) pte |= ARM_LPAE_PTE_SH_IS; else pte |= ARM_LPAE_PTE_SH_OS;
Actually, it still doesn't work on s922x :-/. For it to work I correctly, I need to drop the outer shareable flag here.
The logic here does seem a bit odd. Originally it was:
IOMMU_CACHE -> Inner shared (value 3) !IOMMU_CACHE -> Outer shared (value 2)
For Mali we're forcing everything to the second option. But Mali being Mali doesn't do things the same as LPAE, so for Mali we have:
0 - not shared 1 - reserved 2 - inner(*) and outer shareable 3 - inner shareable only
(*) where "inner" means internal to the GPU, and "outer" means shared with the CPU "inner". Very confusing!
So originally we had: IOMMU_CACHE -> not shared with CPU (only internally in the GPU) !IOMMU_CACHE -> shared with CPU
The change above gets us to "always shared", dropping the SH_OS bit would get us to not even shareable between cores (which doesn't sound like what we want).
Thanks for this explanation.
It's not at all clear to me why the change helps, but I suspect we want at least "inner" shareable.
Right. Looks like all this was caused by a bad conflict resolution during a rebase. Sorry for the noise :-/.
When the GPU's ACE-Lite interface is fully wired up and capable of snooping CPU caches, it may be described as "dma-coherent" in devicetree, which will already inform the DMA layer not to perform unnecessary cache maintenance. However, we still need to ensure that the GPU uses the appropriate cacheable outer-shareable attributes in order to generate the requisite snoop signals, and that CPU mappings don't create a mismatch by using a non-cacheable type either.
Reviewed-by: Steven Price steven.price@arm.com Tested-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Robin Murphy robin.murphy@arm.com --- drivers/gpu/drm/panfrost/panfrost_device.h | 1 + drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 4 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index c30c719a8059..b31f45315e96 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -84,6 +84,7 @@ struct panfrost_device { /* pm_domains for devices with more than one. */ struct device *pm_domain_devs[MAX_PM_DOMAINS]; struct device_link *pm_domain_links[MAX_PM_DOMAINS]; + bool coherent;
struct panfrost_features features; const struct panfrost_compatible *comp; diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index ada51df9a7a3..2a6f2f716b2f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev) if (!pfdev->comp) return -ENODEV;
+ pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT; + /* Allocate and initialze the DRM device. */ ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev); if (IS_ERR(ddev)) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 33355dd302f1..cdf1a8754eba 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { */ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size) { + struct panfrost_device *pfdev = dev->dev_private; struct panfrost_gem_object *obj;
obj = kzalloc(sizeof(*obj), GFP_KERNEL); @@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t INIT_LIST_HEAD(&obj->mappings.list); mutex_init(&obj->mappings.lock); obj->base.base.funcs = &panfrost_gem_funcs; + obj->base.map_cached = pfdev->coherent;
return &obj->base.base; } diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index e8f7b11352d2..8852fd378f7a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv) .pgsize_bitmap = SZ_4K | SZ_2M, .ias = FIELD_GET(0xff, pfdev->features.mmu_features), .oas = FIELD_GET(0xff00, pfdev->features.mmu_features), + .coherent_walk = pfdev->coherent, .tlb = &mmu_tlb_ops, .iommu_dev = pfdev->dev, };
According to a downstream commit I found in the Khadas vendor kernel, the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows how to handle this properly) we should describe it as such. Otherwise the mismatch leads to all manner of fun with mismatched attributes and inadvertently snooping stale data from caches, which would account for at least some of the brokenness observed on this platform.
Reviewed-by: Neil Armstrong narmstrong@baylibre.com Tested-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Robin Murphy robin.murphy@arm.com --- arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi index 9b8548e5f6e5..ee8fcae9f9f0 100644 --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi @@ -135,3 +135,7 @@ map1 { }; }; }; + +&mali { + dma-coherent; +};
Series is:
Acked-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com
On Tue, Sep 22, 2020 at 03:16:47PM +0100, Robin Murphy wrote:
Hi all,
Here's a quick v2 with the tags so far picked up and some inline commentary about the shareability domains for the pagetable code.
Robin.
Robin Murphy (3): iommu/io-pgtable-arm: Support coherency for Mali LPAE drm/panfrost: Support cache-coherent integrations arm64: dts: meson: Describe G12b GPU as coherent
arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++ drivers/gpu/drm/panfrost/panfrost_device.h | 1 + drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- 6 files changed, 20 insertions(+), 1 deletion(-)
-- 2.28.0.dirty
On 22/09/2020 16:16, Robin Murphy wrote:
Hi all,
Here's a quick v2 with the tags so far picked up and some inline commentary about the shareability domains for the pagetable code.
Robin.
Robin Murphy (3): iommu/io-pgtable-arm: Support coherency for Mali LPAE drm/panfrost: Support cache-coherent integrations arm64: dts: meson: Describe G12b GPU as coherent
arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++ drivers/gpu/drm/panfrost/panfrost_device.h | 1 + drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/iommu/io-pgtable-arm.c | 11 ++++++++++- 6 files changed, 20 insertions(+), 1 deletion(-)
Applying to drm-misc-next
Thanks !
dri-devel@lists.freedesktop.org