Another iteration of the split-pagetable support for arm-smmu and the Adreno GPU SMMU. After email discussions [1] we opted to make a arm-smmu implementation for specifically for the Adreno GPU and use that to enable split pagetable support and later other implementation specific bits that we need.
On the hardware side this is very close to the same code from before [2] only the TTBR1 quirk is turned on by the implementation and not a domain attribute. In drm/msm we use the returned size of the aperture as a clue to let us know which virtual address space we should use for global memory objects.
There are two open items that you should be aware of. First, in the implementation specific code we have to check the compatible string of the device so that we only enable TTBR1 for the GPU (SID 0) and not the GMU (SID 4). I went back and forth trying to decide if I wanted to use the compatible string or the SID as the filter and settled on the compatible string but I could be talked out of it.
The other open item is that in drm/msm the hardware only uses 49 bits of the address space but arm-smmu expects the address to be sign extended all the way to 64 bits. This isn't a problem normally unless you look at the hardware registers that contain a IOVA and then the upper bits will be zero. I opted to restrict the internal drm/msm IOVA range to only 49 bits and then sign extend right before calling iommu_map / iommu_unmap. This is a bit wonky but I thought that matching the hardware would be less confusing when debugging a hang.
v9: Fix bot-detected merge conflict v7: Add attached device to smmu_domain to pass to implementation specific functions
[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-May/044537.html [2] https://patchwork.kernel.org/patch/11482591/
Jordan Crouse (7): iommu/arm-smmu: Pass io-pgtable config to implementation specific function iommu/arm-smmu: Add support for split pagetables dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU iommu/arm-smmu: Add a pointer to the attached device to smmu_domain iommu/arm-smmu: Add implementation for the adreno GPU SMMU drm/msm: Set the global virtual address range from the IOMMU domain arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU
.../devicetree/bindings/iommu/arm,smmu.yaml | 4 ++ arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++++- drivers/gpu/drm/msm/msm_iommu.c | 7 +++ drivers/iommu/arm-smmu-impl.c | 6 ++- drivers/iommu/arm-smmu-qcom.c | 45 ++++++++++++++++++- drivers/iommu/arm-smmu.c | 38 +++++++++++----- drivers/iommu/arm-smmu.h | 30 ++++++++++--- 8 files changed, 120 insertions(+), 25 deletions(-)
Use the aperture settings from the IOMMU domain to set up the virtual address range for the GPU. This allows us to transparently deal with IOMMU side features (like split pagetables).
Signed-off-by: Jordan Crouse jcrouse@codeaurora.org ---
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++++++++++-- drivers/gpu/drm/msm/msm_iommu.c | 7 +++++++ 2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 5db06b590943..3e717c1ebb7f 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu, struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type); struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, iommu); struct msm_gem_address_space *aspace; + u64 start, size;
- aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M, - 0xffffffff - SZ_16M); + /* + * Use the aperture start or SZ_16M, whichever is greater. This will + * ensure that we align with the allocated pagetable range while still + * allowing room in the lower 32 bits for GMEM and whatnot + */ + start = max_t(u64, SZ_16M, iommu->geometry.aperture_start); + size = iommu->geometry.aperture_end - start + 1; + + aspace = msm_gem_address_space_create(mmu, "gpu", + start & GENMASK(48, 0), size);
if (IS_ERR(aspace) && !IS_ERR(mmu)) mmu->funcs->destroy(mmu); diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 3a381a9674c9..1b6635504069 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, struct msm_iommu *iommu = to_msm_iommu(mmu); size_t ret;
+ /* The arm-smmu driver expects the addresses to be sign extended */ + if (iova & BIT_ULL(48)) + iova |= GENMASK_ULL(63, 49); + ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); WARN_ON(!ret);
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova, size_t len) { struct msm_iommu *iommu = to_msm_iommu(mmu);
+ if (iova & BIT_ULL(48)) + iova |= GENMASK_ULL(63, 49); + iommu_unmap(iommu->domain, iova, len);
return 0;
On Fri, Jun 26, 2020 at 1:01 PM Jordan Crouse jcrouse@codeaurora.org wrote:
Use the aperture settings from the IOMMU domain to set up the virtual address range for the GPU. This allows us to transparently deal with IOMMU side features (like split pagetables).
Signed-off-by: Jordan Crouse jcrouse@codeaurora.org
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++++++++++-- drivers/gpu/drm/msm/msm_iommu.c | 7 +++++++ 2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 5db06b590943..3e717c1ebb7f 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu, struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type); struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, iommu); struct msm_gem_address_space *aspace;
u64 start, size;
aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
0xffffffff - SZ_16M);
/*
* Use the aperture start or SZ_16M, whichever is greater. This will
* ensure that we align with the allocated pagetable range while still
* allowing room in the lower 32 bits for GMEM and whatnot
*/
start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
size = iommu->geometry.aperture_end - start + 1;
aspace = msm_gem_address_space_create(mmu, "gpu",
start & GENMASK(48, 0), size);
hmm, I kinda think this isn't going to play well for the 32b gpus (pre-a5xx).. possibly we should add address space size to 'struct adreno_info'?
Or I guess it is always going to be the same for all devices within a generation? So it could just be passed in to adreno_gpu_init()
Hopefully that makes things smoother if we someday had more than 48bits..
BR, -R
if (IS_ERR(aspace) && !IS_ERR(mmu)) mmu->funcs->destroy(mmu);
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 3a381a9674c9..1b6635504069 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, struct msm_iommu *iommu = to_msm_iommu(mmu); size_t ret;
/* The arm-smmu driver expects the addresses to be sign extended */
if (iova & BIT_ULL(48))
iova |= GENMASK_ULL(63, 49);
ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); WARN_ON(!ret);
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova, size_t len) { struct msm_iommu *iommu = to_msm_iommu(mmu);
if (iova & BIT_ULL(48))
iova |= GENMASK_ULL(63, 49);
iommu_unmap(iommu->domain, iova, len); return 0;
-- 2.17.1
Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
On Sat, Jun 27, 2020 at 10:10:14AM -0700, Rob Clark wrote:
On Fri, Jun 26, 2020 at 1:01 PM Jordan Crouse jcrouse@codeaurora.org wrote:
Use the aperture settings from the IOMMU domain to set up the virtual address range for the GPU. This allows us to transparently deal with IOMMU side features (like split pagetables).
Signed-off-by: Jordan Crouse jcrouse@codeaurora.org
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++++++++++-- drivers/gpu/drm/msm/msm_iommu.c | 7 +++++++ 2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 5db06b590943..3e717c1ebb7f 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu, struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type); struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, iommu); struct msm_gem_address_space *aspace;
u64 start, size;
aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
0xffffffff - SZ_16M);
/*
* Use the aperture start or SZ_16M, whichever is greater. This will
* ensure that we align with the allocated pagetable range while still
* allowing room in the lower 32 bits for GMEM and whatnot
*/
start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
size = iommu->geometry.aperture_end - start + 1;
aspace = msm_gem_address_space_create(mmu, "gpu",
start & GENMASK(48, 0), size);
hmm, I kinda think this isn't going to play well for the 32b gpus (pre-a5xx).. possibly we should add address space size to 'struct adreno_info'?
I checked and qcom-iommu sets the aperture correctly so this should be okay for everybody. To be honest, I'm nots sure if we even need to mask the start to 49 bits. It seems that all of the iommu implementations do the right thing. Of course it would be worth a check if you have a 4xx handy.
Or I guess it is always going to be the same for all devices within a generation? So it could just be passed in to adreno_gpu_init()
We can do that easily if we are worried about it (see also: a2xx). I just figured this might save us a bit of code.
Hopefully that makes things smoother if we someday had more than 48bits..
We'll be at 49 bits for as far ahead as I can see. 49 bits has a special meaning in the SMMU so it is a natural fit for the GPU hardware. If we change in N generations we can just shift to a family specific function at that point.
Jordan
BR, -R
if (IS_ERR(aspace) && !IS_ERR(mmu)) mmu->funcs->destroy(mmu);
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 3a381a9674c9..1b6635504069 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, struct msm_iommu *iommu = to_msm_iommu(mmu); size_t ret;
/* The arm-smmu driver expects the addresses to be sign extended */
if (iova & BIT_ULL(48))
iova |= GENMASK_ULL(63, 49);
ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); WARN_ON(!ret);
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova, size_t len) { struct msm_iommu *iommu = to_msm_iommu(mmu);
if (iova & BIT_ULL(48))
iova |= GENMASK_ULL(63, 49);
iommu_unmap(iommu->domain, iova, len); return 0;
-- 2.17.1
Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Hi Will, Robin,
On 2020-06-27 01:30, Jordan Crouse wrote:
Another iteration of the split-pagetable support for arm-smmu and the Adreno GPU SMMU. After email discussions [1] we opted to make a arm-smmu implementation for specifically for the Adreno GPU and use that to enable split pagetable support and later other implementation specific bits that we need.
On the hardware side this is very close to the same code from before [2] only the TTBR1 quirk is turned on by the implementation and not a domain attribute. In drm/msm we use the returned size of the aperture as a clue to let us know which virtual address space we should use for global memory objects.
There are two open items that you should be aware of. First, in the implementation specific code we have to check the compatible string of the device so that we only enable TTBR1 for the GPU (SID 0) and not the GMU (SID 4). I went back and forth trying to decide if I wanted to use the compatible string or the SID as the filter and settled on the compatible string but I could be talked out of it.
The other open item is that in drm/msm the hardware only uses 49 bits of the address space but arm-smmu expects the address to be sign extended all the way to 64 bits. This isn't a problem normally unless you look at the hardware registers that contain a IOVA and then the upper bits will be zero. I opted to restrict the internal drm/msm IOVA range to only 49 bits and then sign extend right before calling iommu_map / iommu_unmap. This is a bit wonky but I thought that matching the hardware would be less confusing when debugging a hang.
v9: Fix bot-detected merge conflict v7: Add attached device to smmu_domain to pass to implementation specific functions
[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-May/044537.html [2] https://patchwork.kernel.org/patch/11482591/
Jordan Crouse (7): iommu/arm-smmu: Pass io-pgtable config to implementation specific function iommu/arm-smmu: Add support for split pagetables dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU iommu/arm-smmu: Add a pointer to the attached device to smmu_domain iommu/arm-smmu: Add implementation for the adreno GPU SMMU drm/msm: Set the global virtual address range from the IOMMU domain arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU
.../devicetree/bindings/iommu/arm,smmu.yaml | 4 ++ arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++++- drivers/gpu/drm/msm/msm_iommu.c | 7 +++ drivers/iommu/arm-smmu-impl.c | 6 ++- drivers/iommu/arm-smmu-qcom.c | 45 ++++++++++++++++++- drivers/iommu/arm-smmu.c | 38 +++++++++++----- drivers/iommu/arm-smmu.h | 30 ++++++++++--- 8 files changed, 120 insertions(+), 25 deletions(-)
Any chance reviewing this?
Thanks, Sai
dri-devel@lists.freedesktop.org