[PATCHv8 0/8] System Cache support for GPU and required SMMU support

List overview All Threads
Download

newer

older

Re: [PATCH 000/141] Fix...

[radeon-alex:drm-next 219/295]...

Sai Prakash Ranjan

17 Nov 2020 17 Nov '20

2:30 p.m.

Some hardware variants contain a system cache or the last level cache(llc). This cache is typically a large block which is shared by multiple clients on the SOC. GPU uses the system cache to cache both the GPU data buffers(like textures) as well the SMMU pagetables. This helps with improved render performance as well as lower power consumption by reducing the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices which then be used by multiple SOC clients. This patch series is an effort to enable and use two of those slices preallocated for the GPU, one for the GPU data buffers and another for the GPU SMMU hardware pagetables.

Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver. Patch 7 and 8 are minor cleanups for arm-smmu impl.

Changes in v8: * Introduce a generic domain attribute for pagetable config (Will) * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will) * Move non-strict mode to use new struct domain_attr_io_pgtbl_config (Will)

Changes in v7: * Squash Jordan's patch to support MMU500 targets * Rebase on top of for-joerg/arm-smmu/updates and Jordan's short series for adreno-smmu impl

Changes in v6: * Move table to arm-smmu-qcom (Robin)

Changes in v5: * Drop cleanup of blank lines since it was intentional (Robin) * Rebase again on top of msm-next-pgtables as it moves pretty fast

Changes in v4: * Drop IOMMU_SYS_CACHE prot flag * Rebase on top of https://gitlab.freedesktop.org/drm/msm/-/tree/msm-next-pgtables

Changes in v3: * Fix domain attribute setting to before iommu_attach_device() * Fix few code style and checkpatch warnings * Rebase on top of Jordan's latest split pagetables and per-instance pagetables support

Changes in v2: * Addressed review comments and rebased on top of Jordan's split pagetables series

Jordan Crouse (1): drm/msm/a6xx: Add support for using system cache on MMU500 based targets

Sai Prakash Ranjan (5): iommu/io-pgtable-arm: Add support to use system cache iommu/arm-smmu: Add domain attribute for pagetable configuration iommu/arm-smmu: Move non-strict mode to use domain_attr_io_pgtbl_cfg iommu: arm-smmu-impl: Use table to list QCOM implementations iommu: arm-smmu-impl: Add a space before open parenthesis

Sharat Masetty (2): drm/msm: rearrange the gpu_rmw() function drm/msm/a6xx: Add support for using system cache(LLC)

drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 109 +++++++++++++++++++++ drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 5 + drivers/gpu/drm/msm/adreno/adreno_gpu.c | 17 ++++ drivers/gpu/drm/msm/msm_drv.c | 8 ++ drivers/gpu/drm/msm/msm_drv.h | 1 + drivers/gpu/drm/msm/msm_gpu.h | 5 +- drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 11 +-- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 21 +++- drivers/iommu/arm/arm-smmu/arm-smmu.c | 30 +++++- drivers/iommu/arm/arm-smmu/arm-smmu.h | 3 +- drivers/iommu/io-pgtable-arm.c | 10 +- include/linux/io-pgtable.h | 8 ++ include/linux/iommu.h | 1 + 13 files changed, 203 insertions(+), 26 deletions(-)

base-commit: a29bbb0861f487a5e144dc997a9f71a36c7a2404

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Show replies by date

Sai Prakash Ranjan

17 Nov 17 Nov

2:30 p.m.

New subject: [PATCHv8 1/8] iommu/io-pgtable-arm: Add support to use system cache

Add a quirk IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to override the attributes set in TCR for the page table walker when using system cache.

Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org --- drivers/iommu/io-pgtable-arm.c | 10 ++++++++-- include/linux/io-pgtable.h | 4 ++++ 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index a7a9bc08dcd1..7c9ea9d7874a 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -761,7 +761,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)

if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NON_STRICT | - IO_PGTABLE_QUIRK_ARM_TTBR1)) + IO_PGTABLE_QUIRK_ARM_TTBR1 | + IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)) return NULL;

data = arm_lpae_alloc_pgtable(cfg); @@ -773,10 +774,15 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) tcr->sh = ARM_LPAE_TCR_SH_IS; tcr->irgn = ARM_LPAE_TCR_RGN_WBWA; tcr->orgn = ARM_LPAE_TCR_RGN_WBWA; + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) + goto out_free_data; } else { tcr->sh = ARM_LPAE_TCR_SH_OS; tcr->irgn = ARM_LPAE_TCR_RGN_NC; - tcr->orgn = ARM_LPAE_TCR_RGN_NC; + if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)) + tcr->orgn = ARM_LPAE_TCR_RGN_NC; + else + tcr->orgn = ARM_LPAE_TCR_RGN_WBWA; }

tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 4cde111e425b..a9a2c59fab37 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -86,6 +86,9 @@ struct io_pgtable_cfg { * * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table * for use in the upper half of a split address space. + * + * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the attributes set in TCR for + * the page table walker when using system cache. */ #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) @@ -93,6 +96,7 @@ struct io_pgtable_cfg { #define IO_PGTABLE_QUIRK_ARM_MTK_EXT BIT(3) #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4) #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5) + #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6) unsigned long quirks; unsigned long pgsize_bitmap; unsigned int ias;

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Will Deacon

23 Nov 23 Nov

3:06 p.m.

New subject: [PATCHv8 1/8] iommu/io-pgtable-arm: Add support to use system cache

On Tue, Nov 17, 2020 at 08:00:40PM +0530, Sai Prakash Ranjan wrote:

...

Please can you reword this to say:

"Override the outer-cacheability attributes set in the TCR for a non-coherent page-table walker."

Will

Sai Prakash Ranjan

4:41 p.m.

New subject: [PATCHv8 1/8] iommu/io-pgtable-arm: Add support to use system cache

On 2020-11-23 20:36, Will Deacon wrote:

...

On Tue, Nov 17, 2020 at 08:00:40PM +0530, Sai Prakash Ranjan wrote:

...
Add a quirk IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to override the attributes set in TCR for the page table walker when using system cache.

Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org

drivers/iommu/io-pgtable-arm.c | 10 ++++++++-- include/linux/io-pgtable.h | 4 ++++ 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index a7a9bc08dcd1..7c9ea9d7874a 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -761,7 +761,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)

if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NON_STRICT |
	    IO_PGTABLE_QUIRK_ARM_TTBR1))
	    IO_PGTABLE_QUIRK_ARM_TTBR1 |
	    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
return NULL;

data = arm_lpae_alloc_pgtable(cfg);
@@ -773,10 +774,15 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) tcr->sh = ARM_LPAE_TCR_SH_IS; tcr->irgn = ARM_LPAE_TCR_RGN_WBWA; tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
	goto out_free_data;
} else { tcr->sh = ARM_LPAE_TCR_SH_OS; tcr->irgn = ARM_LPAE_TCR_RGN_NC;
tcr->orgn = ARM_LPAE_TCR_RGN_NC;
if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
	tcr->orgn = ARM_LPAE_TCR_RGN_NC;
else
	tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
}

tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 4cde111e425b..a9a2c59fab37 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -86,6 +86,9 @@ struct io_pgtable_cfg { * * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table * for use in the upper half of a split address space.
*
* IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the attributes set in 
TCR for
*	the page table walker when using system cache.
Please can you reword this to say:

"Override the outer-cacheability attributes set in the TCR for a non-coherent page-table walker."

Sure, thanks.

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Sai Prakash Ranjan

17 Nov 17 Nov

2:30 p.m.

New subject: [PATCHv8 2/8] iommu/arm-smmu: Add domain attribute for pagetable configuration

Add iommu domain attribute for pagetable configuration which initially will be used to set quirks like for system cache aka last level cache to be used by client drivers like GPU to set right attributes for caching the hardware pagetables into the system cache and later can be extended to include other page table configuration data.

Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 25 +++++++++++++++++++++++++ drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + include/linux/io-pgtable.h | 4 ++++ include/linux/iommu.h | 1 + 4 files changed, 31 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 0f28a8614da3..7b05782738e2 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -789,6 +789,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, if (smmu_domain->non_strict) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;

+ if (smmu_domain->pgtbl_cfg.quirks) + pgtbl_cfg.quirks |= smmu_domain->pgtbl_cfg.quirks; + pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); if (!pgtbl_ops) { ret = -ENOMEM; @@ -1511,6 +1514,12 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_IO_PGTABLE_CFG: { + struct domain_attr_io_pgtbl_cfg *pgtbl_cfg = data; + *pgtbl_cfg = smmu_domain->pgtbl_cfg; + + return 0; + } default: return -ENODEV; } @@ -1551,6 +1560,22 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, else smmu_domain->stage = ARM_SMMU_DOMAIN_S1; break; + case DOMAIN_ATTR_IO_PGTABLE_CFG: { + struct domain_attr_io_pgtbl_cfg *pgtbl_cfg = data; + + if (smmu_domain->smmu) { + ret = -EPERM; + goto out_unlock; + } + + if (!pgtbl_cfg) { + ret = -ENODEV; + goto out_unlock; + } + + smmu_domain->pgtbl_cfg = *pgtbl_cfg; + break; + } default: ret = -ENODEV; } diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h index 04288b6fc619..18fbed376afb 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h @@ -364,6 +364,7 @@ enum arm_smmu_domain_stage { struct arm_smmu_domain { struct arm_smmu_device *smmu; struct io_pgtable_ops *pgtbl_ops; + struct domain_attr_io_pgtbl_cfg pgtbl_cfg; const struct iommu_flush_ops *flush_ops; struct arm_smmu_cfg cfg; enum arm_smmu_domain_stage stage; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index a9a2c59fab37..686b37d48743 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -212,6 +212,10 @@ struct io_pgtable {

#define io_pgtable_ops_to_pgtable(x) container_of((x), struct io_pgtable, ops)

+struct domain_attr_io_pgtbl_cfg { + unsigned long quirks; +}; + static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop) { iop->cfg.tlb->tlb_flush_all(iop->cookie); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b95a6f8db6ff..ffaa389ea128 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -118,6 +118,7 @@ enum iommu_attr { DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_NESTING, /* two stages of translation */ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, + DOMAIN_ATTR_IO_PGTABLE_CFG, DOMAIN_ATTR_MAX, };

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Will Deacon

23 Nov 23 Nov

3:18 p.m.

New subject: [PATCHv8 2/8] iommu/arm-smmu: Add domain attribute for pagetable configuration

On Tue, Nov 17, 2020 at 08:00:41PM +0530, Sai Prakash Ranjan wrote:

...

Do we really need to check this? If somebody passed us a NULL pointer then they have a bug and we don't check this for other domain attributes afaict.

...

nit: Can you rename this to 'struct io_pgtable_domain_attr' please?

Will

Sai Prakash Ranjan

4:42 p.m.

New subject: [PATCHv8 2/8] iommu/arm-smmu: Add domain attribute for pagetable configuration

On 2020-11-23 20:48, Will Deacon wrote:

...

On Tue, Nov 17, 2020 at 08:00:41PM +0530, Sai Prakash Ranjan wrote:

...
Add iommu domain attribute for pagetable configuration which initially will be used to set quirks like for system cache aka last level cache to be used by client drivers like GPU to set right attributes for caching the hardware pagetables into the system cache and later can be extended to include other page table configuration data.

Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org

drivers/iommu/arm/arm-smmu/arm-smmu.c | 25 +++++++++++++++++++++++++ drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + include/linux/io-pgtable.h | 4 ++++ include/linux/iommu.h | 1 + 4 files changed, 31 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 0f28a8614da3..7b05782738e2 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -789,6 +789,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, if (smmu_domain->non_strict) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
if (smmu_domain->pgtbl_cfg.quirks)
pgtbl_cfg.quirks |= smmu_domain->pgtbl_cfg.quirks;
pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); if (!pgtbl_ops) { ret = -ENOMEM;
@@ -1511,6 +1514,12 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0;
case DOMAIN_ATTR_IO_PGTABLE_CFG: {
	struct domain_attr_io_pgtbl_cfg *pgtbl_cfg = data;
	*pgtbl_cfg = smmu_domain->pgtbl_cfg;
	return 0;
}
default: return -ENODEV; }
@@ -1551,6 +1560,22 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, else smmu_domain->stage = ARM_SMMU_DOMAIN_S1; break;
case DOMAIN_ATTR_IO_PGTABLE_CFG: {
	struct domain_attr_io_pgtbl_cfg *pgtbl_cfg = data;
	if (smmu_domain->smmu) {
		ret = -EPERM;
		goto out_unlock;
	}
	if (!pgtbl_cfg) {
Do we really need to check this? If somebody passed us a NULL pointer then they have a bug and we don't check this for other domain attributes afaict.

True, I'll drop it.

...

Done, thanks.

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Sai Prakash Ranjan

17 Nov 17 Nov

2:30 p.m.

New subject: [PATCHv8 3/8] iommu/arm-smmu: Move non-strict mode to use domain_attr_io_pgtbl_cfg

Now that we have a struct domain_attr_io_pgtbl_cfg with quirks, use that for non_strict mode as well thereby removing the need for more members of arm_smmu_domain in the future.

Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 7 ++----- drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 - 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 7b05782738e2..5f066a1b7221 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -786,9 +786,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, goto out_clear_smmu; }

- if (smmu_domain->non_strict) - pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; - if (smmu_domain->pgtbl_cfg.quirks) pgtbl_cfg.quirks |= smmu_domain->pgtbl_cfg.quirks;

@@ -1527,7 +1524,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = smmu_domain->non_strict; + *(int *)data = smmu_domain->pgtbl_cfg.quirks; return 0; default: return -ENODEV; @@ -1583,7 +1580,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - smmu_domain->non_strict = *(int *)data; + smmu_domain->pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; break; default: ret = -ENODEV; diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h index 18fbed376afb..caae543ea077 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h @@ -368,7 +368,6 @@ struct arm_smmu_domain { const struct iommu_flush_ops *flush_ops; struct arm_smmu_cfg cfg; enum arm_smmu_domain_stage stage; - bool non_strict; struct mutex init_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ struct iommu_domain domain;

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Will Deacon

23 Nov 23 Nov

3:19 p.m.

New subject: [PATCHv8 3/8] iommu/arm-smmu: Move non-strict mode to use domain_attr_io_pgtbl_cfg

On Tue, Nov 17, 2020 at 08:00:42PM +0530, Sai Prakash Ranjan wrote:

...

Probably better to compare with IO_PGTABLE_QUIRK_NON_STRICT here even though we only support this one quirk for DMA domains atm.

Will

Sai Prakash Ranjan

4:43 p.m.

New subject: [PATCHv8 3/8] iommu/arm-smmu: Move non-strict mode to use domain_attr_io_pgtbl_cfg

On 2020-11-23 20:49, Will Deacon wrote:

...

Ok will do, thanks.

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Sai Prakash Ranjan

17 Nov 17 Nov

2:30 p.m.

New subject: [PATCHv8 4/8] drm/msm: rearrange the gpu_rmw() function

From: Sharat Masetty smasetty@codeaurora.org

The register read-modify-write construct is generic enough that it can be used by other subsystems as needed, create a more generic rmw() function and have the gpu_rmw() use this new function.

Signed-off-by: Sharat Masetty smasetty@codeaurora.org Reviewed-by: Jordan Crouse jcrouse@codeaurora.org Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org --- drivers/gpu/drm/msm/msm_drv.c | 8 ++++++++ drivers/gpu/drm/msm/msm_drv.h | 1 + drivers/gpu/drm/msm/msm_gpu.h | 5 +---- 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 49685571dc0e..a1e22b974b77 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -180,6 +180,14 @@ u32 msm_readl(const void __iomem *addr) return val; }

+void msm_rmw(void __iomem *addr, u32 mask, u32 or) +{ + u32 val = msm_readl(addr); + + val &= ~mask; + msm_writel(val | or, addr); +} + struct msm_vblank_work { struct work_struct work; int crtc_id; diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index b9dd8f8f4887..655b3b0424a1 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -478,6 +478,7 @@ void __iomem *msm_ioremap_quiet(struct platform_device *pdev, const char *name, const char *dbgname); void msm_writel(u32 data, void __iomem *addr); u32 msm_readl(const void __iomem *addr); +void msm_rmw(void __iomem *addr, u32 mask, u32 or);

struct msm_gpu_submitqueue; int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx); diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index 6c9e1fdc1a76..b2b419277953 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -246,10 +246,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)

static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or) { - uint32_t val = gpu_read(gpu, reg); - - val &= ~mask; - gpu_write(gpu, reg, val | or); + msm_rmw(gpu->mmio + (reg << 2), mask, or); }

static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Sai Prakash Ranjan

2:30 p.m.

New subject: [PATCHv8 5/8] drm/msm/a6xx: Add support for using system cache(LLC)

From: Sharat Masetty smasetty@codeaurora.org

The last level system cache can be partitioned to 32 different slices of which GPU has two slices preallocated. One slice is used for caching GPU buffers and the other slice is used for caching the GPU SMMU pagetables. This talks to the core system cache driver to acquire the slice handles, configure the SCID's to those slices and activates and deactivates the slices upon GPU power collapse and restore.

Some support from the IOMMU driver is also needed to make use of the system cache to set the right TCR attributes. GPU then has the ability to override a few cacheability parameters which it does to override write-allocate to write-no-allocate as the GPU hardware does not benefit much from it.

DOMAIN_ATTR_IO_PGTABLE_CFG is another domain level attribute used by the IOMMU driver for pagetable configuration which will be used to set a quirk initially to set the right attributes to cache the hardware pagetables into the system cache.

Signed-off-by: Sharat Masetty smasetty@codeaurora.org [saiprakash.ranjan: fix to set attr before device attach to iommu and rebase] Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 83 +++++++++++++++++++++++++ drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 4 ++ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 17 +++++ 3 files changed, 104 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 948f3656c20c..95c98c642876 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -8,7 +8,9 @@ #include "a6xx_gpu.h" #include "a6xx_gmu.xml.h"

+#include <linux/bitfield.h> #include <linux/devfreq.h> +#include <linux/soc/qcom/llcc-qcom.h>

#define GPU_PAS_ID 13

@@ -1022,6 +1024,79 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu) return IRQ_HANDLED; }

+static void a6xx_llc_rmw(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 mask, u32 or) +{ + return msm_rmw(a6xx_gpu->llc_mmio + (reg << 2), mask, or); +} + +static void a6xx_llc_write(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 value) +{ + return msm_writel(value, a6xx_gpu->llc_mmio + (reg << 2)); +} + +static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu) +{ + llcc_slice_deactivate(a6xx_gpu->llc_slice); + llcc_slice_deactivate(a6xx_gpu->htw_llc_slice); +} + +static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu) +{ + u32 cntl1_regval = 0; + + if (IS_ERR(a6xx_gpu->llc_mmio)) + return; + + if (!llcc_slice_activate(a6xx_gpu->llc_slice)) { + u32 gpu_scid = llcc_get_slice_id(a6xx_gpu->llc_slice); + + gpu_scid &= 0x1f; + cntl1_regval = (gpu_scid << 0) | (gpu_scid << 5) | (gpu_scid << 10) | + (gpu_scid << 15) | (gpu_scid << 20); + } + + if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) { + u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice); + + gpuhtw_scid &= 0x1f; + cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid); + } + + if (cntl1_regval) { + /* + * Program the slice IDs for the various GPU blocks and GPU MMU + * pagetables + */ + a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, cntl1_regval); + + /* + * Program cacheability overrides to not allocate cache lines on + * a write miss + */ + a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 0x03); + } +} + +static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu) +{ + llcc_slice_putd(a6xx_gpu->llc_slice); + llcc_slice_putd(a6xx_gpu->htw_llc_slice); +} + +static void a6xx_llc_slices_init(struct platform_device *pdev, + struct a6xx_gpu *a6xx_gpu) +{ + a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx"); + if (IS_ERR(a6xx_gpu->llc_mmio)) + return; + + a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU); + a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW); + + if (IS_ERR(a6xx_gpu->llc_slice) && IS_ERR(a6xx_gpu->htw_llc_slice)) + a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL); +} + static int a6xx_pm_resume(struct msm_gpu *gpu) { struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); @@ -1038,6 +1113,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)

msm_gpu_resume_devfreq(gpu);

+ a6xx_llc_activate(a6xx_gpu); + return 0; }

@@ -1048,6 +1125,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)

trace_msm_gpu_suspend(0);

+ a6xx_llc_deactivate(a6xx_gpu); + devfreq_suspend_device(gpu->devfreq.devfreq);

return a6xx_gmu_stop(a6xx_gpu); @@ -1091,6 +1170,8 @@ static void a6xx_destroy(struct msm_gpu *gpu) drm_gem_object_put(a6xx_gpu->shadow_bo); }

+ a6xx_llc_slices_destroy(a6xx_gpu); + a6xx_gmu_remove(a6xx_gpu);

adreno_gpu_cleanup(adreno_gpu); @@ -1209,6 +1290,8 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev) if (info && info->revn == 650) adreno_gpu->base.hw_apriv = true;

+ a6xx_llc_slices_init(pdev, a6xx_gpu); + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1); if (ret) { a6xx_destroy(&(a6xx_gpu->base.base)); diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h index 3eeebf6a754b..9e6079af679c 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h @@ -28,6 +28,10 @@ struct a6xx_gpu { uint32_t *shadow;

bool has_whereami; + + void __iomem *llc_mmio; + void *llc_slice; + void *htw_llc_slice; };

#define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 458b5b26d3c2..df63f3fe639d 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -16,6 +16,7 @@ #include <linux/soc/qcom/mdt_loader.h> #include <soc/qcom/ocmem.h> #include "adreno_gpu.h" +#include "a6xx_gpu.h" #include "msm_gem.h" #include "msm_mmu.h"

@@ -189,6 +190,9 @@ struct msm_gem_address_space * adreno_iommu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) { + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); + struct domain_attr_io_pgtbl_cfg pgtbl_cfg; struct iommu_domain *iommu; struct msm_mmu *mmu; struct msm_gem_address_space *aspace; @@ -198,7 +202,20 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu, if (!iommu) return NULL;

+ /* + * This allows GPU to set the bus attributes required to use system + * cache on behalf of the iommu page table walker. + */ + if (!IS_ERR(a6xx_gpu->htw_llc_slice)) { + pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA; + iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg); + } + mmu = msm_iommu_new(&pdev->dev, iommu); + if (IS_ERR(mmu)) { + iommu_domain_free(iommu); + return ERR_CAST(mmu); + }

/* * Use the aperture start or SZ_16M, whichever is greater. This will

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Sai Prakash Ranjan

2:30 p.m.

New subject: [PATCHv8 6/8] drm/msm/a6xx: Add support for using system cache on MMU500 based targets

From: Jordan Crouse jcrouse@codeaurora.org

GPU targets with an MMU-500 attached have a slightly different process for enabling system cache. Use the compatible string on the IOMMU phandle to see if an MMU-500 is attached and modify the programming sequence accordingly.

Signed-off-by: Jordan Crouse jcrouse@codeaurora.org Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 +++++++++++++++++++++------ drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 + 2 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 95c98c642876..3f8b92da8cba 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1042,6 +1042,8 @@ static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu)

static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu) { + struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; + struct msm_gpu *gpu = &adreno_gpu->base; u32 cntl1_regval = 0;

if (IS_ERR(a6xx_gpu->llc_mmio)) @@ -1055,11 +1057,17 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu) (gpu_scid << 15) | (gpu_scid << 20); }

+ /* + * For targets with a MMU500, activate the slice but don't program the + * register. The XBL will take care of that. + */ if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) { - u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice); + if (!a6xx_gpu->have_mmu500) { + u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice);

- gpuhtw_scid &= 0x1f; - cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid); + gpuhtw_scid &= 0x1f; + cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid); + } }

if (cntl1_regval) { @@ -1067,13 +1075,20 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu) * Program the slice IDs for the various GPU blocks and GPU MMU * pagetables */ - a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, cntl1_regval); - - /* - * Program cacheability overrides to not allocate cache lines on - * a write miss - */ - a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 0x03); + if (a6xx_gpu->have_mmu500) + gpu_rmw(gpu, REG_A6XX_GBIF_SCACHE_CNTL1, GENMASK(24, 0), + cntl1_regval); + else { + a6xx_llc_write(a6xx_gpu, + REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, cntl1_regval); + + /* + * Program cacheability overrides to not allocate cache + * lines on a write miss + */ + a6xx_llc_rmw(a6xx_gpu, + REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 0x03); + } } }

@@ -1086,10 +1101,21 @@ static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu) static void a6xx_llc_slices_init(struct platform_device *pdev, struct a6xx_gpu *a6xx_gpu) { + struct device_node *phandle; + a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx"); if (IS_ERR(a6xx_gpu->llc_mmio)) return;

+ /* + * There is a different programming path for targets with an mmu500 + * attached, so detect if that is the case + */ + phandle = of_parse_phandle(pdev->dev.of_node, "iommus", 0); + a6xx_gpu->have_mmu500 = (phandle && + of_device_is_compatible(phandle, "arm,mmu-500")); + of_node_put(phandle); + a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU); a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h index 9e6079af679c..e793d329e77b 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h @@ -32,6 +32,7 @@ struct a6xx_gpu { void __iomem *llc_mmio; void *llc_slice; void *htw_llc_slice; + bool have_mmu500; };

#define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Sai Prakash Ranjan

2:30 p.m.

New subject: [PATCHv8 7/8] iommu: arm-smmu-impl: Use table to list QCOM implementations

Use table and of_match_node() to match qcom implementation instead of multiple of_device_compatible() calls for each QCOM SMMU implementation.

Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org Acked-by: Will Deacon will@kernel.org --- drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 9 +-------- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 21 ++++++++++++++++----- drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 - 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c index 7fed89c9d18a..26e2734eb4d7 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c @@ -214,14 +214,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_device_is_compatible(np, "nvidia,tegra194-smmu")) return nvidia_smmu_impl_init(smmu);

- if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || - of_device_is_compatible(np, "qcom,sc7180-smmu-500") || - of_device_is_compatible(np, "qcom,sm8150-smmu-500") || - of_device_is_compatible(np, "qcom,sm8250-smmu-500")) - return qcom_smmu_impl_init(smmu); - - if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu")) - return qcom_adreno_smmu_impl_init(smmu); + smmu = qcom_smmu_impl_init(smmu);

if (of_device_is_compatible(np, "marvell,ap806-smmu-500")) smmu->impl = &mrvl_mmu500_impl; diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index d0636c803a36..add1859b2899 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -318,12 +318,23 @@ static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu, return &qsmmu->smmu; }

+static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = { + { .compatible = "qcom,sc7180-smmu-500" }, + { .compatible = "qcom,sdm845-smmu-500" }, + { .compatible = "qcom,sm8150-smmu-500" }, + { .compatible = "qcom,sm8250-smmu-500" }, + { } +}; + struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu) { - return qcom_smmu_create(smmu, &qcom_smmu_impl); -} + const struct device_node *np = smmu->dev->of_node;

-struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu) -{ - return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl); + if (of_match_node(qcom_smmu_impl_of_match, np)) + return qcom_smmu_create(smmu, &qcom_smmu_impl); + + if (of_device_is_compatible(np, "qcom,adreno-smmu")) + return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl); + + return smmu; } diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h index caae543ea077..7db81c7c7833 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h @@ -523,7 +523,6 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu); struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu); struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu); -struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu);

void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx); int arm_mmu500_reset(struct arm_smmu_device *smmu);

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Sai Prakash Ranjan

2:30 p.m.

New subject: [PATCHv8 8/8] iommu: arm-smmu-impl: Add a space before open parenthesis

Fix the checkpatch warning for space required before the open parenthesis.

Signed-off-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org Acked-by: Will Deacon will@kernel.org --- drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c index 26e2734eb4d7..136872e77195 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c @@ -12,7 +12,7 @@

static int arm_smmu_gr0_ns(int offset) { - switch(offset) { + switch (offset) { case ARM_SMMU_GR0_sCR0: case ARM_SMMU_GR0_sACR: case ARM_SMMU_GR0_sGFSR:

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Will Deacon

23 Nov 23 Nov

3:21 p.m.

On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:

...

Modulo some minor comments I've made, this looks good to me. What is the plan for merging it? I can take the IOMMU parts, but patches 4-6 touch the MSM GPU driver and I'd like to avoid conflicts with that.

Will

Sai Prakash Ranjan

5:01 p.m.

On 2020-11-23 20:51, Will Deacon wrote:

...

SMMU bits are pretty much independent and GPU relies on the domain attribute and the quirk exposed, so as long as SMMU changes go in first it should be good. Rob?

Thanks, Sai

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Rob Clark

7:22 p.m.

On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:

...

I suppose one option would be to split out the patch that adds the attribute into it's own patch, and merge that both thru drm and iommu?

If Will/Robin dislike that approach, I'll pick up the parts of the drm patches which don't depend on the new attribute for v5.11 and the rest for v5.12.. or possibly a second late v5.11 pull req if airlied doesn't hate me too much for it.

Going forward, I think we will have one or two more co-dependent series, like the smmu iova fault handler improvements that Jordan posted. So I would like to hear how Will and Robin prefer to handle those.

BR, -R

...

Sai Prakash Ranjan

24 Nov 24 Nov

4:02 a.m.

On 2020-11-24 00:52, Rob Clark wrote:

...

Ok I can split out domain attr and quirk into its own patch if Will is fine with that approach.

...

Thanks, Sai

-- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Will Deacon

11:10 a.m.

On Tue, Nov 24, 2020 at 09:32:54AM +0530, Sai Prakash Ranjan wrote:

...

Why don't I just queue the first two patches on their own branch and we both pull that?

Will

Rob Clark

7:05 p.m.

On Tue, Nov 24, 2020 at 3:10 AM Will Deacon will@kernel.org wrote:

...

On Tue, Nov 24, 2020 at 09:32:54AM +0530, Sai Prakash Ranjan wrote:

...
On 2020-11-24 00:52, Rob Clark wrote:

...
On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:

...
On 2020-11-23 20:51, Will Deacon wrote:

...
On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:

...
Some hardware variants contain a system cache or the last level cache(llc). This cache is typically a large block which is shared by multiple clients on the SOC. GPU uses the system cache to cache both the GPU data buffers(like textures) as well the SMMU pagetables. This helps with improved render performance as well as lower power consumption by reducing the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices which then be used by multiple SOC clients. This patch series is an effort to enable and use two of those slices preallocated for the GPU, one for the GPU data buffers and another for the GPU SMMU hardware pagetables.

Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver. Patch 7 and 8 are minor cleanups for arm-smmu impl.

Changes in v8:

Introduce a generic domain attribute for pagetable config (Will)

Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)

Move non-strict mode to use new struct domain_attr_io_pgtbl_config

(Will)

Modulo some minor comments I've made, this looks good to me. What is the plan for merging it? I can take the IOMMU parts, but patches 4-6 touch the MSM GPU driver and I'd like to avoid conflicts with that.

SMMU bits are pretty much independent and GPU relies on the domain attribute and the quirk exposed, so as long as SMMU changes go in first it should be good. Rob?

I suppose one option would be to split out the patch that adds the attribute into it's own patch, and merge that both thru drm and iommu?

Ok I can split out domain attr and quirk into its own patch if Will is fine with that approach.

Why don't I just queue the first two patches on their own branch and we both pull that?

Ok, that works for me. I normally base msm-next on -rc1 but I guess as long as we base the branch on the older or our two -next branches, that should work out nicely

BR, -R

Will Deacon

9:43 p.m.

On Tue, Nov 24, 2020 at 11:05:39AM -0800, Rob Clark wrote:

...

Turns out we're getting a v10 of Sai's stuff, so I've asked him to split patch two up anyway. Then I'll make a branch based on -rc1 that we can both pull.

Will

Rob Clark

10:08 p.m.

On Tue, Nov 24, 2020 at 1:43 PM Will Deacon will@kernel.org wrote:

...

Sounds good, thx

BR, -R

1635

Age (days ago)

1642

Last active (days ago)

dri-devel@lists.freedesktop.org

22 comments

3 participants

tags (0)

participants (3)

Rob Clark
Sai Prakash Ranjan
Will Deacon