Some hardware variants contain a system level cache or the last level cache(llc). This cache is typically a large block which is shared by multiple clients on the SOC. GPU uses the system cache to cache both the GPU data buffers(like textures) as well the SMMU pagetables. This helps with improved render performance as well as lower power consumption by reducing the bus traffic to the system memory.
The system cache architecture allows the cache to be split into slices which then be used by multiple SOC clients. This patch series is an effort to enable and use two of those slices perallocated for the GPU, one for the GPU data buffers and another for the GPU SMMU hardware pagetables.
This patchseries depends on the core llcc driver which was submitted to the mailing list: https://patchwork.kernel.org/patch/10184935/ and SMMU support for upstream hint which will be posted to the lists soon by Vivek.
Sharat Masetty (5): drm/msm: rearrange the gpu_rmw() function arm64:dts:sdm845: Add support for GPU LLCC drm/msm/adreno: Add registers in the GPU CX domain drm/msm: Pass mmu features to generic layers drm/msm/A6xx: Add support for using system cache(llc)
arch/arm64/boot/dts/qcom/sdm845.dtsi | 8 +- drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 2 +- drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 2 +- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 2 +- drivers/gpu/drm/msm/adreno/a6xx.xml.h | 4 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 162 +++++++++++++++++++++++++++++++- drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 9 ++ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 4 +- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 2 +- drivers/gpu/drm/msm/msm_drv.c | 8 ++ drivers/gpu/drm/msm/msm_drv.h | 1 + drivers/gpu/drm/msm/msm_gpu.c | 6 +- drivers/gpu/drm/msm/msm_gpu.h | 6 +- drivers/gpu/drm/msm/msm_iommu.c | 13 +++ drivers/gpu/drm/msm/msm_mmu.h | 16 ++++ 15 files changed, 231 insertions(+), 14 deletions(-)
-- 1.9.1
The register read-modify-write construct is generic enough that it can be used by other subsystems as needed, create a more generic rmw() function and have the gpu_rmw() use this new function.
Signed-off-by: Sharat Masetty smasetty@codeaurora.org --- drivers/gpu/drm/msm/msm_drv.c | 8 ++++++++ drivers/gpu/drm/msm/msm_drv.h | 1 + drivers/gpu/drm/msm/msm_gpu.h | 5 +---- 3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index fbad854..a08b7d2 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -149,6 +149,14 @@ u32 msm_readl(const void __iomem *addr) return val; }
+void msm_rmw(void __iomem *addr, u32 mask, u32 or) +{ + u32 val = msm_readl(addr); + + val &= ~mask; + msm_writel(val | or, addr); +} + struct vblank_event { struct list_head node; int crtc_id; diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index 0a653dd..7e71354 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -314,6 +314,7 @@ void __iomem *msm_ioremap(struct platform_device *pdev, const char *name, const char *dbgname); void msm_writel(u32 data, void __iomem *addr); u32 msm_readl(const void __iomem *addr); +void msm_rmw(void __iomem *addr, u32 mask, u32 or);
struct msm_gpu_submitqueue; int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx); diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index b9b86ef..96058d2 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -194,10 +194,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or) { - uint32_t val = gpu_read(gpu, reg); - - val &= ~mask; - gpu_write(gpu, reg, val | or); + msm_rmw(gpu->mmio + (reg << 2), mask, or); }
static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
On Fri, Mar 23, 2018 at 12:49:47PM +0530, Sharat Masetty wrote:
Reviewed-by: Jordan Crouse jcrouse@codeaurora.org
Add client side bindings required for the GPU to use the last level system cache. Also add a register range in the GPU CX domain.
Signed-off-by: Sharat Masetty smasetty@codeaurora.org --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi index eb0a1b2..7e2d938 100644 --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -887,8 +887,8 @@ compatible = "qcom,adreno-630.2", "qcom,adreno"; #stream-id-cells = <16>;
- reg = <0x5000000 0x40000>; - reg-names = "kgsl_3d0_reg_memory"; + reg = <0x5000000 0x40000>, <0x509e000 0x10>; + reg-names = "kgsl_3d0_reg_memory", "cx_mem";
/* * Look ma, no clocks! The GPU clocks and power are controlled @@ -898,6 +898,10 @@ interrupts = <0 300 0>; interrupt-names = "kgsl_3d0_irq";
+ /* GPU related llc slices */ + cache-slice-names = "gpu", "gpuhtw"; + cache-slices = <&llcc 12>, <&llcc 11>; + iommus = <&kgsl_smmu 0>;
operating-points-v2 = <&gpu_opp_table>;
On Fri, Mar 23, 2018 at 12:49:48PM +0530, Sharat Masetty wrote:
Add client side bindings required for the GPU to use the last level system cache. Also add a register range in the GPU CX domain.
Reviewed-by: Jordan Crouse jcrouse@codeaurora.org
Also, these should go the the devicetree lists for review (but maybe wait until the other changes have gotten further through the process).
On 4/4/2018 2:52 AM, Jordan Crouse wrote:
Thanks Jordan for the review and the reminder, I will send this specific patch out to the devicetree mailing list for review.
Add the registers needed for configuring the system cache slice info and other parameters in the GPU.
Signed-off-by: Sharat Masetty smasetty@codeaurora.org --- drivers/gpu/drm/msm/adreno/a6xx.xml.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx.xml.h b/drivers/gpu/drm/msm/adreno/a6xx.xml.h index 17d1241..29ce813 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx.xml.h +++ b/drivers/gpu/drm/msm/adreno/a6xx.xml.h @@ -1596,5 +1596,9 @@ static inline uint32_t A6XX_CP_PROTECT_REG_MASK_LEN(uint32_t val)
#define REG_A6XX_PDC_GPU_SEQ_MEM_0 0x000a0000
+#define REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0 0x00000001 + +#define REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1 0x00000002 +
#endif /* A6XX_XML */
On Fri, Mar 23, 2018 at 12:49:49PM +0530, Sharat Masetty wrote:
Add the registers needed for configuring the system cache slice info and other parameters in the GPU.
Reviewed-by: Jordan Crouse jcrouse@codeaurora.org
Allow different Adreno targets the ability to pass specific mmu features to the generic layers. This will help conditionally configure certain iommu features for certain Adreno targets.
Also Add a few simple support functions to support a bitmask of features that a specific MMU implementation supports.
Signed-off-by: Sharat Masetty smasetty@codeaurora.org --- drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 2 +- drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 2 +- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 2 +- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 4 +++- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 2 +- drivers/gpu/drm/msm/msm_gpu.c | 6 ++++-- drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_mmu.h | 13 +++++++++++++ 9 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c index 1dd84d3..a7a8573 100644 --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c @@ -492,7 +492,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev) adreno_gpu->registers = a3xx_registers; adreno_gpu->reg_offsets = a3xx_register_offsets;
- ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1); + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0); if (ret) goto fail;
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c index 2884b1b..5e7e15d6 100644 --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c @@ -574,7 +574,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev) adreno_gpu->registers = a4xx_registers; adreno_gpu->reg_offsets = a4xx_register_offsets;
- ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1); + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1, 0); if (ret) goto fail;
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index a4f68af..c9e06ff 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -1295,7 +1295,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
check_speed_bin(&pdev->dev);
- ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4); + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0); if (ret) { a5xx_destroy(&(a5xx_gpu->base.base)); return ERR_PTR(ret); diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index e83b066..bd50674 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1040,7 +1040,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev) adreno_gpu->registers = a6xx_registers; adreno_gpu->reg_offsets = a6xx_register_offsets;
- ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4); + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0); if (ret) { a6xx_destroy(&(a6xx_gpu->base.base)); return ERR_PTR(ret); diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 6657461..a87ec6b 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -557,7 +557,8 @@ static int adreno_get_pwrlevels(struct device *dev,
int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, struct adreno_gpu *adreno_gpu, - const struct adreno_gpu_funcs *funcs, int nr_rings) + const struct adreno_gpu_funcs *funcs, int nr_rings, + unsigned long mmu_features) { struct adreno_platform_config *config = pdev->dev.platform_data; struct msm_gpu_config adreno_gpu_config = { 0 }; @@ -576,6 +577,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, adreno_gpu_config.va_end = 0xffffffff;
adreno_gpu_config.nr_rings = nr_rings; + adreno_gpu_config.mmu_features = mmu_features;
adreno_get_pwrlevels(&pdev->dev, gpu);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h index bb9affd..19eda65 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h @@ -225,7 +225,7 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs, - int nr_rings); + int nr_rings, unsigned long mmu_features); void adreno_gpu_cleanup(struct adreno_gpu *gpu); int adreno_load_fw(struct adreno_gpu *adreno_gpu);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index ce8e781..c7f616c 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -704,7 +704,7 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
static struct msm_gem_address_space * msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev, - uint64_t va_start, uint64_t va_end) + uint64_t va_start, uint64_t va_end, unsigned long mmu_features) { struct iommu_domain *iommu; struct msm_gem_address_space *aspace; @@ -732,6 +732,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu) return ERR_CAST(aspace); }
+ msm_mmu_set_feature(aspace->mmu, mmu_features); + ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0); if (ret) { msm_gem_address_space_put(aspace); @@ -815,7 +817,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, msm_devfreq_init(gpu);
gpu->aspace = msm_gpu_create_address_space(gpu, pdev, - config->va_start, config->va_end); + config->va_start, config->va_end, config->mmu_features);
if (gpu->aspace == NULL) dev_info(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name); diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index 96058d2..dff9973 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -34,6 +34,7 @@ struct msm_gpu_config { uint64_t va_start; uint64_t va_end; unsigned int nr_rings; + unsigned long mmu_features; };
/* So far, with hardware that I've seen to date, we can have: diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h index aa2c5d4..85df78d 100644 --- a/drivers/gpu/drm/msm/msm_mmu.h +++ b/drivers/gpu/drm/msm/msm_mmu.h @@ -35,6 +35,7 @@ struct msm_mmu { struct device *dev; int (*handler)(void *arg, unsigned long iova, int flags); void *arg; + unsigned long features; };
static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev, @@ -54,4 +55,16 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg, mmu->handler = handler; }
+static inline void msm_mmu_set_feature(struct msm_mmu *mmu, + unsigned long feature) +{ + mmu->features |= feature; +} + +static inline bool msm_mmu_has_feature(struct msm_mmu *mmu, + unsigned long feature) +{ + return (mmu->features & feature) ? true : false; +} + #endif /* __MSM_MMU_H__ */
On Fri, Mar 23, 2018 at 12:49:50PM +0530, Sharat Masetty wrote:
Reviewed-by: Jordan Crouse jcrouse@codeaurora.org
The last level system cache can be partitioned to 32 different slices of which GPU has two slices preallocated. The "gpu" slice is used for caching GPU buffers and the "gpuhtw" slice is used for caching the GPU SMMU pagetables. This patch talks to the core system cache driver to acquire the slice handles, configure the SCID's to those slices and activates and deactivates the slices upon GPU power collapse and restore.
Some support from the IOMMU driver is also needed to make use of the system cache. IOMMU_UPSTREAM_HINT is a buffer protection flag which enables caching GPU data buffers in the system cache with memory attributes such as outer cacheable, read-allocate, write-allocate for buffers. The GPU then has the ability to override a few cacheability parameters which it does to override write-allocate to write-no-allocate as the GPU hardware does not benefit much from it. Similarly DOMAIN_ATTR_USE_UPSTREAM_HINT is another domain level attribute used by the IOMMU driver to set the right attributes to cache the hardware pagetables into the system cache.
Signed-off-by: Sharat Masetty smasetty@codeaurora.org --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 162 +++++++++++++++++++++++++++++++++- drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 9 ++ drivers/gpu/drm/msm/msm_iommu.c | 13 +++ drivers/gpu/drm/msm/msm_mmu.h | 3 + 4 files changed, 186 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index bd50674..e4554eb 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -13,6 +13,7 @@
#include <linux/qcom_scm.h> #include <linux/soc/qcom/mdt_loader.h> +#include <linux/soc/qcom/llcc-qcom.h>
#include "msm_gem.h" #include "msm_mmu.h" @@ -913,6 +914,154 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu) ~0 };
+#define A6XX_LLC_NUM_GPU_SCIDS 5 +#define A6XX_GPU_LLC_SCID_NUM_BITS 5 + +#define A6XX_GPU_LLC_SCID_MASK \ + ((1 << (A6XX_LLC_NUM_GPU_SCIDS * A6XX_GPU_LLC_SCID_NUM_BITS)) - 1) + +#define A6XX_GPUHTW_LLC_SCID_SHIFT 25 +#define A6XX_GPUHTW_LLC_SCID_MASK \ + (((1 << A6XX_GPU_LLC_SCID_NUM_BITS) - 1) << A6XX_GPUHTW_LLC_SCID_SHIFT) + +static inline void a6xx_gpu_cx_rmw(struct a6xx_llc *llc, + u32 reg, u32 mask, u32 or) +{ + msm_rmw(llc->mmio + (reg << 2), mask, or); +} + +static void a6xx_llc_deactivate(struct msm_gpu *gpu) +{ + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); + struct a6xx_llc *llc = &a6xx_gpu->llc; + + llcc_slice_deactivate(llc->gpu_llc_slice); + llcc_slice_deactivate(llc->gpuhtw_llc_slice); +} + +static void a6xx_llc_activate(struct msm_gpu *gpu) +{ + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); + struct a6xx_llc *llc = &a6xx_gpu->llc; + + if (!llc->mmio) + return; + + if (llc->gpu_llc_slice) + if (!llcc_slice_activate(llc->gpu_llc_slice)) + /* Program the sub-cache ID for all GPU blocks */ + a6xx_gpu_cx_rmw(llc, + REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1, + A6XX_GPU_LLC_SCID_MASK, + (llc->cntl1_regval & + A6XX_GPU_LLC_SCID_MASK)); + + if (llc->gpuhtw_llc_slice) + if (!llcc_slice_activate(llc->gpuhtw_llc_slice)) + /* Program the sub-cache ID for GPU pagetables */ + a6xx_gpu_cx_rmw(llc, + REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_1, + A6XX_GPUHTW_LLC_SCID_MASK, + (llc->cntl1_regval & + A6XX_GPUHTW_LLC_SCID_MASK)); + + /* Program cacheability overrides */ + a6xx_gpu_cx_rmw(llc, REG_A6XX_GPU_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, + llc->cntl0_regval); +} + +void a6xx_llc_slices_destroy(struct a6xx_llc *llc) +{ + if (llc->mmio) { + iounmap(llc->mmio); + llc->mmio = NULL; + } + + llcc_slice_putd(llc->gpu_llc_slice); + llc->gpu_llc_slice = NULL; + + llcc_slice_putd(llc->gpuhtw_llc_slice); + llc->gpuhtw_llc_slice = NULL; +} + +static int a6xx_llc_slices_init(struct platform_device *pdev, + struct a6xx_llc *llc) +{ + int i; + + /* Get the system cache slice descriptor for GPU and GPUHTWs */ + llc->gpu_llc_slice = llcc_slice_getd(&pdev->dev, "gpu"); + if (IS_ERR(llc->gpu_llc_slice)) + llc->gpu_llc_slice = NULL; + + llc->gpuhtw_llc_slice = llcc_slice_getd(&pdev->dev, "gpuhtw"); + if (IS_ERR(llc->gpuhtw_llc_slice)) + llc->gpuhtw_llc_slice = NULL; + + if (llc->gpu_llc_slice == NULL && llc->gpuhtw_llc_slice == NULL) + return -1; + + /* Map registers */ + llc->mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx"); + if (IS_ERR(llc->mmio)) { + llc->mmio = NULL; + a6xx_llc_slices_destroy(llc); + return -1; + } + + /* + * Setup GPU system cache CNTL0 and CNTL1 register values. + * These values will be programmed everytime GPU comes out + * of power collapse as these are non-retention registers. + */ + + /* + * CNTL0 provides options to override the settings for the + * read and write allocation policies for the LLC. These + * overrides are global for all memory transactions from + * the GPU. + * + * 0x3: read-no-alloc-overridden = 0 + * read-no-alloc = 0 - Allocate lines on read miss + * write-no-alloc-overridden = 1 + * write-no-alloc = 1 - Do not allocates lines on write miss + */ + llc->cntl0_regval = 0x03; + + /* + * CNTL1 is used to specify SCID for (CP, TP, VFD, CCU and UBWC + * FLAG cache) GPU blocks. This value will be passed along with + * the address for any memory transaction from GPU to identify + * the sub-cache for that transaction. + * + * Currently there is only one SCID allocated for all GPU blocks + * Hence set same SCID for all the blocks. + */ + + if (llc->gpu_llc_slice) { + u32 gpu_scid = llcc_get_slice_id(llc->gpu_llc_slice); + + for (i = 0; i < A6XX_LLC_NUM_GPU_SCIDS; i++) + llc->cntl1_regval |= + gpu_scid << (A6XX_GPU_LLC_SCID_NUM_BITS * i); + } + + /* + * Set SCID for GPU IOMMU. This will be used to access + * page tables that are cached in LLC. + */ + if (llc->gpuhtw_llc_slice) { + u32 gpuhtw_scid = llcc_get_slice_id(llc->gpuhtw_llc_slice); + + llc->cntl1_regval |= + gpuhtw_scid << A6XX_GPUHTW_LLC_SCID_SHIFT; + } + + return 0; +} + static int a6xx_pm_resume(struct msm_gpu *gpu) { struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); @@ -923,6 +1072,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
gpu->needs_hw_init = true;
+ /* Activate LLC slices */ + a6xx_llc_activate(gpu); + return ret; }
@@ -931,6 +1083,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu) struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+ /* Deactivate LLC slices */ + a6xx_llc_deactivate(gpu); + /* * Make sure the GMU is idle before continuing (because some transitions * may use VBIF @@ -993,6 +1148,8 @@ static void a6xx_destroy(struct msm_gpu *gpu) drm_gem_object_unreference_unlocked(a6xx_gpu->sqe_bo); }
+ a6xx_llc_slices_destroy(&a6xx_gpu->llc); + a6xx_gmu_remove(a6xx_gpu);
adreno_gpu_cleanup(adreno_gpu); @@ -1040,7 +1197,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev) adreno_gpu->registers = a6xx_registers; adreno_gpu->reg_offsets = a6xx_register_offsets;
- ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, 0); + ret = a6xx_llc_slices_init(pdev, &a6xx_gpu->llc); + + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4, + ret ? 0 : MMU_FEATURE_USE_SYSTEM_CACHE); if (ret) { a6xx_destroy(&(a6xx_gpu->base.base)); return ERR_PTR(ret); diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h index 21ab701..392c426 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h @@ -21,6 +21,14 @@
extern bool hang_debug;
+struct a6xx_llc { + void __iomem *mmio; + void *gpu_llc_slice; + void *gpuhtw_llc_slice; + u32 cntl0_regval; + u32 cntl1_regval; +}; + struct a6xx_gpu { struct adreno_gpu base;
@@ -46,6 +54,7 @@ struct a6xx_gpu { uint64_t scratch_iova;
struct a6xx_gmu gmu; + struct a6xx_llc llc; };
#define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base) diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 1ab629b..6c03eda 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -39,6 +39,16 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, { struct msm_iommu *iommu = to_msm_iommu(mmu); int ret; + int gpu_htw_llc = 1; + + /* + * This allows GPU to set the bus attributes required + * to use system cache on behalf of the iommu page table + * walker. + */ + if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE)) + iommu_domain_set_attr(iommu->domain, + DOMAIN_ATTR_USE_UPSTREAM_HINT, &gpu_htw_llc);
pm_runtime_get_suppliers(mmu->dev); ret = iommu_attach_device(iommu->domain, mmu->dev); @@ -63,6 +73,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, struct msm_iommu *iommu = to_msm_iommu(mmu); size_t ret;
+ if (msm_mmu_has_feature(mmu, MMU_FEATURE_USE_SYSTEM_CACHE)) + prot |= IOMMU_USE_UPSTREAM_HINT; + pm_runtime_get_suppliers(mmu->dev); ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); pm_runtime_put_suppliers(mmu->dev); diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h index 85df78d..257bdea 100644 --- a/drivers/gpu/drm/msm/msm_mmu.h +++ b/drivers/gpu/drm/msm/msm_mmu.h @@ -30,6 +30,9 @@ struct msm_mmu_funcs { void (*destroy)(struct msm_mmu *mmu); };
+/* MMU features */ +#define MMU_FEATURE_USE_SYSTEM_CACHE (1 << 0) + struct msm_mmu { const struct msm_mmu_funcs *funcs; struct device *dev;
On Fri, Mar 23, 2018 at 12:49:51PM +0530, Sharat Masetty wrote:
This has a dependency on the LLCC driver and the API to that may change (it is under review now). When it does, this will have to naturally change as well but that'll be a minor tweek and won't affect the functionality of this driver so pending those changes..
Reviewed-by: Jordan Crouse jcrouse@codeaurora.org
On 4/4/2018 2:54 AM, Jordan Crouse wrote:
Thanks for the review Jordan. Vivek will also submit the SMMU changes for the UPSTREAM_HINT support to the mailing list soon. So once the dependencies are sorted out, I will review and submit a fresh patch set if needed.
Hi Sharat,
On 3/23/2018 12:49 PM, Sharat Masetty wrote:
Couple of minor nits. Please see comments inline below.
static?
If you move this ioremap to start of the function, then you wouldn't need to call a6xx_llcc_slices_destroy().
regards Vivek
dri-devel@lists.freedesktop.org