On Wed, Mar 17, 2021 at 2:33 AM Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Rob,
On 2021-03-16 22:46, Rob Clark wrote:
<snip>...
When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped into the CPU and with what attributes? Rob said "writecombine for everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
Currently userspace asks for everything WC, so pgprot_writecombine()
The kernel doesn't enforce this, but so far provides no UAPI to do anything useful with non-coherent cached mappings (although there is interest to support this)
btw, I'm looking at a benchmark (gl_driver2_off) where (after some other in-flight optimizations land) we end up bottlenecked on writing to WC cmdstream buffers. I assume in the current state, WC goes all the way to main memory rather than just to system cache?
oh, I guess this (mentioned earlier in thread) is what I really want for this benchmark:
https://android-review.googlesource.com/c/kernel/common/+/1549097/3
You can also check if the system cache lines are allocated for GPU or not with patch in https://crrev.com/c/2766723
With the above patch applied, cat /sys/kernel/debug/llcc_stats/llcc_scid_status
The SCIDs for GPU are listed in include/linux/soc/qcom/llcc-qcom.h
Actually for the benchmark I was referring to, it is the *CPU* bottlenecked on writes to writecombine mappings.. so I think what I want is for CPU mappings to be able to use systemcache..
BR, -R