On Tue, Sep 21, 2021 at 04:06:00PM +0300, Ville Syrjälä wrote:
On Mon, Sep 20, 2021 at 10:47:08PM -0700, Lucas De Marchi wrote:
On Wed, Sep 15, 2021 at 12:29:12PM -0700, John Harrison wrote:
On 9/15/2021 12:24, Belgaumkar, Vinay wrote:
On 9/14/2021 12:51 PM, Lucas De Marchi wrote:
The clflush calls here aren't doing anything since we are not writting something and flushing the cache lines to be visible to GuC. Here the intention seems to be to make sure whatever GuC has written is visible to the CPU before we read them. However a clflush from the CPU side is the wrong instruction to use.
Is there a right instruction to use? Either we need to verify that no
how can there be a right instruction? If the GuC needs to flush, then the GuC needs to do it, nothing to be done by the CPU.
Flushing the CPU cache line here is doing nothing to guarantee that what was written by GuC hit the memory and we are reading it. Not sure why it was actually added, but since it was added by Vinay and he reviewed this patch, I'm assuming he also agrees
clflush == writeback + invalidate. The invalidate is the important part when the CPU has to read something written by something else that's not cache coherent.
Although the invalidate would be the important part, how would that work if there is still a flush? Wouldn't we be overriding whatever was written by the other side? Or are we using the fact that we shouldn't be writting to this cacheline so we know it's not dirty?
Now, I have no idea if the guc has its own (CPU invisible) caches or not. If it does then it will need to trigger a writeback. But regardless, if the guc bypasses the CPU caches the CPU will need to invalidate before it reads anything in case it has stale data sitting in its cache.
Indeed, thanks... but another case would be if caches are coherent through snoop. Do you know what is the cache architecture with GuC and CPU?
Another question comes to mind, but first some context: I'm looking at this in order to support other archs besides x86... the only platforms in which this would be relevant would be on the discrete ones (I'm currently running an arm64 guest on qemu and using pci passthrough). I see that for dgfx intel_guc_allocate_vma() uses i915_gem_object_create_lmem() instead of i915_gem_object_create_shmem(). Would that make a difference?
thanks Lucas De Marchi