On 2020-12-18 06:14, Chen Li wrote: [...]
No, not performance. See standards like OpenGL, Vulkan as well as VA-API and VDPAU require that you can mmap() device memory and execute memset/memcpy on the memory from userspace.
If your ARM base board can't do that for some then you can't use the hardware with that board.
If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based systems I believe it should be able to mmap() to userspace with the Normal memory type, where unaligned accesses and such are allowed, as opposed to the Device memory type intended for MMIO mappings, which has more restrictions but stricter ordering guarantees.
Hi, Robin. I cannot understand it allow unaligned accesses. prefetchable PCI bar should also be mmio, and accesses will end with device memory, so why does this allow unaligned access?
Because even Device-GRE is a bit too restrictive to expose to userspace that's likely to expect it to behave as regular memory, so, for better or worse, we use MT_NORMAL_MC for pgrprot_writecombine().
Regardless of what happens elsewhere though, if something is mapped *into the kernel* with ioremap(), then it is fundamentally wrong per the kernel memory model to reference that mapping directly without using I/O accessors. That is not specific to any individual architecture, and Sparse should be screaming about it already. I guess in this case the UVD code needs to pay more attention to whether radeon_bo_kmap() ends up going via ttm_bo_ioremap() or not.
(I'm assuming the initial fault was memset() with 0 trying to perform "DC ZVA" on a Device-type mapping from ioremap() - FYI a stacktrace on its own without the rest of the error dump showing what actually triggered it isn't overly useful)
Robin.
why it may be 'DC ZVA'? I'm not sure the pc in initial kernel fault memset, but I capture the userspace crash pc: stp(128bit) or str with neon(also 128bit) to render node(/dev/dri/renderD128).
As I said it was an assumption. I guessed at it being more likely to be an MMU fault than an external abort, and given the size and the fact that it's a variable initialisation guessed at it being slightly more likely to hit the ZVA special-case rather than being unaligned. Looking again, I guess starting at an odd-numbered 32-bit element might lead to an unaligned store of XZR, but either way it doesn't really matter - what it showed is it clearly *could* be an MMU fault because TTM seems to be a bit careless with iomem pointers.
That said, if you're also getting external aborts from your host bridge not liking 128-bit transactions, then as Christian says you're probably going to have a bad time on this platform either way.
Robin.