On Fri, 18 Dec 2020 16:10:12 +0800, Christian König wrote:
Am 18.12.20 um 04:51 schrieb Chen Li:
[SNIP]
If your ARM base board can't do that for some then you can't use the hardware with that board.
Good to know, thanks! BTW, have you ever seen or heard boards like mine which cannot mmap device memory correctly from userspace correctly?
Unfortunately yes. We haven't been able to figure out what exactly goes wrong in those cases.
Ok. one more question: only e8860 or all radeon cards have this issue?
This applies to all hardware with dedicated memory which needs to be mapped to userspace.
That includes all graphics hardware from AMD as well as NVidia and probably a whole bunch of other PCIe devices.
Can mmio on these devices work fine in kernel space? I cannot see the difference here except user space should use uncacheable mmap to map virtual memory to device space(though I don't know how to use uncacheable mmap), while kernel use uncache ioremap.
The graphics address remapping table (GART),[1] also known as the graphics aperture remapping table,[2] or graphics translation table (GTT),[3] is an I/O memory management unit (IOMMU) used by Accelerated Graphics Port (AGP) and PCI Express (PCIe) graphics cards.
GART or GTT refers to the translation tables graphics hardware use to access system memory.
Something like 15 years ago we used the IOMMU functionality from AGP to implement that. But modern hardware (PCIe) uses some specialized hardware in the GPU for that.
Regards, Christian.
Good to know, thanks! So modern GART/GTT is like tlb, and iommu is forcused on translating address and not manager the tlb.
You are getting closer in your understanding, but the TLB is the Translation lookaside buffer. Basically a cache of recent VM translations which is present is all page table translations (GART, IOMMU, CPU etc...).
The key difference is where the page table translation happens on modern hardware:
- For the GART/GTT it is inside the GPU to translate between GPU internal and
bus addresses. 2. For IOMMU it is inside the root complex of the PCIe to translate between bus addresses and physical addresses. 3. For CPU page tables it is inside the CPU core to translate between virtual addresses and physical addresses.
Regards, Christian.
Awesome explaination! Thanks in a ton!