On Thu, Sep 5, 2019 at 3:30 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Sep 5, 2019 at 12:05 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Sep 5, 2019 at 10:03 AM Rob Clark robdclark@gmail.com wrote:
On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy robin.murphy@arm.com wrote:
On 04/09/2019 01:12, Rob Clark wrote:
On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam festevam@gmail.com wrote:
Hi Jonathan,
On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek jonathan@marek.ca wrote: > > Hi, > > I tried this and it works with patches 4+5 from Rob's series and > changing gpummu to use sg_phys(sg) instead of sg->dma_address > (dma_address isn't set now that dma_map_sg isn't used).
Thanks for testing it. I haven't had a chance to test it yet.
Rob,
I assume your series is targeted to 5.4, correct?
maybe, although Christoph Hellwig didn't seem like a big fan of exposing cache ops, and would rather add a new allocation API for uncached pages.. so I'm not entirely sure what the way forward will be.
TBH, the use of map/unmap looked reasonable in the context of "start/stop using these pages for stuff which may include DMA", so even if it was cheekily ignoring sg->dma_address I'm not sure I'd really consider it "abuse" - in comparison, using sync without a prior map unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG will be rendered useless with false positives if this driver is active while trying to debug something else.
The warning referenced in 0036bc73ccbe represents something being unmapped which didn't match a corresponding map - from what I can make of get_pages()/put_pages() it looks like that would need msm_obj->flags or msm_obj->sgt to change during the lifetime of the object, neither of which sounds like a thing that should legitimately happen. Are you sure this isn't all just hiding a subtle bug elsewhere? After all, if what was being unmapped wasn't right, who says that what's now being synced is?
Correct, msm_obj->flags/sgt should not change.
I reverted the various patches, and went back to the original setup that used dma_{map,unmap}_sg() to reproduce the original issue that prompted the change in the first place. It is a pretty massive flood of splats, which pretty quickly overflowed the dmesg ring buffer, so I might be missing some things, but I'll poke around some more.
The one thing I wonder about, what would happen if the buffer is allocated and dma_map_sg() called before drm/msm attaches it's own iommu_domains, and then dma_unmap_sg() afterwards. We aren't actually ever using the iommu domain that DMA API is creating for the device, so all the extra iommu_map/unmap (and tlb flush) is at best unnecessary. But I'm not sure if it could be having some unintended side effects that cause this sort of problem.
it seems like every time (or at least every time we splat), we end up w/ iova=fffffffffffff000 .. which doesn't sound likely to be right. Although from just looking at the dma-iommu.c code, I'm not sure how this happens. And adding some printk's results in enough traces that I can't boot for some reason..
Ok, I see better what is going on.. at least on the kernel that I'm using on the yoga c630 laptop, where I have a patch[1] to skip domain attach. That results in to_smmu_domain(domain)->pgtbl_ops being null, so arm_smmu_map() fails. So we skip __finalise_sg() which sets the sg_dma_address(). Which causes the failure on unmap.
That said, I'm pretty sure I've seen (or had reported) a similar splat (although maybe not so frequent) on devices without that patch (where the bootloader isn't enabling scanout). I'll have to switch over to a different device that doesn't light up display from bootloader, so that I can drop that skip-domain-attach patch
All that said, this would be much easier if I could do the cache operations without all this unneeded iommu stuff. (Not to mention the unnecessary TLB flushes that I suspect are also happening.)
fwiw, with https://patchwork.freedesktop.org/series/63096/ we could go back to simply using dma_{map,unmap}_sg() in all cases, as the iommu dma_ops would no longer get in the way.
BR, -R