On Thu, Sep 5, 2019 at 10:03 AM Rob Clark robdclark@gmail.com wrote:
On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy robin.murphy@arm.com wrote:
On 04/09/2019 01:12, Rob Clark wrote:
On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam festevam@gmail.com wrote:
Hi Jonathan,
On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek jonathan@marek.ca wrote:
Hi,
I tried this and it works with patches 4+5 from Rob's series and changing gpummu to use sg_phys(sg) instead of sg->dma_address (dma_address isn't set now that dma_map_sg isn't used).
Thanks for testing it. I haven't had a chance to test it yet.
Rob,
I assume your series is targeted to 5.4, correct?
maybe, although Christoph Hellwig didn't seem like a big fan of exposing cache ops, and would rather add a new allocation API for uncached pages.. so I'm not entirely sure what the way forward will be.
TBH, the use of map/unmap looked reasonable in the context of "start/stop using these pages for stuff which may include DMA", so even if it was cheekily ignoring sg->dma_address I'm not sure I'd really consider it "abuse" - in comparison, using sync without a prior map unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG will be rendered useless with false positives if this driver is active while trying to debug something else.
The warning referenced in 0036bc73ccbe represents something being unmapped which didn't match a corresponding map - from what I can make of get_pages()/put_pages() it looks like that would need msm_obj->flags or msm_obj->sgt to change during the lifetime of the object, neither of which sounds like a thing that should legitimately happen. Are you sure this isn't all just hiding a subtle bug elsewhere? After all, if what was being unmapped wasn't right, who says that what's now being synced is?
Correct, msm_obj->flags/sgt should not change.
I reverted the various patches, and went back to the original setup that used dma_{map,unmap}_sg() to reproduce the original issue that prompted the change in the first place. It is a pretty massive flood of splats, which pretty quickly overflowed the dmesg ring buffer, so I might be missing some things, but I'll poke around some more.
The one thing I wonder about, what would happen if the buffer is allocated and dma_map_sg() called before drm/msm attaches it's own iommu_domains, and then dma_unmap_sg() afterwards. We aren't actually ever using the iommu domain that DMA API is creating for the device, so all the extra iommu_map/unmap (and tlb flush) is at best unnecessary. But I'm not sure if it could be having some unintended side effects that cause this sort of problem.
it seems like every time (or at least every time we splat), we end up w/ iova=fffffffffffff000 .. which doesn't sound likely to be right. Although from just looking at the dma-iommu.c code, I'm not sure how this happens. And adding some printk's results in enough traces that I can't boot for some reason..
BR, -R