Hi,
I am getting the following crash when booting the adreno driver on i.MX53 running a 5.3-rc6 kernel.
Such error does not happen with 5.2 though.
Before I start running a bisect, I am wondering if anyone has any ideas about this issue.
Thanks,
Fabio Estevam
[ 2.083249] 8<--- cut here --- [ 2.086460] Unable to handle kernel paging request at virtual address 50001000 [ 2.094174] pgd = (ptrval) [ 2.096911] [50001000] *pgd=00000000 [ 2.100606] Internal error: Oops: 805 [#1] SMP ARM [ 2.105412] Modules linked in: [ 2.108487] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc6-00271-g9f159ae07f07 #4 [ 2.116411] Hardware name: Freescale i.MX53 (Device Tree Support) [ 2.122538] PC is at v7_dma_clean_range+0x20/0x38 [ 2.127254] LR is at __dma_page_cpu_to_dev+0x28/0x90 [ 2.132226] pc : [<c011c76c>] lr : [<c01181c4>] psr: 20000013 [ 2.138500] sp : d80b5a88 ip : de96c000 fp : d840ce6c [ 2.143732] r10: 00000000 r9 : 00000001 r8 : d843e010 [ 2.148964] r7 : 00000000 r6 : 00008000 r5 : ddb6c000 r4 : 00000000 [ 2.155500] r3 : 0000003f r2 : 00000040 r1 : 50008000 r0 : 50001000 [ 2.162037] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 2.169180] Control: 10c5387d Table: 70004019 DAC: 00000051 [ 2.174934] Process swapper/0 (pid: 1, stack limit = 0x(ptrval)) [ 2.180949] Stack: (0xd80b5a88 to 0xd80b6000) [ 2.185319] 5a80: c011c7bc d8491780 d840ce6c d849b380 00000000 c011822c [ 2.193509] 5aa0: c0d01a18 c0118abc c0118a78 d84a0200 00000008 c1308908 d838e800 d849a4a8 [ 2.201697] 5ac0: d8491780 c06699b4 ffffffff ffffffff 00000000 d8491600 d80b5b20 d84a0200 [ 2.209886] 5ae0: d8491780 d8491600 d80b5b20 d8491600 d849a4a8 d84a0200 00000003 d84a0358 [ 2.218077] 5b00: c1308908 d8491600 d849a4a8 d8491780 d840ce6c c066a55c c1308908 c066a104 [ 2.226266] 5b20: 01001000 00000000 d84a0200 10700ac6 d849a480 d84a0200 00000000 d8491600 [ 2.234455] 5b40: 00000000 e0845000 c1308908 c066a72c d849a480 d840ce6c d840ce00 c1308908 [ 2.242643] 5b60: 00000000 c066b584 d849a488 d849a4a8 00000000 c1308908 d840ce6c c066ff40 [ 2.250832] 5b80: d849a488 d849a4a8 00000000 c1308908 00000000 d81b4000 00000000 e0845000 [ 2.259021] 5ba0: d838e800 c1308908 d8491600 10700ac6 d80b5bc8 d840ce00 d840ce6c 00000001 [ 2.267210] 5bc0: 00000000 e0845000 d838e800 c066ece4 01000000 00000000 10ff0000 00000000 [ 2.275399] 5be0: c1308908 00000001 d81b4000 00000000 01000000 00000000 00000001 10700ac6 [ 2.283587] 5c00: c0d6d564 d840ce00 d81b4010 00000001 d81b4000 c0d6d564 c1308908 d80b5c48 [ 2.291777] 5c20: d838e800 c061f9cc c1029dec d80b5c48 d838e800 00000000 00000000 c13e8788 [ 2.299965] 5c40: ffffffff c1308928 c102a234 00000000 01000000 00000000 10ff0000 00000000 [ 2.308154] 5c60: 00000001 00000000 a0000013 10700ac6 c13b7658 d840ce00 d838e800 d81b4000 [ 2.316343] 5c80: d840ce00 c1308908 00000002 d838f800 00000000 c0620514 00000001 10700ac6 [ 2.324531] 5ca0: d8496440 00000000 d81b4010 c1aa1c00 d838e800 c061e070 00000000 00000000 [ 2.332720] 5cc0: 00000000 c0d6c534 df56cf34 000000c8 00000000 10700ac6 d81b4010 00000000 [ 2.340909] 5ce0: 00000000 d8496440 d838e800 c103acd0 d8496280 00000000 c1380488 c06a3e10 [ 2.349097] 5d00: 00000000 00000000 ffffffff d838f800 d838e800 d843e010 d8496440 c1308908 [ 2.357286] 5d20: 00000000 d83f9640 c1380488 c0668554 00000006 00000007 c13804d4 d83f9640 [ 2.365475] 5d40: c1380488 c017ec18 d80c0000 c0c43e40 d843e010 d8496440 00000001 c0182a94 [ 2.373665] 5d60: 60000013 10700ac6 d843e010 d8496280 d8496400 00000018 d8496440 00000001 [ 2.381854] 5d80: c13804d4 d83f9640 c1380488 c06a4280 c1380488 00000000 c0d764f8 d8496440 [ 2.390044] 5da0: c1380488 d843e010 c0d764f8 c1308908 00000000 00000000 c13ef300 c06a44f0 [ 2.398232] 5dc0: c0d8a0dc dffcc6f0 d843e010 dffcc6f0 00000000 d843e010 00000000 c06680b8 [ 2.406421] 5de0: d84988c0 d83f9640 d84988c0 d84989a0 d8498230 10700ac6 00000001 d843e010 [ 2.414610] 5e00: 00000000 c137eec0 00000000 c137eec0 00000000 00000000 c13ef300 c06ac1a0 [ 2.422799] 5e20: d843e010 c1aa40dc c1aa40e0 00000000 c137eec0 c06aa014 d843e010 c137eec0 [ 2.430988] 5e40: c137eec0 c1308908 c13e9880 c13e85d4 00000000 c06aa368 c1308908 c13e9880 [ 2.439178] 5e60: c13e85d4 d843e010 00000000 c137eec0 c1308908 c13e9880 c13e85d4 c06aa618 [ 2.447367] 5e80: 00000000 c137eec0 d843e010 c06aa6a4 00000000 c137eec0 c06aa620 c06a844c [ 2.455556] 5ea0: d80888d4 d80888a4 d84914d0 10700ac6 d80888d4 c137eec0 d8494f00 c1380d28 [ 2.463745] 5ec0: 00000000 c06a946c c105f3d4 c1308908 00000000 c137eec0 c1308908 00000000 [ 2.471934] 5ee0: c125fdd0 c06ab304 c1308928 c1308908 00000000 c0103178 00000109 00000000 [ 2.480123] 5f00: dffffc6e dffffc00 c1126860 00000109 00000109 c014dc88 c11253ac c10607a0 [ 2.488312] 5f20: 00000000 00000006 00000006 00000000 c12adeec dffffc6e 00000000 10700ac6 [ 2.496501] 5f40: c1308f18 10700ac6 00000007 c13e9880 c13ef300 c1294850 c1308928 c12ae4c4 [ 2.504690] 5f60: 00000000 c12011f8 00000006 00000006 00000000 c120066c 00000000 00000109 [ 2.512878] 5f80: 00000000 00000000 c0c3bb28 00000000 00000000 00000000 00000000 00000000 [ 2.521066] 5fa0: 00000000 c0c3bb30 00000000 c01010b4 00000000 00000000 00000000 00000000 [ 2.529255] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.537443] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 2.545640] [<c011c76c>] (v7_dma_clean_range) from [<c01181c4>] (__dma_page_cpu_to_dev+0x28/0x90) [ 2.554526] [<c01181c4>] (__dma_page_cpu_to_dev) from [<c0118abc>] (arm_dma_sync_sg_for_device+0x44/0x64) [ 2.564121] [<c0118abc>] (arm_dma_sync_sg_for_device) from [<c06699b4>] (get_pages+0x1ac/0x214) [ 2.572834] [<c06699b4>] (get_pages) from [<c066a55c>] (msm_gem_get_and_pin_iova+0xb0/0x13c) [ 2.581284] [<c066a55c>] (msm_gem_get_and_pin_iova) from [<c066a72c>] (_msm_gem_kernel_new+0x38/0xa8) [ 2.590515] [<c066a72c>] (_msm_gem_kernel_new) from [<c066b584>] (msm_gem_kernel_new+0x24/0x2c) [ 2.599230] [<c066b584>] (msm_gem_kernel_new) from [<c066ff40>] (msm_ringbuffer_new+0x68/0x140) [ 2.607940] [<c066ff40>] (msm_ringbuffer_new) from [<c066ece4>] (msm_gpu_init+0x430/0x5fc) [ 2.616220] [<c066ece4>] (msm_gpu_init) from [<c061f9cc>] (adreno_gpu_init+0x16c/0x298) [ 2.624236] [<c061f9cc>] (adreno_gpu_init) from [<c0620514>] (a2xx_gpu_init+0x84/0x104) [ 2.632252] [<c0620514>] (a2xx_gpu_init) from [<c061e070>] (adreno_bind+0x190/0x274) [ 2.640018] [<c061e070>] (adreno_bind) from [<c06a3e10>] (component_bind_all+0xe8/0x22c) [ 2.648124] [<c06a3e10>] (component_bind_all) from [<c0668554>] (msm_drm_bind+0xf4/0x610) [ 2.656315] [<c0668554>] (msm_drm_bind) from [<c06a4280>] (try_to_bring_up_master+0x158/0x198) [ 2.664940] [<c06a4280>] (try_to_bring_up_master) from [<c06a44f0>] (component_master_add_with_match+0xb8/0xf8) [ 2.675042] [<c06a44f0>] (component_master_add_with_match) from [<c06680b8>] (msm_pdev_probe+0x214/0x28c) [ 2.684630] [<c06680b8>] (msm_pdev_probe) from [<c06ac1a0>] (platform_drv_probe+0x48/0x98) [ 2.692908] [<c06ac1a0>] (platform_drv_probe) from [<c06aa014>] (really_probe+0xec/0x2cc) [ 2.701099] [<c06aa014>] (really_probe) from [<c06aa368>] (driver_probe_device+0x5c/0x164) [ 2.709376] [<c06aa368>] (driver_probe_device) from [<c06aa618>] (device_driver_attach+0x58/0x60) [ 2.718259] [<c06aa618>] (device_driver_attach) from [<c06aa6a4>] (__driver_attach+0x84/0xc0) [ 2.726796] [<c06aa6a4>] (__driver_attach) from [<c06a844c>] (bus_for_each_dev+0x70/0xb4) [ 2.734985] [<c06a844c>] (bus_for_each_dev) from [<c06a946c>] (bus_add_driver+0x154/0x1e0) [ 2.743262] [<c06a946c>] (bus_add_driver) from [<c06ab304>] (driver_register+0x74/0x108) [ 2.751369] [<c06ab304>] (driver_register) from [<c0103178>] (do_one_initcall+0x80/0x32c) [ 2.759560] [<c0103178>] (do_one_initcall) from [<c12011f8>] (kernel_init_freeable+0x2e4/0x3c8) [ 2.768278] [<c12011f8>] (kernel_init_freeable) from [<c0c3bb30>] (kernel_init+0x8/0x114) [ 2.776469] [<c0c3bb30>] (kernel_init) from [<c01010b4>] (ret_from_fork+0x14/0x20) [ 2.784046] Exception stack(0xd80b5fb0 to 0xd80b5ff8) [ 2.789107] 5fa0: 00000000 00000000 00000000 00000000 [ 2.797295] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.805482] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 2.812111] Code: e1a02312 e2423001 e1c00003 e320f000 (ee070f3a) [ 2.818319] ---[ end trace cdc18b3504e6a4f8 ]---
Hi Robin,
On Mon, Sep 2, 2019 at 11:45 AM Robin Murphy robin.murphy@arm.com wrote:
Try 0036bc73ccbe - that looks like something that CONFIG_DMA_API_DEBUG should have been screaming about anyway.
Thanks for your suggestion.
I can successfully boot after reverting the following commits:
commit 141db5703c887f46957615cd6616ca28fe4691e0 (HEAD) Author: Fabio Estevam festevam@gmail.com Date: Mon Sep 2 14:58:18 2019 -0300
Revert "drm/msm: stop abusing dma_map/unmap for cache"
This reverts commit 0036bc73ccbe7e600a3468bf8e8879b122252274.
commit fa5b1f620f2984c254877d6049214c39c24c8207 Author: Fabio Estevam festevam@gmail.com Date: Mon Sep 2 14:56:01 2019 -0300
Revert "drm/msm: Use the correct dma_sync calls in msm_gem"
This reverts commit 3de433c5b38af49a5fc7602721e2ab5d39f1e69c.
Rob,
What would be the recommended approach for fixing this?
Thanks
On Mon, Sep 2, 2019 at 11:03 AM Fabio Estevam festevam@gmail.com wrote:
We need a direct way to handle cache, so we can stop trying to trick DMA API into doing what we want.
Something like this is what I had in mind:
https://patchwork.freedesktop.org/series/65211/
I guess I could respin that. I'm not really sure of any other way to have things working on the different combinations of archs and dma_ops that we have. Lately fixing one has been breaking another.
BR, -R
Hi Jonathan,
On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek jonathan@marek.ca wrote:
Thanks for testing it. I haven't had a chance to test it yet.
Rob,
I assume your series is targeted to 5.4, correct?
If this is the case, what we should do about the i.MX5 regression on 5.3?
Would a revert of the two commits be acceptable in 5.3 in order to avoid the regression?
Please advise.
Thanks
On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam festevam@gmail.com wrote:
maybe, although Christoph Hellwig didn't seem like a big fan of exposing cache ops, and would rather add a new allocation API for uncached pages.. so I'm not entirely sure what the way forward will be.
In the mean time, it is a bit ugly, but I guess something like this should work:
-------------------- diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c index 7263f4373f07..5a6a79fbc9d6 100644 --- a/drivers/gpu/drm/msm/msm_gem.c +++ b/drivers/gpu/drm/msm/msm_gem.c @@ -52,7 +52,7 @@ static void sync_for_device(struct msm_gem_object *msm_obj) { struct device *dev = msm_obj->base.dev->dev;
- if (get_dma_ops(dev)) { + if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) { dma_sync_sg_for_device(dev, msm_obj->sgt->sgl, msm_obj->sgt->nents, DMA_BIDIRECTIONAL); } else { @@ -65,7 +65,7 @@ static void sync_for_cpu(struct msm_gem_object *msm_obj) { struct device *dev = msm_obj->base.dev->dev;
- if (get_dma_ops(dev)) { + if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) { dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl, msm_obj->sgt->nents, DMA_BIDIRECTIONAL); } else { --------------------
BR, -R
Hi Rob,
On Tue, Sep 3, 2019 at 9:12 PM Rob Clark robdclark@gmail.com wrote:
In the mean time, it is a bit ugly, but I guess something like this should work:
Yes, this works on a i.MX53 board, thanks:
Tested-by: Fabio Estevam festevam@gmail.com
Is this something you could submit for 5.3?
Thanks
On 04/09/2019 01:12, Rob Clark wrote:
TBH, the use of map/unmap looked reasonable in the context of "start/stop using these pages for stuff which may include DMA", so even if it was cheekily ignoring sg->dma_address I'm not sure I'd really consider it "abuse" - in comparison, using sync without a prior map unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG will be rendered useless with false positives if this driver is active while trying to debug something else.
The warning referenced in 0036bc73ccbe represents something being unmapped which didn't match a corresponding map - from what I can make of get_pages()/put_pages() it looks like that would need msm_obj->flags or msm_obj->sgt to change during the lifetime of the object, neither of which sounds like a thing that should legitimately happen. Are you sure this isn't all just hiding a subtle bug elsewhere? After all, if what was being unmapped wasn't right, who says that what's now being synced is?
Robin.
On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy robin.murphy@arm.com wrote:
Correct, msm_obj->flags/sgt should not change.
I reverted the various patches, and went back to the original setup that used dma_{map,unmap}_sg() to reproduce the original issue that prompted the change in the first place. It is a pretty massive flood of splats, which pretty quickly overflowed the dmesg ring buffer, so I might be missing some things, but I'll poke around some more.
The one thing I wonder about, what would happen if the buffer is allocated and dma_map_sg() called before drm/msm attaches it's own iommu_domains, and then dma_unmap_sg() afterwards. We aren't actually ever using the iommu domain that DMA API is creating for the device, so all the extra iommu_map/unmap (and tlb flush) is at best unnecessary. But I'm not sure if it could be having some unintended side effects that cause this sort of problem.
BR, -R
On Thu, Sep 5, 2019 at 10:03 AM Rob Clark robdclark@gmail.com wrote:
it seems like every time (or at least every time we splat), we end up w/ iova=fffffffffffff000 .. which doesn't sound likely to be right. Although from just looking at the dma-iommu.c code, I'm not sure how this happens. And adding some printk's results in enough traces that I can't boot for some reason..
BR, -R
On Thu, Sep 5, 2019 at 12:05 PM Rob Clark robdclark@gmail.com wrote:
Ok, I see better what is going on.. at least on the kernel that I'm using on the yoga c630 laptop, where I have a patch[1] to skip domain attach. That results in to_smmu_domain(domain)->pgtbl_ops being null, so arm_smmu_map() fails. So we skip __finalise_sg() which sets the sg_dma_address(). Which causes the failure on unmap.
That said, I'm pretty sure I've seen (or had reported) a similar splat (although maybe not so frequent) on devices without that patch (where the bootloader isn't enabling scanout). I'll have to switch over to a different device that doesn't light up display from bootloader, so that I can drop that skip-domain-attach patch
All that said, this would be much easier if I could do the cache operations without all this unneeded iommu stuff. (Not to mention the unnecessary TLB flushes that I suspect are also happening.)
[1] https://patchwork.kernel.org/patch/11038793/
BR, -R
On Thu, Sep 5, 2019 at 3:30 PM Rob Clark robdclark@gmail.com wrote:
fwiw, with https://patchwork.freedesktop.org/series/63096/ we could go back to simply using dma_{map,unmap}_sg() in all cases, as the iommu dma_ops would no longer get in the way.
BR, -R
On 05/09/2019 20:05, Rob Clark wrote:
Right, one of the semi-intentional side-effects of 43c5bf11a610 is that iommu-dma no longer interferes with unmanaged domains - it will still go and make its own redundant mappings in the unattached default domain, but as long as the DMA API usage is fundamentally sound then it shouldn't actually get in the way.
Yeah, that's a bogus IOVA for sure, so regardless of how we actually make Adreno happy it would still be interesting to figure out how it came about. Do you see any WARNs from io-pgtable-arm before the one from __iommu_dma_unmap()?
Robin.
dri-devel@lists.freedesktop.org