On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
`BUG: unable to handle page fault for address: 00000000003a8290` and the fact it crashed at `_raw_spin_lock_irqsave` look like the memory (maybe dev->dma_io_tlb_mem) was corrupted? The dev->dma_io_tlb_mem should be set here (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/dri...) through device_initialize.
I'm less sure about this. 'dma_io_tlb_mem' should be pointing at 'io_tlb_default_mem', which is a page-aligned allocation from memblock. The spinlock is at offset 0x24 in that structure, and looking at the register dump from the crash:
Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006 Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580 Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17 Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000 Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212 Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000 Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000 Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0 Jun 29 18:28:42 hp-4300G kernel: Call Trace: Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50 Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0
Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and RDX pointing at the spinlock. Yet RAX is holding junk :/
I agree that enabling KASAN would be a good idea, but I also think we probably need to get some more information out of swiotlb_tbl_map_single() to see see what exactly is going wrong in there.
I can certainly enable KASAN and if there is any debug print I can add or dump anything, let me know!
I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built x86 defconfig and ran it on my laptop. However, it seems to work fine!
Please can you share your .config?
Will