Hi Claire,
On Mon, Jul 05, 2021 at 03:29:34PM +0800, Claire Chang wrote:
Looking at the logs, the use-after-free bug looked somehow relevant (and it's nvme again. Qian's crash is about nvme too):
[ 2.468288] BUG: KASAN: use-after-free in __iommu_dma_unmap_swiotlb+0x64/0xb0 [ 2.468288] Read of size 8 at addr ffff8881d7830000 by task swapper/0/0
[ 2.468288] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3-debug #1 [ 2.468288] Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020 [ 2.468288] Call Trace: [ 2.468288] <IRQ> [ 2.479433] dump_stack+0x9c/0xcf [ 2.479433] print_address_description.constprop.0+0x18/0x130 [ 2.479433] ? __iommu_dma_unmap_swiotlb+0x64/0xb0 [ 2.479433] kasan_report.cold+0x7f/0x111 [ 2.479433] ? __iommu_dma_unmap_swiotlb+0x64/0xb0 [ 2.479433] __iommu_dma_unmap_swiotlb+0x64/0xb0 [ 2.479433] nvme_pci_complete_rq+0x73/0x130 [ 2.479433] blk_complete_reqs+0x6f/0x80 [ 2.479433] __do_softirq+0xfc/0x3be [ 2.479433] irq_exit_rcu+0xce/0x120 [ 2.479433] common_interrupt+0x80/0xa0 [ 2.479433] </IRQ> [ 2.479433] asm_common_interrupt+0x1e/0x40 [ 2.479433] RIP: 0010:cpuidle_enter_state+0xf9/0x590
I wonder if this ended up unmapping something wrong and messing up the dev->dma_io_tlb_mem (i.e. io_tlb_default_mem)?
Could you try this patch on top of 7d31f1c65cc9? This patch helps check if we try to unmap the wrong address.
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index b7f76bca89bf..5ac08d50a394 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -613,6 +613,21 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr, size_t mapping_size, enum dma_data_direction dir, unsigned long attrs) { + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + unsigned int offset = swiotlb_align_offset(dev, tlb_addr); + int index; + + if (!is_swiotlb_buffer(dev, tlb_addr - offset)) { + dev_err(dev, "%s: attempt to unmap invalid address (0x%llx, offset=%u)\n", __func__, tlb_addr, offset); + return; + } + + index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT; + if (mem->slots[index].orig_addr == INVALID_PHYS_ADDR) { + dev_err(dev, "%s: memory is not mapped before (0x%llx, offset=%u)\n", __func__, tlb_addr, offset); + return; + } + /* * First, sync the memory before unmapping the entry */
It might be useful to have CONFIG_SLUB_DEBUG=y, CONFIG_SLUB_DEBUG_ON=y and line numbers (scripts/decode_stacktrace.sh) too.
Thank you so much for helping!
Please find attached logs both decoded and not decoded, with CONFIG_KASAN=y + CONFIG_SLUB_DEBUG_ON=y with the requested patch applied on top of 7d31f1c65cc9.
If there is any further information I can provide, please let me know!
Cheers, Nathan