On Tue, Jun 22, 2010 at 3:59 PM, FUJITA Tomonori fujita.tomonori@lab.ntt.co.jp wrote:
On Mon, 21 Jun 2010 17:19:43 -0400 Matt Turner mattst88@gmail.com wrote:
Michael Cree and I have been debugging FDO bug 26403 [1]. I tried booting with `radeon.test=1` and found this, which I think is related:
[drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000 [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000
[snip]
[drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000 [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000 pci_map_single failed: could not allocate dma page tables [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000 [TTM] Couldn't bind backend. radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002) [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253 Error while testing BO move.
From what I can see, the call chain is radeon_test_moves (radeon_ttm_backend_bind called through callback function) - radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind - radeon_gart.c:radeon_gart_bind calls pci_map_page - pci_map_page is alpha_pci_map_page, which calls... - alpha_pci_map_page calls pci_iommu.c:pci_map_single_1 - pci_map_single_1 calls iommu_arena_alloc - iommu_arena_alloc calls iommu_arena_find_pages - iommu_arena_find_pages returns non-0 - iommu_arena_alloc returns non-0 - pci_map_single_1 returns 0 after printing "could not allocate dma page tables" error - alpha_pci_map_page returns 0 from pci_map_single_1 - radeon_gart_bind returns non-0, error path prints "*ERROR* failed to bind 128 pages at 0x0FF02000"
This happens in the latest git, right?
Is this a regression (what kernel version worked)?
Seems that the IOMMU can't find 128 pages. It's likely due to:
- out of the IOMMU space (possibly someone doesn't free the IOMMU
space).
or
- the mapping parameters (such as align) aren't appropriate so the
IOMMU can't find space.
I don't think KMS drivers have ever worked on alpha so its not a regression, they are working fine on x86 + powerpc and sparc has been run at least once.
I suspect we are simply hitting the limits of the iommu, how big an address space does it handle? since generally graphics drivers try to bind a lot of things to the GART.
It might be worth limiting the PCIGART in radeon to 32MB to see if the lower limit helps.
Dave.
Is this the cause of the bug we're seeing in the report [1]?
Anyone know what's going wrong here?
I've attached a patch to print the debug info about the mapping parameters.
diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index d1dbd9a..17cf0d8 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -187,6 +187,10 @@ iommu_arena_alloc(struct device *dev, struct pci_iommu_arena *arena, long n, /* Search for N empty ptes */ ptes = arena->ptes; mask = max(align, arena->align_entry) - 1;
- printk("%s: %p, %p, %d, %ld, %lx, %u\n", __func__, dev, arena, arena->size,
- n, mask, align);
p = iommu_arena_find_pages(dev, arena, n, mask); if (p < 0) { spin_unlock_irqrestore(&arena->lock, flags);