Greetings,
Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this.
[ 1313.811031] nouveau 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [ 1313.811035] swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152 [ 1313.811038] CPU: 6 PID: 3026 Comm: Xorg Tainted: G E 4.15.0.g1291a0d5-master #355 [ 1313.811040] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 1313.811041] Call Trace: [ 1313.811049] dump_stack+0x7c/0xb6 [ 1313.811053] swiotlb_alloc_coherent+0x13f/0x150 [ 1313.811060] ttm_dma_pool_alloc_new_pages+0x106/0x3c0 [ttm] [ 1313.811066] ttm_dma_pool_get_pages+0x10a/0x1e0 [ttm] [ 1313.811070] ttm_dma_populate+0x21f/0x2f0 [ttm] [ 1313.811075] ttm_tt_bind+0x2f/0x60 [ttm] [ 1313.811079] ttm_bo_handle_move_mem+0x51f/0x580 [ttm] [ 1313.811084] ? ttm_bo_handle_move_mem+0x5/0x580 [ttm] [ 1313.811088] ttm_bo_validate+0x10c/0x120 [ttm] [ 1313.811092] ? ttm_bo_validate+0x5/0x120 [ttm] [ 1313.811106] ? drm_mode_setcrtc+0x20e/0x540 [drm] [ 1313.811109] ttm_bo_init_reserved+0x290/0x490 [ttm] [ 1313.811114] ttm_bo_init+0x52/0xb0 [ttm] [ 1313.811141] ? nv10_bo_put_tile_region+0x60/0x60 [nouveau] [ 1313.811163] nouveau_bo_new+0x465/0x5e0 [nouveau] [ 1313.811184] ? nv10_bo_put_tile_region+0x60/0x60 [nouveau] [ 1313.811203] nouveau_gem_new+0x66/0x110 [nouveau] [ 1313.811223] ? nouveau_gem_new+0x110/0x110 [nouveau] [ 1313.811241] nouveau_gem_ioctl_new+0x48/0xc0 [nouveau] [ 1313.811249] drm_ioctl_kernel+0x64/0xb0 [drm] [ 1313.811257] drm_ioctl+0x2a4/0x360 [drm] [ 1313.811276] ? nouveau_gem_new+0x110/0x110 [nouveau] [ 1313.811285] ? drm_ioctl+0x5/0x360 [drm] [ 1313.811304] nouveau_drm_ioctl+0x50/0xb0 [nouveau] [ 1313.811308] do_vfs_ioctl+0x90/0x690 [ 1313.811311] ? do_vfs_ioctl+0x5/0x690 [ 1313.811313] SyS_ioctl+0x3b/0x70 [ 1313.811316] entry_SYSCALL_64_fastpath+0x1f/0x91 [ 1313.811320] RIP: 0033:0x7f3234746227 [ 1313.811321] RSP: 002b:00007ffc3ace0408 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 [ 1313.811324] RAX: ffffffffffffffda RBX: 00000000025515d0 RCX: 00007f3234746227 [ 1313.811325] RDX: 00007ffc3ace0460 RSI: 00000000c0306480 RDI: 000000000000000b [ 1313.811326] RBP: 0000000000824120 R08: 0000000002548f80 R09: 00000000025490d0 [ 1313.811328] R10: 0000000000000000 R11: 0000000000003246 R12: 000000000000093d [ 1313.811329] R13: 0000000002aff74c R14: 0000000000824150 R15: 0000000000000000
On 12/18/17 7:06 PM, Mike Galbraith wrote:
Greetings,
Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this.
Hi,
i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice!
Greetings,
Tobias
[ 1313.811031] nouveau 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [ 1313.811035] swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152 [ 1313.811038] CPU: 6 PID: 3026 Comm: Xorg Tainted: G E 4.15.0.g1291a0d5-master #355 [ 1313.811040] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 1313.811041] Call Trace: [ 1313.811049] dump_stack+0x7c/0xb6 [ 1313.811053] swiotlb_alloc_coherent+0x13f/0x150 [ 1313.811060] ttm_dma_pool_alloc_new_pages+0x106/0x3c0 [ttm] [ 1313.811066] ttm_dma_pool_get_pages+0x10a/0x1e0 [ttm] [ 1313.811070] ttm_dma_populate+0x21f/0x2f0 [ttm] [ 1313.811075] ttm_tt_bind+0x2f/0x60 [ttm] [ 1313.811079] ttm_bo_handle_move_mem+0x51f/0x580 [ttm] [ 1313.811084] ? ttm_bo_handle_move_mem+0x5/0x580 [ttm] [ 1313.811088] ttm_bo_validate+0x10c/0x120 [ttm] [ 1313.811092] ? ttm_bo_validate+0x5/0x120 [ttm] [ 1313.811106] ? drm_mode_setcrtc+0x20e/0x540 [drm] [ 1313.811109] ttm_bo_init_reserved+0x290/0x490 [ttm] [ 1313.811114] ttm_bo_init+0x52/0xb0 [ttm] [ 1313.811141] ? nv10_bo_put_tile_region+0x60/0x60 [nouveau] [ 1313.811163] nouveau_bo_new+0x465/0x5e0 [nouveau] [ 1313.811184] ? nv10_bo_put_tile_region+0x60/0x60 [nouveau] [ 1313.811203] nouveau_gem_new+0x66/0x110 [nouveau] [ 1313.811223] ? nouveau_gem_new+0x110/0x110 [nouveau] [ 1313.811241] nouveau_gem_ioctl_new+0x48/0xc0 [nouveau] [ 1313.811249] drm_ioctl_kernel+0x64/0xb0 [drm] [ 1313.811257] drm_ioctl+0x2a4/0x360 [drm] [ 1313.811276] ? nouveau_gem_new+0x110/0x110 [nouveau] [ 1313.811285] ? drm_ioctl+0x5/0x360 [drm] [ 1313.811304] nouveau_drm_ioctl+0x50/0xb0 [nouveau] [ 1313.811308] do_vfs_ioctl+0x90/0x690 [ 1313.811311] ? do_vfs_ioctl+0x5/0x690 [ 1313.811313] SyS_ioctl+0x3b/0x70 [ 1313.811316] entry_SYSCALL_64_fastpath+0x1f/0x91 [ 1313.811320] RIP: 0033:0x7f3234746227 [ 1313.811321] RSP: 002b:00007ffc3ace0408 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 [ 1313.811324] RAX: ffffffffffffffda RBX: 00000000025515d0 RCX: 00007f3234746227 [ 1313.811325] RDX: 00007ffc3ace0460 RSI: 00000000c0306480 RDI: 000000000000000b [ 1313.811326] RBP: 0000000000824120 R08: 0000000002548f80 R09: 00000000025490d0 [ 1313.811328] R10: 0000000000000000 R11: 0000000000003246 R12: 000000000000093d [ 1313.811329] R13: 0000000002aff74c R14: 0000000000824150 R15: 0000000000000000
On Mon, 2017-12-18 at 20:01 +0100, Tobias Klausmann wrote:
On 12/18/17 7:06 PM, Mike Galbraith wrote:
Greetings,
Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this.
Hi,
i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice!
Workload permitting. To reproduce, mount your box NFS, cd to somewhere the NFS mount, and just do bonnie -s <memory size>. There, maybe you'll beat me to it. I hope so, I have multiple kernels doing the annoying "baby birds in a nest" thing at me literally endlessly :)
-Mike
On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
On 12/18/17 7:06 PM, Mike Galbraith wrote:
Greetings,
Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this.
Hi,
i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice!
I'm seeing this (with the amdgpu and radeon drivers) when restic takes a backup, creating memory pressure. I happen to have just finished bisecting, the result is:
648bc3574716400acc06f99915815f80d9563783 is the first bad commit commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König christian.koenig@amd.com Date: Thu Jul 6 09:59:43 2017 +0200
drm/ttm: add transparent huge page support for DMA allocations v2
Try to allocate huge pages when it makes sense.
v2: fix comment and use ifdef
On 2017-12-19 11:37 AM, Michel Dänzer wrote:
On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
On 12/18/17 7:06 PM, Mike Galbraith wrote:
Greetings,
Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this.
Hi,
i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice!
I'm seeing this (with the amdgpu and radeon drivers) when restic takes a backup, creating memory pressure. I happen to have just finished bisecting, the result is:
648bc3574716400acc06f99915815f80d9563783 is the first bad commit commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König christian.koenig@amd.com Date: Thu Jul 6 09:59:43 2017 +0200
drm/ttm: add transparent huge page support for DMA allocations v2 Try to allocate huge pages when it makes sense. v2: fix comment and use ifdef
BTW, I haven't noticed any bad effects other than the dmesg splats, so maybe it's just noise about transient failures for which there is a proper fallback in place.
Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
On 2017-12-19 11:37 AM, Michel Dänzer wrote:
On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
On 12/18/17 7:06 PM, Mike Galbraith wrote:
Greetings,
Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this.
Hi,
i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice!
I'm seeing this (with the amdgpu and radeon drivers) when restic takes a backup, creating memory pressure. I happen to have just finished bisecting, the result is:
648bc3574716400acc06f99915815f80d9563783 is the first bad commit commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König christian.koenig@amd.com Date: Thu Jul 6 09:59:43 2017 +0200
drm/ttm: add transparent huge page support for DMA allocations v2 Try to allocate huge pages when it makes sense. v2: fix comment and use ifdef
BTW, I haven't noticed any bad effects other than the dmesg splats, so maybe it's just noise about transient failures for which there is a proper fallback in place.
Yeah, I think that is exactly what happens here.
We try to allocate a huge page, but fail and so fall back to using multiple 4k pages instead.
Going to send out a patch to suppress the warning.
Thanks for bisecting this, Christian.
On Tue, Dec 19, 2017 at 8:45 AM, Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
On 2017-12-19 11:37 AM, Michel Dänzer wrote:
On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
On 12/18/17 7:06 PM, Mike Galbraith wrote:
Greetings,
Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this.
Hi,
i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice!
I'm seeing this (with the amdgpu and radeon drivers) when restic takes a backup, creating memory pressure. I happen to have just finished bisecting, the result is:
648bc3574716400acc06f99915815f80d9563783 is the first bad commit commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König christian.koenig@amd.com Date: Thu Jul 6 09:59:43 2017 +0200
drm/ttm: add transparent huge page support for DMA allocations v2 Try to allocate huge pages when it makes sense. v2: fix comment and use ifdef
BTW, I haven't noticed any bad effects other than the dmesg splats, so maybe it's just noise about transient failures for which there is a proper fallback in place.
Yeah, I think that is exactly what happens here.
We try to allocate a huge page, but fail and so fall back to using multiple 4k pages instead.
Going to send out a patch to suppress the warning.
Hi Christian,
Did you ever send out such a patch? I didn't see one on the list, but perhaps I missed it. One definitely hasn't made it upstream yet. (I just hit the issue myself with Linus's tree from last night.)
Thanks,
-ilia
On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote:
On Tue, Dec 19, 2017 at 8:45 AM, Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
On 2017-12-19 11:37 AM, Michel Dänzer wrote:
On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
On 12/18/17 7:06 PM, Mike Galbraith wrote:
Greetings,
Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this.
Hi,
i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice!
I'm seeing this (with the amdgpu and radeon drivers) when restic takes a backup, creating memory pressure. I happen to have just finished bisecting, the result is:
648bc3574716400acc06f99915815f80d9563783 is the first bad commit commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König christian.koenig@amd.com Date: Thu Jul 6 09:59:43 2017 +0200
drm/ttm: add transparent huge page support for DMA allocations v2 Try to allocate huge pages when it makes sense. v2: fix comment and use ifdef
BTW, I haven't noticed any bad effects other than the dmesg splats, so maybe it's just noise about transient failures for which there is a proper fallback in place.
Yeah, I think that is exactly what happens here.
We try to allocate a huge page, but fail and so fall back to using multiple 4k pages instead.
Going to send out a patch to suppress the warning.
Hi Christian,
Did you ever send out such a patch? I didn't see one on the list, but perhaps I missed it. One definitely hasn't made it upstream yet. (I just hit the issue myself with Linus's tree from last night.)
Actually, that wants a bit more methinks, because while the stack dump goes away, you still get spammed, it just comes in smaller chunks.
-Mike
On Sun, Dec 31, 2017 at 3:53 PM, Mike Galbraith efault@gmx.de wrote:
On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote:
On Tue, Dec 19, 2017 at 8:45 AM, Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
On 2017-12-19 11:37 AM, Michel Dänzer wrote:
On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
On 12/18/17 7:06 PM, Mike Galbraith wrote: > > Greetings, > > Kernel bound workloads seem to trigger the below for whatever reason. > I only see this when beating up NFS. There was a kworker wakeup > latency issue, but with a bandaid applied to fix that up, I can still > trigger this.
Hi,
i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice!
I'm seeing this (with the amdgpu and radeon drivers) when restic takes a backup, creating memory pressure. I happen to have just finished bisecting, the result is:
648bc3574716400acc06f99915815f80d9563783 is the first bad commit commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König christian.koenig@amd.com Date: Thu Jul 6 09:59:43 2017 +0200
drm/ttm: add transparent huge page support for DMA allocations v2 Try to allocate huge pages when it makes sense. v2: fix comment and use ifdef
BTW, I haven't noticed any bad effects other than the dmesg splats, so maybe it's just noise about transient failures for which there is a proper fallback in place.
Yeah, I think that is exactly what happens here.
We try to allocate a huge page, but fail and so fall back to using multiple 4k pages instead.
Going to send out a patch to suppress the warning.
Hi Christian,
Did you ever send out such a patch? I didn't see one on the list, but perhaps I missed it. One definitely hasn't made it upstream yet. (I just hit the issue myself with Linus's tree from last night.)
Actually, that wants a bit more methinks, because while the stack dump goes away, you still get spammed, it just comes in smaller chunks.
OK, well this has to either be fixed or reverted. Right now it's complaining all the time for me after like a day of uptime.
-ilia
Am 01.01.2018 um 19:08 schrieb Ilia Mirkin:
On Sun, Dec 31, 2017 at 3:53 PM, Mike Galbraith efault@gmx.de wrote:
On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote:
On Tue, Dec 19, 2017 at 8:45 AM, Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
On 2017-12-19 11:37 AM, Michel Dänzer wrote:
On 2017-12-18 08:01 PM, Tobias Klausmann wrote: > On 12/18/17 7:06 PM, Mike Galbraith wrote: >> Greetings, >> >> Kernel bound workloads seem to trigger the below for whatever reason. >> I only see this when beating up NFS. There was a kworker wakeup >> latency issue, but with a bandaid applied to fix that up, I can still >> trigger this. > > Hi, > > i have seen this one as well with my system, but i could not find an > easy way to trigger it for bisecting purpose. If you can trigger it > conveniently, a bisect would be nice! I'm seeing this (with the amdgpu and radeon drivers) when restic takes a backup, creating memory pressure. I happen to have just finished bisecting, the result is:
648bc3574716400acc06f99915815f80d9563783 is the first bad commit commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König christian.koenig@amd.com Date: Thu Jul 6 09:59:43 2017 +0200
drm/ttm: add transparent huge page support for DMA allocations v2 Try to allocate huge pages when it makes sense. v2: fix comment and use ifdef
BTW, I haven't noticed any bad effects other than the dmesg splats, so maybe it's just noise about transient failures for which there is a proper fallback in place.
Yeah, I think that is exactly what happens here.
We try to allocate a huge page, but fail and so fall back to using multiple 4k pages instead.
Going to send out a patch to suppress the warning.
Hi Christian,
Did you ever send out such a patch? I didn't see one on the list, but perhaps I missed it. One definitely hasn't made it upstream yet. (I just hit the issue myself with Linus's tree from last night.)
Actually, that wants a bit more methinks, because while the stack dump goes away, you still get spammed, it just comes in smaller chunks.
OK, well this has to either be fixed or reverted. Right now it's complaining all the time for me after like a day of uptime.
I've already send out a patch to Konrad Rzeszutek Wilk and he wanted to queue that up.
But there is another warning I'm currently working on, just didn't had time to during the holidays.
Regards, Christian.
dri-devel@lists.freedesktop.org