I'm doing memory allocation failure injection test using 3.19-rc5 and it seems to me that there is a memory corruption bug in ttm or vmwgfx code.
---------- Crash pattern 1 start ---------- [ 80.751971] [TTM] Failed allocating page table [ 83.000393] BUG: unable to handle kernel NULL pointer dereference at (null) [ 83.004392] IP: [<ffffffff811b65a9>] __fput+0x39/0x1e0 [ 83.006944] PGD 7acd2067 PUD 7b0c7067 PMD 0 [ 83.009240] Oops: 0000 [#1] SMP [ 83.010940] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel dm_mirror ghash_clmulni_intel dm_region_hash aesni_intel dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev vmw_balloon microcode serio_raw pcspkr parport_pc shpchp parport vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput sd_mod ata_generic pata_acpi mptspi scsi_transport_spi mptscsih ata_piix e1000 mptbase libata floppy [ 83.038033] CPU: 2 PID: 8795 Comm: sh Tainted: G W OE 3.19.0-rc5+ #28 [ 83.039666] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 83.042110] task: ffff88007a220000 ti: ffff880052048000 task.ti: ffff880052048000 [ 83.043865] RIP: 0010:[<ffffffff811b65a9>] [<ffffffff811b65a9>] __fput+0x39/0x1e0 [ 83.045665] RSP: 0018:ffff88005204bea8 EFLAGS: 00010297 [ 83.046895] RAX: 0000000000000000 RBX: ffff88007aff3500 RCX: 0000000000000a0a [ 83.048595] RDX: 000000000002801d RSI: 000000000000000a RDI: ffff88007aff3500 [ 83.050254] RBP: ffff88005204bee8 R08: ffff88007cbfd000 R09: 0000000180080006 [ 83.051848] R10: 0000000000000000 R11: ffffea0001f2fe00 R12: ffffffff81e6c040 [ 83.053515] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 83.055156] FS: 0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 [ 83.057000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 83.058328] CR2: 0000000000000000 CR3: 000000007b0bc000 CR4: 00000000000407e0 [ 83.060004] Stack: [ 83.060482] ffff88007af0de48 ffff88007af0dc00 ffff88007af0de48 0000000000000000 [ 83.062285] ffffffff81e6c040 ffff88007a220610 ffff88007a220000 0000000000000000 [ 83.064115] ffff88005204bef8 ffffffff811b679e ffff88005204bf28 ffffffff81088f6f [ 83.065956] Call Trace: [ 83.066544] [<ffffffff811b679e>] ____fput+0xe/0x10 [ 83.067738] [<ffffffff81088f6f>] task_work_run+0xaf/0xf0 [ 83.068971] [<ffffffff81013c5a>] do_notify_resume+0x7a/0x90 [ 83.070307] [<ffffffff816a6d87>] int_signal+0x12/0x17 [ 83.071464] Code: 55 41 54 53 48 89 fb 48 83 ec 18 4c 8b 7f 18 4c 8b 77 10 4c 8b 6f 20 e8 06 c7 4e 00 8b 53 44 4c 8b 53 20 89 d0 83 e0 02 83 f8 01 <41> 0f b7 02 45 19 e4 41 83 e4 08 41 83 c4 08 44 89 e1 66 25 00 [ 83.077450] RIP [<ffffffff811b65a9>] __fput+0x39/0x1e0 [ 83.078729] RSP <ffff88005204bea8> [ 83.079522] CR2: 0000000000000000
crash> bt -l PID: 8795 TASK: ffff88007a220000 CPU: 2 COMMAND: "sh" #0 [ffff88005204ba70] machine_kexec at ffffffff8104ef62 /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320 #1 [ffff88005204bac0] crash_kexec at ffffffff810ed983 /usr/src/linux/kernel/kexec.c: 1482 #2 [ffff88005204bb90] oops_end at ffffffff810176e8 /usr/src/linux/arch/x86/kernel/dumpstack.c: 231 #3 [ffff88005204bbc0] no_context at ffffffff8169af1f /usr/src/linux/arch/x86/mm/fault.c: 724 #4 [ffff88005204bc20] __bad_area_nosemaphore at ffffffff8169aff6 /usr/src/linux/arch/x86/mm/fault.c: 804 #5 [ffff88005204bc70] bad_area at ffffffff8169b31f /usr/src/linux/arch/x86/mm/fault.c: 833 #6 [ffff88005204bca0] __do_page_fault at ffffffff81059b37 /usr/src/linux/arch/x86/mm/fault.c: 1213 #7 [ffff88005204bdc0] do_page_fault at ffffffff81059c11 /usr/src/linux/arch/x86/mm/fault.c: 1295 #8 [ffff88005204bdf0] page_fault at ffffffff816a8a28 /usr/src/linux/arch/x86/kernel/entry_64.S: 1283 [exception RIP: __fput+57] RIP: ffffffff811b65a9 RSP: ffff88005204bea8 RFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff88007aff3500 RCX: 0000000000000a0a RDX: 000000000002801d RSI: 000000000000000a RDI: ffff88007aff3500 RBP: ffff88005204bee8 R8: ffff88007cbfd000 R9: 0000000180080006 R10: 0000000000000000 R11: ffffea0001f2fe00 R12: ffffffff81e6c040 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88005204bef0] ____fput at ffffffff811b679e /usr/src/linux/fs/file_table.c: 245 #10 [ffff88005204bf00] task_work_run at ffffffff81088f6f /usr/src/linux/kernel/task_work.c: 125 #11 [ffff88005204bf30] do_notify_resume at ffffffff81013c5a /usr/src/linux/include/linux/tracehook.h: 190 #12 [ffff88005204bf50] int_signal at ffffffff816a6d87 /usr/src/linux/arch/x86/kernel/entry_64.S: 587 RIP: 00007f1361d5f420 RSP: 00007fff77be5740 RFLAGS: 00000200 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 000000000000003b CS: 0033 SS: 002b WARNING: possibly bogus exception frame ---------- Crash pattern 1 end ----------
---------- Crash pattern 2 start ---------- [ 227.647021] [TTM] Failed allocating page table [ 227.875795] BUG: unable to handle kernel NULL pointer dereference at (null) [ 227.877714] IP: [<ffffffff81594c57>] skb_queue_tail+0x37/0x60 [ 227.879107] PGD 78adc067 PUD 78ada067 PMD 0 [ 227.880186] Oops: 0002 [#1] SMP [ 227.881017] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mirror aesni_intel dm_region_hash dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev vmw_balloon microcode parport_pc serio_raw pcspkr parport vmw_vmci shpchp i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput ata_generic pata_acpi sd_mod ata_piix libata mptspi scsi_transport_spi e1000 mptscsih mptbase floppy [ 227.898988] CPU: 2 PID: 610 Comm: Xorg Tainted: G W OE 3.19.0-rc5+ #28 [ 227.900691] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 227.903162] task: ffff8800788c6040 ti: ffff8800792d8000 task.ti: ffff8800792d8000 [ 227.904884] RIP: 0010:[<ffffffff81594c57>] [<ffffffff81594c57>] skb_queue_tail+0x37/0x60 [ 227.906816] RSP: 0018:ffff8800792dbbc8 EFLAGS: 00010046 [ 227.908056] RAX: 0000000000000292 RBX: ffff88007cbc6d10 RCX: 0000000000000000 [ 227.909718] RDX: 0000000000000000 RSI: 0000000000000292 RDI: ffff88007cbc6d24 [ 227.911376] RBP: ffff8800792dbbe8 R08: 0000000000000292 R09: 0180000002800000 [ 227.913027] R10: 0000000700020008 R11: 0000000000000000 R12: ffff88007b65aa00 [ 227.914690] R13: ffff88007cbc6d24 R14: 0000000000000000 R15: ffff88007cbc6c80 [ 227.916356] FS: 00007f3d07740980(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 [ 227.918232] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 227.919559] CR2: 0000000000000000 CR3: 0000000078add000 CR4: 00000000000407e0 [ 227.921261] Stack: [ 227.921744] 0000000000000078 ffff88007b65aa00 0000000000000078 0000000000000000 [ 227.923618] ffff8800792dbca8 ffffffff816491bd ffff88007cbc6d10 ffff8800792dbd10 [ 227.925427] 0000007800000000 ffff8800792dbcc8 0000000000000078 ffff88007cbc6f78 [ 227.927271] Call Trace: [ 227.927872] [<ffffffff816491bd>] unix_stream_sendmsg+0x1dd/0x430 [ 227.929301] [<ffffffff8158c0c3>] sock_aio_write+0x103/0x140 [ 227.930638] [<ffffffff811b42ec>] do_sync_readv_writev+0x4c/0x80 [ 227.932047] [<ffffffff811b5c95>] do_readv_writev+0x1e5/0x280 [ 227.933406] [<ffffffff8101fe4b>] ? __restore_xstate_sig+0x8b/0x680 [ 227.934865] [<ffffffff81104424>] ? __audit_syscall_entry+0xb4/0x110 [ 227.936371] [<ffffffff811b5db9>] vfs_writev+0x39/0x50 [ 227.937565] [<ffffffff811b5eea>] SyS_writev+0x4a/0xd0 [ 227.938777] [<ffffffff816a6d6c>] ? int_check_syscall_exit_work+0x34/0x3d [ 227.940364] [<ffffffff816a6ae9>] system_call_fastpath+0x12/0x17 [ 227.941775] Code: 8d 6f 14 41 54 49 89 f4 53 48 89 fb 4c 89 ef 48 83 ec 08 e8 dc 1a 11 00 48 8b 53 08 49 89 1c 24 4c 89 ef 48 89 c6 49 89 54 24 08 <4c> 89 22 83 43 10 01 4c 89 63 08 e8 09 17 11 00 48 83 c4 08 5b [ 227.947880] RIP [<ffffffff81594c57>] skb_queue_tail+0x37/0x60 [ 227.949297] RSP <ffff8800792dbbc8> [ 227.950112] CR2: 0000000000000000
crash> bt -l PID: 610 TASK: ffff8800788c6040 CPU: 2 COMMAND: "Xorg" #0 [ffff8800792db790] machine_kexec at ffffffff8104ef62 /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320 #1 [ffff8800792db7e0] crash_kexec at ffffffff810ed983 /usr/src/linux/kernel/kexec.c: 1482 #2 [ffff8800792db8b0] oops_end at ffffffff810176e8 /usr/src/linux/arch/x86/kernel/dumpstack.c: 231 #3 [ffff8800792db8e0] no_context at ffffffff8169af1f /usr/src/linux/arch/x86/mm/fault.c: 724 #4 [ffff8800792db940] __bad_area_nosemaphore at ffffffff8169aff6 /usr/src/linux/arch/x86/mm/fault.c: 804 #5 [ffff8800792db990] bad_area at ffffffff8169b31f /usr/src/linux/arch/x86/mm/fault.c: 833 #6 [ffff8800792db9c0] __do_page_fault at ffffffff81059b37 /usr/src/linux/arch/x86/mm/fault.c: 1213 #7 [ffff8800792dbae0] do_page_fault at ffffffff81059c11 /usr/src/linux/arch/x86/mm/fault.c: 1295 #8 [ffff8800792dbb10] page_fault at ffffffff816a8a28 /usr/src/linux/arch/x86/kernel/entry_64.S: 1283 [exception RIP: skb_queue_tail+55] RIP: ffffffff81594c57 RSP: ffff8800792dbbc8 RFLAGS: 00010046 RAX: 0000000000000292 RBX: ffff88007cbc6d10 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000292 RDI: ffff88007cbc6d24 RBP: ffff8800792dbbe8 R8: 0000000000000292 R9: 0180000002800000 R10: 0000000700020008 R11: 0000000000000000 R12: ffff88007b65aa00 R13: ffff88007cbc6d24 R14: 0000000000000000 R15: ffff88007cbc6c80 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff8800792dbbf0] unix_stream_sendmsg at ffffffff816491bd /usr/src/linux/net/unix/af_unix.c: 1711 #10 [ffff8800792dbcb0] sock_aio_write at ffffffff8158c0c3 /usr/src/linux/net/socket.c: 955 #11 [ffff8800792dbd90] do_sync_readv_writev at ffffffff811b42ec /usr/src/linux/fs/read_write.c: 697 #12 [ffff8800792dbe20] do_readv_writev at ffffffff811b5c95 /usr/src/linux/fs/read_write.c: 851 #13 [ffff8800792dbf20] vfs_writev at ffffffff811b5db9 /usr/src/linux/fs/read_write.c: 893 #14 [ffff8800792dbf30] sys_writev at ffffffff811b5eea /usr/src/linux/fs/read_write.c: 926 #15 [ffff8800792dbf80] system_call_fastpath at ffffffff816a6ae9 /usr/src/linux/arch/x86/kernel/entry_64.S: 423 RIP: 00007f3d056223c0 RSP: 00007ffff316be40 RFLAGS: 00003293 RAX: ffffffffffffffda RBX: ffffffff816a6ae9 RCX: ffffffffffffffff RDX: 0000000000000001 RSI: 00007ffff316af90 RDI: 0000000000000014 RBP: 0000000001d59be0 R8: 0000000000000000 R9: 0000000000000004 R10: 00000000ffffffff R11: 0000000000003293 R12: 00007f3d077406a0 R13: 0000000000000001 R14: 00007ffff316af90 R15: 0000000000000000 ORIG_RAX: 0000000000000014 CS: 0033 SS: 002b ---------- Crash pattern 2 end ----------
---------- Crash pattern 3 start ---------- [ 88.675004] [TTM] Failed allocating page table [ 88.678152] BUG: unable to handle kernel paging request at ffff8801531d77c0 [ 88.679845] IP: [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0 [ 88.681221] PGD 1f2b067 PUD 0 [ 88.682000] Oops: 0002 [#1] SMP [ 88.682838] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mirror dm_region_hash aesni_intel dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev vmw_balloon microcode serio_raw pcspkr parport_pc shpchp vmw_vmci parport i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput sd_mod ata_generic pata_acpi e1000 ata_piix libata mptspi scsi_transport_spi mptscsih mptbase floppy [ 88.701377] CPU: 0 PID: 3904 Comm: gnome-shell Tainted: G W OE 3.19.0-rc5+ #31 [ 88.703292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 88.705840] task: ffff880079e05780 ti: ffff88007918c000 task.ti: ffff88007918c000 [ 88.707575] RIP: 0010:[<ffffffff815964b5>] [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0 [ 88.709601] RSP: 0018:ffff88007918faa8 EFLAGS: 00010246 [ 88.710884] RAX: 00000000ffffffff RBX: ffff8800531d7700 RCX: 00000000ffffffff [ 88.712584] RDX: ffff8801531d77c0 RSI: 0000000000000000 RDI: ffff8800531d77c8 [ 88.714260] RBP: ffff88007918faf8 R08: 00000000ffffffc0 R09: 0000000000000200 [ 88.715927] R10: ffffffff8159639e R11: ffff88007f803700 R12: ffff8800531d7800 [ 88.717648] R13: 00000000ffffffff R14: ffff88007f803700 R15: 0000000000000100 [ 88.719327] FS: 00007fcafd8aaa00(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 88.721216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.722548] CR2: ffff8801531d77c0 CR3: 00000000790ae000 CR4: 00000000000407f0 [ 88.724257] Stack: [ 88.724761] ffff88007a2cdf00 000000007a01b800 0000000000000246 00ff88007918fcd8 [ 88.726741] ffff88007918fae8 0000000000000003 0000000000000000 ffff88007918fba8 [ 88.728578] ffff88007a01b800 0000000000000000 ffff88007918fb58 ffffffff81596d5c [ 88.730582] Call Trace: [ 88.731243] [<ffffffff81596d5c>] alloc_skb_with_frags+0x5c/0x1e0 [ 88.732725] [<ffffffff811c99bc>] ? do_sys_poll+0x12c/0x5b0 [ 88.734208] [<ffffffff815910b6>] sock_alloc_send_pskb+0x196/0x250 [ 88.735710] [<ffffffff8159b887>] ? skb_copy_datagram_from_iter+0xe7/0x200 [ 88.737361] [<ffffffff8164ba07>] ? wait_for_unix_gc+0x27/0xa0 [ 88.738784] [<ffffffff8164928a>] unix_stream_sendmsg+0x2aa/0x430 [ 88.740213] [<ffffffff8158c0c3>] sock_aio_write+0x103/0x140 [ 88.741610] [<ffffffff811c8860>] ? poll_select_copy_remaining+0x130/0x130 [ 88.743278] [<ffffffff811b42ec>] do_sync_readv_writev+0x4c/0x80 [ 88.744721] [<ffffffff811b5c95>] do_readv_writev+0x1e5/0x280 [ 88.746109] [<ffffffff8158bf9d>] ? SYSC_recvfrom+0x13d/0x160 [ 88.747452] [<ffffffff81104424>] ? __audit_syscall_entry+0xb4/0x110 [ 88.748992] [<ffffffff811b5db9>] vfs_writev+0x39/0x50 [ 88.750192] [<ffffffff811b5eea>] SyS_writev+0x4a/0xd0 [ 88.751423] [<ffffffff811046b6>] ? __audit_syscall_exit+0x236/0x2e0 [ 88.753121] [<ffffffff816a6ae9>] system_call_fastpath+0x12/0x17 [ 88.754650] Code: b6 83 90 00 00 00 83 e0 f7 09 c8 b9 ff ff ff ff 85 f6 88 83 90 00 00 00 b8 ff ff ff ff 66 89 8b c2 00 00 00 66 89 83 c6 00 00 00 <48> c7 02 00 00 00 00 48 c7 42 08 00 00 00 00 48 c7 42 10 00 00 [ 88.761554] RIP [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0 [ 88.763077] RSP <ffff88007918faa8> [ 88.763978] CR2: ffff8801531d77c0
crash> bt -l PID: 3904 TASK: ffff880079e05780 CPU: 0 COMMAND: "gnome-shell" #0 [ffff88007918f690] machine_kexec at ffffffff8104ef62 /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320 #1 [ffff88007918f6e0] crash_kexec at ffffffff810ed983 /usr/src/linux/kernel/kexec.c: 1482 #2 [ffff88007918f7b0] oops_end at ffffffff810176e8 /usr/src/linux/arch/x86/kernel/dumpstack.c: 231 #3 [ffff88007918f7e0] no_context at ffffffff8169af1f /usr/src/linux/arch/x86/mm/fault.c: 724 #4 [ffff88007918f840] __bad_area_nosemaphore at ffffffff8169aff6 /usr/src/linux/arch/x86/mm/fault.c: 804 #5 [ffff88007918f890] bad_area_nosemaphore at ffffffff8169b162 /usr/src/linux/arch/x86/mm/fault.c: 812 #6 [ffff88007918f8a0] __do_page_fault at ffffffff810596f8 /usr/src/linux/arch/x86/mm/fault.c: 1277 #7 [ffff88007918f9c0] do_page_fault at ffffffff81059c11 /usr/src/linux/arch/x86/mm/fault.c: 1295 #8 [ffff88007918f9f0] page_fault at ffffffff816a8a28 /usr/src/linux/arch/x86/kernel/entry_64.S: 1283 [exception RIP: __alloc_skb+357] RIP: ffffffff815964b5 RSP: ffff88007918faa8 RFLAGS: 00010246 RAX: 00000000ffffffff RBX: ffff8800531d7700 RCX: 00000000ffffffff RDX: ffff8801531d77c0 RSI: 0000000000000000 RDI: ffff8800531d77c8 RBP: ffff88007918faf8 R8: 00000000ffffffc0 R9: 0000000000000200 R10: ffffffff8159639e R11: ffff88007f803700 R12: ffff8800531d7800 R13: 00000000ffffffff R14: ffff88007f803700 R15: 0000000000000100 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88007918fb00] alloc_skb_with_frags at ffffffff81596d5c /usr/src/linux/net/core/skbuff.c: 4386 #10 [ffff88007918fb60] sock_alloc_send_pskb at ffffffff815910b6 /usr/src/linux/net/core/sock.c: 1826 #11 [ffff88007918fbf0] unix_stream_sendmsg at ffffffff8164928a /usr/src/linux/net/unix/af_unix.c: 1682 #12 [ffff88007918fcb0] sock_aio_write at ffffffff8158c0c3 /usr/src/linux/net/socket.c: 955 #13 [ffff88007918fd90] do_sync_readv_writev at ffffffff811b42ec /usr/src/linux/fs/read_write.c: 697 #14 [ffff88007918fe20] do_readv_writev at ffffffff811b5c95 /usr/src/linux/fs/read_write.c: 851 #15 [ffff88007918ff20] vfs_writev at ffffffff811b5db9 /usr/src/linux/fs/read_write.c: 893 #16 [ffff88007918ff30] sys_writev at ffffffff811b5eea /usr/src/linux/fs/read_write.c: 926 #17 [ffff88007918ff80] system_call_fastpath at ffffffff816a6ae9 /usr/src/linux/arch/x86/kernel/entry_64.S: 423 RIP: 00007fcaf3c273c0 RSP: 00007fffadd91330 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: ffffffff816a6ae9 RCX: 00007fffadd91360 RDX: 0000000000000002 RSI: 00007fffadd914b0 RDI: 0000000000000006 RBP: 0000000000b5c230 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 00007fffadd91428 R13: 00007fffadd91424 R14: 0000000000b5c248 R15: 0000000000000001 ORIG_RAX: 0000000000000014 CS: 0033 SS: 002b ---------- Crash pattern 3 end ----------
---------- Failed memory allocation start ---------- 0xffffffff81199850 : __kmalloc+0x0/0x280 [kernel] /usr/src/linux/mm/slub.c:3247 0xffffffff814676fa : ttm_tt_init+0x8a/0xb0 [kernel] /usr/src/linux/include/linux/slab.h:524 /usr/src/linux/include/linux/slab.h:535 /usr/src/linux/include/drm/drm_mem_util.h:38 /usr/src/linux/drivers/gpu/drm/ttm/ttm_tt.c:53 /usr/src/linux/drivers/gpu/drm/ttm/ttm_tt.c:200 0xffffffff8147caa6 : vmw_ttm_tt_create+0x76/0xb0 [kernel] /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c:700 0xffffffff81467b8d : ttm_bo_add_ttm+0x9d/0xe0 [kernel] /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:238 0xffffffff8146a2ff : ttm_bo_validate+0x14f/0x1f0 [kernel] /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:1067 0xffffffff8146a5d4 : ttm_bo_init+0x234/0x470 [kernel] /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:1167 0xffffffff8147ae9e : vmw_dmabuf_init+0x13e/0x240 [kernel] /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:435 0xffffffff8147b0cb : vmw_user_dmabuf_alloc+0x8b/0x120 [kernel] /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:503 0xffffffff8147b202 : vmw_dmabuf_alloc_ioctl+0x52/0xb0 [kernel] /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:698 0xffffffff814497a4 : drm_ioctl+0x1a4/0x630 [kernel] /usr/src/linux/drivers/gpu/drm/drm_ioctl.c:727 0xffffffff814773c9 : vmw_generic_ioctl+0x169/0x260 [kernel] /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1073 0xffffffff814774f5 : vmw_unlocked_ioctl+0x15/0x20 [kernel] /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1084 0xffffffff811c7c18 : do_vfs_ioctl+0x2f8/0x510 [kernel] /usr/src/linux/fs/ioctl.c:44 /usr/src/linux/fs/ioctl.c:602 0xffffffff811c7e71 : sys_ioctl+0x41/0x80 [kernel] /usr/src/linux/include/linux/file.h:38 /usr/src/linux/fs/ioctl.c:618 /usr/src/linux/fs/ioctl.c:608 0xffffffff816a6ae9 : system_call_fastpath+0x12/0x17 [kernel] /usr/src/linux/arch/x86/kernel/entry_64.S:423 ---------- Failed memory allocation end ----------
If I skip ttm_tt_destroy() call, this bug no longer occurs. Therefore, I guess that this memory corruption is caused by the destroy function being called with partially initialized ttm object.
--- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -199,8 +199,8 @@ int ttm_tt_init(struct ttm_tt *ttm, struct ttm_bo_device *bdev,
ttm_tt_alloc_page_directory(ttm); if (!ttm->pages) { - ttm_tt_destroy(ttm); - pr_err("Failed allocating page table\n"); + //ttm_tt_destroy(ttm); + pr_err("Failed allocating page table, but skip ttm_tt_destroy()\n"); return -ENOMEM; } return 0;
I can reproduce this problem at least since 3.13.0. I don't know whether this problem is specific to vmwgfx code or not, for I tested only CentOS 7 with GUI environment on VMware Player 6.
I think you can reproduce this problem by starting a SystemTap script shown below and then flipping windows using from Ctrl-Alt-F1 to Ctrl-Alt-F7 .
---------- Reproducer start ---------- # stap -g -e 'global is_target%; probe begin { printf("Probe start!\n"); } probe module("ttm").function("ttm_tt_init") { is_target[tid()] = 1; } probe module("ttm").function("ttm_tt_init").return { is_target[tid()] = 0; } probe kernel.function("__kmalloc") { if (($flags & %{ __GFP_NOFAIL | __GFP_WAIT %} ) == %{ __GFP_WAIT %} && is_target[tid()]) { print_backtrace(); $size = 1 << 30; exit(); } } probe end { delete is_target; }' ---------- Reproducer end ----------
I can also reproduce below problem using 3.10.0-123.9.3.el7.x86_64 , though below problem might be different from above problem.
---------- Crash pattern 4 start ---------- [TTM] Failed allocating page table ------------[ cut here ]------------ WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0() list_add corruption. prev->next should be next (ffff88007af4cd98), but was (null). (prev=ffff88007ac881f0). Modules linked in: fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ppdev glue_helper vmw_balloon ablk_helper cryptd serio_raw parport_pc i2c_piix4 parport vmw_vmci pcspkr dm_mirror shpchp dm_region_hash dm_log mperf dm_mod nfsd auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi crc32c_intel vmwgfx mptspi ttm scsi_transport_spi mptscsih ahci ata_piix libahci drm mptbase libata e1000 i2c_core floppy [last unloaded: stap_bad36894e80d53e8ee72ce3ee48a27ac_3394] CPU: 0 PID: 849 Comm: Xorg Tainted: GF W O-------------- 3.10.0-123.9.3.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 ffff88007984da10 00000000da42f7a4 ffff88007984d9c8 ffffffff815e239b ffff88007984da00 ffffffff8105dee1 ffff88007ac881f0 ffff88007af4cd98 ffff88007ac881f0 0000000000000282 ffff88007984db98 ffff88007984da68 Call Trace: [<ffffffff815e239b>] dump_stack+0x19/0x1b [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80 [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffff812cfeec>] __list_add+0xac/0xc0 [<ffffffffa01a56e9>] vmw_fence_create+0xd9/0x130 [vmwgfx] [<ffffffffa0197ef8>] vmw_execbuf_fence_commands+0xc8/0x120 [vmwgfx] [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx] [<ffffffff81194585>] ? __kmalloc+0x55/0x230 [<ffffffffa0199af8>] do_dmabuf_dirty_sou.isra.9+0x328/0x3c0 [vmwgfx] [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm] [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm] [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx] [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm] [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm] [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx] [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b ---[ end trace a993c155f4775b96 ]--- ------------[ cut here ]------------ WARNING: at lib/list_debug.c:36 __list_add+0x8a/0xc0() list_add double add: new=ffff88007ac881f0, prev=ffff88007ac881f0, next=ffff88007af4cd98. Modules linked in: fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ppdev glue_helper vmw_balloon ablk_helper cryptd serio_raw parport_pc i2c_piix4 parport vmw_vmci pcspkr dm_mirror shpchp dm_region_hash dm_log mperf dm_mod nfsd auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi crc32c_intel vmwgfx mptspi ttm scsi_transport_spi mptscsih ahci ata_piix libahci drm mptbase libata e1000 i2c_core floppy [last unloaded: stap_bad36894e80d53e8ee72ce3ee48a27ac_3394] CPU: 0 PID: 849 Comm: Xorg Tainted: GF W O-------------- 3.10.0-123.9.3.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 ffff88007984da10 00000000da42f7a4 ffff88007984d9c8 ffffffff815e239b ffff88007984da00 ffffffff8105dee1 ffff88007ac881f0 ffff88007af4cd98 ffff88007ac881f0 0000000000000282 ffff88007984db98 ffff88007984da68 Call Trace: [<ffffffff815e239b>] dump_stack+0x19/0x1b [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80 [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffff812cfeca>] __list_add+0x8a/0xc0 [<ffffffffa01a56e9>] vmw_fence_create+0xd9/0x130 [vmwgfx] [<ffffffffa0197ef8>] vmw_execbuf_fence_commands+0xc8/0x120 [vmwgfx] [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx] [<ffffffff81194585>] ? __kmalloc+0x55/0x230 [<ffffffffa0199af8>] do_dmabuf_dirty_sou.isra.9+0x328/0x3c0 [vmwgfx] [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm] [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm] [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx] [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm] [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm] [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx] [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b ---[ end trace a993c155f4775b97 ]--- INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=60019 jiffies, g=6722, c=6721, q=0) sending NMI to all CPUs: NMI backtrace for cpu 0 CPU: 0 PID: 849 Comm: Xorg Tainted: GF W O-------------- 3.10.0-123.9.3.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 task: ffff880077655b00 ti: ffff88007984c000 task.ti: ffff88007984c000 RIP: 0010:[<ffffffff8108ece5>] [<ffffffff8108ece5>] __wake_up_common+0x5/0x90 RSP: 0018:ffff88007984d9d0 EFLAGS: 00000046 RAX: 0000000000000046 RBX: ffff88007ac88220 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007ac88220 RBP: ffff88007984da00 R08: 0000000000000000 R09: ffff88007f617320 R10: ffffea000173f700 R11: ffffffffa01a462d R12: 0000000000000046 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 FS: 00007faaaca78980(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007faaa4b3c000 CR3: 000000007baaf000 CR4: 00000000000407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffff81090af9 ffff88007af4cd80 ffff88007ac881e0 ffff88007ac881f0 ffff88007984da48 ffff88007ac881e0 ffff88007984da88 ffffffffa01a524b ffffc90008680018 ffff88007af4cda8 ffff88007af4cdd8 0000000000000292 Call Trace: [<ffffffff81090af9>] ? __wake_up+0x39/0x50 [<ffffffffa01a524b>] vmw_fences_update+0x11b/0x220 [vmwgfx] [<ffffffffa01a2568>] vmw_update_seqno+0x48/0x50 [vmwgfx] [<ffffffffa01a2073>] vmw_fifo_send_fence+0x93/0xe0 [vmwgfx] [<ffffffffa0197e85>] vmw_execbuf_fence_commands+0x55/0x120 [vmwgfx] [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx] [<ffffffffa01998d0>] do_dmabuf_dirty_sou.isra.9+0x100/0x3c0 [vmwgfx] [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm] [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm] [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx] [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm] [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm] [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx] [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b Code: 49 0f af c0 e9 64 ff ff ff 0f 1f 44 00 00 44 8d 4a ff 31 c0 45 31 c0 4d 63 c9 e9 4e ff ff ff 0f 1f 80 00 00 00 00 66 66 66 66 90 <55> 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d 67 ---------- Crash pattern 4 end ----------
hi
I've encountered [BUG: unable to handle kernel NULL pointer dereference at] which has call stack like your pattern2. And before this happended, I got a lot of memory allocation failure warnings. And my kernel is 3.10.0-327.62.1.el7.x86_64.
Since, you mentioned it may be a bug of drm/tmm. So, I checked drm/ttm for possible patch to fix this problem, but found nothing. Could you please tell me is there any progress of this problem that you detected.
Best wished!
Jinxiang, Gu
On 2020/07/14 18:13, Gu Jinxiang wrote:
I've encountered [BUG: unable to handle kernel NULL pointer dereference at] which has call stack like your pattern2. And before this happended, I got a lot of memory allocation failure warnings. And my kernel is 3.10.0-327.62.1.el7.x86_64.
Since, you mentioned it may be a bug of drm/tmm. So, I checked drm/ttm for possible patch to fix this problem, but found nothing. Could you please tell me is there any progress of this problem that you detected.
I'm not aware of any progress on https://patchwork.kernel.org/patch/5681611/ .
On Wed, 15 Jul 2020 at 17:00, Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp wrote:
On 2020/07/14 18:13, Gu Jinxiang wrote:
I've encountered [BUG: unable to handle kernel NULL pointer dereference at] which has call stack like your pattern2. And before this happended, I got a lot of memory allocation failure warnings. And my kernel is 3.10.0-327.62.1.el7.x86_64.
Since, you mentioned it may be a bug of drm/tmm. So, I checked drm/ttm for possible patch to fix this problem, but found nothing. Could you please tell me is there any progress of this problem that you detected.
I'm not aware of any progress on https://patchwork.kernel.org/patch/5681611/ .
Just found this email, I've hopefully fix this issue in my drm-next tree with
https://patchwork.freedesktop.org/patch/380782/
Dave.
dri-devel@lists.freedesktop.org