https://bugzilla.kernel.org/show_bug.cgi?id=204181
Andrey Grodzovsky (andrey.grodzovsky@amd.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |andrey.grodzovsky@amd.com
--- Comment #30 from Andrey Grodzovsky (andrey.grodzovsky@amd.com) --- (In reply to Sergey Kondakov from comment #27)
Created attachment 284153 [details] dmesg_2019-08-04-amdgpu-new_dereference-with-shadowprimary
So, I've been using explicitly disabled "EnablePageFlip" and "TearFree" options as workaround for the original dereference but then decided to try out "ShadowPrimary" during fiddling with mvtools' motion-interpolation optimization in mpv, since page flipping is disabled anyway. But the result was ANOTHER null pointer dereference mere seconds after login: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 3272 Comm: X:cs0 Tainted: G IO 5.2.5-1407.g79b6a9c-HSF #1 openSUSE Tumbleweed Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14e 09/09/2014 RIP: 0010:amdgpu_vm_update_directories+0xe7/0x260 [amdgpu] Code: 89 08 48 8d 4a 40 48 89 48 08 48 89 42 40 48 8b 78 f0 c6 40 10 00 4c 8b a7 80 06 00 00 4d 85 e4 74 08 4d 8b a4 24 40 04 00 00 <4d> 8b 6c 24 08 31 f6 49 8b 95 80 06 00 00 48 85 d2 74 0f 48 8b 92 RSP: 0018:ffffafc2478aba10 EFLAGS: 00010246 RAX: ffff98742e20e670 RBX: ffff98742e20e658 RCX: ffff98744fc66040 RDX: ffff98744fc66000 RSI: ffff98742e20e638 RDI: ffff9873a295f800 RBP: ffff987459e00000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffafc2478abb58 R14: ffff98744fc66000 R15: ffffafc2478abb58 FS: 00007f3ee03d7700(0000) GS:ffff98746de00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000003f27aa000 CR4: 00000000000406e0 Call Trace: amdgpu_cs_vm_handling+0x308/0x440 [amdgpu] amdgpu_cs_ioctl+0x154/0xa10 [amdgpu] ? amdgpu_cs_vm_handling+0x440/0x440 [amdgpu] drm_ioctl_kernel+0xaa/0xf0 drm_ioctl+0x208/0x385 ? amdgpu_cs_vm_handling+0x440/0x440 [amdgpu] ? _raw_spin_unlock_irqrestore+0x59/0x70 ? preempt_count_sub+0x98/0xe0 ? _raw_spin_unlock_irqrestore+0x46/0x70 amdgpu_drm_ioctl+0x49/0x80 [amdgpu] do_vfs_ioctl+0x3ed/0x720 ? __fget+0xf9/0x1b0 ksys_ioctl+0x5e/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x66/0xc0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f3ee641c7c7 Code: 00 00 90 48 8b 05 d1 86 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 86 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007f3ee03d6a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f3ee03d6a70 RCX: 00007f3ee641c7c7 RDX: 00007f3ee03d6a70 RSI: 00000000c0186444 RDI: 000000000000000e RBP: 00000000c0186444 R08: 00007f3ee03d6b80 R09: 0000000000000020 R10: 00007f3ee03d6b80 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000e R14: 000055d55e6f8bf0 R15: 000055d55e6f91a8 Modules linked in: af_packet xt_pkttype xt_string nf_nat_ftp nf_conntrack_ftp xt_tcpudp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables scsi_transport_iscsi ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss msr bnep it87 hwmon_vid zram amd64_edac_mod edac_mce_amd kvm_amd kvm rc_avermedia tuner_simple tuner_types irqbypass tuner tda7432 btusb btrtl btbcm btintel tvaudio msp3400 bluetooth snd_usb_audio ath9k joydev bttv ath9k_common snd_usbmidi_lib tea575x ath9k_hw tveeprom snd_rawmidi videobuf_dma_sg mxm_wmi wmi_bmof pcspkr ath videobuf_core snd_seq_device k10temp fam15h_power rc_core snd_hda_codec_realtek v4l2_common snd_hda_codec_generic sp5100_tco snd_hda_codec_hdmi ledtrig_audio mac80211 amdgpu videodev media i2c_piix4 snd_hda_intel cfg80211 snd_hda_codec r8169 snd_hda_core realtek snd_hwdep libphy snd_pcm gpu_sched rfkill ttm mac_hid hid_generic usbhid uas usb_storage ohci_pci serio_raw sd_mod ehci_pci ohci_hcd ehci_hcd xhci_pci xhci_hcd wmi exfat(O) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc vhba(O) uinput sg nbd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ecryptfs CR2: 0000000000000008 ---[ end trace a7f0ed14134a76ad ]--- RIP: 0010:amdgpu_vm_update_directories+0xe7/0x260 [amdgpu] Code: 89 08 48 8d 4a 40 48 89 48 08 48 89 42 40 48 8b 78 f0 c6 40 10 00 4c 8b a7 80 06 00 00 4d 85 e4 74 08 4d 8b a4 24 40 04 00 00 <4d> 8b 6c 24 08 31 f6 49 8b 95 80 06 00 00 48 85 d2 74 0f 48 8b 92 RSP: 0018:ffffafc2478aba10 EFLAGS: 00010246 RAX: ffff98742e20e670 RBX: ffff98742e20e658 RCX: ffff98744fc66040 RDX: ffff98744fc66000 RSI: ffff98742e20e638 RDI: ffff9873a295f800 RBP: ffff987459e00000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffafc2478abb58 R14: ffff98744fc66000 R15: ffffafc2478abb58 FS: 00007f3ee03d7700(0000) GS:ffff98746de00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000003f27aa000 CR4: 00000000000406e0
Sergey, I tried to reproduce you latest issue on Ellsmere (Polaris 10) with "ShadowPrimary" enabled flip disabled and didn't observe any crash. In case you built your own kernel can you give me the output of this command -
Run gdb on amdgpu.ko gdb drivers/gpu/drm/amd/amdgpu/amdgpu.ko
Then do - list *(amdgpu_vm_update_directories+0xe7)