Hi folks, Two weeks ago when commit 22051d9c4a57 coming to my system. Started happen randomly errors: "gnome-shell: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0" Symptoms: The screen goes out as in energy saving. And it is impossible to wake the computer in a few minutes.
I am making bisect and looks like the first bad commit is 476e955dd679. Here full bisect logs: https://mega.nz/#F!kgYFxAIb!v1tcHANPy2ns1lh4LQLeIg
I wrote about my find to the amd-gfx mailing list, but no one answer me. Until yesterday, I thought it was a bug in the amdgpu driver. But yesterday, after the next occurrence of an error, the system hangs completely already with another error.
[ 3225.317560] BUG: unable to handle page fault for address: 000000000000c9f4 [ 3225.317562] #PF: supervisor read access in kernel mode [ 3225.317563] #PF: error_code(0x0000) - not-present page [ 3225.317565] PGD 0 P4D 0 [ 3225.317567] Oops: 0000 [#1] SMP NOPTI [ 3225.317571] CPU: 2 PID: 12717 Comm: Xorg Tainted: G W 5.3.0-0.rc2.git4.1.fc31.x86_64 #1 [ 3225.317572] Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2406 06/21/2019 [ 3225.317625] RIP: 0010:dc_resource_state_copy_construct+0x18/0xf0 [amdgpu] [ 3225.317627] Code: 00 49 83 c4 01 44 39 e0 7f b5 5b 5d 41 5c 41 5d c3 c3 0f 1f 44 00 00 41 56 ba f8 c9 00 00 41 55 41 54 49 89 f4 55 4c 89 e5 53 <44> 8b ae f4 c9 00 00 48 89 fe 4c 89 e7 e8 16 86 48 f7 49 8d 84 24 [ 3225.317630] RSP: 0018:ffffb439c3e377d0 EFLAGS: 00010246 [ 3225.317631] RAX: ffff9b0ba19a0000 RBX: ffffffffc08380b0 RCX: 0000000000000006 [ 3225.317633] RDX: 000000000000c9f8 RSI: 0000000000000000 RDI: ffff9b0ab7fc0000 [ 3225.317635] RBP: 0000000000000000 R08: 000002eef3c694b7 R09: 0000000000000000 [ 3225.317636] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 3225.317638] R13: ffff9b0bb5381000 R14: ffff9b09acc68598 R15: ffff9b09acc68540 [ 3225.317640] FS: 00007fdde56cbf00(0000) GS:ffff9b0bba400000(0000) knlGS:0000000000000000 [ 3225.317641] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3225.317643] CR2: 000000000000c9f4 CR3: 00000007382ee000 CR4: 00000000003406e0 [ 3225.317644] Call Trace: [ 3225.317714] amdgpu_dm_atomic_commit_tail.cold+0xad/0xe1 [amdgpu] [ 3225.317719] ? lockdep_hardirqs_on+0xf0/0x180 [ 3225.317723] ? debug_check_no_obj_freed+0x107/0x1d8 [ 3225.317786] ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu] [ 3225.317850] ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu] [ 3225.317855] ? kfree+0x1b6/0x3b0 [ 3225.317918] ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu] [ 3225.317923] ? __lock_acquire+0x247/0x1910 [ 3225.317928] ? find_held_lock+0x32/0x90 [ 3225.317931] ? mark_held_locks+0x50/0x80 [ 3225.317934] ? _raw_spin_unlock_irq+0x29/0x40 [ 3225.317937] ? lockdep_hardirqs_on+0xf0/0x180 [ 3225.317939] ? _raw_spin_unlock_irq+0x29/0x40 [ 3225.317942] ? wait_for_completion_timeout+0x75/0x190 [ 3225.317954] ? commit_tail+0x3c/0x70 [drm_kms_helper] [ 3225.317960] commit_tail+0x3c/0x70 [drm_kms_helper] [ 3225.317968] drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper] [ 3225.317975] drm_atomic_helper_disable_plane+0x82/0xb0 [drm_kms_helper] [ 3225.317994] drm_mode_cursor_universal+0x12c/0x240 [drm] [ 3225.318011] drm_mode_cursor_common+0xd8/0x230 [drm] [ 3225.318026] ? drm_mode_setplane+0x1a0/0x1a0 [drm] [ 3225.318038] drm_mode_cursor_ioctl+0x4d/0x70 [drm] [ 3225.318049] drm_ioctl_kernel+0xaa/0xf0 [drm] [ 3225.318061] drm_ioctl+0x208/0x390 [drm] [ 3225.318075] ? drm_mode_setplane+0x1a0/0x1a0 [drm] [ 3225.318079] ? lockdep_hardirqs_on+0xf0/0x180 [ 3225.318145] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 3225.318164] do_vfs_ioctl+0x411/0x750 [ 3225.318175] ksys_ioctl+0x5e/0x90 [ 3225.318179] __x64_sys_ioctl+0x16/0x20 [ 3225.318188] do_syscall_64+0x5c/0xb0 [ 3225.318191] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3225.318194] RIP: 0033:0x7fdde5b4007b [ 3225.318203] Code: 0f 1e fa 48 8b 05 0d 9e 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 9d 0c 00 f7 d8 64 89 01 48 [ 3225.318209] RSP: 002b:00007ffec481a6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3225.318213] RAX: ffffffffffffffda RBX: 00007ffec481a710 RCX: 00007fdde5b4007b [ 3225.318215] RDX: 00007ffec481a710 RSI: 00000000c01c64a3 RDI: 000000000000000e [ 3225.318217] RBP: 00000000c01c64a3 R08: 0000000000000080 R09: 0000000000000000 [ 3225.318218] R10: 0000000000000004 R11: 0000000000000246 R12: 00000000000006f1 [ 3225.318220] R13: 000000000000000e R14: 000056201b5b5490 R15: 000056201bbe7820 [ 3225.318225] Modules linked in: macvtap macvlan tap rfcomm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic ledtrig_audio kvm_amd rtwpci snd_hda_codec_hdmi rtw88 kvm snd_hda_intel snd_usb_audio snd_hda_codec mac80211 snd_hda_core snd_usbmidi_lib irqbypass snd_rawmidi uvcvideo snd_hwdep snd_seq videobuf2_vmalloc videobuf2_memops btusb videobuf2_v4l2 crct10dif_pclmul snd_seq_device videobuf2_common btrtl crc32_pclmul eeepc_wmi snd_pcm btbcm btintel asus_wmi xpad snd_timer sparse_keymap [ 3225.318261] videodev ff_memless bluetooth joydev ghash_clmulni_intel cfg80211 video snd mc k10temp wmi_bmof soundcore ecdh_generic sp5100_tco ecc rfkill ccp i2c_piix4 libarc4 gpio_amdpt gpio_generic acpi_cpufreq binfmt_misc ip_tables hid_logitech_hidpp amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper drm igb crc32c_intel dca i2c_algo_bit hid_logitech_dj nvme nvme_core wmi pinctrl_amd [ 3225.318283] CR2: 000000000000c9f4
Every time when I see "SMP NOPTI" error I think that something wrong happens with memory management. So I decided to ask for help on the linux-mm mailing list. Anyway for unknown reasons AMD developers ignored me.
Thanks.
-- Best Regards, Mike Gavrilov.