On Wed, Sep 04, 2019 at 09:17:16AM +0800, Hillf Danton wrote:
Daniel Vetter daniel@ffwll.ch
Now 11:01pm and "gnome shell stuck warning" not appear since 19:17. So looks like issue happens only when computer blocked and monitor in power save mode.
I'd bet on runtime pm or some other power saving feature in amdgpu shutting the interrupt handling down before we've handled all the interrupts. That would then result in a stuck fence.
Do we already know which fence is stuck?
It is welcomed to shed a thread of light on how to collect/print that info. Say line:xxx-yyy in path/to/amdgpu/zzz.c
Extend your backtrac warning slightly like
WARN(r, "we're stuck on fence %pS\n", fence->ops);
Also adding Harry and Alex, I'm not really working on amdgpu ... -Daniel
On Wed, 4 Sep 2019 at 13:37, Daniel Vetter daniel@ffwll.ch wrote:
Extend your backtrac warning slightly like
WARN(r, "we're stuck on fence %pS\n", fence->ops);
Also adding Harry and Alex, I'm not really working on amdgpu ...
[ 3511.998320] ------------[ cut here ]------------ [ 3511.998714] we're stuck on fence amdgpu_fence_ops+0x0/0xffffffffffffc220 [amdgpu] [ 3511.998991] WARNING: CPU: 10 PID: 1811 at drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332 amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu] [ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi bluetooth xpad joydev mc snd_pcm [ 3511.999076] asus_wmi ff_memless cfg80211 sparse_keymap video wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4 ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca nvme i2c_algo_bit nvme_core wmi pinctrl_amd [ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted 5.3.0-0.rc6.git2.1c.fc32.x86_64 #1 [ 3511.999131] Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019 [ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu] [ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0 74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82 b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff ff ff [ 3511.999282] RSP: 0018:ffffb9c04170f798 EFLAGS: 00210282 [ 3511.999288] RAX: 0000000000000000 RBX: ffff8d2ce5205a80 RCX: 0000000000000006 [ 3511.999292] RDX: 0000000000000007 RSI: ffff8d2c5bea4070 RDI: ffff8d2cfb5d9e00 [ 3511.999296] RBP: ffff8d28becae480 R08: 00000331b36fd503 R09: 0000000000000000 [ 3511.999299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d2ce5200000 [ 3511.999303] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d2ce1540000 [ 3511.999308] FS: 00007f59a5bc6f00(0000) GS:ffff8d2cfb400000(0000) knlGS:0000000000000000 [ 3511.999311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3511.999315] CR2: 00001108bc475960 CR3: 000000075bf32000 CR4: 00000000003406e0 [ 3511.999319] Call Trace: [ 3511.999394] amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu] [ 3511.999503] dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu] [ 3511.999609] dce12_update_clocks+0xd8/0x110 [amdgpu] [ 3511.999712] dc_commit_state+0x414/0x590 [amdgpu] [ 3511.999725] ? find_held_lock+0x32/0x90 [ 3511.999832] amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu] [ 3511.999844] ? reacquire_held_locks+0xed/0x210 [ 3511.999859] ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm] [ 3511.999866] ? find_held_lock+0x32/0x90 [ 3511.999872] ? find_held_lock+0x32/0x90 [ 3511.999881] ? __lock_acquire+0x247/0x1910 [ 3511.999893] ? find_held_lock+0x32/0x90 [ 3511.999901] ? mark_held_locks+0x50/0x80 [ 3511.999907] ? _raw_spin_unlock_irq+0x29/0x40 [ 3511.999913] ? lockdep_hardirqs_on+0xf0/0x180 [ 3511.999919] ? _raw_spin_unlock_irq+0x29/0x40 [ 3511.999924] ? wait_for_completion_timeout+0x75/0x190 [ 3511.999952] ? commit_tail+0x3c/0x70 [drm_kms_helper] [ 3511.999966] commit_tail+0x3c/0x70 [drm_kms_helper] [ 3511.999979] drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper] [ 3512.000002] drm_mode_atomic_ioctl+0x793/0x9b0 [drm] [ 3512.000014] ? __lock_acquire+0x247/0x1910 [ 3512.000044] ? drm_atomic_set_property+0xa50/0xa50 [drm] [ 3512.000066] drm_ioctl_kernel+0xaa/0xf0 [drm] [ 3512.000088] drm_ioctl+0x208/0x390 [drm] [ 3512.000108] ? drm_atomic_set_property+0xa50/0xa50 [drm] [ 3512.000120] ? lockdep_hardirqs_on+0xf0/0x180 [ 3512.000205] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 3512.000216] do_vfs_ioctl+0x411/0x750 [ 3512.000229] ksys_ioctl+0x5e/0x90 [ 3512.000237] __x64_sys_ioctl+0x16/0x20 [ 3512.000242] do_syscall_64+0x5c/0xb0 [ 3512.000249] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3512.000254] RIP: 0033:0x7f59a603d00b [ 3512.000259] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89 01 48 [ 3512.000263] RSP: 002b:00007ffc493bcc08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3512.000267] RAX: ffffffffffffffda RBX: 00007ffc493bcc50 RCX: 00007f59a603d00b [ 3512.000271] RDX: 00007ffc493bcc50 RSI: 00000000c03864bc RDI: 000000000000000e [ 3512.000275] RBP: 00000000c03864bc R08: 000055aa62e41d00 R09: 0000000000000001 [ 3512.000278] R10: 0000000000000001 R11: 0000000000000246 R12: 000055aa61a99d00 [ 3512.000282] R13: 000000000000000e R14: 000055aa628f7430 R15: 000055aa62e34540 [ 3512.000297] irq event stamp: 258283232 [ 3512.000303] hardirqs last enabled at (258283231): [<ffffffff8b170beb>] console_unlock+0x46b/0x5d0 [ 3512.000309] hardirqs last disabled at (258283232): [<ffffffff8b0038da>] trace_hardirqs_off_thunk+0x1a/0x20 [ 3512.000314] softirqs last enabled at (258282448): [<ffffffff8be0035d>] __do_softirq+0x35d/0x45d [ 3512.000319] softirqs last disabled at (258282413): [<ffffffff8b0f1e57>] irq_exit+0xf7/0x100 [ 3512.000323] ---[ end trace 55ed0c80b95aef99 ]---
On Thu, Sep 5, 2019 at 12:27 AM Mikhail Gavrilov mikhail.v.gavrilov@gmail.com wrote:
On Wed, 4 Sep 2019 at 13:37, Daniel Vetter daniel@ffwll.ch wrote:
Extend your backtrac warning slightly like
WARN(r, "we're stuck on fence %pS\n", fence->ops);
Also adding Harry and Alex, I'm not really working on amdgpu ...
[ 3511.998320] ------------[ cut here ]------------ [ 3511.998714] we're stuck on fence amdgpu_fence_ops+0x0/0xffffffffffffc220 [amdgpu]$
I think those fences are only emitted for CS, not display related. Adding Christian König. -Daniel
[ 3511.998991] WARNING: CPU: 10 PID: 1811 at drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332 amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu] [ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi bluetooth xpad joydev mc snd_pcm [ 3511.999076] asus_wmi ff_memless cfg80211 sparse_keymap video wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4 ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca nvme i2c_algo_bit nvme_core wmi pinctrl_amd [ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted 5.3.0-0.rc6.git2.1c.fc32.x86_64 #1 [ 3511.999131] Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019 [ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu] [ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0 74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82 b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff ff ff [ 3511.999282] RSP: 0018:ffffb9c04170f798 EFLAGS: 00210282 [ 3511.999288] RAX: 0000000000000000 RBX: ffff8d2ce5205a80 RCX: 0000000000000006 [ 3511.999292] RDX: 0000000000000007 RSI: ffff8d2c5bea4070 RDI: ffff8d2cfb5d9e00 [ 3511.999296] RBP: ffff8d28becae480 R08: 00000331b36fd503 R09: 0000000000000000 [ 3511.999299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d2ce5200000 [ 3511.999303] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d2ce1540000 [ 3511.999308] FS: 00007f59a5bc6f00(0000) GS:ffff8d2cfb400000(0000) knlGS:0000000000000000 [ 3511.999311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3511.999315] CR2: 00001108bc475960 CR3: 000000075bf32000 CR4: 00000000003406e0 [ 3511.999319] Call Trace: [ 3511.999394] amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu] [ 3511.999503] dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu] [ 3511.999609] dce12_update_clocks+0xd8/0x110 [amdgpu] [ 3511.999712] dc_commit_state+0x414/0x590 [amdgpu] [ 3511.999725] ? find_held_lock+0x32/0x90 [ 3511.999832] amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu] [ 3511.999844] ? reacquire_held_locks+0xed/0x210 [ 3511.999859] ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm] [ 3511.999866] ? find_held_lock+0x32/0x90 [ 3511.999872] ? find_held_lock+0x32/0x90 [ 3511.999881] ? __lock_acquire+0x247/0x1910 [ 3511.999893] ? find_held_lock+0x32/0x90 [ 3511.999901] ? mark_held_locks+0x50/0x80 [ 3511.999907] ? _raw_spin_unlock_irq+0x29/0x40 [ 3511.999913] ? lockdep_hardirqs_on+0xf0/0x180 [ 3511.999919] ? _raw_spin_unlock_irq+0x29/0x40 [ 3511.999924] ? wait_for_completion_timeout+0x75/0x190 [ 3511.999952] ? commit_tail+0x3c/0x70 [drm_kms_helper] [ 3511.999966] commit_tail+0x3c/0x70 [drm_kms_helper] [ 3511.999979] drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper] [ 3512.000002] drm_mode_atomic_ioctl+0x793/0x9b0 [drm] [ 3512.000014] ? __lock_acquire+0x247/0x1910 [ 3512.000044] ? drm_atomic_set_property+0xa50/0xa50 [drm] [ 3512.000066] drm_ioctl_kernel+0xaa/0xf0 [drm] [ 3512.000088] drm_ioctl+0x208/0x390 [drm] [ 3512.000108] ? drm_atomic_set_property+0xa50/0xa50 [drm] [ 3512.000120] ? lockdep_hardirqs_on+0xf0/0x180 [ 3512.000205] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 3512.000216] do_vfs_ioctl+0x411/0x750 [ 3512.000229] ksys_ioctl+0x5e/0x90 [ 3512.000237] __x64_sys_ioctl+0x16/0x20 [ 3512.000242] do_syscall_64+0x5c/0xb0 [ 3512.000249] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3512.000254] RIP: 0033:0x7f59a603d00b [ 3512.000259] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89 01 48 [ 3512.000263] RSP: 002b:00007ffc493bcc08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3512.000267] RAX: ffffffffffffffda RBX: 00007ffc493bcc50 RCX: 00007f59a603d00b [ 3512.000271] RDX: 00007ffc493bcc50 RSI: 00000000c03864bc RDI: 000000000000000e [ 3512.000275] RBP: 00000000c03864bc R08: 000055aa62e41d00 R09: 0000000000000001 [ 3512.000278] R10: 0000000000000001 R11: 0000000000000246 R12: 000055aa61a99d00 [ 3512.000282] R13: 000000000000000e R14: 000055aa628f7430 R15: 000055aa62e34540 [ 3512.000297] irq event stamp: 258283232 [ 3512.000303] hardirqs last enabled at (258283231): [<ffffffff8b170beb>] console_unlock+0x46b/0x5d0 [ 3512.000309] hardirqs last disabled at (258283232): [<ffffffff8b0038da>] trace_hardirqs_off_thunk+0x1a/0x20 [ 3512.000314] softirqs last enabled at (258282448): [<ffffffff8be0035d>] __do_softirq+0x35d/0x45d [ 3512.000319] softirqs last disabled at (258282413): [<ffffffff8b0f1e57>] irq_exit+0xf7/0x100 [ 3512.000323] ---[ end trace 55ed0c80b95aef99 ]---
On Thu, 5 Sep 2019 at 12:58, Daniel Vetter daniel@ffwll.ch wrote:
I think those fences are only emitted for CS, not display related. Adding Christian König.
More fresh kernel log with 5.3RC7 - the issue still happens. https://pastebin.com/tyxkWJYV
-- Best Regards, Mike Gavrilov.
On Thu, 5 Sep 2019 at 12:58, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 5, 2019 at 12:27 AM Mikhail Gavrilov mikhail.v.gavrilov@gmail.com wrote:
On Wed, 4 Sep 2019 at 13:37, Daniel Vetter daniel@ffwll.ch wrote:
Extend your backtrac warning slightly like
WARN(r, "we're stuck on fence %pS\n", fence->ops);
Also adding Harry and Alex, I'm not really working on amdgpu ...
[ 3511.998320] ------------[ cut here ]------------ [ 3511.998714] we're stuck on fence amdgpu_fence_ops+0x0/0xffffffffffffc220 [amdgpu]$
I think those fences are only emitted for CS, not display related. Adding Christian König. -Daniel
[ 3511.998991] WARNING: CPU: 10 PID: 1811 at drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332 amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu] [ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi bluetooth xpad joydev mc snd_pcm [ 3511.999076] asus_wmi ff_memless cfg80211 sparse_keymap video wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4 ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca nvme i2c_algo_bit nvme_core wmi pinctrl_amd [ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted 5.3.0-0.rc6.git2.1c.fc32.x86_64 #1 [ 3511.999131] Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019 [ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu] [ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0 74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82 b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff ff ff [ 3511.999282] RSP: 0018:ffffb9c04170f798 EFLAGS: 00210282 [ 3511.999288] RAX: 0000000000000000 RBX: ffff8d2ce5205a80 RCX: 0000000000000006 [ 3511.999292] RDX: 0000000000000007 RSI: ffff8d2c5bea4070 RDI: ffff8d2cfb5d9e00 [ 3511.999296] RBP: ffff8d28becae480 R08: 00000331b36fd503 R09: 0000000000000000 [ 3511.999299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d2ce5200000 [ 3511.999303] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d2ce1540000 [ 3511.999308] FS: 00007f59a5bc6f00(0000) GS:ffff8d2cfb400000(0000) knlGS:0000000000000000 [ 3511.999311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3511.999315] CR2: 00001108bc475960 CR3: 000000075bf32000 CR4: 00000000003406e0 [ 3511.999319] Call Trace: [ 3511.999394] amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu] [ 3511.999503] dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu] [ 3511.999609] dce12_update_clocks+0xd8/0x110 [amdgpu] [ 3511.999712] dc_commit_state+0x414/0x590 [amdgpu] [ 3511.999725] ? find_held_lock+0x32/0x90 [ 3511.999832] amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu] [ 3511.999844] ? reacquire_held_locks+0xed/0x210 [ 3511.999859] ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm] [ 3511.999866] ? find_held_lock+0x32/0x90 [ 3511.999872] ? find_held_lock+0x32/0x90 [ 3511.999881] ? __lock_acquire+0x247/0x1910 [ 3511.999893] ? find_held_lock+0x32/0x90 [ 3511.999901] ? mark_held_locks+0x50/0x80 [ 3511.999907] ? _raw_spin_unlock_irq+0x29/0x40 [ 3511.999913] ? lockdep_hardirqs_on+0xf0/0x180 [ 3511.999919] ? _raw_spin_unlock_irq+0x29/0x40 [ 3511.999924] ? wait_for_completion_timeout+0x75/0x190 [ 3511.999952] ? commit_tail+0x3c/0x70 [drm_kms_helper] [ 3511.999966] commit_tail+0x3c/0x70 [drm_kms_helper] [ 3511.999979] drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper] [ 3512.000002] drm_mode_atomic_ioctl+0x793/0x9b0 [drm] [ 3512.000014] ? __lock_acquire+0x247/0x1910 [ 3512.000044] ? drm_atomic_set_property+0xa50/0xa50 [drm] [ 3512.000066] drm_ioctl_kernel+0xaa/0xf0 [drm] [ 3512.000088] drm_ioctl+0x208/0x390 [drm] [ 3512.000108] ? drm_atomic_set_property+0xa50/0xa50 [drm] [ 3512.000120] ? lockdep_hardirqs_on+0xf0/0x180 [ 3512.000205] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 3512.000216] do_vfs_ioctl+0x411/0x750 [ 3512.000229] ksys_ioctl+0x5e/0x90 [ 3512.000237] __x64_sys_ioctl+0x16/0x20 [ 3512.000242] do_syscall_64+0x5c/0xb0 [ 3512.000249] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3512.000254] RIP: 0033:0x7f59a603d00b [ 3512.000259] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89 01 48 [ 3512.000263] RSP: 002b:00007ffc493bcc08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3512.000267] RAX: ffffffffffffffda RBX: 00007ffc493bcc50 RCX: 00007f59a603d00b [ 3512.000271] RDX: 00007ffc493bcc50 RSI: 00000000c03864bc RDI: 000000000000000e [ 3512.000275] RBP: 00000000c03864bc R08: 000055aa62e41d00 R09: 0000000000000001 [ 3512.000278] R10: 0000000000000001 R11: 0000000000000246 R12: 000055aa61a99d00 [ 3512.000282] R13: 000000000000000e R14: 000055aa628f7430 R15: 000055aa62e34540 [ 3512.000297] irq event stamp: 258283232 [ 3512.000303] hardirqs last enabled at (258283231): [<ffffffff8b170beb>] console_unlock+0x46b/0x5d0 [ 3512.000309] hardirqs last disabled at (258283232): [<ffffffff8b0038da>] trace_hardirqs_off_thunk+0x1a/0x20 [ 3512.000314] softirqs last enabled at (258282448): [<ffffffff8be0035d>] __do_softirq+0x35d/0x45d [ 3512.000319] softirqs last disabled at (258282413): [<ffffffff8b0f1e57>] irq_exit+0xf7/0x100 [ 3512.000323] ---[ end trace 55ed0c80b95aef99 ]---
-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
I agree with Daniels analysis.
It looks like the problem is simply that PM turns of a block before all work is done on that block.
Have you opened a bug report yet? If not then that would certainly help cause it is really hard to extract all necessary information from that mail thread.
Regards, Christian.
Am 08.09.19 um 23:24 schrieb Mikhail Gavrilov:
On Thu, 5 Sep 2019 at 12:58, Daniel Vetter daniel@ffwll.ch wrote:
I think those fences are only emitted for CS, not display related. Adding Christian König.
More fresh kernel log with 5.3RC7 - the issue still happens. https://pastebin.com/tyxkWJYV
-- Best Regards, Mike Gavrilov.
On Thu, 5 Sep 2019 at 12:58, Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Sep 5, 2019 at 12:27 AM Mikhail Gavrilov mikhail.v.gavrilov@gmail.com wrote:
On Wed, 4 Sep 2019 at 13:37, Daniel Vetter daniel@ffwll.ch wrote:
Extend your backtrac warning slightly like
WARN(r, "we're stuck on fence %pS\n", fence->ops);
Also adding Harry and Alex, I'm not really working on amdgpu ...
[ 3511.998320] ------------[ cut here ]------------ [ 3511.998714] we're stuck on fence amdgpu_fence_ops+0x0/0xffffffffffffc220 [amdgpu]$
I think those fences are only emitted for CS, not display related. Adding Christian König. -Daniel
[ 3511.998991] WARNING: CPU: 10 PID: 1811 at drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332 amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu] [ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi bluetooth xpad joydev mc snd_pcm [ 3511.999076] asus_wmi ff_memless cfg80211 sparse_keymap video wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4 ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca nvme i2c_algo_bit nvme_core wmi pinctrl_amd [ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted 5.3.0-0.rc6.git2.1c.fc32.x86_64 #1 [ 3511.999131] Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019 [ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu] [ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0 74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82 b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff ff ff [ 3511.999282] RSP: 0018:ffffb9c04170f798 EFLAGS: 00210282 [ 3511.999288] RAX: 0000000000000000 RBX: ffff8d2ce5205a80 RCX: 0000000000000006 [ 3511.999292] RDX: 0000000000000007 RSI: ffff8d2c5bea4070 RDI: ffff8d2cfb5d9e00 [ 3511.999296] RBP: ffff8d28becae480 R08: 00000331b36fd503 R09: 0000000000000000 [ 3511.999299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d2ce5200000 [ 3511.999303] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d2ce1540000 [ 3511.999308] FS: 00007f59a5bc6f00(0000) GS:ffff8d2cfb400000(0000) knlGS:0000000000000000 [ 3511.999311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3511.999315] CR2: 00001108bc475960 CR3: 000000075bf32000 CR4: 00000000003406e0 [ 3511.999319] Call Trace: [ 3511.999394] amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu] [ 3511.999503] dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu] [ 3511.999609] dce12_update_clocks+0xd8/0x110 [amdgpu] [ 3511.999712] dc_commit_state+0x414/0x590 [amdgpu] [ 3511.999725] ? find_held_lock+0x32/0x90 [ 3511.999832] amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu] [ 3511.999844] ? reacquire_held_locks+0xed/0x210 [ 3511.999859] ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm] [ 3511.999866] ? find_held_lock+0x32/0x90 [ 3511.999872] ? find_held_lock+0x32/0x90 [ 3511.999881] ? __lock_acquire+0x247/0x1910 [ 3511.999893] ? find_held_lock+0x32/0x90 [ 3511.999901] ? mark_held_locks+0x50/0x80 [ 3511.999907] ? _raw_spin_unlock_irq+0x29/0x40 [ 3511.999913] ? lockdep_hardirqs_on+0xf0/0x180 [ 3511.999919] ? _raw_spin_unlock_irq+0x29/0x40 [ 3511.999924] ? wait_for_completion_timeout+0x75/0x190 [ 3511.999952] ? commit_tail+0x3c/0x70 [drm_kms_helper] [ 3511.999966] commit_tail+0x3c/0x70 [drm_kms_helper] [ 3511.999979] drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper] [ 3512.000002] drm_mode_atomic_ioctl+0x793/0x9b0 [drm] [ 3512.000014] ? __lock_acquire+0x247/0x1910 [ 3512.000044] ? drm_atomic_set_property+0xa50/0xa50 [drm] [ 3512.000066] drm_ioctl_kernel+0xaa/0xf0 [drm] [ 3512.000088] drm_ioctl+0x208/0x390 [drm] [ 3512.000108] ? drm_atomic_set_property+0xa50/0xa50 [drm] [ 3512.000120] ? lockdep_hardirqs_on+0xf0/0x180 [ 3512.000205] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 3512.000216] do_vfs_ioctl+0x411/0x750 [ 3512.000229] ksys_ioctl+0x5e/0x90 [ 3512.000237] __x64_sys_ioctl+0x16/0x20 [ 3512.000242] do_syscall_64+0x5c/0xb0 [ 3512.000249] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3512.000254] RIP: 0033:0x7f59a603d00b [ 3512.000259] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89 01 48 [ 3512.000263] RSP: 002b:00007ffc493bcc08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3512.000267] RAX: ffffffffffffffda RBX: 00007ffc493bcc50 RCX: 00007f59a603d00b [ 3512.000271] RDX: 00007ffc493bcc50 RSI: 00000000c03864bc RDI: 000000000000000e [ 3512.000275] RBP: 00000000c03864bc R08: 000055aa62e41d00 R09: 0000000000000001 [ 3512.000278] R10: 0000000000000001 R11: 0000000000000246 R12: 000055aa61a99d00 [ 3512.000282] R13: 000000000000000e R14: 000055aa628f7430 R15: 000055aa62e34540 [ 3512.000297] irq event stamp: 258283232 [ 3512.000303] hardirqs last enabled at (258283231): [<ffffffff8b170beb>] console_unlock+0x46b/0x5d0 [ 3512.000309] hardirqs last disabled at (258283232): [<ffffffff8b0038da>] trace_hardirqs_off_thunk+0x1a/0x20 [ 3512.000314] softirqs last enabled at (258282448): [<ffffffff8be0035d>] __do_softirq+0x35d/0x45d [ 3512.000319] softirqs last disabled at (258282413): [<ffffffff8b0f1e57>] irq_exit+0xf7/0x100 [ 3512.000323] ---[ end trace 55ed0c80b95aef99 ]---
-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Mon, 9 Sep 2019 at 14:15, Koenig, Christian Christian.Koenig@amd.com wrote:
I agree with Daniels analysis.
It looks like the problem is simply that PM turns of a block before all work is done on that block.
Have you opened a bug report yet? If not then that would certainly help cause it is really hard to extract all necessary information from that mail thread.
https://bugs.freedesktop.org/show_bug.cgi?id=111689 It'll do?
-- Best Regards, Mike Gavrilov.
dri-devel@lists.freedesktop.org