Hi,
yesterday I updated the kernel from 5.13.7 to 5.14.9 and found it broke suspend-to-RAM. The machine displays a few messages on text console after resume but hangs when switching to X11.
The hardware: Asus P8H77-V with Intel Core i5-3550 CPU, display connected via DP.
DP1 connected primary 1920x1200+0+0 (normal left inverted right x axis y axis) 520mm x 320mm 1920x1200 59.95*+
I bisected it and the offending commit is the totally unlikely and innocent looking
commit b3484d2b03e4c940a9598aa841a52d69729c582a Author: Javier Martinez Canillas javierm@redhat.com Date: Tue May 25 17:13:13 2021 +0200
drm/fb-helper: improve DRM fbdev emulation device names
Now I'm running 5.14.9 with this commit reverted and suspend works.
In /var/log/kern.log I found this on suspend:
[ 34.002252][ T3455] ------------[ cut here ]------------ [ 34.002256][ T3455] i915 0000:00:02.0: drm_WARN_ON((intel_de_read(dev_priv, intel_dp->output_reg) & (1 << 31)) == 0) [ 34.002274][ T3455] WARNING: CPU: 0 PID: 3455 at drivers/gpu/drm/i915/display/g4x_dp.c:431 intel_dp_link_down.isra.0+0x2e7/0x390 [ 34.002285][ T3455] Modules linked in: kvm_intel kvm irqbypass ehci_pci xhci_pci ehci_hcd xhci_hcd [ 34.002304][ T3455] CPU: 0 PID: 3455 Comm: kworker/u8:27 Not tainted 5.14.9 #29 [ 34.002309][ T3455] Hardware name: System manufacturer System Product Name/P8H77-V, BIOS 1905 10/27/2014 [ 34.002312][ T3455] Workqueue: events_unbound async_run_entry_fn [ 34.002320][ T3455] RIP: 0010:intel_dp_link_down.isra.0+0x2e7/0x390 [ 34.002326][ T3455] Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 d2 19 05 00 48 c7 c1 e8 53 8c 88 4c 89 e2 48 c7 c7 7b 83 89 88 48 89 c6 e8 d1 d0 52 00 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 7d 08 4c [ 34.002330][ T3455] RSP: 0018:ffffaa1dc1be3a88 EFLAGS: 00010282 [ 34.002335][ T3455] RAX: 0000000000000000 RBX: ffffa15a45f28000 RCX: 0000000000000000 [ 34.002338][ T3455] RDX: 0000000000000001 RSI: ffffffff8714af2f RDI: ffffffff8714af2f [ 34.002341][ T3455] RBP: 0000000080180344 R08: 00000007eab15173 R09: 0000000000000001 [ 34.002344][ T3455] R10: 0000000000080000 R11: 0000000000000000 R12: ffffa15a40f87180 [ 34.002347][ T3455] R13: ffffa15a464e0000 R14: ffffa15a593c6000 R15: 0000000000000001 [ 34.002350][ T3455] FS: 0000000000000000(0000) GS:ffffa15d4f800000(0000) knlGS:0000000000000000 [ 34.002354][ T3455] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 34.002357][ T3455] CR2: 000056000d53cf60 CR3: 000000015fa2a006 CR4: 00000000001706f0 [ 34.002361][ T3455] Call Trace: [ 34.002373][ T3455] g4x_post_disable_dp+0x2e/0x110 [ 34.002380][ T3455] intel_encoders_post_disable+0x73/0x80 [ 34.002391][ T3455] ilk_crtc_disable+0x96/0x3a0 [ 34.002402][ T3455] intel_old_crtc_state_disables+0x5c/0x110 [ 34.002412][ T3455] intel_atomic_commit_tail+0xdcc/0x1410 [ 34.002434][ T3455] intel_atomic_commit+0x332/0x3b0 [ 34.002443][ T3455] drm_atomic_helper_disable_all+0x175/0x190 [ 34.002453][ T3455] drm_atomic_helper_suspend+0xa6/0x200 [ 34.002474][ T3455] intel_display_suspend+0x23/0x50 [ 34.002480][ T3455] i915_drm_suspend+0x42/0xe0 [ 34.002488][ T3455] pci_pm_suspend+0x74/0x160 [ 34.002496][ T3455] ? pci_pm_freeze+0xb0/0xb0 [ 34.002500][ T3455] dpm_run_callback+0x6f/0x170 [ 34.002512][ T3455] __device_suspend+0x110/0x4b0 [ 34.002521][ T3455] async_suspend+0x1b/0x90 [ 34.002529][ T3455] async_run_entry_fn+0x2e/0x110 [ 34.002535][ T3455] process_one_work+0x2c9/0x550 [ 34.002551][ T3455] worker_thread+0x4f/0x3e0 [ 34.002556][ T3455] ? rescuer_thread+0x340/0x340 [ 34.002563][ T3455] kthread+0x14a/0x170 [ 34.002569][ T3455] ? set_kthread_struct+0x40/0x40 [ 34.002578][ T3455] ret_from_fork+0x22/0x30 [ 34.002600][ T3455] ---[ end trace 2058ff589e8cbd78 ]---
and after resume:
[ 44.513409][ C0] i915 0000:00:02.0: [drm] *ERROR* uncleared pch fifo underrun on pch transcoder A [ 44.514557][ C0] i915 0000:00:02.0: [drm] *ERROR* PCH transcoder A FIFO underrun [ 54.997256][ T3520] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:45:pipe A] flip_done timed out
Scratching my head about it I found one hint in /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler:
using_kms() { grep -q -E '(nouveau|drm)fb' /proc/fb; }
So the ABI change in /proc/fb causes the pm-utils scripts to skip the --quirk-no-chvt and apply other quirks, /var/log/pm-suspend.log says:
Running hook /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler suspend suspend: No quirk database entry for this system, using default. /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler suspend suspend: success.
Running hook /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler resume suspend: Saving last known working quirks: --quirk-vbe-post --quirk-dpms-on --quirk-dpms-suspend --quirk-vbestate-restore --quirk-vbemode-restore --quirk-vga-mode-3 /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler resume suspend: success.
whereas in the normal case it's
Running hook /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler suspend suspend: Kernel modesetting video driver detected, not using quirks. /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler suspend suspend: success.
Running hook /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler resume suspend: /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler resume suspend: success.
Johannes
On Thu, Oct 07, 2021 at 09:25:58AM +0200, Johannes Stezenbach wrote:
Hi,
yesterday I updated the kernel from 5.13.7 to 5.14.9 and found it broke suspend-to-RAM. The machine displays a few messages on text console after resume but hangs when switching to X11.
The hardware: Asus P8H77-V with Intel Core i5-3550 CPU, display connected via DP.
DP1 connected primary 1920x1200+0+0 (normal left inverted right x axis y axis) 520mm x 320mm 1920x1200 59.95*+
I bisected it and the offending commit is the totally unlikely and innocent looking
commit b3484d2b03e4c940a9598aa841a52d69729c582a Author: Javier Martinez Canillas javierm@redhat.com Date: Tue May 25 17:13:13 2021 +0200
drm/fb-helper: improve DRM fbdev emulation device names
Now I'm running 5.14.9 with this commit reverted and suspend works.
In /var/log/kern.log I found this on suspend:
[ 34.002252][ T3455] ------------[ cut here ]------------ [ 34.002256][ T3455] i915 0000:00:02.0: drm_WARN_ON((intel_de_read(dev_priv, intel_dp->output_reg) & (1 << 31)) == 0) [ 34.002274][ T3455] WARNING: CPU: 0 PID: 3455 at drivers/gpu/drm/i915/display/g4x_dp.c:431 intel_dp_link_down.isra.0+0x2e7/0x390 [ 34.002285][ T3455] Modules linked in: kvm_intel kvm irqbypass ehci_pci xhci_pci ehci_hcd xhci_hcd [ 34.002304][ T3455] CPU: 0 PID: 3455 Comm: kworker/u8:27 Not tainted 5.14.9 #29 [ 34.002309][ T3455] Hardware name: System manufacturer System Product Name/P8H77-V, BIOS 1905 10/27/2014 [ 34.002312][ T3455] Workqueue: events_unbound async_run_entry_fn [ 34.002320][ T3455] RIP: 0010:intel_dp_link_down.isra.0+0x2e7/0x390 [ 34.002326][ T3455] Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 d2 19 05 00 48 c7 c1 e8 53 8c 88 4c 89 e2 48 c7 c7 7b 83 89 88 48 89 c6 e8 d1 d0 52 00 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 7d 08 4c [ 34.002330][ T3455] RSP: 0018:ffffaa1dc1be3a88 EFLAGS: 00010282 [ 34.002335][ T3455] RAX: 0000000000000000 RBX: ffffa15a45f28000 RCX: 0000000000000000 [ 34.002338][ T3455] RDX: 0000000000000001 RSI: ffffffff8714af2f RDI: ffffffff8714af2f [ 34.002341][ T3455] RBP: 0000000080180344 R08: 00000007eab15173 R09: 0000000000000001 [ 34.002344][ T3455] R10: 0000000000080000 R11: 0000000000000000 R12: ffffa15a40f87180 [ 34.002347][ T3455] R13: ffffa15a464e0000 R14: ffffa15a593c6000 R15: 0000000000000001 [ 34.002350][ T3455] FS: 0000000000000000(0000) GS:ffffa15d4f800000(0000) knlGS:0000000000000000 [ 34.002354][ T3455] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 34.002357][ T3455] CR2: 000056000d53cf60 CR3: 000000015fa2a006 CR4: 00000000001706f0 [ 34.002361][ T3455] Call Trace: [ 34.002373][ T3455] g4x_post_disable_dp+0x2e/0x110 [ 34.002380][ T3455] intel_encoders_post_disable+0x73/0x80 [ 34.002391][ T3455] ilk_crtc_disable+0x96/0x3a0 [ 34.002402][ T3455] intel_old_crtc_state_disables+0x5c/0x110 [ 34.002412][ T3455] intel_atomic_commit_tail+0xdcc/0x1410 [ 34.002434][ T3455] intel_atomic_commit+0x332/0x3b0 [ 34.002443][ T3455] drm_atomic_helper_disable_all+0x175/0x190 [ 34.002453][ T3455] drm_atomic_helper_suspend+0xa6/0x200 [ 34.002474][ T3455] intel_display_suspend+0x23/0x50 [ 34.002480][ T3455] i915_drm_suspend+0x42/0xe0 [ 34.002488][ T3455] pci_pm_suspend+0x74/0x160 [ 34.002496][ T3455] ? pci_pm_freeze+0xb0/0xb0 [ 34.002500][ T3455] dpm_run_callback+0x6f/0x170 [ 34.002512][ T3455] __device_suspend+0x110/0x4b0 [ 34.002521][ T3455] async_suspend+0x1b/0x90 [ 34.002529][ T3455] async_run_entry_fn+0x2e/0x110 [ 34.002535][ T3455] process_one_work+0x2c9/0x550 [ 34.002551][ T3455] worker_thread+0x4f/0x3e0 [ 34.002556][ T3455] ? rescuer_thread+0x340/0x340 [ 34.002563][ T3455] kthread+0x14a/0x170 [ 34.002569][ T3455] ? set_kthread_struct+0x40/0x40 [ 34.002578][ T3455] ret_from_fork+0x22/0x30 [ 34.002600][ T3455] ---[ end trace 2058ff589e8cbd78 ]---
and after resume:
[ 44.513409][ C0] i915 0000:00:02.0: [drm] *ERROR* uncleared pch fifo underrun on pch transcoder A [ 44.514557][ C0] i915 0000:00:02.0: [drm] *ERROR* PCH transcoder A FIFO underrun [ 54.997256][ T3520] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:45:pipe A] flip_done timed out
Scratching my head about it I found one hint in /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler:
using_kms() { grep -q -E '(nouveau|drm)fb' /proc/fb; }
So the ABI change in /proc/fb causes the pm-utils scripts to skip the --quirk-no-chvt and apply other quirks, /var/log/pm-suspend.log says:
Nasty. This pm-utils quirk stuff really has no business existing IMO, and so I recommend nuking pm-utils from your system as soon as possible. Back when I still had it on my machines (due to some silly dependency I think), I just created empty override files in /etc/pm/ to permanently disable all the quirks.
But as long people might be using it I guess we need some kind of revert/fix to put the "drmfb" back into the name. Javier?
Hello,
On 10/7/21 14:38, Ville Syrjälä wrote:
[snip]
So the ABI change in /proc/fb causes the pm-utils scripts to skip the --quirk-no-chvt and apply other quirks, /var/log/pm-suspend.log says:
Nasty. This pm-utils quirk stuff really has no business existing IMO, and so I recommend nuking pm-utils from your system as soon as possible. Back when I still had it on my machines (due to some silly dependency I think), I just created empty override files in /etc/pm/ to permanently disable all the quirks.
But as long people might be using it I guess we need some kind of revert/fix to put the "drmfb" back into the name. Javier?
Yes, the change was just cosmetic because we had confusing names such as "simpledrmdrmfb". When it was proposed, the agreement was that /proc/fb shouldn't be considered an ABI but now we found that people are using it.
So I agree that would be better to revert this patch. Johannes, will you post a revert or do you want me to do it ?
Best regards,
Hi Javier,
On Thu, Oct 07, 2021 at 03:01:46PM +0200, Javier Martinez Canillas wrote:
Yes, the change was just cosmetic because we had confusing names such as "simpledrmdrmfb". When it was proposed, the agreement was that /proc/fb shouldn't be considered an ABI but now we found that people are using it.
So I agree that would be better to revert this patch. Johannes, will you post a revert or do you want me to do it ?
Please do it.
Thanks, Johannes
dri-devel@lists.freedesktop.org