[Bug 199959] New: amdgpu, regression?: system freezes after resume

List overview All Threads
Download

newer

older

[PATCH] staging: android: ion:...

[Bug 99851] [drm:.r600_ring_test...

bugzilla-daemon＠bugzilla.kernel.org

7 Jun 2018 7 Jun '18

12:45 a.m.

https://bugzilla.kernel.org/show_bug.cgi?id=199959

Bug ID: 199959 Summary: amdgpu, regression?: system freezes after resume Product: Drivers Version: 2.5 Kernel Version: 4.17 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: mezin.alexander@gmail.com Regression: No

Created attachment 276359 --> https://bugzilla.kernel.org/attachment.cgi?id=276359&action=edit failed resume - journalctl output

After suspend and resume, I see the lock screen, but mouse cursor doesn't move, pressing keys doesn't seem to change anything (can't perform VT switch too).

Sapphire Radeon RX 580 Pulse 8 Gb Two displays connected through DisplayPort: Dell P2415Q and LG 27UD69P Cinnamon desktop (Xorg) Arch Linux

Happens on kernels 4.16.13 and 4.17 (even with amdgpu.dc=0) Doesn happen with kernel 4.14.48 (and earlier 4.14.*)

-- You are receiving this mail because: You are watching the assignee of the bug.

Show replies by date

bugzilla-daemon＠bugzilla.kernel.org

7 Jun 7 Jun

3:03 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #1 from Alexander Mezin (mezin.alexander@gmail.com) --- Just tested 4.15.15 and 4.16 On 4.15.15 suspend and resume works fine On 4.16 the system freezes even with amdgpu.dc=0

Note that Arch has

CONFIG_DRM_AMD_DC=y CONFIG_DRM_AMD_DC_PRE_VEGA=y # CONFIG_DRM_AMD_DC_FBC is not set CONFIG_DRM_AMD_DC_DCN1_0=y

in kernel config for both 4.15 and 4.16

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

3:04 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #2 from Alexander Mezin (mezin.alexander@gmail.com) --- Created attachment 276363 --> https://bugzilla.kernel.org/attachment.cgi?id=276363&action=edit Failed resume - 4.16, amdgpu.dc=0

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

3:05 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

Alexander Mezin (mezin.alexander@gmail.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- Regression|No |Yes

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

3:26 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

Alexander Mezin (mezin.alexander@gmail.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- Kernel Version|4.17 |4.16

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

7:54 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

Christian König (christian.koenig@amd.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |christian.koenig@amd.com

--- Comment #3 from Christian König (christian.koenig@amd.com) --- Standard question: Can you bisect?

The logs don't show anything suspicious, so without a bisect it is probably really hard to guess what this could be.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

8 Jun 8 Jun

4:13 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #4 from Alexander Mezin (mezin.alexander@gmail.com) --- Commit d6895ad39f3b396be199f5b6fdfb8cde4be7bbf7 seems to be the cause. Resume works on 4.16 if I revert that single commit (tested on 4.16.0, 4.16.13, with both amdgpu.dc=0 and amdgpu.dc=1).

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

7:53 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #5 from Christian König (christian.koenig@amd.com) --- Ok, well that is interesting.

Please provide the output of "sudo cat /proc/iomem" and "lspci -t -v -nn".

In the meantime I will try to reproduce the issue here.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

11:13 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #6 from Alexander Mezin (mezin.alexander@gmail.com) --- Created attachment 276391 --> https://bugzilla.kernel.org/attachment.cgi?id=276391&action=edit lspci -t -v -nn

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

11:14 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #7 from Alexander Mezin (mezin.alexander@gmail.com) --- Created attachment 276393 --> https://bugzilla.kernel.org/attachment.cgi?id=276393&action=edit /proc/iomem

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

12:02 p.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #8 from Christian König (christian.koenig@amd.com) --- Mhm, I've tried the same ASIC (Polaris 10 8gb) in an AMD Threadripper and here it is working quite fine with suspend/resume.

So the only explanation I have is that this is some strange issue with PCI BAR resizing and Intel hardware.

Is the system completely unresponsive after resume, or can you at least ping it over the network?

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

9 Jun 9 Jun

12:36 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #9 from Alexander Mezin (mezin.alexander@gmail.com) --- It seems that only GPU is hung, I can even SSH to the machine. But things like restarting gdm/Xorg/unplugging the monitor didn't "fix" it. "shutdown -h now" didn't work.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

3:31 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #10 from Alexander Mezin (mezin.alexander@gmail.com) --- Actually, sometimes mouse pointer moves, and only freezes after I press a few keys/click a few times. Also, sometimes it's just colored pattern instead of the lock screen on the background. With Gnome on Wayland it takes a bit more time to break: after resume I see the desktop, but after a few clicks/key presses I see artifacts and then eventually everything freezes.

And just in case: - The problem also occurs with only one monitor connected. - On Windows on the same machine suspend and resume works without any problems.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

5:18 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #11 from Alexander Mezin (mezin.alexander@gmail.com) --- I literally have no idea what I'm doing, but adding 'amdgpu_device_resize_fb_bar(adev);' line to all 'gmc_v?_?_resume()' (because I don't know which version is used for my card) "fixed" it somehow. Resume works, but there are some artifacts on screen during resume (they flash only once and then disappear). Before 'amdgpu_device_resize_fb_bar' was introduced, there were no artifacts at all.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

5:47 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #12 from Alexander Mezin (mezin.alexander@gmail.com) --- Created attachment 276415 --> https://bugzilla.kernel.org/attachment.cgi?id=276415&action=edit dmesg: resume with device_resize_fb_bar() in gmc_v?_?_resume()

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

9:37 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #13 from Christian König (christian.koenig@amd.com) --- (In reply to Alexander Mezin from comment #11)

...

I literally have no idea what I'm doing, but adding 'amdgpu_device_resize_fb_bar(adev);' line to all 'gmc_v?_?_resume()' (because I don't know which version is used for my card) "fixed" it somehow. Resume works, but there are some artifacts on screen during resume (they flash only once and then disappear). Before 'amdgpu_device_resize_fb_bar' was introduced, there were no artifacts at all.

Hehe, yeah that was a really nice test and confirms my suspicion on what's going wrong here.

Because you tried to resize the BAR once more after resume the resources in the address space are freed up and allocated again: [ 212.484672] amdgpu 0000:65:00.0: BAR 2: releasing [mem 0xe200000000-0xe2001fffff 64bit pref] [ 212.484673] amdgpu 0000:65:00.0: BAR 0: releasing [mem 0xe000000000-0xe1ffffffff 64bit pref] [ 212.484683] pcieport 0000:64:00.0: BAR 15: releasing [mem 0xe000000000-0xe2ffffffff 64bit pref]

[ 212.484691] pcieport 0000:64:00.0: BAR 15: assigned [mem 0xe000000000-0xe2ffffffff 64bit pref] [ 212.484692] amdgpu 0000:65:00.0: BAR 0: assigned [mem 0xe000000000-0xe1ffffffff 64bit pref] [ 212.484697] amdgpu 0000:65:00.0: BAR 2: assigned [mem 0xe200000000-0xe2001fffff 64bit pref]

Since it allocates the exact same address we freed up before the real issue is not the address itself, but that fact that the hardware config isn't saved during suspend/resume.

That strongly looks like a bug in the BIOS and/or the Linux PCI subsystem driver for Intel hardware to me.

I will try to narrow this down with a few patches on Monday, but don't expect any quick fix.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

11 Jun 11 Jun

2:22 p.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #14 from Christian König (christian.koenig@amd.com) --- Created attachment 276471 --> https://bugzilla.kernel.org/attachment.cgi?id=276471&action=edit Testing patch

Please test if this patch helps as well.

It limits the work done during resume to reprogramming BAR 0 & 2 and not the bridge.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

12 Jun 12 Jun

12:40 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #15 from Alexander Mezin (mezin.alexander@gmail.com) --- No, it doesn't change anything, system freezes on resume.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

8:12 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #16 from Christian König (christian.koenig@amd.com) --- So the problem seems to be the bridge then.

Please provide me with the output of the following commands, once before you suspended, once after you resumed without any change and once after you resumed with your hack to resize the BAR once more:

sudo setpci -s 64:00.0 COMMAND PREF_MEMORY_BASE PREF_MEMORY_LIMIT PREF_BASE_UPPER32 PREF_LIMIT_UPPER32 sudo lspci -s 64:00.0 -vvvv

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

13 Jun 13 Jun

2:38 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #17 from Alexander Mezin (mezin.alexander@gmail.com) --- setpci - exactly the same output in all 3 cases (verified with 'diff' to be sure): 0407 0001 fff1 000000e0 000000e2

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:39 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #18 from Alexander Mezin (mezin.alexander@gmail.com) --- Created attachment 276517 --> https://bugzilla.kernel.org/attachment.cgi?id=276517&action=edit lspci before suspend

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:39 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #19 from Alexander Mezin (mezin.alexander@gmail.com) --- Created attachment 276519 --> https://bugzilla.kernel.org/attachment.cgi?id=276519&action=edit lspci after resume, no hack

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

2:39 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #20 from Alexander Mezin (mezin.alexander@gmail.com) --- Created attachment 276521 --> https://bugzilla.kernel.org/attachment.cgi?id=276521&action=edit lspci after resume with hack

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

3:42 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #21 from Alexander Mezin (mezin.alexander@gmail.com) --- Not sure if it'll help, but I've added more logging here:

--- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -436,6 +436,8 @@ int pci_resize_resource(struct pci_dev *dev, int resno, int size) if (ret) return ret;

+ pci_info(dev, "BAR %d: resized from %d to %d", resno, old, size); + res->end = res->start + pci_rebar_size_to_bytes(size) - 1;

/* Check if the new config works by trying to assign everything. */

And suspend-resume with "re-resize" hack shows this:

amdgpu 0000:65:00.0: BAR 0: resized from 8 to 13

(this message appears in dmesg two times, first one on boot, second one during resume, exactly the same message in both cases)

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

14 Jun 14 Jun

8:38 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #22 from Christian König (christian.koenig@amd.com) --- Your debugging efforts are better than mine.

Please provide the output of "sudo setpci -s 65:00.0 ECAP15.l ECAP15+4.l ECAP15+8.l" once before suspend and once after suspend without any changes (e.g. when the problem happens).

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

9:55 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #23 from Alexander Mezin (mezin.alexander@gmail.com) --- (In reply to Christian König from comment #22)

...

Your debugging efforts are better than mine.

Please provide the output of "sudo setpci -s 65:00.0 ECAP15.l ECAP15+4.l ECAP15+8.l" once before suspend and once after suspend without any changes (e.g. when the problem happens).

before suspend: 27010015 0003f000 00000d20

after resume: 27010015 0003f000 00000820

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

10:02 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #24 from Christian König (christian.koenig@amd.com) --- Created attachment 276547 --> https://bugzilla.kernel.org/attachment.cgi?id=276547&action=edit Possible fix

In this case please try the attached patch and see if it helps.

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

10:17 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #25 from Alexander Mezin (mezin.alexander@gmail.com) --- Yes, it works

dmesg: [ 34.330683] amdgpu 0000:65:00.0: Test 0 from 8 to 13

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

19 Jun 19 Jun

2:13 p.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

Joern Hoffmann (j.hoffmann@quapona.com) changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |j.hoffmann@quapona.com

--- Comment #26 from Joern Hoffmann (j.hoffmann@quapona.com) --- For me, it works to.

dmesg | grep amdgpu:

[ 3.437098] [drm] amdgpu kernel modesetting enabled. [ 3.442103] fb: switching to amdgpudrmfb from EFI VGA [ 3.442234] amdgpu 0000:01:00.0: enabling device (0006 -> 0007) [ 3.443795] amdgpu 0000:01:00.0: BAR 2: releasing [mem 0xd0000000-0xd01fffff 64bit pref] [ 3.443797] amdgpu 0000:01:00.0: BAR 0: releasing [mem 0xc0000000-0xcfffffff 64bit pref] [ 3.443822] amdgpu 0000:01:00.0: BAR 0: assigned [mem 0x2200000000-0x23ffffffff 64bit pref] [ 3.443827] amdgpu 0000:01:00.0: BAR 2: assigned [mem 0x2100000000-0x21001fffff 64bit pref] [ 3.443849] amdgpu 0000:01:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used) [ 3.443850] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF [ 3.443917] [drm] amdgpu: 8192M of VRAM memory ready [ 3.443918] [drm] amdgpu: 8192M of GTT memory ready. [ 4.239650] fbcon: amdgpudrmfb (fb0) is primary device [ 4.323338] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device [ 4.340440] [drm] Initialized amdgpu 3.25.0 20150101 for 0000:01:00.0 on minor 0 [ 10.704309] amdgpu 0000:01:00.0: 00000000a78be373 unpin not necessary [ 10.704310] amdgpu 0000:01:00.0: 00000000a78be373 unpin not necessary [ 10.704310] amdgpu 0000:01:00.0: 000000006047af5e unpin not necessary [ 10.704311] amdgpu 0000:01:00.0: 000000002d9a27ec unpin not necessary [ 11.443673] amdgpu 0000:01:00.0: Test 0 from 8 to 13

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

30 Jun 30 Jun

1:46 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

--- Comment #27 from Alexander Mezin (mezin.alexander@gmail.com) --- So the patch will only land in 4.19. Are you going to fix the regression (in amdgpu) for 4.15-4.18 somehow?

-- You are receiving this mail because: You are watching the assignee of the bug.

bugzilla-daemon＠bugzilla.kernel.org

26 Aug 26 Aug

12:34 a.m.

New subject: [Bug 199959] amdgpu, regression?: system freezes after resume

https://bugzilla.kernel.org/show_bug.cgi?id=199959

Aleksandr Mezin (mezin.alexander@gmail.com) changed:

--- Comment #28 from Aleksandr Mezin (mezin.alexander@gmail.com) --- Seems to be fixed in 4.18.5 by backport

-- You are receiving this mail because: You are watching the assignee of the bug.

2453

Age (days ago)

2533

Last active (days ago)

dri-devel@lists.freedesktop.org

30 comments

1 participants

tags (0)

participants (1)

bugzilla-daemon＠bugzilla.kernel.org