https://bugs.freedesktop.org/show_bug.cgi?id=92836
Bug ID: 92836 Summary: amdgpu does not resume properly from suspend Product: DRI Version: XOrg git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: David@WalkerStreet.info
My laptop does not resume properly after a suspend. The problem seems to be with the amdgpu kernel module; it's often accompanied by a long stream of dmesg errors reported. Here's a sampling:
[ 1494.980561] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0b020504 [ 1494.980561] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 1494.980561] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [ 1494.980561] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0) [ 1494.995478] systemd-journald[498]: /dev/kmsg buffer overrun, some messages lost. [ 1494.980561] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0b0a4004 [ 1494.980561] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 1494.980561] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [ 1494.980561] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0) [ 1494.995486] systemd-journald[498]: /dev/kmsg buffer overrun, some messages lost.
Any ideas? I'm running the 4.2.4-1-default under openSUSE Tumbleweed. Here's some hwinfo data:
09: PCI 01.0: 0300 VGA compatible controller (VGA) [Created at pci.366] Unique ID: vSkL.bMI5Iw7ysWD SysFS ID: /devices/pci0000:00/0000:00:01.0 SysFS BusID: 0000:00:01.0 Hardware Class: graphics card Model: "ATI Carrizo" Vendor: pci 0x1002 "ATI Technologies Inc" Device: pci 0x9874 "Carrizo" SubVendor: pci 0x103c "Hewlett-Packard Company" SubDevice: pci 0x80af Revision: 0xc5 Driver: "amdgpu" Driver Modules: "drm" Memory Range: 0xe0000000-0xefffffff (ro,non-prefetchable) Memory Range: 0xf0000000-0xf07fffff (ro,non-prefetchable) I/O Ports: 0xf000-0xf0ff (rw) Memory Range: 0xff700000-0xff73ffff (rw,non-prefetchable) Memory Range: 0xff740000-0xff75ffff (ro,non-prefetchable,disabled) IRQ: 47 (129945 events) Module Alias: "pci:v00001002d00009874sv0000103Csd000080AFbc03sc00i00" Driver Info #0: Driver Status: amdgpu is active Driver Activation Cmd: "modprobe amdgpu" Config Status: cfg=new, avail=yes, need=no, active=unknown
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #1 from Alex Deucher alexdeucher@gmail.com --- Can you try kernel 4.3?
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #2 from Alex Deucher alexdeucher@gmail.com --- Please attach your xorg log and dmesg output.
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #3 from David Walker David@WalkerStreet.info --- Created attachment 119452 --> https://bugs.freedesktop.org/attachment.cgi?id=119452&action=edit dmesg output from 4.3.0-1.g7b374a4-default
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #4 from David Walker David@WalkerStreet.info --- Created attachment 119453 --> https://bugs.freedesktop.org/attachment.cgi?id=119453&action=edit Xorg.0.log from 4.3.0-1.g7b374a4-default
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #5 from David Walker David@WalkerStreet.info --- I've attached dmesg and Xorg.0.log files for 4.3.0-1.g7b374a4. It appears that the GPU faults have gone away, but the visual symptom is still the same; the screen is blank after a resume.
You'll also note that there are a *lot* of "xhci_hcd 0000:00:10.0: WARN Successful completion on short TX" messages in the dmesg output. I suspect they're unrelated, but they don't appear under 4.2.3-1.4.
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #6 from Alex Deucher alexdeucher@gmail.com --- Does booting with apci_osi=Linux on the kernel command line help? A lot of new laptops use d3cold to support windows 10 which Linux in general doesn't support at the moment.
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #7 from David Walker David@WalkerStreet.info --- apci_osi=Linux doesn't seem to help. I have found that it does recover sometimes, albeit rarely, and more often when running Gnome with Wayland, rather than X11, and sometimes only after control-alt-backspace. I haven't done all that much testing, though, so this all may simply be coincidence.
Any other debugging I could do?
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #8 from David Walker David@WalkerStreet.info --- Created attachment 119962 --> https://bugs.freedesktop.org/attachment.cgi?id=119962&action=edit suspend1.dmesg from 4.3.0-6.g6b3b033-default
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #9 from David Walker David@WalkerStreet.info --- Created attachment 119963 --> https://bugs.freedesktop.org/attachment.cgi?id=119963&action=edit suspend2.dmesg from 4.3.0-6.g6b3b033-default
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #10 from David Walker David@WalkerStreet.info --- Created attachment 119964 --> https://bugs.freedesktop.org/attachment.cgi?id=119964&action=edit suspend3.dmesg from 4.3.0-6.g6b3b033-default
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #11 from David Walker David@WalkerStreet.info --- Created attachment 119965 --> https://bugs.freedesktop.org/attachment.cgi?id=119965&action=edit Xorg.0.log from 4.3.0-6.g6b3b033-default
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #12 from David Walker David@WalkerStreet.info --- Over the past couple of Tumbleweed kernel upgrades (most recently kernel-default-4.3.0-6.1.g6b3b033), I've noticed that resumes succeed sometimes, and that if a resume fails, another one or two suspend/resume cycles will result in a successful resume. FYI, I'm also using ucode-amd-20151109git-35.1 and kernel-firmware-20151109git-35.1.
I have attached Xorg.0.log and the following three "dmesg -c" outputs:
suspend1.dmesg - before any suspends suspend2.dmesg - after two suspends, the first of which failed suspend3.dmesg - after one suspend that succeeded
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #13 from Vedran Miletić vedran@miletic.net --- David, can you try newer kernel? I get the same errors with Fedora kernel 4.5.4 on
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] [1002:6938] (rev f1)
but the errors are non-fatal, i.e. there is graphics corruption but no hangs. Restarting X removes graphics corruption.
https://bugs.freedesktop.org/show_bug.cgi?id=92836
--- Comment #14 from David Walker David@WalkerStreet.info --- (In reply to Vedran Miletić from comment #13)
David, can you try newer kernel? I get the same errors with Fedora kernel 4.5.4 on
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] [1002:6938] (rev f1)
but the errors are non-fatal, i.e. there is graphics corruption but no hangs. Restarting X removes graphics corruption.
Sorry, Vendran, but I ended up buying another laptop (this time from a company that specializes in Linux laptops), so I no longer have a testbed for this.
https://bugs.freedesktop.org/show_bug.cgi?id=92836
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |INVALID
dri-devel@lists.freedesktop.org