https://bugs.freedesktop.org/show_bug.cgi?id=99881
Bug ID: 99881 Summary: Lockup/Freezes on Laptop with switchable graphics Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/Radeon Assignee: dri-devel@lists.freedesktop.org Reporter: matthew@tech3.me
Created attachment 129781 --> https://bugs.freedesktop.org/attachment.cgi?id=129781&action=edit dmesg log
Hi,
I have a HP Pavilion dv6-3111sa laptop (circa 2010) with 2 GPUs:
01:05.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RS880M [Mobility Radeon HD 4225/4250] [1002:9712] 02:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430/5450/5470] [1002:68e0] (rev ff)
I am running Ubuntu 16.04.2 with kernel Ubuntu 4.8.0-36.36~16.04.1-generic 4.8.11
The screen usually freezes for a fraction of a second and then again a few seconds later. It may do this several times. In addition, the computer usually locks up before/after graphical login requiring a hard shutdown, although it doesn't always lock up. It seems to be preventing the computer from shutting down normally as well.
This appears in dmesg output whenever a freeze occurs:
186.427140] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 186.431201] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000). [ 186.431293] radeon 0000:02:00.0: WB enabled [ 186.431301] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff958c0f4f3c00 [ 186.431306] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff958c0f4f3c0c [ 186.431703] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffad3d81a1c418 [ 186.447926] [drm] ring test on 0 succeeded in 1 usecs [ 186.447934] [drm] ring test on 3 succeeded in 2 usecs [ 186.634582] [drm] ring test on 5 succeeded in 1 usecs [ 186.634592] [drm] UVD initialized successfully. [ 186.634648] [drm] ib test on ring 0 succeeded in 0 usecs [ 186.634686] [drm] ib test on ring 3 succeeded in 0 usecs [ 186.805724] [drm] ib test on ring 5 succeeded [ 186.838322] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo [ 186.942052] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535 [ 196.033454] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 196.646111] snd_hda_intel 0000:02:00.1: Cannot lock devices!
Adding radeon.runpm=0 to my boot cmdline solves the issues as a workaround.
With previous ubuntu/kernel versions, the main issue was the freezing which would happen every seven seconds with the corresponding dmesg block. This would continue ad infinitum, although on rare occasions it would stop after many freezes. However with my current kernel this pattern doesn't seem to occur - it freezes a few times before the freezing stops and the freezes do not occur at regular intervals.
I'm not sure if this is a graphics or sound issue from the dmesg block. There's also some ACPI errors in the dmesg log so maybe a firmware problem, or faulty hardware? I tried some lower level debugging previously but couldn't conclude anything.
Thanks for any assistance.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #1 from Matthew Fox matthew@tech3.me --- Created attachment 129782 --> https://bugs.freedesktop.org/attachment.cgi?id=129782&action=edit lspci.log
https://bugs.freedesktop.org/show_bug.cgi?id=99881
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #129781|text/x-log |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #2 from Michel Dänzer michel@daenzer.net --- It sounds like you have the environment variable DRI_PRIME=1 set for all applications?
Those dmesg messages are normal when the dedicated GPU is powered up, which takes some time. With runpm enabled, it's powered off automatically when nothing uses it for a while.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #3 from Matthew Fox matthew@tech3.me --- Hi Michel,
Just a slight correction to my description - I am running Ubuntu Gnome 16.04.2.
This is a fresh install and I have not set that env var anywhere. Where could I check for that?
Does that mean with radeon.runpm=0 the laptop would be using more power & generating more heat?
Thanks
Matthew
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #4 from Michel Dänzer michel@daenzer.net --- (In reply to Matthew Fox from comment #3)
This is a fresh install and I have not set that env var anywhere. Where could I check for that?
What does
env | grep DRI_
say?
Does that mean with radeon.runpm=0 the laptop would be using more power & generating more heat?
Yes (assuming the dedicated GPU is off most of the time with runpm on).
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #5 from Matthew Fox matthew@tech3.me ---
That printed nothing.
My session with runtime pm enabled (no radeon.runpm=0 in cmdline) had been running for a couple of hours without problem (apart from a bit of freezing at the start). However, just after running that command, some new radeon errors appeared in dmesg that I haven't seen before. I think they were ring test failures. The PC has locked up now anyway so I can only hard shut it down. I was switching ttys with CTRL+ALT at the same time which might have caused it.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #6 from Michel Dänzer michel@daenzer.net --- Note that you should run
env | grep DRI_
in an X terminal, not in a console TTY.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #7 from Matthew Fox matthew@tech3.me --- (In reply to Michel Dänzer from comment #6)
Same result in both :/
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #8 from Michel Dänzer michel@daenzer.net --- Please attach the corresponding Xorg log file.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #9 from Matthew Fox matthew@tech3.me --- (In reply to Michel Dänzer from comment #8)
Please attach the corresponding Xorg log file.
Hi,
The only Xorg logs I have are for my new session. They weren't in /var/log/ but
/home/matthew/.local/share/xorg/Xorg.1.log /var/lib/gdm3/.local/share/xorg/Xorg.0.log
for some reason. They are attached.
Also attached is a dmesg log for my current session.
When I said:
With previous ubuntu/kernel versions, the main issue was the freezing which would happen every seven seconds with the corresponding dmesg block. This would continue ad infinitum, although on rare occasions it would stop after many freezes. However with my current kernel this pattern doesn't seem to occur - it freezes a few times before the freezing stops and the freezes do not occur at regular intervals.
- this seems to be true of my current kernel. From the current dmesg.log, the 'Disabling via vga_switcheroo' happened at 14, 33, 41 and finally 48 (seven seconds apart, except 14-33):
[ 14.146303] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 15.586313] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 15.588655] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000). [ 15.588728] radeon 0000:02:00.0: WB enabled [ 15.588731] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00 [ 15.588733] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c [ 15.589099] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418 [ 15.605265] [drm] ring test on 0 succeeded in 1 usecs [ 15.605270] [drm] ring test on 3 succeeded in 2 usecs [ 15.791907] [drm] ring test on 5 succeeded in 1 usecs [ 15.791914] [drm] UVD initialized successfully. [ 15.791956] [drm] ib test on ring 0 succeeded in 0 usecs [ 15.791986] [drm] ib test on ring 3 succeeded in 0 usecs [ 16.482332] [drm] ib test on ring 5 succeeded [ 16.515177] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo [ 16.619344] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535
[ 33.089549] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 33.389563] snd_hda_intel 0000:02:00.1: Cannot lock devices! [ 34.733597] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 34.735932] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000). [ 34.736006] radeon 0000:02:00.0: WB enabled [ 34.736009] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00 [ 34.736011] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c [ 34.736378] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418 [ 34.753251] [drm] ring test on 0 succeeded in 1 usecs [ 34.753256] [drm] ring test on 3 succeeded in 2 usecs [ 34.939919] [drm] ring test on 5 succeeded in 1 usecs [ 34.939926] [drm] UVD initialized successfully. [ 34.939969] [drm] ib test on ring 0 succeeded in 0 usecs [ 34.940006] [drm] ib test on ring 3 succeeded in 0 usecs [ 35.617560] [drm] ib test on ring 5 succeeded [ 35.650390] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo [ 35.753848] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535
[ 41.025246] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 41.325632] snd_hda_intel 0000:02:00.1: Cannot lock devices! [ 42.665278] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 42.667593] [drm] PCIE GART of 512M enabled (table at 0x000000000014C000). [ 42.667666] radeon 0000:02:00.0: WB enabled [ 42.667670] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff90180fa71c00 [ 42.667671] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff90180fa71c0c [ 42.668038] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffbdbf41a1c418 [ 42.684185] [drm] ring test on 0 succeeded in 1 usecs [ 42.684189] [drm] ring test on 3 succeeded in 2 usecs [ 42.870780] [drm] ring test on 5 succeeded in 1 usecs [ 42.870784] [drm] UVD initialized successfully. [ 42.870821] [drm] ib test on ring 0 succeeded in 0 usecs [ 42.870850] [drm] ib test on ring 3 succeeded in 0 usecs [ 43.553259] [drm] ib test on ring 5 succeeded [ 43.582109] snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo [ 43.685717] snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535
[ 48.960919] snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo [ 49.261324] snd_hda_intel 0000:02:00.1: Cannot lock devices!
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #10 from Matthew Fox matthew@tech3.me --- Created attachment 129783 --> https://bugs.freedesktop.org/attachment.cgi?id=129783&action=edit dmesg.log 2
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #11 from Matthew Fox matthew@tech3.me --- Created attachment 129784 --> https://bugs.freedesktop.org/attachment.cgi?id=129784&action=edit Xorg.1.log
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #12 from Matthew Fox matthew@tech3.me --- Created attachment 129785 --> https://bugs.freedesktop.org/attachment.cgi?id=129785&action=edit Xorg.0.log
https://bugs.freedesktop.org/show_bug.cgi?id=99881
Matthew Fox matthew@tech3.me changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #129782|text/x-log |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #13 from Michel Dänzer michel@daenzer.net --- Please attach the output of xrandr.
With runpm enabled, if you run xrandr, does the dedicated GPU turn on and the corresponding messages appear in dmesg?
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #14 from Matthew Fox matthew@tech3.me --- Hi,
It's rare that the PC doesn't lock up with runpm enabled so I've only been able to test this a couple of times.
In the first try, the PC had stabilized (stopped freezing) after a while. I then ran xrandr. Immediately after I cat /sys/kernel/debug/vgaswitcheroo/switch and the discrete gpu had powered up. dmesg showed 1 block of gpu initialization lines. A few seconds later vgaswitcheroo/switch showed the discrete gpu as being off. dmesg also showed 2 or 3 blocks of the gpu initialization. It looked like the gpu was being enabled and disabled repeatedly. The computer then locked up a few seconds later. I don't have any logs for this session.
In the second try, the PC had stabilized. I ran xrandr and vgaswitcheroo/switch had changed from 'DynOff' to 'DynPwr' for the discrete gpu. dmesg showed 1 block of the gpu initialization. The computer locked up a few seconds later. The logs I have were captured straight after xrandr had run so the 'dmesg after' log only shows one of the gpu initialization blocks but I suspect the gpu was being enabled and disabled repeatedly before the PC locked up. I wasn't able to run dmesg again before the lockup to confirm.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #15 from Matthew Fox matthew@tech3.me --- Created attachment 129808 --> https://bugs.freedesktop.org/attachment.cgi?id=129808&action=edit xrandr.log
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #16 from Matthew Fox matthew@tech3.me --- Created attachment 129809 --> https://bugs.freedesktop.org/attachment.cgi?id=129809&action=edit Xorg log before xrandr
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #17 from Matthew Fox matthew@tech3.me --- Created attachment 129810 --> https://bugs.freedesktop.org/attachment.cgi?id=129810&action=edit Xorg log after xrandr
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #18 from Matthew Fox matthew@tech3.me --- Created attachment 129811 --> https://bugs.freedesktop.org/attachment.cgi?id=129811&action=edit vgaswitcheroo switch before xrandr
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #19 from Matthew Fox matthew@tech3.me --- Created attachment 129812 --> https://bugs.freedesktop.org/attachment.cgi?id=129812&action=edit vgaswitcheroo switch after xrandr
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #20 from Matthew Fox matthew@tech3.me --- Created attachment 129813 --> https://bugs.freedesktop.org/attachment.cgi?id=129813&action=edit dmesg before xrandr
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #21 from Matthew Fox matthew@tech3.me --- Created attachment 129814 --> https://bugs.freedesktop.org/attachment.cgi?id=129814&action=edit dmesg after xrandr
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #22 from Michel Dänzer michel@daenzer.net --- I suspect what happens is that some client occasionally asks the X server to probe the connected displays, similar to xrandr. This powers up the dGPU, in order to probe its display connectors. That takes some time, during which the X server freezes.
Assuming you don't need the dGPU display outputs, adding the below to /etc/X11/xorg.conf may serve as a workaround. You can still use the dGPU for applications by setting the environment variable DRI_PRIME=1.
Section "ServerFlags" Option "AutoAddGPU" "off" EndSection
Section "Device" Identifier "Device0" Option "AccelMethod" "glamor" Option "DRI" "3" EndSection
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #23 from Matthew Fox matthew@tech3.me --- That workaround doesn't seem to have any effect so I'll run with radeon.runpm=0
Thanks for your help any way.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #24 from Michel Dänzer michel@daenzer.net --- (In reply to Matthew Fox from comment #23)
That workaround doesn't seem to have any effect [...]
At the very least, it should have visible effects in the Xorg log file and xrandr output. Please attach those with the attempted workaround.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #25 from Matthew Fox matthew@tech3.me --- Hi,
/etc/X11/xorg.conf didn't exist so I created it with the contents you specified.
So I'm now running with runpm enabled and the xorg.conf in place.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #26 from Matthew Fox matthew@tech3.me --- Created attachment 129854 --> https://bugs.freedesktop.org/attachment.cgi?id=129854&action=edit xrandr.log 2
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #27 from Matthew Fox matthew@tech3.me --- Created attachment 129855 --> https://bugs.freedesktop.org/attachment.cgi?id=129855&action=edit Xorg.0.log
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #28 from Matthew Fox matthew@tech3.me --- Created attachment 129856 --> https://bugs.freedesktop.org/attachment.cgi?id=129856&action=edit Xorg.1.log
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #29 from Matthew Fox matthew@tech3.me --- Created attachment 129857 --> https://bugs.freedesktop.org/attachment.cgi?id=129857&action=edit dmesg
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #30 from Matthew Fox matthew@tech3.me --- Just to confirm, the freezes and hard lockups still occur and the corresponding messages in dmesg which I also attached.
This may be more sound related but I previously found in the kernel source (file http://lxr.free-electrons.com/source/sound/pci/hda/hda_intel.c?v=4.8):
1182 static int register_vga_switcheroo(struct azx *chip) 1183 { 1184 struct hda_intel *hda = container_of(chip, struct hda_intel, chip); 1185 int err; 1186 1187 if (!hda->use_vga_switcheroo) 1188 return 0; 1189 /* FIXME: currently only handling DIS controller 1190 * is there any machine with two switchable HDMI audio controllers? 1191 */ 1192 err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops, 1193 VGA_SWITCHEROO_DIS); 1194 if (err < 0) 1195 return err; 1196 hda->vga_switcheroo_registered = 1; 1197 1198 /* register as an optimus hdmi audio power domain */ 1199 vga_switcheroo_init_domain_pm_optimus_hdmi_audio(chip->card->dev, 1200 &hda->hdmi_pm_domain); 1201 return 0; 1202 }
In dmesg, these lines always appear along with the gpu init lines:
snd_hda_intel 0000:02:00.1: Enabling via vga_switcheroo snd_hda_intel 0000:02:00.1: CORB reset timeout#2, CORBRP = 65535 snd_hda_intel 0000:02:00.1: Disabling via vga_switcheroo snd_hda_intel 0000:02:00.1: Cannot lock devices!
'CORB reset timeout#2, CORBRP = 65535' appears red in dmesg and 'Cannot lock devices!' appears white in dmesg.
0000:02:00.1 is the Discrete audio attached to the discrete GPU (the discrete GPU is 02:00.0)
From lspci, there's another audio device:
00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) [1002:4383] (rev 40)
Now in the function above, it says '...is there any machine with two switchable HDMI audio controllers?' - I wonder if that's the case here? Which might be causing problems and the associated sound messages in dmesg?
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #31 from Michel Dänzer michel@daenzer.net --- (In reply to Matthew Fox from comment #30)
Just to confirm, the freezes and hard lockups still occur and the corresponding messages in dmesg which I also attached.
Weird; the xrandr output and Xorg log file show that the workaround is working as intended, Xorg is no longer using the dGPU; not sure why it's still getting powered on.
I'm not sure about the sound messages, but I'd guess they're a symptom of the dGPU powering on, not its cause. You could try if radeon.audio=0 on the kernel command line makes any difference though, just in case.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #32 from Alex Deucher alexdeucher@gmail.com --- Does your kernel have this patch? http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers...
https://bugs.freedesktop.org/show_bug.cgi?id=99881
--- Comment #33 from Matthew Fox matthew@tech3.me --- Hi,
With runpm enabled & radeon.audio=0, the computer locks up requiring a hard shutdown.
With runpm enabled & radeon.audio=0 & xorg.conf workaround, ditto. Except sometimes instead the computer will lock up for 10 seconds or so during which time the caps lock will toggle on/off, pressed keys will not be printed on screen. Mouse cursor will move on screen but clicks will not happen. After the freeze, the key presses that didn't print, print and same for the mouse clicks.
Alex - yes it does.
https://bugs.freedesktop.org/show_bug.cgi?id=99881
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #34 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/774.
dri-devel@lists.freedesktop.org