https://bugs.freedesktop.org/show_bug.cgi?id=75127
Priority: medium Bug ID: 75127 Assignee: dri-devel@lists.freedesktop.org Summary: Radeon SUMO: atombios stuck executing Severity: major Classification: Unclassified OS: Linux (All) Reporter: sandy.8925@gmail.com Hardware: x86-64 (AMD64) Status: NEW Version: unspecified Component: DRM/Radeon Product: DRI
Created attachment 94246 --> https://bugs.freedesktop.org/attachment.cgi?id=94246&action=edit linux_kernel_3.14-rc2_dmesg
I have a laptop with Radeon HD6520G GPU.
I am running Arch Linux 64 bit with Linux 3.14-rc2 kernel and Mesa 10.0.3
During shutdown, suspend and resume, GPU hangs and I get error messages in the kernel that state:
[drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D05E (len 62, WS 0, PS 0) @ 0xD07A
Earlier, shutdown would work fine, and laptop would also suspend quickly.
However now I find that suspend and shutdown take a long time and I see the above error messages.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #1 from Sandeep sandy.8925@gmail.com --- The dmesg output attached above:
linux_kernel_3.14-rc2_dmesg
is when I suspend the laptop and resume.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #2 from Sandeep sandy.8925@gmail.com --- Sorry, I made a mistake - this is with Linux 3.13 kernel I will re-upload the same file with the right kernel version number in the name
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Sandeep sandy.8925@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #94246|0 |1 is obsolete| |
--- Comment #3 from Sandeep sandy.8925@gmail.com --- Created attachment 94248 --> https://bugs.freedesktop.org/attachment.cgi?id=94248&action=edit linux_kernel_3.13.3_dmesg
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Radeon SUMO: atombios stuck |Radeon SUMO: GPU reset |executing |
--- Comment #4 from Alex Deucher agd5f@yahoo.com --- Is this a regression? If so can you narrow down what component you changed that caused it? The atombios messages look to be a side affect of a GPU reset.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #5 from Sandeep sandy.8925@gmail.com --- (In reply to comment #4)
Is this a regression? If so can you narrow down what component you changed that caused it? The atombios messages look to be a side affect of a GPU reset.
I think it's a regression since hangs didn't occur on shutting down, suspend and resume before. The GPU did hang sometimes when switching between VTs and on playing some games fullscreen. I'm not sure when it started hanging for shutdown, suspend and resume but I think it might be after installing Linux 3.13 kernel.
I will try using older kernel versions to see if the problem exists there as well.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #6 from Sandeep sandy.8925@gmail.com --- Compiled and installed Linux 3.12.12 kernel. No GPU hang problems occur for suspend and resume. Works fine (other than the fact that the laptop's display connected through LVDS is blank).
I will post the dmesg output soon.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #7 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #6)
Compiled and installed Linux 3.12.12 kernel. No GPU hang problems occur for suspend and resume. Works fine (other than the fact that the laptop's display connected through LVDS is blank).
Does disabling dpm help? Boot with radeon.dpm=0 on the kernel command line in grub. If not, can you bisect the kernel with git to find out what commit caused the regression?
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #8 from Sandeep sandy.8925@gmail.com --- Created attachment 94979 --> https://bugs.freedesktop.org/attachment.cgi?id=94979&action=edit linux_kernel_3.12.12_dmesg
Suspend, resume and shutdown work fine here
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #9 from Sandeep sandy.8925@gmail.com --- Unfortunately disabling dpm did not help.
I set radeon.dpm=0 and booted the Linux 3.13.5 kernel, and the same problems still occurred. Will attach dmesg output shortly.
I will try bisecting.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #10 from Sandeep sandy.8925@gmail.com --- Created attachment 94980 --> https://bugs.freedesktop.org/attachment.cgi?id=94980&action=edit linux_kernel_3.13.5_dmesg_dpm_disabled
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #11 from Sandeep sandy.8925@gmail.com --- I've started using git bisect to find the bad commit(s)
I am bisecting between 3.12 and 3.13 (as tagged)
Results so far: 42a2d923cc349583ebf6fdd52a7d35e1c2f7e6bd - good
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #12 from Sandeep sandy.8925@gmail.com --- I'm still bisecting, should be done after a few more revisions.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #13 from Sandeep sandy.8925@gmail.com --- Still have 2-3 more revisions to test.
I suspect it is most likely this commit: 10ebc0bc09344ab6310309169efc73dfe6c23d72
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #14 from Sandeep sandy.8925@gmail.com --- Confirmed:
This commit : 10ebc0bc09344ab6310309169efc73dfe6c23d72
is the first bad commit where problems occur.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #15 from Alex Deucher agd5f@yahoo.com --- Please try these patches: http://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-3.14&id=9ba... http://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-3.14&id=784...
You can also force runpm off by booting with radeon.runpm=0 on the kernel command line in grub.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #16 from Sandeep sandy.8925@gmail.com --- Setting radeon.runpm=0 helped. Suspend, resume work correctly now.
Which kernel version should I apply the patches to and test with? Latest git commit (3.14-git), or stable 3.13.x kernel code?
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #17 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #16)
Setting radeon.runpm=0 helped. Suspend, resume work correctly now.
Which kernel version should I apply the patches to and test with? Latest git commit (3.14-git), or stable 3.13.x kernel code?
They are against 3.14, but they should apply to 3.13 as well.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #18 from Sandeep sandy.8925@gmail.com --- Unfortunately, those patches did not help. The GPU hang still occurs (I tested without setting radeon.runpm=0).
I applied the patches against 3.13.6 kernel
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #19 from Sandeep sandy.8925@gmail.com --- The GPU reset still occurs on Linux kernel 3.14 as well.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Radeon SUMO: GPU reset |runpm hang with | |PowerXpress/hybrid laptop
--- Comment #20 from Alex Deucher agd5f@yahoo.com --- You have a
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kh3095@yandex.ru
--- Comment #21 from Alex Deucher agd5f@yahoo.com --- *** Bug 77082 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugzilla.kernel.org | |/show_bug.cgi?id=72701
--- Comment #22 from Alex Deucher agd5f@yahoo.com --- It seems runpm is not working properly on your system. Booting with radeon.runpm=0 reverts back to the 3.12 behavior (PX dGPUs are not dynamically powered down). Did manually powering on/off the dGPU via debugfs ever work on your system? See the "Forcing the power state of the devices" section of this page: http://nouveau.freedesktop.org/wiki/Optimus/ for how to test that.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #23 from Sandeep sandy.8925@gmail.com --- Turning off the dedicated GPU works fine, turning off the GPU doesn't.
The dedicated GPU is a Radeon HD 6650M . The kernel identifies it as a TURKS GPU.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #24 from Sandeep sandy.8925@gmail.com --- Oops, typo in last comment. When I turn off the GPU using:
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
and then try to turn on the GPU using:
echo ON > /sys/kernel/debug/vgaswitcheroo/switch
GPU reset messages are printed in the kernel.
(e.g) 7213.870052] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [ 7213.870055] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing E2F6 (len 2585, WS 4, PS 4) @ 0xE9E0 [ 7213.904826] [drm:radeon_dp_link_train_cr] *ERROR* clock recovery reached max voltage [ 7213.904827] [drm:radeon_dp_link_train_cr] *ERROR* clock recovery failed [ 7567.068285] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [ 7567.068289] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing E2F6 (len 2585, WS 4, PS 4) @ 0xE9E0 [ 7567.103047] [drm:radeon_dp_link_train_cr] *ERROR* clock recovery reached max voltage [ 7567.103048] [drm:radeon_dp_link_train_cr] *ERROR* clock recovery failed
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #25 from Alex Deucher agd5f@yahoo.com --- Created attachment 97007 --> https://bugs.freedesktop.org/attachment.cgi?id=97007&action=edit possible fix
Does the attached kernel patch help?
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #26 from Sandeep sandy.8925@gmail.com --- Unfortunately, the problem still occurs even with the new patches. I applied them against the latest source code of the kernel from git, after this commit: 18a1a7a1d862ae0794a0179473d08a414dd49234
I still get GPU reset messages even on startup.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #97007|0 |1 is obsolete| |
--- Comment #27 from Alex Deucher agd5f@yahoo.com --- Created attachment 97099 --> https://bugs.freedesktop.org/attachment.cgi?id=97099&action=edit possible fix
Updated patch.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #28 from Sandeep sandy.8925@gmail.com --- No, unfortunately GPU reset still occurs on startup, suspend, resume and shutdown.
The laptop did suspend faster than earlier cases though, maybe the GPU was able to break out of the reset cycle earlier.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #97099|0 |1 is obsolete| |
--- Comment #29 from Alex Deucher agd5f@yahoo.com --- Created attachment 97106 --> https://bugs.freedesktop.org/attachment.cgi?id=97106&action=edit possible fix
fix a stupid typo.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #30 from kh3095@yandex.ru --- Patch v3 (applied to 3.13.7) doesn't work for me. Again the same messages:
20.528628] pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot-add [ 20.528807] pciehp 0000:00:03.0:pcie04: Cannot add device at 0000:02:00
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #31 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #30)
Patch v3 (applied to 3.13.7) doesn't work for me. Again the same messages:
20.528628] pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot-add [ 20.528807] pciehp 0000:00:03.0:pcie04: Cannot add device at 0000:02:00
Please attach your dmesg output with the patch applied.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #32 from kh3095@yandex.ru --- Created attachment 97179 --> https://bugs.freedesktop.org/attachment.cgi?id=97179&action=edit dmesg, linux 3.13.7, patched with v3
Here you are...
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #97106|0 |1 is obsolete| |
--- Comment #33 from Alex Deucher agd5f@yahoo.com --- Created attachment 97193 --> https://bugs.freedesktop.org/attachment.cgi?id=97193&action=edit possible fix
New patch.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #34 from Sandeep sandy.8925@gmail.com --- Even with the latest patch applied (https://bugs.freedesktop.org/attachment.cgi?id=97193) the problem still occurs.
The system does recover from the reset faster than before though - suspends and resumes in a few seconds now, whereas earlier it would take a few tens of seconds to snap out of the reset cycle.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #35 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #34)
Even with the latest patch applied (https://bugs.freedesktop.org/attachment.cgi?id=97193) the problem still occurs.
The system does recover from the reset faster than before though - suspends and resumes in a few seconds now, whereas earlier it would take a few tens of seconds to snap out of the reset cycle.
Please attach your dmesg output with the patch applied. It shouldn't try and auto suspend or reset the integrated card at all. Somehow it seems like runtime pm is still getting applied to the integrated card.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #36 from Alex Deucher agd5f@yahoo.com --- oh, wait, that's the dGPU that is resetting, not the integrated chip. Does removing radeon.dpm=1 from your kernel command line in grub help?
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #37 from Sandeep sandy.8925@gmail.com --- (In reply to comment #36)
oh, wait, that's the dGPU that is resetting, not the integrated chip. Does removing radeon.dpm=1 from your kernel command line in grub help?
I will try that now.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #38 from Sandeep sandy.8925@gmail.com --- Results:
Startup(full restart) - no GPU reset Suspend - GPU reset but recovers quickly Resume - GPU reset and takes a long time to recover
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #39 from Alex Deucher agd5f@yahoo.com --- Does disabling dpm help (radeon.dpm=0)? if not, any chance you could bisect? Also, please attach your dmesg output with the latest patch applied.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Sandeep sandy.8925@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #94248|0 |1 is obsolete| | Attachment #94980|0 |1 is obsolete| |
--- Comment #40 from Sandeep sandy.8925@gmail.com --- Created attachment 97203 --> https://bugs.freedesktop.org/attachment.cgi?id=97203&action=edit dmesg, linux 3.15-git
linux 3.15-git-ce7613db2d + patch v4
radeon module parameters at default settings
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #41 from Sandeep sandy.8925@gmail.com --- With radeon.dpm=0 and no other module parameters for radeon
Results:
Startup(full restart) - no GPU reset Suspend - GPU reset but recovers quickly Resume - GPU reset and takes a long time to recover
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #42 from Sandeep sandy.8925@gmail.com --- What exactly do I need to bisect i.e starting and ending commit ?
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #43 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #42)
What exactly do I need to bisect i.e starting and ending commit ?
git bisect start git bisect good <commit id or tag> git bisect bad <commit id or tag>
At this point git will check out the commit halfway between these two. Test it and report back: git bisect good //if that commit works git bisect bad // if that commit is broken
git will checkout the next half way point. repeat until it's done. Once you've found the problematic commit:
git bisect reset // resets your tree back to where you were before you started bisecting. E.g., if it was working in 3.12 and broke in 3.13:
git bisect start git bisect good v3.12 git bisect bad v3.13
https://bugs.freedesktop.org/show_bug.cgi?id=75127
--- Comment #44 from Sandeep sandy.8925@gmail.com --- Ok, but what should the good and bad commits for the bisect be?
I had already done a bisection earlier and found that the commit adding and enabling runtime power management was where the problems began.
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Sandeep sandy.8925@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #94979|0 |1 is obsolete| | Attachment #97179|0 |1 is obsolete| | Attachment #97203|0 |1 is obsolete| |
--- Comment #45 from Sandeep sandy.8925@gmail.com --- Created attachment 117623 --> https://bugs.freedesktop.org/attachment.cgi?id=117623&action=edit Partial kernel logs with full atom debug info
https://bugs.freedesktop.org/show_bug.cgi?id=75127
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #46 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/446.
dri-devel@lists.freedesktop.org