https://bugs.freedesktop.org/show_bug.cgi?id=99907
Bug ID: 99907 Summary: linux-firmware 2017-02-17 update causes varying breaks in AMDGPU for recent cards Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: saunders.52@wright.edu
Currently on Arch Linux after they shipped an update to the linux-firmware set (20170217.12987ca-1), there's been reports of various issues ranging from power management failing to in my case (AMD Radeon RX 460) Xorg failing to work at all (it either blinks and goes back to a frozen VT as the GPU hangs, or the GPU hangs on a full-screen corruption of some kind.) This is broken on both kernel 4.9.11 and 4.10 in my testing, on Xorg 1.19.1. The system still responds to SSH connections, but fails to shutdown properly if attempted over that. Tracker link: https://bugs.archlinux.org/task/53042
I've traced it back to a specific commit to linux-firmware, 7a110b85a46d7f884f4ac712ff52e02ed57234bd, https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/com..., pushed to the git repo on 2-17-17, which updates a large subset of the firmware images used by AMDGPU. Seeing as how this is a binary file set, I'm really not sure how to proceed from here in testing it to give any more useful information here.
Apologies if this is the wrong place to report a firmware issue, but I was unsure where to file it otherwise.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #1 from saunders.52@wright.edu --- There's a dmesg set on the Arch bug tracker provided for the power management failure case - Previous firmware revision: https://bugs.archlinux.org/task/53042?getfile=15004 Current firmware revision: https://bugs.archlinux.org/task/53042?getfile=15005
Looking at my Xorg logs over SSH at the time, there were no differences to a successful useage of Xorg on the previous firmware. I wasn't thoughtful enough to take a dmesg capture, and I've got a large workload running on my machine right now. I can probably experiment with getting the logs for my case in a day or two.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #2 from Alex Deucher alexdeucher@gmail.com --- Does the firmware here fix the issue?
https://people.freedesktop.org/~agd5f/radeon_ucode/polaris/
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #3 from saunders.52@wright.edu --- I tried the firmware you linked and the problems persisted (GPU hang when starting Xorg). I did take the opportunity and the fact the machine still responds over SSH to capture my Xorg and kernel logs, which I will attach. For the record, the symptoms are the same with AMDGPU with my standard config (DRI 3, TearFree), a blank config file, and with Modesetting.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #4 from saunders.52@wright.edu --- Created attachment 129843 --> https://bugs.freedesktop.org/attachment.cgi?id=129843&action=edit 4.9.11 kernel log with old functioning firmware.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #5 from saunders.52@wright.edu --- Created attachment 129844 --> https://bugs.freedesktop.org/attachment.cgi?id=129844&action=edit 4.9.11 kernel log with new malfunctioning firmware.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #6 from saunders.52@wright.edu --- Created attachment 129845 --> https://bugs.freedesktop.org/attachment.cgi?id=129845&action=edit Xorg 1.19.1 log with new malfunctioning firmware.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #7 from saunders.52@wright.edu --- Created attachment 129846 --> https://bugs.freedesktop.org/attachment.cgi?id=129846&action=edit Xorg 1.19.1 log with old functional firmware.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #8 from Sasan sasy360@gmail.com --- RX 460 user here. Same issue. Kernel panic and backtrace messages in my log file might help.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #9 from Sasan sasy360@gmail.com --- Created attachment 129852 --> https://bugs.freedesktop.org/attachment.cgi?id=129852&action=edit journal log
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #10 from Alex Deucher alexdeucher@gmail.com --- I've reverted the polaris 11 changes in the firmware git tree. Just waiting for them to land.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #11 from saunders.52@wright.edu --- Okay, that's glad for me to hear. There's still the people on Polaris10 and others having power management failures - someone's card doubled in idle temperature.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #12 from saunders.52@wright.edu --- Arch has a testing update (linux-firmware-20170217.12987ca-2) that's the same git revison that was causing problems with the troublesome AMD commits reverted, and this has fixed both my RX 460 GPU hang and the issues with power management on an R9 Fury.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #13 from Alex Deucher alexdeucher@gmail.com --- Does the new firmware work properly with kernel 4.10 or newer?
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #14 from saunders.52@wright.edu --- Which new firmware? The one you linked earlier in the discussion, or the new setup with the one git commit reverted?
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #15 from Alex Deucher alexdeucher@gmail.com --- (In reply to saunders.52 from comment #14)
Which new firmware? The one you linked earlier in the discussion, or the new setup with the one git commit reverted?
Either or both.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #16 from saunders.52@wright.edu --- (In reply to Alex Deucher from comment #15)
(In reply to saunders.52 from comment #14)
Which new firmware? The one you linked earlier in the discussion, or the new setup with the one git commit reverted?
Either or both.
The GIT commit reversion should be similar enough to a manual change I tried with both 4.9 and 4.10 I can almost certainly say it would work (trying the old firmware manually). I haven't tried the other, and won't be able to for about 5 hours (away from the desktop in question).
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #17 from Alex Deucher alexdeucher@gmail.com --- (In reply to saunders.52 from comment #16)
The GIT commit reversion should be similar enough to a manual change I tried with both 4.9 and 4.10 I can almost certainly say it would work (trying the old firmware manually). I haven't tried the other, and won't be able to for about 5 hours (away from the desktop in question).
So the new firmware works in 4.10, but not in 4.9?
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #18 from saunders.52@wright.edu --- (In reply to Alex Deucher from comment #17)
(In reply to saunders.52 from comment #16)
The GIT commit reversion should be similar enough to a manual change I tried with both 4.9 and 4.10 I can almost certainly say it would work (trying the old firmware manually). I haven't tried the other, and won't be able to for about 5 hours (away from the desktop in question).
So the new firmware works in 4.10, but not in 4.9?
The old firmware works in 4.10. The new firmware hasn't been tested by me outside of 4.9.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #19 from saunders.52@wright.edu --- Well, the one you linked above didn't work in 4.9. The one shipping in the repos that is getting reverted (20170217.12987ca-1) didn't work in 4.9 and 4.10. The oldest of the three (the one shipped originally as part of 20161222.4b9559f) is stable in both.
Are there some version numbers I can refer to these by to make this less insanely confusing?
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #20 from saunders.52@wright.edu --- And I didn't check the one you linked in 4.10. I think.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #21 from Alex Deucher alexdeucher@gmail.com --- (In reply to saunders.52 from comment #19)
Are there some version numbers I can refer to these by to make this less insanely confusing?
The 5th dword in each binary is the version.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #22 from saunders.52@wright.edu --- Okay, assuming I'm reading this right with hexdump... On an RX 460 (4 GB):
Old Committed Version (0080 0000): Works on 4.9 and 4.10. New Committed Version, Now Uncommitted (0083 0000): Does not work on 4.9 and 4.10. Download Version (0086 0000): Tested on 4.9, where it doesn't work. Probably not tested on 4.10 (I don't remember.)
https://bugs.freedesktop.org/show_bug.cgi?id=99907
--- Comment #23 from saunders.52@wright.edu --- I was able to get back to the machine in question sooner than I thought. The version you have for download in Comment 2, (0086 0000) does not work on 4.10, and has the same crash issue.
https://bugs.freedesktop.org/show_bug.cgi?id=99907
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #24 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/142.
dri-devel@lists.freedesktop.org