https://bugs.freedesktop.org/show_bug.cgi?id=100666
--- Comment #13 from Alex Deucher alexdeucher@gmail.com --- (In reply to Luke McKee from comment #11)
In this case it was on topic. The link explains how to use fancontrol script from lm_sensors to work around fan control issues. I saw on another ticket when I first posted here that dc=1 fixed the fancontrol issues. Finally I got dc=1 working and still it doesn't resolve the dpm fancontrol issues on my platform.
dc and powerplay are largely independent. It's generally not likely that one will affect the other.
https://github.com/kobalicek/amdtweak as root # ./amdtweak --card 0 --verbose --extract-bios /tmp/amdbios.bin fails. The sysfs shows that the powerplay tables are not proper too.
I'm not familiar with that tool or how it goes about attempting to fetch the vbios. The driver uses several mechanism to fetch it depending on the platform. It's possible that tool does something weird to fetch the vbios and it's possible that tool incorrectly interprets some of the vbios tables.
[ 4969.713277] resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000c3fff window] [ 4969.713283] caller pci_map_rom+0x66/0xf0 mapping multiple BARs [ 4969.713289] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
This last message is from the pci subsystem and is harmless. If the driver were not able to load the vbios, it would fail to load.
If it can't read it's powerplay table because it can't read the bios maybe that's why there is all these problems.
The driver is able to load the vbios image just fine. If it wasn't able to, or if there was a major problem with one of the tables, the driver would fail to load.
(In reply to Alex Deucher from comment #9)
Please stop posting this on every bug report.
https://bugs.freedesktop.org/show_bug.cgi?id=100666#c0 Also the users above on this ticket above here when they grepped their dmesg wouldn't have output any powerplay mes.sages because they grepped radeon instead of amdgpu
[ 10.124232] amdgpu: [powerplay] failed to send message 309 ret is 254 [ 10.124248] amdgpu: [powerplay] failed to send pre message 14e ret is 254
There are lots of reasons an smu message might fail. Just because you see an smu message failure does not mean you are seeing the same issue as someone else. It's like a GPU hang. There are lots of potential root causes.