https://bugzilla.kernel.org/show_bug.cgi?id=193651
Bug ID: 193651 Summary: Amdgpu error messages at boot with Amd RX460 Product: Drivers Version: 2.5 Kernel Version: 4.11-wip Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: low Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: fin4478@hotmail.com Regression: No
Created attachment 253581 --> https://bugzilla.kernel.org/attachment.cgi?id=253581&action=edit dmesg logfile
I have Gigabyte RX460 2GB gpu card, Debian testing Xfce and adg5f drm-next-4.11-wip kernel downloaded and compiled as today. Computer works ok but the dmesg command shows the following boot errors that might interest amdgou driver developers. Mounting my home partiton fails amdgpu IB tests:
[ 7.001953] [drm] ib test on ring 12 succeeded [ 7.055163] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null) [ 8.011874] [drm:0xffffffffa01360ce] *ERROR* amdgpu: IB test timed out. [ 8.011910] [drm:0xffffffffa00e1b4b] *ERROR* amdgpu: failed testing IB on ring 13 (-110). [ 8.011943] [drm:0xffffffffa00be574] *ERROR* ib ring test failed (-110).
Some powerplay errors: [ 4.888584] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 4.891452] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table! [ 4.894807] amdgpu: [powerplay] failed to send message 309 ret is 254 [ 4.894824] amdgpu: [powerplay] failed to send pre message 14e ret is 254
Bios recognition errors: [ 4.729628] [drm] BIOS signature incorrect 20 7 [ 4.729635] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
https://bugzilla.kernel.org/show_bug.cgi?id=193651
Steven A. Falco stevenfalco@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |stevenfalco@gmail.com
--- Comment #1 from Steven A. Falco stevenfalco@gmail.com --- Created attachment 253671 --> https://bugzilla.kernel.org/attachment.cgi?id=253671&action=edit dmesg log
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #2 from Steven A. Falco stevenfalco@gmail.com --- Created attachment 253681 --> https://bugzilla.kernel.org/attachment.cgi?id=253681&action=edit xorg log file
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #3 from Steven A. Falco stevenfalco@gmail.com --- Created attachment 253691 --> https://bugzilla.kernel.org/attachment.cgi?id=253691&action=edit journal log file
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #4 from Steven A. Falco stevenfalco@gmail.com --- Created attachment 253701 --> https://bugzilla.kernel.org/attachment.cgi?id=253701&action=edit /var/log/messages
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #5 from Steven A. Falco stevenfalco@gmail.com --- I began having problems with my AMD GPU when Fedora 25 switched from their 4.8.16-300.fc25 kernel to a 4.9.3 kernel, as described here:
https://bugzilla.redhat.com/show_bug.cgi?id=1414025
The initial symptom was that there was no kernel frame buffer, so the system dropped back to using an accelerated video interface.
With the latest Fedora kernel (4.9.6-200.fc25), the system eventually runs normally, but it takes upwards of 6 minutes for the system to boot. As shown in the files I attached, I too get many messages of the form:
[ 346.235933] failed to send pre message 148 ret is 0 [ 346.455587] failed to send message 148 ret is 0
I'd like the importance of this bug raised to medium or high, as it is a clear regression from the 4.8.16 kernel to the 4.9.3 kernel.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #6 from Steven A. Falco stevenfalco@gmail.com --- Typo in the above comment:
s/an accelerated video/an un-accelerated video/
https://bugzilla.kernel.org/show_bug.cgi?id=193651
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #7 from Alex Deucher alexdeucher@gmail.com --- Does using the new ucode here help?
https://people.freedesktop.org/~agd5f/radeon_ucode/polaris/
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #8 from fin4478@hotmail.com --- Alex, thanks for the new firmware. Still Bios recognition errors at boot, but otherwise ok.
[ 3.461112] [drm] BIOS signature incorrect 20 7 [ 3.461117] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
Steven, you are using Tonga gpu and radeon kernel driver that fails at boot and the system using VESA driver. The amdgpu driver has support for Tonga, but you need to make a custom 4.11-wip kernel. Stock distribution kernels do not have stable amdgpu code. Creating a custom kernel in Debian: Use the command: git clone -b drm-next-4.11-wip git://people.freedesktop.org/~agd5f/linux
The kernel configuration file of Debian Official kernel are available in /boot, named after the kernel release. Copy the .config file to the linux directory. Connect all your devices and run the command: make localmodconfig. You can use the command make defconfig too for creating initial .config file.
Use the command: make xconfig and check that you have enabled: Reroute Broken IRQ, Virtualization KVM and 300Hz CPU timer, I also disabled Swap, Kernel Debug, CPU Freq scaling , Cpu handling in Acpi, Used Bios to control CPU and devices. In the drivers->graphics->amdgpu enable cik support for a gcn 1.1 gpu and si support for a gcn 1.0 gpu.
Create debian kernel package: export CONCURRENCY_LEVEL=4 fakeroot make-kpkg --initrd kernel_image
Install the kernel package with Gdebi. To make a custom kernel to boot, add a line to /etc/initramfs-tools/modules: unix And run: sudo update-initramfs Reboot.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #9 from fin4478@hotmail.com --- After updating the firmware I still have powerplay erros: [ 3.574222] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 3.577052] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #10 from Steven A. Falco stevenfalco@gmail.com --- Thanks for the information on building a new kernel. I'll give that a try. I'm running Fedora 25, but I think I can follow your Debian instructions.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #11 from Alex Deucher alexdeucher@gmail.com --- (In reply to fin4478 from comment #8)
Alex, thanks for the new firmware. Still Bios recognition errors at boot, but otherwise ok.
[ 3.461112] [drm] BIOS signature incorrect 20 7 [ 3.461117] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
This is harmless. The driver tries several methods to fetch the vbios image. The driver would not load at all if it failed to fetch the vbios image.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #12 from Steven A. Falco stevenfalco@gmail.com --- Created attachment 253891 --> https://bugzilla.kernel.org/attachment.cgi?id=253891&action=edit New dmesg file
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #13 from Steven A. Falco stevenfalco@gmail.com --- I successfully built a custom kernel. It appears to be working well. Thanks for the help!
I included a new dmesg.log file because I still see messages like:
[ 9.719278] amdgpu: [powerplay] failed to send pre message 15b ret is 0 [ 10.158327] amdgpu: [powerplay] failed to send message 15b ret is 0
Are these harmless or do they indicate a problem?
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #14 from Steven A. Falco stevenfalco@gmail.com --- One other error message I just noticed:
[ 5.538117] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!
https://bugzilla.kernel.org/show_bug.cgi?id=193651
Milo (milomak@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |milomak@gmail.com
--- Comment #15 from Milo (milomak@gmail.com) --- (In reply to fin4478 from comment #8)
Alex, thanks for the new firmware. Still Bios recognition errors at boot, but otherwise ok.
[ 3.461112] [drm] BIOS signature incorrect 20 7 [ 3.461117] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
Steven, you are using Tonga gpu and radeon kernel driver that fails at boot and the system using VESA driver. The amdgpu driver has support for Tonga, but you need to make a custom 4.11-wip kernel. Stock distribution kernels do not have stable amdgpu code. Creating a custom kernel in Debian: Use the command: git clone -b drm-next-4.11-wip git://people.freedesktop.org/~agd5f/linux
The kernel configuration file of Debian Official kernel are available in /boot, named after the kernel release. Copy the .config file to the linux directory. Connect all your devices and run the command: make localmodconfig. You can use the command make defconfig too for creating initial .config file.
Use the command: make xconfig and check that you have enabled: Reroute Broken IRQ, Virtualization KVM and 300Hz CPU timer, I also disabled Swap, Kernel Debug, CPU Freq scaling , Cpu handling in Acpi, Used Bios to control CPU and devices. In the drivers->graphics->amdgpu enable cik support for a gcn 1.1 gpu and si support for a gcn 1.0 gpu.
Create debian kernel package: export CONCURRENCY_LEVEL=4 fakeroot make-kpkg --initrd kernel_image
Install the kernel package with Gdebi. To make a custom kernel to boot, add a line to /etc/initramfs-tools/modules: unix And run: sudo update-initramfs Reboot.
I tried the above but notice that it fails when trying to build headers.
If I am trying to build a 4.10.5 kernel, could I copy files from what was downloaded when I ran the git command?
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #16 from fin4478@hotmail.com --- (In reply to Milo from comment #15)
I tried the above but notice that it fails when trying to build headers.
You do not need kernel headers unless you are using some dkms drivers. Currently there is a temperature bug in wip kernel. 4.11-rc3 kernel from kernel.org works.
Some kernel version headers build failed because missing BUG-REPORTS file. Create the file into the linux directory.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #17 from Milo (milomak@gmail.com) --- (In reply to fin4478 from comment #16)
(In reply to Milo from comment #15)
I tried the above but notice that it fails when trying to build headers.
You do not need kernel headers unless you are using some dkms drivers. Currently there is a temperature bug in wip kernel. 4.11-rc3 kernel from kernel.org works.
Some kernel version headers build failed because missing BUG-REPORTS file. Create the file into the linux directory.
Thanks.
Yes I do have some dkms packages installed that need headers. As you suggested, creating REPORTING-BUGS solved the headers build fail.
So it built as version 4.10.0-rc5-gec3fa8e6ca19. When I booted from it, after 20 minutes the screen was still blank so I rebooted.
I had built 4.9.10 before from kernel.org which after 10 minutes eventually booted into X. I used this .config along with you comments in comment #8 to build 4.10.0-rc5-gec3fa8e6ca19.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #18 from fin4478@hotmail.com --- (In reply to Milo from comment #17)
So it built as version 4.10.0-rc5-gec3fa8e6ca19. When I booted from it, after 20 minutes the screen was still blank so I rebooted.
All your software must in sync, so use Debian testing Xfce, Oibaf ppa yakkety version and latest kernels. I will post my kernel 4.11-rc config.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #19 from fin4478@hotmail.com --- Created attachment 255553 --> https://bugzilla.kernel.org/attachment.cgi?id=255553&action=edit Kernel 4.11-rc3 config file for RX400 series, add drivers for your hardware
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #20 from Christian Lanig (christian.lanig@gmail.com) --- This bug report is partly a duplicate of that one: https://bugs.freedesktop.org/show_bug.cgi?id=100443
I'm getting the same AVS/Powerplay messages, updating the firmware didn't help.
The topic headline is very unspecific and the replies appear very confusing to me. Has this issue been solved or not? Does a custom Kernel change the messages? - I'm using the newest Ubuntu mainline Kernel. Is something wrong with the configuration used to build this Kernel by the Kernel team? How is this issue related to Tonga GPUs? Polaris is the first dGPU with AVFS. How many issues do we count here and which replies belong to which issue?
Perhaps someone could make a summary or something - that would be very pleasant.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #21 from Steven A. Falco (stevenfalco@gmail.com) --- As I previously reported, building a custom kernel as suggested in comment 8 allows me to use my video card.
I do continue to get the powerplay error messages, but aside from slowing down boot a little, they don't seem to do any harm.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #22 from Christian Lanig (christian.lanig@gmail.com) --- Thanks for clarification. That means building the Kernel was not a possible fix for the messages but for getting the driver to start with Tonga.
So the AVFS- issue and missing/wrong value in the voltage dependency table is still existent. As well as your remaining Powerplay messages.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #23 from Milo (milomak@gmail.com) --- (In reply to Christian Lanig from comment #22)
Thanks for clarification. That means building the Kernel was not a possible fix for the messages but for getting the driver to start with Tonga.
So the AVFS- issue and missing/wrong value in the voltage dependency table is still existent. As well as your remaining Powerplay messages.
i didn't have the avfs issue but the other two on my r9 m390 and booting taking as long as 10 minutes when i moved from kernel 4.8 to 4.9/4.10
having built 4.11-rc4, my boot times are back down to around 30 seconds but the messages still persist in dmesg. and it seems i have added some ib related messages.
so i was just confirming that from my perspective there is an issue that was solved though not the messages that led me here.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #24 from Steven A. Falco (stevenfalco@gmail.com) --- With kernel 4.11.3-200.fc25.x86_64, I no longer need a custom kernel to use my video card. I still see messages like:
[ 13.599542] [drm] ib test on ring 13 succeeded [ 13.606627] [drm:amdgpu_device_init [amdgpu]] *ERROR* ib ring test failed (-110). [ 14.500572] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 14.983369] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 15.120567] amdgpu: [powerplay] failed to send pre message 155 ret is 0 [ 15.609965] amdgpu: [powerplay] failed to send message 155 ret is 0 [ 16.014478] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 16.165919] amdgpu: [powerplay] failed to send pre message 15b ret is 0 [ 16.570511] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 16.721456] amdgpu: [powerplay] failed to send message 15b ret is 0 [ 17.498715] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 17.951062] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 18.843123] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 19.295427] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 20.187154] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 20.639852] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 21.531857] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 21.984781] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 21.998448] [drm] Initialized amdgpu 3.10.0 20150101 for 0000:05:00.0 on minor 0
but at least the card is otherwise functional.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #25 from Milo (milomak@gmail.com) --- trying to install a 4.13 kernel and boot times are back to more than 5 minutes (the failed to send message appears for more than 5 minutes when i run journalctl -e)
journalctl log - https://pastebin.com/7dScPxNn
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #26 from fin4478@hotmail.com --- Milo, try to set amdgpu.audio=0 to the kernel command line.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #27 from Milo (milomak@gmail.com) --- i have added it /etc/default/grub as follows: GRUB_CMDLINE_LINUX_DEFAULT="nointremap quiet amdgpu.audio=0" and then ran update-grub
however it still takes more than 5 minutes to boot
On Mon, Oct 23, 2017 at 8:11 PM bugzilla-daemon@bugzilla.kernel.org wrote:
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #26 from fin4478@hotmail.com --- Milo, try to set amdgpu.audio=0 to the kernel command line.
-- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.kernel.org/show_bug.cgi?id=193651
--- Comment #28 from fin4478@hotmail.com --- (In reply to Alex Deucher from comment #11)
(In reply to fin4478 from comment #8)
Alex, thanks for the new firmware. Still Bios recognition errors at boot, but otherwise ok.
[ 3.461112] [drm] BIOS signature incorrect 20 7 [ 3.461117] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
This is harmless. The driver tries several methods to fetch the vbios image. The driver would not load at all if it failed to fetch the vbios image.
All messages that use the dev_err function slows down booting and make it look ugly. Amd should manage this fix to the pci driver: Change Invalid PCI ROM header signature message to use the dev_info function in drivers/pci/rom.c
https://bugzilla.kernel.org/show_bug.cgi?id=193651
fin4478@hotmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |CODE_FIX
https://bugzilla.kernel.org/show_bug.cgi?id=193651
jacky (jackysen422@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jackysen422@gmail.com
--- Comment #29 from jacky (jackysen422@gmail.com) --- I was able to dead lock that system by openning too many tabs in Chromium, but that is not what those patches should have solved. nohang/earlymoon would have probably helped if it was used. YOR Construction & Investments, Inc also offers a maintenance program for your air condition replacement. Regular checkups will make the equipment last longer and save you money on utility bills. In case of an emergency, you will get preferential status for scheduling and even get a discount on emergency repairs. If you need help with your air condition replacement, homeowners can call us today at 1-888-457-7746. https://yorconstruction.com/air-condition-replacement/
dri-devel@lists.freedesktop.org