https://bugs.freedesktop.org/show_bug.cgi?id=110360
Bug ID: 110360 Summary: AMD system hits AMD-Vi: Completion-Wait loop timed out on Acer Squirtle_SR laptop Product: DRI Version: XOrg git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: jian-hong@endlessm.com
Created attachment 143905 --> https://bugs.freedesktop.org/attachment.cgi?id=143905&action=edit The error log
We have an Acer Squirtle_SR laptop equipped with AMD A9-9420e RADEON R5, 5 COMPUTE CORES 2C+3G and [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1002:6900]. We test it with Linux kernel 5.1.0-rc4. The system hits the following error and makes system hang up:
Apr 09 11:28:57 endless kernel: AMD-Vi: Completion-Wait loop timed out Apr 09 11:28:57 endless kernel: iwlwifi 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xff814000 flags=0x0050]
The worst case is the disk's block may be disrupted, then we have to re-install the system if it cannot be recovered by fsck.
If we blacklist the amdgpu module, then system will not hit the error. But system has no GUI, and only shows console.
If iommu=soft is appended to the boot command, system works fine.
https://bugs.freedesktop.org/show_bug.cgi?id=110360
jian-hong@endlessm.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Hardware|Other |x86-64 (AMD64) Priority|medium |high OS|All |Linux (All)
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #1 from jian-hong@endlessm.com --- The [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1002:6900]
01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1002:6900] (rev c3) Subsystem: Acer Incorporated [ALI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1025:1217] Physical Slot: 0 Flags: bus master, fast devsel, latency 0, IRQ 44 Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d0000000 (64-bit, prefetchable) [size=2M] I/O ports at 3000 [size=256] Memory at d1400000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at d1440000 [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Capabilities: [270] #19 Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Kernel driver in use: amdgpu Kernel modules: amdgpu
https://bugs.freedesktop.org/show_bug.cgi?id=110360
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #143905|text/x-log |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #2 from Alex Deucher alexdeucher@gmail.com --- Does booting with amdgpu.runpm=0 on the kernel command line in grub help?
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #3 from jian-hong@endlessm.com --- Created attachment 143916 --> https://bugs.freedesktop.org/attachment.cgi?id=143916&action=edit The dmesg of disabled amdgpu's runpm
(In reply to Alex Deucher from comment #2)
Does booting with amdgpu.runpm=0 on the kernel command line in grub help?
System boots correctly with amdgpu.runpm=0 on the kernel command line.
https://bugs.freedesktop.org/show_bug.cgi?id=110360
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #143916|text/x-log |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=110360
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugzilla.kernel.org | |/show_bug.cgi?id=194521
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #4 from jian-hong@endlessm.com --- Created attachment 143930 --> https://bugs.freedesktop.org/attachment.cgi?id=143930&action=edit The dmesg of disabled pci ats
Also tested with 'pci=noats' on boot command which is mentioned in https://bugzilla.kernel.org/show_bug.cgi?id=194521#c24 System also boots fine.
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #5 from Alex Deucher alexdeucher@gmail.com --- Please attach the output of lspci -vnn
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #6 from jian-hong@endlessm.com --- Created attachment 143946 --> https://bugs.freedesktop.org/attachment.cgi?id=143946&action=edit lspci -nnv
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #7 from jian-hong@endlessm.com --- Any thing else I can help more? Test or need more information, log? :)
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #8 from Alex Deucher alexdeucher@gmail.com --- https://patchwork.kernel.org/patch/10889269/
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #9 from Alex Deucher alexdeucher@gmail.com --- (In reply to Alex Deucher from comment #8)
I think it's actually a problem with runtime pm and some pci state. I may ask you to help debug that when I get a chance.
https://bugs.freedesktop.org/show_bug.cgi?id=110360
--- Comment #10 from Daniel Drake dan@reactivated.net --- Thanks Alex. We will have to return this unit to the vendor at some point, but we will try to hold onto it for another month so that we can run any tests you request.
Alternatively, we may be able to get an affected unit shipped to you on a 1-month loan. Would that be useful?
https://bugs.freedesktop.org/show_bug.cgi?id=110360
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #11 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/740.
dri-devel@lists.freedesktop.org