https://bugs.freedesktop.org/show_bug.cgi?id=106111
--- Comment #2 from Alex Williamson alex.williamson@redhat.com --- The IOMMU looks to be unhappy first:
[ 40.201258] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x19@0x270 [ 40.201271] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1b@0x2d0 [ 40.201279] vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x1e@0x370 [ 159.958402] AMD-Vi: Completion-Wait loop timed out [ 160.118777] AMD-Vi: Completion-Wait loop timed out [ 160.799864] AMD-Vi: Event logged [ [ 160.799868] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000043e8e8550] [ 160.799872] AMD-Vi: Event logged [ [ 160.799874] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000043e8e8570] [ 160.799876] AMD-Vi: Event logged [ [ 160.799878] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000043e8e8590] [ 161.801729] AMD-Vi: Event logged [ [ 161.801732] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000043e8e85e0] [ 180.096365] AMD-Vi: Completion-Wait loop timed out [ 180.256758] AMD-Vi: Completion-Wait loop timed out [ 180.417182] AMD-Vi: Completion-Wait loop timed out [ 180.577636] AMD-Vi: Completion-Wait loop timed out
Can you try a v4.17-rc1 kernel? Specifically, these two updates:
6bd06f5a486c vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs eb5ecd1a40e2 iommu/amd: Add support for fast IOTLB flushing
Something about AMD GPUs get unhappy if the IOMMU sends out too many invalidations and the above two patches can reduce the number of those invalidations by up to a factor of 512.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...