https://bugs.freedesktop.org/show_bug.cgi?id=104481
Bug ID: 104481 Summary: GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7 platforms while playing video Product: Mesa Version: git Hardware: All OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: luis.p.mendes@gmail.com QA Contact: dri-devel@lists.freedesktop.org
Created attachment 136527 --> https://bugs.freedesktop.org/attachment.cgi?id=136527&action=edit dmesg and iomem data from lockup obtained with glretrace
I am getting GPU lockups while playing video on Kodi, but it also happened with other applications that play video, while OpenGL seems to be stable. The system seem to be more sensitive to VP9 encoded videos. The freeze happens both on amd64 as well as on armv7l platforms. I am also able to reproduce GPU hangs on amd64 while replaying a glretrace obtained with kodi on arm platform.
The arm dmesg and traces show a clear GPU lockup, while amd64 dmesg isn't so clear, but the user experience is just the same, complete graphical system freeze, while machine is still working with ssh or remote connections.
Please find amd64 logs in attachments, including iomem, dmesg and gdb traces.
In both platforms I am using Ubuntu 17.10 with Mate desktop, and lightdm session manager, with libdrm-2.4.89, mesa-17.4 at commit "radv: Implement binning on GFX9." - 6a36bfc64d2096aa338958c4605f5fc6372c07b8 and kernel https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.16 at commit "drm/amdgpu: Correct the IB size of bo update mapping." - 104bd2ca1124dfd9aa904d5f5a96253ef2b580f6.
Please note that the system was more stable a few weeks ago with drm-next-4.16 based on kernel 4.15-rc2, and a previous mesa version, I don't remember the actual commits, but despite it was more stable, both on arm as well as on amd64, both systems still crashed similarly, it just got more evident with these new versions.
There are two distinct crash behaviours on amd64: the ones that I obtained while playing a video with kodi on amd64 and those that I obtained on amd64 by replaying an apitrace from the arm platform while playing a VP9 video with kodi.
The first kind of crashes is detailed with logs kodi-processes_and_backtraces.txt and kodi-amdgpu_lockup_dmesg_and_iomem.txt. The second kind of crashes is detailed with logs glretrace-processes_and_backtraces.txt and glretrace-amdgpu_lockup_dmesg_and_iomem.txt.
For some strange reason the amd64 platform is complaining about polaris11 firmware files, but they are in /lib/firmware and they taken by cloning https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git. I am using the same firmware files on armv7l and the same graphics card and it doesn't complain with the firmware.
I can also provide the apitrace trace file, but it takes around 1GB of data.
https://bugs.freedesktop.org/show_bug.cgi?id=104481
--- Comment #1 from Luis Mendes luis.p.mendes@gmail.com --- Created attachment 136528 --> https://bugs.freedesktop.org/attachment.cgi?id=136528&action=edit Processes listing and gdb backtraces for all threads - glretrace lockup
This is the process listing and gdb backtraces for all glretrace threads upon GPU hang caused by replaying with glretrace the apitrace obtained on arm platform from kodi playing a VP9 encoded video.
https://bugs.freedesktop.org/show_bug.cgi?id=104481
--- Comment #2 from Luis Mendes luis.p.mendes@gmail.com --- Created attachment 136529 --> https://bugs.freedesktop.org/attachment.cgi?id=136529&action=edit dmesg and iomem data from lockup obtained with kodi on amd64
This attachment contains the dmesg and iomem information retrieved after the GPU lockup occurred when playing a VP9 encoded video with kodi directly on amd64 platform.
https://bugs.freedesktop.org/show_bug.cgi?id=104481
--- Comment #3 from Luis Mendes luis.p.mendes@gmail.com --- Created attachment 136530 --> https://bugs.freedesktop.org/attachment.cgi?id=136530&action=edit Processes listing and gdb backtraces for all threads - kodi amd64 lockup
This attachment contains the processes listing and gdb backtraces for all kodi threads, that were retrieved after the GPU lockup occurred when playing a VP9 encoded video with kodi directly on amd64 platform.
https://bugs.freedesktop.org/show_bug.cgi?id=104481
Luis Mendes luis.p.mendes@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |major
https://bugs.freedesktop.org/show_bug.cgi?id=104481
--- Comment #4 from Luis Mendes luis.p.mendes@gmail.com --- Created attachment 136532 --> https://bugs.freedesktop.org/attachment.cgi?id=136532&action=edit Kernel hung task backtrace from GPU hang caused by glretrace replay
This attachment contains the first print of the kernel backtrace with the hung caused by GPU hang when replaying the apitrace of the armv7l playing the VP9 video with kodi.
https://bugs.freedesktop.org/show_bug.cgi?id=104481
Luis Mendes luis.p.mendes@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- OS|All |Linux (All)
https://bugs.freedesktop.org/show_bug.cgi?id=104481
--- Comment #5 from Julien Isorce julien.isorce@gmail.com --- (In reply to Luis Mendes from comment #0)
I can also provide the apitrace trace file, but it takes around 1GB of data.
Just provide it through google drive or other similar way, see https://bugs.freedesktop.org/show_bug.cgi?id=94900#c15
https://bugs.freedesktop.org/show_bug.cgi?id=104481
--- Comment #6 from Luis Mendes luis.p.mendes@gmail.com --- (In reply to Julien Isorce from comment #5)
(In reply to Luis Mendes from comment #0)
I can also provide the apitrace trace file, but it takes around 1GB of data.
Just provide it through google drive or other similar way, see https://bugs.freedesktop.org/show_bug.cgi?id=94900#c15
I haven't sent updates on this issue for a while, but this is now more diverse, that is, on the amd64 platforms (TYAN S7002, TYAN S7025) that I have, I am getting trouble for the amdgpu driver to load, and when I am able to do so, it runs into a GPU lockup as soon at it tries to enter into graphical X session mode. That has been like so for kernels linux-4.16.x, 4.17.x and 4.18-rcX. Please see https://lists.freedesktop.org/archives/amd-gfx/2018-July/023925.html
On armhf the story has been different... I was able to have a working configuration with Ubuntu 17.10, kernel 4.17.6 and kodi 17.3, however, the same kernel with Ubuntu 18.04 and kodi 17.6 made the problem reappear. I switched to kernel-4.18-rc8 and the problem went away again. I can provide an apitrace for 4.17.6 if desired, but it looks like it is fixed with kernel 4.18.
From my side, I am now more concerned with my amd64 platforms, as I am simply
unable to use the AMD gpus.
Please advise.
https://bugs.freedesktop.org/show_bug.cgi?id=104481
GitLab Migration User gitlab-migration@fdo.invalid changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #7 from GitLab Migration User gitlab-migration@fdo.invalid --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1297.
dri-devel@lists.freedesktop.org