https://bugs.freedesktop.org/show_bug.cgi?id=99236
Bug ID: 99236 Summary: System (seems to) completely freezes when interacting with java swing applications. Product: DRI Version: DRI git Hardware: x86 (IA32) OS: Linux (All) Status: NEW Severity: major Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: tmp6154@yandex.ru
I'm not completely certain whether it's related to AMD GPU driver or not, but it's rather strange issue that I can get stable reproduce for.
I'm running a Gentoo system with AMD Radeon RX 480, with git AMD GPU driver and git version of mesa. When interacting with scrollable JTextAreas in Java Swing application, I get a stably reproducible issue (100%). Since this affects multiple java swing applications (e.g. eclipse), including the program I'm developing at the time, I can attempt to put together a reproducer java app, if that's needed.
For first 2-3 seconds, mouse cursor moves jittery, then it stops to move at all and display keeps displaying same frozen state. Nothing appears to work, including Ctrl+Alt+F1, etc. But despite machine looking completely locked up, in fact, it's not. I can ssh into it and issue reboot command. During that time, display doesn't shows any signs of life until the moment when machine reboots (but I can hear KDE shutdown sound).
This occurs only under JRE8, both icedtea and oracle variants. Under JRE7 issue doesn't trigger. If memory serves, JRE8 brought improvements to hardware-accelerated GUI rendering. This is especially strange, since Swing GUI framework renders it's own GUI widgets.
Considering that even Ctrl+Alt+F1 doesn't work, I suppose problem happens on kernel level, possibly in AMD GPU driver.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
Vitaly Ostrosablin tmp6154@yandex.ru changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|System (seems to) |System (seems to) |completely freezes when |completely freeze when |interacting with java swing |interacting with java swing |applications. |applications.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
Vitaly Ostrosablin tmp6154@yandex.ru changed:
What |Removed |Added ---------------------------------------------------------------------------- Hardware|x86 (IA32) |x86-64 (AMD64)
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #1 from Alex Deucher alexdeucher@gmail.com --- Please attach your xorg log and dmesg output.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #2 from Vitaly Ostrosablin tmp6154@yandex.ru --- Created attachment 128721 --> https://bugs.freedesktop.org/attachment.cgi?id=128721&action=edit dmesg output from faulty session
Here's my dmesg. From these lines:
[ 675.891897] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802 [ 675.891899] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 675.891900] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02048002 [ 675.891902] VM fault (0x02, vmid 1) at page 0, read from 'TC4' (0x54433400) (72) [ 675.892003] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802 [ 675.892004] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 675.892006] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02048002
It's obvious that something went terribly wrong inside AMD GPU.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #3 from Vitaly Ostrosablin tmp6154@yandex.ru --- Created attachment 128722 --> https://bugs.freedesktop.org/attachment.cgi?id=128722&action=edit Xorg.0.log from faulty session
This is Xorg log. It doesn't seem that X noticed the fault at all. Moreover, with `ps -e` over ssh I can see X process and all other GUI programs still running just fine. So it seems that only GPU driver has failed, other stuff doesn't seem to be affected.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #4 from Vitaly Ostrosablin tmp6154@yandex.ru --- Created attachment 128723 --> https://bugs.freedesktop.org/attachment.cgi?id=128723&action=edit Java Swing reproducer application
Here's Java sources of a reproducer mini-application for the issue. Running it and pressing the button results for me in AMD GPU fault, similar to ones I've already attached logs for. Works in 100% cases (2/2) if run under JRE8.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|DRM/AMDgpu |Drivers/Gallium/radeonsi Product|DRI |Mesa QA Contact| |dri-devel@lists.freedesktop | |.org Version|DRI git |unspecified
--- Comment #5 from Michel Dänzer michel@daenzer.net --- It's most likely a Mesa driver issue.
Can you try running Xephyr something like this:
GALLIUM_DDEBUG="pipelined 2000" Xephyr :99 -glamor -screen 1024x768
and then run the reproducer app with DISPLAY=:99 . After the hang, a file should appear in ~/ddebug_dumps/. Please attach that file here.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #6 from Vitaly Ostrosablin tmp6154@yandex.ru --- Yes, no problem. However, my XOrg was compiled without Xephyr, so I rebuilt it. Unfortunately, I've decided to update mesa to latest commit as well, but it seems that one of recent commits breaks everything (I get an unusable desktop which looks white and can see gray outlines of KDE taskbar, mouse cursor and login password box cursor). So, I had to temporarily revert to 13.0.3 mesa. But there some useful info. First, on 13.0.3 I cannot reproduce fault with reproducer app. Second, reproducer app looks same under 13.0.3 both on Java 7 and Java 8. I found it strange that under Java 7 swing app looks like it should (Metal look & feel) and under Java 8 it looked different (white buttons instead of default metallic). But it appears that this was just a rendering artefact.
I will try to get back to working mesa commit and reproduce the problem with Xephyr now.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #7 from Vitaly Ostrosablin tmp6154@yandex.ru --- Created attachment 128804 --> https://bugs.freedesktop.org/attachment.cgi?id=128804&action=edit Comparison of reproducer app under java 8 and java 7 with git mesa
Checked out two days old revision of mesa. Attached screenshot of what I meant about reproducer app.
Will try running it in Xephyr.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #8 from Vitaly Ostrosablin tmp6154@yandex.ru --- Attempted to run reproducer app in Xephyr. It appears exactly like on host with Java 8 in attached screenshot. I.e. with white button. However, clicking the button just adds the text into textarea, as programmed, while doing this directly on host's Xorg hangs the system.
I think it's possible that app alone is not enough to reproduce the issue, KDE and it's window manager might be at play here, too. But it's strange, because this seems to occur on adding text to JTextArea, which is rendered by Swing and should be least affected by WM and DE (except for window border, which is absent in Xephyr, since no WM runs there).
But if that's mesa-only bug, shouldn't Ctrl+Alt+F1 work? Here GPU appears to have stopped output completely (most likely, fullscreen tty opens, but GPU shows same picture as on moment of freeze).
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #9 from Michel Dänzer michel@daenzer.net --- Any chance you can bisect Mesa?
(In reply to Vitaly Ostrosablin from comment #8)
But if that's mesa-only bug, shouldn't Ctrl+Alt+F1 work? Here GPU appears to have stopped output completely (most likely, fullscreen tty opens, but GPU shows same picture as on moment of freeze).
A GPU hang tends to cause the Xorg process to hang as well, which prevents VT switching from working.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #10 from Vitaly Ostrosablin tmp6154@yandex.ru --- Yes, will try to bisect mesa. Unfortunately, in looks like I'll have to do that manually, since Gentoo doesn't seem to have bisect tools for portage. So far I can say following initial info:
1) Bug wasn't introduced at least until November 30, 2016. 2) White button artefact doesn't seem to be related to hang. In Nov 30 commit button is white, but pressing it doesn't hang the system. 3) On Dec 20, 2016, hang was already introduced.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #11 from Vitaly Ostrosablin tmp6154@yandex.ru --- Further narrowed date range: between Dec 6 and Dec 12.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #12 from Vitaly Ostrosablin tmp6154@yandex.ru --- Looks like it broke on Dec 07. There was a lot of radeonsi-related commits, but I had difficulty compiling a working mesa out of them. On Dec 6, there was no bug. No commits on Dec 7 seems to work, they're segfaulting. Then later on Dec 8, mesa can be compiled an started, but issue is already present.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #13 from Ilia Mirkin imirkin@alum.mit.edu --- Vitaly - commit id's please. Dates are largely meaningless - the default date shown by git has little to do with when the commit made it into a particular tree, even with mesa's rebase policy.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #14 from Vitaly Ostrosablin tmp6154@yandex.ru --- 85a3057f651a1c56348f1af18343d9cc0a5c93f3 used to work fine.
After that, in at most 3 commits to future from this point something was broken and mesa didn't run (checked on 4c8c13b3568c82e503a10ddcb846b4c96261ec4c).
One of commits further in history I tried was 132b69c4edb824c70c98f8937c63e49b04f3adff, which didn't work as well.
After it, there was a huge batch of radeonsi commits.
c7dc1b010ae581f532240b661cb3d1c82e117e7e is not runnable, too.
bd56de88dfb192310f3432a3c0e0ddc3469c6d55 is runnable (probably, was fixed somewhere earlier) and java reproducer app hangs system there.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #15 from Michel Dänzer michel@daenzer.net --- For any commits that you can't test, run
git bisect skip
Eventually, git bisect will either show the commit which introduced the problem, or the minimal set of candidates.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
--- Comment #16 from Vitaly Ostrosablin tmp6154@yandex.ru --- Have successfully updated to latest mesa. Seems like issue was fixed recently.
https://bugs.freedesktop.org/show_bug.cgi?id=99236
Timothy Arceri t_arceri@yahoo.com.au changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #17 from Timothy Arceri t_arceri@yahoo.com.au --- (In reply to Vitaly Ostrosablin from comment #16)
Have successfully updated to latest mesa. Seems like issue was fixed recently.
dri-devel@lists.freedesktop.org