https://bugs.freedesktop.org/show_bug.cgi?id=93015
Bug ID: 93015 Summary: Tonga Elemental segfault + VM faults since radeon: implement r600_query_hw_get_result via function pointers Product: DRI Version: DRI git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: adf.lists@gmail.com
Unreal 4.5 Elemental demo on r9 285 using powerplay kernel.
Since mesa commit -
commit 50f0f938e3a577647fdfb6bdbb4ad3da252aa791 Author: Nicolai Hähnle nhaehnle@gmail.com Date: Fri Nov 13 00:27:34 2015 +0100
radeon: implement r600_query_hw_get_result via function pointers
We will need the clear_result override for the batch query implementation.
About a minute into the demo (always same place) the demo will catch a segfault and quit.
In dmesg I see a few VM faults.
While confirming the bisect I see that though it doesn't crash on the commit before above =
commit c207c55fc08a1bf3dd40e79b3aaec34afbee2e55 Author: Nicolai Hähnle nhaehnle@gmail.com Date: Wed Nov 18 12:05:11 2015 +0100
radeon: split hw query buffer handling from cs emit
The idea here is that driver queries implemented outside of common code will use the same query buffer handling with different logic for starting and stopping the corresponding counters.
At the point where it would have crashed I start getting flooded with VM faults
[17771.298259] VM fault (0x14, vmid 5) at page 1204016, write from 'TC0' (0x54433000) (8) [17771.330661] amdgpu 0000:01:00.0: GPU fault detected: 146 0x04c20814 [17771.330665] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00125E98 [17771.330666] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B008014 [17771.330668] VM fault (0x14, vmid 5) at page 1203864, write from 'TC0' (0x54433000) (8) [17771.363320] amdgpu 0000:01:00.0: GPU fault detected: 146 0x05e20814 [17771.363323] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001264BC [17771.363325] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B008014 [17771.363326] VM fault (0x14, vmid 5) at page 1205436, write from 'TC0' (0x54433000) (8) [17771.395828] amdgpu 0000:01:00.0: GPU fault detected: 146 0x06620814 [17771.395832] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001260CC [17771.395833] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B008014 [17771.395834] VM fault (0x14, vmid 5) at page 1204428, write from 'TC0' (0x54433000) (8)
https://bugs.freedesktop.org/show_bug.cgi?id=93015
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |nhaehnle@gmail.com
--- Comment #1 from Michel Dänzer michel@daenzer.net --- Nicolai, any ideas?
https://bugs.freedesktop.org/show_bug.cgi?id=93015
--- Comment #2 from Nicolai Hähnle nhaehnle@gmail.com --- Hi Andy, thanks for the report! I can reproduce the crash, it does indeed seem to be related to buffer handling, I am investigating.
https://bugs.freedesktop.org/show_bug.cgi?id=93015
--- Comment #3 from Nicolai Hähnle nhaehnle@gmail.com --- Created attachment 119980 --> https://bugs.freedesktop.org/attachment.cgi?id=119980&action=edit patch that should fix the bug
https://bugs.freedesktop.org/show_bug.cgi?id=93015
--- Comment #4 from Nicolai Hähnle nhaehnle@gmail.com --- Created attachment 119981 --> https://bugs.freedesktop.org/attachment.cgi?id=119981&action=edit related patch
Okay, so I understand what failed and why it worked before.
Could you please test both patches? The first one should fix your problem, the second one is a related cleanup on top of it that hopefully contains no regressions.
[I have apparently unrelated weirdness going on right now which prevents me from testing this properly.]
https://bugs.freedesktop.org/show_bug.cgi?id=93015
--- Comment #5 from Mathias Tillman master.homer@gmail.com --- Had this problem too, and the patch seems to have fixed it for me.
https://bugs.freedesktop.org/show_bug.cgi?id=93015
--- Comment #6 from Andy Furniss adf.lists@gmail.com --- Patch one fixes it for me and I can't find any regressions with patch one + patch two.
https://bugs.freedesktop.org/show_bug.cgi?id=93015
Nicolai Hähnle nhaehnle@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #7 from Nicolai Hähnle nhaehnle@gmail.com --- Thanks for testing! The patches are in Mesa master, and the bug doesn't affect any of the stable releases, hence closing it.
dri-devel@lists.freedesktop.org