https://bugs.freedesktop.org/show_bug.cgi?id=107229
Bug ID: 107229 Summary: Metro 2033 Redux hangs Product: DRI Version: unspecified Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: alexander@tsoy.me
Metro 2033 Redux hangs when the certain combination of mesa version, kernel version and kernel configuration is used. This is always happen on loading screen.
I have done some tests using integrated benchmark (benchmark.sh):
linux-4.14.x + mesa-7.3.x = OK linux-4.14.x + mesa-8.0.x / mesa-8.1.x = hang linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=y = OK linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=n + mesa-8.0.x / mesa-8.1.x = hang
When the hang occur, it is causes massive slowdown of all other graphical applications. With 4.14 kernels the game process is unkillable so it hangs somewhere in the kernel space. With 4.17 kernels it can be killed but this takes some time.
My GPU: 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga PRO [Radeon R9 285/380] [1002:6939] (rev f1) (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology Tonga PRO [Radeon R9 285/380] [174b:e305]
https://bugs.freedesktop.org/show_bug.cgi?id=107229
--- Comment #1 from Alexander Tsoy alexander@tsoy.me --- Created attachment 140639 --> https://bugs.freedesktop.org/attachment.cgi?id=140639&action=edit linux_metro_bisect.log
# first bad commit: [6ed4e2e673d348df6623012a628a8ab8624e3222] drm/ttm: add transparent huge page support for wc or uc allocations v2
Bisect is done with CONFIG_TRANSPARENT_HUGEPAGE=y. This is how I came to an idea to play with transparent huge pages. Yes, I forgot about --term-old/--term-new bisect options :)
https://bugs.freedesktop.org/show_bug.cgi?id=107229
--- Comment #2 from Alexander Tsoy alexander@tsoy.me --- (In reply to Alexander Tsoy from comment #0)
With 4.14 kernels the game process is unkillable so it hangs somewhere in the kernel space. With 4.17 kernels it can be killed but this takes some time.
The process actually can be killed in a while loop.
Perf report:
$ sudo perf report | grep metro | head 33.33% metro metro [.] cbackend_OGL::delayed_upload 31.56% metro [kernel.vmlinux] [k] rb_prev 2.07% metro [kernel.vmlinux] [k] alloc_iova 0.20% metro [kernel.vmlinux] [k] __switch_to 0.18% metro [kernel.vmlinux] [k] native_load_gs_index 0.13% metro [kernel.vmlinux] [k] __x86_indirect_thunk_rax 0.12% metro [kernel.vmlinux] [k] entry_SYSCALL_64 0.08% metro [kernel.vmlinux] [k] __schedule 0.08% metro [kernel.vmlinux] [k] read_tsc 0.07% metro libc-2.26.so [.] __nanosleep
https://bugs.freedesktop.org/show_bug.cgi?id=107229
--- Comment #3 from Michel Dänzer michel@daenzer.net --- (In reply to Alexander Tsoy from comment #0)
linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=y = OK linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=n + mesa-8.0.x / mesa-8.1.x = hang
Did you swap CONFIG_TRANSPARENT_HUGEPAGE=y/n here? I.e. CONFIG_TRANSPARENT_HUGEPAGE=y is bad, CONFIG_TRANSPARENT_HUGEPAGE=n is good?
If not, how exactly did you bisect with CONFIG_TRANSPARENT_HUGEPAGE=y ?
https://bugs.freedesktop.org/show_bug.cgi?id=107229
--- Comment #4 from Alexander Tsoy alexander@tsoy.me --- (In reply to Michel Dänzer from comment #3)
(In reply to Alexander Tsoy from comment #0)
linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=y = OK linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=n + mesa-8.0.x / mesa-8.1.x = hang
Did you swap CONFIG_TRANSPARENT_HUGEPAGE=y/n here? I.e. CONFIG_TRANSPARENT_HUGEPAGE=y is bad, CONFIG_TRANSPARENT_HUGEPAGE=n is good?
Yes, after getting a clue that this bug could be related to transparent huge pages, I tried to disable CONFIG_TRANSPARENT_HUGEPAGE in 4.17.6 kernel. This results in the same hang I had with 4.14.x kernels.
Note that transparent huge pages must be disabled at build time. cmdline option " transparent_hugepage=never" doesn't change anything.
https://bugs.freedesktop.org/show_bug.cgi?id=107229
--- Comment #5 from Alexander Tsoy alexander@tsoy.me --- To clarify a bit: first bad commit in bisect is actually the first good commit that fixed hangs in Metro.
https://bugs.freedesktop.org/show_bug.cgi?id=107229
--- Comment #6 from Alexander Tsoy alexander@tsoy.me --- (In reply to Alexander Tsoy from comment #5)
To clarify a bit: first bad commit in bisect is actually the first good commit that fixed hangs in Metro.
But only when transparent huge pages are enabled of course.
https://bugs.freedesktop.org/show_bug.cgi?id=107229
--- Comment #7 from Alexander Tsoy alexander@tsoy.me --- Created attachment 140964 --> https://bugs.freedesktop.org/attachment.cgi?id=140964&action=edit dmesg
Same problem with the latest amd-staging-drm-next (commit bf1fd52b0632cd17ac875432a36d3e92be96d8cb). Now the kernel gives me the following errors:
[ 324.552371] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. [ 324.561030] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
And with CONFIG_TRANSPARENT_HUGEPAGE=y the same kernel works fine.
https://bugs.freedesktop.org/show_bug.cgi?id=107229
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #8 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/447.
dri-devel@lists.freedesktop.org