https://bugs.freedesktop.org/show_bug.cgi?id=95474
Bug ID: 95474 Summary: Bioshock Infinite and DiRT Showdown perform very poor on any GPU with GCN >=1.1 Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: 0xe2.0x9a.0x9b@gmail.com QA Contact: dri-devel@lists.freedesktop.org
See http://www.phoronix.com/scan.php?page=article&item=nv-amd-23ppw
https://bugs.freedesktop.org/show_bug.cgi?id=95474
Jan Ziak 0xe2.0x9a.0x9b@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Bioshock Infinite and DiRT |Bioshock Infinite and DiRT |Showdown perform very poor |Showdown perform very |on any GPU with GCN >=1.1 |poorly on any GPU with GCN | |>=1.1
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #1 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- Does anybody know what the primary cause of the poor performance is?
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #2 from Alex Deucher alexdeucher@gmail.com --- Can you identify what component caused the regression? mesa? llvm? kernel?
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #3 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- (In reply to Alex Deucher from comment #2)
Can you identify what component caused the regression? mesa? llvm? kernel?
That is a good question, but I do not know the answer.
Data:
Kernel: 4.6.0 Kernel module: radeon.ko
Resolution: 1920x1080 Game quality setting: Ultra
CPU: A10-7850K CPU utilization: 120% (kernel-space is about 10% CPU)
GPU: R9 390 GPU utilization (radeontop): >>>> 15% <<<< GPU performance level: forced max clocks
Based on this, it seems that the user-space component (mesa + llvm) is the bottleneck.
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #4 from Alex Deucher alexdeucher@gmail.com --- Can you try a different kernel or mesa version?
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #5 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- (In reply to Alex Deucher from comment #2)
Can you identify what component caused the regression? mesa? llvm? kernel?
Output from "perf report":
# Samples: 660K of event 'cycles' # Event count (approx.): 568704142352 # # Overhead Command Shared Object Symbol 9.94% G.26 radeonsi_dri.so [.] pb_cache_is_buffer_compat 2.51% G.26 libgcc_s.so.1 [.] __umoddi3 1.66% G.26 libc-2.22.so [.] _int_malloc 1.52% G.26 libc-2.22.so [.] _int_free 1.42% G.26 radeonsi_dri.so [.] radeon_drm_cs_add_buffer 1.22% bioshock.i386 radeonsi_dri.so [.] radeon_cs_context_cleanup 1.14% bioshock.i386 [kernel.vmlinux] [k] reservation_object_add_shared_fence 1.02% G.26 libc-2.22.so [.] __libc_calloc 0.93% G.26 libpthread-2.22.so [.] pthread_mutex_lock 0.81% bioshock.i386 [kernel.vmlinux] [k] __ww_mutex_lock_interruptible 0.79% G.26 bioshock.i386 [.] 0x00000000001a8853 0.79% G.26 radeonsi_dri.so [.] pb_cache_reclaim_buffer 0.78% G.26 libpthread-2.22.so [.] __pthread_mutex_unlock_usercnt 0.74% G.26 libc-2.22.so [.] malloc 0.72% bioshock.i386 radeonsi_dri.so [.] radeon_drm_cs_emit_ioctl_oneshot 0.66% bioshock.i386 [radeon] [k] radeon_bo_list_validate 0.60% G.26 radeonsi_dri.so [.] __x86.get_pc_thunk.bx 0.56% G.26 bioshock.i386 [.] 0x00000000001a8e53 0.56% G.26 radeonsi_dri.so [.] set_add 0.55% G.26 radeonsi_dri.so [.] ir_expression::accept 0.54% bioshock.i386 [ttm] [k] ttm_bo_list_ref_sub 0.54% G.26 radeonsi_dri.so [.] _mesa_glsl_parse 0.53% G.26 radeonsi_dri.so [.] radeon_lookup_buffer 0.52% G.26 libc-2.22.so [.] __memcmp_sse4_2 0.48% G.26 radeonsi_dri.so [.] visit_list_elements 0.47% bioshock.i386 [kernel.vmlinux] [k] reservation_object_reserve_shared 0.46% G.26 radeonsi_dri.so [.] si_reset_buffer_resources 0.45% G.26 radeonsi_dri.so [.] hash_table_search 0.45% bioshock.i386 [drm] [k] drm_gem_object_lookup 0.42% G.26 libc-2.22.so [.] malloc_consolidate 0.41% bioshock.i386 [radeon] [k] radeon_sync_fence 0.41% G.26 radeonsi_dri.so [.] set_search 0.40% G.26 radeonsi_dri.so [.] u_default_transfer_inline_write 0.40% bioshock.i386 [ttm] [k] ttm_bo_add_to_lru 0.38% G.26 radeonsi_dri.so [.] st_validate_state 0.36% bioshock.i386 [ttm] [k] ttm_bo_del_from_lru
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #6 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- (In reply to Alex Deucher from comment #4)
Can you try a different kernel or mesa version?
I will try tomorrow.
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #7 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- (In reply to Alex Deucher from comment #4)
Can you try a different kernel or mesa version?
LLVM 3.8 + Mesa 11.2.2 -> same result
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #8 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- (In reply to Alex Deucher from comment #4)
Can you try a different kernel or mesa version?
Kernel 4.4.6 radeon.ko + LLVM 3.8 + Mesa 11.2.2 -> same result
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #9 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- Created attachment 124063 --> https://bugs.freedesktop.org/attachment.cgi?id=124063&action=edit kcachegrind screenshot: _mesa_FenceSync
I ran callgrind on Bioshock with mesa-git.
Callgrind instrumentation was enabled only when the Bioshock benchmark was rendering frames from the game. The benchmark graphics quality was set to Medium.
The screenshot I am sending indicates that Bioshock expects a faster _mesa_FenceSync implementation.
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #10 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- Also, the first part of the Bioshock benchmark causes Mesa to compile some shaders almost every frame.
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #11 from Marek Olšák maraeo@gmail.com --- CPU profiling is usable only if you've built Mesa and LLVM with -fno-omit-frame-pointer.
This article suggests that the regression happened between 11.2 and master. Do you disagree with that? http://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Padoka-May-U...
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #12 from Marek Olšák maraeo@gmail.com --- (In reply to Jan Ziak from comment #9)
Created attachment 124063 [details] kcachegrind screenshot: _mesa_FenceSync
I ran callgrind on Bioshock with mesa-git.
Callgrind instrumentation was enabled only when the Bioshock benchmark was rendering frames from the game. The benchmark graphics quality was set to Medium.
The screenshot I am sending indicates that Bioshock expects a faster _mesa_FenceSync implementation.
I wouldn't trust callgrind, because it runs on a CPU emulator.
The proper way to profile this is to build Mesa with -fno-omit-frame-pointer and use sysprof, which is very easy to use. Sysprof can also save the results to disk.
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #13 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- (In reply to Marek Olšák from comment #11)
This article suggests that the regression happened between 11.2 and master. Do you disagree with that? http://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Padoka-May- Ubuntu- 16&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Phoronix+%28Pho ronix%29
I am unable to reproduce the results from the article on my machine with Mesa 11.2.0. Mesa 11.2.0 and mesa-git have similar performance on my machine.
The 37.35 FPS (Ultra quality setting) in the Phoronix article for Mesa 11.2.0 is highly unlikely.
Bioshock benchmark ignores the command-line option -ForceCompatLevel=N if there exists "$HOME/.local/share/irrationalgames/bioshockinfinite".
https://bugs.freedesktop.org/show_bug.cgi?id=95474
--- Comment #14 from Marek Olšák maraeo@gmail.com --- I've done some profiling.
Bioshock Infinite: - the game is CPU-bound most of the time - some small performance enhancements have landed already - the FenceSync optimization is a work in progress, expect a 30% improvement - most of the scratch buffer usage is for private memory, not VGPR spilling (this may be a defect in our indirect indexing) - if I'm not taking private memory usage into account, it's still in top 2 of the worst VGPR spilling apps
DiRT Showdown: - the game is GPU-bound - there are a bunch of very slow pixel shaders using while loops, it's unclear how to make them faster - most of the scratch buffer usage is for VGPR spilling - it's in top 2 of the worst VGPR spilling apps
https://bugs.freedesktop.org/show_bug.cgi?id=95474
Jan Ziak 0xe2.0x9a.0x9b@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #15 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- I am closing this issue and marking it as fixed.
Bioshock Infinite benchmark @1080p runs at 33 FPS on Ultra settings on A10-7850K+R9-390. R9-390 is a GCN 1.1 GPU.
Phoronix claims to have been able to reach about 80 FPS on Ultra settings with GCN 1.2+ GPUs and a fast 4(8) cores(threads) Skylake CPU:
http://phoronix.com/scan.php?page=news_item&px=RadeonSI-Mesa-Git-BioShoc...
----
DiRT Showdown: It is better to have a separate freedesktop.org bug tracking DiRT Showdown performance issues than to postpone closing this bug.
dri-devel@lists.freedesktop.org