https://bugs.freedesktop.org/show_bug.cgi?id=96897
Bug ID: 96897 Summary: [opencl] clpeak hangs during compilation Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: 0xe2.0x9a.0x9b@gmail.com QA Contact: dri-devel@lists.freedesktop.org
Hello.
clpeak (http://github.com/krrishnarraj/clpeak) defacto enters an infinite loop during compilation.
GPU: R9 390 Kernel module: amdgpu.ko, linux 4.7.0-rc7 Mesa: 12.1.0-devel (git-ead7736) LLVM: git 2016-jul-11
$ clinfo Number of platforms: 1 (should be 2: intel.cpu + mesa.gpu) Platform Version: OpenCL 1.1 Mesa 12.1.0-devel (git-ead7736)
$ ll /usr/lib64/libOpenCL.so.1 /usr/lib64/libOpenCL.so.1 -> OpenCL/vendors/mesa/libOpenCL.so.1.0.0
(Gentoo Linux) $ eselect opencl list Available OpenCL implementations: [1] amdgpu-pro [2] intel [3] mesa * [4] nvidia
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #1 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- Created attachment 125023 --> https://bugs.freedesktop.org/attachment.cgi?id=125023&action=edit gdb backtrace
https://bugs.freedesktop.org/show_bug.cgi?id=96897
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #125023|text/x-log |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=96897
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop. |.org |org QA Contact|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop. |.org |org Component|Drivers/Gallium/radeonsi |Mesa core
--- Comment #2 from Michel Dänzer michel@daenzer.net --- Looks like deep recursion in clover / LLVM code.
https://bugs.freedesktop.org/show_bug.cgi?id=96897
Vedran Miletić vedran@miletic.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|[opencl] clpeak hangs |clpeak OpenCL benchmark |during compilation |hangs during compilation on | |Clover RadeonSI Component|Mesa core |Drivers/Gallium/radeonsi QA Contact|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop |org |.org CC| |vedran@miletic.net Assignee|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop |org |.org Blocks| |99553
--- Comment #3 from Vedran Miletić vedran@miletic.net --- Interesting, I will look into this.
Referenced Bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=99553 [Bug 99553] Tracker bug for runnning OpenCL applications on Clover
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #4 from Vedran Miletić vedran@miletic.net --- Not anymore on both LLVM 3.9.1 and LLVM git from today:
input.cl:34:106: error: call to 'mad' is ambiguous input.cl:30:22: note: expanded from macro 'MAD_64' input.cl:29:22: note: expanded from macro 'MAD_16' input.cl:28:25: note: expanded from macro 'MAD_4' /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function /usr/local/include/clc/math/mad.inc:1:39: note: candidate function input.cl:34:106: error: call to 'mad' is ambiguous
Did clpeak change or did we change? If we changed, did we regress?
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #5 from Jan Ziak 0xe2.0x9a.0x9b@gmail.com --- With LLVM 4.0.0 I am getting the following results:
$ clinfo Platform ID: 0x7ff6aaf2ed60 Name: AMD HAWAII (DRM 3.10.0 / 4.11.0-rc2+, LLVM 4.0.0) Vendor: AMD Device OpenCL C version: OpenCL C 1.1 Driver version: 17.1.0-devel Profile: FULL_PROFILE Version: OpenCL 1.1 Mesa 17.1.0-devel (git-ad13bd2)
$ ./clpeak Platform: Clover Device: AMD HAWAII (DRM 3.10.0 / 4.11.0-rc2+, LLVM 4.0.0) Driver version : 17.1.0-devel (Linux x64) Compute units : 40 Clock frequency : 1000 MHz clpeak: /var/tmp/portage/sys-devel/clang-4.0.0/work/x/y/cfe-4.0.0.src/lib/Sema/Sema.cpp:317: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed. Aborted (core dumped)
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #6 from Andy Furniss adf.lists@gmail.com --- Same for me on tonga + git llvm/libclc/mesa/clpeak
Platform: Clover Device: AMD TONGA (DRM 3.13.0 / 4.11.0-rc1-g00c1259, LLVM 5.0.0) Driver version : 17.1.0-devel (Linux x64) Compute units : 28 Clock frequency : 973 MHz clpeak: /mnt/sdb1/Gits/llvm/tools/clang/lib/Sema/Sema.cpp:316: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed. Aborted
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #7 from Andy Furniss adf.lists@gmail.com --- (In reply to Andy Furniss from comment #6)
Same for me on tonga + git llvm/libclc/mesa/clpeak
Platform: Clover Device: AMD TONGA (DRM 3.13.0 / 4.11.0-rc1-g00c1259, LLVM 5.0.0) Driver version : 17.1.0-devel (Linux x64) Compute units : 28 Clock frequency : 973 MHz clpeak: /mnt/sdb1/Gits/llvm/tools/clang/lib/Sema/Sema.cpp:316: clang::Sema::~Sema(): Assertion `DelayedTypos.empty() && "Uncorrected typos!"' failed. Aborted
This starts with clpeak commit -
16e1b207a4d4e81a0c48c77c950437dca1364cb6 is the first bad commit commit 16e1b207a4d4e81a0c48c77c950437dca1364cb6 Author: espes espes@pequalsnp.com Date: Mon Jul 18 17:06:15 2016 -0700
Add support for halfs
Before this it completes OK, but there is some delay ~40 seconds, before results start appearing.
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #8 from ricardo.ribalda@gmail.com --- With:
Device: AMD CARRIZO (DRM 3.9.0 / 4.10.0-qtec-standard, LLVM 4.0.1) Driver version : 17.0.3 (Linux x64) Compute units : 8 Clock frequency : 800 MHz
I am getting the same error as Vedran: error: call to 'mad' is ambiguous
After reverting:
16e1b207a4d4e81a0c48c77c950437dca1364cb6 is the first bad commit commit 16e1b207a4d4e81a0c48c77c950437dca1364cb6 Author: espes espes@pequalsnp.com Date: Mon Jul 18 17:06:15 2016 -0700
I am experiencing an endless loop as reported by Jan.
I get the same endless loop with:
Platform: Clover Device: AMD PALM (DRM 2.49.0 / 4.10.0-qtec-standard, LLVM 4.0.1) Driver version : 17.0.3 (Linux x64) Compute units : 2 Clock frequency : 0 MHz
https://bugs.freedesktop.org/show_bug.cgi?id=96897
M. Edward (Ed) Borasky znmeb@znmeb.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |znmeb@znmeb.net
--- Comment #9 from M. Edward (Ed) Borasky znmeb@znmeb.net --- I have something like this on Fedora - both 25 (stable) and 26 (alpha). I type "clpeak" and the CPU goes to 100% and nothing else happens.
I'll attach a 'clinfo' printout.
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #10 from M. Edward (Ed) Borasky znmeb@znmeb.net --- Created attachment 131104 --> https://bugs.freedesktop.org/attachment.cgi?id=131104&action=edit clinfo for the system
Note: this bug is in Fedora's bugzilla as well - https://bugzilla.redhat.com/show_bug.cgi?id=1433632
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #11 from M. Edward (Ed) Borasky znmeb@znmeb.net --- Linking to a clpeak GitHub issue: https://github.com/krrishnarraj/clpeak/issues/32
Note: I'm now on Arch Linux and I have the non-looping version of this.
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #12 from Jan Vesely jan.vesely@rutgers.edu ---
input.cl:34:106: error: call to 'mad' is ambiguous
This looks to be caused by the lack of half precision builtins in libclc. GCN+ GPUs advertise support for cl_khr_fp16 in CLC but libclc is not ready yet.
You can try my experimental cl_khr_fp16 branch: https://github.com/jvesely/libclc/tree/cl_khr_fp16
https://bugs.freedesktop.org/show_bug.cgi?id=96897
Jan Vesely jan.vesely@rutgers.edu changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #13 from Jan Vesely jan.vesely@rutgers.edu --- Initial support for cl_khr_fp16 builtins has been added to libclc in r332677. It should be enough to run clpeak. clpeak still takes few mins to compile the kernels (~7mins on my carrizo laptop)
https://bugs.freedesktop.org/show_bug.cgi?id=96897
--- Comment #14 from Dieter Nützel Dieter@nuetzel-hh.de --- (In reply to Jan Vesely from comment #13)
Initial support for cl_khr_fp16 builtins has been added to libclc in r332677. It should be enough to run clpeak. clpeak still takes few mins to compile the kernels (~7mins on my carrizo laptop)
GREAT work Jan!
After 3 min and ~12 sec float start crunching on my X3470 Xeon (only one core would be used for kernel compile => 3.6 GHz turbo mode)
My desktop was frozen during float 'Global memory bandwidth (GBPS)' compute and partly frozen during 'Double-precision compute (GFLOPS)'.
Whole benchmark finished after 6 min and 17 secs.
/home/dieter> time clpeak
Platform: Clover Device: Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 4.16.9-1.g4f45b1e-default, LLVM 7.0.0) Driver version : 18.2.0-devel (Linux x64) Compute units : 36 Clock frequency : 1411 MHz
Global memory bandwidth (GBPS) float : 2.64 float2 : 2.64 float4 : 2.64 float8 : 2.54 float16 : 1.45
Single-precision compute (GFLOPS) float : 6341.87 float2 : 6131.34 float4 : 6105.61 float8 : 5933.91 float16 : 5939.44
half-precision compute (GFLOPS) half : 6307.47 half2 : 6193.25 half4 : 6114.34 half8 : 5729.57 half16 : 6047.90
Double-precision compute (GFLOPS) double : 404.52 double2 : 404.41 double4 : 404.06 double8 : 403.08 double16 : 401.53
Integer compute (GIOPS) int : 1222.75 int2 : 1213.90 int4 : 1210.72 int8 : 1208.57 int16 : 1213.99
Transfer bandwidth (GBPS) enqueueWriteBuffer : 8.78 enqueueReadBuffer : 4.86 enqueueMapBuffer(for read) : 4871.79 memcpy from mapped ptr : 4.94 enqueueUnmap(after write) : 3528.56 memcpy to mapped ptr : 4.94
Kernel launch latency : 293.57 us
206.285u 3.765s 6:17.14 55.6% 0+0k 0+0io 0pf+0w
For reference AMD 17.40 /home/dieter> time clpeak
Platform: AMD Accelerated Parallel Processing Device: Ellesmere Driver version : 2482.3 (Linux x64) Compute units : 36 Clock frequency : 1411 MHz
Global memory bandwidth (GBPS) float : 202.59 float2 : 209.30 float4 : 209.63 float8 : 162.15 float16 : 138.41
Single-precision compute (GFLOPS) float : 6342.71 float2 : 6374.96 float4 : 6178.29 float8 : 5973.53 float16 : 6018.79
half-precision compute (GFLOPS) half : 6306.97 half2 : 6366.06 half4 : 6350.41 half8 : 6154.31 half16 : 6280.47
Double-precision compute (GFLOPS) double : 404.64 double2 : 404.38 double4 : 398.54 double8 : 403.25 double16 : 401.53
Integer compute (GIOPS) int : 1206.77 int2 : 1221.26 int4 : 1225.83 int8 : 1225.88 int16 : 1227.35
Transfer bandwidth (GBPS) enqueueWriteBuffer : 9.03 enqueueReadBuffer : 5.08 enqueueMapBuffer(for read) : 149130.81 memcpy from mapped ptr : 5.09 enqueueUnmap(after write) : 75882.81 memcpy to mapped ptr : 5.08
Kernel launch latency : 93.33 us
23.056u 1.592s 1:08.29 36.0% 0+0k 0+0io 0pf+0w
dri-devel@lists.freedesktop.org