https://bugs.freedesktop.org/show_bug.cgi?id=96296
Bug ID: 96296 Summary: clpeak causes a GPU hang Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/r600 Assignee: dri-devel@lists.freedesktop.org Reporter: notasas@gmail.com QA Contact: dri-devel@lists.freedesktop.org
AMD JUNIPER (DRM 2.43.0 / 4.6.0) Mesa 12.1.0-devel (git-3581812) llvm-3.8 1:3.8-2ubuntu3
clpeak - https://github.com/krrishnarraj/clpeak.git
As soon as it starts it's float8 test (earlier ones run fine), the machine locks up and does not recover. Perhaps it attempts to execute some fp64 instructions that are missing on Juniper?
https://bugs.freedesktop.org/show_bug.cgi?id=96296
--- Comment #1 from Jan Vesely jan.vesely@rutgers.edu --- (In reply to Grazvydas Ignotas from comment #0)
AMD JUNIPER (DRM 2.43.0 / 4.6.0) Mesa 12.1.0-devel (git-3581812) llvm-3.8 1:3.8-2ubuntu3
clpeak - https://github.com/krrishnarraj/clpeak.git
As soon as it starts it's float8 test (earlier ones run fine), the machine locks up and does not recover. Perhaps it attempts to execute some fp64 instructions that are missing on Juniper?
any attempt to use doubles should fail to build the kernel (even with llvm 3.8).
Running with CLOVER_DEBUG=llvm,asm CLOVER_OUTPUT=out_file should give you an idea about what the compiled program looks like, though I'd recommend using llvm 3.9.
https://bugs.freedesktop.org/show_bug.cgi?id=96296
--- Comment #2 from Grazvydas Ignotas notasas@gmail.com --- Created attachment 124221 --> https://bugs.freedesktop.org/attachment.cgi?id=124221&action=edit logs
https://bugs.freedesktop.org/show_bug.cgi?id=96296
--- Comment #3 from Grazvydas Ignotas notasas@gmail.com --- OK so it's the memory bandwidth test that causes the GPU hang, --compute-dp fails with "No double precision support! Skipped", as expected.
llvm 3.9 doesn't seemed to be released so I've build the trunk, but the hang is still there. I've been able to capture the logs before the system dies, attached.
BTW CLOVER_OUTPUT doesn't seem to be handled, did you mean CLOVER_DEBUG_FILE?
https://bugs.freedesktop.org/show_bug.cgi?id=96296
--- Comment #4 from Jan Vesely jan.vesely@rutgers.edu --- Created attachment 124375 --> https://bugs.freedesktop.org/attachment.cgi?id=124375&action=edit global_bandwidth_v16_local_offset asm dump
https://bugs.freedesktop.org/show_bug.cgi?id=96296
--- Comment #5 from Jan Vesely jan.vesely@rutgers.edu --- One problem is that starting from R700 ADD_INT is VecALU only instruction (should not be in Trans slot), but it was not enough to fix the hang on my Turks.
https://bugs.freedesktop.org/show_bug.cgi?id=96296
Andreas Boll andreas.boll.dev@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Blocks| |99553
Referenced Bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=99553 [Bug 99553] Tracker bug for runnning OpenCL applications on Clover
https://bugs.freedesktop.org/show_bug.cgi?id=96296
--- Comment #6 from ricardo.ribalda@gmail.com --- Using llvm 4.0.1 and the latest git commit from libclc ( 17648cd846390e294feafef21c32c7106eac1e24 ):
I am getting a cpu endless loop with clpeak, fixable with ctrl+c.
Other samples, such as Matrix Multiply work fine.
CLOVER_DEBUG=llvm,asm,clc CLOVER_OUTPUT=clover.out clpeak >dump 2>dump.err
https://bugs.freedesktop.org/show_bug.cgi?id=96296
--- Comment #7 from ricardo.ribalda@gmail.com --- Created attachment 130914 --> https://bugs.freedesktop.org/attachment.cgi?id=130914&action=edit AMD PALM (DRM 2.49.0 / 4.10.0-qtec-standard, LLVM 4.0.1 + MESA 17.0.3
https://bugs.freedesktop.org/show_bug.cgi?id=96296
--- Comment #8 from Jan Vesely jan.vesely@rutgers.edu --- got this today. No hang.
Platform: Clover Device: AMD TURKS (DRM 2.49.0 / 4.11.11-300.fc26.x86_64, LLVM 6.0.0) Driver version : 17.3.0-devel (Linux x64) Compute units : 6 Clock frequency : 650 MHz
Global memory bandwidth (GBPS) float : 40.47 float2 : 41.01 float4 : 38.05 float8 : 25.09 float16 : 13.33
Single-precision compute (GFLOPS) float : 124.18 float2 : 243.14 float4 : 249.80 float8 : 285.99 float16 : 350.36
No double precision support! Skipped
Integer compute (GIOPS) int : 62.25 int2 : 122.03 int4 : 123.01 int8 : 122.29 int16 : 122.11
Transfer bandwidth (GBPS) enqueueWriteBuffer : 18.15 enqueueReadBuffer : 3.06 enqueueMapBuffer(for read) : 6.53 memcpy from mapped ptr : 5.65 enqueueUnmap(after write) : 2108.68 memcpy to mapped ptr : 7.49
Kernel launch latency : 67.10 us
https://bugs.freedesktop.org/show_bug.cgi?id=96296
Grazvydas Ignotas notasas@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #9 from Grazvydas Ignotas notasas@gmail.com --- I've changed hardware and can no longer test, so I'll just trust Jan and close this.
https://bugs.freedesktop.org/show_bug.cgi?id=96296
Jan Vesely jan.vesely@rutgers.edu changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |---
--- Comment #10 from Jan Vesely jan.vesely@rutgers.edu --- turns out I spoke too fast. The GPU still hangs, but Linux is better at recovering. There are still GPU hang(ring 0 stalled for more than) messages in dmesg.
https://bugs.freedesktop.org/show_bug.cgi?id=96296
Jan Vesely jan.vesely@rutgers.edu changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|clpeak causes a GPU hang |[clover r600g juniper] | |clpeak causes a GPU hang
https://bugs.freedesktop.org/show_bug.cgi?id=96296
GitLab Migration User gitlab-migration@fdo.invalid changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|REOPENED |RESOLVED
--- Comment #11 from GitLab Migration User gitlab-migration@fdo.invalid --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/586.
dri-devel@lists.freedesktop.org