https://bugs.freedesktop.org/show_bug.cgi?id=106631
Bug ID: 106631 Summary: PALM: clpeak: Bus error (core dumped) & lots of GPU lockup Product: Mesa Version: 18.0 Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/r600 Assignee: dri-devel@lists.freedesktop.org Reporter: ricardo.ribalda@gmail.com QA Contact: dri-devel@lists.freedesktop.org
root@qt5022:~# time clpeak
Platform: Clover Device: AMD PALM (DRM 2.50.0 / 4.16.0-qtec-standard, LLVM 6.0.1) Driver version : 18.0.3 (Linux x64) Compute units : 2 Clock frequency : 0 MHz
Global memory bandwidth (GBPS) float : 5.42 float2 : 7.10 float4 : 6.69 float8 : 4.88 float16 : 0.43
Single-precision compute (GFLOPS) float : 18.90 float2 : 36.90 float4 : 38.66 float8 : 42.19 float16 : 53.13
No half precision support! Skipped
No double precision support! Skipped
Integer compute (GIOPS) int : 9.48 int2 : Bus error (core dumped)
real 20m10.785s user 15m58.717s sys 1m5.031s
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #1 from Ricardo Ribalda ricardo.ribalda@gmail.com --- Created attachment 139710 --> https://bugs.freedesktop.org/attachment.cgi?id=139710&action=edit dmesg
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #2 from Ricardo Ribalda ricardo.ribalda@gmail.com --- libclc version: a2118d58fca567694edfabea78293e0dc9255500 (current HEAD)
https://bugs.freedesktop.org/show_bug.cgi?id=106631
Jan Vesely jan.vesely@rutgers.edu changed:
What |Removed |Added ---------------------------------------------------------------------------- Blocks| |99553
--- Comment #3 from Jan Vesely jan.vesely@rutgers.edu --- looks like the benchmark needs more than the allocated 10s to complete. you can adjust this via radeon.lockup_timeout kernel module parameter. You can check the used value at: /sys/module/radeon/parameters/lockup_timeout but you'll need to set it at boot time.
(In reply to Ricardo Ribalda from comment #0)
real 20m10.785s user 15m58.717s sys 1m5.031s
oh, that's pretty slow...
Referenced Bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=99553 [Bug 99553] Tracker bug for runnning OpenCL applications on Clover
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #4 from Ricardo Ribalda ricardo.ribalda@gmail.com --- Hi Jan
I have increased lockup_timeout to 100K and I am not getting the bus error. But I am getting similar dmesg errors
root@qt5022:~# cat /sys/module/radeon/parameters/lockup_timeout 100000
root@qt5022:~# time clpeak
Platform: Clover Device: AMD PALM (DRM 2.50.0 / 4.16.0-qtec-standard, LLVM 6.0.1) Driver version : 18.0.3 (Linux x64) Compute units : 2 Clock frequency : 0 MHz
Global memory bandwidth (GBPS) float : 5.46 float2 : 7.17 float4 : 6.79 float8 : 4.89 float16 : 0.11
Single-precision compute (GFLOPS) float : 18.95 float2 : 36.90 float4 : 38.69 float8 : 42.19 float16 : 53.17
No half precision support! Skipped
No double precision support! Skipped
Integer compute (GIOPS) int : 9.49 int2 : 18.52 int4 : 18.41 int8 : 18.59 int16 : 18.05
Transfer bandwidth (GBPS) enqueueWriteBuffer : 1.21 enqueueReadBuffer : 0.63 enqueueMapBuffer(for read) : 2.30 memcpy from mapped ptr : 0.87 enqueueUnmap(after write) : 506.05 memcpy to mapped ptr : 0.88
Kernel launch latency : 608.36 us
real 29m45.765s user 18m55.669s sys 1m7.651s
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #5 from Ricardo Ribalda ricardo.ribalda@gmail.com --- Created attachment 139734 --> https://bugs.freedesktop.org/attachment.cgi?id=139734&action=edit dmesg 100k
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #6 from Ricardo Ribalda ricardo.ribalda@gmail.com --- Eventhough it is not comparable, for reference: this is the result with fgrlx.
root@qt5022:~# time clpeak
Platform: AMD Accelerated Parallel Processing Device: AMD G-T56N Processor Driver version : 1800.8 (sse2) (Linux x64) Compute units : 2 Clock frequency : 530 MHz
Global memory bandwidth (GBPS) float : 0.80 float2 : 1.12 float4 : 1.09 float8 : 1.31 float16 : 1.34
Single-precision compute (GFLOPS) float : 0.59 float2 : 1.16 float4 : 2.32 float8 : 4.42 float16 : 0.85
No half precision support! Skipped
Double-precision compute (GFLOPS) double : 0.43 double2 : 0.84 double4 : 1.46 double8 : 1.41 double16 : 0.28
Integer compute (GIOPS) int : 0.73 int2 : 0.30 int4 : 0.35 int8 : 0.40 int16 : 0.32
Transfer bandwidth (GBPS) enqueueWriteBuffer : 1.29 enqueueReadBuffer : 1.08 enqueueMapBuffer(for read) : 3591.11 memcpy from mapped ptr : 0.98 enqueueUnmap(after write) : 15339.17 memcpy to mapped ptr : 0.99
Kernel launch latency : 64.41 us
real 8m12.337s user 12m51.022s sys 0m29.840s
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #7 from Jan Vesely jan.vesely@rutgers.edu --- It looks like even 100s is not enough. Can you try running with no time limit? (set to 0). Looking at the numbers I think mesa's results are inflated by the kernel getting killed before finishing the computation. Looking at the numbers it can take significantly longer.
Did you by any chance build llvm in debug mode? that can inflate kernel compile times significantly.
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #8 from Ricardo Ribalda ricardo.ribalda@gmail.com --- I am using llvm/clang from https://github.com/kraj/meta-clang . Can you point me to something to check if the debug mode is enabled or not?
Thanks
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #9 from Ricardo Ribalda ricardo.ribalda@gmail.com --- (In reply to Ricardo Ribalda from comment #8)
I am using llvm/clang from https://github.com/kraj/meta-clang . Can you point me to something to check if the debug mode is enabled or not?
Thanks
Answer to myself. Seems to be a Release build : https://github.com/kraj/meta-clang/blob/master/recipes-devtools/clang/clang_...
But if you can tell me how to verify it in runtime I would love to try it
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #10 from Jan Vesely jan.vesely@rutgers.edu --- (In reply to Ricardo Ribalda from comment #9)
(In reply to Ricardo Ribalda from comment #8)
I am using llvm/clang from https://github.com/kraj/meta-clang . Can you point me to something to check if the debug mode is enabled or not?
Thanks
Answer to myself. Seems to be a Release build : https://github.com/kraj/meta-clang/blob/master/recipes-devtools/clang/ clang_git.bb#L78
But if you can tell me how to verify it in runtime I would love to try it
$ llvm-config --assertion-mode and $ llvm-config --build-mode
this won't change the GPU kernel running time, but it might speed up the kernel compilation time.
https://bugs.freedesktop.org/show_bug.cgi?id=106631
--- Comment #11 from Ricardo Ribalda ricardo.ribalda@gmail.com --- Seems that it is in release mode
root@qt5022:~# llvm-config --assertion-mode OFF root@qt5022:~# llvm-config --build-mode Release
https://bugs.freedesktop.org/show_bug.cgi?id=106631
GitLab Migration User gitlab-migration@fdo.invalid changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #12 from GitLab Migration User gitlab-migration@fdo.invalid --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/638.
dri-devel@lists.freedesktop.org