https://bugs.freedesktop.org/show_bug.cgi?id=100105
--- Comment #2 from Jan Vesely jan.vesely@rutgers.edu --- Latest update: diff --git a/src/cluda_opencl.h b/src/cluda_opencl.h index 6e0095c..8ba2d14 100644 --- a/src/cluda_opencl.h +++ b/src/cluda_opencl.h @@ -48,7 +48,7 @@ typedef struct _ga_half { } ga_half;
#define ga_half2float(p) vload_half(0, &((p).data)) -static inline ga_half ga_float2half(ga_float f) { +inline ga_half ga_float2half(ga_float f) { ga_half r; vstore_half_rtn(f, 0, &r.data); return r; diff --git a/src/gpuarray_buffer_opencl.c b/src/gpuarray_buffer_opencl.c index 8f12811..2041ca2 100644 --- a/src/gpuarray_buffer_opencl.c +++ b/src/gpuarray_buffer_opencl.c @@ -146,7 +146,7 @@ cl_ctx *cl_make_ctx(cl_context ctx, gpucontext_props *p) { CL_CHECKN(global_err, clGetDeviceInfo(id, CL_DEVICE_VERSION, device_version_size, device_version, NULL)); - if (device_version[7] == '1' && device_version[9] < '2') { + if (device_version[7] == '1' && device_version[9] < '1') { error_set(global_err, GA_UNSUPPORTED_ERROR, "We only support OpenCL 1.2 and up"); return NULL
pygpu.test()
pygpu is installed in /home/jvesely/.local/lib/python3.6/site-packages/pygpu-0.7.5+12.g6f0132c.dirty-py3.6-linux-x86_64.egg/pygpu NumPy version 1.13.3 NumPy relaxed strides checking option: True NumPy is installed in /usr/lib64/python3.6/site-packages/numpy Python version 3.6.4 (default, Mar 13 2018, 18:18:20) [GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] nose version 1.3.7 *** Testing for AMD Radeon R7 Graphics (CARRIZO / DRM 3.23.0 / 4.15.14-300.fc27.x86_64, LLVM 6.0.0)
---------------------------------------------------------------------- Ran 6670 tests in 995.728s
FAILED (SKIP=12, errors=580, failures=2)
All errors are: TypeError: This is for CUDA arrays. The two failures are: FAIL: pygpu.tests.test_elemwise.test_elemwise_f16(<built-in function add>, 'float16', 'float16', (50,)) FAIL: pygpu.tests.test_elemwise.test_elemwise_f16(<built-in function iadd>, 'float16', 'float16', (50,))
Which fail on half precision rounding error. for example: 7.0390625+7.20703125 is expected to be 14.25 but gpu returns 14.2421875 the fp32 result is 14.24609375.
The GPU result is rounded down (towards zero) The CPU result is rounded up (away from zero)
It looks like our vstore_half_rtn is not working as expected, which is weird because it passes CTS.