https://bugs.freedesktop.org/show_bug.cgi?id=75400
Priority: medium Bug ID: 75400 Assignee: dri-devel@lists.freedesktop.org Summary: regression in OpenCL since commit cc3aeac Severity: normal Classification: Unclassified OS: Linux (All) Reporter: brunojimen@gmail.com Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Drivers/Gallium/r600 Product: Mesa
Hi,
This morning I recompiled mesa and found that the OpenCL support was broken. I have managed to bisect the regresion back to commit cc3aeac ( http://cgit.freedesktop.org/mesa/mesa/commit/?id=cc3aeacab64a6928a903f1dbfea... ) Strangely, it's nothing related to clover.
I am using Arch linux with kernel 3.13.4 and a AMD HD5470. Nothing interesting in dmesg or Xorg logs.
If I can give you any more information, just ask.
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #1 from Emil Velikov emil.l.velikov@gmail.com --- Hi Bruno
What you mean with "broken" here ? If you're talking about a compilation problem take a look at bug 75356, which has a patch to resolve it.
If you are having different a problem let us know what it is :)
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #2 from Bruno Jiménez brunojimen@gmail.com --- Hi Emil,
No, it's not a compilation error, nor for mesa nor for opencl code. It's just that OpenCL programs crash with segfaults.
Every test from http://cgit.freedesktop.org/~tstellar/opencl-example/ fails and its 'hello_world' program crash with a segfault.
As the code changed in that bug has nothing to do with clover, maybe the problem is with my configuration?
Here's what I pass to autogen.sh, surely there's something I don't need, but I took them from a PKGBUILD:
--prefix=/usr \ --sysconfdir=/etc \ --with-dri-driverdir=/usr/lib/xorg/modules/dri \ --with-gallium-drivers=r600,swrast\ --with-dri-drivers=swrast \ --enable-gallium-llvm \ --enable-egl \ --enable-gallium-egl \ --with-egl-platforms=x11,drm,wayland \ --enable-shared-glapi \ --enable-gbm \ --enable-gallium-gbm \ --enable-glx-tls \ --enable-dri \ --enable-glx \ --enable-osmesa \ --enable-texture-float \ --enable-xa \ --enable-vdpau \ --enable-omx \ --with-llvm-shared-libs \ --enable-opencl --enable-opencl-icd \ --with-clang-libdir=/usr/lib
If there's anything else I can do to help, just ask. Thanks!
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #3 from Emil Velikov emil.l.velikov@gmail.com --- Strange I do not see how the commit will cause other than compilation issues. FWIW might be worth double-checking that the bisect went fine and attaching a back trace of the segfault.
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #4 from Bruno Jiménez brunojimen@gmail.com --- I am also very surpised of what commit seems to start this. I have done the bisect making Arch packages, installing and then testing them. So, unless I have missed something, which is also possible, that's it.
I have recompiled at commit cc3aeac with debug information, but for some strange reason, gdb don't want to step into OpenCL functions.
Here's what I have guessed:
- Actually, the segfault comes from a fprintf with a "%s" and a null pointer. It can be solved by just adding a default case to 'clUtilErrorString'.
- The real problem happens with 'clGetPlatformIDs', which returns an error value of '-1001'.
I have triggered the return of 'CL_INVALID_VALUE', and tried various combinations of parameters to see if it changed anything. And seems to be one thing or the other.
I have checked the code at mesa/src/gallium/state_trackers/clover/api/platform.cpp (where clGetPlatformIDs is) and have no clue how it can be possible.
Sorry if this isn't enough information, but I completely clueless of what can be happening.
I will check again my packages to see if I have compiled some version and have called it other.
If I can help with anything else, just ask.
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #5 from Francisco Jerez currojerez@riseup.net --- (In reply to comment #4)
I am also very surpised of what commit seems to start this. I have done the bisect making Arch packages, installing and then testing them. So, unless I have missed something, which is also possible, that's it.
I have recompiled at commit cc3aeac with debug information, but for some strange reason, gdb don't want to step into OpenCL functions.
Here's what I have guessed:
- Actually, the segfault comes from a fprintf with a "%s" and a null
pointer. It can be solved by just adding a default case to 'clUtilErrorString'.
- The real problem happens with 'clGetPlatformIDs', which returns an error
value of '-1001'.
I have triggered the return of 'CL_INVALID_VALUE', and tried various combinations of parameters to see if it changed anything. And seems to be one thing or the other.
I have checked the code at mesa/src/gallium/state_trackers/clover/api/platform.cpp (where clGetPlatformIDs is) and have no clue how it can be possible.
Sorry if this isn't enough information, but I completely clueless of what can be happening.
I will check again my packages to see if I have compiled some version and have called it other.
If I can help with anything else, just ask.
Most likely you're getting that segfault somewhere in the ICD loader because it's unable to load Mesa's ICD library. I guess that this hunk:
+if NEED_WINSYS_XLIB +AM_CPPFLAGS += -DHAVE_WINSYS_XLIB +endif
pulls in the XLIB pipe-loader back-end that was previously ifdef-ed out in Clover builds, leading to undefined symbols in the resulting library.
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #6 from Emil Velikov emil.l.velikov@gmail.com --- (In reply to comment #5)
Most likely you're getting that segfault somewhere in the ICD loader because it's unable to load Mesa's ICD library. I guess that this hunk:
+if NEED_WINSYS_XLIB +AM_CPPFLAGS += -DHAVE_WINSYS_XLIB +endif
pulls in the XLIB pipe-loader back-end that was previously ifdef-ed out in Clover builds, leading to undefined symbols in the resulting library.
Would that not cause the build/link to fail ? Hmm guess not, since the opencl target is missing -no-undefined.
Francisco, Is there any particular reason why we do not use -no-undefined for opencl ?
Bruno, Feel free to grab the patch from bug 75356, which should handle the symbol problems and continue from there.
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #7 from Bruno Jiménez brunojimen@gmail.com --- Hi Francisco,
The segfaults were caused because 'clGetPlatformIDs' returned an strange error (-1001), and when passed to 'clUtilErrorString' (from 'cl_util.c') it meant an unhandled error case. So it returned nothing, and when fprintf tries to write it it gives a segfault.
Emil,
I'll try that patch as soon as I can.
Thanks!
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #8 from Bruno Jiménez brunojimen@gmail.com --- Hi,
I'm afraid that that patch doesn't help. I have also tried the patch you have sent to the Mailing List ( http://lists.freedesktop.org/archives/mesa-dev/2014-February/054780.html ) but also nothing.
If there's anything else I can do, just ask. Thanks!
https://bugs.freedesktop.org/show_bug.cgi?id=75400
--- Comment #9 from Emil Velikov emil.l.velikov@gmail.com --- (In reply to comment #8)
Hi,
I'm afraid that that patch doesn't help. I have also tried the patch you have sent to the Mailing List ( http://lists.freedesktop.org/archives/mesa-dev/2014-February/054780.html ) but also nothing.
Interesting that patch you've linked should have caused build breakage as there is yet another missing symbol/reference :\
Just pushed a few patches that should resolve the missing symbols within pipe-loader, used by opencl. Checkout latest master and give it a try.
https://bugs.freedesktop.org/show_bug.cgi?id=75400
Bruno Jiménez brunojimen@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #10 from Bruno Jiménez brunojimen@gmail.com --- Hi,
The latest master branch works perfectly.
Thanks a lot!
dri-devel@lists.freedesktop.org