https://bugs.freedesktop.org/show_bug.cgi?id=108272
--- Comment #12 from Jan Vesely jan.vesely@rutgers.edu --- Hi,
sorry for the delay. somehow I missed the notifications. (In reply to jamespharvey20 from comment #11)
When I originally filed this, I assumed it was 1 bug since I tried 2 things with OpenCL, and both failed with opencl-mesa but worked with opencl-amd.
Jan Vesely was correct that there were two separate problems.
I'm hoping Jan Vesely can give guidance on whether to leave this bug open for any of the reasons below, or if I should close it and potentially open up 1-2 new bugs.
The original luxmark bug (segfault) is solved, but that exposes 2 new opencl-mesa bugs when running luxmark.
The original IndigoBenchmark bug (segfault) isn't solved, but as explained below, I understand if we have to consider that unsolvable for now.
I don't think this affects any of these bugs, but I'll mention a few weeks ago, I switched back to my Asus Radeon R9 390. The same behaviors discussed in this entire bug report occur. (i.e. 18.2.3 and before crash luxmark.) If someone really wants me to do so, I can switch back to the RX 580 to test 18.2.4, but I'm betting since it works properly with the R9 390 that the problem is fixed.
ORIGINAL LUXMARK BUG #1
Using mesa 18.2.4, the luxmark segfault is solved.
As this was the first bug. I'd close this one and open new bugs for both indigo and incorrect rendering in luxmark.
NEW - LUXMARK BUG #2
Jan Vesely's comment on 2018-10-09 mentions: "bumping MAX_GLOBAL_BUFFERS to 32 allows luxmark to run, albeit still with many incorrect pixels -- libclc rounding conversions are incorrect."
That's what I'm seeing out of 18.2.4. Using LuxBall HDR (Simple Benchmark):
MESA 18.2.4: 40626 (Image validation OK (65739 different pixels, 10.27%)
AMDGPU-PRO: 15739 (Image validation OK (5736 different pixels, 0.90%)
There's no typos there. opencl-mesa scores almost unbelievably higher than opencl-amd, but the different pixels percentage increases by a factor of 11.4.
As Jan's other comment on 2018-10-09 mentions, the image looks garbled and the results are incorrect.
Not sure if this bug should be left open for this issue, or if I should create a new bug. (Or, if there is a bug already open for it.) Or, if mesa will say it's purely libclc's problem, and to go to them about it.
I'd say this is probably a purely libclc problem, but feel free to open the bug against clover on freedesktop. 10% is rather good I usually saw ~30% wrong pixels on my machines.
NEW - LUXMARK BUG #3
Although luxmark can now benchmark, when doing so, all input becomes unusably awful. It reminds me of when Windows has too many things open, suddenly decided it can't cope, and you're waiting to see if it's going to recover or crash. Keystrokes take too long to be printed, and the mouse becomes slow and jumpy. Top shows cpu and memory usage are fine, which was my first thought. BTW, running xf86-video-amdgpu 18.1.0, and when I upgraded mesa, it was both mesa and opencl-mesa.
In comparison, if I use opencl-amd, input is not affected. I wouldn't even know the GPU is being slammed.
Using the program radeontop, I can see when using mesa, "Graphics pipe", "Texture Addresser", and "Shader Interpolator" are between 95-100%, usually 98-100%.
When using opencl-amd, radeontop shows the same. (Granted, Vertex Grouper + Tesselator / Shader Export/Scan Converter/Depth Block/Color Block bounce between 5-20% vs on opencl-mesa, they bounce between 1-5%.)
This sounds like GPU priority/scheduling problem. I haven't looked into whether it can be solved via opening lower priority pipe for compute, or we need to enable advanced features like CWSR. Please open a separate bug. Hogging a large portion of the GPU might explain some of that high score.
INDIGO BUG
I edited 18.2.4's si_get.c to be very short:
snprintf(sscreen->renderer_string, sizeof(sscreen->renderer_string), "%s", chip_name);
And compiled/installed it, but it didn't affect the crash.
IndigoBenchmark said they're statically linking with LLVM 3.4, which is quite old. But, it runs fine with opencl-amd, and only crashes on opencl-mesa. I just posted a followup "where do we go from here"-ish comment there which has to be moderator approved so isn't showing yet. https://www.indigorenderer.com/forum/viewtopic.php?f=37&t=14986
Part of me thinks it needs to be given up on, being a closed-source precompiled binary statically linked against LLVM 3.4.
Part of me thinks since it only crashes with opencl-mesa, and runs perfectly fine with opencl-amd, there's probably (but not definitely) a bug in opencl-mesa.
But, I understand since they don't seem to be paying this any attention, we may have to give up on the Indigo Bug as being unable to be realistically investigated further.
Can you check if indigo exports any LLVM symbols? It might be that we end up using those instead of the new ones from libLLVM.* If that's the case one solution would be to link mesa/clover with static LLVM. Enabling symbol versioning for LLVM should work as well.