https://bugs.freedesktop.org/show_bug.cgi?id=97524
Bug ID: 97524 Summary: Invalid sampler settings cause full GPU reset Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: minor Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: dark_sylinc@yahoo.com.ar QA Contact: dri-devel@lists.freedesktop.org
Created attachment 126091 --> https://bugs.freedesktop.org/attachment.cgi?id=126091&action=edit Bug repro
Running invalid the following GL settings hangs Mesa pretty badly. Sometimes it says stuck on ring(0)/(3), other times it just hangs; GPU tries to reset but does a really poor job (at least I can blindly go to tty1 via Ctrl+Alt+F1 then reboot via Ctrl+Alt+Supr).
Attachment provided to repro the bug.
What is causing it: Vertex Shader must use samplerBuffer on binding point 0. Pixel Shader must use samplerCube on binding point 0. Bind a TBO to binding point 0.
What happens: Near full system hang, it becomes really unstable. System can be soft-rebooted but that's it.
What should happen: All other GPU drivers I tested with handle this gracefully by raising a GL_INVALID_OPERATION error and continuing rendering the rest normally.
Version tested: Latest git 22cec6dc5e5a3060bc87f4a92871b4f6eef04632
I'm assigning a low priority since this GL setting combination is invalid to begin with and I want to believe software isn't shipped like this (though considering other GPU drivers allow to ignore the problem, I wouldn't be fully surprised if there is faulty software out there...); though I believe the system shouldn't hang because of this.
GL_VERSION = 4.3.0.0 GL_VENDOR = X.Org GL_RENDERER = Gallium 0.4 on AMD CAPE VERDE (DRM 2.45.0 / 4.7.0-040700-generic, LLVM 3.9.0)
uname -r 4.7.0-040700-generic
Cheers
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #1 from Nicolai Hähnle nhaehnle@gmail.com --- Hi Matias, thanks for the report. I cannot reproduce this on Bonaire or Polaris. Could you please send the dmesg of this when running your bug repro program?
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #2 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- Created attachment 126190 --> https://bugs.freedesktop.org/attachment.cgi?id=126190&action=edit Kern.log when it managed to reset
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #3 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- Created attachment 126191 --> https://bugs.freedesktop.org/attachment.cgi?id=126191&action=edit Kern.log when it failed to reset
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #4 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- Ah, you're on bleeding edge HW. Stopping showing off!
It's an AMD Radeon HD 7770 1GB. The audio is not in use but I'm including it just in case.
Vendor and Device ID 1682:3231
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X] (prog-if 00 [VGA controller]) Subsystem: XFX Pine Group Inc. Cape Verde XT [Radeon HD 7770/8760 / R7 250X] Flags: bus master, fast devsel, latency 0, IRQ 27 Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f7d00000 (64-bit, non-prefetchable) [size=256K] I/O ports at e000 [size=256] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Capabilities: [270] #19 Kernel driver in use: radeon Kernel modules: radeon
Vendor and Device ID 1682:aab0
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] Subsystem: XFX Pine Group Inc. Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] Flags: bus master, fast devsel, latency 0, IRQ 32 Memory at f7d60000 (64-bit, non-prefetchable) [size=16K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Kernel driver in use: snd_hda_intel
I'll be attaching two logs. One of them from a few days ago where the GPU managed to recover (if we can call that "recover" it was barely functional and had to reset via Ctrl+Alt+Supr). Another from today I just repro again and was totally unable to recover, screen flashed several times as if it tried to soft reset more than once (I'm not certain that's what happened), still could do Ctrl+Alt+Supr to reset.
I've been thinking these few days this bug MIGHT be related (causing): https://bugs.freedesktop.org/show_bug.cgi?id=93649 If the game for just one frame randomly presents an invalid setup for one measly small object, it would hang Radeon 7770s but it would work fine for everybody else including these users in Windows (seriously, in my own program I could not visually tell what was missing, it was some small object in a cubemap render used for reflections) and nobody would notice.
If you have a suspect on where I should look (i.e. you have Mesa code that SHOULD be catching this incorrect setup but isn't; I can analyze why it's not catching the error) I can insert a few printfs or hook gdb. As a graphics programmer I have my pride to keep.
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #5 from Nicolai Hähnle nhaehnle@gmail.com --- I've now tried your sample with Verde, and cannot reproduce it there either.
The logs you have submitted are unfortunately not very useful, because all I see are messages about ring stalls and reset attempts. Are there any messages _before_ that?
To be honest, I don't see what the problem could be. Running it with a debug build (which shows all OpenGL errors) shows quite clearly that the bad state is caught at program link time, which then leads to the bad program not being used.
One thing that catches my eye is that your glxinfo says DRM 2.45.0, while mine says DRM 2.47.0.
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #6 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- Of course you can't repro... nothing is ever easy, isnt it? :D
Regarding the log: Unfortunately that's it. If I go one line up, I can see a crash from nemo, the file browser which had crashed a few minutes earlier for something completely unrelated.
Regarding libdrm: it's hard to know you're on latest everything! I didn't know I wasn't on latest DRM. I will compile, update and try again.
Could you tell me the error msgs you get? (from successfully catching the error). I could Find in Files the error string to see where it's being caught and begin investigating why it never hits on my machine.
Cheers Matias
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #7 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- I've researched yesterday and I did not know DRM 2.47 was tied to the kernel version. I thought libdrm was what determined the version.
I would have to update to the kernel 4.8 which I can't do immediately.
In the meantime it would help a lot if I know what errors show up for you so I can research why my machine is just firing up the draw commands.
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #8 from Nicolai Hähnle nhaehnle@gmail.com --- Here's the output:
GL Context creation suceeded. Mesa: User error: GL_INVALID_OPERATION in glUseProgram(program 3 not linked) ^C0:1(1): error: syntax error, unexpected $end0:1(1): error: syntax error, unexpected $endGL_INVALID_OPERATION in glUseProgram(program 3 not linked)Shader Stats: SGPRS: 24 VGPRS: 16 Code Size: 64 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0Shader Stats: SGPRS: 16 VGPRS: 16 Code Size: 40 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0Shader Stats: SGPRS: 18 VGPRS: 8 Code Size: 252 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0Shader Stats: SGPRS: 16 VGPRS: 16 Code Size: 64 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0Shader Stats: SGPRS: 16 VGPRS: 16 Code Size: 40 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0
Again, this is with a debug build of Mesa, but there should be no difference in error checking -- only the output when an error is detected is different.
https://bugs.freedesktop.org/show_bug.cgi?id=97524
--- Comment #9 from Matias N. Goldberg dark_sylinc@yahoo.com.ar --- Oh I see what's happening.
That's my fault.
The paths to the shader files were hardcoded as absolute paths.
Look for: readFile( "/home/matias/Projects/MesaRing0Bug/src/VertexShader_vs.glsl", data, FILE_BUF );
and: readFile( "/home/matias/Projects/MesaRing0Bug/src/PixelShader_ps.glsl", data, FILE_BUF );
Change the hardcoded paths, and try again. Sorry about that.
https://bugs.freedesktop.org/show_bug.cgi?id=97524
Nicolai Hähnle nhaehnle@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- QA Contact|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop. |.org |org Component|Drivers/Gallium/radeonsi |Mesa core Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop. |.org |org
--- Comment #10 from Nicolai Hähnle nhaehnle@gmail.com --- I can reproduce this now. It really seems like this should be fixed in Mesa main, though: there is already code that checks for this condition when it affects a single program stage (in _mesa_update_shader_textures_used) and when it affects a SSO pipeline (in _mesa_sampler_uniforms_pipeline_are_valid). This code needs to be hooked into the non-SSO case as well.
dri-devel@lists.freedesktop.org