https://bugs.freedesktop.org/show_bug.cgi?id=79659
Priority: medium Bug ID: 79659 Assignee: dri-devel@lists.freedesktop.org Summary: R9 270X lockup with unigine valley since radeonsi: enable ARB_sample_shading Severity: normal Classification: Unclassified OS: Linux (All) Reporter: adf.lists@gmail.com Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Drivers/Gallium/radeonsi Product: Mesa
R9 270X since -
commit f98a7d89be5d307c7a80fbde028a610f4377c3b9 Author: Marek Olšák marek.olsak@amd.com Date: Wed May 7 13:15:41 2014 +0200
radeonsi: enable ARB_sample_shading
unigine valley run like -
vblank_mode=0 MESA_GLSL_VERSION_OVERRIDE=330 MESA_GL_VERSION_OVERRIDE=3.3 ./valley
will gpu lock then hard lock if I don't sysrq sub quickly enough
Jun 4 21:59:31 ph4 kernel: radeon 0000:01:00.0: ring 3 stalled for more than 10003msec Jun 4 21:59:31 ph4 kernel: radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000000d8a9 last fence id 0x000000000000d8a7 on ring 3) Jun 4 21:59:31 ph4 kernel: radeon 0000:01:00.0: failed to get a new IB (-35) Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: Saved 1677 dwords of commands on ring 0. Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: GPU softreset: 0x0000004D Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS = 0xF7D20028 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0xEC400000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0xEDC00000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200400C0 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x40000000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008006 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x80228647 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44483106 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100100 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200400C0 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jun 4 21:59:33 ph4 kernel: [drm] probing gen 2 caps for device 1022:9603 = 300d02/0 Jun 4 21:59:33 ph4 kernel: [drm] PCIE gen 2 link speeds already enabled Jun 4 21:59:33 ph4 kernel: [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000). Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: WB enabled Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8800cc194c00 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8800cc194c04 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8800cc194c08 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8800cc194c0c Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8800cc194c10 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc900105b5a18 Jun 4 21:59:33 ph4 kernel: [drm] ring test on 0 succeeded in 3 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 1 succeeded in 1 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 2 succeeded in 1 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 3 succeeded in 2 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 4 succeeded in 1 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 5 succeeded in 2 usecs Jun 4 21:59:33 ph4 kernel: [drm] UVD initialized successfully. Jun 4 21:59:43 ph4 kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10000msec Jun 4 21:59:43 ph4 kernel: radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000002dc5a last fence id 0x000000000002dc3f on ring 0) Jun 4 21:59:43 ph4 kernel: [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). Jun 4 21:59:43 ph4 kernel: [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). Jun 4 21:59:43 ph4 kernel: radeon 0000:01:00.0: ib ring test failed (-35). Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GPU softreset: 0x00000048 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS = 0xA0003028 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200400C0 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00400002 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x84010243 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Jun 4 21:59:44 ph4 kernel: SysRq : Emergency Sync Jun 4 21:59:44 ph4 kernel: Emergency Sync complete Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200400C0 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #1 from Michel Dänzer michel@daenzer.net --- Looks like it's failing to compile some (all?) fragment shaders:
GLShader::loadFragment(): error in "core/shaders/default/sky/fragment_volume_ambient.shader" file defines: UNKNOWN,QUALITY_LOW,QUALITY_MEDIUM,QUALITY_HIGH,MULTISAMPLE_0,USE_INSTANCING,USE_GEOMETRY_SHADER,USE_TEXTURE_3D,USE_TEXTURE_ARRAY,USE_ALPHA_FADE,USE_REFLECTION,USE_OCCLUSION,HAS_DEFERRED_COLOR,HAS_DEFERRED_NORMAL,USE_RGB10A2,USE_ENVIRONMENT,USE_NORMALIZATION,USE_DIRECTIONAL_LIGHTMAPS,USE_SHADOW_KERNEL,OPENGL,HAS_ARB_DRAW_INSTANCED,HAS_ARB_TEXTURE_SNORM,SHADING_LANGUAGE=330,USE_ARB_BLEND_FUNC_EXTENDED,USE_ARB_SHADER_BIT_ENCODING,USE_ARB_SAMPLE_SHADING,,TURBULENCE 0:170(1): error: syntax error, unexpected EXTENSION, expecting $end
... and so on.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #2 from José Suárez j.suarez.agapito@gmail.com --- I have been experiencing the same problem both with Unigine Heaven and Unigine Valley since 2 June git version. I had not been able to identify the commit which was causing the problem, but given that mesa 10.3 git of 28 May works OK, while git of 2 June (and subsequent days git versions) does not, I presume the problem must be caused by the radeonsi related commits applied on 2 June. Particularly, the 'radeonsi: enable ARB_sample_shading' commit was applied on 2 June, so I presume Andy Furniss' guess is correct (not sure if he had bisected).
I'm also getting the "UNKNOWN,QUALITY_LOW,QUALITY_MEDIUM,QUALITY_HIGH,MULTISAMPLE_0,USE_INSTANCING,USE_GEOMETRY_SHADER,USE_TEXTURE_3D,USE_TEXTURE_ARRAY,USE_ALPHA_FADE,USE_REFLECTION,USE_OCCLUSION,HAS_DEFERRED_COLOR,HAS_DEFERRED_NORMAL,USE_RGB10A2,USE_ENVIRONMENT,USE_NORMALIZATION,USE_DIRECTIONAL_LIGHTMAPS,USE_SHADOW_KERNEL,OPENGL,HAS_ARB_DRAW_INSTANCED,HAS_ARB_TEXTURE_SNORM,SHADING_LANGUAGE=330,USE_ARB_BLEND_FUNC_EXTENDED,USE_ARB_SHADER_BIT_ENCODING,USE_ARB_SAMPLE_SHADING,,TURBULENCE 0:170(1): error: syntax error, unexpected EXTENSION, expecting $end" kind of log on the console from which I launch heaven or valley.
I'm using a Radeon HD 7870.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #3 from Marek Olšák maraeo@gmail.com --- What happens if you set this environment variable?
force_glsl_extensions_warn=true
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #4 from Michel Dänzer michel@daenzer.net --- (In reply to comment #3)
force_glsl_extensions_warn=true
That's enabled by default for Heaven in /etc/drirc, but I just tried setting it explicitly just in case. Doesn't help.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #5 from Marek Olšák maraeo@gmail.com --- The problem is Unigine don't know how to use GLSL, again.
There is "#extension GL_ARB_sample_shading : enable" in the middle of (all?) shaders. This is not allowed by any GLSL specification. All #extension directives must occur before any non-preprocessor tokens, which pretty much means "at the beginning of shader code".
What I see: Valley is loading. Then there is hang and it recovers successfully. After that, Valley seems to have exited. That's all.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #6 from Matt Turner mattst88@gmail.com --- If you only want to run the application and don't care about a fix, you can run with
MESA_EXTENSION_OVERRIDE=-GL_ARB_sample_shading
We should implement a driconf workaround for this.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #7 from Andy Furniss adf.lists@gmail.com --- (In reply to comment #6)
If you only want to run the application and don't care about a fix, you can run with
MESA_EXTENSION_OVERRIDE=-GL_ARB_sample_shading
We should implement a driconf workaround for this.
Thanks, that works and is also needed for heaven 4.0
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #8 from Andy Furniss adf.lists@gmail.com --- (In reply to comment #5)
The problem is Unigine don't know how to use GLSL, again.
There is "#extension GL_ARB_sample_shading : enable" in the middle of (all?) shaders. This is not allowed by any GLSL specification. All #extension directives must occur before any non-preprocessor tokens, which pretty much means "at the beginning of shader code".
What I see: Valley is loading. Then there is hang and it recovers successfully. After that, Valley seems to have exited. That's all.
It's repeatedly more serious than that for me - maybe because I am fullscreen?
But anyway if I don't sysrq quickly enough when the monitor goes off I am in ext4 bitching about disk errors territory after I hard reset, so no waiting to see if the GPU reset works for me (which it never seems to do on SI - but then I haven't had this card for long).
Heaven 4.0 is also affected, but I don't lock with that - it renders junk but I can quit OK, after that there is a 90% chance my display is mostly trash. fbcon is OK when I quit X, but restarting X will still result in trashed display.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #9 from Marek Olšák maraeo@gmail.com --- The hangs are gone if I apply my workaround which fixes the compile failures.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #10 from Andy Furniss adf.lists@gmail.com --- (In reply to comment #9)
The hangs are gone if I apply my workaround which fixes the compile failures.
If you mean -
st/mesa, gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0
I hadn't tried, I assumed they would go in, and now it looks like the stuff in common has moved up a level.
Checking patch src/gallium/state_trackers/dri/common/dri_context.c... error: src/gallium/state_trackers/dri/common/dri_context.c: No such file or directory
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #11 from Ed Tomlinson edt@aei.ca --- On a r7 260x this bug leads to a dead system and a reboot. From my pov its fine if the demo fails but its NOT fine if it brings down my box...
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #12 from Andy Furniss adf.lists@gmail.com --- (In reply to comment #9)
The hangs are gone if I apply my workaround which fixes the compile failures.
Working for me now the workaround is in.
One nit WRT drirc, I don't know what the expected behavior is, but Mesa doesn't use the configured/installed location.
So for me who configures --prefix=/usr and so gets the default --sysconfdir=PREFIX/etc drirc ends up in /usr/etc/ but doesn't get read from there by the same mesa - is that expected?
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #13 from Marek Olšák maraeo@gmail.com --- (In reply to comment #12)
(In reply to comment #9)
The hangs are gone if I apply my workaround which fixes the compile failures.
Working for me now the workaround is in.
One nit WRT drirc, I don't know what the expected behavior is, but Mesa doesn't use the configured/installed location.
So for me who configures --prefix=/usr and so gets the default --sysconfdir=PREFIX/etc drirc ends up in /usr/etc/ but doesn't get read from there by the same mesa - is that expected?
This is weird. It should have been installed in /etc.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #14 from Alexander Tsoy alexander@tsoy.me --- Interesting.. My Bonaire XTX (R7 260X) is not affected by this bug. How is this possible? Cape Verde PRO (HD 7750) is affected and workaround from comment 6 fixes the problem. On both systems I have mesa-10.3 which contains the commit mentioned in comment 0.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #15 from Alexander Tsoy alexander@tsoy.me --- Ah.. "st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0" is also included in mesa-10.3. So both Heaven 4.0 and Valley 1.0 should just work. The question is why ARB_sample_shading is causing GPU lockup on VERDE. Should I open a separate bug for this issue?
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #16 from Andy Furniss adf.lists@gmail.com --- (In reply to Alexander Tsoy from comment #15)
Ah.. "st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0" is also included in mesa-10.3. So both Heaven 4.0 and Valley 1.0 should just work. The question is why ARB_sample_shading is causing GPU lockup on VERDE. Should I open a separate bug for this issue?
Maybe check that the your /etc/drirc has the workaround and/or if you have a .drirc in your home dir that it has it also, though I haven't tested if having a .drirc under $HOME without the workaround overrides one in /etc with it.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #17 from Alexander Tsoy alexander@tsoy.me --- (In reply to Andy Furniss from comment #16)
drirc was the first thing I checked. I filed a new bug 84836.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
--- Comment #18 from Marek Olšák maraeo@gmail.com --- (In reply to Alexander Tsoy from comment #15)
Ah.. "st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0" is also included in mesa-10.3. So both Heaven 4.0 and Valley 1.0 should just work. The question is why ARB_sample_shading is causing GPU lockup on VERDE. Should I open a separate bug for this issue?
I can re-test VERDE when I get home.
I haven't investigated why the hw hangs, because it only happens if shader compilation fails and so none of the sample_shading shader stuff makes it to the driver. I think the likely cause is that Unigine attempted to do rendering with a shader that hasn't actually been compiled and things went wrong after that.
https://bugs.freedesktop.org/show_bug.cgi?id=79659
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #19 from Michel Dänzer michel@daenzer.net --- Resolving per comment #12.
dri-devel@lists.freedesktop.org