https://bugs.freedesktop.org/show_bug.cgi?id=40423
Summary: Firefox GLX test crashes with Mesa 7.12-devel, Gallium 0.4 on AMD JUNIPER Product: Mesa Version: unspecified Platform: Other URL: https://crash-stats.mozilla.com/report/index/4a1da9e3- fef3-4ce6-be03-ce07d2110827 OS/Version: All Status: NEW Severity: normal Priority: medium Component: Drivers/DRI/Radeon AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: bjacob@mozilla.com
Just got this Firefox crash report: https://crash-stats.mozilla.com/report/index/4a1da9e3-fef3-4ce6-be03-ce07d21...
The Firefox crash itself is not relevant to Mesa/ATI, but this report also says (look at App Notes) that the little GLX test process that Firefox runs at startup also crashed:
App Notes
GLXtest process failed (received signal 11): VENDOR X.Org RENDERER Gallium 0.4 on AMD JUNIPER VERSION 2.1 Mesa 7.12-devel (git-5379a70) TFP TRUE
You can find the source code of this test here: http://hg.mozilla.org/mozilla-central/file/d0700ba932b4/toolkit/xre/glxtest....
It's even simpler than 'glxinfo' but differs in that it uses a GLX pixmap instead of a XWindow to back the GL context.
As the 'App Notes' say, this process crashed on signal 11 (segmentation fault)
In the crash report, by clicking the Modules tab you can get version info for the libraries loaded in the Firefox process. For example: libX11.so.6.3.0 libGL.so.1.2
Unfortunately I don't have any more information about it: no stack, no more device information.
Note that Firefox itself is "fine", the present bug is only making the GLX test process crash, Firefox notices that and as a result disables GL-dependent features. But it would be nice to not have to disable these features ;-)
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #1 from Benoit Jacob bjacob@mozilla.com 2011-08-27 11:22:53 PDT --- Notice that besides looking at the source code (above link) you can also try this yourself just by running Firefox. Any version >= 6 will do, but the present crash report is on Nightly (9):
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #2 from Benoit Jacob bjacob@mozilla.com 2011-08-27 11:24:27 PDT --- ... to check this in Firefox, go to the about:support page and scroll down to Graphics. If the GLX test process crashed, this will be explicitly reported there, like "GLXtest process failed (received signal 11)"
https://bugs.freedesktop.org/show_bug.cgi?id=40423
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|Drivers/DRI/Radeon |Drivers/Gallium/r600
--- Comment #3 from Michel Dänzer michel@daenzer.net 2011-08-29 11:04:17 PDT --- (In reply to comment #3)
You can find the source code of this test here: http://hg.mozilla.org/mozilla-central/file/d0700ba932b4/toolkit/xre/glxtest....
It has at least one obvious flaw: dlsym() can only be used to retrieve glXGetProcAddress(ARB) and the symbols defined by the Linux OpenGL ABI (basically OpenGL 1.2 + GL_ARB_multitexture). All other GL(X) symbols need to be retrieved with glXGetProcAddress(ARB).
I didn't look further than that.
Unfortunately I don't have any more information about it: no stack, no more device information.
Even assuming the crash wasn't due to a test bug, it'll be hard to do anything about it without that information.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #4 from Ian Romanick idr@freedesktop.org 2011-08-29 11:31:07 PDT --- (In reply to comment #3)
(In reply to comment #3)
You can find the source code of this test here: http://hg.mozilla.org/mozilla-central/file/d0700ba932b4/toolkit/xre/glxtest....
It has at least one obvious flaw: dlsym() can only be used to retrieve glXGetProcAddress(ARB) and the symbols defined by the Linux OpenGL ABI (basically OpenGL 1.2 + GL_ARB_multitexture). All other GL(X) symbols need to be retrieved with glXGetProcAddress(ARB).
That is kind of odd, but I don't think it's the problem. The only GL symbol that they use dlsym on is glGetString. The library has to statically export that. :) glXQueryExtension, glXQueryVersion, glXCreatePixmap, glXMakeCurrent, glXDestroyPixmap, and glXDestroyContext are in the same boat.
Technically speaking, glXChooseFBConfig, glXGetVisualFromFBConfig, and glXCreateNewContext should be retrieved via glXGetProcAddress. If dlsym returns non-NULL, it seems unlikely to be a problem on our stack. Since is already done for glXBindTexImageEXT, it is a good idea to use glXGetProcAddress for these functions as well.
I didn't look further than that.
Unfortunately I don't have any more information about it: no stack, no more device information.
Even assuming the crash wasn't due to a test bug, it'll be hard to do anything about it without that information.
I hacked up a stand-alone version, but I wasn't able to reproduce the crash on the Intel drivers.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #5 from Ian Romanick idr@freedesktop.org 2011-08-29 11:31:41 PDT --- Created an attachment (id=50665) --> (https://bugs.freedesktop.org/attachment.cgi?id=50665) Stand-alone version of Mozilla's glxtext
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #6 from Jure Repinc jlp.bugs@gmail.com 2011-08-29 15:43:47 PDT --- Looks like this crash report came from me. With Nightly from today and Mesa 7.12-devel (git-110f846) I still get "GLXtest process failed (received signal 11)" in about:support. But if I download the glxtest attachment from Ian and compile it with "g++ -ldl -lX11 glxtest.cpp -o glxtest" I don't get any crash, it exits normally. Oh, and the hardware is AMD Radeon HD 5750
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #7 from Benoit Jacob bjacob@mozilla.com 2011-08-30 13:55:01 PDT --- Hi Jure :-)
Thanks Ian and Michel, will use glXGetProcAddress.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #8 from Benoit Jacob bjacob@mozilla.com 2011-08-30 13:56:59 PDT --- (In reply to comment #6)
Looks like this crash report came from me. With Nightly from today and Mesa 7.12-devel (git-110f846) I still get "GLXtest process failed (received signal 11)" in about:support. But if I download the glxtest attachment from Ian and compile it with "g++ -ldl -lX11 glxtest.cpp -o glxtest" I don't get any crash, it exits normally.
That's really weird isn't it. Can you run the standalone glxtest, compiled with -g3, in valgrind --tool=memcheck so we see if it's hitting uninitialized memory?
https://bugs.freedesktop.org/show_bug.cgi?id=40423
Benoit Jacob bjacob@mozilla.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #50665|0 |1 is obsolete| |
--- Comment #9 from Benoit Jacob bjacob@mozilla.com 2011-08-30 14:00:11 PDT --- Created an attachment (id=50707) --> (https://bugs.freedesktop.org/attachment.cgi?id=50707) Stand-alone version of Mozilla's glxtext, v2
The original version used an uninitialized value for the file descriptor it was writing to. This version now uses standard output, so one can see what information glxtest is sending to Firefox.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #10 from Benoit Jacob bjacob@mozilla.com 2011-08-30 14:12:23 PDT --- (In reply to comment #9)
Created an attachment (id=50707)
--> (https://bugs.freedesktop.org/attachment.cgi?id=50707)
Stand-alone version of Mozilla's glxtext, v2
The original version used an uninitialized value
er, sorry, it used -1, but i still guess that using stdout is better.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #11 from Julien Cristau jcristau@debian.org 2011-08-30 14:33:54 PDT ---
--- Comment #9 from Benoit Jacob bjacob@mozilla.com 2011-08-30 14:00:11 PDT --- The original version used an uninitialized value for the file descriptor it was writing to. This version now uses standard output, so one can see what information glxtest is sending to Firefox.
Not that it matters much, but stdout is 1, not 0 ;)
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #12 from Benoit Jacob bjacob@mozilla.com 2011-08-30 14:36:51 PDT --- (In reply to comment #11)
--- Comment #9 from Benoit Jacob bjacob@mozilla.com 2011-08-30 14:00:11 PDT --- The original version used an uninitialized value for the file descriptor it was writing to. This version now uses standard output, so one can see what information glxtest is sending to Firefox.
Not that it matters much, but stdout is 1, not 0 ;)
Oops! You're right. Now I'm totally confused as to why my change actually made it print stuff to stdout.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
Benoit Jacob bjacob@mozilla.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #50707|0 |1 is obsolete| |
--- Comment #13 from Benoit Jacob bjacob@mozilla.com 2011-08-30 14:55:44 PDT --- Created an attachment (id=50724) --> (https://bugs.freedesktop.org/attachment.cgi?id=50724) Stand-alone version of Mozilla's glxtext, v3
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #14 from Jure Repinc jlp.bugs@gmail.com 2011-09-01 05:10:06 PDT --- This is what valgrind gives on my laptop with ATI Mobility Radeon 5470:
$ valgrind --tool=memcheck ./glxtest ==20468== Memcheck, a memory error detector ==20468== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==20468== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info ==20468== Command: ./glxtest ==20468== ==20468== Invalid read of size 4 ==20468== at 0x52341B2: _mesa_make_extension_string (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x5246397: _mesa_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x5236B01: st_api_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x521F352: dri_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x51FF655: driBindContext (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x3C14056B0E: dri2_bind_context (in /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2) ==20468== by 0x3C1403032D: MakeContextCurrent (in /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2) ==20468== by 0x400F1C: glxtest() (glxtest.cpp:193) ==20468== by 0x4010B2: main (glxtest.cpp:239) ==20468== Address 0x4f157b0 is 0 bytes inside a block of size 1 alloc'd ==20468== at 0x4A063E4: calloc (vg_replace_malloc.c:467) ==20468== by 0x5234429: _mesa_make_extension_string (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x5246397: _mesa_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x5236B01: st_api_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x521F352: dri_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x51FF655: driBindContext (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x3C14056B0E: dri2_bind_context (in /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2) ==20468== by 0x3C1403032D: MakeContextCurrent (in /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2) ==20468== by 0x400F1C: glxtest() (glxtest.cpp:193) ==20468== by 0x4010B2: main (glxtest.cpp:239) ==20468== VENDOR X.Org RENDERER Gallium 0.4 on AMD CEDAR VERSION 2.1 Mesa 7.12-devel (git-82fff5f) TFP TRUE ==20468== ==20468== HEAP SUMMARY: ==20468== in use at exit: 212,008 bytes in 531 blocks ==20468== total heap usage: 3,105 allocs, 2,574 frees, 2,729,162 bytes allocated ==20468== ==20468== LEAK SUMMARY: ==20468== definitely lost: 1,948 bytes in 4 blocks ==20468== indirectly lost: 0 bytes in 0 blocks ==20468== possibly lost: 120 bytes in 3 blocks ==20468== still reachable: 209,940 bytes in 524 blocks ==20468== suppressed: 0 bytes in 0 blocks ==20468== Rerun with --leak-check=full to see details of leaked memory ==20468== ==20468== For counts of detected and suppressed errors, rerun with: -v ==20468== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 11 from 9)
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #15 from Jure Repinc jlp.bugs@gmail.com 2011-09-01 05:29:39 PDT --- Ups forgot the debugging symbols for mesa:
$ valgrind --tool=memcheck ./glxtest ==28048== Memcheck, a memory error detector ==28048== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==28048== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info ==28048== Command: ./glxtest ==28048== ==28048== Invalid read of size 4 ==28048== at 0x54B71B2: _mesa_make_extension_string (extensions.c:912) ==28048== by 0x54C9397: _mesa_make_current (context.c:1514) ==28048== by 0x54B9B01: st_api_make_current (st_manager.c:789) ==28048== by 0x54A2352: dri_make_current (dri_context.c:205) ==28048== by 0x5482655: driBindContext (dri_util.c:196) ==28048== by 0x509FB0E: dri2_bind_context (dri2_glx.c:151) ==28048== by 0x507932D: MakeContextCurrent (glxcurrent.c:269) ==28048== by 0x400F1C: glxtest() (glxtest.cpp:193) ==28048== by 0x4010B2: main (glxtest.cpp:239) ==28048== Address 0x4f157b0 is 0 bytes inside a block of size 1 alloc'd ==28048== at 0x4A063E4: calloc (vg_replace_malloc.c:467) ==28048== by 0x54B7429: _mesa_make_extension_string (extensions.c:776) ==28048== by 0x54C9397: _mesa_make_current (context.c:1514) ==28048== by 0x54B9B01: st_api_make_current (st_manager.c:789) ==28048== by 0x54A2352: dri_make_current (dri_context.c:205) ==28048== by 0x5482655: driBindContext (dri_util.c:196) ==28048== by 0x509FB0E: dri2_bind_context (dri2_glx.c:151) ==28048== by 0x507932D: MakeContextCurrent (glxcurrent.c:269) ==28048== by 0x400F1C: glxtest() (glxtest.cpp:193) ==28048== by 0x4010B2: main (glxtest.cpp:239) ==28048== VENDOR X.Org RENDERER Gallium 0.4 on AMD CEDAR VERSION 2.1 Mesa 7.12-devel (git-11f6466) TFP TRUE ==28048== ==28048== HEAP SUMMARY: ==28048== in use at exit: 212,008 bytes in 531 blocks ==28048== total heap usage: 3,105 allocs, 2,574 frees, 2,729,162 bytes allocated ==28048== ==28048== LEAK SUMMARY: ==28048== definitely lost: 1,948 bytes in 4 blocks ==28048== indirectly lost: 0 bytes in 0 blocks ==28048== possibly lost: 120 bytes in 3 blocks ==28048== still reachable: 209,940 bytes in 524 blocks ==28048== suppressed: 0 bytes in 0 blocks ==28048== Rerun with --leak-check=full to see details of leaked memory ==28048== ==28048== For counts of detected and suppressed errors, rerun with: -v ==28048== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 11 from 9)
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #16 from Jure Repinc jlp.bugs@gmail.com 2011-09-24 04:43:18 PDT --- Is there anything else I can test to help fix this?
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #17 from Benoit Jacob bjacob@mozilla.com 2011-09-25 09:18:10 PDT --- (In reply to comment #16)
Is there anything else I can test to help fix this?
Looking at the Valgrind error from comment 15, it suggests that _mesa_make_extension_string (extensions.c:912) makes the assumption that it's safe to read 4 bytes from a string and that assumption fails as the string is empty with just 1 byte allocated (and set to 0) by calloc.
This kind of error is very commonly found in string manipulation and is generally considered safe to ignore as it can hardly lead to a crash. Indeed, AFAIK it could only crash if reading those 4 bytes led to crossing a (4k) page boundary, which would mean that the pointer was not even 4-byte-aligned... It's not inconceivable to have a Firefox-specific crash there as Firefox uses a custom memory allocator (jemalloc).
What you could do to investigate this bug, if you can recompile Mesa, is add a printf in _mesa_make_extension_string (extensions.c:912) to print this pointer. And perhaps add a printf at the next line to see if it's there that we crash.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #18 from Benoit Jacob bjacob@mozilla.com 2011-09-25 09:20:37 PDT --- ... or just use gdb to do the same. Not sure why I was recommending printf-based debugging ;-) oh yes, it might be a little tricky with gdb as this is in a child process, you might have to break on fork to get the child process PID and then attach gdb to the child process.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #19 from Michel Dänzer michel@daenzer.net 2011-09-29 04:07:25 PDT --- (In reply to comment #17)
Looking at the Valgrind error from comment 15, it suggests that _mesa_make_extension_string (extensions.c:912) makes the assumption that it's safe to read 4 bytes from a string and that assumption fails as the string is empty with just 1 byte allocated (and set to 0) by calloc.
Indeed, looks like a bug in gcc/glibc. Note that current Mesa Git works around this by rounding up the allocation sizes to multiples of four bytes.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #20 from Benoit Jacob bjacob@mozilla.com 2011-09-29 06:25:14 PDT --- Note, again (see comment 17), that this is an extremely common bug in string functions, and that I'm not at all sure that it's the cause of the crash since it could only lead to a crash if the string were not 4-byte aligned, which would be surprising.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #21 from Nicholas Miell nmiell@gmail.com 2011-09-30 20:45:37 PDT --- It isn't a bug, it is a perfectly valid and entirely safe optimization.
Valgrind supplies its own overrides for the string functions that don't do this optimization specifically to prevent false positives from the C library, but in this case the false positive results from a call to strlen() that gcc has inlined.
Does jemalloc (or Firefox's fork of jemalloc) obey the platform ABI requirements for the alignment of the return values of the malloc functions?
Because gcc's value range propagation has carried the fact that the pointer was returned by calloc to the point where it has inlined strlen(), and knowing that the pointer has to be word aligned, it has omitted the usual byte-at-a-time strlen() preamble and skipped straight to word-at-a-time loads.
jemalloc's INSTALL lists the following option: --disable-tiny Disable tiny (sub-quantum-sized) object support. Technically it is not legal for a malloc implementation to allocate objects with less than quantum alignment (8 or 16 bytes, depending on architecture), but in practice it never causes any problems if, for example, 4-byte allocations are 4-byte-aligned.
If, for example, jemalloc decides to stick a 4 byte allocation in the last 4 bytes of a page on a 64-bit plaform (i.e. word size is 8 bytes), and gcc's value range propagation can follow a pointer from a malloc function to an inlined strlen, things will explode violently.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #22 from Nicholas Miell nmiell@gmail.com 2011-09-30 21:08:50 PDT --- Looking at Firefox's fork of jemalloc, TINY_MIN_2POW is 1, making the smallest allocation size 2 bytes, which is broken.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #23 from Benoit Jacob bjacob@mozilla.com 2011-09-30 21:22:05 PDT --- (In reply to comment #22)
Looking at Firefox's fork of jemalloc, TINY_MIN_2POW is 1, making the smallest allocation size 2 bytes, which is broken.
This sounds like something that would be very much worth discussing with the JEMalloc author, Jason Evans, by filing a bug at bugzilla.mozilla.org, product 'Core', component 'jemalloc', CC'ing Jason Evans ... and CC me too please, I'm interested. I'm not really competent to file this myself.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #24 from Nicholas Miell nmiell@gmail.com 2011-10-01 16:59:16 PDT --- Already filed https://bugzilla.mozilla.org/show_bug.cgi?id=691003 yesterday.
Upstream jemalloc is fine (and has diverged rather significantly from Firefox's copy).
https://bugs.freedesktop.org/show_bug.cgi?id=40423
--- Comment #25 from Benoit Jacob bjacob@mozilla.com 2011-10-06 04:54:06 PDT --- Thanks a lot.
https://bugzilla.mozilla.org/show_bug.cgi?id=691003 is fixed as of today, will be in tomorrow's Nightly.
It seems that this bug could have explained a lot of mysterious Firefox crashiness on linux. Having it fixed fills me with indescriptible glee.
https://bugs.freedesktop.org/show_bug.cgi?id=40423
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |NOTOURBUG
dri-devel@lists.freedesktop.org