https://bugs.freedesktop.org/show_bug.cgi?id=74539
Priority: medium Bug ID: 74539 Assignee: dri-devel@lists.freedesktop.org Summary: [r600g] Memory leak when playing WoW with RV790 Severity: normal Classification: Unclassified OS: Linux (All) Reporter: rankincj@googlemail.com Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Drivers/Gallium/r600 Product: Mesa
Created attachment 93417 --> https://bugs.freedesktop.org/attachment.cgi?id=93417&action=edit dmesg log showing memory allocation failure.
I am seeing a possible memory leak while playing WoW-64.exe with my RV790. The problem seems to happen after ~ 1 hour of play:
[ 117.896993] fuse init (API version 7.22) [ 3326.401752] WoW-64.exe: page allocation failure: order:4, mode:0x10c0d0 [ 3326.407099] CPU: 7 PID: 31106 Comm: WoW-64.exe Not tainted 3.12.9 #1 [ 3326.412185] Hardware name: Gigabyte Technology Co., Ltd. EX58-UD3R/EX58-UD3R, BIOS FB 05/04/2009 [ 3326.419812] ffff8801afdcdbd0 ffffffff812d0071 0000000000000001 ffffffff810a0c50 [ 3326.426142] 0000000000000001 0000000000000000 ffffffff8164ff80 ffffffff8164f400 [ 3326.432429] 000000000010c0d0 ffffffff812cebb7 ffffffff8164f400 ffff880100000000
I *think* I can bisect this, although it might make some time:
9bace99d77642f8fbd46b1f0be025ad758f83f5e BAD f5bd5568abcc234c1c2b6a4bb67b880706f3caed GOOD
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #1 from Chris Rankin rankincj@googlemail.com --- It looks like the first bad commit is this one:
commit ed42e95404a51298ea878a0d1cdcbc473612706a Author: Marek Olšák marek.olsak@amd.com Date: Wed Jan 22 02:49:53 2014 +0100
r600g,radeonsi: consolidate remaining obviously duplicated pipe_screen code
Reviewed-by: Michel Dänzer michel.daenzer@amd.com Reviewed-by: Tom Stellard thomas.stellard@amd.com
I have definitely reproduced the problem with this commit, but failed to reproduce it with only the one before:
commit 65dc588bfd3b8145131340ffe77f216be58378ac Author: Marek Olšák marek.olsak@amd.com Date: Wed Jan 22 02:42:20 2014 +0100
r600g,radeonsi: consolidate get_compute_param
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #2 from Marek Olšák maraeo@gmail.com --- (In reply to comment #1)
It looks like the first bad commit is this one:
commit ed42e95404a51298ea878a0d1cdcbc473612706a Author: Marek Olšák marek.olsak@amd.com Date: Wed Jan 22 02:49:53 2014 +0100
r600g,radeonsi: consolidate remaining obviously duplicated pipe_screen
code
I don't think this is it. These functions are called once per process.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #3 from Chris Rankin rankincj@googlemail.com --- (In reply to comment #2)
I don't think this is it. These functions are called once per process.
Regardless, ed42e95404a51298ea878a0d1cdcbc473612706a is definitely "bad" whereas "65dc588bfd3b8145131340ffe77f216be58378ac" failed to crash after over 2 hours of play. I really cannot say anything else at this time.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #4 from Chris Rankin rankincj@googlemail.com --- I have finally managed to generate an "out-of-memory" condition while playing WoW with git HEAD at:
commit f5bd5568abcc234c1c2b6a4bb67b880706f3caed Author: Mark Mueller MarkKMueller@gmail.com Date: Tue Jan 21 22:37:20 2014 -0800
mesa: Fix Type A _INT formats to MESA_FORMAT naming standard
So the bottom line is that I cannot bisect this, because not only do I have no reliable means of identifying a "GOOD" commit, but I also have no idea where the first "GOOD" commit might be.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #5 from Michel Dänzer michel@daenzer.net --- Please try getting more information about the leak(s) with valgrind --leak-check=full.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #6 from Chris Rankin rankincj@googlemail.com --- (In reply to comment #5)
Please try getting more information about the leak(s) with valgrind --leak-check=full.
Do I need to do anything "special" to valgrind WoW.exe, seeing as it must be invoked using wine?
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #7 from Nick Tenney nick.tenney@gmail.com --- I found this helpful for setting up wine and valgrind together:
http://wiki.winehq.org/Wine_and_Valgrind
You may need to recompile wine after installing valgrind, as mentioned in the wiki. For a similar issue in Diablo III, I could not get the game to run with valgrind, so I used apitrace to record a session and ran valgrind on the trace (after much help from Michel and Ilia). Hope this helps. Oh, make sure to compile Mesa with debug symbols or you'll need to repeat the whole process. I forgot that the first time 'round.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #8 from Chris Rankin rankincj@googlemail.com --- (In reply to comment #7)
You may need to recompile wine after installing valgrind, as mentioned in the wiki.
There is no "re"-compile of wine - it either works with Fedora's debuginfo package or it doesn't.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #9 from Chris Rankin rankincj@googlemail.com --- Has anyone ever "valground" a 32 bit executable on a box which is natively 64 bit, please? This bug is currently making it impossible to run Wow-64.exe:
http://bugs.winehq.org/show_bug.cgi?id=35582
That in itself isn't an issue - the memory leak occurs with both 32 bit and 64 bit WoW. However, the following command is failing:
$ valgrind --trace-children=yes --leak-check=full /usr/bin/wine /opt/wine/World\ of\ Warcraft/Wow.exe -opengl -noautoload64bit
with this error:
valgrind: failed to start tool 'memcheck' for platform 'amd64-linux': No such file or directory
Which I assume means that Valgrind is trying to use 64 bit tools on a 32 bit executable.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #10 from Chris Rankin rankincj@googlemail.com --- (In reply to comment #9)
valgrind: failed to start tool 'memcheck' for platform 'amd64-linux': No such file or directory
More specifically: Valgrind is falling back to the x86_64 platform when executing the "--trace-children=yes" option!
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #11 from Chris Rankin rankincj@googlemail.com --- Created attachment 95119 --> https://bugs.freedesktop.org/attachment.cgi?id=95119&action=edit Valgrind output from 32 bit WoW
I have an extremely underpowered dual P4 box with a HD4670 AGP card that is capable of running 32 bit WoW, so I've tried to run valgrind on that. Here is the output.
Unfortunately, I was only able to get as far as the login screen as Blizzard rejected my login. (Too slow, perhaps? Or maybe they detected valgrind and disallowed me?) However, there are some interesting entries.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #12 from Chris Rankin rankincj@googlemail.com --- Created attachment 95121 --> https://bugs.freedesktop.org/attachment.cgi?id=95121&action=edit 2nd valgrind output from 32 bit WoW
Again with the HD4670 AGP and the latest git:
commit 079bff5a99fa19029fc0caba92fe57046ee29b23 Author: Anuj Phogat anuj.phogat@gmail.com Date: Mon Mar 3 14:40:14 2014 -0800
mesa: Allow GL_DEPTH_COMPONENT and GL_DEPTH_STENCIL combinations in glTexIma
https://bugs.freedesktop.org/show_bug.cgi?id=74539
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #95119|application/octet-stream |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=74539
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #95121|application/octet-stream |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=74539
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #95119|text/plain |application/octet-stream mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=74539
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #95121|text/plain |application/octet-stream mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #13 from Nick Tenney nick.tenney@gmail.com --- try: $apitrace /usr/bin/wine /opt/wine/World\ of\ Warcraft/Wow.exe -opengl -noautoload64bit
To record an apitrace of a little bit of play. It will generate a .trace file that you can run through valgrind (I believe same options as before, I can't recall). This may help you get a little further.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #14 from Chris Rankin rankincj@googlemail.com --- (In reply to comment #13)
This may help you get a little further.
Actually, I'm hoping that the valgrind output from WoW on my native 32 bit box will be sufficient. It does seem to show a suspiciously large number of allocations in the r600 code.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #15 from Michel Dänzer michel@daenzer.net --- (In reply to comment #14)
Actually, I'm hoping that the valgrind output from WoW on my native 32 bit box will be sufficient.
If you could produce an apitrace reproducing the leaks as reported by valgrind, that might make it easier for us to reproduce and investigate the problem.
This one looks interesting, but I'm not sure yet how the memory ends up being leaked:
==13334== 302,736 bytes in 84 blocks are possibly lost in loss record 6,231 of 6,282 ==13334== at 0x400870E: calloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==13334== by 0x8444B00: r600_texture_create_object (r600_texture.c:571) ==13334== by 0x844587C: r600_texture_create (r600_texture.c:759) ==13334== by 0x843FF69: r600_resource_create_common (r600_pipe_common.c:589) ==13334== by 0x83DC5EF: r600_resource_create (r600_pipe.c:558) ==13334== by 0x81D7A98: st_texture_create (st_texture.c:96) ==13334== by 0x81AA32E: guess_and_alloc_texture (st_cb_texture.c:405) ==13334== by 0x81AA476: st_AllocTextureImageBuffer (st_cb_texture.c:459) ==13334== by 0x814C953: _mesa_store_compressed_teximage (texstore.c:4195) ==13334== by 0x81A99D4: st_CompressedTexImage (st_cb_texture.c:823) ==13334== by 0x8138540: teximage (teximage.c:3244) ==13334== by 0x813A8D3: _mesa_CompressedTexImage2D (teximage.c:3913)
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #16 from Chris Rankin rankincj@googlemail.com --- The apitrace is 155,721 KB and so cannot be uploaded. Rather than break it into
50 fragments, does anyone have another location to upload it to please?
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #17 from Chris Rankin rankincj@googlemail.com --- Created attachment 95249 --> https://bugs.freedesktop.org/attachment.cgi?id=95249&action=edit apitrace output for 32 bit WoW (1)
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #18 from Chris Rankin rankincj@googlemail.com --- Created attachment 95250 --> https://bugs.freedesktop.org/attachment.cgi?id=95250&action=edit apitrace output for 32 bit WoW (2)
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #19 from Chris Rankin rankincj@googlemail.com --- Created attachment 95252 --> https://bugs.freedesktop.org/attachment.cgi?id=95252&action=edit apitrace output for 32 bit WoW (3)
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #20 from Chris Rankin rankincj@googlemail.com --- Created attachment 95253 --> https://bugs.freedesktop.org/attachment.cgi?id=95253&action=edit apitrace output for 32 bit WoW (4)
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #21 from Chris Rankin rankincj@googlemail.com --- Created attachment 95254 --> https://bugs.freedesktop.org/attachment.cgi?id=95254&action=edit apitrace output for 32 bit WoW (5)
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #22 from Chris Rankin rankincj@googlemail.com --- Created attachment 95255 --> https://bugs.freedesktop.org/attachment.cgi?id=95255&action=edit apitrace output for 32 bit WoW (6)
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #23 from Chris Rankin rankincj@googlemail.com --- Created attachment 95256 --> https://bugs.freedesktop.org/attachment.cgi?id=95256&action=edit apitrace output for 32 bit WoW (7)
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #24 from Chris Rankin rankincj@googlemail.com --- Created attachment 95257 --> https://bugs.freedesktop.org/attachment.cgi?id=95257&action=edit apitrace output for 32 bit WoW (8)
This is a less ambitious apitrace from 32 bit WoW. You can reconstruct the original file by:
$ cat wine-preloader-2.trace.xz.* > wine-preloader-2.trace.xz
The SHA1SUM should be: c969eb3169db84e26e50f29a4e4674058e5ec897
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #25 from Chris Rankin rankincj@googlemail.com --- Created attachment 97051 --> https://bugs.freedesktop.org/attachment.cgi?id=97051&action=edit dmesg output from 3.13.9
This memory leak is still present with 3.13.9 and HEAD 4ccff1499c956b51f18710c7308cbce883f64cd9.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #26 from Michel Dänzer michel@daenzer.net --- Does the patch from bug 74868 help?
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #27 from Chris Rankin rankincj@googlemail.com --- Thanks, I'll give it a try. Was Mesa leaking memory for *all* failed shaders, or just failed geometry shaders?
I tried booting up Diablo III a few days back to see the new geometry shaders in action on my 4890...
AFAIK, the 4890 needs a 3.14+ kernel to get geometry shader support.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #28 from Chris Rankin rankincj@googlemail.com --- (In reply to comment #26)
Does the patch from bug 74868 help?
Hmm, it hasn't OOM-ed yet. But one of the symptoms that I'd come to associate with the memory problem was an increasing jerkiness in the game play over time. That symptom at least is still present.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #29 from Marek Olšák maraeo@gmail.com --- With kernel 3.15, you can watch GPU memory usage by setting: GALLIUM_HUD=VRAM-usage,GTT-usage
You should able to see if we leak GPU memory or not.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #30 from Chris Rankin rankincj@googlemail.com --- (In reply to comment #29)
With kernel 3.15, you can watch GPU memory usage by setting: GALLIUM_HUD=VRAM-usage,GTT-usage
Is this support sufficiently non-invasive to be backported to 3.14-stable?
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #31 from Marek Olšák maraeo@gmail.com --- It's not a bug fix, so I doubt it would be accepted.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #32 from Chris Rankin rankincj@googlemail.com --- (In reply to comment #31)
It's not a bug fix, so I doubt it would be accepted.
Perhaps not, but possibly worth asking Mr Greg KH if he'd be prepared to consider it anyway?
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #33 from Chris Rankin rankincj@googlemail.com --- Created attachment 97325 --> https://bugs.freedesktop.org/attachment.cgi?id=97325&action=edit Xorg.0.log showing errors when exiting WoW
One of the other errors that I've come to associate (rightly or wrongly) with the OOM problem is that it can take a long time to get keyboard/mouse control back after exiting WoW.
This is the Xorg.0.log file from an instance where I didn't get keyboard/mouse control back at all, and Xorg just chewed up 100% of one CPU instead.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #34 from Michel Dänzer michel@daenzer.net --- (In reply to comment #33)
This is the Xorg.0.log file from an instance where I didn't get keyboard/mouse control back at all, and Xorg just chewed up 100% of one CPU instead.
DRICloseScreen is a DRI1 function; I suspect the backtraces in the Xorg log file aren't reliable. It would be interesting to see where the Xorg process is spinning, e.g. by attaching gdb to it and getting a couple of backtraces from it.
(In reply to comment #32)
Perhaps not, but possibly worth asking Mr Greg KH if he'd be prepared to consider it anyway?
Sure, feel free to ask him. :)
Anyway, any GPU resource leaks should be accompanied by 'normal' memory leaks, so valgrind should be the proper tool for the job.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #35 from Chris Rankin rankincj@googlemail.com --- Created attachment 97584 --> https://bugs.freedesktop.org/attachment.cgi?id=97584&action=edit Xorg backtrace when WoW fails to exit
The problem with Xorg happened again, so I logged in via another machine and extracted a backtrace. And it appears to be spinning uselessly here:
0x00000000005732b4 in DRI2DrawableGone (p=0x1977780, id=1117838198) at dri2.c:382 382 xorg_list_for_each_entry_safe(ref, next, &pPriv->reference_list, link) {
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #36 from Michel Dänzer michel@daenzer.net --- (In reply to comment #35)
[...] it appears to be spinning uselessly here:
0x00000000005732b4 in DRI2DrawableGone (p=0x1977780, id=1117838198) at dri2.c:382 382 xorg_list_for_each_entry_safe(ref, next, &pPriv->reference_list, link) {
Weird. Anyway, that doesn't seem directly related to memory leaks in r600g and should be tracked separately.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
--- Comment #37 from Chris Rankin rankincj@googlemail.com --- Created attachment 98185 --> https://bugs.freedesktop.org/attachment.cgi?id=98185&action=edit dmesg output with 3.14.2
Drat, I had hoped that this issue had been fixed.
https://bugs.freedesktop.org/show_bug.cgi?id=74539
GitLab Migration User gitlab-migration@fdo.invalid changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #38 from GitLab Migration User gitlab-migration@fdo.invalid --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/491.
dri-devel@lists.freedesktop.org