https://bugs.freedesktop.org/show_bug.cgi?id=45018
Bug #: 45018 Summary: [bisected] rendering regression since added support for virtual address space on cayman v11 Classification: Unclassified Product: Mesa Version: git Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/r600 AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: alexandre.f.demers@gmail.com
Created attachment 55888 --> https://bugs.freedesktop.org/attachment.cgi?id=55888 Good rendering
When testing RenderFeatTest.bin64, the shadows on test07 are not rendered correctly anymore. Bisecting identified the following commit as culprit:
bb1f0cf3508630a9a93512c79badf8c493c46743 is the first bad commit commit bb1f0cf3508630a9a93512c79badf8c493c46743 Author: Jerome Glisse jglisse@redhat.com Date: Fri Dec 2 10:20:29 2011 -0500
r600g: add support for virtual address space on cayman v11
I'll be attaching pictures to show the regression.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #1 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-20 22:03:32 UTC --- By the way, I'm using latest drm and kernel from git. I have pcie_gen2 enabled. I'm running Ubuntu Oneiric with a HD6950.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #2 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-20 22:04:48 PST --- Created attachment 55889 --> https://bugs.freedesktop.org/attachment.cgi?id=55889 Bad rendering
Projected shadows are not rendered correctly anymore.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #3 from Alex Deucher agd5f@yahoo.com 2012-01-21 06:48:47 PST --- Please attach your xorg log and dmesg output.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #4 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-21 07:06:26 PST --- Created attachment 55912 --> https://bugs.freedesktop.org/attachment.cgi?id=55912 dmesg with bad rendering after running the app
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #5 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-21 07:06:59 PST --- Created attachment 55913 --> https://bugs.freedesktop.org/attachment.cgi?id=55913 xorg.log with bad rendering
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #6 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-21 07:07:31 PST --- Should I add the logs from the good rendering?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #55912|text/x-log |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #55913|text/x-log |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED
--- Comment #7 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-23 22:20:21 PST --- One of the latest commits fixed the issue. Many commits were related to r600g, some were specific to cayman, one of them must have fixed it. Closing.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |
--- Comment #8 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-25 10:01:51 PST --- I must have mixed something when testing. It is still there with latest git.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #9 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-27 21:59:44 PST --- Here is why I thought the bug was fixed: for another reason, I booted with a 3.2 kernel instead of a 3.3-rc1. The bugs is not visible under kernel 3.2, but is under 3.3-rc1 since the bisected commit. I will try with a 3.3-rc2 kernel once it will be available.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #10 from Michel Dänzer michel@daenzer.net 2012-01-28 04:52:09 PST --- (In reply to comment #9)
The bugs is not visible under kernel 3.2, [...]
3.2 lacks Radeon virtual address space support.
I will try with a 3.3-rc2 kernel once it will be available.
That's unlikely to make a difference, this is likely a userspace bug.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #11 from Alexandre Demers alexandre.f.demers@gmail.com 2012-01-31 16:35:53 PST --- (In reply to comment #10)
(In reply to comment #9)
The bugs is not visible under kernel 3.2, [...]
3.2 lacks Radeon virtual address space support.
I will try with a 3.3-rc2 kernel once it will be available.
That's unlikely to make a difference, this is likely a userspace bug.
Indeed and I can now confirm it. 3.3-rc2 doesn't change the problem.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|git |8.0
--- Comment #12 from Jerome Glisse glisse@freedesktop.org 2012-02-03 15:10:11 PST --- Can you please record an apitrace of the affected test. Thank you
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #13 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-03 19:26:16 PST --- (In reply to comment #12)
Can you please record an apitrace of the affected test. Thank you
I'll try it during the weekend. I should be able to give you an apitrace by Sunday night or Monday.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #14 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-05 17:45:20 PST --- Here I uploaded the apitrace: http://www.mediafire.com/?mnlmwe6x4j305zm
It was to big to be posted here, so it's available on mediafire.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #15 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-14 22:44:50 UTC --- I'm now running kernel 3.3-rc3 and some applications are completely freezing my system. I haven't seen anything special in the various logs. The only error available is about a conflict in addresses for bo, but they are not related to the application that are crashing my system. Renderfeattest.bin64 and sanctuary from unigine are exemples of those apps. Gnome-shell is also resetting, freezing or crashing from time to time (the problem could come from X when crashing).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #16 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-15 15:54:48 PST --- With yesterday's gits (mesa, drum, ddx), Gnome-shell is now freezing right after I log in. From time to time, I receive GPU hanged for X msec and then it resets. I'll try to bisect.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #18 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-16 16:31:44 UTC --- Created attachment 57178 --> https://bugs.freedesktop.org/attachment.cgi?id=57178 dmesg with bo conflict
Latest dmesg with the bo conflicts.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #17 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-16 16:19:08 PST --- I was not able to find the root of the problem. However, I have many message telling me the following: Radon 0000:01:00.0: no ffff880214144000 via 0x024000000 conflicts with (no ffff880213ffcc00 0x02340000 0x03340000)
And at some point: gnome-shell segfault at 14 ip 00007f20b053a82a Sep 00007fff23165000 error 4 in r600_drive.so[7f20b0418000+405000]
I'll continue investigating the bug, but each time I go back with kernel 3.2 (no virtual address space), it works like a charm.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #19 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-18 08:39:32 PST --- I've been able to get a log in dmesg when the GPU locked. Just after relaunching X and Gnome-Shell, I was able to reproduce a lock/crash I'm experiencing from time to time without any log to post. So I took a picture of it. I'm attaching all that just now.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|8.0 |git
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #20 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-18 08:40:56 PST --- Created attachment 57234 --> https://bugs.freedesktop.org/attachment.cgi?id=57234 GPU lock
GPU lock and reset
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #21 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-18 08:42:31 PST --- Created attachment 57235 --> https://bugs.freedesktop.org/attachment.cgi?id=57235 Kernel crash
Picture of the kernel crash
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #22 from Harald Judt h.judt@gmx.at 2012-02-18 09:37:28 PST --- Created attachment 57237 --> https://bugs.freedesktop.org/attachment.cgi?id=57237 radeon_cs_update_pages.jpg
I experience a very similar lockup problem. It always happens when rotating the desktop cube (compiz).
It seems the rest of the system still works (playing music via mpd etc.), but I can't switch VT. However, I took a picture of the output (see attached jpg).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #23 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-19 13:39:42 UTC --- Great to know I'm not the only one with this problem. By the way, still there with kernel 3.3-rc4 and latests gits.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #24 from Jerome Glisse glisse@freedesktop.org 2012-02-21 09:44:50 PST --- I pushed a mesa fix for bo allocation issue. If you enable 2d tiling properly you shouldn't have lockup anymore. There is also a kernel patch to fix kernel issue after gpu lockup.
http://lists.freedesktop.org/archives/dri-devel/2012-February/019293.html
To properly enabled 2d tiling you need libdrm from git and ddx from git and add:
Option "ColorTiling2D" "true"
To your gpu device section of xorg configuration
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #25 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-21 20:25:39 PST --- Does this imply that when not using 2d tiling it shouldn't crash or lock anymore or is it specific to 2d tiling usage?
(In reply to comment #24)
I pushed a mesa fix for bo allocation issue. If you enable 2d tiling properly you shouldn't have lockup anymore. There is also a kernel patch to fix kernel issue after gpu lockup.
http://lists.freedesktop.org/archives/dri-devel/2012-February/019293.html
To properly enabled 2d tiling you need libdrm from git and ddx from git and add:
Option "ColorTiling2D" "true"
To your gpu device section of xorg configuration
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #26 from Alex Deucher agd5f@yahoo.com 2012-02-22 06:32:43 PST --- (In reply to comment #25)
Does this imply that when not using 2d tiling it shouldn't crash or lock anymore or is it specific to 2d tiling usage?
It shouldn't lock up, but if it does (for any reason, not necessarily VM related), the kernel patch should allow the kernel recover more gracefully if the reset fails.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #27 from Jerome Glisse glisse@freedesktop.org 2012-02-22 09:50:14 PST --- (In reply to comment #26)
(In reply to comment #25)
Does this imply that when not using 2d tiling it shouldn't crash or lock anymore or is it specific to 2d tiling usage?
It shouldn't lock up, but if it does (for any reason, not necessarily VM related), the kernel patch should allow the kernel recover more gracefully if the reset fails.
Well actualy 2D tiling path fix bunch of issues that leaded to GPU lockup. So with 2D tiling enabled there is less chance of lockup.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #28 from Harald Judt h.judt@gmx.at 2012-02-25 16:03:32 PST --- Ok. Now with latest git, there is an improvement: No kernel crash anymore, but compiz still hangs when rotating the cube and the X screen freezes. Gladly, I can switch VT and pkill -9 and restart compiz, and X is usable again. At least no need to reboot!
The following line appeared in dmesg when the freeze happened: radeon 0000:01:00.0: offset 0x300000 is in reserved area 0x800000
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #29 from Alexandre Demers alexandre.f.demers@gmail.com 2012-02-25 17:31:17 PST --- Created attachment 57645 --> https://bugs.freedesktop.org/attachment.cgi?id=57645 dmesg after lock with latest patch
Without using 2D tiling yet, I'll try it soon. But, meanwhile, I've installed kernel 3.3-rc5 with latest gits (mesa, drm and xorg driver) and it still locks. I'm joining the output found in dmesg.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #30 from Harald Judt h.judt@gmx.at 2012-02-27 17:24:12 PST --- Created attachment 57740 --> https://bugs.freedesktop.org/attachment.cgi?id=57740 screenshot showing garbled fonts in blender-2.62
Besides the lockups and the rendering regressions already mentioned, the commit bb1f0cf3508630a9a93512c79badf8c493c46743 "r600g: add support for virtual address space on cayman v11" makes the font garbled in blender-2.62.
git bisect start # bad: [bf4fedcef3e345f5117232d58bd9000c2441de74] r600g: use u_default_transfer_flush_region for all resource types git bisect bad bf4fedcef3e345f5117232d58bd9000c2441de74 # good: [f9c9933f9c7f72f12be27ccda98c965c75f08a12] mesa: Bump version number to 8.0 (final) git bisect good f9c9933f9c7f72f12be27ccda98c965c75f08a12 # good: [fe77fd3983ba3da16ec53c58a790c381b07387ce] docs: Add 8.0.1 release notes git bisect good fe77fd3983ba3da16ec53c58a790c381b07387ce # good: [6fe42b603d0ec9e13a8b7d6c46c6d89da3a6a614] mesa: Include glx tests Makefile.in in tarball git bisect good 6fe42b603d0ec9e13a8b7d6c46c6d89da3a6a614 # skip: [ac3a765589a881c56f351514d6436760edd4a291] r300g: set minimum point size to 1.0 for non-sprite non-aa points git bisect skip ac3a765589a881c56f351514d6436760edd4a291 # bad: [8bfadc802f6c3c85de4c429b2a87d0bdb1705028] st/vdpau: implement uploads to interlaced video buffers git bisect bad 8bfadc802f6c3c85de4c429b2a87d0bdb1705028 # bad: [c45771905f237d9285465dfce955440582ee51e5] swrast: use stencil packing function in s_stencil.c git bisect bad c45771905f237d9285465dfce955440582ee51e5 # bad: [5a0f395bcf70e524492e766a07cf0b816b42a20d] glsl: Fix leak of LinkedTransformFeedback.Varyings. git bisect bad 5a0f395bcf70e524492e766a07cf0b816b42a20d # bad: [39491d1d31d9f03437816fbb4f2872761ae1157c] r600g: vertex id support. git bisect bad 39491d1d31d9f03437816fbb4f2872761ae1157c # good: [6950a4faf650fe119ee97aa18b006eed099038be] mesa: Throw the required error for glCopyTex{Sub,}Image from multisample FBO. git bisect good 6950a4faf650fe119ee97aa18b006eed099038be # good: [27915708ed4519cc5606e81fb789e8427501f355] docs: new page describing how to build, install VMware SVGA3D guest driver git bisect good 27915708ed4519cc5606e81fb789e8427501f355 # bad: [bfcffd4d721d87bb6287980a09e0296ceed0bba3] r600g: fix r600 f2i to be trans only emitted. git bisect bad bfcffd4d721d87bb6287980a09e0296ceed0bba3 # good: [6c2c2c5a07c81a15a89519a8a84ef7c69698903b] scons: Fix libGL.so build. git bisect good 6c2c2c5a07c81a15a89519a8a84ef7c69698903b # bad: [5250bd00c00ac8470320f4fae1d74425132f2083] r600g: add missing r32 uint/sint fbo formats. git bisect bad 5250bd00c00ac8470320f4fae1d74425132f2083 # bad: [bb1f0cf3508630a9a93512c79badf8c493c46743] r600g: add support for virtual address space on cayman v11 git bisect bad bb1f0cf3508630a9a93512c79badf8c493c46743
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #31 from Harald Judt h.judt@gmx.at 2012-03-02 07:10:27 PST --- Using current git of kernel, xorg-server, xf86-video-ati and mesa, the screen still freezes every once in a while and dmesg shows these messages:
radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000 radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #32 from Harald Judt h.judt@gmx.at 2012-03-04 04:07:31 PST --- And today an unrecoverable error with 3.3-rc6, forcing a reboot of the machine:
radeon 0000:01:00.0: GPU lockup CP stall for more than 10060msec GPU lockup (waiting for 0x00026292 last fence id 0x0002628F) radeon 0000:01:00.0: GPU softreset radeon 0000:01:00.0: GRBM_STATUS=0xF5702028 radeon 0000:01:00.0: GRBM_STATUS_SE0=0xFC000004 radeon 0000:01:00.0: GRBM_STATUS_SE1=0xFC000004 radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x09E3D64C radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00071001 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DF7B radeon 0000:01:00.0: GRBM_STATUS=0x00003828 radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 radeon 0000:01:00.0: SRBM_STATUS=0x200000C0 radeon 0000:01:00.0: GPU reset succeed
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #33 from Harald Judt h.judt@gmx.at 2012-03-04 05:38:29 PST --- Ok, the last gpu lockup has nothing to do with this; it is specific to an application and occurs on kernel-3.2 too.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #36 from Alexandre Demers alexandre.f.demers@gmail.com 2012-03-04 16:06:56 UTC --- (In reply to comment #35)
(In reply to comment #34)
Is there a way to disable radeon virtual addressing when loading the kernel?
You can disable it in mesa. Just set ws->info.r600_virtual_address to FALSE in do_winsys_init() in radeon_drm_winsys.c.
Hi Alex.
I want to point out this is not an option for the average user nor is it an option to turn off virtual address "on the fly". The average user will not recompile code; only if we are lucky will he use a flag to disable or enable an option, as long as it is easily accessible. You are taking the point of view of a dev or, at best, a tester willing to go beyond testing the code as it is.
I know I can run a 3.2 kernel, I know I can compile a different version or bisect or submit patches, I know I can switch from Gnome Shell to another window manager without fancy effects or that I can disable options if I follow your advise. But this is not accessible to the average user.
Please, consider another option for the average users that will use compiled code available soon.
Meanwhile, I'm still completly dedicated in solving this issue if I can do anything else. I'm sure other people following this bug are also willing to go further to help you fix this issue. Can we provide you with something more? apitrace, register states?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #34 from Alexandre Demers alexandre.f.demers@gmail.com 2012-03-04 13:46:48 PST --- Tested with latest kernel, mesa and drm gits and the problem is still there.
Is there a way to disable radeon virtual addressing when loading the kernel? I'm sorry to ask, but from where I stand, this regression is preventing me from having a reliable experience with my computer (freezes, crashes and locks) and I was not having this problem prior to this commits serie (mesa/kernel). I think it should be ironed out and disabled until things are fixed (as Intel RC6 had been until recently). We are getting near a new kernel release (3.3) where this will be enabled by default, so we can expect this problem to be reported a lot more once a new stable kernel will be available.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #35 from Alex Deucher agd5f@yahoo.com 2012-03-04 14:30:32 PST --- (In reply to comment #34)
Is there a way to disable radeon virtual addressing when loading the kernel?
You can disable it in mesa. Just set ws->info.r600_virtual_address to FALSE in do_winsys_init() in radeon_drm_winsys.c.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #37 from Alex Deucher agd5f@yahoo.com 2012-03-04 16:24:02 PST --- (In reply to comment #36)
I know I can run a 3.2 kernel, I know I can compile a different version or bisect or submit patches, I know I can switch from Gnome Shell to another window manager without fancy effects or that I can disable options if I follow your advise. But this is not accessible to the average user.
You can run an older mesa release as well. It's probably a better as a mesa knob than a kernel knob.
Please, consider another option for the average users that will use compiled code available soon.
We can add a mesa option if we aren't able to get this fixed in time for the next mesa release, but for now I'd prefer to leave it enabled otherwise most users will just disable it and not test the current code which won't help in getting it fixed.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #38 from Alexandre Demers alexandre.f.demers@gmail.com 2012-03-05 20:40:48 UTC --- Some news: today, I updated xserver and it seems I'm now able to boot under Gnome-Shell correctly.
However, launching RenderFeatTest.bin64 still hangs exactly where it has been hanging for some time now and freeze my window manager. At least, it seems one of the problem was related to xserver.
I'll hope I'll be able to find something new in the logs.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #39 from Alexandre Demers alexandre.f.demers@gmail.com 2012-03-06 22:25:26 PST --- (In reply to comment #38)
Some news: today, I updated xserver and it seems I'm now able to boot under Gnome-Shell correctly.
However, launching RenderFeatTest.bin64 still hangs exactly where it has been hanging for some time now and freeze my window manager. At least, it seems one of the problem was related to xserver.
I'll hope I'll be able to find something new in the logs.
After one testing day, it happened again. It's just not happening at start as it was doing, but more randomly. Too bad.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #40 from Harald Judt h.judt@gmx.at 2012-03-20 11:50:26 PDT --- (In reply to comment #35)
(In reply to comment #34)
Is there a way to disable radeon virtual addressing when loading the kernel?
You can disable it in mesa. Just set ws->info.r600_virtual_address to FALSE in do_winsys_init() in radeon_drm_winsys.c.
Thanks, as expected this also cures the garbled fonts in blender.
(In reply to comment #37)
We can add a mesa option if we aren't able to get this fixed in time for the next mesa release, but for now I'd prefer to leave it enabled otherwise most users will just disable it and not test the current code which won't help in getting it fixed.
We're already 2 users affected and very willing to test and help ;-) What information could we provide to further improve the situation?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #41 from Alex Deucher agd5f@yahoo.com 2012-03-20 12:11:26 PDT --- Are you still getting any messages like the following in your dmesg with the latest mesa from git?
radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000 radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000
I pushed a patch yesterday that fixed up a missing va setup, although I don't think the driver should hit that path with cayman and vm support.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #42 from Alexandre Demers alexandre.f.demers@gmail.com 2012-03-21 07:35:36 PDT --- (In reply to comment #41)
Are you still getting any messages like the following in your dmesg with the latest mesa from git?
radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000 radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000
I pushed a patch yesterday that fixed up a missing va setup, although I don't think the driver should hit that path with cayman and vm support.
Upgraded yesterday, using latest 3.3.0 kernel with latest drm and mesa. For now, it seems I'm not seeing it. However, I'll be testing it more in the next few days, I'll be mostly doing more than just using the desktop (I'll run some demos and games that were triggering the error). I'll keep you updated.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #43 from Alexandre Demers alexandre.f.demers@gmail.com 2012-03-21 21:11:41 PDT --- Created attachment 58846 --> https://bugs.freedesktop.org/attachment.cgi?id=58846 Different error message
Running RenderFeatTest.bin64 with yesterday's gits still crash at the same spot, but doesn't produce the "radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000" error.
GPU locks up, as you can see in the dmesg output. Once hung, I have to reset my computer to be able to use the radeon driver again, otherwise I'm running under software rendering (softpipe).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #44 from Alexandre Demers alexandre.f.demers@gmail.com 2012-04-03 19:24:36 PDT --- Just to let you know I've moved from Ubuntu to Arch. This week, kernel 3.0 came in and the problem is obviously appearing as expected. Still locks up, still hangs, still fails to resume:
[ 1454.142346] radeon 0000:01:00.0: offset 0x100000 is in reserved area 0x800000 [ 1454.142955] [drm:radeon_cs_parser_relocs] *ERROR* gem object lookup failed 0x10 [ 1454.142959] [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -2! [ 1454.155602] [drm:radeon_cs_parser_relocs] *ERROR* gem object lookup failed 0x10 [ 1454.155606] [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -2! [ 1465.203216] radeon 0000:01:00.0: GPU lockup CP stall for more than 10030msec [ 1465.203220] GPU lockup (waiting for 0x0001E557 last fence id 0x0001E554) [ 1465.418088] radeon 0000:01:00.0: GPU softreset [ 1465.418092] radeon 0000:01:00.0: GRBM_STATUS=0xF5700828 [ 1465.418094] radeon 0000:01:00.0: GRBM_STATUS_SE0=0xFC000001 [ 1465.418096] radeon 0000:01:00.0: GRBM_STATUS_SE1=0xFC000001 [ 1465.418098] radeon 0000:01:00.0: SRBM_STATUS=0x20020FC0 [ 1465.418101] radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x000779DD [ 1465.418103] radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00072001 [ 1465.418105] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000005B9 [ 1465.418108] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x020A400C [ 1465.579826] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 1465.579828] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DF7B [ 1465.579936] radeon 0000:01:00.0: GRBM_STATUS=0x80103828 [ 1465.579938] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x04000007 [ 1465.579940] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x04000007 [ 1465.579941] radeon 0000:01:00.0: SRBM_STATUS=0x20020FC0 [ 1465.580943] radeon 0000:01:00.0: GPU reset succeed [ 1465.771511] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 1465.942884] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 1465.944796] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [ 1465.944872] radeon 0000:01:00.0: WB enabled [ 1465.944874] [drm] fence driver on ring 0 use gpu addr 0x40000c00 and cpu addr 0xffff88021ea01c00 [ 1465.944876] [drm] fence driver on ring 1 use gpu addr 0x40000c04 and cpu addr 0xffff88021ea01c04 [ 1465.944878] [drm] fence driver on ring 2 use gpu addr 0x40000c08 and cpu addr 0xffff88021ea01c08 [ 1466.140829] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8500)=0xCAFEDEAD) [ 1466.140831] [drm:cayman_resume] *ERROR* cayman startup failed on resume
I'll be testing kernel 3.4-rc1 soon and I'll play with 2D tiling.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #45 from Alexandre Demers alexandre.f.demers@gmail.com 2012-04-22 22:56:08 PDT --- I'm now working with a 3.4-rc4 kernel. I activated ColorTiling2D. However, it didn't change anything.
On the other hand, if you have the hardware under hand, I want to let you know that since the problem appeared (after kernel 3.2 and the culprit commit under mesa), you should be able to recreate the problem by running piglit tests (r600.tests). I'm able to recreate it each time if I also enable GLSL130. Doing the same, but with a 3.2 kernel will not produce the crash, as expected.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #47 from Michel Dänzer michel@daenzer.net 2012-04-25 07:11:51 UTC --- (In reply to comment #46)
Does this kernel patch help? http://lists.freedesktop.org/archives/dri-devel/2012-April/022037.html
I was wondering about that as well, but I'm afraid it can't, as Christian pointed out.
(In reply to comment #44)
[ 1454.142346] radeon 0000:01:00.0: offset 0x100000 is in reserved area 0x800000
The offset 0x100000 comes from userspace, so it could still be a pure userspace problem. Maybe it runs out of VM space and into the reserved area or something like that.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #46 from Alex Deucher agd5f@yahoo.com 2012-04-25 06:41:45 PDT --- Does this kernel patch help? http://lists.freedesktop.org/archives/dri-devel/2012-April/022037.html
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #48 from Alexandre Demers alexandre.f.demers@gmail.com 2012-04-25 08:21:12 PDT --- (In reply to comment #47)
(In reply to comment #46)
Does this kernel patch help? http://lists.freedesktop.org/archives/dri-devel/2012-April/022037.html
I was wondering about that as well, but I'm afraid it can't, as Christian pointed out.
(In reply to comment #44)
[ 1454.142346] radeon 0000:01:00.0: offset 0x100000 is in reserved area 0x800000
The offset 0x100000 comes from userspace, so it could still be a pure userspace problem. Maybe it runs out of VM space and into the reserved area or something like that.
I'll try it just in case ASAP. However, this means it won't probably be until Friday or the weekend.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #49 from Alex Deucher agd5f@yahoo.com 2012-04-25 08:59:57 PDT --- Created attachment 60582 --> https://bugs.freedesktop.org/attachment.cgi?id=60582 possible fix
Does this patch help?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #50 from Michel Dänzer michel@daenzer.net 2012-04-25 09:16:19 PDT --- Comment on attachment 60582 --> https://bugs.freedesktop.org/attachment.cgi?id=60582 possible fix
Review of attachment 60582: --> (https://bugs.freedesktop.org/page.cgi?id=splinter.html&bug=45018&att...) -----------------------------------------------------------------
::: src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ +221,4 @@
pipe_mutex_unlock(mgr->bo_va_mutex); return offset; }
if (waste < hole->size && (hole->size - waste) >= size) {
AFAICT the 'if (offset >= (hole->offset + hole->size))' test further up is a roundabout way of saying 'if (waste >= hole->size)', so I'm afraid this won't have any effect.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #51 from Michel Dänzer michel@daenzer.net 2012-04-26 11:50:28 PDT --- Does the Mesa patch series at http://lists.freedesktop.org/archives/mesa-dev/2012-April/021211.html help?
Beware that it's only lightly tested, and I'll be away now for a long weekend. If there's a problem with the patches, I'll look into it next week.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #52 from Alexandre Demers alexandre.f.demers@gmail.com 2012-04-26 14:19:41 PDT --- (In reply to comment #51)
Does the Mesa patch series at http://lists.freedesktop.org/archives/mesa-dev/2012-April/021211.html help?
Beware that it's only lightly tested, and I'll be away now for a long weekend. If there's a problem with the patches, I'll look into it next week.
No, it doesn't. But it's not worst either.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |major
--- Comment #53 from Alexandre Demers alexandre.f.demers@gmail.com 2012-04-26 19:41:37 PDT --- (In reply to comment #46)
Does this kernel patch help? http://lists.freedesktop.org/archives/dri-devel/2012-April/022037.html
No, it doesn't.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #54 from Alexandre Demers alexandre.f.demers@gmail.com 2012-05-04 21:21:18 PDT --- On latest git (3cd7bee48f7caf7850ea64d40f43875d4c975507), in src/gallium/drivers/r600/r66_hw_context.c, on line 194, shouldn't it be: - int offset + unsigned offset
Also, at line 1259, I'm not quite sure why it is shifted by 2. Most of the time, offset is usually shifted by 8. Just looking through the code to see if something could have been missed...
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #55 from Michel Dänzer michel@daenzer.net 2012-05-07 03:07:07 PDT --- (In reply to comment #54)
On latest git (3cd7bee48f7caf7850ea64d40f43875d4c975507), in src/gallium/drivers/r600/r66_hw_context.c, on line 194, shouldn't it be:
- int offset
- unsigned offset
That might be slightly better, but it doesn't really matter. It's the offset from the start of the MMIO aperture, so it would only matter if the register aperture grew beyond 2GB, which we're almost 5 orders of magnitude short of. Very unlikely.
Also, at line 1259, I'm not quite sure why it is shifted by 2. Most of the time, offset is usually shifted by 8.
It's just converting offset from units of 32 bits to bytes.
Just looking through the code to see if something could have been missed...
Right now it would be most useful to track down why radeon_bomgr_find_va / radeon_bomgr_force_va ends up returning the offset the kernel complains about.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #56 from Alexandre Demers alexandre.f.demers@gmail.com 2012-05-30 11:18:28 PDT --- (In reply to comment #55)
(In reply to comment #54)
On latest git (3cd7bee48f7caf7850ea64d40f43875d4c975507), in src/gallium/drivers/r600/r66_hw_context.c, on line 194, shouldn't it be:
- int offset
- unsigned offset
That might be slightly better, but it doesn't really matter. It's the offset from the start of the MMIO aperture, so it would only matter if the register aperture grew beyond 2GB, which we're almost 5 orders of magnitude short of. Very unlikely.
Also, at line 1259, I'm not quite sure why it is shifted by 2. Most of the time, offset is usually shifted by 8.
It's just converting offset from units of 32 bits to bytes.
Just looking through the code to see if something could have been missed...
Right now it would be most useful to track down why radeon_bomgr_find_va / radeon_bomgr_force_va ends up returning the offset the kernel complains about.
What do you suggest? I'll be playing with kernel 3.5-rc1 when available, but I don't think that will fix it. Is there or could there be a way to track what's going on with a debug switch or something similar?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #57 from Alexandre Demers alexandre.f.demers@gmail.com 2012-06-03 18:26:12 PDT --- Now running kernel 3.5-rc1 with latest mesa, drm, ddx and still locking the GPU. As always, easy to reproduce by running piglit r600 tests.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #58 from Alexandre Demers alexandre.f.demers@gmail.com 2012-06-05 19:18:10 UTC --- I noticed a different clue that could help track down the bug: when X doesn't completly freezes, there is a backtrace under .xsession-error. So I'm attaching both dmesg and the xsession snippet related to this crash.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #59 from Alexandre Demers alexandre.f.demers@gmail.com 2012-06-05 19:19:18 PDT --- Created attachment 62618 --> https://bugs.freedesktop.org/attachment.cgi?id=62618 dmesg related to the xsession-error file
This dmesg happened with the next attachment: xsession-error.txt
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #60 from Alexandre Demers alexandre.f.demers@gmail.com 2012-06-05 19:20:47 PDT --- Created attachment 62619 --> https://bugs.freedesktop.org/attachment.cgi?id=62619 snippet when gnome-shell is able to fall bak on its feet
snippet when gnome-shell is able to fall bak on its feet. Should be used with the last dmesg submitted.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #61 from Christian König deathsimple@vodafone.de 2012-06-06 02:15:10 PDT --- Please also try this patch: http://lists.freedesktop.org/archives/dri-devel/2012-June/023735.html
It doesn't fix anything rendering related, but instead fixes a deadlock introduced with the vm patch It isn't the complete solution of the problem it might still be an improvement.
Christian.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #62 from Alexandre Demers alexandre.f.demers@gmail.com 2012-06-06 15:37:07 PDT --- (In reply to comment #61)
Please also try this patch: http://lists.freedesktop.org/archives/dri-devel/2012-June/023735.html
It doesn't fix anything rendering related, but instead fixes a deadlock introduced with the vm patch It isn't the complete solution of the problem it might still be an improvement.
Christian.
Thanks Christian. I just tested the patch and it still fails. Running piglit on r600.test hangs, kills Xorg, restarts without any 3D support and produce the following:
----- dmesg:
[ 44.640434] retire_capture_urb: 1 callbacks suppressed [ 64.193666] radeon 0000:01:00.0: bo ffff88021b1d2400 va 0x0180D000 conflict with (bo ffff880221d00400 0x0180D000 0x0180E000) [ 64.242569] radeon 0000:01:00.0: bo ffff880221d1dc00 va 0x0184E000 conflict with (bo ffff8802135ac800 0x0184E000 0x0184F000) [ 64.369362] radeon 0000:01:00.0: bo ffff880222126800 va 0x01841000 conflict with (bo ffff88021b3b4400 0x01841000 0x01842000) [ 64.832098] radeon 0000:01:00.0: bo ffff88021352dc00 va 0x01859000 conflict with (bo ffff880222c42800 0x01859000 0x0185B000) [ 65.486230] EXT4-fs (sdc2): warning: maximal mount count reached, running e2fsck is recommended [ 65.540929] EXT4-fs (sdc2): mounted filesystem with ordered data mode. Opts: (null) [ 69.016383] radeon 0000:01:00.0: bo ffff880221d1e000 va 0x0402D000 conflict with (bo ffff880221fc5000 0x0402D000 0x0402E000) [ 69.017579] radeon 0000:01:00.0: bo ffff880221d1b400 va 0x0404D000 conflict with (bo ffff880206061400 0x0404D000 0x0404E000) [ 471.209470] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 471.209482] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000001ee7 last fence id 0x0000000000001ee4) [ 471.708793] radeon 0000:01:00.0: GPU lockup CP stall for more than 10500msec [ 471.708803] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000001ee5) [ 471.708812] radeon 0000:01:00.0: failed to get a new IB (-35) [ 471.708818] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib ! [ 471.712988] radeon 0000:01:00.0: GPU softreset [ 471.712996] radeon 0000:01:00.0: GRBM_STATUS=0xB3703828 [ 471.713001] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x24000007 [ 471.713006] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x3D000007 [ 471.713012] radeon 0000:01:00.0: SRBM_STATUS=0x200206C0 [ 471.713017] radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x00000000 [ 471.713023] radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000 [ 471.713029] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 471.713035] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x07070010 [ 471.862829] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 471.862831] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DF7B [ 471.862933] radeon 0000:01:00.0: GRBM_STATUS=0x00003828 [ 471.862934] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [ 471.862936] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [ 471.862937] radeon 0000:01:00.0: SRBM_STATUS=0x200206C0 [ 471.863938] radeon 0000:01:00.0: GPU reset succeed [ 472.044573] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 472.202790] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 472.204511] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [ 472.204582] radeon 0000:01:00.0: WB enabled [ 472.204584] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880221964c00 [ 472.204586] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff880221964c04 [ 472.204587] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff880221964c08 [ 472.387014] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8500)=0xCAFEDEAD) [ 472.387015] [drm:cayman_resume] *ERROR* cayman startup failed on resume [ 472.406246] radeon 0000:01:00.0: ffff88021c7d3800 unpin not necessary [ 472.406260] radeon 0000:01:00.0: ffff88021c7d3c00 unpin not necessary [ 472.407518] radeon 0000:01:00.0: GPU softreset [ 472.407525] radeon 0000:01:00.0: GRBM_STATUS=0xA0003828 [ 472.407530] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [ 472.407536] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [ 472.407541] radeon 0000:01:00.0: SRBM_STATUS=0x200206C0 [ 472.407546] radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x00000000 [ 472.407552] radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000 [ 472.407557] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 472.407562] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x07070010 [ 472.577076] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 472.577080] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DF7B [ 472.577183] radeon 0000:01:00.0: GRBM_STATUS=0x00003828 [ 472.577185] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x00000007 [ 472.577186] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x00000007 [ 472.577188] radeon 0000:01:00.0: SRBM_STATUS=0x200206C0 [ 472.578190] radeon 0000:01:00.0: GPU reset succeed [ 472.756629] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 472.912577] radeon 0000:01:00.0: Wait for MC idle timedout ! [ 472.914304] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [ 472.914377] radeon 0000:01:00.0: WB enabled [ 472.914380] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880221964c00 [ 472.914382] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff880221964c04 [ 472.914383] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff880221964c08 [ 473.094478] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8500)=0xCAFEDEAD) [ 473.094480] [drm:cayman_resume] *ERROR* cayman startup failed on resume [ 539.092664] retire_capture_urb: 1 callbacks suppressed
----- xsession-errors
Tracker-Message: Setting up monitor for changes to config file:'/home/dema1701/.config/tracker/tracker-miner-fs.cfg' Tracker-Message: Setting up monitor for changes to config file:'/home/dema1701/.config/tracker/tracker-store.cfg' Starting log: File:'/home/dema1701/.local/share/tracker/tracker-miner-fs.log' ** Message: applet now removed from the notification area ** (process:1189): WARNING **: Trying to register gtype 'GMountMountFlags' as enum when in fact it is of type 'GFlags' ** (process:1189): WARNING **: Trying to register gtype 'GDriveStartFlags' as enum when in fact it is of type 'GFlags' ** (process:1189): WARNING **: Trying to register gtype 'GSocketMsgFlags' as enum when in fact it is of type 'GFlags' Tracker-Message: Setting up monitor for changes to config file:'/home/dema1701/.config/tracker/tracker-store.cfg' Starting log: File:'/home/dema1701/.local/share/tracker/tracker-store.log' radeon: Failed to allocate a buffer: radeon: size : 256 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:865 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy Mesa: User error: GL_OUT_OF_MEMORY in glTexSubImage2D radeon: Failed to allocate a buffer: radeon: size : 2560 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:865 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy radeon: Failed to allocate a buffer: radeon: size : 2560 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:865 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy radeon: Failed to allocate a buffer: radeon: size : 256 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:865 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy Window manager warning: Failed to load theme "Ambiance": Failed to find a valid file for theme Ambiance
** Message: applet now embedded in the notification area ** Message: Stopping registered applet secret agent because GNOME Shell is running radeon: Failed to allocate a buffer: radeon: size : 256 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:865 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy Mesa: User error: GL_OUT_OF_MEMORY in glTexSubImage2D radeon: Failed to allocate a buffer: radeon: size : 2816 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:865 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy Window manager warning: CurrentTime used to choose focus window; focus window may not be correct. Window manager warning: Got a request to focus the no_focus_window with a timestamp of 0. This shouldn't happen! Window manager warning: Log level 16: STACK_OP_ADD: window 0x2600002 already in stack Window manager warning: Log level 16: STACK_OP_ADD: window 0x2600002 already in stack Window manager warning: Log level 16: STACK_OP_ADD: window 0x2600002 already in stack Window manager warning: Log level 16: STACK_OP_ADD: window 0x1e00002 already in stack Window manager warning: Log level 16: STACK_OP_ADD: window 0x1e00002 already in stack Window manager warning: Log level 16: STACK_OP_ADD: window 0x2600002 already in stack Window manager warning: Log level 16: STACK_OP_ADD: window 0x2600002 already in stack Window manager warning: Log level 16: STACK_OP_ADD: window 0x2600002 already in stack ** Message: Active session changed ** Message: Active session changed
(gnome-settings-daemon:1120): color-plugin-WARNING **: Done switch to new account, reload devices ** Message: Active session changed ** Message: Active session changed ** Message: Active session changed gnome-session[1085]: Gdk-WARNING: gnome-session: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. (gnome-settings-daemon:1120): Gdk-WARNING **: gnome-settings-daemon: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. (gnome-screensaver:1191): Gdk-WARNING **: gnome-screensaver: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. (evolution-alarm-notify:1196): Gdk-WARNING **: evolution-alarm-notify: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. (gnome-shell-calendar-server:1227): Gdk-WARNING **: gnome-shell-calendar-server: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. (gnome-terminal:1406): Gdk-WARNING **: gnome-terminal: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. (nautilus:1385): Gdk-WARNING **: nautilus: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. (nm-applet:1185): Gdk-WARNING **: nm-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. applet.py: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.
(deja-dup-monitor:1184): GVFS-RemoteVolumeMonitor-WARNING **: Owner :1.17 of volume monitor org.gtk.Private.UDisks2VolumeMonitor disconnected from the bus; removing drives/volumes/mounts g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.
Received signal:15->'Terminated' g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting. g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.
Received signal:15->'Terminated' OK
(tracker-miner-fs:1183): GVFS-RemoteVolumeMonitor-WARNING **: Owner :1.17 of volume monitor org.gtk.Private.UDisks2VolumeMonitor disconnected from the bus; removing drives/volumes/mounts (tracker-miner-fs:1183): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed (tracker-miner-fs:1183): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed (tracker-miner-fs:1183): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
OK
radeon: The kernel rejected CS, see dmesg for more information. Window manager warning: Log level 16: gnome-shell: Fatal IO error 16 (Device or resource busy) on X server :0.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #63 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-10 00:21:56 PDT --- Now running latest drm-next just in case. Always the same error, but with a little something new: with regular kernel, once the GPU crashed, it stays this way. With the drm-next branch, it loops. Attaching some files in a moment.
I just started Gnome Shell, then opened a terminal window and launched piglit r600 tests.
I'm pretty sure (dmesg): [ 66.238981] radeon 0000:01:00.0: bo ffff88020f46bc00 va 0x0183B000 conflict with (bo ffff88021b65d000 0x0183B000 0x0183C000) [ 66.271373] radeon 0000:01:00.0: bo ffff880222cc9400 va 0x01814000 conflict with (bo ffff880221a50800 0x01814000 0x01815000) [ 66.334540] radeon 0000:01:00.0: bo ffff880222b70000 va 0x01809000 conflict with (bo ffff8802230a9000 0x01809000 0x0180A000)
corresponds to (.xsession-error):
radeon: Failed to allocate a buffer: radeon: size : 256 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:869 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy Mesa: User error: GL_OUT_OF_MEMORY in glTexSubImage radeon: Failed to allocate a buffer: radeon: size : 256 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:869 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy radeon: Failed to allocate a buffer: radeon: size : 256 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:869 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy
Then (dmesg):
[ 196.710933] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [ 196.710946] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000675 last fence id 0x000000000000066c) [ 196.711129] radeon 0000:01:00.0: couldn't schedule ib [ 196.711239] radeon 0000:01:00.0: couldn't schedule ib [ 196.711805] radeon 0000:01:00.0: couldn't schedule ib [ 196.715732] radeon 0000:01:00.0: couldn't schedule ib [ 196.715975] radeon 0000:01:00.0: couldn't schedule ib [ 196.716362] radeon 0000:01:00.0: couldn't schedule ib [ 196.716627] radeon 0000:01:00.0: couldn't schedule ib [ 196.718012] radeon 0000:01:00.0: couldn't schedule ib [ 196.718262] radeon 0000:01:00.0: couldn't schedule ib [ 196.718480] radeon 0000:01:00.0: couldn't schedule ib [ 196.718985] radeon 0000:01:00.0: couldn't schedule ib [ 196.920396] radeon 0000:01:00.0: couldn't schedule ib [ 196.920703] radeon 0000:01:00.0: couldn't schedule ib [ 196.921084] radeon 0000:01:00.0: couldn't schedule ib [ 196.921318] radeon 0000:01:00.0: couldn't schedule ib [ 196.921558] radeon 0000:01:00.0: couldn't schedule ib [ 196.921898] radeon 0000:01:00.0: couldn't schedule ib [ 196.952350] radeon 0000:01:00.0: couldn't schedule ib [ 196.952386] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 196.952439] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 196.952494] IP: [<ffffffffa050080d>] radeon_fence_ref+0xd/0x40 [radeon] [ 196.952531] PGD 221dc4067 PUD 2228ff067 PMD 0 [ 196.952556] Oops: 0000 [#1] PREEMPT SMP [ 196.952579] CPU 1 [ 196.952617] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi powernow_k8 snd_seq_device radeon ttm joydev snd_hda_codec_hdmi ppdev evdev pwc snd_hda_codec_realtek r8712u(C) r8169 mperf parport_pc parport sp5100_tco usb_storage uas drm_kms_helper drm videobuf2_vmalloc videobuf2_memops hid_logitech_dj pcspkr processor snd_hda_intel snd_hda_codec i2c_algo_bit mii hid_generic videobuf2_core videodev media wmi kvm_amd snd_hwdep snd_pcm snd_page_alloc snd_timer psmouse i2c_piix4 usbhid firewire_ohci hid serio_raw i2c_core firewire_core k10temp kvm microcode crc_itu_t snd edac_core button soundcore edac_mce_amd ext4 crc16 jbd2 mbcache pata_acpi sr_mod sd_mod cdrom pata_atiixp ata_generic ohci_hcd ahci libahci libata ehci_hcd usbcore scsi_mod usb_common [ 196.952957] [ 196.952969] Pid: 715, comm: Xorg Tainted: G C 3.5.0-rc4-VANILLA-46957-g74da01d #1 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H [ 196.953044] RIP: 0010:[<ffffffffa050080d>] [<ffffffffa050080d>] radeon_fence_ref+0xd/0x40 [radeon] [ 196.953092] RSP: 0018:ffff8802230e9b48 EFLAGS: 00010286 ...
and it loops.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #64 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-10 00:22:55 PDT --- Created attachment 64052 --> https://bugs.freedesktop.org/attachment.cgi?id=64052 dmesg drm-next
dmesg with latest drm-next branch
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #65 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-10 00:23:46 PDT --- Created attachment 64053 --> https://bugs.freedesktop.org/attachment.cgi?id=64053 xsession with drm-next
.xsession with drm-next branch
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #66 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-23 18:49:17 PDT --- (In reply to comment #37)
(In reply to comment #36)
I know I can run a 3.2 kernel, I know I can compile a different version or bisect or submit patches, I know I can switch from Gnome Shell to another window manager without fancy effects or that I can disable options if I follow your advise. But this is not accessible to the average user.
You can run an older mesa release as well. It's probably a better as a mesa knob than a kernel knob.
Please, consider another option for the average users that will use compiled code available soon.
We can add a mesa option if we aren't able to get this fixed in time for the next mesa release, but for now I'd prefer to leave it enabled otherwise most users will just disable it and not test the current code which won't help in getting it fixed.
So it's been a while now and no improvement (even with proposed patches or by running drm-next kernel). Could we add this flag now so it will be possible to disable VM for cayman if wanted? This way, people will still be able to use VM by default, but for those encountering this problem, it will be possible to use their card without seeing it locking up by this code. It will also be possible to enable VM for them to test for any improvement or regression. Nobody's loosing anything. I'll be able to test other commits and new features running programs and piglit tests and once in a while I'll test the VM code (or test any patches or fixes dev could suggest me).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #67 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-24 06:53:33 PDT --- Created attachment 64585 --> https://bugs.freedesktop.org/attachment.cgi?id=64585 Adding an environment variable to disable VM if wanted
By setting R600_VM=0, we disable the virtual address space code path. By default, the path will still be enabled and used. However, if set to 0, it will prevent some cards (mostly CAYMAN it seems) from locking up or crashing because of the VM code. It is a work around until we figure out why it is locking.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #68 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-25 18:11:54 PDT --- I was thinking about it yesterday: is it possible that we are not tracking something in the virtual addresse spaces that we should be? That could explain why we are getting messages like "radeon 0000:01:00.0: bo ffff880212cb7000 va 0x00C26000 conflict with (bo ffff880222cc9400 0x00C26000 0x00C27000)" and so on.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #69 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-27 18:26:19 PDT --- (In reply to comment #67)
Created attachment 64585 [details] [review] Adding an environment variable to disable VM if wanted
By setting R600_VM=0, we disable the virtual address space code path. By default, the path will still be enabled and used. However, if set to 0, it will prevent some cards (mostly CAYMAN it seems) from locking up or crashing because of the VM code. It is a work around until we figure out why it is locking.
Please, if someone could review and commit if possible.
Thank you. Alexandre Demers
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #70 from Alex Deucher agd5f@yahoo.com 2012-07-31 15:10:38 UTC --- Does this kernel patch help? http://lists.freedesktop.org/archives/dri-devel/2012-July/025704.html
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #71 from awaters1@gmail.com awaters1@gmail.com 2012-08-01 01:18:00 UTC --- I have been having this same issue with respect to rendering regressions, I have also experienced the error relating to va conflicts. I investigated it a bit and I think the cause of the rendering regression is when a va is freed through radeon_bomgr_free_va and subsequently used again in radeon_bomgr_find_va the GPU isn't done with the memory and it gets overwritten before the GPU is done.
I experimented with this a bit and by not reusing any va_holes in radeon_bomgr_find_va the rendering regression goes away, at the expense of continually eating up the memory. So I looked around a way to make it so the va was only freed when it wasn't used any more, and it turns out that worked as well.
In order to test this I placed a call to radeon_bo_wait before radeon_bomgr_free_va is called within radeon_bo_destroy, the code looks something like in radeon_drm_bo.c if (mgr->va) { radeon_bo_wait(bo, RADEON_USAGE_READWRITE); radeon_bomgr_free_va(mgr, bo->va, bo->va_size); }
It causes busy waiting currently and could be improved by tracking the destroyed bos that need to be freed from va when they are not busy, if this is ultimately the way to solve it.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #72 from awaters1@gmail.com awaters1@gmail.com 2012-08-01 02:09:17 UTC --- Also, I believe the source of "radeon 0000:01:00.0: bo ffff8802ea5ec800 va 0x038EC000 conflict with (bo ffff8803eb464000 0x038EC000 0x038ED000)" is due to a race condition. It appears that after the call to radeon_bomgr_free_va the virtual address space is in a state where user space sees that freed address as available but the kernel hasn't been notified yet, until the drmIoctl call I assume.
I'm not sure if there are multiple threads allowed to interact with radeon_drm_bo.c, but if there are then the user space can request a virtual address that hasn't been freed yet by the kernel.
I moved the call to radeon_bomgr_free_va to be after the drmIoctl inradeon_bo_destroy, I'll run through the piglit tests to see if it fixes the errors.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #73 from Jerome Glisse glisse@freedesktop.org 2012-08-01 02:14:12 UTC --- Created attachment 65013 --> https://bugs.freedesktop.org/attachment.cgi?id=65013 Free va early in the kernel
Diagnosis was kind of obvious, but it just pop into my mind that ttm was sometimes delaying the deletion. So attached kernel patch should fix the issue without any mesa patch.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #74 from Jerome Glisse glisse@freedesktop.org 2012-08-01 02:15:19 UTC --- The way i build my kernel must hide this latency i guess...
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #75 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-01 03:26:16 UTC --- These are all food news. So I'll test both patches and I'll see if it also fixes the thing for me. Awaters (I don't know your name, you'll have to tell me), if what you found fixes my 6 month old problem, I'll offer you a beer (or whatever you'd like to drink). I'll be back soon with some news (good I hope).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #65013|0 |1 is obsolete| |
--- Comment #76 from Jerome Glisse glisse@freedesktop.org 2012-08-01 03:40:53 UTC --- Created attachment 65014 --> https://bugs.freedesktop.org/attachment.cgi?id=65014 Free va earyl
This one build (minor typo)
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #77 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-01 16:09:21 UTC --- (In reply to comment #70)
Does this kernel patch help? http://lists.freedesktop.org/archives/dri-devel/2012-July/025704.html
No, it doesn't (well not about the present bug).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #78 from Jerome Glisse glisse@freedesktop.org 2012-08-01 16:59:03 UTC --- (In reply to comment #77)
(In reply to comment #70)
Does this kernel patch help? http://lists.freedesktop.org/archives/dri-devel/2012-July/025704.html
No, it doesn't (well not about the present bug).
This patch is mostly for the lockup situation, it does not affect the va issue. My patch should definitely fix va issue. Alex patch might fix lockup on top of that.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #79 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-01 18:06:40 UTC --- (In reply to comment #78)
(In reply to comment #77)
(In reply to comment #70)
Does this kernel patch help? http://lists.freedesktop.org/archives/dri-devel/2012-July/025704.html
No, it doesn't (well not about the present bug).
This patch is mostly for the lockup situation, it does not affect the va issue. My patch should definitely fix va issue. Alex patch might fix lockup on top of that.
OK, so I should try them together then. I should be able to test it tonight. As of this morning with Alex's patch only, va issue was still reported but I had no time to test it further for lockups.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #80 from Anthony Waters awaters1@gmail.com 2012-08-02 00:41:23 UTC --- I tried both patches, the one from comment 76 and the one from comment 70, neither fixed the issue with the rendering regression or the va conflict.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #81 from Anthony Waters awaters1@gmail.com 2012-08-02 00:47:06 UTC --- Created attachment 65051 --> https://bugs.freedesktop.org/attachment.cgi?id=65051 fixes to wait on the bo and to free the va after the kernel
These are the changes I made to make it work in mesa, the first change, inserting radeon_bo_wait was so that the va wouldn't be immediately reallocated for a different va while the GPU was still using it causing the rendering regression.
The second change was to move the freeing of the va in mesa after the kernel was freed so that the kernel's list would be updated before mesa's list.
Hopefully this provides more insight to the issue/cause
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #82 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-02 01:02:42 UTC --- (In reply to comment #80)
I tried both patches, the one from comment 76 and the one from comment 70, neither fixed the issue with the rendering regression or the va conflict.
Same here, I was rebuilding my kernel from scratch just in case.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #83 from Jerome Glisse glisse@freedesktop.org 2012-08-02 03:46:43 UTC --- How do you trigger the va issue ? piglit ? I was not able to reproduce. It's kind of painful to debug in the dark.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #84 from Anthony Waters awaters1@gmail.com 2012-08-03 01:28:25 UTC --- I randomly saw it when I was playing a game of Warcraft 3, the terrain textures would blink. I'll check the piglit tests and mesa demos to see if I can reproduce the issue with them.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #85 from Anthony Waters awaters1@gmail.com 2012-08-03 02:07:06 UTC --- I found a demo that has the issue, in the demos repository for mesa within the src/demo folder the program 'reflect'. After I start it up and press 's' to see the stencil buffer the white plan blinks continuously. Applying the patch 'fixes to wait on the bo and to free the va after the kernel' removes the blinking, as does disabling va through the variable ws->info.r600_virtual_address.
The other issue with the kernel reporting a va conflict is going to be a little harder to reproduce because it appears to be caused by a race condition.
I'll still look for other demos that have the issue.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #86 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-03 03:00:00 UTC --- (In reply to comment #85)
I found a demo that has the issue, in the demos repository for mesa within the src/demo folder the program 'reflect'. After I start it up and press 's' to see the stencil buffer the white plan blinks continuously. Applying the patch 'fixes to wait on the bo and to free the va after the kernel' removes the blinking, as does disabling va through the variable ws->info.r600_virtual_address.
The other issue with the kernel reporting a va conflict is going to be a little harder to reproduce because it appears to be caused by a race condition.
I'll still look for other demos that have the issue.
Yes, I understand it can be hard to track for you Jerome. Well for the va issue, on my side, it is as simple as logging in KDE or Gnome 3. Before logging in, there is no va error in dmesg. Once I'm in, there are usually 3 or sometimes 6 errors (they are written in block of 3, so I suspect it tries a first time and for some reason it fails and try again second time).
I also experience the issue when watching some movies. With Anthony's patch, va issues are gone and I watched a couple of shows yesterday without any problem. Before the patch, it would blink and get corrupted after about 16 minutes and then crash. So, Anthony has put a finger on something.
However, I also run piglit tests and some other applications like RendererFeatTest64 (which is an application released before Amnesia went out to test OpenGL performances if I recall recorrectly). With Anthony's patch, I'm still able to lock the display everytime (if I play music at the same time, it will continue to play but I won't be able to change terminal even if sometimes my mouse pointer can still be moved). RendererFeatTest64 will always lock at the same test, but it is not the same for piglit tests (even if it happens often at the same or near the same).
I'm installing a freshly compiled kernel 3.5.0 with both Alex and your patches (by the way, they can't be applied on latest drm-next branch) and I'll tell you if I'm still experiencing the lockups. I'll also try Anthony's test to see if I get the same results (blinking without his patch, OK with it)
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #87 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-03 06:03:39 UTC --- (In reply to comment #86)
(In reply to comment #85)
I found a demo that has the issue, in the demos repository for mesa within the src/demo folder the program 'reflect'. After I start it up and press 's' to see the stencil buffer the white plan blinks continuously. Applying the patch 'fixes to wait on the bo and to free the va after the kernel' removes the blinking, as does disabling va through the variable ws->info.r600_virtual_address.
The other issue with the kernel reporting a va conflict is going to be a little harder to reproduce because it appears to be caused by a race condition.
I'll still look for other demos that have the issue.
Yes, I understand it can be hard to track for you Jerome. Well for the va issue, on my side, it is as simple as logging in KDE or Gnome 3. Before logging in, there is no va error in dmesg. Once I'm in, there are usually 3 or sometimes 6 errors (they are written in block of 3, so I suspect it tries a first time and for some reason it fails and try again second time).
I also experience the issue when watching some movies. With Anthony's patch, va issues are gone and I watched a couple of shows yesterday without any problem. Before the patch, it would blink and get corrupted after about 16 minutes and then crash. So, Anthony has put a finger on something.
However, I also run piglit tests and some other applications like RendererFeatTest64 (which is an application released before Amnesia went out to test OpenGL performances if I recall recorrectly). With Anthony's patch, I'm still able to lock the display everytime (if I play music at the same time, it will continue to play but I won't be able to change terminal even if sometimes my mouse pointer can still be moved). RendererFeatTest64 will always lock at the same test, but it is not the same for piglit tests (even if it happens often at the same or near the same).
I'm installing a freshly compiled kernel 3.5.0 with both Alex and your patches (by the way, they can't be applied on latest drm-next branch) and I'll tell you if I'm still experiencing the lockups. I'll also try Anthony's test to see if I get the same results (blinking without his patch, OK with it)
Well it still locks up even with the patches. I also tested the reflect demo and I don't have any blink without Anthony's patch, but we may be experiencing different symptoms of the same problem.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #88 from Michel Dänzer michel@daenzer.net 2012-08-03 07:47:17 UTC --- (In reply to comment #86)
So, Anthony has put a finger on something.
Yes, I think something like Anthony's patch is needed due to asynchronous GPU processing: when the userspace driver assigns virtual address space for a new BO, the GPU may not have finished processing command streams using previous BOs occupying that same virtual address space.
However, the userspace driver shouldn't wait synchronously for the BO to go idle when destroying it but should instead defer destruction (or at least the freeing of the virtual address space) until it notices the BO has become idle.
With Anthony's patch, I'm still able to lock the display everytime
And these lockups do not happen when not using virtual address space? Can you provide the dmesg output of the GPU reset for such a lockup? Ideally from a single piglit test reproducing it.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #89 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-03 08:05:07 UTC --- (In reply to comment #88)
(In reply to comment #86)
So, Anthony has put a finger on something.
Yes, I think something like Anthony's patch is needed due to asynchronous GPU processing: when the userspace driver assigns virtual address space for a new BO, the GPU may not have finished processing command streams using previous BOs occupying that same virtual address space.
However, the userspace driver shouldn't wait synchronously for the BO to go idle when destroying it but should instead defer destruction (or at least the freeing of the virtual address space) until it notices the BO has become idle.
With Anthony's patch, I'm still able to lock the display everytime
And these lockups do not happen when not using virtual address space? Can you provide the dmesg output of the GPU reset for such a lockup? Ideally from a single piglit test reproducing it.
Nope, no lockup without va (I may only be lucky though if the bug is there but only shown when using va). I'll try to find a way to get dmesg... It has been a problem since the start for that part, but I may be able to use another computer to log in remotely. May take a couple of days to do though.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #90 from Michel Dänzer michel@daenzer.net 2012-08-03 08:13:03 UTC --- (In reply to comment #89)
Nope, no lockup without va (I may only be lucky though if the bug is there but only shown when using va).
That's indeed possible: Using virtual address space will catch out of bounds memory access that may otherwise go unnoticed.
So, I think in this report we should focus on the rendering regression(s), and track the lockups in other reports.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |ASSIGNED
--- Comment #91 from Christian König deathsimple@vodafone.de 2012-08-03 12:58:04 UTC --- I just fixed a memory leak in radeonsi, and it looks like I'm hitting the same problem now.
Do I understand it correctly that the userspace VM manager is releasing allocations to early and not waiting for async buffer use to end?
That should be easy to fix.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #92 from Michel Dänzer michel@daenzer.net 2012-08-03 13:21:22 UTC --- (In reply to comment #91)
I just fixed a memory leak in radeonsi, and it looks like I'm hitting the same problem now.
Ah cool, you found it already. :)
Do I understand it correctly that the userspace VM manager is releasing allocations to early and not waiting for async buffer use to end?
That's my working theory.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #93 from Michel Dänzer michel@daenzer.net 2012-08-03 13:26:32 UTC --- (In reply to comment #92)
Do I understand it correctly that the userspace VM manager is releasing allocations to early and not waiting for async buffer use to end?
That's my working theory.
Also, if it wasn't the case, I don't see how Anthony's patch could make a difference.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #94 from Jerome Glisse glisse@freedesktop.org 2012-08-03 14:39:59 UTC --- (In reply to comment #88)
(In reply to comment #86)
So, Anthony has put a finger on something.
Yes, I think something like Anthony's patch is needed due to asynchronous GPU processing: when the userspace driver assigns virtual address space for a new BO, the GPU may not have finished processing command streams using previous BOs occupying that same virtual address space.
However, the userspace driver shouldn't wait synchronously for the BO to go idle when destroying it but should instead defer destruction (or at least the freeing of the virtual address space) until it notices the BO has become idle.
With Anthony's patch, I'm still able to lock the display everytime
And these lockups do not happen when not using virtual address space? Can you provide the dmesg output of the GPU reset for such a lockup? Ideally from a single piglit test reproducing it.
No, Anthony patch should not be needed. Once userspace call kernel to destroy bo userspace should be able to reuse va right away even if kernel is delaying bo destruction. My patch should fix the va issue, note that the patch attached here have a bug but it should not affect the va thing.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #95 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-03 14:51:56 UTC --- (In reply to comment #90)
(In reply to comment #89)
Nope, no lockup without va (I may only be lucky though if the bug is there but only shown when using va).
That's indeed possible: Using virtual address space will catch out of bounds memory access that may otherwise go unnoticed.
So, I think in this report we should focus on the rendering regression(s), and track the lockups in other reports.
OK, I'll open another bug for the lockups. This one will be renamed for va issues and rendering regression. I'll wait until tonight to make changes to see if someone objects.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #65051|0 |1 is obsolete| |
--- Comment #96 from Christian König deathsimple@vodafone.de 2012-08-03 15:03:52 UTC --- Created attachment 65093 --> https://bugs.freedesktop.org/attachment.cgi?id=65093 Possible fix.
It's hard and uneffecient to solve this problem completely in the kernel.
Since we patch the VM table synchronously, but use it asynchronously we will always end up needing to wait for a bo use by the GPU to end before patching in the new VA.
Please take a look at the attached patch it should fix the issue nicely in userspace.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #97 from Marek Olšák maraeo@gmail.com 2012-08-03 15:20:12 UTC --- (In reply to comment #96)
Created attachment 65093 [details] [review] Possible fix.
It's hard and uneffecient to solve this problem completely in the kernel.
Since we patch the VM table synchronously, but use it asynchronously we will always end up needing to wait for a bo use by the GPU to end before patching in the new VA.
Please take a look at the attached patch it should fix the issue nicely in userspace.
Please use the radeon_bo_is_busy function. Calling DRM_RADEON_GEM_BUSY directly is not reliable because of the thread offloading of the CS ioctl. The same applies to any other kernel queries and commands which depend on the CS ioctl.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #98 from Jerome Glisse glisse@freedesktop.org 2012-08-03 16:54:04 UTC --- Created attachment 65095 --> https://bugs.freedesktop.org/attachment.cgi?id=65095 Properly protect virtual address
Properly protect virtual address
Patch against Linus master, gonna attach patch against 3.5 next.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #65095|0 |1 is obsolete| |
--- Comment #99 from Jerome Glisse glisse@freedesktop.org 2012-08-03 16:56:00 UTC --- Created attachment 65096 --> https://bugs.freedesktop.org/attachment.cgi?id=65096 Properly protect virtual address
Properly protect virtual address
Patch against Linus master, gonna attach patch against 3.5 next.
Sorry previous one was wrong one.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #65096|0 |1 is obsolete| |
--- Comment #100 from Jerome Glisse glisse@freedesktop.org 2012-08-03 16:59:41 UTC --- Created attachment 65097 --> https://bugs.freedesktop.org/attachment.cgi?id=65097 Properly protect virtual address
Properly protect virtual address
Patch against Linus master, gonna attach patch against 3.5 next.
Again, sorry previous one was wrong one.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #101 from Jerome Glisse glisse@freedesktop.org 2012-08-03 17:05:15 UTC --- Created attachment 65098 --> https://bugs.freedesktop.org/attachment.cgi?id=65098 Properly protect virtual address against kernel 3.5
Same patch against 3.5
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #65098|0 |1 is obsolete| |
--- Comment #102 from Jerome Glisse glisse@freedesktop.org 2012-08-03 19:04:54 UTC --- Created attachment 65101 --> https://bugs.freedesktop.org/attachment.cgi?id=65101 Properly protect virtual address kernel 3.5 v2
Updated
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #65097|0 |1 is obsolete| |
--- Comment #103 from Jerome Glisse glisse@freedesktop.org 2012-08-03 19:05:47 UTC --- Created attachment 65102 --> https://bugs.freedesktop.org/attachment.cgi?id=65102 Properly protect virtual address v2
Against Linus master
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #104 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-03 19:44:34 UTC --- (In reply to comment #103)
Created attachment 65102 [details] [review] Properly protect virtual address v2
Against Linus master
I will test them later today. They should take care of the va issues, right? Probably nothing to do with lockups?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #105 from Jerome Glisse glisse@freedesktop.org 2012-08-03 19:46:50 UTC --- Well for va issue you also need the mesa patch from Christian. This patch mostly fix kernel, it might help with lockup, thought here piglit lockup hard with lastest mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #106 from Anthony Waters awaters1@gmail.com 2012-08-04 02:05:34 UTC --- I tried the patch from Christian in comment 96 atop of mesa git and the patch from Jerome in comment 102 atop of linux-3.5 and I no longer experience the rendering regression and I have not seen the va conflict error, thanks.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #107 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-04 03:55:56 UTC --- Tested with 3.6-rc1 and latest mesa with both respective patches. No va issue anymore.
However, lockups still happen with RendererFeatTest64: I tried to run some tests and my system locked completly and restarted. This seems to be a different problem though not related to the va conflict issue. So I'll open a different bug for the lockups revealed by the same commit (as previously said, without virtual address space, it doesn't lock).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|[bisected] rendering |[bisected] rendering |regression since added |regression and va conflicts |support for virtual address |since added support for |space on cayman v11 |virtual address space on | |cayman v11
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #108 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-05 04:29:39 UTC --- Oops, I've hit a va error again. I've been using my computer all day long, going from one window to another, using Flash on Openstreetmap and Google Map. The error could explain some lockups I've experienced. I hit the card's maximum memory from what I understand of the error. Should I put collected info here or under bug 53111?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #109 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-05 04:34:02 UTC --- (In reply to comment #108)
Oops, I've hit a va error again. I've been using my computer all day long, going from one window to another, using Flash on Openstreetmap and Google Map. The error could explain some lockups I've experienced. I hit the card's maximum memory from what I understand of the error. Should I put collected info here or under bug 53111?
Here is the error message without any log for now. I'll wait to see if it should be tracked here: [54804.656571] radeon 0000:01:00.0: offset 0x400000 is in reserved area 0x800000 [54805.166815] radeon 0000:01:00.0: bo ffff8800c227d800 va 0x02B00000 conflict with (bo ffff880202702400 0x02440000 0x03440000) [54805.177976] radeon 0000:01:00.0: bo ffff8800c227b000 va 0x02C38000 conflict with (bo ffff880202702400 0x02440000 0x03440000) [54805.178980] radeon 0000:01:00.0: bo ffff880061241400 va 0x02C38000 conflict with (bo ffff880202702400 0x02440000 0x03440000) [54805.253953] radeon 0000:01:00.0: bo ffff88021b183800 va 0x00900000 conflict with (bo ffff8802222fc000 0x00900000 0x00901000) [54806.900210] radeon 0000:01:00.0: va above limit (0x00100200 > 0x00100000) [54806.927121] radeon 0000:01:00.0: va above limit (0x001000B0 > 0x00100000) [54811.663812] radeon 0000:01:00.0: bo ffff880223631c00 va 0x01278000 conflict with (bo ffff88020270b000 0x01200000 0x01700000) [54813.069082] radeon 0000:01:00.0: bo ffff88021b183800 va 0x00900000 conflict with (bo ffff8802222fc000 0x00900000 0x00901000) [54813.075691] radeon 0000:01:00.0: bo ffff88007f002c00 va 0x00900000 conflict with (bo ffff8802222fc000 0x00900000 0x00901000) [54813.075886] radeon 0000:01:00.0: bo ffff88007f002000 va 0x00900000 conflict with (bo ffff8802222fc000 0x00900000 0x00901000) [54813.075961] gnome-shell[1025]: segfault at 50 ip 00007f8af5ebe019 sp 00007fff80159650 error 4 in r600_dri.so[7f8af5e53000+4b1000]
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #110 from Christian König deathsimple@vodafone.de 2012-08-08 10:48:35 UTC --- I just pushed a minor bugfix to mesa master, that in conjunction with Jeromes kernel patch should eliminate the last VA issues.
Please retest it again.
Christian.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #111 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-08 13:29:34 UTC --- (In reply to comment #110)
I just pushed a minor bugfix to mesa master, that in conjunction with Jeromes kernel patch should eliminate the last VA issues.
Please retest it again.
Christian.
You must be refering to commit 8c44e5a144009a03c20befa6468d19d41c802795. Do I still need to apply your previous patch also (attachment 65093)? I'll try it tonight, but it may take a bit more complicated to reproduce, I'll have to play for a while until it does or doesn't trigger the last reported vm error.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #112 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-09 01:38:41 UTC --- (In reply to comment #111)
(In reply to comment #110)
I just pushed a minor bugfix to mesa master, that in conjunction with Jeromes kernel patch should eliminate the last VA issues.
Please retest it again.
Christian.
You must be refering to commit 8c44e5a144009a03c20befa6468d19d41c802795. Do I still need to apply your previous patch also (attachment 65093 [details] [review])? I'll try it tonight, but it may take a bit more complicated to reproduce, I'll have to play for a while until it does or doesn't trigger the last reported vm error.
Well, I tested it with your previous patch on top of 68bccc40f55aee7f4af8eb64b15a95f0b49d6a17 and it was not working properly. First, I had to modify your patch to apply on top of latest git. After applying it, compiling and installing, I rebooted and I was unable to load the logging screen. I removed the patch, rebuilt a clean mesa from 68bccc40f55aee7f4af8eb64b15a95f0b49d6a17, installed and relaunched Xorg and... I was able to log in. So I'm now testing latest mesa (68bccc40f55aee7f4af8eb64b15a95f0b49d6a17) with kernel 3.6-rc1 + Jerome's patch. I should be able to tell you soon if it works. Meanwhile, if I should have applied something different, let me know.
To Jerome: I could test your [PATCH] drm/radeon: delay virtual address destruction to bo destruction. But first, I want to make sure Christian's patch does what it should do.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #113 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-09 05:20:04 UTC --- (In reply to comment #112)
(In reply to comment #111)
(In reply to comment #110)
I just pushed a minor bugfix to mesa master, that in conjunction with Jeromes kernel patch should eliminate the last VA issues.
Please retest it again.
Christian.
You must be refering to commit 8c44e5a144009a03c20befa6468d19d41c802795. Do I still need to apply your previous patch also (attachment 65093 [details] [review] [review])? I'll try it tonight, but it may take a bit more complicated to reproduce, I'll have to play for a while until it does or doesn't trigger the last reported vm error.
Well, I tested it with your previous patch on top of 68bccc40f55aee7f4af8eb64b15a95f0b49d6a17 and it was not working properly. First, I had to modify your patch to apply on top of latest git. After applying it, compiling and installing, I rebooted and I was unable to load the logging screen. I removed the patch, rebuilt a clean mesa from 68bccc40f55aee7f4af8eb64b15a95f0b49d6a17, installed and relaunched Xorg and... I was able to log in. So I'm now testing latest mesa (68bccc40f55aee7f4af8eb64b15a95f0b49d6a17) with kernel 3.6-rc1 + Jerome's patch. I should be able to tell you soon if it works. Meanwhile, if I should have applied something different, let me know.
To Jerome: I could test your [PATCH] drm/radeon: delay virtual address destruction to bo destruction. But first, I want to make sure Christian's patch does what it should do.
Bug still there with latest mesa git (without your previous patch as explained previously). Aug 9 01:03:29 Xander kernel: [13308.165749] radeon 0000:01:00.0: offset 0x400000 is in reserved area 0x800000 Aug 9 01:03:29 Xander kernel: [13308.232245] radeon 0000:01:00.0: bo ffff880223646400 va 0x02B00000 conflict with (bo ffff8801e3edc400
Locked and reset without any notice.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #114 from Alex Deucher agd5f@yahoo.com 2012-08-09 14:14:04 UTC --- Please test mesa from git (no additional patches) and make sure your kernel has this patch: http://lists.freedesktop.org/archives/dri-devel/2012-August/026015.html (no other kernel patches).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #115 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-09 14:56:34 UTC --- (In reply to comment #114)
Please test mesa from git (no additional patches) and make sure your kernel has this patch: http://lists.freedesktop.org/archives/dri-devel/2012-August/026015.html (no other kernel patches).
It looks pretty much to what I was testing with (latest mesa git without any patch as explained in comment 112) where I had already applied Jerome's patch v2 (no other patch). v4 doesn't seem to have any major differences (according to comment for v3 and v4). Nevertheless, I'll recompile kernel 3.6-rc1 with patch v4 just in case, though I would be surprised if that would make a difference from test/error reported in comment 113.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #116 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-09 15:24:41 UTC --- (In reply to comment #113)
(In reply to comment #112)
(In reply to comment #111)
(In reply to comment #110)
I just pushed a minor bugfix to mesa master, that in conjunction with Jeromes kernel patch should eliminate the last VA issues.
Please retest it again.
Christian.
You must be refering to commit 8c44e5a144009a03c20befa6468d19d41c802795. Do I still need to apply your previous patch also (attachment 65093 [details] [review] [review] [review])? I'll try it tonight, but it may take a bit more complicated to reproduce, I'll have to play for a while until it does or doesn't trigger the last reported vm error.
Well, I tested it with your previous patch on top of 68bccc40f55aee7f4af8eb64b15a95f0b49d6a17 and it was not working properly. First, I had to modify your patch to apply on top of latest git. After applying it, compiling and installing, I rebooted and I was unable to load the logging screen. I removed the patch, rebuilt a clean mesa from 68bccc40f55aee7f4af8eb64b15a95f0b49d6a17, installed and relaunched Xorg and... I was able to log in. So I'm now testing latest mesa (68bccc40f55aee7f4af8eb64b15a95f0b49d6a17) with kernel 3.6-rc1 + Jerome's patch. I should be able to tell you soon if it works. Meanwhile, if I should have applied something different, let me know.
To Jerome: I could test your [PATCH] drm/radeon: delay virtual address destruction to bo destruction. But first, I want to make sure Christian's patch does what it should do.
Bug still there with latest mesa git (without your previous patch as explained previously). Aug 9 01:03:29 Xander kernel: [13308.165749] radeon 0000:01:00.0: offset 0x400000 is in reserved area 0x800000 Aug 9 01:03:29 Xander kernel: [13308.232245] radeon 0000:01:00.0: bo ffff880223646400 va 0x02B00000 conflict with (bo ffff8801e3edc400
Locked and reset without any notice.
Two things I've noticed: 1- the error points directly at "offset 0x400000 is in reserved area 0x800000" since I applied Christian's and Jerome's patches, which is a different error from errors before patches. 2- the error only happens after a while, when switching between windows (under Gnome 3 in that case). I had to alt+tab and show my whole desktop (top left corner) many times before it happened. I played with my desktop all night long.
So, it's like if the pointer keeps increasing until it reaches its limit. Either we are not releasing correctly previous addresses (or we are forgetting some on the way)or we are unaware of every released addresses, in both cases pushing us forward until we hit a wall.
And if someone could explain me what this message/addresses means, I'd appreciate it. How is it possible that an offset of 0x400000 ends up in a reserved area allocated at 0x800000? We must not be offsetting from 0 obviously.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #117 from Alex Deucher agd5f@yahoo.com 2012-08-09 18:50:34 UTC --- (In reply to comment #116)
And if someone could explain me what this message/addresses means, I'd appreciate it. How is it possible that an offset of 0x400000 ends up in a reserved area allocated at 0x800000? We must not be offsetting from 0 obviously.
The first 8 MB of the client's VM space are reserved for kernel use and not available for the client to use. The client is not allowed to use an address below 0x800000. If an address ends up there, the kernel flags it. That's the message you are seeing.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #118 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-11 04:49:31 UTC --- Reproduced again with exactly the setup Alex told me to use (kernel 3.6-rc1+Jerome's patch v4 and latest mesa containing Christian's fix). To reproduce, I clicked repeatedly on Activities on top left corner of Gnome shell until it locked:
Everything.log --- Aug 11 00:23:08 Xander kernel: [92926.580673] radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000 Aug 11 00:23:08 Xander kernel: [92926.587281] [drm:radeon_cs_parser_relocs] *ERROR* gem object lookup failed 0x11 Aug 11 00:23:08 Xander kernel: [92926.587291] [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -2! Aug 11 00:23:08 Xander kernel: [92926.597151] radeon 0000:01:00.0: offset 0x200000 is in reserved area 0x800000 Aug 11 00:23:18 Xander kernel: [92937.073091] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec Aug 11 00:23:18 Xander kernel: [92937.073105] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000009ea1d last fence id 0x000000000009ea1c) Aug 11 00:23:18 Xander kernel: [92937.074236] radeon 0000:01:00.0: Saved 15 dwords of commands on ring 0. Aug 11 00:23:18 Xander kernel: [92937.074243] radeon 0000:01:00.0: GPU softreset Aug 11 00:23:18 Xander kernel: [92937.074248] radeon 0000:01:00.0: GRBM_STATUS=0xF5700828 Aug 11 00:23:18 Xander kernel: [92937.074253] radeon 0000:01:00.0: GRBM_STATUS_SE0=0xFC000001 Aug 11 00:23:18 Xander kernel: [92937.074258] radeon 0000:01:00.0: GRBM_STATUS_SE1=0xFC000001 Aug 11 00:23:18 Xander kernel: [92937.074263] radeon 0000:01:00.0: SRBM_STATUS=0x20020FC0 Aug 11 00:23:18 Xander kernel: [92937.074269] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Aug 11 00:23:18 Xander kernel: [92937.074274] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x40000000 Aug 11 00:23:18 Xander kernel: [92937.074279] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008004 Aug 11 00:23:18 Xander kernel: [92937.074284] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80228647 Aug 11 00:23:18 Xander kernel: [92937.074289] radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x00074124 Aug 11 00:23:18 Xander kernel: [92937.074294] radeon 0000:01:00.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00071001 Aug 11 00:23:18 Xander kernel: [92937.074300] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000021F1 Aug 11 00:23:18 Xander kernel: [92937.074305] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x020A4004 Aug 11 00:23:19 Xander kernel: [92937.223150] radeon 0000:01:00.0: Wait for MC idle timedout ! Aug 11 00:23:19 Xander kernel: [92937.223152] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DF7B Aug 11 00:23:19 Xander kernel: [92937.223254] radeon 0000:01:00.0: GRBM_STATUS=0x80103828 Aug 11 00:23:19 Xander kernel: [92937.223256] radeon 0000:01:00.0: GRBM_STATUS_SE0=0x04000007 Aug 11 00:23:19 Xander kernel: [92937.223257] radeon 0000:01:00.0: GRBM_STATUS_SE1=0x04000007 Aug 11 00:23:19 Xander kernel: [92937.223258] radeon 0000:01:00.0: SRBM_STATUS=0x20020FC0 Aug 11 00:23:19 Xander kernel: [92937.223260] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Aug 11 00:23:19 Xander kernel: [92937.223262] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Aug 11 00:23:19 Xander kernel: [92937.223263] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Aug 11 00:23:19 Xander kernel: [92937.223264] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Aug 11 00:23:19 Xander kernel: [92937.224266] radeon 0000:01:00.0: GPU reset succeeded, trying to resume Aug 11 00:23:19 Xander kernel: [92937.230003] [drm] probing gen 2 caps for device 1022:9603 = 2/0 Aug 11 00:23:19 Xander kernel: [92937.230004] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 Aug 11 00:23:19 Xander kernel: [92937.388426] radeon 0000:01:00.0: Wait for MC idle timedout ! Aug 11 00:23:19 Xander kernel: [92937.546743] radeon 0000:01:00.0: Wait for MC idle timedout ! Aug 11 00:23:19 Xander kernel: [92937.548662] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). Aug 11 00:23:19 Xander kernel: [92937.548751] radeon 0000:01:00.0: WB enabled Aug 11 00:23:19 Xander kernel: [92937.548754] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff88022332dc00 Aug 11 00:23:19 Xander kernel: [92937.548755] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xffff88022332dc04 Aug 11 00:23:19 Xander kernel: [92937.548757] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xffff88022332dc08 Aug 11 00:23:19 Xander kernel: [92937.752374] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD) Aug 11 00:23:19 Xander kernel: [92937.752377] [drm:cayman_resume] *ERROR* cayman startup failed on resume
Could it be a previously hidden bug that patches from Jerome and Christian digged up?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #119 from Michel Dänzer michel@daenzer.net 2012-08-15 16:07:30 UTC --- (In reply to comment #118)
Try the Mesa patches from http://lists.freedesktop.org/archives/mesa-dev/2012-August/025715.html .
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #120 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-16 00:38:42 UTC --- (In reply to comment #119)
(In reply to comment #118)
Try the Mesa patches from http://lists.freedesktop.org/archives/mesa-dev/2012-August/025715.html .
Testing right now.
May I suggest adding some debug info with an env variable switch to be able to track what the vm_mgr is doing, keeping and forgetting if this doesn't fix the problem or something similar?
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #121 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-16 15:35:45 UTC --- (In reply to comment #120)
(In reply to comment #119)
(In reply to comment #118)
Try the Mesa patches from http://lists.freedesktop.org/archives/mesa-dev/2012-August/025715.html .
Testing right now.
May I suggest adding some debug info with an env variable switch to be able to track what the vm_mgr is doing, keeping and forgetting if this doesn't fix the problem or something similar?
I've been testing to reproduce latest VA issue all evening without being able to. So if it doesn't finally fix the problem, your patches do help a lot. I'll continue to test it tonight. Good to know your patches have been commited this morning.
However, keep in mind I haven't tested anything for the other lockups (piglit tests and some other OpenGL apps).
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |v10lator@myway.de
--- Comment #122 from Alex Deucher agd5f@yahoo.com 2012-08-16 16:59:49 UTC --- *** Bug 53291 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #123 from Thomas Rohloff v10lator@myway.de 2012-08-16 20:10:32 UTC --- (In reply to comment #119)
(In reply to comment #118)
Try the Mesa patches from http://lists.freedesktop.org/archives/mesa-dev/2012-August/025715.html .
Not sure if this is related or if I should open a new report, but since this patches I get this when I try to start compiz with GLAMOR acceleration: http://pastebin.com/WbxMT0V9 - before I got the "conficts with" messages and without GLAMOR I get (and got) no messages at all but compiz loads slow and the screen flickers while doing so.
P.S. Also the desktop is corrupted with GLAMOR. This is better since this patches but still there.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #124 from Thomas Rohloff v10lator@myway.de 2012-08-16 21:03:39 UTC --- And there are some random rendering issues that wasn't there before the patches, like using the wrong texture.
Good: http://img713.imageshack.us/img713/492/mcgood.png Bad: http://img96.imageshack.us/img96/6417/mcbad.png
Also water in the game flashes white (seems to choose the wrong texture sometimes in the animation, too) and sometimes the whole game screen flashes blue.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #125 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-17 03:00:53 UTC --- (In reply to comment #124)
And there are some random rendering issues that wasn't there before the patches, like using the wrong texture.
Good: http://img713.imageshack.us/img713/492/mcgood.png Bad: http://img96.imageshack.us/img96/6417/mcbad.png
Also water in the game flashes white (seems to choose the wrong texture sometimes in the animation, too) and sometimes the whole game screen flashes blue.
I won't officially answer you question, but I think it should be tracked under a different bug since you are using Glamor. However, if I was you, I would create a new bug entry with a reference to this one.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
--- Comment #126 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-17 03:18:27 UTC --- Good news on my side: I was unable to recreate the bug until now. So I went with running pilit tests. Sadly, for that part, it still locks (now tracked under bug 53111).
I won't say for sure the vm problem is fixed, but if it's still there, latest patches helped a lot since I was able to run more than twice as long as usual without any problem.
https://bugs.freedesktop.org/show_bug.cgi?id=45018
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED
--- Comment #127 from Michel Dänzer michel@daenzer.net 2012-08-17 07:26:52 UTC --- (In reply to comment #126)
I won't say for sure the vm problem is fixed, but if it's still there, latest patches helped a lot since I was able to run more than twice as long as usual without any problem.
Great! Resolving this bug as fixed.
Any other remaining issues, in particular Thomas' glamor issues, should be tracked in separate bug reports.
dri-devel@lists.freedesktop.org