https://bugs.freedesktop.org/show_bug.cgi?id=29556
Summary: [rv620] GPU reset followed by black screen Product: DRI Version: XOrg CVS Platform: x86-64 (AMD64) OS/Version: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/Radeon AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: scary.moo@gmail.com
Created an attachment (id=37840) --> (https://bugs.freedesktop.org/attachment.cgi?id=37840) system dmesg
Using latest git (as of 12/08/2010) of libdrm, mesa(classic),xf86-video-ati and drm-radeon-testing (commit drm/radeon/kms: enable writeback on remaing asics ), gpu is a hd3470 mobility (rv620), forced to lowest power state (echo "low" > /sys/class/drm/card0/device/power_profile). During normal web browsing, maybe while playing a flash video, the mouse cursor suddenly stops and immediately after I get a black screen from which I cannot recover unless I sysrq-reboot (haven't tried ssh). Upon reboot a check of the system log shows
[ 2325.450063] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec [ 2325.450066] ------------[ cut here ]------------ [ 2325.450083] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:239 radeon_fence_wait+0x35b/0x3c0 [radeon]() [ 2325.450085] Hardware name: Satellite A300 [ 2325.450087] GPU lockup (waiting for 0x0000BDBC last fence id 0x0000BDBA) [ 2325.450089] Modules linked in: radeon ttm ath5k drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect i2c_i801 ath [ 2325.450099] Pid: 1954, comm: X Not tainted 2.6.35+ #19 [ 2325.450101] Call Trace: [ 2325.450108] [<ffffffff8103a04a>] warn_slowpath_common+0x7a/0xb0 [ 2325.450111] [<ffffffff8103a121>] warn_slowpath_fmt+0x41/0x50 [ 2325.450120] [<ffffffffa009ba1b>] radeon_fence_wait+0x35b/0x3c0 [radeon] [ 2325.450125] [<ffffffff810521f0>] ? autoremove_wake_function+0x0/0x40 [ 2325.450134] [<ffffffffa009c1fc>] radeon_sync_obj_wait+0xc/0x10 [radeon] [ 2325.450139] [<ffffffffa005ad69>] ttm_bo_wait+0xf9/0x1b0 [ttm] [ 2325.450144] [<ffffffffa005e11f>] ttm_bo_move_accel_cleanup+0x9f/0x2e0 [ttm] [ 2325.450153] [<ffffffffa009c32f>] radeon_move_blit+0x11f/0x180 [radeon] [ 2325.450162] [<ffffffffa009c786>] radeon_bo_move+0xb6/0x1e0 [radeon] [ 2325.450166] [<ffffffffa005b1a5>] ttm_bo_handle_move_mem+0x135/0x410 [ttm] [ 2325.450170] [<ffffffffa005d2c9>] ttm_bo_evict+0x1b9/0x3f0 [ttm] [ 2325.450175] [<ffffffff81090001>] ? __isolate_lru_page+0x81/0xa0 [ 2325.450179] [<ffffffffa005c6f7>] ttm_mem_evict_first+0x147/0x1e0 [ttm] [ 2325.450183] [<ffffffffa005d059>] ttm_bo_mem_space+0x3e9/0x4a0 [ttm] [ 2325.450187] [<ffffffffa005d5e7>] ttm_bo_move_buffer+0xe7/0x160 [ttm] [ 2325.450192] [<ffffffff81260028>] ? drm_mapbufs+0x318/0x340 [ 2325.450196] [<ffffffffa005d6f6>] ttm_bo_validate+0x96/0x120 [ttm] [ 2325.450199] [<ffffffffa005db35>] ttm_bo_init+0x2e5/0x340 [ttm] [ 2325.450209] [<ffffffffa009d198>] radeon_bo_create+0x128/0x220 [radeon] [ 2325.450218] [<ffffffffa009cf10>] ? radeon_ttm_bo_destroy+0x0/0xc0 [radeon] [ 2325.450228] [<ffffffffa00b1aa4>] radeon_gem_object_create+0x84/0x100 [radeon] [ 2325.450232] [<ffffffff810c9030>] ? pollwake+0x0/0x60 [ 2325.450242] [<ffffffffa00b1f1f>] radeon_gem_create_ioctl+0x4f/0xe0 [radeon] [ 2325.450246] [<ffffffff81398e94>] ? sock_aio_read+0x134/0x150 [ 2325.450249] [<ffffffff8126138c>] drm_ioctl+0x33c/0x410 [ 2325.450259] [<ffffffffa00b1ed0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon] [ 2325.450262] [<ffffffff810b8bf2>] ? do_sync_read+0xd2/0x110 [ 2325.450266] [<ffffffff810c7b2c>] vfs_ioctl+0x3c/0xd0 [ 2325.450268] [<ffffffff810c812c>] do_vfs_ioctl+0x7c/0x520 [ 2325.450271] [<ffffffff810b9345>] ? vfs_read+0x105/0x140 [ 2325.450274] [<ffffffff810c861a>] sys_ioctl+0x4a/0x80 [ 2325.450277] [<ffffffff81004669>] ? do_device_not_available+0x9/0x10 [ 2325.450280] [<ffffffff8100256b>] system_call_fastpath+0x16/0x1b [ 2325.450283] ---[ end trace b2d00ea6bab57761 ]--- [ 2325.450295] [drm] Disabling audio support [ 2325.451428] radeon 0000:01:00.0: GPU softreset [ 2325.451431] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xA0003030 [ 2325.451435] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000003 [ 2325.451438] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200010C0 [ 2325.451452] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [ 2325.468635] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 [ 2325.484646] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0x00003030 [ 2325.484651] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000003 [ 2325.484654] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0 [ 2325.485660] radeon 0000:01:00.0: GPU reset succeed [ 2325.506653] [drm] Clocks initialized ! [ 2382.495138] SysRq : Emergency Sync
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #1 from Stefano Carignano scary.moo@gmail.com 2010-08-13 04:03:30 PDT --- Created an attachment (id=37841) --> (https://bugs.freedesktop.org/attachment.cgi?id=37841) Xorg log
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #2 from Marc marvin24@gmx.de 2010-08-13 05:14:33 PDT --- Created an attachment (id=37845) View: https://bugs.freedesktop.org/attachment.cgi?id=37845 Review: https://bugs.freedesktop.org/review?bug=29556&attachment=37845
rebased V2 blit patch from dri-devel
can you try this patch ontop of d-r-t?
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #3 from Marc marvin24@gmx.de 2010-08-15 04:45:10 PDT --- with the above patch the problem got not really cured here. It just happens more seldom. So there is still something wrong with the blit code in d-r-t.
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #4 from Stefano Carignano scary.moo@gmail.com 2010-08-15 07:02:15 PDT --- (In reply to comment #3)
with the above patch the problem got not really cured here. It just happens more seldom. So there is still something wrong with the blit code in d-r-t.
oh that's too bad, I tried the patch for a couple days now and it did seem to improve things, namely I haven't managed to crash the system anymore (I'm not using it heavily though, is this related to the gpu load or is it somewhat random ?)
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #5 from Marc marvin24@gmx.de 2010-08-16 10:40:17 PDT --- In fact, the bug is a little different now. Instead of a GPU hang, Xorg just blocks in D-state, no dmesg output. I did a cat /proc/`pidof X`/stack and got:
[<ffffffffa01ac7b2>] radeon_fence_wait+0x1d1/0x2ea [radeon] [<ffffffffa01acf41>] radeon_sync_obj_wait+0x11/0x13 [radeon] [<ffffffffa009295c>] ttm_bo_wait+0xbe/0x153 [ttm] [<ffffffffa0095b54>] ttm_bo_move_accel_cleanup+0x8b/0x29f [ttm] [<ffffffffa01ad07d>] radeon_move_blit+0x12a/0x148 [radeon] [<ffffffffa01ad420>] radeon_bo_move+0x114/0x13c [radeon] [<ffffffffa0092da9>] ttm_bo_handle_move_mem+0x1b6/0x2b1 [ttm] [<ffffffffa009449e>] ttm_bo_evict+0x2e1/0x34a [ttm] [<ffffffffa009467d>] ttm_mem_evict_first+0x176/0x1a4 [ttm] [<ffffffffa0094141>] ttm_bo_mem_space+0x3fd/0x479 [ttm] [<ffffffffa0094b6e>] ttm_bo_move_buffer+0xb3/0x11b [ttm] [<ffffffffa0094c83>] ttm_bo_validate+0xad/0xf6 [ttm] [<ffffffffa0094ffe>] ttm_bo_init+0x332/0x36b [ttm] [<ffffffffa01ae8e9>] radeon_bo_create+0x17f/0x246 [radeon] [<ffffffffa01beac8>] radeon_gem_object_create+0x7d/0xda [radeon] [<ffffffffa01beb72>] radeon_gem_create_ioctl+0x4d/0xab [radeon] [<ffffffffa002543c>] drm_ioctl+0x255/0x34d [drm] [<ffffffff810f719c>] vfs_ioctl+0x32/0xa6 [<ffffffff810f7aba>] do_vfs_ioctl+0x46a/0x4a3 [<ffffffff810f7b49>] sys_ioctl+0x56/0x79 [<ffffffff81002b9b>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #6 from Marc marvin24@gmx.de 2010-08-16 10:43:03 PDT --- btw. this happens often when displaying images.google.com (with some images) in firefox and try to scroll. My screen has 1920x1200 res, but my computer at work crash also today with 1280x1024 resolution. Maybe this is not so relevant, just in case...
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #7 from Alex Deucher agd5f@yahoo.com 2010-08-16 11:02:01 PDT --- Are you getting these issues specific to d-r-t or are you seeing them on 2.6.36-rc1 or drm-core-next?
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #8 from Marc marvin24@gmx.de 2010-08-16 14:13:39 PDT --- it happens on d-r-t since the blit the cleanup http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commit;h=36d... I also have V2 of the cleanup (see comment #2) applied (and 2.6.35.2 also).
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #9 from Alex Deucher agd5f@yahoo.com 2010-08-16 14:44:17 PDT --- (In reply to comment #8)
it happens on d-r-t since the blit the cleanup http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commit;h=36d... I also have V2 of the cleanup (see comment #2) applied (and 2.6.35.2 also).
That patch is currently busted as is. You need to either revert it, or apply v2 that I posted on dri-devel.
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #10 from Marc marvin24@gmx.de 2010-08-17 01:39:36 PDT --- well that's what I did. basicly, the patch in comment #2 is an interdiff of blit_V1 and blit_V2, so I should produce the same result as unapplying V1 and applying V2 - correct?
output of interdiff:
diff -u b/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c --- b/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -448,19 +448,8 @@ int num_packet2s = 0;
/* pin copy shader into vram if already initialized */ - if (rdev->r600_blit.shader_obj) { - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (unlikely(r != 0)) - return r; - r = radeon_bo_pin(rdev->r600_blit.shader_obj, RADEON_GEM_DOMAIN_VRAM, - &rdev->r600_blit.shader_gpu_addr); - radeon_bo_unreserve(rdev->r600_blit.shader_obj); - if (r) { - dev_err(rdev->dev, "(%d) pin blit object failed\n", r); - return r; - } - return 0; - } + if (rdev->r600_blit.shader_obj) + goto done;
mutex_init(&rdev->r600_blit.mutex); rdev->r600_blit.state_offset = 0; @@ -519,6 +508,18 @@ memcpy(ptr + rdev->r600_blit.ps_offset, r6xx_ps, r6xx_ps_size * 4); radeon_bo_kunmap(rdev->r600_blit.shader_obj); radeon_bo_unreserve(rdev->r600_blit.shader_obj); + +done: + r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); + if (unlikely(r != 0)) + return r; + r = radeon_bo_pin(rdev->r600_blit.shader_obj, RADEON_GEM_DOMAIN_VRAM, + &rdev->r600_blit.shader_gpu_addr); + radeon_bo_unreserve(rdev->r600_blit.shader_obj); + if (r) { + dev_err(rdev->dev, "(%d) pin blit object failed\n", r); + return r; + } return 0; }
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #11 from Alex Deucher agd5f@yahoo.com 2010-08-17 07:44:50 PDT --- is this still a problem with the current d-r-t?
https://bugs.freedesktop.org/show_bug.cgi?id=29556
Marc marvin24@gmx.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |marvin24@gmx.de
--- Comment #12 from Marc marvin24@gmx.de 2010-08-17 12:12:50 PDT --- yes it does
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #13 from Marc marvin24@gmx.de 2010-08-17 12:14:24 PDT --- (In reply to comment #12)
yes it does
eh - is!
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #14 from Alex Deucher agd5f@yahoo.com 2010-08-17 12:38:46 PDT --- Can you bisect to see what commit is causing the problem?
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #15 from Marc marvin24@gmx.de 2010-08-17 14:05:34 PDT --- the bug is hard to trigger (10 min scrolling with firefox), so bisecting will take a lot of time. I booted with no_wb=1 just for testing and now it seems to work fine. So maybe the blit and the writeback change have some unhealthy relationship. Also somehow the git history got changed... I'm sure the writeback changes where there many days ago and before the blit change. Maybe it also helps, that original reporter (Stefano) has a rv620 chip which is (AFAIK) similar to the rs780/785 chips.
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #16 from Alex Deucher agd5f@yahoo.com 2010-08-17 14:09:48 PDT --- (In reply to comment #15)
the bug is hard to trigger (10 min scrolling with firefox), so bisecting will take a lot of time. I booted with no_wb=1 just for testing and now it seems to work fine. So maybe the blit and the writeback change have some unhealthy
writeback has nothing to do with the blit but it might cause problems on it's own. If no_wb=1 fixes the issue, then writeback might not work well on your system.
relationship. Also somehow the git history got changed... I'm sure the writeback changes where there many days ago and before the blit change. Maybe
The branch was rebased.
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #17 from Andy Furniss lists@andyfurniss.entadsl.com 2010-08-18 03:20:48 PDT --- (In reply to comment #0)
Created an attachment (id=37840)
--> (https://bugs.freedesktop.org/attachment.cgi?id=37840)
system dmesg
Using latest git (as of 12/08/2010) of libdrm, mesa(classic),xf86-video-ati and drm-radeon-testing (commit drm/radeon/kms: enable writeback on remaing asics ), gpu is a hd3470 mobility (rv620), forced to lowest power state (echo "low" > /sys/class/drm/card0/device/power_profile).
I managed to get the same with the last d-r-t, using low power + gits like you, but this was on a rv790.
It seems like seamonkey was involved, but just to confuse the issue I wasn't running a clean d-r-t or ddx - which may well be irrelevant but -
d-r-t had tiling fixed + 2 cs parser fixes from the list, ddx had wait for vline FALSE and dri2 sync was off in drirc.
Had tested without issue many games, mplayer and mesa demos over the day.
The lockup was triggered when I found a seamonkey bug that makes it spawn a new window every 1/2-1 sec. While this was happening as X was unuseable due to the constant new windows I was switching back and forth between vt2 and 7. Then it locked up and I didn't get the screen back. After sysrq reboot, the card was still in a state - alsa failed to probe hardware, but I carried on into X looked at kern log in an xterm OK, but as soon as I started seamonkey it went again.
Reboot went OK this time and try as hard as I could - triggering seamonkey bug and switching vts I couldn't reproduce.
Now running current vanilla d-r-t and ddx I have so far failed to trigger it, but then I ran the other d-r-t for days OK.
https://bugs.freedesktop.org/show_bug.cgi?id=29556
Thierry Vignaud tvignaud@mandriva.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |patch
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #18 from Marc marvin24@gmx.de 2010-09-05 10:30:34 PDT --- seems the bug I was seeing (see comment #5) is fixed by v3 fencing patch, so for me it is ready to be closed...
https://bugs.freedesktop.org/show_bug.cgi?id=29556
--- Comment #19 from Andy Furniss lists@andyfurniss.entadsl.com 2010-09-05 15:45:22 PDT --- (In reply to comment #18)
seems the bug I was seeing (see comment #5) is fixed by v3 fencing patch, so for me it is ready to be closed...
I have failed to reproduce the one lockup I had with various d-r-ts, and now am running d-r-t + v3 fence.
https://bugs.freedesktop.org/show_bug.cgi?id=29556
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED
dri-devel@lists.freedesktop.org