https://bugs.freedesktop.org/show_bug.cgi?id=29140
Summary: [rs690] Freeze at Xorg startup when using KMS Product: DRI Version: XOrg CVS Platform: x86-64 (AMD64) OS/Version: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/Radeon AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: steckdenis@yahoo.fr
Created an attachment (id=37174) --> (https://bugs.freedesktop.org/attachment.cgi?id=37174) Complete dmesg (radeon module loaded just after the last line speaking about a segfault of glxinfo)
Hello,
I have an ATI Radeon X1270 card (rs690m), with 128 Mio of sideport memory and 256 Mio of shared RAM. This memory configuration caused the bug #27529 .
I use the 2.6.35-rc5 kernel (linus-tree), Mesa git (as of the 17th of July 2010, commit 184abe8e26f76a50ede43d503aa6bf129d8d6b76, "llvmpipe: Remove unused variable in lp_test_sincos."), libdrm git (as of the first of July 2010, commit b803918f3f77c62edf22e78cb2095be399753423, "drm mode: Return -errno on drmIoctl() failure") and xf86-video-ati git (as of the 15th of July, commit cdeb1949c820242f05a8897d3ddd0718f204dacf, "kms: don't call cursor helper if using software cursor").
Because of the bug #27529, I used to boot my netbook with the "radeon.modeset=0" param, to avoir screen corruptions.
Then, I went to vacations. When I returned back to home, I wanted to try the latest versions of all my favorite softwares, especially the shiny new kernel, containing the fix of #27529.
I updated all my software stack to the versions I mentionned in the second paragraph, changed "radeon.modeset=0" to "radeon.modeset=1" and rebooted.
The reboot was ok. All my services started up nicely, except KDM. When the X server attempts to start, the screen goes black, with a small text cursor at its top left corner. This cursor doesn't move or blinks. The computer doesn't go farther in the boot sequence, it freezes.
I rebooted, using the nomodeset option, and killed KDM (which started fine). I launched a small shell loop that continuously write the dmesg output in a file and launched it.
While it was running, I unloaded the radeon module and reloaded it with the "modeset=1" parameter. All went fine, my screen was ok, with my text console on it. Then, I restarted KDM, which attempted to start Xorg.
The same bug came again. During the "freeze", I was able to see the activity of my hard disk drive (the dmesg loop contains a sync). It makes me thinking that the computer is not hard frozen, only the graphical part. I cannot go to any virtual console at this point, and I rebooted.
I attached my complete dmesg, as captured by my small script.
Thanks for resolving this bug, I hope it is the last one I will see before having a nice KMS-enabled netbook, and being able to give a try to Gallium3D, which is needed to a have the very nice Blur effect of KWin on my hardware.
https://bugs.freedesktop.org/show_bug.cgi?id=29140
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #37174|text/x-log |text/plain mime type| | Attachment #37174|0 |1 is patch| |
https://bugs.freedesktop.org/show_bug.cgi?id=29140
steckdenis@yahoo.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Product|DRI |xorg Version|XOrg CVS |git Component|DRM/Radeon |Driver/Radeon AssignedTo|dri-devel@lists.freedesktop |xorg-driver-ati@lists.x.org |.org | QAContact| |xorg-team@lists.x.org
--- Comment #1 from steckdenis@yahoo.fr 2010-07-23 02:05:34 PDT --- Hello,
I tested the new 2.6.35-rc6 kernel, and the bug also happens with this one. I use Mesa Git as of this morning, and the same xf86-video-ati and libdrm versions as in my last post.
I also tested with the "Option "NoAccel" "true"" in my xorg.conf, and the bug didn't happened. src/radeon_kms.c in the xf86-video-ati driver loads the EXA Xorg module when NoAccel is "false", so I didn't load it with NoAccel was "true". The bug may be there.
https://bugs.freedesktop.org/show_bug.cgi?id=29140
steckdenis@yahoo.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Product|xorg |DRI Version|git |DRI CVS Component|Driver/Radeon |DRM/Radeon AssignedTo|xorg-driver-ati@lists.x.org |dri-devel@lists.freedesktop | |.org QAContact|xorg-team@lists.x.org |
https://bugs.freedesktop.org/show_bug.cgi?id=29140
steckdenis@yahoo.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|[rs690] Freeze at Xorg |[rs690] Freeze at Xorg |startup when using KMS |startup when using KMS and | |multiple screens
https://bugs.freedesktop.org/show_bug.cgi?id=29140
--- Comment #8 from steckdenis@yahoo.fr 2010-08-09 06:00:25 PDT --- Created an attachment (id=37722) --> (https://bugs.freedesktop.org/attachment.cgi?id=37722) Strace output when running Xorg
Hello,
I tried today to reproduce this bug using Xorg Git, Linux 2.6.35 and Mesa Git. The bug happened again, except that I have some very interesting informations for you.
GDB wasn't helpfull because the bug is in the radeon kernel module. I discovered that Linux prints to dmesg a complete kernel stacktrace when an application is locked up by a mutex. By chance, it is just what is happening with Xorg, so I have a stack trace :
INFO: task Xorg:2948 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Xorg D 00000000ffffc694 0 2948 1 0x00400005 ffff8800683e7788 0000000000000082 ffff880001814f00 ffff8800683a1180 0000000000014f00 0000000000014f00 ffff8800683e7fd8 ffff8800683e7fd8 ffff8800683e7fd8 ffff88006c1df780 ffff8800683e7fd8 0000000000014f00 Call Trace: [<ffffffff81356bef>] __mutex_lock_slowpath+0x13f/0x310 [<ffffffff81356dd1>] mutex_lock+0x11/0x30 [<ffffffffa04feba5>] radeon_ring_lock+0x25/0x50 [radeon] [<ffffffffa0511f01>] r300_gpu_is_lockup+0x71/0x190 [radeon] [<ffffffffa04e753e>] radeon_fence_wait+0x33e/0x3d0 [radeon] [<ffffffff8106f610>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa04e6f85>] ? radeon_fence_emit+0xe5/0x130 [radeon] [<ffffffffa0545a58>] radeon_pm_set_clocks+0x3c8/0x5f0 [radeon] [<ffffffff81356ce3>] ? __mutex_lock_slowpath+0x233/0x310 [<ffffffffa0546908>] radeon_pm_compute_clocks+0xd8/0x270 [radeon] [<ffffffffa04dadf3>] atombios_crtc_mode_fixup+0x23/0x40 [radeon] [<ffffffffa044a06b>] drm_crtc_helper_set_mode+0x15b/0x3f0 [drm_kms_helper] [<ffffffffa0506a7a>] ? r100_cs_packet_next_reloc+0x4a/0x1e0 [radeon] [<ffffffffa044abe7>] drm_crtc_helper_set_config+0x797/0x820 [drm_kms_helper] [<ffffffffa03e6ccf>] ? drm_mode_object_find+0x5f/0x80 [drm] [<ffffffffa03e7f9f>] drm_mode_setcrtc+0x2cf/0x3a0 [drm] [<ffffffffa03da99c>] drm_ioctl+0x37c/0x460 [drm] [<ffffffffa03e7cd0>] ? drm_mode_setcrtc+0x0/0x3a0 [drm] [<ffffffff8112d04c>] vfs_ioctl+0x3c/0xd0 [<ffffffff8112d62c>] do_vfs_ioctl+0x7c/0x500 [<ffffffff8112db29>] sys_ioctl+0x79/0x90 [<ffffffff8100a017>] tracesys+0xd9/0xde
To be even more complete, I launched Xorg with strace, to see when all things are happening. I attached the strace output to this bug.
The last line, that is not complete, is when Xorg tries to call the DRM_IOCTL_MODE_SETCRTC. The two previous ioctls are DRM_IOCTL_MODE_ADDFB followed by DRM_IOCTL_MODE_SETGAMMA.
This bug doesn't happen when I use only one monitor (the internal LVDS), but only when I also use my external VGA monitor (without it, I think DRM_IOCTL_MODE_SETCRTC is never called).
I use an ATI Radeon X1270 (rs690m with 128Mio sideport memory) on a Packard Bell Dot/MA.FR netbook (it's the same as the Gateway Gateway LT3103u, but with a Packard Bell logo on it :) ).
I hope these informations will help you.
https://bugs.freedesktop.org/show_bug.cgi?id=29140
--- Comment #9 from steckdenis@yahoo.fr 2010-08-12 09:57:45 PDT --- Hello,
I think I found the problem, but I am unfortunately unable to fix it (I don't know the radeon module enough).
A change between the 2.6.34 and 2.6.35 kernels added a bunch of functions in drivers/gpu/drm/radeon/radeon_pm.c. The function that causes troubles to me is radeon_pm_set_clocks(struct radeon_device *rdev); .
This function begins by locking three mutexes, including rdev->cp.mutex.
My card is a r300, so the code goes through the "else" branch of the if. This branch contains a call to radeon_fence_emit.
Now in radeon_fence.c . I don't know how, but this function happens to call radeon_fence_wait. The problem is that radeon_fence_wait calls r300_gpu_is_lockup, by branching in "if (unlikely(!radeon_fence_signaled(fence))) {".
In r300.c : r300_gpu_is_lockup, called by radeon_fence_wait, calls radeon_ring_lock, because it wants to write in the ring.
In radeon_ring.c : radeon_ring_lock begins by calling "mutex_lock(&rdev->cp.mutex);", the exact same mutex as the one already locked by radeon_pm_set_clocks. That seems to be the problem.
Cheers.
https://bugs.freedesktop.org/show_bug.cgi?id=29140
--- Comment #10 from Alex Deucher agd5f@yahoo.com 2010-08-12 16:17:12 PDT --- Created an attachment (id=37828) View: https://bugs.freedesktop.org/attachment.cgi?id=37828 Review: https://bugs.freedesktop.org/review?bug=29140&attachment=37828
possible fix
Does this patch help?
https://bugs.freedesktop.org/show_bug.cgi?id=29140
--- Comment #11 from steckdenis@yahoo.fr 2010-08-13 00:55:19 PDT --- I applied the patch on a vanilla 2.6.35.1 kernel, and it works !
Thanks.
https://bugs.freedesktop.org/show_bug.cgi?id=29140
--- Comment #12 from Alex Deucher agd5f@yahoo.com 2010-08-13 07:54:14 PDT --- I've sent the patch to Dave. Thanks for tracking this down.
https://bugs.freedesktop.org/show_bug.cgi?id=29140
Jerome Glisse glisse@freedesktop.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED
--- Comment #13 from Jerome Glisse glisse@freedesktop.org 2010-08-16 10:15:28 PDT --- Closing
dri-devel@lists.freedesktop.org