Hi.
I've got Ubuntu Saucy on a desktop with Intel G41 chipset (Asus P5G41M LE/CSM). It exibits weird behaviour. Kernel: Linux kolya 3.11.0-18-generic.
When I boot system up, login into X session and run xrands I get this: Screen 0: minimum 8 x 8, current 1920 x 1080, maximum 32767 x 32767 VGA1 disconnected (normal left inverted right x axis y axis) HDMI1 connected 1920x1080+0+0 (normal left inverted right x axis y axis) 531mm x 299mm 1920x1080 60.0*+ 1280x1024 75.0 60.0 1152x864 75.0 1024x768 75.1 60.0 800x600 75.0 60.3 640x480 75.0 60.0 720x400 70.1 DP1 disconnected (normal left inverted right x axis y axis) VIRTUAL1 disconnected (normal left inverted right x axis y axis)
In this state I can switch to a text mode console and it works fine.
Then I suspend/resume machine and run xrandr again with following result: Screen 0: minimum 8 x 8, current 1920 x 1080, maximum 32767 x 32767 VGA1 disconnected (normal left inverted right x axis y axis) HDMI1 disconnected 1920x1080+0+0 (normal left inverted right x axis y axis) 0mm x 0mm DP1 disconnected (normal left inverted right x axis y axis) VIRTUAL1 disconnected (normal left inverted right x axis y axis) 1920x1080 (0x47) 148.5MHz h: width 1920 start 2008 end 2052 total 2200 skew 0 clock 67.5KHz v: height 1080 start 1084 end 1089 total 1125 clock 60.0Hz
Note, that all screens are in disconnected state. Fortunately I still see my desktop.
But unfortunately when I switch to a text console my monitor goes into power saving mode and doesn't return from it even if I type on keyboard. Also if I leave my X session (i.e. logout) my monitor goes into sleep on all virtual terminals and I have to reboot the box.
I've tried to 'redetect' screens in my X session but it doesn't help to fix the issue. Disconnecting/reconnecting screen doesn't help as well. Reboot helps.
Unfortunately I'm do not know details of Linux graphics system to be able to properly classify the issue. I would really appreciate some guidance here. 1) Is this some sort of a known issue with known solution? 2) Should I file a bug? If so - where and what additional information might be useful in that bug.
I would really appreciate any help and I'll gladly provide any additional information that might be required (or test things).
Thanks!
On Sun, 16 Mar 2014, Nikolay Martynov mar.kolya@gmail.com wrote:
Not that I can recall, but that doesn't prove anything...
Please do, on DRM/Intel component at [1]. We'll tend to not forget stuff so easy if they're filed as bugs. Please attach dmesg from early boot with drm.debug=0xe module parameter enabled. Please see if running drm-intel-nightly branch of [2] helps.
Thanks, Jani.
[1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI [2] http://cgit.freedesktop.org/drm-intel/
Hi.
Thank you for you reply. I'll post debug output shortly.
I would really appreciate if you could point me on some sort of manual that describes how to properly run [2] on Ubuntu? Should I run the whole kernel or just some modules? Is there any way to build only requires modules? Sorry, my knowledge is somewhat limited in this regards.
Thanks!
[1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI [2] http://cgit.freedesktop.org/drm-intel/
On Mon, 17 Mar 2014, Nikolay Martynov mar.kolya@gmail.com wrote:
The whole kernel. See https://wiki.ubuntu.com/KernelTeam/GitKernelBuild for a starting point; please use the drm-intel-nightly branch in our tree instead of Linus' tree.
BR, Jani.
Hi.
Thank you so much for that link. I've created a bug report about original suspend/resume issue: https://bugs.freedesktop.org/show_bug.cgi?id=76301.
I was able to build drm-intel-nightly kernel using instructions from above link but unfortunately I run into issues: - It looks like I get unsigned kernel modules and this prevents proper debug output, hopefully I would be able to resolve this on my own but unfortunately I won't be able to provide useful dmesg just yet. - The drm-intel-nightly '6e052fec0cc204f4d2a0f71f45c0363971ad10dc' kernel hangs during boot on my aforementioned hardware. I've tried several times and mostly I get black screen. But couple of times I got some error output that stated something about softlock and lost interrupts - and it was sort of slowly printing those errors, once about 30 seconds. Unfortunately I do not have a hardcopy of that. The interesting thing is that when I add 'nomodeset' to boot parameters I'm able to successfully boot. I take it this is not expected and probably is a regression. I would appreciate if you could suggest course of action to debug this (and probably create another bugreport).
Thanks!
On Tue, 18 Mar 2014, Nikolay Martynov mar.kolya@gmail.com wrote:
Perhaps you failed to install the modules that go with the kernel?
BR, Jani.
Hi
Perhaps you failed to install the modules that go with the kernel?
I've built today's git version. The system boots but short after I log into X session system freezes.
I can login via ssh and see dmesg:
[ 58.699131] general protection fault: 0000 [#1] SMP [ 58.699173] Modules linked in: nls_utf8 udf crc_itu_t rfcomm bnep bluetooth 6lowpan_iphc binfmt_misc snd_usb_audio snd_usbmidi_lib gpio_ich ppdev gspca_zc3xx gspca_main videodev snd_hda_codec_realtek dm_multipath snd_hda_codec_generic snd_hda_intel scsi_dh snd_hda_codec snd_hwdep snd_pcm snd_seq_midi coretemp snd_seq_midi_event kvm_intel snd_rawmidi snd_seq kvm snd_seq_device snd_timer snd lpc_ich soundcore serio_raw parport_pc lp mac_hid asus_atk0110 parport raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear hid_generic usbhid hid raid1 i915 atl1e video i2c_algo_bit drm_kms_helper drm [ 58.699510] CPU: 0 PID: 1638 Comm: Xorg Tainted: G W 3.14.0-rc7-custom #2 [ 58.699537] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0506 06/11/2010 [ 58.699578] task: ffff8800d0f498d0 ti: ffff8800cfcb8000 task.ti: ffff8800cfcb8000 [ 58.699603] RIP: 0010:[<ffffffffa00a4e3a>] [<ffffffffa00a4e3a>] i915_gem_object_set_cache_level+0x8a/0x310 [i915] [ 58.699677] RSP: 0018:ffff8800cfcb9d60 EFLAGS: 00010246 [ 58.699701] RAX: ffff880036444000 RBX: dead000000100098 RCX: ffff8800cfcfd158 [ 58.699725] RDX: ffff8800cfcfcfd8 RSI: ffff8800d086d0d0 RDI: ffff88011b401800 [ 58.699749] RBP: ffff8800cfcb9d90 R08: 0000000000017340 R09: ffff88011fc17340 [ 58.699772] R10: ffffea0003421b40 R11: ffffffffa00a1e33 R12: ffff8800cfcfcf00 [ 58.699796] R13: 0000000000000001 R14: ffff8800cf36c800 R15: ffff8800cfcfcfc0 [ 58.699821] FS: 00007f3af20e1980(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 58.699856] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 58.699876] CR2: 00007f52d8659000 CR3: 00000000d0fa2000 CR4: 00000000000407f0 [ 58.699899] Stack: [ 58.699910] 00000001cfcb9d88 ffff8800cf36c800 ffff8800cfcfcf00 00000000fffffffe [ 58.699945] 0000000000000001 ffff8800cf36c800 ffff8800cfcb9dc0 ffffffffa00a5144 [ 58.699978] ffff8800d0ea3000 000000000000006f fffffffffffffff2 ffff8800cfcb9e20 [ 58.700010] Call Trace: [ 58.700010] [<ffffffffa00a5144>] i915_gem_set_caching_ioctl+0x84/0xf0 [i915] [ 58.700010] [<ffffffffa0003bc2>] drm_ioctl+0x4d2/0x600 [drm] [ 58.700010] [<ffffffff8101ce55>] ? native_sched_clock+0x35/0x90 [ 58.700010] [<ffffffff8101ce55>] ? native_sched_clock+0x35/0x90 [ 58.700010] [<ffffffff8101ceb9>] ? sched_clock+0x9/0x10 [ 58.700010] [<ffffffff811d7880>] do_vfs_ioctl+0x2e0/0x4c0 [ 58.700010] [<ffffffff810a20f4>] ? vtime_account_user+0x54/0x60 [ 58.700010] [<ffffffff811d7ae1>] SyS_ioctl+0x81/0xa0 [ 58.700010] [<ffffffff817304ff>] tracesys+0xe1/0xe6 [ 58.700010] Code: 90 f6 40 40 0f 0f 85 46 01 00 00 48 8b 42 68 49 39 c7 48 8d 50 98 75 e9 44 8b 6d d4 0f 1f 44 00 00 49 8b 46 30 f6 40 1c 20 75 25 <f6> 43 20 20 74 1f 4c 89 ee 48 89 df e8 95 9e ff ff 84 c0 75 10 [ 58.700010] RIP [<ffffffffa00a4e3a>] i915_gem_object_set_cache_level+0x8a/0x310 [i915] [ 58.700010] RSP <ffff8800cfcb9d60> [ 58.725088] ---[ end trace c84dd3681cbb815b ]---
Is this something expected on current git (ec45c7550806d1373db6915a4031a7ae2542d61f)? Thanks!
On Wed, Mar 19, 2014 at 08:15:05PM -0400, Nikolay Martynov wrote:
QA just reported it as well, https://bugs.freedesktop.org/show_bug.cgi?id=76384
If you can find the corresponding line for i915_gem_object_set_cache_level+0x8a (gdb /path/to/i915.ko; list *i915_gem_object_set_cache_level+0x8a) that would help. -Chris
2014-03-20 3:46 GMT-04:00 Chris Wilson chris@chris-wilson.co.uk:
(gdb) list *i915_gem_object_set_cache_level+0x8a 0x24e3a is in i915_gem_object_set_cache_level (drivers/gpu/drm/i915/i915_gem.c:3147). 3142 * crossing memory domains and dying. 3143 */ 3144 if (HAS_LLC(dev)) 3145 return true; 3146 3147 if (!drm_mm_node_allocated(gtt_space)) 3148 return true; 3149 3150 if (list_empty(>t_space->node_list)) 3151 return true;
Please let me know if there's anything else I can do.
On Thu, Mar 20, 2014 at 09:38:17AM -0400, Nikolay Martynov wrote:
Can you please try:
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 13fc490d1f62..4f71125493fd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3676,7 +3676,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, enum i915_cache_level cache_level) { struct drm_device *dev = obj->base.dev; - struct i915_vma *vma; + struct i915_vma *vma, *next; int ret;
if (obj->cache_level == cache_level) @@ -3687,7 +3687,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, return -EBUSY; }
- list_for_each_entry(vma, &obj->vma_list, vma_link) { + list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) { if (!i915_gem_valid_gtt_space(dev, &vma->node, cache_level)) { ret = i915_vma_unbind(vma); if (ret)
2014-03-20 9:43 GMT-04:00 Chris Wilson chris@chris-wilson.co.uk:
Yes, that seem to help. It didn't freeze anymore in 15 mins I used it. Thanks!
On Thu, Mar 20, 2014 at 07:17:00PM -0400, Nikolay Martynov wrote:
Yes, that seem to help. It didn't freeze anymore in 15 mins I used it. Thanks!
Thanks indeed,
commit 3f5e0f06a3355a77ace053b4ffc0ac1c413cf2d0 Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Mar 21 07:40:56 2014 +0000
drm/i915: Fix unsafe loop iteration over vma whilst unbinding them
On non-LLC platforms, when changing the cache level of an object, we may need to unbind it so that prefetching across page boundaries does not cross into a different memory domain. This requires us to unbind conflicting vma, but we did so iterating over the objects vma in an unsafe manner (as the list was being modified as we iterated).
The regression was introduced in commit 3089c6f239d7d2c4cb2dd5c353e8984cf79af1d7 Author: Ben Widawsky ben@bwidawsk.net Date: Wed Jul 31 17:00:03 2013 -0700
drm/i915: make caching operate on all address spaces apparently as far back as v3.12-rc1, but it has only just begun to trigger real world bug reports.
Reported-and-tested-by: Nikolay Martynov mar.kolya@gmail.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76384 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ben Widawsky ben@bwidawsk.net Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
Now you can get back to your original bug :( -Chris
dri-devel@lists.freedesktop.org