On Tue, Oct 28, 2014 at 05:06:01PM +0200, Jani Nikula wrote:
On Tue, 28 Oct 2014, Johan Hovold johan@kernel.org wrote:
Hi,
I have had some problems with crashes involving suspend-to-disk after updating to v3.16.
Below is a log with 3.16.6 from a failed suspend attempt after which I get a NULL deref in ext4 code.
A couple of weeks ago I got something similar, with backtraces from ext4 (ext4_alloc_inode) and NULL-derefs in vfs (vfs_get_attr_nosec) when trying to do IO after resuming from suspend. That was with 3.16.3 and I was hoping that whatever it was would have been fixed in 3.16.6 (there were some ext4 error handling patches in there). I only got photos of those oopses but it involved kmem_cache_alloc (slub) and a NULL-deref in vfs_get_attr_nosec. I can put the photos up somewhere. That time I also got back to X and could issue a dmesg in an xterm, but any process trying to do IO died.
Something similar happened with 3.16.1 but unfortunately I do not have any logs from that.
I also have experienced occasional hangs during suspend, but I believe I have seen this with older kernels as well so not sure if related. Seems to be more frequent with 3.16.
This is my main machine so not keen on trying to bisect this on it.
It's an i7-4770 on an Intel DH87MC using the integrated HD Graphics 4600.
I'm CCing the Intel graphics guys due to some errors drm errors in the logs, and reports of other people having problems involving suspend and this driver.
My first suggestion would be to try to reproduce the NULL deref without i915 loaded, and track the issues you have independently.
I actually don't think this is i915 related, the new drm errors after failed suspend could possibly just be a side effect of whatever is causing the apparent memory corruption. As I mentioned, the first log I have of this do not seem to point at i915 (even if backlight-restore happens when tasks are restarted).
Please file any i915 issues against DRM/Intel at [1].
I'll see if I can get around to that. There are bug reports in various distro tracker about the intel_ddi_pll_enable warning dating back to April.
It's there on every resume. For instance this morning:
[108109.324398] WARNING: CPU: 1 PID: 7298 at /home/johan/src/linux/linux-xi/drivers/gpu/drm/i915/intel_ddi.c:911 intel_ddi_pll_enable+0x233/0x240() [108109.324398] WRPLL1 already enabled [108109.324399] Modules linked in: [108109.324400] CPU: 1 PID: 7298 Comm: kworker/u16:8 Tainted: G W 3.16.6 #1 [108109.324401] Hardware name: /DH87MC, BIOS MCH8710H.86A.0154.2014.0123.1542 01/23/2014 [108109.324403] Workqueue: events_unbound async_run_entry_fn [108109.324405] 0000000000000000 0000000000000009 ffffffff81739c03 ffff88053e89baf8 [108109.324405] ffffffff810850f6 ffff8807fadf0000 00000000b035061f 0000000000000001 [108109.324406] 0000000000046040 ffffffff81a10a41 ffffffff810851d5 ffffffff81a10a83 [108109.324407] Call Trace: [108109.324410] [<ffffffff81739c03>] ? dump_stack+0x49/0x6a [108109.324412] [<ffffffff810850f6>] ? warn_slowpath_common+0x86/0xb0 [108109.324414] [<ffffffff810851d5>] ? warn_slowpath_fmt+0x45/0x50 [108109.324415] [<ffffffff814445c3>] ? intel_ddi_pll_enable+0x233/0x240 [108109.324417] [<ffffffff814208ea>] ? haswell_crtc_mode_set+0x1a/0x30 [108109.324419] [<ffffffff8142e168>] ? __intel_set_mode+0x6a8/0x1590 [108109.324420] [<ffffffff814335f7>] ? intel_modeset_setup_hw_state+0x817/0xd10 [108109.324422] [<ffffffff813d4ae9>] ? drm_modeset_lock_all_crtcs+0x39/0x50 [108109.324424] [<ffffffff81328570>] ? pci_pm_suspend_noirq+0x1b0/0x1b0 [108109.324426] [<ffffffff813d719e>] ? __i915_drm_thaw+0x11e/0x1a0 [108109.324426] [<ffffffff813d786f>] ? i915_resume+0x1f/0x40 [108109.324428] [<ffffffff814749ef>] ? dpm_run_callback+0x4f/0x150 [108109.324428] [<ffffffff814756b3>] ? device_resume+0x93/0x1d0 [108109.324429] [<ffffffff81475804>] ? async_resume+0x14/0x40 [108109.324430] [<ffffffff810aaabd>] ? async_run_entry_fn+0x2d/0x120 [108109.324433] [<ffffffff8109eb58>] ? process_one_work+0x158/0x410 [108109.324434] [<ffffffff8109f376>] ? worker_thread+0x116/0x510 [108109.324435] [<ffffffff810c11ec>] ? __wake_up_common+0x4c/0x80 [108109.324436] [<ffffffff8109f260>] ? init_pwq+0x160/0x160 [108109.324437] [<ffffffff810a538c>] ? kthread+0xbc/0xe0 [108109.324439] [<ffffffff810a0000>] ? workqueue_sysfs_register+0x110/0x150 [108109.324440] [<ffffffff810a52d0>] ? kthread_freezable_should_stop+0x60/0x60 [108109.324442] [<ffffffff81741aac>] ? ret_from_fork+0x7c/0xb0 [108109.324443] [<ffffffff810a52d0>] ? kthread_freezable_should_stop+0x60/0x60
Thanks, Johan