Unfortunately we've never really made use of creating/destroying connectors on the fly like we have with MST, so 4.6 ended up showing a lot of various bugs with hotplugging MST displays, booting with MST displays, etc. Most of these bugs are very likely to panic the kernel, and a couple of them end up even doing out of bounds memory accesses causing all sorts of other issues.
The proper fix for these issues is Dave's connector lifetime patch series in 4.7-rc1[1], however backporting those patches would be too big of a fix to submit for stable. This patch series is a much smaller set of changes to workaround this issue.
As another note: the first two patches in this series are already upstream for 4.7-rc1, however since we have the connector ref lifetime patches in 4.7-rc1 they don't fix any kernel panics there, only a few inconsistencies in i915/drm's code. They do however, fix kernel panics for 4.6 since connectors getting destroyed can make dev->mode_config.num_connector have a different value from fb_helper->connector_count.
[1]: 0552f7651bc233e5407ab06ba97a9d7c25e19580 in master
Lyude (4): drm/i915/fbdev: Fix num_connector references in intel_fb_initial_config() drm/fb_helper: Fix references to dev->mode_config.num_connector drm/i915: Discard previous atomic state on resume if connectors change drm/atomic: Verify connector->funcs != NULL when clearing states
drivers/gpu/drm/drm_atomic.c | 2 +- drivers/gpu/drm/drm_fb_helper.c | 5 ++--- drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++ drivers/gpu/drm/i915/intel_fbdev.c | 6 +++--- 4 files changed, 18 insertions(+), 7 deletions(-)
commit 14a3842a1d5945067d1dd0788f314e14d5b18e5b upstream.
During boot time, MST devices usually send a ton of hotplug events irregardless of whether or not any physical hotplugs actually occurred. Hotplugs mean connectors being created/destroyed, and the number of DRM connectors changing under us. This isn't a problem if we use fb_helper->connector_count since we only set it once in the code, however if we use num_connector from struct drm_mode_config we risk it's value changing under us. On top of that, there's even a chance that dev->mode_config.num_connector != fb_helper->connector_count. If the number of connectors happens to increase under us, we'll end up using the wrong array size for memcpy and start writing beyond the actual length of the array, occasionally resulting in kernel panics.
This fix is only required for 4.6 and below. David Airlie's patchseries for 4.7 to add connector reference counting provides a more proper fix for this.
Upstream fix: 0552f7651bc2 ("drm/i915/mst: use reference counted connectors. (v3)") Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Lyude cpaul@redhat.com --- drivers/gpu/drm/i915/intel_fbdev.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index 97a91e6..c607217 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -366,12 +366,12 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper, uint64_t conn_configured = 0, mask; int pass = 0;
- save_enabled = kcalloc(dev->mode_config.num_connector, sizeof(bool), + save_enabled = kcalloc(fb_helper->connector_count, sizeof(bool), GFP_KERNEL); if (!save_enabled) return false;
- memcpy(save_enabled, enabled, dev->mode_config.num_connector); + memcpy(save_enabled, enabled, fb_helper->connector_count); mask = (1 << fb_helper->connector_count) - 1; retry: for (i = 0; i < fb_helper->connector_count; i++) { @@ -510,7 +510,7 @@ retry: if (fallback) { bail: DRM_DEBUG_KMS("Not using firmware configuration\n"); - memcpy(enabled, save_enabled, dev->mode_config.num_connector); + memcpy(enabled, save_enabled, fb_helper->connector_count); kfree(save_enabled); return false; }
commit 255f0e7c418ad95a4baeda017ae6182ba9b3c423 upstream.
During boot, MST hotplugs are generally expected (even if no physical hotplugging occurs) and result in DRM's connector topology changing. This means that using num_connector from the current mode configuration can lead to the number of connectors changing under us. This can lead to some nasty scenarios in fbcon:
- We allocate an array to the size of dev->mode_config.num_connectors. - MST hotplug occurs, dev->mode_config.num_connectors gets incremented. - We try to loop through each element in the array using the new value of dev->mode_config.num_connectors, and end up going out of bounds since dev->mode_config.num_connectors is now larger then the array we allocated.
fb_helper->connector_count however, will always remain consistent while we do a modeset in fb_helper.
This fix is only required for 4.6 and below. David Airlie's patchseries for 4.7 to add connector reference counting provides a more proper fix for this.
Changes since v1: - Remove leftover *dev variable from drm_pick_crtcs since it's not used now
Upstream fix: 0552f7651bc2 ("drm/i915/mst: use reference counted connectors. (v3)") Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Lyude cpaul@redhat.com --- drivers/gpu/drm/drm_fb_helper.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index 855108e..fe4df97 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -1895,7 +1895,6 @@ static int drm_pick_crtcs(struct drm_fb_helper *fb_helper, int n, int width, int height) { int c, o; - struct drm_device *dev = fb_helper->dev; struct drm_connector *connector; const struct drm_connector_helper_funcs *connector_funcs; struct drm_encoder *encoder; @@ -1914,7 +1913,7 @@ static int drm_pick_crtcs(struct drm_fb_helper *fb_helper, if (modes[n] == NULL) return best_score;
- crtcs = kzalloc(dev->mode_config.num_connector * + crtcs = kzalloc(fb_helper->connector_count * sizeof(struct drm_fb_helper_crtc *), GFP_KERNEL); if (!crtcs) return best_score; @@ -1960,7 +1959,7 @@ static int drm_pick_crtcs(struct drm_fb_helper *fb_helper, if (score > best_score) { best_score = score; memcpy(best_crtcs, crtcs, - dev->mode_config.num_connector * + fb_helper->connector_count * sizeof(struct drm_fb_helper_crtc *)); } }
If an MST device is disconnected while the machine is suspended, the number of connectors will change as well after we call intel_dp_mst_resume(). This means that any previous atomic state we had before suspending is no longer valid, since it'll still be pointing to missing connectors. We need to check for this before committing the state, otherwise we'll kernel panic on resume whenever if any MST display was disconnected before we started resuming:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40 [drm_kms_helper] Call Trace: [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915] [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0 [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0 [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm] [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90 [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm] [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915] [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90 [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915] [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915] [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0 [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190 [<ffffffff814da455>] device_resume+0xd5/0x1f0 [<ffffffff814da58d>] async_resume+0x1d/0x50 [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150 [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0 [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0 [<ffffffff810ad038>] worker_thread+0x48/0x4e0 [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0 [<ffffffff810b3794>] kthread+0xe4/0x100 [<ffffffff81742672>] ret_from_fork+0x22/0x50 [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200
Changes since v1: - Move drm_atomic_state_free() call down so we're holding the appropriate locks when destroying the atomic state Changes since v2: - Check that state != NULL before we start accessing it's members
This fix is only required for 4.6 and below. David Airlie's patchseries for 4.7 to add connector reference counting provides a more proper fix for this.
Upstream fix: 0552f7651bc2 ("drm/i915/mst: use reference counted connectors. (v3)") Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Lyude cpaul@redhat.com --- drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 0104a06..6fdb90e 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -15958,6 +15958,18 @@ void intel_display_resume(struct drm_device *dev) retry: ret = drm_modeset_lock_all_ctx(dev, &ctx);
+ /* + * With MST, the number of connectors can change between suspend and + * resume, which means that the state we want to restore might now be + * impossible to use since it'll be pointing to non-existant + * connectors. + */ + if (ret == 0 && state && + state->num_connector != dev->mode_config.num_connector) { + drm_atomic_state_free(state); + state = NULL; + } + if (ret == 0 && !setup) { setup = true;
Unfortunately since we don't have Dave's connector refcounting patch here yet, it's very possible that drm_atomic_state_default_clear() could get called by intel_display_resume() when intel_dp_mst_destroy_connector() isn't completely finished destroying an mst connector, but has already finished setting connector->funcs to NULL. As such, we need to treat the connector like it's already been destroyed and just skip it, otherwise we'll end up dereferencing a NULL pointer.
This fix is only required for 4.6 and below. David Airlie's patchseries for 4.7 to add connector reference counting provides a more proper fix for this.
Changes since v1: - Fix leftover whitespace
Upstream fix: 0552f7651bc2 ("drm/i915/mst: use reference counted connectors. (v3)") Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Lyude cpaul@redhat.com --- drivers/gpu/drm/drm_atomic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 8ee1db8..d307d96 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -139,7 +139,7 @@ void drm_atomic_state_default_clear(struct drm_atomic_state *state) for (i = 0; i < state->num_connector; i++) { struct drm_connector *connector = state->connectors[i];
- if (!connector) + if (!connector || !connector->funcs) continue;
/*
dri-devel@lists.freedesktop.org