When a main MST port is disconnected drivers should call drm_dp_mst_topology_mgr_set_mst() disabling the MST manager, this function will set manager mst_primary to NULL and it will cause the crash bellow on the next atomic check when trying to access mst_primary->port.
As there is no use in running checks over managers that are not active this patch will skip it.
[ 305.616450] [drm:drm_dp_mst_atomic_check] [MST PORT:00000000cc2049e9] releases all VCPI slots [ 305.625085] [drm:drm_dp_mst_atomic_check] [MST PORT:00000000020ab43e] releases all VCPI slots [ 305.633729] [drm:drm_dp_mst_atomic_check] [MST MGR:00000000cdd467d4] mst state 00000000b67672eb VCPI avail=63 used=0 [ 305.644405] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 305.651448] #PF: supervisor read access in kernel mode [ 305.656640] #PF: error_code(0x0000) - not-present page [ 305.661807] PGD 0 P4D 0 [ 305.664396] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 305.668789] CPU: 3 PID: 183 Comm: kworker/3:2 Not tainted 5.5.0-rc6+ #1404 [ 305.675703] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3201.A00.1905140358 05/14/2019 [ 305.689425] Workqueue: events drm_dp_delayed_destroy_work [ 305.694874] RIP: 0010:drm_dp_mst_atomic_check+0x138/0x2c0 [ 305.700306] Code: 00 00 00 41 29 d9 41 89 d8 4c 89 fa 4c 89 f1 48 c7 c6 b0 b1 34 82 bf 10 00 00 00 45 31 ed e8 3f 99 02 00 4d 8b bf 80 04 00 00 <49> 8b 47 30 49 8d 5f 30 4c 8d 60 e8 48 39 c3 74 35 49 8b 7c 24 28 [ 305.719169] RSP: 0018:ffffc90001687b58 EFLAGS: 00010246 [ 305.724434] RAX: 0000000000000000 RBX: 000000000000003f RCX: 0000000000000000 [ 305.731611] RDX: 0000000000000000 RSI: ffff88849fba8cb8 RDI: 00000000ffffffff [ 305.738785] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 [ 305.745962] R10: ffffc900016879a0 R11: ffffc900016879a5 R12: 0000000000000000 [ 305.753139] R13: 0000000000000000 R14: ffff8884905c9bc0 R15: 0000000000000000 [ 305.760315] FS: 0000000000000000(0000) GS:ffff88849fb80000(0000) knlGS:0000000000000000 [ 305.768452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 305.774263] CR2: 0000000000000030 CR3: 0000000005610006 CR4: 0000000000760ee0 [ 305.781441] PKRU: 55555554 [ 305.784228] Call Trace: [ 305.786739] intel_atomic_check+0xb2e/0x2560 [i915] [ 305.791678] ? printk+0x53/0x6a [ 305.794856] ? drm_atomic_check_only+0x3e/0x810 [ 305.799417] ? __drm_dbg+0x82/0x90 [ 305.802848] drm_atomic_check_only+0x56a/0x810 [ 305.807322] drm_atomic_commit+0xe/0x50 [ 305.811185] drm_client_modeset_commit_atomic+0x1e2/0x250 [ 305.816619] drm_client_modeset_commit_force+0x4d/0x180 [ 305.821921] drm_fb_helper_restore_fbdev_mode_unlocked+0x46/0xa0 [ 305.827963] drm_fb_helper_set_par+0x2b/0x40 [ 305.832265] drm_fb_helper_hotplug_event.part.0+0xb2/0xd0 [ 305.837755] drm_kms_helper_hotplug_event+0x21/0x30 [ 305.842694] process_one_work+0x25b/0x5b0 [ 305.846735] worker_thread+0x4b/0x3b0 [ 305.850439] kthread+0x100/0x140 [ 305.853690] ? process_one_work+0x5b0/0x5b0 [ 305.857901] ? kthread_park+0x80/0x80 [ 305.861588] ret_from_fork+0x24/0x50 [ 305.865202] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel bluetooth prime_numbers snd_hda_intel snd_intel_dspcfg snd_hda_codec e1000e snd_hwdep snd_hda_core thunderbolt mei_hdcp mei_me asix cdc_ether x86_pkg_temp_thermal r8152 mei coretemp usbnet snd_pcm mii crct10dif_pclmul ptp crc32_pclmul ecdh_generic ghash_clmulni_intel pps_core ecc i2c_i801 intel_lpss_pci [ 305.903096] CR2: 0000000000000030 [ 305.906431] ---[ end trace 70ee364eed801cb0 ]--- [ 305.940816] RIP: 0010:drm_dp_mst_atomic_check+0x138/0x2c0 [ 305.946261] Code: 00 00 00 41 29 d9 41 89 d8 4c 89 fa 4c 89 f1 48 c7 c6 b0 b1 34 82 bf 10 00 00 00 45 31 ed e8 3f 99 02 00 4d 8b bf 80 04 00 00 <49> 8b 47 30 49 8d 5f 30 4c 8d 60 e8 48 39 c3 74 35 49 8b 7c 24 28 [ 305.965125] RSP: 0018:ffffc90001687b58 EFLAGS: 00010246 [ 305.970382] RAX: 0000000000000000 RBX: 000000000000003f RCX: 0000000000000000 [ 305.977571] RDX: 0000000000000000 RSI: ffff88849fba8cb8 RDI: 00000000ffffffff [ 305.984747] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 [ 305.991921] R10: ffffc900016879a0 R11: ffffc900016879a5 R12: 0000000000000000 [ 305.999099] R13: 0000000000000000 R14: ffff8884905c9bc0 R15: 0000000000000000 [ 306.006271] FS: 0000000000000000(0000) GS:ffff88849fb80000(0000) knlGS:0000000000000000 [ 306.014407] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 306.020185] CR2: 0000000000000030 CR3: 000000048b3aa003 CR4: 0000000000760ee0 [ 306.027404] PKRU: 55555554 [ 306.030127] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:38 [ 306.039049] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 183, name: kworker/3:2 [ 306.047272] INFO: lockdep is turned off. [ 306.051217] irq event stamp: 77505 [ 306.054647] hardirqs last enabled at (77505): [<ffffffff81a0c147>] _raw_spin_unlock_irqrestore+0x47/0x60 [ 306.064270] hardirqs last disabled at (77504): [<ffffffff81a0bedf>] _raw_spin_lock_irqsave+0xf/0x50 [ 306.073404] softirqs last enabled at (77402): [<ffffffff81e00389>] __do_softirq+0x389/0x47f [ 306.081885] softirqs last disabled at (77395): [<ffffffff810b83a9>] irq_exit+0xa9/0xc0 [ 306.089859] CPU: 3 PID: 183 Comm: kworker/3:2 Tainted: G D 5.5.0-rc6+ #1404 [ 306.098167] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3201.A00.1905140358 05/14/2019 [ 306.111882] Workqueue: events drm_dp_delayed_destroy_work [ 306.117314] Call Trace: [ 306.119780] dump_stack+0x71/0xa0 [ 306.123135] ___might_sleep.cold+0xf7/0x10b [ 306.127399] exit_signals+0x2b/0x360 [ 306.131014] do_exit+0xa7/0xc70 [ 306.134189] ? kthread+0x100/0x140 [ 306.137615] rewind_stack_do_exit+0x17/0x20
Fixes: cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") Cc: Mikita Lipski mikita.lipski@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: Lyude Paul lyude@redhat.com Signed-off-by: José Roberto de Souza jose.souza@intel.com --- drivers/gpu/drm/drm_dp_mst_topology.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 4b74193b89ce..38bf111e5f9b 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -5034,6 +5034,9 @@ int drm_dp_mst_atomic_check(struct drm_atomic_state *state) int i, ret = 0;
for_each_new_mst_mgr_in_state(state, mgr, mst_state, i) { + if (!mgr->mst_state) + continue; + ret = drm_dp_mst_atomic_check_vcpi_alloc_limit(mgr, mst_state); if (ret) break;
Removing this lose code block and removing unnecessary bracket.
Cc: Lyude Paul lyude@redhat.com Signed-off-by: José Roberto de Souza jose.souza@intel.com --- drivers/gpu/drm/drm_dp_mst_topology.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 38bf111e5f9b..e3a22362aaf2 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -3491,6 +3491,8 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms mgr->mst_state = mst_state; /* set the device into MST mode */ if (mst_state) { + struct drm_dp_payload reset_pay; + WARN_ON(mgr->mst_primary);
/* get dpcd info */ @@ -3521,16 +3523,12 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL, DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC); - if (ret < 0) { + if (ret < 0) goto out_unlock; - }
- { - struct drm_dp_payload reset_pay; - reset_pay.start_slot = 0; - reset_pay.num_slots = 0x3f; - drm_dp_dpcd_write_payload(mgr, 0, &reset_pay); - } + reset_pay.start_slot = 0; + reset_pay.num_slots = 0x3f; + drm_dp_dpcd_write_payload(mgr, 0, &reset_pay);
queue_work(system_long_wq, &mgr->work);
Reviewed-by: Lyude Paul lyude@redhat.com
On Thu, 2020-01-16 at 17:58 -0800, José Roberto de Souza wrote:
JFYI: I'm going to go ahead and push this patch by itself to drm-misc-next since it applies cleanly, the other patches in this series don't depend on this, and I'm about to send out a patch that modifies the code around these hunks anyway.
On Thu, 2020-01-16 at 17:58 -0800, José Roberto de Souza wrote:
This is a eDP function and it will always returns true for non-eDP ports.
Signed-off-by: José Roberto de Souza jose.souza@intel.com --- drivers/gpu/drm/i915/display/intel_dp.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c index 4074d83b1a5f..a50b5b6dd009 100644 --- a/drivers/gpu/drm/i915/display/intel_dp.c +++ b/drivers/gpu/drm/i915/display/intel_dp.c @@ -7537,7 +7537,6 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
if (!intel_edp_init_connector(intel_dp, intel_connector)) { intel_dp_aux_fini(intel_dp); - intel_dp_mst_encoder_cleanup(intel_dig_port); goto fail; }
On Thu, Jan 16, 2020 at 05:58:36PM -0800, José Roberto de Souza wrote:
This makes the unwind look incomplete to the causual reader. The cleanup function does both anyway so cross checking is harder if they're not consistent. So not sure I like it. Hmm. The ordering of these two also looks off here.
Maybe nicer to just move the whole onion to the end of function (we alredy have one layer there)?
On Thu, 2020-01-30 at 19:16 +0200, Ville Syrjälä wrote:
If I need to rework the 4/4 patch I will do that, otherwise I will just ignore this patch.
Please check my answer to your comment.
On Thu, 2020-01-16 at 17:58 -0800, José Roberto de Souza wrote:
Change looks fine for me(anyway better than now).
But:
This whole thing looks kind of confusing to me. Why we are even calling intel_edp_init_connector for non-eDP ports, just to immediately get true returned? So returning success means either success or that this is non-eDP..
This confuses the caller, that we have actually successfully initialized eDP, while actually this also means here that it is not eDP.
Why we can't just do it like:
if (intel_dp_is_edp(intel_dp)) { if (!intel_edp_init_connector(intel_dp, intel_connector)) { intel_dp_aux_fini(intel_dp); goto fail; } }
it looks much more understandable and less confusing, i.e eDP functions are only called for eDP and no return value hacks are needed.
Reviewed-by: Stanislav Lisovskiy stanislav.lisovskiy@intel.com
Stan
TGL timeouts when disabling MST transcoder and fifo underruns over MST transcoders are fixed when setting TRANS_DDI_MODE_SELECT to 0(HDMI mode) during the disable sequence.
Although BSpec disable sequence don't require this step it is a harmless change and it is also done by Windows driver. Anyhow HW team was notified about that but it can take some time to documentation to be updated.
A case that always lead to those issues is: - do a modeset enabling pipe A and pipe B in the same MST stream leaving A as master - disable pipe A, promote B as master doing a full modeset in A - enable pipe A, changing the master transcoder back to A(doing a full modeset in B) - Pow: underruns and timeouts
The transcoders involved will only work again when complete disabled and their power wells turned off causing a reset in their registers.
Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Signed-off-by: José Roberto de Souza jose.souza@intel.com --- drivers/gpu/drm/i915/display/intel_ddi.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c index 32ea3c7e8b62..82e90f271974 100644 --- a/drivers/gpu/drm/i915/display/intel_ddi.c +++ b/drivers/gpu/drm/i915/display/intel_ddi.c @@ -1997,6 +1997,7 @@ void intel_ddi_disable_transcoder_func(const struct intel_crtc_state *crtc_state
val = I915_READ(TRANS_DDI_FUNC_CTL(cpu_transcoder)); val &= ~TRANS_DDI_FUNC_ENABLE; + val &= ~TRANS_DDI_MODE_SELECT_MASK;
if (INTEL_GEN(dev_priv) >= 12) { if (!intel_dp_mst_is_master_trans(crtc_state))
On Thu, Jan 16, 2020 at 05:58:37PM -0800, José Roberto de Souza wrote:
Feels a bit early since IIRC we still leave a bunch of other stuff enabled/selected here. In fact we don't seem to be clearing the DDI select anywhere at all? That one I would be more suspicious of than the mode. But maybe we should just clear both somewhere? I would suggest it should be when we clear the port select finally.
On Thu, 2020-01-30 at 19:25 +0200, Ville Syrjälä wrote:
We are clearing DDI select, in our code it is named as TGL_TRANS_DDI_PORT_MASK/TRANS_DDI_PORT_MASK.
For TGL in MST mode we clear DDI select in the block below for MST slaves and then in intel_ddi_post_disable_dp() for MST master as instructed by Display port sequences.
On Thu, Jan 30, 2020 at 08:07:07PM +0000, Souza, Jose wrote:
Ah. Hmm, so that can't be it then. Bummer. I guess I would still feel a bit safer if we clear the mode select alongside the the DDI select for the master. Since the spec says the DDI select must remain set for the master there must be something still going on, and so I worry that something might not work quite right if we change the mode prematurely.
On 31-Jan-20 4:50 PM, Ville Syrjälä wrote:
Hi Ville/Jose,
We are still observing =>*ERROR* Timed out waiting for ACT sent when disabling =>FIFO underruns Do we need a change disable sequence?
On Wed, May 13, 2020 at 01:58:47PM +0530, Sharma, Swati2 wrote:
My crystal ball is foggy today. If you want to know whether there is a mismatch between the spec and code you're just going to have to read both and compare.
On Wed, May 13, 2020 at 03:09:52PM +0300, Ville Syrjälä wrote:
If if there is no difference then it may become a fun task of trial and error to see what about the current order the hardware doesn't like.
On Thu, 2020-01-16 at 17:58 -0800, José Roberto de Souza wrote:
BAT/IGT are fine, so if this improves some MST behaviors don't see any point in holding this patch from being merged. For MST, we are anyway facing MST regressions almost on weekly basis. Until we have some meaningful tests in CI for MST, it anyway would be constantly in "bad" shape.
Reviewed-by: Stanislav Lisovskiy stanislav.lisovskiy@intel.com
Thanks for the catch! Makes sense to skip disabled managers there, but I wonder why it didn't cause a crash with amdgpu. Anyways looks good to me.
Acked-by: Mikita Lipski mikita.lipski@amd.com
On 1/16/20 8:58 PM, José Roberto de Souza wrote:
Reviewed-by: Lyude Paul lyude@redhat.com
On Thu, 2020-01-16 at 17:58 -0800, José Roberto de Souza wrote:
On Fri, Jan 17, 2020 at 3:21 PM Lyude Paul lyude@redhat.com wrote:
Reviewed-by: Lyude Paul lyude@redhat.com
Applied. I'll send a PR shortly.
Thanks!
Alex
dri-devel@lists.freedesktop.org