drm_do_probe_ddc_edid ENXIO check too aggressive?

List overview All Threads
Download

newer

older

[Bug 70934] New: [Regression]...

[Bug 73297] Xorg crashes with...

Daniel Drake

16 Dec 2013 16 Dec '13

9:47 p.m.

Hi,

I'm using exynos_drm on Exynos4412 to output to a Sony HDMI TV.

When I disconnect and then re-plug the TV, Exynos detects this event and tries to read the EDID from the DDC over I2C.

The DDC does not provide an ACK at this point, so the i2c-s3c2410 driver reports ENXIO, which seems to agree with the documentation:

ENXIO Returned by I2C adapters to indicate that the address phase of a transfer didn't get an ACK. While it might just mean an I2C device was temporarily not responding, usually it means there's nothing listening at that address.

As of commit 9292f37e1f5c79400254dca46f83313488093825, DRM treats ENXIO as "no device, bail":

Author: Eugeni Dodonov eugeni.dodonov@intel.com Date: Thu Jan 5 09:34:28 2012 -0200

drm: give up on edid retries when i2c bus is not responding

But in the case of my Sony TV it seems that we hit the "temporarily not responding" case, because if I insert a delay, the message gets acked and the EDID gets read successfully. Similarly, if I revert the patch so that we attempt the query a few times times, it succeeds on second retry.

Any suggested solutions to this issue?

Thanks, Daniel

Show replies by date

Daniel Drake

16 Dec 16 Dec

9:50 p.m.

Resend with correct addresses for Eugeni and Chris...

On Mon, Dec 16, 2013 at 3:47 PM, Daniel Drake drake@endlessm.com wrote:

...

Hi,

I'm using exynos_drm on Exynos4412 to output to a Sony HDMI TV.

When I disconnect and then re-plug the TV, Exynos detects this event and tries to read the EDID from the DDC over I2C.

The DDC does not provide an ACK at this point, so the i2c-s3c2410 driver reports ENXIO, which seems to agree with the documentation:

ENXIO Returned by I2C adapters to indicate that the address phase of a transfer didn't get an ACK. While it might just mean an I2C device was temporarily not responding, usually it means there's nothing listening at that address.

As of commit 9292f37e1f5c79400254dca46f83313488093825, DRM treats ENXIO as "no device, bail":

Author: Eugeni Dodonov eugeni.dodonov@intel.com Date: Thu Jan 5 09:34:28 2012 -0200
drm: give up on edid retries when i2c bus is not responding
But in the case of my Sony TV it seems that we hit the "temporarily not responding" case, because if I insert a delay, the message gets acked and the EDID gets read successfully. Similarly, if I revert the patch so that we attempt the query a few times times, it succeeds on second retry.

Any suggested solutions to this issue?

Thanks, Daniel

Daniel Vetter

10:19 p.m.

On Mon, Dec 16, 2013 at 10:47 PM, Daniel Drake drake@endlessm.com wrote:

...

I'm using exynos_drm on Exynos4412 to output to a Sony HDMI TV.

When I disconnect and then re-plug the TV, Exynos detects this event and tries to read the EDID from the DDC over I2C.

The DDC does not provide an ACK at this point, so the i2c-s3c2410 driver reports ENXIO, which seems to agree with the documentation:

ENXIO Returned by I2C adapters to indicate that the address phase of a transfer didn't get an ACK. While it might just mean an I2C device was temporarily not responding, usually it means there's nothing listening at that address.

As of commit 9292f37e1f5c79400254dca46f83313488093825, DRM treats ENXIO as "no device, bail":

Author: Eugeni Dodonov eugeni.dodonov@intel.com Date: Thu Jan 5 09:34:28 2012 -0200
drm: give up on edid retries when i2c bus is not responding
But in the case of my Sony TV it seems that we hit the "temporarily not responding" case, because if I insert a delay, the message gets acked and the EDID gets read successfully. Similarly, if I revert the patch so that we attempt the query a few times times, it succeeds on second retry.

Any suggested solutions to this issue?

This usually happens if the hpd isn't properly recessed and we start the i2c transaction while the physical connection hasn't been established properly yet. If you're _really_ slow and careful you can probably even break your current delay (presuming this theory is correct).

At least on intel I've not yet heard of this, at least since we've fixed up our code for this bug. We now only have the reverse problem, where when you unplug the connector really slowly we'll miss the unplug since the i2c slave still responded when the hpd interrupt fired. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Drake

10:55 p.m.

On Mon, Dec 16, 2013 at 4:19 PM, Daniel Vetter daniel@ffwll.ch wrote:

...

This usually happens if the hpd isn't properly recessed and we start the i2c transaction while the physical connection hasn't been established properly yet. If you're _really_ slow and careful you can probably even break your current delay (presuming this theory is correct).

Hmm yes, I think you are probably right. Without touching the HDMI cable I disconnect and reconnect the power cable of my TV. Presumably that plug is more "atomic" :)

When I do that, it detects the resolution fine.

Do you have any suggestions on how to fix this then? Anything nicer than e.g. a 1 second delay before processing hpd events? That would still fail in the "slow connect" case but might be the best option?

Daniel

Daniel Vetter

11:40 p.m.

On Mon, Dec 16, 2013 at 11:55 PM, Daniel Drake drake@endlessm.com wrote:

...

On Mon, Dec 16, 2013 at 4:19 PM, Daniel Vetter daniel@ffwll.ch wrote:

...
This usually happens if the hpd isn't properly recessed and we start the i2c transaction while the physical connection hasn't been established properly yet. If you're _really_ slow and careful you can probably even break your current delay (presuming this theory is correct).

Hmm yes, I think you are probably right. Without touching the HDMI cable I disconnect and reconnect the power cable of my TV. Presumably that plug is more "atomic" :)

When I do that, it detects the resolution fine.

Do you have any suggestions on how to fix this then? Anything nicer than e.g. a 1 second delay before processing hpd events? That would still fail in the "slow connect" case but might be the best option?

Have a bit of logic in the exynos ->detect function to re-try a 2nd round of edid probing after each hdp interrupt if the first one returns an -ENXIO. Only tricky part is to be careful with edge detection so that userspace gets all the hotplug events still. Presuming you don't have any funkiness with reprobing causing yet another hpd interrupt and stuff like that (seen it all), as long as you're using the helpers in drm_crtc_helper.c it should all be working correctly. So you want a work item which just grabs all modeset locks and then calls drm_helper_probe_single_connector_modes or something on the right connector. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Drake

17 Dec 17 Dec

3:12 p.m.

On Mon, Dec 16, 2013 at 5:40 PM, Daniel Vetter daniel@ffwll.ch wrote:

...

Have a bit of logic in the exynos ->detect function to re-try a 2nd round of edid probing after each hdp interrupt if the first one returns an -ENXIO. Only tricky part is to be careful with edge detection so that userspace gets all the hotplug events still. Presuming you don't have any funkiness with reprobing causing yet another hpd interrupt and stuff like that (seen it all), as long as you're using the helpers in drm_crtc_helper.c it should all be working correctly. So you want a work item which just grabs all modeset locks and then calls drm_helper_probe_single_connector_modes or something on the right connector.

Thanks for the tips. Having trouble sticking to those details though. exynos_drm_connector_detect() is actually a long way away from EDID probing so it is hard to detect the ENXIO case there.

What happens here is: exynos hdmi_irq_thread() calls drm_helper_hpd_irq_event() That then calls exynos_drm_connector_detect(), which returns a simple "yes, hpd detection says we are connected" Then it calls drm_kms_helper_hotplug_event() for which the call chain is:

drm_kms_helper_hotplug_event exynos_drm_output_poll_changed drm_fb_helper_hotplug_event drm_fb_helper_probe_connector_mode exynos_drm_connector_fill_modes drm_helper_probe_single_connector_modes exynos_drm_connector_get_modes drm_get_edid drm_do_get_edid drm_do_probe_ddc_edid <-- ENXIO in here

drm_do_probe_ddc_edid actually swallows the ENXIO code and just returns -1, so that particular error is indistinguishable from others.

Trying to follow your suggestions, the closest would seem to be something like:

1. Modify exynos_drm_connector_detect to read and cache the EDID right away. If EDID read fails for any reason, report "no connection" and schedule a work item to retry the EDID read. If the later EDID read succeeds, call drm_kms_helper_hotplug_event()

2. Modify exynos_drm_connector_get_modes to return cached EDID

Does that sound sensible?

Thanks Daniel

Daniel Vetter

18 Dec 18 Dec

8:43 a.m.

On Tue, Dec 17, 2013 at 09:12:42AM -0600, Daniel Drake wrote:

...

On Mon, Dec 16, 2013 at 5:40 PM, Daniel Vetter daniel@ffwll.ch wrote:

...
Have a bit of logic in the exynos ->detect function to re-try a 2nd round of edid probing after each hdp interrupt if the first one returns an -ENXIO. Only tricky part is to be careful with edge detection so that userspace gets all the hotplug events still. Presuming you don't have any funkiness with reprobing causing yet another hpd interrupt and stuff like that (seen it all), as long as you're using the helpers in drm_crtc_helper.c it should all be working correctly. So you want a work item which just grabs all modeset locks and then calls drm_helper_probe_single_connector_modes or something on the right connector.

Thanks for the tips. Having trouble sticking to those details though. exynos_drm_connector_detect() is actually a long way away from EDID probing so it is hard to detect the ENXIO case there.

Yeah, was a bit hand-wavy ;-)

...

What happens here is: exynos hdmi_irq_thread() calls drm_helper_hpd_irq_event() That then calls exynos_drm_connector_detect(), which returns a simple "yes, hpd detection says we are connected" Then it calls drm_kms_helper_hotplug_event() for which the call chain is:

drm_kms_helper_hotplug_event exynos_drm_output_poll_changed drm_fb_helper_hotplug_event drm_fb_helper_probe_connector_mode exynos_drm_connector_fill_modes drm_helper_probe_single_connector_modes exynos_drm_connector_get_modes drm_get_edid drm_do_get_edid drm_do_probe_ddc_edid <-- ENXIO in here

drm_do_probe_ddc_edid actually swallows the ENXIO code and just returns -1, so that particular error is indistinguishable from others.

Trying to follow your suggestions, the closest would seem to be something like:

Modify exynos_drm_connector_detect to read and cache the EDID right

away. If EDID read fails for any reason, report "no connection" and schedule a work item to retry the EDID read. If the later EDID read succeeds, call drm_kms_helper_hotplug_event()

Modify exynos_drm_connector_get_modes to return cached EDID

I think we can do it simpler. When you get a hpd interrupt you eventually call drm_helper_hpd_irq_event which will update all the state and poke connectors. I'd just create a delay_work which you launch from hdmi_irq_thread with a 1 second delay which again calls drm_helper_hpd_irq_event. That way you won't miss the EDID (presuming the user isn't _really_ careful with pluggin in the cable slowly).

It's a bit inefficient but also doesn't really matter since hpd don't happen often. And when they do userspace usually wants to do a modeset anyway, so an added 30ms to read the edid twice won't really hurt. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Drake

4:28 p.m.

On Wed, Dec 18, 2013 at 2:43 AM, Daniel Vetter daniel@ffwll.ch wrote:

...

I think we can do it simpler. When you get a hpd interrupt you eventually call drm_helper_hpd_irq_event which will update all the state and poke connectors. I'd just create a delay_work which you launch from hdmi_irq_thread with a 1 second delay which again calls drm_helper_hpd_irq_event. That way you won't miss the EDID (presuming the user isn't _really_ careful with pluggin in the cable slowly).

OK, I like this more simplistic option.

However, one more pesky detail, if I am understanding your suggestion correctly. You are hoping for:

1. I connect the display, causing HPD interrupt 2. hpd irq handler calls drm_helper_hpd_irq_event() directly. Fails to read EDID, no big deal. 3. hpd irq handles schedules a work item to call drm_helper_hpd_irq_event() a second time, 1 second later. Reads EDID sucessfully, image gets output to display.

That doesn't quite work. drm_helper_hpd_irq_event() in step 2 notices and records the state change (disconnected --> connected). And after drm_helper_probe_single_connector_modes() fails to read the EDID, it starts outputting an image at 1024x768. When we come to step 3, we call drm_helper_hpd_irq_event again, but that just returns without doing anything as it does not detect any new state change.

If we skip step 2, i.e. always delay the call to drm_helper_hpd_irq_event by 1 second, the problem is solved. As a user, the 1 second delay is not noticable. What do you think about just doing this?

Daniel

Daniel Vetter

5:15 p.m.

On Wed, Dec 18, 2013 at 10:28:37AM -0600, Daniel Drake wrote:

...

On Wed, Dec 18, 2013 at 2:43 AM, Daniel Vetter daniel@ffwll.ch wrote:

...
I think we can do it simpler. When you get a hpd interrupt you eventually call drm_helper_hpd_irq_event which will update all the state and poke connectors. I'd just create a delay_work which you launch from hdmi_irq_thread with a 1 second delay which again calls drm_helper_hpd_irq_event. That way you won't miss the EDID (presuming the user isn't _really_ careful with pluggin in the cable slowly).

OK, I like this more simplistic option.

However, one more pesky detail, if I am understanding your suggestion correctly. You are hoping for:

I connect the display, causing HPD interrupt

hpd irq handler calls drm_helper_hpd_irq_event() directly. Fails to

read EDID, no big deal. 3. hpd irq handles schedules a work item to call drm_helper_hpd_irq_event() a second time, 1 second later. Reads EDID sucessfully, image gets output to display.

That doesn't quite work. drm_helper_hpd_irq_event() in step 2 notices and records the state change (disconnected --> connected). And after drm_helper_probe_single_connector_modes() fails to read the EDID, it starts outputting an image at 1024x768. When we come to step 3, we call drm_helper_hpd_irq_event again, but that just returns without doing anything as it does not detect any new state change.

If we skip step 2, i.e. always delay the call to drm_helper_hpd_irq_event by 1 second, the problem is solved. As a user, the 1 second delay is not noticable. What do you think about just doing this?

Oh right, if you use the hpd pin for detection then this will happen. So I guess always delaying is the right option. Or you could make the connector state only conditional on an edid being present for HDMI, which would also work. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Kurtz

7:58 p.m.

On Tue, Dec 17, 2013 at 11:12 PM, Daniel Drake drake@endlessm.com wrote:

...

On Mon, Dec 16, 2013 at 5:40 PM, Daniel Vetter daniel@ffwll.ch wrote:

...
Have a bit of logic in the exynos ->detect function to re-try a 2nd round of edid probing after each hdp interrupt if the first one returns an -ENXIO. Only tricky part is to be careful with edge detection so that userspace gets all the hotplug events still. Presuming you don't have any funkiness with reprobing causing yet another hpd interrupt and stuff like that (seen it all), as long as you're using the helpers in drm_crtc_helper.c it should all be working correctly. So you want a work item which just grabs all modeset locks and then calls drm_helper_probe_single_connector_modes or something on the right connector.

Thanks for the tips. Having trouble sticking to those details though. exynos_drm_connector_detect() is actually a long way away from EDID probing so it is hard to detect the ENXIO case there.

What happens here is: exynos hdmi_irq_thread() calls drm_helper_hpd_irq_event() That then calls exynos_drm_connector_detect(), which returns a simple "yes, hpd detection says we are connected" Then it calls drm_kms_helper_hotplug_event() for which the call chain is:

drm_kms_helper_hotplug_event exynos_drm_output_poll_changed drm_fb_helper_hotplug_event drm_fb_helper_probe_connector_mode exynos_drm_connector_fill_modes drm_helper_probe_single_connector_modes exynos_drm_connector_get_modes drm_get_edid drm_do_get_edid drm_do_probe_ddc_edid <-- ENXIO in here

drm_do_probe_ddc_edid actually swallows the ENXIO code and just returns -1, so that particular error is indistinguishable from others.

Trying to follow your suggestions, the closest would seem to be something like:

Modify exynos_drm_connector_detect to read and cache the EDID right

away. If EDID read fails for any reason, report "no connection" and schedule a work item to retry the EDID read. If the later EDID read succeeds, call drm_kms_helper_hotplug_event()

Modify exynos_drm_connector_get_modes to return cached EDID

Does that sound sensible?

+seanpaul

I think the problem is that the hdmi irq is really just an undebounced gpio interrupt, and thus, it is firing way too soon. The chromium kernel adds an excplicit 1.1 second timer to debounce hpd between the hdmi hpd-gpio irq and taking any action (ie reading the edid). I believe this will eventually make its way upstream, but is still pending some other patches:

http://git.chromium.org/gitweb/?p=chromiumos/third_party/kernel-next.git;a=b...

-Dan

...

Thanks Daniel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Daniel Drake

8:18 p.m.

On Wed, Dec 18, 2013 at 1:58 PM, Daniel Kurtz djkurtz@chromium.org wrote:

...

+seanpaul

I think the problem is that the hdmi irq is really just an undebounced gpio interrupt, and thus, it is firing way too soon. The chromium kernel adds an excplicit 1.1 second timer to debounce hpd between the hdmi hpd-gpio irq and taking any action (ie reading the edid). I believe this will eventually make its way upstream, but is still pending some other patches:

http://git.chromium.org/gitweb/?p=chromiumos/third_party/kernel-next.git;a=b...

Yes, this looks very similar to the approach I tried earlier. I guess the patch was written for the same reasons as well. Sean, any objections to me taking your patch and sending it upstream?

http://git.chromium.org/gitweb/?p=chromiumos/third_party/kernel-next.git;a=c...

Thanks Daniel

Daniel Drake

10:07 p.m.

On Wed, Dec 18, 2013 at 2:18 PM, Daniel Drake drake@endlessm.com wrote:

...

Yes, this looks very similar to the approach I tried earlier. I guess the patch was written for the same reasons as well. Sean, any objections to me taking your patch and sending it upstream?

http://git.chromium.org/gitweb/?p=chromiumos/third_party/kernel-next.git;a=c...

Seems easier said than done. The patch looks innocent but it touches on other stuff that has changed a lot.

For example, it adds cancel_delayed_work_sync() in the hdmi_poweroff() path. With my naive backport to vanilla-3.8, hdmi_poweroff() gets called within the context of that work item, and a work item trying to cancel_delayed_work_sync() on itself hangs. Reproduced just by unplugging the display once.

I see that things have changed substantially in the ChromiumOS tree, for example this patch probably has some effect:

commit 29ae0c6096395a96a597453271946fc7d8442b6e Author: Sean Paul seanpaul@chromium.org Date: Thu Jun 20 17:24:12 2013 -0400

drm/exynos: Consolidate suspend/resume in drm_drv

As this job seems bigger than anticipated and I don't have any Exynos hardware that can boot mainline, I think I might have to stop here, and hope that ChromiumOS guys upstream all this soon?

Daniel

Sean Paul

6 Jan 6 Jan

3:17 p.m.

On Wed, Dec 18, 2013 at 5:07 PM, Daniel Drake drake@endlessm.com wrote:

...

On Wed, Dec 18, 2013 at 2:18 PM, Daniel Drake drake@endlessm.com wrote:

...
Yes, this looks very similar to the approach I tried earlier. I guess the patch was written for the same reasons as well. Sean, any objections to me taking your patch and sending it upstream?

http://git.chromium.org/gitweb/?p=chromiumos/third_party/kernel-next.git;a=c...

Seems easier said than done. The patch looks innocent but it touches on other stuff that has changed a lot.

For example, it adds cancel_delayed_work_sync() in the hdmi_poweroff() path. With my naive backport to vanilla-3.8, hdmi_poweroff() gets called within the context of that work item, and a work item trying to cancel_delayed_work_sync() on itself hangs. Reproduced just by unplugging the display once.

I see that things have changed substantially in the ChromiumOS tree, for example this patch probably has some effect:

commit 29ae0c6096395a96a597453271946fc7d8442b6e Author: Sean Paul seanpaul@chromium.org Date: Thu Jun 20 17:24:12 2013 -0400
drm/exynos: Consolidate suspend/resume in drm_drv
As this job seems bigger than anticipated and I don't have any Exynos hardware that can boot mainline, I think I might have to stop here, and hope that ChromiumOS guys upstream all this soon?

Hey Daniel, The set is posted, but I still owe a new version to Tomasz. It's on my TODO list, I'm hoping to get to it soon.

Sean

...

Daniel

4130

Age (days ago)

4151

Last active (days ago)

dri-devel@lists.freedesktop.org

12 comments

4 participants

tags (0)

participants (4)

Daniel Drake
Daniel Kurtz
Daniel Vetter
Sean Paul