On 10/18/2011 11:14, Jean Delvare wrote:
Hi Dave, Eugeni, Alex,
On Tue, 18 Oct 2011 11:02:00 +0100, Dave Airlie wrote:
This allows to avoid talking to a non-existent bus repeatedly until we finally timeout. The non-existent bus is signalled by -ENXIO error, provided by i2c_algo_bit:bit_doAddress call.
As the advantage of such change, all the other routines which use drm_get_edid would benefit for this timeout.
This change should fix https://bugs.freedesktop.org/show_bug.cgi?id=41059 and improve overall edid detection timing by 10-30% in most cases, and by a much larger margin in case of phantom outputs.
Jean, Alex,
I'm thinking of thowing this into -next, any objections?
This seems to be the wrong place for the test. The code below is really testing for non-responsive (possibly not present) EDID, NOT "non-existent adapter". Non-existent adapter should be checked in the firmware if possible, or failing that, by testing the bus lines at bus creation time, as i2c_algo_bit.bit_test=1 is doing. While this setting has been known to cause trouble recently (not per se but because it was triggering a bug somewhere else in the radeon driver), it might still have value, and could be changed to a per-adapter setting by exporting the test function (I have a patch ready doing exactly this) and letting video drivers test their I2C buses and discard the non-working ones if they want.
I am worried that the patch below will cause regressions on other machines. There's a comment right before the loop which reads:
/* The core i2c driver will automatically retry the transfer if the * adapter reports EAGAIN. However, we find that bit-banging transfers * are susceptible to errors under a heavily loaded machine and * generate spurious NAKs and timeouts. Retrying the transfer * of the individual block a few times seems to overcome this. */
So the retries are there for a reason, and -ENXIO is exactly what you get on spurious NAKs.
Hi Jean,
You are right about the bit_test=1 testing, my initial attempt at the fix did exactly that (https://bugs.freedesktop.org/show_bug.cgi?id=41059, comments 14 and 15).
The problem is that for some buses, namely HDMI ones in my testing, bit_test-like tests always consider them as non-existent when the connectivity is not setup; and as working when it is. So in any case, we could not just blacklist them - when they do, they are gone for good, and won't work anymore, and we need to keep re-trying them at each attempt.
And in case of continuous pre-testing for the stuck bits and like when driver attempts to get the edid (for example, when xrandr is run), we still hit the same issue - the drm_do_probe_ddc_edid will continue to retry the attempts until it reaches the maximum number of retries. E.g., there is no way to tell drm_do_probe_ddc_edid to treat any return code as a permanent failure and make it give up faster.
It could be fixed either in per-driver fashion, like I did with the other patch (which also tests for -ENXIO); or in a generic way, in DRM. I couldn't reproduce any false positives coming from -ENXIO on i915 driver, but perhaps it is easier with radeon? Do you have any specific workload which trick the driver into generating spurious NAKs by a chance?
One thing which may make sense would be to make the retry count a module parameter, instead of a hard-coded value, so that users can lower the value if they care more about boot time than reliability. But again, ideally problematic buses would not have been created in the first place.
Or perhaps it would be possible to have any error code coming from i2c_transfer to signal a permanent error, which should not be retried.. what do you think?
-- Eugeni Dodonov