On Tue, 2017-07-18 at 22:21 +0100, Chris Wilson wrote:
Quoting Paul Kocialkowski (2017-07-18 16:16:26)
It may occur that a hotplug uevent is detected at resume, even though it does not indicate that an actual hotplug happened. This is the case when link training fails on any other connector.
There is currently no way to distinguish what connector caused a hotplug uevent, nor what the reason for that uevent really is. This makes it impossible to find out whether the test actually passed or not.
And you may get more than one and then this skips even though the test passed. Looks like the patch is overcompensating. What you can do is repeat the test a few times, and then look at all the different errors you get. If the connector remains (no mst disappareance) once it goes bad, it should remain bad and so not generate any new uevent. Or you only repeat the test whilst link_status[old] != link_status[new].
I am not sure it is really desirable to repeat the test until we are fairly certain it succeeds. This involves suspend/resume, that is already long enough as it is.
Also, a uevent will be generated everytime link training fails, regardless of whether it was already failing before (I just tested that to make sure). In my case, it's due to a DP-VGA bridge that will consistently fail link training in the first seconds after resume.
So this is actually even worse that I thought, because there is no way to find out that this is why a uevent was generated if the link status was already bad before.
So I don't see how we can manage with the current information at disposal.
My main point here is that we need more information about what's going on than simply "HOTPLUG=1". These patches demonstrate that working around the lack of information is a pain for testing purposes and can only leads to semi-working hackish workarounds.
Do you agree that this is what the problem really is?