Re: [Freedreno] [PATCH 0/9] drm/msm: Avoid possible infinite probe deferral and speed booting

15 Jul 2020

      On Tue, Jul 14, 2020 at 4:54 PM Bjorn Andersson
bjorn.andersson@linaro.org wrote:
...
On Tue 14 Jul 15:13 PDT 2020, Rob Herring wrote:
...
On Tue, Jul 14, 2020 at 10:33 AM Jeffrey Hugo jeffrey.l.hugo@gmail.com wrote:
...
On Mon, Jul 13, 2020 at 5:50 PM Doug Anderson dianders@chromium.org wrote:
...
Hi,
On Mon, Jul 13, 2020 at 1:25 PM Rob Herring robh+dt@kernel.org wrote:
...
On Mon, Jul 13, 2020 at 9:08 AM Doug Anderson dianders@chromium.org wrote:
...
Hi,
On Mon, Jul 13, 2020 at 7:11 AM Rob Herring robh+dt@kernel.org wrote:
>
> On Fri, Jul 10, 2020 at 5:02 PM Douglas Anderson dianders@chromium.org wrote:
> >
> > I found that if I ever had a little mistake in my kernel config,
> > or device tree, or graphics driver that my system would sit in a loop
> > at bootup trying again and again and again.  An example log was:
>
> Why do we care about optimizing the error case?
It actually results in a _fully_ infinite loop.  That is: if anything
small causes a component of DRM to fail to probe then the whole system
doesn't boot because it just loops trying to probe over and over
again.  The messages I put in the commit message are printed over and
over and over again.
Sounds like a bug as that's not what should happen.
If you defer during boot (initcalls), then you'll be on the deferred
list until late_initcall and everything is retried. After
late_initcall, only devices getting added should trigger probing. But
maybe the adding and then removing a device is causing a re-trigger.
Right, I'm nearly certain that the adding and then removing is causing
a re-trigger.  I believe the loop would happen for any case where we
have a probe function that:

Adds devices.
After adding devices it decides that it needs to defer.
Removes the devices it added.
Return -EPROBE_DEFER from its probe function.

Specifically from what I know about how -EPROBE_DEFER works I'm not
sure how it wouldn't cause an infinite loop in that case.
Perhaps the missing part of my explanation, though, is why it never
gets out of this infinite loop.  In my case I purposely made the
bridge chip "ti-sn65dsi86.c" return an error (-EINVAL) in its probe
every time.  Obviously I wasn't going to get a display up like this,
but I just wanted to not loop forever at bootup.  I tracked down
exactly why we get an - EPROBE_DEFER over and over in this case.
You can see it in msm_dsi_host_register().  If some components haven't
shown up when that function runs it will _always_ return
-EPROBE_DEFER.
In my case, since I caused the bridge to fail to probe, those
components will _never_ show up.  That means that
msm_dsi_host_register() will _always_ return -EPROBE_DEFER.
I haven't dug through all the DRM code enough, but it doesn't
necessarily seem like the wrong behavior.  If the bridge driver or a
panel was a module then (presumably) they could show up later and so
it should be OK for it to defer, right?
So with all that, it doesn't really feel like this is a bug so much as
it's an unsupported use case.  The current deferral logic simply can't
handle the case we're throwing at it.  You cannot return -EPROBE_DEFER
if your probe function adds devices each time through the probe
function.
Assuming all the above makes sense, that means we're stuck with:
a) This patch series, which makes us not add devices.
b) Some other patch series which rearchitects the MSM graphics stack
to not return -EPROBE_DEFER in this case.
This isn't a MSM specific issue.  This is an issue with how the DSI
interface works, and how software is structured in Linux.  I would
expect that pretty much any DSI host in the kernel would have some
version of this issue.
The problem is that DSI is not "hot pluggable", so to give the DRM
stack the info it needs, we need both the DSI controller (aka the MSM
graphics stack in your case), and the thing it connects to (in your
case, the TI bridge, normally the actual panel) because the DRM stack
expects that if init completes, it has certain information
(resolution, etc), and some of that information is in the DSI
controller, and some of it is on the DSI device.
Ah yes, DRM's lack of hot-plug and discrete component support... Is
that not improved with some of the bridge rework?
Anyways, given there is a child dependency on the parent, I don't
think we should work-around DRM deficiencies in DT.
BTW, There's also a deferred probe timeout you can use which stops
deferring probe some number of seconds after late_initcall.
I don't think we can rely on the deferred probe timeout, given that it
was reverted back to 0 seconds past late_initcall - which given that
most of the involved components are modules, means that without the
opt-in command line option we would likely fail to bring up the display.
I meant just as a way to make progress booting in this case where the
display is never coming up. We're talking only about a better
experience for an error case.
Maybe a simple solution is just having some delay inserted between
delayed probe triggers so progress is made.
Rob

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [Freedreno] [PATCH 0/9] drm/msm: Avoid possible infinite probe deferral and speed booting