On Fri, Jun 04, 2021 at 12:50:03PM +0800, Huacai Chen wrote:
On Thu, Jun 3, 2021 at 2:31 AM Bjorn Helgaas bhelgaas@google.com wrote:
[+cc linux-pci]
On Wed, Jun 2, 2021 at 11:22 AM Daniel Vetter daniel@ffwll.ch wrote:
On Wed, Jun 02, 2021 at 06:36:03PM +0800, Huacai Chen wrote:
On Wed, Jun 2, 2021 at 2:03 AM Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Jun 01, 2021 at 07:12:27PM +0200, Greg KH wrote:
On Tue, Jun 01, 2021 at 05:56:40PM +0200, Daniel Vetter wrote: > On Fri, May 28, 2021 at 04:26:07PM +0800, Huacai Chen wrote: > > We should call vga_arb_device_init() after PCI enumeration, otherwise it > > may fail to select the default VGA device. Since vga_arb_device_init() > > and PCI enumeration function (i.e., pcibios_init() or acpi_init()) are > > both wrapped by subsys_initcall(), their sequence is not assured. So, we > > use subsys_initcall_sync() instead of subsys_initcall() to wrap vga_arb_ > > device_init().
Trying to juggle levels like this always fails if you build the code as a module.
Why not fix it properly and handle the out-of-order loading by returning a "deferred" error if you do not have your resources yet?
It's not a driver, it's kinda a bolted-on-the-side subsytem of pci. So not something you can -EPROBE_DEFER I think, without potentially upsetting the drivers that need this.
Which might mean we should move this into pci subsystem proper perhaps? Then adding the init call at the right time becomes trivial since we just plug it in at the end of pci init.
Also maybe that's how distros avoid this pain, pci is built-in, vgaarb is generally a module, problem solved.
Bjorn, would you take this entire vgaarb.c thing? From a quick look I don't think it has a drm-ism in it (unlike vga_switcheroo, but that works a bit differently and doesn't have this init order issue).
Emmm, this patch cannot handle the hotplug case and module case, it just handles the case that vgaarb, drm driver and pci all built-in. But I think this is enough, because the original problem only happens on very few BMC-based VGA cards (BMC doesn't set the VGA Enable bit on the bridge, which breaks vgaarb).
I'm not talking aout hotplug, just ordering the various pieces correctly. That vgaarb isn't really a driver and also can't really handle hotplug is my point. I guess that got lost a bit?
Anyway my proposal is essentially to do a
$ git move drivers/gpu/vga/vgaarb.c drivers/pci
But I just realized that vgaarb is a bool option, so module isn't possible anyway, and we could fix this by calling vgaarb from pcibios init (with an empty static inline in the header if vgaarb is disabled). That makes the dependency very explicit and guarantees it works correctly.
pcibios_init() is also an initcall and is implemented by every arch. I agree that calling vga_arb_device_init() directly from pcibios_init() would probably fix this problem, and it would be really nice to have it not be an initcall. But it's also kind of a pain to have to update all those copies of pcibios_init(), and I would be looking for a way to unify it since it's not really an arch-specific thing.
I think the simplest solution, which I suggested earlier [1], would be to explicitly call vga_arbiter_add_pci_device() directly from the PCI core when it enumerates a VGA device. Then there's no initcall and no need for the BUS_NOTIFY_ADD/DEL_DEVICE stuff. vga_arbiter_add_pci_device() could set the default VGA device when it is enumerated, and change the default device if we enumerate a "better" one. And hotplug VGA devices would work automatically.
Emm, It seems that your solution has some difficulties to remove the whole initcall(vga_arb_device_init): we call vga_arbiter_add_pci_device() in pci_bus_add_device(), the list_for_each_entry() loop can be moved to vga_arbiter_check_bridge_sharing(), vga_arb_select_default_device() can be renamed to vga_arb_update_default_device() and be called in vga_arbiter_add_pci_device(), but how to handle misc_register(&vga_arb_device)?
Might need to keep vga_arb_device_init() as an initcall, but remove everything from it except the misc_register().