Hello,
As a bit of background, all NVIDIA GPUs since GT215 have an audio subfunction for HDMI(/DP) audio to be sent to the sink. This generally works.
However some, especially laptop, devices come up with that function disabled. We have a quirk to enable it when coming back from runpm, but that doesn't help the init case. Basically we have to write a bit to the PCI config space:
https://github.com/torvalds/linux/blob/v4.12/drivers/gpu/drm/nouveau/nouveau...
(MMIO 0x88000 is an alias for the PCI config space)
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
Thanks,
-ilia
Am 09.10.2017 um 16:41 schrieb Ilia Mirkin:
Hello,
As a bit of background, all NVIDIA GPUs since GT215 have an audio subfunction for HDMI(/DP) audio to be sent to the sink. This generally works.
However some, especially laptop, devices come up with that function disabled. We have a quirk to enable it when coming back from runpm, but that doesn't help the init case. Basically we have to write a bit to the PCI config space:
https://github.com/torvalds/linux/blob/v4.12/drivers/gpu/drm/nouveau/nouveau...
(MMIO 0x88000 is an alias for the PCI config space)
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
Sounds similar to my work about resizing BARs.
I would just try to enable the device and then trigger a rescan of the BUS where it is attached (similar to echo 1 > /sys/bus/pci/devices/$pci_id_of_your_bus/rescan).
There is certainly a function in the PCI subsystem you just need to call for that.
Regards, Christian.
Thanks,
-ilia _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Oct 9, 2017 at 11:45 AM, Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 09.10.2017 um 16:41 schrieb Ilia Mirkin:
Hello,
As a bit of background, all NVIDIA GPUs since GT215 have an audio subfunction for HDMI(/DP) audio to be sent to the sink. This generally works.
However some, especially laptop, devices come up with that function disabled. We have a quirk to enable it when coming back from runpm, but that doesn't help the init case. Basically we have to write a bit to the PCI config space:
https://github.com/torvalds/linux/blob/v4.12/drivers/gpu/drm/nouveau/nouveau...
(MMIO 0x88000 is an alias for the PCI config space)
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
Sounds similar to my work about resizing BARs.
I would just try to enable the device and then trigger a rescan of the BUS where it is attached (similar to echo 1 > /sys/bus/pci/devices/$pci_id_of_your_bus/rescan).
There is certainly a function in the PCI subsystem you just need to call for that.
I'm not exactly familiar with the pci subsystem. I know that hotplug is a thing, at least in theory. I figured it'd be something related to that, since this is in essence a random device appearing on the bus after the initial scan. Is a bus rescan safe on a fully initialized system?
Either way, some specific functions to call or look at would be great.
Cheers,
-ilia
Am 09.10.2017 um 17:57 schrieb Ilia Mirkin:
On Mon, Oct 9, 2017 at 11:45 AM, Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 09.10.2017 um 16:41 schrieb Ilia Mirkin:
Hello,
As a bit of background, all NVIDIA GPUs since GT215 have an audio subfunction for HDMI(/DP) audio to be sent to the sink. This generally works.
However some, especially laptop, devices come up with that function disabled. We have a quirk to enable it when coming back from runpm, but that doesn't help the init case. Basically we have to write a bit to the PCI config space:
https://github.com/torvalds/linux/blob/v4.12/drivers/gpu/drm/nouveau/nouveau...
(MMIO 0x88000 is an alias for the PCI config space)
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
Sounds similar to my work about resizing BARs.
I would just try to enable the device and then trigger a rescan of the BUS where it is attached (similar to echo 1 > /sys/bus/pci/devices/$pci_id_of_your_bus/rescan).
There is certainly a function in the PCI subsystem you just need to call for that.
I'm not exactly familiar with the pci subsystem. I know that hotplug is a thing, at least in theory. I figured it'd be something related to that, since this is in essence a random device appearing on the bus after the initial scan. Is a bus rescan safe on a fully initialized system?
Either way, some specific functions to call or look at would be great.
Well I would just try to use pci_rescan_bus(), as far as I know that should be save even when devices are already configured.
Christian.
Cheers,
-ilia
On Mon, Oct 09, 2017 at 10:41:38AM -0400, Ilia Mirkin wrote:
Hello,
As a bit of background, all NVIDIA GPUs since GT215 have an audio subfunction for HDMI(/DP) audio to be sent to the sink. This generally works.
However some, especially laptop, devices come up with that function disabled. We have a quirk to enable it when coming back from runpm, but that doesn't help the init case. Basically we have to write a bit to the PCI config space:
https://github.com/torvalds/linux/blob/v4.12/drivers/gpu/drm/nouveau/nouveau...
(MMIO 0x88000 is an alias for the PCI config space)
/* do magic */ nvif_mask(&device->object, 0x088488, (1 << 25), (1 << 25));
Wow, that *is* inscrutable magic. I gave up trying to convince myself that this turns into pci_write_config_word() eventually.
0x88000 is way too big for an offset into PCI config space (it's only 4096 bytes per device at most), so I don't know what sort of alias it could be. It doesn't look like a MMCONFIG address (which the driver shouldn't be using directly anyway).
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
If you make an early quirk for the GPU device, we'll run the quirk while enumerating it. If it can enable the audio device, and if the audio device is at a higher device number, the PCI core should try to enumerate it later. But there might be wrinkles if setting the bit turns a single-function device into a multi-function one, because the core might have already decided it didn't need to scan for more functions. DECLARE_PCI_FIXUP_CLASS_EARLY() is the place to start.
Bjorn
On Wed, Oct 11, 2017 at 7:47 AM, Bjorn Helgaas helgaas@kernel.org wrote:
On Mon, Oct 09, 2017 at 10:41:38AM -0400, Ilia Mirkin wrote:
Hello,
As a bit of background, all NVIDIA GPUs since GT215 have an audio subfunction for HDMI(/DP) audio to be sent to the sink. This generally works.
However some, especially laptop, devices come up with that function disabled. We have a quirk to enable it when coming back from runpm, but that doesn't help the init case. Basically we have to write a bit to the PCI config space:
https://github.com/torvalds/linux/blob/v4.12/drivers/gpu/drm/nouveau/nouveau...
(MMIO 0x88000 is an alias for the PCI config space)
/* do magic */ nvif_mask(&device->object, 0x088488, (1 << 25), (1 << 25));
Wow, that *is* inscrutable magic. I gave up trying to convince myself that this turns into pci_write_config_word() eventually.
0x88000 is way too big for an offset into PCI config space (it's only 4096 bytes per device at most), so I don't know what sort of alias it could be. It doesn't look like a MMCONFIG address (which the driver shouldn't be using directly anyway).
0x88000 is an offset into one of the BARs. At this offset is a mirror of the PCI config space (or you could say that the PCI config space is a mirror of these registers... we'll never know). The "magic" comment probably refers to flipping the bit though, since it's the "make it work" bit.
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
If you make an early quirk for the GPU device, we'll run the quirk while enumerating it. If it can enable the audio device, and if the audio device is at a higher device number, the PCI core should try to enumerate it later. But there might be wrinkles if setting the bit turns a single-function device into a multi-function one, because the core might have already decided it didn't need to scan for more functions. DECLARE_PCI_FIXUP_CLASS_EARLY() is the place to start.
I thought about this type of approach but I have two concerns: - We can't just do this willy-nilly on *any* NVIDIA GPU, only on "new" ones. The definition of "new" here, of course, is like 2010, and there have been a lot of chips released since then. Each chip is allocated a group of 0x20 or 0x40 PCI device ids, but it'll still be a lot of code. Do you really want this in drivers/pci/quirks.c? - This fixup will be applied no matter what. I wonder if that's an OK thing to do -- feels like this sort of thing should be scoped to the driver. I was assuming there'd be some magic hotplug function that could be called. I suppose we could condition the quirk on CONFIG_NOUVEAU being built (into the kernel or otherwise), but that's still a coarse tool given that distros enable everything.
How would I find out about the single-function vs multi-function issue? Is there a separate bit in the pci config that indicates a multi-function device, or do you just mean it's an implementation issue in the pci core which might not handle a quirk that adds a second device?
FWIW the GPU always comes up on .0, while the audio function comes up on .1. I've never seen an NVIDIA GPU where that's different.
Thanks,
-ilia
On Wed, Oct 11, 2017 at 08:54:05AM -0400, Ilia Mirkin wrote:
On Wed, Oct 11, 2017 at 7:47 AM, Bjorn Helgaas helgaas@kernel.org wrote:
On Mon, Oct 09, 2017 at 10:41:38AM -0400, Ilia Mirkin wrote:
Hello,
As a bit of background, all NVIDIA GPUs since GT215 have an audio subfunction for HDMI(/DP) audio to be sent to the sink. This generally works.
However some, especially laptop, devices come up with that function disabled. We have a quirk to enable it when coming back from runpm, but that doesn't help the init case. Basically we have to write a bit to the PCI config space:
https://github.com/torvalds/linux/blob/v4.12/drivers/gpu/drm/nouveau/nouveau...
(MMIO 0x88000 is an alias for the PCI config space)
/* do magic */ nvif_mask(&device->object, 0x088488, (1 << 25), (1 << 25));
Wow, that *is* inscrutable magic. I gave up trying to convince myself that this turns into pci_write_config_word() eventually.
0x88000 is way too big for an offset into PCI config space (it's only 4096 bytes per device at most), so I don't know what sort of alias it could be. It doesn't look like a MMCONFIG address (which the driver shouldn't be using directly anyway).
0x88000 is an offset into one of the BARs. At this offset is a mirror of the PCI config space (or you could say that the PCI config space is a mirror of these registers... we'll never know). The "magic" comment probably refers to flipping the bit though, since it's the "make it work" bit.
OK, that makes sense. If you made a quirk, you might want to use a config access, since using MMIO means you depend on the BAR having been assigned space. Often the BIOS will have assigned the BAR, but there's no requirement for it to do so.
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
If you make an early quirk for the GPU device, we'll run the quirk while enumerating it. If it can enable the audio device, and if the audio device is at a higher device number, the PCI core should try to enumerate it later. But there might be wrinkles if setting the bit turns a single-function device into a multi-function one, because the core might have already decided it didn't need to scan for more functions. DECLARE_PCI_FIXUP_CLASS_EARLY() is the place to start.
I thought about this type of approach but I have two concerns:
- We can't just do this willy-nilly on *any* NVIDIA GPU, only on
"new" ones. The definition of "new" here, of course, is like 2010, and there have been a lot of chips released since then. Each chip is allocated a group of 0x20 or 0x40 PCI device ids, but it'll still be a lot of code. Do you really want this in drivers/pci/quirks.c?
It sounds like this is basically a workaround for a BIOS bug -- it forgot to enable the audio device. If you complain to the vendor, they might eventually fix it and you won't need to add new devices.
- This fixup will be applied no matter what. I wonder if that's an OK
thing to do -- feels like this sort of thing should be scoped to the driver. I was assuming there'd be some magic hotplug function that could be called. I suppose we could condition the quirk on CONFIG_NOUVEAU being built (into the kernel or otherwise), but that's still a coarse tool given that distros enable everything.
You can have conditions in your fixup -- only certain devices, certain platforms, etc.
How would I find out about the single-function vs multi-function issue? Is there a separate bit in the pci config that indicates a multi-function device, or do you just mean it's an implementation issue in the pci core which might not handle a quirk that adds a second device?
There's a bit in the header that tells us whether it's a multifunction device. See pci_setup_device(), where it sets dev->multifunction. That happens before we run early quirks, so if enabling the audio device in the quirk changed the bit, we'd have a problem. But you'll have to experiment. It could be that the GPU identifies as a multifunction device even if the audio device is disabled.
Bjorn
On Wed, Oct 11, 2017 at 9:46 AM, Bjorn Helgaas helgaas@kernel.org wrote:
On Wed, Oct 11, 2017 at 08:54:05AM -0400, Ilia Mirkin wrote:
On Wed, Oct 11, 2017 at 7:47 AM, Bjorn Helgaas helgaas@kernel.org wrote:
/* do magic */ nvif_mask(&device->object, 0x088488, (1 << 25), (1 << 25));
Wow, that *is* inscrutable magic. I gave up trying to convince myself that this turns into pci_write_config_word() eventually.
0x88000 is way too big for an offset into PCI config space (it's only 4096 bytes per device at most), so I don't know what sort of alias it could be. It doesn't look like a MMCONFIG address (which the driver shouldn't be using directly anyway).
0x88000 is an offset into one of the BARs. At this offset is a mirror of the PCI config space (or you could say that the PCI config space is a mirror of these registers... we'll never know). The "magic" comment probably refers to flipping the bit though, since it's the "make it work" bit.
OK, that makes sense. If you made a quirk, you might want to use a config access, since using MMIO means you depend on the BAR having been assigned space. Often the BIOS will have assigned the BAR, but there's no requirement for it to do so.
Oh yeah, we'd definitely do it via the PCI config space. Mapping the bar in early pci setup doesn't seem like the wise move. I was just mentioning the 88000 thing so that the code made sense. I guess that didn't go to plan...
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
If you make an early quirk for the GPU device, we'll run the quirk while enumerating it. If it can enable the audio device, and if the audio device is at a higher device number, the PCI core should try to enumerate it later. But there might be wrinkles if setting the bit turns a single-function device into a multi-function one, because the core might have already decided it didn't need to scan for more functions. DECLARE_PCI_FIXUP_CLASS_EARLY() is the place to start.
I thought about this type of approach but I have two concerns:
- We can't just do this willy-nilly on *any* NVIDIA GPU, only on
"new" ones. The definition of "new" here, of course, is like 2010, and there have been a lot of chips released since then. Each chip is allocated a group of 0x20 or 0x40 PCI device ids, but it'll still be a lot of code. Do you really want this in drivers/pci/quirks.c?
It sounds like this is basically a workaround for a BIOS bug -- it forgot to enable the audio device. If you complain to the vendor, they might eventually fix it and you won't need to add new devices.
I was thinking of just blanket applying it to all GT215+ chips after we do some spot checks for safety -- the chip can be determined based on PCI id (http://envytools.readthedocs.io/en/latest/hw/pciid.html). There are more GPU pci ids than there are users of nouveau... and the failure has nothing to do with the pci id but rather with the platform and/or vbios author. It's a somewhat widespread problem among laptops and still happens with new ones.
I suspect it was treated as a BIOS feature rather than a bug -- you get a second audio device and confuse the user about which one is the "real audio". There are even some laptops which make the audio function appear based on HDMI cable presence at boot (and I think make it appear/disappear dynamically in Windows).
- This fixup will be applied no matter what. I wonder if that's an OK
thing to do -- feels like this sort of thing should be scoped to the driver. I was assuming there'd be some magic hotplug function that could be called. I suppose we could condition the quirk on CONFIG_NOUVEAU being built (into the kernel or otherwise), but that's still a coarse tool given that distros enable everything.
You can have conditions in your fixup -- only certain devices, certain platforms, etc.
How would I find out about the single-function vs multi-function issue? Is there a separate bit in the pci config that indicates a multi-function device, or do you just mean it's an implementation issue in the pci core which might not handle a quirk that adds a second device?
There's a bit in the header that tells us whether it's a multifunction device. See pci_setup_device(), where it sets dev->multifunction. That happens before we run early quirks, so if enabling the audio device in the quirk changed the bit, we'd have a problem. But you'll have to experiment. It could be that the GPU identifies as a multifunction device even if the audio device is disabled.
OK, thanks for all the info! I'll try to work something up and get it tested by affected users.
Cheers,
-ilia
On Wed, Oct 11, 2017 at 4:18 PM, Ilia Mirkin imirkin@alum.mit.edu wrote:
On Wed, Oct 11, 2017 at 9:46 AM, Bjorn Helgaas helgaas@kernel.org wrote:
On Wed, Oct 11, 2017 at 08:54:05AM -0400, Ilia Mirkin wrote:
On Wed, Oct 11, 2017 at 7:47 AM, Bjorn Helgaas helgaas@kernel.org wrote:
/* do magic */ nvif_mask(&device->object, 0x088488, (1 << 25), (1 << 25));
Wow, that *is* inscrutable magic. I gave up trying to convince myself that this turns into pci_write_config_word() eventually.
0x88000 is way too big for an offset into PCI config space (it's only 4096 bytes per device at most), so I don't know what sort of alias it could be. It doesn't look like a MMCONFIG address (which the driver shouldn't be using directly anyway).
0x88000 is an offset into one of the BARs. At this offset is a mirror of the PCI config space (or you could say that the PCI config space is a mirror of these registers... we'll never know). The "magic" comment probably refers to flipping the bit though, since it's the "make it work" bit.
OK, that makes sense. If you made a quirk, you might want to use a config access, since using MMIO means you depend on the BAR having been assigned space. Often the BIOS will have assigned the BAR, but there's no requirement for it to do so.
Oh yeah, we'd definitely do it via the PCI config space. Mapping the bar in early pci setup doesn't seem like the wise move. I was just mentioning the 88000 thing so that the code made sense. I guess that didn't go to plan...
This works for runtime pm resume, since the device was originally there and we just have to make sure the underlying device agrees with it, but when it's missing on boot, we have to convince linux that it exists, bind a driver, etc.
What's the best way of going about doing that?
If you make an early quirk for the GPU device, we'll run the quirk while enumerating it. If it can enable the audio device, and if the audio device is at a higher device number, the PCI core should try to enumerate it later. But there might be wrinkles if setting the bit turns a single-function device into a multi-function one, because the core might have already decided it didn't need to scan for more functions. DECLARE_PCI_FIXUP_CLASS_EARLY() is the place to start.
I thought about this type of approach but I have two concerns:
- We can't just do this willy-nilly on *any* NVIDIA GPU, only on
"new" ones. The definition of "new" here, of course, is like 2010, and there have been a lot of chips released since then. Each chip is allocated a group of 0x20 or 0x40 PCI device ids, but it'll still be a lot of code. Do you really want this in drivers/pci/quirks.c?
It sounds like this is basically a workaround for a BIOS bug -- it forgot to enable the audio device. If you complain to the vendor, they might eventually fix it and you won't need to add new devices.
I was thinking of just blanket applying it to all GT215+ chips after we do some spot checks for safety -- the chip can be determined based on PCI id (http://envytools.readthedocs.io/en/latest/hw/pciid.html). There are more GPU pci ids than there are users of nouveau... and the failure has nothing to do with the pci id but rather with the platform and/or vbios author. It's a somewhat widespread problem among laptops and still happens with new ones.
I suspect it was treated as a BIOS feature rather than a bug -- you get a second audio device and confuse the user about which one is the "real audio". There are even some laptops which make the audio function appear based on HDMI cable presence at boot (and I think make it appear/disappear dynamically in Windows).
- This fixup will be applied no matter what. I wonder if that's an OK
thing to do -- feels like this sort of thing should be scoped to the driver. I was assuming there'd be some magic hotplug function that could be called. I suppose we could condition the quirk on CONFIG_NOUVEAU being built (into the kernel or otherwise), but that's still a coarse tool given that distros enable everything.
You can have conditions in your fixup -- only certain devices, certain platforms, etc.
How would I find out about the single-function vs multi-function issue? Is there a separate bit in the pci config that indicates a multi-function device, or do you just mean it's an implementation issue in the pci core which might not handle a quirk that adds a second device?
There's a bit in the header that tells us whether it's a multifunction device. See pci_setup_device(), where it sets dev->multifunction. That happens before we run early quirks, so if enabling the audio device in the quirk changed the bit, we'd have a problem. But you'll have to experiment. It could be that the GPU identifies as a multifunction device even if the audio device is disabled.
OK, thanks for all the info! I'll try to work something up and get it tested by affected users.
With the bit 25 in 0x488, we can disable/enable the Multifunction reporting bit 23 in 0xc, which means if we disable the multifunction bit in 0x488, the GPU itself doesn't tell the system it is one either. So the write to 0x488 is actually required to get the GPU into a state, where it actually reports to be a multifunction device.
Cheers,
-ilia _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org