On Tue, May 31, 2016 at 01:34:43PM +0200, Lukas Wunner wrote:
On Mon, May 30, 2016 at 07:03:46PM +0200, Peter Wu wrote:
On Sun, May 29, 2016 at 05:50:06PM +0200, Lukas Wunner wrote:
How exactly did you reach the situation where the root port didn't wake up when you tried to load nouveau again? (IRC conversation this week.)
Ensure that the pci/pm patches are applied, then:
- Unload nouveau (I have blacklisted it for testing).
- Enable rpm for the root port and children (control = auto).
- Verify in the kernel logs that the devices are sleeping: pcieport 0000:00:01.0: power state changed by ACPI to D3cold
- (Optional, to rule out issues with delays:) Disable rpm for the Nvidia device (control = on).
- modprobe nouveau.
The above test with v4.6 + 4 pci/pm patches (8b71f565) gives:
50.245795 MXM: GUID detected in BIOS 50.245948 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492 50.246016 ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246044 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.GFX0._DSM] at AML address ffffc90000013b11 length 492 50.246110 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246256 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246289 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246443 ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160108/nsarguments-95) 50.246457 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PEGP._DSM] at AML address ffffc90000018297 length 1F 50.246932 pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported 50.247005 VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle 50.247084 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0.PG00._ON] at AML address ffffc9000001086e length 11D 50.390140 pcieport 0000:00:01.0: power state changed by ACPI to D0 50.491893 nseval-0227 ns_evaluate : **** Execute method [\_SB.PCI0.PEG0._DSW] at AML address ffffc90000010a2d length 1D 50.492285 pcieport 0000:00:01.0: PME# disabled 50.492583 nouveau 0000:01:00.0: unknown chipset (ffffffff) 50.492687 nouveau: probe of 0000:01:00.0 failed with error -12
I've tested this on a MacBook Pro, which does not have ACPI _PR3 methods for the root port to which the discrete GPU is attached. The port can thus only suspend to D3hot, not D3cold.
Even without patch [2/9], when unloading nouveau and letting the root port go to D3hot, the port is subsequently correctly resumed to D0 when reloading nouveau.
So the issue that you're seeing without patch [2/9] seems to be specific to Optimus/_PR3 machines. If possible you should try to get it working without patch [2/9] because that patch is really optional (as I've written in the commit message). I'm totally unfamiliar with Optimus but maybe lspci could help to debug this?
Without 2/9 I can prevent the issue by writing "on" to /sys/bus/pci/devices/0000:00:01.0/power/control (the PCIe port), but that effectively gives the same result as applying 2/9.
The problem occurs when the power is lost (by putting the PCIe port in D3cold). Maybe it is a bug in the PCI core that does not re-initialize devices under the port, but since a workaround is available (2/9), I will focus on other issues first. Maybe it is worth to mention this issue in the commit message for 2/9 though.