From: Thierry Reding treding@nvidia.com
This is a result of looking into a more formal way of doing what was proposed here:
http://patchwork.ozlabs.org/patch/1145363/
The Tegra DRM driver is written such that runtime PM controls all aspects of bringing up and shutting down the hardware associated with a display pipeline. This works very nicely with the DRM/KMS atomic mode- setting framework that has very rigorous call sequences. There are also suspend/resume helpers for system sleep that are built on top of these generic helpers and that cause the same code sequences to be run as if users had simply chosen to disable all display pipelines at normal runtime.
The current behaviour of the PM core to disallow runtime suspend/resume during system sleep gets in the way of this because the devices do not in fact runtime suspend/resume during that time. Most of the time this causes display outputs to malfunction upon resume.
Now, there are good reasons for preventing runtime suspend during system sleep, as given in commit eea3fc0357eb ("PCI / PM: Detect early wakeup in pci_pm_prepare()") that originally introduced this mechanism. There can, however, also be cases, like the one described above, where it is safe to allow this. Add a flag and a set of helpers to set or clear that new flag so that drivers that know it will be safe to runtime suspend a device at system sleep time can mark the device as such.
If a device has the flag set, the PM core will no longer take a runtime PM reference for it, thus allowing the device to runtime suspend at the expected time.
Thierry
Thierry Reding (2): PM / runtime: Allow drivers to override runtime PM behaviour on sleep drm/tegra: Allow runtime suspend on system sleep
drivers/base/power/main.c | 6 ++++-- drivers/base/power/runtime.c | 16 ++++++++++++++++ drivers/gpu/drm/tegra/dc.c | 1 + drivers/gpu/drm/tegra/dsi.c | 1 + drivers/gpu/drm/tegra/hdmi.c | 1 + drivers/gpu/drm/tegra/hub.c | 1 + drivers/gpu/drm/tegra/sor.c | 1 + include/linux/pm.h | 1 + include/linux/pm_runtime.h | 2 ++ 9 files changed, 28 insertions(+), 2 deletions(-)
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
Add helpers to flag devices as being safe to be runtime suspended at system sleep time.
Signed-off-by: Thierry Reding treding@nvidia.com --- drivers/base/power/main.c | 6 ++++-- drivers/base/power/runtime.c | 16 ++++++++++++++++ include/linux/pm.h | 1 + include/linux/pm_runtime.h | 2 ++ 4 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 134a8af51511..f8dbf00c703b 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1113,7 +1113,8 @@ static void device_complete(struct device *dev, pm_message_t state)
device_unlock(dev);
- pm_runtime_put(dev); + if (!dev->power.always_runtime) + pm_runtime_put(dev); }
/** @@ -1896,7 +1897,8 @@ static int device_prepare(struct device *dev, pm_message_t state) * block runtime suspend here, during the prepare phase, and allow * it again during the complete phase. */ - pm_runtime_get_noresume(dev); + if (!dev->power.always_runtime) + pm_runtime_get_noresume(dev);
device_lock(dev);
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 48616f358854..699803984426 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -1440,6 +1440,22 @@ void pm_runtime_allow(struct device *dev) } EXPORT_SYMBOL_GPL(pm_runtime_allow);
+void pm_runtime_always_allow(struct device *dev) +{ + spin_lock_irq(&dev->power.lock); + dev->power.always_runtime = 1; + spin_unlock_irq(&dev->power.lock); +} +EXPORT_SYMBOL_GPL(pm_runtime_always_allow); + +void pm_runtime_always_forbid(struct device *dev) +{ + spin_lock_irq(&dev->power.lock); + dev->power.always_runtime = 0; + spin_unlock_irq(&dev->power.lock); +} +EXPORT_SYMBOL_GPL(pm_runtime_always_forbid); + /** * pm_runtime_no_callbacks - Ignore runtime PM callbacks for a device. * @dev: Device to handle. diff --git a/include/linux/pm.h b/include/linux/pm.h index e057d1fa2469..6133cf496878 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -615,6 +615,7 @@ struct dev_pm_info { unsigned int use_autosuspend:1; unsigned int timer_autosuspends:1; unsigned int memalloc_noio:1; + unsigned int always_runtime:1; unsigned int links_count; enum rpm_request request; enum rpm_status runtime_status; diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 22af69d237a6..28204baf01cb 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -46,6 +46,8 @@ extern void pm_runtime_enable(struct device *dev); extern void __pm_runtime_disable(struct device *dev, bool check_resume); extern void pm_runtime_allow(struct device *dev); extern void pm_runtime_forbid(struct device *dev); +extern void pm_runtime_always_allow(struct device *dev); +extern void pm_runtime_always_forbid(struct device *dev); extern void pm_runtime_no_callbacks(struct device *dev); extern void pm_runtime_irq_safe(struct device *dev); extern void __pm_runtime_use_autosuspend(struct device *dev, bool use);
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
Not only for that.
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Thanks!
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
Not only for that.
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed. I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
Is there some other alternative that I can look into?
Thierry
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com=
wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
=20 Not only for that. =20
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
=20 But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
Is there some other alternative that I can look into?
First of all, ensure that the dpm_list ordering is what it should be on the system/platform in question. That can be done with the help of device links.
In addition, make sure that the devices needed to suspend other devices are suspended in the noirq phase of system-wide suspend and resumed in the noirq phase of system-wide resume. Or at least all of the other devices need to be suspended before them and resumed after them.
These two things should allow you to cover the vast majority of cases if not all of them without messing up with the rules.
Thanks!
On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com=
wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
=20 Not only for that. =20
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
=20 But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
BTW, what if user space prevents PM-runtime from suspending devices by writing "on" to their "control" files?
System-wide suspend is (of course) still expected to work in that case, so how exactly would you overcome that?
On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com=
wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
=20 Not only for that. =20
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
=20 But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
BTW, what if user space prevents PM-runtime from suspending devices by writing "on" to their "control" files?
System-wide suspend is (of course) still expected to work in that case, so how exactly would you overcome that?
I suppose one way to overcome that would be to make it an error to write "on" to the "control" files for these devices.
Currently doing this is likely going to break display support on Tegra, so this would be a good idea in this case anyway.
Again, I could avoid all of these issues by avoiding runtime PM in this driver, but I would end up reimplementing some of the same concepts. I'd rather use something that's supported by the PM core and that might be useful to other drivers than reinvent the wheel.
Thierry
On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding thierry.reding@gmail.com wrote:
On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com=
wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
=20 Not only for that. =20
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
=20 But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
BTW, what if user space prevents PM-runtime from suspending devices by writing "on" to their "control" files?
System-wide suspend is (of course) still expected to work in that case, so how exactly would you overcome that?
I suppose one way to overcome that would be to make it an error to write "on" to the "control" files for these devices.
Seeing suggestions like this in messages from seasoned kernel developers is seriously disappointing. :-/
Currently doing this is likely going to break display support on Tegra, so this would be a good idea in this case anyway.
PM-runtime has always allowed user space to prevent devices from being suspended and it seems that this has not been taken into account by Tegra display support developers at all.
Again, I could avoid all of these issues by avoiding runtime PM in this driver,
I don't quite see the connection here.
Preventing a device from suspending should never be a functional problem. It may be an energy-efficiency problem, but that's something for user space to consider before writing "on" to a device's control file.
but I would end up reimplementing some of the same concepts. I'd rather use something that's supported by the PM core and that might be useful to other drivers than reinvent the wheel.
Which doesn't have to be by using PM-runtime suspend for the handling of system-wide suspend, at least in my view.
On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote:
On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding thierry.reding@gmail.com wrote:
On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com=
wrote:
> > From: Thierry Reding treding@nvidia.com > > Currently the driver PM core will automatically acquire a runtime PM > reference for devices before system sleep is entered. This is needed > to avoid potential issues related to devices' parents getting put to > runtime suspend at the wrong time and causing problems with their > children. =20 Not only for that. =20 > In some cases drivers are carefully written to avoid such issues and > the default behaviour can be changed to allow runtime PM to operate > regularly during system sleep. =20 But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
BTW, what if user space prevents PM-runtime from suspending devices by writing "on" to their "control" files?
System-wide suspend is (of course) still expected to work in that case, so how exactly would you overcome that?
I suppose one way to overcome that would be to make it an error to write "on" to the "control" files for these devices.
Seeing suggestions like this in messages from seasoned kernel developers is seriously disappointing. :-/
Currently doing this is likely going to break display support on Tegra, so this would be a good idea in this case anyway.
PM-runtime has always allowed user space to prevent devices from being suspended and it seems that this has not been taken into account by Tegra display support developers at all.
Again, I could avoid all of these issues by avoiding runtime PM in this driver,
I don't quite see the connection here.
Preventing a device from suspending should never be a functional problem. It may be an energy-efficiency problem, but that's something for user space to consider before writing "on" to a device's control file.
That's really a question of how you define suspension. In the case of display drivers we have the somewhat unfortunate situation that in most SoCs the display "device" is actually represented by a collection of different devices. On Tegra specifically, for example, you have a couple of display controllers, then some "encoders" that take pixel streams from the display controllers and encode them into some wire format like LVDS, HDMI, DSI or DP.
Prohibiting suspension of any of the individual devices causes problems because it effectively makes the whole composite display device not suspendable. Doing so in turn usually means that you can't change the display configuration anymore because devices need to be powered up and down in order to change the configuration.
I consider powering up and down the devices a form of suspension. Hence it seemed natural to implement using runtime PM.
It sounds to me like userspace preventing runtime PM is problematic in most scenarios that involve composite devices because it makes all of the interactions between the devices a bit complicated.
but I would end up reimplementing some of the same concepts. I'd rather use something that's supported by the PM core and that might be useful to other drivers than reinvent the wheel.
Which doesn't have to be by using PM-runtime suspend for the handling of system-wide suspend, at least in my view.
Well, runtime PM is very convenient for this, though. It would allow the same code paths to be used in all cases.
Thierry
On Fri, Nov 29, 2019 at 01:07:19PM +0100, Thierry Reding wrote:
On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote:
On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding thierry.reding@gmail.com wrote:
On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com= wrote: > > > > From: Thierry Reding treding@nvidia.com > > > > Currently the driver PM core will automatically acquire a runtime PM > > reference for devices before system sleep is entered. This is needed > > to avoid potential issues related to devices' parents getting put to > > runtime suspend at the wrong time and causing problems with their > > children. >=20 > Not only for that. >=20 > > In some cases drivers are carefully written to avoid such issues and > > the default behaviour can be changed to allow runtime PM to operate > > regularly during system sleep. >=20 > But this change breaks quite a few assumptions in the core too, so no, > it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
BTW, what if user space prevents PM-runtime from suspending devices by writing "on" to their "control" files?
System-wide suspend is (of course) still expected to work in that case, so how exactly would you overcome that?
I suppose one way to overcome that would be to make it an error to write "on" to the "control" files for these devices.
Seeing suggestions like this in messages from seasoned kernel developers is seriously disappointing. :-/
Currently doing this is likely going to break display support on Tegra, so this would be a good idea in this case anyway.
PM-runtime has always allowed user space to prevent devices from being suspended and it seems that this has not been taken into account by Tegra display support developers at all.
Again, I could avoid all of these issues by avoiding runtime PM in this driver,
I don't quite see the connection here.
Preventing a device from suspending should never be a functional problem. It may be an energy-efficiency problem, but that's something for user space to consider before writing "on" to a device's control file.
That's really a question of how you define suspension. In the case of display drivers we have the somewhat unfortunate situation that in most SoCs the display "device" is actually represented by a collection of different devices. On Tegra specifically, for example, you have a couple of display controllers, then some "encoders" that take pixel streams from the display controllers and encode them into some wire format like LVDS, HDMI, DSI or DP.
Prohibiting suspension of any of the individual devices causes problems because it effectively makes the whole composite display device not suspendable. Doing so in turn usually means that you can't change the display configuration anymore because devices need to be powered up and down in order to change the configuration.
I consider powering up and down the devices a form of suspension. Hence it seemed natural to implement using runtime PM.
It sounds to me like userspace preventing runtime PM is problematic in most scenarios that involve composite devices because it makes all of the interactions between the devices a bit complicated.
Yeah with the DT model of how a SoC works, all these tiny little devices are essentially implementation details that userspace really shouldn't ever care about, much less change anything with them.
If userspace doesn't want the gpu to auto-suspend, then there's the overall gpu device that it can set that on, and I guess doing that should not break a decently written driver. For all the others insisting that userspace can be stupid essentially means we get to hand roll large chunks of runtime pm in drivers, which feels rather pointless. We have a lot of that home-grown runtime pm for subcomponents in i915, and I very much understand why the DT folks wanted to standardize all that with lots of little explicit devices.
Maybe the mistake was simply allowing these to be visible to userspace. -Daniel
but I would end up reimplementing some of the same concepts. I'd rather use something that's supported by the PM core and that might be useful to other drivers than reinvent the wheel.
Which doesn't have to be by using PM-runtime suspend for the handling of system-wide suspend, at least in my view.
Well, runtime PM is very convenient for this, though. It would allow the same code paths to be used in all cases.
Thierry
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Fri, Nov 29, 2019 at 1:07 PM Thierry Reding thierry.reding@gmail.com wrote:
On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote:
[cut]
Sorry for the delay.
First off, let me note that I have seen your most recent patches and thanks for taking the feedback into account, much appreciated!
Nevertheless, I feel that I need to address the below, because it is really important.
Preventing a device from suspending should never be a functional problem. It may be an energy-efficiency problem, but that's something for user space to consider before writing "on" to a device's control file.
That's really a question of how you define suspension.
In general, yes.
However, if you talk about PM-runtime, there are definitions of "suspended" and "active" in there already. Namely, in the PM-runtime context, "suspended" means "may not be accessible to software" whereas "active" means "software can access it".
In the case of display drivers we have the somewhat unfortunate situation that in most SoCs the display "device" is actually represented by a collection of different devices. On Tegra specifically, for example, you have a couple of display controllers, then some "encoders" that take pixel streams from the display controllers and encode them into some wire format like LVDS, HDMI, DSI or DP.
Prohibiting suspension of any of the individual devices causes problems because it effectively makes the whole composite display device not suspendable.
For PM-runtime, that shouldn't be a problem at all.
PM-runtime is all about (possibly) saving energy by powering down devices that are not in use. In particular, It is not about powering down any devices on demand for any reason other than idleness. Therefore in PM-runtime a situation in which a given device cannot be suspended at all is regarded as normal, even though that may not be desirable for energy-efficiency reasons. It just means that the device is in use by somebody all the time. Moreover, PM-runtime is designed to make it possible to resume devices at any time (as long as the hardware works as expected), as soon as they are needed, modulo some possible delays. Actually, that's the purpose of a significant part of the PM-runtime framework.
Accordingly, device drivers may refuse to suspend devices, but refusing to resume a device is not expected by PM-runtime.
If writing "on" to the "control" file of a device does not cause it to be resumed (if suspended) and to stay in the "active" meta-state until "auto" is written to that file, you cannot really claim that PM-runtime is working correctly on your system.
Doing so in turn usually means that you can't change the display configuration anymore because devices need to be powered up and down in order to change the configuration.
I consider powering up and down the devices a form of suspension. Hence it seemed natural to implement using runtime PM.
Unfortunately, that's not the case.
The purpose of PM-runtime is to allow idle devices to be put into power states in which it may not be safe to access them and to make them go back into the "accessible and responsive" state whenever software wants/needs to access them in a coordinated fashion. IOW, it kind of is a counterpart of CPU idle time management.
It sounds to me like userspace preventing runtime PM is problematic in most scenarios that involve composite devices because it makes all of the interactions between the devices a bit complicated.
Even so, that's how it works.
User space can expect to be able to block runtime suspend of devices at any level of device hierarchy, at least for diagnostics if nothing else, end the kernel is responsible for ensuring that.
but I would end up reimplementing some of the same concepts. I'd rather use something that's supported by the PM core and that might be useful to other drivers than reinvent the wheel.
Which doesn't have to be by using PM-runtime suspend for the handling of system-wide suspend, at least in my view.
Well, runtime PM is very convenient for this, though. It would allow the same code paths to be used in all cases.
The same low-level power-up and power-down code can be used in all cases, but PM-runtime is not low-level enough. It is also opportunistic, so if you need to power down a device for reasons other than "natural" idleness, PM-runtime is not the right tool for that task.
Of course, PM-runtime callbacks can invoke the low-level power-up and power-down code, but as you said there are reasons for powering down devices not just because they happen to be idle. System-wide suspend is one of them.
On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com=
wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
=20 Not only for that. =20
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
=20 But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
You're right. I was finally able to find a toolchain that I could build an old version of the kernel with. I tested system suspend/resume on the v4.8 release, which is the first one that had the runtime PM changes as well as the subsystem suspend/resume support wired up, and I can't see the runtime PM callbacks invoked during system suspend/resume.
So I must be misremembering, or I'm confusing it with some other tests I was running at the time.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For this particular use-case the above should all be irrelevant. None of the drivers involved here do anything special at system suspend, because runtime suspend already puts the devices into the lowest possible power state. Basically when these devices are put into runtime suspend, they are completely turned off. The only exception is for things like HDMI where the +5V pin remains powered, so that hotplug detection will work.
The runtime PM state of the devices involved is managed by the subsystem system suspend/resume helpers in DRM/KMS. Basically those helpers turn off all the devices in the composite device, which ultimately results in their last runtime PM reference being released. So for system suspend and resume, these devices aren't touched, other than maybe for the PM core's internal book-keeping.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
That's true. But the same thing is true for pretty much all other APIs. People obviously have to make sure they know what they're doing, just like they have to with any other API.
I suppose the documentation for this new function is currently lacking a bit. Perhaps adding a big warning to this and listing the common pitfalls would help people make the right call about whether or not they can use this.
Is there some other alternative that I can look into?
First of all, ensure that the dpm_list ordering is what it should be on the system/platform in question. That can be done with the help of device links.
I don't think we have device links for everything, but the deferred probe code should take care of ordering the dpm_list correctly because we do handle deferred probe properly in all cases.
Also, the dpm_list ordering isn't very critical in this case. If the devices are allowed to runtime suspend during system sleep, the subsystem sleep helper will put them into runtime suspend at the correct time. This is propagated all the way through the display pipeline and that order is ensured by the subsystem helpers.
In addition, make sure that the devices needed to suspend other devices are suspended in the noirq phase of system-wide suspend and resumed in the noirq phase of system-wide resume. Or at least all of the other devices need to be suspended before them and resumed after them.
We're fine on this front as well. We have run into such issues in the past, but I don't think there are any such issue left at the moment. I do have one pending fix for I2C suspend/resume which fixes an issue where some pinmuxing changes needed to get the HDMI DDC channel to work were not getting applied during resume.
That I2C issue is related to this, I think. What I'm seeing is that when the system goes to sleep, the pinmux looses its programming at a hardware level, but the I2C driver doesn't know about it because it does not get runtime suspended. At runtime suspend it would switch the pinmux state to "idle" which would then match the system suspend state. Upon runtime resume it sets the "default" pinmux state, which will then restore the register programming.
In the current case where runtime suspend/resume is prohibited during system sleep, upon resume the I2C driver will assume that the pinmux state is still "default" and it won't reapply the state (it's actually the pinmux subsystem that makes this decision) and causes HDMI DDC transactions to time out.
One simple fix for that is to use pm_runtime_force_suspend() and pm_runtime_force_resume() as system suspend/resume callbacks to make sure the I2C controller is runtime suspended/resumed during system sleep.
Note that forcing runtime suspend/resume this way is suboptimal in the DRM/KMS case because the suspend/resume happens disconnected from the subsystem suspend/resume callbacks, which is not desired as that breaks some of the assumptions in those callbacks.
These two things should allow you to cover the vast majority of cases if not all of them without messing up with the rules.
One alternative that I had thought about was to just ditch the runtime PM callbacks for this. However, there's one corner case where this may break. On early Tegra generations, the two display controllers are "coupled" in that the second one doesn't work if the first one is disabled. We describe that using a device link from the second to the first controller. This causes the first controller to be automatically be runtime resumed when the second controller is used. This only works via runtime PM, so if I don't use runtime PM I'd have to add special handling for that case.
Actually, there's another problem as well. Most of these devices use generic PM domains to power on/off the SoC partitions that they're in. If I side-step runtime PM, then I'd have to somehow find a way to explicitly control the PM domains.
Another alternative would be to have a kind of hybrid approach where I leave runtime PM calls in the drivers, but disconnect the runtime PM callback implementations from that. That would at least fix the issue with the generic PM domains.
However, it would not fix the problem with coupled display controllers because empty runtime PM callbacks wouldn't actually power up the first display controller when it is needed by the second controller. I would have to add infrastructure that basically duplicates some of runtime PM to fix that.
So the bottom line is that runtime PM is still the best solution for this problem. It works really nice and is very consistent.
Do you think adding better documentation to this new flag and the accessors would help remove your concerns about this?
Thierry
On Fri, Nov 29, 2019 at 10:34 AM Thierry Reding thierry.reding@gmail.com wrote:
On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com=
wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
=20 Not only for that. =20
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
=20 But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
You're right. I was finally able to find a toolchain that I could build an old version of the kernel with. I tested system suspend/resume on the v4.8 release, which is the first one that had the runtime PM changes as well as the subsystem suspend/resume support wired up, and I can't see the runtime PM callbacks invoked during system suspend/resume.
So I must be misremembering, or I'm confusing it with some other tests I was running at the time.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For this particular use-case the above should all be irrelevant. None of the drivers involved here do anything special at system suspend, because runtime suspend already puts the devices into the lowest possible power state. Basically when these devices are put into runtime suspend, they are completely turned off. The only exception is for things like HDMI where the +5V pin remains powered, so that hotplug detection will work.
The runtime PM state of the devices involved is managed by the subsystem system suspend/resume helpers in DRM/KMS. Basically those helpers turn off all the devices in the composite device, which ultimately results in their last runtime PM reference being released. So for system suspend and resume, these devices aren't touched, other than maybe for the PM core's internal book-keeping.
OK, so you actually want system-wide PM to work like PM-runtime on the platform in question, but there are substantial differences.
First, PM-runtime suspend can be effectively disabled by user space and system-wide suspend is always expected to work.
Second, if system wakeup devices are involved, their handling during system-wide suspend depends on the return value of device_may_wakeup() which depends on what user space does, whereas PM-runtime assumes device wakeup to be always enabled.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
That's true. But the same thing is true for pretty much all other APIs. People obviously have to make sure they know what they're doing, just like they have to with any other API.
I suppose the documentation for this new function is currently lacking a bit. Perhaps adding a big warning to this and listing the common pitfalls would help people make the right call about whether or not they can use this.
And then *somebody* would have to chase a ton of subtle issues resulting from that. No, thanks, but no thanks.
Is there some other alternative that I can look into?
First of all, ensure that the dpm_list ordering is what it should be on the system/platform in question. That can be done with the help of device links.
I don't think we have device links for everything, but the deferred probe code should take care of ordering the dpm_list correctly because we do handle deferred probe properly in all cases.
Also, the dpm_list ordering isn't very critical in this case. If the devices are allowed to runtime suspend during system sleep, the subsystem sleep helper will put them into runtime suspend at the correct time. This is propagated all the way through the display pipeline and that order is ensured by the subsystem helpers.
You are still not saying what happens if user space doesn't allow PM-runtime to suspend the devices (by writing "on" to their "control" files).
In addition, make sure that the devices needed to suspend other devices are suspended in the noirq phase of system-wide suspend and resumed in the noirq phase of system-wide resume. Or at least all of the other devices need to be suspended before them and resumed after them.
We're fine on this front as well. We have run into such issues in the past, but I don't think there are any such issue left at the moment. I do have one pending fix for I2C suspend/resume which fixes an issue where some pinmuxing changes needed to get the HDMI DDC channel to work were not getting applied during resume.
That I2C issue is related to this, I think. What I'm seeing is that when the system goes to sleep, the pinmux looses its programming at a hardware level, but the I2C driver doesn't know about it because it does not get runtime suspended.
Well, no, that's not the reason. The real reason is that the handling of that device during system-wide suspend does not follow the rules followed by PM-runtime for it.
Switching system-wide PM over to PM-runtime to address that is not going to work, because PM-runtime is not mandatory and system-wide PM is.
At runtime suspend it would switch the pinmux state to "idle" which would then match the system suspend state. Upon runtime resume it sets the "default" pinmux state, which will then restore the register programming.
So this logic needs to be implemented in the system-wide suspend flow as well.
In the current case where runtime suspend/resume is prohibited during
Runtime suspend is, runtime resume isn't until the "late" suspend phase.
system sleep, upon resume the I2C driver will assume that the pinmux state is still "default" and it won't reapply the state (it's actually the pinmux subsystem that makes this decision) and causes HDMI DDC transactions to time out.
So this is a bug in the system-wide suspend/resume flow that needs to be addressed, but not by switching it over to PM-runtime.
One simple fix for that is to use pm_runtime_force_suspend() and pm_runtime_force_resume() as system suspend/resume callbacks to make sure the I2C controller is runtime suspended/resumed during system sleep.
Note that forcing runtime suspend/resume this way is suboptimal in the DRM/KMS case because the suspend/resume happens disconnected from the subsystem suspend/resume callbacks, which is not desired as that breaks some of the assumptions in those callbacks.
So there needs to be another way.
Have you looked at DPM_FLAG_SMART_SUSPEND?
These two things should allow you to cover the vast majority of cases if not all of them without messing up with the rules.
One alternative that I had thought about was to just ditch the runtime PM callbacks for this. However, there's one corner case where this may break. On early Tegra generations, the two display controllers are "coupled" in that the second one doesn't work if the first one is disabled. We describe that using a device link from the second to the first controller. This causes the first controller to be automatically be runtime resumed when the second controller is used. This only works via runtime PM, so if I don't use runtime PM I'd have to add special handling for that case.
Runtime resume during system-wide suspend and resume is basically fine unless you try to do it in the "late" suspend phase or later, but that limitation is kind of artificial. [I was talking about that at the LPC this year.] It basically cannot be carried out in the part of system-wide suspend after the core regards the device and its parent etc as "suspended", but the definition of that may be adjusted IMO.
And using PM-runtime resume during system-wide resume may be fine too, basically (as long as the ordering of that is not lead to any kind of loop dependencies).
On the other hand, there is *zero* need for runtime suspend during system-wide transitions and it is known problematic.
Actually, there's another problem as well. Most of these devices use generic PM domains to power on/off the SoC partitions that they're in. If I side-step runtime PM, then I'd have to somehow find a way to explicitly control the PM domains.
That's a problem with genpd, I'd say.
Another alternative would be to have a kind of hybrid approach where I leave runtime PM calls in the drivers, but disconnect the runtime PM callback implementations from that. That would at least fix the issue with the generic PM domains.
However, it would not fix the problem with coupled display controllers because empty runtime PM callbacks wouldn't actually power up the first display controller when it is needed by the second controller. I would have to add infrastructure that basically duplicates some of runtime PM to fix that.
So the bottom line is that runtime PM is still the best solution for this problem. It works really nice and is very consistent.
Do you think adding better documentation to this new flag and the accessors would help remove your concerns about this?
No, it wouldn't.
Also your arguments are mostly about PM-runtime resume, which is a different story.
On Fri, Nov 29, 2019 at 11:09:26AM +0100, Rafael J. Wysocki wrote:
On Fri, Nov 29, 2019 at 10:34 AM Thierry Reding thierry.reding@gmail.com wrote:
On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote:
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
--0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com=
wrote:
From: Thierry Reding treding@nvidia.com
Currently the driver PM core will automatically acquire a runtime PM reference for devices before system sleep is entered. This is needed to avoid potential issues related to devices' parents getting put to runtime suspend at the wrong time and causing problems with their children.
=20 Not only for that. =20
In some cases drivers are carefully written to avoid such issues and the default behaviour can be changed to allow runtime PM to operate regularly during system sleep.
=20 But this change breaks quite a few assumptions in the core too, so no, it can't be made.
Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky.
There are races and such that you may never hit during casual testing.
One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed.
Not true at all.
The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced.
You're right. I was finally able to find a toolchain that I could build an old version of the kernel with. I tested system suspend/resume on the v4.8 release, which is the first one that had the runtime PM changes as well as the subsystem suspend/resume support wired up, and I can't see the runtime PM callbacks invoked during system suspend/resume.
So I must be misremembering, or I'm confusing it with some other tests I was running at the time.
I did some digging but couldn't find anything that would have had an impact on this.
Given that this is completely opt-in feature, why are you categorically NAK'ing this?
The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did.
Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that.
For this particular use-case the above should all be irrelevant. None of the drivers involved here do anything special at system suspend, because runtime suspend already puts the devices into the lowest possible power state. Basically when these devices are put into runtime suspend, they are completely turned off. The only exception is for things like HDMI where the +5V pin remains powered, so that hotplug detection will work.
The runtime PM state of the devices involved is managed by the subsystem system suspend/resume helpers in DRM/KMS. Basically those helpers turn off all the devices in the composite device, which ultimately results in their last runtime PM reference being released. So for system suspend and resume, these devices aren't touched, other than maybe for the PM core's internal book-keeping.
OK, so you actually want system-wide PM to work like PM-runtime on the platform in question, but there are substantial differences.
That's not exactly what I'm trying to do here. If this was all I wanted to do I could simply use UNIVERSAL_DEV_PM_OPS.
What I want to do is basically allow the system-wide PM of the subsystem to control the runtime PM of the devices involved.
First, PM-runtime suspend can be effectively disabled by user space and system-wide suspend is always expected to work.
Second, if system wakeup devices are involved, their handling during system-wide suspend depends on the return value of device_may_wakeup() which depends on what user space does, whereas PM-runtime assumes device wakeup to be always enabled.
For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing.
That's true. But the same thing is true for pretty much all other APIs. People obviously have to make sure they know what they're doing, just like they have to with any other API.
I suppose the documentation for this new function is currently lacking a bit. Perhaps adding a big warning to this and listing the common pitfalls would help people make the right call about whether or not they can use this.
And then *somebody* would have to chase a ton of subtle issues resulting from that. No, thanks, but no thanks.
If the kerneldoc makes it clear that they're only supposed to use this when they exactly know that it's safe to do, I don't think anybody is going to put the blame on you to fix their bugs. If using this breaks, it's clearly wrong to use it.
Is there some other alternative that I can look into?
First of all, ensure that the dpm_list ordering is what it should be on the system/platform in question. That can be done with the help of device links.
I don't think we have device links for everything, but the deferred probe code should take care of ordering the dpm_list correctly because we do handle deferred probe properly in all cases.
Also, the dpm_list ordering isn't very critical in this case. If the devices are allowed to runtime suspend during system sleep, the subsystem sleep helper will put them into runtime suspend at the correct time. This is propagated all the way through the display pipeline and that order is ensured by the subsystem helpers.
You are still not saying what happens if user space doesn't allow PM-runtime to suspend the devices (by writing "on" to their "control" files).
I was suggesting that we prohibit that, which you clearly didn't like. You didn't give any reasons for why you think this is a bad idea, but the alternative would be to implement some driver-specific equivalent of that.
At that point, does it really matter whether the user is prevented from prohibiting suspend via runtime PM or some non-standard mechanism with a different name but equivalent functionality?
The fact is that in order to properly use the device we need to be able to suspend it. We need to do this to switch video modes anyway. There's simply no way to make the display work reliably without it going into suspend and resuming. Whether we call this runtime suspend/resume or something driver-specific is really just an implementation detail. The consequences for userspace are exactly the same.
So if you think that allowing userspace to prohibit runtime suspend is imperative always, then I don't have much choice but to do it without runtime PM.
In addition, make sure that the devices needed to suspend other devices are suspended in the noirq phase of system-wide suspend and resumed in the noirq phase of system-wide resume. Or at least all of the other devices need to be suspended before them and resumed after them.
We're fine on this front as well. We have run into such issues in the past, but I don't think there are any such issue left at the moment. I do have one pending fix for I2C suspend/resume which fixes an issue where some pinmuxing changes needed to get the HDMI DDC channel to work were not getting applied during resume.
That I2C issue is related to this, I think. What I'm seeing is that when the system goes to sleep, the pinmux looses its programming at a hardware level, but the I2C driver doesn't know about it because it does not get runtime suspended.
Well, no, that's not the reason. The real reason is that the handling of that device during system-wide suspend does not follow the rules followed by PM-runtime for it.
Switching system-wide PM over to PM-runtime to address that is not going to work, because PM-runtime is not mandatory and system-wide PM is.
At runtime suspend it would switch the pinmux state to "idle" which would then match the system suspend state. Upon runtime resume it sets the "default" pinmux state, which will then restore the register programming.
So this logic needs to be implemented in the system-wide suspend flow as well.
I suppose one other alternative would be to use universal PM ops for this case. In this case we actually do want the same behaviour at system sleep than we do for runtime PM.
In the current case where runtime suspend/resume is prohibited during
Runtime suspend is, runtime resume isn't until the "late" suspend phase.
system sleep, upon resume the I2C driver will assume that the pinmux state is still "default" and it won't reapply the state (it's actually the pinmux subsystem that makes this decision) and causes HDMI DDC transactions to time out.
So this is a bug in the system-wide suspend/resume flow that needs to be addressed, but not by switching it over to PM-runtime.
One simple fix for that is to use pm_runtime_force_suspend() and pm_runtime_force_resume() as system suspend/resume callbacks to make sure the I2C controller is runtime suspended/resumed during system sleep.
Note that forcing runtime suspend/resume this way is suboptimal in the DRM/KMS case because the suspend/resume happens disconnected from the subsystem suspend/resume callbacks, which is not desired as that breaks some of the assumptions in those callbacks.
So there needs to be another way.
Have you looked at DPM_FLAG_SMART_SUSPEND?
I'll look at that. It seems like it could do the trick for the I2C problem I'm seeing.
Generally, though, what I keep noticing here is that for many devices there is some commonality between runtime PM and system sleep. Actually for some devices they are exactly the same, which I guess is one of the reasons why I had hoped we could somehow simplify things by having runtime PM on one hand and then if system sleep doesn't need anything other than what runtime PM already does, we could just do runtime PM all the time. That way we could avoid all the duplication. I guess that's mostly what universal PM ops are about. I'm not exactly sure how that would work during system resume, though. Would the PM core not invoke the same callback twice, once for system resume and then again (after allowing runtime PM again) for resume runtime?
These two things should allow you to cover the vast majority of cases if not all of them without messing up with the rules.
One alternative that I had thought about was to just ditch the runtime PM callbacks for this. However, there's one corner case where this may break. On early Tegra generations, the two display controllers are "coupled" in that the second one doesn't work if the first one is disabled. We describe that using a device link from the second to the first controller. This causes the first controller to be automatically be runtime resumed when the second controller is used. This only works via runtime PM, so if I don't use runtime PM I'd have to add special handling for that case.
Runtime resume during system-wide suspend and resume is basically fine unless you try to do it in the "late" suspend phase or later, but that limitation is kind of artificial. [I was talking about that at the LPC this year.] It basically cannot be carried out in the part of system-wide suspend after the core regards the device and its parent etc as "suspended", but the definition of that may be adjusted IMO.
And using PM-runtime resume during system-wide resume may be fine too, basically (as long as the ordering of that is not lead to any kind of loop dependencies).
On the other hand, there is *zero* need for runtime suspend during system-wide transitions and it is known problematic.
I don't quite understand this. I don't see a need to runtime resume during suspend, because you're actually trying to suspend devices. Most of the time at least. I get that in some cases you may need to resume devices in order to help put other devices (that depend on them) into suspend, but most of the time the goal is to set devices into some low power state so that when the system is asleep you consume less power than in the active state. This is more or less the same thing that you want with runtime PM as well, isn't it?
So why do you say that there's no need for runtime suspend during system suspend? I always figured that runtime suspend was sort of a soft system suspend in that system suspend may be more aggressive, and mostly a superset of runtime suspend. So in pseudocode it would be roughly:
runtime_suspend(device) { set_low_power_mode(device); }
system_suspend(device) { save_context(device); runtime_suspend(device); power_off(device); }
And for resume you could basically just call these in reverse order:
runtime_resume(device) { set_normal_mode(device); }
system_resume(device) { power_on(device); runtime_resume(device); restore_context(device); }
I understand that this may not be true for all devices. However, in some cases we may even want to go further and do at runtime_suspend() what we do in system_suspend() because the impact may be low enough and the power savings worth it.
Actually, there's another problem as well. Most of these devices use generic PM domains to power on/off the SoC partitions that they're in. If I side-step runtime PM, then I'd have to somehow find a way to explicitly control the PM domains.
That's a problem with genpd, I'd say.
Fair enough. So far we've tried to implement things such that they work within the existing infrastructure, but if runtime PM turns out not to be what we actually need, maybe we need to just move to something different.
Another alternative would be to have a kind of hybrid approach where I leave runtime PM calls in the drivers, but disconnect the runtime PM callback implementations from that. That would at least fix the issue with the generic PM domains.
However, it would not fix the problem with coupled display controllers because empty runtime PM callbacks wouldn't actually power up the first display controller when it is needed by the second controller. I would have to add infrastructure that basically duplicates some of runtime PM to fix that.
So the bottom line is that runtime PM is still the best solution for this problem. It works really nice and is very consistent.
Do you think adding better documentation to this new flag and the accessors would help remove your concerns about this?
No, it wouldn't.
Also your arguments are mostly about PM-runtime resume, which is a different story.
What makes you say that? I'm equally concerned about runtime suspend because runtime resume alone is not good enough to resume a device that was never suspended. Runtime suspend will typically assert a reset for these devices and runtime resume will then deassert the reset. This is necessary to get the devices into a proper working state.
Thierry
From: Thierry Reding treding@nvidia.com
By default the PM core prevents devices from being runtime suspended during system sleep. This is needed to avoid some common cases where devices cannot be properly resumed because their parents are runtime suspended at an unfortunate point in time.
However, there are cases where suspend/resume works in a way that it becomes possible for devices to be runtime suspended at system sleep time. In fact, for some devices runtime suspension can be equivalent to their state in system sleep.
Typically this would be solved by making the runtime suspend/resume callbacks the same as the system suspend/resume callbacks. For some subsystems it isn't quite that simple, unfortunately.
For example, the DRM subsystem has subsystem-level suspend/resume helpers that control how display pipelines are shut down on suspend and brought up again on resume. This procedure is the same as their operation under normal circumstances (when the user switches on/off a subset of the displays in their configuration). These helpers are carefully ordering the operations to make sure the right sequences between connectors, encoders and display controllers are respected.
In order for suspend/resume to not get in the way of the sequencing that's already happening at the subsystem level, allow the devices involved in the display pipelines to runtime suspend during system sleep.
Signed-off-by: Thierry Reding treding@nvidia.com --- drivers/gpu/drm/tegra/dc.c | 1 + drivers/gpu/drm/tegra/dsi.c | 1 + drivers/gpu/drm/tegra/hdmi.c | 1 + drivers/gpu/drm/tegra/hub.c | 1 + drivers/gpu/drm/tegra/sor.c | 1 + 5 files changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 2b9a25c977c0..386819b4662b 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -2506,6 +2506,7 @@ static int tegra_dc_probe(struct platform_device *pdev) }
platform_set_drvdata(pdev, dc); + pm_runtime_always_allow(&pdev->dev); pm_runtime_enable(&pdev->dev);
INIT_LIST_HEAD(&dc->client.list); diff --git a/drivers/gpu/drm/tegra/dsi.c b/drivers/gpu/drm/tegra/dsi.c index a5d47e301c5f..683a27f9ba52 100644 --- a/drivers/gpu/drm/tegra/dsi.c +++ b/drivers/gpu/drm/tegra/dsi.c @@ -1552,6 +1552,7 @@ static int tegra_dsi_probe(struct platform_device *pdev) }
platform_set_drvdata(pdev, dsi); + pm_runtime_always_allow(&pdev->dev); pm_runtime_enable(&pdev->dev);
INIT_LIST_HEAD(&dsi->client.list); diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/drm/tegra/hdmi.c index 50269ffbcb6b..2562bf607be1 100644 --- a/drivers/gpu/drm/tegra/hdmi.c +++ b/drivers/gpu/drm/tegra/hdmi.c @@ -1664,6 +1664,7 @@ static int tegra_hdmi_probe(struct platform_device *pdev) }
platform_set_drvdata(pdev, hdmi); + pm_runtime_always_allow(&pdev->dev); pm_runtime_enable(&pdev->dev);
INIT_LIST_HEAD(&hdmi->client.list); diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c index e56c0f7d3a13..aced537ac990 100644 --- a/drivers/gpu/drm/tegra/hub.c +++ b/drivers/gpu/drm/tegra/hub.c @@ -844,6 +844,7 @@ static int tegra_display_hub_probe(struct platform_device *pdev) return err;
platform_set_drvdata(pdev, hub); + pm_runtime_always_allow(&pdev->dev); pm_runtime_enable(&pdev->dev);
INIT_LIST_HEAD(&hub->client.list); diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c index a68d3b36b972..20058f11bf81 100644 --- a/drivers/gpu/drm/tegra/sor.c +++ b/drivers/gpu/drm/tegra/sor.c @@ -3833,6 +3833,7 @@ static int tegra_sor_probe(struct platform_device *pdev) }
platform_set_drvdata(pdev, sor); + pm_runtime_always_allow(&pdev->dev); pm_runtime_enable(&pdev->dev);
/*
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com wrote:
From: Thierry Reding treding@nvidia.com
This is a result of looking into a more formal way of doing what was proposed here:
http://patchwork.ozlabs.org/patch/1145363/
The Tegra DRM driver is written such that runtime PM controls all aspects of bringing up and shutting down the hardware associated with a display pipeline. This works very nicely with the DRM/KMS atomic mode- setting framework that has very rigorous call sequences. There are also suspend/resume helpers for system sleep that are built on top of these generic helpers and that cause the same code sequences to be run as if users had simply chosen to disable all display pipelines at normal runtime.
The current behaviour of the PM core to disallow runtime suspend/resume during system sleep gets in the way of this because the devices do not in fact runtime suspend/resume during that time. Most of the time this causes display outputs to malfunction upon resume.
Now, there are good reasons for preventing runtime suspend during system sleep, as given in commit eea3fc0357eb ("PCI / PM: Detect early wakeup in pci_pm_prepare()") that originally introduced this mechanism. There can, however, also be cases, like the one described above, where it is safe to allow this. Add a flag and a set of helpers to set or clear that new flag so that drivers that know it will be safe to runtime suspend a device at system sleep time can mark the device as such.
If a device has the flag set, the PM core will no longer take a runtime PM reference for it, thus allowing the device to runtime suspend at the expected time.
What about sprinkling tons of device_links all over this to make sure system suspend/resume is done in the same order too? Slightly less neat from a driver pov, but I think that should get the job done. Maybe could even do a convenience function which converts a dt phandle (or whatever that was called again) into a device_link? -Daniel
Thierry
Thierry Reding (2): PM / runtime: Allow drivers to override runtime PM behaviour on sleep drm/tegra: Allow runtime suspend on system sleep
drivers/base/power/main.c | 6 ++++-- drivers/base/power/runtime.c | 16 ++++++++++++++++ drivers/gpu/drm/tegra/dc.c | 1 + drivers/gpu/drm/tegra/dsi.c | 1 + drivers/gpu/drm/tegra/hdmi.c | 1 + drivers/gpu/drm/tegra/hub.c | 1 + drivers/gpu/drm/tegra/sor.c | 1 + include/linux/pm.h | 1 + include/linux/pm_runtime.h | 2 ++ 9 files changed, 28 insertions(+), 2 deletions(-)
-- 2.23.0
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Nov 28, 2019 at 05:47:05PM +0100, Daniel Vetter wrote:
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding thierry.reding@gmail.com wrote:
From: Thierry Reding treding@nvidia.com
This is a result of looking into a more formal way of doing what was proposed here:
http://patchwork.ozlabs.org/patch/1145363/
The Tegra DRM driver is written such that runtime PM controls all aspects of bringing up and shutting down the hardware associated with a display pipeline. This works very nicely with the DRM/KMS atomic mode- setting framework that has very rigorous call sequences. There are also suspend/resume helpers for system sleep that are built on top of these generic helpers and that cause the same code sequences to be run as if users had simply chosen to disable all display pipelines at normal runtime.
The current behaviour of the PM core to disallow runtime suspend/resume during system sleep gets in the way of this because the devices do not in fact runtime suspend/resume during that time. Most of the time this causes display outputs to malfunction upon resume.
Now, there are good reasons for preventing runtime suspend during system sleep, as given in commit eea3fc0357eb ("PCI / PM: Detect early wakeup in pci_pm_prepare()") that originally introduced this mechanism. There can, however, also be cases, like the one described above, where it is safe to allow this. Add a flag and a set of helpers to set or clear that new flag so that drivers that know it will be safe to runtime suspend a device at system sleep time can mark the device as such.
If a device has the flag set, the PM core will no longer take a runtime PM reference for it, thus allowing the device to runtime suspend at the expected time.
What about sprinkling tons of device_links all over this to make sure system suspend/resume is done in the same order too? Slightly less neat from a driver pov, but I think that should get the job done. Maybe could even do a convenience function which converts a dt phandle (or whatever that was called again) into a device_link?
That wouldn't actually fix the issue that I'm trying to solve. While it seems like forcefully runtime suspending devices on system sleep, as is done in Dmitry's patch that I linked to above, results in the system resuming normally, I do recall that the exact point in time where we reset and power-cycle the hardware can be crucial.
The strange thing is that I do recall this to have worked in the past. This must have been around the time when we worked on atomic modesetting (about 4 years ago) because all of the runtime PM code in Tegra DRM is from around that time. And I remember at the time thinking how neatly this all fit together, because system suspend/resume really wasn't at all a special case.
Unfortunately, looking at the git log I don't see how it ever could've worked given that extra runtime PM reference that the PM core acquires. I tried to go back and test on one of the old kernel trees, but I can't find a GCC that builds those versions. I may need to go and dig through some archives to find an old enough version.
Anyway, the situation being what it is, I think that quite a few of the DRM/KMS drivers may be running into the same problem, since most today use the drm_mode_config_helper_{suspend,resume}() helpers and quite a few do pm_runtime_get_sync() in ->atomic_enable() and pm_runtime_put() in ->atomic_disable(). Now, for some hardware it may not matter whether we actually get suspended, but I'm not sure that's an assumption that we can make.
One possible solution that I can think of is to side-step runtime PM entirely and just call the functions directly. At least on Tegra we don't rely on the reference counting. I'm not entirely sure that would work, though, because what we do rely on is interoperability with generic power domains via runtime PM. The way this was supposed to work was that the various hardware engines would get powered off and after the runtime suspend callbacks had successfully executed, the reference count on their power domains would also be decreased, which would eventually cause the power domains to be powered off at the right time. Moving away from runtime suspend callbacks in the drivers would require at least leaving in a few of the pm_runtime_{get,put}() calls to make sure the power domains are on/off at the right time.
Thierry
-Daniel
Thierry
Thierry Reding (2): PM / runtime: Allow drivers to override runtime PM behaviour on sleep drm/tegra: Allow runtime suspend on system sleep
drivers/base/power/main.c | 6 ++++-- drivers/base/power/runtime.c | 16 ++++++++++++++++ drivers/gpu/drm/tegra/dc.c | 1 + drivers/gpu/drm/tegra/dsi.c | 1 + drivers/gpu/drm/tegra/hdmi.c | 1 + drivers/gpu/drm/tegra/hub.c | 1 + drivers/gpu/drm/tegra/sor.c | 1 + include/linux/pm.h | 1 + include/linux/pm_runtime.h | 2 ++ 9 files changed, 28 insertions(+), 2 deletions(-)
-- 2.23.0
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
dri-devel@lists.freedesktop.org