[...]
We have three components comprising PM on Tegra:
- Power gate
- Clock state
- Voltage state
GENPD on/off represents the 'power gate'.
Clock and reset are controlled by device drivers using clk and rst APIs.
Voltage state is represented by GENPD's performance level.
GENPD core assumes that at a first rpm-resume of a consumer device, its genpd_performance=0. Not true for Tegra because h/w of the device is preconfigured to a non-zero perf level initially, h/w may not support zero level at all.
I think you may be misunderstanding genpd's behaviour around this, but let me elaborate.
In genpd_runtime_resume(), we try to restore the performance state for the device that genpd_runtime_suspend() *may* have dropped earlier. That means, if genpd_runtime_resume() is called prior genpd_runtime_suspend() for the first time, it means that genpd_runtime_resume() will *not* restore a performance state, but instead just leave the performance state as is for the device (see genpd_restore_performance_state()).
In other words, a consumer driver may use the following sequence to set an initial performance state for the device during ->probe():
... rate = clk_get_rate() dev_pm_opp_set_rate(rate)
pm_runtime_enable() pm_runtime_resume_and_get() ...
Note that, it's the consumer driver's responsibility to manage device specific resources, in its ->runtime_suspend|resume() callbacks. Typically that means dealing with clock gating/ungating, for example.
In the other scenario where a consumer driver prefers to *not* call pm_runtime_resume_and_get() in its ->probe(), because it doesn't need to power on the device to complete probing, then we don't want to vote for an OPP at all - and we also want the performance state for the device in genpd to be set to zero. Correct?
Yes
Is this the main problem you are trying to solve, because I think this doesn't work out of the box as of today?
The main problem is that the restored performance state is zero for the first genpd_runtime_resume(), while it's not zero from the h/w perspective.
This should not be a problem, but can be handled by the consumer driver.
genpd_runtime_resume() calls genpd_restore_performance_state() to restore a performance state for the device. However, in the scenario you describe, "gpd_data->rpm_pstate" is zero, which makes genpd_restore_performance_state() to just leave the device's performance state as is - it will *not* restore the performance state to zero.
To make the consumer driver deal with this, it would need to call dev_pm_opp_set_rate() from within its ->runtime_resume() callback.
There is another concern though, but perhaps it's not a problem after all. Viresh told us that dev_pm_opp_set_rate() may turn on resources like clock/regulators. That could certainly be problematic, in particular if the device and its genpd have OPP tables associated with it and the consumer driver wants to follow the above sequence in probe.
dev_pm_opp_set_rate() won't enable clocks and regulators, but it may change the clock rate and voltage. This is also platform/driver specific because it's up to OPP user how to configure OPP table. On Tegra we only assign clock to OPP table, regulators are unused.
Viresh, can you please chime in here and elaborate on some of the magic happening behind dev_pm_opp_set_rate() API - is there a problem here or not?
GENPD core assumes that consumer devices can work at any performance level. Not true for Tegra because voltage needs to be set in accordance to the clock rate before clock is enabled, otherwise h/w won't work properly, perhaps clock may be unstable or h/w won't be latching.
Correct. Genpd relies on the callers to use the OPP framework if there are constraints like you describe above.
That said, it's not forbidden for a consumer driver to call dev_pm_genpd_set_performance_state() directly, but then it better knows exactly what it's doing.
Performance level should be set to 0 while device is suspended.
Do you mean system suspend or runtime suspend? Or both?
Runtime suspend.
Alright. So that's already taken care of for us in genpd_runtime_suspend().
Or perhaps you have discovered some problem with this?
Performance level needs to be bumped on rpm-resume of a device in accordance to h/w state before hardware is enabled.
Assuming there was a performance state set for the device when genpd_runtime_suspend() was called, genpd_runtime_resume() will restore that state according to the sequence you described.
What do you think about adding API that will allow drivers to explicitly set the restored performance state of a power domain?
Another option could be to change the GENPD core, making it to set the rpm_pstate when dev_pm_genpd_set_performance_state(dev) is invoked and device is rpm-suspended, instead of calling the genpd->set_performance_state callback.
Then drivers will be able to sync the perf state at a probe time.
What do you think?
I don't think it's needed, see my reply earlier above. However your change touches another problem though, see below.
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index a934c679e6ce..cc15ab9eacc9 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -435,7 +435,7 @@ static void genpd_restore_performance_state(struct device *dev, int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state) { struct generic_pm_domain *genpd;
int ret;
int ret = 0; genpd = dev_to_genpd_safe(dev); if (!genpd)
@@ -446,7 +446,10 @@ int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state) return -EINVAL;
genpd_lock(genpd);
ret = genpd_set_performance_state(dev, state);
if (pm_runtime_suspended(dev))
dev_gpd_data(dev)->rpm_pstate = state;
else
ret = genpd_set_performance_state(dev, state); genpd_unlock(genpd);
This doesn't work for all cases. For example, when a consumer driver deploys runtime PM support in its ->probe() according to the below sequence:
... dev_pm_opp_set_rate(rate) pm_runtime_get_noresume() pm_runtime_set_active() pm_runtime_enable() ... pm_runtime_put() ...
We need to call genpd_set_performance_state() independently of whether the device is runtime suspended or not.
I don't see where is the problem in yours example.
pm_runtime_suspended() = false while RPM is disabled. When device is resumed, the rpm_pstate=0, so it won't change the pstate on resume.
Yes, you are certainly correct, my bad! I mixed it up with pm_runtime_status_suspended(), which only cares about the status.
So, after a second thought, your suggestion sounds very much reasonable to me! I have also tried to consider all different scenarios, including the system suspend/resume path, but I think it should be fine.
I also think that a patch like the above should be considered as a fix, because it actually fixes a problem, according to what I said in my earlier reply, below.
Fixes : 5937c3ce2122 ("PM: domains: Drop/restore performance state votes for devices at runtime PM").
Although, it actually seems like good idea to update dev_gpd_data(dev)->rpm_pstate = state here, as to make sure genpd_runtime_resume() doesn't restore an old/invalid value that was saved while dropping the performance state vote for the device in genpd_runtime_suspend() earlier.
Let me send a patch for this shortly, to close this window of a possible error.
It will also remove the need to resume device just to change the clock rate, like I needed to do it in the PWM patch of this series.
Do you want to send the patch formally? Or do you prefer it if I do it?
Kind regards Uffe