Hi,
This is a follow-up of the discussion here: https://lore.kernel.org/linux-clk/20210319150355.xzw7ikwdaga2dwhv@gilmour/
This implements a mechanism to raise and lower clock rates based on consumer workloads, with an example of such an implementation for the RaspberryPi4 HDMI controller.
There's a couple of things worth discussing:
- The name is in conflict with clk_request_rate, and even though it feels like the right name to me, we should probably avoid any confusion
- The code so far implements a policy of always going for the lowest rate possible. While we don't have an use-case for something else, this should maybe be made more flexible?
Let me know what you think Maxime
Maxime Ripard (2): clk: Introduce a clock request API drm/vc4: hdmi: Convert to the new clock request API
drivers/clk/clk.c | 121 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/vc4/vc4_hdmi.c | 19 ++++-- drivers/gpu/drm/vc4/vc4_hdmi.h | 3 + include/linux/clk.h | 4 ++ 4 files changed, 140 insertions(+), 7 deletions(-)
It's not unusual to find clocks being shared across multiple devices that need to change the rate depending on what the device is doing at a given time.
The SoC found on the RaspberryPi4 (BCM2711) is in such a situation between its two HDMI controllers that share a clock that needs to be raised depending on the output resolution of each controller.
The current clk_set_rate API doesn't really allow to support that case since there's really no synchronisation between multiple users, it's essentially a fire-and-forget solution.
clk_set_min_rate does allow for such a synchronisation, but has another drawback: it doesn't allow to reduce the clock rate once the work is over.
In our previous example, this means that if we were to raise the resolution of one HDMI controller to the largest resolution and then changing for a smaller one, we would still have the clock running at the largest resolution rate resulting in a poor power-efficiency.
In order to address both issues, let's create an API that allows user to create temporary requests to increase the rate to a minimum, before going back to the initial rate once the request is done.
This introduces mainly two side-effects:
* There's an interaction between clk_set_rate and requests. This has been addressed by having clk_set_rate increasing the rate if it's greater than what the requests asked for, and in any case changing the rate the clock will return to once all the requests are done.
* Similarly, clk_round_rate has been adjusted to take the requests into account and return a rate that will be greater or equal to the requested rates.
Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/clk/clk.c | 121 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/clk.h | 4 ++ 2 files changed, 125 insertions(+)
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 5052541a0986..4fd91a57e6db 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -77,12 +77,14 @@ struct clk_core { unsigned int protect_count; unsigned long min_rate; unsigned long max_rate; + unsigned long default_request_rate; unsigned long accuracy; int phase; struct clk_duty duty; struct hlist_head children; struct hlist_node child_node; struct hlist_head clks; + struct list_head pending_requests; unsigned int notifier_count; #ifdef CONFIG_DEBUG_FS struct dentry *dentry; @@ -105,6 +107,12 @@ struct clk { struct hlist_node clks_node; };
+struct clk_request { + struct list_head list; + struct clk *clk; + unsigned long rate; +}; + /*** runtime pm ***/ static int clk_pm_runtime_get(struct clk_core *core) { @@ -1434,10 +1442,14 @@ unsigned long clk_hw_round_rate(struct clk_hw *hw, unsigned long rate) { int ret; struct clk_rate_request req; + struct clk_request *clk_req;
clk_core_get_boundaries(hw->core, &req.min_rate, &req.max_rate); req.rate = rate;
+ list_for_each_entry(clk_req, &hw->core->pending_requests, list) + req.min_rate = max(clk_req->rate, req.min_rate); + ret = clk_core_round_rate_nolock(hw->core, &req); if (ret) return 0; @@ -1458,6 +1470,7 @@ EXPORT_SYMBOL_GPL(clk_hw_round_rate); long clk_round_rate(struct clk *clk, unsigned long rate) { struct clk_rate_request req; + struct clk_request *clk_req; int ret;
if (!clk) @@ -1471,6 +1484,9 @@ long clk_round_rate(struct clk *clk, unsigned long rate) clk_core_get_boundaries(clk->core, &req.min_rate, &req.max_rate); req.rate = rate;
+ list_for_each_entry(clk_req, &clk->core->pending_requests, list) + req.min_rate = max(clk_req->rate, req.min_rate); + ret = clk_core_round_rate_nolock(clk->core, &req);
if (clk->exclusive_count) @@ -1938,6 +1954,7 @@ static struct clk_core *clk_calc_new_rates(struct clk_core *core, unsigned long new_rate; unsigned long min_rate; unsigned long max_rate; + struct clk_request *req; int p_index = 0; long ret;
@@ -1952,6 +1969,9 @@ static struct clk_core *clk_calc_new_rates(struct clk_core *core,
clk_core_get_boundaries(core, &min_rate, &max_rate);
+ list_for_each_entry(req, &core->pending_requests, list) + min_rate = max(req->rate, min_rate); + /* find the closest rate and parent clk/rate */ if (clk_core_can_round(core)) { struct clk_rate_request req; @@ -2156,6 +2176,7 @@ static unsigned long clk_core_req_round_rate_nolock(struct clk_core *core, { int ret, cnt; struct clk_rate_request req; + struct clk_request *clk_req;
lockdep_assert_held(&prepare_lock);
@@ -2170,6 +2191,9 @@ static unsigned long clk_core_req_round_rate_nolock(struct clk_core *core, clk_core_get_boundaries(core, &req.min_rate, &req.max_rate); req.rate = req_rate;
+ list_for_each_entry(clk_req, &core->pending_requests, list) + req.min_rate = max(clk_req->rate, req.min_rate); + ret = clk_core_round_rate_nolock(core, &req);
/* restore the protection */ @@ -2263,6 +2287,9 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
ret = clk_core_set_rate_nolock(clk->core, rate);
+ if (!list_empty(&clk->core->pending_requests)) + clk->core->default_request_rate = rate; + if (clk->exclusive_count) clk_core_rate_protect(clk->core);
@@ -2428,6 +2455,99 @@ int clk_set_max_rate(struct clk *clk, unsigned long rate) } EXPORT_SYMBOL_GPL(clk_set_max_rate);
+/** + * clk_request_start - Request a rate to be enforced temporarily + * @clk: the clk to act on + * @rate: the new rate asked for + * + * This function will create a request to temporarily increase the rate + * of the clock to a given rate to a certain minimum. + * + * This is meant as a best effort mechanism and while the rate of the + * clock will be guaranteed to be equal or higher than the requested + * rate, there's none on what the actual rate will be due to other + * factors (other requests previously set, clock boundaries, etc.). + * + * Once the request is marked as done through clk_request_done(), the + * rate will be reverted back to what the rate was before the request. + * + * The reported boundaries of the clock will also be adjusted so that + * clk_round_rate() take those requests into account. A call to + * clk_set_rate() during a request will affect the rate the clock will + * return to after the requests on that clock are done. + * + * Returns 0 on success, an ERR_PTR otherwise. + */ +struct clk_request *clk_request_start(struct clk *clk, unsigned long rate) +{ + struct clk_request *req; + int ret; + + if (!clk) + return ERR_PTR(-EINVAL); + + req = kzalloc(sizeof(*req), GFP_KERNEL); + if (!req) + return ERR_PTR(-ENOMEM); + + clk_prepare_lock(); + + req->clk = clk; + req->rate = rate; + + if (list_empty(&clk->core->pending_requests)) + clk->core->default_request_rate = clk_core_get_rate_recalc(clk->core); + + ret = clk_core_set_rate_nolock(clk->core, rate); + if (ret) { + clk_prepare_unlock(); + kfree(req); + return ERR_PTR(ret); + } + + list_add_tail(&req->list, &clk->core->pending_requests); + clk_prepare_unlock(); + + return req; +} +EXPORT_SYMBOL_GPL(clk_request_start); + +/** + * clk_request_done - Mark a clk_request as done + * @req: the request to mark done + * + * This function will remove the rate request from the clock and adjust + * the clock rate back to either to what it was before the request + * started, or if there's any other request on that clock to a proper + * rate for them. + */ +void clk_request_done(struct clk_request *req) +{ + struct clk_core *core = req->clk->core; + + clk_prepare_lock(); + + list_del(&req->list); + + if (list_empty(&core->pending_requests)) { + clk_core_set_rate_nolock(core, core->default_request_rate); + core->default_request_rate = 0; + } else { + struct clk_request *cur_req; + unsigned long new_rate = 0; + + list_for_each_entry(cur_req, &core->pending_requests, list) + new_rate = max(new_rate, cur_req->rate); + + clk_core_set_rate_nolock(core, new_rate); + } + + clk_prepare_unlock(); + + kfree(req); +} +EXPORT_SYMBOL_GPL(clk_request_done); + /** * clk_get_parent - return the parent of a clk * @clk: the clk whose parent gets returned @@ -3863,6 +3983,7 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw) goto fail_parents;
INIT_HLIST_HEAD(&core->clks); + INIT_LIST_HEAD(&core->pending_requests);
/* * Don't call clk_hw_create_clk() here because that would pin the diff --git a/include/linux/clk.h b/include/linux/clk.h index 266e8de3cb51..2aa52140d8a9 100644 --- a/include/linux/clk.h +++ b/include/linux/clk.h @@ -15,6 +15,7 @@
struct device; struct clk; +struct clk_request; struct device_node; struct of_phandle_args;
@@ -783,6 +784,9 @@ int clk_save_context(void); */ void clk_restore_context(void);
+struct clk_request *clk_request_start(struct clk *clk, unsigned long rate); +void clk_request_done(struct clk_request *req); + #else /* !CONFIG_HAVE_CLK */
static inline struct clk *clk_get(struct device *dev, const char *id)
The new clock request API allows us to increase the rate of the HSM clock to match our pixel rate requirements while decreasing it when we're done, resulting in a better power-efficiency.
Signed-off-by: Maxime Ripard maxime@cerno.tech --- drivers/gpu/drm/vc4/vc4_hdmi.c | 19 ++++++++++++------- drivers/gpu/drm/vc4/vc4_hdmi.h | 3 +++ 2 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index 1fda574579af..244053de6150 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -473,7 +473,9 @@ static void vc4_hdmi_encoder_post_crtc_powerdown(struct drm_encoder *encoder, HDMI_READ(HDMI_VID_CTL) & ~VC4_HD_VID_CTL_ENABLE);
clk_disable_unprepare(vc4_hdmi->pixel_bvb_clock); + clk_request_done(vc4_hdmi->bvb_req); clk_disable_unprepare(vc4_hdmi->hsm_clock); + clk_request_done(vc4_hdmi->hsm_req); clk_disable_unprepare(vc4_hdmi->pixel_clock);
ret = pm_runtime_put(&vc4_hdmi->pdev->dev); @@ -778,9 +780,9 @@ static void vc4_hdmi_encoder_pre_crtc_configure(struct drm_encoder *encoder, * pixel clock, but HSM ends up being the limiting factor. */ hsm_rate = max_t(unsigned long, 120000000, (pixel_rate / 100) * 101); - ret = clk_set_min_rate(vc4_hdmi->hsm_clock, hsm_rate); - if (ret) { - DRM_ERROR("Failed to set HSM clock rate: %d\n", ret); + vc4_hdmi->hsm_req = clk_request_start(vc4_hdmi->hsm_clock, hsm_rate); + if (IS_ERR(vc4_hdmi->hsm_req)) { + DRM_ERROR("Failed to set HSM clock rate: %ld\n", PTR_ERR(vc4_hdmi->hsm_req)); return; }
@@ -797,10 +799,11 @@ static void vc4_hdmi_encoder_pre_crtc_configure(struct drm_encoder *encoder, * FIXME: When the pixel freq is 594MHz (4k60), this needs to be setup * at 300MHz. */ - ret = clk_set_min_rate(vc4_hdmi->pixel_bvb_clock, - (hsm_rate > VC4_HSM_MID_CLOCK ? 150000000 : 75000000)); - if (ret) { - DRM_ERROR("Failed to set pixel bvb clock rate: %d\n", ret); + vc4_hdmi->bvb_req = clk_request_start(vc4_hdmi->pixel_bvb_clock, + (hsm_rate > VC4_HSM_MID_CLOCK ? 150000000 : 75000000)); + if (IS_ERR(vc4_hdmi->bvb_req)) { + DRM_ERROR("Failed to set pixel bvb clock rate: %ld\n", PTR_ERR(vc4_hdmi->bvb_req)); + clk_request_done(vc4_hdmi->hsm_req); clk_disable_unprepare(vc4_hdmi->hsm_clock); clk_disable_unprepare(vc4_hdmi->pixel_clock); return; @@ -809,6 +812,8 @@ static void vc4_hdmi_encoder_pre_crtc_configure(struct drm_encoder *encoder, ret = clk_prepare_enable(vc4_hdmi->pixel_bvb_clock); if (ret) { DRM_ERROR("Failed to turn on pixel bvb clock: %d\n", ret); + clk_request_done(vc4_hdmi->bvb_req); + clk_request_done(vc4_hdmi->hsm_req); clk_disable_unprepare(vc4_hdmi->hsm_clock); clk_disable_unprepare(vc4_hdmi->pixel_clock); return; diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.h b/drivers/gpu/drm/vc4/vc4_hdmi.h index 3cebd1fd00fc..9ac4a2c751df 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.h +++ b/drivers/gpu/drm/vc4/vc4_hdmi.h @@ -167,6 +167,9 @@ struct vc4_hdmi {
struct reset_control *reset;
+ struct clk_request *bvb_req; + struct clk_request *hsm_req; + struct debugfs_regset32 hdmi_regset; struct debugfs_regset32 hd_regset; };
Hi Mike, Stephen,
On Tue, Apr 13, 2021 at 12:13:18PM +0200, Maxime Ripard wrote:
Hi,
This is a follow-up of the discussion here: https://lore.kernel.org/linux-clk/20210319150355.xzw7ikwdaga2dwhv@gilmour/
This implements a mechanism to raise and lower clock rates based on consumer workloads, with an example of such an implementation for the RaspberryPi4 HDMI controller.
There's a couple of things worth discussing:
The name is in conflict with clk_request_rate, and even though it feels like the right name to me, we should probably avoid any confusion
The code so far implements a policy of always going for the lowest rate possible. While we don't have an use-case for something else, this should maybe be made more flexible?
Ping?
Maxime
Quoting Maxime Ripard (2021-04-13 03:13:18)
Hi,
This is a follow-up of the discussion here: https://lore.kernel.org/linux-clk/20210319150355.xzw7ikwdaga2dwhv@gilmour/
This implements a mechanism to raise and lower clock rates based on consumer workloads, with an example of such an implementation for the RaspberryPi4 HDMI controller.
There's a couple of things worth discussing:
The name is in conflict with clk_request_rate, and even though it feels like the right name to me, we should probably avoid any confusion
The code so far implements a policy of always going for the lowest rate possible. While we don't have an use-case for something else, this should maybe be made more flexible?
I'm definitely confused how it is different from the clk_set_rate_exclusive() API and associated clk_rate_exclusive_get()/clk_rate_exclusive_put(). Can you explain further the differences in the cover letter here?
Let me know what you think Maxime
Hi Stephen,
On Fri, Apr 30, 2021 at 01:59:39PM -0700, Stephen Boyd wrote:
Quoting Maxime Ripard (2021-04-13 03:13:18)
Hi,
This is a follow-up of the discussion here: https://lore.kernel.org/linux-clk/20210319150355.xzw7ikwdaga2dwhv@gilmour/
This implements a mechanism to raise and lower clock rates based on consumer workloads, with an example of such an implementation for the RaspberryPi4 HDMI controller.
There's a couple of things worth discussing:
The name is in conflict with clk_request_rate, and even though it feels like the right name to me, we should probably avoid any confusion
The code so far implements a policy of always going for the lowest rate possible. While we don't have an use-case for something else, this should maybe be made more flexible?
I'm definitely confused how it is different from the clk_set_rate_exclusive() API and associated clk_rate_exclusive_get()/clk_rate_exclusive_put(). Can you explain further the differences in the cover letter here?
The exclusive API is meant to prevent the clock rate from changing, allowing a single user to make sure that no other user will be able to change it.
What we want here is instead to allow multiple users to be able to express a set of minimum rates and then let the CCF figure out a rate for that clock that matches those constraints (so basically what clk_set_min_rate does), but then does allow for the clock to go back to its initial rate once that constraint is not needed anymore.
So I guess it's more akin to clk_set_min_rate with rollback than the exclusive API?
Maxime
Hi Stephen, Mike,
On Mon, May 03, 2021 at 10:32:21AM +0200, Maxime Ripard wrote:
Hi Stephen,
On Fri, Apr 30, 2021 at 01:59:39PM -0700, Stephen Boyd wrote:
Quoting Maxime Ripard (2021-04-13 03:13:18)
Hi,
This is a follow-up of the discussion here: https://lore.kernel.org/linux-clk/20210319150355.xzw7ikwdaga2dwhv@gilmour/
This implements a mechanism to raise and lower clock rates based on consumer workloads, with an example of such an implementation for the RaspberryPi4 HDMI controller.
There's a couple of things worth discussing:
The name is in conflict with clk_request_rate, and even though it feels like the right name to me, we should probably avoid any confusion
The code so far implements a policy of always going for the lowest rate possible. While we don't have an use-case for something else, this should maybe be made more flexible?
I'm definitely confused how it is different from the clk_set_rate_exclusive() API and associated clk_rate_exclusive_get()/clk_rate_exclusive_put(). Can you explain further the differences in the cover letter here?
The exclusive API is meant to prevent the clock rate from changing, allowing a single user to make sure that no other user will be able to change it.
What we want here is instead to allow multiple users to be able to express a set of minimum rates and then let the CCF figure out a rate for that clock that matches those constraints (so basically what clk_set_min_rate does), but then does allow for the clock to go back to its initial rate once that constraint is not needed anymore.
So I guess it's more akin to clk_set_min_rate with rollback than the exclusive API?
Is that rationale good enough, or did you expect something else?
Maxime
Hi Stephen, Mike,
On Mon, May 24, 2021 at 02:48:11PM +0200, Maxime Ripard wrote:
Hi Stephen, Mike,
On Mon, May 03, 2021 at 10:32:21AM +0200, Maxime Ripard wrote:
Hi Stephen,
On Fri, Apr 30, 2021 at 01:59:39PM -0700, Stephen Boyd wrote:
Quoting Maxime Ripard (2021-04-13 03:13:18)
Hi,
This is a follow-up of the discussion here: https://lore.kernel.org/linux-clk/20210319150355.xzw7ikwdaga2dwhv@gilmour/
This implements a mechanism to raise and lower clock rates based on consumer workloads, with an example of such an implementation for the RaspberryPi4 HDMI controller.
There's a couple of things worth discussing:
The name is in conflict with clk_request_rate, and even though it feels like the right name to me, we should probably avoid any confusion
The code so far implements a policy of always going for the lowest rate possible. While we don't have an use-case for something else, this should maybe be made more flexible?
I'm definitely confused how it is different from the clk_set_rate_exclusive() API and associated clk_rate_exclusive_get()/clk_rate_exclusive_put(). Can you explain further the differences in the cover letter here?
The exclusive API is meant to prevent the clock rate from changing, allowing a single user to make sure that no other user will be able to change it.
What we want here is instead to allow multiple users to be able to express a set of minimum rates and then let the CCF figure out a rate for that clock that matches those constraints (so basically what clk_set_min_rate does), but then does allow for the clock to go back to its initial rate once that constraint is not needed anymore.
So I guess it's more akin to clk_set_min_rate with rollback than the exclusive API?
Is that rationale good enough, or did you expect something else?
I'm not really sure what to do at this point. It's been over 2 months since I sent this series, and we really need that mechanism in some form or another.
I'm really fine with changing that series in any way, but I got no comment that I could address to turn this into something that would be acceptable to you. How can we move this forward?
Maxime
dri-devel@lists.freedesktop.org