This is the third iteration of the work previously posted here: (v1) https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html (v2) https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg208170.html
The high level goal of this work is to allow non-cgroup-controller parts of the kernel (e.g., device drivers) to register their own private policy data for specific cgroups. That mechanism is then made use of in the i915 graphics driver to allow GPU priority to be assigned according to the cgroup membership of the owning process. Please see the v1 cover letter linked above for a more in-depth explanation and justification.
Key changes in v3 of this series:
* The cgroup core interfaces have been redesigned as suggested by Tejun. The cgroup core now supports the following:
void cgroup_priv_install(struct cgroup *cgrp, struct cgroup_priv *priv); struct cgroup_priv *cgroup_priv_lookup(struct cgroup *cgrp, const void *key); void cgroup_priv_free(struct cgroup *cgrp, const void *key);
* Consumers of this interface may want to test the filesystem permissions of a cgroup to decide whether to allow custom syscalls or ioctls to operate on a cgroup's private data. A new cgroup_permission(cgrp_fd, mask) function has been added to make this easier.
* The i915 code looks up cgroup private data for 'current' rather than drm_file->pid to account for the device handle passing done by DRI3 (i.e., the process that opened the device filehandle isn't necessarily the same process issuing ioctl's on it).
* Other minor updates/changes based on feedback from Chris Wilson and Tejun Heo.
One open question on the i915 side of this work is whether we need to limit the "priority offset" that can be assigned via cgroup (and if so, what it should be limited to). At the moment we make the assumption that cgroup-based priority will be assigned by a privileged system administrator / system integrator, so we accept any integer value they choose to assign to the priority, even if it doesn't make sense.
As noted on previous iterations, the i915 userspace consumer for the ioctl here is a simple i-g-t tool that will be sent shortly as a followup on the intel-gfx list. The tool itself is pretty trivial, but that's intetional; the "real" userspace consumer of this work would usually be something like a sysv-init script or systemd recipe that would call that simple tool in an appropriate way to accomplish the desired policy for a specific system.
Matt Roper (6): cgroup: Allow registration and lookup of cgroup private data cgroup: Introduce task_get_dfl_cgroup() cgroup: Introduce cgroup_permission() drm/i915: cgroup integration drm/i915: Introduce 'priority offset' for GPU contexts drm/i915: Add context priority & priority offset to debugfs (v2)
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_cgroup.c | 214 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_debugfs.c | 3 + drivers/gpu/drm/i915/i915_drv.c | 5 + drivers/gpu/drm/i915/i915_drv.h | 25 ++++ drivers/gpu/drm/i915/i915_gem_context.c | 7 +- drivers/gpu/drm/i915/i915_gem_context.h | 9 ++ include/linux/cgroup-defs.h | 38 ++++++ include/linux/cgroup.h | 102 +++++++++++++++ include/uapi/drm/i915_drm.h | 13 ++ kernel/cgroup/cgroup.c | 56 +++++++++ 11 files changed, 470 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_cgroup.c
There are cases where other parts of the kernel may wish to store data associated with individual cgroups without building a full cgroup controller. Let's add interfaces to allow them to register and lookup this private data for individual cgroups.
A kernel system (e.g., a driver) that wishes to register private data for a cgroup will do so by subclassing the 'struct cgroup_priv' structure to describe the necessary data to store. Before registering a private data structure to a cgroup, the caller should fill in the 'key' and 'free' fields of the base cgroup_priv structure.
* 'key' should be a unique void* that will act as a key for future privdata lookups/removals. Note that this allows drivers to store per-device private data for a cgroup by using a device pointer as a key.
* 'free' should be a function pointer to a function that may be used to destroy the private data. This function will be called automatically if the underlying cgroup is destroyed.
Cc: Tejun Heo tj@kernel.org Cc: cgroups@vger.kernel.org Signed-off-by: Matt Roper matthew.d.roper@intel.com --- include/linux/cgroup-defs.h | 38 ++++++++++++++++++++++ include/linux/cgroup.h | 78 +++++++++++++++++++++++++++++++++++++++++++++ kernel/cgroup/cgroup.c | 14 ++++++++ 3 files changed, 130 insertions(+)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 9f242b876fde..17c679a7b5de 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -8,6 +8,7 @@ #ifndef _LINUX_CGROUP_DEFS_H #define _LINUX_CGROUP_DEFS_H
+#include <linux/hashtable.h> #include <linux/limits.h> #include <linux/list.h> #include <linux/idr.h> @@ -307,6 +308,36 @@ struct cgroup_stat { struct prev_cputime prev_cputime; };
+/* + * Private data associated with a cgroup by an indpendent (non-controller) part + * of the kernel. This is useful for things like drivers that may wish to track + * their own cgroup-specific data. + * + * If an individual cgroup is destroyed, the cgroups framework will + * automatically free all associated private data. If cgroup private data is + * registered by a kernel module, then it is the module's responsibility to + * manually free its own private data upon unload. + */ +struct cgroup_priv { + /* cgroup this private data is associated with */ + struct cgroup *cgroup; + + /* + * Lookup key that defines the in-kernel consumer of this private + * data. + */ + const void *key; + + /* + * Function to release private data. This will be automatically called + * if/when the cgroup is destroyed. + */ + void (*free)(struct cgroup_priv *priv); + + /* Hashlist node in cgroup's privdata hashtable */ + struct hlist_node hnode; +}; + struct cgroup { /* self css with NULL ->ss, points back to this cgroup */ struct cgroup_subsys_state self; @@ -427,6 +458,13 @@ struct cgroup { /* used to store eBPF programs */ struct cgroup_bpf bpf;
+ /* + * cgroup private data registered by other non-controller parts of the + * kernel + */ + DECLARE_HASHTABLE(privdata, 4); + struct mutex privdata_mutex; + /* ids of the ancestors at each level including self */ int ancestor_ids[]; }; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 473e0c0abb86..a3604b005417 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -833,4 +833,82 @@ static inline void put_cgroup_ns(struct cgroup_namespace *ns) free_cgroup_ns(ns); }
+/** + * cgroup_priv_install - install new cgroup private data + * @key: Key uniquely identifying kernel owner of private data + * + * Allows non-controller kernel subsystems to register their own private data + * associated with a cgroup. This will often be used by drivers which wish to + * track their own per-cgroup data without building a full cgroup controller. + * + * Callers should ensure that no existing private data exists for the given key + * before adding new private data. If two sets of private data are registered + * with the same key, it is undefined which will be returned by future calls + * to cgroup_priv_lookup. + * + * Kernel modules that register private data with this function should take + * care to free their private data when unloaded to prevent leaks. + */ +static inline void +cgroup_priv_install(struct cgroup *cgrp, + struct cgroup_priv *priv) +{ + WARN_ON(!mutex_is_locked(&cgrp->privdata_mutex)); + WARN_ON(!priv->key); + WARN_ON(!priv->free); + WARN_ON(priv->cgroup); + + priv->cgroup = cgrp; + hash_add(cgrp->privdata, &priv->hnode, + (unsigned long)priv->key); +} + +/** + * cgroup_priv_lookup - looks up cgroup private data + * @key: Key uniquely identifying owner of private data to lookup + * + * Looks up the private data associated with a key. + * + * Returns: + * Previously registered cgroup private data associated with the given key, or + * NULL if no private data has been registered. + */ +static inline struct cgroup_priv * +cgroup_priv_lookup(struct cgroup *cgrp, + const void *key) +{ + struct cgroup_priv *priv; + + WARN_ON(!mutex_is_locked(&cgrp->privdata_mutex)); + + hash_for_each_possible(cgrp->privdata, priv, hnode, + (unsigned long)key) + if (priv->key == key) + return priv; + + return NULL; +} + +/** + * cgroup_priv_free - free cgroup private data + * @key: Key uniquely identifying owner of private data to free + */ +static inline void +cgroup_priv_free(struct cgroup *cgrp, const void *key) +{ + struct cgroup_priv *priv; + struct hlist_node *tmp; + + mutex_lock(&cgrp->privdata_mutex); + + hash_for_each_possible_safe(cgrp->privdata, priv, tmp, hnode, + (unsigned long)key) { + hash_del(&priv->hnode); + if (priv->key == key && !WARN_ON(priv->free == NULL)) + priv->free(priv); + } + + mutex_unlock(&cgrp->privdata_mutex); +} + #endif /* _LINUX_CGROUP_H */ diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 8cda3bc3ae22..9e576dc8b566 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1839,6 +1839,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) INIT_LIST_HEAD(&cgrp->cset_links); INIT_LIST_HEAD(&cgrp->pidlists); mutex_init(&cgrp->pidlist_mutex); + hash_init(cgrp->privdata); + mutex_init(&cgrp->privdata_mutex); cgrp->self.cgroup = cgrp; cgrp->self.flags |= CSS_ONLINE; cgrp->dom_cgrp = cgrp; @@ -4578,6 +4580,9 @@ static void css_release_work_fn(struct work_struct *work) container_of(work, struct cgroup_subsys_state, destroy_work); struct cgroup_subsys *ss = css->ss; struct cgroup *cgrp = css->cgroup; + struct cgroup_priv *priv; + struct hlist_node *tmp; + int i;
mutex_lock(&cgroup_mutex);
@@ -4617,6 +4622,15 @@ static void css_release_work_fn(struct work_struct *work) NULL);
cgroup_bpf_put(cgrp); + + /* Any private data must be released automatically */ + mutex_lock(&cgrp->privdata_mutex); + hash_for_each_safe(cgrp->privdata, i, tmp, priv, hnode) { + hash_del(&priv->hnode); + if (!WARN_ON(!priv->free)) + priv->free(priv); + } + mutex_unlock(&cgrp->privdata_mutex); }
mutex_unlock(&cgroup_mutex);
Hello, Matt.
cc'ing Roman and Alexei.
On Tue, Mar 06, 2018 at 03:46:55PM -0800, Matt Roper wrote:
There are cases where other parts of the kernel may wish to store data associated with individual cgroups without building a full cgroup controller. Let's add interfaces to allow them to register and lookup this private data for individual cgroups.
A kernel system (e.g., a driver) that wishes to register private data for a cgroup will do so by subclassing the 'struct cgroup_priv' structure to describe the necessary data to store. Before registering a private data structure to a cgroup, the caller should fill in the 'key' and 'free' fields of the base cgroup_priv structure.
'key' should be a unique void* that will act as a key for future privdata lookups/removals. Note that this allows drivers to store per-device private data for a cgroup by using a device pointer as a key.
'free' should be a function pointer to a function that may be used to destroy the private data. This function will be called automatically if the underlying cgroup is destroyed.
This feature turned out to have more users than I originally anticipated and bpf also wants something like this to track network states. The requirements are pretty similar but not quite the same. The extra requirements are...
* Lookup must be really cheap. Probably using pointer hash or walking list isn't great, so maybe idr based lookup + RCU protected index table per cgroup?
* It should support both regular memory and percpu pointers. Given that what cgroup does is pretty much cgroup:key -> pointer lookup, it's mostly about getting the interface right so that it's not too error-prone.
Sorry about moving the goalpost.
Thanks.
Wraps task_dfl_cgroup() to also take a reference to the cgroup.
Cc: Tejun Heo tj@kernel.org Cc: cgroups@vger.kernel.org Signed-off-by: Matt Roper matthew.d.roper@intel.com --- include/linux/cgroup.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index a3604b005417..b1ea2064f247 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -527,6 +527,29 @@ static inline struct cgroup *task_dfl_cgroup(struct task_struct *task) return task_css_set(task)->dfl_cgrp; }
+/** + * task_get_dfl_cgroup() - find and get the cgroup for a task + * @task: the target task + * + * Find the cgroup in the v2 hierarchy that a task belongs to, increment its + * reference count, and return it. + * + * Returns: + * The appropriate cgroup from the default hierarchy. + */ +static inline struct cgroup * +task_get_dfl_cgroup(struct task_struct *task) +{ + struct cgroup *cgrp; + + mutex_lock(&cgroup_mutex); + cgrp = task_dfl_cgroup(task); + cgroup_get(cgrp); + mutex_unlock(&cgroup_mutex); + + return cgrp; +} + static inline struct cgroup *cgroup_parent(struct cgroup *cgrp) { struct cgroup_subsys_state *parent_css = cgrp->self.parent;
(cc'ing Roman)
Hello,
On Tue, Mar 06, 2018 at 03:46:56PM -0800, Matt Roper wrote:
+static inline struct cgroup * +task_get_dfl_cgroup(struct task_struct *task) +{
- struct cgroup *cgrp;
- mutex_lock(&cgroup_mutex);
- cgrp = task_dfl_cgroup(task);
- cgroup_get(cgrp);
- mutex_unlock(&cgroup_mutex);
Heh, this is super heavy. Can't we do "rcu, try get, compare & retry"?
Thanks.
Non-controller kernel subsystems may base access restrictions for cgroup-related syscalls/ioctls on a process' access to the cgroup. Let's make it easy for other parts of the kernel to check these cgroup permissions.
Cc: Tejun Heo tj@kernel.org Cc: cgroups@vger.kernel.org Signed-off-by: Matt Roper matthew.d.roper@intel.com --- include/linux/cgroup.h | 1 + kernel/cgroup/cgroup.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b1ea2064f247..dd1d1d9813e8 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -100,6 +100,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry,
struct cgroup *cgroup_get_from_path(const char *path); struct cgroup *cgroup_get_from_fd(int fd); +int cgroup_permission(int fd, int mask);
int cgroup_attach_task_all(struct task_struct *from, struct task_struct *); int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 9e576dc8b566..52d68b226867 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5781,6 +5781,48 @@ struct cgroup *cgroup_get_from_fd(int fd) } EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
+/** + * cgroup_permission - check cgroup fd permissions + * @fd: fd obtained by open(cgroup) + * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC) + * + * Check for read/write/execute permissions on a cgroup. + */ +int cgroup_permission(int fd, int mask) +{ + struct file *f; + struct inode *inode; + struct cgroup_subsys_state *css; + int ret; + + f = fget_raw(fd); + if (!f) + return -EBADF; + + css = css_tryget_online_from_dir(f->f_path.dentry, NULL); + if (IS_ERR(css)) { + ret = PTR_ERR(css); + goto out_file; + } + + inode = kernfs_get_inode(f->f_path.dentry->d_sb, css->cgroup->kn); + if (!inode) { + ret = -ENOMEM; + goto out_cgroup; + } + + ret = inode_permission(inode, mask); + iput(inode); + +out_cgroup: + cgroup_put(css->cgroup); +out_file: + fput(f); + + return ret; +} +EXPORT_SYMBOL_GPL(cgroup_permission); + /* * sock->sk_cgrp_data handling. For more info, see sock_cgroup_data * definition in cgroup-defs.h.
On Tue, Mar 06, 2018 at 03:46:57PM -0800, Matt Roper wrote:
Non-controller kernel subsystems may base access restrictions for cgroup-related syscalls/ioctls on a process' access to the cgroup. Let's make it easy for other parts of the kernel to check these cgroup permissions.
I'm not sure about pulling in cgroup perms this way because cgroup access perms have more to it than the file and directory permissions. Can't this be restricted to root or some CAP?
Thanks.
Introduce a new DRM_IOCTL_I915_CGROUP_SETPARAM ioctl that will allow userspace to set i915-specific parameters for individual cgroups. i915 cgroup data will be registered and later looked up via the new cgroup_priv infrastructure.
v2: - Large rebase/rewrite for new cgroup_priv interface
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_cgroup.c | 167 +++++++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_drv.c | 5 ++ drivers/gpu/drm/i915/i915_drv.h | 24 ++++++ include/uapi/drm/i915_drm.h | 12 +++ 5 files changed, 209 insertions(+) create mode 100644 drivers/gpu/drm/i915/i915_cgroup.c
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 1bd9bc5b8c5c..5974e32834bf 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -48,6 +48,7 @@ i915-y := i915_drv.o \ i915-$(CONFIG_COMPAT) += i915_ioc32.o i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o +i915-$(CONFIG_CGROUPS) += i915_cgroup.o
# GEM code i915-y += i915_cmd_parser.o \ diff --git a/drivers/gpu/drm/i915/i915_cgroup.c b/drivers/gpu/drm/i915/i915_cgroup.c new file mode 100644 index 000000000000..4a46cb167f53 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_cgroup.c @@ -0,0 +1,167 @@ +/* SPDX-License-Identifier: MIT */ +/* + * i915_cgroup.c - Linux cgroups integration for i915 + * + * Copyright (C) 2018 Intel Corporation + */ + +#include <linux/cgroup.h> + +#include "i915_drv.h" + +struct i915_cgroup_data { + struct cgroup_priv base; + + struct list_head node; +}; + +static inline struct i915_cgroup_data * +cgrp_to_i915(struct cgroup_priv *priv) +{ + return container_of(priv, struct i915_cgroup_data, base); +} + +static void +i915_cgroup_free(struct cgroup_priv *priv) +{ + struct cgroup *cgrp = priv->cgroup; + struct i915_cgroup_data *ipriv; + + WARN_ON(!mutex_is_locked(&cgrp->privdata_mutex)); + + ipriv = cgrp_to_i915(priv); + + /* + * Remove private data from both cgroup's hashtable and i915's list. + * If this function is being called as a result of cgroup removal + * (as opposed to an i915 unload), it will have already been removed from + * the hashtable, but the hash_del() call here is still safe. + */ + hash_del(&priv->hnode); + list_del(&ipriv->node); + + kfree(ipriv); +} + +void +i915_cgroup_init(struct drm_i915_private *dev_priv) +{ + INIT_LIST_HEAD(&dev_priv->cgroup_list); +} + +void +i915_cgroup_shutdown(struct drm_i915_private *dev_priv) +{ + struct i915_cgroup_data *priv, *tmp; + struct cgroup *cgrp; + + mutex_lock(&cgroup_mutex); + + list_for_each_entry_safe(priv, tmp, &dev_priv->cgroup_list, node) { + cgrp = priv->base.cgroup; + + mutex_lock(&cgrp->privdata_mutex); + i915_cgroup_free(&priv->base); + mutex_unlock(&cgrp->privdata_mutex); + } + + mutex_unlock(&cgroup_mutex); +} + +/* + * Return i915 cgroup private data, creating and registering it if one doesn't + * already exist for this cgroup. + */ +static struct i915_cgroup_data * +get_or_create_cgroup_data(struct drm_i915_private *dev_priv, + struct cgroup *cgrp) +{ + struct cgroup_priv *priv; + struct i915_cgroup_data *ipriv; + + mutex_lock(&cgrp->privdata_mutex); + + priv = cgroup_priv_lookup(cgrp, dev_priv); + if (priv) { + ipriv = cgrp_to_i915(priv); + } else { + ipriv = kzalloc(sizeof *ipriv, GFP_KERNEL); + if (!ipriv) { + ipriv = ERR_PTR(-ENOMEM); + goto out; + } + + ipriv->base.key = dev_priv; + ipriv->base.free = i915_cgroup_free; + list_add(&ipriv->node, &dev_priv->cgroup_list); + + cgroup_priv_install(cgrp, &ipriv->base); + } + +out: + mutex_unlock(&cgrp->privdata_mutex); + + return ipriv; +} + +/** + * i915_cgroup_setparam_ioctl - ioctl to alter i915 settings for a cgroup + * @dev: DRM device + * @data: data pointer for the ioctl + * @file_priv: DRM file handle for the ioctl call + * + * Allows i915-specific parameters to be set for a Linux cgroup. + */ +int +i915_cgroup_setparam_ioctl(struct drm_device *dev, + void *data, + struct drm_file *file) +{ + struct drm_i915_cgroup_param *req = data; + struct cgroup *cgrp; + int ret; + + /* We don't actually support any flags yet. */ + if (req->flags) { + DRM_DEBUG_DRIVER("Invalid flags\n"); + return -EINVAL; + } + + /* + * Make sure the file descriptor really is a cgroup fd and is on the + * v2 hierarchy. + */ + cgrp = cgroup_get_from_fd(req->cgroup_fd); + if (IS_ERR(cgrp)) { + DRM_DEBUG_DRIVER("Invalid cgroup file descriptor\n"); + return PTR_ERR(cgrp); + } + + /* + * Access control: The strategy for using cgroups in a given + * environment is generally determined by the system integrator + * and/or OS vendor, so the specific policy about who can/can't + * manipulate them tends to be domain-specific (and may vary + * depending on the location in the cgroup hierarchy). Rather than + * trying to tie permission on this ioctl to a DRM-specific concepts + * like DRM master, we'll allow cgroup parameters to be set by any + * process that has been granted write access on the cgroup's + * virtual file system (i.e., the same permissions that would + * generally be needed to update the virtual files provided by + * cgroup controllers). + */ + ret = cgroup_permission(req->cgroup_fd, MAY_WRITE); + if (ret) + goto out; + + switch (req->param) { + default: + DRM_DEBUG_DRIVER("Invalid cgroup parameter %lld\n", req->param); + ret = -EINVAL; + } + +out: + cgroup_put(cgrp); + + return ret; +} diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index d61b51c0bf0b..a1c7bc9cd173 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1405,6 +1405,8 @@ int i915_driver_load(struct pci_dev *pdev, const struct pci_device_id *ent)
intel_runtime_pm_put(dev_priv);
+ i915_cgroup_init(dev_priv); + i915_welcome_messages(dev_priv);
return 0; @@ -1431,6 +1433,8 @@ void i915_driver_unload(struct drm_device *dev) struct drm_i915_private *dev_priv = to_i915(dev); struct pci_dev *pdev = dev_priv->drm.pdev;
+ i915_cgroup_shutdown(dev_priv); + i915_driver_unregister(dev_priv);
if (i915_gem_suspend(dev_priv)) @@ -2832,6 +2836,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_CGROUP_SETPARAM, i915_cgroup_setparam_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), };
static struct drm_driver driver = { diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7eec99d7fad4..7c38142ee009 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2006,6 +2006,9 @@ struct drm_i915_private {
struct intel_ppat ppat;
+ /* cgroup private data */ + struct list_head cgroup_list; + /* Kernel Modesetting */
struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES]; @@ -2938,6 +2941,27 @@ intel_ggtt_update_needs_vtd_wa(struct drm_i915_private *dev_priv) int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv, int enable_ppgtt);
+/* i915_cgroup.c */ +#ifdef CONFIG_CGROUPS +void i915_cgroup_init(struct drm_i915_private *dev_priv); +int i915_cgroup_setparam_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +void i915_cgroup_shutdown(struct drm_i915_private *dev_priv); +#else +static inline int +i915_cgroup_init(struct drm_i915_private *dev_priv) +{ + return 0; +} +static inline void i915_cgroup_shutdown(struct drm_i915_private *dev_priv) {} +static inline int +i915_cgroup_setparam_ioctl(struct drm_device *dev, void *data, + struct drm_file *file) +{ + return -EINVAL; +} +#endif + /* i915_drv.c */ void __printf(3, 4) __i915_printk(struct drm_i915_private *dev_priv, const char *level, diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 29fa48e4755d..b6651b263838 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -318,6 +318,7 @@ typedef struct _drm_i915_sarea { #define DRM_I915_PERF_OPEN 0x36 #define DRM_I915_PERF_ADD_CONFIG 0x37 #define DRM_I915_PERF_REMOVE_CONFIG 0x38 +#define DRM_I915_CGROUP_SETPARAM 0x39
#define DRM_IOCTL_I915_INIT DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t) #define DRM_IOCTL_I915_FLUSH DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH) @@ -375,6 +376,7 @@ typedef struct _drm_i915_sarea { #define DRM_IOCTL_I915_PERF_OPEN DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param) #define DRM_IOCTL_I915_PERF_ADD_CONFIG DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config) #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64) +#define DRM_IOCTL_I915_CGROUP_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_I915_CGROUP_SETPARAM, struct drm_i915_cgroup_param)
/* Allow drivers to submit batchbuffers directly to hardware, relying * on the security mechanisms provided by hardware. @@ -1615,6 +1617,16 @@ struct drm_i915_perf_oa_config { __u64 flex_regs_ptr; };
+/** + * Structure to set i915 parameter on a cgroup. + */ +struct drm_i915_cgroup_param { + __s32 cgroup_fd; + __u32 flags; + __u64 param; + __s64 value; +}; + #if defined(__cplusplus) } #endif
There are cases where a system integrator may wish to raise/lower the priority of GPU workloads being submitted by specific OS process(es), independently of how the software self-classifies its own priority. Exposing "priority offset" as an i915-specific cgroup parameter will enable such system-level configuration.
Normally GPU contexts start with a priority value of 0 (I915_CONTEXT_DEFAULT_PRIORITY) and then may be adjusted up/down from there via other mechanisms. We'd like to provide a system-level input to the priority decision that will be taken into consideration, even when userspace later attempts to set an absolute priority value via I915_CONTEXT_PARAM_PRIORITY. The priority offset introduced here provides a base value that will always be added to (or subtracted from) the software's self-assigned priority value.
This patch makes priority offset a cgroup-specific value; contexts will be created with a priority offset based on the cgroup membership of the process creating the context at the time the context is created. Note that priority offset is assigned at context creation time; migrating a process to a different cgroup or changing the offset associated with a cgroup will only affect new context creation and will not alter the behavior of existing contexts previously created by the process.
v2: - Rebase onto new cgroup_priv API - Use current instead of drm_file->pid to determine which process to lookup priority for. (Chris) - Don't forget to subtract priority offset in context_getparam ioctl to make it match setparam behavior. (Chris)
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/i915_cgroup.c | 47 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_drv.h | 15 ++++++----- drivers/gpu/drm/i915/i915_gem_context.c | 7 ++--- drivers/gpu/drm/i915/i915_gem_context.h | 9 +++++++ include/uapi/drm/i915_drm.h | 1 + 5 files changed, 69 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_cgroup.c b/drivers/gpu/drm/i915/i915_cgroup.c index 4a46cb167f53..6a28b1a23971 100644 --- a/drivers/gpu/drm/i915/i915_cgroup.c +++ b/drivers/gpu/drm/i915/i915_cgroup.c @@ -13,6 +13,8 @@ struct i915_cgroup_data { struct cgroup_priv base;
struct list_head node; + + int priority_offset; };
static inline struct i915_cgroup_data * @@ -117,8 +119,10 @@ i915_cgroup_setparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { + struct drm_i915_private *dev_priv = to_i915(dev); struct drm_i915_cgroup_param *req = data; struct cgroup *cgrp; + struct i915_cgroup_data *cgrpdata; int ret;
/* We don't actually support any flags yet. */ @@ -154,7 +158,18 @@ i915_cgroup_setparam_ioctl(struct drm_device *dev, if (ret) goto out;
+ cgrpdata = get_or_create_cgroup_data(dev_priv, cgrp); + if (IS_ERR(cgrpdata)) { + ret = PTR_ERR(cgrpdata); + goto out; + } + switch (req->param) { + case I915_CGROUP_PARAM_PRIORITY_OFFSET: + DRM_DEBUG_DRIVER("Setting cgroup priority offset to %lld\n", + req->value); + cgrpdata->priority_offset = req->value; + break; default: DRM_DEBUG_DRIVER("Invalid cgroup parameter %lld\n", req->param); ret = -EINVAL; @@ -165,3 +180,35 @@ i915_cgroup_setparam_ioctl(struct drm_device *dev,
return ret; } + +/** + * i915_cgroup_get_prio_offset() - get prio offset for current proc's cgroup + * @dev_priv: drm device + * @file_priv: file handle for calling process + * + * RETURNS: + * Priority offset associated with the calling process' cgroup in the default + * (v2) hierarchy, otherwise 0 if no explicit priority has been assigned. + */ +int +i915_cgroup_get_current_prio_offset(struct drm_i915_private *dev_priv) +{ + struct cgroup *cgrp; + struct cgroup_priv *cgrpdata; + int offset = 0; + + cgrp = task_get_dfl_cgroup(current); + if (WARN_ON(!cgrp)) + return 0; + + mutex_lock(&cgrp->privdata_mutex); + + cgrpdata = cgroup_priv_lookup(cgrp, dev_priv); + if (cgrpdata) + offset = cgrp_to_i915(cgrpdata)->priority_offset; + + mutex_unlock(&cgrp->privdata_mutex); + cgroup_put(cgrp); + + return offset; +} diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7c38142ee009..51b80926d20c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2947,18 +2947,19 @@ void i915_cgroup_init(struct drm_i915_private *dev_priv); int i915_cgroup_setparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file); void i915_cgroup_shutdown(struct drm_i915_private *dev_priv); +int i915_cgroup_get_current_prio_offset(struct drm_i915_private *dev_priv); #else -static inline int -i915_cgroup_init(struct drm_i915_private *dev_priv) +static inline void i915_cgroup_init(struct drm_i915_private *dev_priv) {} +static inline void i915_cgroup_shutdown(struct drm_i915_private *dev_priv) {} +static inline int i915_cgroup_setparam_ioctl(struct drm_device *dev, void *data, + struct drm_file *file) { - return 0; + return -EINVAL; } -static inline void i915_cgroup_shutdown(struct drm_i915_private *dev_priv) {} static inline int -i915_cgroup_setparam_ioctl(struct drm_device *dev, void *data, - struct drm_file *file) +i915_cgroup_get_current_prio_offset(struct drm_i915_private *dev_priv) { - return -EINVAL; + return 0; } #endif
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index a73340ae9419..b29fbe0c776f 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -274,7 +274,8 @@ __create_hw_context(struct drm_i915_private *dev_priv, kref_init(&ctx->ref); list_add_tail(&ctx->link, &dev_priv->contexts.list); ctx->i915 = dev_priv; - ctx->priority = I915_PRIORITY_NORMAL; + ctx->priority_offset = i915_cgroup_get_current_prio_offset(dev_priv); + ctx->priority = I915_PRIORITY_NORMAL + ctx->priority_offset;
INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL); INIT_LIST_HEAD(&ctx->handles_list); @@ -739,7 +740,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, args->value = i915_gem_context_is_bannable(ctx); break; case I915_CONTEXT_PARAM_PRIORITY: - args->value = ctx->priority; + args->value = ctx->priority - ctx->priority_offset; break; default: ret = -EINVAL; @@ -812,7 +813,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, !capable(CAP_SYS_NICE)) ret = -EPERM; else - ctx->priority = priority; + ctx->priority = priority + ctx->priority_offset; } break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h index 7854262ddfd9..c3b4fb54fbb6 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.h +++ b/drivers/gpu/drm/i915/i915_gem_context.h @@ -150,6 +150,15 @@ struct i915_gem_context { */ int priority;
+ /** + * @priority_offset: priority offset + * + * A value, configured via cgroup, that sets the starting priority + * of the context. Any priority set explicitly via context parameter + * will be added to the priority offset. + */ + int priority_offset; + /** ggtt_offset_bias: placement restriction for context objects */ u32 ggtt_offset_bias;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index b6651b263838..5b67aaf8a1c0 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1624,6 +1624,7 @@ struct drm_i915_cgroup_param { __s32 cgroup_fd; __u32 flags; __u64 param; +#define I915_CGROUP_PARAM_PRIORITY_OFFSET 0x1 __s64 value; };
Quoting Matt Roper (2018-03-06 23:46:59)
There are cases where a system integrator may wish to raise/lower the priority of GPU workloads being submitted by specific OS process(es), independently of how the software self-classifies its own priority. Exposing "priority offset" as an i915-specific cgroup parameter will enable such system-level configuration.
Normally GPU contexts start with a priority value of 0 (I915_CONTEXT_DEFAULT_PRIORITY) and then may be adjusted up/down from there via other mechanisms. We'd like to provide a system-level input to the priority decision that will be taken into consideration, even when userspace later attempts to set an absolute priority value via I915_CONTEXT_PARAM_PRIORITY. The priority offset introduced here provides a base value that will always be added to (or subtracted from) the software's self-assigned priority value.
This patch makes priority offset a cgroup-specific value; contexts will be created with a priority offset based on the cgroup membership of the process creating the context at the time the context is created. Note that priority offset is assigned at context creation time; migrating a process to a different cgroup or changing the offset associated with a cgroup will only affect new context creation and will not alter the behavior of existing contexts previously created by the process.
v2:
- Rebase onto new cgroup_priv API
- Use current instead of drm_file->pid to determine which process to lookup priority for. (Chris)
- Don't forget to subtract priority offset in context_getparam ioctl to make it match setparam behavior. (Chris)
Signed-off-by: Matt Roper matthew.d.roper@intel.com
For ctx->priority/ctx->priority_offset Reviewed-by: Chris Wilson chris@chris-wilson.co.uk
At the end of the day, everything that is modifiable by context is going to want cgroup constraint, but like priority_offset each will require some thought as to how to express the constraint.
Interesting conundrum, and still we want a consistent interface for all the gpus on a system. -Chris
Quoting Chris Wilson (2018-03-08 11:32:04)
Quoting Matt Roper (2018-03-06 23:46:59)
There are cases where a system integrator may wish to raise/lower the priority of GPU workloads being submitted by specific OS process(es), independently of how the software self-classifies its own priority. Exposing "priority offset" as an i915-specific cgroup parameter will enable such system-level configuration.
Normally GPU contexts start with a priority value of 0 (I915_CONTEXT_DEFAULT_PRIORITY) and then may be adjusted up/down from there via other mechanisms. We'd like to provide a system-level input to the priority decision that will be taken into consideration, even when userspace later attempts to set an absolute priority value via I915_CONTEXT_PARAM_PRIORITY. The priority offset introduced here provides a base value that will always be added to (or subtracted from) the software's self-assigned priority value.
This patch makes priority offset a cgroup-specific value; contexts will be created with a priority offset based on the cgroup membership of the process creating the context at the time the context is created. Note that priority offset is assigned at context creation time; migrating a process to a different cgroup or changing the offset associated with a cgroup will only affect new context creation and will not alter the behavior of existing contexts previously created by the process.
v2:
- Rebase onto new cgroup_priv API
- Use current instead of drm_file->pid to determine which process to lookup priority for. (Chris)
- Don't forget to subtract priority offset in context_getparam ioctl to make it match setparam behavior. (Chris)
Signed-off-by: Matt Roper matthew.d.roper@intel.com
For ctx->priority/ctx->priority_offset Reviewed-by: Chris Wilson chris@chris-wilson.co.uk
As a reminder, we do have the question of how to bound ctx->priority + ctx->priority_offset. Currently, outside of the user range is privileged space reserved for the kernel (both above and below). Now we can move those even further to accommodate multiple non-overlapping cgroups. We also have the presumption that only privileged processes can raise a priority, but I presume the cgroup property itself is protected. -Chris
On Thu, Mar 08, 2018 at 01:11:33PM +0000, Chris Wilson wrote:
Quoting Chris Wilson (2018-03-08 11:32:04)
Quoting Matt Roper (2018-03-06 23:46:59)
There are cases where a system integrator may wish to raise/lower the priority of GPU workloads being submitted by specific OS process(es), independently of how the software self-classifies its own priority. Exposing "priority offset" as an i915-specific cgroup parameter will enable such system-level configuration.
Normally GPU contexts start with a priority value of 0 (I915_CONTEXT_DEFAULT_PRIORITY) and then may be adjusted up/down from there via other mechanisms. We'd like to provide a system-level input to the priority decision that will be taken into consideration, even when userspace later attempts to set an absolute priority value via I915_CONTEXT_PARAM_PRIORITY. The priority offset introduced here provides a base value that will always be added to (or subtracted from) the software's self-assigned priority value.
This patch makes priority offset a cgroup-specific value; contexts will be created with a priority offset based on the cgroup membership of the process creating the context at the time the context is created. Note that priority offset is assigned at context creation time; migrating a process to a different cgroup or changing the offset associated with a cgroup will only affect new context creation and will not alter the behavior of existing contexts previously created by the process.
v2:
- Rebase onto new cgroup_priv API
- Use current instead of drm_file->pid to determine which process to lookup priority for. (Chris)
- Don't forget to subtract priority offset in context_getparam ioctl to make it match setparam behavior. (Chris)
Signed-off-by: Matt Roper matthew.d.roper@intel.com
For ctx->priority/ctx->priority_offset Reviewed-by: Chris Wilson chris@chris-wilson.co.uk
As a reminder, we do have the question of how to bound ctx->priority + ctx->priority_offset. Currently, outside of the user range is privileged space reserved for the kernel (both above and below). Now we can move those even further to accommodate multiple non-overlapping cgroups.
Yep, I mentioned this as an open question in the series coverletter.
My understanding is that today we limit userspace-assigned priorities to [1023,1023] and that the kernel can use the userspace range plus -1024 (for the kernel idle context), 1024 (for boosting contexts the display is waiting on), and INT_MAX (for the preempt context). Are there any other special values we use today that I'm overlooking?
I think we have the following options with how to proceed:
* Option 1: Silently limit (priority+priority offset) to the existing [-1023,1023] range. That means that if, for example, a user context had a priority offset of 600 and tried to manually boost its context priority to 600, it would simply be treated as a 1023 for scheduling purposes. This approach is simple, but doesn't allow us to force contexts in different cgroups into non-overlapping priority ranges (i.e., lowest possible priority in one cgroup is still higher than the highest possible priority in another cgroup). It also isn't possible to define a class of applications as "more important than display" via cgroup, which I think might be useful in some cases.
* Option 2: Allow priority offset to be set in a much larger range (e.g., [SHRT_MIN+1023,SHRT_MAX-1023]). This allows cgroups to have effective priority ranges that don't overlap. cgroup ranges can also be intentionally set above I915_PRIORITY_DISPLAY to allow us to define classes of applications whose work is more important than maintaining a stable framerate on the display. We'd also probably need to shift the kernel idle context down to something like INT_MIN instead of using -1024.
* Option 3: No limits, leave behavior as it stands now in this patch series (unrestricted range). If you're privileged enough to define the cgroup settings for a system and you decide to shoot yourself in the foot by setting them to nonsense values, that's your own fault.
Thoughts? Any other options to consider?
We also have the presumption that only privileged processes can raise a priority, but I presume the cgroup property itself is protected. -Chris
The way the code is written write now, anyone who has write access to the cgroup itself (i.e., can migrate processes into/out of the cgroup) is allowed to adjust the i915-specific cgroup settings. So this depends on how the cgroups-v2 filesystem was initially mounted and/or how the filesystem permissions were adjusted after the fact; the details may vary from system to system depending on the policy decisions of the system integrator / system administrator. But I think an off-the-shelf Linux distro will only give the system administrator sufficient permissions to adjust cgroups (if it even mounts the cgroups-v2 filesystem in the first place), so anyone else that winds up with those privileges has had them intentionally delegated.
Matt
Quoting Matt Roper (2018-03-08 18:22:06)
On Thu, Mar 08, 2018 at 01:11:33PM +0000, Chris Wilson wrote:
Quoting Chris Wilson (2018-03-08 11:32:04)
Quoting Matt Roper (2018-03-06 23:46:59)
There are cases where a system integrator may wish to raise/lower the priority of GPU workloads being submitted by specific OS process(es), independently of how the software self-classifies its own priority. Exposing "priority offset" as an i915-specific cgroup parameter will enable such system-level configuration.
Normally GPU contexts start with a priority value of 0 (I915_CONTEXT_DEFAULT_PRIORITY) and then may be adjusted up/down from there via other mechanisms. We'd like to provide a system-level input to the priority decision that will be taken into consideration, even when userspace later attempts to set an absolute priority value via I915_CONTEXT_PARAM_PRIORITY. The priority offset introduced here provides a base value that will always be added to (or subtracted from) the software's self-assigned priority value.
This patch makes priority offset a cgroup-specific value; contexts will be created with a priority offset based on the cgroup membership of the process creating the context at the time the context is created. Note that priority offset is assigned at context creation time; migrating a process to a different cgroup or changing the offset associated with a cgroup will only affect new context creation and will not alter the behavior of existing contexts previously created by the process.
v2:
- Rebase onto new cgroup_priv API
- Use current instead of drm_file->pid to determine which process to lookup priority for. (Chris)
- Don't forget to subtract priority offset in context_getparam ioctl to make it match setparam behavior. (Chris)
Signed-off-by: Matt Roper matthew.d.roper@intel.com
For ctx->priority/ctx->priority_offset Reviewed-by: Chris Wilson chris@chris-wilson.co.uk
As a reminder, we do have the question of how to bound ctx->priority + ctx->priority_offset. Currently, outside of the user range is privileged space reserved for the kernel (both above and below). Now we can move those even further to accommodate multiple non-overlapping cgroups.
Yep, I mentioned this as an open question in the series coverletter.
My understanding is that today we limit userspace-assigned priorities to [1023,1023] and that the kernel can use the userspace range plus -1024 (for the kernel idle context), 1024 (for boosting contexts the display is waiting on), and INT_MAX (for the preempt context). Are there any other special values we use today that I'm overlooking?
I think we have the following options with how to proceed:
- Option 1: Silently limit (priority+priority offset) to the existing [-1023,1023] range. That means that if, for example, a user context had a priority offset of 600 and tried to manually boost its context priority to 600, it would simply be treated as a 1023 for scheduling purposes. This approach is simple, but doesn't allow us to force contexts in different cgroups into non-overlapping priority ranges (i.e., lowest possible priority in one cgroup is still higher than the highest possible priority in another cgroup). It also isn't possible to define a class of applications as "more important than display" via cgroup, which I think might be useful in some cases.
Agreed, non-overlapping seems to be a useful property (apply the usual disclaimer for the rudimentary scheduler).
- Option 2: Allow priority offset to be set in a much larger range (e.g., [SHRT_MIN+1023,SHRT_MAX-1023]). This allows cgroups to have effective priority ranges that don't overlap. cgroup ranges can also be intentionally set above I915_PRIORITY_DISPLAY to allow us to define classes of applications whose work is more important than maintaining a stable framerate on the display. We'd also probably need to shift the kernel idle context down to something like INT_MIN instead of using -1024.
INT_MIN+1 just to be awkward. (INT_MIN is I915_PRIORITY_INVALID.)
Something to bear in mind is that I want to reserve the low 8 bits of ctx->priority for internal use (heuristic adjustments by the scheduler). Can shrink that to say 3 bits in the current scheme.
3bits for internal adjustment 20bits for user priority. 8bits for cgroup offset. signbit.
Nothing prohibits us from moving to 64b if required. But 32b should be enough range for plenty of clients and cgroups, imo. Reducing cgroup offset to 6 bits would be nice :)
- Option 3: No limits, leave behavior as it stands now in this patch series (unrestricted range). If you're privileged enough to define the cgroup settings for a system and you decide to shoot yourself in the foot by setting them to nonsense values, that's your own fault.
Overflow have a habit of catching us out, so I'd like to rule that out. And once we define a suitable range, it's hard to reduce it without userspace noticing and screaming regression.
Thoughts? Any other options to consider?
Another option would be to say keep a plist for each cgroup and round-robin between cgroups (or somesuch, or just a plist of cgroups to keep things priority based). Down that road lies CFS with fairness inside cgroups and between cgroups.
We also have the presumption that only privileged processes can raise a priority, but I presume the cgroup property itself is protected.
The way the code is written write now, anyone who has write access to the cgroup itself (i.e., can migrate processes into/out of the cgroup) is allowed to adjust the i915-specific cgroup settings. So this depends on how the cgroups-v2 filesystem was initially mounted and/or how the filesystem permissions were adjusted after the fact; the details may vary from system to system depending on the policy decisions of the system integrator / system administrator. But I think an off-the-shelf Linux distro will only give the system administrator sufficient permissions to adjust cgroups (if it even mounts the cgroups-v2 filesystem in the first place), so anyone else that winds up with those privileges has had them intentionally delegated.
Ok. I just worry about how easy it is to starve the system and accidentally DoS oneself. :) -Chris
Quoting Chris Wilson (2018-03-08 18:48:51)
Quoting Matt Roper (2018-03-08 18:22:06)
- Option 2: Allow priority offset to be set in a much larger range (e.g., [SHRT_MIN+1023,SHRT_MAX-1023]). This allows cgroups to have effective priority ranges that don't overlap. cgroup ranges can also be intentionally set above I915_PRIORITY_DISPLAY to allow us to define classes of applications whose work is more important than maintaining a stable framerate on the display. We'd also probably need to shift the kernel idle context down to something like INT_MIN instead of using -1024.
INT_MIN+1 just to be awkward. (INT_MIN is I915_PRIORITY_INVALID.)
Something to bear in mind is that I want to reserve the low 8 bits of ctx->priority for internal use (heuristic adjustments by the scheduler). Can shrink that to say 3 bits in the current scheme.
3bits for internal adjustment 20bits for user priority. 8bits for cgroup offset. signbit.
Nothing prohibits us from moving to 64b if required. But 32b should be enough range for plenty of clients and cgroups, imo. Reducing cgroup offset to 6 bits would be nice :)
Forgot to comment on the policy decision of I915_PRIORITY_DISPLAY. It's naughty that it's a magic constant that isn't exposed to the sysadmin, so I would still set it to the maximum priority possible under the extended scheme (i.e. INT_MAX), but allow for it to be adjusted. I915_SETPARAM *shivers* Maybe that's a topic for cgroup as well if we can associate the pageflip with a process and find its interactivity settings.
Can I expose just about any random policy decision through cgroup? -Chris
On Thu, Mar 08, 2018 at 06:55:34PM +0000, Chris Wilson wrote:
Quoting Chris Wilson (2018-03-08 18:48:51)
Quoting Matt Roper (2018-03-08 18:22:06)
- Option 2: Allow priority offset to be set in a much larger range (e.g., [SHRT_MIN+1023,SHRT_MAX-1023]). This allows cgroups to have effective priority ranges that don't overlap. cgroup ranges can also be intentionally set above I915_PRIORITY_DISPLAY to allow us to define classes of applications whose work is more important than maintaining a stable framerate on the display. We'd also probably need to shift the kernel idle context down to something like INT_MIN instead of using -1024.
INT_MIN+1 just to be awkward. (INT_MIN is I915_PRIORITY_INVALID.)
Something to bear in mind is that I want to reserve the low 8 bits of ctx->priority for internal use (heuristic adjustments by the scheduler). Can shrink that to say 3 bits in the current scheme.
3bits for internal adjustment 20bits for user priority. 8bits for cgroup offset. signbit.
Nothing prohibits us from moving to 64b if required. But 32b should be enough range for plenty of clients and cgroups, imo. Reducing cgroup offset to 6 bits would be nice :)
Forgot to comment on the policy decision of I915_PRIORITY_DISPLAY. It's naughty that it's a magic constant that isn't exposed to the sysadmin, so I would still set it to the maximum priority possible under the extended scheme (i.e. INT_MAX), but allow for it to be adjusted. I915_SETPARAM *shivers* Maybe that's a topic for cgroup as well if we can associate the pageflip with a process and find its interactivity settings.
Can I expose just about any random policy decision through cgroup? -Chris
If the policy applies to a cgroup of processes, then we can deal with pretty much any kind of policy as long as we stick to the driver ioctl approach in this series. E.g., we could add a cgroup setting "eligible for display boost" so that only specific processes are eligible for a display boost.
If we want to control a single overall system value (e.g., "set the display boost fixed value") we could technically do that by having such parameters set on the v2 hierarchy's root cgroup, but that seems a bit counterintuitive and I'm not sure if there's a benefit to doing so. Using a more general interface like I915_SETPARAM or sysfs or something seems more appropriate in that case.
Matt
Update i915_context_status to include priority information.
v2: - Clarify that the offset is based on cgroup (Chris)
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index e838c765b251..eb1cb9a9f5b0 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1960,6 +1960,9 @@ static int i915_context_status(struct seq_file *m, void *unused) seq_putc(m, ctx->remap_slice ? 'R' : 'r'); seq_putc(m, '\n');
+ seq_printf(m, "Priority %d (cgroup offset = %d)\n", + ctx->priority, ctx->priority_offset); + for_each_engine(engine, dev_priv, id) { struct intel_context *ce = &ctx->engine[engine->id];
dri-devel@lists.freedesktop.org