We'd like to revisit the proposal of a GPU cgroup controller for managing GPU devices but with just a basic set of controls. This series is based on the prior patch series from Kenny Ho [1]. We take Kenny's base patches which implement the basic framework for the controller, but we propose an alternate set of control files. Here we've taken a subset of the controls proposed in earlier discussion on ML here [2].
This series proposes a set of device memory controls (gpu.memory.current, gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage (gpu.sched.runtime). GPU time sharing controls are left as future work. These are implemented within the GPU controller along with integration/usage of the device memory controls by the i915 device driver.
As an accelerator or GPU device is similar in many respects to a CPU with (or without) attached system memory, the basic principle here is try to copy the semantics of existing controls from other controllers when possible and where these controls serve the same underlying purpose. For example, the memory.max and memory.current controls are based on same controls from MEMCG controller.
Following with the implementation used by the existing RDMA controller, here we introduce a general purpose drm_cgroup_try_charge and uncharge pair of exported functions. These functions are to be used for charging and uncharging all current and future DRM resource controls.
Patches 1 - 4 are part original work and part refactoring of the prior work from Kenny Ho from his series for GPU / DRM controller v2 [1].
Patches 5 - 7 introduce new controls to the GPU / DRM controller for device memory accounting and GPU time tracking.
Patch 8 introduces DRM support for associating GEM objects with a cgroup.
Patch 9 implements i915 changes to use cgroups for device memory charging and enforcing device memory allocation limit.
[1] https://lists.freedesktop.org/archives/dri-devel/2020-February/257052.html [2] https://lists.freedesktop.org/archives/dri-devel/2019-November/242599.html
Brian Welty (6): drmcg: Add skeleton seq_show and write for drmcg files drmcg: Add support for device memory accounting via page counter drmcg: Add memory.total file drmcg: Add initial support for tracking gpu time usage drm/gem: Associate GEM objects with drm cgroup drm/i915: Use memory cgroup for enforcing device memory limit
Kenny Ho (3): cgroup: Introduce cgroup for drm subsystem drm, cgroup: Bind drm and cgroup subsystem drm, cgroup: Initialize drmcg properties
Documentation/admin-guide/cgroup-v2.rst | 58 ++- Documentation/cgroup-v1/drm.rst | 1 + drivers/gpu/drm/drm_drv.c | 11 + drivers/gpu/drm/drm_gem.c | 89 ++++ drivers/gpu/drm/i915/gem/i915_gem_mman.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_region.c | 23 +- drivers/gpu/drm/i915/intel_memory_region.c | 13 +- drivers/gpu/drm/i915/intel_memory_region.h | 2 +- include/drm/drm_cgroup.h | 85 ++++ include/drm/drm_device.h | 7 + include/drm/drm_gem.h | 17 + include/linux/cgroup_drm.h | 113 +++++ include/linux/cgroup_subsys.h | 4 + init/Kconfig | 5 + kernel/cgroup/Makefile | 1 + kernel/cgroup/drm.c | 533 +++++++++++++++++++++ 16 files changed, 954 insertions(+), 9 deletions(-) create mode 100644 Documentation/cgroup-v1/drm.rst create mode 100644 include/drm/drm_cgroup.h create mode 100644 include/linux/cgroup_drm.h create mode 100644 kernel/cgroup/drm.c
From: Kenny Ho Kenny.Ho@amd.com
With the increased importance of machine learning, data science and other cloud-based applications, GPUs are already in production use in data centers today. Existing GPU resource management is very coarse grain, however, as sysadmins are only able to distribute workload on a per-GPU basis. An alternative is to use GPU virtualization (with or without SRIOV) but it generally acts on the entire GPU instead of the specific resources in a GPU. With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU resource management (in addition to what may be available via GPU virtualization.)
Signed-off-by: Kenny Ho Kenny.Ho@amd.com --- Documentation/admin-guide/cgroup-v2.rst | 18 ++++- Documentation/cgroup-v1/drm.rst | 1 + include/linux/cgroup_drm.h | 92 +++++++++++++++++++++++++ include/linux/cgroup_subsys.h | 4 ++ init/Kconfig | 5 ++ kernel/cgroup/Makefile | 1 + kernel/cgroup/drm.c | 42 +++++++++++ 7 files changed, 161 insertions(+), 2 deletions(-) create mode 100644 Documentation/cgroup-v1/drm.rst create mode 100644 include/linux/cgroup_drm.h create mode 100644 kernel/cgroup/drm.c
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 63521cd36ce5..b099e1d71098 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -63,8 +63,10 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou 5-7-1. RDMA Interface Files 5-8. HugeTLB 5.8-1. HugeTLB Interface Files - 5-8. Misc - 5-8-1. perf_event + 5-9. GPU + 5-9-1. GPU Interface Files + 5-10. Misc + 5-10-1. perf_event 5-N. Non-normative information 5-N-1. CPU controller root cgroup process behaviour 5-N-2. IO controller root cgroup process behaviour @@ -2160,6 +2162,18 @@ HugeTLB Interface Files are local to the cgroup i.e. not hierarchical. The file modified event generated on this file reflects only the local events.
+GPU +--- + +The "gpu" controller regulates the distribution and accounting of +of GPU-related resources. + +GPU Interface Files +~~~~~~~~~~~~~~~~~~~~ + +TODO + + Misc ----
diff --git a/Documentation/cgroup-v1/drm.rst b/Documentation/cgroup-v1/drm.rst new file mode 100644 index 000000000000..5f5658e1f5ed --- /dev/null +++ b/Documentation/cgroup-v1/drm.rst @@ -0,0 +1 @@ +Please see ../cgroup-v2.rst for details diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h new file mode 100644 index 000000000000..345af54a5d41 --- /dev/null +++ b/include/linux/cgroup_drm.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: MIT + * Copyright 2019 Advanced Micro Devices, Inc. + */ +#ifndef _CGROUP_DRM_H +#define _CGROUP_DRM_H + +#include <linux/cgroup.h> + +#ifdef CONFIG_CGROUP_DRM + +/** + * The DRM cgroup controller data structure. + */ +struct drmcg { + struct cgroup_subsys_state css; +}; + +/** + * css_to_drmcg - get the corresponding drmcg ref from a cgroup_subsys_state + * @css: the target cgroup_subsys_state + * + * Return: DRM cgroup that contains the @css + */ +static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css) +{ + return css ? container_of(css, struct drmcg, css) : NULL; +} + +/** + * drmcg_get - get the drmcg reference that a task belongs to + * @task: the target task + * + * This increase the reference count of the css that the @task belongs to + * + * Return: reference to the DRM cgroup the task belongs to + */ +static inline struct drmcg *drmcg_get(struct task_struct *task) +{ + return css_to_drmcg(task_get_css(task, gpu_cgrp_id)); +} + +/** + * drmcg_put - put a drmcg reference + * @drmcg: the target drmcg + * + * Put a reference obtained via drmcg_get + */ +static inline void drmcg_put(struct drmcg *drmcg) +{ + if (drmcg) + css_put(&drmcg->css); +} + +/** + * drmcg_parent - find the parent of a drm cgroup + * @cg: the target drmcg + * + * This does not increase the reference count of the parent cgroup + * + * Return: parent DRM cgroup of @cg + */ +static inline struct drmcg *drmcg_parent(struct drmcg *cg) +{ + return css_to_drmcg(cg->css.parent); +} + +#else /* CONFIG_CGROUP_DRM */ + +struct drmcg { +}; + +static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css) +{ + return NULL; +} + +static inline struct drmcg *drmcg_get(struct task_struct *task) +{ + return NULL; +} + +static inline void drmcg_put(struct drmcg *drmcg) +{ +} + +static inline struct drmcg *drmcg_parent(struct drmcg *cg) +{ + return NULL; +} + +#endif /* CONFIG_CGROUP_DRM */ +#endif /* _CGROUP_DRM_H */ diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index acb77dcff3b4..f4e627942115 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -61,6 +61,10 @@ SUBSYS(pids) SUBSYS(rdma) #endif
+#if IS_ENABLED(CONFIG_CGROUP_DRM) +SUBSYS(gpu) +#endif + /* * The following subsystems are not supported on the default hierarchy. */ diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..bee29f51e380 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1027,6 +1027,11 @@ config CGROUP_RDMA Attaching processes with active RDMA resources to the cgroup hierarchy is allowed even if can cross the hierarchy's limit.
+config CGROUP_DRM + bool "DRM controller (EXPERIMENTAL)" + help + Provides accounting and enforcement of resources in the DRM subsystem. + config CGROUP_FREEZER bool "Freezer controller" help diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile index 5d7a76bfbbb7..31f186f58121 100644 --- a/kernel/cgroup/Makefile +++ b/kernel/cgroup/Makefile @@ -4,5 +4,6 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o freezer.o obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o obj-$(CONFIG_CGROUP_PIDS) += pids.o obj-$(CONFIG_CGROUP_RDMA) += rdma.o +obj-$(CONFIG_CGROUP_DRM) += drm.o obj-$(CONFIG_CPUSETS) += cpuset.o obj-$(CONFIG_CGROUP_DEBUG) += debug.o diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c new file mode 100644 index 000000000000..5e38a8230922 --- /dev/null +++ b/kernel/cgroup/drm.c @@ -0,0 +1,42 @@ +// SPDX-License-Identifier: MIT +// Copyright 2019 Advanced Micro Devices, Inc. +#include <linux/slab.h> +#include <linux/cgroup.h> +#include <linux/cgroup_drm.h> + +static struct drmcg *root_drmcg __read_mostly; + +static void drmcg_css_free(struct cgroup_subsys_state *css) +{ + struct drmcg *drmcg = css_to_drmcg(css); + + kfree(drmcg); +} + +static struct cgroup_subsys_state * +drmcg_css_alloc(struct cgroup_subsys_state *parent_css) +{ + struct drmcg *parent = css_to_drmcg(parent_css); + struct drmcg *drmcg; + + drmcg = kzalloc(sizeof(struct drmcg), GFP_KERNEL); + if (!drmcg) + return ERR_PTR(-ENOMEM); + + if (!parent) + root_drmcg = drmcg; + + return &drmcg->css; +} + +struct cftype files[] = { + { } /* terminate */ +}; + +struct cgroup_subsys gpu_cgrp_subsys = { + .css_alloc = drmcg_css_alloc, + .css_free = drmcg_css_free, + .early_init = false, + .legacy_cftypes = files, + .dfl_cftypes = files, +};
From: Kenny Ho Kenny.Ho@amd.com
Since the drm subsystem can be compiled as a module and drm devices can be added and removed during run time, add several functions to bind the drm subsystem as well as drm devices with drmcg.
Two pairs of functions: drmcg_bind/drmcg_unbind - used to bind/unbind the drm subsystem to the cgroup subsystem as the drm core initialize/exit.
drmcg_register_dev/drmcg_unregister_dev - used to register/unregister drm devices to the cgroup subsystem as the devices are presented/removed from userspace.
Signed-off-by: Kenny Ho Kenny.Ho@amd.com --- drivers/gpu/drm/drm_drv.c | 8 +++ include/drm/drm_cgroup.h | 39 +++++++++++ include/linux/cgroup_drm.h | 4 ++ kernel/cgroup/drm.c | 131 +++++++++++++++++++++++++++++++++++++ 4 files changed, 182 insertions(+) create mode 100644 include/drm/drm_cgroup.h
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 20d22e41d7ce..3b940926d672 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -42,6 +42,7 @@ #include <drm/drm_managed.h> #include <drm/drm_mode_object.h> #include <drm/drm_print.h> +#include <drm/drm_cgroup.h>
#include "drm_crtc_internal.h" #include "drm_internal.h" @@ -893,6 +894,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags) if (drm_core_check_feature(dev, DRIVER_MODESET)) drm_modeset_register_all(dev);
+ drmcg_register_dev(dev); + DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n", driver->name, driver->major, driver->minor, driver->patchlevel, driver->date, @@ -928,6 +931,8 @@ EXPORT_SYMBOL(drm_dev_register); */ void drm_dev_unregister(struct drm_device *dev) { + drmcg_unregister_dev(dev); + if (drm_core_check_feature(dev, DRIVER_LEGACY)) drm_lastclose(dev);
@@ -1030,6 +1035,7 @@ static const struct file_operations drm_stub_fops = {
static void drm_core_exit(void) { + drmcg_unbind(); unregister_chrdev(DRM_MAJOR, "drm"); debugfs_remove(drm_debugfs_root); drm_sysfs_destroy(); @@ -1056,6 +1062,8 @@ static int __init drm_core_init(void) if (ret < 0) goto error;
+ drmcg_bind(&drm_minor_acquire, &drm_dev_put); + drm_core_init_complete = true;
DRM_DEBUG("Initialized\n"); diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h new file mode 100644 index 000000000000..530c9a0b3238 --- /dev/null +++ b/include/drm/drm_cgroup.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: MIT + * Copyright 2019 Advanced Micro Devices, Inc. + */ +#ifndef __DRM_CGROUP_H__ +#define __DRM_CGROUP_H__ + +#ifdef CONFIG_CGROUP_DRM + +void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)), + void (*put_ddev)(struct drm_device *dev)); + +void drmcg_unbind(void); + +void drmcg_register_dev(struct drm_device *dev); + +void drmcg_unregister_dev(struct drm_device *dev); + +#else + +static inline void drmcg_bind( + struct drm_minor (*(*acq_dm)(unsigned int minor_id)), + void (*put_ddev)(struct drm_device *dev)) +{ +} + +static inline void drmcg_unbind(void) +{ +} + +static inline void drmcg_register_dev(struct drm_device *dev) +{ +} + +static inline void drmcg_unregister_dev(struct drm_device *dev) +{ +} + +#endif /* CONFIG_CGROUP_DRM */ +#endif /* __DRM_CGROUP_H__ */ diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h index 345af54a5d41..307bb75db248 100644 --- a/include/linux/cgroup_drm.h +++ b/include/linux/cgroup_drm.h @@ -5,6 +5,10 @@ #define _CGROUP_DRM_H
#include <linux/cgroup.h> +#include <drm/drm_file.h> + +/* limit defined per the way drm_minor_alloc operates */ +#define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
#ifdef CONFIG_CGROUP_DRM
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c index 5e38a8230922..061bb9c458e4 100644 --- a/kernel/cgroup/drm.c +++ b/kernel/cgroup/drm.c @@ -1,11 +1,142 @@ // SPDX-License-Identifier: MIT // Copyright 2019 Advanced Micro Devices, Inc. +#include <linux/bitmap.h> +#include <linux/mutex.h> #include <linux/slab.h> #include <linux/cgroup.h> #include <linux/cgroup_drm.h> +#include <drm/drm_file.h> +#include <drm/drm_device.h> +#include <drm/drm_cgroup.h>
static struct drmcg *root_drmcg __read_mostly;
+/* global mutex for drmcg across all devices */ +static DEFINE_MUTEX(drmcg_mutex); + +static DECLARE_BITMAP(known_devs, MAX_DRM_DEV); + +static struct drm_minor (*(*acquire_drm_minor)(unsigned int minor_id)); + +static void (*put_drm_dev)(struct drm_device *dev); + +/** + * drmcg_bind - Bind DRM subsystem to cgroup subsystem + * @acq_dm: function pointer to the drm_minor_acquire function + * @put_ddev: function pointer to the drm_dev_put function + * + * This function binds some functions from the DRM subsystem and make + * them available to the drmcg subsystem. + * + * drmcg_unbind does the opposite of this function + */ +void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)), + void (*put_ddev)(struct drm_device *dev)) +{ + mutex_lock(&drmcg_mutex); + acquire_drm_minor = acq_dm; + put_drm_dev = put_ddev; + mutex_unlock(&drmcg_mutex); +} +EXPORT_SYMBOL(drmcg_bind); + +/** + * drmcg_unbind - Unbind DRM subsystem from cgroup subsystem + * + * drmcg_bind does the opposite of this function + */ +void drmcg_unbind(void) +{ + mutex_lock(&drmcg_mutex); + acquire_drm_minor = NULL; + put_drm_dev = NULL; + mutex_unlock(&drmcg_mutex); +} +EXPORT_SYMBOL(drmcg_unbind); + +/** + * drmcg_register_dev - register a DRM device for usage in drm cgroup + * @dev: DRM device + * + * This function make a DRM device visible to the cgroup subsystem. + * Once the drmcg is aware of the device, drmcg can start tracking and + * control resource usage for said device. + * + * drmcg_unregister_dev reverse the operation of this function + */ +void drmcg_register_dev(struct drm_device *dev) +{ + if (WARN_ON(dev->primary->index >= MAX_DRM_DEV)) + return; + + mutex_lock(&drmcg_mutex); + set_bit(dev->primary->index, known_devs); + mutex_unlock(&drmcg_mutex); +} +EXPORT_SYMBOL(drmcg_register_dev); + +/** + * drmcg_unregister_dev - Iterate through all stored DRM minors + * @dev: DRM device + * + * Unregister @dev so that drmcg no longer control resource usage + * of @dev. The @dev was registered to drmcg using + * drmcg_register_dev function + */ +void drmcg_unregister_dev(struct drm_device *dev) +{ + if (WARN_ON(dev->primary->index >= MAX_DRM_DEV)) + return; + + mutex_lock(&drmcg_mutex); + clear_bit(dev->primary->index, known_devs); + mutex_unlock(&drmcg_mutex); +} +EXPORT_SYMBOL(drmcg_unregister_dev); + +/** + * drm_minor_for_each - Iterate through all stored DRM minors + * @fn: Function to be called for each pointer. + * @data: Data passed to callback function. + * + * The callback function will be called for each registered device, passing + * the minor, the @drm_minor entry and @data. + * + * If @fn returns anything other than %0, the iteration stops and that + * value is returned from this function. + */ +static int drm_minor_for_each(int (*fn)(int id, void *p, void *data), + void *data) +{ + int rc = 0; + + mutex_lock(&drmcg_mutex); + if (acquire_drm_minor) { + unsigned int minor; + struct drm_minor *dm; + + minor = find_next_bit(known_devs, MAX_DRM_DEV, 0); + while (minor < MAX_DRM_DEV) { + dm = acquire_drm_minor(minor); + + if (IS_ERR(dm)) + continue; + + rc = fn(minor, (void *)dm, data); + + put_drm_dev(dm->dev); /* release from acquire_drm_minor */ + + if (rc) + break; + + minor = find_next_bit(known_devs, MAX_DRM_DEV, minor+1); + } + } + mutex_unlock(&drmcg_mutex); + + return rc; +} + static void drmcg_css_free(struct cgroup_subsys_state *css) { struct drmcg *drmcg = css_to_drmcg(css);
From: Kenny Ho Kenny.Ho@amd.com
drmcg initialization involves allocating a per cgroup, per device data structure and setting the defaults. There are two entry points for drmcg init:
1) When struct drmcg is created via css_alloc, initialization is done for each device
2) When DRM devices are created after drmcgs are created, per device drmcg data structure is allocated at the beginning of DRM device creation such that drmcg can begin tracking usage statistics
Entry point #2 usually applies to the root cgroup since it can be created before DRM devices are available. The drmcg controller will go through all existing drm cgroups and initialize them with the new device accordingly.
Extending Kenny's original work, this has been simplified some and the custom_init callback has been removed. (Brian)
Signed-off-by Kenny Ho Kenny.Ho@amd.com Signed-off-by: Brian Welty brian.welty@intel.com --- drivers/gpu/drm/drm_drv.c | 3 ++ include/drm/drm_cgroup.h | 17 +++++++ include/drm/drm_device.h | 7 +++ include/linux/cgroup_drm.h | 13 ++++++ kernel/cgroup/drm.c | 95 +++++++++++++++++++++++++++++++++++++- 5 files changed, 134 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index 3b940926d672..dac742445b38 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -570,6 +570,7 @@ static void drm_dev_init_release(struct drm_device *dev, void *res) /* Prevent use-after-free in drm_managed_release when debugging is * enabled. Slightly awkward, but can't really be helped. */ dev->dev = NULL; + mutex_destroy(&dev->drmcg_mutex); mutex_destroy(&dev->master_mutex); mutex_destroy(&dev->clientlist_mutex); mutex_destroy(&dev->filelist_mutex); @@ -612,6 +613,7 @@ static int drm_dev_init(struct drm_device *dev, mutex_init(&dev->filelist_mutex); mutex_init(&dev->clientlist_mutex); mutex_init(&dev->master_mutex); + mutex_init(&dev->drmcg_mutex);
ret = drmm_add_action(dev, drm_dev_init_release, NULL); if (ret) @@ -652,6 +654,7 @@ static int drm_dev_init(struct drm_device *dev, if (ret) goto err;
+ drmcg_device_early_init(dev); return 0;
err: diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h index 530c9a0b3238..43caf1b6a0de 100644 --- a/include/drm/drm_cgroup.h +++ b/include/drm/drm_cgroup.h @@ -4,6 +4,17 @@ #ifndef __DRM_CGROUP_H__ #define __DRM_CGROUP_H__
+#include <drm/drm_file.h> + +struct drm_device; + +/** + * Per DRM device properties for DRM cgroup controller for the purpose + * of storing per device defaults + */ +struct drmcg_props { +}; + #ifdef CONFIG_CGROUP_DRM
void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)), @@ -15,6 +26,8 @@ void drmcg_register_dev(struct drm_device *dev);
void drmcg_unregister_dev(struct drm_device *dev);
+void drmcg_device_early_init(struct drm_device *device); + #else
static inline void drmcg_bind( @@ -35,5 +48,9 @@ static inline void drmcg_unregister_dev(struct drm_device *dev) { }
+static inline void drmcg_device_early_init(struct drm_device *device) +{ +} + #endif /* CONFIG_CGROUP_DRM */ #endif /* __DRM_CGROUP_H__ */ diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index d647223e8390..1cdccc9a653c 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -8,6 +8,7 @@
#include <drm/drm_hashtab.h> #include <drm/drm_mode_config.h> +#include <drm/drm_cgroup.h>
struct drm_driver; struct drm_minor; @@ -318,6 +319,12 @@ struct drm_device { */ struct drm_fb_helper *fb_helper;
+ /** \name DRM Cgroup */ + /*@{ */ + struct mutex drmcg_mutex; + struct drmcg_props drmcg_props; + /*@} */ + /* Everything below here is for legacy driver, never use! */ /* private: */ #if IS_ENABLED(CONFIG_DRM_LEGACY) diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h index 307bb75db248..50f055804400 100644 --- a/include/linux/cgroup_drm.h +++ b/include/linux/cgroup_drm.h @@ -12,11 +12,19 @@
#ifdef CONFIG_CGROUP_DRM
+/** + * Per DRM cgroup, per device resources (such as statistics and limits) + */ +struct drmcg_device_resource { + /* for per device stats */ +}; + /** * The DRM cgroup controller data structure. */ struct drmcg { struct cgroup_subsys_state css; + struct drmcg_device_resource *dev_resources[MAX_DRM_DEV]; };
/** @@ -40,6 +48,8 @@ static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css) */ static inline struct drmcg *drmcg_get(struct task_struct *task) { + if (!cgroup_subsys_enabled(gpu_cgrp_subsys)) + return NULL; return css_to_drmcg(task_get_css(task, gpu_cgrp_id)); }
@@ -70,6 +80,9 @@ static inline struct drmcg *drmcg_parent(struct drmcg *cg)
#else /* CONFIG_CGROUP_DRM */
+struct drmcg_device_resource { +}; + struct drmcg { };
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c index 061bb9c458e4..836929c27de8 100644 --- a/kernel/cgroup/drm.c +++ b/kernel/cgroup/drm.c @@ -1,11 +1,15 @@ // SPDX-License-Identifier: MIT -// Copyright 2019 Advanced Micro Devices, Inc. +/* + * Copyright 2019 Advanced Micro Devices, Inc. + * Copyright © 2021 Intel Corporation + */ #include <linux/bitmap.h> #include <linux/mutex.h> #include <linux/slab.h> #include <linux/cgroup.h> #include <linux/cgroup_drm.h> #include <drm/drm_file.h> +#include <drm/drm_drv.h> #include <drm/drm_device.h> #include <drm/drm_cgroup.h>
@@ -54,6 +58,26 @@ void drmcg_unbind(void) } EXPORT_SYMBOL(drmcg_unbind);
+/* caller must hold dev->drmcg_mutex */ +static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev) +{ + int minor = dev->primary->index; + struct drmcg_device_resource *ddr = drmcg->dev_resources[minor]; + + if (ddr == NULL) { + ddr = kzalloc(sizeof(struct drmcg_device_resource), + GFP_KERNEL); + + if (!ddr) + return -ENOMEM; + } + + /* set defaults here */ + drmcg->dev_resources[minor] = ddr; + + return 0; +} + /** * drmcg_register_dev - register a DRM device for usage in drm cgroup * @dev: DRM device @@ -137,23 +161,61 @@ static int drm_minor_for_each(int (*fn)(int id, void *p, void *data), return rc; }
+static int drmcg_css_free_fn(int id, void *ptr, void *data) +{ + struct drm_minor *minor = ptr; + struct drmcg *drmcg = data; + + if (minor->type != DRM_MINOR_PRIMARY) + return 0; + + kfree(drmcg->dev_resources[minor->index]); + + return 0; +} + static void drmcg_css_free(struct cgroup_subsys_state *css) { struct drmcg *drmcg = css_to_drmcg(css);
+ drm_minor_for_each(&drmcg_css_free_fn, drmcg); + kfree(drmcg); }
+static int init_drmcg_fn(int id, void *ptr, void *data) +{ + struct drm_minor *minor = ptr; + struct drmcg *drmcg = data; + int rc; + + if (minor->type != DRM_MINOR_PRIMARY) + return 0; + + mutex_lock(&minor->dev->drmcg_mutex); + rc = init_drmcg_single(drmcg, minor->dev); + mutex_unlock(&minor->dev->drmcg_mutex); + + return rc; +} + static struct cgroup_subsys_state * drmcg_css_alloc(struct cgroup_subsys_state *parent_css) { struct drmcg *parent = css_to_drmcg(parent_css); struct drmcg *drmcg; + int rc;
drmcg = kzalloc(sizeof(struct drmcg), GFP_KERNEL); if (!drmcg) return ERR_PTR(-ENOMEM);
+ rc = drm_minor_for_each(&init_drmcg_fn, drmcg); + if (rc) { + drmcg_css_free(&drmcg->css); + return ERR_PTR(rc); + } + if (!parent) root_drmcg = drmcg;
@@ -171,3 +233,34 @@ struct cgroup_subsys gpu_cgrp_subsys = { .legacy_cftypes = files, .dfl_cftypes = files, }; + +/** + * drmcg_device_early_init - initialize device specific resources for DRM cgroups + * @dev: the target DRM device + * + * Allocate and initialize device specific resources for existing DRM cgroups. + * Typically only the root cgroup exists before the initialization of @dev. + */ +void drmcg_device_early_init(struct drm_device *dev) +{ + struct cgroup_subsys_state *pos; + + if (root_drmcg == NULL) + return; + + /* init cgroups created before registration (i.e. root cgroup) */ + rcu_read_lock(); + css_for_each_descendant_pre(pos, &root_drmcg->css) { + css_get(pos); + rcu_read_unlock(); + + mutex_lock(&dev->drmcg_mutex); + init_drmcg_single(css_to_drmcg(pos), dev); + mutex_unlock(&dev->drmcg_mutex); + + rcu_read_lock(); + css_put(pos); + } + rcu_read_unlock(); +} +EXPORT_SYMBOL(drmcg_device_early_init);
Add basic .seq_show and .write functions for use with DRM cgroup control files. This is based on original work from Kenny Ho and extracted from patches [1] and [2]. Has been simplified to remove having different file types and functions for each.
[1] https://lists.freedesktop.org/archives/dri-devel/2020-February/254986.html [2] https://lists.freedesktop.org/archives/dri-devel/2020-February/254990.html
Co-developed-by: Kenny Ho Kenny.Ho@amd.com Signed-off-by: Brian Welty brian.welty@intel.com --- include/drm/drm_cgroup.h | 5 ++ kernel/cgroup/drm.c | 102 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 107 insertions(+)
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h index 43caf1b6a0de..12526def9fbf 100644 --- a/include/drm/drm_cgroup.h +++ b/include/drm/drm_cgroup.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: MIT * Copyright 2019 Advanced Micro Devices, Inc. + * Copyright © 2021 Intel Corporation */ #ifndef __DRM_CGROUP_H__ #define __DRM_CGROUP_H__ @@ -15,6 +16,10 @@ struct drm_device; struct drmcg_props { };
+enum drmcg_res_type { + __DRMCG_TYPE_LAST, +}; + #ifdef CONFIG_CGROUP_DRM
void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)), diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c index 836929c27de8..aece11faa0bc 100644 --- a/kernel/cgroup/drm.c +++ b/kernel/cgroup/drm.c @@ -5,6 +5,7 @@ */ #include <linux/bitmap.h> #include <linux/mutex.h> +#include <linux/seq_file.h> #include <linux/slab.h> #include <linux/cgroup.h> #include <linux/cgroup_drm.h> @@ -12,6 +13,7 @@ #include <drm/drm_drv.h> #include <drm/drm_device.h> #include <drm/drm_cgroup.h> +#include <drm/drm_ioctl.h>
static struct drmcg *root_drmcg __read_mostly;
@@ -222,6 +224,106 @@ drmcg_css_alloc(struct cgroup_subsys_state *parent_css) return &drmcg->css; }
+static int drmcg_apply_value(struct drmcg_device_resource *ddr, + enum drmcg_res_type type, char *buf) +{ + int ret = 0; + unsigned long val; + + switch (type) { + default: + break; + } + + return ret; +} + +static int drmcg_seq_show_fn(int id, void *ptr, void *data) +{ + struct drm_minor *minor = ptr; + struct seq_file *sf = data; + struct drmcg *drmcg = css_to_drmcg(seq_css(sf)); + enum drmcg_res_type type = seq_cft(sf)->private; + struct drmcg_device_resource *ddr; + + if (minor->type != DRM_MINOR_PRIMARY) + return 0; + + ddr = drmcg->dev_resources[minor->index]; + if (ddr == NULL) + return 0; + + seq_printf(sf, "%d:%d ", DRM_MAJOR, minor->index); + switch (type) { + default: + seq_puts(sf, "\n"); + break; + } + + return 0; +} + +static int drmcg_seq_show(struct seq_file *sf, void *v) +{ + return drm_minor_for_each(&drmcg_seq_show_fn, sf); +} + +static ssize_t drmcg_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct drmcg *drmcg = css_to_drmcg(of_css(of)); + enum drmcg_res_type type = of_cft(of)->private; + char *cft_name = of_cft(of)->name; + char *limits = strstrip(buf); + struct drmcg_device_resource *ddr; + struct drm_minor *dm; + char *line; + char sattr[256]; + int minor, ret = 0; + + while (!ret && limits != NULL) { + line = strsep(&limits, "\n"); + + if (sscanf(line, + __stringify(DRM_MAJOR)":%u %255[^\t\n]", + &minor, sattr) != 2) { + pr_err("drmcg: error parsing %s ", cft_name); + pr_cont_cgroup_name(drmcg->css.cgroup); + pr_cont("\n"); + + continue; + } + + mutex_lock(&drmcg_mutex); + if (acquire_drm_minor) + dm = acquire_drm_minor(minor); + else + dm = NULL; + mutex_unlock(&drmcg_mutex); + + if (IS_ERR_OR_NULL(dm)) { + pr_err("drmcg: invalid minor %d for %s ", + minor, cft_name); + pr_cont_cgroup_name(drmcg->css.cgroup); + pr_cont("\n"); + + continue; + } + + mutex_lock(&dm->dev->drmcg_mutex); + ddr = drmcg->dev_resources[minor]; + ret = drmcg_apply_value(ddr, type, sattr); + mutex_unlock(&dm->dev->drmcg_mutex); + + mutex_lock(&drmcg_mutex); + if (put_drm_dev) + put_drm_dev(dm->dev); /* release from acquire_drm_minor */ + mutex_unlock(&drmcg_mutex); + } + + return ret ?: nbytes; +} + struct cftype files[] = { { } /* terminate */ };
Here we introduce a general purpose drm_cgroup_try_charge and uncharge pair of functions. This is modelled after the existing RDMA cgroup controller, and the idea is for these functions to be used for charging/uncharging all current and future DRM resource controls.
Two new controls are added in order to allow DRM device drivers to expose memories attached to the device for purposes of memory accounting and enforcing allocation limit.
Following controls are introduced and are intended to provide equivalent behavior and functionality where possible as the MEM controller:
memory.current Read-only value, displays current device memory usage for all DRM device memory in terms of allocated physical backing store.
memory.max Read-write value, displays the maximum memory usage limit of device memory. If memory usage reaches this limit, then subsequent memory allocations will fail.
The expectation is that the DRM device driver simply lets allocations fail when memory.max is reached. Future work might be to introduce the device memory concept of memory.high or memory.low controls, such that DRM device might be able to invoke memory eviction when these lower threshold are hit.
With provided charging functions, support for memory accounting/charging is functional. The intent is for DRM device drivers to charge against memory.current as device memory is physically allocated. Implementation is simplified by making use of page counters underneath. Nested cgroups will correctly maintain the parent for the memory page counters, such that hierarchial charging to parent's memory.current is supported.
Note, this is only for tracking of allocations from device-backed memory. Memory charging for objects that are backed by system memory is already handled by the mm subsystem and existing memory accounting controls.
Signed-off-by: Brian Welty brian.welty@intel.com --- Documentation/admin-guide/cgroup-v2.rst | 29 ++++- include/drm/drm_cgroup.h | 20 ++++ include/linux/cgroup_drm.h | 4 +- kernel/cgroup/drm.c | 145 +++++++++++++++++++++++- 4 files changed, 191 insertions(+), 7 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index b099e1d71098..405912710b3a 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2171,8 +2171,35 @@ of GPU-related resources. GPU Interface Files ~~~~~~~~~~~~~~~~~~~~
-TODO +GPU controller supports usage of multiple DRM devices within a cgroup. +As such, for all interface files, output is keyed by device major:minor +and is not ordered.
+The first set of control files are for regulating the accounting and +allocation of memories attached to DRM devices. The controls are intended +to provide equivalent behavior and functionality where possible as the +MEM controller. All memory amounts are in bytes. + + memory.current + Read-only value, displays current device memory usage for the DRM + device memory in terms of allocated physical backing store. + + memory.max + Read-write value, displays the maximum memory usage limit for the + DRM device memory. If memory usage reaches this limit, + subsequent device memory allocations will fail. + +While some DRM devices may be capable to present multiple memory segments +to the user, the intent with above controls is to aggregate all user +allocatable backing store. Any support for multiple memory segments is +left as future work. + +The expectation is that the DRM device driver simply lets allocations fail +when memory.max is reached. Future work might be to introduce the device +memory concept of memory.high or memory.low controls. When these lower +thresholds are hit, this would then allow the DRM device driver to invoke +some equivalent to OOM-killer or forced memory eviction for the device +backed memory in order to attempt to free additional space.
Misc ---- diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h index 12526def9fbf..8b4c4e798b11 100644 --- a/include/drm/drm_cgroup.h +++ b/include/drm/drm_cgroup.h @@ -8,6 +8,7 @@ #include <drm/drm_file.h>
struct drm_device; +struct drmcg;
/** * Per DRM device properties for DRM cgroup controller for the purpose @@ -17,6 +18,8 @@ struct drmcg_props { };
enum drmcg_res_type { + DRMCG_TYPE_MEM_CURRENT, + DRMCG_TYPE_MEM_MAX, __DRMCG_TYPE_LAST, };
@@ -33,6 +36,11 @@ void drmcg_unregister_dev(struct drm_device *dev);
void drmcg_device_early_init(struct drm_device *device);
+int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, + enum drmcg_res_type type, u64 usage); +void drm_cgroup_uncharge(struct drmcg *drmcg, struct drm_device *dev, + enum drmcg_res_type type, u64 usage); + #else
static inline void drmcg_bind( @@ -57,5 +65,17 @@ static inline void drmcg_device_early_init(struct drm_device *device) { }
+static inline +int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, + enum drmcg_res_type res, u64 usage) +{ + return 0; +} + +static inline +void drm_cgroup_uncharge(struct drmcg *drmcg,struct drm_device *dev, + enum drmcg_res_type type, u64 usage) +{ +} #endif /* CONFIG_CGROUP_DRM */ #endif /* __DRM_CGROUP_H__ */ diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h index 50f055804400..3570636473cf 100644 --- a/include/linux/cgroup_drm.h +++ b/include/linux/cgroup_drm.h @@ -1,10 +1,12 @@ /* SPDX-License-Identifier: MIT * Copyright 2019 Advanced Micro Devices, Inc. + * Copyright © 2021 Intel Corporation */ #ifndef _CGROUP_DRM_H #define _CGROUP_DRM_H
#include <linux/cgroup.h> +#include <linux/page_counter.h> #include <drm/drm_file.h>
/* limit defined per the way drm_minor_alloc operates */ @@ -16,7 +18,7 @@ * Per DRM cgroup, per device resources (such as statistics and limits) */ struct drmcg_device_resource { - /* for per device stats */ + struct page_counter memory; };
/** diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c index aece11faa0bc..bec41f343208 100644 --- a/kernel/cgroup/drm.c +++ b/kernel/cgroup/drm.c @@ -64,6 +64,7 @@ EXPORT_SYMBOL(drmcg_unbind); static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev) { int minor = dev->primary->index; + struct drmcg_device_resource *parent_ddr = NULL; struct drmcg_device_resource *ddr = drmcg->dev_resources[minor];
if (ddr == NULL) { @@ -74,7 +75,12 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev) return -ENOMEM; }
+ if (drmcg_parent(drmcg)) + parent_ddr = drmcg_parent(drmcg)->dev_resources[minor]; + /* set defaults here */ + page_counter_init(&ddr->memory, + parent_ddr ? &parent_ddr->memory : NULL); drmcg->dev_resources[minor] = ddr;
return 0; @@ -212,6 +218,13 @@ drmcg_css_alloc(struct cgroup_subsys_state *parent_css) if (!drmcg) return ERR_PTR(-ENOMEM);
+ /* + * parent is normally set upon return in caller, but set here as + * is needed by init_drmcg_single so resource initialization can + * associate with parent for hierarchial charging + */ + drmcg->css.parent = parent_css; + rc = drm_minor_for_each(&init_drmcg_fn, drmcg); if (rc) { drmcg_css_free(&drmcg->css); @@ -231,6 +244,10 @@ static int drmcg_apply_value(struct drmcg_device_resource *ddr, unsigned long val;
switch (type) { + case DRMCG_TYPE_MEM_MAX: + ret = page_counter_memparse(buf, "max", &val); + if (!ret) + ret = page_counter_set_max(&ddr->memory, val); default: break; } @@ -253,10 +270,21 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data) if (ddr == NULL) return 0;
- seq_printf(sf, "%d:%d ", DRM_MAJOR, minor->index); switch (type) { + case DRMCG_TYPE_MEM_CURRENT: + seq_printf(sf, "%d:%d %ld\n", + DRM_MAJOR, minor->index, + page_counter_read(&ddr->memory) * PAGE_SIZE); + break; + case DRMCG_TYPE_MEM_MAX: + seq_printf(sf, "%d:%d ", DRM_MAJOR, minor->index); + if (ddr->memory.max == PAGE_COUNTER_MAX) + seq_printf(sf, "max\n"); + else + seq_printf(sf, "%ld\n", ddr->memory.max * PAGE_SIZE); + break; default: - seq_puts(sf, "\n"); + seq_printf(sf, "%d:%d\n", DRM_MAJOR, minor->index); break; }
@@ -268,6 +296,17 @@ static int drmcg_seq_show(struct seq_file *sf, void *v) return drm_minor_for_each(&drmcg_seq_show_fn, sf); }
+static int drmcg_parse_input(const char *buf, enum drmcg_res_type type, + int *minor, char *sattr) +{ + if (sscanf(buf, + __stringify(DRM_MAJOR)":%u %255[^\t\n]", + minor, sattr) != 2) + return -EINVAL; + + return 0; +} + static ssize_t drmcg_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -284,9 +323,7 @@ static ssize_t drmcg_write(struct kernfs_open_file *of, char *buf, while (!ret && limits != NULL) { line = strsep(&limits, "\n");
- if (sscanf(line, - __stringify(DRM_MAJOR)":%u %255[^\t\n]", - &minor, sattr) != 2) { + if (drmcg_parse_input(line, type, &minor, sattr)) { pr_err("drmcg: error parsing %s ", cft_name); pr_cont_cgroup_name(drmcg->css.cgroup); pr_cont("\n"); @@ -325,6 +362,18 @@ static ssize_t drmcg_write(struct kernfs_open_file *of, char *buf, }
struct cftype files[] = { + { + .name = "memory.current", + .seq_show = drmcg_seq_show, + .private = DRMCG_TYPE_MEM_CURRENT, + }, + { + .name = "memory.max", + .seq_show = drmcg_seq_show, + .write = drmcg_write, + .private = DRMCG_TYPE_MEM_MAX, + .flags = CFTYPE_NOT_ON_ROOT + }, { } /* terminate */ };
@@ -366,3 +415,89 @@ void drmcg_device_early_init(struct drm_device *dev) rcu_read_unlock(); } EXPORT_SYMBOL(drmcg_device_early_init); + +/** + * drm_cgroup_try_charge - try charging against drmcg resource + * @drmcg: DRM cgroup to charge + * @dev: associated DRM device + * @type: which resource type to charge + * @usage: usage to be charged + * + * For @type of DRMCG_TYPE_MEM_CURRENT: + * Try to charge @usage bytes to specified DRM cgroup. This will fail if + * a successful charge would cause the current device memory usage to go + * above the maximum limit. On failure, the caller (DRM device driver) may + * choose to enact some form of memory reclaim, but the exact behavior is left + * to the DRM device driver to define. + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, + enum drmcg_res_type type, u64 usage) +{ + struct drmcg_device_resource *res; + struct page_counter *counter; + int err = -ENOMEM; + u64 nr_pages; + + if (!drmcg) + return 0; + + res = drmcg->dev_resources[dev->primary->index]; + if (!res) + return -EINVAL; + + switch (type) { + case DRMCG_TYPE_MEM_CURRENT: + nr_pages = usage >> PAGE_SHIFT; + if (page_counter_try_charge(&res->memory, nr_pages, + &counter)) { + css_get_many(&drmcg->css, nr_pages); + err = 0; + } + break; + default: + err = -EINVAL; + break; + } + + return err; +} +EXPORT_SYMBOL(drm_cgroup_try_charge); + +/** + * drm_cgroup_uncharge - uncharge resource usage from drmcg + * @drmcg: DRM cgroup to uncharge + * @dev: associated DRM device + * @type: which resource type to uncharge + * @usage: usage to be uncharged + * + * Action for each @type: + * DRMCG_TYPE_MEM_CURRENT: Uncharge @usage bytes from specified DRM cgroup. + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +void drm_cgroup_uncharge(struct drmcg *drmcg, struct drm_device *dev, + enum drmcg_res_type type, u64 usage) +{ + struct drmcg_device_resource *res; + u64 nr_pages; + + if (!drmcg) + return; + + res = drmcg->dev_resources[dev->primary->index]; + if (!res) + return; + + switch (type) { + case DRMCG_TYPE_MEM_CURRENT: + nr_pages = usage >> PAGE_SHIFT; + page_counter_uncharge(&res->memory, nr_pages); + css_put_many(&drmcg->css, nr_pages); + break; + default: + break; + } +} +EXPORT_SYMBOL(drm_cgroup_uncharge);
Following control is introduced in order to display total memory that exists for the DRM device. DRM drivers can advertise this value by writing to drm_device.drmcg_props. This is needed in order to effectively use the other memory controls. Normally for system memory this is available to the user using procfs.
memory.total Read-only value, displays total memory for a device, shown only in root cgroup.
Signed-off-by: Brian Welty brian.welty@intel.com --- Documentation/admin-guide/cgroup-v2.rst | 4 ++++ include/drm/drm_cgroup.h | 2 ++ kernel/cgroup/drm.c | 10 ++++++++++ 3 files changed, 16 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 405912710b3a..ccc25f03a898 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2189,6 +2189,10 @@ MEM controller. All memory amounts are in bytes. DRM device memory. If memory usage reaches this limit, subsequent device memory allocations will fail.
+ memory.total + Read-only value, displays total memory for a device, shown only in + root cgroup. + While some DRM devices may be capable to present multiple memory segments to the user, the intent with above controls is to aggregate all user allocatable backing store. Any support for multiple memory segments is diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h index 8b4c4e798b11..9ba0e372eeee 100644 --- a/include/drm/drm_cgroup.h +++ b/include/drm/drm_cgroup.h @@ -15,11 +15,13 @@ struct drmcg; * of storing per device defaults */ struct drmcg_props { + u64 memory_total; };
enum drmcg_res_type { DRMCG_TYPE_MEM_CURRENT, DRMCG_TYPE_MEM_MAX, + DRMCG_TYPE_MEM_TOTAL, __DRMCG_TYPE_LAST, };
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c index bec41f343208..08e75eb67593 100644 --- a/kernel/cgroup/drm.c +++ b/kernel/cgroup/drm.c @@ -283,6 +283,10 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data) else seq_printf(sf, "%ld\n", ddr->memory.max * PAGE_SIZE); break; + case DRMCG_TYPE_MEM_TOTAL: + seq_printf(sf, "%d:%d %llu\n", DRM_MAJOR, minor->index, + minor->dev->drmcg_props.memory_total); + break; default: seq_printf(sf, "%d:%d\n", DRM_MAJOR, minor->index); break; @@ -374,6 +378,12 @@ struct cftype files[] = { .private = DRMCG_TYPE_MEM_MAX, .flags = CFTYPE_NOT_ON_ROOT }, + { + .name = "memory.total", + .seq_show = drmcg_seq_show, + .private = DRMCG_TYPE_MEM_TOTAL, + .flags = CFTYPE_ONLY_ON_ROOT, + }, { } /* terminate */ };
Single control below is added to DRM cgroup controller in order to track user execution time for GPU devices. It is up to device drivers to charge execution time to the cgroup via drm_cgroup_try_charge().
sched.runtime Read-only value, displays current user execution time for each DRM device. The expectation is that this is incremented by DRM device driver's scheduler upon user context completion or context switch. Units of time are in microseconds for consistency with cpu.stats.
Signed-off-by: Brian Welty brian.welty@intel.com --- Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++ include/drm/drm_cgroup.h | 2 ++ include/linux/cgroup_drm.h | 2 ++ kernel/cgroup/drm.c | 20 ++++++++++++++++++++ 4 files changed, 33 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index ccc25f03a898..f1d0f333a49e 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2205,6 +2205,15 @@ thresholds are hit, this would then allow the DRM device driver to invoke some equivalent to OOM-killer or forced memory eviction for the device backed memory in order to attempt to free additional space.
+The below set of control files are for time accounting of DRM devices. Units +of time are in microseconds. + + sched.runtime + Read-only value, displays current user execution time for each DRM + device. The expectation is that this is incremented by DRM device + driver's scheduler upon user context completion or context switch. + + Misc ----
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h index 9ba0e372eeee..315dab8a93b8 100644 --- a/include/drm/drm_cgroup.h +++ b/include/drm/drm_cgroup.h @@ -22,6 +22,7 @@ enum drmcg_res_type { DRMCG_TYPE_MEM_CURRENT, DRMCG_TYPE_MEM_MAX, DRMCG_TYPE_MEM_TOTAL, + DRMCG_TYPE_SCHED_RUNTIME, __DRMCG_TYPE_LAST, };
@@ -79,5 +80,6 @@ void drm_cgroup_uncharge(struct drmcg *drmcg,struct drm_device *dev, enum drmcg_res_type type, u64 usage) { } + #endif /* CONFIG_CGROUP_DRM */ #endif /* __DRM_CGROUP_H__ */ diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h index 3570636473cf..0fafa663321e 100644 --- a/include/linux/cgroup_drm.h +++ b/include/linux/cgroup_drm.h @@ -19,6 +19,8 @@ */ struct drmcg_device_resource { struct page_counter memory; + seqlock_t sched_lock; + u64 exec_runtime; };
/** diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c index 08e75eb67593..64e9d0dbe8c8 100644 --- a/kernel/cgroup/drm.c +++ b/kernel/cgroup/drm.c @@ -81,6 +81,7 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev) /* set defaults here */ page_counter_init(&ddr->memory, parent_ddr ? &parent_ddr->memory : NULL); + seqlock_init(&ddr->sched_lock); drmcg->dev_resources[minor] = ddr;
return 0; @@ -287,6 +288,10 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data) seq_printf(sf, "%d:%d %llu\n", DRM_MAJOR, minor->index, minor->dev->drmcg_props.memory_total); break; + case DRMCG_TYPE_SCHED_RUNTIME: + seq_printf(sf, "%d:%d %llu\n", DRM_MAJOR, minor->index, + ktime_to_us(ddr->exec_runtime)); + break; default: seq_printf(sf, "%d:%d\n", DRM_MAJOR, minor->index); break; @@ -384,6 +389,12 @@ struct cftype files[] = { .private = DRMCG_TYPE_MEM_TOTAL, .flags = CFTYPE_ONLY_ON_ROOT, }, + { + .name = "sched.runtime", + .seq_show = drmcg_seq_show, + .private = DRMCG_TYPE_SCHED_RUNTIME, + .flags = CFTYPE_NOT_ON_ROOT, + }, { } /* terminate */ };
@@ -440,6 +451,10 @@ EXPORT_SYMBOL(drmcg_device_early_init); * choose to enact some form of memory reclaim, but the exact behavior is left * to the DRM device driver to define. * + * For @res type of DRMCG_TYPE_SCHED_RUNTIME: + * For GPU time accounting, add @usage amount of GPU time to @drmcg for + * the given device. + * * Returns 0 on success. Otherwise, an error code is returned. */ int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, @@ -466,6 +481,11 @@ int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, err = 0; } break; + case DRMCG_TYPE_SCHED_RUNTIME: + write_seqlock(&res->sched_lock); + res->exec_runtime = ktime_add(res->exec_runtime, usage); + write_sequnlock(&res->sched_lock); + break; default: err = -EINVAL; break;
Quoting Brian Welty (2021-01-26 23:46:24)
Single control below is added to DRM cgroup controller in order to track user execution time for GPU devices. It is up to device drivers to charge execution time to the cgroup via drm_cgroup_try_charge().
sched.runtime Read-only value, displays current user execution time for each DRM device. The expectation is that this is incremented by DRM device driver's scheduler upon user context completion or context switch. Units of time are in microseconds for consistency with cpu.stats.
Were not we also planning for a percentage style budgeting?
Capping the maximum runtime is definitely useful, but in order to configure a system for peaceful co-existence of two or more workloads we must also impose a limit on how big portion of the instantaneous capacity can be used.
Regards, Joonas
Signed-off-by: Brian Welty brian.welty@intel.com
Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++ include/drm/drm_cgroup.h | 2 ++ include/linux/cgroup_drm.h | 2 ++ kernel/cgroup/drm.c | 20 ++++++++++++++++++++ 4 files changed, 33 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index ccc25f03a898..f1d0f333a49e 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2205,6 +2205,15 @@ thresholds are hit, this would then allow the DRM device driver to invoke some equivalent to OOM-killer or forced memory eviction for the device backed memory in order to attempt to free additional space.
+The below set of control files are for time accounting of DRM devices. Units +of time are in microseconds.
- sched.runtime
Read-only value, displays current user execution time for each DRM
device. The expectation is that this is incremented by DRM device
driver's scheduler upon user context completion or context switch.
Misc
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h index 9ba0e372eeee..315dab8a93b8 100644 --- a/include/drm/drm_cgroup.h +++ b/include/drm/drm_cgroup.h @@ -22,6 +22,7 @@ enum drmcg_res_type { DRMCG_TYPE_MEM_CURRENT, DRMCG_TYPE_MEM_MAX, DRMCG_TYPE_MEM_TOTAL,
DRMCG_TYPE_SCHED_RUNTIME, __DRMCG_TYPE_LAST,
};
@@ -79,5 +80,6 @@ void drm_cgroup_uncharge(struct drmcg *drmcg,struct drm_device *dev, enum drmcg_res_type type, u64 usage) { }
#endif /* CONFIG_CGROUP_DRM */ #endif /* __DRM_CGROUP_H__ */ diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h index 3570636473cf..0fafa663321e 100644 --- a/include/linux/cgroup_drm.h +++ b/include/linux/cgroup_drm.h @@ -19,6 +19,8 @@ */ struct drmcg_device_resource { struct page_counter memory;
seqlock_t sched_lock;
u64 exec_runtime;
};
/** diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c index 08e75eb67593..64e9d0dbe8c8 100644 --- a/kernel/cgroup/drm.c +++ b/kernel/cgroup/drm.c @@ -81,6 +81,7 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev) /* set defaults here */ page_counter_init(&ddr->memory, parent_ddr ? &parent_ddr->memory : NULL);
seqlock_init(&ddr->sched_lock); drmcg->dev_resources[minor] = ddr; return 0;
@@ -287,6 +288,10 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data) seq_printf(sf, "%d:%d %llu\n", DRM_MAJOR, minor->index, minor->dev->drmcg_props.memory_total); break;
case DRMCG_TYPE_SCHED_RUNTIME:
seq_printf(sf, "%d:%d %llu\n", DRM_MAJOR, minor->index,
ktime_to_us(ddr->exec_runtime));
break; default: seq_printf(sf, "%d:%d\n", DRM_MAJOR, minor->index); break;
@@ -384,6 +389,12 @@ struct cftype files[] = { .private = DRMCG_TYPE_MEM_TOTAL, .flags = CFTYPE_ONLY_ON_ROOT, },
{
.name = "sched.runtime",
.seq_show = drmcg_seq_show,
.private = DRMCG_TYPE_SCHED_RUNTIME,
.flags = CFTYPE_NOT_ON_ROOT,
}, { } /* terminate */
};
@@ -440,6 +451,10 @@ EXPORT_SYMBOL(drmcg_device_early_init);
- choose to enact some form of memory reclaim, but the exact behavior is left
- to the DRM device driver to define.
- For @res type of DRMCG_TYPE_SCHED_RUNTIME:
- For GPU time accounting, add @usage amount of GPU time to @drmcg for
- the given device.
*/
- Returns 0 on success. Otherwise, an error code is returned.
int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, @@ -466,6 +481,11 @@ int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, err = 0; } break;
case DRMCG_TYPE_SCHED_RUNTIME:
write_seqlock(&res->sched_lock);
res->exec_runtime = ktime_add(res->exec_runtime, usage);
write_sequnlock(&res->sched_lock);
break; default: err = -EINVAL; break;
-- 2.20.1
On 2/3/2021 5:25 AM, Joonas Lahtinen wrote:
Quoting Brian Welty (2021-01-26 23:46:24)
Single control below is added to DRM cgroup controller in order to track user execution time for GPU devices. It is up to device drivers to charge execution time to the cgroup via drm_cgroup_try_charge().
sched.runtime Read-only value, displays current user execution time for each DRM device. The expectation is that this is incremented by DRM device driver's scheduler upon user context completion or context switch. Units of time are in microseconds for consistency with cpu.stats.
Were not we also planning for a percentage style budgeting?
Yes, that's right. Above is to report accumlated time usage. I can include controls for time sharing in next submission. But not using percentage. Relative time share can be implemented with weights as described in cgroups documentation for resource distribution models. This was also the prior feedback from Tejun [1], and so will look very much like the existing cpu.weight or io.weight.
Capping the maximum runtime is definitely useful, but in order to configure a system for peaceful co-existence of two or more workloads we must also impose a limit on how big portion of the instantaneous capacity can be used.
Agreed. This is also included with CPU and IO controls (cpu.max and io.max), so we should also plan to have the same.
-Brian
[1] https://lists.freedesktop.org/archives/dri-devel/2020-April/262141.html
Regards, Joonas
Signed-off-by: Brian Welty brian.welty@intel.com
Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++ include/drm/drm_cgroup.h | 2 ++ include/linux/cgroup_drm.h | 2 ++ kernel/cgroup/drm.c | 20 ++++++++++++++++++++ 4 files changed, 33 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index ccc25f03a898..f1d0f333a49e 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2205,6 +2205,15 @@ thresholds are hit, this would then allow the DRM device driver to invoke some equivalent to OOM-killer or forced memory eviction for the device backed memory in order to attempt to free additional space.
+The below set of control files are for time accounting of DRM devices. Units +of time are in microseconds.
- sched.runtime
Read-only value, displays current user execution time for each DRM
device. The expectation is that this is incremented by DRM device
driver's scheduler upon user context completion or context switch.
Misc
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h index 9ba0e372eeee..315dab8a93b8 100644 --- a/include/drm/drm_cgroup.h +++ b/include/drm/drm_cgroup.h @@ -22,6 +22,7 @@ enum drmcg_res_type { DRMCG_TYPE_MEM_CURRENT, DRMCG_TYPE_MEM_MAX, DRMCG_TYPE_MEM_TOTAL,
DRMCG_TYPE_SCHED_RUNTIME, __DRMCG_TYPE_LAST,
};
@@ -79,5 +80,6 @@ void drm_cgroup_uncharge(struct drmcg *drmcg,struct drm_device *dev, enum drmcg_res_type type, u64 usage) { }
#endif /* CONFIG_CGROUP_DRM */ #endif /* __DRM_CGROUP_H__ */ diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h index 3570636473cf..0fafa663321e 100644 --- a/include/linux/cgroup_drm.h +++ b/include/linux/cgroup_drm.h @@ -19,6 +19,8 @@ */ struct drmcg_device_resource { struct page_counter memory;
seqlock_t sched_lock;
u64 exec_runtime;
};
/** diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c index 08e75eb67593..64e9d0dbe8c8 100644 --- a/kernel/cgroup/drm.c +++ b/kernel/cgroup/drm.c @@ -81,6 +81,7 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev) /* set defaults here */ page_counter_init(&ddr->memory, parent_ddr ? &parent_ddr->memory : NULL);
seqlock_init(&ddr->sched_lock); drmcg->dev_resources[minor] = ddr; return 0;
@@ -287,6 +288,10 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data) seq_printf(sf, "%d:%d %llu\n", DRM_MAJOR, minor->index, minor->dev->drmcg_props.memory_total); break;
case DRMCG_TYPE_SCHED_RUNTIME:
seq_printf(sf, "%d:%d %llu\n", DRM_MAJOR, minor->index,
ktime_to_us(ddr->exec_runtime));
break; default: seq_printf(sf, "%d:%d\n", DRM_MAJOR, minor->index); break;
@@ -384,6 +389,12 @@ struct cftype files[] = { .private = DRMCG_TYPE_MEM_TOTAL, .flags = CFTYPE_ONLY_ON_ROOT, },
{
.name = "sched.runtime",
.seq_show = drmcg_seq_show,
.private = DRMCG_TYPE_SCHED_RUNTIME,
.flags = CFTYPE_NOT_ON_ROOT,
}, { } /* terminate */
};
@@ -440,6 +451,10 @@ EXPORT_SYMBOL(drmcg_device_early_init);
- choose to enact some form of memory reclaim, but the exact behavior is left
- to the DRM device driver to define.
- For @res type of DRMCG_TYPE_SCHED_RUNTIME:
- For GPU time accounting, add @usage amount of GPU time to @drmcg for
- the given device.
*/
- Returns 0 on success. Otherwise, an error code is returned.
int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, @@ -466,6 +481,11 @@ int drm_cgroup_try_charge(struct drmcg *drmcg, struct drm_device *dev, err = 0; } break;
case DRMCG_TYPE_SCHED_RUNTIME:
write_seqlock(&res->sched_lock);
res->exec_runtime = ktime_add(res->exec_runtime, usage);
write_sequnlock(&res->sched_lock);
break; default: err = -EINVAL; break;
-- 2.20.1
This patch adds tracking of which cgroup to make charges against for a given GEM object. We associate the current task's cgroup with GEM objects as they are created. First user of this is for charging DRM cgroup for device memory allocations. The intended behavior is for device drivers to make the cgroup charging calls at the time that backing store is allocated or deallocated for the object.
Exported functions are provided for charging memory allocations for a GEM object to DRM cgroup. To aid in debugging, we store how many bytes have been charged inside the GEM object. Add helpers for setting and clearing the object's associated cgroup which will check that charges are not being leaked.
For shared objects, this may make the charge against a cgroup that is potentially not the same cgroup as the process using the memory. Based on the memory cgroup's discussion of "memory ownership", this seems acceptable [1].
[1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership"
Signed-off-by: Brian Welty brian.welty@intel.com --- drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); }
+/** + * drm_gem_object_set_cgroup - associate GEM object with a cgroup + * @obj: GEM object which is being associated with a cgroup + * @task: task associated with process control group to use + * + * This will acquire a reference on cgroup and use for charging GEM + * memory allocations. + * This helper could be extended in future to migrate charges to another + * cgroup, print warning if this usage occurs. + */ +void drm_gem_object_set_cgroup(struct drm_gem_object *obj, + struct task_struct *task) +{ + /* if object has existing cgroup, we migrate the charge... */ + if (obj->drmcg) { + pr_warn("DRM: need to migrate cgroup charge of %lld\n", + atomic64_read(&obj->drmcg_bytes_charged)); + } + obj->drmcg = drmcg_get(task); +} +EXPORT_SYMBOL(drm_gem_object_set_cgroup); + +/** + * drm_gem_object_unset_cgroup - clear GEM object's associated cgroup + * @obj: GEM object + * + * This will release a reference on cgroup. + */ +void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{ + WARN_ON(atomic64_read(&obj->drmcg_bytes_charged)); + drmcg_put(obj->drmcg); +} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup); + +/** + * drm_gem_object_charge_mem - try charging size bytes to DRM cgroup + * @obj: GEM object which is being charged + * @size: number of bytes to charge + * + * Try to charge @size bytes to GEM object's associated DRM cgroup. This + * will fail if a successful charge would cause the current device memory + * usage to go above the cgroup's GPU memory maximum limit. + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{ + int ret; + + ret = drm_cgroup_try_charge(obj->drmcg, obj->dev, + DRMCG_TYPE_MEM_CURRENT, size); + if (!ret) + atomic64_add(size, &obj->drmcg_bytes_charged); + return ret; +} +EXPORT_SYMBOL(drm_gem_object_charge_mem); + +/** + * drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup + * @obj: GEM object which is being uncharged + * @size: number of bytes to uncharge + * + * Uncharge @size bytes from the DRM cgroup associated with specified + * GEM object. + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{ + u64 charged = atomic64_read(&obj->drmcg_bytes_charged); + + if (WARN_ON(!charged)) + return; + if (WARN_ON(size > charged)) + size = charged; + + atomic64_sub(size, &obj->drmcg_bytes_charged); + drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT, + size); +} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem); + /** * drm_gem_object_init - initialize an allocated shmem-backed GEM object * @dev: drm_device the object should be initialized for @@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
+ drm_gem_object_set_cgroup(obj, current); + kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size; @@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj); + + /* Release reference on cgroup used with GEM object charging */ + drm_gem_object_unset_cgroup(obj); } EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h>
+#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h>
struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
+ /** + * @drmcg: + * + * cgroup used for charging GEM object page allocations against. This + * is set to the current cgroup during GEM object creation. + * Charging policy is up to the DRM driver to implement and should be + * charged when allocating backing store from device memory. + */ + struct drmcg *drmcg; + atomic64_t drmcg_bytes_charged; + /** * @vma_node: * @@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset);
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj, + struct task_struct *task); +void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size); #endif /* __DRM_GEM_H__ */
On Tue, Jan 26, 2021 at 01:46:25PM -0800, Brian Welty wrote:
This patch adds tracking of which cgroup to make charges against for a given GEM object. We associate the current task's cgroup with GEM objects as they are created. First user of this is for charging DRM cgroup for device memory allocations. The intended behavior is for device drivers to make the cgroup charging calls at the time that backing store is allocated or deallocated for the object.
Exported functions are provided for charging memory allocations for a GEM object to DRM cgroup. To aid in debugging, we store how many bytes have been charged inside the GEM object. Add helpers for setting and clearing the object's associated cgroup which will check that charges are not being leaked.
For shared objects, this may make the charge against a cgroup that is potentially not the same cgroup as the process using the memory. Based on the memory cgroup's discussion of "memory ownership", this seems acceptable [1].
[1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership"
Signed-off-by: Brian Welty brian.welty@intel.com
Since for now we only have the generic gpu/xpu/bikeshed.memory bucket that counts everything, why don't we also charge in these gem functions?
Also, that would remove the need for all these functions exported to drivers. Plus the cgroups setup could also move fully into drm core code, since all drivers (*) support it. That way this would really be a fully generic cgroups controller, and we could land it.
The other things I'd do: - drop gpu scheduling controller from the initial patch series. Yes we'll need it, but we also need vram limits and all these things for full featured controller. Having the minimal viable cgroup controller in upstream would unblock all these other things, and we could discuss them in separate patch series, instead of one big bikeshed that never reaches full consensus.
- the xpu thing is probably real, I just chatted with Android people for their gpu memory accounting needs, and cgroups sounds like a solution for them too. But unlike on desktop/server linux, on Android all shared buffers are allocated from dma-buf heaps, so outside of drm, and hence a cgroup controller that's tightly tied to drm isn't that useful. So I think we should move the controller/charge functions up one level into drivers/gpu/cgroups.
On the naming bikeshed I think gpu is perfectly fine, just explain in the docs that the G stands for "general" :-) Otherwise we might need to rename drivers/gpu to drivers/xpu too, and that's maybe one bikeshed too far. Plus, right now it really is the controller for gpu related memory, even if we extend it to Android (where it would also include video/camera allocatioons). Extending this cgroup controller to accelerators in general is maybe a bit too much.
- The other disambiguation is how we account dma-buf (well, buffer based) gpu allocations vs HMM gpu memory allocations, that might be worth clarifying in the docs.
- Finally to accelerate this further, I think it'd be good to pull out the cgroup spec for this more minimized series into patch 1, as a draft. That way we could get all stakeholders to ack on that ack, so hopefully we're building something that will work for everyone. That way we can hopefully untangle the controller design discussions from the implementation bikeshedding as much as possible.
Cheers, Daniel
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory.
drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); }
+/**
- drm_gem_object_set_cgroup - associate GEM object with a cgroup
- @obj: GEM object which is being associated with a cgroup
- @task: task associated with process control group to use
- This will acquire a reference on cgroup and use for charging GEM
- memory allocations.
- This helper could be extended in future to migrate charges to another
- cgroup, print warning if this usage occurs.
- */
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task)
+{
- /* if object has existing cgroup, we migrate the charge... */
- if (obj->drmcg) {
pr_warn("DRM: need to migrate cgroup charge of %lld\n",
atomic64_read(&obj->drmcg_bytes_charged));
- }
- obj->drmcg = drmcg_get(task);
+} +EXPORT_SYMBOL(drm_gem_object_set_cgroup);
+/**
- drm_gem_object_unset_cgroup - clear GEM object's associated cgroup
- @obj: GEM object
- This will release a reference on cgroup.
- */
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{
- WARN_ON(atomic64_read(&obj->drmcg_bytes_charged));
- drmcg_put(obj->drmcg);
+} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup);
+/**
- drm_gem_object_charge_mem - try charging size bytes to DRM cgroup
- @obj: GEM object which is being charged
- @size: number of bytes to charge
- Try to charge @size bytes to GEM object's associated DRM cgroup. This
- will fail if a successful charge would cause the current device memory
- usage to go above the cgroup's GPU memory maximum limit.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{
- int ret;
- ret = drm_cgroup_try_charge(obj->drmcg, obj->dev,
DRMCG_TYPE_MEM_CURRENT, size);
- if (!ret)
atomic64_add(size, &obj->drmcg_bytes_charged);
- return ret;
+} +EXPORT_SYMBOL(drm_gem_object_charge_mem);
+/**
- drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup
- @obj: GEM object which is being uncharged
- @size: number of bytes to uncharge
- Uncharge @size bytes from the DRM cgroup associated with specified
- GEM object.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{
- u64 charged = atomic64_read(&obj->drmcg_bytes_charged);
- if (WARN_ON(!charged))
return;
- if (WARN_ON(size > charged))
size = charged;
- atomic64_sub(size, &obj->drmcg_bytes_charged);
- drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT,
size);
+} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem);
/**
- drm_gem_object_init - initialize an allocated shmem-backed GEM object
- @dev: drm_device the object should be initialized for
@@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
- drm_gem_object_set_cgroup(obj, current);
- kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size;
@@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj);
- /* Release reference on cgroup used with GEM object charging */
- drm_gem_object_unset_cgroup(obj);
} EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h>
+#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h>
struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
- /**
* @drmcg:
*
* cgroup used for charging GEM object page allocations against. This
* is set to the current cgroup during GEM object creation.
* Charging policy is up to the DRM driver to implement and should be
* charged when allocating backing store from device memory.
*/
- struct drmcg *drmcg;
- atomic64_t drmcg_bytes_charged;
- /**
- @vma_node:
@@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset);
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task);
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size);
#endif /* __DRM_GEM_H__ */
2.20.1
Hi
Am 09.02.21 um 11:54 schrieb Daniel Vetter:
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory.
Do you have a URL to the discussion?
While I recent worked on GEM, I thought that vmwgfx could easily switch to the GEM internals without adopting the interface.
Personally, I think we should consider renaming struct drm_gem_object et al. It's not strictly GEM UAPI, but nowadays anything memory-related. Maybe drm_mem_object would fit.
Best regards Thomas
drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); }
+/**
- drm_gem_object_set_cgroup - associate GEM object with a cgroup
- @obj: GEM object which is being associated with a cgroup
- @task: task associated with process control group to use
- This will acquire a reference on cgroup and use for charging GEM
- memory allocations.
- This helper could be extended in future to migrate charges to another
- cgroup, print warning if this usage occurs.
- */
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task)
+{
- /* if object has existing cgroup, we migrate the charge... */
- if (obj->drmcg) {
pr_warn("DRM: need to migrate cgroup charge of %lld\n",
atomic64_read(&obj->drmcg_bytes_charged));
- }
- obj->drmcg = drmcg_get(task);
+} +EXPORT_SYMBOL(drm_gem_object_set_cgroup);
+/**
- drm_gem_object_unset_cgroup - clear GEM object's associated cgroup
- @obj: GEM object
- This will release a reference on cgroup.
- */
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{
- WARN_ON(atomic64_read(&obj->drmcg_bytes_charged));
- drmcg_put(obj->drmcg);
+} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup);
+/**
- drm_gem_object_charge_mem - try charging size bytes to DRM cgroup
- @obj: GEM object which is being charged
- @size: number of bytes to charge
- Try to charge @size bytes to GEM object's associated DRM cgroup. This
- will fail if a successful charge would cause the current device memory
- usage to go above the cgroup's GPU memory maximum limit.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{
- int ret;
- ret = drm_cgroup_try_charge(obj->drmcg, obj->dev,
DRMCG_TYPE_MEM_CURRENT, size);
- if (!ret)
atomic64_add(size, &obj->drmcg_bytes_charged);
- return ret;
+} +EXPORT_SYMBOL(drm_gem_object_charge_mem);
+/**
- drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup
- @obj: GEM object which is being uncharged
- @size: number of bytes to uncharge
- Uncharge @size bytes from the DRM cgroup associated with specified
- GEM object.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{
- u64 charged = atomic64_read(&obj->drmcg_bytes_charged);
- if (WARN_ON(!charged))
return;
- if (WARN_ON(size > charged))
size = charged;
- atomic64_sub(size, &obj->drmcg_bytes_charged);
- drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT,
size);
+} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem);
- /**
- drm_gem_object_init - initialize an allocated shmem-backed GEM object
- @dev: drm_device the object should be initialized for
@@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
- drm_gem_object_set_cgroup(obj, current);
- kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size;
@@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj);
- /* Release reference on cgroup used with GEM object charging */
- drm_gem_object_unset_cgroup(obj); } EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h>
+#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h>
struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
- /**
* @drmcg:
*
* cgroup used for charging GEM object page allocations against. This
* is set to the current cgroup during GEM object creation.
* Charging policy is up to the DRM driver to implement and should be
* charged when allocating backing store from device memory.
*/
- struct drmcg *drmcg;
- atomic64_t drmcg_bytes_charged;
- /**
- @vma_node:
@@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset);
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task);
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size);
#endif /* __DRM_GEM_H__ */
2.20.1
On Wed, Feb 10, 2021 at 08:52:29AM +0100, Thomas Zimmermann wrote:
Hi
Am 09.02.21 um 11:54 schrieb Daniel Vetter:
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory.
Do you have a URL to the discussion?
While I recent worked on GEM, I thought that vmwgfx could easily switch to the GEM internals without adopting the interface.
Zack Rusin pinged me on irc I think, not sure there's anything on dri-devel. zackr on freenode. I think he's working on this already.
Personally, I think we should consider renaming struct drm_gem_object et al. It's not strictly GEM UAPI, but nowadays anything memory-related. Maybe drm_mem_object would fit.
I think abbrevations we've created that have been around for long enough can stay. Otherwise we'd need to rename drm into something less confusing too :-)
So gem just becomes the buffer based memory management thing in drm, which is the accelerator and display framework in upstream. And ttm is our memory manager for discrete gpus - ttm means translation table manager, and that part just got moved out as a helper so drivers call we needed :-)
I mean we haven't even managed to rename the cma helpers to dma helpers. The one thing we did manage is rename struct fence to struct dma_fence, because no prefix was just really bad naming accident.
Cheers, Daniel
Best regards Thomas
drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); } +/**
- drm_gem_object_set_cgroup - associate GEM object with a cgroup
- @obj: GEM object which is being associated with a cgroup
- @task: task associated with process control group to use
- This will acquire a reference on cgroup and use for charging GEM
- memory allocations.
- This helper could be extended in future to migrate charges to another
- cgroup, print warning if this usage occurs.
- */
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task)
+{
- /* if object has existing cgroup, we migrate the charge... */
- if (obj->drmcg) {
pr_warn("DRM: need to migrate cgroup charge of %lld\n",
atomic64_read(&obj->drmcg_bytes_charged));
- }
- obj->drmcg = drmcg_get(task);
+} +EXPORT_SYMBOL(drm_gem_object_set_cgroup);
+/**
- drm_gem_object_unset_cgroup - clear GEM object's associated cgroup
- @obj: GEM object
- This will release a reference on cgroup.
- */
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{
- WARN_ON(atomic64_read(&obj->drmcg_bytes_charged));
- drmcg_put(obj->drmcg);
+} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup);
+/**
- drm_gem_object_charge_mem - try charging size bytes to DRM cgroup
- @obj: GEM object which is being charged
- @size: number of bytes to charge
- Try to charge @size bytes to GEM object's associated DRM cgroup. This
- will fail if a successful charge would cause the current device memory
- usage to go above the cgroup's GPU memory maximum limit.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{
- int ret;
- ret = drm_cgroup_try_charge(obj->drmcg, obj->dev,
DRMCG_TYPE_MEM_CURRENT, size);
- if (!ret)
atomic64_add(size, &obj->drmcg_bytes_charged);
- return ret;
+} +EXPORT_SYMBOL(drm_gem_object_charge_mem);
+/**
- drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup
- @obj: GEM object which is being uncharged
- @size: number of bytes to uncharge
- Uncharge @size bytes from the DRM cgroup associated with specified
- GEM object.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{
- u64 charged = atomic64_read(&obj->drmcg_bytes_charged);
- if (WARN_ON(!charged))
return;
- if (WARN_ON(size > charged))
size = charged;
- atomic64_sub(size, &obj->drmcg_bytes_charged);
- drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT,
size);
+} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem);
- /**
- drm_gem_object_init - initialize an allocated shmem-backed GEM object
- @dev: drm_device the object should be initialized for
@@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
- drm_gem_object_set_cgroup(obj, current);
- kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size;
@@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj) dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj);
- /* Release reference on cgroup used with GEM object charging */
- drm_gem_object_unset_cgroup(obj); } EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h> +#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h> struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
- /**
* @drmcg:
*
* cgroup used for charging GEM object page allocations against. This
* is set to the current cgroup during GEM object creation.
* Charging policy is up to the DRM driver to implement and should be
* charged when allocating backing store from device memory.
*/
- struct drmcg *drmcg;
- atomic64_t drmcg_bytes_charged;
- /**
- @vma_node:
@@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset); +void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task);
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size);
#endif /* __DRM_GEM_H__ */
2.20.1
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
On 2/9/2021 2:54 AM, Daniel Vetter wrote:
On Tue, Jan 26, 2021 at 01:46:25PM -0800, Brian Welty wrote:
This patch adds tracking of which cgroup to make charges against for a given GEM object. We associate the current task's cgroup with GEM objects as they are created. First user of this is for charging DRM cgroup for device memory allocations. The intended behavior is for device drivers to make the cgroup charging calls at the time that backing store is allocated or deallocated for the object.
Exported functions are provided for charging memory allocations for a GEM object to DRM cgroup. To aid in debugging, we store how many bytes have been charged inside the GEM object. Add helpers for setting and clearing the object's associated cgroup which will check that charges are not being leaked.
For shared objects, this may make the charge against a cgroup that is potentially not the same cgroup as the process using the memory. Based on the memory cgroup's discussion of "memory ownership", this seems acceptable [1].
[1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership"
Signed-off-by: Brian Welty brian.welty@intel.com
Since for now we only have the generic gpu/xpu/bikeshed.memory bucket that counts everything, why don't we also charge in these gem functions?
I'm not sure what you mean exactly. You want to merge/move the charging logic proposed in patch #5 (drm_cgroup_try_charge in kernel/cgroup/drm.c) into drm_gem_object_charge_mem() ?
Or reading below, I think you are okay keeping the logic separated as is, but you want much of the code in kernel/cgroup/drm.c moved to drivers/gpu/cgroup ? Yes, I see that should allow to reduce number of exported functions.
Also, that would remove the need for all these functions exported to drivers. Plus the cgroups setup could also move fully into drm core code, since all drivers (*) support it That way this would really be a fully generic cgroups controller, and we could land it.
Patch #2 proposed to have a few setup functions called during drm device registration. You are suggesting to have this more tightly integrated? Okay, can see what that looks like. It's true most of the exported functions from kernel/cgroup/drm.c were taking a struct drm_device pointer, so seems it can be restructured as you suggest. But I guess we will always need some logic in kernel/cgroup/drm.c to encapsulate the allocation of the 'struct cgroup_subsys_state' for this new controller. But I'm not sure I see how this makes the controller 'fully generic' as you describe.
The other things I'd do:
drop gpu scheduling controller from the initial patch series. Yes we'll need it, but we also need vram limits and all these things for full featured controller. Having the minimal viable cgroup controller in upstream would unblock all these other things, and we could discuss them in separate patch series, instead of one big bikeshed that never reaches full consensus.
the xpu thing is probably real, I just chatted with Android people for their gpu memory accounting needs, and cgroups sounds like a solution for them too. But unlike on desktop/server linux, on Android all shared buffers are allocated from dma-buf heaps, so outside of drm, and hence a cgroup controller that's tightly tied to drm isn't that useful. So I think we should move the controller/charge functions up one level into drivers/gpu/cgroups.
Hmm, so for this, you are asking for the cgroup logic to not directly use DRM data structures? Okay, that's why you suggest drivers/gpu/cgroups and not drivers/gpu/drm/cgroups. So this is your angle to make it 'fully generic' then.....
On the naming bikeshed I think gpu is perfectly fine, just explain in the docs that the G stands for "general" :-) Otherwise we might need to rename drivers/gpu to drivers/xpu too, and that's maybe one bikeshed too far. Plus, right now it really is the controller for gpu related memory, even if we extend it to Android (where it would also include video/camera allocatioons). Extending this cgroup controller to accelerators in general is maybe a bit too much.
The other disambiguation is how we account dma-buf (well, buffer based) gpu allocations vs HMM gpu memory allocations, that might be worth clarifying in the docs.
Finally to accelerate this further, I think it'd be good to pull out the cgroup spec for this more minimized series into patch 1, as a draft. That way we could get all stakeholders to ack on that ack, so hopefully we're building something that will work for everyone. That way we can hopefully untangle the controller design discussions from the implementation bikeshedding as much as possible.
Okay, thanks for all the inputs. I agree the 'cgroup spec' should be in first patch. Can redo this way as well.
As much of the code here for the controller was Kenny's work... Kenny, any input on Daniel's suggestions? Otherwise, I can proceed to rework as suggested.
Thanks, -Brian
Cheers, Daniel
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory.
drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); }
+/**
- drm_gem_object_set_cgroup - associate GEM object with a cgroup
- @obj: GEM object which is being associated with a cgroup
- @task: task associated with process control group to use
- This will acquire a reference on cgroup and use for charging GEM
- memory allocations.
- This helper could be extended in future to migrate charges to another
- cgroup, print warning if this usage occurs.
- */
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task)
+{
- /* if object has existing cgroup, we migrate the charge... */
- if (obj->drmcg) {
pr_warn("DRM: need to migrate cgroup charge of %lld\n",
atomic64_read(&obj->drmcg_bytes_charged));
- }
- obj->drmcg = drmcg_get(task);
+} +EXPORT_SYMBOL(drm_gem_object_set_cgroup);
+/**
- drm_gem_object_unset_cgroup - clear GEM object's associated cgroup
- @obj: GEM object
- This will release a reference on cgroup.
- */
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{
- WARN_ON(atomic64_read(&obj->drmcg_bytes_charged));
- drmcg_put(obj->drmcg);
+} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup);
+/**
- drm_gem_object_charge_mem - try charging size bytes to DRM cgroup
- @obj: GEM object which is being charged
- @size: number of bytes to charge
- Try to charge @size bytes to GEM object's associated DRM cgroup. This
- will fail if a successful charge would cause the current device memory
- usage to go above the cgroup's GPU memory maximum limit.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{
- int ret;
- ret = drm_cgroup_try_charge(obj->drmcg, obj->dev,
DRMCG_TYPE_MEM_CURRENT, size);
- if (!ret)
atomic64_add(size, &obj->drmcg_bytes_charged);
- return ret;
+} +EXPORT_SYMBOL(drm_gem_object_charge_mem);
+/**
- drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup
- @obj: GEM object which is being uncharged
- @size: number of bytes to uncharge
- Uncharge @size bytes from the DRM cgroup associated with specified
- GEM object.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{
- u64 charged = atomic64_read(&obj->drmcg_bytes_charged);
- if (WARN_ON(!charged))
return;
- if (WARN_ON(size > charged))
size = charged;
- atomic64_sub(size, &obj->drmcg_bytes_charged);
- drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT,
size);
+} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem);
/**
- drm_gem_object_init - initialize an allocated shmem-backed GEM object
- @dev: drm_device the object should be initialized for
@@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
- drm_gem_object_set_cgroup(obj, current);
- kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size;
@@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj);
- /* Release reference on cgroup used with GEM object charging */
- drm_gem_object_unset_cgroup(obj);
} EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h>
+#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h>
struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
- /**
* @drmcg:
*
* cgroup used for charging GEM object page allocations against. This
* is set to the current cgroup during GEM object creation.
* Charging policy is up to the DRM driver to implement and should be
* charged when allocating backing store from device memory.
*/
- struct drmcg *drmcg;
- atomic64_t drmcg_bytes_charged;
- /**
- @vma_node:
@@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset);
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task);
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size);
#endif /* __DRM_GEM_H__ */
2.20.1
On Wed, Feb 10, 2021 at 02:00:57PM -0800, Brian Welty wrote:
On 2/9/2021 2:54 AM, Daniel Vetter wrote:
On Tue, Jan 26, 2021 at 01:46:25PM -0800, Brian Welty wrote:
This patch adds tracking of which cgroup to make charges against for a given GEM object. We associate the current task's cgroup with GEM objects as they are created. First user of this is for charging DRM cgroup for device memory allocations. The intended behavior is for device drivers to make the cgroup charging calls at the time that backing store is allocated or deallocated for the object.
Exported functions are provided for charging memory allocations for a GEM object to DRM cgroup. To aid in debugging, we store how many bytes have been charged inside the GEM object. Add helpers for setting and clearing the object's associated cgroup which will check that charges are not being leaked.
For shared objects, this may make the charge against a cgroup that is potentially not the same cgroup as the process using the memory. Based on the memory cgroup's discussion of "memory ownership", this seems acceptable [1].
[1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership"
Signed-off-by: Brian Welty brian.welty@intel.com
Since for now we only have the generic gpu/xpu/bikeshed.memory bucket that counts everything, why don't we also charge in these gem functions?
I'm not sure what you mean exactly. You want to merge/move the charging logic proposed in patch #5 (drm_cgroup_try_charge in kernel/cgroup/drm.c) into drm_gem_object_charge_mem() ?
Or reading below, I think you are okay keeping the logic separated as is, but you want much of the code in kernel/cgroup/drm.c moved to drivers/gpu/cgroup ? Yes, I see that should allow to reduce number of exported functions.
Both. I mean we'd need to look at the code again when it's shuffled, but I'd say:
- cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
- the charging for gem buffers moves into core gem code, so it happens for all gpu drivers and all gem buffer allocations.
- this might or might not mean a bunch less exported stuff from the cgroups files (since you don't need separate steps for linking a gem object to a cgroup from the actual charging), and probably no exports anymore for drivers (since they charge nothing). That will change when we add charging for specific memory pools I guess, but we add that when we add tha functionality.
Also, that would remove the need for all these functions exported to drivers. Plus the cgroups setup could also move fully into drm core code, since all drivers (*) support it That way this would really be a fully generic cgroups controller, and we could land it.
Patch #2 proposed to have a few setup functions called during drm device registration. You are suggesting to have this more tightly integrated?
Yeah essentially if DRIVER_GEM is set drm core would simply set this all up. Since with this we'd always account all gem buffers in cgroups, and it would make basic cgroup support a non-optional part of drm drivers.
Okay, can see what that looks like. It's true most of the exported functions from kernel/cgroup/drm.c were taking a struct drm_device pointer, so seems it can be restructured as you suggest. But I guess we will always need some logic in kernel/cgroup/drm.c to encapsulate the allocation of the 'struct cgroup_subsys_state' for this new controller. But I'm not sure I see how this makes the controller 'fully generic' as you describe.
All DRIVER_GEM would automatacially support it. And yes there'll still be some encapsulation ofc.
The other things I'd do:
drop gpu scheduling controller from the initial patch series. Yes we'll need it, but we also need vram limits and all these things for full featured controller. Having the minimal viable cgroup controller in upstream would unblock all these other things, and we could discuss them in separate patch series, instead of one big bikeshed that never reaches full consensus.
the xpu thing is probably real, I just chatted with Android people for their gpu memory accounting needs, and cgroups sounds like a solution for them too. But unlike on desktop/server linux, on Android all shared buffers are allocated from dma-buf heaps, so outside of drm, and hence a cgroup controller that's tightly tied to drm isn't that useful. So I think we should move the controller/charge functions up one level into drivers/gpu/cgroups.
Hmm, so for this, you are asking for the cgroup logic to not directly use DRM data structures? Okay, that's why you suggest drivers/gpu/cgroups and not drivers/gpu/drm/cgroups. So this is your angle to make it 'fully generic' then.....
This is another flavour of "generic", maybe need to split them up: - make it more generic by rolling it out for all DRIVER_GEM - make it more generic by allowing non-drm code to charge gpu memory (android's dma-buf heaps will need that, maybe v4l eventually too)
On the naming bikeshed I think gpu is perfectly fine, just explain in the docs that the G stands for "general" :-) Otherwise we might need to rename drivers/gpu to drivers/xpu too, and that's maybe one bikeshed too far. Plus, right now it really is the controller for gpu related memory, even if we extend it to Android (where it would also include video/camera allocatioons). Extending this cgroup controller to accelerators in general is maybe a bit too much.
The other disambiguation is how we account dma-buf (well, buffer based) gpu allocations vs HMM gpu memory allocations, that might be worth clarifying in the docs.
Finally to accelerate this further, I think it'd be good to pull out the cgroup spec for this more minimized series into patch 1, as a draft. That way we could get all stakeholders to ack on that ack, so hopefully we're building something that will work for everyone. That way we can hopefully untangle the controller design discussions from the implementation bikeshedding as much as possible.
Okay, thanks for all the inputs. I agree the 'cgroup spec' should be in first patch. Can redo this way as well.
As much of the code here for the controller was Kenny's work... Kenny, any input on Daniel's suggestions? Otherwise, I can proceed to rework as suggested.
If you're worried about acknowledgement if you end up fully rewriting code: Reference an old version from Kenny from archive and mention in the commit log it's based on that work. There's no requirement that you can only reuse patches from other people entirely unchanged, this kind of collaborative patch development mode happens all the time.
Cheers, Daniel
Thanks, -Brian
Cheers, Daniel
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory.
drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); }
+/**
- drm_gem_object_set_cgroup - associate GEM object with a cgroup
- @obj: GEM object which is being associated with a cgroup
- @task: task associated with process control group to use
- This will acquire a reference on cgroup and use for charging GEM
- memory allocations.
- This helper could be extended in future to migrate charges to another
- cgroup, print warning if this usage occurs.
- */
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task)
+{
- /* if object has existing cgroup, we migrate the charge... */
- if (obj->drmcg) {
pr_warn("DRM: need to migrate cgroup charge of %lld\n",
atomic64_read(&obj->drmcg_bytes_charged));
- }
- obj->drmcg = drmcg_get(task);
+} +EXPORT_SYMBOL(drm_gem_object_set_cgroup);
+/**
- drm_gem_object_unset_cgroup - clear GEM object's associated cgroup
- @obj: GEM object
- This will release a reference on cgroup.
- */
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{
- WARN_ON(atomic64_read(&obj->drmcg_bytes_charged));
- drmcg_put(obj->drmcg);
+} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup);
+/**
- drm_gem_object_charge_mem - try charging size bytes to DRM cgroup
- @obj: GEM object which is being charged
- @size: number of bytes to charge
- Try to charge @size bytes to GEM object's associated DRM cgroup. This
- will fail if a successful charge would cause the current device memory
- usage to go above the cgroup's GPU memory maximum limit.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{
- int ret;
- ret = drm_cgroup_try_charge(obj->drmcg, obj->dev,
DRMCG_TYPE_MEM_CURRENT, size);
- if (!ret)
atomic64_add(size, &obj->drmcg_bytes_charged);
- return ret;
+} +EXPORT_SYMBOL(drm_gem_object_charge_mem);
+/**
- drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup
- @obj: GEM object which is being uncharged
- @size: number of bytes to uncharge
- Uncharge @size bytes from the DRM cgroup associated with specified
- GEM object.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{
- u64 charged = atomic64_read(&obj->drmcg_bytes_charged);
- if (WARN_ON(!charged))
return;
- if (WARN_ON(size > charged))
size = charged;
- atomic64_sub(size, &obj->drmcg_bytes_charged);
- drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT,
size);
+} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem);
/**
- drm_gem_object_init - initialize an allocated shmem-backed GEM object
- @dev: drm_device the object should be initialized for
@@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
- drm_gem_object_set_cgroup(obj, current);
- kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size;
@@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj);
- /* Release reference on cgroup used with GEM object charging */
- drm_gem_object_unset_cgroup(obj);
} EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h>
+#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h>
struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
- /**
* @drmcg:
*
* cgroup used for charging GEM object page allocations against. This
* is set to the current cgroup during GEM object creation.
* Charging policy is up to the DRM driver to implement and should be
* charged when allocating backing store from device memory.
*/
- struct drmcg *drmcg;
- atomic64_t drmcg_bytes_charged;
- /**
- @vma_node:
@@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset);
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task);
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size);
#endif /* __DRM_GEM_H__ */
2.20.1
On 2/11/2021 7:34 AM, Daniel Vetter wrote:
On Wed, Feb 10, 2021 at 02:00:57PM -0800, Brian Welty wrote:
On 2/9/2021 2:54 AM, Daniel Vetter wrote:
On Tue, Jan 26, 2021 at 01:46:25PM -0800, Brian Welty wrote:
This patch adds tracking of which cgroup to make charges against for a given GEM object. We associate the current task's cgroup with GEM objects as they are created. First user of this is for charging DRM cgroup for device memory allocations. The intended behavior is for device drivers to make the cgroup charging calls at the time that backing store is allocated or deallocated for the object.
Exported functions are provided for charging memory allocations for a GEM object to DRM cgroup. To aid in debugging, we store how many bytes have been charged inside the GEM object. Add helpers for setting and clearing the object's associated cgroup which will check that charges are not being leaked.
For shared objects, this may make the charge against a cgroup that is potentially not the same cgroup as the process using the memory. Based on the memory cgroup's discussion of "memory ownership", this seems acceptable [1].
[1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership"
Signed-off-by: Brian Welty brian.welty@intel.com
Since for now we only have the generic gpu/xpu/bikeshed.memory bucket that counts everything, why don't we also charge in these gem functions?
I'm not sure what you mean exactly. You want to merge/move the charging logic proposed in patch #5 (drm_cgroup_try_charge in kernel/cgroup/drm.c) into drm_gem_object_charge_mem() ?
Or reading below, I think you are okay keeping the logic separated as is, but you want much of the code in kernel/cgroup/drm.c moved to drivers/gpu/cgroup ? Yes, I see that should allow to reduce number of exported functions.
Both. I mean we'd need to look at the code again when it's shuffled, but I'd say:
cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
the charging for gem buffers moves into core gem code, so it happens for all gpu drivers and all gem buffer allocations.
Daniel, I'm not sure we're in sync on what 'charging for general gpu memory' means. Thus far, I have been proposing to charge/uncharge when backing store is allocated/freed. And thus, this would be done in DRM driver (so then also in the dma-buf exporter). I can't see how we'd hoist this part into drm gem code. The memory limit in this series is for VRAM usage/limit not GEM buffers...
Unless you are talking about charging for GEM buffer creation? But this is more of a 'soft resource' more along lines of Kenny's earlier GEM buffer limit control. I raised issue with this then, and at the time, Tejun agreed we should keep to 'hard resource' controls, see [1] and [2].
[1] https://lists.freedesktop.org/archives/dri-devel/2019-May/218071.html [2] https://lists.freedesktop.org/archives/dri-devel/2020-April/262141.html
- this might or might not mean a bunch less exported stuff from the cgroups files (since you don't need separate steps for linking a gem object to a cgroup from the actual charging), and probably no exports anymore for drivers (since they charge nothing). That will change when we add charging for specific memory pools I guess, but we add that when we add tha functionality.
... so considering VRAM charging, then yes, we very much need to have exported functions for drivers to do the charging. But these can be exported from drm.ko (or new .ko?) instead of kernel. Is that still preference? Also, if number of exported functions is concern, we can replace some of it with use of function pointers.
So then returning to this comment of yours:
- cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
If you agree that we are charging just at backing-store level, then I think logic belongs in drivers/gpu/drm/cgroup ?? As charging is done in DRM driver (also dma-buf exporter). In other words, part of drm. If I understand, dma-buf heaps is exporter of system memory and doesn't need to charge against gpu controller?? Will need some help to understand the dma-buf heap use case a bit more.
Thanks, -Brian
Also, that would remove the need for all these functions exported to drivers. Plus the cgroups setup could also move fully into drm core code, since all drivers (*) support it That way this would really be a fully generic cgroups controller, and we could land it.
Patch #2 proposed to have a few setup functions called during drm device registration. You are suggesting to have this more tightly integrated?
Yeah essentially if DRIVER_GEM is set drm core would simply set this all up. Since with this we'd always account all gem buffers in cgroups, and it would make basic cgroup support a non-optional part of drm drivers.
Okay, can see what that looks like. It's true most of the exported functions from kernel/cgroup/drm.c were taking a struct drm_device pointer, so seems it can be restructured as you suggest. But I guess we will always need some logic in kernel/cgroup/drm.c to encapsulate the allocation of the 'struct cgroup_subsys_state' for this new controller. But I'm not sure I see how this makes the controller 'fully generic' as you describe.
All DRIVER_GEM would automatacially support it. And yes there'll still be some encapsulation ofc.
The other things I'd do:
drop gpu scheduling controller from the initial patch series. Yes we'll need it, but we also need vram limits and all these things for full featured controller. Having the minimal viable cgroup controller in upstream would unblock all these other things, and we could discuss them in separate patch series, instead of one big bikeshed that never reaches full consensus.
the xpu thing is probably real, I just chatted with Android people for their gpu memory accounting needs, and cgroups sounds like a solution for them too. But unlike on desktop/server linux, on Android all shared buffers are allocated from dma-buf heaps, so outside of drm, and hence a cgroup controller that's tightly tied to drm isn't that useful. So I think we should move the controller/charge functions up one level into drivers/gpu/cgroups.
Hmm, so for this, you are asking for the cgroup logic to not directly use DRM data structures? Okay, that's why you suggest drivers/gpu/cgroups and not drivers/gpu/drm/cgroups. So this is your angle to make it 'fully generic' then.....
This is another flavour of "generic", maybe need to split them up:
- make it more generic by rolling it out for all DRIVER_GEM
- make it more generic by allowing non-drm code to charge gpu memory (android's dma-buf heaps will need that, maybe v4l eventually too)
On the naming bikeshed I think gpu is perfectly fine, just explain in the docs that the G stands for "general" :-) Otherwise we might need to rename drivers/gpu to drivers/xpu too, and that's maybe one bikeshed too far. Plus, right now it really is the controller for gpu related memory, even if we extend it to Android (where it would also include video/camera allocatioons). Extending this cgroup controller to accelerators in general is maybe a bit too much.
The other disambiguation is how we account dma-buf (well, buffer based) gpu allocations vs HMM gpu memory allocations, that might be worth clarifying in the docs.
Finally to accelerate this further, I think it'd be good to pull out the cgroup spec for this more minimized series into patch 1, as a draft. That way we could get all stakeholders to ack on that ack, so hopefully we're building something that will work for everyone. That way we can hopefully untangle the controller design discussions from the implementation bikeshedding as much as possible.
Okay, thanks for all the inputs. I agree the 'cgroup spec' should be in first patch. Can redo this way as well.
As much of the code here for the controller was Kenny's work... Kenny, any input on Daniel's suggestions? Otherwise, I can proceed to rework as suggested.
If you're worried about acknowledgement if you end up fully rewriting code: Reference an old version from Kenny from archive and mention in the commit log it's based on that work. There's no requirement that you can only reuse patches from other people entirely unchanged, this kind of collaborative patch development mode happens all the time.
Cheers, Daniel
Thanks, -Brian
Cheers, Daniel
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory.
drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); }
+/**
- drm_gem_object_set_cgroup - associate GEM object with a cgroup
- @obj: GEM object which is being associated with a cgroup
- @task: task associated with process control group to use
- This will acquire a reference on cgroup and use for charging GEM
- memory allocations.
- This helper could be extended in future to migrate charges to another
- cgroup, print warning if this usage occurs.
- */
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task)
+{
- /* if object has existing cgroup, we migrate the charge... */
- if (obj->drmcg) {
pr_warn("DRM: need to migrate cgroup charge of %lld\n",
atomic64_read(&obj->drmcg_bytes_charged));
- }
- obj->drmcg = drmcg_get(task);
+} +EXPORT_SYMBOL(drm_gem_object_set_cgroup);
+/**
- drm_gem_object_unset_cgroup - clear GEM object's associated cgroup
- @obj: GEM object
- This will release a reference on cgroup.
- */
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{
- WARN_ON(atomic64_read(&obj->drmcg_bytes_charged));
- drmcg_put(obj->drmcg);
+} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup);
+/**
- drm_gem_object_charge_mem - try charging size bytes to DRM cgroup
- @obj: GEM object which is being charged
- @size: number of bytes to charge
- Try to charge @size bytes to GEM object's associated DRM cgroup. This
- will fail if a successful charge would cause the current device memory
- usage to go above the cgroup's GPU memory maximum limit.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{
- int ret;
- ret = drm_cgroup_try_charge(obj->drmcg, obj->dev,
DRMCG_TYPE_MEM_CURRENT, size);
- if (!ret)
atomic64_add(size, &obj->drmcg_bytes_charged);
- return ret;
+} +EXPORT_SYMBOL(drm_gem_object_charge_mem);
+/**
- drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup
- @obj: GEM object which is being uncharged
- @size: number of bytes to uncharge
- Uncharge @size bytes from the DRM cgroup associated with specified
- GEM object.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{
- u64 charged = atomic64_read(&obj->drmcg_bytes_charged);
- if (WARN_ON(!charged))
return;
- if (WARN_ON(size > charged))
size = charged;
- atomic64_sub(size, &obj->drmcg_bytes_charged);
- drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT,
size);
+} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem);
/**
- drm_gem_object_init - initialize an allocated shmem-backed GEM object
- @dev: drm_device the object should be initialized for
@@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
- drm_gem_object_set_cgroup(obj, current);
- kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size;
@@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj);
- /* Release reference on cgroup used with GEM object charging */
- drm_gem_object_unset_cgroup(obj);
} EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h>
+#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h>
struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
- /**
* @drmcg:
*
* cgroup used for charging GEM object page allocations against. This
* is set to the current cgroup during GEM object creation.
* Charging policy is up to the DRM driver to implement and should be
* charged when allocating backing store from device memory.
*/
- struct drmcg *drmcg;
- atomic64_t drmcg_bytes_charged;
- /**
- @vma_node:
@@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset);
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task);
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size);
#endif /* __DRM_GEM_H__ */
2.20.1
On Sat, Mar 6, 2021 at 1:44 AM Brian Welty brian.welty@intel.com wrote:
On 2/11/2021 7:34 AM, Daniel Vetter wrote:
On Wed, Feb 10, 2021 at 02:00:57PM -0800, Brian Welty wrote:
On 2/9/2021 2:54 AM, Daniel Vetter wrote:
On Tue, Jan 26, 2021 at 01:46:25PM -0800, Brian Welty wrote:
This patch adds tracking of which cgroup to make charges against for a given GEM object. We associate the current task's cgroup with GEM objects as they are created. First user of this is for charging DRM cgroup for device memory allocations. The intended behavior is for device drivers to make the cgroup charging calls at the time that backing store is allocated or deallocated for the object.
Exported functions are provided for charging memory allocations for a GEM object to DRM cgroup. To aid in debugging, we store how many bytes have been charged inside the GEM object. Add helpers for setting and clearing the object's associated cgroup which will check that charges are not being leaked.
For shared objects, this may make the charge against a cgroup that is potentially not the same cgroup as the process using the memory. Based on the memory cgroup's discussion of "memory ownership", this seems acceptable [1].
[1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership"
Signed-off-by: Brian Welty brian.welty@intel.com
Since for now we only have the generic gpu/xpu/bikeshed.memory bucket that counts everything, why don't we also charge in these gem functions?
I'm not sure what you mean exactly. You want to merge/move the charging logic proposed in patch #5 (drm_cgroup_try_charge in kernel/cgroup/drm.c) into drm_gem_object_charge_mem() ?
Or reading below, I think you are okay keeping the logic separated as is, but you want much of the code in kernel/cgroup/drm.c moved to drivers/gpu/cgroup ? Yes, I see that should allow to reduce number of exported functions.
Both. I mean we'd need to look at the code again when it's shuffled, but I'd say:
cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
the charging for gem buffers moves into core gem code, so it happens for all gpu drivers and all gem buffer allocations.
Daniel, I'm not sure we're in sync on what 'charging for general gpu memory' means. Thus far, I have been proposing to charge/uncharge when backing store is allocated/freed. And thus, this would be done in DRM driver (so then also in the dma-buf exporter). I can't see how we'd hoist this part into drm gem code. The memory limit in this series is for VRAM usage/limit not GEM buffers...
Yes this would be at gem buffer creation time. And just to get cgroups for drm up&running.
Unless you are talking about charging for GEM buffer creation? But this is more of a 'soft resource' more along lines of Kenny's earlier GEM buffer limit control. I raised issue with this then, and at the time, Tejun agreed we should keep to 'hard resource' controls, see [1] and [2].
[1] https://lists.freedesktop.org/archives/dri-devel/2019-May/218071.html [2] https://lists.freedesktop.org/archives/dri-devel/2020-April/262141.html
- this might or might not mean a bunch less exported stuff from the cgroups files (since you don't need separate steps for linking a gem object to a cgroup from the actual charging), and probably no exports anymore for drivers (since they charge nothing). That will change when we add charging for specific memory pools I guess, but we add that when we add tha functionality.
... so considering VRAM charging, then yes, we very much need to have exported functions for drivers to do the charging. But these can be exported from drm.ko (or new .ko?) instead of kernel. Is that still preference? Also, if number of exported functions is concern, we can replace some of it with use of function pointers.
So the reason I suggested we drop all this is because we won't charge in drivers, we'll charge in ttm buffer management code. Which we'll adopt for dg1 in upstream. But it will take some time.
So then returning to this comment of yours:
- cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
If you agree that we are charging just at backing-store level, then I think logic belongs in drivers/gpu/drm/cgroup ?? As charging is done in DRM driver (also dma-buf exporter). In other words, part of drm. If I understand, dma-buf heaps is exporter of system memory and doesn't need to charge against gpu controller?? Will need some help to understand the dma-buf heap use case a bit more.
Well we also need to track system gpu memory somehow. Currently that flies under the radar, and nasty userspace can just easily exhaust all of system memory with gpu buffers, even if there's otherwise cgroup limits in place. Which is not good. Hence also the overall limit for buffers. -Daniel
Thanks, -Brian
Also, that would remove the need for all these functions exported to drivers. Plus the cgroups setup could also move fully into drm core code, since all drivers (*) support it That way this would really be a fully generic cgroups controller, and we could land it.
Patch #2 proposed to have a few setup functions called during drm device registration. You are suggesting to have this more tightly integrated?
Yeah essentially if DRIVER_GEM is set drm core would simply set this all up. Since with this we'd always account all gem buffers in cgroups, and it would make basic cgroup support a non-optional part of drm drivers.
Okay, can see what that looks like. It's true most of the exported functions from kernel/cgroup/drm.c were taking a struct drm_device pointer, so seems it can be restructured as you suggest. But I guess we will always need some logic in kernel/cgroup/drm.c to encapsulate the allocation of the 'struct cgroup_subsys_state' for this new controller. But I'm not sure I see how this makes the controller 'fully generic' as you describe.
All DRIVER_GEM would automatacially support it. And yes there'll still be some encapsulation ofc.
The other things I'd do:
drop gpu scheduling controller from the initial patch series. Yes we'll need it, but we also need vram limits and all these things for full featured controller. Having the minimal viable cgroup controller in upstream would unblock all these other things, and we could discuss them in separate patch series, instead of one big bikeshed that never reaches full consensus.
the xpu thing is probably real, I just chatted with Android people for their gpu memory accounting needs, and cgroups sounds like a solution for them too. But unlike on desktop/server linux, on Android all shared buffers are allocated from dma-buf heaps, so outside of drm, and hence a cgroup controller that's tightly tied to drm isn't that useful. So I think we should move the controller/charge functions up one level into drivers/gpu/cgroups.
Hmm, so for this, you are asking for the cgroup logic to not directly use DRM data structures? Okay, that's why you suggest drivers/gpu/cgroups and not drivers/gpu/drm/cgroups. So this is your angle to make it 'fully generic' then.....
This is another flavour of "generic", maybe need to split them up:
- make it more generic by rolling it out for all DRIVER_GEM
- make it more generic by allowing non-drm code to charge gpu memory (android's dma-buf heaps will need that, maybe v4l eventually too)
On the naming bikeshed I think gpu is perfectly fine, just explain in the docs that the G stands for "general" :-) Otherwise we might need to rename drivers/gpu to drivers/xpu too, and that's maybe one bikeshed too far. Plus, right now it really is the controller for gpu related memory, even if we extend it to Android (where it would also include video/camera allocatioons). Extending this cgroup controller to accelerators in general is maybe a bit too much.
The other disambiguation is how we account dma-buf (well, buffer based) gpu allocations vs HMM gpu memory allocations, that might be worth clarifying in the docs.
Finally to accelerate this further, I think it'd be good to pull out the cgroup spec for this more minimized series into patch 1, as a draft. That way we could get all stakeholders to ack on that ack, so hopefully we're building something that will work for everyone. That way we can hopefully untangle the controller design discussions from the implementation bikeshedding as much as possible.
Okay, thanks for all the inputs. I agree the 'cgroup spec' should be in first patch. Can redo this way as well.
As much of the code here for the controller was Kenny's work... Kenny, any input on Daniel's suggestions? Otherwise, I can proceed to rework as suggested.
If you're worried about acknowledgement if you end up fully rewriting code: Reference an old version from Kenny from archive and mention in the commit log it's based on that work. There's no requirement that you can only reuse patches from other people entirely unchanged, this kind of collaborative patch development mode happens all the time.
Cheers, Daniel
Thanks, -Brian
Cheers, Daniel
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory.
drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); }
+/**
- drm_gem_object_set_cgroup - associate GEM object with a cgroup
- @obj: GEM object which is being associated with a cgroup
- @task: task associated with process control group to use
- This will acquire a reference on cgroup and use for charging GEM
- memory allocations.
- This helper could be extended in future to migrate charges to another
- cgroup, print warning if this usage occurs.
- */
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task)
+{
- /* if object has existing cgroup, we migrate the charge... */
- if (obj->drmcg) {
pr_warn("DRM: need to migrate cgroup charge of %lld\n",
atomic64_read(&obj->drmcg_bytes_charged));
- }
- obj->drmcg = drmcg_get(task);
+} +EXPORT_SYMBOL(drm_gem_object_set_cgroup);
+/**
- drm_gem_object_unset_cgroup - clear GEM object's associated cgroup
- @obj: GEM object
- This will release a reference on cgroup.
- */
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{
- WARN_ON(atomic64_read(&obj->drmcg_bytes_charged));
- drmcg_put(obj->drmcg);
+} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup);
+/**
- drm_gem_object_charge_mem - try charging size bytes to DRM cgroup
- @obj: GEM object which is being charged
- @size: number of bytes to charge
- Try to charge @size bytes to GEM object's associated DRM cgroup. This
- will fail if a successful charge would cause the current device memory
- usage to go above the cgroup's GPU memory maximum limit.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{
- int ret;
- ret = drm_cgroup_try_charge(obj->drmcg, obj->dev,
DRMCG_TYPE_MEM_CURRENT, size);
- if (!ret)
atomic64_add(size, &obj->drmcg_bytes_charged);
- return ret;
+} +EXPORT_SYMBOL(drm_gem_object_charge_mem);
+/**
- drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup
- @obj: GEM object which is being uncharged
- @size: number of bytes to uncharge
- Uncharge @size bytes from the DRM cgroup associated with specified
- GEM object.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{
- u64 charged = atomic64_read(&obj->drmcg_bytes_charged);
- if (WARN_ON(!charged))
return;
- if (WARN_ON(size > charged))
size = charged;
- atomic64_sub(size, &obj->drmcg_bytes_charged);
- drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT,
size);
+} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem);
/**
- drm_gem_object_init - initialize an allocated shmem-backed GEM object
- @dev: drm_device the object should be initialized for
@@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
- drm_gem_object_set_cgroup(obj, current);
- kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size;
@@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj);
- /* Release reference on cgroup used with GEM object charging */
- drm_gem_object_unset_cgroup(obj);
} EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h>
+#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h>
struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
- /**
- @drmcg:
- cgroup used for charging GEM object page allocations against. This
- is set to the current cgroup during GEM object creation.
- Charging policy is up to the DRM driver to implement and should be
- charged when allocating backing store from device memory.
- */
- struct drmcg *drmcg;
- atomic64_t drmcg_bytes_charged;
- /**
- @vma_node:
@@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset);
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task);
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size);
#endif /* __DRM_GEM_H__ */
2.20.1
On 3/18/2021 3:16 AM, Daniel Vetter wrote:
On Sat, Mar 6, 2021 at 1:44 AM Brian Welty brian.welty@intel.com wrote:
On 2/11/2021 7:34 AM, Daniel Vetter wrote:
On Wed, Feb 10, 2021 at 02:00:57PM -0800, Brian Welty wrote:
On 2/9/2021 2:54 AM, Daniel Vetter wrote:
On Tue, Jan 26, 2021 at 01:46:25PM -0800, Brian Welty wrote:
This patch adds tracking of which cgroup to make charges against for a given GEM object. We associate the current task's cgroup with GEM objects as they are created. First user of this is for charging DRM cgroup for device memory allocations. The intended behavior is for device drivers to make the cgroup charging calls at the time that backing store is allocated or deallocated for the object.
Exported functions are provided for charging memory allocations for a GEM object to DRM cgroup. To aid in debugging, we store how many bytes have been charged inside the GEM object. Add helpers for setting and clearing the object's associated cgroup which will check that charges are not being leaked.
For shared objects, this may make the charge against a cgroup that is potentially not the same cgroup as the process using the memory. Based on the memory cgroup's discussion of "memory ownership", this seems acceptable [1].
[1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership"
Signed-off-by: Brian Welty brian.welty@intel.com
Since for now we only have the generic gpu/xpu/bikeshed.memory bucket that counts everything, why don't we also charge in these gem functions?
I'm not sure what you mean exactly. You want to merge/move the charging logic proposed in patch #5 (drm_cgroup_try_charge in kernel/cgroup/drm.c) into drm_gem_object_charge_mem() ?
Or reading below, I think you are okay keeping the logic separated as is, but you want much of the code in kernel/cgroup/drm.c moved to drivers/gpu/cgroup ? Yes, I see that should allow to reduce number of exported functions.
Both. I mean we'd need to look at the code again when it's shuffled, but I'd say:
cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
the charging for gem buffers moves into core gem code, so it happens for all gpu drivers and all gem buffer allocations.
Daniel, I'm not sure we're in sync on what 'charging for general gpu memory' means. Thus far, I have been proposing to charge/uncharge when backing store is allocated/freed. And thus, this would be done in DRM driver (so then also in the dma-buf exporter). I can't see how we'd hoist this part into drm gem code. The memory limit in this series is for VRAM usage/limit not GEM buffers...
Yes this would be at gem buffer creation time. And just to get cgroups for drm up&running.
Okay, but it's not of the ones in Tejun's list to start with: https://lists.freedesktop.org/archives/dri-devel/2020-April/262141.html I hoped we would start by pursuing those (gpu.weight and gpu.memory.high) as first step.
Limiting GEM buffers is essentially controlling virtual memory size, which tend to just always get set to unlimited. Would be nice to get consensus from maintainers before proceeding to implement this.
Unless you are talking about charging for GEM buffer creation? But this is more of a 'soft resource' more along lines of Kenny's earlier GEM buffer limit control. I raised issue with this then, and at the time, Tejun agreed we should keep to 'hard resource' controls, see [1] and [2].
[1] https://lists.freedesktop.org/archives/dri-devel/2019-May/218071.html [2] https://lists.freedesktop.org/archives/dri-devel/2020-April/262141.html
- this might or might not mean a bunch less exported stuff from the cgroups files (since you don't need separate steps for linking a gem object to a cgroup from the actual charging), and probably no exports anymore for drivers (since they charge nothing). That will change when we add charging for specific memory pools I guess, but we add that when we add tha functionality.
... so considering VRAM charging, then yes, we very much need to have exported functions for drivers to do the charging. But these can be exported from drm.ko (or new .ko?) instead of kernel. Is that still preference? Also, if number of exported functions is concern, we can replace some of it with use of function pointers.
So the reason I suggested we drop all this is because we won't charge in drivers, we'll charge in ttm buffer management code. Which we'll adopt for dg1 in upstream. But it will take some time.
Okay, thanks for clarifying. I'm not familiar with where try_charge/uncharge would fit into the ttm model. Will need to look into it more....
So then returning to this comment of yours:
- cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
If you agree that we are charging just at backing-store level, then I think logic belongs in drivers/gpu/drm/cgroup ?? As charging is done in DRM driver (also dma-buf exporter). In other words, part of drm. If I understand, dma-buf heaps is exporter of system memory and doesn't need to charge against gpu controller?? Will need some help to understand the dma-buf heap use case a bit more.
Well we also need to track system gpu memory somehow. Currently that flies under the radar, and nasty userspace can just easily exhaust all of system memory with gpu buffers, even if there's otherwise cgroup limits in place. Which is not good. Hence also the overall limit for buffers.
If DRM allows user-space to exhaust all of system memory, this seems to be a gap in enforcement of MEMCG limits for system memory. I tried to look into it when this was discussed in the past.... My guess is that shmem_read_mapping_page_gfp() -> shmem_getpage_gfp() is not choosing the correct MM to charge against in the use case of drivers using shmemfs for backing gem buffers.
-Brian
-Daniel
Thanks, -Brian
Also, that would remove the need for all these functions exported to drivers. Plus the cgroups setup could also move fully into drm core code, since all drivers (*) support it That way this would really be a fully generic cgroups controller, and we could land it.
Patch #2 proposed to have a few setup functions called during drm device registration. You are suggesting to have this more tightly integrated?
Yeah essentially if DRIVER_GEM is set drm core would simply set this all up. Since with this we'd always account all gem buffers in cgroups, and it would make basic cgroup support a non-optional part of drm drivers.
Okay, can see what that looks like. It's true most of the exported functions from kernel/cgroup/drm.c were taking a struct drm_device pointer, so seems it can be restructured as you suggest. But I guess we will always need some logic in kernel/cgroup/drm.c to encapsulate the allocation of the 'struct cgroup_subsys_state' for this new controller. But I'm not sure I see how this makes the controller 'fully generic' as you describe.
All DRIVER_GEM would automatacially support it. And yes there'll still be some encapsulation ofc.
The other things I'd do:
drop gpu scheduling controller from the initial patch series. Yes we'll need it, but we also need vram limits and all these things for full featured controller. Having the minimal viable cgroup controller in upstream would unblock all these other things, and we could discuss them in separate patch series, instead of one big bikeshed that never reaches full consensus.
the xpu thing is probably real, I just chatted with Android people for their gpu memory accounting needs, and cgroups sounds like a solution for them too. But unlike on desktop/server linux, on Android all shared buffers are allocated from dma-buf heaps, so outside of drm, and hence a cgroup controller that's tightly tied to drm isn't that useful. So I think we should move the controller/charge functions up one level into drivers/gpu/cgroups.
Hmm, so for this, you are asking for the cgroup logic to not directly use DRM data structures? Okay, that's why you suggest drivers/gpu/cgroups and not drivers/gpu/drm/cgroups. So this is your angle to make it 'fully generic' then.....
This is another flavour of "generic", maybe need to split them up:
- make it more generic by rolling it out for all DRIVER_GEM
- make it more generic by allowing non-drm code to charge gpu memory (android's dma-buf heaps will need that, maybe v4l eventually too)
On the naming bikeshed I think gpu is perfectly fine, just explain in the docs that the G stands for "general" :-) Otherwise we might need to rename drivers/gpu to drivers/xpu too, and that's maybe one bikeshed too far. Plus, right now it really is the controller for gpu related memory, even if we extend it to Android (where it would also include video/camera allocatioons). Extending this cgroup controller to accelerators in general is maybe a bit too much.
The other disambiguation is how we account dma-buf (well, buffer based) gpu allocations vs HMM gpu memory allocations, that might be worth clarifying in the docs.
Finally to accelerate this further, I think it'd be good to pull out the cgroup spec for this more minimized series into patch 1, as a draft. That way we could get all stakeholders to ack on that ack, so hopefully we're building something that will work for everyone. That way we can hopefully untangle the controller design discussions from the implementation bikeshedding as much as possible.
Okay, thanks for all the inputs. I agree the 'cgroup spec' should be in first patch. Can redo this way as well.
As much of the code here for the controller was Kenny's work... Kenny, any input on Daniel's suggestions? Otherwise, I can proceed to rework as suggested.
If you're worried about acknowledgement if you end up fully rewriting code: Reference an old version from Kenny from archive and mention in the commit log it's based on that work. There's no requirement that you can only reuse patches from other people entirely unchanged, this kind of collaborative patch development mode happens all the time.
Cheers, Daniel
Thanks, -Brian
Cheers, Daniel
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory.
drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ include/drm/drm_gem.h | 17 ++++++++ 2 files changed, 106 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index c2ce78c4edc3..a12da41eaafe 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -29,6 +29,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <linux/cgroup_drm.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/module.h> @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) return drmm_add_action(dev, drm_gem_init_release, NULL); }
+/**
- drm_gem_object_set_cgroup - associate GEM object with a cgroup
- @obj: GEM object which is being associated with a cgroup
- @task: task associated with process control group to use
- This will acquire a reference on cgroup and use for charging GEM
- memory allocations.
- This helper could be extended in future to migrate charges to another
- cgroup, print warning if this usage occurs.
- */
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task)
+{
- /* if object has existing cgroup, we migrate the charge... */
- if (obj->drmcg) {
pr_warn("DRM: need to migrate cgroup charge of %lld\n",
atomic64_read(&obj->drmcg_bytes_charged));
- }
- obj->drmcg = drmcg_get(task);
+} +EXPORT_SYMBOL(drm_gem_object_set_cgroup);
+/**
- drm_gem_object_unset_cgroup - clear GEM object's associated cgroup
- @obj: GEM object
- This will release a reference on cgroup.
- */
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) +{
- WARN_ON(atomic64_read(&obj->drmcg_bytes_charged));
- drmcg_put(obj->drmcg);
+} +EXPORT_SYMBOL(drm_gem_object_unset_cgroup);
+/**
- drm_gem_object_charge_mem - try charging size bytes to DRM cgroup
- @obj: GEM object which is being charged
- @size: number of bytes to charge
- Try to charge @size bytes to GEM object's associated DRM cgroup. This
- will fail if a successful charge would cause the current device memory
- usage to go above the cgroup's GPU memory maximum limit.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) +{
- int ret;
- ret = drm_cgroup_try_charge(obj->drmcg, obj->dev,
DRMCG_TYPE_MEM_CURRENT, size);
- if (!ret)
atomic64_add(size, &obj->drmcg_bytes_charged);
- return ret;
+} +EXPORT_SYMBOL(drm_gem_object_charge_mem);
+/**
- drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup
- @obj: GEM object which is being uncharged
- @size: number of bytes to uncharge
- Uncharge @size bytes from the DRM cgroup associated with specified
- GEM object.
- Returns 0 on success. Otherwise, an error code is returned.
- */
+void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) +{
- u64 charged = atomic64_read(&obj->drmcg_bytes_charged);
- if (WARN_ON(!charged))
return;
- if (WARN_ON(size > charged))
size = charged;
- atomic64_sub(size, &obj->drmcg_bytes_charged);
- drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT,
size);
+} +EXPORT_SYMBOL(drm_gem_object_uncharge_mem);
/**
- drm_gem_object_init - initialize an allocated shmem-backed GEM object
- @dev: drm_device the object should be initialized for
@@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, obj->dev = dev; obj->filp = NULL;
- drm_gem_object_set_cgroup(obj, current);
- kref_init(&obj->refcount); obj->handle_count = 0; obj->size = size;
@@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
dma_resv_fini(&obj->_resv); drm_gem_free_mmap_offset(obj);
- /* Release reference on cgroup used with GEM object charging */
- drm_gem_object_unset_cgroup(obj);
} EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..06ea10fc17bc 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -37,6 +37,7 @@ #include <linux/kref.h> #include <linux/dma-resv.h>
+#include <drm/drm_cgroup.h> #include <drm/drm_vma_manager.h>
struct dma_buf_map; @@ -222,6 +223,17 @@ struct drm_gem_object { */ struct file *filp;
- /**
- @drmcg:
- cgroup used for charging GEM object page allocations against. This
- is set to the current cgroup during GEM object creation.
- Charging policy is up to the DRM driver to implement and should be
- charged when allocating backing store from device memory.
- */
- struct drmcg *drmcg;
- atomic64_t drmcg_bytes_charged;
- /**
- @vma_node:
@@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset);
+void drm_gem_object_set_cgroup(struct drm_gem_object *obj,
struct task_struct *task);
+void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size);
#endif /* __DRM_GEM_H__ */
2.20.1
The other thread about how to manage gpu compute resources reminded me of this one about gpu memory splitting.
On Thu, Mar 18, 2021 at 8:20 PM Brian Welty brian.welty@intel.com wrote:
On 3/18/2021 3:16 AM, Daniel Vetter wrote:
On Sat, Mar 6, 2021 at 1:44 AM Brian Welty brian.welty@intel.com wrote:
On 2/11/2021 7:34 AM, Daniel Vetter wrote:
On Wed, Feb 10, 2021 at 02:00:57PM -0800, Brian Welty wrote:
On 2/9/2021 2:54 AM, Daniel Vetter wrote:
On Tue, Jan 26, 2021 at 01:46:25PM -0800, Brian Welty wrote: > This patch adds tracking of which cgroup to make charges against for a > given GEM object. We associate the current task's cgroup with GEM objects > as they are created. First user of this is for charging DRM cgroup for > device memory allocations. The intended behavior is for device drivers to > make the cgroup charging calls at the time that backing store is allocated > or deallocated for the object. > > Exported functions are provided for charging memory allocations for a > GEM object to DRM cgroup. To aid in debugging, we store how many bytes > have been charged inside the GEM object. Add helpers for setting and > clearing the object's associated cgroup which will check that charges are > not being leaked. > > For shared objects, this may make the charge against a cgroup that is > potentially not the same cgroup as the process using the memory. Based > on the memory cgroup's discussion of "memory ownership", this seems > acceptable [1]. > > [1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt, "Memory Ownership" > > Signed-off-by: Brian Welty brian.welty@intel.com
Since for now we only have the generic gpu/xpu/bikeshed.memory bucket that counts everything, why don't we also charge in these gem functions?
I'm not sure what you mean exactly. You want to merge/move the charging logic proposed in patch #5 (drm_cgroup_try_charge in kernel/cgroup/drm.c) into drm_gem_object_charge_mem() ?
Or reading below, I think you are okay keeping the logic separated as is, but you want much of the code in kernel/cgroup/drm.c moved to drivers/gpu/cgroup ? Yes, I see that should allow to reduce number of exported functions.
Both. I mean we'd need to look at the code again when it's shuffled, but I'd say:
cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
the charging for gem buffers moves into core gem code, so it happens for all gpu drivers and all gem buffer allocations.
Daniel, I'm not sure we're in sync on what 'charging for general gpu memory' means. Thus far, I have been proposing to charge/uncharge when backing store is allocated/freed. And thus, this would be done in DRM driver (so then also in the dma-buf exporter). I can't see how we'd hoist this part into drm gem code. The memory limit in this series is for VRAM usage/limit not GEM buffers...
Yes this would be at gem buffer creation time. And just to get cgroups for drm up&running.
Okay, but it's not of the ones in Tejun's list to start with: https://lists.freedesktop.org/archives/dri-devel/2020-April/262141.html I hoped we would start by pursuing those (gpu.weight and gpu.memory.high) as first step.
Limiting GEM buffers is essentially controlling virtual memory size, which tend to just always get set to unlimited. Would be nice to get consensus from maintainers before proceeding to implement this.
Hm I missed this one from Tejun.
The problem with just assuming that dma-buf in system memory are included as part of the overall memcg is that when we throw a buffer out of vram, it needs to go somewhere. So from that pov keeping all the dma-buf in their separate hierarchy, outside of what memcg sees, simplifies a lot of things. Unlike e.g. numa migration gpu vram migration is often required for correctness (some buffers _have_ to be in in VRAM on some gpus, or the workload just doesn't work), so if we allow admins to misconfigure stuff then there could be a lot of bad surprises.
I'm also expecting more hierarchies like this, e.g. if you have a bunch of GPUs connected with a fast interconnect, then you might want to set an overall limit for VRAM across all gpus, and then maybe additional limits on each gpu node.
This is entirely different if we manage the VRAM as ZONE_DEVICE memory, but in that case I expect memcg will be able to take care of all the managing needs.
Another case to consider is CMA allocations, where it's a zone in normal memory we need to allocate from, so again strange issues with double counting are to be expected.
Or am I worrying about something that shouldn't be a problem? I.e. if workload fail because admins didn't reserve enough main memory in their memcg then we just shrug?
Unless you are talking about charging for GEM buffer creation? But this is more of a 'soft resource' more along lines of Kenny's earlier GEM buffer limit control. I raised issue with this then, and at the time, Tejun agreed we should keep to 'hard resource' controls, see [1] and [2].
[1] https://lists.freedesktop.org/archives/dri-devel/2019-May/218071.html [2] https://lists.freedesktop.org/archives/dri-devel/2020-April/262141.html
- this might or might not mean a bunch less exported stuff from the cgroups files (since you don't need separate steps for linking a gem object to a cgroup from the actual charging), and probably no exports anymore for drivers (since they charge nothing). That will change when we add charging for specific memory pools I guess, but we add that when we add tha functionality.
... so considering VRAM charging, then yes, we very much need to have exported functions for drivers to do the charging. But these can be exported from drm.ko (or new .ko?) instead of kernel. Is that still preference? Also, if number of exported functions is concern, we can replace some of it with use of function pointers.
So the reason I suggested we drop all this is because we won't charge in drivers, we'll charge in ttm buffer management code. Which we'll adopt for dg1 in upstream. But it will take some time.
Okay, thanks for clarifying. I'm not familiar with where try_charge/uncharge would fit into the ttm model. Will need to look into it more....
So then returning to this comment of yours:
- cgroup code and the charging for general gpu memory moves to drivers/gpu/cgroup, so dma-buf heaps can use it too.
If you agree that we are charging just at backing-store level, then I think logic belongs in drivers/gpu/drm/cgroup ?? As charging is done in DRM driver (also dma-buf exporter). In other words, part of drm. If I understand, dma-buf heaps is exporter of system memory and doesn't need to charge against gpu controller?? Will need some help to understand the dma-buf heap use case a bit more.
Well we also need to track system gpu memory somehow. Currently that flies under the radar, and nasty userspace can just easily exhaust all of system memory with gpu buffers, even if there's otherwise cgroup limits in place. Which is not good. Hence also the overall limit for buffers.
If DRM allows user-space to exhaust all of system memory, this seems to be a gap in enforcement of MEMCG limits for system memory. I tried to look into it when this was discussed in the past.... My guess is that shmem_read_mapping_page_gfp() -> shmem_getpage_gfp() is not choosing the correct MM to charge against in the use case of drivers using shmemfs for backing gem buffers.
Yeah we know about this one since forever. The bug report is roughly as old as the gem/ttm memory managers :-/ So another problem might be that if we now suddenly include gpu memory in the memcg accounting, we start breaking a bunch of workloads that worked just fine beforehand. -Daniel
-Brian
-Daniel
Thanks, -Brian
Also, that would remove the need for all these functions exported to drivers. Plus the cgroups setup could also move fully into drm core code, since all drivers (*) support it That way this would really be a fully generic cgroups controller, and we could land it.
Patch #2 proposed to have a few setup functions called during drm device registration. You are suggesting to have this more tightly integrated?
Yeah essentially if DRIVER_GEM is set drm core would simply set this all up. Since with this we'd always account all gem buffers in cgroups, and it would make basic cgroup support a non-optional part of drm drivers.
Okay, can see what that looks like. It's true most of the exported functions from kernel/cgroup/drm.c were taking a struct drm_device pointer, so seems it can be restructured as you suggest. But I guess we will always need some logic in kernel/cgroup/drm.c to encapsulate the allocation of the 'struct cgroup_subsys_state' for this new controller. But I'm not sure I see how this makes the controller 'fully generic' as you describe.
All DRIVER_GEM would automatacially support it. And yes there'll still be some encapsulation ofc.
The other things I'd do:
drop gpu scheduling controller from the initial patch series. Yes we'll need it, but we also need vram limits and all these things for full featured controller. Having the minimal viable cgroup controller in upstream would unblock all these other things, and we could discuss them in separate patch series, instead of one big bikeshed that never reaches full consensus.
the xpu thing is probably real, I just chatted with Android people for their gpu memory accounting needs, and cgroups sounds like a solution for them too. But unlike on desktop/server linux, on Android all shared buffers are allocated from dma-buf heaps, so outside of drm, and hence a cgroup controller that's tightly tied to drm isn't that useful. So I think we should move the controller/charge functions up one level into drivers/gpu/cgroups.
Hmm, so for this, you are asking for the cgroup logic to not directly use DRM data structures? Okay, that's why you suggest drivers/gpu/cgroups and not drivers/gpu/drm/cgroups. So this is your angle to make it 'fully generic' then.....
This is another flavour of "generic", maybe need to split them up:
- make it more generic by rolling it out for all DRIVER_GEM
- make it more generic by allowing non-drm code to charge gpu memory (android's dma-buf heaps will need that, maybe v4l eventually too)
On the naming bikeshed I think gpu is perfectly fine, just explain in the docs that the G stands for "general" :-) Otherwise we might need to rename drivers/gpu to drivers/xpu too, and that's maybe one bikeshed too far. Plus, right now it really is the controller for gpu related memory, even if we extend it to Android (where it would also include video/camera allocatioons). Extending this cgroup controller to accelerators in general is maybe a bit too much.
The other disambiguation is how we account dma-buf (well, buffer based) gpu allocations vs HMM gpu memory allocations, that might be worth clarifying in the docs.
Finally to accelerate this further, I think it'd be good to pull out the cgroup spec for this more minimized series into patch 1, as a draft. That way we could get all stakeholders to ack on that ack, so hopefully we're building something that will work for everyone. That way we can hopefully untangle the controller design discussions from the implementation bikeshedding as much as possible.
Okay, thanks for all the inputs. I agree the 'cgroup spec' should be in first patch. Can redo this way as well.
As much of the code here for the controller was Kenny's work... Kenny, any input on Daniel's suggestions? Otherwise, I can proceed to rework as suggested.
If you're worried about acknowledgement if you end up fully rewriting code: Reference an old version from Kenny from archive and mention in the commit log it's based on that work. There's no requirement that you can only reuse patches from other people entirely unchanged, this kind of collaborative patch development mode happens all the time.
Cheers, Daniel
Thanks, -Brian
Cheers, Daniel
*: vmwgfx is the only non-gem driver, but there's plans to move at least vmwgfx internals (maybe not the uapi, we'll see) over to gem. Once that's done it's truly all gpu memory. > --- > drivers/gpu/drm/drm_gem.c | 89 +++++++++++++++++++++++++++++++++++++++ > include/drm/drm_gem.h | 17 ++++++++ > 2 files changed, 106 insertions(+) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index c2ce78c4edc3..a12da41eaafe 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -29,6 +29,7 @@ > #include <linux/slab.h> > #include <linux/mm.h> > #include <linux/uaccess.h> > +#include <linux/cgroup_drm.h> > #include <linux/fs.h> > #include <linux/file.h> > #include <linux/module.h> > @@ -112,6 +113,89 @@ drm_gem_init(struct drm_device *dev) > return drmm_add_action(dev, drm_gem_init_release, NULL); > } > > +/** > + * drm_gem_object_set_cgroup - associate GEM object with a cgroup > + * @obj: GEM object which is being associated with a cgroup > + * @task: task associated with process control group to use > + * > + * This will acquire a reference on cgroup and use for charging GEM > + * memory allocations. > + * This helper could be extended in future to migrate charges to another > + * cgroup, print warning if this usage occurs. > + */ > +void drm_gem_object_set_cgroup(struct drm_gem_object *obj, > + struct task_struct *task) > +{ > + /* if object has existing cgroup, we migrate the charge... */ > + if (obj->drmcg) { > + pr_warn("DRM: need to migrate cgroup charge of %lld\n", > + atomic64_read(&obj->drmcg_bytes_charged)); > + } > + obj->drmcg = drmcg_get(task); > +} > +EXPORT_SYMBOL(drm_gem_object_set_cgroup); > + > +/** > + * drm_gem_object_unset_cgroup - clear GEM object's associated cgroup > + * @obj: GEM object > + * > + * This will release a reference on cgroup. > + */ > +void drm_gem_object_unset_cgroup(struct drm_gem_object *obj) > +{ > + WARN_ON(atomic64_read(&obj->drmcg_bytes_charged)); > + drmcg_put(obj->drmcg); > +} > +EXPORT_SYMBOL(drm_gem_object_unset_cgroup); > + > +/** > + * drm_gem_object_charge_mem - try charging size bytes to DRM cgroup > + * @obj: GEM object which is being charged > + * @size: number of bytes to charge > + * > + * Try to charge @size bytes to GEM object's associated DRM cgroup. This > + * will fail if a successful charge would cause the current device memory > + * usage to go above the cgroup's GPU memory maximum limit. > + * > + * Returns 0 on success. Otherwise, an error code is returned. > + */ > +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size) > +{ > + int ret; > + > + ret = drm_cgroup_try_charge(obj->drmcg, obj->dev, > + DRMCG_TYPE_MEM_CURRENT, size); > + if (!ret) > + atomic64_add(size, &obj->drmcg_bytes_charged); > + return ret; > +} > +EXPORT_SYMBOL(drm_gem_object_charge_mem); > + > +/** > + * drm_gem_object_uncharge_mem - uncharge size bytes from DRM cgroup > + * @obj: GEM object which is being uncharged > + * @size: number of bytes to uncharge > + * > + * Uncharge @size bytes from the DRM cgroup associated with specified > + * GEM object. > + * > + * Returns 0 on success. Otherwise, an error code is returned. > + */ > +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size) > +{ > + u64 charged = atomic64_read(&obj->drmcg_bytes_charged); > + > + if (WARN_ON(!charged)) > + return; > + if (WARN_ON(size > charged)) > + size = charged; > + > + atomic64_sub(size, &obj->drmcg_bytes_charged); > + drm_cgroup_uncharge(obj->drmcg, obj->dev, DRMCG_TYPE_MEM_CURRENT, > + size); > +} > +EXPORT_SYMBOL(drm_gem_object_uncharge_mem); > + > /** > * drm_gem_object_init - initialize an allocated shmem-backed GEM object > * @dev: drm_device the object should be initialized for > @@ -156,6 +240,8 @@ void drm_gem_private_object_init(struct drm_device *dev, > obj->dev = dev; > obj->filp = NULL; > > + drm_gem_object_set_cgroup(obj, current); > + > kref_init(&obj->refcount); > obj->handle_count = 0; > obj->size = size; > @@ -950,6 +1036,9 @@ drm_gem_object_release(struct drm_gem_object *obj) > > dma_resv_fini(&obj->_resv); > drm_gem_free_mmap_offset(obj); > + > + /* Release reference on cgroup used with GEM object charging */ > + drm_gem_object_unset_cgroup(obj); > } > EXPORT_SYMBOL(drm_gem_object_release); > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > index 240049566592..06ea10fc17bc 100644 > --- a/include/drm/drm_gem.h > +++ b/include/drm/drm_gem.h > @@ -37,6 +37,7 @@ > #include <linux/kref.h> > #include <linux/dma-resv.h> > > +#include <drm/drm_cgroup.h> > #include <drm/drm_vma_manager.h> > > struct dma_buf_map; > @@ -222,6 +223,17 @@ struct drm_gem_object { > */ > struct file *filp; > > + /** > + * @drmcg: > + * > + * cgroup used for charging GEM object page allocations against. This > + * is set to the current cgroup during GEM object creation. > + * Charging policy is up to the DRM driver to implement and should be > + * charged when allocating backing store from device memory. > + */ > + struct drmcg *drmcg; > + atomic64_t drmcg_bytes_charged; > + > /** > * @vma_node: > * > @@ -417,4 +429,9 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array, > int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, > u32 handle, u64 *offset); > > +void drm_gem_object_set_cgroup(struct drm_gem_object *obj, > + struct task_struct *task); > +void drm_gem_object_unset_cgroup(struct drm_gem_object *obj); > +int drm_gem_object_charge_mem(struct drm_gem_object *obj, u64 size); > +void drm_gem_object_uncharge_mem(struct drm_gem_object *obj, u64 size); > #endif /* __DRM_GEM_H__ */ > -- > 2.20.1 >
Hi,
Mon, 2021-05-10 at 17:36 +0200, Daniel Vetter wrote:
...
If DRM allows user-space to exhaust all of system memory, this seems to be a gap in enforcement of MEMCG limits for system memory. I tried to look into it when this was discussed in the past.... My guess is that shmem_read_mapping_page_gfp() -> shmem_getpage_gfp() is not choosing the correct MM to charge against in the use case of drivers using shmemfs for backing gem buffers.
Yeah we know about this one since forever. The bug report is roughly as old as the gem/ttm memory managers :-/ So another problem might be that if we now suddenly include gpu memory in the memcg accounting, we start breaking a bunch of workloads that worked just fine beforehand.
It's not the first time tightening security requires adapting settings for running workloads...
Workload GPU memory usage needs to be significant portion of application's real memory usage, to cause workload to hit limits that have been set for it earlier. Therefore I think it to definitely be something that user setting such limits actually cares about.
=> I think the important thing is that reason for the failures is clear from the OOM message. This works much better if GPU related memory usage is specifically stated in that message, once that memory starts to be accounted for system memory.
- Eero
To charge device memory allocations, we need to (1) identify appropriate cgroup to charge (currently decided at object creation time), and (2) make the charging call at the time that memory pages are being allocated.
For (1), see prior DRM patch which associates current task's cgroup with GEM objects as they are created. That cgroup will be charged/uncharged for all paging activity against the GEM object.
For (2), we call drm_get_object_charge_mem() in .get_pages callback for the GEM object type. Uncharging is done in .put_pages when the memory is marked such that it can be evicted. The try_charge() call will fail with -ENOMEM if the current memory allocation will exceed the cgroup device memory maximum, and allow for driver to perform memory reclaim.
We also set the total device memory reported by DRM cgroup by storing in drm_device.drmcg_props after initializing LMEM memory regions.
FIXME: to release drm cgroup reference requires this additional patch: https://patchwork.freedesktop.org/patch/349029
Signed-off-by: Brian Welty brian.welty@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_region.c | 23 ++++++++++++++++++---- drivers/gpu/drm/i915/intel_memory_region.c | 13 ++++++++++-- drivers/gpu/drm/i915/intel_memory_region.h | 2 +- 4 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index ec28a6cde49b..9fbe91d4d2f1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -16,6 +16,7 @@ #include "i915_gem_gtt.h" #include "i915_gem_ioctls.h" #include "i915_gem_object.h" +#include "i915_gem_lmem.h" #include "i915_gem_mman.h" #include "i915_trace.h" #include "i915_user_extensions.h" diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 3e3dad22a683..690b36b25984 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -12,11 +12,14 @@ void i915_gem_object_put_pages_buddy(struct drm_i915_gem_object *obj, struct sg_table *pages) { - __intel_memory_region_put_pages_buddy(obj->mm.region, &obj->mm.blocks); + u64 freed;
+ freed = __intel_memory_region_put_pages_buddy(obj->mm.region, + &obj->mm.blocks); obj->mm.dirty = false; sg_free_table(pages); kfree(pages); + drm_gem_object_uncharge_mem(&obj->base, freed); }
int @@ -25,7 +28,7 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) const u64 max_segment = i915_sg_segment_size(); struct intel_memory_region *mem = obj->mm.region; struct list_head *blocks = &obj->mm.blocks; - resource_size_t size = obj->base.size; + resource_size_t charged, size = obj->base.size; resource_size_t prev_end; struct i915_buddy_block *block; unsigned int flags; @@ -44,12 +47,22 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj) }
flags = I915_ALLOC_MIN_PAGE_SIZE; - if (obj->flags & I915_BO_ALLOC_CONTIGUOUS) + if (obj->flags & I915_BO_ALLOC_CONTIGUOUS) { flags |= I915_ALLOC_CONTIGUOUS; + charged = roundup_pow_of_two(size); + } else { + charged = size; + } + + ret = drm_gem_object_charge_mem(&obj->base, charged); + if (ret) { + DRM_DEBUG("DRMCG: charge_mem failed for %lld\n", charged); + goto err_free_sg; + }
ret = __intel_memory_region_get_pages_buddy(mem, size, flags, blocks); if (ret) - goto err_free_sg; + goto err_uncharge;
GEM_BUG_ON(list_empty(blocks));
@@ -99,6 +112,8 @@ i915_gem_object_get_pages_buddy(struct drm_i915_gem_object *obj)
return 0;
+err_uncharge: + drm_gem_object_uncharge_mem(&obj->base, charged); err_free_sg: sg_free_table(st); kfree(st); diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index 1bfcdd89b241..9b1edbf4361c 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -46,13 +46,18 @@ intel_memory_region_free_pages(struct intel_memory_region *mem, return size; }
-void +u64 __intel_memory_region_put_pages_buddy(struct intel_memory_region *mem, struct list_head *blocks) { + u64 freed; + mutex_lock(&mem->mm_lock); - mem->avail += intel_memory_region_free_pages(mem, blocks); + freed = intel_memory_region_free_pages(mem, blocks); + mem->avail += freed; mutex_unlock(&mem->mm_lock); + + return freed; }
void @@ -241,6 +246,7 @@ void intel_memory_region_put(struct intel_memory_region *mem)
int intel_memory_regions_hw_probe(struct drm_i915_private *i915) { + u64 lmem_total = 0; int err, i;
for (i = 0; i < ARRAY_SIZE(i915->mm.regions); i++) { @@ -260,6 +266,7 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) break; case INTEL_MEMORY_LOCAL: mem = intel_setup_fake_lmem(i915); + lmem_total += mem->total; break; }
@@ -278,6 +285,8 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) i915->mm.regions[i] = mem; }
+ i915->drm.drmcg_props.memory_total = lmem_total; + return 0;
out_cleanup: diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index 6ffc0673f005..c9fca951a372 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -109,7 +109,7 @@ struct i915_buddy_block * __intel_memory_region_get_block_buddy(struct intel_memory_region *mem, resource_size_t size, unsigned int flags); -void __intel_memory_region_put_pages_buddy(struct intel_memory_region *mem, +u64 __intel_memory_region_put_pages_buddy(struct intel_memory_region *mem, struct list_head *blocks); void __intel_memory_region_put_block_buddy(struct i915_buddy_block *block);
On 2021/1/27 上午5:46, Brian Welty wrote:
We'd like to revisit the proposal of a GPU cgroup controller for managing GPU devices but with just a basic set of controls. This series is based on the prior patch series from Kenny Ho [1]. We take Kenny's base patches which implement the basic framework for the controller, but we propose an alternate set of control files. Here we've taken a subset of the controls proposed in earlier discussion on ML here [2].
This series proposes a set of device memory controls (gpu.memory.current, gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage (gpu.sched.runtime). GPU time sharing controls are left as future work. These are implemented within the GPU controller along with integration/usage of the device memory controls by the i915 device driver.
As an accelerator or GPU device is similar in many respects to a CPU with (or without) attached system memory, the basic principle here is try to copy the semantics of existing controls from other controllers when possible and where these controls serve the same underlying purpose. For example, the memory.max and memory.current controls are based on same controls from MEMCG controller.
It seems not to be DRM specific, or even GPU specific. Would we have an universal control group for
any accelerator, GPGPU device etc, that hold sharable resources like device memory, compute utility,
bandwidth, with extra control file to select between devices(or vendors)?
e.g. /cgname.device that stores PCI BDF, or enum(intel, amdgpu, nvidia, ...), defaults to none,
means not enabled.
On 2021/1/27 上午5:46, Brian Welty wrote:
We'd like to revisit the proposal of a GPU cgroup controller for managing GPU devices but with just a basic set of controls. This series is based on the prior patch series from Kenny Ho [1]. We take Kenny's base patches which implement the basic framework for the controller, but we propose an alternate set of control files. Here we've taken a subset of the controls proposed in earlier discussion on ML here [2].
This series proposes a set of device memory controls (gpu.memory.current, gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage (gpu.sched.runtime). GPU time sharing controls are left as future work. These are implemented within the GPU controller along with integration/usage of the device memory controls by the i915 device driver.
As an accelerator or GPU device is similar in many respects to a CPU with (or without) attached system memory, the basic principle here is try to copy the semantics of existing controls from other controllers when possible and where these controls serve the same underlying purpose. For example, the memory.max and memory.current controls are based on same controls from MEMCG controller.
It seems not to be DRM specific, or even GPU specific. Would we have an universal control group for any accelerator, GPGPU device etc, that hold sharable resources like device memory, compute utility, bandwidth, with extra control file to select between devices(or vendors)?
e.g. /cgname.device that stores PCI BDF, or enum(intel, amdgpu, nvidia, ...), defaults to none, means not enabled.
On 1/28/2021 7:00 PM, Xingyou Chen wrote:
On 2021/1/27 上午5:46, Brian Welty wrote:
We'd like to revisit the proposal of a GPU cgroup controller for managing GPU devices but with just a basic set of controls. This series is based on the prior patch series from Kenny Ho [1]. We take Kenny's base patches which implement the basic framework for the controller, but we propose an alternate set of control files. Here we've taken a subset of the controls proposed in earlier discussion on ML here [2].
This series proposes a set of device memory controls (gpu.memory.current, gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage (gpu.sched.runtime). GPU time sharing controls are left as future work. These are implemented within the GPU controller along with integration/usage of the device memory controls by the i915 device driver.
As an accelerator or GPU device is similar in many respects to a CPU with (or without) attached system memory, the basic principle here is try to copy the semantics of existing controls from other controllers when possible and where these controls serve the same underlying purpose. For example, the memory.max and memory.current controls are based on same controls from MEMCG controller.
It seems not to be DRM specific, or even GPU specific. Would we have an universal control group for any accelerator, GPGPU device etc, that hold sharable resources like device memory, compute utility, bandwidth, with extra control file to select between devices(or vendors)?
e.g. /cgname.device that stores PCI BDF, or enum(intel, amdgpu, nvidia, ...), defaults to none, means not enabled.
Hi, thanks for the feedback. Yes, I tend to agree. I've asked about this in earlier work; my suggestion is to name the controller something like 'XPU' to be clear that these controls could apply to more than GPU.
But at least for now, based on Tejun's reply [1], the feedback is to try and keep this controller as small and focused as possible on just GPU. At least until we get some consensus on set of controls for GPU..... but for this we need more active input from community......
-Brian
[1] https://lists.freedesktop.org/archives/dri-devel/2019-November/243167.html
On Mon, Feb 01, 2021 at 03:21:35PM -0800, Brian Welty wrote:
On 1/28/2021 7:00 PM, Xingyou Chen wrote:
On 2021/1/27 上午5:46, Brian Welty wrote:
We'd like to revisit the proposal of a GPU cgroup controller for managing GPU devices but with just a basic set of controls. This series is based on the prior patch series from Kenny Ho [1]. We take Kenny's base patches which implement the basic framework for the controller, but we propose an alternate set of control files. Here we've taken a subset of the controls proposed in earlier discussion on ML here [2].
This series proposes a set of device memory controls (gpu.memory.current, gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage (gpu.sched.runtime). GPU time sharing controls are left as future work. These are implemented within the GPU controller along with integration/usage of the device memory controls by the i915 device driver.
As an accelerator or GPU device is similar in many respects to a CPU with (or without) attached system memory, the basic principle here is try to copy the semantics of existing controls from other controllers when possible and where these controls serve the same underlying purpose. For example, the memory.max and memory.current controls are based on same controls from MEMCG controller.
It seems not to be DRM specific, or even GPU specific. Would we have an universal control group for any accelerator, GPGPU device etc, that hold sharable resources like device memory, compute utility, bandwidth, with extra control file to select between devices(or vendors)?
e.g. /cgname.device that stores PCI BDF, or enum(intel, amdgpu, nvidia, ...), defaults to none, means not enabled.
Hi, thanks for the feedback. Yes, I tend to agree. I've asked about this in earlier work; my suggestion is to name the controller something like 'XPU' to be clear that these controls could apply to more than GPU.
But at least for now, based on Tejun's reply [1], the feedback is to try and keep this controller as small and focused as possible on just GPU. At least until we get some consensus on set of controls for GPU..... but for this we need more active input from community......
There's also nothing stopping anyone from exposing any kind of XPU as drivers/gpu device. Aside from the "full stack must be open requirement we have" in drm. And frankly with drm being very confusing acronym we could also rename GPU to be the "general processing unit" subsytem :-) -Daniel
-Brian
[1] https://lists.freedesktop.org/archives/dri-devel/2019-November/243167.html
dri-devel@lists.freedesktop.org