[RFC 0/8] Per client GPU stats

List overview All Threads
Download

newer

older

[PATCH 0/3] add checks against...

[PATCH v3 1/2] dt-bindings: panel:...

Tvrtko Ursulin

15 Jul 2021 15 Jul '21

9:18 a.m.

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Same old work but now rebased and series ending with some DRM docs proposing the common specification which should enable nice common userspace tools to be written.

For the moment I only have intel_gpu_top converted to use this and that seems to work okay.

v2: * Added prototype of possible amdgpu changes and spec updates to align with the common spec.

Tvrtko Ursulin (8): drm/i915: Explicitly track DRM clients drm/i915: Make GEM contexts track DRM clients drm/i915: Track runtime spent in closed and unreachable GEM contexts drm/i915: Track all user contexts per client drm/i915: Track context current active time drm: Document fdinfo format specification drm/i915: Expose client engine utilisation via fdinfo drm/amdgpu: Convert to common fdinfo format

Documentation/gpu/amdgpu.rst | 26 ++++ Documentation/gpu/drm-usage-stats.rst | 108 +++++++++++++ Documentation/gpu/i915.rst | 27 ++++ Documentation/gpu/index.rst | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 18 ++- drivers/gpu/drm/i915/Makefile | 5 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 42 ++++- .../gpu/drm/i915/gem/i915_gem_context_types.h | 6 + drivers/gpu/drm/i915/gt/intel_context.c | 27 +++- drivers/gpu/drm/i915/gt/intel_context.h | 15 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 24 ++- .../drm/i915/gt/intel_execlists_submission.c | 23 ++- .../gpu/drm/i915/gt/intel_gt_clock_utils.c | 4 + drivers/gpu/drm/i915/gt/intel_lrc.c | 27 ++-- drivers/gpu/drm/i915/gt/intel_lrc.h | 24 +++ drivers/gpu/drm/i915/gt/selftest_lrc.c | 10 +- drivers/gpu/drm/i915/i915_drm_client.c | 143 ++++++++++++++++++ drivers/gpu/drm/i915/i915_drm_client.h | 66 ++++++++ drivers/gpu/drm/i915/i915_drv.c | 9 ++ drivers/gpu/drm/i915/i915_drv.h | 5 + drivers/gpu/drm/i915/i915_gem.c | 21 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 9 +- drivers/gpu/drm/i915/i915_gpu_error.h | 2 +- 23 files changed, 581 insertions(+), 61 deletions(-) create mode 100644 Documentation/gpu/drm-usage-stats.rst create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

-- 2.30.2

Show replies by date

Tvrtko Ursulin

15 Jul 15 Jul

9:18 a.m.

New subject: [RFC 1/8] drm/i915: Explicitly track DRM clients

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Tracking DRM clients more explicitly will allow later patches to accumulate past and current GPU usage in a centralised place and also consolidate access to owning task pid/name.

Unique client id is also assigned for the purpose of distinguishing/ consolidating between multiple file descriptors owned by the same process.

v2: Chris Wilson: * Enclose new members into dedicated structs. * Protect against failed sysfs registration.

v3: * sysfs_attr_init.

v4: * Fix for internal clients.

v5: * Use cyclic ida for client id. (Chris) * Do not leak pid reference. (Chris) * Tidy code with some locals.

v6: * Use xa_alloc_cyclic to simplify locking. (Chris) * No need to unregister individial sysfs files. (Chris) * Rebase on top of fpriv kref. * Track client closed status and reflect in sysfs.

v7: * Make drm_client more standalone concept.

v8: * Simplify sysfs show. (Chris) * Always track name and pid.

v9: * Fix cyclic id assignment.

v10: * No need for a mutex around xa_alloc_cyclic. * Refactor sysfs into own function. * Unregister sysfs before freeing pid and name. * Move clients setup into own function.

v11: * Call clients init directly from driver init. (Chris)

v12: * Do not fail client add on id wrap. (Maciej)

v13 (Lucas): Rebase.

v14: * Dropped sysfs bits.

v15: * Dropped tracking of pid/ and name. * Dropped RCU freeing of the client object.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk # v11 Reviewed-by: Aravind Iddamsetty aravind.iddamsetty@intel.com # v11 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/Makefile | 5 +- drivers/gpu/drm/i915/i915_drm_client.c | 68 ++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_drm_client.h | 50 +++++++++++++++++++ drivers/gpu/drm/i915/i915_drv.c | 6 +++ drivers/gpu/drm/i915/i915_drv.h | 5 ++ drivers/gpu/drm/i915/i915_gem.c | 21 ++++++-- 6 files changed, 150 insertions(+), 5 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 10b3bb6207ba..784f99ca11fc 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -33,8 +33,9 @@ subdir-ccflags-y += -I$(srctree)/$(src) # Please keep these build lists sorted!

# core driver code -i915-y += i915_drv.o \ - i915_config.o \ +i915-y += i915_config.o \ + i915_drm_client.o \ + i915_drv.o \ i915_irq.o \ i915_getparam.o \ i915_mitigations.o \ diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c new file mode 100644 index 000000000000..e61e9ba15256 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_drm_client.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include <linux/kernel.h> +#include <linux/slab.h> +#include <linux/types.h> + +#include "i915_drm_client.h" +#include "i915_gem.h" +#include "i915_utils.h" + +void i915_drm_clients_init(struct i915_drm_clients *clients, + struct drm_i915_private *i915) +{ + clients->i915 = i915; + clients->next_id = 0; + + xa_init_flags(&clients->xarray, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); +} + +struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients) +{ + struct i915_drm_client *client; + struct xarray *xa = &clients->xarray; + int ret; + + client = kzalloc(sizeof(*client), GFP_KERNEL); + if (!client) + return ERR_PTR(-ENOMEM); + + xa_lock_irq(xa); + ret = __xa_alloc_cyclic(xa, &client->id, client, xa_limit_32b, + &clients->next_id, GFP_KERNEL); + xa_unlock_irq(xa); + if (ret < 0) + goto err; + + kref_init(&client->kref); + client->clients = clients; + + return client; + +err: + kfree(client); + + return ERR_PTR(ret); +} + +void __i915_drm_client_free(struct kref *kref) +{ + struct i915_drm_client *client = + container_of(kref, typeof(*client), kref); + struct xarray *xa = &client->clients->xarray; + unsigned long flags; + + xa_lock_irqsave(xa, flags); + __xa_erase(xa, client->id); + xa_unlock_irqrestore(xa, flags); + kfree(client); +} + +void i915_drm_clients_fini(struct i915_drm_clients *clients) +{ + GEM_BUG_ON(!xa_empty(&clients->xarray)); + xa_destroy(&clients->xarray); +} diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h new file mode 100644 index 000000000000..e8986ad51176 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_drm_client.h @@ -0,0 +1,50 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2020 Intel Corporation + */ + +#ifndef __I915_DRM_CLIENT_H__ +#define __I915_DRM_CLIENT_H__ + +#include <linux/kref.h> +#include <linux/xarray.h> + +struct drm_i915_private; + +struct i915_drm_clients { + struct drm_i915_private *i915; + + struct xarray xarray; + u32 next_id; +}; + +struct i915_drm_client { + struct kref kref; + + unsigned int id; + + struct i915_drm_clients *clients; +}; + +void i915_drm_clients_init(struct i915_drm_clients *clients, + struct drm_i915_private *i915); + +static inline struct i915_drm_client * +i915_drm_client_get(struct i915_drm_client *client) +{ + kref_get(&client->kref); + return client; +} + +void __i915_drm_client_free(struct kref *kref); + +static inline void i915_drm_client_put(struct i915_drm_client *client) +{ + kref_put(&client->kref, __i915_drm_client_free); +} + +struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients); + +void i915_drm_clients_fini(struct i915_drm_clients *clients); + +#endif /* !__I915_DRM_CLIENT_H__ */ diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 30d8cd8c69b1..bb628eade92a 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -68,6 +68,7 @@ #include "gt/intel_rc6.h"

#include "i915_debugfs.h" +#include "i915_drm_client.h" #include "i915_drv.h" #include "i915_ioc32.h" #include "i915_irq.h" @@ -343,6 +344,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)

intel_gt_init_early(&dev_priv->gt, dev_priv);

+ i915_drm_clients_init(&dev_priv->clients, dev_priv); + i915_gem_init_early(dev_priv);

/* This must be called before any calls to HAS_PCH_* */ @@ -362,6 +365,7 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)

err_gem: i915_gem_cleanup_early(dev_priv); + i915_drm_clients_fini(&dev_priv->clients); intel_gt_driver_late_release(&dev_priv->gt); intel_region_ttm_device_fini(dev_priv); err_ttm: @@ -381,6 +385,7 @@ static void i915_driver_late_release(struct drm_i915_private *dev_priv) intel_irq_fini(dev_priv); intel_power_domains_cleanup(dev_priv); i915_gem_cleanup_early(dev_priv); + i915_drm_clients_fini(&dev_priv->clients); intel_gt_driver_late_release(&dev_priv->gt); intel_region_ttm_device_fini(dev_priv); vlv_suspend_cleanup(dev_priv); @@ -996,6 +1001,7 @@ static void i915_driver_postclose(struct drm_device *dev, struct drm_file *file) struct drm_i915_file_private *file_priv = file->driver_priv;

i915_gem_context_close(file); + i915_drm_client_put(file_priv->client);

kfree_rcu(file_priv, rcu);

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c4747f4407ef..338d384c31eb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -96,6 +96,7 @@ #include "intel_wakeref.h" #include "intel_wopcm.h"

+#include "i915_drm_client.h" #include "i915_gem.h" #include "i915_gem_gtt.h" #include "i915_gpu_error.h" @@ -284,6 +285,8 @@ struct drm_i915_file_private { /** ban_score: Accumulated score of all ctx bans and fast hangs. */ atomic_t ban_score; unsigned long hang_timestamp; + + struct i915_drm_client *client; };

/* Interface history: @@ -1218,6 +1221,8 @@ struct drm_i915_private {

struct i915_pmu pmu;

+ struct i915_drm_clients clients; + struct i915_hdcp_comp_master *hdcp_master; bool hdcp_comp_added;

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 590efc8b0265..d6f829d49a28 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1179,25 +1179,40 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file) { struct drm_i915_file_private *file_priv; - int ret; + struct i915_drm_client *client; + int ret = -ENOMEM;

DRM_DEBUG("\n");

file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL); if (!file_priv) - return -ENOMEM; + goto err_alloc; + + client = i915_drm_client_add(&i915->clients); + if (IS_ERR(client)) { + ret = PTR_ERR(client); + goto err_client; + }

file->driver_priv = file_priv; file_priv->dev_priv = i915; file_priv->file = file; + file_priv->client = client;

file_priv->bsd_engine = -1; file_priv->hang_timestamp = jiffies;

ret = i915_gem_context_open(i915, file); if (ret) - kfree(file_priv); + goto err_context; + + return 0;

+err_context: + i915_drm_client_put(client); +err_client: + kfree(file_priv); +err_alloc: return ret; }

-- 2.30.2

Tvrtko Ursulin

9:18 a.m.

New subject: [RFC 2/8] drm/i915: Make GEM contexts track DRM clients

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Make GEM contexts keep a reference to i915_drm_client for the whole of of their lifetime which will come handy in following patches.

v2: Don't bother supporting selftests contexts from debugfs. (Chris) v3 (Lucas): Finish constructing ctx before adding it to the list v4 (Ram): Rebase. v5: Trivial rebase for proto ctx changes. v6: Rebase after clients no longer track name and pid.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk # v5 Reviewed-by: Aravind Iddamsetty aravind.iddamsetty@intel.com # v5 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++ drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 3 +++ 2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 7d6f52d8a801..3bf409cf0214 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -988,6 +988,9 @@ void i915_gem_context_release(struct kref *ref) trace_i915_context_free(ctx); GEM_BUG_ON(!i915_gem_context_is_closed(ctx));

+ if (ctx->client) + i915_drm_client_put(ctx->client); + mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex);

@@ -1436,6 +1439,8 @@ static void gem_context_register(struct i915_gem_context *ctx, ctx->file_priv = fpriv;

ctx->pid = get_task_pid(current, PIDTYPE_PID); + ctx->client = i915_drm_client_get(fpriv->client); + snprintf(ctx->name, sizeof(ctx->name), "%s[%d]", current->comm, pid_nr(ctx->pid));

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 94c03a97cb77..e1bca913818e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -277,6 +277,9 @@ struct i915_gem_context { /** @link: place with &drm_i915_private.context_list */ struct list_head link;

+ /** @client: struct i915_drm_client */ + struct i915_drm_client *client; + /** * @ref: reference count *

-- 2.30.2

Tvrtko Ursulin

9:18 a.m.

New subject: [RFC 3/8] drm/i915: Track runtime spent in closed and unreachable GEM contexts

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

As contexts are abandoned we want to remember how much GPU time they used (per class) so later we can used it for smarter purposes.

As GEM contexts are closed we want to have the DRM client remember how much GPU time they used (per class) so later we can used it for smarter purposes.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 25 +++++++++++++++++++-- drivers/gpu/drm/i915/i915_drm_client.h | 7 ++++++ 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 3bf409cf0214..4b93fcb11914 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -841,23 +841,44 @@ static void free_engines_rcu(struct rcu_head *rcu) free_engines(engines); }

+static void accumulate_runtime(struct i915_drm_client *client, + struct i915_gem_engines *engines) +{ + struct i915_gem_engines_iter it; + struct intel_context *ce; + + if (!client) + return; + + /* Transfer accumulated runtime to the parent GEM context. */ + for_each_gem_engine(ce, engines, it) { + unsigned int class = ce->engine->uabi_class; + + GEM_BUG_ON(class >= ARRAY_SIZE(client->past_runtime)); + atomic64_add(intel_context_get_total_runtime_ns(ce), + &client->past_runtime[class]); + } +} + static int __i915_sw_fence_call engines_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) { struct i915_gem_engines *engines = container_of(fence, typeof(*engines), fence); + struct i915_gem_context *ctx = engines->ctx;

switch (state) { case FENCE_COMPLETE: if (!list_empty(&engines->link)) { - struct i915_gem_context *ctx = engines->ctx; unsigned long flags;

spin_lock_irqsave(&ctx->stale.lock, flags); list_del(&engines->link); spin_unlock_irqrestore(&ctx->stale.lock, flags); } - i915_gem_context_put(engines->ctx); + accumulate_runtime(ctx->client, engines); + i915_gem_context_put(ctx); + break;

case FENCE_FREE: diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h index e8986ad51176..9d80d9f715ee 100644 --- a/drivers/gpu/drm/i915/i915_drm_client.h +++ b/drivers/gpu/drm/i915/i915_drm_client.h @@ -9,6 +9,8 @@ #include <linux/kref.h> #include <linux/xarray.h>

+#include "gt/intel_engine_types.h" + struct drm_i915_private;

struct i915_drm_clients { @@ -24,6 +26,11 @@ struct i915_drm_client { unsigned int id;

struct i915_drm_clients *clients; + + /** + * @past_runtime: Accumulation of pphwsp runtimes from closed contexts. + */ + atomic64_t past_runtime[MAX_ENGINE_CLASS + 1]; };

void i915_drm_clients_init(struct i915_drm_clients *clients,

-- 2.30.2

Tvrtko Ursulin

9:18 a.m.

New subject: [RFC 4/8] drm/i915: Track all user contexts per client

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

We soon want to start answering questions like how much GPU time is the context belonging to a client which exited still using.

To enable this we start tracking all context belonging to a client on a separate list.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 12 ++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 3 +++ drivers/gpu/drm/i915/i915_drm_client.c | 2 ++ drivers/gpu/drm/i915/i915_drm_client.h | 5 +++++ 4 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 4b93fcb11914..9f32540f97bd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1215,6 +1215,7 @@ static void set_closed_name(struct i915_gem_context *ctx)

static void context_close(struct i915_gem_context *ctx) { + struct i915_drm_client *client; struct i915_address_space *vm;

/* Flush any concurrent set_engines() */ @@ -1247,6 +1248,13 @@ static void context_close(struct i915_gem_context *ctx) list_del(&ctx->link); spin_unlock(&ctx->i915->gem.contexts.lock);

+ client = ctx->client; + if (client) { + spin_lock(&client->ctx_lock); + list_del_rcu(&ctx->client_link); + spin_unlock(&client->ctx_lock); + } + mutex_unlock(&ctx->mutex);

/* @@ -1469,6 +1477,10 @@ static void gem_context_register(struct i915_gem_context *ctx, old = xa_store(&fpriv->context_xa, id, ctx, GFP_KERNEL); WARN_ON(old);

+ spin_lock(&ctx->client->ctx_lock); + list_add_tail_rcu(&ctx->client_link, &ctx->client->ctx_list); + spin_unlock(&ctx->client->ctx_lock); + spin_lock(&i915->gem.contexts.lock); list_add_tail(&ctx->link, &i915->gem.contexts.list); spin_unlock(&i915->gem.contexts.lock); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index e1bca913818e..4eab17591f3c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -280,6 +280,9 @@ struct i915_gem_context { /** @client: struct i915_drm_client */ struct i915_drm_client *client;

+ /** link: &drm_client.context_list */ + struct list_head client_link; + /** * @ref: reference count * diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c index e61e9ba15256..91a8559bebf7 100644 --- a/drivers/gpu/drm/i915/i915_drm_client.c +++ b/drivers/gpu/drm/i915/i915_drm_client.c @@ -38,6 +38,8 @@ struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients) goto err;

kref_init(&client->kref); + spin_lock_init(&client->ctx_lock); + INIT_LIST_HEAD(&client->ctx_list); client->clients = clients;

return client; diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h index 9d80d9f715ee..7416e18aa33c 100644 --- a/drivers/gpu/drm/i915/i915_drm_client.h +++ b/drivers/gpu/drm/i915/i915_drm_client.h @@ -7,6 +7,8 @@ #define __I915_DRM_CLIENT_H__

#include <linux/kref.h> +#include <linux/list.h> +#include <linux/spinlock.h> #include <linux/xarray.h>

#include "gt/intel_engine_types.h" @@ -25,6 +27,9 @@ struct i915_drm_client {

unsigned int id;

+ spinlock_t ctx_lock; /* For add/remove from ctx_list. */ + struct list_head ctx_list; /* List of contexts belonging to client. */ + struct i915_drm_clients *clients;

/**

-- 2.30.2

Tvrtko Ursulin

9:18 a.m.

New subject: [RFC 5/8] drm/i915: Track context current active time

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Track context active (on hardware) status together with the start timestamp.

This will be used to provide better granularity of context runtime reporting in conjunction with already tracked pphwsp accumulated runtime.

The latter is only updated on context save so does not give us visibility to any currently executing work.

As part of the patch the existing runtime tracking data is moved under the new ce->stats member and updated under the seqlock. This provides the ability to atomically read out accumulated plus active runtime.

v2: * Rename and make __intel_context_get_active_time unlocked.

v3: * Use GRAPHICS_VER.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Aravind Iddamsetty aravind.iddamsetty@intel.com # v1 Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/gt/intel_context.c | 27 ++++++++++++++++++- drivers/gpu/drm/i915/gt/intel_context.h | 15 ++++------- drivers/gpu/drm/i915/gt/intel_context_types.h | 24 +++++++++++------ .../drm/i915/gt/intel_execlists_submission.c | 23 ++++++++++++---- .../gpu/drm/i915/gt/intel_gt_clock_utils.c | 4 +++ drivers/gpu/drm/i915/gt/intel_lrc.c | 27 ++++++++++--------- drivers/gpu/drm/i915/gt/intel_lrc.h | 24 +++++++++++++++++ drivers/gpu/drm/i915/gt/selftest_lrc.c | 10 +++---- drivers/gpu/drm/i915/i915_gpu_error.c | 9 +++---- drivers/gpu/drm/i915/i915_gpu_error.h | 2 +- 10 files changed, 116 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index bd63813c8a80..06816690ffc7 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -374,7 +374,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) ce->ring = NULL; ce->ring_size = SZ_4K;

- ewma_runtime_init(&ce->runtime.avg); + ewma_runtime_init(&ce->stats.runtime.avg);

ce->vm = i915_vm_get(engine->gt->vm);

@@ -500,6 +500,31 @@ struct i915_request *intel_context_create_request(struct intel_context *ce) return rq; }

+u64 intel_context_get_total_runtime_ns(const struct intel_context *ce) +{ + u64 total, active; + + total = ce->stats.runtime.total; + if (ce->ops->flags & COPS_RUNTIME_CYCLES) + total *= ce->engine->gt->clock_period_ns; + + active = READ_ONCE(ce->stats.active); + if (active) + active = intel_context_clock() - active; + + return total + active; +} + +u64 intel_context_get_avg_runtime_ns(struct intel_context *ce) +{ + u64 avg = ewma_runtime_read(&ce->stats.runtime.avg); + + if (ce->ops->flags & COPS_RUNTIME_CYCLES) + avg *= ce->engine->gt->clock_period_ns; + + return avg; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_context.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index b10cbe8fee99..093e2423e92b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -245,18 +245,13 @@ intel_context_clear_nopreempt(struct intel_context *ce) clear_bit(CONTEXT_NOPREEMPT, &ce->flags); }

-static inline u64 intel_context_get_total_runtime_ns(struct intel_context *ce) -{ - const u32 period = ce->engine->gt->clock_period_ns; - - return READ_ONCE(ce->runtime.total) * period; -} +u64 intel_context_get_total_runtime_ns(const struct intel_context *ce); +u64 intel_context_get_avg_runtime_ns(struct intel_context *ce);

-static inline u64 intel_context_get_avg_runtime_ns(struct intel_context *ce) +static inline u64 intel_context_clock(void) { - const u32 period = ce->engine->gt->clock_period_ns; - - return mul_u32_u32(ewma_runtime_read(&ce->runtime.avg), period); + /* As we mix CS cycles with CPU clocks, use the raw monotonic clock. */ + return ktime_get_raw_fast_ns(); }

#endif /* __INTEL_CONTEXT_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 90026c177105..9c68fda36c40 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -33,6 +33,9 @@ struct intel_context_ops { #define COPS_HAS_INFLIGHT_BIT 0 #define COPS_HAS_INFLIGHT BIT(COPS_HAS_INFLIGHT_BIT)

+#define COPS_RUNTIME_CYCLES_BIT 1 +#define COPS_RUNTIME_CYCLES BIT(COPS_RUNTIME_CYCLES_BIT) + int (*alloc)(struct intel_context *ce);

int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); @@ -111,14 +114,19 @@ struct intel_context { } lrc; u32 tag; /* cookie passed to HW to track this context on submission */

- /* Time on GPU as tracked by the hw. */ - struct { - struct ewma_runtime avg; - u64 total; - u32 last; - I915_SELFTEST_DECLARE(u32 num_underflow); - I915_SELFTEST_DECLARE(u32 max_underflow); - } runtime; + /** stats: Context GPU engine busyness tracking. */ + struct intel_context_stats { + u64 active; + + /* Time on GPU as tracked by the hw. */ + struct { + struct ewma_runtime avg; + u64 total; + u32 last; + I915_SELFTEST_DECLARE(u32 num_underflow); + I915_SELFTEST_DECLARE(u32 max_underflow); + } runtime; + } stats;

unsigned int active_count; /* protected by timeline->mutex */

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 56e25090da67..31a426f3d984 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -595,8 +595,6 @@ static void __execlists_schedule_out(struct i915_request * const rq, GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag)); __set_bit(ccid - 1, &engine->context_tag); } - - lrc_update_runtime(ce); intel_engine_context_out(engine); execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT); if (engine->fw_domain && !--engine->fw_active) @@ -1948,8 +1946,23 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) * and merits a fresh timeslice. We reinstall the timer after * inspecting the queue to see if we need to resumbit. */ - if (*prev != *execlists->active) /* elide lite-restores */ + if (*prev != *execlists->active) { /* elide lite-restores */ + /* + * Note the inherent discrepancy between the HW runtime, + * recorded as part of the context switch, and the CPU + * adjustment for active contexts. We have to hope that + * the delay in processing the CS event is very small + * and consistent. It works to our advantage to have + * the CPU adjustment _undershoot_ (i.e. start later than) + * the CS timestamp so we never overreport the runtime + * and correct overselves later when updating from HW. + */ + if (*prev) + lrc_runtime_stop((*prev)->context); + if (*execlists->active) + lrc_runtime_start((*execlists->active)->context); new_timeslice(execlists); + }

return inactive; } @@ -2534,7 +2547,7 @@ static int execlists_context_alloc(struct intel_context *ce) }

static const struct intel_context_ops execlists_context_ops = { - .flags = COPS_HAS_INFLIGHT, + .flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES,

.alloc = execlists_context_alloc,

@@ -3494,7 +3507,7 @@ static void virtual_context_exit(struct intel_context *ce) }

static const struct intel_context_ops virtual_context_ops = { - .flags = COPS_HAS_INFLIGHT, + .flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES,

.alloc = virtual_context_alloc,

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c index 9f0e729d2d15..aa1ecc302865 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c @@ -159,6 +159,10 @@ void intel_gt_init_clock_frequency(struct intel_gt *gt) if (gt->clock_frequency) gt->clock_period_ns = intel_gt_clock_interval_to_ns(gt, 1);

+ /* Icelake appears to use another fixed frequency for CTX_TIMESTAMP */ + if (GRAPHICS_VER(gt->i915) == 11) + gt->clock_period_ns = NSEC_PER_SEC / 13750000; + GT_TRACE(gt, "Using clock frequency: %dkHz, period: %dns, wrap: %lldms\n", gt->clock_frequency / 1000, diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 8ada1afe3d22..eaaf57bb44f4 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -642,7 +642,7 @@ static void init_common_regs(u32 * const regs, CTX_CTRL_RS_CTX_ENABLE); regs[CTX_CONTEXT_CONTROL] = ctl;

- regs[CTX_TIMESTAMP] = ce->runtime.last; + regs[CTX_TIMESTAMP] = ce->stats.runtime.last; }

static void init_wa_bb_regs(u32 * const regs, @@ -1565,35 +1565,36 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine) } }

-static void st_update_runtime_underflow(struct intel_context *ce, s32 dt) +static void st_runtime_underflow(struct intel_context_stats *stats, s32 dt) { #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) - ce->runtime.num_underflow++; - ce->runtime.max_underflow = max_t(u32, ce->runtime.max_underflow, -dt); + stats->runtime.num_underflow++; + stats->runtime.max_underflow = + max_t(u32, stats->runtime.max_underflow, -dt); #endif }

void lrc_update_runtime(struct intel_context *ce) { + struct intel_context_stats *stats = &ce->stats; u32 old; s32 dt;

- if (intel_context_is_barrier(ce)) + old = stats->runtime.last; + stats->runtime.last = lrc_get_runtime(ce); + dt = stats->runtime.last - old; + if (!dt) return;

- old = ce->runtime.last; - ce->runtime.last = lrc_get_runtime(ce); - dt = ce->runtime.last - old; - if (unlikely(dt < 0)) { CE_TRACE(ce, "runtime underflow: last=%u, new=%u, delta=%d\n", - old, ce->runtime.last, dt); - st_update_runtime_underflow(ce, dt); + old, stats->runtime.last, dt); + st_runtime_underflow(stats, dt); return; }

- ewma_runtime_add(&ce->runtime.avg, dt); - ce->runtime.total += dt; + ewma_runtime_add(&stats->runtime.avg, dt); + stats->runtime.total += dt; }

#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index 7f697845c4cf..8073674538d7 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -79,4 +79,28 @@ static inline u32 lrc_get_runtime(const struct intel_context *ce) return READ_ONCE(ce->lrc_reg_state[CTX_TIMESTAMP]); }

+static inline void lrc_runtime_start(struct intel_context *ce) +{ + struct intel_context_stats *stats = &ce->stats; + + if (intel_context_is_barrier(ce)) + return; + + if (stats->active) + return; + + WRITE_ONCE(stats->active, intel_context_clock()); +} + +static inline void lrc_runtime_stop(struct intel_context *ce) +{ + struct intel_context_stats *stats = &ce->stats; + + if (!stats->active) + return; + + lrc_update_runtime(ce); + WRITE_ONCE(stats->active, 0); +} + #endif /* __INTEL_LRC_H__ */ diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index b0977a3b699b..9b9ee0fe1512 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1751,8 +1751,8 @@ static int __live_pphwsp_runtime(struct intel_engine_cs *engine) if (IS_ERR(ce)) return PTR_ERR(ce);

- ce->runtime.num_underflow = 0; - ce->runtime.max_underflow = 0; + ce->stats.runtime.num_underflow = 0; + ce->stats.runtime.max_underflow = 0;

do { unsigned int loop = 1024; @@ -1790,11 +1790,11 @@ static int __live_pphwsp_runtime(struct intel_engine_cs *engine) intel_context_get_avg_runtime_ns(ce));

err = 0; - if (ce->runtime.num_underflow) { + if (ce->stats.runtime.num_underflow) { pr_err("%s: pphwsp underflow %u time(s), max %u cycles!\n", engine->name, - ce->runtime.num_underflow, - ce->runtime.max_underflow); + ce->stats.runtime.num_underflow, + ce->stats.runtime.max_underflow); GEM_TRACE_DUMP(); err = -EOVERFLOW; } diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index a2c58b54a592..d4410caf27ac 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -484,13 +484,10 @@ static void error_print_context(struct drm_i915_error_state_buf *m, const char *header, const struct i915_gem_context_coredump *ctx) { - const u32 period = m->i915->gt.clock_period_ns; - err_printf(m, "%s%s[%d] prio %d, guilty %d active %d, runtime total %lluns, avg %lluns\n", header, ctx->comm, ctx->pid, ctx->sched_attr.priority, ctx->guilty, ctx->active, - ctx->total_runtime * period, - mul_u32_u32(ctx->avg_runtime, period)); + ctx->total_runtime, ctx->avg_runtime); }

static struct i915_vma_coredump * @@ -1279,8 +1276,8 @@ static bool record_context(struct i915_gem_context_coredump *e, e->guilty = atomic_read(&ctx->guilty_count); e->active = atomic_read(&ctx->active_count);

- e->total_runtime = rq->context->runtime.total; - e->avg_runtime = ewma_runtime_read(&rq->context->runtime.avg); + e->total_runtime = intel_context_get_total_runtime_ns(rq->context); + e->avg_runtime = intel_context_get_avg_runtime_ns(rq->context);

simulated = i915_gem_context_no_error_capture(ctx);

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h index b98d8cdbe4f2..b11deb547672 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.h +++ b/drivers/gpu/drm/i915/i915_gpu_error.h @@ -90,7 +90,7 @@ struct intel_engine_coredump { char comm[TASK_COMM_LEN];

u64 total_runtime; - u32 avg_runtime; + u64 avg_runtime;

pid_t pid; int active;

-- 2.30.2

Tvrtko Ursulin

9:18 a.m.

New subject: [RFC 6/8] drm: Document fdinfo format specification

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Proposal to standardise the fdinfo text format as optionally output by DRM drivers.

Idea is that a simple but, well defined, spec will enable generic userspace tools to be written while at the same time avoiding a more heavy handed approach of adding a mid-layer to DRM.

i915 implements a subset of the spec, everything apart from the memory stats currently, and a matching intel_gpu_top tool exists.

Open is to see if AMD can migrate to using the proposed GPU utilisation key-value pairs, or if they are not workable to see whether to go vendor specific, or if a standardised alternative can be found which is workable for both drivers.

Same for the memory utilisation key-value pairs proposal.

v2: * Update for removal of name and pid.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: David M Nieto David.Nieto@amd.com Cc: Christian König christian.koenig@amd.com Cc: Daniel Vetter daniel@ffwll.ch --- Documentation/gpu/drm-usage-stats.rst | 97 +++++++++++++++++++++++++++ Documentation/gpu/index.rst | 1 + 2 files changed, 98 insertions(+) create mode 100644 Documentation/gpu/drm-usage-stats.rst

diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst new file mode 100644 index 000000000000..78dc01c30e22 --- /dev/null +++ b/Documentation/gpu/drm-usage-stats.rst @@ -0,0 +1,97 @@ +.. _drm-client-usage-stats: + +====================== +DRM client usage stats +====================== + +DRM drivers can choose to export partly standardised text output via the +`fops->show_fdinfo()` as part of the driver specific file operations registered +in the `struct drm_driver` object registered with the DRM core. + +One purpose of this output is to enable writing as generic as practicaly +feasible `top(1)` like userspace monitoring tools. + +Given the differences between various DRM drivers the specification of the +output is split between common and driver specific parts. Having said that, +wherever possible effort should still be made to standardise as much as +possible. + +File format specification +========================= + +- File shall contain one key value pair per one line of text. +- Colon character (`:`) must be used to delimit keys and values. +- All keys shall be prefixed with `drm-`. +- Whitespace between the delimiter and first non-whitespace character shall be + ignored when parsing. +- Neither keys or values are allowed to contain whitespace characters. +- Numerical key value pairs can end with optional unit string. +- Data type of the value is fixed as defined in the specification. + +Key types +--------- + +1. Mandatory, fully standardised. +2. Optional, fully standardised. +3. Driver specific. + +Data types +---------- + +- <uint> - Unsigned integer without defining the maximum value. +- <str> - String excluding any above defined reserved characters or whitespace. + +Mandatory fully standardised keys +--------------------------------- + +- drm-driver: <str> + +String shall contain a fixed string uniquely identified the driver handling +the device in question. For example name of the respective kernel module. + +Optional fully standardised keys +-------------------------------- + +- drm-pdev: aaaa:bb.cc.d + +For PCI devices this should contain the PCI slot address of the device in +question. + +- drm-client-id: <uint> + +Unique value relating to the open DRM file descriptor used to distinguish +duplicated and shared file descriptors. Conceptually the value should map 1:1 +to the in kernel representation of `struct drm_file` instances. + +Uniqueness of the value shall be either globally unique, or unique within the +scope of each device, in which case `drm-pdev` shall be present as well. + +Userspace should make sure to not double account any usage statistics by using +the above described criteria in order to associate data to individual clients. + +- drm-engine-<str>: <uint> ns + +GPUs usually contain multiple execution engines. Each shall be given a stable +and unique name (str), with possible values documented in the driver specific +documentation. + +Value shall be in specified time units which the respective GPU engine spent +busy executing workloads belonging to this client. + +Values are not required to be constantly monotonic if it makes the driver +implementation easier, but are required to catch up with the previously reported +larger value within a reasonable period. Upon observing a value lower than what +was previously read, userspace is expected to stay with that larger previous +value until a monotonic update is seen. + +- drm-memory-<str>: <uint> [KiB|MiB] + +Each possible memory type which can be used to store buffer objects by the +GPU in question shall be given a stable and unique name to be returned as the +string here. + +Value shall reflect the amount of storage currently consumed by the buffer +object belong to this client, in the respective memory region. + +Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' +indicating kibi- or mebi-bytes. diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst index b9c1214d8f23..b99dede9a5b1 100644 --- a/Documentation/gpu/index.rst +++ b/Documentation/gpu/index.rst @@ -10,6 +10,7 @@ Linux GPU Driver Developer's Guide drm-kms drm-kms-helpers drm-uapi + drm-usage-stats driver-uapi drm-client drivers

-- 2.30.2

Daniel Stone

23 Jul 23 Jul

4:43 p.m.

New subject: [RFC 6/8] drm: Document fdinfo format specification

Hi Tvrtko, Thanks for typing this up!

On Thu, 15 Jul 2021 at 10:18, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

+Mandatory fully standardised keys +---------------------------------

+- drm-driver: <str>

+String shall contain a fixed string uniquely identified the driver handling +the device in question. For example name of the respective kernel module.

I think let's be more prescriptive and just say that it is the module name.

...

+Optional fully standardised keys +--------------------------------

+- drm-pdev: aaaa:bb.cc.d

+For PCI devices this should contain the PCI slot address of the device in +question.

How about just major:minor of the DRM render node device it's attached to?

...

+- drm-client-id: <uint>

+Unique value relating to the open DRM file descriptor used to distinguish +duplicated and shared file descriptors. Conceptually the value should map 1:1 +to the in kernel representation of `struct drm_file` instances.

+Uniqueness of the value shall be either globally unique, or unique within the +scope of each device, in which case `drm-pdev` shall be present as well.

+Userspace should make sure to not double account any usage statistics by using +the above described criteria in order to associate data to individual clients.

+- drm-engine-<str>: <uint> ns

+GPUs usually contain multiple execution engines. Each shall be given a stable +and unique name (str), with possible values documented in the driver specific +documentation.

+Value shall be in specified time units which the respective GPU engine spent +busy executing workloads belonging to this client.

+Values are not required to be constantly monotonic if it makes the driver +implementation easier, but are required to catch up with the previously reported +larger value within a reasonable period. Upon observing a value lower than what +was previously read, userspace is expected to stay with that larger previous +value until a monotonic update is seen.

Yeah, that would work well for Mali/Panfrost. We can queue multiple jobs in the hardware, which can either be striped across multiple cores with an affinity mask (e.g. 3 cores for your client and 1 for your compositor), or picked according to priority, or ...

The fine-grained performance counters (e.g. time spent waiting for sampler) are only GPU-global. So if you have two jobs running simultaneously, you have no idea who's responsible for what.

But it does give us coarse-grained counters which are accounted per-job-slot, including exactly this metric: amount of 'GPU time' (whatever that means) occupied by that job slot during the sampling period. So we could support that nicely if we fenced job-slot updates with register reads/writes.

Something I'm missing though is how we enable this information. Seems like it would be best to either only do it whilst fdinfo is open (and re-read it whenever you need an update), or on a per-driver sysfs toggle, or ... ?

...

+- drm-memory-<str>: <uint> [KiB|MiB]

+Each possible memory type which can be used to store buffer objects by the +GPU in question shall be given a stable and unique name to be returned as the +string here.

+Value shall reflect the amount of storage currently consumed by the buffer +object belong to this client, in the respective memory region.

+Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' +indicating kibi- or mebi-bytes.

I'm a bit wary of the accounting here. Is it buffer allocations originating from the client, in which case it conceptually clashes with gralloc? Is it the client which last wrote to the buffer? The client with the oldest open handle to the buffer? Other?

Cheers, Daniel

Daniel Vetter

4:47 p.m.

New subject: [RFC 6/8] drm: Document fdinfo format specification

On Fri, Jul 23, 2021 at 05:43:01PM +0100, Daniel Stone wrote:

...

Hi Tvrtko, Thanks for typing this up!

On Thu, 15 Jul 2021 at 10:18, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
+Mandatory fully standardised keys +---------------------------------

+- drm-driver: <str>

+String shall contain a fixed string uniquely identified the driver handling +the device in question. For example name of the respective kernel module.

I think let's be more prescriptive and just say that it is the module name.

Just a quick comment on this one.

drm_driver.name is already uapi, so let's please not invent a new one. The shared code should probably make sure drivers don't get this wrong. Maybe good if we document the getverion ioctl, which also exposes this, and then link between the two. -Daniel

...

...
+Optional fully standardised keys +--------------------------------

+- drm-pdev: aaaa:bb.cc.d

+For PCI devices this should contain the PCI slot address of the device in +question.

How about just major:minor of the DRM render node device it's attached to?

...
+- drm-client-id: <uint>

+Unique value relating to the open DRM file descriptor used to distinguish +duplicated and shared file descriptors. Conceptually the value should map 1:1 +to the in kernel representation of `struct drm_file` instances.

+Uniqueness of the value shall be either globally unique, or unique within the +scope of each device, in which case `drm-pdev` shall be present as well.

+Userspace should make sure to not double account any usage statistics by using +the above described criteria in order to associate data to individual clients.

+- drm-engine-<str>: <uint> ns

+GPUs usually contain multiple execution engines. Each shall be given a stable +and unique name (str), with possible values documented in the driver specific +documentation.

+Value shall be in specified time units which the respective GPU engine spent +busy executing workloads belonging to this client.

+Values are not required to be constantly monotonic if it makes the driver +implementation easier, but are required to catch up with the previously reported +larger value within a reasonable period. Upon observing a value lower than what +was previously read, userspace is expected to stay with that larger previous +value until a monotonic update is seen.

Yeah, that would work well for Mali/Panfrost. We can queue multiple jobs in the hardware, which can either be striped across multiple cores with an affinity mask (e.g. 3 cores for your client and 1 for your compositor), or picked according to priority, or ...

The fine-grained performance counters (e.g. time spent waiting for sampler) are only GPU-global. So if you have two jobs running simultaneously, you have no idea who's responsible for what.

But it does give us coarse-grained counters which are accounted per-job-slot, including exactly this metric: amount of 'GPU time' (whatever that means) occupied by that job slot during the sampling period. So we could support that nicely if we fenced job-slot updates with register reads/writes.

Something I'm missing though is how we enable this information. Seems like it would be best to either only do it whilst fdinfo is open (and re-read it whenever you need an update), or on a per-driver sysfs toggle, or ... ?

...
+- drm-memory-<str>: <uint> [KiB|MiB]

+Each possible memory type which can be used to store buffer objects by the +GPU in question shall be given a stable and unique name to be returned as the +string here.

+Value shall reflect the amount of storage currently consumed by the buffer +object belong to this client, in the respective memory region.

+Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' +indicating kibi- or mebi-bytes.

I'm a bit wary of the accounting here. Is it buffer allocations originating from the client, in which case it conceptually clashes with gralloc? Is it the client which last wrote to the buffer? The client with the oldest open handle to the buffer? Other?

Cheers, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Nieto, David M

5:45 p.m.

New subject: [RFC 6/8] drm: Document fdinfo format specification

[AMD Official Use Only]

I just want to make a comment that with this approach (the ns) calculating the percentage will take at least two reads of the fdinfo per pid over some time. Some engines may be able to provide a single shot percentage usage over an internal integration period. That is, for example, what we currently have implemented for that exact reason.

I'd like to propose that we add an optional set of fields for this. Also, I may have missed a message, but why did we remove the timstamp? It is needed for accurate measurements of engine usage.

David ________________________________ From: Daniel Vetter daniel@ffwll.ch Sent: Friday, July 23, 2021 9:47 AM To: Daniel Stone daniel@fooishbar.org Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com; intel-gfx Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin tvrtko.ursulin@intel.com; Koenig, Christian Christian.Koenig@amd.com; dri-devel dri-devel@lists.freedesktop.org; Nieto, David M David.Nieto@amd.com Subject: Re: [RFC 6/8] drm: Document fdinfo format specification

On Fri, Jul 23, 2021 at 05:43:01PM +0100, Daniel Stone wrote:

...

Hi Tvrtko, Thanks for typing this up!

On Thu, 15 Jul 2021 at 10:18, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
+Mandatory fully standardised keys +---------------------------------

+- drm-driver: <str>

+String shall contain a fixed string uniquely identified the driver handling +the device in question. For example name of the respective kernel module.

I think let's be more prescriptive and just say that it is the module name.

Just a quick comment on this one.

...

...
+Optional fully standardised keys +--------------------------------

+- drm-pdev: aaaa:bb.cc.d

+For PCI devices this should contain the PCI slot address of the device in +question.

How about just major:minor of the DRM render node device it's attached to?

...
+- drm-client-id: <uint>

+Unique value relating to the open DRM file descriptor used to distinguish +duplicated and shared file descriptors. Conceptually the value should map 1:1 +to the in kernel representation of `struct drm_file` instances.

+Uniqueness of the value shall be either globally unique, or unique within the +scope of each device, in which case `drm-pdev` shall be present as well.

+Userspace should make sure to not double account any usage statistics by using +the above described criteria in order to associate data to individual clients.

+- drm-engine-<str>: <uint> ns

+GPUs usually contain multiple execution engines. Each shall be given a stable +and unique name (str), with possible values documented in the driver specific +documentation.

+Value shall be in specified time units which the respective GPU engine spent +busy executing workloads belonging to this client.

+Values are not required to be constantly monotonic if it makes the driver +implementation easier, but are required to catch up with the previously reported +larger value within a reasonable period. Upon observing a value lower than what +was previously read, userspace is expected to stay with that larger previous +value until a monotonic update is seen.

Yeah, that would work well for Mali/Panfrost. We can queue multiple jobs in the hardware, which can either be striped across multiple cores with an affinity mask (e.g. 3 cores for your client and 1 for your compositor), or picked according to priority, or ...

The fine-grained performance counters (e.g. time spent waiting for sampler) are only GPU-global. So if you have two jobs running simultaneously, you have no idea who's responsible for what.

But it does give us coarse-grained counters which are accounted per-job-slot, including exactly this metric: amount of 'GPU time' (whatever that means) occupied by that job slot during the sampling period. So we could support that nicely if we fenced job-slot updates with register reads/writes.

Something I'm missing though is how we enable this information. Seems like it would be best to either only do it whilst fdinfo is open (and re-read it whenever you need an update), or on a per-driver sysfs toggle, or ... ?

...
+- drm-memory-<str>: <uint> [KiB|MiB]

+Each possible memory type which can be used to store buffer objects by the +GPU in question shall be given a stable and unique name to be returned as the +string here.

+Value shall reflect the amount of storage currently consumed by the buffer +object belong to this client, in the respective memory region.

+Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' +indicating kibi- or mebi-bytes.

I'm a bit wary of the accounting here. Is it buffer allocations originating from the client, in which case it conceptually clashes with gralloc? Is it the client which last wrote to the buffer? The client with the oldest open handle to the buffer? Other?

Cheers, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....

Tvrtko Ursulin

26 Jul 26 Jul

9:01 a.m.

New subject: [RFC 6/8] drm: Document fdinfo format specification

On 23/07/2021 18:45, Nieto, David M wrote:

...

[AMD Official Use Only]

I just want to make a comment that with this approach (the ns) calculating the percentage will take at least two reads of the fdinfo per pid over some time. Some engines may be able to provide a single shot percentage usage over an internal integration period. That is, for example, what we currently have implemented for that exact reason.

I'd like to propose that we add an optional set of fields for this.

Yes it is already like that in the text I've sent out. Because I was unclear how the amdgpu accounting works I called out for you guys to fill in the blanks in the last patch:

""" Opens: * Does it work for AMD? * What are the semantics of AMD engine utilisation reported in percents? Can it align with what i915 does or needs to document the alternative in the specification document?

"""

""" -- drm-engine-<str>: <uint> ns +- drm-engine-<str>: <uint> [ns|%] ... +Where time unit is given as a percentage...[AMD folks to fill the semantics +and interpretation of that]... """

So if cumulative nanoseconds definitely do not work for you, could you please fill in those blanks?

...

Also, I may have missed a message, but why did we remove the timstamp? It is needed for accurate measurements of engine usage.

Hm I did not remove anything - I only renamed some of the fields output from amdgpu fdinfo.

Regards,

Tvrtko

...

David

*From:* Daniel Vetter daniel@ffwll.ch *Sent:* Friday, July 23, 2021 9:47 AM *To:* Daniel Stone daniel@fooishbar.org *Cc:* Tvrtko Ursulin tvrtko.ursulin@linux.intel.com; intel-gfx Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin tvrtko.ursulin@intel.com; Koenig, Christian Christian.Koenig@amd.com; dri-devel dri-devel@lists.freedesktop.org; Nieto, David M David.Nieto@amd.com *Subject:* Re: [RFC 6/8] drm: Document fdinfo format specification On Fri, Jul 23, 2021 at 05:43:01PM +0100, Daniel Stone wrote:

...
Hi Tvrtko, Thanks for typing this up!

On Thu, 15 Jul 2021 at 10:18, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
+Mandatory fully standardised keys +---------------------------------

+- drm-driver: <str>

+String shall contain a fixed string uniquely identified the driver handling +the device in question. For example name of the respective kernel module.

I think let's be more prescriptive and just say that it is the module name.

Just a quick comment on this one.

drm_driver.name is already uapi, so let's please not invent a new one. The shared code should probably make sure drivers don't get this wrong. Maybe good if we document the getverion ioctl, which also exposes this, and then link between the two. -Daniel

...
...
+Optional fully standardised keys +--------------------------------

+- drm-pdev: aaaa:bb.cc.d

+For PCI devices this should contain the PCI slot address of the device in +question.

How about just major:minor of the DRM render node device it's attached to?

...
+- drm-client-id: <uint>

+Unique value relating to the open DRM file descriptor used to distinguish +duplicated and shared file descriptors. Conceptually the value should map 1:1 +to the in kernel representation of `struct drm_file` instances.

+Uniqueness of the value shall be either globally unique, or unique within the +scope of each device, in which case `drm-pdev` shall be present as well.

+Userspace should make sure to not double account any usage statistics by using +the above described criteria in order to associate data to individual clients.

+- drm-engine-<str>: <uint> ns

+GPUs usually contain multiple execution engines. Each shall be given a stable +and unique name (str), with possible values documented in the driver specific +documentation.

+Value shall be in specified time units which the respective GPU engine spent +busy executing workloads belonging to this client.

+Values are not required to be constantly monotonic if it makes the driver +implementation easier, but are required to catch up with the previously reported +larger value within a reasonable period. Upon observing a value lower than what +was previously read, userspace is expected to stay with that larger previous +value until a monotonic update is seen.

Yeah, that would work well for Mali/Panfrost. We can queue multiple jobs in the hardware, which can either be striped across multiple cores with an affinity mask (e.g. 3 cores for your client and 1 for your compositor), or picked according to priority, or ...

The fine-grained performance counters (e.g. time spent waiting for sampler) are only GPU-global. So if you have two jobs running simultaneously, you have no idea who's responsible for what.

But it does give us coarse-grained counters which are accounted per-job-slot, including exactly this metric: amount of 'GPU time' (whatever that means) occupied by that job slot during the sampling period. So we could support that nicely if we fenced job-slot updates with register reads/writes.

Something I'm missing though is how we enable this information. Seems like it would be best to either only do it whilst fdinfo is open (and re-read it whenever you need an update), or on a per-driver sysfs toggle, or ... ?

...
+- drm-memory-<str>: <uint> [KiB|MiB]

+Each possible memory type which can be used to store buffer objects by the +GPU in question shall be given a stable and unique name to be returned as the +string here.

+Value shall reflect the amount of storage currently consumed by the buffer +object belong to this client, in the respective memory region.

+Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' +indicating kibi- or mebi-bytes.

I'm a bit wary of the accounting here. Is it buffer allocations originating from the client, in which case it conceptually clashes with gralloc? Is it the client which last wrote to the buffer? The client with the oldest open handle to the buffer? Other?

Cheers, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.... https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7CDavid.Nieto%40amd.com%7Cda2d9f95ced44d09f66c08d94df991da%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637626556571460650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GrjAYg9tG2QX7z4BDaqa4wMPj2nFcvGo4xCmD8OzwNE%3D&reserved=0

Tvrtko Ursulin

8:57 a.m.

New subject: [RFC 6/8] drm: Document fdinfo format specification

On 23/07/2021 17:43, Daniel Stone wrote:

...

Hi Tvrtko, Thanks for typing this up!

On Thu, 15 Jul 2021 at 10:18, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
+Mandatory fully standardised keys +---------------------------------

+- drm-driver: <str>

+String shall contain a fixed string uniquely identified the driver handling +the device in question. For example name of the respective kernel module.

I think let's be more prescriptive and just say that it is the module name.

I liked the drm_driver.name the other Daniel made so I'll go with that.

...

...
+Optional fully standardised keys +--------------------------------

+- drm-pdev: aaaa:bb.cc.d

+For PCI devices this should contain the PCI slot address of the device in +question.

How about just major:minor of the DRM render node device it's attached to?

I don't have a strong opinion on this one. I can add it, but might keep the drm-dev tag under the optional list because it is handy for intel_gpu_top multi-device support. Or maybe the lookup to pci device is easier than I think now so okay, on my todo list to check.

...

...
+- drm-client-id: <uint>

+Unique value relating to the open DRM file descriptor used to distinguish +duplicated and shared file descriptors. Conceptually the value should map 1:1 +to the in kernel representation of `struct drm_file` instances.

+Uniqueness of the value shall be either globally unique, or unique within the +scope of each device, in which case `drm-pdev` shall be present as well.

+Userspace should make sure to not double account any usage statistics by using +the above described criteria in order to associate data to individual clients.

+- drm-engine-<str>: <uint> ns

+GPUs usually contain multiple execution engines. Each shall be given a stable +and unique name (str), with possible values documented in the driver specific +documentation.

+Value shall be in specified time units which the respective GPU engine spent +busy executing workloads belonging to this client.

+Values are not required to be constantly monotonic if it makes the driver +implementation easier, but are required to catch up with the previously reported +larger value within a reasonable period. Upon observing a value lower than what +was previously read, userspace is expected to stay with that larger previous +value until a monotonic update is seen.

Yeah, that would work well for Mali/Panfrost. We can queue multiple jobs in the hardware, which can either be striped across multiple cores with an affinity mask (e.g. 3 cores for your client and 1 for your compositor), or picked according to priority, or ...

The fine-grained performance counters (e.g. time spent waiting for sampler) are only GPU-global. So if you have two jobs running simultaneously, you have no idea who's responsible for what.

But it does give us coarse-grained counters which are accounted per-job-slot, including exactly this metric: amount of 'GPU time' (whatever that means) occupied by that job slot during the sampling period. So we could support that nicely if we fenced job-slot updates with register reads/writes.

Something I'm missing though is how we enable this information. Seems like it would be best to either only do it whilst fdinfo is open (and re-read it whenever you need an update), or on a per-driver sysfs toggle, or ... ?

Presumably there is non-trivial cost for querying this data on your driver?

Would it be workable to enable tracking on first use and stop some time after last? Just a thought which may have significant downsides from driver to driver.

...

...
+- drm-memory-<str>: <uint> [KiB|MiB]

+Each possible memory type which can be used to store buffer objects by the +GPU in question shall be given a stable and unique name to be returned as the +string here.

+Value shall reflect the amount of storage currently consumed by the buffer +object belong to this client, in the respective memory region.

+Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' +indicating kibi- or mebi-bytes.

I'm a bit wary of the accounting here. Is it buffer allocations originating from the client, in which case it conceptually clashes with gralloc? Is it the client which last wrote to the buffer? The client with the oldest open handle to the buffer? Other?

Haven't looked into AMD code here so know what they export.

Gralloc allocates buffer from it's own drm client and shares them or it is just a library which runs from a client context?

Regards,

Tvrtko

Tvrtko Ursulin

15 Jul 15 Jul

9:18 a.m.

New subject: [RFC 7/8] drm/i915: Expose client engine utilisation via fdinfo

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Similar to AMD commit 874442541133 ("drm/amdgpu: Add show_fdinfo() interface"), using the infrastructure added in previous patches, we add basic client info and GPU engine utilisation for i915.

Example of the output:

pos: 0 flags: 0100002 mnt_id: 21 drm-driver: i915 drm-pdev: 0000:00:02.0 drm-client-id: 7 drm-engine-render: 9288864723 ns drm-engine-copy: 2035071108 ns drm-engine-video: 0 ns drm-engine-video-enhance: 0 ns

v2: * Update for removal of name and pid.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: David M Nieto David.Nieto@amd.com Cc: Christian König christian.koenig@amd.com Cc: Daniel Vetter daniel@ffwll.ch --- Documentation/gpu/drm-usage-stats.rst | 6 +++ Documentation/gpu/i915.rst | 27 ++++++++++ drivers/gpu/drm/i915/i915_drm_client.c | 73 ++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_drm_client.h | 4 ++ drivers/gpu/drm/i915/i915_drv.c | 3 ++ 5 files changed, 113 insertions(+)

diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 78dc01c30e22..b87505438aaa 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -95,3 +95,9 @@ object belong to this client, in the respective memory region.

Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' indicating kibi- or mebi-bytes. + +=============================== +Driver specific implementations +=============================== + +:ref:`i915-usage-stats` diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst index 204ebdaadb45..b28cc316dbd9 100644 --- a/Documentation/gpu/i915.rst +++ b/Documentation/gpu/i915.rst @@ -701,3 +701,30 @@ The style guide for ``i915_reg.h``.

.. kernel-doc:: drivers/gpu/drm/i915/i915_reg.h :doc: The i915 register macro definition style guide + +.. _i915-usage-stats: + +i915 DRM client usage stats implementation +========================================== + +The drm/i915 driver implements the DRM client usage stats specification as +documented in :ref:`drm-client-usage-stats`. + +Example of the output showing the implemented key value pairs and entirety of +the currenly possible format options: + +:: + + pos: 0 + flags: 0100002 + mnt_id: 21 + drm-driver: i915 + drm-pdev: 0000:00:02.0 + drm-client-id: 7 + drm-engine-render: 9288864723 ns + drm-engine-copy: 2035071108 ns + drm-engine-video: 0 ns + drm-engine-video-enhance: 0 ns + +Possible `drm-engine-` key names are: `render`, `copy`, `video` and +`video-enhance`. diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c index 91a8559bebf7..8a6706e06e31 100644 --- a/drivers/gpu/drm/i915/i915_drm_client.c +++ b/drivers/gpu/drm/i915/i915_drm_client.c @@ -7,6 +7,11 @@ #include <linux/slab.h> #include <linux/types.h>

+#include <uapi/drm/i915_drm.h> + +#include <drm/drm_print.h> + +#include "gem/i915_gem_context.h" #include "i915_drm_client.h" #include "i915_gem.h" #include "i915_utils.h" @@ -68,3 +73,71 @@ void i915_drm_clients_fini(struct i915_drm_clients *clients) GEM_BUG_ON(!xa_empty(&clients->xarray)); xa_destroy(&clients->xarray); } + +#ifdef CONFIG_PROC_FS +static const char * const uabi_class_names[] = { + [I915_ENGINE_CLASS_RENDER] = "render", + [I915_ENGINE_CLASS_COPY] = "copy", + [I915_ENGINE_CLASS_VIDEO] = "video", + [I915_ENGINE_CLASS_VIDEO_ENHANCE] = "video-enhance", +}; + +static u64 busy_add(struct i915_gem_context *ctx, unsigned int class) +{ + struct i915_gem_engines_iter it; + struct intel_context *ce; + u64 total = 0; + + for_each_gem_engine(ce, rcu_dereference(ctx->engines), it) { + if (ce->engine->uabi_class != class) + continue; + + total += intel_context_get_total_runtime_ns(ce); + } + + return total; +} + +static void +show_client_class(struct seq_file *m, + struct i915_drm_client *client, + unsigned int class) +{ + const struct list_head *list = &client->ctx_list; + u64 total = atomic64_read(&client->past_runtime[class]); + struct i915_gem_context *ctx; + + rcu_read_lock(); + list_for_each_entry_rcu(ctx, list, client_link) + total += busy_add(ctx, class); + rcu_read_unlock(); + + return seq_printf(m, "drm-engine-%s:\t%llu ns\n", + uabi_class_names[class], total); +} + +void i915_drm_client_fdinfo(struct seq_file *m, struct file *f) +{ + struct drm_file *file = f->private_data; + struct drm_i915_file_private *file_priv = file->driver_priv; + struct drm_i915_private *i915 = file_priv->dev_priv; + struct i915_drm_client *client = file_priv->client; + struct pci_dev *pdev = to_pci_dev(i915->drm.dev); + unsigned int i; + + /* + * ****************************************************************** + * For text output format description please see drm-usage-stats.rst! + * ****************************************************************** + */ + + seq_puts(m, "drm-driver:\ti915\n"); + seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n", + pci_domain_nr(pdev->bus), pdev->bus->number, + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn)); + seq_printf(m, "drm-client-id:\t%u\n", client->id); + + for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++) + show_client_class(m, client, i); +} +#endif diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h index 7416e18aa33c..d96d6a06302e 100644 --- a/drivers/gpu/drm/i915/i915_drm_client.h +++ b/drivers/gpu/drm/i915/i915_drm_client.h @@ -57,6 +57,10 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)

struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);

+#ifdef CONFIG_PROC_FS +void i915_drm_client_fdinfo(struct seq_file *m, struct file *f); +#endif + void i915_drm_clients_fini(struct i915_drm_clients *clients);

#endif /* !__I915_DRM_CLIENT_H__ */ diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index bb628eade92a..b2736b1e5a06 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1706,6 +1706,9 @@ static const struct file_operations i915_driver_fops = { .read = drm_read, .compat_ioctl = i915_ioc32_compat_ioctl, .llseek = noop_llseek, +#ifdef CONFIG_PROC_FS + .show_fdinfo = i915_drm_client_fdinfo, +#endif };

static int

-- 2.30.2

Tvrtko Ursulin

9:18 a.m.

New subject: [RFC 8/8] drm/amdgpu: Convert to common fdinfo format

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Convert fdinfo format to one documented in drm-usage-stats.rst.

Opens: * Does it work for AMD? * What are the semantics of AMD engine utilisation reported in percents? Can it align with what i915 does or needs to document the alternative in the specification document?

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: David M Nieto David.Nieto@amd.com Cc: Christian König christian.koenig@amd.com Cc: Daniel Vetter daniel@ffwll.ch --- Documentation/gpu/amdgpu.rst | 26 ++++++++++++++++++++++ Documentation/gpu/drm-usage-stats.rst | 7 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 18 ++++++++++----- 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/Documentation/gpu/amdgpu.rst b/Documentation/gpu/amdgpu.rst index 364680cdad2e..b9b79c810f28 100644 --- a/Documentation/gpu/amdgpu.rst +++ b/Documentation/gpu/amdgpu.rst @@ -322,3 +322,29 @@ smartshift_bias

.. kernel-doc:: drivers/gpu/drm/amd/pm/amdgpu_pm.c :doc: smartshift_bias + +.. _amdgpu-usage-stats: + +amdgpu DRM client usage stats implementation +============================================ + +The amdgpu driver implements the DRM client usage stats specification as +documented in :ref:`drm-client-usage-stats`. + +Example of the output showing the implemented key value pairs and entirety of +the currenly possible format options: + +:: + + pos: 0 + flags: 0100002 + mnt_id: 21 + drm-driver: amdgpu + drm-pdev: 0000:00:02.0 + drm-client-id: 7 + drm-engine-... TODO + drm-memory-... TODO + +Possible `drm-engine-` key names are: ``,... TODO. + +Possible `drm-memory-` key names are: ``,... TODO. diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index b87505438aaa..eaaa361805c0 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -69,7 +69,7 @@ scope of each device, in which case `drm-pdev` shall be present as well. Userspace should make sure to not double account any usage statistics by using the above described criteria in order to associate data to individual clients.

-- drm-engine-<str>: <uint> ns +- drm-engine-<str>: <uint> [ns|%]

GPUs usually contain multiple execution engines. Each shall be given a stable and unique name (str), with possible values documented in the driver specific @@ -84,6 +84,9 @@ larger value within a reasonable period. Upon observing a value lower than what was previously read, userspace is expected to stay with that larger previous value until a monotonic update is seen.

+Where time unit is given as a percentage...[AMD folks to fill the semantics +and interpretation of that]... + - drm-memory-<str>: <uint> [KiB|MiB]

Each possible memory type which can be used to store buffer objects by the @@ -101,3 +104,5 @@ Driver specific implementations ===============================

:ref:`i915-usage-stats` + +:ref:`amdgpu-usage-stats` diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index d94c5419ec25..d6b011008fe9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -76,11 +76,19 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f) } amdgpu_vm_get_memory(&fpriv->vm, &vram_mem, &gtt_mem, &cpu_mem); amdgpu_bo_unreserve(fpriv->vm.root.bo); - seq_printf(m, "pdev:\t%04x:%02x:%02x.%d\npasid:\t%u\n", domain, bus, + + /* + * ****************************************************************** + * For text output format description please see drm-usage-stats.rst! + * ****************************************************************** + */ + + seq_puts(m, "drm-driver: amdgpu\n"); + seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\npasid:\t%u\n", domain, bus, dev, fn, fpriv->vm.pasid); - seq_printf(m, "vram mem:\t%llu kB\n", vram_mem/1024UL); - seq_printf(m, "gtt mem:\t%llu kB\n", gtt_mem/1024UL); - seq_printf(m, "cpu mem:\t%llu kB\n", cpu_mem/1024UL); + seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL); + seq_printf(m, "drm-memory-gtt:\t%llu KiB\n", gtt_mem/1024UL); + seq_printf(m, "drm-memory-cpu:\t%llu KiB\n", cpu_mem/1024UL); for (i = 0; i < AMDGPU_HW_IP_NUM; i++) { uint32_t count = amdgpu_ctx_num_entities[i]; int idx = 0; @@ -96,7 +104,7 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f) perc = div64_u64(10000 * total, min); frac = perc % 100;

- seq_printf(m, "%s%d:\t%d.%d%%\n", + seq_printf(m, "drm-engine-%s%d:\t%d.%d %%\n", amdgpu_ip_name[i], idx, perc/100, frac); }

-- 2.30.2

Alex Deucher

23 Jul 23 Jul

1:56 p.m.

New subject: [RFC 8/8] drm/amdgpu: Convert to common fdinfo format

+ David, Roy

On Thu, Jul 15, 2021 at 5:18 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Convert fdinfo format to one documented in drm-usage-stats.rst.

Opens:

Does it work for AMD?

What are the semantics of AMD engine utilisation reported in percents? Can it align with what i915 does or needs to document the alternative in the specification document?

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: David M Nieto David.Nieto@amd.com Cc: Christian König christian.koenig@amd.com Cc: Daniel Vetter daniel@ffwll.ch

Documentation/gpu/amdgpu.rst | 26 ++++++++++++++++++++++ Documentation/gpu/drm-usage-stats.rst | 7 +++++- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 18 ++++++++++----- 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/Documentation/gpu/amdgpu.rst b/Documentation/gpu/amdgpu.rst index 364680cdad2e..b9b79c810f28 100644 --- a/Documentation/gpu/amdgpu.rst +++ b/Documentation/gpu/amdgpu.rst @@ -322,3 +322,29 @@ smartshift_bias

.. kernel-doc:: drivers/gpu/drm/amd/pm/amdgpu_pm.c :doc: smartshift_bias

+.. _amdgpu-usage-stats:

+amdgpu DRM client usage stats implementation +============================================

+The amdgpu driver implements the DRM client usage stats specification as +documented in :ref:`drm-client-usage-stats`.

+Example of the output showing the implemented key value pairs and entirety of +the currenly possible format options:

+::
 pos:    0
 flags:  0100002
 mnt_id: 21
 drm-driver: amdgpu
 drm-pdev:   0000:00:02.0
 drm-client-id:      7
 drm-engine-... TODO
 drm-memory-... TODO
+Possible `drm-engine-` key names are: ``,... TODO.

+Possible `drm-memory-` key names are: ``,... TODO. diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index b87505438aaa..eaaa361805c0 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -69,7 +69,7 @@ scope of each device, in which case `drm-pdev` shall be present as well. Userspace should make sure to not double account any usage statistics by using the above described criteria in order to associate data to individual clients.

-- drm-engine-<str>: <uint> ns +- drm-engine-<str>: <uint> [ns|%]

GPUs usually contain multiple execution engines. Each shall be given a stable and unique name (str), with possible values documented in the driver specific @@ -84,6 +84,9 @@ larger value within a reasonable period. Upon observing a value lower than what was previously read, userspace is expected to stay with that larger previous value until a monotonic update is seen.

+Where time unit is given as a percentage...[AMD folks to fill the semantics +and interpretation of that]...

drm-memory-<str>: <uint> [KiB|MiB]

Each possible memory type which can be used to store buffer objects by the @@ -101,3 +104,5 @@ Driver specific implementations ===============================

:ref:`i915-usage-stats`

+:ref:`amdgpu-usage-stats` diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index d94c5419ec25..d6b011008fe9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -76,11 +76,19 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f) } amdgpu_vm_get_memory(&fpriv->vm, &vram_mem, &gtt_mem, &cpu_mem); amdgpu_bo_unreserve(fpriv->vm.root.bo);
  seq_printf(m, "pdev:\t%04x:%02x:%02x.%d\npasid:\t%u\n", domain, bus,
  /*
   * ******************************************************************
   * For text output format description please see drm-usage-stats.rst!
   * ******************************************************************
   */
  seq_puts(m, "drm-driver: amdgpu\n");
  seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\npasid:\t%u\n", domain, bus,
                  dev, fn, fpriv->vm.pasid);
  seq_printf(m, "vram mem:\t%llu kB\n", vram_mem/1024UL);
  seq_printf(m, "gtt mem:\t%llu kB\n", gtt_mem/1024UL);
  seq_printf(m, "cpu mem:\t%llu kB\n", cpu_mem/1024UL);
  seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
  seq_printf(m, "drm-memory-gtt:\t%llu KiB\n", gtt_mem/1024UL);
  seq_printf(m, "drm-memory-cpu:\t%llu KiB\n", cpu_mem/1024UL);
  for (i = 0; i < AMDGPU_HW_IP_NUM; i++) {
          uint32_t count = amdgpu_ctx_num_entities[i];
          int idx = 0;
@@ -96,7 +104,7 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f) perc = div64_u64(10000 * total, min); frac = perc % 100;
                  seq_printf(m, "%s%d:\t%d.%d%%\n",
                  seq_printf(m, "drm-engine-%s%d:\t%d.%d %%\n",
                                  amdgpu_ip_name[i],
                                  idx, perc/100, frac);
          }
-- 2.30.2

Tvrtko Ursulin

11:21 a.m.

New subject: [Intel-gfx] [RFC 0/8] Per client GPU stats

On 15/07/2021 10:18, Tvrtko Ursulin wrote:

...

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Same old work but now rebased and series ending with some DRM docs proposing the common specification which should enable nice common userspace tools to be written.

For the moment I only have intel_gpu_top converted to use this and that seems to work okay.

v2:

Added prototype of possible amdgpu changes and spec updates to align with the common spec.

Not much interest for the common specification?

For reference I've just posted the intel-gpu-top adaptation required to parse it here: https://patchwork.freedesktop.org/patch/446041/?series=90464&rev=2.

Note that this is not attempting to be a vendor agnostic tool but is adding per client data to existing i915 tool which uses PMU counters for global stats.

intel-gpu-top: Intel Skylake (Gen9) @ /dev/dri/card0 - 335/ 339 MHz; 10% RC6; 1.24/ 4.18 W; 527 irqs/s

IMC reads: 3297 MiB/s IMC writes: 2767 MiB/s

PID NAME Render/3D Blitter Video VideoEnhance 10202 neverball |███████████████▎ || || || | 5665 Xorg |███████▍ || || || | 5679 xfce4-session | || || || | 5772 ibus-ui-gtk3 | || || || | 5775 ibus-extension- | || || || | 5777 ibus-x11 | || || || | 5823 xfwm4 | || || || |

Regards,

Tvrtko

...

Tvrtko Ursulin (8): drm/i915: Explicitly track DRM clients drm/i915: Make GEM contexts track DRM clients drm/i915: Track runtime spent in closed and unreachable GEM contexts drm/i915: Track all user contexts per client drm/i915: Track context current active time drm: Document fdinfo format specification drm/i915: Expose client engine utilisation via fdinfo drm/amdgpu: Convert to common fdinfo format

Documentation/gpu/amdgpu.rst | 26 ++++ Documentation/gpu/drm-usage-stats.rst | 108 +++++++++++++ Documentation/gpu/i915.rst | 27 ++++ Documentation/gpu/index.rst | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 18 ++- drivers/gpu/drm/i915/Makefile | 5 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 42 ++++- .../gpu/drm/i915/gem/i915_gem_context_types.h | 6 + drivers/gpu/drm/i915/gt/intel_context.c | 27 +++- drivers/gpu/drm/i915/gt/intel_context.h | 15 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 24 ++- .../drm/i915/gt/intel_execlists_submission.c | 23 ++- .../gpu/drm/i915/gt/intel_gt_clock_utils.c | 4 + drivers/gpu/drm/i915/gt/intel_lrc.c | 27 ++-- drivers/gpu/drm/i915/gt/intel_lrc.h | 24 +++ drivers/gpu/drm/i915/gt/selftest_lrc.c | 10 +- drivers/gpu/drm/i915/i915_drm_client.c | 143 ++++++++++++++++++ drivers/gpu/drm/i915/i915_drm_client.h | 66 ++++++++ drivers/gpu/drm/i915/i915_drv.c | 9 ++ drivers/gpu/drm/i915/i915_drv.h | 5 + drivers/gpu/drm/i915/i915_gem.c | 21 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 9 +- drivers/gpu/drm/i915/i915_gpu_error.h | 2 +- 23 files changed, 581 insertions(+), 61 deletions(-) create mode 100644 Documentation/gpu/drm-usage-stats.rst create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

Christian König

11:23 a.m.

New subject: [Intel-gfx] [RFC 0/8] Per client GPU stats

Am 23.07.21 um 13:21 schrieb Tvrtko Ursulin:

...

On 15/07/2021 10:18, Tvrtko Ursulin wrote:

...
From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Same old work but now rebased and series ending with some DRM docs proposing the common specification which should enable nice common userspace tools to be written.

For the moment I only have intel_gpu_top converted to use this and that seems to work okay.

v2: * Added prototype of possible amdgpu changes and spec updates to align with the common spec.

Not much interest for the common specification?

Well I would rather say not much opposition :)

Of hand everything you do in this patch set sounds absolutely sane to me, just don't have any time to review it in detail.

Regards, Christian.

...

For reference I've just posted the intel-gpu-top adaptation required to parse it here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.....

Note that this is not attempting to be a vendor agnostic tool but is adding per client data to existing i915 tool which uses PMU counters for global stats.

intel-gpu-top: Intel Skylake (Gen9) @ /dev/dri/card0 - 335/ 339 MHz; 10% RC6; 1.24/ 4.18 W;      527 irqs/s

IMC reads:     3297 MiB/s      IMC writes:     2767 MiB/s

ENGINES BUSY MI_SEMA MI_WAIT        Render/3D   78.74% |██████████████████████████████████████████████████████████████████████████▏ |      0%      0%          Blitter    0.00% | |      0%      0%            Video    0.00% | |      0%      0%     VideoEnhance    0.00% | |      0%      0%

PID              NAME          Render/3D Blitter                      Video                    VideoEnhance 10202         neverball |███████████████▎ ||                          || ||                          | 5665              Xorg |███████▍ ||                          || ||                          | 5679     xfce4-session | ||                          || ||                          | 5772      ibus-ui-gtk3 | ||                          || ||                          | 5775   ibus-extension- | ||                          || ||                          | 5777          ibus-x11 | ||                          || ||                          | 5823             xfwm4 | ||                          || ||                          |

Regards,

Tvrtko

...
Tvrtko Ursulin (8):    drm/i915: Explicitly track DRM clients    drm/i915: Make GEM contexts track DRM clients    drm/i915: Track runtime spent in closed and unreachable GEM contexts    drm/i915: Track all user contexts per client    drm/i915: Track context current active time    drm: Document fdinfo format specification    drm/i915: Expose client engine utilisation via fdinfo    drm/amdgpu: Convert to common fdinfo format

Documentation/gpu/amdgpu.rst                  | 26 ++++ Documentation/gpu/drm-usage-stats.rst         | 108 +++++++++++++ Documentation/gpu/i915.rst                    | 27 ++++ Documentation/gpu/index.rst                   |   1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c    | 18 ++- drivers/gpu/drm/i915/Makefile                 |   5 +- drivers/gpu/drm/i915/gem/i915_gem_context.c   | 42 ++++- .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 + drivers/gpu/drm/i915/gt/intel_context.c       | 27 +++- drivers/gpu/drm/i915/gt/intel_context.h       | 15 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 24 ++- .../drm/i915/gt/intel_execlists_submission.c | 23 ++- .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 + drivers/gpu/drm/i915/gt/intel_lrc.c           | 27 ++-- drivers/gpu/drm/i915/gt/intel_lrc.h           | 24 +++ drivers/gpu/drm/i915/gt/selftest_lrc.c        | 10 +- drivers/gpu/drm/i915/i915_drm_client.c        | 143 ++++++++++++++++++ drivers/gpu/drm/i915/i915_drm_client.h        | 66 ++++++++ drivers/gpu/drm/i915/i915_drv.c               |   9 ++ drivers/gpu/drm/i915/i915_drv.h               |   5 + drivers/gpu/drm/i915/i915_gem.c               | 21 ++- drivers/gpu/drm/i915/i915_gpu_error.c         |   9 +- drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +- 23 files changed, 581 insertions(+), 61 deletions(-) create mode 100644 Documentation/gpu/drm-usage-stats.rst create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

Tvrtko Ursulin

1:50 p.m.

New subject: [Intel-gfx] [RFC 0/8] Per client GPU stats

On 23/07/2021 12:23, Christian König wrote:

...

Am 23.07.21 um 13:21 schrieb Tvrtko Ursulin:

...
On 15/07/2021 10:18, Tvrtko Ursulin wrote:

...
From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Same old work but now rebased and series ending with some DRM docs proposing the common specification which should enable nice common userspace tools to be written.

For the moment I only have intel_gpu_top converted to use this and that seems to work okay.

v2: * Added prototype of possible amdgpu changes and spec updates to align with the common spec.

Not much interest for the common specification?

Well I would rather say not much opposition :)

Hah, thanks, that's good to hear!

...

Of hand everything you do in this patch set sounds absolutely sane to me, just don't have any time to review it in detail.

That's fine - could you maybe suggest who on the AMD side could have a look at the relevant patches?

Regards,

Tvrtko

...

...
For reference I've just posted the intel-gpu-top adaptation required to parse it here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.....

Note that this is not attempting to be a vendor agnostic tool but is adding per client data to existing i915 tool which uses PMU counters for global stats.

intel-gpu-top: Intel Skylake (Gen9) @ /dev/dri/card0 - 335/ 339 MHz; 10% RC6; 1.24/ 4.18 W;      527 irqs/s

IMC reads:     3297 MiB/s      IMC writes:     2767 MiB/s

ENGINES BUSY MI_SEMA MI_WAIT        Render/3D   78.74% |██████████████████████████████████████████████████████████████████████████▏ |      0%      0%          Blitter    0.00% | |      0%      0%            Video    0.00% | |      0%      0%     VideoEnhance    0.00% | |      0%      0%

PID              NAME          Render/3D Blitter                      Video                    VideoEnhance 10202         neverball |███████████████▎ || || ||                          | 5665              Xorg |███████▍ ||                          || ||                          | 5679     xfce4-session | ||                          || ||                          | 5772      ibus-ui-gtk3 | ||                          || ||                          | 5775   ibus-extension- | ||                          || ||                          | 5777          ibus-x11 | ||                          || ||                          | 5823             xfwm4 | ||                          || ||                          |

Regards,

Tvrtko

...
Tvrtko Ursulin (8):    drm/i915: Explicitly track DRM clients    drm/i915: Make GEM contexts track DRM clients    drm/i915: Track runtime spent in closed and unreachable GEM contexts    drm/i915: Track all user contexts per client    drm/i915: Track context current active time    drm: Document fdinfo format specification    drm/i915: Expose client engine utilisation via fdinfo    drm/amdgpu: Convert to common fdinfo format

Documentation/gpu/amdgpu.rst                  | 26 ++++ Documentation/gpu/drm-usage-stats.rst         | 108 +++++++++++++ Documentation/gpu/i915.rst                    | 27 ++++ Documentation/gpu/index.rst                   |   1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c    | 18 ++- drivers/gpu/drm/i915/Makefile                 |   5 +- drivers/gpu/drm/i915/gem/i915_gem_context.c   | 42 ++++- .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 + drivers/gpu/drm/i915/gt/intel_context.c       | 27 +++- drivers/gpu/drm/i915/gt/intel_context.h       | 15 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 24 ++- .../drm/i915/gt/intel_execlists_submission.c | 23 ++- .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 + drivers/gpu/drm/i915/gt/intel_lrc.c           | 27 ++-- drivers/gpu/drm/i915/gt/intel_lrc.h           | 24 +++ drivers/gpu/drm/i915/gt/selftest_lrc.c        | 10 +- drivers/gpu/drm/i915/i915_drm_client.c        | 143 ++++++++++++++++++ drivers/gpu/drm/i915/i915_drm_client.h        | 66 ++++++++ drivers/gpu/drm/i915/i915_drv.c               |   9 ++ drivers/gpu/drm/i915/i915_drv.h               |   5 + drivers/gpu/drm/i915/i915_gem.c               | 21 ++- drivers/gpu/drm/i915/i915_gpu_error.c         |   9 +- drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +- 23 files changed, 581 insertions(+), 61 deletions(-) create mode 100644 Documentation/gpu/drm-usage-stats.rst create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

Alex Deucher

1:55 p.m.

New subject: [Intel-gfx] [RFC 0/8] Per client GPU stats

On Fri, Jul 23, 2021 at 9:51 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

On 23/07/2021 12:23, Christian König wrote:

...
Am 23.07.21 um 13:21 schrieb Tvrtko Ursulin:

...
On 15/07/2021 10:18, Tvrtko Ursulin wrote:

...
From: Tvrtko Ursulin tvrtko.ursulin@intel.com

Same old work but now rebased and series ending with some DRM docs proposing the common specification which should enable nice common userspace tools to be written.

For the moment I only have intel_gpu_top converted to use this and that seems to work okay.

v2:

Added prototype of possible amdgpu changes and spec updates to

align with the common spec.

Not much interest for the common specification?

Well I would rather say not much opposition :)

Hah, thanks, that's good to hear!

...
Of hand everything you do in this patch set sounds absolutely sane to me, just don't have any time to review it in detail.

That's fine - could you maybe suggest who on the AMD side could have a look at the relevant patches?

Adding David and Roy who did the implementation for the AMD side. Can you take a look at these patches when you get a chance?

Thanks,

Alex

...

Regards,

Tvrtko

...
...
For reference I've just posted the intel-gpu-top adaptation required to parse it here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.....

Note that this is not attempting to be a vendor agnostic tool but is adding per client data to existing i915 tool which uses PMU counters for global stats.

intel-gpu-top: Intel Skylake (Gen9) @ /dev/dri/card0 - 335/ 339 MHz; 10% RC6; 1.24/ 4.18 W; 527 irqs/s
  IMC reads:     3297 MiB/s
 IMC writes:     2767 MiB/s

     ENGINES BUSY MI_SEMA MI_WAIT
   Render/3D   78.74%
|██████████████████████████████████████████████████████████████████████████▏ | 0% 0% Blitter 0.00% | | 0% 0% Video 0.00% | | 0% 0% VideoEnhance 0.00% | | 0% 0%

PID NAME Render/3D Blitter Video VideoEnhance 10202 neverball |███████████████▎ || || || | 5665 Xorg |███████▍ || || || | 5679 xfce4-session | || || || | 5772 ibus-ui-gtk3 | || || || | 5775 ibus-extension- | || || || | 5777 ibus-x11 | || || || | 5823 xfwm4 | || || || |

Regards,

Tvrtko

...
Tvrtko Ursulin (8): drm/i915: Explicitly track DRM clients drm/i915: Make GEM contexts track DRM clients drm/i915: Track runtime spent in closed and unreachable GEM contexts drm/i915: Track all user contexts per client drm/i915: Track context current active time drm: Document fdinfo format specification drm/i915: Expose client engine utilisation via fdinfo drm/amdgpu: Convert to common fdinfo format

Documentation/gpu/amdgpu.rst | 26 ++++ Documentation/gpu/drm-usage-stats.rst | 108 +++++++++++++ Documentation/gpu/i915.rst | 27 ++++ Documentation/gpu/index.rst | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 18 ++- drivers/gpu/drm/i915/Makefile | 5 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 42 ++++- .../gpu/drm/i915/gem/i915_gem_context_types.h | 6 + drivers/gpu/drm/i915/gt/intel_context.c | 27 +++- drivers/gpu/drm/i915/gt/intel_context.h | 15 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 24 ++- .../drm/i915/gt/intel_execlists_submission.c | 23 ++- .../gpu/drm/i915/gt/intel_gt_clock_utils.c | 4 + drivers/gpu/drm/i915/gt/intel_lrc.c | 27 ++-- drivers/gpu/drm/i915/gt/intel_lrc.h | 24 +++ drivers/gpu/drm/i915/gt/selftest_lrc.c | 10 +- drivers/gpu/drm/i915/i915_drm_client.c | 143 ++++++++++++++++++ drivers/gpu/drm/i915/i915_drm_client.h | 66 ++++++++ drivers/gpu/drm/i915/i915_drv.c | 9 ++ drivers/gpu/drm/i915/i915_drv.h | 5 + drivers/gpu/drm/i915/i915_gem.c | 21 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 9 +- drivers/gpu/drm/i915/i915_gpu_error.h | 2 +- 23 files changed, 581 insertions(+), 61 deletions(-) create mode 100644 Documentation/gpu/drm-usage-stats.rst create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

1373

Age (days ago)

1384

Last active (days ago)

dri-devel@lists.freedesktop.org

18 comments

6 participants

tags (0)

participants (6)

Alex Deucher
Christian König
Daniel Stone
Daniel Vetter
Nieto, David M
Tvrtko Ursulin