Re: [RESEND PATCH v2 2/2] drm/panfrost: Expose perf counters through debugfs

28 May 2019

      On Tue, 28 May 2019 15:53:48 +0200
Boris Brezillon boris.brezillon@collabora.com wrote:
...
Hi Steve,
On Thu, 16 May 2019 17:21:39 +0100
Steven Price steven.price@arm.com wrote:
...
...
...
+static void panfrost_perfcnt_setup(struct panfrost_device *pfdev)
+{

  u32 cfg;

  /*

   * Always use address space 0 for now.

   * FIXME: this needs to be updated when we start using different

   * address space.

   */

  cfg = GPU_PERFCNT_CFG_AS(0);

  if (panfrost_model_cmp(pfdev, 0x1000) >= 0)    

You've got a couple of these. Perhaps we should add either a
model_is_bifrost() helper or an is_bifrost variable to use.
...

          cfg |= GPU_PERFCNT_CFG_SETSEL(1);    

mali_kbase only sets this if CONFIG_MALI_PRFCNT_SET_SECONDARY is defined

i.e. this selects a different selection of counters. At at the very

least this should be an optional flag that can be set, but I suspect the
primary set are the ones you are interested in.
I'll add a param to pass the set of counters to monitor.
...
...
...

  gpu_write(pfdev, GPU_PERFCNT_CFG,

            cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));

  if (!pfdev->perfcnt->enabled)

          return;

  gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0xffffffff);

  gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0xffffffff);

  gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0xffffffff);

  /*

   * Due to PRLAM-8186 we need to disable the Tiler before we enable HW

   * counters.    

Seems like you could just always apply the work-around? It doesn't
look like it would be much different.
Technically the workaround causes the driver to perform the operation in
the wrong order below (write TILER_EN when the mode is MANUAL) - this is
a workaround for the hardware issue (and therefore works on that
hardware). But I wouldn't like to say it works on other hardware which
doesn't have the issue.
Okay, I'll keep the workaround only for HW that are impacted by the bug.
...
...
...

   */

  if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))

          gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);

  else

          gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);

  gpu_write(pfdev, GPU_PERFCNT_CFG,

            cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_MANUAL));

  if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))

          gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);

+}
[...]
...
...
...
+int panfrost_perfcnt_init(struct panfrost_device *pfdev)
+{

  struct panfrost_perfcnt *perfcnt;

  struct drm_gem_shmem_object *bo;

  size_t size;

  u32 status;

  int ret;

  if (panfrost_has_hw_feature(pfdev, HW_FEATURE_V4)) {

          unsigned int ncoregroups;

          ncoregroups = hweight64(pfdev->features.l2_present);

          size = ncoregroups * BLOCKS_PER_COREGROUP *

                 COUNTERS_PER_BLOCK * BYTES_PER_COUNTER;

  } else {

          unsigned int nl2c, ncores;

          /*

           * TODO: define a macro to extract the number of l2 caches from

           * mem_features.

           */

          nl2c = ((pfdev->features.mem_features >> 8) & GENMASK(3, 0)) + 1;

          /*

           * The ARM driver is grouping cores per core group and then

           * only using the number of cores in group 0 to calculate the

           * size. Not sure why this is done like that, but I guess

           * shader_present will only show cores in the first group

           * anyway.

           */    

I'm not sure I understand this comment. I'm not sure which version of
the driver you are looking at, but shader_present should contain all the
cores (in every core group).
The kbase driver (in panfrost's git tree[1]), contains:
base_gpu_props *props = &kbdev->gpu_props.props;
   u32 nr_l2 = props->l2_props.num_l2_slices;
   u64 core_mask = props->coherency_info.group[0].core_mask;
u32 nr_blocks = fls64(core_mask);
/* JM and tiler counter blocks are always present */
   dump_size = (2 + nr_l2 + nr_blocks) *
   		NR_CNT_PER_BLOCK *
   		NR_BYTES_PER_CNT;
[1]
https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/produc...
i.e. the number of cores + number of L2 caches (plus 2 for JM/tiler)
multiplied by the block size.
Actually, I was referring to what's done in
kbase_gpuprops_construct_coherent_groups() [2].
...
Also another difference with the below is that fls64 and hweight64 don't
return the same numbers. If the core mask is non-contiguous this will
return different values.
Right, I should use fls64() not hweight64().
Oops, forgot to ask the extra question I had. Looks like when the GPU
is busy and I enable/dump/disable perfcnt too quickly, I sometime gets
{JM,SHADER,MMU_L2,TILER}_COUNTER[2] = 0 while I'd expect it to be set
to the {JM,SHADER,MMU_L2,TILER}_EN value (0xff or 0xffff depending on
the block). I made sure I was receiving the SAMPLE_DONE+CLEAN_DONE
events. Any idea what could cause this?
Note that when the GPU is idle, {JM,SHADER,MMU_L2,TILER}_COUNTER[2] are
set to the value I expect (0xff or 0xffff depending on the block).

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [RESEND PATCH v2 2/2] drm/panfrost: Expose perf counters through debugfs