On Tue, 28 May 2019 15:53:48 +0200 Boris Brezillon boris.brezillon@collabora.com wrote:
Hi Steve,
On Thu, 16 May 2019 17:21:39 +0100 Steven Price steven.price@arm.com wrote:
+static void panfrost_perfcnt_setup(struct panfrost_device *pfdev) +{
u32 cfg;
/*
* Always use address space 0 for now.
* FIXME: this needs to be updated when we start using different
* address space.
*/
cfg = GPU_PERFCNT_CFG_AS(0);
if (panfrost_model_cmp(pfdev, 0x1000) >= 0)
You've got a couple of these. Perhaps we should add either a model_is_bifrost() helper or an is_bifrost variable to use.
cfg |= GPU_PERFCNT_CFG_SETSEL(1);
mali_kbase only sets this if CONFIG_MALI_PRFCNT_SET_SECONDARY is defined
- i.e. this selects a different selection of counters. At at the very
least this should be an optional flag that can be set, but I suspect the primary set are the ones you are interested in.
I'll add a param to pass the set of counters to monitor.
gpu_write(pfdev, GPU_PERFCNT_CFG,
cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
if (!pfdev->perfcnt->enabled)
return;
gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0xffffffff);
gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0xffffffff);
gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0xffffffff);
/*
* Due to PRLAM-8186 we need to disable the Tiler before we enable HW
* counters.
Seems like you could just always apply the work-around? It doesn't look like it would be much different.
Technically the workaround causes the driver to perform the operation in the wrong order below (write TILER_EN when the mode is MANUAL) - this is a workaround for the hardware issue (and therefore works on that hardware). But I wouldn't like to say it works on other hardware which doesn't have the issue.
Okay, I'll keep the workaround only for HW that are impacted by the bug.
*/
if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
else
gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
gpu_write(pfdev, GPU_PERFCNT_CFG,
cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_MANUAL));
if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
+}
[...]
+int panfrost_perfcnt_init(struct panfrost_device *pfdev) +{
struct panfrost_perfcnt *perfcnt;
struct drm_gem_shmem_object *bo;
size_t size;
u32 status;
int ret;
if (panfrost_has_hw_feature(pfdev, HW_FEATURE_V4)) {
unsigned int ncoregroups;
ncoregroups = hweight64(pfdev->features.l2_present);
size = ncoregroups * BLOCKS_PER_COREGROUP *
COUNTERS_PER_BLOCK * BYTES_PER_COUNTER;
} else {
unsigned int nl2c, ncores;
/*
* TODO: define a macro to extract the number of l2 caches from
* mem_features.
*/
nl2c = ((pfdev->features.mem_features >> 8) & GENMASK(3, 0)) + 1;
/*
* The ARM driver is grouping cores per core group and then
* only using the number of cores in group 0 to calculate the
* size. Not sure why this is done like that, but I guess
* shader_present will only show cores in the first group
* anyway.
*/
I'm not sure I understand this comment. I'm not sure which version of the driver you are looking at, but shader_present should contain all the cores (in every core group).
The kbase driver (in panfrost's git tree[1]), contains:
base_gpu_props *props = &kbdev->gpu_props.props; u32 nr_l2 = props->l2_props.num_l2_slices; u64 core_mask = props->coherency_info.group[0].core_mask;
u32 nr_blocks = fls64(core_mask);
/* JM and tiler counter blocks are always present */ dump_size = (2 + nr_l2 + nr_blocks) * NR_CNT_PER_BLOCK * NR_BYTES_PER_CNT;
[1] https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/produc...
i.e. the number of cores + number of L2 caches (plus 2 for JM/tiler) multiplied by the block size.
Actually, I was referring to what's done in kbase_gpuprops_construct_coherent_groups() [2].
Also another difference with the below is that fls64 and hweight64 don't return the same numbers. If the core mask is non-contiguous this will return different values.
Right, I should use fls64() not hweight64().
Oops, forgot to ask the extra question I had. Looks like when the GPU is busy and I enable/dump/disable perfcnt too quickly, I sometime gets {JM,SHADER,MMU_L2,TILER}_COUNTER[2] = 0 while I'd expect it to be set to the {JM,SHADER,MMU_L2,TILER}_EN value (0xff or 0xffff depending on the block). I made sure I was receiving the SAMPLE_DONE+CLEAN_DONE events. Any idea what could cause this?
Note that when the GPU is idle, {JM,SHADER,MMU_L2,TILER}_COUNTER[2] are set to the value I expect (0xff or 0xffff depending on the block).