On Fri, May 15, 2015 at 02:07:29AM +0100, Robert Bragg wrote:
On Fri, May 8, 2015 at 5:24 PM, Peter Zijlstra peterz@infradead.org wrote:
On Thu, May 07, 2015 at 03:15:43PM +0100, Robert Bragg wrote:
I've changed the uapi for configuring the i915_oa specific attributes when calling perf_event_open(2) whereby instead of cramming lots of bitfields into the perf_event_attr config members, I'm now daisy-chaining a drm_i915_oa_event_attr_t structure off of a single config member that's extensible and validated in the same way as the perf_event_attr struct. I've found this much nicer to work with while being neatly extensible too.
This worries me a bit.. is there more background for this?
Would it maybe be helpful to see the before and after? I had kept this uapi change in a separate patch for a while locally but in the end decided to squash it before sending out my updated series.
Although I did find it a bit awkward with the bitfields, I was mainly concerned about the extensibility of packing logically separate attributes into the config members and had heard similar concerns from a few others who had been experimenting with my patches too.
A few simple attributes I can think of a.t.m that we might want to add in the future are:
- control of the OABUFFER size
- a way to ask the kernel to collect reports at the beginning and end
of batch buffers, in addition to periodic reports
- alternative ways to uniquely identify a context to support tools
profiling a single context not necessarily owned by the current process
It could also be interesting to expose some counter configuration through these attributes too. E.g. on Broadwell+ we have 14 'Flexible EU' counters included in the OA unit reports, each with a 16bit configuration.
In a more extreme case it might also be useful to allow userspace to specify a complete counter config, which (depending on the configuration) could be over 100 32bit values to select the counter signals + configure the corresponding combining logic.
Since this pmu is in a device driver it also seemed reasonably appropriate to de-couple it slightly from the core perf_event_attr structure by allowing driver extensible attributes.
I wonder if it might be less worrisome if the i915_oa_copy_attr() code were instead a re-usable utility perhaps maintained in events/core.c, so if other pmu drivers were to follow suite there would be less risk of a mistake being made here?
So I had a peek at:
https://01.org/sites/default/files/documentation/observability_performance_c...
In an attempt to inform myself of how the hardware worked. But the document is rather sparse (and does not include broadwell).
So from what I can gather there's two ways to observe the counters, through MMIO or trough the ring-buffer, which in turn seems triggered by a MI_REPORT_PERF_COUNT command.
[ Now the document talks about shortcomings of this scheme, where the MI_REPORT_PERF_COUNT command can only be placed every other command, but a single command can contain so much computation that this is not fine grained enough -- leading to the suggestion of a timer/cycle based reporting, but that is not actually mentioned afaict ]
Now the MI_REPORT_PERF_COUNT can select a vector (Counter Select) of which events it will write out.
This covers the regular 'A' counters. Is this correct?
Then there are the 'B' counters, which appear to be programmable through the CEC MMIO registers.
These B events can also be part of the MI_REPORT_PERF_COUNT vector.
Right?
So for me the 'natural' way to represent this in perf would be through event groups. Create a perf event for every single event -- yes this is 53 events.
Use the MMIO reads for the regular read() interface, and use a hrtimer placing MI_REPORT_PERF_COUNT commands, with a counter select mask covering the all events in the current group, for sampling.
Then obtain the vector values using PERF_SAMPLE_READ and PERF_FORMAT_READ -- and use the read txn support to consume the ring-buffer vectors instead of the MMIO registers.
You can use the perf_event_attr::config to select the counter (A0-A44, B0-B7) and use perf_event_attr::config1 (low and high dword) for the corresponding CEC registers.
This does not require random per driver ABI extentions for perf_event_attr, nor your custom output format.
Am I missing something obvious here?