On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:
Hi folks,
This is the plain text version of the previous email in case that was considered as spam.
--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However, debugfs nodes are getting deprecated in the next Android release.
Maybe explain why it is useful first ?
--- Proposal --- We are taking the chance to unify all the vendors to migrate their existing debugfs nodes into a standardized sysfs node structure. Then the platform is able to do a bunch of useful things: memory profiling, system health coverage, field metrics, local shell dump, in-app api, etc. This proposal is better served upstream as all GPU vendors can standardize a gpu memory structure and reduce fragmentation across Android and Linux that clients can rely on.
--- Detailed design --- The sysfs node structure looks like below: /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a node having the comma separated size values: "4096,81920,...,4096".
How does kernel knows what API the allocation is use for ? With the open source driver you never specify what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...).
For the top level root, vendors can choose their own names based on the value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd KMDs. (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name under /sys/devices/ is already created and used for other purposes.
On one side you want to standardize on the other you want to give complete freedom on the top level naming scheme. I would rather see a consistent naming scheme (ie something more restraint and with little place for interpration by individual driver) .
For the 2nd level "pid", there are usually just a couple of them per snapshot, since we only takes snapshot for the active ones.
? Do not understand here, you can have any number of applications with GPU objects ? And thus there is no bound on the number of PID. Please consider desktop too, i do not know what kind of limitation android impose.
For the 3rd level "type_name", the type name will be one of the GPU memory object types in lower case, and the value will be a comma separated sequence of size values for all the allocations under that specific type.
We especially would like some comments on this part. For the GPU memory object types, we defined 9 different types for Android: (1) UNKNOWN // not accounted for in any other category (2) SHADER // shader binaries (3) COMMAND // allocations which have a lifetime similar to a VkCommandBuffer (4) VULKAN // backing for VkDeviceMemory (5) GL_TEXTURE // GL Texture and RenderBuffer (6) GL_BUFFER // GL Buffer (7) QUERY // backing for query (8) DESCRIPTOR // allocations which have a lifetime similar to a VkDescriptorSet (9) TRANSIENT // random transient things that the driver needs
We are wondering if those type enumerations make sense to the upstream side as well, or maybe we just deal with our own different type sets. Cuz on the Android side, we'll just read those nodes named after the types we defined in the sysfs node structure.
See my above point of open source driver and kernel being unaware of the allocation purpose and use.
Cheers, Jérôme