Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

List overview All Threads
Download

newer

older

Atomic KMS API lacks the ability...

[GIT PULL] etnaviv-next for 5.7

Yiwei Zhang

25 Oct 2019 25 Oct '19

6:35 p.m.

Hi folks,

This is the plain text version of the previous email in case that was considered as spam.

--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However, debugfs nodes are getting deprecated in the next Android release.

--- Proposal --- We are taking the chance to unify all the vendors to migrate their existing debugfs nodes into a standardized sysfs node structure. Then the platform is able to do a bunch of useful things: memory profiling, system health coverage, field metrics, local shell dump, in-app api, etc. This proposal is better served upstream as all GPU vendors can standardize a gpu memory structure and reduce fragmentation across Android and Linux that clients can rely on.

--- Detailed design --- The sysfs node structure looks like below: /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a node having the comma separated size values: "4096,81920,...,4096".

For the top level root, vendors can choose their own names based on the value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd KMDs. (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name under /sys/devices/ is already created and used for other purposes.

For the 2nd level "pid", there are usually just a couple of them per snapshot, since we only takes snapshot for the active ones.

For the 3rd level "type_name", the type name will be one of the GPU memory object types in lower case, and the value will be a comma separated sequence of size values for all the allocations under that specific type.

We especially would like some comments on this part. For the GPU memory object types, we defined 9 different types for Android: (1) UNKNOWN // not accounted for in any other category (2) SHADER // shader binaries (3) COMMAND // allocations which have a lifetime similar to a VkCommandBuffer (4) VULKAN // backing for VkDeviceMemory (5) GL_TEXTURE // GL Texture and RenderBuffer (6) GL_BUFFER // GL Buffer (7) QUERY // backing for query (8) DESCRIPTOR // allocations which have a lifetime similar to a VkDescriptorSet (9) TRANSIENT // random transient things that the driver needs

We are wondering if those type enumerations make sense to the upstream side as well, or maybe we just deal with our own different type sets. Cuz on the Android side, we'll just read those nodes named after the types we defined in the sysfs node structure.

Looking forward to any concerns/comments/suggestions!

Best regards, Yiwei

Attachments:

attachment.html (text/html — 3.2 KB)

Show replies by date

Jerome Glisse

28 Oct 28 Oct

3:26 p.m.

On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:

...

Hi folks,

This is the plain text version of the previous email in case that was considered as spam.

--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However, debugfs nodes are getting deprecated in the next Android release.

Maybe explain why it is useful first ?

...

--- Proposal --- We are taking the chance to unify all the vendors to migrate their existing debugfs nodes into a standardized sysfs node structure. Then the platform is able to do a bunch of useful things: memory profiling, system health coverage, field metrics, local shell dump, in-app api, etc. This proposal is better served upstream as all GPU vendors can standardize a gpu memory structure and reduce fragmentation across Android and Linux that clients can rely on.

--- Detailed design --- The sysfs node structure looks like below: /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a node having the comma separated size values: "4096,81920,...,4096".

How does kernel knows what API the allocation is use for ? With the open source driver you never specify what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...).

...

For the top level root, vendors can choose their own names based on the value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd KMDs. (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name under /sys/devices/ is already created and used for other purposes.

On one side you want to standardize on the other you want to give complete freedom on the top level naming scheme. I would rather see a consistent naming scheme (ie something more restraint and with little place for interpration by individual driver) .

...

For the 2nd level "pid", there are usually just a couple of them per snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with GPU objects ? And thus there is no bound on the number of PID. Please consider desktop too, i do not know what kind of limitation android impose.

...

For the 3rd level "type_name", the type name will be one of the GPU memory object types in lower case, and the value will be a comma separated sequence of size values for all the allocations under that specific type.

We especially would like some comments on this part. For the GPU memory object types, we defined 9 different types for Android: (1) UNKNOWN // not accounted for in any other category (2) SHADER // shader binaries (3) COMMAND // allocations which have a lifetime similar to a VkCommandBuffer (4) VULKAN // backing for VkDeviceMemory (5) GL_TEXTURE // GL Texture and RenderBuffer (6) GL_BUFFER // GL Buffer (7) QUERY // backing for query (8) DESCRIPTOR // allocations which have a lifetime similar to a VkDescriptorSet (9) TRANSIENT // random transient things that the driver needs

We are wondering if those type enumerations make sense to the upstream side as well, or maybe we just deal with our own different type sets. Cuz on the Android side, we'll just read those nodes named after the types we defined in the sysfs node structure.

See my above point of open source driver and kernel being unaware of the allocation purpose and use.

Cheers, Jérôme

Yiwei Zhang

6:33 p.m.

On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse jglisse@redhat.com wrote:

...

On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:

...
Hi folks,

This is the plain text version of the previous email in case that was considered as spam.

--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However, debugfs

nodes

...
are getting deprecated in the next Android release.

Maybe explain why it is useful first ?

Memory is precious on Android mobile platforms. Apps using a large amount of memory, games, tend to maintain a table for the memory on different devices with different prediction models. Private gpu memory allocations is currently semi-blind to the apps and the platform as well.

By having the data, the platform can do: (1) GPU memory profiling as part of the huge Android profiler in progress. (2) Android system health team can enrich the performance test coverage. (3) We can collect filed metrics to detect any regression on the gpu private memory allocations in the production population. (4) Shell user can easily dump the allocations in a uniform way across vendors. (5) Platform can feed the data to the apps so that apps can do memory allocations in a more predictable way.

...

...
--- Proposal --- We are taking the chance to unify all the vendors to migrate their

existing

...
debugfs nodes into a standardized sysfs node structure. Then the platform is able to do a bunch of useful things: memory profiling, system health coverage, field metrics, local shell dump, in-app api, etc. This proposal is better served upstream as all GPU vendors can standardize a gpu memory structure and reduce fragmentation across Android and Linux that clients can rely on.

--- Detailed design --- The sysfs node structure looks like below: /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a

node

...
having the comma separated size values: "4096,81920,...,4096".

How does kernel knows what API the allocation is use for ? With the open source driver you never specify what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...).

Oh, is this a hard requirement for the open source drivers to not bookkeep any data from userland? I think the API is just some additional metadata passed down.

...

...
For the top level root, vendors can choose their own names based on the value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd

KMDs.

...
(2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name under /sys/devices/ is already created and used for other purposes.

On one side you want to standardize on the other you want to give complete freedom on the top level naming scheme. I would rather see a consistent naming scheme (ie something more restraint and with little place for interpration by individual driver)

Thanks for commenting on this. We definitely need some suggestions on the root directory. In the multi-gpu case on desktop, is there some existing consumer to query "some data" from all the GPUs? How does the tool find all GPUs and differentiate between them? Is this already standardized?

...

For the 2nd level "pid", there are usually just a couple of them per

...
snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with GPU objects ? And thus there is no bound on the number of PID. Please consider desktop too, i do not know what kind of limitation android impose.

We are only interested in tracking *active* GPU private allocations. So yes, any application currently holding an active GPU context will probably has a node here. Since we want to do profiling for specific apps, the data has to be per application based. I don't get your concerns here. If it's about the tracking overhead, it's rare to see tons of application doing private gpu allocations at the same time. Could you help elaborate a bit?

...

For the 3rd level "type_name", the type name will be one of the GPU memory

...
object types in lower case, and the value will be a comma separated sequence of size values for all the allocations under that specific type.

We especially would like some comments on this part. For the GPU memory object types, we defined 9 different types for Android: (1) UNKNOWN // not accounted for in any other category (2) SHADER // shader binaries (3) COMMAND // allocations which have a lifetime similar to a VkCommandBuffer (4) VULKAN // backing for VkDeviceMemory (5) GL_TEXTURE // GL Texture and RenderBuffer (6) GL_BUFFER // GL Buffer (7) QUERY // backing for query (8) DESCRIPTOR // allocations which have a lifetime similar to a VkDescriptorSet (9) TRANSIENT // random transient things that the driver needs

We are wondering if those type enumerations make sense to the upstream

side

...
as well, or maybe we just deal with our own different type sets. Cuz on

the

...
Android side, we'll just read those nodes named after the types we

defined

...
in the sysfs node structure.

See my above point of open source driver and kernel being unaware of the allocation purpose and use.

Cheers, Jérôme

Many thanks for the reply! Yiwei

Yiwei Zhang

29 Oct 29 Oct

1:19 a.m.

Hi Jerome and all folks,

In addition to my last reply, I just wanna get some more information regarding this on the upstream side.

1. Do you think this(standardize a way to report GPU private allocations) is going to be a useful thing on the upstream as well? It grants a lot benefits for Android, but I'd like to get an idea for the non-Android world.

2. There might be some worries that upstream kernel driver has no idea regarding the API. However, to achieve good fidelity around memory reporting, we'd have to pass down certain metadata which is known only by the userland. Consider this use case: on the upstream side, freedreno for example, some memory buffer object(BO) during its own lifecycle could represent totally different things, and kmd is not aware of that. When we'd like to take memory snapshots at certain granularity, we have to know what that buffer represents so that the snapshot can be meaningful and useful.

If we just keep this Android specific, I'd worry some day the upstream has standardized a way to report this and Android vendors have to take extra efforts to migrate over. This is one of the main reasons we'd like to do this on the upstream side.

Timeline wise, Android has explicit deadlines for the next release and we have to push hard towards those. Any prompt responses are very much appreciated!

Best regards, Yiwei

On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang zzyiwei@google.com wrote:

...

On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse jglisse@redhat.com wrote:

...
On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:

...
Hi folks,

This is the plain text version of the previous email in case that was considered as spam.

--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However, debugfs

nodes

...
are getting deprecated in the next Android release.

Maybe explain why it is useful first ?

Memory is precious on Android mobile platforms. Apps using a large amount of memory, games, tend to maintain a table for the memory on different devices with different prediction models. Private gpu memory allocations is currently semi-blind to the apps and the platform as well.

By having the data, the platform can do: (1) GPU memory profiling as part of the huge Android profiler in progress. (2) Android system health team can enrich the performance test coverage. (3) We can collect filed metrics to detect any regression on the gpu private memory allocations in the production population. (4) Shell user can easily dump the allocations in a uniform way across vendors. (5) Platform can feed the data to the apps so that apps can do memory allocations in a more predictable way.

...
...
--- Proposal --- We are taking the chance to unify all the vendors to migrate their

existing

...
debugfs nodes into a standardized sysfs node structure. Then the

platform

...
is able to do a bunch of useful things: memory profiling, system health coverage, field metrics, local shell dump, in-app api, etc. This

proposal

...
is better served upstream as all GPU vendors can standardize a gpu

memory

...
structure and reduce fragmentation across Android and Linux that clients can rely on.

--- Detailed design --- The sysfs node structure looks like below: /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a

node

...
having the comma separated size values: "4096,81920,...,4096".

How does kernel knows what API the allocation is use for ? With the open source driver you never specify what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...).

Oh, is this a hard requirement for the open source drivers to not bookkeep any data from userland? I think the API is just some additional metadata passed down.

...
...
For the top level root, vendors can choose their own names based on the value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd

KMDs.

...
(2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name under /sys/devices/ is already created and used for other purposes.

On one side you want to standardize on the other you want to give complete freedom on the top level naming scheme. I would rather see a consistent naming scheme (ie something more restraint and with little place for interpration by individual driver)

Thanks for commenting on this. We definitely need some suggestions on the root directory. In the multi-gpu case on desktop, is there some existing consumer to query "some data" from all the GPUs? How does the tool find all GPUs and differentiate between them? Is this already standardized?

...
For the 2nd level "pid", there are usually just a couple of them per

...
snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with GPU objects ? And thus there is no bound on the number of PID. Please consider desktop too, i do not know what kind of limitation android impose.

We are only interested in tracking *active* GPU private allocations. So yes, any application currently holding an active GPU context will probably has a node here. Since we want to do profiling for specific apps, the data has to be per application based. I don't get your concerns here. If it's about the tracking overhead, it's rare to see tons of application doing private gpu allocations at the same time. Could you help elaborate a bit?

...
For the 3rd level "type_name", the type name will be one of the GPU memory

...
object types in lower case, and the value will be a comma separated sequence of size values for all the allocations under that specific

type.

...
We especially would like some comments on this part. For the GPU memory object types, we defined 9 different types for Android: (1) UNKNOWN // not accounted for in any other category (2) SHADER // shader binaries (3) COMMAND // allocations which have a lifetime similar to a VkCommandBuffer (4) VULKAN // backing for VkDeviceMemory (5) GL_TEXTURE // GL Texture and RenderBuffer (6) GL_BUFFER // GL Buffer (7) QUERY // backing for query (8) DESCRIPTOR // allocations which have a lifetime similar to a VkDescriptorSet (9) TRANSIENT // random transient things that the driver needs

We are wondering if those type enumerations make sense to the upstream

side

...
as well, or maybe we just deal with our own different type sets. Cuz on

the

...
Android side, we'll just read those nodes named after the types we

defined

...
in the sysfs node structure.

See my above point of open source driver and kernel being unaware of the allocation purpose and use.

Cheers, Jérôme

Many thanks for the reply! Yiwei

Kenny Ho

31 Oct 31 Oct

5:23 a.m.

Hi Yiwei,

I am not sure if you are aware, there is an ongoing RFC on adding drm support in cgroup for the purpose of resource tracking. One of the resource is GPU memory. It's not exactly the same as what you are proposing (it doesn't track API usage, but it tracks the type of GPU memory from kmd perspective) but perhaps it would be of interest to you. There are no consensus on it at this point.

(sorry for being late to the discussion. I only noticed this thread when one of the email got lucky and escape the spam folder.)

Regards, Kenny

On Wed, Oct 30, 2019 at 4:14 AM Yiwei Zhang zzyiwei@google.com wrote:

...

Hi Jerome and all folks,

In addition to my last reply, I just wanna get some more information regarding this on the upstream side.

Do you think this(standardize a way to report GPU private allocations) is going to be a useful thing on the upstream as well? It grants a lot benefits for Android, but I'd like to get an idea for the non-Android world.

There might be some worries that upstream kernel driver has no idea regarding the API. However, to achieve good fidelity around memory reporting, we'd have to pass down certain metadata which is known only by the userland. Consider this use case: on the upstream side, freedreno for example, some memory buffer object(BO) during its own lifecycle could represent totally different things, and kmd is not aware of that. When we'd like to take memory snapshots at certain granularity, we have to know what that buffer represents so that the snapshot can be meaningful and useful.

If we just keep this Android specific, I'd worry some day the upstream has standardized a way to report this and Android vendors have to take extra efforts to migrate over. This is one of the main reasons we'd like to do this on the upstream side.

Timeline wise, Android has explicit deadlines for the next release and we have to push hard towards those. Any prompt responses are very much appreciated!

Best regards, Yiwei

On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang zzyiwei@google.com wrote:

...
On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse jglisse@redhat.com wrote:

...
On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:

...
Hi folks,

This is the plain text version of the previous email in case that was considered as spam.

--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However, debugfs nodes are getting deprecated in the next Android release.

Maybe explain why it is useful first ?

Memory is precious on Android mobile platforms. Apps using a large amount of memory, games, tend to maintain a table for the memory on different devices with different prediction models. Private gpu memory allocations is currently semi-blind to the apps and the platform as well.

By having the data, the platform can do: (1) GPU memory profiling as part of the huge Android profiler in progress. (2) Android system health team can enrich the performance test coverage. (3) We can collect filed metrics to detect any regression on the gpu private memory allocations in the production population. (4) Shell user can easily dump the allocations in a uniform way across vendors. (5) Platform can feed the data to the apps so that apps can do memory allocations in a more predictable way.

...
...
--- Proposal --- We are taking the chance to unify all the vendors to migrate their existing debugfs nodes into a standardized sysfs node structure. Then the platform is able to do a bunch of useful things: memory profiling, system health coverage, field metrics, local shell dump, in-app api, etc. This proposal is better served upstream as all GPU vendors can standardize a gpu memory structure and reduce fragmentation across Android and Linux that clients can rely on.

--- Detailed design --- The sysfs node structure looks like below: /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a node having the comma separated size values: "4096,81920,...,4096".

How does kernel knows what API the allocation is use for ? With the open source driver you never specify what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...).

Oh, is this a hard requirement for the open source drivers to not bookkeep any data from userland? I think the API is just some additional metadata passed down.

...
...
For the top level root, vendors can choose their own names based on the value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd KMDs. (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name under /sys/devices/ is already created and used for other purposes.

On one side you want to standardize on the other you want to give complete freedom on the top level naming scheme. I would rather see a consistent naming scheme (ie something more restraint and with little place for interpration by individual driver)

Thanks for commenting on this. We definitely need some suggestions on the root directory. In the multi-gpu case on desktop, is there some existing consumer to query "some data" from all the GPUs? How does the tool find all GPUs and differentiate between them? Is this already standardized?

...
...
For the 2nd level "pid", there are usually just a couple of them per snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with GPU objects ? And thus there is no bound on the number of PID. Please consider desktop too, i do not know what kind of limitation android impose.

We are only interested in tracking *active* GPU private allocations. So yes, any application currently holding an active GPU context will probably has a node here. Since we want to do profiling for specific apps, the data has to be per application based. I don't get your concerns here. If it's about the tracking overhead, it's rare to see tons of application doing private gpu allocations at the same time. Could you help elaborate a bit?

...
...
For the 3rd level "type_name", the type name will be one of the GPU memory object types in lower case, and the value will be a comma separated sequence of size values for all the allocations under that specific type.

We especially would like some comments on this part. For the GPU memory object types, we defined 9 different types for Android: (1) UNKNOWN // not accounted for in any other category (2) SHADER // shader binaries (3) COMMAND // allocations which have a lifetime similar to a VkCommandBuffer (4) VULKAN // backing for VkDeviceMemory (5) GL_TEXTURE // GL Texture and RenderBuffer (6) GL_BUFFER // GL Buffer (7) QUERY // backing for query (8) DESCRIPTOR // allocations which have a lifetime similar to a VkDescriptorSet (9) TRANSIENT // random transient things that the driver needs

We are wondering if those type enumerations make sense to the upstream side as well, or maybe we just deal with our own different type sets. Cuz on the Android side, we'll just read those nodes named after the types we defined in the sysfs node structure.

See my above point of open source driver and kernel being unaware of the allocation purpose and use.

Cheers, Jérôme

Many thanks for the reply! Yiwei

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Yiwei Zhang

4:59 p.m.

Hi Kenny,

Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread?

Best, Yiwei

On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho y2kenny@gmail.com wrote:

...

Hi Yiwei,

I am not sure if you are aware, there is an ongoing RFC on adding drm support in cgroup for the purpose of resource tracking. One of the resource is GPU memory. It's not exactly the same as what you are proposing (it doesn't track API usage, but it tracks the type of GPU memory from kmd perspective) but perhaps it would be of interest to you. There are no consensus on it at this point.

(sorry for being late to the discussion. I only noticed this thread when one of the email got lucky and escape the spam folder.)

Regards, Kenny

On Wed, Oct 30, 2019 at 4:14 AM Yiwei Zhang zzyiwei@google.com wrote:

...
Hi Jerome and all folks,

In addition to my last reply, I just wanna get some more information

regarding this on the upstream side.

...

Do you think this(standardize a way to report GPU private

allocations) is going to be a useful thing on the upstream as well? It grants a lot benefits for Android, but I'd like to get an idea for the non-Android world.

...

There might be some worries that upstream kernel driver has no idea

regarding the API. However, to achieve good fidelity around memory reporting, we'd have to pass down certain metadata which is known only by the userland. Consider this use case: on the upstream side, freedreno for example, some memory buffer object(BO) during its own lifecycle could represent totally different things, and kmd is not aware of that. When we'd like to take memory snapshots at certain granularity, we have to know what that buffer represents so that the snapshot can be meaningful and useful.

...
If we just keep this Android specific, I'd worry some day the upstream

has standardized a way to report this and Android vendors have to take extra efforts to migrate over. This is one of the main reasons we'd like to do this on the upstream side.

...
Timeline wise, Android has explicit deadlines for the next release and

we have to push hard towards those. Any prompt responses are very much appreciated!

...
Best regards, Yiwei

On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang zzyiwei@google.com wrote:

...
On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse jglisse@redhat.com

wrote:

...
...
...
On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:

...
Hi folks,

This is the plain text version of the previous email in case that was considered as spam.

--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However,

debugfs nodes

...
...
...
...
are getting deprecated in the next Android release.

Maybe explain why it is useful first ?

Memory is precious on Android mobile platforms. Apps using a large

amount of

...
...
memory, games, tend to maintain a table for the memory on different

devices with

...
...
different prediction models. Private gpu memory allocations is

currently semi-blind

...
...
to the apps and the platform as well.

By having the data, the platform can do: (1) GPU memory profiling as part of the huge Android profiler in

progress.

...
...
(2) Android system health team can enrich the performance test coverage. (3) We can collect filed metrics to detect any regression on the gpu

private memory

...
...
allocations in the production population. (4) Shell user can easily dump the allocations in a uniform way across

vendors.

...
...
(5) Platform can feed the data to the apps so that apps can do memory

allocations

...
...
in a more predictable way.

...
...
--- Proposal --- We are taking the chance to unify all the vendors to migrate their

existing

...
...
...
...
debugfs nodes into a standardized sysfs node structure. Then the

platform

...
...
...
...
is able to do a bunch of useful things: memory profiling, system

health

...
...
...
...
coverage, field metrics, local shell dump, in-app api, etc. This

proposal

...
...
...
...
is better served upstream as all GPU vendors can standardize a gpu

memory

...
...
...
...
structure and reduce fragmentation across Android and Linux that

clients

...
...
...
...
can rely on.

--- Detailed design --- The sysfs node structure looks like below: /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is

a node

...
...
...
...
having the comma separated size values: "4096,81920,...,4096".

How does kernel knows what API the allocation is use for ? With the open source driver you never specify what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...).

Oh, is this a hard requirement for the open source drivers to not

bookkeep any

...
...
data from userland? I think the API is just some additional metadata

passed down.

...
...
...
...
For the top level root, vendors can choose their own names based on

the

...
...
...
...
value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu

driver

...
...
...
...
cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd

KMDs.

...
...
...
...
(2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem"

or

...
...
...
...
"mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root

name

...
...
...
...
under /sys/devices/ is already created and used for other purposes.

On one side you want to standardize on the other you want to give complete freedom on the top level naming scheme. I would rather see a consistent naming scheme (ie something more restraint and with little place for interpration by individual driver)

Thanks for commenting on this. We definitely need some suggestions on

the root

...
...
directory. In the multi-gpu case on desktop, is there some existing

consumer to

...
...
query "some data" from all the GPUs? How does the tool find all GPUs and differentiate between them? Is this already standardized?

...
...
For the 2nd level "pid", there are usually just a couple of them per snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with GPU objects ? And thus there is no bound on the number of PID. Please consider desktop too, i do not know what kind of limitation android impose.

We are only interested in tracking *active* GPU private allocations. So

yes, any

...
...
application currently holding an active GPU context will probably has a

node here.

...
...
Since we want to do profiling for specific apps, the data has to be per

application

...
...
based. I don't get your concerns here. If it's about the tracking

overhead, it's rare

...
...
to see tons of application doing private gpu allocations at the same

time. Could

...
...
you help elaborate a bit?

...
...
For the 3rd level "type_name", the type name will be one of the GPU

memory

...
...
...
...
object types in lower case, and the value will be a comma separated sequence of size values for all the allocations under that specific

type.

...
...
...
...
We especially would like some comments on this part. For the GPU

memory

...
...
...
...
object types, we defined 9 different types for Android: (1) UNKNOWN // not accounted for in any other category (2) SHADER // shader binaries (3) COMMAND // allocations which have a lifetime similar to a VkCommandBuffer (4) VULKAN // backing for VkDeviceMemory (5) GL_TEXTURE // GL Texture and RenderBuffer (6) GL_BUFFER // GL Buffer (7) QUERY // backing for query (8) DESCRIPTOR // allocations which have a lifetime similar to a VkDescriptorSet (9) TRANSIENT // random transient things that the driver needs

We are wondering if those type enumerations make sense to the

upstream side

...
...
...
...
as well, or maybe we just deal with our own different type sets. Cuz

on the

...
...
...
...
Android side, we'll just read those nodes named after the types we

defined

...
...
...
...
in the sysfs node structure.

See my above point of open source driver and kernel being unaware of the allocation purpose and use.

Cheers, Jérôme

Many thanks for the reply! Yiwei

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Kenny Ho

5:57 p.m.

Hi Yiwei,

This is the latest series: https://patchwork.kernel.org/cover/11120371/

(I still need to reply some of the feedback.)

Regards, Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang zzyiwei@google.com wrote:

...

Hi Kenny,

Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread?

Best, Yiwei

On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

I am not sure if you are aware, there is an ongoing RFC on adding drm support in cgroup for the purpose of resource tracking. One of the resource is GPU memory. It's not exactly the same as what you are proposing (it doesn't track API usage, but it tracks the type of GPU memory from kmd perspective) but perhaps it would be of interest to you. There are no consensus on it at this point.

(sorry for being late to the discussion. I only noticed this thread when one of the email got lucky and escape the spam folder.)

Regards, Kenny

On Wed, Oct 30, 2019 at 4:14 AM Yiwei Zhang zzyiwei@google.com wrote:

...
Hi Jerome and all folks,

In addition to my last reply, I just wanna get some more information regarding this on the upstream side.

Do you think this(standardize a way to report GPU private allocations) is going to be a useful thing on the upstream as well? It grants a lot benefits for Android, but I'd like to get an idea for the non-Android world.

There might be some worries that upstream kernel driver has no idea regarding the API. However, to achieve good fidelity around memory reporting, we'd have to pass down certain metadata which is known only by the userland. Consider this use case: on the upstream side, freedreno for example, some memory buffer object(BO) during its own lifecycle could represent totally different things, and kmd is not aware of that. When we'd like to take memory snapshots at certain granularity, we have to know what that buffer represents so that the snapshot can be meaningful and useful.

If we just keep this Android specific, I'd worry some day the upstream has standardized a way to report this and Android vendors have to take extra efforts to migrate over. This is one of the main reasons we'd like to do this on the upstream side.

Timeline wise, Android has explicit deadlines for the next release and we have to push hard towards those. Any prompt responses are very much appreciated!

Best regards, Yiwei

On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang zzyiwei@google.com wrote:

...
On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse jglisse@redhat.com wrote:

...
On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:

...
Hi folks,

This is the plain text version of the previous email in case that was considered as spam.

--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However, debugfs nodes are getting deprecated in the next Android release.

Maybe explain why it is useful first ?

Memory is precious on Android mobile platforms. Apps using a large amount of memory, games, tend to maintain a table for the memory on different devices with different prediction models. Private gpu memory allocations is currently semi-blind to the apps and the platform as well.

By having the data, the platform can do: (1) GPU memory profiling as part of the huge Android profiler in progress. (2) Android system health team can enrich the performance test coverage. (3) We can collect filed metrics to detect any regression on the gpu private memory allocations in the production population. (4) Shell user can easily dump the allocations in a uniform way across vendors. (5) Platform can feed the data to the apps so that apps can do memory allocations in a more predictable way.

...
...
--- Proposal --- We are taking the chance to unify all the vendors to migrate their existing debugfs nodes into a standardized sysfs node structure. Then the platform is able to do a bunch of useful things: memory profiling, system health coverage, field metrics, local shell dump, in-app api, etc. This proposal is better served upstream as all GPU vendors can standardize a gpu memory structure and reduce fragmentation across Android and Linux that clients can rely on.

--- Detailed design --- The sysfs node structure looks like below: /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a node having the comma separated size values: "4096,81920,...,4096".

How does kernel knows what API the allocation is use for ? With the open source driver you never specify what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...).

Oh, is this a hard requirement for the open source drivers to not bookkeep any data from userland? I think the API is just some additional metadata passed down.

...
...
For the top level root, vendors can choose their own names based on the value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd KMDs. (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name under /sys/devices/ is already created and used for other purposes.

On one side you want to standardize on the other you want to give complete freedom on the top level naming scheme. I would rather see a consistent naming scheme (ie something more restraint and with little place for interpration by individual driver)

Thanks for commenting on this. We definitely need some suggestions on the root directory. In the multi-gpu case on desktop, is there some existing consumer to query "some data" from all the GPUs? How does the tool find all GPUs and differentiate between them? Is this already standardized?

...
...
For the 2nd level "pid", there are usually just a couple of them per snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with GPU objects ? And thus there is no bound on the number of PID. Please consider desktop too, i do not know what kind of limitation android impose.

We are only interested in tracking *active* GPU private allocations. So yes, any application currently holding an active GPU context will probably has a node here. Since we want to do profiling for specific apps, the data has to be per application based. I don't get your concerns here. If it's about the tracking overhead, it's rare to see tons of application doing private gpu allocations at the same time. Could you help elaborate a bit?

...
...
For the 3rd level "type_name", the type name will be one of the GPU memory object types in lower case, and the value will be a comma separated sequence of size values for all the allocations under that specific type.

We especially would like some comments on this part. For the GPU memory object types, we defined 9 different types for Android: (1) UNKNOWN // not accounted for in any other category (2) SHADER // shader binaries (3) COMMAND // allocations which have a lifetime similar to a VkCommandBuffer (4) VULKAN // backing for VkDeviceMemory (5) GL_TEXTURE // GL Texture and RenderBuffer (6) GL_BUFFER // GL Buffer (7) QUERY // backing for query (8) DESCRIPTOR // allocations which have a lifetime similar to a VkDescriptorSet (9) TRANSIENT // random transient things that the driver needs

We are wondering if those type enumerations make sense to the upstream side as well, or maybe we just deal with our own different type sets. Cuz on the Android side, we'll just read those nodes named after the types we defined in the sysfs node structure.

See my above point of open source driver and kernel being unaware of the allocation purpose and use.

Cheers, Jérôme

Many thanks for the reply! Yiwei

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Pekka Paalanen

1 Nov 1 Nov

8:36 a.m.

On Thu, 31 Oct 2019 13:57:00 -0400 Kenny Ho y2kenny@gmail.com wrote:

...

Hi Yiwei,

This is the latest series: https://patchwork.kernel.org/cover/11120371/

(I still need to reply some of the feedback.)

Regards, Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang zzyiwei@google.com wrote:

...
Hi Kenny,

Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread?

Best, Yiwei

On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

I am not sure if you are aware, there is an ongoing RFC on adding drm support in cgroup for the purpose of resource tracking. One of the resource is GPU memory. It's not exactly the same as what you are proposing (it doesn't track API usage, but it tracks the type of GPU memory from kmd perspective) but perhaps it would be of interest to you. There are no consensus on it at this point.

Hi Yiwei,

I'd like to point out an effort to have drivers label BOs for debugging purposes: https://lists.freedesktop.org/archives/dri-devel/2019-October/239727.html

I don't know if it would work, but an obvious idea might be to use those labels for tracking the kinds of buffers - a piece of UAPI which I believe you are still missing.

Thanks, pq

Yiwei Zhang

4 Nov 4 Nov

7:34 p.m.

Hi folks,

(Daniel, I just moved you to this thread)

Below are the latest thoughts based on all the feedback and comments.

First, I need to clarify on the gpu memory object type enumeration thing. We don't want to enforce those enumerations across the upstream and Android, and we should just leave those configurable and flexible.

Second, to make this effort also useful to all the other memory management tools like PSS. At least an additional node is needed for the part of the gpu private allocation not mapped to the userspace(invisible to PSS). This is especially critical for the downstream Android so that low-memory-killer(lmkd) can be aware of the actual total memory for a process and will know how much gets freed up if it kills that process. This is an effort to de-mystify the "lost ram".

Given above, the new node structure would look like below:

Global nodes: /sys/devices/<root>/gpu_mem/global/total /* Total private allocation for coherency, this should also include the anonymous memory allocated in the kmd */ /sys/devices/<root>/gpu_mem/global/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/global/<type1> /* One total value per type, this should also include the anonymous memory allocated in the kmd(or maybe another anonymous type for global nodes) */ /sys/devices/<root>/gpu_mem/global/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/global/<typeN> /* One total value per type */

Per process nodes: /sys/devices/<root>/gpu_mem/proc/<pid>/total /* Total private allocation for coherency */ /sys/devices/<root>/gpu_mem/proc/<pid>/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type1> /* One total value per type */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/proc/<pid>/<typeN> /* One total value per type */

The type1 to typeN for downstream Android will be the enumerations I mentioned in the original email which are: unknown, shader,..., transient. For the upstream, those can be the labeled BOs or any other customized types.

Look forward to the comments and feedback!

Best regards, Yiwei

On Fri, Nov 1, 2019 at 1:37 AM Pekka Paalanen ppaalanen@gmail.com wrote:

...

On Thu, 31 Oct 2019 13:57:00 -0400 Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

This is the latest series: https://patchwork.kernel.org/cover/11120371/

(I still need to reply some of the feedback.)

Regards, Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang zzyiwei@google.com wrote:

...
Hi Kenny,

Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread?

Best, Yiwei

On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

I am not sure if you are aware, there is an ongoing RFC on adding drm support in cgroup for the purpose of resource tracking. One of the resource is GPU memory. It's not exactly the same as what you are proposing (it doesn't track API usage, but it tracks the type of GPU memory from kmd perspective) but perhaps it would be of interest to you. There are no consensus on it at this point.

Hi Yiwei,

I'd like to point out an effort to have drivers label BOs for debugging purposes: https://lists.freedesktop.org/archives/dri-devel/2019-October/239727.html

I don't know if it would work, but an obvious idea might be to use those labels for tracking the kinds of buffers - a piece of UAPI which I believe you are still missing.

Thanks, pq

Daniel Vetter

5 Nov 5 Nov

9:47 a.m.

On Mon, Nov 04, 2019 at 11:34:33AM -0800, Yiwei Zhang wrote:

...

Hi folks,

(Daniel, I just moved you to this thread)

Below are the latest thoughts based on all the feedback and comments.

First, I need to clarify on the gpu memory object type enumeration thing. We don't want to enforce those enumerations across the upstream and Android, and we should just leave those configurable and flexible.

Second, to make this effort also useful to all the other memory management tools like PSS. At least an additional node is needed for the part of the gpu private allocation not mapped to the userspace(invisible to PSS). This is especially critical for the downstream Android so that low-memory-killer(lmkd) can be aware of the actual total memory for a process and will know how much gets freed up if it kills that process. This is an effort to de-mystify the "lost ram".

Given above, the new node structure would look like below:

Global nodes: /sys/devices/<root>/gpu_mem/global/total /* Total private allocation for coherency, this should also include the anonymous memory allocated in the kmd */ /sys/devices/<root>/gpu_mem/global/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/global/<type1> /* One total value per type, this should also include the anonymous memory allocated in the kmd(or maybe another anonymous type for global nodes) */ /sys/devices/<root>/gpu_mem/global/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/global/<typeN> /* One total value per type */

Per process nodes: /sys/devices/<root>/gpu_mem/proc/<pid>/total /* Total private allocation for coherency */ /sys/devices/<root>/gpu_mem/proc/<pid>/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type1> /* One total value per type */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/proc/<pid>/<typeN> /* One total value per type */

The type1 to typeN for downstream Android will be the enumerations I mentioned in the original email which are: unknown, shader,..., transient. For the upstream, those can be the labeled BOs or any other customized types.

Look forward to the comments and feedback!

I don't think this will work well, at least for upstream:

- The labels are currently free-form, baking them back into your structure would mean we'd need to do lots of hot add/remove of sysfs directory trees. Which sounds like a real bad idea :-/

- Buffer objects aren't attached to pids, but files. And files can be shared. If we want to list this somewhere outside of debugfs, we need to tie this into the files somehow (so proc), except the underlying files are all anon inodes, so this gets really tricky I think to make work well.

Cheers, Daniel

...

Best regards, Yiwei

On Fri, Nov 1, 2019 at 1:37 AM Pekka Paalanen ppaalanen@gmail.com wrote:

...
On Thu, 31 Oct 2019 13:57:00 -0400 Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

This is the latest series: https://patchwork.kernel.org/cover/11120371/

(I still need to reply some of the feedback.)

Regards, Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang zzyiwei@google.com wrote:

...
Hi Kenny,

Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread?

Best, Yiwei

On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

I am not sure if you are aware, there is an ongoing RFC on adding drm support in cgroup for the purpose of resource tracking. One of the resource is GPU memory. It's not exactly the same as what you are proposing (it doesn't track API usage, but it tracks the type of GPU memory from kmd perspective) but perhaps it would be of interest to you. There are no consensus on it at this point.

Hi Yiwei,

I'd like to point out an effort to have drivers label BOs for debugging purposes: https://lists.freedesktop.org/archives/dri-devel/2019-October/239727.html

I don't know if it would work, but an obvious idea might be to use those labels for tracking the kinds of buffers - a piece of UAPI which I believe you are still missing.

Thanks, pq

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Yiwei Zhang

7:45 p.m.

Hi Daniel,

...

The labels are currently free-form, baking them back into your structure

would mean we'd need to do lots of hot add/remove of sysfs directory trees. Which sounds like a real bad idea :-/

Given the free form of that ioctl, what's the plan of using that and the reporting of the labeled BOs? Do you think upstream kernel need to have certain resource category based tracking for gpu private allocations?

...

Buffer objects aren't attached to pids, but files. And files can be

shared. If we want to list this somewhere outside of debugfs, we need to tie this into the files somehow (so proc), except the underlying files are all anon inodes, so this gets really tricky I think to make work well.

So there isn't any gpu private allocations on the upstream side? How does upstream deal with duplicate accounting for the shared memory?

Best, Yiwei

Daniel Vetter

6 Nov 6 Nov

9:56 a.m.

On Tue, Nov 05, 2019 at 11:45:28AM -0800, Yiwei Zhang wrote:

...

Hi Daniel,

...

The labels are currently free-form, baking them back into your structure

would mean we'd need to do lots of hot add/remove of sysfs directory trees. Which sounds like a real bad idea :-/

Given the free form of that ioctl, what's the plan of using that and the reporting of the labeled BOs? Do you think upstream kernel need to have certain resource category based tracking for gpu private allocations?

There's no plan, we simply didn't consider more standardized buckets when adding that label support. So yeah not sure what to do now, except I don't want 2 different ways for labelling buffers.

...

...

Buffer objects aren't attached to pids, but files. And files can be

shared. If we want to list this somewhere outside of debugfs, we need to tie this into the files somehow (so proc), except the underlying files are all anon inodes, so this gets really tricky I think to make work well.

So there isn't any gpu private allocations on the upstream side? How does upstream deal with duplicate accounting for the shared memory?

Atm we don't account gpu memory anywhere at all. There's a lot of discussion going on how to remedy that in the context of a cgroup controller, and how to account allocated buffers against processes is a huge deal. Maybe cgroups is more the kind of control/reporting you're looking for? Of course would mean that android creates a cgroup for each app. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Rob Clark

4:55 p.m.

On Tue, Nov 5, 2019 at 1:47 AM Daniel Vetter daniel@ffwll.ch wrote:

...

On Mon, Nov 04, 2019 at 11:34:33AM -0800, Yiwei Zhang wrote:

...
Hi folks,

(Daniel, I just moved you to this thread)

Below are the latest thoughts based on all the feedback and comments.

First, I need to clarify on the gpu memory object type enumeration thing. We don't want to enforce those enumerations across the upstream and Android, and we should just leave those configurable and flexible.

Second, to make this effort also useful to all the other memory management tools like PSS. At least an additional node is needed for the part of the gpu private allocation not mapped to the userspace(invisible to PSS). This is especially critical for the downstream Android so that low-memory-killer(lmkd) can be aware of the actual total memory for a process and will know how much gets freed up if it kills that process. This is an effort to de-mystify the "lost ram".

Given above, the new node structure would look like below:

Global nodes: /sys/devices/<root>/gpu_mem/global/total /* Total private allocation for coherency, this should also include the anonymous memory allocated in the kmd */ /sys/devices/<root>/gpu_mem/global/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/global/<type1> /* One total value per type, this should also include the anonymous memory allocated in the kmd(or maybe another anonymous type for global nodes) */ /sys/devices/<root>/gpu_mem/global/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/global/<typeN> /* One total value per type */

Per process nodes: /sys/devices/<root>/gpu_mem/proc/<pid>/total /* Total private allocation for coherency */ /sys/devices/<root>/gpu_mem/proc/<pid>/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type1> /* One total value per type */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/proc/<pid>/<typeN> /* One total value per type */

The type1 to typeN for downstream Android will be the enumerations I mentioned in the original email which are: unknown, shader,..., transient. For the upstream, those can be the labeled BOs or any other customized types.

Look forward to the comments and feedback!

I don't think this will work well, at least for upstream:

The labels are currently free-form, baking them back into your structure would mean we'd need to do lots of hot add/remove of sysfs directory trees. Which sounds like a real bad idea :-/

also, a bo's label can change over time if it is re-used for a different purpose.. not sure what the overhead is for add/remove sysfs, but I don't think I want that overhead in the bo_reuse path

(maybe that matters less for vk, where we aren't using a userspace bo cache)

BR, -R

...

Buffer objects aren't attached to pids, but files. And files can be shared. If we want to list this somewhere outside of debugfs, we need to tie this into the files somehow (so proc), except the underlying files are all anon inodes, so this gets really tricky I think to make work well.

Cheers, Daniel

...
Best regards, Yiwei

On Fri, Nov 1, 2019 at 1:37 AM Pekka Paalanen ppaalanen@gmail.com wrote:

...
On Thu, 31 Oct 2019 13:57:00 -0400 Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

This is the latest series: https://patchwork.kernel.org/cover/11120371/

(I still need to reply some of the feedback.)

Regards, Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang zzyiwei@google.com wrote:

...
Hi Kenny,

Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread?

Best, Yiwei

On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

I am not sure if you are aware, there is an ongoing RFC on adding drm support in cgroup for the purpose of resource tracking. One of the resource is GPU memory. It's not exactly the same as what you are proposing (it doesn't track API usage, but it tracks the type of GPU memory from kmd perspective) but perhaps it would be of interest to you. There are no consensus on it at this point.

Hi Yiwei,

I'd like to point out an effort to have drivers label BOs for debugging purposes: https://lists.freedesktop.org/archives/dri-devel/2019-October/239727.html

I don't know if it would work, but an obvious idea might be to use those labels for tracking the kinds of buffers - a piece of UAPI which I believe you are still missing.

Thanks, pq

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Yiwei Zhang

7:21 p.m.

For the sysfs approach, I'm assuming the upstream vendors still need to provide a pair of UMD and KMD, and this ioctl to label the BO is kept as driver private ioctl. Then will each driver just define their own set of "label"s and the KMD will only consume the corresponding ones so that the sysfs nodes won't change at all? Report zero if there's no allocation or re-use under a particular "label".

A separate thought is that do the GPU memory allocations deserve a node under /proc/<pid> for per process tracking? If the structure can stay similar to what "maps" or "smaps" are, then we can bookkeep all BOs with a label easily. For multi-gpu scenario, maybe having something like "/proc/<pid>/gpu_mem/<gpu_id>/maps" along with a global table somewhere specifying the {gpu_id, device_name} pairs. Then the global GPU allocation summary info still lives under "/sys/devices/<device_name>/gpu_mem/". How difficult it is to define such procfs node structure? Just curious.

Thanks for all the comments and replies!

Best regards, Yiwei

On Wed, Nov 6, 2019 at 8:55 AM Rob Clark robdclark@gmail.com wrote:

...

On Tue, Nov 5, 2019 at 1:47 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Mon, Nov 04, 2019 at 11:34:33AM -0800, Yiwei Zhang wrote:

...
Hi folks,

(Daniel, I just moved you to this thread)

Below are the latest thoughts based on all the feedback and comments.

First, I need to clarify on the gpu memory object type enumeration thing. We don't want to enforce those enumerations across the upstream and Android, and we should just leave those configurable and flexible.

Second, to make this effort also useful to all the other memory management tools like PSS. At least an additional node is needed for the part of the gpu private allocation not mapped to the userspace(invisible to PSS). This is especially critical for the downstream Android so that low-memory-killer(lmkd) can be aware of the actual total memory for a process and will know how much gets freed up if it kills that process. This is an effort to de-mystify the "lost ram".

Given above, the new node structure would look like below:

Global nodes: /sys/devices/<root>/gpu_mem/global/total /* Total private allocation for coherency, this should also include the anonymous memory allocated in the kmd */ /sys/devices/<root>/gpu_mem/global/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/global/<type1> /* One total value per type, this should also include the anonymous memory allocated in the kmd(or maybe another anonymous type for global nodes) */ /sys/devices/<root>/gpu_mem/global/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/global/<typeN> /* One total value per type */

Per process nodes: /sys/devices/<root>/gpu_mem/proc/<pid>/total /* Total private allocation for coherency */ /sys/devices/<root>/gpu_mem/proc/<pid>/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type1> /* One total value per type */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/proc/<pid>/<typeN> /* One total value per type */

The type1 to typeN for downstream Android will be the enumerations I mentioned in the original email which are: unknown, shader,..., transient. For the upstream, those can be the labeled BOs or any other customized types.

Look forward to the comments and feedback!

I don't think this will work well, at least for upstream:

The labels are currently free-form, baking them back into your structure would mean we'd need to do lots of hot add/remove of sysfs directory trees. Which sounds like a real bad idea :-/

also, a bo's label can change over time if it is re-used for a different purpose.. not sure what the overhead is for add/remove sysfs, but I don't think I want that overhead in the bo_reuse path

(maybe that matters less for vk, where we aren't using a userspace bo cache)

BR, -R

...

Buffer objects aren't attached to pids, but files. And files can be shared. If we want to list this somewhere outside of debugfs, we need to tie this into the files somehow (so proc), except the underlying files are all anon inodes, so this gets really tricky I think to make work well.

Cheers, Daniel

...
Best regards, Yiwei

On Fri, Nov 1, 2019 at 1:37 AM Pekka Paalanen ppaalanen@gmail.com wrote:

...
On Thu, 31 Oct 2019 13:57:00 -0400 Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

This is the latest series: https://patchwork.kernel.org/cover/11120371/

(I still need to reply some of the feedback.)

Regards, Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang zzyiwei@google.com wrote:

...
Hi Kenny,

Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread?

Best, Yiwei

On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho y2kenny@gmail.com wrote: > > Hi Yiwei, > > I am not sure if you are aware, there is an ongoing RFC on adding drm > support in cgroup for the purpose of resource tracking. One of the > resource is GPU memory. It's not exactly the same as what you are > proposing (it doesn't track API usage, but it tracks the type of GPU > memory from kmd perspective) but perhaps it would be of interest to > you. There are no consensus on it at this point.

Hi Yiwei,

I'd like to point out an effort to have drivers label BOs for debugging purposes: https://lists.freedesktop.org/archives/dri-devel/2019-October/239727.html

I don't know if it would work, but an obvious idea might be to use those labels for tracking the kinds of buffers - a piece of UAPI which I believe you are still missing.

Thanks, pq

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Yiwei Zhang

12 Nov 12 Nov

6:17 p.m.

Hi folks,

What do you think about:

...

For the sysfs approach, I'm assuming the upstream vendors still need to provide a pair of UMD and KMD, and this ioctl to label the BO is kept as driver private ioctl. Then will each driver just define their own set of "label"s and the KMD will only consume the corresponding ones so that the sysfs nodes won't change at all? Report zero if there's no allocation or re-use under a particular "label".

Best, Yiwei

On Wed, Nov 6, 2019 at 11:21 AM Yiwei Zhang zzyiwei@google.com wrote:

...

For the sysfs approach, I'm assuming the upstream vendors still need to provide a pair of UMD and KMD, and this ioctl to label the BO is kept as driver private ioctl. Then will each driver just define their own set of "label"s and the KMD will only consume the corresponding ones so that the sysfs nodes won't change at all? Report zero if there's no allocation or re-use under a particular "label".

A separate thought is that do the GPU memory allocations deserve a node under /proc/<pid> for per process tracking? If the structure can stay similar to what "maps" or "smaps" are, then we can bookkeep all BOs with a label easily. For multi-gpu scenario, maybe having something like "/proc/<pid>/gpu_mem/<gpu_id>/maps" along with a global table somewhere specifying the {gpu_id, device_name} pairs. Then the global GPU allocation summary info still lives under "/sys/devices/<device_name>/gpu_mem/". How difficult it is to define such procfs node structure? Just curious.

Thanks for all the comments and replies!

Best regards, Yiwei

On Wed, Nov 6, 2019 at 8:55 AM Rob Clark robdclark@gmail.com wrote:

...
On Tue, Nov 5, 2019 at 1:47 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Mon, Nov 04, 2019 at 11:34:33AM -0800, Yiwei Zhang wrote:

...
Hi folks,

(Daniel, I just moved you to this thread)

Below are the latest thoughts based on all the feedback and comments.

First, I need to clarify on the gpu memory object type enumeration thing. We don't want to enforce those enumerations across the upstream and Android, and we should just leave those configurable and flexible.

Second, to make this effort also useful to all the other memory management tools like PSS. At least an additional node is needed for the part of the gpu private allocation not mapped to the userspace(invisible to PSS). This is especially critical for the downstream Android so that low-memory-killer(lmkd) can be aware of the actual total memory for a process and will know how much gets freed up if it kills that process. This is an effort to de-mystify the "lost ram".

Given above, the new node structure would look like below:

Global nodes: /sys/devices/<root>/gpu_mem/global/total /* Total private allocation for coherency, this should also include the anonymous memory allocated in the kmd */ /sys/devices/<root>/gpu_mem/global/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/global/<type1> /* One total value per type, this should also include the anonymous memory allocated in the kmd(or maybe another anonymous type for global nodes) */ /sys/devices/<root>/gpu_mem/global/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/global/<typeN> /* One total value per type */

Per process nodes: /sys/devices/<root>/gpu_mem/proc/<pid>/total /* Total private allocation for coherency */ /sys/devices/<root>/gpu_mem/proc/<pid>/total_unmapped /* Account for the private allocation not mapped to userspace(not visible for PSS), don't need to be coherent with the "total" node. lmkd or equivalent service looking at PSS will only look at this node in addition. */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type1> /* One total value per type */ /sys/devices/<root>/gpu_mem/proc/<pid>/<type2> /* One total value per type */ ... /sys/devices/<root>/gpu_mem/proc/<pid>/<typeN> /* One total value per type */

The type1 to typeN for downstream Android will be the enumerations I mentioned in the original email which are: unknown, shader,..., transient. For the upstream, those can be the labeled BOs or any other customized types.

Look forward to the comments and feedback!

I don't think this will work well, at least for upstream:

The labels are currently free-form, baking them back into your structure would mean we'd need to do lots of hot add/remove of sysfs directory trees. Which sounds like a real bad idea :-/

also, a bo's label can change over time if it is re-used for a different purpose.. not sure what the overhead is for add/remove sysfs, but I don't think I want that overhead in the bo_reuse path

(maybe that matters less for vk, where we aren't using a userspace bo cache)

BR, -R

...

Buffer objects aren't attached to pids, but files. And files can be shared. If we want to list this somewhere outside of debugfs, we need to tie this into the files somehow (so proc), except the underlying files are all anon inodes, so this gets really tricky I think to make work well.

Cheers, Daniel

...
Best regards, Yiwei

On Fri, Nov 1, 2019 at 1:37 AM Pekka Paalanen ppaalanen@gmail.com wrote:

...
On Thu, 31 Oct 2019 13:57:00 -0400 Kenny Ho y2kenny@gmail.com wrote:

...
Hi Yiwei,

This is the latest series: https://patchwork.kernel.org/cover/11120371/

(I still need to reply some of the feedback.)

Regards, Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang zzyiwei@google.com wrote: > > Hi Kenny, > > Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread? > > Best, > Yiwei > > On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho y2kenny@gmail.com wrote: >> >> Hi Yiwei, >> >> I am not sure if you are aware, there is an ongoing RFC on adding drm >> support in cgroup for the purpose of resource tracking. One of the >> resource is GPU memory. It's not exactly the same as what you are >> proposing (it doesn't track API usage, but it tracks the type of GPU >> memory from kmd perspective) but perhaps it would be of interest to >> you. There are no consensus on it at this point.

Hi Yiwei,

I'd like to point out an effort to have drivers label BOs for debugging purposes: https://lists.freedesktop.org/archives/dri-devel/2019-October/239727.html

I don't know if it would work, but an obvious idea might be to use those labels for tracking the kinds of buffers - a piece of UAPI which I believe you are still missing.

Thanks, pq

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Jerome Glisse

8:18 p.m.

On Tue, Nov 12, 2019 at 10:17:10AM -0800, Yiwei Zhang wrote:

...

Hi folks,

What do you think about:

...
For the sysfs approach, I'm assuming the upstream vendors still need to provide a pair of UMD and KMD, and this ioctl to label the BO is kept as driver private ioctl. Then will each driver just define their own set of "label"s and the KMD will only consume the corresponding ones so that the sysfs nodes won't change at all? Report zero if there's no allocation or re-use under a particular "label".

To me this looks like a way to abuse the kernel into provide a specific message passing API between process only for GPU. It would be better to use existing kernel/userspace API to pass message between process than add a new one just for a special case.

Note that I believe that listing GPU allocation for a process might useful but only if it is a generic thing accross all GPU (for upstream GPU driver we do not care about non upstream).

Cheers, Jérôme

Yiwei Zhang

15 Nov 15 Nov

1:02 a.m.

Thanks for all the comments and feedback, and they are all so valuable to me.

Let me summarize the main concerns so far here: (1) Open source driver never specifies what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...). (2) The ioctl to label anything to a BO and the label can change over the BO's lifetime: https://patchwork.kernel.org/patch/11185643/ (3) BOs are not attached to pids, but files, and can be shared.

Besides the discussions here, there was a lot of internal discussion for this proposal as well. The general principle is that I'll align my proposal with what exists on the upstream so that to help Android common kernel stay close to the upstream kernel for the sake of future graphics driver architecture.

I think tracking BOs per process would be a good thing on upstream as well. Some of the GPU addressable memory may have been mapped to the userspace which is visible in RSS. However, tools consuming RSS data can benefit more by knowing the amount of GPU memory which are not mapped. It's a good thing for per process memory accounting.

BOs on the upstream is not equal to what's on Android today. Android GPU memory objects are purely private and thus indexed by pid, and the shared memory is allocated through ion/dmabuf interface. The ion/dmabuf is similar to the upstream BO except that GEM BOs may just be an anon inode without a fd before sharing. For Android ion/dmabuf accounting, there was already an effort here to improve the dma-buf tracking(https://patchwork.kernel.org/cover/10831029/), and there's userspace API built on top of the "proc/<pid>/fdinfo" node(https://android.googlesource.com/platform/system/core/+/refs/heads/master/li...).

Is it reasonable to add another ioctl or something equivalent to label a BO with what PID makes the allocation? When the BO gets shared to other processes, this information also needs to be bookkept somewhere for tracking. Basically I wonder if it's possible for upstream to track BOs in a similar way Android tracks dmabuf. Then there's a node implemented by cgroup in proc listing all the BOs per process with information like label, refcount, etc. Then Android GPU vendors can implement the same nodes which is going to be compatible even if they later adopts drm subsystem.

So my sketch idea for the nodes are: (1) /proc/gpu0_meminfo, /proc/gpu1_meminfo This is a list of all BOs with pids holding a reference to it and the current label of each BO (2) /proc/<pid>/gpu0_meminfo, /proc/<pid>/gpu1_meminfo This is a list of all BOs this process holds a reference to. (3) Is it reasonable to implement another nodes for {total, total_unmapped} counters? or just surface through /proc/meminfo?

Many thanks for the feedback! Yiwei

On Tue, Nov 12, 2019 at 12:18 PM Jerome Glisse jglisse@redhat.com wrote:

...

On Tue, Nov 12, 2019 at 10:17:10AM -0800, Yiwei Zhang wrote:

...
Hi folks,

What do you think about:

...
For the sysfs approach, I'm assuming the upstream vendors still need to provide a pair of UMD and KMD, and this ioctl to label the BO is kept as driver private ioctl. Then will each driver just define their own set of "label"s and the KMD will only consume the corresponding ones so that the sysfs nodes won't change at all? Report zero if there's no allocation or re-use under a particular "label".

To me this looks like a way to abuse the kernel into provide a specific message passing API between process only for GPU. It would be better to use existing kernel/userspace API to pass message between process than add a new one just for a special case.

Note that I believe that listing GPU allocation for a process might useful but only if it is a generic thing accross all GPU (for upstream GPU driver we do not care about non upstream).

Cheers, Jérôme

Yiwei Zhang

13 Dec 13 Dec

10:09 p.m.

Hi folks,

Would we be able to track the below for each of the graphics kmds: (1) Global total memory (2) Per-process total memory (3) Per-process total memory not mapped to userland -> when it's mapped it's shown in RSS, so this is to help complete the picture of RSS

Would it be better reported under each kmd's device node? or in proc/ or sys/? Any draft ideas or concerns are so welcome!

As for the previous detailed tracking for the userland contexts, on downstream Android we'll throw a HAL for memory data from those detailed categories.

Thanks for all the info and comments so far! Look forward to better ideas as well!

Best regards, Yiwei

On Thu, Nov 14, 2019 at 5:02 PM Yiwei Zhang zzyiwei@google.com wrote:

...

Thanks for all the comments and feedback, and they are all so valuable to me.

Let me summarize the main concerns so far here: (1) Open source driver never specifies what API is creating a gem object (opengl, vulkan, ...) nor what purpose (transient, shader, ...). (2) The ioctl to label anything to a BO and the label can change over the BO's lifetime: https://patchwork.kernel.org/patch/11185643/ (3) BOs are not attached to pids, but files, and can be shared.

Besides the discussions here, there was a lot of internal discussion for this proposal as well. The general principle is that I'll align my proposal with what exists on the upstream so that to help Android common kernel stay close to the upstream kernel for the sake of future graphics driver architecture.

I think tracking BOs per process would be a good thing on upstream as well. Some of the GPU addressable memory may have been mapped to the userspace which is visible in RSS. However, tools consuming RSS data can benefit more by knowing the amount of GPU memory which are not mapped. It's a good thing for per process memory accounting.

BOs on the upstream is not equal to what's on Android today. Android GPU memory objects are purely private and thus indexed by pid, and the shared memory is allocated through ion/dmabuf interface. The ion/dmabuf is similar to the upstream BO except that GEM BOs may just be an anon inode without a fd before sharing. For Android ion/dmabuf accounting, there was already an effort here to improve the dma-buf tracking(https://patchwork.kernel.org/cover/10831029/), and there's userspace API built on top of the "proc/<pid>/fdinfo" node(https://android.googlesource.com/platform/system/core/+/refs/heads/master/li...).

Is it reasonable to add another ioctl or something equivalent to label a BO with what PID makes the allocation? When the BO gets shared to other processes, this information also needs to be bookkept somewhere for tracking. Basically I wonder if it's possible for upstream to track BOs in a similar way Android tracks dmabuf. Then there's a node implemented by cgroup in proc listing all the BOs per process with information like label, refcount, etc. Then Android GPU vendors can implement the same nodes which is going to be compatible even if they later adopts drm subsystem.

So my sketch idea for the nodes are: (1) /proc/gpu0_meminfo, /proc/gpu1_meminfo This is a list of all BOs with pids holding a reference to it and the current label of each BO (2) /proc/<pid>/gpu0_meminfo, /proc/<pid>/gpu1_meminfo This is a list of all BOs this process holds a reference to. (3) Is it reasonable to implement another nodes for {total, total_unmapped} counters? or just surface through /proc/meminfo?

Many thanks for the feedback! Yiwei

On Tue, Nov 12, 2019 at 12:18 PM Jerome Glisse jglisse@redhat.com wrote:

...
On Tue, Nov 12, 2019 at 10:17:10AM -0800, Yiwei Zhang wrote:

...
Hi folks,

What do you think about:

...
For the sysfs approach, I'm assuming the upstream vendors still need to provide a pair of UMD and KMD, and this ioctl to label the BO is kept as driver private ioctl. Then will each driver just define their own set of "label"s and the KMD will only consume the corresponding ones so that the sysfs nodes won't change at all? Report zero if there's no allocation or re-use under a particular "label".

To me this looks like a way to abuse the kernel into provide a specific message passing API between process only for GPU. It would be better to use existing kernel/userspace API to pass message between process than add a new one just for a special case.

Note that I believe that listing GPU allocation for a process might useful but only if it is a generic thing accross all GPU (for upstream GPU driver we do not care about non upstream).

Cheers, Jérôme

Rohan Garg

19 Dec 19 Dec

4:18 p.m.

Hey

...

Is it reasonable to add another ioctl or something equivalent to label a BO with what PID makes the allocation? When the BO gets shared to other processes, this information also needs to be bookkept somewhere for tracking. Basically I wonder if it's possible for upstream to track BOs in a similar way Android tracks dmabuf. Then there's a node implemented by cgroup in proc listing all the BOs per process with information like label, refcount, etc. Then Android GPU vendors can implement the same nodes which is going to be compatible even if they later adopts drm subsystem.

So my sketch idea for the nodes are: (1) /proc/gpu0_meminfo, /proc/gpu1_meminfo This is a list of all BOs with pids holding a reference to it and the current label of each BO (2) /proc/<pid>/gpu0_meminfo, /proc/<pid>/gpu1_meminfo This is a list of all BOs this process holds a reference to. (3) Is it reasonable to implement another nodes for {total, total_unmapped} counters? or just surface through /proc/meminfo?

This would be tricky to implement because:

(1) PID's are not unique, PID namespaces allow linux userspace to potentially share the same PID.

(2) Specifically in the case of mesa, there isn't a way to (AFAIK) associate a BO with a PID.

Cheers Rohan Garg

Yiwei Zhang

6:52 p.m.

Hi Rohan,

Thanks for pointing out the pids issue! Then the index would be {namespace + pid(in that namespace)}. I'll grab a setup and play with the driver to see what I can do. I know how to find an Intel or Freedreno setup, but I'd still like to know is there a development friendly Mali setup?

Many many thanks for all the feedback! Yiwei

On Thu, Dec 19, 2019 at 8:18 AM Rohan Garg rohan.garg@collabora.com wrote:

...

Hey

...
Is it reasonable to add another ioctl or something equivalent to label a BO with what PID makes the allocation? When the BO gets shared to other processes, this information also needs to be bookkept somewhere for tracking. Basically I wonder if it's possible for upstream to track BOs in a similar way Android tracks dmabuf. Then there's a node implemented by cgroup in proc listing all the BOs per process with information like label, refcount, etc. Then Android GPU vendors can implement the same nodes which is going to be compatible even if they later adopts drm subsystem.

So my sketch idea for the nodes are: (1) /proc/gpu0_meminfo, /proc/gpu1_meminfo This is a list of all BOs with pids holding a reference to it and the current label of each BO (2) /proc/<pid>/gpu0_meminfo, /proc/<pid>/gpu1_meminfo This is a list of all BOs this process holds a reference to. (3) Is it reasonable to implement another nodes for {total, total_unmapped} counters? or just surface through /proc/meminfo?

This would be tricky to implement because:

(1) PID's are not unique, PID namespaces allow linux userspace to potentially share the same PID.

(2) Specifically in the case of mesa, there isn't a way to (AFAIK) associate a BO with a PID.

Cheers Rohan Garg

Rohan Garg

6 Jan 6 Jan

10:46 a.m.

Hi Yiwei

On jueves, 19 de diciembre de 2019 19:52:26 (CET) Yiwei Zhang wrote:

...

Hi Rohan,

Thanks for pointing out the pids issue! Then the index would be {namespace

pid(in that namespace)}. I'll grab a setup and play with the driver to

see what I can do. I know how to find an Intel or Freedreno setup, but I'd still like to know is there a development friendly Mali setup?

You should be able to setup a Mali T860 compatible device with this guide [1].

Cheers Rohan Garg

[1] https://panfrost.freedesktop.org/building-panfrost-mesa.html

Yiwei Zhang

8:47 p.m.

Thanks, I'll check it out.

Best, Yiwei

On Mon, Jan 6, 2020 at 2:46 AM Rohan Garg rohan.garg@collabora.com wrote:

...

Hi Yiwei

On jueves, 19 de diciembre de 2019 19:52:26 (CET) Yiwei Zhang wrote:

...
Hi Rohan,

Thanks for pointing out the pids issue! Then the index would be

{namespace

...

pid(in that namespace)}. I'll grab a setup and play with the driver to

see what I can do. I know how to find an Intel or Freedreno setup, but

I'd

...
still like to know is there a development friendly Mali setup?

You should be able to setup a Mali T860 compatible device with this guide [1].

Cheers Rohan Garg

[1] https://panfrost.freedesktop.org/building-panfrost-mesa.html

Rohan Garg

20 Mar 20 Mar

12:07 p.m.

Hi Yiwei After some deliberation on how to move forward with my BO Labeling patches[1], we've come up with the following structure for debugfs entries:

/debugfs/dri/128/bo/<handle>/label /debugfs/dri/128/bo/<handle>/size

My initial idea was to count the total memory allocated for a particular label in kernel space, but that turned out to be far too complicated to implement. Which is why we decided to move towards something simpler and handle collating this information on the userspace side of things.

Would this satisfy most of the Android teams requirements? I understand that it would leave out the memory tracking requirements tied to a specific PID, but correct me if I'm wrong, would this not possible with gralloc on Android?

Cheers Rohan Garg

[1] https://patchwork.freedesktop.org/patch/335508/?series=66752&rev=4

On lunes, 6 de enero de 2020 21:47:21 (CET) Yiwei Zhang wrote:

...

Thanks, I'll check it out.

Best, Yiwei

On Mon, Jan 6, 2020 at 2:46 AM Rohan Garg rohan.garg@collabora.com wrote:

...
Hi Yiwei

On jueves, 19 de diciembre de 2019 19:52:26 (CET) Yiwei Zhang wrote:

...
Hi Rohan,

Thanks for pointing out the pids issue! Then the index would be

{namespace

...

pid(in that namespace)}. I'll grab a setup and play with the driver to

see what I can do. I know how to find an Intel or Freedreno setup, but

I'd

...
still like to know is there a development friendly Mali setup?

You should be able to setup a Mali T860 compatible device with this guide [1].

Cheers Rohan Garg

[1] https://panfrost.freedesktop.org/building-panfrost-mesa.html

Yiwei Zhang

9:35 p.m.

Hi Rohan,

Thanks for your reply! We'd like the standardization to happen in the drm layer so it can be carried along, however, debugfs has already been deprecated in Android kernel and tracking per pid is most likely a doable solution only in the device driver layer. So...since we'd still like the low-cost run-time query system, we eventually end up doing the tracepoint[1] + eBPF solution. We standardized a gpu memory total tracepoint in upstream linux. Then the GPU vendors integrate into their kernel driver to track those global and per-process total counters. Then we wrote a bpf c program to track this tracepoint and maintain a map for the userspace to poke at.

Best regards, Yiwei

[1] https://lore.kernel.org/lkml/20200302235044.59163-1-zzyiwei@google.com

On Fri, Mar 20, 2020 at 5:07 AM Rohan Garg rohan.garg@collabora.com wrote:

...

Hi Yiwei After some deliberation on how to move forward with my BO Labeling patches[1], we've come up with the following structure for debugfs entries:

/debugfs/dri/128/bo/<handle>/label /debugfs/dri/128/bo/<handle>/size

My initial idea was to count the total memory allocated for a particular label in kernel space, but that turned out to be far too complicated to implement. Which is why we decided to move towards something simpler and handle collating this information on the userspace side of things.

Would this satisfy most of the Android teams requirements? I understand that it would leave out the memory tracking requirements tied to a specific PID, but correct me if I'm wrong, would this not possible with gralloc on Android?

Cheers Rohan Garg

[1] https://patchwork.freedesktop.org/patch/335508/?series=66752&rev=4

On lunes, 6 de enero de 2020 21:47:21 (CET) Yiwei Zhang wrote:

...
Thanks, I'll check it out.

Best, Yiwei

On Mon, Jan 6, 2020 at 2:46 AM Rohan Garg rohan.garg@collabora.com

wrote:

...
...
Hi Yiwei

On jueves, 19 de diciembre de 2019 19:52:26 (CET) Yiwei Zhang wrote:

...
Hi Rohan,

Thanks for pointing out the pids issue! Then the index would be

{namespace

...

pid(in that namespace)}. I'll grab a setup and play with the

driver to

...
...
...
see what I can do. I know how to find an Intel or Freedreno setup,

but

...
...
I'd

...
still like to know is there a development friendly Mali setup?

You should be able to setup a Mali T860 compatible device with this

guide

...
...
[1].

Cheers Rohan Garg

[1] https://panfrost.freedesktop.org/building-panfrost-mesa.html

Pekka Paalanen

29 Oct 29 Oct

8:33 a.m.

On Mon, 28 Oct 2019 11:33:57 -0700 Yiwei Zhang zzyiwei@google.com wrote:

...

On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse jglisse@redhat.com wrote:

...
On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:

...
Hi folks,

This is the plain text version of the previous email in case that was considered as spam.

Hi,

you still had a HTML attachment. More comments below.

...

...
...
--- Background --- On the downstream Android, vendors used to report GPU private memory allocations with debugfs nodes in their own formats. However, debugfs

nodes

...
are getting deprecated in the next Android release.

...

...
For the 2nd level "pid", there are usually just a couple of them per

...
snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with GPU objects ? And thus there is no bound on the number of PID. Please consider desktop too, i do not know what kind of limitation android impose.

We are only interested in tracking *active* GPU private allocations. So yes, any application currently holding an active GPU context will probably has a node here. Since we want to do profiling for specific apps, the data has to be per application based. I don't get your concerns here. If it's about the tracking overhead, it's rare to see tons of application doing private gpu allocations at the same time. Could you help elaborate a bit?

Toolkits for the Linux desktop, at least GTK 4, are moving to GPU-accelerated rendering by default AFAIK. This means that every application using such toolkit will have an active GPU context created and used at all times. So potentially every single end user application running in a system may have a GPU context, even a simple text editor.

In my opinion tracking per process is good, but you cannot sidestep the question of tracking performance by saying that there is only few processes using the GPU.

What is an "active" GPU private allocation? This implies that there are also inactive allocations, what are those?

Let's say you have a bunch of apps and the memory consumption is spread out into sysfs files like you propose. How would one get a coherent view of total GPU private memory usage in a system? Iterating through all sysfs files in userspace and summing up won't work, because each file will be sampled at a different time, which means the result is not coherent. Separate files for accumulated statistics perhaps?

What about getting a coherent view of the total GPU private memory consumption of a single process? I think the same caveat and solution would apply.

Thanks, pq

Yiwei Zhang

30 Oct 30 Oct

9:03 p.m.

Hi folks,

Didn't realize gmail has a plain text mode ; )

...

In my opinion tracking per process is good, but you cannot sidestep the question of tracking performance by saying that there is only few processes using the GPU.

Agreed, I shouldn't make that statement. Thanks for the info as well!

...

What is an "active" GPU private allocation? This implies that there are also inactive allocations, what are those?

"active" is used to claim that we don't track the allocation history. We just want the currently allocated memory.

...

What about getting a coherent view of the total GPU private memory consumption of a single process? I think the same caveat and solution would apply.

Realistically I assume drivers won't change the values during a snapshot call? But adding one more node per process for total GPU private memory allocated would be good for test enforcement for the coherency as well. I'd suggest an additional "/sys/devices/<some TBD root>/<pid>/gpu_mem/total" node.

Best, Yiwei

Yiwei Zhang

10:06 p.m.

...

What about getting a coherent view of the total GPU private memory consumption of a single process? I think the same caveat and solution would apply.

For the coherency issue, now I understand your concerns. Let me re-think and come back. A total value per process is an option if we'd like precise total GPU private memory per process. We'll check if there're other options as well. Thanks for pointing this out!

On Wed, Oct 30, 2019 at 2:03 PM Yiwei Zhang zzyiwei@google.com wrote:

...

Hi folks,

Didn't realize gmail has a plain text mode ; )

...
In my opinion tracking per process is good, but you cannot sidestep the question of tracking performance by saying that there is only few processes using the GPU.

Agreed, I shouldn't make that statement. Thanks for the info as well!

...
What is an "active" GPU private allocation? This implies that there are also inactive allocations, what are those?

"active" is used to claim that we don't track the allocation history. We just want the currently allocated memory.

...
What about getting a coherent view of the total GPU private memory consumption of a single process? I think the same caveat and solution would apply.

Realistically I assume drivers won't change the values during a snapshot call? But adding one more node per process for total GPU private memory allocated would be good for test enforcement for the coherency as well. I'd suggest an additional "/sys/devices/<some TBD root>/<pid>/gpu_mem/total" node.

Best, Yiwei

1869

Age (days ago)

2016

Last active (days ago)

dri-devel@lists.freedesktop.org

26 comments

7 participants

tags (0)

participants (7)

Daniel Vetter
Jerome Glisse
Kenny Ho
Pekka Paalanen
Rob Clark
Rohan Garg
Yiwei Zhang