Since this also involves the kernel let's add dri-devel ...
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
Hi all,
As many of you already know, I've been working with James Jones on the Generic Device Allocator project lately. He started a discussion thread some weeks ago seeking feedback on the current prototype of the library and advice on how to move all this forward, from a prototype stage to production. For further reference, see:
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level design ideas for one of the currently missing parts in the library: Usage transitions. That's something I'll personally work on during the following weeks.
In the meantime, I've been working on putting together an open source implementation of the allocator mechanisms using the Nouveau driver for all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to different components of the graphics stack:
** Allocator **
An allocator driver implementation on top of Nouveau. The current implementation only handles pitch linear layouts, but that's enough to have the kmscube port working using the allocator and Nouveau drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the EXT_external_objects extension to import allocator allocations to OpenGL as a texture object. However, the Nouveau implementation of these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import extension to attach allocator metadata to texture objects so the driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now, it just serves as an experimental mechanism to import allocator memory in OpenGL, and attach metadata to texture objects.
You can pull these changes (written on top of the above) from:
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the allocator. Main thing is the allocator initialization path will use EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported by the underlying EGL implementation.
You can pull these changes from:
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is importing allocations to DRM FB objects. This is probably one of the most sensitive parts of the project as it requires modification/addition of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people about this, all having different opinions. I'd like to take this opportunity to also start a discussion about what's the best option to create a path to get allocator allocations added as DRM FB objects.
These are the few options we've considered to start with:
A) Have vendor-private ioctls to set properties on GEM objects that are inherited by the FB objects. This is how our (NVIDIA) desktop DRM driver currently works. This would require every vendor to add their own ioctl to process allocator metadata, but the metadata is actually a vendor-agnostic object more like DRM modifiers. We'd like to come up with a vendor-agnostic solutions that can be integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator metadata blobs for each plane of the FB. Some people in the community have mentioned this is their preferred design. This, however, means we'd have to go through the exercise of adding another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate plane in the image, and using the existing DRM modifiers mechanism to indicate there is another plane for each "real" plane added. It isn't clear how this scales to surfaces that already need several planes, but there are some people that see this as the only way forward. Also, we would have to create a separate GEM buffer for the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to prototype the new path (which is actually very similar to the drmModeAddFB2() one). You can take a look at the new interfaces here:
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be a better choice than the above, so any suggestion will be greatly appreciated.
What kind of metadata are we talking about here? Addfb has tons of stuff already that's "metadata". The only thing I've spotted is PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell userspace, but definitely not something addfb ever needs. addfb only needs the resulting pitch that we actually allocated (and might decide it doesn't like that, but that's a different issue).
And since there's no patches for nouveau itself I can't really say anything beyond that. -Daniel
On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter daniel@ffwll.ch wrote:
Since this also involves the kernel let's add dri-devel ...
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
Hi all,
As many of you already know, I've been working with James Jones on the Generic Device Allocator project lately. He started a discussion thread some weeks ago seeking feedback on the current prototype of the library and advice on how to move all this forward, from a prototype stage to production. For further reference, see:
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level design ideas for one of the currently missing parts in the library: Usage transitions. That's something I'll personally work on during the following weeks.
In the meantime, I've been working on putting together an open source implementation of the allocator mechanisms using the Nouveau driver for all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to different components of the graphics stack:
** Allocator **
An allocator driver implementation on top of Nouveau. The current implementation only handles pitch linear layouts, but that's enough to have the kmscube port working using the allocator and Nouveau drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the EXT_external_objects extension to import allocator allocations to OpenGL as a texture object. However, the Nouveau implementation of these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import extension to attach allocator metadata to texture objects so the driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now, it just serves as an experimental mechanism to import allocator memory in OpenGL, and attach metadata to texture objects.
You can pull these changes (written on top of the above) from:
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the allocator. Main thing is the allocator initialization path will use EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported by the underlying EGL implementation.
You can pull these changes from:
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is importing allocations to DRM FB objects. This is probably one of the most sensitive parts of the project as it requires modification/addition of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people about this, all having different opinions. I'd like to take this opportunity to also start a discussion about what's the best option to create a path to get allocator allocations added as DRM FB objects.
These are the few options we've considered to start with:
A) Have vendor-private ioctls to set properties on GEM objects that are inherited by the FB objects. This is how our (NVIDIA) desktop DRM driver currently works. This would require every vendor to add their own ioctl to process allocator metadata, but the metadata is actually a vendor-agnostic object more like DRM modifiers. We'd like to come up with a vendor-agnostic solutions that can be integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator metadata blobs for each plane of the FB. Some people in the community have mentioned this is their preferred design. This, however, means we'd have to go through the exercise of adding another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate plane in the image, and using the existing DRM modifiers mechanism to indicate there is another plane for each "real" plane added. It isn't clear how this scales to surfaces that already need several planes, but there are some people that see this as the only way forward. Also, we would have to create a separate GEM buffer for the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to prototype the new path (which is actually very similar to the drmModeAddFB2() one). You can take a look at the new interfaces here:
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be a better choice than the above, so any suggestion will be greatly appreciated.
What kind of metadata are we talking about here? Addfb has tons of stuff already that's "metadata". The only thing I've spotted is PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell userspace, but definitely not something addfb ever needs. addfb only needs the resulting pitch that we actually allocated (and might decide it doesn't like that, but that's a different issue).
And since there's no patches for nouveau itself I can't really say anything beyond that.
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
Kristian
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Inline.
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg hoegsberg@gmail.com wrote:
On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter daniel@ffwll.ch wrote:
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
Hi all,
As many of you already know, I've been working with James Jones on the Generic Device Allocator project lately. He started a discussion thread some weeks ago seeking feedback on the current prototype of the library and advice on how to move all this forward, from a prototype stage to production. For further reference, see:
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level design ideas for one of the currently missing parts in the library: Usage transitions. That's something I'll personally work on during the following weeks.
In the meantime, I've been working on putting together an open source implementation of the allocator mechanisms using the Nouveau driver for all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to different components of the graphics stack:
** Allocator **
An allocator driver implementation on top of Nouveau. The current implementation only handles pitch linear layouts, but that's enough to have the kmscube port working using the allocator and Nouveau drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the EXT_external_objects extension to import allocator allocations to OpenGL as a texture object. However, the Nouveau implementation of these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import extension to attach allocator metadata to texture objects so the driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now, it just serves as an experimental mechanism to import allocator memory in OpenGL, and attach metadata to texture objects.
You can pull these changes (written on top of the above) from:
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the allocator. Main thing is the allocator initialization path will use EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported by the underlying EGL implementation.
You can pull these changes from:
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is importing allocations to DRM FB objects. This is probably one of the most sensitive parts of the project as it requires modification/addition of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people about this, all having different opinions. I'd like to take this opportunity to also start a discussion about what's the best option to create a path to get allocator allocations added as DRM FB objects.
These are the few options we've considered to start with:
A) Have vendor-private ioctls to set properties on GEM objects that are inherited by the FB objects. This is how our (NVIDIA) desktop DRM driver currently works. This would require every vendor to add their own ioctl to process allocator metadata, but the metadata is actually a vendor-agnostic object more like DRM modifiers. We'd like to come up with a vendor-agnostic solutions that can be integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator metadata blobs for each plane of the FB. Some people in the community have mentioned this is their preferred design. This, however, means we'd have to go through the exercise of adding another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate plane in the image, and using the existing DRM modifiers mechanism to indicate there is another plane for each "real" plane added. It isn't clear how this scales to surfaces that already need several planes, but there are some people that see this as the only way forward. Also, we would have to create a separate GEM buffer for the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to prototype the new path (which is actually very similar to the drmModeAddFB2() one). You can take a look at the new interfaces here:
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could be a better choice than the above, so any suggestion will be greatly appreciated.
What kind of metadata are we talking about here? Addfb has tons of stuff already that's "metadata". The only thing I've spotted is PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell userspace, but definitely not something addfb ever needs. addfb only needs the resulting pitch that we actually allocated (and might decide it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all allocation parameters the generic allocator was given to allocate memory. That currently means the final capability set used for the allocation, including all constraints (such as memory alignment, pitch alignment, and others) and capabilities, describing allocation properties like tiling formats, compression, and such.
And since there's no patches for nouveau itself I can't really say anything beyond that.
I can work on implementing these interfaces for nouveau, maybe partially, if that's going to help. I just thought it'd be better to first start a discussion on what would be the right way to pass allocator metadata to display drivers before starting to seriously implement any of the proposed options.
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
Thanks.
Kristian
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
Inline.
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg hoegsberg@gmail.com wrote:
On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter daniel@ffwll.ch wrote:
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico <
mvicomoya@nvidia.com> wrote:
Hi all,
As many of you already know, I've been working with James Jones on the Generic Device Allocator project lately. He started a discussion
thread
some weeks ago seeking feedback on the current prototype of the
library
and advice on how to move all this forward, from a prototype stage to production. For further reference, see:
November/177632.html
From the thread above, we came up with very interesting high level design ideas for one of the currently missing parts in the library: Usage transitions. That's something I'll personally work on during the following weeks.
In the meantime, I've been working on putting together an open source implementation of the allocator mechanisms using the Nouveau driver
for
all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to different components of the graphics stack:
** Allocator **
An allocator driver implementation on top of Nouveau. The current implementation only handles pitch linear layouts, but that's enough to have the kmscube port working using the allocator and Nouveau drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/
nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the EXT_external_objects extension to import allocator allocations to OpenGL as a texture object. However, the Nouveau implementation of these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_
external_objects-nouveau
Also, James's kmscube port uses the NVX_unix_allocator_import extension to attach allocator metadata to texture objects so the driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now, it just serves as an experimental mechanism to import allocator memory in OpenGL, and attach metadata to texture objects.
You can pull these changes (written on top of the above) from:
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_
unix_allocator_import
** kmscube **
Mostly minor fixes and improvements on top of James's port to use
the
allocator. Main thing is the allocator initialization path will use EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported by the underlying EGL implementation.
You can pull these changes from:
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/
allocator-nouveau
With all the above you should be able to get kmscube working using the allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is importing allocations to DRM FB objects. This is probably one of the most sensitive parts of the project as it requires
modification/addition
of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several
people
about this, all having different opinions. I'd like to take this opportunity to also start a discussion about what's the best option to create a path to get allocator allocations added as DRM FB objects.
These are the few options we've considered to start with:
A) Have vendor-private ioctls to set properties on GEM objects that are inherited by the FB objects. This is how our (NVIDIA) desktop DRM driver currently works. This would require every vendor to
add
their own ioctl to process allocator metadata, but the metadata
is
actually a vendor-agnostic object more like DRM modifiers. We'd like to come up with a vendor-agnostic solutions that can be integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator metadata blobs for each plane of the FB. Some people in the community have mentioned this is their preferred design. This, however, means we'd have to go through the exercise of adding another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate plane in the image, and using the existing DRM modifiers
mechanism
to indicate there is another plane for each "real" plane added.
It
isn't clear how this scales to surfaces that already need several planes, but there are some people that see this as the only way forward. Also, we would have to create a separate GEM buffer for the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to prototype the new path (which is actually very similar to the drmModeAddFB2() one). You can take a look at the new interfaces here:
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_
addfb_with_metadata__4.14-rc8
There may be other options that haven't been explored yet that could
be
a better choice than the above, so any suggestion will be greatly appreciated.
What kind of metadata are we talking about here? Addfb has tons of stuff already that's "metadata". The only thing I've spotted is PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell userspace, but definitely not something addfb ever needs. addfb only needs the resulting pitch that we actually allocated (and might decide it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all allocation parameters the generic allocator was given to allocate memory. That currently means the final capability set used for the allocation, including all constraints (such as memory alignment, pitch alignment, and others) and capabilities, describing allocation properties like tiling formats, compression, and such.
And since there's no patches for nouveau itself I can't really say anything beyond that.
I can work on implementing these interfaces for nouveau, maybe partially, if that's going to help. I just thought it'd be better to first start a discussion on what would be the right way to pass allocator metadata to display drivers before starting to seriously implement any of the proposed options.
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of 56 bits that configure your tiling/swizzling for color buffers. What I don't buy is that you need all those combinations when passing buffers around between codecs, cameras and display controllers. Even if you're sharing between the same 3D drivers in different processes, I expect just locking down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
If you want us the redesign KMS and the rest of the eco system around blobs instead of the modifiers that are now moderately pervasive, you have to justify it a little better than just "we didn't find it suitable".
Kristian
Thanks.
Kristian
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-- Miguel
On Wed, Dec 20, 2017 at 6:22 PM, Kristian Kristensen hoegsberg@google.com wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg hoegsberg@gmail.com wrote:
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of 56 bits that configure your tiling/swizzling for color buffers. What I don't buy is that you need all those combinations when passing buffers around between codecs, cameras and display controllers. Even if you're sharing between the same 3D drivers in different processes, I expect just locking down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
There's probably a world of stuff that we don't know about in nouveau, but I have a hard time coming up with more than 64-bits worth of tiling info for dGPU surfaces...
There's 8 bits (sorta, not fully populated, but might as well use them) of "micro" tiling which is done at the PTE level by the memory controller and includes compression settings, and then there's 4 bits of tiling per dimension for macro blocks (which configures different sizes for each dimension for tile sizes) -- that's only 20 bits. MSAA level (which is part of the micro tiling setting usually, but may not necessarily have to be) - another couple of bits, maybe something else weird for another few bits. Anyways, this is *nowhere* close to 64 bits.
What am I missing?
-ilia
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen hoegsberg@google.com wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
Inline.
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg hoegsberg@gmail.com wrote:
On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter daniel@ffwll.ch wrote:
Since this also involves the kernel let's add dri-devel ...
Yeah, I forgot. Thanks Daniel!
On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
Hi all,
As many of you already know, I've been working with James Jones on the Generic Device Allocator project lately. He started a discussion thread some weeks ago seeking feedback on the current prototype of the library and advice on how to move all this forward, from a prototype stage to production. For further reference, see:
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
From the thread above, we came up with very interesting high level design ideas for one of the currently missing parts in the library: Usage transitions. That's something I'll personally work on during the following weeks.
In the meantime, I've been working on putting together an open source implementation of the allocator mechanisms using the Nouveau driver for all to be able to play with.
Below I'm seeking feedback on a bunch of changes I had to make to different components of the graphics stack:
** Allocator **
An allocator driver implementation on top of Nouveau. The current implementation only handles pitch linear layouts, but that's enough to have the kmscube port working using the allocator and Nouveau drivers.
You can pull these changes from
https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
** Mesa **
James's kmscube port to use the allocator relies on the EXT_external_objects extension to import allocator allocations to OpenGL as a texture object. However, the Nouveau implementation of these mechanisms is missing in Mesa, so I went ahead and added them.
You can pull these changes from
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-no...
Also, James's kmscube port uses the NVX_unix_allocator_import extension to attach allocator metadata to texture objects so the driver knows how to deal with the imported memory.
Note that there isn't a formal spec for this extension yet. For now, it just serves as an experimental mechanism to import allocator memory in OpenGL, and attach metadata to texture objects.
You can pull these changes (written on top of the above) from:
https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_impo...
** kmscube **
Mostly minor fixes and improvements on top of James's port to use the allocator. Main thing is the allocator initialization path will use EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported by the underlying EGL implementation.
You can pull these changes from:
https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
With all the above you should be able to get kmscube working using the allocator on top of the Nouveau driver.
Another of the missing pieces before we can move this to production is importing allocations to DRM FB objects. This is probably one of the most sensitive parts of the project as it requires modification/addition of kernel driver interfaces.
At XDC2017, James had several hallway conversations with several people about this, all having different opinions. I'd like to take this opportunity to also start a discussion about what's the best option to create a path to get allocator allocations added as DRM FB objects.
These are the few options we've considered to start with:
A) Have vendor-private ioctls to set properties on GEM objects that are inherited by the FB objects. This is how our (NVIDIA) desktop DRM driver currently works. This would require every vendor to add their own ioctl to process allocator metadata, but the metadata is actually a vendor-agnostic object more like DRM modifiers. We'd like to come up with a vendor-agnostic solutions that can be integrated to core DRM.
B) Add a new drmModeAddFBWithMetadata() command that takes allocator metadata blobs for each plane of the FB. Some people in the community have mentioned this is their preferred design. This, however, means we'd have to go through the exercise of adding another metadata mechanism to the whole graphics stack.
C) Shove allocator metadata into DRM by defining it to be a separate plane in the image, and using the existing DRM modifiers mechanism to indicate there is another plane for each "real" plane added. It isn't clear how this scales to surfaces that already need several planes, but there are some people that see this as the only way forward. Also, we would have to create a separate GEM buffer for the metadatada itself, which seems excessive.
We personally like option (B) better, and have already started to prototype the new path (which is actually very similar to the drmModeAddFB2() one). You can take a look at the new interfaces here:
https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadat...
There may be other options that haven't been explored yet that could be a better choice than the above, so any suggestion will be greatly appreciated.
What kind of metadata are we talking about here? Addfb has tons of stuff already that's "metadata". The only thing I've spotted is PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell userspace, but definitely not something addfb ever needs. addfb only needs the resulting pitch that we actually allocated (and might decide it doesn't like that, but that's a different issue).
Sorry I failed to make it clearer. Metadata here refers to all allocation parameters the generic allocator was given to allocate memory. That currently means the final capability set used for the allocation, including all constraints (such as memory alignment, pitch alignment, and others) and capabilities, describing allocation properties like tiling formats, compression, and such.
Yeah, that part was all clear. I'd want more details of what exact kind of metadata. fast-clear colors? tiling layouts? aux data for the compressor? hiz (or whatever you folks call it) tree?
As you say, we've discussed massive amounts of different variants on this, and there's different answers for different questions. Consensus seems to be that bigger stuff (compression data, hiz, clear colors, ...) should be stored in aux planes, while the exact layout and what kind of aux planes you have are encoded in the modifier.
And since there's no patches for nouveau itself I can't really say anything beyond that.
I can work on implementing these interfaces for nouveau, maybe partially, if that's going to help. I just thought it'd be better to first start a discussion on what would be the right way to pass allocator metadata to display drivers before starting to seriously implement any of the proposed options.
It's not so much wiring down the interfaces, but actually implementing the features. "We need more than the 56bits of modifier" is a lot more plausible when you have the full stack showing that you do actually need it. Or well, not a full stack but at least a demo that shows what you want to pull of but can't do right now.
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of 56 bits that configure your tiling/swizzling for color buffers. What I don't buy is that you need all those combinations when passing buffers around between codecs, cameras and display controllers. Even if you're sharing between the same 3D drivers in different processes, I expect just locking down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
Tegra just redesigned it's modifier space from an ungodly amount of bits to just a few layouts. Not even just the ones in used, but simply limiting to the ones that make sense (there's dependencies apparently) Also note that the modifier alone doesn't need to describe the layout precisely, it only makes sense together with a specific pixel format and size. E.g. a bunch of the i915 layouts change layout depending upon bpp.
If you want us the redesign KMS and the rest of the eco system around blobs instead of the modifiers that are now moderately pervasive, you have to justify it a little better than just "we didn't find it suitable".
Given that this involves the kernel and hence the kernel's userspace requirements for merging stuff (assuming of course you want to establish this as an upstream interface), then I'd say a sufficient demonstration would be actually running out of bits in nouveau (kernel+mesa). -Daniel
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen hoegsberg@google.com wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg hoegsberg@gmail.com wrote:
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of 56 bits that configure your tiling/swizzling for color buffers. What I don't buy is that you need all those combinations when passing buffers around between codecs, cameras and display controllers. Even if you're sharing between the same 3D drivers in different processes, I expect just locking down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for each vendor a reasonably sized set of tiling formats that are appropriate for images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all vendors. Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
Tegra just redesigned it's modifier space from an ungodly amount of bits to just a few layouts. Not even just the ones in used, but simply limiting to the ones that make sense (there's dependencies apparently) Also note that the modifier alone doesn't need to describe the layout precisely, it only makes sense together with a specific pixel format and size. E.g. a bunch of the i915 layouts change layout depending upon bpp.
On Tue, Feb 20, 2018 at 10:14:47PM -0800, Chad Versace wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen hoegsberg@google.com wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg hoegsberg@gmail.com wrote:
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of 56 bits that configure your tiling/swizzling for color buffers. What I don't buy is that you need all those combinations when passing buffers around between codecs, cameras and display controllers. Even if you're sharing between the same 3D drivers in different processes, I expect just locking down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for each vendor a reasonably sized set of tiling formats that are appropriate for images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all vendors. Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
fwiw (since the source of truth wrt modifiers is the kernel's uapi header):
Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
I'm happy to merge modifier #define additions for pretty much anything where there's a need for sharing across devices/drivers/apis, explicitly including stuff that's only relevant for userspace and which the kernel nevers sees (in e.g. a kms addfb2 call). Trying to preemptively enumerate everything that's possible doesn't seem like a wise idea. But even then we can probably spare the oddball vendor prefix is a driver team really insists that this is what they want, best using some code that makes the case for them. -Daniel
On Wed 21 Feb 2018, Daniel Vetter wrote:
On Tue, Feb 20, 2018 at 10:14:47PM -0800, Chad Versace wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen hoegsberg@google.com wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg hoegsberg@gmail.com wrote:
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of 56 bits that configure your tiling/swizzling for color buffers. What I don't buy is that you need all those combinations when passing buffers around between codecs, cameras and display controllers. Even if you're sharing between the same 3D drivers in different processes, I expect just locking down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for each vendor a reasonably sized set of tiling formats that are appropriate for images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all vendors. Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
fwiw (since the source of truth wrt modifiers is the kernel's uapi header):
Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
Linux would eventually encounter big problems if the kernel and Vulkan disagreed on the fundamental, unspoken Theory of Modifiers. So your acked-by is definitely worth something here. Thanks for confirming.
I'm happy to merge modifier #define additions for pretty much anything where there's a need for sharing across devices/drivers/apis, explicitly including stuff that's only relevant for userspace and which the kernel nevers sees (in e.g. a kms addfb2 call). Trying to preemptively enumerate everything that's possible doesn't seem like a wise idea. But even then we can probably spare the oddball vendor prefix is a driver team really insists that this is what they want, best using some code that makes the case for them.
Yep. I believe Jason Ekstrand has tentative plans for such a modifier that improves performance for interop in GL and Vulkan but the kernel and Intel display hw wouldn't understand: a modifier for CCS_E images that are fully compressed.
On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace chadversary@chromium.org wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen hoegsberg@google.com wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg hoegsberg@gmail.com wrote:
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a total of 56 bits that configure your tiling/swizzling for color buffers. What I don't buy is that you need all those combinations when passing buffers around between codecs, cameras and display controllers. Even if you're sharing between the same 3D drivers in different processes, I expect just locking down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for each vendor a reasonably sized set of tiling formats that are appropriate for images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all vendors. Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode are defined more by the asic specific constraints than the tiling mode itself. At a high level we have basically 3 tiling modes (out of 16 possible) that would be the minimum we'd want to expose for gfx6-8. gfx9 uses a completely new scheme. 1. Linear (per asic stride requirements, not usable by many hw blocks) 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick) 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible), tile split (7 possible), sample split (4 possible), num banks (4 possible), bank width (4 possible), bank height (4 possible), macro tile aspect (4 possible) all of which are asic config specific)
I guess we could do something like: AMD_GFX6_LINEAR_ALIGNED_64B AMD_GFX6_LINEAR_ALIGNED_256B AMD_GFX6_LINEAR_ALIGNED_512B AMD_GFX6_1D_THIN_DISPLAY AMD_GFX6_1D_THIN_DEPTH AMD_GFX6_1D_THIN_ROTATED AMD_GFX6_1D_THIN_THIN AMD_GFX6_1D_THIN_THICK AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 etc.
We only probably need 40 bits to encode all of the tiling parameters so we could do family, plus tiling encoding that still seems unwieldy to deal with from an application perspective. All of the parameters affect the alignment requirements.
Alex
On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher alexdeucher@gmail.com wrote:
On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace chadversary@chromium.org
wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
hoegsberg@google.com> wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
mvicomoya@nvidia.com> wrote:
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
hoegsberg@gmail.com> wrote:
I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
56 bits that configure your tiling/swizzling for color buffers. What
I don't
buy is that you need all those combinations when passing buffers
around
between codecs, cameras and display controllers. Even if you're
sharing
between the same 3D drivers in different processes, I expect just
locking
down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for
each
vendor a reasonably sized set of tiling formats that are
appropriate for
images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all
vendors.
Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode are defined more by the asic specific constraints than the tiling mode itself. At a high level we have basically 3 tiling modes (out of 16 possible) that would be the minimum we'd want to expose for gfx6-8. gfx9 uses a completely new scheme.
- Linear (per asic stride requirements, not usable by many hw blocks)
- 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
- 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4 possible), bank width (4 possible), bank height (4 possible), macro tile aspect (4 possible) all of which are asic config specific)
I guess we could do something like: AMD_GFX6_LINEAR_ALIGNED_64B AMD_GFX6_LINEAR_ALIGNED_256B AMD_GFX6_LINEAR_ALIGNED_512B AMD_GFX6_1D_THIN_DISPLAY AMD_GFX6_1D_THIN_DEPTH AMD_GFX6_1D_THIN_ROTATED AMD_GFX6_1D_THIN_THIN AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
etc.
We only probably need 40 bits to encode all of the tiling parameters so we could do family, plus tiling encoding that still seems unwieldy to deal with from an application perspective. All of the parameters affect the alignment requirements.
We discussed this earlier in the thread, here's what I said:
Another point here is that the modifier doesn't need to encode all the thing you have to communicate to the HW. For a given width, height, format, compression type and maybe a few other high-level parameters, I'm skeptical that the remaining tile parameters aren't just mechanically derivable using a fixed table or formula. So instead of thinking of the modifiers as something you can just memcpy into a state packet, it identifies a family of configurations - enough information to deterministically derive the full exact configuration. The formula may change, for example for different hardware or if it's determined to not be optimal, and in that case, we can use a new modifier to represent to new formula.
Kristian
Alex
On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg hoegsberg@gmail.com wrote:
On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher alexdeucher@gmail.com wrote:
On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace chadversary@chromium.org
wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
hoegsberg@google.com> wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
mvicomoya@nvidia.com> wrote:
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
hoegsberg@gmail.com> wrote:
> I'd like to see concrete examples of actual display controllers > supporting more format layouts than what can be specified with a 64 > bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
56 bits that configure your tiling/swizzling for color buffers. What
I don't
buy is that you need all those combinations when passing buffers
around
between codecs, cameras and display controllers. Even if you're
sharing
between the same 3D drivers in different processes, I expect just
locking
down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for
each
vendor a reasonably sized set of tiling formats that are
appropriate for
images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all
vendors.
Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode are defined more by the asic specific constraints than the tiling mode itself. At a high level we have basically 3 tiling modes (out of 16 possible) that would be the minimum we'd want to expose for gfx6-8. gfx9 uses a completely new scheme.
- Linear (per asic stride requirements, not usable by many hw blocks)
- 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
- 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4 possible), bank width (4 possible), bank height (4 possible), macro tile aspect (4 possible) all of which are asic config specific)
I guess we could do something like: AMD_GFX6_LINEAR_ALIGNED_64B AMD_GFX6_LINEAR_ALIGNED_256B AMD_GFX6_LINEAR_ALIGNED_512B AMD_GFX6_1D_THIN_DISPLAY AMD_GFX6_1D_THIN_DEPTH AMD_GFX6_1D_THIN_ROTATED AMD_GFX6_1D_THIN_THIN AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
etc.
We only probably need 40 bits to encode all of the tiling parameters so we could do family, plus tiling encoding that still seems unwieldy to deal with from an application perspective. All of the parameters affect the alignment requirements.
We discussed this earlier in the thread, here's what I said:
Another point here is that the modifier doesn't need to encode all the thing you have to communicate to the HW. For a given width, height, format, compression type and maybe a few other high-level parameters, I'm skeptical that the remaining tile parameters aren't just mechanically derivable using a fixed table or formula. So instead of thinking of the modifiers as something you can just memcpy into a state packet, it identifies a family of configurations - enough information to deterministically derive the full exact configuration. The formula may change, for example for different hardware or if it's determined to not be optimal, and in that case, we can use a new modifier to represent to new formula.
I think this is not so much about being able to dump it in a state packet, but about sharing between different GPUs of AMD. We have basically only a few interesting tiling modes if you look at a single GPU, but checking if those are equal depends on the other bits which may or may not be different per chip for the same conceptual tiling mode. We could just put a chip identifier in, but that would preclude any sharing while I think we can do some.
- Bas
Kristian
Alex
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg hoegsberg@gmail.com wrote:
On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher alexdeucher@gmail.com wrote:
On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace chadversary@chromium.org
wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
hoegsberg@google.com> wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
mvicomoya@nvidia.com> wrote:
> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
hoegsberg@gmail.com> wrote:
>> I'd like to see concrete examples of actual display controllers >> supporting more format layouts than what can be specified with a 64 >> bit modifier. > > The main problem is our tiling and other metadata parameters can't > generally fit in a modifier, so we find passing a blob of metadata a > more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
56 bits that configure your tiling/swizzling for color buffers. What
I don't
buy is that you need all those combinations when passing buffers
around
between codecs, cameras and display controllers. Even if you're
sharing
between the same 3D drivers in different processes, I expect just
locking
down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for
each
vendor a reasonably sized set of tiling formats that are
appropriate for
images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all
vendors.
Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode are defined more by the asic specific constraints than the tiling mode itself. At a high level we have basically 3 tiling modes (out of 16 possible) that would be the minimum we'd want to expose for gfx6-8. gfx9 uses a completely new scheme.
- Linear (per asic stride requirements, not usable by many hw blocks)
- 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
- 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4 possible), bank width (4 possible), bank height (4 possible), macro tile aspect (4 possible) all of which are asic config specific)
I guess we could do something like: AMD_GFX6_LINEAR_ALIGNED_64B AMD_GFX6_LINEAR_ALIGNED_256B AMD_GFX6_LINEAR_ALIGNED_512B AMD_GFX6_1D_THIN_DISPLAY AMD_GFX6_1D_THIN_DEPTH AMD_GFX6_1D_THIN_ROTATED AMD_GFX6_1D_THIN_THIN AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
etc.
We only probably need 40 bits to encode all of the tiling parameters so we could do family, plus tiling encoding that still seems unwieldy to deal with from an application perspective. All of the parameters affect the alignment requirements.
We discussed this earlier in the thread, here's what I said:
Another point here is that the modifier doesn't need to encode all the thing you have to communicate to the HW. For a given width, height, format, compression type and maybe a few other high-level parameters, I'm skeptical that the remaining tile parameters aren't just mechanically derivable using a fixed table or formula. So instead of thinking of the modifiers as something you can just memcpy into a state packet, it identifies a family of configurations - enough information to deterministically derive the full exact configuration. The formula may change, for example for different hardware or if it's determined to not be optimal, and in that case, we can use a new modifier to represent to new formula.
I think this is not so much about being able to dump it in a state packet, but about sharing between different GPUs of AMD. We have basically only a few interesting tiling modes if you look at a single GPU, but checking if those are equal depends on the other bits which may or may not be different per chip for the same conceptual tiling mode. We could just put a chip identifier in, but that would preclude any sharing while I think we can do some.
Right. And the 2D ones, while they are the most complicated, are also the most interesting from a performance perspective so ideally you'd find a match on one of those. If you don't expose the 2D modes, there's not much point in supporting modifiers at all.
Alex
On 02/22/2018 01:16 PM, Alex Deucher wrote:
On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg hoegsberg@gmail.com wrote:
On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher alexdeucher@gmail.com wrote:
On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace chadversary@chromium.org
wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
hoegsberg@google.com> wrote:
> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
mvicomoya@nvidia.com> wrote:
>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
hoegsberg@gmail.com> wrote:
>>> I'd like to see concrete examples of actual display controllers >>> supporting more format layouts than what can be specified with a 64 >>> bit modifier. >> >> The main problem is our tiling and other metadata parameters can't >> generally fit in a modifier, so we find passing a blob of metadata a >> more suitable mechanism. > > I understand that you may have n knobs with a total of more than a
total of
> 56 bits that configure your tiling/swizzling for color buffers. What
I don't
> buy is that you need all those combinations when passing buffers
around
> between codecs, cameras and display controllers. Even if you're
sharing
> between the same 3D drivers in different processes, I expect just
locking
> down, say, 64 different combinations (you can add more over time) and > assigning each a modifier would be sufficient. I doubt you'd extract > meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for
each
vendor a reasonably sized set of tiling formats that are
appropriate for
images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all
vendors.
Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode are defined more by the asic specific constraints than the tiling mode itself. At a high level we have basically 3 tiling modes (out of 16 possible) that would be the minimum we'd want to expose for gfx6-8. gfx9 uses a completely new scheme.
- Linear (per asic stride requirements, not usable by many hw blocks)
- 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
- 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4 possible), bank width (4 possible), bank height (4 possible), macro tile aspect (4 possible) all of which are asic config specific)
I guess we could do something like: AMD_GFX6_LINEAR_ALIGNED_64B AMD_GFX6_LINEAR_ALIGNED_256B AMD_GFX6_LINEAR_ALIGNED_512B AMD_GFX6_1D_THIN_DISPLAY AMD_GFX6_1D_THIN_DEPTH AMD_GFX6_1D_THIN_ROTATED AMD_GFX6_1D_THIN_THIN AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
etc.
We only probably need 40 bits to encode all of the tiling parameters so we could do family, plus tiling encoding that still seems unwieldy to deal with from an application perspective. All of the parameters affect the alignment requirements.
We discussed this earlier in the thread, here's what I said:
Another point here is that the modifier doesn't need to encode all the thing you have to communicate to the HW. For a given width, height, format, compression type and maybe a few other high-level parameters, I'm skeptical that the remaining tile parameters aren't just mechanically derivable using a fixed table or formula. So instead of thinking of the modifiers as something you can just memcpy into a state packet, it identifies a family of configurations - enough information to deterministically derive the full exact configuration. The formula may change, for example for different hardware or if it's determined to not be optimal, and in that case, we can use a new modifier to represent to new formula.
I think this is not so much about being able to dump it in a state packet, but about sharing between different GPUs of AMD. We have basically only a few interesting tiling modes if you look at a single GPU, but checking if those are equal depends on the other bits which may or may not be different per chip for the same conceptual tiling mode. We could just put a chip identifier in, but that would preclude any sharing while I think we can do some.
Right. And the 2D ones, while they are the most complicated, are also the most interesting from a performance perspective so ideally you'd find a match on one of those. If you don't expose the 2D modes, there's not much point in supporting modifiers at all.
This is essentially the problem I keep running into when trying to work up something based on the suggestions here as well. Yes, for a given build of our driver on a single device, we can re-derive exactly the same tiling parameters given a few manageable constraints. That was the essence of the design of the Vulkan external objects framework, and it comes with all the limitations I'm trying to avoid by introducing the more complex allocator framework:
-We want to share across GPUs.
-We potentially want to share across non-version-locked driver components, even potentially between Nouveau-driven/Tegra-DRM driven GPUs and NVIDIA proprietary driven GPUs. There's no way we can assure the drivers use the same algorithm there.
Taking it further than even I would like to, in a discussion over DRM format modifier usage in Vulkan, it was recently proposed that DRM format modifiers be used to serialize data in a pre-tiled format. I personally don't think DRM format modifiers should be used for this at all, but something like extended allocator meta-data might be appropriate.
At this point I've heard engineers from Intel, AMD, and of course myself at NVIDIA saying that while DRM format modifiers solve many more cases than assuming pitch-linear or doing magic to pass around metadata, they don't solve all the cases necessary to make optimal use of any of our HW in at least some interesting cases. Hence it seems reasonable to continue to improve the design of these mechanisms.
Responding to some earlier points that fell off my mail retention limit while I was on paternity leave:
I understand that it's an incomplete example, but even so I don't think this duplication is feasible. It's not a matter of how many use cases we have to duplicate at this point in time, it's that all these APIs are live, evolving APIs and keeping the allocator uptodate as various APIs grow new corner cases doesn't seem practical. Further, it's not orthogonal or composable - the allocator has to know about all producers and consumers and if I add a new piece of hardware I have to extend the allocator to understands its new use cases. With the modifier model, I just ask the new driver which modifiers it supports for the use case I'm interested in and feed those modifiers to the allocator.
There are currently 3 complete modern low-level 3D graphics APIs along with some slightly longer in the tooth higher-level alternatives being actively maintained at more or less the same feature level, countless video decode/encode APIs with more or less equivalent functionality, and more mode setting APIs than anyone wants. If that much total duplicated effort is possible, it seems feasible to maintain a list of layouts and related properties, most of which will see some re-use between all these APIs.
Further, the central library doesn't need to be burdened by all of these use cases unless they become cross-vendor. The usage itself is vendor-extensible, so if AMD had wanted to add a bunch of Mantle-only usage bits, they could have done so without cluttering the shared library code or namespace.
Vulkan isn't expected to know about video encode usage. You ask the video codec about supported modifiers for encode and you ask Vulkan for supported modifiers for, say optimal render usage. The allocator determines the optimal lowest common denominator and allocates the buffer. Maybe that's linear, or if you've designed both parts, maybe there's a simple shared tiled format that the encoder can source from.
It was determined early on in attempts to design this mechanism that such LCD intersection doesn't produce the optimal result. Only considering the usage holistically can produce optimal layouts.
For modifiers and liballocator as well, the meta data is copied by value (and passed through IPC) and as such can't model shared mutable information. That means, fast colors, compression aux buffers and such, has to be in a share BO plane.
Again, this is making large design assumptions. Fast clear color data, for example would be a very reasonable thing to include in static metadata given our driver+HW architecture.
Thanks, -James
Alex _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Thu, Feb 22, 2018 at 04:16:52PM -0500, Alex Deucher wrote:
On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen bas@basnieuwenhuizen.nl wrote:
On Thu, Feb 22, 2018 at 7:04 PM, Kristian H??gsberg hoegsberg@gmail.com wrote:
On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher alexdeucher@gmail.com wrote:
On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace chadversary@chromium.org
wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
hoegsberg@google.com> wrote:
> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
mvicomoya@nvidia.com> wrote:
>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian H??gsberg <
hoegsberg@gmail.com> wrote:
>>> I'd like to see concrete examples of actual display controllers >>> supporting more format layouts than what can be specified with a 64 >>> bit modifier. >> >> The main problem is our tiling and other metadata parameters can't >> generally fit in a modifier, so we find passing a blob of metadata a >> more suitable mechanism. > > I understand that you may have n knobs with a total of more than a
total of
> 56 bits that configure your tiling/swizzling for color buffers. What
I don't
> buy is that you need all those combinations when passing buffers
around
> between codecs, cameras and display controllers. Even if you're
sharing
> between the same 3D drivers in different processes, I expect just
locking
> down, say, 64 different combinations (you can add more over time) and > assigning each a modifier would be sufficient. I doubt you'd extract > meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for
each
vendor a reasonably sized set of tiling formats that are
appropriate for
images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all
vendors.
Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode are defined more by the asic specific constraints than the tiling mode itself. At a high level we have basically 3 tiling modes (out of 16 possible) that would be the minimum we'd want to expose for gfx6-8. gfx9 uses a completely new scheme.
- Linear (per asic stride requirements, not usable by many hw blocks)
- 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
- 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4 possible), bank width (4 possible), bank height (4 possible), macro tile aspect (4 possible) all of which are asic config specific)
I guess we could do something like: AMD_GFX6_LINEAR_ALIGNED_64B AMD_GFX6_LINEAR_ALIGNED_256B AMD_GFX6_LINEAR_ALIGNED_512B AMD_GFX6_1D_THIN_DISPLAY AMD_GFX6_1D_THIN_DEPTH AMD_GFX6_1D_THIN_ROTATED AMD_GFX6_1D_THIN_THIN AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
etc.
We only probably need 40 bits to encode all of the tiling parameters so we could do family, plus tiling encoding that still seems unwieldy to deal with from an application perspective. All of the parameters affect the alignment requirements.
We discussed this earlier in the thread, here's what I said:
Another point here is that the modifier doesn't need to encode all the thing you have to communicate to the HW. For a given width, height, format, compression type and maybe a few other high-level parameters, I'm skeptical that the remaining tile parameters aren't just mechanically derivable using a fixed table or formula. So instead of thinking of the modifiers as something you can just memcpy into a state packet, it identifies a family of configurations - enough information to deterministically derive the full exact configuration. The formula may change, for example for different hardware or if it's determined to not be optimal, and in that case, we can use a new modifier to represent to new formula.
I think this is not so much about being able to dump it in a state packet, but about sharing between different GPUs of AMD. We have basically only a few interesting tiling modes if you look at a single GPU, but checking if those are equal depends on the other bits which may or may not be different per chip for the same conceptual tiling mode. We could just put a chip identifier in, but that would preclude any sharing while I think we can do some.
Right. And the 2D ones, while they are the most complicated, are also the most interesting from a performance perspective so ideally you'd find a match on one of those. If you don't expose the 2D modes, there's not much point in supporting modifiers at all.
1. Make sure you have a test farm covering all your use cases and hw.
2. Create a struct that encodes everything. Make it a few kb big if it has to be, whatever it takes.
3. Do a little library that contains a huge table mapping modifiers to these structs, and one function that returns you the unique modifier for the given tiling layout description struct. We can have that in the kernel sources, or just delegate the entire AMD modifier block to some userspace library you're managing (with just the few modifiers the kernel needs in the uapi/drm_fourcc.h header). If the lib doesn't find the modifier, make it crash with a nice loud backtrace.
4. Add modifiers to that lib until you stop failing on the test farm.
5 optional: Make the lib faster with hashing/compressing/whatever if it turns out to be a bottleneck somewhere. Since you'll only ever need it on import/export, add a small cache with the relevant few entries for the device instance at hand and I don't expect this will be a problem, ever.
I'm pretty sure you'll finish step 4 before you run out of modifiers. If you don't, then we suck it up, admit sheepishly that modifiers turned out to be a stupid idea and rev the kernel's uapi. We know how to do that, but I also don't want to rev uapi just for fun.
Cheers, Daniel
Kristian Høgsberg hoegsberg@gmail.com writes:
On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher alexdeucher@gmail.com wrote:
On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace chadversary@chromium.org
wrote:
On Thu 21 Dec 2017, Daniel Vetter wrote:
On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
hoegsberg@google.com> wrote:
On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
mvicomoya@nvidia.com> wrote:
On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
hoegsberg@gmail.com> wrote:
> I'd like to see concrete examples of actual display controllers > supporting more format layouts than what can be specified with a 64 > bit modifier.
The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism.
I understand that you may have n knobs with a total of more than a
total of
56 bits that configure your tiling/swizzling for color buffers. What
I don't
buy is that you need all those combinations when passing buffers
around
between codecs, cameras and display controllers. Even if you're
sharing
between the same 3D drivers in different processes, I expect just
locking
down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob.
I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites.
I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt:
One goal of modifiers in the Linux ecosystem is to enumerate for
each
vendor a reasonably sized set of tiling formats that are
appropriate for
images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all
vendors.
Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats.
Where it gets tricky is how to select that subset? Our tiling mode are defined more by the asic specific constraints than the tiling mode itself. At a high level we have basically 3 tiling modes (out of 16 possible) that would be the minimum we'd want to expose for gfx6-8. gfx9 uses a completely new scheme.
- Linear (per asic stride requirements, not usable by many hw blocks)
- 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
- 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4 possible), bank width (4 possible), bank height (4 possible), macro tile aspect (4 possible) all of which are asic config specific)
I guess we could do something like: AMD_GFX6_LINEAR_ALIGNED_64B AMD_GFX6_LINEAR_ALIGNED_256B AMD_GFX6_LINEAR_ALIGNED_512B AMD_GFX6_1D_THIN_DISPLAY AMD_GFX6_1D_THIN_DEPTH AMD_GFX6_1D_THIN_ROTATED AMD_GFX6_1D_THIN_THIN AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
etc.
We only probably need 40 bits to encode all of the tiling parameters so we could do family, plus tiling encoding that still seems unwieldy to deal with from an application perspective. All of the parameters affect the alignment requirements.
We discussed this earlier in the thread, here's what I said:
Another point here is that the modifier doesn't need to encode all the thing you have to communicate to the HW. For a given width, height, format, compression type and maybe a few other high-level parameters, I'm skeptical that the remaining tile parameters aren't just mechanically derivable using a fixed table or formula. So instead of thinking of the modifiers as something you can just memcpy into a state packet, it identifies a family of configurations - enough information to deterministically derive the full exact configuration. The formula may change, for example for different hardware or if it's determined to not be optimal, and in that case, we can use a new modifier to represent to new formula.
Agreed. For Broadcom's VC5+ stuff, our tiling layout depends on the number of SDRAM banks and bank size, but all users of buffers will know what those are, so I'm not planning on including those in the modifier.
dri-devel@lists.freedesktop.org