(Adding dri-devel back, and trying to respond to some comments from the different forks)
James Jones wrote:
(thanks James for most of the info below)
To elaborate a bit, if we want to share an allocation across GPUs for 3D rendering, it seems we would need 12 bits to express our swizzling/tiling memory layouts for fermi+. In addition to that, maxwell uses 3 more bits for this, and we need an extra bit to identify pre-fermi representations.
We also need one bit to differentiate between Tegra and desktop, and another one to indicate whether the layout is otherwise linear.
Then things like whether compression is used (one more bit), and we can probably get by with 3 bits for the type of compression if we are creative. However, it'd be way easier to just track arch + page kind, which would be like 32 bits on its own.
Whether Z-culling and/or zero-bandwidth-clears are used may be another 3 bits.
If device-local properties are included, we might need a couple more bits for caching.
We may also need to express locality information, which may take at least another 2 or 3 bits.
If we want to share array textures too, you also need to pass the array pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on its own.
So yes, as James mentioned, with some effort, we could technically fit our current allocation parameters in a modifier, but I'm still not convinced this is as future proof as it could be as our hardware grows in capabilities.
Daniel Stone wrote:
I'm a bit confused. Can't modifiers be specified by vendors and only interpreted by drivers? My understanding was that modifiers could actually be treated as opaque 64-bit data, in which case they would qualify as "magic blobs of data". Otherwise, it seems this wouldn't be scalable. What am I missing?
Daniel Vetter wrote:
Not sure whether I might be misunderstanding your statement, but one of the allocator main features is negotiation of nearly optimal allocation parameters given a set of uses on different devices/engines by the capability merge operation. A client should have queried what every device/engine is capable of for the given uses, find the optimal set of capabilities, and use it for allocating a buffer. At the moment these parameters are given to KMS, they are expected to be good. If they aren't, the client didn't do things right.
Rob Clark wrote:
I guess we can indeed start with modifiers for now, if that's what it takes to get the allocator mechanisms rolling. However, it seems to me that we won't be able to encode the same type of information included in capability sets with modifiers in all cases. For instance, if we end up encoding usage transition information in capability sets, how that would translate to modifiers?
I assume display doesn't really care about a lot of the data capability sets may encode, but is it correct to think of modifiers as things only display needs? If we are to treat modifiers as a first-class citizen, I would expect to use them beyond that.
Kristian Kristensen wrote:
Why detached from the display driver? I don't see why there couldn't be an allocator driver with access to display capabilities that can be used in the negotiation step to find the optimal set of allocation parameters.
Kristian Kristensen wrote:
If someone has N knobs available, I don't understand why there shouldn't be a mechanism that allows making use of them all, regardless of performance numbers.
Daniel Vetter wrote:
My understanding is that capability sets may include all metadata you mentioned. Besides tiling/swizzling layout and compression parameters, things like zero-bandwidth-clears (I guess the same or similar to fast-clear colors?), hiz-like data, device-local properties such as caches, or locality information could/will be also included in a capability set. We are even considering encoding some sort of usage transition information in the capability set itself.
Thanks, Miguel.
On Thu, Dec 28, 2017 at 1:24 PM, Miguel Angel Vico mvicomoya@nvidia.com wrote:
btw, the places where modifiers are used currently is limited to 2d textures, without mipmap levels. Basically scanout buffers, winsys buffers, decoded frames of video, and that sort of thing. I think we can keep it that way, which avoids needing to encode additional info (layer pitch, z tiling info for 3d textures, or whatever else).
So we just need to have something in userspace that translates the relevant subset of capability set info to modifiers.
Maybe down the road, if capability sets are ubiquitous we can "promote" that mechanism to kernel uabi.. although tbh I am not entirely sure I can envision a use-case where kernel needs to know about a cubemap array texture.
BR, -R
On 12/28/2017 10:24 AM, Miguel Angel Vico wrote:
Not clear if this is an NV-only term, so for those not familiar, page kind is very loosely the equivalent of a format modifier our HW uses internally in its memory management subsystem. The value mappings vary a bit for each HW generation.
Right, this becomes a lot more interesting when modifiers or capability sets start getting used to share things from Vulkan<->Vulkan, for example. Of course, we don't need to change kernel ABIs for that, but wayland protocols, Vulkan extensions, etc. might need modification. Regardless, I agree with Miguel's sentiment. Let's at least defer this debate a bit until we know more about what capability sets look like. If modifiers alone still seem sufficient, so be it.
In addition, speaking to some other portions of your response, most of the usage in the prototype is placeholder stuff for testing. USE_SCANNOUT is partially expanded to include orientation as well, which helps in some cases on our hardware. If there's more complex stuff for other display hardware, it needs to be expanded further, or that HW is free to expose a vendor-specific usage, since usage is extensible. It's easy to mirror in all the relevant usage flags from other APIs or engines too. That's a rather small amount of duplication.
The important part is the logic that selects optimal usage. I don't think it's possible to select optimal usage with the queries spread around all the APIs. Vulkan isn't going to know about video encode usage. In many situations it won't know about display usage. It just knows optimal texture/render usage. Therefore it can't optimize parameters for usage it doesn't know about it. A centralized allocator can, especially when all the usage ends up delegated to a single device/GPU. It will have all the same information available to it on the back end because it can access DRM devices, v4l devices, etc. to query their capabilities via allocator backends, but it can have more information available on the front end from the app, and a more complete solution returned from a driver that is able to parse and consider that additional information.
Additionally, I again offer the goal of an optimal gralloc implementation built on top of the allocator mechanism. I find it difficult to imagine building gralloc on top of Vulkan or EGL and DRM. Does such a solution seem feasible to you? I've not researched this significantly myself, but Google Android engineers shared that concern when we had the initial discussions at XDC 2016.
I think there's some nuance here. The format of compression metadata would clearly be a capability set thing. The compression data itself would indeed be in some auxiliary surface on most/all hardware. Things like fast clears are harder to nail down because implementations seem more varied there. It might be very awkward on some hardware to put the necessary meta-data in a DRM FB plane, while that might be the only reasonable way to accomplish it on other hardware. I think we'll have to work through some corner cases across lots of hardware before this bottoms out.
Thanks, -James
On Wed, Jan 3, 2018 at 11:26 AM, James Jones jajones@nvidia.com wrote:
Modifers aren't display only, but I suppose they are 2D color buffer only - no mip maps, texture arrays, cube maps etc. But within that scope, they should provide a mechanism for negotiating the optimal layout for a given use case.
Another point here is that the modifier doesn't need to encode all the thing you have to communicate to the HW. For a given width, height, format, compression type and maybe a few other high-level parameters, I'm skeptical that the remaining tile parameters aren't just mechanically derivable using a fixed table or formula. So instead of thinking of the modifiers as something you can just memcpy into a state packet, it identifies a family of configurations - enough information to deterministically derive the full exact configuration. The formula may change, for example for different hardware or if it's determined to not be optimal, and in that case, we can use a new modifier to represent to new formula.
I understand that it's an incomplete example, but even so I don't think this duplication is feasible. It's not a matter of how many use cases we have to duplicate at this point in time, it's that all these APIs are live, evolving APIs and keeping the allocator uptodate as various APIs grow new corner cases doesn't seem practical. Further, it's not orthogonal or composable - the allocator has to know about all producers and consumers and if I add a new piece of hardware I have to extend the allocator to understands its new use cases. With the modifier model, I just ask the new driver which modifiers it supports for the use case I'm interested in and feed those modifiers to the allocator.
Vulkan isn't expected to know about video encode usage. You ask the video codec about supported modifiers for encode and you ask Vulkan for supported modifiers for, say optimal render usage. The allocator determines the optimal lowest common denominator and allocates the buffer. Maybe that's linear, or if you've designed both parts, maybe there's a simple shared tiled format that the encoder can source from.
For modifiers and liballocator as well, the meta data is copied by value (and passed through IPC) and as such can't model shared mutable information. That means, fast colors, compression aux buffers and such, has to be in a share BO plane.
Kristian
Just wanted to clarify this one thing here, otherwise I think Rob/krh covered it all.
On Thu, Dec 28, 2017 at 10:24:38AM -0800, Miguel Angel Vico wrote:
Your example code has a new capability for PITCH_ALIGNMENT. That looks wrong for addfb (which should only received the the computed intersection of all requirements, not the requirements itself). And since that was the only thing in your example code besides the bare boilerplate to wire it all up it looks a bit confused.
Maybe we need to distinguish capabilities into constraints on properties (like pitch alignment, or power-of-two pitch) and properties (like pitch) themselves. -Daniel
dri-devel@lists.freedesktop.org