On Wed, Oct 5, 2011 at 7:54 AM, Thomas Hellstrom thomas@shipmail.org wrote:
In any case, I'm not saying fences is the best way to flush but since the bo code assumes that signaling a sync object means "make the buffer contents available for CPU read / write", it's usually a good way to do it; there's even a sync_obj_flush() method that gets called when a potential flush needs to happen.
I don't think we use it like that. To my knowledge, the purpose of the sync obj (to Radeon Gallium drivers at least) is to be able to wait for the last use of a buffer. Whether the contents can or cannot be available to the CPU is totally irrelevant.
Currently (and it's a very important performance optimization), buffers stay mapped and available for CPU read/write during their first map_buffer call. Unmap_buffer is a no-op. The unmapping happens on buffer destruction. We only call bo_wait when we want to wait for the GPU until it's done with the buffer (we don't always want that, sometimes we want to use the unsychronized flag). Otherwise the contents of buffers are available at *any time*.
We could probably implement bo_wait privately in the kernel driver and not use ttm_bo_wait. I preferred code sharing though.
Textures (especially the tiled ones) are never mapped directly and a temporary staging resource is used instead, so we don't actually pollute address space that much. (in case you would have such a remark) We will use staging resources for buffers too, but it's really the last resort to avoid waiting when direct access can't be used.
- Can't we say that a write_sync_obj is simply a sync_obj? What's the
difference between those two? I think we should remove the write_sync_obj bo member.
Okay, but I think we should remove sync_obj instead, and keep write and read sync objs. In the case of READWRITE usage, read_sync_obj would be equal to write_sync_obj.
Sure, I'm fine with that.
One other thing, though, that makes me a little puzzled:
Let's assume you don't allow readers and writers at the same time, which is my perception of how read- and write fences should work; you either have a list of read fences or a single write fence (in the same way a read-write lock works).
Now, if you only allow a single read fence, like in this patch. That implies that you can only have either a single read fence or a single write fence at any one time. We'd only need a single fence pointer on the bo, and sync_obj_arg would tell us whether to signal the fence for read or for write (assuming that sync_obj_arg was set up to indicate read / write at validation time), then the patch really isn't necessary at all, as it only allows a single read fence?
Or is it that you want to allow read- and write fences co-existing? In that case, what's the use case?
There are lots of read-write use cases which don't need any barriers or flushing. The obvious ones are color blending and depth-stencil buffering. The OpenGL application is also allowed to use a subrange of a buffer as a vertex buffer (read-only) and another disjoint subrange of the same buffer for transform feedback (write-only), which kinda makes me think about whether we should track subranges instead of treating a whole buffer as "busy". It gets even more funky with ARB_shader_image_load_store, which supports atomic read-modify-write operations on textures, not to mention atomic memory operations in compute shaders (wait, isn't that also exposed in GL as GL_ARB_shader_atomic_counters?).
I was thinking whether the two sync objs should be "read" and "readwrite", or "read" and "write". I chose the latter, because it's more fine-grained and we have to keep at least two of them around anyway.
So now that you know what we use sync objs for, what are your ideas on re-implementing that patch in a way that is okay with you? Besides removing the third sync objs of course.
Marek