Sorry, I forgot to CC correctly.
On Mon, Jul 22, 2013 at 12:53 PM, David Herrmann dh.herrmann@gmail.com wrote:
Hi
On Fri, Jul 19, 2013 at 11:13 AM, Thomas Hellstrom thellstrom@vmware.com wrote:
On 07/18/2013 10:54 PM, David Herrmann wrote:
Hi
On Thu, Jul 18, 2013 at 1:24 PM, Thomas Hellstrom thellstrom@vmware.com wrote:
...
I think that if there are good reasons to keep locking internal, I'm fine with that, (And also, of course, with Daniel's proposal). Currently the add / remove / lookup paths are mostly used by TTM during object creation and destruction.
However, if the lookup path is ever used by pread / pwrite, that situation might change and we would then like to minimize the locking.
I tried to keep the change as minimal as I could. Follow-up patches are welcome. I just thought pushing the lock into drm_vma_* would simplify things. If there are benchmarks that prove me wrong, I'll gladly spend some time optimizing that.
In the general case, one reason for designing the locking outside of a utilities like this, is that different callers may have different requirements. For example, the call path is known not to be multithreaded at all, or the caller may prefer a mutex over a spinlock for various reasons. It might also be that some callers will want to use RCU locking in the future if the lookup path becomes busy, and that would require *all* users to adapt to RCU object destruction...
I haven't looked at the code closely enough to say that any of this applies in this particular case, though.
Some notes why I went with local locking:
- mmap offset creation is done once and is independent of any other
operations you might do on the BO in parallel
- mmap lookup is also only done once in most cases. User-space rarely
maps a buffer twice (setup/teardown of mmaps is expensive, but keeping them around is very cheap). And for shared buffers only the writer maps it as the reader normally passes it to the kernel without mapping/touching it. Only for software-rendering we have two or more mappings of the same object.
- creating/mapping/destroying buffer objects is expensive in its
current form and buffers tend to stay around for a long time
I couldn't find a situation were the vma-manager was part of a performance critical path. But on the other hand, the paths were it is used are quite heavy. That's why I don't want to lock the whole buffer or even device. Instead, we need some "management lock" which is used for mmap-setup or similar things that don't affect other operations on the buffer or device. We don't have such a lock, so I introduced local locking. If we end up with more use-cases, I volunteer to refactor this. But I am no big fan of overgeneralizing it before more uses are known.
Anyway, I will resend with vm_lock removed (+_unlocked() helpers) so we keep the status-quo regarding locks. Thanks for the comments on TTM buffer transfer.
Thanks David
On 07/22/2013 12:55 PM, David Herrmann wrote:
Sorry, I forgot to CC correctly.
On Mon, Jul 22, 2013 at 12:53 PM, David Herrmann dh.herrmann@gmail.com wrote:
Hi
On Fri, Jul 19, 2013 at 11:13 AM, Thomas Hellstrom thellstrom@vmware.com wrote:
On 07/18/2013 10:54 PM, David Herrmann wrote:
Hi
On Thu, Jul 18, 2013 at 1:24 PM, Thomas Hellstrom thellstrom@vmware.com wrote:
...
I think that if there are good reasons to keep locking internal, I'm fine with that, (And also, of course, with Daniel's proposal). Currently the add / remove / lookup paths are mostly used by TTM during object creation and destruction.
However, if the lookup path is ever used by pread / pwrite, that situation might change and we would then like to minimize the locking.
I tried to keep the change as minimal as I could. Follow-up patches are welcome. I just thought pushing the lock into drm_vma_* would simplify things. If there are benchmarks that prove me wrong, I'll gladly spend some time optimizing that.
In the general case, one reason for designing the locking outside of a utilities like this, is that different callers may have different requirements. For example, the call path is known not to be multithreaded at all, or the caller may prefer a mutex over a spinlock for various reasons. It might also be that some callers will want to use RCU locking in the future if the lookup path becomes busy, and that would require *all* users to adapt to RCU object destruction...
I haven't looked at the code closely enough to say that any of this applies in this particular case, though.
Some notes why I went with local locking:
- mmap offset creation is done once and is independent of any other
operations you might do on the BO in parallel
- mmap lookup is also only done once in most cases. User-space rarely
maps a buffer twice (setup/teardown of mmaps is expensive, but keeping them around is very cheap). And for shared buffers only the writer maps it as the reader normally passes it to the kernel without mapping/touching it. Only for software-rendering we have two or more mappings of the same object.
- creating/mapping/destroying buffer objects is expensive in its
current form and buffers tend to stay around for a long time
I couldn't find a situation were the vma-manager was part of a performance critical path. But on the other hand, the paths were it is used are quite heavy. That's why I don't want to lock the whole buffer or even device. Instead, we need some "management lock" which is used for mmap-setup or similar things that don't affect other operations on the buffer or device. We don't have such a lock, so I introduced local locking. If we end up with more use-cases, I volunteer to refactor this. But I am no big fan of overgeneralizing it before more uses are known.
I think we are discussing two different things here:
1) Having a separate lock for vma management vs 2) building that lock into the vma manager utility.
You're arguing for 1) and I completely agree with you, and although I still think one generally should avoid building locks into utilities like this (avoid 2), I'm fine with the current approach.
Thanks, Thomas
dri-devel@lists.freedesktop.org