Hi Christian, David,
Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
Essentially we're failing a test in crucible : func.sync.semaphore-fd.opaque-fd This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue. Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other. It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost. Because we have 2 devices and they both have their own submission thread.
Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload. This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
The scenario would look like this : - vkQueueSubmit(queueA, signal on semA); - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace. - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit(). - vkQueueSubmit(queueB, wait on semA); - in the caller thread, this would read the syncobj additional u64 payload - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj. It would need an additional ioctl to read & increment the value.
What do you think?
-Lionel
Hi Lionel,
By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue. By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
-David
-----Original Message----- From: Lionel Landwerlin lionel.g.landwerlin@intel.com Sent: Friday, August 2, 2019 4:52 AM To: dri-devel dri-devel@lists.freedesktop.org; Koenig, Christian Christian.Koenig@amd.com; Zhou, David(ChunMing) David1.Zhou@amd.com; Jason Ekstrand jason@jlekstrand.net Subject: Threaded submission & semaphore sharing
Hi Christian, David,
Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
Essentially we're failing a test in crucible : func.sync.semaphore-fd.opaque-fd This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue. Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other. It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost. Because we have 2 devices and they both have their own submission thread.
Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload. This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
The scenario would look like this : - vkQueueSubmit(queueA, signal on semA); - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace. - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit(). - vkQueueSubmit(queueB, wait on semA); - in the caller thread, this would read the syncobj additional u64 payload - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj. It would need an additional ioctl to read & increment the value.
What do you think?
-Lionel
There aren't CTS tests covering the issue I was mentioning. But we could add them.
I don't have all the details regarding your implementation but even with the "semaphore thread", I could see it running into the same issues. What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?
For example with queueA & queueB from 2 different VkDevice : vkQueueSubmit(queueA, signal semA); vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with timelineSemB triggering a wait before signal. vkQueueSubmit(queueB, signal semA);
-Lionel
On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
Hi Lionel,
By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue. By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
-David
-----Original Message----- From: Lionel Landwerlin lionel.g.landwerlin@intel.com Sent: Friday, August 2, 2019 4:52 AM To: dri-devel dri-devel@lists.freedesktop.org; Koenig, Christian Christian.Koenig@amd.com; Zhou, David(ChunMing) David1.Zhou@amd.com; Jason Ekstrand jason@jlekstrand.net Subject: Threaded submission & semaphore sharing
Hi Christian, David,
Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
Essentially we're failing a test in crucible : func.sync.semaphore-fd.opaque-fd This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue. Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other. It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost. Because we have 2 devices and they both have their own submission thread.
Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload. This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
The scenario would look like this : - vkQueueSubmit(queueA, signal on semA); - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace. - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit(). - vkQueueSubmit(queueB, wait on semA); - in the caller thread, this would read the syncobj additional u64 payload - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj. It would need an additional ioctl to read & increment the value.
What do you think?
-Lionel
Hi Lionel,
Well could you describe once more what the problem is?
Cause I don't fully understand why a rather normal tandem submission with two semaphores should fail in any way.
Regards, Christian.
Am 02.08.2019 06:28 schrieb Lionel Landwerlin lionel.g.landwerlin@intel.com: There aren't CTS tests covering the issue I was mentioning. But we could add them.
I don't have all the details regarding your implementation but even with the "semaphore thread", I could see it running into the same issues. What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?
For example with queueA & queueB from 2 different VkDevice : vkQueueSubmit(queueA, signal semA); vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with timelineSemB triggering a wait before signal. vkQueueSubmit(queueB, signal semA);
-Lionel
On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
Hi Lionel,
By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue. By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
-David
-----Original Message----- From: Lionel Landwerlin lionel.g.landwerlin@intel.com Sent: Friday, August 2, 2019 4:52 AM To: dri-devel dri-devel@lists.freedesktop.org; Koenig, Christian Christian.Koenig@amd.com; Zhou, David(ChunMing) David1.Zhou@amd.com; Jason Ekstrand jason@jlekstrand.net Subject: Threaded submission & semaphore sharing
Hi Christian, David,
Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
Essentially we're failing a test in crucible : func.sync.semaphore-fd.opaque-fd This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue. Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other. It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost. Because we have 2 devices and they both have their own submission thread.
Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload. This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
The scenario would look like this : - vkQueueSubmit(queueA, signal on semA); - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace. - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit(). - vkQueueSubmit(queueB, wait on semA); - in the caller thread, this would read the syncobj additional u64 payload - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj. It would need an additional ioctl to read & increment the value.
What do you think?
-Lionel
Hey Christian,
The problem boils down to the fact that we don't immediately create dma fences when calling vkQueueSubmit(). This is delayed to a thread.
From a single application thread, you can QueueSubmit() to 2 queues from 2 different devices. Each QueueSubmit to one queue has a dependency on the previous QueueSubmit on the other queue through an exported/imported semaphore.
From the API point of view the state of the semaphore should be changed after each QueueSubmit(). The problem is that it's not because of the thread and because you might have those 2 submission threads tied to different VkDevice/VkInstance or even different applications (synchronizing themselves outside the vulkan API).
Hope that makes sense. It's not really easy to explain by mail, the best explanation is probably reading the test : https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync...
Like David mentioned you're not running into that issue right now, because you only dispatch to the thread under specific conditions. But I could build a case to force that and likely run into the same issue.
-Lionel
On 02/08/2019 07:33, Koenig, Christian wrote:
Hi Lionel,
Well could you describe once more what the problem is?
Cause I don't fully understand why a rather normal tandem submission with two semaphores should fail in any way.
Regards, Christian.
Am 02.08.2019 06:28 schrieb Lionel Landwerlin lionel.g.landwerlin@intel.com: There aren't CTS tests covering the issue I was mentioning. But we could add them.
I don't have all the details regarding your implementation but even with the "semaphore thread", I could see it running into the same issues. What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?
For example with queueA & queueB from 2 different VkDevice : vkQueueSubmit(queueA, signal semA); vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with timelineSemB triggering a wait before signal. vkQueueSubmit(queueB, signal semA);
-Lionel
On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
Hi Lionel,
By the Queue thread is a heavy thread, which is always resident in
driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue.
By the way, I already pass all your CTS cases for now. I suggest you
to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
-David
-----Original Message----- From: Lionel Landwerlin lionel.g.landwerlin@intel.com Sent: Friday, August 2, 2019 4:52 AM To: dri-devel dri-devel@lists.freedesktop.org; Koenig, Christian
Christian.Koenig@amd.com; Zhou, David(ChunMing) David1.Zhou@amd.com; Jason Ekstrand jason@jlekstrand.net
Subject: Threaded submission & semaphore sharing
Hi Christian, David,
Sorry to report this so late in the process, but I think we found an
issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
Essentially we're failing a test in crucible : func.sync.semaphore-fd.opaque-fd This test create a single binary semaphore, shares it between 2
VkDevice/VkQueue.
Then in a loop it proceeds to submit workload alternating between
the 2 VkQueue with one submit depending on the other.
It does so by waiting on the VkSemaphore signaled in the previous
iteration and resignaling it.
The problem for us is that once things are dispatched to the
submission thread, the ordering of the submission is lost.
Because we have 2 devices and they both have their own submission
thread.
Jason suggested that we reestablish the ordering by having
semaphores/syncobjs carry an additional uint64_t payload.
This 64bit integer would represent be an identifier that submission
threads will WAIT_FOR_AVAILABLE on.
The scenario would look like this : - vkQueueSubmit(queueA, signal on semA); - in the caller thread, this would increment the syncobj
additional u64 payload and return it to userspace.
- at some point the submission thread of queueA submits
the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit().
- vkQueueSubmit(queueB, wait on semA); - in the caller thread, this would read the syncobj additional u64 payload - at some point the submission thread of queueB will try
to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
Because we want the binary semaphores to be shared across processes
and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj.
It would need an additional ioctl to read & increment the value.
What do you think?
-Lionel
Hi Lionel,
Well that looks more like your test case is buggy.
According to the code the ctx1 queue always waits for sem1 and ctx2 queue always waits for sem2.
This way there can't be any Synchronisation between the two.
Regards, Christian.
Am 02.08.2019 06:55 schrieb Lionel Landwerlin lionel.g.landwerlin@intel.com: Hey Christian,
The problem boils down to the fact that we don't immediately create dma fences when calling vkQueueSubmit(). This is delayed to a thread.
From a single application thread, you can QueueSubmit() to 2 queues from 2 different devices.
Each QueueSubmit to one queue has a dependency on the previous QueueSubmit on the other queue through an exported/imported semaphore.
From the API point of view the state of the semaphore should be changed after each QueueSubmit().
The problem is that it's not because of the thread and because you might have those 2 submission threads tied to different VkDevice/VkInstance or even different applications (synchronizing themselves outside the vulkan API).
Hope that makes sense. It's not really easy to explain by mail, the best explanation is probably reading the test : https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync...
Like David mentioned you're not running into that issue right now, because you only dispatch to the thread under specific conditions. But I could build a case to force that and likely run into the same issue.
-Lionel
On 02/08/2019 07:33, Koenig, Christian wrote: Hi Lionel,
Well could you describe once more what the problem is?
Cause I don't fully understand why a rather normal tandem submission with two semaphores should fail in any way.
Regards, Christian.
Am 02.08.2019 06:28 schrieb Lionel Landwerlin lionel.g.landwerlin@intel.commailto:lionel.g.landwerlin@intel.com: There aren't CTS tests covering the issue I was mentioning. But we could add them.
I don't have all the details regarding your implementation but even with the "semaphore thread", I could see it running into the same issues. What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?
For example with queueA & queueB from 2 different VkDevice : vkQueueSubmit(queueA, signal semA); vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with timelineSemB triggering a wait before signal. vkQueueSubmit(queueB, signal semA);
-Lionel
On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
Hi Lionel,
By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue. By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
-David
-----Original Message----- From: Lionel Landwerlin lionel.g.landwerlin@intel.commailto:lionel.g.landwerlin@intel.com Sent: Friday, August 2, 2019 4:52 AM To: dri-devel dri-devel@lists.freedesktop.orgmailto:dri-devel@lists.freedesktop.org; Koenig, Christian Christian.Koenig@amd.commailto:Christian.Koenig@amd.com; Zhou, David(ChunMing) David1.Zhou@amd.commailto:David1.Zhou@amd.com; Jason Ekstrand jason@jlekstrand.netmailto:jason@jlekstrand.net Subject: Threaded submission & semaphore sharing
Hi Christian, David,
Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
Essentially we're failing a test in crucible : func.sync.semaphore-fd.opaque-fd This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue. Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other. It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost. Because we have 2 devices and they both have their own submission thread.
Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload. This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
The scenario would look like this : - vkQueueSubmit(queueA, signal on semA); - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace. - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit(). - vkQueueSubmit(queueB, wait on semA); - in the caller thread, this would read the syncobj additional u64 payload - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj. It would need an additional ioctl to read & increment the value.
What do you think?
-Lionel
On 02/08/2019 08:08, Koenig, Christian wrote:
Hi Lionel,
Well that looks more like your test case is buggy.
According to the code the ctx1 queue always waits for sem1 and ctx2 queue always waits for sem2.
That's supposed to be the same underlying syncobj because it's exported from one VkDevice as opaque FD from sem1 and imported into sem2.
This way there can't be any Synchronisation between the two.
Regards, Christian.
Am 02.08.2019 06:55 schrieb Lionel Landwerlin lionel.g.landwerlin@intel.com: Hey Christian,
The problem boils down to the fact that we don't immediately create dma fences when calling vkQueueSubmit(). This is delayed to a thread.
From a single application thread, you can QueueSubmit() to 2 queues from 2 different devices. Each QueueSubmit to one queue has a dependency on the previous QueueSubmit on the other queue through an exported/imported semaphore.
From the API point of view the state of the semaphore should be changed after each QueueSubmit(). The problem is that it's not because of the thread and because you might have those 2 submission threads tied to different VkDevice/VkInstance or even different applications (synchronizing themselves outside the vulkan API).
Hope that makes sense. It's not really easy to explain by mail, the best explanation is probably reading the test : https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync...
Like David mentioned you're not running into that issue right now, because you only dispatch to the thread under specific conditions. But I could build a case to force that and likely run into the same issue.
-Lionel
On 02/08/2019 07:33, Koenig, Christian wrote:
Hi Lionel,
Well could you describe once more what the problem is?
Cause I don't fully understand why a rather normal tandem submission with two semaphores should fail in any way.
Regards, Christian.
Am 02.08.2019 06:28 schrieb Lionel Landwerlin lionel.g.landwerlin@intel.com: There aren't CTS tests covering the issue I was mentioning. But we could add them.
I don't have all the details regarding your implementation but even with the "semaphore thread", I could see it running into the same issues. What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?
For example with queueA & queueB from 2 different VkDevice : vkQueueSubmit(queueA, signal semA); vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with timelineSemB triggering a wait before signal. vkQueueSubmit(queueB, signal semA);
-Lionel
On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
Hi Lionel,
By the Queue thread is a heavy thread, which is always resident in
driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue.
By the way, I already pass all your CTS cases for now. I suggest
you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
-David
-----Original Message----- From: Lionel Landwerlin lionel.g.landwerlin@intel.com Sent: Friday, August 2, 2019 4:52 AM To: dri-devel dri-devel@lists.freedesktop.org; Koenig, Christian
Christian.Koenig@amd.com; Zhou, David(ChunMing) David1.Zhou@amd.com; Jason Ekstrand jason@jlekstrand.net
Subject: Threaded submission & semaphore sharing
Hi Christian, David,
Sorry to report this so late in the process, but I think we found
an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
Essentially we're failing a test in crucible : func.sync.semaphore-fd.opaque-fd This test create a single binary semaphore, shares it between 2
VkDevice/VkQueue.
Then in a loop it proceeds to submit workload alternating between
the 2 VkQueue with one submit depending on the other.
It does so by waiting on the VkSemaphore signaled in the previous
iteration and resignaling it.
The problem for us is that once things are dispatched to the
submission thread, the ordering of the submission is lost.
Because we have 2 devices and they both have their own submission
thread.
Jason suggested that we reestablish the ordering by having
semaphores/syncobjs carry an additional uint64_t payload.
This 64bit integer would represent be an identifier that submission
threads will WAIT_FOR_AVAILABLE on.
The scenario would look like this : - vkQueueSubmit(queueA, signal on semA); - in the caller thread, this would increment the syncobj
additional u64 payload and return it to userspace.
- at some point the submission thread of queueA submits
the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit().
- vkQueueSubmit(queueB, wait on semA); - in the caller thread, this would read the syncobj
additional
u64 payload - at some point the submission thread of queueB will try
to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
Because we want the binary semaphores to be shared across processes
and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj.
It would need an additional ioctl to read & increment the value.
What do you think?
-Lionel
dri-devel@lists.freedesktop.org