Hi,
The goal of this feature is to solve an issue that was seen in our testing. After discussing on [1] I thought we could solve this problem with stalling the application thread after each vkQueueSubmit() call where a binary semaphore is signaled but I don't think it's a good option because of performance impacts on the application.
Unfortunately there isn't a good way to reproduce this problem 100% because it essentially exposes a race between the application thread and the submission thread.
I've uploaded tests in the Khronos repository to try to expose the issue.
Thanks,
[1] : https://lists.freedesktop.org/archives/dri-devel/2019-August/229188.html
Lionel Landwerlin (2): drm/syncobj: add sideband payload drm/syncobj: add submit point query capability
drivers/gpu/drm/drm_syncobj.c | 132 ++++++++++++++++++++++------------ include/drm/drm_syncobj.h | 9 +++ include/uapi/drm/drm.h | 5 +- 3 files changed, 100 insertions(+), 46 deletions(-)
-- 2.23.0.rc1
The Vulkan timeline semaphores allow signaling to happen on the point of the timeline without all of the its dependencies to be created.
The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the Linux kernel are using a thread to wait on the dependencies of a given point to materialize and delay actual submission to the kernel driver until the wait completes.
If a binary semaphore is submitted for signaling along the side of a timeline semaphore waiting for completion that means that the drm syncobj associated with that binary semaphore will not have a DMA fence associated with it by the time vkQueueSubmit() returns. This and the fact that a binary semaphore can be signaled and unsignaled as before its DMA fences materialize mean that we cannot just rely on the fence within the syncobj but we also need a sideband payload verifying that the fence in the syncobj matches the last submission from the Vulkan API point of view.
This change adds a sideband payload that is incremented with signaled syncobj when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the syncobj will read the sideband payload and wait for a fence chain element with a seqno superior or equal to the sideband payload value to be added into the fence chain and use that fence to trigger the submission on the kernel driver.
Signed-off-by: Lionel Landwerlin lionel.g.landwerlin@intel.com --- drivers/gpu/drm/drm_syncobj.c | 119 +++++++++++++++++++++------------- include/drm/drm_syncobj.h | 9 +++ include/uapi/drm/drm.h | 4 +- 3 files changed, 86 insertions(+), 46 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index b927e482e554..c437fb6aaf7c 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1150,8 +1150,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, if (ret < 0) return ret;
- for (i = 0; i < args->count_handles; i++) + for (i = 0; i < args->count_handles; i++) { drm_syncobj_replace_fence(syncobjs[i], NULL); + syncobjs[i]->sideband_payload = 0; + }
drm_syncobj_array_free(syncobjs, args->count_handles);
@@ -1197,7 +1199,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, { struct drm_syncobj_timeline_array *args = data; struct drm_syncobj **syncobjs; - struct dma_fence_chain **chains; + struct dma_fence_chain **chains = NULL; uint64_t *points; uint32_t i, j; int ret; @@ -1205,7 +1207,8 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE)) return -EOPNOTSUPP;
- if (args->pad != 0) + if (args->selector != DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIGNALED_POINT && + args->selector != DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIDEBAND_PAYLOAD) return -EINVAL;
if (args->count_handles == 0) @@ -1232,30 +1235,41 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, goto err_points; }
- chains = kmalloc_array(args->count_handles, sizeof(void *), GFP_KERNEL); - if (!chains) { - ret = -ENOMEM; - goto err_points; - } - for (i = 0; i < args->count_handles; i++) { - chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); - if (!chains[i]) { - for (j = 0; j < i; j++) - kfree(chains[j]); + switch (args->selector) { + case DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIGNALED_POINT: + chains = kmalloc_array(args->count_handles, sizeof(void *), GFP_KERNEL); + if (!chains) { ret = -ENOMEM; - goto err_chains; + goto err_points; + } + for (i = 0; i < args->count_handles; i++) { + chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); + if (!chains[i]) { + for (j = 0; j < i; j++) + kfree(chains[j]); + ret = -ENOMEM; + goto err_chains; + } } - }
- for (i = 0; i < args->count_handles; i++) { - struct dma_fence *fence = dma_fence_get_stub(); + for (i = 0; i < args->count_handles; i++) { + struct dma_fence *fence = dma_fence_get_stub();
- drm_syncobj_add_point(syncobjs[i], chains[i], - fence, points[i]); - dma_fence_put(fence); + drm_syncobj_add_point(syncobjs[i], chains[i], + fence, points[i]); + dma_fence_put(fence); + } + break; + + case DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIDEBAND_PAYLOAD: + for (i = 0; i < args->count_handles; i++) + syncobjs[i]->sideband_payload = points[i]; + break; } + err_chains: - kfree(chains); + if (chains) + kfree(chains); err_points: kfree(points); out: @@ -1276,7 +1290,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE)) return -EOPNOTSUPP;
- if (args->pad != 0) + if (args->selector != DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIGNALED_POINT && + args->selector != DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIDEBAND_PAYLOAD) return -EINVAL;
if (args->count_handles == 0) @@ -1290,37 +1305,51 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, return ret;
for (i = 0; i < args->count_handles; i++) { - struct dma_fence_chain *chain; struct dma_fence *fence; + struct dma_fence_chain *chain; uint64_t point;
- fence = drm_syncobj_fence_get(syncobjs[i]); - chain = to_dma_fence_chain(fence); - if (chain) { - struct dma_fence *iter, *last_signaled = NULL; - - dma_fence_chain_for_each(iter, fence) { - if (iter->context != fence->context) { - dma_fence_put(iter); - /* It is most likely that timeline has - * unorder points. */ - break; + switch (args->selector) { + case DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIGNALED_POINT: + fence = drm_syncobj_fence_get(syncobjs[i]); + chain = to_dma_fence_chain(fence); + if (chain) { + struct dma_fence *iter, *last_signaled = NULL; + + dma_fence_chain_for_each(iter, fence) { + if (iter->context != fence->context) { + dma_fence_put(iter); + /* It is most likely that + * timeline has unorder + * points. */ + break; + } + dma_fence_put(last_signaled); + last_signaled = dma_fence_get(iter); } + point = dma_fence_is_signaled(last_signaled) ? + last_signaled->seqno : + to_dma_fence_chain(last_signaled)->prev_seqno; dma_fence_put(last_signaled); - last_signaled = dma_fence_get(iter); + } else { + point = 0; } - point = dma_fence_is_signaled(last_signaled) ? - last_signaled->seqno : - to_dma_fence_chain(last_signaled)->prev_seqno; - dma_fence_put(last_signaled); - } else { - point = 0; - } - ret = copy_to_user(&points[i], &point, sizeof(uint64_t)); - ret = ret ? -EFAULT : 0; - if (ret) + ret = copy_to_user(&points[i], &point, sizeof(uint64_t)); + ret = ret ? -EFAULT : 0; + if (ret) + goto error; + break; + + case DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIDEBAND_PAYLOAD: + copy_to_user(&points[i], &syncobjs[i]->sideband_payload, sizeof(uint64_t)); + ret = ret ? -EFAULT : 0; + if (ret) + goto error; break; + } } + +error: drm_syncobj_array_free(syncobjs, args->count_handles);
return ret; diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 6cf7243a1dc5..b587b8e07547 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -61,6 +61,15 @@ struct drm_syncobj { * @file: A file backing for this syncobj. */ struct file *file; + /** + * @sideband_payload: A 64bit side band payload. + * + * We use the sideband payload value to wait on binary syncobj fences + * to materialize. It is a reservation mechanism for the signaler to + * express that at some point in the future a dma fence with the same + * seqno will be put into the syncobj. + */ + u64 sideband_payload; };
void drm_syncobj_free(struct kref *kref); diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 8a5b2f8f8eb9..dea759a36d37 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -782,7 +782,9 @@ struct drm_syncobj_timeline_array { __u64 handles; __u64 points; __u32 count_handles; - __u32 pad; + __u32 selector; +#define DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIGNALED_POINT (0) +#define DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIDEBAND_PAYLOAD (1) };
This feature was talked about by David. It allows userspace to query the last submitted point on a timeline.
Following the previous commit it made sense to add it.
Signed-off-by: Lionel Landwerlin lionel.g.landwerlin@intel.com --- drivers/gpu/drm/drm_syncobj.c | 15 ++++++++++++++- include/uapi/drm/drm.h | 1 + 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index c437fb6aaf7c..ad2f5672d707 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1291,7 +1291,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, return -EOPNOTSUPP;
if (args->selector != DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIGNALED_POINT && - args->selector != DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIDEBAND_PAYLOAD) + args->selector != DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIDEBAND_PAYLOAD && + args->selector != DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SUBMIT_POINT) return -EINVAL;
if (args->count_handles == 0) @@ -1346,6 +1347,18 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, if (ret) goto error; break; + + case DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SUBMIT_POINT: + fence = drm_syncobj_fence_get(syncobjs[i]); + chain = to_dma_fence_chain(fence); + point = chain ? chain->base.seqno : 0; + dma_fence_put(fence); + + ret = copy_to_user(&points[i], &point, sizeof(uint64_t)); + ret = ret ? -EFAULT : 0; + if (ret) + goto error; + break; } }
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index dea759a36d37..3b8cdb3ffa94 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -785,6 +785,7 @@ struct drm_syncobj_timeline_array { __u32 selector; #define DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIGNALED_POINT (0) #define DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SIDEBAND_PAYLOAD (1) +#define DRM_SYNCOBJ_TIMELINE_ARRAY_SELECTOR_SUBMIT_POINT (2) };
Well first of all I strongly suggest to make the sideband information a separate IOCTL and not use the existing signal/query IOCTLs for it.
Then as far as I see this basically sets a sequence number to use for binary semaphores, is that correct? If yes then that would be a rather elegant idea.
Christian.
Am 07.08.19 um 15:37 schrieb Lionel Landwerlin:
Hi,
The goal of this feature is to solve an issue that was seen in our testing. After discussing on [1] I thought we could solve this problem with stalling the application thread after each vkQueueSubmit() call where a binary semaphore is signaled but I don't think it's a good option because of performance impacts on the application.
Unfortunately there isn't a good way to reproduce this problem 100% because it essentially exposes a race between the application thread and the submission thread.
I've uploaded tests in the Khronos repository to try to expose the issue.
Thanks,
[1] : https://lists.freedesktop.org/archives/dri-devel/2019-August/229188.html
Lionel Landwerlin (2): drm/syncobj: add sideband payload drm/syncobj: add submit point query capability
drivers/gpu/drm/drm_syncobj.c | 132 ++++++++++++++++++++++------------ include/drm/drm_syncobj.h | 9 +++ include/uapi/drm/drm.h | 5 +- 3 files changed, 100 insertions(+), 46 deletions(-)
-- 2.23.0.rc1
On 07/08/2019 16:49, Koenig, Christian wrote:
Well first of all I strongly suggest to make the sideband information a separate IOCTL and not use the existing signal/query IOCTLs for it.
It felt like at least the query ioctl was the right place to get the sideband value.
I can change.
Then as far as I see this basically sets a sequence number to use for binary semaphores, is that correct? If yes then that would be a rather elegant idea.
Yeah that's pretty much it. From the vulkan API point view, this doesn't even need to be atomic, it just needs to happen within vkQueueSubmit().
Again to explain the issue, it's that syncobj is container and we might race picking up the fence from one thread while the submission thread updates the fence.
Essentially reusing the fence chain mechanism solve the issue because we can wait on a particular replacement sequence number (matching a vkQueueSubmit() call).
-Lionel
Christian.
Am 07.08.19 um 15:37 schrieb Lionel Landwerlin:
Hi,
The goal of this feature is to solve an issue that was seen in our testing. After discussing on [1] I thought we could solve this problem with stalling the application thread after each vkQueueSubmit() call where a binary semaphore is signaled but I don't think it's a good option because of performance impacts on the application.
Unfortunately there isn't a good way to reproduce this problem 100% because it essentially exposes a race between the application thread and the submission thread.
I've uploaded tests in the Khronos repository to try to expose the issue.
Thanks,
[1] : https://lists.freedesktop.org/archives/dri-devel/2019-August/229188.html
Lionel Landwerlin (2): drm/syncobj: add sideband payload drm/syncobj: add submit point query capability
drivers/gpu/drm/drm_syncobj.c | 132 ++++++++++++++++++++++------------ include/drm/drm_syncobj.h | 9 +++ include/uapi/drm/drm.h | 5 +- 3 files changed, 100 insertions(+), 46 deletions(-)
-- 2.23.0.rc1
dri-devel@lists.freedesktop.org