Hi everyone,
this series adds support for using sync fences as prefences and postfences for host1x job submissions. The patches are available as a git repository at
https://github.com/cyndis/linux/tree/host1x-fence-1
and testing code is available at
https://github.com/cyndis/host1x_test
though you may want to edit the main function to disable the timeout tests for now as they cause a deadlock (not caused by this series; fix upcoming).
Verified on a Jetson TX1; should go on top of the earlier VIC series.
Some additional points: * I noticed that the waitchk_mask field in the submit UAPI is completely useless, and has never had any effect in the upstream kernel. It has also not existed in the downstream kernel for many years. We could replace it with the flags field if that is deemed acceptable, though of course it is possible there exists some application that fills it with some non-zero value. * Signaling is enabled for all host1x fences, not just those for which enable_signaling has been called. This is because enable_signaling is called from atomic context and we cannot set up an action waiter in atomic context.
Thanks, Mikko
Mikko Perttunen (3): gpu: host1x: Add support for DMA fences drm/tegra: Add sync file support to submit interface drm/tegra: Support for sync file-based fences in submit
drivers/gpu/drm/tegra/drm.c | 69 +++++++++++-- drivers/gpu/host1x/Kconfig | 1 + drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/dev.h | 12 ++- drivers/gpu/host1x/fence.c | 202 +++++++++++++++++++++++++++++++++++++ drivers/gpu/host1x/fence.h | 28 +++++ drivers/gpu/host1x/hw/channel_hw.c | 36 +++++-- drivers/gpu/host1x/intr.c | 11 +- drivers/gpu/host1x/intr.h | 8 +- drivers/gpu/host1x/syncpt.c | 2 + include/linux/host1x.h | 12 ++- include/uapi/drm/tegra_drm.h | 8 +- 12 files changed, 367 insertions(+), 23 deletions(-) create mode 100644 drivers/gpu/host1x/fence.c create mode 100644 drivers/gpu/host1x/fence.h
Add an implementation of DMA fences backed by Host1x syncpoints, an interface to specify a prefence for job submissions.
Before submission, prefences containing only Host1x syncpoints are waited by pushing wait commands to CDMA, whereas other fences are CPU-waited.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/host1x/Kconfig | 1 + drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/dev.h | 12 ++- drivers/gpu/host1x/fence.c | 202 +++++++++++++++++++++++++++++++++++++ drivers/gpu/host1x/fence.h | 28 +++++ drivers/gpu/host1x/hw/channel_hw.c | 36 +++++-- drivers/gpu/host1x/intr.c | 11 +- drivers/gpu/host1x/intr.h | 8 +- drivers/gpu/host1x/syncpt.c | 2 + include/linux/host1x.h | 12 ++- 10 files changed, 302 insertions(+), 11 deletions(-) create mode 100644 drivers/gpu/host1x/fence.c create mode 100644 drivers/gpu/host1x/fence.h
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig index b2fd029d67b3..3ede4aede6c3 100644 --- a/drivers/gpu/host1x/Kconfig +++ b/drivers/gpu/host1x/Kconfig @@ -1,6 +1,7 @@ config TEGRA_HOST1X tristate "NVIDIA Tegra host1x driver" depends on ARCH_TEGRA || (ARM && COMPILE_TEST) + select DMA_SHARED_BUFFER help Driver for the NVIDIA Tegra host1x hardware.
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index a1d9974cfcb5..e5d61542b9b5 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -8,6 +8,7 @@ host1x-y = \ job.o \ debug.o \ mipi.o \ + fence.o \ hw/host1x01.o \ hw/host1x02.o \ hw/host1x04.o \ diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index e5113acecd7a..1ee79dbd1882 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2012-2015, NVIDIA Corporation. + * Copyright (C) 2012-2016 NVIDIA CORPORATION. All rights reserved. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -41,6 +41,7 @@ struct host1x_channel_ops { int (*init)(struct host1x_channel *channel, struct host1x *host, unsigned int id); int (*submit)(struct host1x_job *job); + void (*push_wait)(struct host1x_channel *ch, u32 id, u32 thresh); };
struct host1x_cdma_ops { @@ -110,6 +111,8 @@ struct host1x { struct device *dev; struct clk *clk;
+ u64 fence_ctx_base; + struct iommu_domain *domain; struct iova_domain iova; dma_addr_t iova_end; @@ -230,6 +233,13 @@ static inline int host1x_hw_channel_submit(struct host1x *host, return host->channel_op->submit(job); }
+static inline void host1x_hw_channel_push_wait(struct host1x *host, + struct host1x_channel *channel, + u32 id, u32 thresh) +{ + host->channel_op->push_wait(channel, id, thresh); +} + static inline void host1x_hw_cdma_start(struct host1x *host, struct host1x_cdma *cdma) { diff --git a/drivers/gpu/host1x/fence.c b/drivers/gpu/host1x/fence.c new file mode 100644 index 000000000000..3b056623ea64 --- /dev/null +++ b/drivers/gpu/host1x/fence.c @@ -0,0 +1,202 @@ +/* + * Copyright (C) 2016 NVIDIA CORPORATION. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/dma-fence.h> +#include <linux/dma-fence-array.h> +#include <linux/slab.h> + +#include "fence.h" +#include "intr.h" +#include "syncpt.h" +#include "cdma.h" +#include "channel.h" +#include "dev.h" + +struct host1x_fence { + struct dma_fence base; + spinlock_t lock; + + struct host1x_syncpt *syncpt; + u32 threshold; + + struct host1x *host; + void *waiter; + + char timeline_name[10]; +}; + +static inline struct host1x_fence *to_host1x_fence(struct dma_fence *fence) +{ + return (struct host1x_fence *)fence; +} + +static const char *host1x_fence_get_driver_name(struct dma_fence *fence) +{ + return "host1x"; +} + +static const char *host1x_fence_get_timeline_name(struct dma_fence *fence) +{ + struct host1x_fence *f = to_host1x_fence(fence); + + return f->timeline_name; +} + +static bool host1x_fence_enable_signaling(struct dma_fence *fence) +{ + struct host1x_fence *f = to_host1x_fence(fence); + + if (host1x_syncpt_is_expired(f->syncpt, f->threshold)) + return false; + + return true; +} + +static bool host1x_fence_signaled(struct dma_fence *fence) +{ + struct host1x_fence *f = to_host1x_fence(fence); + + return host1x_syncpt_is_expired(f->syncpt, f->threshold); +} + +static void host1x_fence_release(struct dma_fence *fence) +{ + struct host1x_fence *f = to_host1x_fence(fence); + + if (f->waiter) + host1x_intr_put_ref(f->host, f->syncpt->id, f->waiter); + + kfree(f); +} + +const struct dma_fence_ops host1x_fence_ops = { + .get_driver_name = host1x_fence_get_driver_name, + .get_timeline_name = host1x_fence_get_timeline_name, + .enable_signaling = host1x_fence_enable_signaling, + .signaled = host1x_fence_signaled, + .wait = dma_fence_default_wait, + .release = host1x_fence_release, +}; + +static void host1x_fence_wait_single(struct host1x_fence *f, + struct host1x *host, + struct host1x_channel *ch) +{ + if (host1x_syncpt_is_expired(f->syncpt, f->threshold)) + return; + + host1x_hw_channel_push_wait(host, ch, f->syncpt->id, f->threshold); +} + +/** + * host1x_fence_is_waitable() - Check if DMA fence can be waited by hardware + * @fence: DMA fence + * + * Check is @fence is only backed by Host1x syncpoints and can therefore be + * waited using only hardware. + */ +bool host1x_fence_is_waitable(struct dma_fence *fence) +{ + struct dma_fence_array *array; + int i; + + array = to_dma_fence_array(fence); + if (!array) + return fence->ops == &host1x_fence_ops; + + for (i = 0; i < array->num_fences; ++i) { + if (array->fences[i]->ops != &host1x_fence_ops) + return false; + } + + return true; +} + +/** + * host1x_fence_wait() - Insert waits for fence into channel + * @fence: DMA fence + * @host: Host1x + * @ch: Host1x channel + * + * Inserts wait commands into Host1x channel fences in @fence. + * in @fence. @fence must only consist of syncpoint-backed fences. + * + * Return: 0 on success, -errno otherwise. + */ +int host1x_fence_wait(struct dma_fence *fence, struct host1x *host, + struct host1x_channel *ch) +{ + struct dma_fence_array *array; + int i = 0; + + if (!host1x_fence_is_waitable(fence)) + return -EINVAL; + + array = to_dma_fence_array(fence); + if (!array) { + host1x_fence_wait_single(to_host1x_fence(fence), host, ch); + return 0; + } + + for (i = 0; i < array->num_fences; ++i) { + host1x_fence_wait_single(to_host1x_fence(array->fences[i]), + host, ch); + } + + return 0; +} + +struct dma_fence *host1x_fence_create(struct host1x *host, + struct host1x_syncpt *syncpt, + u32 threshold) +{ + struct host1x_waitlist *waiter; + struct host1x_fence *f; + int err; + + f = kzalloc(sizeof(*f), GFP_KERNEL); + if (!f) + return NULL; + + waiter = kzalloc(sizeof(*waiter), GFP_KERNEL); + if (!waiter) { + kfree(f); + return NULL; + } + + f->host = host; + f->syncpt = syncpt; + f->threshold = threshold; + f->waiter = NULL; + snprintf(f->timeline_name, ARRAY_SIZE(f->timeline_name), + "%d", syncpt->id); + + spin_lock_init(&f->lock); + dma_fence_init(&f->base, &host1x_fence_ops, &f->lock, + host->fence_ctx_base + syncpt->id, threshold); + + err = host1x_intr_add_action(f->host, f->syncpt->id, f->threshold, + HOST1X_INTR_ACTION_SIGNAL_FENCE, f, + waiter, &f->waiter); + if (err) { + kfree(waiter); + dma_fence_put((struct dma_fence *)f); + return NULL; + } + + return (struct dma_fence *)f; +} +EXPORT_SYMBOL(host1x_fence_create); diff --git a/drivers/gpu/host1x/fence.h b/drivers/gpu/host1x/fence.h new file mode 100644 index 000000000000..5725c95c0f1b --- /dev/null +++ b/drivers/gpu/host1x/fence.h @@ -0,0 +1,28 @@ +/* + * Copyright (C) 2016 NVIDIA CORPORATION. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_FENCE_H +#define __HOST1X_FENCE_H + +struct host1x; +struct host1x_channel; +struct dma_fence; + +bool host1x_fence_is_waitable(struct dma_fence *fence); +int host1x_fence_wait(struct dma_fence *fence, struct host1x *host, + struct host1x_channel *ch); + +#endif diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c index 5e8df78b7acd..27dc78f4c400 100644 --- a/drivers/gpu/host1x/hw/channel_hw.c +++ b/drivers/gpu/host1x/hw/channel_hw.c @@ -1,7 +1,7 @@ /* * Tegra host1x Channel * - * Copyright (c) 2010-2013, NVIDIA Corporation. + * Copyright (C) 2010-2016 NVIDIA CORPORATION. All rights reserved. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -16,6 +16,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/dma-fence.h> #include <linux/host1x.h> #include <linux/slab.h>
@@ -23,6 +24,7 @@
#include "../channel.h" #include "../dev.h" +#include "../fence.h" #include "../intr.h" #include "../job.h"
@@ -68,11 +70,26 @@ static void submit_gathers(struct host1x_job *job) u32 op1 = host1x_opcode_gather(g->words); u32 op2 = g->base + g->offset;
+ /* add a setclass for modules that require it */ + if (job->class) + host1x_cdma_push(cdma, + host1x_opcode_setclass(job->class, 0, 0), + HOST1X_OPCODE_NOP); + trace_write_gather(cdma, g->bo, g->offset, op1 & 0xffff); host1x_cdma_push(cdma, op1, op2); } }
+static void channel_push_wait(struct host1x_channel *channel, + u32 id, u32 thresh) +{ + host1x_cdma_push(&channel->cdma, + host1x_opcode_setclass(HOST1X_CLASS_HOST1X, + host1x_uclass_wait_syncpt_r(), 1), + host1x_class_host_wait_syncpt(id, thresh)); +} + static inline void synchronize_syncpt_base(struct host1x_job *job) { struct host1x *host = dev_get_drvdata(job->channel->dev->parent); @@ -110,6 +127,16 @@ static int channel_submit(struct host1x_job *job) /* before error checks, return current max */ prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
+ if (job->prefence) { + if (host1x_fence_is_waitable(job->prefence)) { + host1x_fence_wait(job->prefence, host, job->channel); + } else { + err = dma_fence_wait(job->prefence, true); + if (err) + goto error; + } + } + /* get submit lock */ err = mutex_lock_interruptible(&ch->submitlock); if (err) @@ -149,12 +176,6 @@ static int channel_submit(struct host1x_job *job)
job->syncpt_end = syncval;
- /* add a setclass for modules that require it */ - if (job->class) - host1x_cdma_push(&ch->cdma, - host1x_opcode_setclass(job->class, 0, 0), - HOST1X_OPCODE_NOP); - submit_gathers(job);
/* end CDMA submit & stash pinned hMems into sync queue */ @@ -192,4 +213,5 @@ static int host1x_channel_init(struct host1x_channel *ch, struct host1x *dev, static const struct host1x_channel_ops host1x_channel_ops = { .init = host1x_channel_init, .submit = channel_submit, + .push_wait = channel_push_wait }; diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c index 8b4fad0ab35d..b3d51288243d 100644 --- a/drivers/gpu/host1x/intr.c +++ b/drivers/gpu/host1x/intr.c @@ -1,7 +1,7 @@ /* * Tegra host1x Interrupt Management * - * Copyright (c) 2010-2013, NVIDIA Corporation. + * Copyright (C) 2010-2016 NVIDIA CORPORATION. All rights reserved. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -17,6 +17,7 @@ */
#include <linux/clk.h> +#include <linux/dma-fence.h> #include <linux/interrupt.h> #include <linux/slab.h> #include <linux/irq.h> @@ -133,12 +134,20 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter) wake_up_interruptible(wq); }
+static void action_signal_fence(struct host1x_waitlist *waiter) +{ + struct dma_fence *fence = waiter->data; + + dma_fence_signal(fence); +} + typedef void (*action_handler)(struct host1x_waitlist *waiter);
static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = { action_submit_complete, action_wakeup, action_wakeup_interruptible, + action_signal_fence };
static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT]) diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h index 1370c2bb75b8..6b2c090fa91c 100644 --- a/drivers/gpu/host1x/intr.h +++ b/drivers/gpu/host1x/intr.h @@ -1,7 +1,7 @@ /* * Tegra host1x Interrupt Management * - * Copyright (c) 2010-2013, NVIDIA Corporation. + * Copyright (C) 2010-2016 NVIDIA CORPORATION. All rights reserved. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -43,6 +43,12 @@ enum host1x_intr_action { */ HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
+ /* + * Signal a dma fence. + * 'data' points to a host1x_fence + */ + HOST1X_INTR_ACTION_SIGNAL_FENCE, + HOST1X_INTR_ACTION_COUNT };
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c index 3236c3d21a15..9c9c983d3a6c 100644 --- a/drivers/gpu/host1x/syncpt.c +++ b/drivers/gpu/host1x/syncpt.c @@ -18,6 +18,7 @@
#include <linux/module.h> #include <linux/device.h> +#include <linux/dma-fence.h> #include <linux/slab.h>
#include <trace/events/host1x.h> @@ -391,6 +392,7 @@ int host1x_syncpt_init(struct host1x *host) mutex_init(&host->syncpt_mutex); host->syncpt = syncpt; host->bases = bases; + host->fence_ctx_base = dma_fence_context_alloc(host->info->nb_pts);
host1x_syncpt_restore(host);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h index 3d04aa1dc83e..fd9b526486e0 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009-2013, NVIDIA Corporation. All rights reserved. + * Copyright (C) 2009-2016 NVIDIA CORPORATION. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -55,6 +55,7 @@ struct host1x_client { * host1x buffer objects */
+struct dma_fence; struct host1x_bo; struct sg_table;
@@ -233,6 +234,9 @@ struct host1x_job {
/* Add a channel wait for previous ops to complete */ bool serialize; + + /* Wait for prefence to complete before submitting */ + struct dma_fence *prefence; };
struct host1x_job *host1x_job_alloc(struct host1x_channel *ch, @@ -309,4 +313,10 @@ int tegra_mipi_enable(struct tegra_mipi_device *device); int tegra_mipi_disable(struct tegra_mipi_device *device); int tegra_mipi_calibrate(struct tegra_mipi_device *device);
+struct host1x_fence; + +struct dma_fence *host1x_fence_create(struct host1x *host, + struct host1x_syncpt *syncpt, + u32 threshold); + #endif
Adds ability to pass sync file based prefences and get back sync file based postfences during job submission. Both fence fd's are passed in the `fence` field. A new `flags` field is used to specify if the prefence should be waited or a postfence created.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- include/uapi/drm/tegra_drm.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/include/uapi/drm/tegra_drm.h b/include/uapi/drm/tegra_drm.h index d954f8c33321..b85689c15d8f 100644 --- a/include/uapi/drm/tegra_drm.h +++ b/include/uapi/drm/tegra_drm.h @@ -117,6 +117,9 @@ struct drm_tegra_waitchk { __u32 thresh; };
+#define DRM_TEGRA_SUBMIT_WAIT_FENCE_FD (1 << 0) +#define DRM_TEGRA_SUBMIT_CREATE_FENCE_FD (1 << 1) + struct drm_tegra_submit { __u64 context; __u32 num_syncpts; @@ -129,9 +132,10 @@ struct drm_tegra_submit { __u64 cmdbufs; __u64 relocs; __u64 waitchks; - __u32 fence; /* Return value */ + __u32 fence; + __u32 flags;
- __u32 reserved[5]; /* future expansion */ + __u32 reserved[4]; /* future expansion */ };
#define DRM_TEGRA_GEM_TILING_MODE_PITCH 0
Add support for sync file-based prefences and postfences to job submission. Fences are passed to the Host1x implementation.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 69 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 64dff8530403..bf4a2a13c17d 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -10,6 +10,7 @@ #include <linux/bitops.h> #include <linux/host1x.h> #include <linux/iommu.h> +#include <linux/sync_file.h>
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> @@ -344,6 +345,7 @@ int tegra_drm_submit(struct tegra_drm_context *context, struct drm_tegra_submit *args, struct drm_device *drm, struct drm_file *file) { + struct host1x *host1x = dev_get_drvdata(drm->dev->parent); unsigned int num_cmdbufs = args->num_cmdbufs; unsigned int num_relocs = args->num_relocs; unsigned int num_waitchks = args->num_waitchks; @@ -361,6 +363,11 @@ int tegra_drm_submit(struct tegra_drm_context *context, if (args->num_syncpts != 1) return -EINVAL;
+ /* Check for unrecognized flags */ + if (args->flags & ~(DRM_TEGRA_SUBMIT_WAIT_FENCE_FD | + DRM_TEGRA_SUBMIT_CREATE_FENCE_FD)) + return -EINVAL; + job = host1x_job_alloc(context->channel, args->num_cmdbufs, args->num_relocs, args->num_waitchks); if (!job) @@ -372,19 +379,27 @@ int tegra_drm_submit(struct tegra_drm_context *context, job->class = context->client->base.class; job->serialize = true;
+ if (args->flags & DRM_TEGRA_SUBMIT_WAIT_FENCE_FD) { + job->prefence = sync_file_get_fence(args->fence); + if (!job->prefence) { + err = -ENOENT; + goto put_job; + } + } + while (num_cmdbufs) { struct drm_tegra_cmdbuf cmdbuf; struct host1x_bo *bo;
if (copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf))) { err = -EFAULT; - goto fail; + goto put_fence; }
bo = host1x_bo_lookup(file, cmdbuf.handle); if (!bo) { err = -ENOENT; - goto fail; + goto put_fence; }
host1x_job_add_gather(job, bo, cmdbuf.words, cmdbuf.offset); @@ -398,19 +413,19 @@ int tegra_drm_submit(struct tegra_drm_context *context, &relocs[num_relocs], drm, file); if (err < 0) - goto fail; + goto put_fence; }
if (copy_from_user(job->waitchk, waitchks, sizeof(*waitchks) * num_waitchks)) { err = -EFAULT; - goto fail; + goto put_fence; }
if (copy_from_user(&syncpt, (void __user *)(uintptr_t)args->syncpts, sizeof(syncpt))) { err = -EFAULT; - goto fail; + goto put_fence; }
job->is_addr_reg = context->client->ops->is_addr_reg; @@ -423,20 +438,54 @@ int tegra_drm_submit(struct tegra_drm_context *context,
err = host1x_job_pin(job, context->client->base.dev); if (err) - goto fail; + goto put_fence;
err = host1x_job_submit(job); if (err) - goto fail_submit; + goto unpin_job;
- args->fence = job->syncpt_end; + if (args->flags & DRM_TEGRA_SUBMIT_CREATE_FENCE_FD) { + struct dma_fence *fence; + struct sync_file *file; + + fence = host1x_fence_create( + host1x, host1x_syncpt_get(host1x, job->syncpt_id), + job->syncpt_end); + if (!fence) { + err = -ENOMEM; + goto put_fence; + } + + file = sync_file_create(fence); + if (!file) { + dma_fence_put(fence); + err = -ENOMEM; + goto put_fence; + } + + err = get_unused_fd_flags(O_CLOEXEC); + if (err < 0) { + dma_fence_put(fence); + goto put_fence; + } + + fd_install(err, file->file); + args->fence = err; + } else { + args->fence = job->syncpt_end; + }
+ if (job->prefence) + dma_fence_put(job->prefence); host1x_job_put(job); return 0;
-fail_submit: +unpin_job: host1x_job_unpin(job); -fail: +put_fence: + if (job->prefence) + dma_fence_put(job->prefence); +put_job: host1x_job_put(job); return err; }
On Thu, Mar 09, 2017 at 07:57:18PM +0200, Mikko Perttunen wrote:
Add support for sync file-based prefences and postfences to job submission. Fences are passed to the Host1x implementation.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
drivers/gpu/drm/tegra/drm.c | 69 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 64dff8530403..bf4a2a13c17d 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -10,6 +10,7 @@ #include <linux/bitops.h> #include <linux/host1x.h> #include <linux/iommu.h> +#include <linux/sync_file.h>
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> @@ -344,6 +345,7 @@ int tegra_drm_submit(struct tegra_drm_context *context, struct drm_tegra_submit *args, struct drm_device *drm, struct drm_file *file) {
- struct host1x *host1x = dev_get_drvdata(drm->dev->parent); unsigned int num_cmdbufs = args->num_cmdbufs; unsigned int num_relocs = args->num_relocs; unsigned int num_waitchks = args->num_waitchks;
@@ -361,6 +363,11 @@ int tegra_drm_submit(struct tegra_drm_context *context, if (args->num_syncpts != 1) return -EINVAL;
- /* Check for unrecognized flags */
- if (args->flags & ~(DRM_TEGRA_SUBMIT_WAIT_FENCE_FD |
DRM_TEGRA_SUBMIT_CREATE_FENCE_FD))
return -EINVAL;
- job = host1x_job_alloc(context->channel, args->num_cmdbufs, args->num_relocs, args->num_waitchks); if (!job)
@@ -372,19 +379,27 @@ int tegra_drm_submit(struct tegra_drm_context *context, job->class = context->client->base.class; job->serialize = true;
if (args->flags & DRM_TEGRA_SUBMIT_WAIT_FENCE_FD) {
job->prefence = sync_file_get_fence(args->fence);
if (!job->prefence) {
err = -ENOENT;
goto put_job;
}
}
while (num_cmdbufs) { struct drm_tegra_cmdbuf cmdbuf; struct host1x_bo *bo;
if (copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf))) { err = -EFAULT;
goto fail;
goto put_fence;
}
bo = host1x_bo_lookup(file, cmdbuf.handle); if (!bo) { err = -ENOENT;
goto fail;
goto put_fence;
}
host1x_job_add_gather(job, bo, cmdbuf.words, cmdbuf.offset);
@@ -398,19 +413,19 @@ int tegra_drm_submit(struct tegra_drm_context *context, &relocs[num_relocs], drm, file); if (err < 0)
goto fail;
goto put_fence;
}
if (copy_from_user(job->waitchk, waitchks, sizeof(*waitchks) * num_waitchks)) { err = -EFAULT;
goto fail;
goto put_fence;
}
if (copy_from_user(&syncpt, (void __user *)(uintptr_t)args->syncpts, sizeof(syncpt))) { err = -EFAULT;
goto fail;
goto put_fence;
}
job->is_addr_reg = context->client->ops->is_addr_reg;
@@ -423,20 +438,54 @@ int tegra_drm_submit(struct tegra_drm_context *context,
err = host1x_job_pin(job, context->client->base.dev); if (err)
goto fail;
goto put_fence;
err = host1x_job_submit(job); if (err)
goto fail_submit;
goto unpin_job;
Shouldn't all error-unwinding gotos after this jump to the unpin_job label as well? Seems like they all jump to put_fence instead, which I think would leave the job pinned on failure.
- args->fence = job->syncpt_end;
if (args->flags & DRM_TEGRA_SUBMIT_CREATE_FENCE_FD) {
struct dma_fence *fence;
struct sync_file *file;
fence = host1x_fence_create(
host1x, host1x_syncpt_get(host1x, job->syncpt_id),
job->syncpt_end);
if (!fence) {
err = -ENOMEM;
goto put_fence;
}
file = sync_file_create(fence);
if (!file) {
dma_fence_put(fence);
err = -ENOMEM;
goto put_fence;
}
err = get_unused_fd_flags(O_CLOEXEC);
if (err < 0) {
dma_fence_put(fence);
goto put_fence;
}
fd_install(err, file->file);
args->fence = err;
} else {
args->fence = job->syncpt_end;
}
if (job->prefence)
dma_fence_put(job->prefence);
host1x_job_put(job); return 0;
-fail_submit: +unpin_job: host1x_job_unpin(job); -fail: +put_fence:
- if (job->prefence)
dma_fence_put(job->prefence);
Since we already have a conditional to check for usage of fence, I'm wondering if we can simplify this a little and leave out the put_fence label altogether, like so:
unpin_job: host1x_job_unpin(job); put_job: if (job->prefence) dma_fence_put(job->prefence);
host1x_job_put(job);
Thierry
On 13.03.2017 09:15, Thierry Reding wrote:
On Thu, Mar 09, 2017 at 07:57:18PM +0200, Mikko Perttunen wrote:
Add support for sync file-based prefences and postfences to job submission. Fences are passed to the Host1x implementation.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
drivers/gpu/drm/tegra/drm.c | 69 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 64dff8530403..bf4a2a13c17d 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -10,6 +10,7 @@ #include <linux/bitops.h> #include <linux/host1x.h> #include <linux/iommu.h> +#include <linux/sync_file.h>
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> @@ -344,6 +345,7 @@ int tegra_drm_submit(struct tegra_drm_context *context, struct drm_tegra_submit *args, struct drm_device *drm, struct drm_file *file) {
- struct host1x *host1x = dev_get_drvdata(drm->dev->parent); unsigned int num_cmdbufs = args->num_cmdbufs; unsigned int num_relocs = args->num_relocs; unsigned int num_waitchks = args->num_waitchks;
@@ -361,6 +363,11 @@ int tegra_drm_submit(struct tegra_drm_context *context, if (args->num_syncpts != 1) return -EINVAL;
- /* Check for unrecognized flags */
- if (args->flags & ~(DRM_TEGRA_SUBMIT_WAIT_FENCE_FD |
DRM_TEGRA_SUBMIT_CREATE_FENCE_FD))
return -EINVAL;
- job = host1x_job_alloc(context->channel, args->num_cmdbufs, args->num_relocs, args->num_waitchks); if (!job)
@@ -372,19 +379,27 @@ int tegra_drm_submit(struct tegra_drm_context *context, job->class = context->client->base.class; job->serialize = true;
if (args->flags & DRM_TEGRA_SUBMIT_WAIT_FENCE_FD) {
job->prefence = sync_file_get_fence(args->fence);
if (!job->prefence) {
err = -ENOENT;
goto put_job;
}
}
while (num_cmdbufs) { struct drm_tegra_cmdbuf cmdbuf; struct host1x_bo *bo;
if (copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf))) { err = -EFAULT;
goto fail;
goto put_fence;
}
bo = host1x_bo_lookup(file, cmdbuf.handle); if (!bo) { err = -ENOENT;
goto fail;
goto put_fence;
}
host1x_job_add_gather(job, bo, cmdbuf.words, cmdbuf.offset);
@@ -398,19 +413,19 @@ int tegra_drm_submit(struct tegra_drm_context *context, &relocs[num_relocs], drm, file); if (err < 0)
goto fail;
goto put_fence;
}
if (copy_from_user(job->waitchk, waitchks, sizeof(*waitchks) * num_waitchks)) { err = -EFAULT;
goto fail;
goto put_fence;
}
if (copy_from_user(&syncpt, (void __user *)(uintptr_t)args->syncpts, sizeof(syncpt))) { err = -EFAULT;
goto fail;
goto put_fence;
}
job->is_addr_reg = context->client->ops->is_addr_reg;
@@ -423,20 +438,54 @@ int tegra_drm_submit(struct tegra_drm_context *context,
err = host1x_job_pin(job, context->client->base.dev); if (err)
goto fail;
goto put_fence;
err = host1x_job_submit(job); if (err)
goto fail_submit;
goto unpin_job;
Shouldn't all error-unwinding gotos after this jump to the unpin_job label as well? Seems like they all jump to put_fence instead, which I think would leave the job pinned on failure.
After host1x_job_submit is succesfully called, host1x's job tracking owns the job and will call unpin on it once it finishes or times out, so we cannot unpin it from here.
- args->fence = job->syncpt_end;
if (args->flags & DRM_TEGRA_SUBMIT_CREATE_FENCE_FD) {
struct dma_fence *fence;
struct sync_file *file;
fence = host1x_fence_create(
host1x, host1x_syncpt_get(host1x, job->syncpt_id),
job->syncpt_end);
if (!fence) {
err = -ENOMEM;
goto put_fence;
}
file = sync_file_create(fence);
if (!file) {
dma_fence_put(fence);
err = -ENOMEM;
goto put_fence;
}
err = get_unused_fd_flags(O_CLOEXEC);
if (err < 0) {
dma_fence_put(fence);
goto put_fence;
}
fd_install(err, file->file);
args->fence = err;
} else {
args->fence = job->syncpt_end;
}
if (job->prefence)
dma_fence_put(job->prefence);
host1x_job_put(job); return 0;
-fail_submit: +unpin_job: host1x_job_unpin(job); -fail: +put_fence:
- if (job->prefence)
dma_fence_put(job->prefence);
Since we already have a conditional to check for usage of fence, I'm wondering if we can simplify this a little and leave out the put_fence label altogether, like so:
unpin_job: host1x_job_unpin(job); put_job: if (job->prefence) dma_fence_put(job->prefence);
host1x_job_put(job);
Yes, that seems like a good idea.
Thierry
Cheers, Mikko.
On Mon, Mar 13, 2017 at 11:07:28AM +0200, Mikko Perttunen wrote:
On 13.03.2017 09:15, Thierry Reding wrote:
On Thu, Mar 09, 2017 at 07:57:18PM +0200, Mikko Perttunen wrote:
Add support for sync file-based prefences and postfences to job submission. Fences are passed to the Host1x implementation.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
drivers/gpu/drm/tegra/drm.c | 69 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 64dff8530403..bf4a2a13c17d 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -10,6 +10,7 @@ #include <linux/bitops.h> #include <linux/host1x.h> #include <linux/iommu.h> +#include <linux/sync_file.h>
#include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> @@ -344,6 +345,7 @@ int tegra_drm_submit(struct tegra_drm_context *context, struct drm_tegra_submit *args, struct drm_device *drm, struct drm_file *file) {
- struct host1x *host1x = dev_get_drvdata(drm->dev->parent); unsigned int num_cmdbufs = args->num_cmdbufs; unsigned int num_relocs = args->num_relocs; unsigned int num_waitchks = args->num_waitchks;
@@ -361,6 +363,11 @@ int tegra_drm_submit(struct tegra_drm_context *context, if (args->num_syncpts != 1) return -EINVAL;
- /* Check for unrecognized flags */
- if (args->flags & ~(DRM_TEGRA_SUBMIT_WAIT_FENCE_FD |
DRM_TEGRA_SUBMIT_CREATE_FENCE_FD))
return -EINVAL;
- job = host1x_job_alloc(context->channel, args->num_cmdbufs, args->num_relocs, args->num_waitchks); if (!job)
@@ -372,19 +379,27 @@ int tegra_drm_submit(struct tegra_drm_context *context, job->class = context->client->base.class; job->serialize = true;
if (args->flags & DRM_TEGRA_SUBMIT_WAIT_FENCE_FD) {
job->prefence = sync_file_get_fence(args->fence);
if (!job->prefence) {
err = -ENOENT;
goto put_job;
}
}
while (num_cmdbufs) { struct drm_tegra_cmdbuf cmdbuf; struct host1x_bo *bo;
if (copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf))) { err = -EFAULT;
goto fail;
goto put_fence;
}
bo = host1x_bo_lookup(file, cmdbuf.handle); if (!bo) { err = -ENOENT;
goto fail;
goto put_fence;
}
host1x_job_add_gather(job, bo, cmdbuf.words, cmdbuf.offset);
@@ -398,19 +413,19 @@ int tegra_drm_submit(struct tegra_drm_context *context, &relocs[num_relocs], drm, file); if (err < 0)
goto fail;
goto put_fence;
}
if (copy_from_user(job->waitchk, waitchks, sizeof(*waitchks) * num_waitchks)) { err = -EFAULT;
goto fail;
goto put_fence;
}
if (copy_from_user(&syncpt, (void __user *)(uintptr_t)args->syncpts, sizeof(syncpt))) { err = -EFAULT;
goto fail;
goto put_fence;
}
job->is_addr_reg = context->client->ops->is_addr_reg;
@@ -423,20 +438,54 @@ int tegra_drm_submit(struct tegra_drm_context *context,
err = host1x_job_pin(job, context->client->base.dev); if (err)
goto fail;
goto put_fence;
err = host1x_job_submit(job); if (err)
goto fail_submit;
goto unpin_job;
Shouldn't all error-unwinding gotos after this jump to the unpin_job label as well? Seems like they all jump to put_fence instead, which I think would leave the job pinned on failure.
After host1x_job_submit is succesfully called, host1x's job tracking owns the job and will call unpin on it once it finishes or times out, so we cannot unpin it from here.
Okay, might be worth explaining that in a comment above the call to host1x_job_submit() because it's not obvious and I'm pretty sure people would send in patches to "fix" this.
Thierry
On Thu, Mar 9, 2017 at 6:57 PM, Mikko Perttunen mperttunen@nvidia.com wrote:
Hi everyone,
this series adds support for using sync fences as prefences and postfences for host1x job submissions. The patches are available as a git repository at
https://github.com/cyndis/linux/tree/host1x-fence-1
and testing code is available at
https://github.com/cyndis/host1x_test
though you may want to edit the main function to disable the timeout tests for now as they cause a deadlock (not caused by this series; fix upcoming).
Verified on a Jetson TX1; should go on top of the earlier VIC series.
Some additional points:
- I noticed that the waitchk_mask field in the submit UAPI is completely useless, and has never had any effect in the upstream kernel. It has also not existed in the downstream kernel for many years. We could replace it with the flags field if that is deemed acceptable, though of course it is possible there exists some application that fills it with some non-zero value.
If open source userspace (nouveau_dri.so) never used it, then you can freely change it. Backwards compat guarantee in drm is only for open source userspace (and by implication ofc anything that uses the ioctl the same way). See:
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace...
On that topic, do we have the nouveau patches to enable the egl_android extension for this already published?
- Signaling is enabled for all host1x fences, not just those for which enable_signaling has been called. This is because enable_signaling is called from atomic context and we cannot set up an action waiter in atomic context.
Yeah, this was some good fun getting it all sorted in i915 too :-) -Daniel
Thanks, Mikko
Mikko Perttunen (3): gpu: host1x: Add support for DMA fences drm/tegra: Add sync file support to submit interface drm/tegra: Support for sync file-based fences in submit
drivers/gpu/drm/tegra/drm.c | 69 +++++++++++-- drivers/gpu/host1x/Kconfig | 1 + drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/dev.h | 12 ++- drivers/gpu/host1x/fence.c | 202 +++++++++++++++++++++++++++++++++++++ drivers/gpu/host1x/fence.h | 28 +++++ drivers/gpu/host1x/hw/channel_hw.c | 36 +++++-- drivers/gpu/host1x/intr.c | 11 +- drivers/gpu/host1x/intr.h | 8 +- drivers/gpu/host1x/syncpt.c | 2 + include/linux/host1x.h | 12 ++- include/uapi/drm/tegra_drm.h | 8 +- 12 files changed, 367 insertions(+), 23 deletions(-) create mode 100644 drivers/gpu/host1x/fence.c create mode 100644 drivers/gpu/host1x/fence.h
-- 2.11.1
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On 03/09/2017 08:58 PM, Daniel Vetter wrote:
On Thu, Mar 9, 2017 at 6:57 PM, Mikko Perttunen mperttunen@nvidia.com wrote:
Hi everyone,
this series adds support for using sync fences as prefences and postfences for host1x job submissions. The patches are available as a git repository at
https://github.com/cyndis/linux/tree/host1x-fence-1
and testing code is available at
https://github.com/cyndis/host1x_test
though you may want to edit the main function to disable the timeout tests for now as they cause a deadlock (not caused by this series; fix upcoming).
Verified on a Jetson TX1; should go on top of the earlier VIC series.
Some additional points:
- I noticed that the waitchk_mask field in the submit UAPI is completely useless, and has never had any effect in the upstream kernel. It has also not existed in the downstream kernel for many years. We could replace it with the flags field if that is deemed acceptable, though of course it is possible there exists some application that fills it with some non-zero value.
If open source userspace (nouveau_dri.so) never used it, then you can freely change it. Backwards compat guarantee in drm is only for open source userspace (and by implication ofc anything that uses the ioctl the same way). See:
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace...
nouveau doesn't have any host1x related code - so no, there is no pre-existing open-source userspace that uses this :)
On that topic, do we have the nouveau patches to enable the egl_android extension for this already published?
I assume you are referring to EGL_ANDROID_native_fence_sync; I don't know what nouveau's status is regarding that. With this series, the host1x driver does not yet allow other drivers access to the raw syncpoint values behind host1x fences but that can be fixed pretty easily if/when nouveau wants to support native syncpoint waits on Tegra. Host1x jobs do use native waits already with this series, though.
- Signaling is enabled for all host1x fences, not just those for which enable_signaling has been called. This is because enable_signaling is called from atomic context and we cannot set up an action waiter in atomic context.
Yeah, this was some good fun getting it all sorted in i915 too :-) -Daniel
:)
Cheers, Mikko
Thanks, Mikko
Mikko Perttunen (3): gpu: host1x: Add support for DMA fences drm/tegra: Add sync file support to submit interface drm/tegra: Support for sync file-based fences in submit
drivers/gpu/drm/tegra/drm.c | 69 +++++++++++-- drivers/gpu/host1x/Kconfig | 1 + drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/dev.h | 12 ++- drivers/gpu/host1x/fence.c | 202 +++++++++++++++++++++++++++++++++++++ drivers/gpu/host1x/fence.h | 28 +++++ drivers/gpu/host1x/hw/channel_hw.c | 36 +++++-- drivers/gpu/host1x/intr.c | 11 +- drivers/gpu/host1x/intr.h | 8 +- drivers/gpu/host1x/syncpt.c | 2 + include/linux/host1x.h | 12 ++- include/uapi/drm/tegra_drm.h | 8 +- 12 files changed, 367 insertions(+), 23 deletions(-) create mode 100644 drivers/gpu/host1x/fence.c create mode 100644 drivers/gpu/host1x/fence.h
-- 2.11.1
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Mar 09, 2017 at 09:09:52PM +0200, Mikko Perttunen wrote:
On 03/09/2017 08:58 PM, Daniel Vetter wrote:
On Thu, Mar 9, 2017 at 6:57 PM, Mikko Perttunen mperttunen@nvidia.com wrote:
Hi everyone,
this series adds support for using sync fences as prefences and postfences for host1x job submissions. The patches are available as a git repository at
https://github.com/cyndis/linux/tree/host1x-fence-1
and testing code is available at
https://github.com/cyndis/host1x_test
though you may want to edit the main function to disable the timeout tests for now as they cause a deadlock (not caused by this series; fix upcoming).
Verified on a Jetson TX1; should go on top of the earlier VIC series.
Some additional points:
- I noticed that the waitchk_mask field in the submit UAPI is completely useless, and has never had any effect in the upstream kernel. It has also not existed in the downstream kernel for many years. We could replace it with the flags field if that is deemed acceptable, though of course it is possible there exists some application that fills it with some non-zero value.
If open source userspace (nouveau_dri.so) never used it, then you can freely change it. Backwards compat guarantee in drm is only for open source userspace (and by implication ofc anything that uses the ioctl the same way). See:
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace...
nouveau doesn't have any host1x related code - so no, there is no pre-existing open-source userspace that uses this :)
On that topic, do we have the nouveau patches to enable the egl_android extension for this already published?
I assume you are referring to EGL_ANDROID_native_fence_sync; I don't know what nouveau's status is regarding that. With this series, the host1x driver does not yet allow other drivers access to the raw syncpoint values behind host1x fences but that can be fixed pretty easily if/when nouveau wants to support native syncpoint waits on Tegra. Host1x jobs do use native waits already with this series, though.
Well you're adding new uapi to the tegra drm driver, and EGL_ANDROID_native_fence_sync for nouveau seems like the real use-case for this. Which means we need that, before we can merge these patches.
At least I assume that this was done for the nv blob tegra gl driver? Of course if there's some other reasonable use-case, we can use that as open source demonstration thing too. -Daniel
On 10.03.2017 14:43, Daniel Vetter wrote:
On Thu, Mar 09, 2017 at 09:09:52PM +0200, Mikko Perttunen wrote:
On 03/09/2017 08:58 PM, Daniel Vetter wrote:
On Thu, Mar 9, 2017 at 6:57 PM, Mikko Perttunen mperttunen@nvidia.com wrote:
Hi everyone,
this series adds support for using sync fences as prefences and postfences for host1x job submissions. The patches are available as a git repository at
https://github.com/cyndis/linux/tree/host1x-fence-1
and testing code is available at
https://github.com/cyndis/host1x_test
though you may want to edit the main function to disable the timeout tests for now as they cause a deadlock (not caused by this series; fix upcoming).
Verified on a Jetson TX1; should go on top of the earlier VIC series.
Some additional points:
- I noticed that the waitchk_mask field in the submit UAPI is completely useless, and has never had any effect in the upstream kernel. It has also not existed in the downstream kernel for many years. We could replace it with the flags field if that is deemed acceptable, though of course it is possible there exists some application that fills it with some non-zero value.
If open source userspace (nouveau_dri.so) never used it, then you can freely change it. Backwards compat guarantee in drm is only for open source userspace (and by implication ofc anything that uses the ioctl the same way). See:
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace...
nouveau doesn't have any host1x related code - so no, there is no pre-existing open-source userspace that uses this :)
On that topic, do we have the nouveau patches to enable the egl_android extension for this already published?
I assume you are referring to EGL_ANDROID_native_fence_sync; I don't know what nouveau's status is regarding that. With this series, the host1x driver does not yet allow other drivers access to the raw syncpoint values behind host1x fences but that can be fixed pretty easily if/when nouveau wants to support native syncpoint waits on Tegra. Host1x jobs do use native waits already with this series, though.
Well you're adding new uapi to the tegra drm driver, and EGL_ANDROID_native_fence_sync for nouveau seems like the real use-case for this. Which means we need that, before we can merge these patches.
At least I assume that this was done for the nv blob tegra gl driver? Of course if there's some other reasonable use-case, we can use that as open source demonstration thing too. -Daniel
The downstream driver doesn't actually support this UAPI; the work was done against the testcases linked to above. My goal was to have userspace clients use fence fds instead of raw syncpoints, which would be cleaner and less error-prone; and allow future extensibility. I suppose this might not be enough of a use-case though.
Now, quickly grepping through mesa and the nouveau kernel driver, it looks like there is currently no implementation of EGL_ANDROID_native_fence_sync, and no support for SYNC_FILE in nouveau. Regrettably, I currently don't have the time to learn these codebases and implement these features, so we might have to drop at least patches 2 and 3 and revisit them later.
Thanks, Mikko.
On Mon, Mar 13, 2017 at 10:55:01AM +0200, Mikko Perttunen wrote:
On 10.03.2017 14:43, Daniel Vetter wrote:
On Thu, Mar 09, 2017 at 09:09:52PM +0200, Mikko Perttunen wrote:
On 03/09/2017 08:58 PM, Daniel Vetter wrote:
On Thu, Mar 9, 2017 at 6:57 PM, Mikko Perttunen mperttunen@nvidia.com wrote:
Hi everyone,
this series adds support for using sync fences as prefences and postfences for host1x job submissions. The patches are available as a git repository at
https://github.com/cyndis/linux/tree/host1x-fence-1
and testing code is available at
https://github.com/cyndis/host1x_test
though you may want to edit the main function to disable the timeout tests for now as they cause a deadlock (not caused by this series; fix upcoming).
Verified on a Jetson TX1; should go on top of the earlier VIC series.
Some additional points:
- I noticed that the waitchk_mask field in the submit UAPI is completely useless, and has never had any effect in the upstream kernel. It has also not existed in the downstream kernel for many years. We could replace it with the flags field if that is deemed acceptable, though of course it is possible there exists some application that fills it with some non-zero value.
If open source userspace (nouveau_dri.so) never used it, then you can freely change it. Backwards compat guarantee in drm is only for open source userspace (and by implication ofc anything that uses the ioctl the same way). See:
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace...
nouveau doesn't have any host1x related code - so no, there is no pre-existing open-source userspace that uses this :)
On that topic, do we have the nouveau patches to enable the egl_android extension for this already published?
I assume you are referring to EGL_ANDROID_native_fence_sync; I don't know what nouveau's status is regarding that. With this series, the host1x driver does not yet allow other drivers access to the raw syncpoint values behind host1x fences but that can be fixed pretty easily if/when nouveau wants to support native syncpoint waits on Tegra. Host1x jobs do use native waits already with this series, though.
Well you're adding new uapi to the tegra drm driver, and EGL_ANDROID_native_fence_sync for nouveau seems like the real use-case for this. Which means we need that, before we can merge these patches.
At least I assume that this was done for the nv blob tegra gl driver? Of course if there's some other reasonable use-case, we can use that as open source demonstration thing too. -Daniel
The downstream driver doesn't actually support this UAPI; the work was done against the testcases linked to above. My goal was to have userspace clients use fence fds instead of raw syncpoints, which would be cleaner and less error-prone; and allow future extensibility. I suppose this might not be enough of a use-case though.
Now, quickly grepping through mesa and the nouveau kernel driver, it looks like there is currently no implementation of EGL_ANDROID_native_fence_sync, and no support for SYNC_FILE in nouveau. Regrettably, I currently don't have the time to learn these codebases and implement these features, so we might have to drop at least patches 2 and 3 and revisit them later.
Yeah demos and test-cases are cool, but for final merging we want the real thing (in some way or another). -Daniel
dri-devel@lists.freedesktop.org