[PATCH v2 00/22] Tegra DRM fixes

List overview All Threads
Download

newer

older

[Bug 101401]...

[PATCH] drm: Fix boot panic when...

Dmitry Osipenko

13 Jun 2017 13 Jun '17

11:15 p.m.

The major changes of the V2 are:

- Dropped "drm/tegra: Check whether page belongs to BO in tegra_bo_kmap()" patch as it is not needed with the checks performed by 'submit' IOCTL.

- Dropped "drm/tegra: Remove module ownership from the tegra_fb_ops", turned out it is not needed. Thanks to Thierry for the clarification.

- Added new patch "gpu: host1x: At first try a non-blocking allocation for the gather copy", which is a trivial fix for the Host1x firewall performance.

- Reworked "drm/tegra: dc: Drop the reset asserts to workaround a bug" patch as per Thierry's suggestion to avoid reset only on Tegra20.

- Fixed "drm/tegra: Don't use IOMMU on Tegra20" patch compilation in case of modular Tegra DRM. Thanks to Mikko and Nicolas for the suggestions.

- The "Forbid relocation address shifting in the firewall" patch has been reworked to not break newer Tegra's.

Thanks to Erik, Mikko, Nicolas and Thierry for the reviews and suggestions.

Dmitry Osipenko (21): drm/tegra: Fix lockup on a use of staging API drm/tegra: Correct idr_alloc() minimum id drm/tegra: Check for malformed offsets and sizes in the 'submit' IOCTL drm/tegra: Correct copying of waitchecks and disable them in the 'submit' IOCTL drm/tegra: Check syncpoint ID in the 'submit' IOCTL drm/tegra: dc: Avoid reset asserts on Tegra20 drm/tegra: dc: Apply clipping to the plane drm/tegra: dc: Disable plane if it is invisible drm/tegra: Don't use IOMMU on Tegra20 Revert "iommu/tegra: gart: Do not register with bus" gpu: host1x: Initialize firewall class to the jobs one gpu: host1x: Correct host1x_job_pin() error handling gpu: host1x: Do not leak BO's phys address to userspace gpu: host1x: Forbid relocation address shifting in the firewall gpu: host1x: Forbid RESTART opcode in the firewall gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall gpu: host1x: Correct swapped arguments in the is_addr_reg() definition gpu: host1x: Check waits in the firewall gpu: host1x: Remove unused 'struct host1x_cmdbuf' gpu: host1x: Remove unused host1x_cdma_stop() definition gpu: host1x: At first try a non-blocking allocation for the gather copy

Mikko Perttunen (1): gpu: host1x: Refactor channel allocation code

-- 2.13.0

Show replies by date

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 01/22] drm/tegra: Fix lockup on a use of staging API

Commit bdd2f9cd ("Don't leak kernel pointer to userspace") added a mutex around staging IOCTL's, some of those mutexes are taken twice.

Fixes: bdd2f9cd10eb ("drm/tegra: Don't leak kernel pointer to userspace") Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 51c48a8e00ec..b49d32a89f40 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -451,18 +451,6 @@ int tegra_drm_submit(struct tegra_drm_context *context,

#ifdef CONFIG_DRM_TEGRA_STAGING -static struct tegra_drm_context * -tegra_drm_file_get_context(struct tegra_drm_file *file, u32 id) -{ - struct tegra_drm_context *context; - - mutex_lock(&file->lock); - context = idr_find(&file->contexts, id); - mutex_unlock(&file->lock); - - return context; -} - static int tegra_gem_create(struct drm_device *drm, void *data, struct drm_file *file) { @@ -606,7 +594,7 @@ static int tegra_close_channel(struct drm_device *drm, void *data,

mutex_lock(&fpriv->lock);

- context = tegra_drm_file_get_context(fpriv, args->context); + context = idr_find(&fpriv->contexts, args->context); if (!context) { err = -EINVAL; goto unlock; @@ -631,7 +619,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,

mutex_lock(&fpriv->lock);

- context = tegra_drm_file_get_context(fpriv, args->context); + context = idr_find(&fpriv->contexts, args->context); if (!context) { err = -ENODEV; goto unlock; @@ -660,7 +648,7 @@ static int tegra_submit(struct drm_device *drm, void *data,

mutex_lock(&fpriv->lock);

- context = tegra_drm_file_get_context(fpriv, args->context); + context = idr_find(&fpriv->contexts, args->context); if (!context) { err = -ENODEV; goto unlock; @@ -685,7 +673,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,

mutex_lock(&fpriv->lock);

- context = tegra_drm_file_get_context(fpriv, args->context); + context = idr_find(&fpriv->contexts, args->context); if (!context) { err = -ENODEV; goto unlock;

-- 2.13.0

Erik Faye-Lund

14 Jun 14 Jun

7:09 a.m.

New subject: [PATCH v2 01/22] drm/tegra: Fix lockup on a use of staging API

On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

Commit bdd2f9cd ("Don't leak kernel pointer to userspace") added a mutex around staging IOCTL's, some of those mutexes are taken twice.

Fixes: bdd2f9cd10eb ("drm/tegra: Don't leak kernel pointer to userspace") Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com

Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 02/22] drm/tegra: Correct idr_alloc() minimum id

The client ID 0 is reserved by the host1x/cdma to mark the timeout timer work as already been scheduled and context ID is used as the clients one. This fixes spurious CDMA timeouts.

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index b49d32a89f40..f9282da94a1f 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -539,7 +539,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv, if (err < 0) return err;

- err = idr_alloc(&fpriv->contexts, context, 0, 0, GFP_KERNEL); + err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL); if (err < 0) { client->ops->close_channel(context); return err;

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 03/22] drm/tegra: Check for malformed offsets and sizes in the 'submit' IOCTL

If commands buffer claims a number of words that is higher than its BO can fit, a kernel OOPS will be fired on the out-of-bounds BO access. This was triggered by an opentegra Xorg driver that erroneously pushed too many commands to the pushbuf.

The CDMA commands buffer address is 4 bytes aligned, so check its alignment.

The maximum number of the CDMA gather fetches is 16383, add a check for it.

Add a sanity check for the relocations in a same way.

[ 46.829393] Unable to handle kernel paging request at virtual address f09b2000 ... [<c04a3ba4>] (host1x_job_pin) from [<c04dfcd0>] (tegra_drm_submit+0x474/0x510) [<c04dfcd0>] (tegra_drm_submit) from [<c04deea0>] (tegra_submit+0x50/0x6c) [<c04deea0>] (tegra_submit) from [<c04c07c0>] (drm_ioctl+0x1e4/0x3ec) [<c04c07c0>] (drm_ioctl) from [<c02541a0>] (do_vfs_ioctl+0x9c/0x8e4) [<c02541a0>] (do_vfs_ioctl) from [<c0254a1c>] (SyS_ioctl+0x34/0x5c) [<c0254a1c>] (SyS_ioctl) from [<c0107640>] (ret_fast_syscall+0x0/0x3c)

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 49 +++++++++++++++++++++++++++++++++++++++++++++ drivers/gpu/drm/tegra/gem.c | 5 ----- drivers/gpu/drm/tegra/gem.h | 5 +++++ 3 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index f9282da94a1f..c9246405fc7b 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -26,6 +26,7 @@ #define DRIVER_PATCHLEVEL 0

#define CARVEOUT_SZ SZ_64M +#define CDMA_GATHER_FETCHES_MAX_NB 16383

struct tegra_drm_file { struct idr contexts; @@ -383,18 +384,42 @@ int tegra_drm_submit(struct tegra_drm_context *context, while (num_cmdbufs) { struct drm_tegra_cmdbuf cmdbuf; struct host1x_bo *bo; + struct tegra_bo *obj; + u64 offset;

if (copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf))) { err = -EFAULT; goto fail; }

+ /* + * The maximum number of CDMA gather fetches is 16383, a higher + * value means the words count is malformed. + */ + if (cmdbuf.words > CDMA_GATHER_FETCHES_MAX_NB) { + err = -EINVAL; + goto fail; + } + bo = host1x_bo_lookup(file, cmdbuf.handle); if (!bo) { err = -ENOENT; goto fail; }

+ offset = (u64)cmdbuf.offset + (u64)cmdbuf.words * sizeof(u32); + obj = host1x_to_tegra_bo(bo); + + /* + * Gather buffer base address must be 4-bytes aligned, + * unaligned offset is malformed and cause commands stream + * corruption on the buffer address relocation. + */ + if (offset & 3 || offset >= obj->gem.size) { + err = -EINVAL; + goto fail; + } + host1x_job_add_gather(job, bo, cmdbuf.words, cmdbuf.offset); num_cmdbufs--; cmdbufs++; @@ -402,11 +427,35 @@ int tegra_drm_submit(struct tegra_drm_context *context,

/* copy and resolve relocations from submit */ while (num_relocs--) { + struct host1x_reloc *reloc; + struct tegra_bo *obj; + err = host1x_reloc_copy_from_user(&job->relocarray[num_relocs], &relocs[num_relocs], drm, file); if (err < 0) goto fail; + + reloc = &job->relocarray[num_relocs]; + obj = host1x_to_tegra_bo(reloc->cmdbuf.bo); + + /* + * The unaligned cmdbuf offset will cause an unaligned write + * during of the relocations patching, corrupting the commands + * stream. + */ + if (reloc->cmdbuf.offset & 3 || + reloc->cmdbuf.offset >= obj->gem.size) { + err = -EINVAL; + goto fail; + } + + obj = host1x_to_tegra_bo(reloc->target.bo); + + if (reloc->target.offset >= obj->gem.size) { + err = -EINVAL; + goto fail; + } }

if (copy_from_user(job->waitchk, waitchks, diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c index 424569b53e57..7a39a355678a 100644 --- a/drivers/gpu/drm/tegra/gem.c +++ b/drivers/gpu/drm/tegra/gem.c @@ -20,11 +20,6 @@ #include "drm.h" #include "gem.h"

-static inline struct tegra_bo *host1x_to_tegra_bo(struct host1x_bo *bo) -{ - return container_of(bo, struct tegra_bo, base); -} - static void tegra_bo_put(struct host1x_bo *bo) { struct tegra_bo *obj = host1x_to_tegra_bo(bo); diff --git a/drivers/gpu/drm/tegra/gem.h b/drivers/gpu/drm/tegra/gem.h index 6c5f12ac0087..8b32a6fd586d 100644 --- a/drivers/gpu/drm/tegra/gem.h +++ b/drivers/gpu/drm/tegra/gem.h @@ -52,6 +52,11 @@ static inline struct tegra_bo *to_tegra_bo(struct drm_gem_object *gem) return container_of(gem, struct tegra_bo, gem); }

+static inline struct tegra_bo *host1x_to_tegra_bo(struct host1x_bo *bo) +{ + return container_of(bo, struct tegra_bo, base); +} + struct tegra_bo *tegra_bo_create(struct drm_device *drm, size_t size, unsigned long flags); struct tegra_bo *tegra_bo_create_with_handle(struct drm_file *file,

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 04/22] drm/tegra: Correct copying of waitchecks and disable them in the 'submit' IOCTL

The waitchecks along with multiple syncpoints per submit are not ready for use yet, let's forbid them for now.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 60 ++++++++++++++++++++++++++++++++++++++++++--- drivers/gpu/host1x/job.h | 7 ------ include/linux/host1x.h | 7 ++++++ 3 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index c9246405fc7b..b90ed2cd32ce 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -349,6 +349,36 @@ static int host1x_reloc_copy_from_user(struct host1x_reloc *dest, return 0; }

+static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest, + struct drm_tegra_waitchk __user *src, + struct drm_file *file) +{ + u32 cmdbuf; + int err; + + err = get_user(cmdbuf, &src->handle); + if (err < 0) + return err; + + err = get_user(dest->offset, &src->offset); + if (err < 0) + return err; + + err = get_user(dest->syncpt_id, &src->syncpt); + if (err < 0) + return err; + + err = get_user(dest->thresh, &src->thresh); + if (err < 0) + return err; + + dest->bo = host1x_bo_lookup(file, cmdbuf); + if (!dest->bo) + return -ENOENT; + + return 0; +} + int tegra_drm_submit(struct tegra_drm_context *context, struct drm_tegra_submit *args, struct drm_device *drm, struct drm_file *file) @@ -370,6 +400,10 @@ int tegra_drm_submit(struct tegra_drm_context *context, if (args->num_syncpts != 1) return -EINVAL;

+ /* We don't yet support waitchks */ + if (args->num_waitchks != 0) + return -EINVAL; + job = host1x_job_alloc(context->channel, args->num_cmdbufs, args->num_relocs, args->num_waitchks); if (!job) @@ -458,10 +492,28 @@ int tegra_drm_submit(struct tegra_drm_context *context, } }

- if (copy_from_user(job->waitchk, waitchks, - sizeof(*waitchks) * num_waitchks)) { - err = -EFAULT; - goto fail; + /* copy and resolve waitchks from submit */ + while (num_waitchks--) { + struct host1x_waitchk *wait = &job->waitchk[num_waitchks]; + struct tegra_bo *obj; + + err = host1x_waitchk_copy_from_user(wait, + &waitchks[num_waitchks], + file); + if (err < 0) + goto fail; + + obj = host1x_to_tegra_bo(wait->bo); + + /* + * The unaligned offset will cause an unaligned write during + * of the waitchks patching, corrupting the commands stream. + */ + if (wait->offset & 3 || + wait->offset >= obj->gem.size) { + err = -EINVAL; + goto fail; + } }

if (copy_from_user(&syncpt, (void __user *)(uintptr_t)args->syncpts, diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h index 878239c476d2..0debd93a1849 100644 --- a/drivers/gpu/host1x/job.h +++ b/drivers/gpu/host1x/job.h @@ -34,13 +34,6 @@ struct host1x_cmdbuf { u32 pad; };

-struct host1x_waitchk { - struct host1x_bo *bo; - u32 offset; - u32 syncpt_id; - u32 thresh; -}; - struct host1x_job_unpin_data { struct host1x_bo *bo; struct sg_table *sgt; diff --git a/include/linux/host1x.h b/include/linux/host1x.h index 3d04aa1dc83e..aa323e43ae4e 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -177,6 +177,13 @@ struct host1x_reloc { unsigned long shift; };

+struct host1x_waitchk { + struct host1x_bo *bo; + u32 offset; + u32 syncpt_id; + u32 thresh; +}; + struct host1x_job { /* When refcount goes to zero, job can be freed */ struct kref ref;

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 05/22] drm/tegra: Check syncpoint ID in the 'submit' IOCTL

In case of invalid syncpoint ID, the host1x_syncpt_get() returns NULL and none of its users perform a check of the returned pointer later. Let's bail out until it's too late.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 9 +++++++++ 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index b90ed2cd32ce..e999391aedc9 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -393,6 +393,8 @@ int tegra_drm_submit(struct tegra_drm_context *context, struct drm_tegra_waitchk __user *waitchks = (void __user *)(uintptr_t)args->waitchks; struct drm_tegra_syncpt syncpt; + struct host1x *host1x = dev_get_drvdata(drm->dev->parent); + struct host1x_syncpt *sp; struct host1x_job *job; int err;

@@ -522,6 +524,13 @@ int tegra_drm_submit(struct tegra_drm_context *context, goto fail; }

+ /* check whether syncpoint ID is valid */ + sp = host1x_syncpt_get(host1x, syncpt.id); + if (!sp) { + err = -ENOENT; + goto fail; + } + job->is_addr_reg = context->client->ops->is_addr_reg; job->syncpt_incrs = syncpt.incrs; job->syncpt_id = syncpt.id;

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 06/22] drm/tegra: dc: Avoid reset asserts on Tegra20

Commit 33a8eb8 ("Implement runtime PM") introduced HW reset control. It causes a hang on Tegra20 if both display controllers are utilized (RGB panel and HDMI). The TRM suggests that each display controller has its own reset control, apparently it is not correct.

Fixes: 33a8eb8d40ee ("drm/tegra: dc: Implement runtime PM") Signed-off-by: Dmitry Osipenko digetx@gmail.com --- drivers/gpu/drm/tegra/dc.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 95b373f739f2..98ee6abb056c 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -30,6 +30,7 @@ struct tegra_dc_soc_info { bool supports_block_linear; unsigned int pitch_align; bool has_powergate; + bool broken_reset; };

struct tegra_plane { @@ -1856,6 +1857,7 @@ static const struct tegra_dc_soc_info tegra20_dc_soc_info = { .supports_block_linear = false, .pitch_align = 8, .has_powergate = false, + .broken_reset = true, };

static const struct tegra_dc_soc_info tegra30_dc_soc_info = { @@ -1865,6 +1867,7 @@ static const struct tegra_dc_soc_info tegra30_dc_soc_info = { .supports_block_linear = false, .pitch_align = 8, .has_powergate = false, + .broken_reset = false, };

static const struct tegra_dc_soc_info tegra114_dc_soc_info = { @@ -1874,6 +1877,7 @@ static const struct tegra_dc_soc_info tegra114_dc_soc_info = { .supports_block_linear = false, .pitch_align = 64, .has_powergate = true, + .broken_reset = false, };

static const struct tegra_dc_soc_info tegra124_dc_soc_info = { @@ -1883,6 +1887,7 @@ static const struct tegra_dc_soc_info tegra124_dc_soc_info = { .supports_block_linear = true, .pitch_align = 64, .has_powergate = true, + .broken_reset = false, };

static const struct tegra_dc_soc_info tegra210_dc_soc_info = { @@ -1892,6 +1897,7 @@ static const struct tegra_dc_soc_info tegra210_dc_soc_info = { .supports_block_linear = true, .pitch_align = 64, .has_powergate = true, + .broken_reset = false, };

static const struct of_device_id tegra_dc_of_match[] = { @@ -1989,7 +1995,8 @@ static int tegra_dc_probe(struct platform_device *pdev) return PTR_ERR(dc->rst); }

- reset_control_assert(dc->rst); + if (!dc->soc->broken_reset) + reset_control_assert(dc->rst);

if (dc->soc->has_powergate) { if (dc->pipe == 0) @@ -2063,10 +2070,12 @@ static int tegra_dc_suspend(struct device *dev) struct tegra_dc *dc = dev_get_drvdata(dev); int err;

- err = reset_control_assert(dc->rst); - if (err < 0) { - dev_err(dev, "failed to assert reset: %d\n", err); - return err; + if (!dc->soc->broken_reset) { + err = reset_control_assert(dc->rst); + if (err < 0) { + dev_err(dev, "failed to assert reset: %d\n", err); + return err; + } }

if (dc->soc->has_powergate) @@ -2096,10 +2105,13 @@ static int tegra_dc_resume(struct device *dev) return err; }

- err = reset_control_deassert(dc->rst); - if (err < 0) { - dev_err(dev, "failed to deassert reset: %d\n", err); - return err; + if (!dc->soc->broken_reset) { + err = reset_control_deassert(dc->rst); + if (err < 0) { + dev_err(dev, + "failed to deassert reset: %d\n", err); + return err; + } } }

-- 2.13.0

Erik Faye-Lund

14 Jun 14 Jun

7:13 a.m.

New subject: [PATCH v2 06/22] drm/tegra: dc: Avoid reset asserts on Tegra20

On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

Commit 33a8eb8 ("Implement runtime PM") introduced HW reset control. It causes a hang on Tegra20 if both display controllers are utilized (RGB panel and HDMI). The TRM suggests that each display controller has its own reset control, apparently it is not correct.

Fixes: 33a8eb8d40ee ("drm/tegra: dc: Implement runtime PM") Signed-off-by: Dmitry Osipenko digetx@gmail.com

Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 07/22] drm/tegra: dc: Apply clipping to the plane

On Tegra20 an overlay plane should be clipped, otherwise its output is distorted once plane crosses display boundary.

Signed-off-by: Dmitry Osipenko digetx@gmail.com --- drivers/gpu/drm/tegra/dc.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index 98ee6abb056c..a7a7cce1afd0 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -486,12 +486,25 @@ static int tegra_plane_state_add(struct tegra_plane *plane, { struct drm_crtc_state *crtc_state; struct tegra_dc_state *tegra; + struct drm_rect clip; + int err;

/* Propagate errors from allocation or locking failures. */ crtc_state = drm_atomic_get_crtc_state(state->state, state->crtc); if (IS_ERR(crtc_state)) return PTR_ERR(crtc_state);

+ clip.x1 = 0; + clip.y1 = 0; + clip.x2 = crtc_state->mode.hdisplay; + clip.y2 = crtc_state->mode.vdisplay; + + /* Check plane state for visibility and calculate clipping bounds */ + err = drm_plane_helper_check_state(state, &clip, 0, INT_MAX, + true, true); + if (err < 0) + return err; + tegra = to_dc_state(crtc_state);

tegra->planes |= WIN_A_ACT_REQ << plane->index; @@ -561,14 +574,14 @@ static void tegra_plane_atomic_update(struct drm_plane *plane, return;

memset(&window, 0, sizeof(window)); - window.src.x = plane->state->src_x >> 16; - window.src.y = plane->state->src_y >> 16; - window.src.w = plane->state->src_w >> 16; - window.src.h = plane->state->src_h >> 16; - window.dst.x = plane->state->crtc_x; - window.dst.y = plane->state->crtc_y; - window.dst.w = plane->state->crtc_w; - window.dst.h = plane->state->crtc_h; + window.src.x = plane->state->src.x1 >> 16; + window.src.y = plane->state->src.y1 >> 16; + window.src.w = drm_rect_width(&plane->state->src) >> 16; + window.src.h = drm_rect_height(&plane->state->src) >> 16; + window.dst.x = plane->state->dst.x1; + window.dst.y = plane->state->dst.y1; + window.dst.w = drm_rect_width(&plane->state->dst); + window.dst.h = drm_rect_height(&plane->state->dst); window.bits_per_pixel = fb->format->cpp[0] * 8; window.bottom_up = tegra_fb_is_bottom_up(fb);

-- 2.13.0

Erik Faye-Lund

14 Jun 14 Jun

7:20 a.m.

New subject: [PATCH v2 07/22] drm/tegra: dc: Apply clipping to the plane

On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

On Tegra20 an overlay plane should be clipped, otherwise its output is distorted once plane crosses display boundary.

Signed-off-by: Dmitry Osipenko digetx@gmail.com

drivers/gpu/drm/tegra/dc.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-)

```
  struct drm_rect clip;
```

  int err;

  /* Propagate errors from allocation or locking failures. */
  crtc_state = drm_atomic_get_crtc_state(state->state, state->crtc);
  if (IS_ERR(crtc_state))
          return PTR_ERR(crtc_state);

```
  clip.x1 = 0;
```
```
  clip.y1 = 0;
```
```
  clip.x2 = crtc_state->mode.hdisplay;
```
```
  clip.y2 = crtc_state->mode.vdisplay;
```

  /* Check plane state for visibility and calculate clipping bounds */

  err = drm_plane_helper_check_state(state, &clip, 0, INT_MAX,

                                     true, true);

```
  if (err < 0)
```
```
          return err;
```

  tegra = to_dc_state(crtc_state);

  tegra->planes |= WIN_A_ACT_REQ << plane->index;

@@ -561,14 +574,14 @@ static void tegra_plane_atomic_update(struct drm_plane *plane, return;

    memset(&window, 0, sizeof(window));

  window.src.x = plane->state->src_x >> 16;

  window.src.y = plane->state->src_y >> 16;

  window.src.w = plane->state->src_w >> 16;

  window.src.h = plane->state->src_h >> 16;

```
  window.dst.x = plane->state->crtc_x;
```
```
  window.dst.y = plane->state->crtc_y;
```
```
  window.dst.w = plane->state->crtc_w;
```
```
  window.dst.h = plane->state->crtc_h;
```

  window.src.x = plane->state->src.x1 >> 16;

  window.src.y = plane->state->src.y1 >> 16;

  window.src.w = drm_rect_width(&plane->state->src) >> 16;

  window.src.h = drm_rect_height(&plane->state->src) >> 16;

```
  window.dst.x = plane->state->dst.x1;
```
```
  window.dst.y = plane->state->dst.y1;
```

  window.dst.w = drm_rect_width(&plane->state->dst);

  window.dst.h = drm_rect_height(&plane->state->dst);
  window.bits_per_pixel = fb->format->cpp[0] * 8;
  window.bottom_up = tegra_fb_is_bottom_up(fb);

Looks good as far as I can tell.

Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 08/22] drm/tegra: dc: Disable plane if it is invisible

On Tegra20 if plane has width or height equal to 0, it will be infinitely wide or tall. Let's disable the plane if it is invisible on atomic state committing to fix the issue. The Rockchip DRM driver does the same.

Signed-off-by: Dmitry Osipenko digetx@gmail.com --- drivers/gpu/drm/tegra/dc.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index a7a7cce1afd0..c875f11786b9 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -559,6 +559,23 @@ static int tegra_plane_atomic_check(struct drm_plane *plane, return 0; }

+static void tegra_dc_disable_window(struct tegra_dc *dc, int index) +{ + unsigned long flags; + u32 value; + + spin_lock_irqsave(&dc->lock, flags); + + value = WINDOW_A_SELECT << index; + tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER); + + value = tegra_dc_readl(dc, DC_WIN_WIN_OPTIONS); + value &= ~WIN_ENABLE; + tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS); + + spin_unlock_irqrestore(&dc->lock, flags); +} + static void tegra_plane_atomic_update(struct drm_plane *plane, struct drm_plane_state *old_state) { @@ -573,6 +590,9 @@ static void tegra_plane_atomic_update(struct drm_plane *plane, if (!plane->state->crtc || !plane->state->fb) return;

+ if (!plane->state->visible) + return tegra_dc_disable_window(dc, p->index); + memset(&window, 0, sizeof(window)); window.src.x = plane->state->src.x1 >> 16; window.src.y = plane->state->src.y1 >> 16; @@ -612,8 +632,6 @@ static void tegra_plane_atomic_disable(struct drm_plane *plane, { struct tegra_plane *p = to_tegra_plane(plane); struct tegra_dc *dc; - unsigned long flags; - u32 value;

/* rien ne va plus */ if (!old_state || !old_state->crtc) @@ -621,16 +639,7 @@ static void tegra_plane_atomic_disable(struct drm_plane *plane,

dc = to_tegra_dc(old_state->crtc);

- spin_lock_irqsave(&dc->lock, flags); - - value = WINDOW_A_SELECT << p->index; - tegra_dc_writel(dc, value, DC_CMD_DISPLAY_WINDOW_HEADER); - - value = tegra_dc_readl(dc, DC_WIN_WIN_OPTIONS); - value &= ~WIN_ENABLE; - tegra_dc_writel(dc, value, DC_WIN_WIN_OPTIONS); - - spin_unlock_irqrestore(&dc->lock, flags); + tegra_dc_disable_window(dc, p->index); }

static const struct drm_plane_helper_funcs tegra_primary_plane_helper_funcs = {

-- 2.13.0

Erik Faye-Lund

14 Jun 14 Jun

7:21 a.m.

New subject: [PATCH v2 08/22] drm/tegra: dc: Disable plane if it is invisible

On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

On Tegra20 if plane has width or height equal to 0, it will be infinitely wide or tall. Let's disable the plane if it is invisible on atomic state committing to fix the issue. The Rockchip DRM driver does the same.

Signed-off-by: Dmitry Osipenko digetx@gmail.com

Looks good as far as I can tell

Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 09/22] drm/tegra: Don't use IOMMU on Tegra20

There is no IOMMU on Tegra20, instead a GART would be picked as an IOMMU provider.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Acked-by: Joerg Roedel jroedel@suse.de --- drivers/gpu/drm/tegra/drm.c | 3 ++- drivers/gpu/host1x/dev.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index e999391aedc9..aa7988dcc28f 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -131,7 +131,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (!tegra) return -ENOMEM;

- if (iommu_present(&platform_bus_type)) { + if (iommu_present(&platform_bus_type) && + !of_machine_is_compatible("nvidia,tegra20")) { u64 carveout_start, carveout_end, gem_start, gem_end; struct iommu_domain_geometry *geometry; unsigned long order; diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index f05ebb14fa63..6a805ed908c3 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -177,7 +177,8 @@ static int host1x_probe(struct platform_device *pdev) return err; }

- if (iommu_present(&platform_bus_type)) { + if (iommu_present(&platform_bus_type) && + !of_machine_is_compatible("nvidia,tegra20")) { struct iommu_domain_geometry *geometry; unsigned long order;

-- 2.13.0

Erik Faye-Lund

14 Jun 14 Jun

7:39 a.m.

New subject: [PATCH v2 09/22] drm/tegra: Don't use IOMMU on Tegra20

On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

There is no IOMMU on Tegra20, instead a GART would be picked as an IOMMU provider.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Acked-by: Joerg Roedel jroedel@suse.de

drivers/gpu/drm/tegra/drm.c | 3 ++- drivers/gpu/host1x/dev.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index e999391aedc9..aa7988dcc28f 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -131,7 +131,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (!tegra) return -ENOMEM;
  if (iommu_present(&platform_bus_type)) {
  if (iommu_present(&platform_bus_type) &&
      !of_machine_is_compatible("nvidia,tegra20")) {
          u64 carveout_start, carveout_end, gem_start, gem_end;
          struct iommu_domain_geometry *geometry;
          unsigned long order;
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index f05ebb14fa63..6a805ed908c3 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -177,7 +177,8 @@ static int host1x_probe(struct platform_device *pdev) return err; }
  if (iommu_present(&platform_bus_type)) {
  if (iommu_present(&platform_bus_type) &&
      !of_machine_is_compatible("nvidia,tegra20")) {
          struct iommu_domain_geometry *geometry;
          unsigned long order;

This doesn't feel great... The commit message says there's no IOMMU, but iommu_present says otherwise. I know there's some more subtleties here, and the commit message does hint at this. But...

If we don't want to treat the GART as an IOMMU, shouldn't we somehow make sure iommu_present() doesn't return true in these cases (or perhaps make something like tegra_use_iommu(), with a comment explaining why we don't want to allow the GART to be treated like a proper IOMMU)? These seems to be the only Tegra-specific calls to iommu_present()...

That being said, the patch seems to have the right effect...

Dmitry Osipenko

10:22 a.m.

New subject: [PATCH v2 09/22] drm/tegra: Don't use IOMMU on Tegra20

On 14.06.2017 10:39, Erik Faye-Lund wrote:

...

On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...
There is no IOMMU on Tegra20, instead a GART would be picked as an IOMMU provider.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Acked-by: Joerg Roedel jroedel@suse.de

drivers/gpu/drm/tegra/drm.c | 3 ++- drivers/gpu/host1x/dev.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index e999391aedc9..aa7988dcc28f 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -131,7 +131,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (!tegra) return -ENOMEM;
  if (iommu_present(&platform_bus_type)) {
  if (iommu_present(&platform_bus_type) &&
      !of_machine_is_compatible("nvidia,tegra20")) {
          u64 carveout_start, carveout_end, gem_start, gem_end;
          struct iommu_domain_geometry *geometry;
          unsigned long order;
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index f05ebb14fa63..6a805ed908c3 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -177,7 +177,8 @@ static int host1x_probe(struct platform_device *pdev) return err; }
  if (iommu_present(&platform_bus_type)) {
  if (iommu_present(&platform_bus_type) &&
      !of_machine_is_compatible("nvidia,tegra20")) {
          struct iommu_domain_geometry *geometry;
          unsigned long order;
This doesn't feel great... The commit message says there's no IOMMU, but iommu_present says otherwise. I know there's some more subtleties here, and the commit message does hint at this. But...

If we don't want to treat the GART as an IOMMU, shouldn't we somehow make sure iommu_present() doesn't return true in these cases (or perhaps make something like tegra_use_iommu(), with a comment explaining why we don't want to allow the GART to be treated like a proper IOMMU)? These seems to be the only Tegra-specific calls to iommu_present()...

That being said, the patch seems to have the right effect...

We don't want to treat the GART as an IOMMU right now, but I'd want to change that in the (near?) future, so it's a kinda trivial preparation patch that restores the GART driver and these of_machine_is_compatible() are supposed to go away later. Probably I should mention this in the commit message(?).

I think we can add a Tegra20 specific IOCTL for an allocation of GART-able memory and use it for a stuff like opentegra's EXA, its fallback allocations migration is a pita and reserved CMA is never enough =)

-- Dmitry

Dmitry Osipenko

10:24 p.m.

New subject: [PATCH v2 09/22] drm/tegra: Don't use IOMMU on Tegra20

On 14.06.2017 13:22, Dmitry Osipenko wrote:

...

On 14.06.2017 10:39, Erik Faye-Lund wrote:

...
On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...
There is no IOMMU on Tegra20, instead a GART would be picked as an IOMMU provider.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Acked-by: Joerg Roedel jroedel@suse.de

drivers/gpu/drm/tegra/drm.c | 3 ++- drivers/gpu/host1x/dev.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index e999391aedc9..aa7988dcc28f 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -131,7 +131,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (!tegra) return -ENOMEM;
  if (iommu_present(&platform_bus_type)) {
  if (iommu_present(&platform_bus_type) &&
      !of_machine_is_compatible("nvidia,tegra20")) {
          u64 carveout_start, carveout_end, gem_start, gem_end;
          struct iommu_domain_geometry *geometry;
          unsigned long order;
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index f05ebb14fa63..6a805ed908c3 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -177,7 +177,8 @@ static int host1x_probe(struct platform_device *pdev) return err; }
  if (iommu_present(&platform_bus_type)) {
  if (iommu_present(&platform_bus_type) &&
      !of_machine_is_compatible("nvidia,tegra20")) {
          struct iommu_domain_geometry *geometry;
          unsigned long order;
This doesn't feel great... The commit message says there's no IOMMU, but iommu_present says otherwise. I know there's some more subtleties here, and the commit message does hint at this. But...

If we don't want to treat the GART as an IOMMU, shouldn't we somehow make sure iommu_present() doesn't return true in these cases (or perhaps make something like tegra_use_iommu(), with a comment explaining why we don't want to allow the GART to be treated like a proper IOMMU)? These seems to be the only Tegra-specific calls to iommu_present()...

That being said, the patch seems to have the right effect...
We don't want to treat the GART as an IOMMU right now, but I'd want to change that in the (near?) future, so it's a kinda trivial preparation patch that restores the GART driver and these of_machine_is_compatible() are supposed to go away later. Probably I should mention this in the commit message(?).

I think we can add a Tegra20 specific IOCTL for an allocation of GART-able memory and use it for a stuff like opentegra's EXA, its fallback allocations migration is a pita and reserved CMA is never enough =)

I have hacked the DRM driver to work with GART correctly for the IOMMU allocations, the grate tests and opentegra are working fine utilizing the GART, CMA is barely utilized and kmsg shows GART's map/unmap debug messages that I've enabled in the driver. So I think GART should be really useful for us.

Actually, now I'm thinking that it should be better to postpone the restoring of the GART driver till we have a full solution, maybe next merge window. I'll drop these two patches from the series in v3 that I'm going to send shortly.

-- Dmitry

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 10/22] Revert "iommu/tegra: gart: Do not register with bus"

The GART driver was disabled because it was picked by as an IOMMU provider for the DRM driver on Tegra20, which is not the purpose of the GART. Now DRM driver avoids to use IOMMU on Tegra20, so the GART driver can be re-enabled. Potentially there are interesting use cases of the GART for Tegra20, like mmap'ing of a fallback memory allocation and using its GART aperture for an accelerated graphics operation.

This reverts commit c7e3ca515e784a82223f447b79eb2090bdb91378.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Acked-by: Joerg Roedel jroedel@suse.de --- drivers/iommu/tegra-gart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c index 37e708fdbb5a..430d13bf4f95 100644 --- a/drivers/iommu/tegra-gart.c +++ b/drivers/iommu/tegra-gart.c @@ -422,7 +422,7 @@ static int tegra_gart_probe(struct platform_device *pdev) do_gart_setup(gart, NULL);

gart_handle = gart; - + bus_set_iommu(&platform_bus_type, &gart_iommu_ops); return 0; }

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 11/22] gpu: host1x: Initialize firewall class to the jobs one

The commands stream is prepended by the jobs class on the CDMA submission, so that explicitly setting a module class in the commands stream isn't necessary. The firewall initializes its class to 0 and the command stream that doesn't explicitly specify the class effectively bypasses the firewall.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/host1x/job.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 5f5f8ee6143d..d9933828fe87 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -504,7 +504,7 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev) fw.dev = dev; fw.reloc = job->relocarray; fw.num_relocs = job->num_relocs; - fw.class = 0; + fw.class = job->class;

for (i = 0; i < job->num_gathers; i++) { struct host1x_job_gather *g = &job->gathers[i];

-- 2.13.0

Erik Faye-Lund

14 Jun 14 Jun

7:41 a.m.

New subject: [PATCH v2 11/22] gpu: host1x: Initialize firewall class to the jobs one

On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

The commands stream is prepended by the jobs class on the CDMA submission, so that explicitly setting a module class in the commands stream isn't necessary. The firewall initializes its class to 0 and the command stream that doesn't explicitly specify the class effectively bypasses the firewall.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com

Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 12/22] gpu: host1x: Correct host1x_job_pin() error handling

In case of relocations / waitchecks patching failure the jobs pins stay referenced till DRM file get closed, wasting memory. Add the missed unpinning.

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index d9933828fe87..14f3f957ffab 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -592,22 +592,20 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)

err = do_relocs(job, g->bo); if (err) - break; + goto out;

err = do_waitchks(job, host, g->bo); if (err) - break; + goto out; }

- if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) && !err) { - err = copy_gathers(job, dev); - if (err) { - host1x_job_unpin(job); - return err; - } - } + if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL)) + goto out;

+ err = copy_gathers(job, dev); out: + if (err) + host1x_job_unpin(job); wmb();

return err;

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 13/22] gpu: host1x: Do not leak BO's phys address to userspace

Perform gathers coping before patching them, so that original gathers are left untouched. That's not as bad as leaking kernel addresses, but still doesn't feel right.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/host1x/job.c | 39 +++++++++++++++++++++++++++------------ 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 14f3f957ffab..4208329ca2af 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -137,8 +137,9 @@ static void host1x_syncpt_patch_offset(struct host1x_syncpt *sp, * avoid a wrap condition in the HW). */ static int do_waitchks(struct host1x_job *job, struct host1x *host, - struct host1x_bo *patch) + struct host1x_job_gather *g) { + struct host1x_bo *patch = g->bo; int i;

/* compare syncpt vs wait threshold */ @@ -165,7 +166,8 @@ static int do_waitchks(struct host1x_job *job, struct host1x *host, wait->syncpt_id, sp->name, wait->thresh, host1x_syncpt_read_min(sp));

- host1x_syncpt_patch_offset(sp, patch, wait->offset); + host1x_syncpt_patch_offset(sp, patch, + g->offset + wait->offset); }

wait->bo = NULL; @@ -269,11 +271,12 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job) return err; }

-static int do_relocs(struct host1x_job *job, struct host1x_bo *cmdbuf) +static int do_relocs(struct host1x_job *job, struct host1x_job_gather *g) { int i = 0; u32 last_page = ~0; void *cmdbuf_page_addr = NULL; + struct host1x_bo *cmdbuf = g->bo;

/* pin & patch the relocs for one gather */ for (i = 0; i < job->num_relocs; i++) { @@ -286,6 +289,13 @@ static int do_relocs(struct host1x_job *job, struct host1x_bo *cmdbuf) if (cmdbuf != reloc->cmdbuf.bo) continue;

+ if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL)) { + target = (u32 *)job->gather_copy_mapped + + reloc->cmdbuf.offset / sizeof(u32) + + g->offset / sizeof(u32); + goto patch_reloc; + } + if (last_page != reloc->cmdbuf.offset >> PAGE_SHIFT) { if (cmdbuf_page_addr) host1x_bo_kunmap(cmdbuf, last_page, @@ -302,6 +312,7 @@ static int do_relocs(struct host1x_job *job, struct host1x_bo *cmdbuf) }

target = cmdbuf_page_addr + (reloc->cmdbuf.offset & ~PAGE_MASK); +patch_reloc: *target = reloc_addr; }

@@ -573,6 +584,12 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev) if (err) goto out;

+ if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL)) { + err = copy_gathers(job, dev); + if (err) + goto out; + } + /* patch gathers */ for (i = 0; i < job->num_gathers; i++) { struct host1x_job_gather *g = &job->gathers[i]; @@ -581,7 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev) if (g->handled) continue;

- g->base = job->gather_addr_phys[i]; + /* copy_gathers() sets gathers base if firewall is enabled */ + if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL)) + g->base = job->gather_addr_phys[i];

for (j = i + 1; j < job->num_gathers; j++) { if (job->gathers[j].bo == g->bo) { @@ -590,19 +609,15 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev) } }

- err = do_relocs(job, g->bo); + err = do_relocs(job, g); if (err) - goto out; + break;

- err = do_waitchks(job, host, g->bo); + err = do_waitchks(job, host, g); if (err) - goto out; + break; }

- if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL)) - goto out; - - err = copy_gathers(job, dev); out: if (err) host1x_job_unpin(job);

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 14/22] gpu: host1x: Forbid relocation address shifting in the firewall

Incorrectly shifted relocation address will cause a lower memory corruption and likely a hang on a write or a read of an arbitrary data in case of IOMMU absent. As of now there is no use for the address shifting (at least on Tegra20) and adding a proper shifting / sizes validation is much more work. Let's forbid it in the firewall till a proper validation is implemented.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com --- drivers/gpu/host1x/dev.c | 4 ++++ drivers/gpu/host1x/dev.h | 1 + drivers/gpu/host1x/job.c | 24 ++++++++++++++++++++++++ 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 6a805ed908c3..21da331ce092 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -72,6 +72,7 @@ static const struct host1x_info host1x01_info = { .init = host1x01_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32), + .version = 1, };

static const struct host1x_info host1x02_info = { @@ -82,6 +83,7 @@ static const struct host1x_info host1x02_info = { .init = host1x02_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32), + .version = 2, };

static const struct host1x_info host1x04_info = { @@ -92,6 +94,7 @@ static const struct host1x_info host1x04_info = { .init = host1x04_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34), + .version = 4, };

static const struct host1x_info host1x05_info = { @@ -102,6 +105,7 @@ static const struct host1x_info host1x05_info = { .init = host1x05_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34), + .version = 5, };

static const struct of_device_id host1x_of_match[] = { diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 229d08b6a45e..c6997651b336 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -100,6 +100,7 @@ struct host1x_info { int (*init)(struct host1x *host1x); /* initialize per SoC ops */ unsigned int sync_offset; /* offset of syncpoint registers */ u64 dma_mask; /* mask of addressable memory */ + unsigned int version; /* host1x's version */ };

struct host1x { diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 4208329ca2af..7825643da324 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -333,6 +333,27 @@ static bool check_reloc(struct host1x_reloc *reloc, struct host1x_bo *cmdbuf, return true; }

+static bool check_reloc_shift(struct device *dev, struct host1x_reloc *reloc) +{ + struct host1x *host = dev_get_drvdata(dev->parent); + + /* skip newer Tegra's since IOMMU is supposed to be used by + * them and not all address registers with their shifts are + * publicly documented */ + if (host->info->version > 1) + return true; + + /* skip Tegra30 with IOMMU enabled */ + if (host->domain) + return true; + + /* relocation shift value validation isn't implemented yet */ + if (reloc->shift) + return false; + + return true; +} + struct host1x_firewall { struct host1x_job *job; struct device *dev; @@ -359,6 +380,9 @@ static int check_register(struct host1x_firewall *fw, unsigned long offset) if (!check_reloc(fw->reloc, fw->cmdbuf, fw->offset)) return -EINVAL;

+ if (!check_reloc_shift(fw->dev, fw->reloc)) + return -EINVAL; + fw->num_relocs--; fw->reloc++; }

-- 2.13.0

Mikko Perttunen

14 Jun 14 Jun

6:50 a.m.

New subject: [PATCH v2 14/22] gpu: host1x: Forbid relocation address shifting in the firewall

On 14.06.2017 02:15, Dmitry Osipenko wrote:

...

Incorrectly shifted relocation address will cause a lower memory corruption and likely a hang on a write or a read of an arbitrary data in case of IOMMU absent. As of now there is no use for the address shifting (at least on Tegra20) and adding a proper shifting / sizes validation is much more work. Let's forbid it in the firewall till a proper validation is implemented.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

drivers/gpu/host1x/dev.c | 4 ++++ drivers/gpu/host1x/dev.h | 1 + drivers/gpu/host1x/job.c | 24 ++++++++++++++++++++++++ 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 6a805ed908c3..21da331ce092 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -72,6 +72,7 @@ static const struct host1x_info host1x01_info = { .init = host1x01_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 1,

};

static const struct host1x_info host1x02_info = { @@ -82,6 +83,7 @@ static const struct host1x_info host1x02_info = { .init = host1x02_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 2,

};

static const struct host1x_info host1x04_info = { @@ -92,6 +94,7 @@ static const struct host1x_info host1x04_info = { .init = host1x04_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 4,

};

static const struct host1x_info host1x05_info = { @@ -102,6 +105,7 @@ static const struct host1x_info host1x05_info = { .init = host1x05_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 5,

};

static const struct of_device_id host1x_of_match[] = { diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 229d08b6a45e..c6997651b336 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -100,6 +100,7 @@ struct host1x_info { int (*init)(struct host1x *host1x); /* initialize per SoC ops */ unsigned int sync_offset; /* offset of syncpoint registers */ u64 dma_mask; /* mask of addressable memory */

unsigned int version; /* host1x's version */

};

struct host1x { diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 4208329ca2af..7825643da324 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -333,6 +333,27 @@ static bool check_reloc(struct host1x_reloc *reloc, struct host1x_bo *cmdbuf, return true; }

+static bool check_reloc_shift(struct device *dev, struct host1x_reloc *reloc) +{
struct host1x *host = dev_get_drvdata(dev->parent);

/* skip newer Tegra's since IOMMU is supposed to be used by
* them and not all address registers with their shifts are
* publicly documented */
if (host->info->version > 1)
return true;

I don't like the firewall working differently per SoC - IOMMU can still be disabled on later SoCs and in this case there would be a security hole even if the firewall is enabled.

...

/* skip Tegra30 with IOMMU enabled */

if (host->domain)
return true;

So if we want to be able to test the firewall on all SoCs just having this check here for all SoCs would be better.

Thanks, Mikko

...

/* relocation shift value validation isn't implemented yet */

if (reloc->shift)
return false;
return true;
+}

struct host1x_firewall { struct host1x_job *job; struct device *dev; @@ -359,6 +380,9 @@ static int check_register(struct host1x_firewall *fw, unsigned long offset) if (!check_reloc(fw->reloc, fw->cmdbuf, fw->offset)) return -EINVAL;
if (!check_reloc_shift(fw->dev, fw->reloc))
	return -EINVAL;
fw->num_relocs--; fw->reloc++; }

Dmitry Osipenko

9:06 a.m.

New subject: [PATCH v2 14/22] gpu: host1x: Forbid relocation address shifting in the firewall

On 14.06.2017 09:50, Mikko Perttunen wrote:

...

On 14.06.2017 02:15, Dmitry Osipenko wrote:

...
Incorrectly shifted relocation address will cause a lower memory corruption and likely a hang on a write or a read of an arbitrary data in case of IOMMU absent. As of now there is no use for the address shifting (at least on Tegra20) and adding a proper shifting / sizes validation is much more work. Let's forbid it in the firewall till a proper validation is implemented.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

drivers/gpu/host1x/dev.c | 4 ++++ drivers/gpu/host1x/dev.h | 1 + drivers/gpu/host1x/job.c | 24 ++++++++++++++++++++++++ 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 6a805ed908c3..21da331ce092 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -72,6 +72,7 @@ static const struct host1x_info host1x01_info = { .init = host1x01_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 1,

};

static const struct host1x_info host1x02_info = { @@ -82,6 +83,7 @@ static const struct host1x_info host1x02_info = { .init = host1x02_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 2,

};

static const struct host1x_info host1x04_info = { @@ -92,6 +94,7 @@ static const struct host1x_info host1x04_info = { .init = host1x04_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 4,

};

static const struct host1x_info host1x05_info = { @@ -102,6 +105,7 @@ static const struct host1x_info host1x05_info = { .init = host1x05_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 5,

};

static const struct of_device_id host1x_of_match[] = { diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 229d08b6a45e..c6997651b336 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -100,6 +100,7 @@ struct host1x_info { int (*init)(struct host1x *host1x); /* initialize per SoC ops */ unsigned int sync_offset; /* offset of syncpoint registers */ u64 dma_mask; /* mask of addressable memory */

unsigned int version; /* host1x's version */

};

struct host1x { diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 4208329ca2af..7825643da324 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -333,6 +333,27 @@ static bool check_reloc(struct host1x_reloc *reloc, struct host1x_bo *cmdbuf, return true; }

+static bool check_reloc_shift(struct device *dev, struct host1x_reloc *reloc) +{
struct host1x *host = dev_get_drvdata(dev->parent);

/* skip newer Tegra's since IOMMU is supposed to be used by
* them and not all address registers with their shifts are
* publicly documented */
if (host->info->version > 1)
   return true;
I don't like the firewall working differently per SoC - IOMMU can still be disabled on later SoCs and in this case there would be a security hole even if the firewall is enabled.

Yes, it would be a security hole like it is is right now. Unfortunately we can't fix it on later SoC's without knowing all the address registers and their shifts.

...

...
/* skip Tegra30 with IOMMU enabled */

if (host->domain)
   return true;
So if we want to be able to test the firewall on all SoCs just having this check here for all SoCs would be better.

I'm not sure what you are meaning here. You'll get a wrong address resolved in HW if that address needs to be shifted.

As I wrote before, on later SoC's firewall could be used as a debug feature without the shifts being validated.

...

...
/* relocation shift value validation isn't implemented yet */

if (reloc->shift)
   return false;
return true;
+}

struct host1x_firewall { struct host1x_job *job; struct device *dev; @@ -359,6 +380,9 @@ static int check_register(struct host1x_firewall *fw, unsigned long offset) if (!check_reloc(fw->reloc, fw->cmdbuf, fw->offset)) return -EINVAL;
   if (!check_reloc_shift(fw->dev, fw->reloc))
       return -EINVAL;
   fw->num_relocs--;
   fw->reloc++;
}

-- Dmitry

Mikko Perttunen

11:47 a.m.

New subject: [PATCH v2 14/22] gpu: host1x: Forbid relocation address shifting in the firewall

On 14.06.2017 12:06, Dmitry Osipenko wrote:

...

On 14.06.2017 09:50, Mikko Perttunen wrote:

...
On 14.06.2017 02:15, Dmitry Osipenko wrote:

...
Incorrectly shifted relocation address will cause a lower memory corruption and likely a hang on a write or a read of an arbitrary data in case of IOMMU absent. As of now there is no use for the address shifting (at least on Tegra20) and adding a proper shifting / sizes validation is much more work. Let's forbid it in the firewall till a proper validation is implemented.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

drivers/gpu/host1x/dev.c | 4 ++++ drivers/gpu/host1x/dev.h | 1 + drivers/gpu/host1x/job.c | 24 ++++++++++++++++++++++++ 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 6a805ed908c3..21da331ce092 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -72,6 +72,7 @@ static const struct host1x_info host1x01_info = { .init = host1x01_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 1,

};

static const struct host1x_info host1x02_info = { @@ -82,6 +83,7 @@ static const struct host1x_info host1x02_info = { .init = host1x02_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 2,

};

static const struct host1x_info host1x04_info = { @@ -92,6 +94,7 @@ static const struct host1x_info host1x04_info = { .init = host1x04_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 4,

};

static const struct host1x_info host1x05_info = { @@ -102,6 +105,7 @@ static const struct host1x_info host1x05_info = { .init = host1x05_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 5,

};

static const struct of_device_id host1x_of_match[] = { diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 229d08b6a45e..c6997651b336 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -100,6 +100,7 @@ struct host1x_info { int (*init)(struct host1x *host1x); /* initialize per SoC ops */ unsigned int sync_offset; /* offset of syncpoint registers */ u64 dma_mask; /* mask of addressable memory */

unsigned int version; /* host1x's version */

};

struct host1x { diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 4208329ca2af..7825643da324 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -333,6 +333,27 @@ static bool check_reloc(struct host1x_reloc *reloc, struct host1x_bo *cmdbuf, return true; }

+static bool check_reloc_shift(struct device *dev, struct host1x_reloc *reloc) +{
struct host1x *host = dev_get_drvdata(dev->parent);

/* skip newer Tegra's since IOMMU is supposed to be used by
* them and not all address registers with their shifts are
* publicly documented */
if (host->info->version > 1)
   return true;
I don't like the firewall working differently per SoC - IOMMU can still be disabled on later SoCs and in this case there would be a security hole even if the firewall is enabled.
Yes, it would be a security hole like it is is right now. Unfortunately we can't fix it on later SoC's without knowing all the address registers and their shifts.

Indeed, we can't fix it, but I was thinking that having the firewall enabled in the kernel should always mean that the system is safe, even if IOMMU is disabled, and even if it means that you cannot really do anything useful with it (at least running host1x_test should be possible without shifts, though). Thus no matter what the hardware revision, we should reject anything we cannot ensure as safe.

...

...
...
/* skip Tegra30 with IOMMU enabled */

if (host->domain)
   return true;
So if we want to be able to test the firewall on all SoCs just having this check here for all SoCs would be better.
I'm not sure what you are meaning here. You'll get a wrong address resolved in HW if that address needs to be shifted.

So we have two choices here: a) If IOMMU is enabled, skip the shift check and return true always. This means that we still ensure that the system is safe, since with IOMMU enabled it is safe even if the firewall cannot ensure that the job is proper. This is what I was trying to say above. b) Always return false if there is a shift, on any SoC and even if IOMMU is enabled. This would harmonize the behavior of the firewall across systems.

...

As I wrote before, on later SoC's firewall could be used as a debug feature without the shifts being validated.

...
...
/* relocation shift value validation isn't implemented yet */

if (reloc->shift)
   return false;
return true;
+}

struct host1x_firewall { struct host1x_job *job; struct device *dev; @@ -359,6 +380,9 @@ static int check_register(struct host1x_firewall *fw, unsigned long offset) if (!check_reloc(fw->reloc, fw->cmdbuf, fw->offset)) return -EINVAL;
   if (!check_reloc_shift(fw->dev, fw->reloc))
       return -EINVAL;
   fw->num_relocs--;
   fw->reloc++;
}

Dmitry Osipenko

2:49 p.m.

New subject: [PATCH v2 14/22] gpu: host1x: Forbid relocation address shifting in the firewall

On 14.06.2017 14:47, Mikko Perttunen wrote:

...

On 14.06.2017 12:06, Dmitry Osipenko wrote:

...
On 14.06.2017 09:50, Mikko Perttunen wrote:

...
On 14.06.2017 02:15, Dmitry Osipenko wrote:

...
Incorrectly shifted relocation address will cause a lower memory corruption and likely a hang on a write or a read of an arbitrary data in case of IOMMU absent. As of now there is no use for the address shifting (at least on Tegra20) and adding a proper shifting / sizes validation is much more work. Let's forbid it in the firewall till a proper validation is implemented.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

drivers/gpu/host1x/dev.c | 4 ++++ drivers/gpu/host1x/dev.h | 1 + drivers/gpu/host1x/job.c | 24 ++++++++++++++++++++++++ 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 6a805ed908c3..21da331ce092 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -72,6 +72,7 @@ static const struct host1x_info host1x01_info = { .init = host1x01_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 1,

};

static const struct host1x_info host1x02_info = { @@ -82,6 +83,7 @@ static const struct host1x_info host1x02_info = { .init = host1x02_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 2,

};

static const struct host1x_info host1x04_info = { @@ -92,6 +94,7 @@ static const struct host1x_info host1x04_info = { .init = host1x04_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 4,

};

static const struct host1x_info host1x05_info = { @@ -102,6 +105,7 @@ static const struct host1x_info host1x05_info = { .init = host1x05_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 5,

};

static const struct of_device_id host1x_of_match[] = { diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 229d08b6a45e..c6997651b336 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -100,6 +100,7 @@ struct host1x_info { int (*init)(struct host1x *host1x); /* initialize per SoC ops */ unsigned int sync_offset; /* offset of syncpoint registers */ u64 dma_mask; /* mask of addressable memory */

unsigned int version; /* host1x's version */

};

struct host1x { diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 4208329ca2af..7825643da324 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -333,6 +333,27 @@ static bool check_reloc(struct host1x_reloc *reloc, struct host1x_bo *cmdbuf, return true; }

+static bool check_reloc_shift(struct device *dev, struct host1x_reloc *reloc) +{
struct host1x *host = dev_get_drvdata(dev->parent);

/* skip newer Tegra's since IOMMU is supposed to be used by
* them and not all address registers with their shifts are
* publicly documented */
if (host->info->version > 1)
   return true;
I don't like the firewall working differently per SoC - IOMMU can still be disabled on later SoCs and in this case there would be a security hole even if the firewall is enabled.
Yes, it would be a security hole like it is is right now. Unfortunately we can't fix it on later SoC's without knowing all the address registers and their shifts.
Indeed, we can't fix it, but I was thinking that having the firewall enabled in the kernel should always mean that the system is safe, even if IOMMU is disabled, and even if it means that you cannot really do anything useful with it (at least running host1x_test should be possible without shifts, though). Thus no matter what the hardware revision, we should reject anything we cannot ensure as safe.

...
...
...
/* skip Tegra30 with IOMMU enabled */

if (host->domain)
   return true;
So if we want to be able to test the firewall on all SoCs just having this check here for all SoCs would be better.
I'm not sure what you are meaning here. You'll get a wrong address resolved in HW if that address needs to be shifted.
So we have two choices here: a) If IOMMU is enabled, skip the shift check and return true always. This means that we still ensure that the system is safe, since with IOMMU enabled it is safe even if the firewall cannot ensure that the job is proper. This is what I was trying to say above. b) Always return false if there is a shift, on any SoC and even if IOMMU is enabled. This would harmonize the behavior of the firewall across systems.

Okay, I think b) is a better choice, actually that's what V1 of the patch did. Preserving behaviour regardless of the IOMMU state is reasonable and this is an opensource driver that is supposed to be used with the opensource userspace after all.

Later I'd want to stricter the firewall checks to disallow any unknown register accesses, that will make it a bit more secure on a later SoC's as well. After, we should add a memory access bounds checking for each of HW operations to make it really secure.

-- Dmitry

Mikko Perttunen

2:59 p.m.

New subject: [PATCH v2 14/22] gpu: host1x: Forbid relocation address shifting in the firewall

On 06/14/2017 05:49 PM, Dmitry Osipenko wrote:

...

On 14.06.2017 14:47, Mikko Perttunen wrote:

...
On 14.06.2017 12:06, Dmitry Osipenko wrote:

...
On 14.06.2017 09:50, Mikko Perttunen wrote:

...
On 14.06.2017 02:15, Dmitry Osipenko wrote:

...
Incorrectly shifted relocation address will cause a lower memory corruption and likely a hang on a write or a read of an arbitrary data in case of IOMMU absent. As of now there is no use for the address shifting (at least on Tegra20) and adding a proper shifting / sizes validation is much more work. Let's forbid it in the firewall till a proper validation is implemented.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

drivers/gpu/host1x/dev.c | 4 ++++ drivers/gpu/host1x/dev.h | 1 + drivers/gpu/host1x/job.c | 24 ++++++++++++++++++++++++ 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 6a805ed908c3..21da331ce092 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -72,6 +72,7 @@ static const struct host1x_info host1x01_info = { .init = host1x01_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 1, };

static const struct host1x_info host1x02_info = {

@@ -82,6 +83,7 @@ static const struct host1x_info host1x02_info = { .init = host1x02_init, .sync_offset = 0x3000, .dma_mask = DMA_BIT_MASK(32),

.version = 2, };

static const struct host1x_info host1x04_info = {

@@ -92,6 +94,7 @@ static const struct host1x_info host1x04_info = { .init = host1x04_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 4, };

static const struct host1x_info host1x05_info = {

@@ -102,6 +105,7 @@ static const struct host1x_info host1x05_info = { .init = host1x05_init, .sync_offset = 0x2100, .dma_mask = DMA_BIT_MASK(34),

.version = 5, };

static const struct of_device_id host1x_of_match[] = {

diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 229d08b6a45e..c6997651b336 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -100,6 +100,7 @@ struct host1x_info { int (*init)(struct host1x *host1x); /* initialize per SoC ops */ unsigned int sync_offset; /* offset of syncpoint registers */ u64 dma_mask; /* mask of addressable memory */

unsigned int version; /* host1x's version */ };

struct host1x {

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 4208329ca2af..7825643da324 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -333,6 +333,27 @@ static bool check_reloc(struct host1x_reloc *reloc, struct host1x_bo *cmdbuf, return true; }

+static bool check_reloc_shift(struct device *dev, struct host1x_reloc *reloc) +{
struct host1x *host = dev_get_drvdata(dev->parent);

/* skip newer Tegra's since IOMMU is supposed to be used by
* them and not all address registers with their shifts are
* publicly documented */
if (host->info->version > 1)
   return true;
I don't like the firewall working differently per SoC - IOMMU can still be disabled on later SoCs and in this case there would be a security hole even if the firewall is enabled.
Yes, it would be a security hole like it is is right now. Unfortunately we can't fix it on later SoC's without knowing all the address registers and their shifts.
Indeed, we can't fix it, but I was thinking that having the firewall enabled in the kernel should always mean that the system is safe, even if IOMMU is disabled, and even if it means that you cannot really do anything useful with it (at least running host1x_test should be possible without shifts, though). Thus no matter what the hardware revision, we should reject anything we cannot ensure as safe.

...
...
...
/* skip Tegra30 with IOMMU enabled */

if (host->domain)
   return true;
So if we want to be able to test the firewall on all SoCs just having this check here for all SoCs would be better.
I'm not sure what you are meaning here. You'll get a wrong address resolved in HW if that address needs to be shifted.
So we have two choices here: a) If IOMMU is enabled, skip the shift check and return true always. This means that we still ensure that the system is safe, since with IOMMU enabled it is safe even if the firewall cannot ensure that the job is proper. This is what I was trying to say above. b) Always return false if there is a shift, on any SoC and even if IOMMU is enabled. This would harmonize the behavior of the firewall across systems.
Okay, I think b) is a better choice, actually that's what V1 of the patch did. Preserving behaviour regardless of the IOMMU state is reasonable and this is an opensource driver that is supposed to be used with the opensource userspace after all.

Later I'd want to stricter the firewall checks to disallow any unknown register accesses, that will make it a bit more secure on a later SoC's as well. After, we should add a memory access bounds checking for each of HW operations to make it really secure.

Yep, this sounds good.

Mikko

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 15/22] gpu: host1x: Forbid RESTART opcode in the firewall

The RESTART opcode terminates the gather and restarts the CDMA fetching from a specified word << 2 relative to the CDMA start address. That shouldn't be allowed to be done by userspace.

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 7825643da324..3680b4b7dce9 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -515,7 +515,6 @@ static int validate(struct host1x_firewall *fw, struct host1x_job_gather *g) goto out; break; case 4: - case 5: case 14: break; default:

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 16/22] gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall

Several channels could be made to write the same unit concurrently via the SETCLASS opcode, trusting userspace is a bad idea. It should be possible to drop the per-client channel reservation and add a per-unit locking by inserting MLOCK's to the command stream to re-allow the SETCLASS opcode, but it will be much more work. Let's forbid the unit-unrelated class changes for now.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 1 + drivers/gpu/drm/tegra/drm.h | 1 + drivers/gpu/drm/tegra/gr2d.c | 7 +++++++ drivers/gpu/host1x/job.c | 24 ++++++++++++++++++++---- include/linux/host1x.h | 3 +++ 5 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index aa7988dcc28f..971cdead6436 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -533,6 +533,7 @@ int tegra_drm_submit(struct tegra_drm_context *context, }

job->is_addr_reg = context->client->ops->is_addr_reg; + job->is_valid_class = context->client->ops->is_valid_class; job->syncpt_incrs = syncpt.incrs; job->syncpt_id = syncpt.id; job->timeout = 10000; diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index 85aa2e3d9d4e..6d6da01282f3 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -83,6 +83,7 @@ struct tegra_drm_client_ops { struct tegra_drm_context *context); void (*close_channel)(struct tegra_drm_context *context); int (*is_addr_reg)(struct device *dev, u32 class, u32 offset); + int (*is_valid_class)(u32 class); int (*submit)(struct tegra_drm_context *context, struct drm_tegra_submit *args, struct drm_device *drm, struct drm_file *file); diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c index 02cd3e37a6ec..fbe0b8b25b42 100644 --- a/drivers/gpu/drm/tegra/gr2d.c +++ b/drivers/gpu/drm/tegra/gr2d.c @@ -109,10 +109,17 @@ static int gr2d_is_addr_reg(struct device *dev, u32 class, u32 offset) return 0; }

+static int gr2d_is_valid_class(u32 class) +{ + return (class == HOST1X_CLASS_GR2D || + class == HOST1X_CLASS_GR2D_SB); +} + static const struct tegra_drm_client_ops gr2d_ops = { .open_channel = gr2d_open_channel, .close_channel = gr2d_close_channel, .is_addr_reg = gr2d_is_addr_reg, + .is_valid_class = gr2d_is_valid_class, .submit = tegra_drm_submit, };

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 3680b4b7dce9..32d3300bcf99 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -373,6 +373,9 @@ struct host1x_firewall {

static int check_register(struct host1x_firewall *fw, unsigned long offset) { + if (!fw->job->is_addr_reg) + return 0; + if (fw->job->is_addr_reg(fw->dev, fw->class, offset)) { if (!fw->num_relocs) return -EINVAL; @@ -390,6 +393,19 @@ static int check_register(struct host1x_firewall *fw, unsigned long offset) return 0; }

+static int check_class(struct host1x_firewall *fw, u32 class) +{ + if (!fw->job->is_valid_class) { + if (fw->class != class) + return -EINVAL; + } else { + if (!fw->job->is_valid_class(fw->class)) + return -EINVAL; + } + + return 0; +} + static int check_mask(struct host1x_firewall *fw) { u32 mask = fw->mask; @@ -463,11 +479,9 @@ static int validate(struct host1x_firewall *fw, struct host1x_job_gather *g) { u32 *cmdbuf_base = (u32 *)fw->job->gather_copy_mapped + (g->offset / sizeof(u32)); + u32 job_class = fw->class; int err = 0;

- if (!fw->job->is_addr_reg) - return 0; - fw->words = g->words; fw->cmdbuf = g->bo; fw->offset = 0; @@ -487,7 +501,9 @@ static int validate(struct host1x_firewall *fw, struct host1x_job_gather *g) fw->class = word >> 6 & 0x3ff; fw->mask = word & 0x3f; fw->reg = word >> 16 & 0xfff; - err = check_mask(fw); + err = check_class(fw, job_class); + if (!err) + err = check_mask(fw); if (err) goto out; break; diff --git a/include/linux/host1x.h b/include/linux/host1x.h index aa323e43ae4e..71a34774b56a 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -235,6 +235,9 @@ struct host1x_job { /* Check if register is marked as an address reg */ int (*is_addr_reg)(struct device *dev, u32 reg, u32 class);

+ /* Check if class belongs to the unit */ + int (*is_valid_class)(u32 class); + /* Request a SETCLASS to this class */ u32 class;

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 17/22] gpu: host1x: Correct swapped arguments in the is_addr_reg() definition

Arguments of the .is_addr_reg() are swapped in the definition of the function, that is quite confusing.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Erik Faye-Lund kusmabite@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- include/linux/host1x.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/host1x.h b/include/linux/host1x.h index 71a34774b56a..561d6bb6580d 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -233,7 +233,7 @@ struct host1x_job { u8 *gather_copy_mapped;

/* Check if register is marked as an address reg */ - int (*is_addr_reg)(struct device *dev, u32 reg, u32 class); + int (*is_addr_reg)(struct device *dev, u32 class, u32 reg);

/* Check if class belongs to the unit */ int (*is_valid_class)(u32 class);

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 18/22] gpu: host1x: Check waits in the firewall

Check waits in the firewall in a way it is done for relocations.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/host1x/job.c | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index 32d3300bcf99..fc194c676d91 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -31,6 +31,8 @@ #include "job.h" #include "syncpt.h"

+#define HOST1X_WAIT_SYNCPT_OFFSET 0x8 + struct host1x_job *host1x_job_alloc(struct host1x_channel *ch, u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks) @@ -354,6 +356,17 @@ static bool check_reloc_shift(struct device *dev, struct host1x_reloc *reloc) return true; }

+static bool check_wait(struct host1x_waitchk *wait, struct host1x_bo *cmdbuf, + unsigned int offset) +{ + offset *= sizeof(u32); + + if (wait->bo != cmdbuf || wait->offset != offset) + return false; + + return true; +} + struct host1x_firewall { struct host1x_job *job; struct device *dev; @@ -361,6 +374,9 @@ struct host1x_firewall { unsigned int num_relocs; struct host1x_reloc *reloc;

+ unsigned int num_waitchks; + struct host1x_waitchk *waitchk; + struct host1x_bo *cmdbuf; unsigned int offset;

@@ -390,6 +406,20 @@ static int check_register(struct host1x_firewall *fw, unsigned long offset) fw->reloc++; }

+ if (offset == HOST1X_WAIT_SYNCPT_OFFSET) { + if (fw->class != HOST1X_CLASS_HOST1X) + return -EINVAL; + + if (!fw->num_waitchks) + return -EINVAL; + + if (!check_wait(fw->waitchk, fw->cmdbuf, fw->offset)) + return -EINVAL; + + fw->num_waitchks--; + fw->waitchk++; + } + return 0; }

@@ -554,6 +584,8 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev) fw.dev = dev; fw.reloc = job->relocarray; fw.num_relocs = job->num_relocs; + fw.waitchk = job->waitchk; + fw.num_waitchks = job->num_waitchk; fw.class = job->class;

for (i = 0; i < job->num_gathers; i++) { @@ -592,8 +624,8 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev) offset += g->words * sizeof(u32); }

- /* No relocs should remain at this point */ - if (fw.num_relocs) + /* No relocs and waitchks should remain at this point */ + if (fw.num_relocs || fw.num_waitchks) return -EINVAL;

return 0;

-- 2.13.0

Erik Faye-Lund

14 Jun 14 Jun

7:47 a.m.

New subject: [PATCH v2 18/22] gpu: host1x: Check waits in the firewall

On Wed, Jun 14, 2017 at 1:15 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

Check waits in the firewall in a way it is done for relocations.

Signed-off-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Mikko Perttunen mperttunen@nvidia.com

Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

Dmitry Osipenko

13 Jun 13 Jun

11:15 p.m.

New subject: [PATCH v2 19/22] gpu: host1x: Remove unused 'struct host1x_cmdbuf'

The struct host1x_cmdbuf is unused, let's remove it.

diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h index 0debd93a1849..4bda51d503ec 100644 --- a/drivers/gpu/host1x/job.h +++ b/drivers/gpu/host1x/job.h @@ -27,13 +27,6 @@ struct host1x_job_gather { bool handled; };

-struct host1x_cmdbuf { - u32 handle; - u32 offset; - u32 words; - u32 pad; -}; - struct host1x_job_unpin_data { struct host1x_bo *bo; struct sg_table *sgt;

-- 2.13.0

Dmitry Osipenko

11:15 p.m.

New subject: [PATCH v2 20/22] gpu: host1x: Remove unused host1x_cdma_stop() definition

There is no host1x_cdma_stop() in the code, let's remove its definition from the header file.

diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h index ec170a78f4e1..286d49386be9 100644 --- a/drivers/gpu/host1x/cdma.h +++ b/drivers/gpu/host1x/cdma.h @@ -88,7 +88,6 @@ struct host1x_cdma {

int host1x_cdma_init(struct host1x_cdma *cdma); int host1x_cdma_deinit(struct host1x_cdma *cdma); -void host1x_cdma_stop(struct host1x_cdma *cdma); int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job); void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2); void host1x_cdma_end(struct host1x_cdma *cdma, struct host1x_job *job);

-- 2.13.0

Dmitry Osipenko

11:16 p.m.

New subject: [PATCH v2 21/22] gpu: host1x: Refactor channel allocation code

From: Mikko Perttunen mperttunen@nvidia.com

This is largely a rewrite of the Host1x channel allocation code, bringing several changes:

- The previous code could deadlock due to an interaction between the 'reflock' mutex and CDMA timeout handling. This gets rid of the mutex. - Support for more than 32 channels, required for Tegra186 - General refactoring, including better encapsulation of channel ownership handling into channel.c

Signed-off-by: Mikko Perttunen mperttunen@nvidia.com Reviewed-by: Dmitry Osipenko digetx@gmail.com Tested-by: Dmitry Osipenko digetx@gmail.com --- drivers/gpu/drm/tegra/gr2d.c | 4 +- drivers/gpu/drm/tegra/gr3d.c | 4 +- drivers/gpu/drm/tegra/vic.c | 4 +- drivers/gpu/host1x/channel.c | 147 +++++++++++++++++++++++-------------- drivers/gpu/host1x/channel.h | 21 ++++-- drivers/gpu/host1x/debug.c | 47 +++++------- drivers/gpu/host1x/dev.c | 7 +- drivers/gpu/host1x/dev.h | 6 +- drivers/gpu/host1x/hw/channel_hw.c | 4 - include/linux/host1x.h | 1 - 10 files changed, 135 insertions(+), 110 deletions(-)

diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c index fbe0b8b25b42..6ea070da7718 100644 --- a/drivers/gpu/drm/tegra/gr2d.c +++ b/drivers/gpu/drm/tegra/gr2d.c @@ -38,7 +38,7 @@ static int gr2d_init(struct host1x_client *client)

client->syncpts[0] = host1x_syncpt_request(client->dev, flags); if (!client->syncpts[0]) { - host1x_channel_free(gr2d->channel); + host1x_channel_put(gr2d->channel); return -ENOMEM; }

@@ -57,7 +57,7 @@ static int gr2d_exit(struct host1x_client *client) return err;

host1x_syncpt_free(client->syncpts[0]); - host1x_channel_free(gr2d->channel); + host1x_channel_put(gr2d->channel);

return 0; } diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c index 13f0d1b7cd98..cee2ab645cde 100644 --- a/drivers/gpu/drm/tegra/gr3d.c +++ b/drivers/gpu/drm/tegra/gr3d.c @@ -48,7 +48,7 @@ static int gr3d_init(struct host1x_client *client)

client->syncpts[0] = host1x_syncpt_request(client->dev, flags); if (!client->syncpts[0]) { - host1x_channel_free(gr3d->channel); + host1x_channel_put(gr3d->channel); return -ENOMEM; }

@@ -67,7 +67,7 @@ static int gr3d_exit(struct host1x_client *client) return err;

host1x_syncpt_free(client->syncpts[0]); - host1x_channel_free(gr3d->channel); + host1x_channel_put(gr3d->channel);

return 0; } diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index cd804e404a11..47cb1aaa58b1 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -182,7 +182,7 @@ static int vic_init(struct host1x_client *client) free_syncpt: host1x_syncpt_free(client->syncpts[0]); free_channel: - host1x_channel_free(vic->channel); + host1x_channel_put(vic->channel); detach_device: if (tegra->domain) iommu_detach_device(tegra->domain, vic->dev); @@ -203,7 +203,7 @@ static int vic_exit(struct host1x_client *client) return err;

host1x_syncpt_free(client->syncpts[0]); - host1x_channel_free(vic->channel); + host1x_channel_put(vic->channel);

if (vic->domain) { iommu_detach_device(vic->domain, vic->dev); diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c index 8f437d924c10..db9b91d1384c 100644 --- a/drivers/gpu/host1x/channel.c +++ b/drivers/gpu/host1x/channel.c @@ -24,19 +24,33 @@ #include "job.h"

/* Constructor for the host1x device list */ -int host1x_channel_list_init(struct host1x *host) +int host1x_channel_list_init(struct host1x_channel_list *chlist, + unsigned int num_channels) { - INIT_LIST_HEAD(&host->chlist.list); - mutex_init(&host->chlist_mutex); - - if (host->info->nb_channels > BITS_PER_LONG) { - WARN(1, "host1x hardware has more channels than supported by the driver\n"); - return -ENOSYS; + chlist->channels = kcalloc(num_channels, sizeof(struct host1x_channel), + GFP_KERNEL); + if (!chlist->channels) + return -ENOMEM; + + chlist->allocated_channels = + kcalloc(BITS_TO_LONGS(num_channels), sizeof(unsigned long), + GFP_KERNEL); + if (!chlist->allocated_channels) { + kfree(chlist->channels); + return -ENOMEM; }

+ bitmap_zero(chlist->allocated_channels, num_channels); + return 0; }

+void host1x_channel_list_free(struct host1x_channel_list *chlist) +{ + kfree(chlist->allocated_channels); + kfree(chlist->channels); +} + int host1x_job_submit(struct host1x_job *job) { struct host1x *host = dev_get_drvdata(job->channel->dev->parent); @@ -47,86 +61,107 @@ EXPORT_SYMBOL(host1x_job_submit);

struct host1x_channel *host1x_channel_get(struct host1x_channel *channel) { - int err = 0; + kref_get(&channel->refcount);

- mutex_lock(&channel->reflock); + return channel; +} +EXPORT_SYMBOL(host1x_channel_get);

- if (channel->refcount == 0) - err = host1x_cdma_init(&channel->cdma); +/** + * host1x_channel_get_index() - Attempt to get channel reference by index + * @host: Host1x device object + * @index: Index of channel + * + * If channel number @index is currently allocated, increase its refcount + * and return a pointer to it. Otherwise, return NULL. + */ +struct host1x_channel *host1x_channel_get_index(struct host1x *host, + unsigned int index) +{ + struct host1x_channel *ch = &host->channel_list.channels[index];

- if (!err) - channel->refcount++; + if (!kref_get_unless_zero(&ch->refcount)) + return NULL;

- mutex_unlock(&channel->reflock); + return ch; +} + +static void release_channel(struct kref *kref) +{ + struct host1x_channel *channel = + container_of(kref, struct host1x_channel, refcount); + struct host1x *host = dev_get_drvdata(channel->dev->parent); + struct host1x_channel_list *chlist = &host->channel_list; + + host1x_hw_cdma_stop(host, &channel->cdma); + host1x_cdma_deinit(&channel->cdma);

- return err ? NULL : channel; + clear_bit(channel->id, chlist->allocated_channels); } -EXPORT_SYMBOL(host1x_channel_get);

void host1x_channel_put(struct host1x_channel *channel) { - mutex_lock(&channel->reflock); + kref_put(&channel->refcount, release_channel); +} +EXPORT_SYMBOL(host1x_channel_put);

- if (channel->refcount == 1) { - struct host1x *host = dev_get_drvdata(channel->dev->parent); +static struct host1x_channel *acquire_unused_channel(struct host1x *host) +{ + struct host1x_channel_list *chlist = &host->channel_list; + unsigned int max_channels = host->info->nb_channels; + unsigned int index;

- host1x_hw_cdma_stop(host, &channel->cdma); - host1x_cdma_deinit(&channel->cdma); + index = find_first_zero_bit(chlist->allocated_channels, max_channels); + if (index >= max_channels) { + dev_err(host->dev, "failed to find free channel\n"); + return NULL; }

- channel->refcount--; + chlist->channels[index].id = index;

- mutex_unlock(&channel->reflock); + set_bit(index, chlist->allocated_channels); + + return &chlist->channels[index]; } -EXPORT_SYMBOL(host1x_channel_put);

+/** + * host1x_channel_request() - Allocate a channel + * @device: Host1x unit this channel will be used to send commands to + * + * Allocates a new host1x channel for @device. If there are no free channels, + * this will sleep until one becomes available. May return NULL if CDMA + * initialization fails. + */ struct host1x_channel *host1x_channel_request(struct device *dev) { struct host1x *host = dev_get_drvdata(dev->parent); - unsigned int max_channels = host->info->nb_channels; - struct host1x_channel *channel = NULL; - unsigned long index; + struct host1x_channel_list *chlist = &host->channel_list; + struct host1x_channel *channel; int err;

- mutex_lock(&host->chlist_mutex); + channel = acquire_unused_channel(host); + if (!channel) + return NULL;

- index = find_first_zero_bit(&host->allocated_channels, max_channels); - if (index >= max_channels) - goto fail; + kref_init(&channel->refcount); + mutex_init(&channel->submitlock); + channel->dev = dev;

- channel = kzalloc(sizeof(*channel), GFP_KERNEL); - if (!channel) + err = host1x_hw_channel_init(host, channel, channel->id); + if (err < 0) goto fail;

- err = host1x_hw_channel_init(host, channel, index); + err = host1x_cdma_init(&channel->cdma); if (err < 0) goto fail;

- /* Link device to host1x_channel */ - channel->dev = dev; - - /* Add to channel list */ - list_add_tail(&channel->list, &host->chlist.list); - - host->allocated_channels |= BIT(index); - - mutex_unlock(&host->chlist_mutex); return channel;

fail: - dev_err(dev, "failed to init channel\n"); - kfree(channel); - mutex_unlock(&host->chlist_mutex); - return NULL; -} -EXPORT_SYMBOL(host1x_channel_request); + clear_bit(channel->id, chlist->allocated_channels);

-void host1x_channel_free(struct host1x_channel *channel) -{ - struct host1x *host = dev_get_drvdata(channel->dev->parent); + dev_err(dev, "failed to initialize channel\n");

- host->allocated_channels &= ~BIT(channel->id); - list_del(&channel->list); - kfree(channel); + return NULL; } -EXPORT_SYMBOL(host1x_channel_free); +EXPORT_SYMBOL(host1x_channel_request); diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h index df767cf90d51..7068e42d42df 100644 --- a/drivers/gpu/host1x/channel.h +++ b/drivers/gpu/host1x/channel.h @@ -20,17 +20,21 @@ #define __HOST1X_CHANNEL_H

#include <linux/io.h> +#include <linux/kref.h>

#include "cdma.h"

struct host1x; +struct host1x_channel;

-struct host1x_channel { - struct list_head list; +struct host1x_channel_list { + struct host1x_channel *channels; + unsigned long *allocated_channels; +};

- unsigned int refcount; +struct host1x_channel { + struct kref refcount; unsigned int id; - struct mutex reflock; struct mutex submitlock; void __iomem *regs; struct device *dev; @@ -38,9 +42,10 @@ struct host1x_channel { };

/* channel list operations */ -int host1x_channel_list_init(struct host1x *host); - -#define host1x_for_each_channel(host, channel) \ - list_for_each_entry(channel, &host->chlist.list, list) +int host1x_channel_list_init(struct host1x_channel_list *chlist, + unsigned int num_channels); +void host1x_channel_list_free(struct host1x_channel_list *chlist); +struct host1x_channel *host1x_channel_get_index(struct host1x *host, + unsigned int index);

#endif diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c index d9330fcc62ad..2aae0e63214c 100644 --- a/drivers/gpu/host1x/debug.c +++ b/drivers/gpu/host1x/debug.c @@ -43,24 +43,19 @@ void host1x_debug_output(struct output *o, const char *fmt, ...) o->fn(o->ctx, o->buf, len); }

-static int show_channels(struct host1x_channel *ch, void *data, bool show_fifo) +static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo) { struct host1x *m = dev_get_drvdata(ch->dev->parent); struct output *o = data;

- mutex_lock(&ch->reflock); + mutex_lock(&ch->cdma.lock);

- if (ch->refcount) { - mutex_lock(&ch->cdma.lock); + if (show_fifo) + host1x_hw_show_channel_fifo(m, ch, o);

- if (show_fifo) - host1x_hw_show_channel_fifo(m, ch, o); + host1x_hw_show_channel_cdma(m, ch, o);

- host1x_hw_show_channel_cdma(m, ch, o); - mutex_unlock(&ch->cdma.lock); - } - - mutex_unlock(&ch->reflock); + mutex_unlock(&ch->cdma.lock);

return 0; } @@ -94,28 +89,22 @@ static void show_syncpts(struct host1x *m, struct output *o) host1x_debug_output(o, "\n"); }

-static void show_all(struct host1x *m, struct output *o) +static void show_all(struct host1x *m, struct output *o, bool show_fifo) { - struct host1x_channel *ch; + int i;

host1x_hw_show_mlocks(m, o); show_syncpts(m, o); host1x_debug_output(o, "---- channels ----\n");

- host1x_for_each_channel(m, ch) - show_channels(ch, o, true); -} - -static void show_all_no_fifo(struct host1x *host1x, struct output *o) -{ - struct host1x_channel *ch; - - host1x_hw_show_mlocks(host1x, o); - show_syncpts(host1x, o); - host1x_debug_output(o, "---- channels ----\n"); + for (i = 0; i < m->info->nb_channels; ++i) { + struct host1x_channel *ch = host1x_channel_get_index(m, i);

- host1x_for_each_channel(host1x, ch) - show_channels(ch, o, false); + if (ch) { + show_channel(ch, o, show_fifo); + host1x_channel_put(ch); + } + } }

static int host1x_debug_show_all(struct seq_file *s, void *unused) @@ -125,7 +114,7 @@ static int host1x_debug_show_all(struct seq_file *s, void *unused) .ctx = s };

- show_all(s->private, &o); + show_all(s->private, &o, true);

return 0; } @@ -137,7 +126,7 @@ static int host1x_debug_show(struct seq_file *s, void *unused) .ctx = s };

- show_all_no_fifo(s->private, &o); + show_all(s->private, &o, false);

return 0; } @@ -216,7 +205,7 @@ void host1x_debug_dump(struct host1x *host1x) .fn = write_to_printk };

- show_all(host1x, &o); + show_all(host1x, &o, true); }

void host1x_debug_dump_syncpts(struct host1x *host1x) diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 21da331ce092..18bc9dabb89d 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -203,7 +203,8 @@ static int host1x_probe(struct platform_device *pdev) host->iova_end = geometry->aperture_end; }

- err = host1x_channel_list_init(host); + err = host1x_channel_list_init(&host->channel_list, + host->info->nb_channels); if (err) { dev_err(&pdev->dev, "failed to initialize channel list\n"); goto fail_detach_device; @@ -212,7 +213,7 @@ static int host1x_probe(struct platform_device *pdev) err = clk_prepare_enable(host->clk); if (err < 0) { dev_err(&pdev->dev, "failed to enable clock\n"); - goto fail_detach_device; + goto fail_free_channels; }

err = reset_control_deassert(host->rst); @@ -249,6 +250,8 @@ static int host1x_probe(struct platform_device *pdev) reset_control_assert(host->rst); fail_unprepare_disable: clk_disable_unprepare(host->clk); +fail_free_channels: + host1x_channel_list_free(&host->channel_list); fail_detach_device: if (host->domain) { put_iova_domain(&host->iova); diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index c6997651b336..ec2eddec93cb 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -130,10 +130,8 @@ struct host1x { struct host1x_syncpt *nop_sp;

struct mutex syncpt_mutex; - struct mutex chlist_mutex; - struct host1x_channel chlist; - unsigned long allocated_channels; - unsigned int num_allocated_channels; + + struct host1x_channel_list channel_list;

struct dentry *debugfs;

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c index 5e8df78b7acd..8447a56c41ca 100644 --- a/drivers/gpu/host1x/hw/channel_hw.c +++ b/drivers/gpu/host1x/hw/channel_hw.c @@ -181,10 +181,6 @@ static int channel_submit(struct host1x_job *job) static int host1x_channel_init(struct host1x_channel *ch, struct host1x *dev, unsigned int index) { - ch->id = index; - mutex_init(&ch->reflock); - mutex_init(&ch->submitlock); - ch->regs = dev->regs + index * HOST1X_CHANNEL_SIZE; return 0; } diff --git a/include/linux/host1x.h b/include/linux/host1x.h index 561d6bb6580d..5dd8df2f8810 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -156,7 +156,6 @@ struct host1x_channel; struct host1x_job;

struct host1x_channel *host1x_channel_request(struct device *dev); -void host1x_channel_free(struct host1x_channel *channel); struct host1x_channel *host1x_channel_get(struct host1x_channel *channel); void host1x_channel_put(struct host1x_channel *channel); int host1x_job_submit(struct host1x_job *job);

-- 2.13.0

Dmitry Osipenko

11:16 p.m.

New subject: [PATCH v2 22/22] gpu: host1x: At first try a non-blocking allocation for the gather copy

The blocking gather copy allocation is a major performance downside of the Host1x firewall, it may take hundreds milliseconds which is unacceptable for the real-time graphics operations. Let's try a non-blocking allocation first as a least invasive solution, it makes opentegra (Xorg driver) performance indistinguishable with/without the firewall.

Signed-off-by: Dmitry Osipenko digetx@gmail.com ---

The other much more invasive solution could be: to have a memory pool reserved for the gather copy and performing the firewall checks (and the gather copying) on the actual job submission to the CDMA, not on the job allocation like it is done now. This solution reduces the overhead of a gather copy allocations.

drivers/gpu/host1x/job.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index fc194c676d91..ed0575f14093 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -594,12 +594,20 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev) size += g->words * sizeof(u32); }

+ /* + * Try a non-blocking allocation from a higher priority pools first, + * as awaiting for the allocation here is a major performance hit. + */ job->gather_copy_mapped = dma_alloc_wc(dev, size, &job->gather_copy, - GFP_KERNEL); - if (!job->gather_copy_mapped) { - job->gather_copy_mapped = NULL; + GFP_NOWAIT); + + /* the higher priority allocation failed, try the generic-blocking */ + if (!job->gather_copy_mapped) + job->gather_copy_mapped = dma_alloc_wc(dev, size, + &job->gather_copy, + GFP_KERNEL); + if (!job->gather_copy_mapped) return -ENOMEM; - }

job->gather_copy_size = size;

-- 2.13.0

Erik Faye-Lund

14 Jun 14 Jun

7:56 a.m.

New subject: [PATCH v2 22/22] gpu: host1x: At first try a non-blocking allocation for the gather copy

On Wed, Jun 14, 2017 at 1:16 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

The blocking gather copy allocation is a major performance downside of the Host1x firewall, it may take hundreds milliseconds which is unacceptable for the real-time graphics operations. Let's try a non-blocking allocation first as a least invasive solution, it makes opentegra (Xorg driver) performance indistinguishable with/without the firewall.

Signed-off-by: Dmitry Osipenko digetx@gmail.com

The other much more invasive solution could be: to have a memory pool reserved for the gather copy and performing the firewall checks (and the gather copying) on the actual job submission to the CDMA, not on the job allocation like it is done now. This solution reduces the overhead of a gather copy allocations.

drivers/gpu/host1x/job.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index fc194c676d91..ed0575f14093 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -594,12 +594,20 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev) size += g->words * sizeof(u32); }
  /*
   * Try a non-blocking allocation from a higher priority pools first,
   * as awaiting for the allocation here is a major performance hit.
   */
  job->gather_copy_mapped = dma_alloc_wc(dev, size, &job->gather_copy,
                                         GFP_KERNEL);
  if (!job->gather_copy_mapped) {
          job->gather_copy_mapped = NULL;
                                         GFP_NOWAIT);
  /* the higher priority allocation failed, try the generic-blocking */
  if (!job->gather_copy_mapped)
          job->gather_copy_mapped = dma_alloc_wc(dev, size,
                                                 &job->gather_copy,
                                                 GFP_KERNEL);
  if (!job->gather_copy_mapped)
          return -ENOMEM;
  }

  job->gather_copy_size = size;

Can't we just have a static fixed-size buffer, and validate chunks at the time? Or why can't we validate directly on the mapped pointers?

It feels pretty silly to have to allocate memory in the first place here...

Dmitry Osipenko

8:32 a.m.

New subject: [PATCH v2 22/22] gpu: host1x: At first try a non-blocking allocation for the gather copy

On 14.06.2017 10:56, Erik Faye-Lund wrote:

...

On Wed, Jun 14, 2017 at 1:16 AM, Dmitry Osipenko digetx@gmail.com wrote:

...
The blocking gather copy allocation is a major performance downside of the Host1x firewall, it may take hundreds milliseconds which is unacceptable for the real-time graphics operations. Let's try a non-blocking allocation first as a least invasive solution, it makes opentegra (Xorg driver) performance indistinguishable with/without the firewall.

Signed-off-by: Dmitry Osipenko digetx@gmail.com

The other much more invasive solution could be: to have a memory pool reserved for the gather copy and performing the firewall checks (and the gather copying) on the actual job submission to the CDMA, not on the job allocation like it is done now. This solution reduces the overhead of a gather copy allocations.

drivers/gpu/host1x/job.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index fc194c676d91..ed0575f14093 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -594,12 +594,20 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev) size += g->words * sizeof(u32); }
  /*
   * Try a non-blocking allocation from a higher priority pools first,
   * as awaiting for the allocation here is a major performance hit.
   */
  job->gather_copy_mapped = dma_alloc_wc(dev, size, &job->gather_copy,
                                         GFP_KERNEL);
  if (!job->gather_copy_mapped) {
          job->gather_copy_mapped = NULL;
                                         GFP_NOWAIT);
  /* the higher priority allocation failed, try the generic-blocking */
  if (!job->gather_copy_mapped)
          job->gather_copy_mapped = dma_alloc_wc(dev, size,
                                                 &job->gather_copy,
                                                 GFP_KERNEL);
  if (!job->gather_copy_mapped)
          return -ENOMEM;
  }

  job->gather_copy_size = size;
Can't we just have a static fixed-size buffer, and validate chunks at the time? Or why can't we validate directly on the mapped pointers?

It feels pretty silly to have to allocate memory in the first place here...

We can't validate the mapped pointers because userspace may write to the buffers memory at any time. Maybe it should be possible to write-protect cmdbuffers memory for the time of its submission and execution, but I'm not sure whether it's worth the effort. The current-proposed solution is efficient and least invasive.

-- Dmitry

Erik Faye-Lund

8:44 a.m.

New subject: [PATCH v2 22/22] gpu: host1x: At first try a non-blocking allocation for the gather copy

On Wed, Jun 14, 2017 at 10:32 AM, Dmitry Osipenko digetx@gmail.com wrote:

...

On 14.06.2017 10:56, Erik Faye-Lund wrote:

...
On Wed, Jun 14, 2017 at 1:16 AM, Dmitry Osipenko digetx@gmail.com wrote:

...
The blocking gather copy allocation is a major performance downside of the Host1x firewall, it may take hundreds milliseconds which is unacceptable for the real-time graphics operations. Let's try a non-blocking allocation first as a least invasive solution, it makes opentegra (Xorg driver) performance indistinguishable with/without the firewall.

Signed-off-by: Dmitry Osipenko digetx@gmail.com

The other much more invasive solution could be: to have a memory pool reserved for the gather copy and performing the firewall checks (and the gather copying) on the actual job submission to the CDMA, not on the job allocation like it is done now. This solution reduces the overhead of a gather copy allocations.

drivers/gpu/host1x/job.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index fc194c676d91..ed0575f14093 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -594,12 +594,20 @@ static inline int copy_gathers(struct host1x_job *job, struct device *dev) size += g->words * sizeof(u32); }
  /*
   * Try a non-blocking allocation from a higher priority pools first,
   * as awaiting for the allocation here is a major performance hit.
   */
  job->gather_copy_mapped = dma_alloc_wc(dev, size, &job->gather_copy,
                                         GFP_KERNEL);
  if (!job->gather_copy_mapped) {
          job->gather_copy_mapped = NULL;
                                         GFP_NOWAIT);
  /* the higher priority allocation failed, try the generic-blocking */
  if (!job->gather_copy_mapped)
          job->gather_copy_mapped = dma_alloc_wc(dev, size,
                                                 &job->gather_copy,
                                                 GFP_KERNEL);
  if (!job->gather_copy_mapped)
          return -ENOMEM;
  }

  job->gather_copy_size = size;
Can't we just have a static fixed-size buffer, and validate chunks at the time? Or why can't we validate directly on the mapped pointers?

It feels pretty silly to have to allocate memory in the first place here...
We can't validate the mapped pointers because userspace may write to the buffers memory at any time. Maybe it should be possible to write-protect cmdbuffers memory for the time of its submission and execution, but I'm not sure whether it's worth the effort. The current-proposed solution is efficient and least invasive.

Right, good point.

Reviewed-by: Erik Faye-Lund kusmabite@gmail.com

2893

Age (days ago)

2894

Last active (days ago)

dri-devel@lists.freedesktop.org

39 comments

3 participants

tags (0)

participants (3)

Dmitry Osipenko
Erik Faye-Lund
Mikko Perttunen