Half-complete kernel interface for waiting on CS completion

List overview All Threads
Download

newer

older

[PATCH 1/4] nouveau: Allow...

[PATCH] [RFC] drm/radeon: rework...

Simon Farnsworth

3 Feb 2012 3 Feb '12

4:30 p.m.

Hello,

I've run out of time to work on this, as I have a fix (the unused BO trick) that's good enough for my needs. I've taken Christian's patches (1 and 2 of this series) and added the obvious ioctl interface on top, enabling userspace to wait for a kernel fence *if* the kernel gives it some opaque tokens (currently the kernel ring and sequence number).

I got stuck trying to work out how to pass those tokens back out of the CS ioctl. Return value isn't enough, as I have 96 bits of information in my token (32-bit ring number, 64-bit seqno), so I experimented with adding an extra chunk that the CS ioctl writes. However, I hit pain trying to work out what to do when writing to that extra chunk fails - the CS has already been submitted to hardware here, so returning an error is unfriendly.

As I've got the fix I need (Mesa-dev message "[PATCH] r600g: Use a fake reloc to sleep for fences"), I can't really justify continuing to work on this, so I'm putting out what I've got, complete with known problems, in case someone else gets interested.

-- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/

Show replies by date

Simon Farnsworth

3 Feb 3 Feb

4:30 p.m.

New subject: [PATCH 1/3] drm/radeon: expand fence sequence values to 64 bit

From: Christian König deathsimple@vodafone.de

They are protected by a read/write lock anyway, so we actually don't need to use the atomic type.

Signed-off-by: Christian König deathsimple@vodafone.de --- Unchanged from Christian's original work.

drivers/gpu/drm/radeon/radeon.h | 6 +++--- drivers/gpu/drm/radeon/radeon_fence.c | 29 +++++++++++++++++------------ 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 2859406..b4cfb11 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -212,8 +212,8 @@ struct radeon_fence_driver { uint32_t scratch_reg; uint64_t gpu_addr; volatile uint32_t *cpu_addr; - atomic_t seq; - uint32_t last_seq; + uint64_t seq; + uint64_t last_seq; unsigned long last_jiffies; unsigned long last_timeout; wait_queue_head_t queue; @@ -228,7 +228,7 @@ struct radeon_fence { struct kref kref; struct list_head list; /* protected by radeon_fence.lock */ - uint32_t seq; + uint64_t seq; bool emitted; bool signaled; /* RB, DMA, etc. */ diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 64ea3dd..ac177c5 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -29,7 +29,6 @@ * Dave Airlie */ #include <linux/seq_file.h> -#include <linux/atomic.h> #include <linux/wait.h> #include <linux/list.h> #include <linux/kref.h> @@ -70,7 +69,7 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence) write_unlock_irqrestore(&rdev->fence_lock, irq_flags); return 0; } - fence->seq = atomic_add_return(1, &rdev->fence_drv[fence->ring].seq); + fence->seq = ++rdev->fence_drv[fence->ring].seq; if (!rdev->ring[fence->ring].ready) /* FIXME: cp is not running assume everythings is done right * away @@ -90,12 +89,18 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring) { struct radeon_fence *fence; struct list_head *i, *n; - uint32_t seq; + uint64_t seq; bool wake = false; unsigned long cjiffies;

seq = radeon_fence_read(rdev, ring); + seq |= rdev->fence_drv[ring].last_seq & 0xFFFFFFFF00000000; if (seq != rdev->fence_drv[ring].last_seq) { + if (seq < rdev->fence_drv[ring].last_seq) { + /* sequence wrapped around */ + seq = (seq & 0xFFFFFFFF) | + (rdev->fence_drv[ring].seq & 0xFFFFFFFF00000000); + } rdev->fence_drv[ring].last_seq = seq; rdev->fence_drv[ring].last_jiffies = jiffies; rdev->fence_drv[ring].last_timeout = RADEON_FENCE_JIFFIES_TIMEOUT; @@ -216,7 +221,7 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) { struct radeon_device *rdev; unsigned long irq_flags, timeout; - u32 seq; + uint64_t last_seq; int r;

if (fence == NULL) { @@ -230,8 +235,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) timeout = rdev->fence_drv[fence->ring].last_timeout; retry: /* save current sequence used to check for GPU lockup */ - seq = rdev->fence_drv[fence->ring].last_seq; - trace_radeon_fence_wait_begin(rdev->ddev, seq); + last_seq = rdev->fence_drv[fence->ring].last_seq; + trace_radeon_fence_wait_begin(rdev->ddev, last_seq); if (intr) { radeon_irq_kms_sw_irq_get(rdev, fence->ring); r = wait_event_interruptible_timeout(rdev->fence_drv[fence->ring].queue, @@ -246,7 +251,7 @@ retry: radeon_fence_signaled(fence), timeout); radeon_irq_kms_sw_irq_put(rdev, fence->ring); } - trace_radeon_fence_wait_end(rdev->ddev, seq); + trace_radeon_fence_wait_end(rdev->ddev, last_seq); if (unlikely(!radeon_fence_signaled(fence))) { /* we were interrupted for some reason and fence isn't * isn't signaled yet, resume wait @@ -258,11 +263,11 @@ retry: /* don't protect read access to rdev->fence_drv[t].last_seq * if we experiencing a lockup the value doesn't change */ - if (seq == rdev->fence_drv[fence->ring].last_seq && + if (last_seq == rdev->fence_drv[fence->ring].last_seq && radeon_gpu_is_lockup(rdev, &rdev->ring[fence->ring])) { /* good news we believe it's a lockup */ printk(KERN_WARNING "GPU lockup (waiting for 0x%08X last fence id 0x%08X)\n", - fence->seq, seq); + (uint32_t)fence->seq, (uint32_t)last_seq); /* FIXME: what should we do ? marking everyone * as signaled for now */ @@ -403,7 +408,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) } rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4]; rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index; - radeon_fence_write(rdev, atomic_read(&rdev->fence_drv[ring].seq), ring); + radeon_fence_write(rdev, rdev->fence_drv[ring].seq, ring); rdev->fence_drv[ring].initialized = true; DRM_INFO("fence driver on ring %d use gpu addr 0x%08Lx and cpu addr 0x%p\n", ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr); @@ -416,7 +421,7 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring) rdev->fence_drv[ring].scratch_reg = -1; rdev->fence_drv[ring].cpu_addr = NULL; rdev->fence_drv[ring].gpu_addr = 0; - atomic_set(&rdev->fence_drv[ring].seq, 0); + rdev->fence_drv[ring].seq = 0; INIT_LIST_HEAD(&rdev->fence_drv[ring].created); INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted); INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled); @@ -481,7 +486,7 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data) fence = list_entry(rdev->fence_drv[i].emitted.prev, struct radeon_fence, list); seq_printf(m, "Last emitted fence %p with 0x%08X\n", - fence, fence->seq); + fence, (uint32_t)fence->seq); } } return 0;

-- 1.7.6.4

Simon Farnsworth

4:30 p.m.

New subject: [PATCH 2/3] drm/radeon: interface waiting for fence values

From: Christian König deathsimple@vodafone.de

To support waiting for fence values from usermode.

Signed-off-by: Christian König deathsimple@vodafone.de --- Again, unchanged from Christian's original work

drivers/gpu/drm/radeon/radeon_fence.c | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index ac177c5..85893c3 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -334,6 +334,28 @@ int radeon_fence_wait_last(struct radeon_device *rdev, int ring) return r; }

+static bool radeon_fence_value_reached(struct radeon_device *rdev, + int ring, uint64_t value) +{ + unsigned long irq_flags; + bool result; + read_lock_irqsave(&rdev->fence_lock, irq_flags); + result = rdev->fence_drv[ring].last_seq <= value; + read_unlock_irqrestore(&rdev->fence_lock, irq_flags); + return result; +} + +int radeon_fence_wait_value(struct radeon_device *rdev, int ring, + uint64_t value, unsigned long timeout) +{ + int r; + radeon_irq_kms_sw_irq_get(rdev, ring); + r = wait_event_interruptible_timeout(rdev->fence_drv[ring].queue, + radeon_fence_value_reached(rdev, ring, value), timeout); + radeon_irq_kms_sw_irq_put(rdev, ring); + return r; +} + struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence) { kref_get(&fence->kref);

-- 1.7.6.4

Simon Farnsworth

4:30 p.m.

New subject: [PATCH 3/3] drm/radeon: Provide a wait for CS completion interface

Provide a mechanism for userspace to wait for kernel fences, and teach the CS ioctl to return enough information to let userspace use this mechanism.

This mechanism isn't safe to include as-is, as there is no way for the CS ioctl to indicate that it hasn't given you the information needed to let you wait for a kernel fence. I'm thus sending it out as a starting point in case someone else wants to continue digging into this.

Signed-off-by: Simon Farnsworth simon.farnsworth@onelan.co.uk --- Note the FIXME - how does the CS ioctl indicate a fence failure?

I'm also a little concerned that my general userspace API isn't flexible enough to match what we actually want - in particular, it would be nice for GL_ARB_sync to be able to plonk fences down scattered throughout the IB, and wait for that fence, not the whole CS. I couldn't easily determine whether it was safe to add a kernel fence to a userspace IB during parsing, though, so didn't attempt to do that.

Finally, this mechanism (as compared to the wait mechanism I proposed in [PATCH] drm/radeon: Add support for userspace fence waits), doesn't do anything to protect us against malicious userspace causing interrupt storms - userspace can still fill an IB with EVENT_WRITE_EOP packets requesting an interrupt, and this mechanism allows userspace to ensure that the interrupt is enabled.

Someone needs to decide if that's a risk worth worrying about - if so, the CS checkers need to reject attempts by userspace to request interrupts on completion.

drivers/gpu/drm/radeon/radeon.h | 3 +++ drivers/gpu/drm/radeon/radeon_cs.c | 25 +++++++++++++++++++++++++ drivers/gpu/drm/radeon/radeon_fence.c | 16 ++++++++++++++++ drivers/gpu/drm/radeon/radeon_kms.c | 1 + include/drm/radeon_drm.h | 15 +++++++++++++++ 5 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index b4cfb11..f26744a 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -800,6 +800,7 @@ struct radeon_cs_parser { int chunk_ib_idx; int chunk_relocs_idx; int chunk_flags_idx; + int chunk_fence_idx; struct radeon_ib *ib; void *track; unsigned family; @@ -1348,6 +1349,8 @@ int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); int radeon_gem_get_tiling_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int radeon_fence_wait_value_ioctl(struct drm_device *dev, void *data, + struct drm_file *filp);

/* VRAM scratch page for HDP bug, default vram page */ struct r600_vram_scratch { diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 435a3d9..e0e8165 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -163,6 +163,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) p->chunk_ib_idx = -1; p->chunk_relocs_idx = -1; p->chunk_flags_idx = -1; + p->chunk_fence_idx = -1; p->chunks_array = kcalloc(cs->num_chunks, sizeof(uint64_t), GFP_KERNEL); if (p->chunks_array == NULL) { return -ENOMEM; @@ -207,6 +208,12 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) if (p->chunks[i].length_dw == 0) return -EINVAL; } + if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FENCE) { + p->chunk_fence_idx = i; + /* A fence chunk must be able to store the end-of-stream fence */ + if (p->chunks[i].length_dw != (sizeof(struct drm_radeon_fence_wait) / 4)) + return -EINVAL; + }

p->chunks[i].length_dw = user_chunk.length_dw; p->chunks[i].user_ptr = (void __user *)(unsigned long)user_chunk.chunk_data; @@ -279,6 +286,24 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data) return 0; }

+static void radeon_cs_parser_output_fence(struct radeon_cs_parser *parser) +{ + struct drm_radeon_fence_wait output; + void __user* chunk; + + memset(&output, 0, sizeof(struct drm_radeon_fence_wait)); + + if (parser->chunk_fence_idx != -1) { + chunk = (void __user*)(unsigned long)parser->chunks_array[parser->chunk_fence_idx]; + output.ring_token = parser->ib->fence->ring; + output.fence_token = parser->ib->fence->seq; + /* FIXME: What do I do about an error here? */ + DRM_COPY_TO_USER(chunk, + &output, + sizeof(struct drm_radeon_fence_wait)); + } +} + /** * cs_parser_fini() - clean parser states * @parser: parser structure holding parsing context. diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 85893c3..50bb054 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -356,6 +356,22 @@ int radeon_fence_wait_value(struct radeon_device *rdev, int ring, return r; }

+int radeon_fence_wait_value_ioctl(struct drm_device *dev, void *data, + struct drm_file *filp) +{ + struct radeon_device *rdev = dev->dev_private; + struct drm_radeon_fence_wait *args = data; + int r; + + r = radeon_fence_wait_value(rdev, + args->ring_token, + args->fence_token, + usecs_to_jiffies(args->timeout_usec)); + if (r > 0) + r = 0; + + return r; +} struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence) { kref_get(&fence->kref); diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c index d335288..a52db06 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -496,5 +496,6 @@ struct drm_ioctl_desc radeon_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING, radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED), DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED), DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(RADEON_FENCE_WAIT, radeon_fence_wait_value_ioctl, DRM_AUTH|DRM_UNLOCKED), }; int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms); diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h index dd2e9cf..d81ce94 100644 --- a/include/drm/radeon_drm.h +++ b/include/drm/radeon_drm.h @@ -510,6 +510,7 @@ typedef struct { #define DRM_RADEON_GEM_GET_TILING 0x29 #define DRM_RADEON_GEM_BUSY 0x2a #define DRM_RADEON_GEM_VA 0x2b +#define DRM_RADEON_FENCE_WAIT 0x2c

#define DRM_IOCTL_RADEON_CP_INIT DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t) #define DRM_IOCTL_RADEON_CP_START DRM_IO( DRM_COMMAND_BASE + DRM_RADEON_CP_START) @@ -552,6 +553,7 @@ typedef struct { #define DRM_IOCTL_RADEON_GEM_GET_TILING DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling) #define DRM_IOCTL_RADEON_GEM_BUSY DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy) #define DRM_IOCTL_RADEON_GEM_VA DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va) +#define DRM_IOCTL_RADEON_FENCE_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_FENCE_WAIT, struct drm_radeon_fence_wait)

typedef struct drm_radeon_init { enum { @@ -906,6 +908,12 @@ struct drm_radeon_gem_va { #define RADEON_CHUNK_ID_RELOCS 0x01 #define RADEON_CHUNK_ID_IB 0x02 #define RADEON_CHUNK_ID_FLAGS 0x03 +#define RADEON_CHUNK_ID_FENCE 0x04 + +/* A RADEON_CHUNK_ID_FENCE is a struct drm_radeon_fence_wait. If the CS is + * accepted, the kernel will fill in the fence_token and ring_token elements + * such that userspace just needs to fill in timeout_usec and call the + * DRM_RADEON_FENCE_WAIT ioctl to wait for this CS to complete. */

/* The first dword of RADEON_CHUNK_ID_FLAGS is a uint32 of these flags: */ #define RADEON_CS_KEEP_TILING_FLAGS 0x01 @@ -967,4 +975,11 @@ struct drm_radeon_info { uint64_t value; };

+struct drm_radeon_fence_wait { + uint64_t fence_token; + uint64_t timeout_usec; + uint32_t ring_token; + uint32_t padding; +}; + #endif

-- 1.7.6.4

Dave Airlie

4:46 p.m.

On Fri, Feb 3, 2012 at 4:30 PM, Simon Farnsworth simon.farnsworth@onelan.co.uk wrote:

...

Hello,

I've run out of time to work on this, as I have a fix (the unused BO trick) that's good enough for my needs. I've taken Christian's patches (1 and 2 of this series) and added the obvious ioctl interface on top, enabling userspace to wait for a kernel fence *if* the kernel gives it some opaque tokens (currently the kernel ring and sequence number).

I got stuck trying to work out how to pass those tokens back out of the CS ioctl. Return value isn't enough, as I have 96 bits of information in my token (32-bit ring number, 64-bit seqno), so I experimented with adding an extra chunk that the CS ioctl writes. However, I hit pain trying to work out what to do when writing to that extra chunk fails - the CS has already been submitted to hardware here, so returning an error is unfriendly.

As I've got the fix I need (Mesa-dev message "[PATCH] r600g: Use a fake reloc to sleep for fences"), I can't really justify continuing to work on this, so I'm putting out what I've got, complete with known problems, in case someone else gets interested.

I just noticed today we should backport to 8.0. If you get a GPU reset in the old model, the app can spin forever, at least it appears thats what gnome-shell is doing here.

Dave.

Simon Farnsworth

5 p.m.

On Friday 3 February 2012, Dave Airlie airlied@gmail.com wrote:

...

On Fri, Feb 3, 2012 at 4:30 PM, Simon Farnsworth simon.farnsworth@onelan.co.uk wrote:

...
As I've got the fix I need (Mesa-dev message "[PATCH] r600g: Use a fake reloc to sleep for fences"), I can't really justify continuing to work on this, so I'm putting out what I've got, complete with known problems, in case someone else gets interested.

I just noticed today we should backport to 8.0. If you get a GPU reset in the old model, the app can spin forever, at least it appears thats what gnome-shell is doing here.

I didn't fix that - I simply used the fake reloc as an alternative to busy-waiting, as I didn't want to churn the code too much.

r600g cannot cope with a GPU reset that occurs before the EVENT_WRITE_EOP changes a dword in the fence BO from 0 to 1. Without the fake reloc, it will busy-wait indefinitely if that happens, using sched_yield to relinquish CPU time - with the fake reloc, it will keep calling the DRM_RADEON_GEM_WAIT_IDLE ioctl in an infinite loop.

It would be fairly trivial to not loop if you went down the fake reloc path (and I can code that if requested, for a v3 Mesa patch); thinking about it, I can also code it so that we break from the loop if the fake reloc stops being busy at any time.

Taking these patches further is the "obvious" correct thing, though.

-- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/

4829

Age (days ago)

4829

Last active (days ago)

dri-devel@lists.freedesktop.org

5 comments

2 participants

tags (0)

participants (2)

Dave Airlie
Simon Farnsworth