This increases the chance slightly that recovery from lockup can happen succesfully.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com --- drivers/gpu/drm/nouveau/nv84_fence.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c index 2cf0ade..daf4b18 100644 --- a/drivers/gpu/drm/nouveau/nv84_fence.c +++ b/drivers/gpu/drm/nouveau/nv84_fence.c @@ -122,8 +122,11 @@ nv84_fence_context_del(struct nouveau_channel *chan) struct drm_device *dev = chan->drm->dev; struct nv84_fence_priv *priv = chan->drm->fence; struct nv84_fence_chan *fctx = chan->fence; + struct nouveau_fifo_chan *fifo = (void *)chan->object; int i;
+ nouveau_bo_wr32(priv->bo, fifo->chid * 16/4, fctx->base.sequence); + for (i = 0; i < dev->mode_config.num_crtc; i++) { struct nouveau_bo *bo = nv50_display_crtc_sema(dev, i); nouveau_bo_vma_del(bo, &fctx->dispc_vma[i]); @@ -168,7 +171,7 @@ nv84_fence_context_new(struct nouveau_channel *chan) ret = nouveau_bo_vma_add(bo, client->vm, &fctx->dispc_vma[i]); }
- nouveau_bo_wr32(priv->bo, fifo->chid * 16/4, 0x00000000); + fctx->base.sequence = nouveau_bo_rd32(priv->bo, fifo->chid * 16/4);
if (ret) nv84_fence_context_del(chan);
On Tue, Sep 3, 2013 at 12:31 AM, Maarten Lankhorst maarten.lankhorst@canonical.com wrote:
This increases the chance slightly that recovery from lockup can happen succesfully.
I'd *really* love to see proof of this. When channels die, all outstanding fences are marked as signalled. This should do absolutely nothing...
Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com
drivers/gpu/drm/nouveau/nv84_fence.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c index 2cf0ade..daf4b18 100644 --- a/drivers/gpu/drm/nouveau/nv84_fence.c +++ b/drivers/gpu/drm/nouveau/nv84_fence.c @@ -122,8 +122,11 @@ nv84_fence_context_del(struct nouveau_channel *chan) struct drm_device *dev = chan->drm->dev; struct nv84_fence_priv *priv = chan->drm->fence; struct nv84_fence_chan *fctx = chan->fence;
struct nouveau_fifo_chan *fifo = (void *)chan->object; int i;
nouveau_bo_wr32(priv->bo, fifo->chid * 16/4, fctx->base.sequence);
for (i = 0; i < dev->mode_config.num_crtc; i++) { struct nouveau_bo *bo = nv50_display_crtc_sema(dev, i); nouveau_bo_vma_del(bo, &fctx->dispc_vma[i]);
@@ -168,7 +171,7 @@ nv84_fence_context_new(struct nouveau_channel *chan) ret = nouveau_bo_vma_add(bo, client->vm, &fctx->dispc_vma[i]); }
nouveau_bo_wr32(priv->bo, fifo->chid * 16/4, 0x00000000);
fctx->base.sequence = nouveau_bo_rd32(priv->bo, fifo->chid * 16/4); if (ret) nv84_fence_context_del(chan);
-- 1.8.3.4
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Op 04-09-13 05:21, Ben Skeggs schreef:
On Tue, Sep 3, 2013 at 12:31 AM, Maarten Lankhorst maarten.lankhorst@canonical.com wrote:
This increases the chance slightly that recovery from lockup can happen succesfully.
I'd *really* love to see proof of this. When channels die, all outstanding fences are marked as signalled. This should do absolutely nothing...
nv84+ heavily rely on fences though, and a race like this is possible: - channel 0 uses a bo from channel 1, queues a wait somewhere in the command stream for it. - channel 1 dies cleanly, but userspace creates a new channel in its place, fence counter is reset to 0. - channel 0 reaches the NV84_SUBCHAN_SEMAPHORE_TRIGGER.ACQUIRE_GEQUAL op, waits on fence in channel 1 to signal forever.
Channel 0 could be the global drm channel used for buffer moves, which would result in a hang. This may seem unlikely, but I believe that parallel piglit runs could trigger it.
If not, simply creating an operation that takes a few seconds in channel 0 and then queuing a command that uses a bo from channel 1 while chan1 is still busy, then deleting/recreating chan1 could trigger it.
~Maarten
On Wed, Sep 4, 2013 at 10:37 PM, Maarten Lankhorst maarten.lankhorst@canonical.com wrote:
Op 04-09-13 05:21, Ben Skeggs schreef:
On Tue, Sep 3, 2013 at 12:31 AM, Maarten Lankhorst maarten.lankhorst@canonical.com wrote:
This increases the chance slightly that recovery from lockup can happen succesfully.
I'd *really* love to see proof of this. When channels die, all outstanding fences are marked as signalled. This should do absolutely nothing...
nv84+ heavily rely on fences though, and a race like this is possible:
- channel 0 uses a bo from channel 1, queues a wait somewhere in the command stream for it.
- channel 1 dies cleanly, but userspace creates a new channel in its place, fence counter is reset to 0.
- channel 0 reaches the NV84_SUBCHAN_SEMAPHORE_TRIGGER.ACQUIRE_GEQUAL op, waits on fence in channel 1 to signal forever.
Ok, this isn't exactly the issue you implied in the commit message. But yes, this could possibly be an issue for sure. I don't think this is the right way to fix it however. I'll have a bit of a think on the problem and see what I can come up with.
Thanks, Ben.
Channel 0 could be the global drm channel used for buffer moves, which would result in a hang. This may seem unlikely, but I believe that parallel piglit runs could trigger it.
If not, simply creating an operation that takes a few seconds in channel 0 and then queuing a command that uses a bo from channel 1 while chan1 is still busy, then deleting/recreating chan1 could trigger it.
~Maarten
dri-devel@lists.freedesktop.org