* Fernando Lopez-Lezcano | 2014-03-01 17:48:29 [-0800]:
On 02/23/2014 10:47 AM, Sebastian Andrzej Siewior wrote:
Dear RT folks!
I'm pleased to announce the v3.12.12-rt19 patch set.
Just hit this Oops in my desktop at home:
[22328.388996] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [22328.389013] IP: [<ffffffffa011a912>] nouveau_fence_wait_uevent.isra.2+0x22/0x440 [nouveau]
This is
| static int | nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) | | { | struct nouveau_channel *chan = fence->channel; | struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);
and chan is NULL.
[22328.389046] RAX: 0000000000000000 RBX: ffff8807a68f8fa8 RCX: 0000000000000000 [22328.389046] RDX: 0000000000000001 RSI: ffff8807a68f8fb0 RDI: ffff8807a68f8fa8 [22328.389047] RBP: ffff8807c09bdca0 R08: 000000000000045e R09: 000000000000e200 [22328.389047] R10: ffffffffa0157d80 R11: ffff8807c09bdde0 R12: 0000000000000001 [22328.389047] R13: 0000000000000000 R14: ffff8807d8493a80 R15: ffff8807a68f8fb0 [22328.389053] Call Trace: [22328.389069] [<ffffffffa011af56>] nouveau_fence_wait+0x86/0x1a0 [nouveau] [22328.389081] [<ffffffffa011ca35>] nouveau_bo_fence_wait+0x15/0x20 [nouveau] [22328.389084] [<ffffffffa00867c6>] ttm_bo_wait+0x96/0x1a0 [ttm] [22328.389095] [<ffffffffa0121dac>] nouveau_gem_ioctl_cpu_prep+0x5c/0xf0 [nouveau] [22328.389101] [<ffffffffa002cd42>] drm_ioctl+0x502/0x630 [drm] [22328.389114] [<ffffffffa01180a1>] nouveau_drm_ioctl+0x51/0x90 [nouveau]
I can't find any kind of locking so my question is what ensures that chan is not set to NULL between nouveau_fence_done() and nouveau_fence_wait_uevent()? There are just a few opcodes in between but nothing that pauses nouveau_fence_signal().
Fernando, can you please check the patch below and test if the warning or the crash appears?
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -184,14 +184,20 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr)
{ struct nouveau_channel *chan = fence->channel; - struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device); - struct nouveau_fence_priv *priv = chan->drm->fence; + struct nouveau_fifo *pfifo; + struct nouveau_fence_priv *priv; struct nouveau_fence_uevent uevent = { .handler.func = nouveau_fence_wait_uevent_handler, - .priv = priv, }; int ret = 0;
+ if (WARN_ON_ONCE(!chan)) + return 0; + + pfifo = nouveau_fifo(chan->drm->device); + priv = chan->drm->fence; + uevent.priv = priv; + nouveau_event_get(pfifo->uevent, 0, &uevent.handler);
if (fence->timeout) {
-- Fernando
Sebastian
op 07-03-14 12:18, Sebastian Andrzej Siewior schreef:
- Fernando Lopez-Lezcano | 2014-03-01 17:48:29 [-0800]:
On 02/23/2014 10:47 AM, Sebastian Andrzej Siewior wrote:
Dear RT folks!
I'm pleased to announce the v3.12.12-rt19 patch set.
Just hit this Oops in my desktop at home:
[22328.388996] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [22328.389013] IP: [<ffffffffa011a912>] nouveau_fence_wait_uevent.isra.2+0x22/0x440 [nouveau]
This is
| static int | nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) | | { | struct nouveau_channel *chan = fence->channel; | struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);
and chan is NULL.
[22328.389046] RAX: 0000000000000000 RBX: ffff8807a68f8fa8 RCX: 0000000000000000 [22328.389046] RDX: 0000000000000001 RSI: ffff8807a68f8fb0 RDI: ffff8807a68f8fa8 [22328.389047] RBP: ffff8807c09bdca0 R08: 000000000000045e R09: 000000000000e200 [22328.389047] R10: ffffffffa0157d80 R11: ffff8807c09bdde0 R12: 0000000000000001 [22328.389047] R13: 0000000000000000 R14: ffff8807d8493a80 R15: ffff8807a68f8fb0 [22328.389053] Call Trace: [22328.389069] [<ffffffffa011af56>] nouveau_fence_wait+0x86/0x1a0 [nouveau] [22328.389081] [<ffffffffa011ca35>] nouveau_bo_fence_wait+0x15/0x20 [nouveau] [22328.389084] [<ffffffffa00867c6>] ttm_bo_wait+0x96/0x1a0 [ttm] [22328.389095] [<ffffffffa0121dac>] nouveau_gem_ioctl_cpu_prep+0x5c/0xf0 [nouveau] [22328.389101] [<ffffffffa002cd42>] drm_ioctl+0x502/0x630 [drm] [22328.389114] [<ffffffffa01180a1>] nouveau_drm_ioctl+0x51/0x90 [nouveau]
I can't find any kind of locking so my question is what ensures that chan is not set to NULL between nouveau_fence_done() and nouveau_fence_wait_uevent()? There are just a few opcodes in between but nothing that pauses nouveau_fence_signal().
Absolutely nothing. :-) Worse still, there's no guarantee that channel isn't freed, but hopefully that is less likely to be an issue.
~Maarten
* Maarten Lankhorst | 2014-03-07 12:36:13 [+0100]:
I can't find any kind of locking so my question is what ensures that chan is not set to NULL between nouveau_fence_done() and nouveau_fence_wait_uevent()? There are just a few opcodes in between but nothing that pauses nouveau_fence_signal().
Absolutely nothing. :-) Worse still, there's no guarantee that channel isn't freed, but hopefully that is less likely to be an issue.
Okay, so I hit the correct spot. What do we do here? Do you want the patch I posted without the WARN_ON() or do you prefer to fix this in an other way?
~Maarten
Sebastian
dri-devel@lists.freedesktop.org