On Tue, Feb 22, 2011 at 9:42 PM, Anca Emanuel anca.emanuel@gmail.com wrote:
General protection fault: http://i.imgur.com/TBJ6y.jpg
dmesg: http://pastebin.com/qD8pR8QH config: http://pastebin.com/XEurtHWi
That's drivers/video/fbmem.c: fb_release(), and the "Code:" disassembly shows that it is
1b: e8 f7 c0 29 00 callq xyz 20: 48 8b 93 b8 03 00 00 mov 0x3b8(%rbx),%rdx 27:* 48 8b 42 10 mov 0x10(%rdx),%rax <-- trapping instruction
which corresponds to
mutex_lock(&info->lock); if (info->fbops->fb_release) info->fbops->fb_release(info,1);
so it looks like 'info->fbops' is invalid. It's in %rdx, and is 0x00d000ae00b500c2, which is definitely not a valid pointer. Looks like some bad corruption (looks like a sequence of 16-bit numbers, but it could be anything).
Looks like nouveafb took over from vesafb. Did you do anything special to trigger this?
Also, you do seem to have some extra patches (yama at the least). Anything else?
Linus
On Wed, Feb 23, 2011 at 6:32 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Tue, Feb 22, 2011 at 9:42 PM, Anca Emanuel anca.emanuel@gmail.com wrote:
General protection fault: http://i.imgur.com/TBJ6y.jpg
dmesg: http://pastebin.com/qD8pR8QH config: http://pastebin.com/XEurtHWi
That's drivers/video/fbmem.c: fb_release(), and the "Code:" disassembly shows that it is
1b: e8 f7 c0 29 00 callq xyz 20: 48 8b 93 b8 03 00 00 mov 0x3b8(%rbx),%rdx 27:* 48 8b 42 10 mov 0x10(%rdx),%rax <-- trapping instruction
which corresponds to
mutex_lock(&info->lock); if (info->fbops->fb_release) info->fbops->fb_release(info,1);
so it looks like 'info->fbops' is invalid. It's in %rdx, and is 0x00d000ae00b500c2, which is definitely not a valid pointer. Looks like some bad corruption (looks like a sequence of 16-bit numbers, but it could be anything).
Looks like nouveafb took over from vesafb. Did you do anything special to trigger this?
No. Just boot the system.
Also, you do seem to have some extra patches (yama at the least). Anything else?
I used git clone, nothing else. First time 2.6.38-rc6 was working. After an update from ubuntu I get that error at boot.
The dmesg is from Ubuntu 11.04 with their kernel and is working fine.
Linus
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wed, Feb 23, 2011 at 9:16 AM, Anca Emanuel anca.emanuel@gmail.com wrote:
Looks like nouveafb took over from vesafb. Did you do anything special to trigger this?
No. Just boot the system.
Every boot?
And just out of interest, what happens if you don't have the vesafb driver at all?
Linus
On Thu, Feb 24, 2011 at 10:28 AM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Wed, Feb 23, 2011 at 9:16 AM, Anca Emanuel anca.emanuel@gmail.com wrote:
Looks like nouveafb took over from vesafb. Did you do anything special to trigger this?
No. Just boot the system.
Every boot?
And just out of interest, what happens if you don't have the vesafb driver at all?
I think this is a race condition somewhere with plymouth getting access to vesafb before it gets kicked off the hw,
I'm assuming removing the vga= line from the command line will stop it,
Dave.
On Thu, Feb 24, 2011 at 2:28 AM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Wed, Feb 23, 2011 at 9:16 AM, Anca Emanuel anca.emanuel@gmail.com wrote:
Looks like nouveafb took over from vesafb. Did you do anything special to trigger this?
No. Just boot the system.
Every boot?
Yes.
And just out of interest, what happens if you don't have the vesafb driver at all?
Linus
I used 'e' option from grub, removed the 'set gfxpayload = $linux_gfx_mode' and it works.
dmesg: http://pastebin.com/JAZsk4vD
On Thu, Feb 24, 2011 at 5:20 AM, Anca Emanuel anca.emanuel@gmail.com wrote:
Every boot?
Yes.
And just out of interest, what happens if you don't have the vesafb driver at all?
I used 'e' option from grub, removed the 'set gfxpayload = $linux_gfx_mode' and it works.
dmesg: http://pastebin.com/JAZsk4vD
Hmm. So it definitely seems to be the hand-over.
Does this patch make any difference? When we unregister the old framebuffer, we still leave it in the registered_fb[] array, which looks wrong. But it would also be interesting to hear if setting CONFIG_SLUB_DEBUG_ON or CONFIG_DEBUG_PAGEALLOC makes any difference (they'd help detect accesses to free'd data structures).
Linus
On Thu, Feb 24, 2011 at 6:37 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Thu, Feb 24, 2011 at 5:20 AM, Anca Emanuel anca.emanuel@gmail.com wrote:
Every boot?
Yes.
And just out of interest, what happens if you don't have the vesafb driver at all?
I used 'e' option from grub, removed the 'set gfxpayload = $linux_gfx_mode' and it works.
dmesg: http://pastebin.com/JAZsk4vD
Hmm. So it definitely seems to be the hand-over.
Does this patch make any difference? When we unregister the old framebuffer, we still leave it in the registered_fb[] array, which looks wrong. But it would also be interesting to hear if setting CONFIG_SLUB_DEBUG_ON or CONFIG_DEBUG_PAGEALLOC makes any difference (they'd help detect accesses to free'd data structures).
Linus
drivers/video/fbmem.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index e2bf953..e8f8925 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a, "%s vs %s - removing generic driver\n", name, registered_fb[i]->fix.id); unregister_framebuffer(registered_fb[i]); + registered_fb[i] = NULL; } } }
Tested the patch, and now I get this: dmesg: http://pastebin.com/ieMNrA7C
[ 12.252328] BUG: unable to handle kernel NULL pointer dereference at 00000000000003b8 [ 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0 [ 12.252354] PGD 78e6c067 PUD 78e6d067 PMD 0 [ 12.252360] Oops: 0000 [#1] SMP [ 12.252364] last sysfs file: /sys/module/snd/initstate [ 12.252370] CPU 0 [ 12.252372] Modules linked in: nouveau(+) snd ttm drm_kms_helper psmouse serio_raw drm soundcore lp snd_page_alloc i2c_algo_bit video parport pata_marvell ahci r8169 libahci [ 12.252393] [ 12.252397] Pid: 244, comm: plymouthd Not tainted 2.6.38-rc6-git3-patch-linus+ #2 MICRO-STAR INTERNATIONAL CO.,LTD MS-7360/MS-7360 [ 12.252407] RIP: 0010:[<ffffffff81311178>] [<ffffffff81311178>] fb_mmap+0x58/0x1d0 [ 12.252414] RSP: 0018:ffff880078e8fd88 EFLAGS: 00010293 [ 12.252418] RAX: 00000000ffffffea RBX: ffff88007047d228 RCX: 0000000000000000 [ 12.252423] RDX: 000fffffffffffff RSI: ffff88007047d228 RDI: ffff880078f5d840 [ 12.252428] RBP: ffff880078e8fdc8 R08: 0000000000000000 R09: ffff88007047d228 [ 12.252432] R10: ffff88006f9d9cf0 R11: ffff88006f9d9d28 R12: ffff880037363800 [ 12.252437] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88007047d228 [ 12.252442] FS: 00007fb5fbaa4720(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 12.252448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.252453] CR2: 00000000000003b8 CR3: 0000000078e6b000 CR4: 00000000000006f0 [ 12.252458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 12.252463] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 12.252468] Process plymouthd (pid: 244, threadinfo ffff880078e8e000, task ffff88003737ad80) [ 12.252473] Stack: [ 12.252476] ffff880037363800 00000000000000b8 ffff880078e8fdd8 ffffffffffffffea [ 12.252484] ffff880037363800 00000000000006bb 00000000006bb000 ffff88007047d228 [ 12.252491] ffff880078e8fe98 ffffffff81130543 ffff880078f5d840 0000000000000000 [ 12.252499] Call Trace: [ 12.252507] [<ffffffff81130543>] mmap_region+0x3c3/0x500 [ 12.252514] [<ffffffff81010d7e>] ? arch_get_unmapped_area_topdown+0x1ce/0x2f0 [ 12.252521] [<ffffffff811309c4>] do_mmap_pgoff+0x344/0x380 [ 12.252528] [<ffffffff810524f1>] ? finish_task_switch+0x41/0xe0 [ 12.252535] [<ffffffff815ac0c3>] ? schedule+0x403/0xa00 [ 12.252541] [<ffffffff81130bfe>] sys_mmap_pgoff+0x1fe/0x230 [ 12.252546] [<ffffffff810108c9>] sys_mmap+0x29/0x30 [ 12.252551] [<ffffffff8100bf02>] system_call_fastpath+0x16/0x1b [ 12.252556] Code: ba ff ff ff ff ff ff 0f 00 48 89 f3 48 8b 40 30 8b 80 b8 00 00 00 25 ff ff 0f 00 49 39 d6 4c 8b 2c c5 c0 cf aa 81 b8 ea ff ff ff <4d> 8b bd b8 03 00 00 76 1f 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 [ 12.252603] RIP [<ffffffff81311178>] fb_mmap+0x58/0x1d0 [ 12.252608] RSP <ffff880078e8fd88> [ 12.252611] CR2: 00000000000003b8 [ 12.252616] ---[ end trace 381165bafe65d748 ]---
On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel anca.emanuel@gmail.com wrote:
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index e2bf953..e8f8925 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a, "%s vs %s - removing generic driver\n", name, registered_fb[i]->fix.id); unregister_framebuffer(registered_fb[i]);
- registered_fb[i] = NULL;
Tested the patch, and now I get this: dmesg: http://pastebin.com/ieMNrA7C
[ 12.252328] BUG: unable to handle kernel NULL pointer dereference at 00000000000003b8 [ 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
Ok, goodie.
Or not so goodie, but it does make it clear that yeah, the fb code seems to be using stale pointers from that registered_fb[] array, and the whole unregistration process is just racing with people using it.
Herton had that much bigger patch, can you test it?
Linus
On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel anca.emanuel@gmail.com wrote:
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index e2bf953..e8f8925 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a, "%s vs %s - removing generic driver\n", name, registered_fb[i]->fix.id); unregister_framebuffer(registered_fb[i]);
registered_fb[i] = NULL;
Tested the patch, and now I get this: dmesg: http://pastebin.com/ieMNrA7C
[ 12.252328] BUG: unable to handle kernel NULL pointer dereference at 00000000000003b8 [ 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
Ok, goodie.
Or not so goodie, but it does make it clear that yeah, the fb code seems to be using stale pointers from that registered_fb[] array, and the whole unregistration process is just racing with people using it.
Herton had that much bigger patch, can you test it?
I think Andy's patch worked, not sure why it fell between the cracks, either didn't appear on lkml or in my inbox at all.
if we can get Herton to repost it properly + a tested by I'm happy for it to go in.
Dave.
On Fri, Feb 25, 2011 at 3:14 AM, Dave Airlie airlied@redhat.com wrote:
On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel anca.emanuel@gmail.com wrote:
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index e2bf953..e8f8925 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a, "%s vs %s - removing generic driver\n", name, registered_fb[i]->fix.id); unregister_framebuffer(registered_fb[i]);
- registered_fb[i] = NULL;
Tested the patch, and now I get this: dmesg: http://pastebin.com/ieMNrA7C
[ 12.252328] BUG: unable to handle kernel NULL pointer dereference at 00000000000003b8 [ 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
Ok, goodie.
Or not so goodie, but it does make it clear that yeah, the fb code seems to be using stale pointers from that registered_fb[] array, and the whole unregistration process is just racing with people using it.
Herton had that much bigger patch, can you test it?
I think Andy's patch worked, not sure why it fell between the cracks, either didn't appear on lkml or in my inbox at all.
if we can get Herton to repost it properly + a tested by I'm happy for it to go in.
Dave.
Tested Andy's patch and it works ! http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f7...
Tested-by: Anca Emanuel anca.emanuel@gmail.com
On Fri, Feb 25, 2011 at 3:47 AM, Anca Emanuel anca.emanuel@gmail.com wrote:
On Fri, Feb 25, 2011 at 3:14 AM, Dave Airlie airlied@redhat.com wrote:
On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel anca.emanuel@gmail.com wrote:
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index e2bf953..e8f8925 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a, "%s vs %s - removing generic driver\n", name, registered_fb[i]->fix.id); unregister_framebuffer(registered_fb[i]);
- registered_fb[i] = NULL;
Tested the patch, and now I get this: dmesg: http://pastebin.com/ieMNrA7C
[ 12.252328] BUG: unable to handle kernel NULL pointer dereference at 00000000000003b8 [ 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
Ok, goodie.
Or not so goodie, but it does make it clear that yeah, the fb code seems to be using stale pointers from that registered_fb[] array, and the whole unregistration process is just racing with people using it.
Herton had that much bigger patch, can you test it?
I think Andy's patch worked, not sure why it fell between the cracks, either didn't appear on lkml or in my inbox at all.
if we can get Herton to repost it properly + a tested by I'm happy for it to go in.
Dave.
Tested Andy's patch and it works ! http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f7...
Tested-by: Anca Emanuel anca.emanuel@gmail.com
link to patch: http://is.gd/otIfGc
On Fri, Feb 25, 2011 at 03:56:20AM +0200, Anca Emanuel wrote:
On Fri, Feb 25, 2011 at 3:47 AM, Anca Emanuel anca.emanuel@gmail.com wrote:
On Fri, Feb 25, 2011 at 3:14 AM, Dave Airlie airlied@redhat.com wrote:
On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel anca.emanuel@gmail.com wrote:
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index e2bf953..e8f8925 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a, "%s vs %s - removing generic driver\n", name, registered_fb[i]->fix.id); unregister_framebuffer(registered_fb[i]);
- registered_fb[i] = NULL;
Tested the patch, and now I get this: dmesg: http://pastebin.com/ieMNrA7C
[ 12.252328] BUG: unable to handle kernel NULL pointer dereference at 00000000000003b8 [ 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
Ok, goodie.
Or not so goodie, but it does make it clear that yeah, the fb code seems to be using stale pointers from that registered_fb[] array, and the whole unregistration process is just racing with people using it.
Herton had that much bigger patch, can you test it?
I think Andy's patch worked, not sure why it fell between the cracks, either didn't appear on lkml or in my inbox at all.
if we can get Herton to repost it properly + a tested by I'm happy for it to go in.
Dave.
Tested Andy's patch and it works ! http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f7...
Tested-by: Anca Emanuel anca.emanuel@gmail.com
link to patch: http://is.gd/otIfGc
Adding Andy on CC (btw he is away for today, may get some time to answer).
Andy, can you repost the patch?
-- []'s Herton
On Fri, Feb 25, 2011 at 11:49:21AM -0300, Herton Ronaldo Krzesinski wrote:
On Fri, Feb 25, 2011 at 03:56:20AM +0200, Anca Emanuel wrote:
On Fri, Feb 25, 2011 at 3:47 AM, Anca Emanuel anca.emanuel@gmail.com wrote:
On Fri, Feb 25, 2011 at 3:14 AM, Dave Airlie airlied@redhat.com wrote:
On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel anca.emanuel@gmail.com wrote:
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index e2bf953..e8f8925 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "%s vs %s - removing generic driver\n", ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name, registered_fb[i]->fix.id); ? ? ? ? ? ? ? ? ? ? ? ?unregister_framebuffer(registered_fb[i]);
- ? ? ? ? ? ? ? ? ? ? ? registered_fb[i] = NULL;
Tested the patch, and now I get this: dmesg: http://pastebin.com/ieMNrA7C
[ ? 12.252328] BUG: unable to handle kernel NULL pointer dereference at 00000000000003b8 [ ? 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
Ok, goodie.
Or not so goodie, but it does make it clear that yeah, the fb code seems to be using stale pointers from that registered_fb[] array, and the whole unregistration process is just racing with people using it.
Herton had that much bigger patch, can you test it?
I think Andy's patch worked, not sure why it fell between the cracks, either didn't appear on lkml or in my inbox at all.
if we can get Herton to repost it properly + a tested by I'm happy for it to go in.
Dave.
Tested Andy's patch and it works ! http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f7...
Tested-by: Anca Emanuel anca.emanuel@gmail.com
link to patch: http://is.gd/otIfGc
Adding Andy on CC (btw he is away for today, may get some time to answer).
Andy, can you repost the patch?
This is the first I've seen the patch as well, but fortunately patchwork caught it on the Cc.
There's also an outstanding patch for fixing an AB-BA deadlock between the fb_info lock and the console lock which this will clash with. I'm happy to rework that patch on top of Andy's patch for Anca and/or Herton to test, though.
I'll need to do some more testing locally as well..
dri-devel@lists.freedesktop.org