On Tue, Mar 19, 2013 at 10:03 AM, Chris Wilson chris@chris-wilson.co.uk wrote:
How about just using: if (!HAS_GMBUS_IRQ(dev_priv->dev)) gmbus4_irq_en = 0; and the existing wait loop?
I explicitly wanted to avoid touching GMBUS4 register, as the real cause of the failure is not clear.
But, as Yinghai Lu points out, the problem is most likely caused by interrupt disabling not working properly (see his very good point regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out should work .... and it indeed does in my case, hence the (tested) patch below.
I think it's a 3.9-rc material, and I am all open to debug this further for 3.10 so that the race is closed and gmbus irqs can be used on Gen4 platform properly.
Agreed. Using the IRQ for GMBUS is just a performance feature that can be deferred until after we determine the root cause - and hope that the failure is somehow peculiar to GMBUS.
Ok, I've merged this patch. But some further investigation points at a much more severe dragon hiding here: The MSI interrupt for the intel gfx is commonly in the 40+ range, but the interrupt vector with the spurious interrupts is 16. Which is the irq of the intel gfx when MSI is disabled!
So it looks like gmbus on the intel gfx is capable of generating non-MSI interrupts in parallel to the MSI interrupts (since apparently gmbus still works, so we get the interrupts we expect). I have no idea how that could happen. Hence adding a bunch of people with more clue than me.
For reference below the updated commit message.
Cheers, Daniel
Author: Jiri Kosina jkosina@suse.cz Date: Tue Mar 19 09:56:57 2013 +0100
drm/i915: stop using GMBUS IRQs on Gen4 chips
Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above.
It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued.
Typically this results in the spurious interrupt source getting disabled:
[ 9.636345] irq 16: nobody cared (try booting with the "irqpoll" option) [ 9.637915] Pid: 4157, comm: ifup Tainted: GF 3.9.0-rc2-00341-g0863702 #422 [ 9.639484] Call Trace: [ 9.640731] <IRQ> [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7 [ 9.640731] [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8 [ 9.640731] [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214 [ 9.640731] [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c [ 9.640731] [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0 [ 9.640731] [<ffffffff8100400e>] handle_irq+0x1a/0x24 [ 9.640731] [<ffffffff81003d17>] do_IRQ+0x48/0xaf [ 9.640731] [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a [ 9.640731] <EOI> [<ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b [ 9.640731] handlers: [ 9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore] [ 9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket] [ 9.640731] Disabling IRQ #16
The really curious thing is now that irq 16 is _not_ the interrupt for the i915 driver when using MSI, but it _is_ the interrupt when not using MSI. So by all indications it seems like gmbus is able to generate a legacy (shared) interrupt in MSI mode on some configurations. I've tried to reproduce this and the differentiating thing seems to be that on unaffected systems no other device uses irq 16 (which seems to be the non-MSI intel gfx interrupt on all gm45).
I have no idea how that even can happen.
To avoid tempting this elephant into a rage, just disable gmbus interrupt support on gen 4.
v2: Improve the commit message with exact details of what's going on. Also add a comment in the code to warn against this particular elephant in the room.
Signed-off-by: Jiri Kosina jkosina@suse.cz (v1) Acked-by: Chris Wilson chris@chris-wilson.co.uk (v1) References: https://lkml.org/lkml/2013/3/8/325 Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
On Tue, 19 Mar 2013, Daniel Vetter wrote:
For reference below the updated commit message.
Cheers, Daniel
Author: Jiri Kosina jkosina@suse.cz Date: Tue Mar 19 09:56:57 2013 +0100
drm/i915: stop using GMBUS IRQs on Gen4 chips Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above. It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued. Typically this results in the spurious interrupt source getting disabled: [ 9.636345] irq 16: nobody cared (try booting with the "irqpoll" option) [ 9.637915] Pid: 4157, comm: ifup Tainted: GF
3.9.0-rc2-00341-g0863702 #422 [ 9.639484] Call Trace: [ 9.640731] <IRQ> [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7 [ 9.640731] [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8 [ 9.640731] [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214 [ 9.640731] [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c [ 9.640731] [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0 [ 9.640731] [<ffffffff8100400e>] handle_irq+0x1a/0x24 [ 9.640731] [<ffffffff81003d17>] do_IRQ+0x48/0xaf [ 9.640731] [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a [ 9.640731] <EOI> [<ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b [ 9.640731] handlers: [ 9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore] [ 9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket] [ 9.640731] Disabling IRQ #16
The really curious thing is now that irq 16 is _not_ the interrupt for the i915 driver when using MSI, but it _is_ the interrupt when not using MSI. So by all indications it seems like gmbus is able to generate a legacy (shared) interrupt in MSI mode on some configurations. I've tried to reproduce this and the differentiating thing seems to be that on unaffected systems no other device uses irq 16 (which seems to be the non-MSI intel gfx interrupt on all gm45).
That might be misleading. It's possible that the erroneous IRQs _are_ being issued but you're simply not aware of them. If the kernel thinks that no device is using IRQ 16 then it will leave that IRQ disabled.
Alan Stern
On Tue, Mar 19, 2013 at 4:38 PM, Alan Stern stern@rowland.harvard.edu wrote:
On Tue, 19 Mar 2013, Daniel Vetter wrote:
For reference below the updated commit message.
Cheers, Daniel
Author: Jiri Kosina jkosina@suse.cz Date: Tue Mar 19 09:56:57 2013 +0100
drm/i915: stop using GMBUS IRQs on Gen4 chips Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above. It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued. Typically this results in the spurious interrupt source getting disabled: [ 9.636345] irq 16: nobody cared (try booting with the "irqpoll" option) [ 9.637915] Pid: 4157, comm: ifup Tainted: GF
3.9.0-rc2-00341-g0863702 #422 [ 9.639484] Call Trace: [ 9.640731] <IRQ> [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7 [ 9.640731] [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8 [ 9.640731] [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214 [ 9.640731] [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c [ 9.640731] [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0 [ 9.640731] [<ffffffff8100400e>] handle_irq+0x1a/0x24 [ 9.640731] [<ffffffff81003d17>] do_IRQ+0x48/0xaf [ 9.640731] [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a [ 9.640731] <EOI> [<ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b [ 9.640731] handlers: [ 9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore] [ 9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket] [ 9.640731] Disabling IRQ #16
The really curious thing is now that irq 16 is _not_ the interrupt for the i915 driver when using MSI, but it _is_ the interrupt when not using MSI. So by all indications it seems like gmbus is able to generate a legacy (shared) interrupt in MSI mode on some configurations. I've tried to reproduce this and the differentiating thing seems to be that on unaffected systems no other device uses irq 16 (which seems to be the non-MSI intel gfx interrupt on all gm45).
That might be misleading. It's possible that the erroneous IRQs _are_ being issued but you're simply not aware of them. If the kernel thinks that no device is using IRQ 16 then it will leave that IRQ disabled.
I guess I should have phrased it more precisely, but that's exactly what I expect is happening on my machine: I don't have anything on irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence the irq is completely disabled. Which obviously makes it impossible for me to reproduce the issue. To test that theory, is there a quick way to force-enable a given interrupt, short of just hacking up a 2nd dummy irq handler in my driver? -Daniel
On Tue, 19 Mar 2013, Daniel Vetter wrote:
That might be misleading. It's possible that the erroneous IRQs _are_ being issued but you're simply not aware of them. If the kernel thinks that no device is using IRQ 16 then it will leave that IRQ disabled.
I guess I should have phrased it more precisely, but that's exactly what I expect is happening on my machine: I don't have anything on irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence the irq is completely disabled. Which obviously makes it impossible for me to reproduce the issue. To test that theory, is there a quick way to force-enable a given interrupt, short of just hacking up a 2nd dummy irq handler in my driver?
I don't know of any way. In fact, I have been thinking of writing a test driver module, with a module parameter telling it which IRQ number to register for. It seems like the sort of thing that would be useful to have, from time to time.
Alan Stern
On Tue, 19 Mar 2013, Alan Stern wrote:
That might be misleading. It's possible that the erroneous IRQs _are_ being issued but you're simply not aware of them. If the kernel thinks that no device is using IRQ 16 then it will leave that IRQ disabled.
I guess I should have phrased it more precisely, but that's exactly what I expect is happening on my machine: I don't have anything on irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence the irq is completely disabled. Which obviously makes it impossible for me to reproduce the issue. To test that theory, is there a quick way to force-enable a given interrupt, short of just hacking up a 2nd dummy irq handler in my driver?
I don't know of any way. In fact, I have been thinking of writing a test driver module, with a module parameter telling it which IRQ number to register for. It seems like the sort of thing that would be useful to have, from time to time.
Ok, so how about this? Daniel, is it enough to make the problem appear on your system (by building this into the kernel and booting with dummy-irq.irq=16)?
Thanks.
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
This module accepts a single 'irq' parameter, which it should register for.
Its sole purpose is to help with debugging of IRQ sharing problems, by force-enabling IRQ that would otherwise be disabled.
Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Jiri Kosina jkosina@suse.cz --- drivers/misc/Kconfig | 8 ++++++ drivers/misc/Makefile | 1 + drivers/misc/dummy-irq.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/dummy-irq.c
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index e83fdfe..db24b79 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -93,6 +93,14 @@ config ATMEL_TCB_CLKSRC_BLOCK TC can be used for other purposes, such as PWM generation and interval timing.
+config DUMMY_IRQ + tristate "Dummy IRQ handler" + default n + ---help--- + This module accepts a single 'irq' parameter, which it should register for. + Its sole purpose is to help with debugging of IRQ sharing problems, by + force-enabling IRQ that would otherwise be disabled. + config IBM_ASM tristate "Device driver for IBM RSA service processor" depends on X86 && PCI && INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 35a1463..28ff261 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_ATMEL_TCLIB) += atmel_tclib.o obj-$(CONFIG_BMP085) += bmp085.o obj-$(CONFIG_BMP085_I2C) += bmp085-i2c.o obj-$(CONFIG_BMP085_SPI) += bmp085-spi.o +obj-$(CONFIG_DUMMY_IRQ) += dummy-irq.o obj-$(CONFIG_ICS932S401) += ics932s401.o obj-$(CONFIG_LKDTM) += lkdtm.o obj-$(CONFIG_TIFM_CORE) += tifm_core.o diff --git a/drivers/misc/dummy-irq.c b/drivers/misc/dummy-irq.c new file mode 100644 index 0000000..4fc13e0 --- /dev/null +++ b/drivers/misc/dummy-irq.c @@ -0,0 +1,59 @@ +/* + * Dummy IRQ handler driver. + * + * This module only registers itself as a handler that is specified to it + * by the 'irq' parameter. + * + * The sole purpose of this module is to help with debugging of systems on + * which spurious IRQs cause the IRQ to be disabled. + * + * Copyright (C) 2013 Jiri Kosina + */ + +/* + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ +#include <linux/module.h> +#include <linux/irq.h> +#include <linux/interrupt.h> + +static int irq; + +static irqreturn_t dummy_interrupt(int irq, void *dev_id) +{ + static int count = 0; + + if (count == 0) { + printk("dummy-irq: interrupt occured on IRQ %d\n", irq); + count++; + } + + return IRQ_NONE; +} + +static int __init dummy_irq_init(void) +{ + if (request_irq(irq, &dummy_interrupt, IRQF_SHARED, "dummy_irq", &irq)) { + printk(KERN_ERR "dummy-irq: cannot register IRQ %d\n", irq); + return -EIO; + } + printk(KERN_INFO "dummy-irq: registered for IRQ %d\n", irq); + return 0; +} + +static void __exit dummy_irq_exit(void) +{ + printk(KERN_INFO "dummy-irq unloaded\n"); + free_irq(irq, &irq); + return; +} + +module_init(dummy_irq_init); +module_exit(dummy_irq_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Jiri Kosina"); +module_param_named(irq, irq, uint, 0444); +MODULE_PARM_DESC(irq, "The IRQ to register for");
On Wed, 20 Mar 2013, Jiri Kosina wrote:
I don't know of any way. In fact, I have been thinking of writing a test driver module, with a module parameter telling it which IRQ number to register for. It seems like the sort of thing that would be useful to have, from time to time.
Ok, so how about this? Daniel, is it enough to make the problem appear on your system (by building this into the kernel and booting with dummy-irq.irq=16)?
Thanks.
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
This module accepts a single 'irq' parameter, which it should register for.
Its sole purpose is to help with debugging of IRQ sharing problems, by force-enabling IRQ that would otherwise be disabled.
Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Jiri Kosina jkosina@suse.cz
This is just what I was thinking of. Three extremely minor suggestions...
+static irqreturn_t dummy_interrupt(int irq, void *dev_id) +{
- static int count = 0;
- if (count == 0) {
printk("dummy-irq: interrupt occured on IRQ %d\n", irq);
You probably should put a severity level here. KERN_INFO?
count++;
- }
- return IRQ_NONE;
+}
+static int __init dummy_irq_init(void) +{
- if (request_irq(irq, &dummy_interrupt, IRQF_SHARED, "dummy_irq", &irq)) {
printk(KERN_ERR "dummy-irq: cannot register IRQ %d\n", irq);
return -EIO;
- }
- printk(KERN_INFO "dummy-irq: registered for IRQ %d\n", irq);
- return 0;
+}
+static void __exit dummy_irq_exit(void) +{
- printk(KERN_INFO "dummy-irq unloaded\n");
- free_irq(irq, &irq);
- return;
A return statement isn't needed here.
+}
+module_init(dummy_irq_init); +module_exit(dummy_irq_exit);
+MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Jiri Kosina"); +module_param_named(irq, irq, uint, 0444);
module_param is good enough when the parameter's name is the same as the variable's name.
+MODULE_PARM_DESC(irq, "The IRQ to register for");
Alan Stern
On Wed, 20 Mar 2013, Alan Stern wrote:
Ok, so how about this? Daniel, is it enough to make the problem appear on your system (by building this into the kernel and booting with dummy-irq.irq=16)?
Thanks.
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
This module accepts a single 'irq' parameter, which it should register for.
Its sole purpose is to help with debugging of IRQ sharing problems, by force-enabling IRQ that would otherwise be disabled.
Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Jiri Kosina jkosina@suse.cz
This is just what I was thinking of. Three extremely minor suggestions...
Thanks Alan. Updated version below.
Greg, willing to merge this simple debugging facility?
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
This module accepts a single 'irq' parameter, which it should register for.
Its sole purpose is to help with debugging of IRQ sharing problems, by force-enabling IRQ that would otherwise be disabled.
Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Jiri Kosina jkosina@suse.cz --- drivers/misc/Kconfig | 8 ++++++ drivers/misc/Makefile | 1 + drivers/misc/dummy-irq.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/dummy-irq.c
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index e83fdfe..69bb79d 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -93,6 +93,14 @@ config ATMEL_TCB_CLKSRC_BLOCK TC can be used for other purposes, such as PWM generation and interval timing.
+config DUMMY_IRQ + tristate "Dummy IRQ handler" + default n + ---help--- + This module accepts a single 'irq' parameter, which it should register for. + The sole purpose of this module is to help with debugging of systems on + which spurious IRQs would happen on disabled IRQ vector. + config IBM_ASM tristate "Device driver for IBM RSA service processor" depends on X86 && PCI && INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 35a1463..28ff261 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_ATMEL_TCLIB) += atmel_tclib.o obj-$(CONFIG_BMP085) += bmp085.o obj-$(CONFIG_BMP085_I2C) += bmp085-i2c.o obj-$(CONFIG_BMP085_SPI) += bmp085-spi.o +obj-$(CONFIG_DUMMY_IRQ) += dummy-irq.o obj-$(CONFIG_ICS932S401) += ics932s401.o obj-$(CONFIG_LKDTM) += lkdtm.o obj-$(CONFIG_TIFM_CORE) += tifm_core.o diff --git a/drivers/misc/dummy-irq.c b/drivers/misc/dummy-irq.c new file mode 100644 index 0000000..7014167 --- /dev/null +++ b/drivers/misc/dummy-irq.c @@ -0,0 +1,59 @@ +/* + * Dummy IRQ handler driver. + * + * This module only registers itself as a handler that is specified to it + * by the 'irq' parameter. + * + * The sole purpose of this module is to help with debugging of systems on + * which spurious IRQs would happen on disabled IRQ vector. + * + * Copyright (C) 2013 Jiri Kosina + */ + +/* + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ +#include <linux/module.h> +#include <linux/irq.h> +#include <linux/interrupt.h> + +static int irq; + +static irqreturn_t dummy_interrupt(int irq, void *dev_id) +{ + static int count = 0; + + if (count == 0) { + printk(KERN_INFO "dummy-irq: interrupt occured on IRQ %d\n", + irq); + count++; + } + + return IRQ_NONE; +} + +static int __init dummy_irq_init(void) +{ + if (request_irq(irq, &dummy_interrupt, IRQF_SHARED, "dummy_irq", &irq)) { + printk(KERN_ERR "dummy-irq: cannot register IRQ %d\n", irq); + return -EIO; + } + printk(KERN_INFO "dummy-irq: registered for IRQ %d\n", irq); + return 0; +} + +static void __exit dummy_irq_exit(void) +{ + printk(KERN_INFO "dummy-irq unloaded\n"); + free_irq(irq, &irq); +} + +module_init(dummy_irq_init); +module_exit(dummy_irq_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Jiri Kosina"); +module_param(irq, uint, 0444); +MODULE_PARM_DESC(irq, "The IRQ to register for");
On Thu, Mar 21, 2013 at 12:21:21AM +0100, Jiri Kosina wrote:
On Wed, 20 Mar 2013, Alan Stern wrote:
Ok, so how about this? Daniel, is it enough to make the problem appear on your system (by building this into the kernel and booting with dummy-irq.irq=16)?
Thanks.
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
This module accepts a single 'irq' parameter, which it should register for.
Its sole purpose is to help with debugging of IRQ sharing problems, by force-enabling IRQ that would otherwise be disabled.
Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Jiri Kosina jkosina@suse.cz
This is just what I was thinking of. Three extremely minor suggestions...
Thanks Alan. Updated version below.
Greg, willing to merge this simple debugging facility?
Yes, I'll take it, give me a few days to catch up with my pending queue, don't worry, it's not lost.
greg k-h
On Thu, Mar 21, 2013 at 12:21:21AM +0100, Jiri Kosina wrote:
On Wed, 20 Mar 2013, Alan Stern wrote:
Ok, so how about this? Daniel, is it enough to make the problem appear on your system (by building this into the kernel and booting with dummy-irq.irq=16)?
Thanks.
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
This module accepts a single 'irq' parameter, which it should register for.
Its sole purpose is to help with debugging of IRQ sharing problems, by force-enabling IRQ that would otherwise be disabled.
Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Jiri Kosina jkosina@suse.cz
This is just what I was thinking of. Three extremely minor suggestions...
Thanks Alan. Updated version below.
Greg, willing to merge this simple debugging facility?
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
This module accepts a single 'irq' parameter, which it should register for.
Its sole purpose is to help with debugging of IRQ sharing problems, by force-enabling IRQ that would otherwise be disabled.
Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Jiri Kosina jkosina@suse.cz
Indeed, this is pretty useful and allowed me to quickly reproduce that phantom irq on my gm45. Thanks to module reloading we can even reset the kernel's irq disabling logic and so test different tricks quickly without rebooting. Really useful.
Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
Thanks, Daniel
drivers/misc/Kconfig | 8 ++++++ drivers/misc/Makefile | 1 + drivers/misc/dummy-irq.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/dummy-irq.c
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index e83fdfe..69bb79d 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -93,6 +93,14 @@ config ATMEL_TCB_CLKSRC_BLOCK TC can be used for other purposes, such as PWM generation and interval timing.
+config DUMMY_IRQ
- tristate "Dummy IRQ handler"
- default n
- ---help---
This module accepts a single 'irq' parameter, which it should register for.
The sole purpose of this module is to help with debugging of systems on
which spurious IRQs would happen on disabled IRQ vector.
config IBM_ASM tristate "Device driver for IBM RSA service processor" depends on X86 && PCI && INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 35a1463..28ff261 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_ATMEL_TCLIB) += atmel_tclib.o obj-$(CONFIG_BMP085) += bmp085.o obj-$(CONFIG_BMP085_I2C) += bmp085-i2c.o obj-$(CONFIG_BMP085_SPI) += bmp085-spi.o +obj-$(CONFIG_DUMMY_IRQ) += dummy-irq.o obj-$(CONFIG_ICS932S401) += ics932s401.o obj-$(CONFIG_LKDTM) += lkdtm.o obj-$(CONFIG_TIFM_CORE) += tifm_core.o diff --git a/drivers/misc/dummy-irq.c b/drivers/misc/dummy-irq.c new file mode 100644 index 0000000..7014167 --- /dev/null +++ b/drivers/misc/dummy-irq.c @@ -0,0 +1,59 @@ +/*
- Dummy IRQ handler driver.
- This module only registers itself as a handler that is specified to it
- by the 'irq' parameter.
- The sole purpose of this module is to help with debugging of systems on
- which spurious IRQs would happen on disabled IRQ vector.
- Copyright (C) 2013 Jiri Kosina
- */
+/*
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License version 2 as published by
- the Free Software Foundation.
- */
+#include <linux/module.h> +#include <linux/irq.h> +#include <linux/interrupt.h>
+static int irq;
+static irqreturn_t dummy_interrupt(int irq, void *dev_id) +{
- static int count = 0;
- if (count == 0) {
printk(KERN_INFO "dummy-irq: interrupt occured on IRQ %d\n",
irq);
count++;
- }
- return IRQ_NONE;
+}
+static int __init dummy_irq_init(void) +{
- if (request_irq(irq, &dummy_interrupt, IRQF_SHARED, "dummy_irq", &irq)) {
printk(KERN_ERR "dummy-irq: cannot register IRQ %d\n", irq);
return -EIO;
- }
- printk(KERN_INFO "dummy-irq: registered for IRQ %d\n", irq);
- return 0;
+}
+static void __exit dummy_irq_exit(void) +{
- printk(KERN_INFO "dummy-irq unloaded\n");
- free_irq(irq, &irq);
+}
+module_init(dummy_irq_init); +module_exit(dummy_irq_exit);
+MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Jiri Kosina"); +module_param(irq, uint, 0444); +MODULE_PARM_DESC(irq, "The IRQ to register for");
-- Jiri Kosina SUSE Labs
On Thu, 21 Mar 2013, Daniel Vetter wrote:
Indeed, this is pretty useful and allowed me to quickly reproduce that phantom irq on my gm45. Thanks to module reloading we can even reset the kernel's irq disabling logic and so test different tricks quickly without rebooting. Really useful.
Daniel,
out of curiosity, have you been able to make some sense of the phantom legacy IRQs on GM45 systems, or are we just staying with my original bandaid (disabling GMBUS IRQs), declaring GM45 broken in this respect?
Thanks,
On Sun, Mar 31, 2013 at 9:55 PM, Jiri Kosina jkosina@suse.cz wrote:
On Thu, 21 Mar 2013, Daniel Vetter wrote:
Indeed, this is pretty useful and allowed me to quickly reproduce that phantom irq on my gm45. Thanks to module reloading we can even reset the kernel's irq disabling logic and so test different tricks quickly without rebooting. Really useful.
Daniel,
out of curiosity, have you been able to make some sense of the phantom legacy IRQs on GM45 systems, or are we just staying with my original bandaid (disabling GMBUS IRQs), declaring GM45 broken in this respect?
I've played around with dummy-irq on my own gm45 and afaict every gmbus interrupt generates both an msi and a legacy pci interrupt when msi is enabled. Everything else (= other interrupt sources) seems to work as expected, and disabling msi also papers over the issue. There's also scary comments in our gm45 irq handler that we need msi to paper over some other races.
So I've decided that I don't want to dwell any longer in this particular dungeon and that your bandaid (of just disabling this mess) is the real fix.
Thanks, Daniel
On Tue, Mar 19, 2013 at 9:54 AM, Daniel Vetter daniel.vetter@ffwll.ch wrote:
I guess I should have phrased it more precisely, but that's exactly what I expect is happening on my machine: I don't have anything on irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence the irq is completely disabled. Which obviously makes it impossible for me to reproduce the issue. To test that theory, is there a quick way to force-enable a given interrupt, short of just hacking up a 2nd dummy irq handler in my driver?
You may try to add another request_irq() after i915_load_modeset_init==>drm_irq_install. That could install one dummy action for ioapic irq for i915.
Also you may need to add one quirk that does not disable intx during msi enabling like: DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2e22, quirk_msi_intx_disable_bug);
Thanks
Yinghai
On Tue, 19 Mar 2013, Yinghai Lu wrote:
I guess I should have phrased it more precisely, but that's exactly what I expect is happening on my machine: I don't have anything on irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence the irq is completely disabled. Which obviously makes it impossible for me to reproduce the issue. To test that theory, is there a quick way to force-enable a given interrupt, short of just hacking up a 2nd dummy irq handler in my driver?
You may try to add another request_irq() after i915_load_modeset_init==>drm_irq_install. That could install one dummy action for ioapic irq for i915.
Also you may need to add one quirk that does not disable intx during msi enabling like: DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2e22, quirk_msi_intx_disable_bug);
This seemed to be really promising idea to me, as the DisINTx+ vs INTx+ discrepancy is very good hint, but unfortunately, after applying this:
--- drivers/pci/quirks.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 0369fb6..8508e24 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -2643,6 +2643,9 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, 0x1073, quirk_msi_intx_disable_bug); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, 0x1083, quirk_msi_intx_disable_bug); + +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2a42, + quirk_msi_intx_disable_bug); #endif /* CONFIG_PCI_MSI */
/* Allow manual resource allocation for PCI hotplug bridges
The problem is still there ... so the inconsistency between DisINTx+ and INTx+ is unfortunately not the whole story.
On Tuesday, March 19, 2013 04:12:18 PM Daniel Vetter wrote:
On Tue, Mar 19, 2013 at 10:03 AM, Chris Wilson chris@chris-wilson.co.uk
wrote:
How about just using: if (!HAS_GMBUS_IRQ(dev_priv->dev)) gmbus4_irq_en = 0;
and the existing wait loop?
I explicitly wanted to avoid touching GMBUS4 register, as the real cause of the failure is not clear.
But, as Yinghai Lu points out, the problem is most likely caused by interrupt disabling not working properly (see his very good point regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out should work .... and it indeed does in my case, hence the (tested) patch below.
I think it's a 3.9-rc material, and I am all open to debug this further for 3.10 so that the race is closed and gmbus irqs can be used on Gen4 platform properly.
Agreed. Using the IRQ for GMBUS is just a performance feature that can be deferred until after we determine the root cause - and hope that the failure is somehow peculiar to GMBUS.
Ok, I've merged this patch. But some further investigation points at a much more severe dragon hiding here: The MSI interrupt for the intel gfx is commonly in the 40+ range, but the interrupt vector with the spurious interrupts is 16. Which is the irq of the intel gfx when MSI is disabled!
So it looks like gmbus on the intel gfx is capable of generating non-MSI interrupts in parallel to the MSI interrupts (since apparently gmbus still works, so we get the interrupts we expect). I have no idea how that could happen. Hence adding a bunch of people with more clue than me.
Hello folks,
I am using Linus git master and built an rpm for 3.9.0-rc4 which has Jiri's patch. I confirm this patch returns the GMA 4500 to working behavior as in 3.8.
Thanks everyone. Shawn
For reference below the updated commit message.
Cheers, Daniel
Author: Jiri Kosina jkosina@suse.cz Date: Tue Mar 19 09:56:57 2013 +0100
drm/i915: stop using GMBUS IRQs on Gen4 chips Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above. It turns out though that on many systems this leads to spurious
interrupts being generated, long after the register write to disable the IRQs has been issued.
Typically this results in the spurious interrupt source getting disabled: [ 9.636345] irq 16: nobody cared (try booting with the "irqpoll"
option) [ 9.637915] Pid: 4157, comm: ifup Tainted: GF 3.9.0-rc2-00341-g0863702 #422 [ 9.639484] Call Trace: [ 9.640731] <IRQ> [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7 [ 9.640731] [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8 [ 9.640731] [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214 [ 9.640731] [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c [ 9.640731] [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0 [ 9.640731] [<ffffffff8100400e>] handle_irq+0x1a/0x24 [ 9.640731] [<ffffffff81003d17>] do_IRQ+0x48/0xaf [ 9.640731] [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a [ 9.640731] <EOI> [<ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b [ 9.640731] handlers: [ 9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore] [ 9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket] [ 9.640731] Disabling IRQ #16
The really curious thing is now that irq 16 is _not_ the interrupt for the i915 driver when using MSI, but it _is_ the interrupt when not using MSI. So by all indications it seems like gmbus is able to generate a legacy (shared) interrupt in MSI mode on some configurations. I've tried to reproduce this and the differentiating thing seems to be that on unaffected systems no other device uses irq 16 (which seems to be the non-MSI intel gfx interrupt on all gm45). I have no idea how that even can happen. To avoid tempting this elephant into a rage, just disable gmbus interrupt support on gen 4. v2: Improve the commit message with exact details of what's going on. Also add a comment in the code to warn against this particular elephant in the room. Signed-off-by: Jiri Kosina <jkosina@suse.cz> (v1) Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) References: https://lkml.org/lkml/2013/3/8/325 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
dri-devel@lists.freedesktop.org