On Fri, 15 Mar 2013, Harald Arnesen wrote:
I have the same problem on my Lenovo T500. I think the graphics card is involved.
This laptop has "hybrid graphics" - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get "irq 16: nobody cared" during boot, not when I boot with the ATI card.
Confirming this. After a lot of hassle, I have bisected this reliably to
commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100
drm/i915: use the gmbus irq for waits
Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel.
Attaching dmesg.txt from the machine with 28c70f162a as head, with drm.debug=0xe.
On Fri, 15 Mar 2013, Jiri Kosina wrote:
Attaching dmesg.txt from the machine with 28c70f162a as head, with drm.debug=0xe.
OK, now actually attaching it, sorry.
On Fri, 15 Mar 2013, Jiri Kosina wrote:
I have the same problem on my Lenovo T500. I think the graphics card is involved.
This laptop has "hybrid graphics" - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get "irq 16: nobody cared" during boot, not when I boot with the ATI card.
Confirming this. After a lot of hassle, I have bisected this reliably to
commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100
drm/i915: use the gmbus irq for waits
Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel.
Attaching dmesg.txt from the machine with 28c70f162a as head, with drm.debug=0xe.
Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that "nobody cared" for irq 16 happens long after last
I915_WRITE(GMBUS4 + reg_offset, 0);
has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the "nobody cared" for irq 16 is gone.
So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk?
Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all.
On Fri, Mar 15, 2013 at 8:14 AM, Jiri Kosina jkosina@suse.cz wrote:
Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that "nobody cared" for irq 16 happens long after last
I915_WRITE(GMBUS4 + reg_offset, 0);
has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the "nobody cared" for irq 16 is gone.
So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk?
Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all.
that device is using i915 0000:00:02.0: irq 44 for MSI/MSI-X
so can you try to boot with pci=nomsi?
On Friday, March 15, 2013 12:14:28 PM Yinghai Lu wrote:
On Fri, Mar 15, 2013 at 8:14 AM, Jiri Kosina jkosina@suse.cz wrote:
Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that "nobody cared" for irq 16 happens long after last
I915_WRITE(GMBUS4 + reg_offset, 0);
has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the "nobody cared" for irq 16 is gone.
So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk?
Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all.
that device is using i915 0000:00:02.0: irq 44 for MSI/MSI-X
so can you try to boot with pci=nomsi?
I can try disabling MSI with 3.9.0-0.rc2.git0.4.fc20. -rc3 is not yet available in rawhide.
thanks, Shawn
On Fri, 15 Mar 2013, Yinghai Lu wrote:
Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that "nobody cared" for irq 16 happens long after last
I915_WRITE(GMBUS4 + reg_offset, 0);
has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the "nobody cared" for irq 16 is gone.
So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk?
Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all.
that device is using i915 0000:00:02.0: irq 44 for MSI/MSI-X
so can you try to boot with pci=nomsi?
Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away.
My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it.
On Mon, Mar 18, 2013 at 2:12 AM, Jiri Kosina jkosina@suse.cz wrote:
On Fri, 15 Mar 2013, Yinghai Lu wrote:
Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that "nobody cared" for irq 16 happens long after last
I915_WRITE(GMBUS4 + reg_offset, 0);
has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the "nobody cared" for irq 16 is gone.
So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk?
Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all.
that device is using i915 0000:00:02.0: irq 44 for MSI/MSI-X
so can you try to boot with pci=nomsi?
Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away.
My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it.
What is /proc/interrupts difference between with and without pci=nomsi ?
drm is forced to share irq 16?
Thanks
Yinghai
On Mon, 18 Mar 2013, Yinghai Lu wrote:
Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away.
My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it.
What is /proc/interrupts difference between with and without pci=nomsi ?
drm is forced to share irq 16?
Yup, IRQ 16 is being shared, and one of the owners is i915.
On Mon, Mar 18, 2013 at 3:05 PM, Jiri Kosina jkosina@suse.cz wrote:
On Mon, 18 Mar 2013, Yinghai Lu wrote:
Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away.
My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it.
What is /proc/interrupts difference between with and without pci=nomsi ?
drm is forced to share irq 16?
Yup, IRQ 16 is being shared, and one of the owners is i915.
the vga report strange INTx status...
00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 20e4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+ Latency: 0 Interrupt: pin A routed to IRQ 44 Region 0: Memory at f2000000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at 1800 [size=8] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee0100c Data: 4142 Capabilities: [d0] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 Kernel modules: i915
it should be INTx-, after we have set DisINTx+ in control.
So INTx can not be disabled after it get enabled before ?
the VGA on my T420 looks right.
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 21ce Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Thanks
Yinghai
On Mon, Mar 18, 2013 at 10:12:49AM +0100, Jiri Kosina wrote:
On Fri, 15 Mar 2013, Yinghai Lu wrote:
Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that "nobody cared" for irq 16 happens long after last
I915_WRITE(GMBUS4 + reg_offset, 0);
has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the "nobody cared" for irq 16 is gone.
So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk?
Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all.
that device is using i915 0000:00:02.0: irq 44 for MSI/MSI-X
so can you try to boot with pci=nomsi?
Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away.
My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it.
Yep, there's a big comment in the irq handler for that chipset that we have a gaping race with when using MSI interrupts. Although the comment bodly claims that the race is small enough to avoid the dreaded "nobody cared" message. Looks like gmbus is good at hitting that race - on newer chips it already brought up a similar race in handling pch interrupts.
Can you please give the below patch a whirl? It removes the probably race msi race avoidance code and replaces it with the same trick Paulo used to fix pch irq handling races.
Thanks, Daniel --- diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 3c7bb04..13de12e 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2684,7 +2684,7 @@ static irqreturn_t i965_irq_handler(int irq, void *arg) { struct drm_device *dev = (struct drm_device *) arg; drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private; - u32 iir, new_iir; + u32 iir, new_iir, ier; u32 pipe_stats[I915_MAX_PIPES]; unsigned long irqflags; int irq_received; @@ -2692,9 +2692,14 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
atomic_inc(&dev_priv->irq_received);
+ /* irq race avoidance, copy&pasta from Paulo's PCH irq fix */ + ier = I915_READ(IER); + I915_WRITE(IER, 0); + POSTING_READ(IER); + iir = I915_READ(IIR);
- for (;;) { + do { bool blc_event = false;
irq_received = iir != 0; @@ -2792,7 +2797,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg) * stray interrupts. */ iir = new_iir; - } + } while (0); + + I915_WRITE(IER, ier); + POSTING_READ(IER);
i915_update_dri1_breadcrumb(dev);
On Mon, Mar 18, 2013 at 08:19:03PM +0100, Daniel Vetter wrote:
On Mon, Mar 18, 2013 at 10:12:49AM +0100, Jiri Kosina wrote:
On Fri, 15 Mar 2013, Yinghai Lu wrote:
Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that "nobody cared" for irq 16 happens long after last
I915_WRITE(GMBUS4 + reg_offset, 0);
has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the "nobody cared" for irq 16 is gone.
So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk?
Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all.
that device is using i915 0000:00:02.0: irq 44 for MSI/MSI-X
so can you try to boot with pci=nomsi?
Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away.
My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it.
Yep, there's a big comment in the irq handler for that chipset that we have a gaping race with when using MSI interrupts. Although the comment bodly claims that the race is small enough to avoid the dreaded "nobody cared" message. Looks like gmbus is good at hitting that race - on newer chips it already brought up a similar race in handling pch interrupts.
Can you please give the below patch a whirl? It removes the probably race msi race avoidance code and replaces it with the same trick Paulo used to fix pch irq handling races.
Still nobody cares about irq16. -Chris
On Mon, 18 Mar 2013, Daniel Vetter wrote:
Yep, there's a big comment in the irq handler for that chipset that we have a gaping race with when using MSI interrupts. Although the comment bodly claims that the race is small enough to avoid the dreaded "nobody cared" message. Looks like gmbus is good at hitting that race - on newer chips it already brought up a similar race in handling pch interrupts.
I see ... will target my focus in that direction, thanks.
Can you please give the below patch a whirl? It removes the probably race msi race avoidance code and replaces it with the same trick Paulo used to fix pch irq handling races.
Unfortunately it didn't change anything, the spurious interrupt report is still there.
On Fri, Mar 15, 2013 at 02:33:13PM +0100, Jiri Kosina wrote:
On Fri, 15 Mar 2013, Harald Arnesen wrote:
I have the same problem on my Lenovo T500. I think the graphics card is involved.
This laptop has "hybrid graphics" - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get "irq 16: nobody cared" during boot, not when I boot with the ATI card.
Confirming this. After a lot of hassle, I have bisected this reliably to
commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100
drm/i915: use the gmbus irq for waits
Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel.
Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)? I can't figure out the exact commit that the merge message referred to though...
greg k-h
On Fri, 15 Mar 2013, Greg KH wrote:
I have the same problem on my Lenovo T500. I think the graphics card is involved.
This laptop has "hybrid graphics" - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get "irq 16: nobody cared" during boot, not when I boot with the ATI card.
Confirming this. After a lot of hassle, I have bisected this reliably to
commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100
drm/i915: use the gmbus irq for waits
Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel.
Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)?
Why do you think it should, please?
(I am seeing this with a2362d247 still).
On Fri, Mar 15, 2013 at 04:37:56PM +0100, Jiri Kosina wrote:
On Fri, 15 Mar 2013, Greg KH wrote:
I have the same problem on my Lenovo T500. I think the graphics card is involved.
This laptop has "hybrid graphics" - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get "irq 16: nobody cared" during boot, not when I boot with the ATI card.
Confirming this. After a lot of hassle, I have bisected this reliably to
commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100
drm/i915: use the gmbus irq for waits
Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel.
Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)?
Why do you think it should, please?
The line: - Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs and subsequent fallout (Paulo)
(I am seeing this with a2362d247 still).
Ok, I guess it isn't still fixed properly, just was guessing :)
greg k-h
On Fri, 15 Mar 2013, Greg KH wrote:
I have the same problem on my Lenovo T500. I think the graphics card is involved.
This laptop has "hybrid graphics" - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get "irq 16: nobody cared" during boot, not when I boot with the ATI card.
Confirming this. After a lot of hassle, I have bisected this reliably to
commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100
drm/i915: use the gmbus irq for waits
Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel.
Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)?
Why do you think it should, please?
The line:
- Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs and subsequent fallout (Paulo)
Ah, that one. I believe that should be irrelevant for GM chipsets, as they don't have AUX line, right?
(I am seeing this with a2362d247 still).
Ok, I guess it isn't still fixed properly, just was guessing :)
Seems like this is a different issue.
Thanks,
On Fri, Mar 15, 2013 at 08:47:39AM -0700, Greg KH wrote:
On Fri, Mar 15, 2013 at 04:37:56PM +0100, Jiri Kosina wrote:
On Fri, 15 Mar 2013, Greg KH wrote:
I have the same problem on my Lenovo T500. I think the graphics card is involved.
This laptop has "hybrid graphics" - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get "irq 16: nobody cared" during boot, not when I boot with the ATI card.
Confirming this. After a lot of hassle, I have bisected this reliably to
commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100
drm/i915: use the gmbus irq for waits
Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel.
Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)?
Why do you think it should, please?
The line:
- Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs and subsequent fallout (Paulo)
(I am seeing this with a2362d247 still).
Ok, I guess it isn't still fixed properly, just was guessing :)
Yeah, the above fix is for pch split platforms, whereas these reports here are for gm45 (which doesn't have the pch display split). Acking of gmbus interrupts works differently on those, I'm testing right now whether I can reproduce this fail. -Daniel
Okay, so I think that for 3.9 we want the patch below, and if eventually hardware root cause / workaround is found for GM45, we can have it merged later.
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips
Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above.
It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued.
Flushing of the register writes by POSTING_READ() directly after the register write doesn't work either.
Disable using of GMBUS IRQs on Gen4 systems before the root cause is found and revert back to old behavior.
Also be more careful about not issuing GMBUS4 register reads in gmbus_wait_hw_status() if we are not using GMBUS IRQs.
Signed-off-by: Jiri Kosina jkosina@suse.cz --- drivers/gpu/drm/i915/intel_i2c.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index acf8aec..8638036 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -58,12 +58,14 @@ to_intel_gmbus(struct i2c_adapter *i2c) return container_of(i2c, struct intel_gmbus, adapter); }
+#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; I915_WRITE(dev_priv->gpio_mmio_base + GMBUS0, 0); - I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0); + if (HAS_GMBUS_IRQ(dev)) + I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0); }
static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo->data = bus; }
-#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait);
+ if (!HAS_GMBUS_IRQ(dev_priv->dev)) { + int ret; + ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) & + (GMBUS_SATOER | gmbus2_status), + 50); + return ret; + } /* Important: The hw handles only the first bit, so set only one! Since * we also need to check for NAKs besides the hw ready/idle signal, we * need to wake up periodically and check that ourselves. */
On Mon, Mar 18, 2013 at 04:56:02PM +0100, Jiri Kosina wrote:
Okay, so I think that for 3.9 we want the patch below, and if eventually hardware root cause / workaround is found for GM45, we can have it merged later.
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips
Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above.
It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued.
Flushing of the register writes by POSTING_READ() directly after the register write doesn't work either.
Disable using of GMBUS IRQs on Gen4 systems before the root cause is found and revert back to old behavior.
Also be more careful about not issuing GMBUS4 register reads in gmbus_wait_hw_status() if we are not using GMBUS IRQs.
Signed-off-by: Jiri Kosina jkosina@suse.cz
drivers/gpu/drm/i915/intel_i2c.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index acf8aec..8638036 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -58,12 +58,14 @@ to_intel_gmbus(struct i2c_adapter *i2c) return container_of(i2c, struct intel_gmbus, adapter); }
+#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; I915_WRITE(dev_priv->gpio_mmio_base + GMBUS0, 0);
- I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0);
- if (HAS_GMBUS_IRQ(dev))
I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0);
There should not be any harm in always clearing GMBUS4, it exists on all platforms.
}
static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo->data = bus; }
-#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait);
- if (!HAS_GMBUS_IRQ(dev_priv->dev)) {
int ret;
ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) &
(GMBUS_SATOER | gmbus2_status),
50);
This should couple up to the normal return code that chooses the appropriate return value based on gmbus2.
How about just using: if (!HAS_GMBUS_IRQ(dev_priv->dev)) gmbus4_irq_en = 0; and the existing wait loop?
return ret;
- } /* Important: The hw handles only the first bit, so set only one! Since
- we also need to check for NAKs besides the hw ready/idle signal, we
- need to wake up periodically and check that ourselves. */
On Mon, 18 Mar 2013, Chris Wilson wrote:
+#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; I915_WRITE(dev_priv->gpio_mmio_base + GMBUS0, 0);
- I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0);
- if (HAS_GMBUS_IRQ(dev))
I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0);
There should not be any harm in always clearing GMBUS4, it exists on all platforms.
}
static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo->data = bus; }
-#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait);
- if (!HAS_GMBUS_IRQ(dev_priv->dev)) {
int ret;
ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) &
(GMBUS_SATOER | gmbus2_status),
50);
This should couple up to the normal return code that chooses the appropriate return value based on gmbus2.
How about just using: if (!HAS_GMBUS_IRQ(dev_priv->dev)) gmbus4_irq_en = 0; and the existing wait loop?
I explicitly wanted to avoid touching GMBUS4 register, as the real cause of the failure is not clear.
But, as Yinghai Lu points out, the problem is most likely caused by interrupt disabling not working properly (see his very good point regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out should work .... and it indeed does in my case, hence the (tested) patch below.
I think it's a 3.9-rc material, and I am all open to debug this further for 3.10 so that the race is closed and gmbus irqs can be used on Gen4 platform properly.
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips
Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above.
It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued.
Flushing of the register writes by POSTING_READ() directly after the register write doesn't work either.
Disable using of GMBUS IRQs on Gen4 systems before the root cause is found and revert back to old behavior.
Signed-off-by: Jiri Kosina jkosina@suse.cz --- drivers/gpu/drm/i915/intel_i2c.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index acf8aec..9934724 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -203,7 +203,7 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo->data = bus; }
-#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 4) +#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 5) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +214,8 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait);
+ if (!HAS_GMBUS_IRQ(dev_priv->dev)) + gmbus4_irq_en = 0; /* Important: The hw handles only the first bit, so set only one! Since * we also need to check for NAKs besides the hw ready/idle signal, we * need to wake up periodically and check that ourselves. */
On Tue, Mar 19, 2013 at 09:56:57AM +0100, Jiri Kosina wrote:
On Mon, 18 Mar 2013, Chris Wilson wrote:
+#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; I915_WRITE(dev_priv->gpio_mmio_base + GMBUS0, 0);
- I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0);
- if (HAS_GMBUS_IRQ(dev))
I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0);
There should not be any harm in always clearing GMBUS4, it exists on all platforms.
}
static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo->data = bus; }
-#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait);
- if (!HAS_GMBUS_IRQ(dev_priv->dev)) {
int ret;
ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) &
(GMBUS_SATOER | gmbus2_status),
50);
This should couple up to the normal return code that chooses the appropriate return value based on gmbus2.
How about just using: if (!HAS_GMBUS_IRQ(dev_priv->dev)) gmbus4_irq_en = 0; and the existing wait loop?
I explicitly wanted to avoid touching GMBUS4 register, as the real cause of the failure is not clear.
But, as Yinghai Lu points out, the problem is most likely caused by interrupt disabling not working properly (see his very good point regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out should work .... and it indeed does in my case, hence the (tested) patch below.
I think it's a 3.9-rc material, and I am all open to debug this further for 3.10 so that the race is closed and gmbus irqs can be used on Gen4 platform properly.
Agreed. Using the IRQ for GMBUS is just a performance feature that can be deferred until after we determine the root cause - and hope that the failure is somehow peculiar to GMBUS.
Acked-by: Chris Wilson chris@chris-wilson.co.uk -Chris
On Mon, Mar 18, 2013 at 04:56:02PM +0100, Jiri Kosina wrote:
Okay, so I think that for 3.9 we want the patch below, and if eventually hardware root cause / workaround is found for GM45, we can have it merged later.
I'd prefer if we dig into this for a bit more. I've been traveling last week and only just now recovered from the mail backlog, hence my delays in handling this.
Also the "nobody cared" showing up only very late after the last gmbus transaction isn't too surprising: It only fires after 100k interrupts, and apparently the irq handler is already a bit racy, so any interrupt might push it over the edge. gmbus interrupts are apparently just help to expose the race (presuming it's indeed the msi race already docuemented in the code). -Daniel
From: Jiri Kosina jkosina@suse.cz Subject: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips
Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above.
It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued.
Flushing of the register writes by POSTING_READ() directly after the register write doesn't work either.
Disable using of GMBUS IRQs on Gen4 systems before the root cause is found and revert back to old behavior.
Also be more careful about not issuing GMBUS4 register reads in gmbus_wait_hw_status() if we are not using GMBUS IRQs.
Signed-off-by: Jiri Kosina jkosina@suse.cz
drivers/gpu/drm/i915/intel_i2c.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index acf8aec..8638036 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -58,12 +58,14 @@ to_intel_gmbus(struct i2c_adapter *i2c) return container_of(i2c, struct intel_gmbus, adapter); }
+#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; I915_WRITE(dev_priv->gpio_mmio_base + GMBUS0, 0);
- I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0);
- if (HAS_GMBUS_IRQ(dev))
I915_WRITE(dev_priv->gpio_mmio_base + GMBUS4, 0);
}
static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo->data = bus; }
-#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait);
- if (!HAS_GMBUS_IRQ(dev_priv->dev)) {
int ret;
ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) &
(GMBUS_SATOER | gmbus2_status),
50);
return ret;
- } /* Important: The hw handles only the first bit, so set only one! Since
- we also need to check for NAKs besides the hw ready/idle signal, we
- need to wake up periodically and check that ourselves. */
-- Jiri Kosina SUSE Labs
dri-devel@lists.freedesktop.org