So I decided to fire up -rc2 today to see what would happen...the results are best described by the attached images. Something is clearly scrambled between my hardware and the i915 driver. Display with X is hosed, but things go weird before X gets a chance to run (it is worth noting that the initial output from the kernel is legible).
FWIW, my hardware is:
00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) Subsystem: Dell OptiPlex 755 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: Memory at fea80000 (32-bit, non-prefetchable) [size=512K] Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
What else can I provide to help track this one down?
Thanks,
jon
On Mon, 23 Aug 2010 11:01:45 -0600 Jonathan Corbet corbet@lwn.net wrote:
So I decided to fire up -rc2 today to see what would happen...the results are best described by the attached images. Something is clearly scrambled between my hardware and the i915 driver. Display with X is hosed, but things go weird before X gets a chance to run (it is worth noting that the initial output from the kernel is legible).
I went ahead and bisected the problem, which was added between -rc1 and -rc2. The end result is this:
32aad86fe88e7323d4fc5e9e423abcee0d55a03d is the first bad commit commit 32aad86fe88e7323d4fc5e9e423abcee0d55a03d Author: Chris Wilson chris@chris-wilson.co.uk Date: Wed Aug 4 13:50:25 2010 +0100
drm/i915/sdvo: Propagate errors from reading/writing control bus.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Eric Anholt eric@anholt.net
I don't know the driver or the hardware and can't begin to guess what went wrong in that patch, but, hopefully, the information is useful to somebody. Please let me know if there's anything else I can do to help track this down.
Thanks,
jon
On Mon, 23 Aug 2010 15:17:08 -0600, Jonathan Corbet corbet@lwn.net wrote:
I went ahead and bisected the problem, which was added between -rc1 and -rc2. The end result is this:
Taking the patch at face value, the cause should be a mistake in error handling. So the first step would be to identify which i2c_transfer() failed.
diff --git a/drivers/gpu/drm/i915/intel_sdvo.c b/drivers/gpu/drm/i915/intel_sdvo.c index 093e914..6afc7cf 100644 --- a/drivers/gpu/drm/i915/intel_sdvo.c +++ b/drivers/gpu/drm/i915/intel_sdvo.c @@ -269,7 +269,7 @@ static bool intel_sdvo_read_byte(struct intel_sdvo *intel_sdvo, u8 addr, u8 *ch) return true; }
- DRM_DEBUG_KMS("i2c transfer returned %d\n", ret); + WARN(1, "i2c transfer failed, ret=%d\n", ret); return false; }
@@ -284,8 +284,13 @@ static bool intel_sdvo_write_byte(struct intel_sdvo *intel_sdvo, int addr, u8 ch .buf = out_buf, } }; + int ret; + + if ((ret = i2c_transfer(intel_sdvo->base.i2c_bus, msgs, 1)) == 1) + return true;
- return i2c_transfer(intel_sdvo->base.i2c_bus, msgs, 1) == 1; + WARN(1, "i2c transfer failed, ret=%d\n", ret); + return false; }
#define SDVO_CMD_NAME_ENTRY(cmd) {cmd, #cmd}
On Mon, 23 Aug 2010 23:36:55 +0100 Chris Wilson chris@chris-wilson.co.uk wrote:
Taking the patch at face value, the cause should be a mistake in error handling. So the first step would be to identify which i2c_transfer() failed.
OK, I tried it, but neither warning triggers.
Don't know if it helps or not, but I tried booting with drm.debug=0x05. The result was truly vast amounts of stuff like this:
Aug 23 17:20:59 bike kernel: m:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x645 m:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458, m:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458, m:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, c
The above is one line from the system log; I took the liberty of wrapping it for readability.
jon
On Mon, 23 Aug 2010 17:32:25 -0600, Jonathan Corbet corbet@lwn.net wrote:
On Mon, 23 Aug 2010 23:36:55 +0100 Chris Wilson chris@chris-wilson.co.uk wrote:
Taking the patch at face value, the cause should be a mistake in error handling. So the first step would be to identify which i2c_transfer() failed.
OK, I tried it, but neither warning triggers.
Sigh, that sounds like I screwed the patch up instead. Thanks.
Don't know if it helps or not, but I tried booting with drm.debug=0x05. The result was truly vast amounts of stuff like this:
Aug 23 17:20:59 bike kernel: m:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x645
[snip]
The above is one line from the system log; I took the liberty of wrapping it for readability.
Hmm, probably bailing out of the ioctl before hitting the newline. drm.debug=0x4 should print the right information for this bug.
On Tue, 24 Aug 2010 00:37:52 +0100 Chris Wilson chris@chris-wilson.co.uk wrote:
drm.debug=0x4 should print the right information for this bug.
That doesn't seem to give me any output at all.
One thing I noticed, though, is that I occasionally get something like:
Aug 23 17:43:14 bike kernel: [ 142.920185] [drm:intel_calculate_wm] *ERROR* Insufficient FIFO for plane, expect flickering: entries required = 51, available = 28.
They seem to come in threes, for whatever that's worth.
Thanks,
jon
On Mon, 23 Aug 2010 17:46:41 -0600, Jonathan Corbet corbet@lwn.net wrote:
On Tue, 24 Aug 2010 00:37:52 +0100 Chris Wilson chris@chris-wilson.co.uk wrote:
drm.debug=0x4 should print the right information for this bug.
That doesn't seem to give me any output at all.
One thing I noticed, though, is that I occasionally get something like:
Aug 23 17:43:14 bike kernel: [ 142.920185] [drm:intel_calculate_wm] *ERROR* Insufficient FIFO for plane, expect flickering: entries required = 51, available = 28.
They seem to come in threes, for whatever that's worth.
In threes. Hmm, one for primary, cursor and self-refresh. drm.debug=0xe would be interesting to see what the pixel clock is.
Can you grab one before the bad commit and one after? If there is a change that may help pin-point the mistake. Or indicate further problems...
On Tue, 24 Aug 2010 00:55:54 +0100 Chris Wilson chris@chris-wilson.co.uk wrote:
In threes. Hmm, one for primary, cursor and self-refresh. drm.debug=0xe would be interesting to see what the pixel clock is.
Can you grab one before the bad commit and one after? If there is a change that may help pin-point the mistake. Or indicate further problems...
OK, three files attached; drm.good is from 2.6.35, drm.bad is from 2.6.36-rc2. I also stripped the times and did a diff, in case that's useful.
If you'd like output from right around the bad commit, say the word; that will take a bit of building time (I didn't keep all those bisect kernels around) but I can do it.
Thanks,
jon
On Tue, 24 Aug 2010 07:16:26 -0600, Jonathan Corbet corbet@lwn.net wrote:
On Tue, 24 Aug 2010 00:55:54 +0100 Chris Wilson chris@chris-wilson.co.uk wrote:
In threes. Hmm, one for primary, cursor and self-refresh. drm.debug=0xe would be interesting to see what the pixel clock is.
Can you grab one before the bad commit and one after? If there is a change that may help pin-point the mistake. Or indicate further problems...
OK, three files attached; drm.good is from 2.6.35, drm.bad is from 2.6.36-rc2. I also stripped the times and did a diff, in case that's useful.
[snip]
-[drm:intel_calculate_wm], FIFO entries required for mode: 48 -[drm:intel_calculate_wm], FIFO watermark level: -22 +[drm:intel_calculate_wm], FIFO entries required for mode: 49 +[drm:intel_calculate_wm], FIFO watermark level: -23 +*ERROR* Insufficient FIFO for plane, expect flickering: entries required = 51, available = 28. [drm:intel_calculate_wm], FIFO entries required for mode: 0 [drm:intel_calculate_wm], FIFO watermark level: 29 [drm:i9xx_update_wm], FIFO watermarks - A: 1, B: 29 -[drm:i9xx_update_wm], self-refresh entries: 60 -[drm:i9xx_update_wm], Setting FIFO watermarks - A: 1, B: 29, C: 2, SR 35 -[drm:i915_get_vblank_counter], trying to get vblank count for disabled pipe 1 +[drm:i9xx_update_wm], self-refresh entries: 120 +[drm:i9xx_update_wm], Setting FIFO watermarks - A: 1, B: 29, C: 2, SR 1
I'm going to focus on this since this could account for the on-screen corruption. Here we suddenly double the computed minimal FIFO size for self-refresh and due to a separate bug program a minimal low watermark.
That should addressed with http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=drm-testing&id=30... however that patch isn't quite ready yet since Jesse pointed out that some chipsets do indeed want a high-watermark instead of the low-watermark used, at least, for gen3+.
The question though is why that bad commit would cause a doubling of the SR. Thanks for the diff, I now know that I need to look more closely at the mode-fixup for SDVO.
On poniedziałek, 23 sierpnia 2010 o 19:01:45 Jonathan Corbet wrote:
So I decided to fire up -rc2 today to see what would happen...the results are best described by the attached images. Something is clearly scrambled between my hardware and the i915 driver. Display with X is hosed, but things go weird before X gets a chance to run (it is worth noting that the initial output from the kernel is legible).
FWIW, my hardware is:
00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) Subsystem: Dell OptiPlex 755 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: Memory at fea80000 (32-bit, non-prefetchable) [size=512K] Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
What else can I provide to help track this one down?
I created a Bugzilla entry at https://bugzilla.kernel.org/show_bug.cgi?id=17151 for your bug report, please add your address to the CC list in there, thanks!
dri-devel@lists.freedesktop.org