On Wed, Nov 10, 2010 at 2:28 PM, Andrew Lutomirski andy@luto.us wrote:
Hi all-
Somewhere between 2.6.34-fedora-whatever and 2.6.36, Nouveau became extremely broken on my hardware. It appears to be triggered by a bug in my monitor (HP LP2475w), which causes the monitor to disappear from DVI when it goes to sleep. Every time the console blanks (in X or otherwise AFAICT) the system crashes oddly but unrecoverably. This is 100% reproducible by Ctrl-Alt-F2 followed by 'echo 1
/sys/class/graphics/fb0/blank' *from SSH* and waiting a few seconds
for the monitor to go to sleep, but it also happens if I just walk away from the computer long enough for it to blank itself. This is present on F14's kernel and on 2.6.36 from kernel.org. This may or may not be related to the unreproducible crashes that I used to get rarely on 2.6.34.
The best hint I have is from this patch (sorry for whitespace damage):
which spews "nv50 got hpd irq" once the display blanks.
I tracked it down. The interrupt code in 2.6.36 is totally broken --- it acknowledges the interrupt *in the bottom half*. This might work by accident if the bottom half gets queued on a different CPU, but something probably changed (concurrency-managed workqueues?) that make the BH end up on the same cpu. So the cpu starves the BH and there goes a cpu.
Then the clocksource watchdog hits and takes the whole system down when it calls stop_machine, which also gets starved on that cpu.
Patch coming.
--Andy