I started wondering why 'top' was showing an otherwise idle system as having a load average of 0.5+, and worker threads constantly using the CPU.
So I did a system-wide profile, and got the attached output (look at it in a really wide terminal).
There seems to be something _seriously_ wrong with i915 SDVO detect. This is on an Apple Mac Mini (hey, your favorite problem child!), and apparently it spends 20% of its non-idle CPU time just doing udelay's for the i2c SDVO connection detection.
That sounds a bit wrong, doesn't it?
I don't know how recent this is - it might have been going on for some time without me noticing. It's the wife's computer, and the same thing doesn't seem to happen on my Core i5 desktop
Any ideas? Any information I can give about the machine?
Linus
On Sun, Aug 15, 2010 at 07:32:15PM -0700, Linus Torvalds wrote:
I started wondering why 'top' was showing an otherwise idle system as having a load average of 0.5+, and worker threads constantly using the CPU.
So I did a system-wide profile, and got the attached output (look at it in a really wide terminal).
There seems to be something _seriously_ wrong with i915 SDVO detect. This is on an Apple Mac Mini (hey, your favorite problem child!), and apparently it spends 20% of its non-idle CPU time just doing udelay's for the i2c SDVO connection detection.
That sounds a bit wrong, doesn't it?
I don't know how recent this is - it might have been going on for some time without me noticing. It's the wife's computer, and the same thing doesn't seem to happen on my Core i5 desktop
Any ideas? Any information I can give about the machine?
You're not alone. I'm seeing similar things on my machine (AMD,Radeon). The monitor outputs are polled every 10 sec in drivers/gpu/drm/drm_crtc_helper.c.
There is a patch attached to the following bug report that disables polling and solves the problem for me: http://bugs.freedesktop.org/show_bug.cgi?id=29536
On Sun, 2010-08-15 at 19:32 -0700, Linus Torvalds wrote:
I started wondering why 'top' was showing an otherwise idle system as having a load average of 0.5+, and worker threads constantly using the CPU.
So I did a system-wide profile, and got the attached output (look at it in a really wide terminal).
There seems to be something _seriously_ wrong with i915 SDVO detect. This is on an Apple Mac Mini (hey, your favorite problem child!), and apparently it spends 20% of its non-idle CPU time just doing udelay's for the i2c SDVO connection detection.
That sounds a bit wrong, doesn't it?
At least we should replace mdelay with msleep in those functions.
Can you get a boot with drm.debug=4?
I wonder are we picking up a bad SDVO, the insane code retries 50 times with a hard loop delay.
Dave.
On Sun, Aug 15, 2010 at 8:30 PM, Dave Airlie airlied@redhat.com wrote:
At least we should replace mdelay with msleep in those functions.
How precise does the timing have to be? I think i2c is self-clocking, so it's ok to see big skews? Becuase msleep() can be off by quite a bit (mdelay can too, but it's _way_ more rare, and requires either a preemptible kernel or serious interrupt activity).
Can you get a boot with drm.debug=4?
Sure. That results in a truncated dmesg (with a 128k buffer). The thing seems to spew out something every ten seconds:
... [ 232.610044] [drm:intel_sdvo_debug_write], SDVOB: W: 0B (SDVO_CMD_GET_ATTACHED_DISPLAYS) [ 232.624504] [drm:intel_sdvo_debug_response], SDVOB: R: 01 00 (Success) [ 232.624517] [drm:intel_sdvo_detect], SDVO response 1 0 [ 232.624524] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 01 (SDVO_CMD_SET_CONTROL_BUS_SWITCH) [ 242.672044] [drm:intel_sdvo_debug_write], SDVOB: W: 0B (SDVO_CMD_GET_ATTACHED_DISPLAYS) [ 242.686503] [drm:intel_sdvo_debug_response], SDVOB: R: 01 00 (Success) [ 242.686516] [drm:intel_sdvo_detect], SDVO response 1 0 [ 242.686523] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 01 (SDVO_CMD_SET_CONTROL_BUS_SWITCH) [ 252.750044] [drm:intel_sdvo_debug_write], SDVOB: W: 0B (SDVO_CMD_GET_ATTACHED_DISPLAYS) [ 252.764526] [drm:intel_sdvo_debug_response], SDVOB: R: 01 00 (Success) [ 252.764539] [drm:intel_sdvo_detect], SDVO response 1 0 ...
It looks like it takes about 15 ms each time. But 15 ms each 10s doesn't seem to be enough to account for the load average. Maybe it gets synchronized with the timer tick or something, causing the load average to look artificially inflated (it also doesn't match up with kworker using 1%+ CPU time).
So maybe there is something else going on. Maybe the load average thing comes from some interaction with the new workqueue thing.
I'll send the whole dmesg to you in a private message, I don't think we want 128kB of crud on lkml.
I wonder are we picking up a bad SDVO, the insane code retries 50 times with a hard loop delay.
Well, I'm not seeing any failures, but maybe those don't get printed out?
Linus
On Sun, 2010-08-15 at 21:01 -0700, Linus Torvalds wrote:
On Sun, Aug 15, 2010 at 8:30 PM, Dave Airlie airlied@redhat.com wrote:
At least we should replace mdelay with msleep in those functions.
How precise does the timing have to be? I think i2c is self-clocking, so it's ok to see big skews? Becuase msleep() can be off by quite a bit (mdelay can too, but it's _way_ more rare, and requires either a preemptible kernel or serious interrupt activity).
In this case it definitely doesn't matter, I expect msleep to be a much nicer to the system in general idea esp in routines we can all at runtime from userspace.
Can you get a boot with drm.debug=4?
Sure. That results in a truncated dmesg (with a 128k buffer). The thing seems to spew out something every ten seconds:
... [ 232.610044] [drm:intel_sdvo_debug_write], SDVOB: W: 0B (SDVO_CMD_GET_ATTACHED_DISPLAYS) [ 232.624504] [drm:intel_sdvo_debug_response], SDVOB: R: 01 00 (Success) [ 232.624517] [drm:intel_sdvo_detect], SDVO response 1 0 [ 232.624524] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 01 (SDVO_CMD_SET_CONTROL_BUS_SWITCH) [ 242.672044] [drm:intel_sdvo_debug_write], SDVOB: W: 0B (SDVO_CMD_GET_ATTACHED_DISPLAYS) [ 242.686503] [drm:intel_sdvo_debug_response], SDVOB: R: 01 00 (Success) [ 242.686516] [drm:intel_sdvo_detect], SDVO response 1 0 [ 242.686523] [drm:intel_sdvo_debug_write], SDVOB: W: 7A 01 (SDVO_CMD_SET_CONTROL_BUS_SWITCH) [ 252.750044] [drm:intel_sdvo_debug_write], SDVOB: W: 0B (SDVO_CMD_GET_ATTACHED_DISPLAYS) [ 252.764526] [drm:intel_sdvo_debug_response], SDVOB: R: 01 00 (Success) [ 252.764539] [drm:intel_sdvo_detect], SDVO response 1 0 ...
It looks like it takes about 15 ms each time. But 15 ms each 10s doesn't seem to be enough to account for the load average. Maybe it gets synchronized with the timer tick or something, causing the load average to look artificially inflated (it also doesn't match up with kworker using 1%+ CPU time).
So maybe there is something else going on. Maybe the load average thing comes from some interaction with the new workqueue thing.
I'll send the whole dmesg to you in a private message, I don't think we want 128kB of crud on lkml.
I wonder are we picking up a bad SDVO, the insane code retries 50 times with a hard loop delay.
Well, I'm not seeing any failures, but maybe those don't get printed out?
Oh wierd, so not where I thought it was, I expect then the SDVO HDMI detection is completely insane,
intel_sdvo_hdmi_sink_detect looks to contain some really uninspiring code. Might be worth adding some debug in there to see if it sinks a lot of time.
Dave.
Linus Torvalds wrote:
I started wondering why 'top' was showing an otherwise idle system as having a load average of 0.5+, and worker threads constantly using the CPU.
So I did a system-wide profile, and got the attached output (look at it in a really wide terminal).
There seems to be something _seriously_ wrong with i915 SDVO detect. This is on an Apple Mac Mini (hey, your favorite problem child!), and apparently it spends 20% of its non-idle CPU time just doing udelay's for the i2c SDVO connection detection.
You might be hitting the infamous hotplug storm [1]. The symptoms vary by kernel version.
2.6.34 and before: udevadm --monitor shows craploads of events and, as long as X is running, X keeps reprobing the outputs which (depending on the particular bug) can suck cpu in the i2c code or cause more hotplug events. It also makes X oddly laggy.
2.6.35 and newer: The kernel is smart enough to probe outputs itself before telling X, so the events never hit userspace. But things still can get a bit laggy.
Anyone know why merely *reading* /sys/class/drm/whatever/status causes the output to get probed? (I see it in the code, but I have no idea why that code's still there after most of the rest of the hotplug code got cleaned up in 2.6.36).
Once I find some free time, I plan on trying to at least fix the issue that causes this bug for me. (It's apparently quite nontrivial due to silliness in the way dock/undock (!) works on some laptops.)
[1] for example: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg00921.html
That sounds a bit wrong, doesn't it?
I don't know how recent this is - it might have been going on for some time without me noticing. It's the wife's computer, and the same thing doesn't seem to happen on my Core i5 desktop
Any ideas? Any information I can give about the machine?
If I'm right, the outputs of intel_bios_dumper and intel_bios_reader could be instructive (both are in intel-gpu-tools).
You could also try intel_reg_write 0x61110 0x0 and see if the problem stops (at least until a suspend/resume cycle). That command turns off output hotplug on the card, which has the side effect that the kernel will stop acting on bogus interrupts.
--Andy
On Sun, Aug 15, 2010 at 9:06 PM, Andy Lutomirski luto@myrealbox.com wrote:
You might be hitting the infamous hotplug storm [1]. The symptoms vary by kernel version.
Hmm. I don't think it's a storm. The drm.debug=4 thing shows things just every 10 seconds. That seems pretty controlled.
Of course, it seems to be several milliseconds worth of work then, so it's not _too_ controlled. I must be missing some detail.
If I'm right, the outputs of intel_bios_dumper and intel_bios_reader could be instructive (both are in intel-gpu-tools).
My version of intel-gpu-tools must be old (fedora 12). It has intel_gpu_dump and intel_reg_write and some other apps, but not the bios-dumper/reader.
Not that I could read the output of them anyway, I bet ;)
You could also try intel_reg_write 0x61110 0x0 and see if the problem stops (at least until a suspend/resume cycle). That command turns off output hotplug on the card, which has the side effect that the kernel will stop acting on bogus interrupts.
# intel_reg_write 0x61110 0x0 Value before: 0x4000220 Value after: 0x0
but it doesn't seem to change any behavior. Still that sdvo probe every 10 s, and still 1% of CPU for kworker in top. But that kworker thing definitely has to be separate from the 10-second sdvo probe, because it shows up all the time (ie top updates every second).
But right now the wife is using that machine, so I can't check the "it should be totally idle, what the heck is going on" behavior at the moment.
Linus
Linus
[cc: intel-gfx]
Linus Torvalds wrote:
On Sun, Aug 15, 2010 at 9:06 PM, Andy Lutomirski luto@myrealbox.com wrote:
You might be hitting the infamous hotplug storm [1]. The symptoms vary by kernel version.
Hmm. I don't think it's a storm. The drm.debug=4 thing shows things just every 10 seconds. That seems pretty controlled.
OK, probably not the hotplug storm. That's always been at least once per second and sometimes several for me.
Of course, it seems to be several milliseconds worth of work then, so it's not _too_ controlled. I must be missing some detail.
If I'm right, the outputs of intel_bios_dumper and intel_bios_reader could be instructive (both are in intel-gpu-tools).
My version of intel-gpu-tools must be old (fedora 12). It has intel_gpu_dump and intel_reg_write and some other apps, but not the bios-dumper/reader.
Not that I could read the output of them anyway, I bet ;)
You could also try intel_reg_write 0x61110 0x0 and see if the problem stops (at least until a suspend/resume cycle). That command turns off output hotplug on the card, which has the side effect that the kernel will stop acting on bogus interrupts.
# intel_reg_write 0x61110 0x0 Value before: 0x4000220 Value after: 0x0
but it doesn't seem to change any behavior. Still that sdvo probe every 10 s, and still 1% of CPU for kworker in top. But that kworker thing definitely has to be separate from the 10-second sdvo probe, because it shows up all the time (ie top updates every second).
Almost certainly not the hotplug storm bug.
The 10 seconds is probably the timer in output_poll_execute (drm_crtc_helper.c):
#define DRM_OUTPUT_POLL_PERIOD (10*HZ)
The code in output_poll_execute (which is in your perf output) looks wrong:
list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
/* if this is HPD or polled don't check it - TV out for instance */ if (!connector->polled) continue;
else if (connector->polled & (DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT)) repoll = true;
[ so if we have DRM_CONNECTOR_POLL_CONNECT set we requeue the *entire* work item. or if we have DRM_CONNECTOR_POLL_DISCONNECT and it's not even connected, we still poll. ]
old_status = connector->status; /* if we are connected and don't want to poll for disconnect skip it */ if (old_status == connector_status_connected && !(connector->polled & DRM_CONNECTOR_POLL_DISCONNECT) && !(connector->polled & DRM_CONNECTOR_POLL_HPD)) continue;
[ but if the status isn't connector_status_connected, we poll it even if we're only in here due to a *different* connector ]
status = connector->funcs->detect(connector);
[ ->detect is the expensive part ]
if (old_status != status) changed = true; }
But that's really the problem, because intel_sdvo_dvi_init contains:
connector->polled = DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT;
I don't know if SDVO is supposed to send hotplug interrupts because that section of the docs is incomplete.
--Andy
On Mon, 2010-08-16 at 10:27 -0400, Andy Lutomirski wrote:
But that's really the problem, because intel_sdvo_dvi_init contains:
connector->polled = DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT;
I don't know if SDVO is supposed to send hotplug interrupts because that section of the docs is incomplete.
SDVO can generate interrupts, although a particular SDVO output type might not. (The G45 docs are missing this section, but the 965 docs are applicable.) We don't have that turned on, of course, because that would be entirely too sensible.
You'd probably need to uncomment the #if 0'd intel_sdvo_set_hotplug() in intel_sdvo.c, and call that from the appropriate bit of intel_sdvo_init(). For pre-gen4 you'd probably also need to set SDVO_INTERRUPT_ENABLE in the appropriate SDVOx register, but in gen4+ that bit is used for bpc selection.
Having done that you'd then need to wait the inevitable six months to find out which SDVO devices actually have functioning hotplug and which ones are just plain lying about it.
- ajax
dri-devel@lists.freedesktop.org