On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote:
On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft apw@canonical.com wrote:
We have been carrying a (rather poor) patch for an issue we identified in the DRM driver. This issue is triggered when a DRM device is initialising and userspace attempts to open it, typically in response to the sysfs device added event. Basically we allocate the minor numbers making the device available, and then call the drm load callback. Until this completes the device is really not ready and these early opens typically lead to oopses.
We have been using the following patch to avoid this by marking the minors as in error until the load method has completed. This avoids the early open by simply erroring out the opens with EAGAIN. Obviously we should be delaying the open until the load method complete.
I include the existing patch for completness (it is not really ready for merging) to illustrate the issue. I think it is logical that the wait should simply be delayed until the load has completed. I am proposing to include a wait queue associated with the idr cache for the drm minors which we can use to allow open callers to wait_event_interruptible() on. I'll be putting together a prototype shortly and will follow up with it.
Thoughts?
Couldn't we just delay registering things until the driver is ready to accept an open?
It's somewhere on my eternal&epic todo list.
Granted the midlayer of drm doesn't make that easy,
... after fixing this one ;-)
thanks for sending this out, it keeps falling off my radar, I don't think I've ever seen this reported on RHEL/Fedora, which makes me wonder what we are doing that makes us lucky.
I think it's just a matter of races, if you load the drm module early enough (like fedora does already in the initrd) and ensure that nothing pokes drm devices for a few seconds, you'll be fine. Iirc ubuntus powerd stuff is really got at brining everything down. Also, not loading the module with udev, but loading it with X resulted in nice fireworks last time I've tried that (radeon ums was trying to set up the card while the kms code was doing the same, hilarity ensued). -Daniel