Hi,
while debugging a few reported bugs, I noticed that drm_fb_helper_force_kernel_mode() that is called in the KMS panic notifier isn't really atomic-safe. It invokes crtc's set_config(), and all implementations seem to involve with page allocations (kmalloc with GFP_KERNEL, via some ttm ops, etc). I've actually seen the Oops with cirrus KMS during panic due to this.
Does anyone have an idea to fix this? I thought of re-using drm_fb_helper_debug_enter(), but this won't work with many drivers that don't have crtc->mode_set_base_atomic(), either (yeah, this is another bug).
thanks,
Takashi
On Mon, May 5, 2014 at 3:02 PM, Takashi Iwai tiwai@suse.de wrote:
Hi,
while debugging a few reported bugs, I noticed that drm_fb_helper_force_kernel_mode() that is called in the KMS panic notifier isn't really atomic-safe. It invokes crtc's set_config(), and all implementations seem to involve with page allocations (kmalloc with GFP_KERNEL, via some ttm ops, etc). I've actually seen the Oops with cirrus KMS during panic due to this.
Does anyone have an idea to fix this? I thought of re-using drm_fb_helper_debug_enter(), but this won't work with many drivers that don't have crtc->mode_set_base_atomic(), either (yeah, this is another bug).
David Herrmann has a long-term plan to address this, using a much more minimal panic console (so that we can avoid all the fbcon madness) and adding a new driver callback to deliver a pointer to whatever framebuffers are currently displayed. If it's possible to obtain such a pointer in atomic contexts. Then we'd rip out all the existing panic notifier and handler stuff in fbcon. We'd also need to disable the panic notifier fbcon registers itself (since that ends up calling down into our ->set_par which again ends up in ->set_config).
That should leave us with some very minimal panic code to audit.
Imo trying to fix the current mess and making ->set_config work in atomic contexts is pointless. drm_can_sleep is trying to make that possible in some ways, and it's horrible since using it means busy-loops in atomic contexts outside of panic handlers won't get reported any more. Also the interactions with the console_lock (which due to some bonghits is protecting almost everything in fbcon/fbdev nowadays) would also be almost completely removed.
Unfortunately doing all this is a lot of work. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
At Mon, 5 May 2014 16:29:37 +0200, Daniel Vetter wrote:
On Mon, May 5, 2014 at 3:02 PM, Takashi Iwai tiwai@suse.de wrote:
Hi,
while debugging a few reported bugs, I noticed that drm_fb_helper_force_kernel_mode() that is called in the KMS panic notifier isn't really atomic-safe. It invokes crtc's set_config(), and all implementations seem to involve with page allocations (kmalloc with GFP_KERNEL, via some ttm ops, etc). I've actually seen the Oops with cirrus KMS during panic due to this.
Does anyone have an idea to fix this? I thought of re-using drm_fb_helper_debug_enter(), but this won't work with many drivers that don't have crtc->mode_set_base_atomic(), either (yeah, this is another bug).
David Herrmann has a long-term plan to address this, using a much more minimal panic console (so that we can avoid all the fbcon madness) and adding a new driver callback to deliver a pointer to whatever framebuffers are currently displayed. If it's possible to obtain such a pointer in atomic contexts. Then we'd rip out all the existing panic notifier and handler stuff in fbcon. We'd also need to disable the panic notifier fbcon registers itself (since that ends up calling down into our ->set_par which again ends up in ->set_config).
That should leave us with some very minimal panic code to audit.
Imo trying to fix the current mess and making ->set_config work in atomic contexts is pointless. drm_can_sleep is trying to make that possible in some ways, and it's horrible since using it means busy-loops in atomic contexts outside of panic handlers won't get reported any more. Also the interactions with the console_lock (which due to some bonghits is protecting almost everything in fbcon/fbdev nowadays) would also be almost completely removed.
Unfortunately doing all this is a lot of work.
OK, thanks for clarification!
The current problem I see is that the rest of panic notifier chain won't be called once when we hit the problem in KMS notifier. So, this bug in KMS influences on the rest panic behavior.
Maybe a hackish solution would be to keep KMS notifier at the end of notifier chain so that it crashes at last. I don't like this either, but...
Takashi
On Mon, May 5, 2014 at 4:48 PM, Takashi Iwai tiwai@suse.de wrote:
The current problem I see is that the rest of panic notifier chain won't be called once when we hit the problem in KMS notifier. So, this bug in KMS influences on the rest panic behavior.
Maybe a hackish solution would be to keep KMS notifier at the end of notifier chain so that it crashes at last. I don't like this either, but...
You need to do that with both the kms panic notifier in drm_fb_helper.c and with the fbcon panic notifier. And iirc there's also a console->unblank call somewhere which _also_ can end up in ->set_par. But I'm not sure anymore when exactly that one is run, I've tried hard to forget this all ;-) -Daniel
At Mon, 5 May 2014 16:52:45 +0200, Daniel Vetter wrote:
On Mon, May 5, 2014 at 4:48 PM, Takashi Iwai tiwai@suse.de wrote:
The current problem I see is that the rest of panic notifier chain won't be called once when we hit the problem in KMS notifier. So, this bug in KMS influences on the rest panic behavior.
Maybe a hackish solution would be to keep KMS notifier at the end of notifier chain so that it crashes at last. I don't like this either, but...
You need to do that with both the kms panic notifier in drm_fb_helper.c and with the fbcon panic notifier. And iirc there's also a console->unblank call somewhere which _also_ can end up in ->set_par. But I'm not sure anymore when exactly that one is run, I've tried hard to forget this all ;-)
Oh well, you're suggesting the need for Propranolol overdose ;)
Takashi
At Mon, 5 May 2014 16:52:45 +0200, Daniel Vetter wrote:
On Mon, May 5, 2014 at 4:48 PM, Takashi Iwai tiwai@suse.de wrote:
The current problem I see is that the rest of panic notifier chain won't be called once when we hit the problem in KMS notifier. So, this bug in KMS influences on the rest panic behavior.
Maybe a hackish solution would be to keep KMS notifier at the end of notifier chain so that it crashes at last. I don't like this either, but...
You need to do that with both the kms panic notifier in drm_fb_helper.c and with the fbcon panic notifier. And iirc there's also a console->unblank call somewhere which _also_ can end up in ->set_par. But I'm not sure anymore when exactly that one is run, I've tried hard to forget this all ;-)
Looking back at the code again, it seems that fbcon has no panic notifier. It has own notifier chain, but it's a private chain that isn't called by the panic. So, we can forget about fbcon, at least (I hope).
Takashi
Hi
On Tue, May 6, 2014 at 3:27 PM, Takashi Iwai tiwai@suse.de wrote:
At Mon, 5 May 2014 16:52:45 +0200, Daniel Vetter wrote:
On Mon, May 5, 2014 at 4:48 PM, Takashi Iwai tiwai@suse.de wrote:
The current problem I see is that the rest of panic notifier chain won't be called once when we hit the problem in KMS notifier. So, this bug in KMS influences on the rest panic behavior.
Maybe a hackish solution would be to keep KMS notifier at the end of notifier chain so that it crashes at last. I don't like this either, but...
You need to do that with both the kms panic notifier in drm_fb_helper.c and with the fbcon panic notifier. And iirc there's also a console->unblank call somewhere which _also_ can end up in ->set_par. But I'm not sure anymore when exactly that one is run, I've tried hard to forget this all ;-)
Looking back at the code again, it seems that fbcon has no panic notifier. It has own notifier chain, but it's a private chain that isn't called by the panic. So, we can forget about fbcon, at least (I hope).
fbcon is called through the VT or fbdev layer, which is called by bust_spinlocks(1) via either unblank_screen() or console_unblank().
Cheers David
At Tue, 6 May 2014 15:32:21 +0200, David Herrmann wrote:
Hi
On Tue, May 6, 2014 at 3:27 PM, Takashi Iwai tiwai@suse.de wrote:
At Mon, 5 May 2014 16:52:45 +0200, Daniel Vetter wrote:
On Mon, May 5, 2014 at 4:48 PM, Takashi Iwai tiwai@suse.de wrote:
The current problem I see is that the rest of panic notifier chain won't be called once when we hit the problem in KMS notifier. So, this bug in KMS influences on the rest panic behavior.
Maybe a hackish solution would be to keep KMS notifier at the end of notifier chain so that it crashes at last. I don't like this either, but...
You need to do that with both the kms panic notifier in drm_fb_helper.c and with the fbcon panic notifier. And iirc there's also a console->unblank call somewhere which _also_ can end up in ->set_par. But I'm not sure anymore when exactly that one is run, I've tried hard to forget this all ;-)
Looking back at the code again, it seems that fbcon has no panic notifier. It has own notifier chain, but it's a private chain that isn't called by the panic. So, we can forget about fbcon, at least (I hope).
fbcon is called through the VT or fbdev layer, which is called by bust_spinlocks(1) via either unblank_screen() or console_unblank().
You mean bust_spinlocks(0), right?
void __attribute__((weak)) bust_spinlocks(int yes) { if (yes) { ++oops_in_progress; } else { #ifdef CONFIG_VT unblank_screen(); #endif console_unblank(); if (--oops_in_progress == 0) wake_up_klogd(); } }
bust_spinlocks(0) is called after the notifier chain, and it's almost at the end of panic().
Takashi
Hi
On Tue, May 6, 2014 at 3:38 PM, Takashi Iwai tiwai@suse.de wrote:
At Tue, 6 May 2014 15:32:21 +0200, David Herrmann wrote:
fbcon is called through the VT or fbdev layer, which is called by bust_spinlocks(1) via either unblank_screen() or console_unblank().
You mean bust_spinlocks(0), right?
void __attribute__((weak)) bust_spinlocks(int yes) { if (yes) { ++oops_in_progress; } else { #ifdef CONFIG_VT unblank_screen(); #endif console_unblank(); if (--oops_in_progress == 0) wake_up_klogd(); } }
bust_spinlocks(0) is called after the notifier chain, and it's almost at the end of panic().
Yes, it's called _after_ the panic-handlers but _before_ console_unlock() (see console_unblank() in printk.c). Therefore, we call into set_config() before the serial drivers get the panic-message (flushed via console_unlock()). If the serial drivers (or whatever you use for debugging) register their own panic-handlers, then they're fine of course.
Thanks David
At Tue, 6 May 2014 15:53:32 +0200, David Herrmann wrote:
Hi
On Tue, May 6, 2014 at 3:38 PM, Takashi Iwai tiwai@suse.de wrote:
At Tue, 6 May 2014 15:32:21 +0200, David Herrmann wrote:
fbcon is called through the VT or fbdev layer, which is called by bust_spinlocks(1) via either unblank_screen() or console_unblank().
You mean bust_spinlocks(0), right?
void __attribute__((weak)) bust_spinlocks(int yes) { if (yes) { ++oops_in_progress; } else { #ifdef CONFIG_VT unblank_screen(); #endif console_unblank(); if (--oops_in_progress == 0) wake_up_klogd(); } }
bust_spinlocks(0) is called after the notifier chain, and it's almost at the end of panic().
Yes, it's called _after_ the panic-handlers but _before_ console_unlock() (see console_unblank() in printk.c). Therefore, we call into set_config() before the serial drivers get the panic-message (flushed via console_unlock()). If the serial drivers (or whatever you use for debugging) register their own panic-handlers, then they're fine of course.
Thanks for clarification. I see it's at the sensible place.
FWIW, the problem I'm tackling now is the blockage of other panic notifiers due to drm_fb. For example, pvpanic isn't executed reliably because of this when a KMS (either cirrus or qxl) driver is loaded. So, for me, it's fine that the system stalls after that point :)
Takashi
Imo trying to fix the current mess and making ->set_config work in atomic contexts is pointless. drm_can_sleep is trying to make that possible in some ways, and it's horrible since using it means busy-loops in atomic contexts outside of panic handlers won't get reported any more. Also the interactions with the console_lock (which due to some bonghits is protecting almost everything in fbcon/fbdev nowadays) would also be almost completely removed.
Unfortunately years ago some Finnish student didn't design his console driver to be lock friendly and then fbcon/fbdev inherited it.
If you are writing a new console driver please don't use fbcon, use a text console that writes to a simple n x m text framebuffer with dirty bits and wakes a waitqueue or work queue of some sort when it changes. Then just redraw updated bits every vblank frame if its actually changed.
Not only will the lock problems go away it'll blow away the existing scrolling text performance on any DRM driver except gma500.
Alan
dri-devel@lists.freedesktop.org