Hi
Am 16.10.19 um 15:05 schrieb Pekka Paalanen:
On Wed, 16 Oct 2019 00:35:39 +0200 Daniel Vetter daniel.vetter@ffwll.ch wrote:
Yeah I don't think tuning the spam level will ever work. What we need is some external input (most likely from the user clicking the "my external screen doesn't work" button, or maybe the compositor realizing something that should work didn't, or some other thing that indicates trouble), and then retroactively capture all debug/informational message leading up to doom.
But without that external "houston we have a problem" input all the debug spam is really just spam and unwanted. btw even if we don't spam dmesg if we enable too much we might have simply trouble with all the printk formatting work we do for nothing. So maybe we need something like trace_printk which iirc delays the formatting until the stuff actually gets read from the log buffer. Plus trace_printk might make it clear enough that it's not stable uapi ... so maybe we do want trace_printk in the end?
Just not really looking forward to reimplementing half the tracing infrastructure just for this ...
Hi,
a thought about the UAPI:
Debugfs is not good because it's not supposed to be touched or even present in production, right?
I'm running Tumbleweed where debugfs is mounted by default for root. I could live having the user to mount debugfs to get the file's content.
specifically be available in production. So a new file in some fs somewhere it should be, and userspace in production can read it at will to attach to a bug report.
Those semantics, "only use this content for attaching into a bug report" should be made very clear in the UAPI.
Has this ever worked? As soon as a userspace program starts depending on the content of this file, it becomes kabi. From the incidents I know, Linus has always been quite strict about this. Even for broken interfaces.
I believe it has to be a ring buffer that is being continuously written also during normal operations, so that we don't have to ask end users to reproduce the issue again just to get some logs. Maybe the issue happens once in a fortnight. The information must be extractable after the fact, without before-hand preparations.
Agreed.
Best regards Thomas
Thanks, pq