Hi All,
Somewhat long mail, so I've divided it into "chapters", sorry.
1. Back ground info -------------------
First some quick background, some recent Lenovo models have a builtin privacy screen which can be turned on/off in software and the kernel recently has gotten support for this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
We have been looking into adding support for this to GNOME, but the userspace API added by the above commit is very Thinkpad specific, and we would rather not rely on an userspace API which is specific to 1 series of laptops.
When we started discussing this I had already seen some versions of Rajat's "drm/i915 Support for integrated privacy screen" series:
https://patchwork.freedesktop.org/series/74650/
Which adds support for integrated privacy screens as a drm-connector property. Anyone who has been involved in the backlight brightness control saga we have with the sysfs backlight class instantly knows/feels that this is the right way to handle this.
So now we want to export the Thinkpad lcdshadow attribute as a drm-connector property.
2. Problem + Possible solutions -------------------------------
The problem is that the code to get/set the lcdshadow setting and also the code to listen for firmware (hardcoded hotkeys) triggered state changes all lives inside the thinkpad_acpi driver; and to export the lcdshadow setting as a drm property we need to access that code (and it is too big to just copy over).
One thing which makes this trickier is that all properties must be attached to the connector before it is registered, we cannot add it at a later time.
I see 3 possible solutions here:
i. Export some symbols from thinkpad_acpi and directly call these from drm_connector_create_standard_properties and other drm_connector functions if the thinkpad_acpi module is enabled. Note this should be done in the core drm_connector functions since the GPU might be one of i915 / amdgpu / nouveau. I believe that it is clear that this solution is not very elegant.
A variant of this would be to have a separate helper module (probaly a few static inlines in a .h) which exports some hooks for i915 / amdgpu / nouveau to call this way we at least keep the ugliness out of the core and keep the module-dep on thinkpad_acpi limited to the i915 / amdgpu / nouveau modules. This might actually not be too bad, except that currently the thinkpad_acpi module refuses to load on non thinkpads...
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
iii. We can add a generic framework to allow drivers outside of the drm-subsys to register something drm_property(ish) specifying a dev_name() and connector-name to which to attach the property when that connector gets created on that device.
This requires the driver registering this property to be fully loaded before the connector gets registered.
3. Picking a solution ---------------------
i. is just ugly, full stop, but also very KISS which still makes it somewhat appealing. Also i. does not scale if we get other vendor specific interfaces for interacting with these privacy screens.
ii. is ugly from an userspace API pov, if there is no "privacy-screen" then we really should not have the property at all rather then setting it to "Not Available". Still it might be an acceptable compromise I guess
iii. is/was the one I liked, until I started looking at the drm_connector code and saw that all properties must be attached before registering the connector, bummer. Then this morning I remembered that the i915 driver has a similar issue with the gvt stuff / the kvmgt module. The kvmgt module must be loaded and fully initialized before the i915 driver loads. This has been solved with module softdeps.
I think that we can do the same here having either the i915+nouveau+amdgpu drivers; or the drm-core have a softdep on thinkpad_acpi so that it can register the lcdshadow property with the to-be-written framework for externally managed props before the internal panels drm-connector gets created.
This also allows the thinkpad_acpi module_init function to return -ENODEV on non Thinkpad devices, so that it will not take up memory, where as with i. we would always need to have it in memory.
I'm currently leaning towards iii. combined with using MODULE_SOFTDEP("pre: thinkpad_acpi") to make sure the thinkpad_acpi driver can do its thing before the internal display connector gets created.
Daniel (and others) does this (iii. + softdeps) sound like something which might be acceptable (I know it will also depend on the resulting code implementing this idea) ?
Any other ideas / suggestions are appreciated here.
Regards,
Hans
On Wed, Apr 15, 2020 at 11:42 AM Hans de Goede hdegoede@redhat.com wrote:
Hi All,
Somewhat long mail, so I've divided it into "chapters", sorry.
- Back ground info
First some quick background, some recent Lenovo models have a builtin privacy screen which can be turned on/off in software and the kernel recently has gotten support for this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
We have been looking into adding support for this to GNOME, but the userspace API added by the above commit is very Thinkpad specific, and we would rather not rely on an userspace API which is specific to 1 series of laptops.
When we started discussing this I had already seen some versions of Rajat's "drm/i915 Support for integrated privacy screen" series:
https://patchwork.freedesktop.org/series/74650/
Which adds support for integrated privacy screens as a drm-connector property. Anyone who has been involved in the backlight brightness control saga we have with the sysfs backlight class instantly knows/feels that this is the right way to handle this.
So now we want to export the Thinkpad lcdshadow attribute as a drm-connector property.
- Problem + Possible solutions
The problem is that the code to get/set the lcdshadow setting and also the code to listen for firmware (hardcoded hotkeys) triggered state changes all lives inside the thinkpad_acpi driver; and to export the lcdshadow setting as a drm property we need to access that code (and it is too big to just copy over).
One thing which makes this trickier is that all properties must be attached to the connector before it is registered, we cannot add it at a later time.
I see 3 possible solutions here:
i. Export some symbols from thinkpad_acpi and directly call these from drm_connector_create_standard_properties and other drm_connector functions if the thinkpad_acpi module is enabled. Note this should be done in the core drm_connector functions since the GPU might be one of i915 / amdgpu / nouveau. I believe that it is clear that this solution is not very elegant.
A variant of this would be to have a separate helper module (probaly a few static inlines in a .h) which exports some hooks for i915 / amdgpu / nouveau to call this way we at least keep the ugliness out of the core and keep the module-dep on thinkpad_acpi limited to the i915 / amdgpu / nouveau modules. This might actually not be too bad, except that currently the thinkpad_acpi module refuses to load on non thinkpads...
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
iii. We can add a generic framework to allow drivers outside of the drm-subsys to register something drm_property(ish) specifying a dev_name() and connector-name to which to attach the property when that connector gets created on that device.
This requires the driver registering this property to be fully loaded before the connector gets registered.
iv. What every SoC subsystem does:
- lcdshadow drivers register drivers - drm drivers look them up - if stuff isn't there yet, we delay loading with EPROBE_DEFER until the entire thing is assembled.
That's what we're doing already for other standardized components like drm_bridge or drm_panel, and I think that's also the right approach for backlight and anything else like that. Hand-rolling our own EPROBE_DEFER handling, or some other duct-tape monsters imo just leads to real pain. Also, with EPROBE_DEFER we have one standard way of building a driver from component, which spans subsystems and is also the underlying magic that makes stuff like component.c work.
Wrt the actual userspace interface, I think the drm core should handle this as much as possible. Same way we let drm core handle a lot of the atomic property stuff, to make sure things are standardized. So
- add an lcdshadow pointer to struct drm_connector - implement the property glue code in drm core completely, at least for the read side - for the write side we might want to have some drm wrappers drivers can call to at the appropriate times to e.g. restore privacy screen settings when the panel gets enabled. In case that's needed.
Also one thing that the EPROBE_DEFER stuff forces us to handle correctly is to track these depencies. That's the other nightmare in backlight land, essentially you have no idea of knowing (in userspace) whether the backlight driver you want is actually loaded, resulting in lots of fun. The kernel is the only thing that can know, and for hw that's built-in there's really no excuse to not know. So a model where stuff gets assembled after drm_dev_register() is imo just plain buggy.
This means that the lcdshadow subsystem needs to have some idea of whether it needs a driver, so that it can correctly differentiate between EPROBE_DEFER and ENOENT error cases. In the latter the driver should continue loading ofc.
Cheers, Daniel
- Picking a solution
i. is just ugly, full stop, but also very KISS which still makes it somewhat appealing. Also i. does not scale if we get other vendor specific interfaces for interacting with these privacy screens.
ii. is ugly from an userspace API pov, if there is no "privacy-screen" then we really should not have the property at all rather then setting it to "Not Available". Still it might be an acceptable compromise I guess
iii. is/was the one I liked, until I started looking at the drm_connector code and saw that all properties must be attached before registering the connector, bummer. Then this morning I remembered that the i915 driver has a similar issue with the gvt stuff / the kvmgt module. The kvmgt module must be loaded and fully initialized before the i915 driver loads. This has been solved with module softdeps.
I think that we can do the same here having either the i915+nouveau+amdgpu drivers; or the drm-core have a softdep on thinkpad_acpi so that it can register the lcdshadow property with the to-be-written framework for externally managed props before the internal panels drm-connector gets created.
This also allows the thinkpad_acpi module_init function to return -ENODEV on non Thinkpad devices, so that it will not take up memory, where as with i. we would always need to have it in memory.
I'm currently leaning towards iii. combined with using MODULE_SOFTDEP("pre: thinkpad_acpi") to make sure the thinkpad_acpi driver can do its thing before the internal display connector gets created.
Daniel (and others) does this (iii. + softdeps) sound like something which might be acceptable (I know it will also depend on the resulting code implementing this idea) ?
Any other ideas / suggestions are appreciated here.
Regards,
Hans
Hi,
On 4/15/20 11:52 AM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 11:42 AM Hans de Goede hdegoede@redhat.com wrote:
Hi All,
Somewhat long mail, so I've divided it into "chapters", sorry.
- Back ground info
First some quick background, some recent Lenovo models have a builtin privacy screen which can be turned on/off in software and the kernel recently has gotten support for this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
We have been looking into adding support for this to GNOME, but the userspace API added by the above commit is very Thinkpad specific, and we would rather not rely on an userspace API which is specific to 1 series of laptops.
When we started discussing this I had already seen some versions of Rajat's "drm/i915 Support for integrated privacy screen" series:
https://patchwork.freedesktop.org/series/74650/
Which adds support for integrated privacy screens as a drm-connector property. Anyone who has been involved in the backlight brightness control saga we have with the sysfs backlight class instantly knows/feels that this is the right way to handle this.
So now we want to export the Thinkpad lcdshadow attribute as a drm-connector property.
- Problem + Possible solutions
The problem is that the code to get/set the lcdshadow setting and also the code to listen for firmware (hardcoded hotkeys) triggered state changes all lives inside the thinkpad_acpi driver; and to export the lcdshadow setting as a drm property we need to access that code (and it is too big to just copy over).
One thing which makes this trickier is that all properties must be attached to the connector before it is registered, we cannot add it at a later time.
I see 3 possible solutions here:
i. Export some symbols from thinkpad_acpi and directly call these from drm_connector_create_standard_properties and other drm_connector functions if the thinkpad_acpi module is enabled. Note this should be done in the core drm_connector functions since the GPU might be one of i915 / amdgpu / nouveau. I believe that it is clear that this solution is not very elegant.
A variant of this would be to have a separate helper module (probaly a few static inlines in a .h) which exports some hooks for i915 / amdgpu / nouveau to call this way we at least keep the ugliness out of the core and keep the module-dep on thinkpad_acpi limited to the i915 / amdgpu / nouveau modules. This might actually not be too bad, except that currently the thinkpad_acpi module refuses to load on non thinkpads...
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
iii. We can add a generic framework to allow drivers outside of the drm-subsys to register something drm_property(ish) specifying a dev_name() and connector-name to which to attach the property when that connector gets created on that device.
This requires the driver registering this property to be fully loaded before the connector gets registered.
iv. What every SoC subsystem does:
- lcdshadow drivers register drivers
- drm drivers look them up
- if stuff isn't there yet, we delay loading with EPROBE_DEFER until
the entire thing is assembled.
That's what we're doing already for other standardized components like drm_bridge or drm_panel, and I think that's also the right approach for backlight and anything else like that. Hand-rolling our own EPROBE_DEFER handling, or some other duct-tape monsters imo just leads to real pain. Also, with EPROBE_DEFER we have one standard way of building a driver from component, which spans subsystems and is also the underlying magic that makes stuff like component.c work.
On the SoCs we have devicetree telling us what components there are, so we can wait for them to show up. The only way to figure out if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, or duplicating a lot of code from thinkpad_acpi. Edit: also see below for a possible solution.
Wrt the actual userspace interface, I think the drm core should handle this as much as possible. Same way we let drm core handle a lot of the atomic property stuff, to make sure things are standardized.
Agreed.
So
- add an lcdshadow pointer to struct drm_connector
- implement the property glue code in drm core completely, at least
for the read side
- for the write side we might want to have some drm wrappers drivers
can call to at the appropriate times to e.g. restore privacy screen settings when the panel gets enabled. In case that's needed.
Also one thing that the EPROBE_DEFER stuff forces us to handle correctly is to track these depencies. That's the other nightmare in backlight land, essentially you have no idea of knowing (in userspace) whether the backlight driver you want is actually loaded, resulting in lots of fun. The kernel is the only thing that can know, and for hw that's built-in there's really no excuse to not know. So a model where stuff gets assembled after drm_dev_register() is imo just plain buggy.
This means that the lcdshadow subsystem needs to have some idea of whether it needs a driver, so that it can correctly differentiate between EPROBE_DEFER and ENOENT error cases. In the latter the driver should continue loading ofc.
Right, so how would the lcdshadow subsystem do this? The only way would be for it to say try and modprobe thinkpad_acpi from its core init function (if thinkpad_acpi is enabled). IOW it is the usual x86 mess. I guess we could have something like this in a theoretical to be added lcdshadow subsystem:
/* Add comment explaining why we need this messy stuff here */ const char * const shadow_providers[] = { #ifdef CONFIG_THINKPAD_ACPI_MODULE "thinkpad_acpi", #endif #ifdef CONFIG_OTHER_MODULE "other", #endif NULL };
int module_init(void) { /* do actual setup of the ?class? */
for (i = 0; shadow_providers[i]; i++) request_module(shadow_providers[i]);
return 0; }
And then simply have the function which gets a lcd_shadow provider provide -ENOENT if there are none.
One problem here is that this will work for modules since the drm-core will depend on modules from the lcdshadow-core, so the lcdshadow-core module will get loaded first and will then try to load thinkpad_acpi, which will return with -ENODEV from its module_init on non ThinkPads. We could even put the request_module loop in some other init_once function so that things will also work when the lcdshadow-core is builtin.
But how do we ensure that thinkpad_acpi will get to register its lcdshadow before say i915 gets probed if everything is builtin?
I guess my SOFTDEP solution has the same issue though. Do you know has this is dealt with for kvmgt ?
Regards,
Hans
- Picking a solution
i. is just ugly, full stop, but also very KISS which still makes it somewhat appealing. Also i. does not scale if we get other vendor specific interfaces for interacting with these privacy screens.
ii. is ugly from an userspace API pov, if there is no "privacy-screen" then we really should not have the property at all rather then setting it to "Not Available". Still it might be an acceptable compromise I guess
iii. is/was the one I liked, until I started looking at the drm_connector code and saw that all properties must be attached before registering the connector, bummer. Then this morning I remembered that the i915 driver has a similar issue with the gvt stuff / the kvmgt module. The kvmgt module must be loaded and fully initialized before the i915 driver loads. This has been solved with module softdeps.
I think that we can do the same here having either the i915+nouveau+amdgpu drivers; or the drm-core have a softdep on thinkpad_acpi so that it can register the lcdshadow property with the to-be-written framework for externally managed props before the internal panels drm-connector gets created.
This also allows the thinkpad_acpi module_init function to return -ENODEV on non Thinkpad devices, so that it will not take up memory, where as with i. we would always need to have it in memory.
I'm currently leaning towards iii. combined with using MODULE_SOFTDEP("pre: thinkpad_acpi") to make sure the thinkpad_acpi driver can do its thing before the internal display connector gets created.
Daniel (and others) does this (iii. + softdeps) sound like something which might be acceptable (I know it will also depend on the resulting code implementing this idea) ?
Any other ideas / suggestions are appreciated here.
Regards,
Hans
On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 11:52 AM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 11:42 AM Hans de Goede hdegoede@redhat.com wrote:
Hi All,
Somewhat long mail, so I've divided it into "chapters", sorry.
- Back ground info
First some quick background, some recent Lenovo models have a builtin privacy screen which can be turned on/off in software and the kernel recently has gotten support for this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
We have been looking into adding support for this to GNOME, but the userspace API added by the above commit is very Thinkpad specific, and we would rather not rely on an userspace API which is specific to 1 series of laptops.
When we started discussing this I had already seen some versions of Rajat's "drm/i915 Support for integrated privacy screen" series:
https://patchwork.freedesktop.org/series/74650/
Which adds support for integrated privacy screens as a drm-connector property. Anyone who has been involved in the backlight brightness control saga we have with the sysfs backlight class instantly knows/feels that this is the right way to handle this.
So now we want to export the Thinkpad lcdshadow attribute as a drm-connector property.
- Problem + Possible solutions
The problem is that the code to get/set the lcdshadow setting and also the code to listen for firmware (hardcoded hotkeys) triggered state changes all lives inside the thinkpad_acpi driver; and to export the lcdshadow setting as a drm property we need to access that code (and it is too big to just copy over).
One thing which makes this trickier is that all properties must be attached to the connector before it is registered, we cannot add it at a later time.
I see 3 possible solutions here:
i. Export some symbols from thinkpad_acpi and directly call these from drm_connector_create_standard_properties and other drm_connector functions if the thinkpad_acpi module is enabled. Note this should be done in the core drm_connector functions since the GPU might be one of i915 / amdgpu / nouveau. I believe that it is clear that this solution is not very elegant.
A variant of this would be to have a separate helper module (probaly a few static inlines in a .h) which exports some hooks for i915 / amdgpu / nouveau to call this way we at least keep the ugliness out of the core and keep the module-dep on thinkpad_acpi limited to the i915 / amdgpu / nouveau modules. This might actually not be too bad, except that currently the thinkpad_acpi module refuses to load on non thinkpads...
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
iii. We can add a generic framework to allow drivers outside of the drm-subsys to register something drm_property(ish) specifying a dev_name() and connector-name to which to attach the property when that connector gets created on that device.
This requires the driver registering this property to be fully loaded before the connector gets registered.
iv. What every SoC subsystem does:
- lcdshadow drivers register drivers
- drm drivers look them up
- if stuff isn't there yet, we delay loading with EPROBE_DEFER until
the entire thing is assembled.
That's what we're doing already for other standardized components like drm_bridge or drm_panel, and I think that's also the right approach for backlight and anything else like that. Hand-rolling our own EPROBE_DEFER handling, or some other duct-tape monsters imo just leads to real pain. Also, with EPROBE_DEFER we have one standard way of building a driver from component, which spans subsystems and is also the underlying magic that makes stuff like component.c work.
On the SoCs we have devicetree telling us what components there are, so we can wait for them to show up. The only way to figure out if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, or duplicating a lot of code from thinkpad_acpi. Edit: also see below for a possible solution.
Yup it sucks. I think all we can do is have a small acpi match function (which yes will duplicate some of the thinkpad_acpi driver logic) to re-create that information and give us a "should we have a lcdshadow driver for this $pci_device" answer.
This need for an in-kernel source of truth for "which backlight, if any, do I need" is why the backlight stuff never got fixed. It's a really hard problem, and doing the table flip and just letting userspace deal with the mess at least avoids having to face the fact that the kernel is totally failing here. It's made worse for backlight because of 20 years of hacked up systems that "work", and we can't regress any of them.
I really want to avoid that situation for anything new like lcdshadow.
Wrt the actual userspace interface, I think the drm core should handle this as much as possible. Same way we let drm core handle a lot of the atomic property stuff, to make sure things are standardized.
Agreed.
So
- add an lcdshadow pointer to struct drm_connector
- implement the property glue code in drm core completely, at least
for the read side
- for the write side we might want to have some drm wrappers drivers
can call to at the appropriate times to e.g. restore privacy screen settings when the panel gets enabled. In case that's needed.
Also one thing that the EPROBE_DEFER stuff forces us to handle correctly is to track these depencies. That's the other nightmare in backlight land, essentially you have no idea of knowing (in userspace) whether the backlight driver you want is actually loaded, resulting in lots of fun. The kernel is the only thing that can know, and for hw that's built-in there's really no excuse to not know. So a model where stuff gets assembled after drm_dev_register() is imo just plain buggy.
This means that the lcdshadow subsystem needs to have some idea of whether it needs a driver, so that it can correctly differentiate between EPROBE_DEFER and ENOENT error cases. In the latter the driver should continue loading ofc.
Right, so how would the lcdshadow subsystem do this? The only way would be for it to say try and modprobe thinkpad_acpi from its core init function (if thinkpad_acpi is enabled). IOW it is the usual x86 mess. I guess we could have something like this in a theoretical to be added lcdshadow subsystem:
/* Add comment explaining why we need this messy stuff here */ const char * const shadow_providers[] = { #ifdef CONFIG_THINKPAD_ACPI_MODULE "thinkpad_acpi", #endif #ifdef CONFIG_OTHER_MODULE "other", #endif NULL };
int module_init(void) { /* do actual setup of the ?class? */
for (i = 0; shadow_providers[i]; i++) request_module(shadow_providers[i]); return 0;
}
Hm I think explicitly loading drivers feels very much not device model like. Don't these drivers auto-load, matching on acpi functions? I guess if that doesn't exist, then we'd need to fix that one first :-/ In general no request_module please, that shouldn't be needed either.
The trouble with request_module is also that (afaiui) it doesn't really work well with parallel module load and all that, for EPROBE_DEFER to work we do need to be able to answer "should we have a driver?" independently of whether that driver has loaded already or not.
And then simply have the function which gets a lcd_shadow provider provide -ENOENT if there are none.
One problem here is that this will work for modules since the drm-core will depend on modules from the lcdshadow-core, so the lcdshadow-core module will get loaded first and will then try to load thinkpad_acpi, which will return with -ENODEV from its module_init on non ThinkPads. We could even put the request_module loop in some other init_once function so that things will also work when the lcdshadow-core is builtin.
But how do we ensure that thinkpad_acpi will get to register its lcdshadow before say i915 gets probed if everything is builtin?
EPROBE_DEFER. Everyone hates it, but it does work. It's really kinda neat magic. The only pain point is that you might need to wire EPROBE_DEFER through a few layers in i915, but we do have a lot of that in place already, plus we have solid error-injecting load error tests in igt. So that part shouldn't be a big deal.
I guess my SOFTDEP solution has the same issue though. Do you know has this is dealt with for kvmgt ?
kvmgt? What do you mean there? kvmgt is just a user of i915-gem, if you enable it in the config (and module option too iirc) then it works more like an additional uapi layer, similar to how we handle fbdev emulation. So totally different scenario, since the main driver is 100% in control of the situation and controls the register/unregister lifetime cycles completely. There's no indepedent part somewhere else for kvmgt or fbdev emulation.
Or I'm totally misunderstanding what you mean with this example here.
Cheers, Daniel
Regards,
Hans
- Picking a solution
i. is just ugly, full stop, but also very KISS which still makes it somewhat appealing. Also i. does not scale if we get other vendor specific interfaces for interacting with these privacy screens.
ii. is ugly from an userspace API pov, if there is no "privacy-screen" then we really should not have the property at all rather then setting it to "Not Available". Still it might be an acceptable compromise I guess
iii. is/was the one I liked, until I started looking at the drm_connector code and saw that all properties must be attached before registering the connector, bummer. Then this morning I remembered that the i915 driver has a similar issue with the gvt stuff / the kvmgt module. The kvmgt module must be loaded and fully initialized before the i915 driver loads. This has been solved with module softdeps.
I think that we can do the same here having either the i915+nouveau+amdgpu drivers; or the drm-core have a softdep on thinkpad_acpi so that it can register the lcdshadow property with the to-be-written framework for externally managed props before the internal panels drm-connector gets created.
This also allows the thinkpad_acpi module_init function to return -ENODEV on non Thinkpad devices, so that it will not take up memory, where as with i. we would always need to have it in memory.
I'm currently leaning towards iii. combined with using MODULE_SOFTDEP("pre: thinkpad_acpi") to make sure the thinkpad_acpi driver can do its thing before the internal display connector gets created.
Daniel (and others) does this (iii. + softdeps) sound like something which might be acceptable (I know it will also depend on the resulting code implementing this idea) ?
Any other ideas / suggestions are appreciated here.
Regards,
Hans
Hi,
On 4/15/20 12:22 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 11:52 AM, Daniel Vetter wrote:
<snip>
iv. What every SoC subsystem does:
- lcdshadow drivers register drivers
- drm drivers look them up
- if stuff isn't there yet, we delay loading with EPROBE_DEFER until
the entire thing is assembled.
That's what we're doing already for other standardized components like drm_bridge or drm_panel, and I think that's also the right approach for backlight and anything else like that. Hand-rolling our own EPROBE_DEFER handling, or some other duct-tape monsters imo just leads to real pain. Also, with EPROBE_DEFER we have one standard way of building a driver from component, which spans subsystems and is also the underlying magic that makes stuff like component.c work.
On the SoCs we have devicetree telling us what components there are, so we can wait for them to show up. The only way to figure out if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, or duplicating a lot of code from thinkpad_acpi. Edit: also see below for a possible solution.
Yup it sucks. I think all we can do is have a small acpi match function (which yes will duplicate some of the thinkpad_acpi driver logic) to re-create that information and give us a "should we have a lcdshadow driver for this $pci_device" answer.
Ok, so questions about this solution:
1. Where should that match-function live ?
2. An acpi_thinkpad derived match-function will only be able to answer if there is an lcdshadow device/driver for the internal panel. It will not be able to tie this info to a certain PCI device. My plan is to pass NULL as dev_name when registering the lcdshadow-device and have lcdshadow_get(dev, <connector-name>) skip device-name matching (consider everything a match) for lcdshadow-devices registered with NULL as dev_name.
So I guess in this case the mini match function should just ignore the passed in device?
This need for an in-kernel source of truth for "which backlight, if any, do I need" is why the backlight stuff never got fixed. It's a really hard problem, and doing the table flip and just letting userspace deal with the mess at least avoids having to face the fact that the kernel is totally failing here. It's made worse for backlight because of 20 years of hacked up systems that "work", and we can't regress any of them.
Right, as discussed during last plumbers Christian Kellner and I have written a plan to slowly resolve this. Unfortunately Christian has not found the time to work on this yet.
I really want to avoid that situation for anything new like lcdshadow.
Ack.
Wrt the actual userspace interface, I think the drm core should handle this as much as possible. Same way we let drm core handle a lot of the atomic property stuff, to make sure things are standardized.
Agreed.
So
- add an lcdshadow pointer to struct drm_connector
- implement the property glue code in drm core completely, at least
for the read side
- for the write side we might want to have some drm wrappers drivers
can call to at the appropriate times to e.g. restore privacy screen settings when the panel gets enabled. In case that's needed.
Also one thing that the EPROBE_DEFER stuff forces us to handle correctly is to track these depencies. That's the other nightmare in backlight land, essentially you have no idea of knowing (in userspace) whether the backlight driver you want is actually loaded, resulting in lots of fun. The kernel is the only thing that can know, and for hw that's built-in there's really no excuse to not know. So a model where stuff gets assembled after drm_dev_register() is imo just plain buggy.
This means that the lcdshadow subsystem needs to have some idea of whether it needs a driver, so that it can correctly differentiate between EPROBE_DEFER and ENOENT error cases. In the latter the driver should continue loading ofc.
Right, so how would the lcdshadow subsystem do this? The only way would be for it to say try and modprobe thinkpad_acpi from its core init function (if thinkpad_acpi is enabled). IOW it is the usual x86 mess. I guess we could have something like this in a theoretical to be added lcdshadow subsystem:
/* Add comment explaining why we need this messy stuff here */ const char * const shadow_providers[] = { #ifdef CONFIG_THINKPAD_ACPI_MODULE "thinkpad_acpi", #endif #ifdef CONFIG_OTHER_MODULE "other", #endif NULL };
int module_init(void) { /* do actual setup of the ?class? */
for (i = 0; shadow_providers[i]; i++) request_module(shadow_providers[i]); return 0;
}
Hm I think explicitly loading drivers feels very much not device model like. Don't these drivers auto-load, matching on acpi functions?
thinkpad_acpi does autoload based on a number of ACPI device-ids, the idea behind the above request_module is to avoid the need to have the acpi-match function you mentioned above.
Basically what would happen is e.g. :
1. i915 loads, calls lcdshadow_get(dev, "eDP-1"); 2. if this is the first lcdshadow_get() call then the lcdshadow core will do the request_module calls, allowing any of these modules to get loaded + probed and call e.g. lcdshadow_register(&mylcdshadowdev, <gfx-adapter-dev-name>, "eDP-1"); 3. After the request modules the lcdshadow_get() will walk over the list of registered devices, including the ones just registered by the request_module calls and then hopefully find a match
So by doing the request-module calls before checking for a matching lcdshadow dev, we avoid the need of having some of the knowledge currently abstracted away in the thinkpad_acpi driver duplicated inside the drm code somewhere.
I guess if that doesn't exist, then we'd need to fix that one first :-/ In general no request_module please, that shouldn't be needed either.
The trouble with request_module is also that (afaiui) it doesn't really work well with parallel module load and all that, for EPROBE_DEFER to work we do need to be able to answer "should we have a driver?" independently of whether that driver has loaded already or not.
The idea here is to avoid using EPROBE_DEFER (on x86 at least) and either directly return the lcdshadow_dev or ENOENT. Also see below.
And then simply have the function which gets a lcd_shadow provider provide -ENOENT if there are none.
One problem here is that this will work for modules since the drm-core will depend on modules from the lcdshadow-core, so the lcdshadow-core module will get loaded first and will then try to load thinkpad_acpi, which will return with -ENODEV from its module_init on non ThinkPads. We could even put the request_module loop in some other init_once function so that things will also work when the lcdshadow-core is builtin.
But how do we ensure that thinkpad_acpi will get to register its lcdshadow before say i915 gets probed if everything is builtin?
EPROBE_DEFER. Everyone hates it, but it does work. It's really kinda neat magic. The only pain point is that you might need to wire EPROBE_DEFER through a few layers in i915, but we do have a lot of that in place already,
AFAIK we do not have any EPROBE_DEFER handling in i915 in place at all! There are _0_ checks for an EPROBE_DEFER return in all of the i915 code. We actually have a similar problem with backlight control when controlled by an external PWM controller such as the PWM controller of the Crystal Cove PMIC found on some BYT/CHT drivers or the PWM controller found in the LPSS bits of BYT/CHT SoCs.
Since again we lack a clear hardware model like device tree, we use lookup tables for the (external) PWM controllers on x86, see the pwm_add_table calls in drivers/acpi/acpi_lpss.c and drivers/mfd/intel_soc_pmic_core.c and the pwm_get call in the i915 code simply continues on its happy way without backlight control if the pwm_get fails, rather then dealing with -EPROBE_DEFER. I looked into untangling this back then but the i915 code really is not ready to unroll everything it has done already once we are probing connectors.
I actually "fixed" the PWM issue by:
1. Adding a module-name field to the PWM lookup table registered by the pwm_add_table calls.
2. Have the PWM core call request_module on that module_name if it finds a match in the registered lookup_table (which get registered early on), but the matching pwm_dev has not been registered yet.
So I agree with you that ideally i915 would handle EPROBE_DEFER but atm AFAIK it really does not handle that at all and we are pretty far away from getting to a point where it does handle that.
Assuming we are going to add some device/model specific lcdshadow knowledge inside the lcdshadow core as you suggested with your "small acpi match function" above, we could do something similar to what the vga_switcheroo code is doing for this and have a lcdshadow_defer_probe() helper and call that really early in i915_pci_probe(), which currently already has this for the vgaswitcheroo case:
if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER;
The reason why I suggested the request_module approach is because as said it will allow lcdshadow_get() to return either a device or -ENOENT and never -EPROBE_DEFER and currently none of the i915 code, nor the nouveau code nor the amdgpu code has any EPROBE_DEFER checks. They all assume that once there probe function is called they can continue on forward without unrolling and exiting from probe with EPROBE_DEFER ever. I agree that that is somewhat of a bad assumption now a days but fixing that is going to be massive undertakig and I'm afraid that trying to deal with it will lead to many many regressions.
So I would rather work around it by using request_module.
plus we have solid error-injecting load error tests in igt. So that part shouldn't be a big deal.
I guess my SOFTDEP solution has the same issue though. Do you know has this is dealt with for kvmgt ?
kvmgt? What do you mean there? kvmgt is just a user of i915-gem, if you enable it in the config (and module option too iirc) then it works more like an additional uapi layer, similar to how we handle fbdev emulation. So totally different scenario, since the main driver is 100% in control of the situation and controls the register/unregister lifetime cycles completely. There's no indepedent part somewhere else for kvmgt or fbdev emulation.
Or I'm totally misunderstanding what you mean with this example here.
The i915 driver used to have a:
MODULE_SOFTDEP("pre: kvmgt")
But it seems that that has been replaced with some mechanism or maybe the need for kvmgt to be loaded first has been removed/
Regards,
Hans
Hi,
On 4/15/20 1:39 PM, Hans de Goede wrote:
<snip>
/* Add comment explaining why we need this messy stuff here */ const char * const shadow_providers[] = { #ifdef CONFIG_THINKPAD_ACPI_MODULE "thinkpad_acpi", #endif #ifdef CONFIG_OTHER_MODULE "other", #endif NULL };
int module_init(void) { /* do actual setup of the ?class? */
for (i = 0; shadow_providers[i]; i++) request_module(shadow_providers[i]);
return 0; }
Hm I think explicitly loading drivers feels very much not device model like. Don't these drivers auto-load, matching on acpi functions?
thinkpad_acpi does autoload based on a number of ACPI device-ids, the idea behind the above request_module is to avoid the need to have the acpi-match function you mentioned above.
Basically what would happen is e.g. :
- i915 loads, calls lcdshadow_get(dev, "eDP-1");
- if this is the first lcdshadow_get() call then
the lcdshadow core will do the request_module calls, allowing any of these modules to get loaded + probed and call e.g. lcdshadow_register(&mylcdshadowdev, <gfx-adapter-dev-name>, "eDP-1"); 3. After the request modules the lcdshadow_get() will walk over the list of registered devices, including the ones just registered by the request_module calls and then hopefully find a match
So by doing the request-module calls before checking for a matching lcdshadow dev, we avoid the need of having some of the knowledge currently abstracted away in the thinkpad_acpi driver duplicated inside the drm code somewhere.
I guess if that doesn't exist, then we'd need to fix that one first :-/ In general no request_module please, that shouldn't be needed either.
The trouble with request_module is also that (afaiui) it doesn't really work well with parallel module load and all that, for EPROBE_DEFER to work we do need to be able to answer "should we have a driver?" independently of whether that driver has loaded already or not.
The idea here is to avoid using EPROBE_DEFER (on x86 at least) and either directly return the lcdshadow_dev or ENOENT. Also see below.
<snip>
Assuming we are going to add some device/model specific lcdshadow knowledge inside the lcdshadow core as you suggested with your "small acpi match function" above, we could do something similar to what the vga_switcheroo code is doing for this and have a lcdshadow_defer_probe() helper and call that really early in i915_pci_probe(), which currently already has this for the vgaswitcheroo case:
if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER;
So thinking more about this and given the total lack of EPROBE_DEFER handling in the 3 major X86 GPU/kms drivers I think that adding a lcdshadow_defer_probe() helper is the way to go. This will also avoid the need for duplicating the lcdshadow detect functionality in the small ACPI-match functions you mentioned (although that might still be interesting to speedup the boot).
When everything is builtin then each enabled "module"-s module_init function will get called, we can call a lcdshadow_probe_done("module-name") function from those and the lcdshadow core can then track if all potential lcdhadow providers have initialized before it stops returning non 0 from lcdshadow_defer_probe().
Or if we still do the small match functions it could be even smarter with this...
And for the modular case it can call request_module on all (enabled as module) potential lcdhadow providers (or again we could rely on the small match function instead).
Then (on x86 at least) we can have lcdshadow_get never return -EPROBE_DEFER and avoid the need to solve the lack of EPROBE_DEFER support in the 3 major x86 drivers.
And this is all kernel internal, so if that lack of EPROBE_DEFER support ever gets fixed then we can drop the lcdshadow_defer_probe() hack and make lcdshadow_get also return -EPROBE_DEFER on x86 in some cases.
Regards,
Hans
On Wed, Apr 15, 2020 at 01:39:23PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 12:22 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 11:52 AM, Daniel Vetter wrote:
<snip>
iv. What every SoC subsystem does:
- lcdshadow drivers register drivers
- drm drivers look them up
- if stuff isn't there yet, we delay loading with EPROBE_DEFER until
the entire thing is assembled.
That's what we're doing already for other standardized components like drm_bridge or drm_panel, and I think that's also the right approach for backlight and anything else like that. Hand-rolling our own EPROBE_DEFER handling, or some other duct-tape monsters imo just leads to real pain. Also, with EPROBE_DEFER we have one standard way of building a driver from component, which spans subsystems and is also the underlying magic that makes stuff like component.c work.
On the SoCs we have devicetree telling us what components there are, so we can wait for them to show up. The only way to figure out if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, or duplicating a lot of code from thinkpad_acpi. Edit: also see below for a possible solution.
Yup it sucks. I think all we can do is have a small acpi match function (which yes will duplicate some of the thinkpad_acpi driver logic) to re-create that information and give us a "should we have a lcdshadow driver for this $pci_device" answer.
Ok, so questions about this solution:
Where should that match-function live ?
An acpi_thinkpad derived match-function will only be able to answer if there is an lcdshadow device/driver for the internal panel. It will not be able to tie this info to a certain PCI device. My plan is to pass NULL as dev_name when registering the lcdshadow-device and have lcdshadow_get(dev, <connector-name>) skip device-name matching (consider everything a match) for lcdshadow-devices registered with NULL as dev_name.
So I guess in this case the mini match function should just ignore the passed in device?
Yeah I think we can't really avoid that. I also expect that we'll need ACPI and dt versions of this, and driver needs to know which one to call. Since at least in a dt world the driver knows exactly for which dt node it needs a lcdshadow driver for (with the phandle stuff), so we can be a lot more strict.
For the acpi version I'm not even sure we can do more than provide the struct device * pointer of the gpu. I think if we ever get more than 1 lcdshadow driver on acpi systems we can add more stuff later on, for now I'd just leave that out.
So maybe
acpi_lcdshadow_get(struct device *dev);
of_lcdshadow_get(struct device_node *np);
And with maybe a future plan to add some kind of enum or whatever to acpi_lcdshadow_get(). Both would return either the lcdshadow pointer, or an PTR_ERR() so that we could encode EPROBE_DEFER vs ENOENT.
We might also want a low-level lcdshadow_get() which only returns ENOENT when the driver isn't there, and which leaves "do we really need one?" to higher levels to answer.
I'd also lean towards putting lcdshadow headers/interfaces into drivers/gpu, similar to how backlight (well that one is in drivers/video for historical reasons) with driver implementations all over.
This need for an in-kernel source of truth for "which backlight, if any, do I need" is why the backlight stuff never got fixed. It's a really hard problem, and doing the table flip and just letting userspace deal with the mess at least avoids having to face the fact that the kernel is totally failing here. It's made worse for backlight because of 20 years of hacked up systems that "work", and we can't regress any of them.
Right, as discussed during last plumbers Christian Kellner and I have written a plan to slowly resolve this. Unfortunately Christian has not found the time to work on this yet.
Yeah, this is hopefully going to be really nice for acpi systems. DT stuff is hopefully simpler to fix since at least the discovery part is already all there, just not formalized.
Also given that this has been hurting for years I wouldn't worry about a few months here and there. -Daniel
I really want to avoid that situation for anything new like lcdshadow.
Ack.
Wrt the actual userspace interface, I think the drm core should handle this as much as possible. Same way we let drm core handle a lot of the atomic property stuff, to make sure things are standardized.
Agreed.
So
- add an lcdshadow pointer to struct drm_connector
- implement the property glue code in drm core completely, at least
for the read side
- for the write side we might want to have some drm wrappers drivers
can call to at the appropriate times to e.g. restore privacy screen settings when the panel gets enabled. In case that's needed.
Also one thing that the EPROBE_DEFER stuff forces us to handle correctly is to track these depencies. That's the other nightmare in backlight land, essentially you have no idea of knowing (in userspace) whether the backlight driver you want is actually loaded, resulting in lots of fun. The kernel is the only thing that can know, and for hw that's built-in there's really no excuse to not know. So a model where stuff gets assembled after drm_dev_register() is imo just plain buggy.
This means that the lcdshadow subsystem needs to have some idea of whether it needs a driver, so that it can correctly differentiate between EPROBE_DEFER and ENOENT error cases. In the latter the driver should continue loading ofc.
Right, so how would the lcdshadow subsystem do this? The only way would be for it to say try and modprobe thinkpad_acpi from its core init function (if thinkpad_acpi is enabled). IOW it is the usual x86 mess. I guess we could have something like this in a theoretical to be added lcdshadow subsystem:
/* Add comment explaining why we need this messy stuff here */ const char * const shadow_providers[] = { #ifdef CONFIG_THINKPAD_ACPI_MODULE "thinkpad_acpi", #endif #ifdef CONFIG_OTHER_MODULE "other", #endif NULL };
int module_init(void) { /* do actual setup of the ?class? */
for (i = 0; shadow_providers[i]; i++) request_module(shadow_providers[i]); return 0;
}
Hm I think explicitly loading drivers feels very much not device model like. Don't these drivers auto-load, matching on acpi functions?
thinkpad_acpi does autoload based on a number of ACPI device-ids, the idea behind the above request_module is to avoid the need to have the acpi-match function you mentioned above.
Basically what would happen is e.g. :
- i915 loads, calls lcdshadow_get(dev, "eDP-1");
- if this is the first lcdshadow_get() call then the lcdshadow core will do the request_module calls, allowing any of these modules to get loaded + probed and call e.g. lcdshadow_register(&mylcdshadowdev, <gfx-adapter-dev-name>, "eDP-1");
- After the request modules the lcdshadow_get() will walk over the list of registered devices, including the ones just registered by the request_module calls and then hopefully find a match
So by doing the request-module calls before checking for a matching lcdshadow dev, we avoid the need of having some of the knowledge currently abstracted away in the thinkpad_acpi driver duplicated inside the drm code somewhere.
I guess if that doesn't exist, then we'd need to fix that one first :-/ In general no request_module please, that shouldn't be needed either.
The trouble with request_module is also that (afaiui) it doesn't really work well with parallel module load and all that, for EPROBE_DEFER to work we do need to be able to answer "should we have a driver?" independently of whether that driver has loaded already or not.
The idea here is to avoid using EPROBE_DEFER (on x86 at least) and either directly return the lcdshadow_dev or ENOENT. Also see below.
And then simply have the function which gets a lcd_shadow provider provide -ENOENT if there are none.
One problem here is that this will work for modules since the drm-core will depend on modules from the lcdshadow-core, so the lcdshadow-core module will get loaded first and will then try to load thinkpad_acpi, which will return with -ENODEV from its module_init on non ThinkPads. We could even put the request_module loop in some other init_once function so that things will also work when the lcdshadow-core is builtin.
But how do we ensure that thinkpad_acpi will get to register its lcdshadow before say i915 gets probed if everything is builtin?
EPROBE_DEFER. Everyone hates it, but it does work. It's really kinda neat magic. The only pain point is that you might need to wire EPROBE_DEFER through a few layers in i915, but we do have a lot of that in place already,
AFAIK we do not have any EPROBE_DEFER handling in i915 in place at all! There are _0_ checks for an EPROBE_DEFER return in all of the i915 code. We actually have a similar problem with backlight control when controlled by an external PWM controller such as the PWM controller of the Crystal Cove PMIC found on some BYT/CHT drivers or the PWM controller found in the LPSS bits of BYT/CHT SoCs.
Since again we lack a clear hardware model like device tree, we use lookup tables for the (external) PWM controllers on x86, see the pwm_add_table calls in drivers/acpi/acpi_lpss.c and drivers/mfd/intel_soc_pmic_core.c and the pwm_get call in the i915 code simply continues on its happy way without backlight control if the pwm_get fails, rather then dealing with -EPROBE_DEFER. I looked into untangling this back then but the i915 code really is not ready to unroll everything it has done already once we are probing connectors.
I actually "fixed" the PWM issue by:
- Adding a module-name field to the PWM lookup table
registered by the pwm_add_table calls.
- Have the PWM core call request_module on that module_name
if it finds a match in the registered lookup_table (which get registered early on), but the matching pwm_dev has not been registered yet.
So I agree with you that ideally i915 would handle EPROBE_DEFER but atm AFAIK it really does not handle that at all and we are pretty far away from getting to a point where it does handle that.
Assuming we are going to add some device/model specific lcdshadow knowledge inside the lcdshadow core as you suggested with your "small acpi match function" above, we could do something similar to what the vga_switcheroo code is doing for this and have a lcdshadow_defer_probe() helper and call that really early in i915_pci_probe(), which currently already has this for the vgaswitcheroo case:
if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER;
The reason why I suggested the request_module approach is because as said it will allow lcdshadow_get() to return either a device or -ENOENT and never -EPROBE_DEFER and currently none of the i915 code, nor the nouveau code nor the amdgpu code has any EPROBE_DEFER checks. They all assume that once there probe function is called they can continue on forward without unrolling and exiting from probe with EPROBE_DEFER ever. I agree that that is somewhat of a bad assumption now a days but fixing that is going to be massive undertakig and I'm afraid that trying to deal with it will lead to many many regressions.
So I would rather work around it by using request_module.
plus we have solid error-injecting load error tests in igt. So that part shouldn't be a big deal.
I guess my SOFTDEP solution has the same issue though. Do you know has this is dealt with for kvmgt ?
kvmgt? What do you mean there? kvmgt is just a user of i915-gem, if you enable it in the config (and module option too iirc) then it works more like an additional uapi layer, similar to how we handle fbdev emulation. So totally different scenario, since the main driver is 100% in control of the situation and controls the register/unregister lifetime cycles completely. There's no indepedent part somewhere else for kvmgt or fbdev emulation.
Or I'm totally misunderstanding what you mean with this example here.
The i915 driver used to have a:
MODULE_SOFTDEP("pre: kvmgt")
But it seems that that has been replaced with some mechanism or maybe the need for kvmgt to be loaded first has been removed/
Regards,
Hans
Hi,
On 4/15/20 2:01 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 01:39:23PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 12:22 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 11:52 AM, Daniel Vetter wrote:
<snip>
iv. What every SoC subsystem does:
- lcdshadow drivers register drivers
- drm drivers look them up
- if stuff isn't there yet, we delay loading with EPROBE_DEFER until
the entire thing is assembled.
That's what we're doing already for other standardized components like drm_bridge or drm_panel, and I think that's also the right approach for backlight and anything else like that. Hand-rolling our own EPROBE_DEFER handling, or some other duct-tape monsters imo just leads to real pain. Also, with EPROBE_DEFER we have one standard way of building a driver from component, which spans subsystems and is also the underlying magic that makes stuff like component.c work.
On the SoCs we have devicetree telling us what components there are, so we can wait for them to show up. The only way to figure out if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, or duplicating a lot of code from thinkpad_acpi. Edit: also see below for a possible solution.
Yup it sucks. I think all we can do is have a small acpi match function (which yes will duplicate some of the thinkpad_acpi driver logic) to re-create that information and give us a "should we have a lcdshadow driver for this $pci_device" answer.
Ok, so questions about this solution:
Where should that match-function live ?
An acpi_thinkpad derived match-function will only be able to answer if there is an lcdshadow device/driver for the internal panel. It will not be able to tie this info to a certain PCI device. My plan is to pass NULL as dev_name when registering the lcdshadow-device and have lcdshadow_get(dev, <connector-name>) skip device-name matching (consider everything a match) for lcdshadow-devices registered with NULL as dev_name.
So I guess in this case the mini match function should just ignore the passed in device?
Yeah I think we can't really avoid that. I also expect that we'll need ACPI and dt versions of this, and driver needs to know which one to call. Since at least in a dt world the driver knows exactly for which dt node it needs a lcdshadow driver for (with the phandle stuff), so we can be a lot more strict.
For the acpi version I'm not even sure we can do more than provide the struct device * pointer of the gpu. I think if we ever get more than 1 lcdshadow driver on acpi systems we can add more stuff later on, for now I'd just leave that out.
So maybe
acpi_lcdshadow_get(struct device *dev);
of_lcdshadow_get(struct device_node *np);
And with maybe a future plan to add some kind of enum or whatever to acpi_lcdshadow_get(). Both would return either the lcdshadow pointer, or an PTR_ERR() so that we could encode EPROBE_DEFER vs ENOENT.
Ok, note I plan to only implement the acpi version for now, I do expect some non ACPI/x86 devices to show up with his feature eventually but I believe it is best to implement this once those actually show up. Esp. since this will also involve adding some devicetree bindings for this.
We might also want a low-level lcdshadow_get() which only returns ENOENT when the driver isn't there, and which leaves "do we really need one?" to higher levels to answer.
Right, so my latest idea on that is indeed a high-level lcdshadow_get() which takes a struct device * and a connector-name and which never returns EPROBE_DEFER.
As for leaving things to the higher levels to answer, as explained in my other follow-up email I think that we should probably add a lcdshadow_probe_defer() helper for this and call that early on in the PCI-driver probe functions for the 3 major x86 GPU drivers. Does that sound ok to you?
I'd also lean towards putting lcdshadow headers/interfaces into drivers/gpu,
Ack, I think we should even make this drm specific and prefix it with drm_ so that we get drm_lcdshadow_foo as functions, just to make clear that this is maintained together with the other drm bits.
But my question about "where should this live" was mainly about the light weight match helpers you suggested to use to figure out if the device supports lcdshadow at all (and we thus should wait for a driver) or not. E.g. I can see us adding a:
drivers/gpu/drm/drm_lcdshadow.c
file for the core bits and then maybe a:
drivers/gpu/drm/drm_lcdshadow_detect.c
file with the light weight match helpers, with each helper wrapped in #if IS_ENABLED(CONFIG_THINKPAD_ACPI), etc. ?
with driver implementations all over.
Ack.
Regards,
Hans
On Wed, Apr 15, 2020 at 03:02:53PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 2:01 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 01:39:23PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 12:22 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 11:52 AM, Daniel Vetter wrote:
<snip>
iv. What every SoC subsystem does:
- lcdshadow drivers register drivers
- drm drivers look them up
- if stuff isn't there yet, we delay loading with EPROBE_DEFER until
the entire thing is assembled.
That's what we're doing already for other standardized components like drm_bridge or drm_panel, and I think that's also the right approach for backlight and anything else like that. Hand-rolling our own EPROBE_DEFER handling, or some other duct-tape monsters imo just leads to real pain. Also, with EPROBE_DEFER we have one standard way of building a driver from component, which spans subsystems and is also the underlying magic that makes stuff like component.c work.
On the SoCs we have devicetree telling us what components there are, so we can wait for them to show up. The only way to figure out if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, or duplicating a lot of code from thinkpad_acpi. Edit: also see below for a possible solution.
Yup it sucks. I think all we can do is have a small acpi match function (which yes will duplicate some of the thinkpad_acpi driver logic) to re-create that information and give us a "should we have a lcdshadow driver for this $pci_device" answer.
Ok, so questions about this solution:
Where should that match-function live ?
An acpi_thinkpad derived match-function will only be able to answer if there is an lcdshadow device/driver for the internal panel. It will not be able to tie this info to a certain PCI device. My plan is to pass NULL as dev_name when registering the lcdshadow-device and have lcdshadow_get(dev, <connector-name>) skip device-name matching (consider everything a match) for lcdshadow-devices registered with NULL as dev_name.
So I guess in this case the mini match function should just ignore the passed in device?
Yeah I think we can't really avoid that. I also expect that we'll need ACPI and dt versions of this, and driver needs to know which one to call. Since at least in a dt world the driver knows exactly for which dt node it needs a lcdshadow driver for (with the phandle stuff), so we can be a lot more strict.
For the acpi version I'm not even sure we can do more than provide the struct device * pointer of the gpu. I think if we ever get more than 1 lcdshadow driver on acpi systems we can add more stuff later on, for now I'd just leave that out.
So maybe
acpi_lcdshadow_get(struct device *dev);
of_lcdshadow_get(struct device_node *np);
And with maybe a future plan to add some kind of enum or whatever to acpi_lcdshadow_get(). Both would return either the lcdshadow pointer, or an PTR_ERR() so that we could encode EPROBE_DEFER vs ENOENT.
Ok, note I plan to only implement the acpi version for now, I do expect some non ACPI/x86 devices to show up with his feature eventually but I believe it is best to implement this once those actually show up. Esp. since this will also involve adding some devicetree bindings for this.
ofc, just wanted to lay out the entire thing. The DT version needs some good bikeshed on the dt schema first anyway (so that the helper can decode that directly).
We might also want a low-level lcdshadow_get() which only returns ENOENT when the driver isn't there, and which leaves "do we really need one?" to higher levels to answer.
Right, so my latest idea on that is indeed a high-level lcdshadow_get() which takes a struct device * and a connector-name and which never returns EPROBE_DEFER.
As for leaving things to the higher levels to answer, as explained in my other follow-up email I think that we should probably add a lcdshadow_probe_defer() helper for this and call that early on in the PCI-driver probe functions for the 3 major x86 GPU drivers. Does that sound ok to you?
Uh ... not pretty. There's still a lifetime problem that strictly speaking there's nothing stopping the other driver from getting unloaded between your _probe_defer and the subsequent _get. I think fixing this properly (and screaming a bit at the error code, oh well) is better.
I'd also lean towards putting lcdshadow headers/interfaces into drivers/gpu,
Ack, I think we should even make this drm specific and prefix it with drm_ so that we get drm_lcdshadow_foo as functions, just to make clear that this is maintained together with the other drm bits.
I want to avoid the impression that I'm on an evil plan to take over the entire world, but personally very happy with a drm_ prefix for this.
But my question about "where should this live" was mainly about the light weight match helpers you suggested to use to figure out if the device supports lcdshadow at all (and we thus should wait for a driver) or not. E.g. I can see us adding a:
drivers/gpu/drm/drm_lcdshadow.c
file for the core bits and then maybe a:
drivers/gpu/drm/drm_lcdshadow_detect.c
file with the light weight match helpers, with each helper wrapped in #if IS_ENABLED(CONFIG_THINKPAD_ACPI), etc. ?
I'd expect it's all going to be so tiny that separate file wont make much sense. Thus far we're simply adding the 1-2 of_ helpers to the corresponding file, with an #ifdef CONFIG_OF around them. It's ok enough.
with driver implementations all over.
Ack.
Regards,
Hans
Cheers, Daniel
Hi,
On 4/15/20 7:54 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 03:02:53PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 2:01 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 01:39:23PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 12:22 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 11:52 AM, Daniel Vetter wrote:
<snip>
> iv. What every SoC subsystem does: > > - lcdshadow drivers register drivers > - drm drivers look them up > - if stuff isn't there yet, we delay loading with EPROBE_DEFER until > the entire thing is assembled. > > That's what we're doing already for other standardized components like > drm_bridge or drm_panel, and I think that's also the right approach > for backlight and anything else like that. Hand-rolling our own > EPROBE_DEFER handling, or some other duct-tape monsters imo just leads > to real pain. Also, with EPROBE_DEFER we have one standard way of > building a driver from component, which spans subsystems and is also > the underlying magic that makes stuff like component.c work.
On the SoCs we have devicetree telling us what components there are, so we can wait for them to show up. The only way to figure out if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, or duplicating a lot of code from thinkpad_acpi. Edit: also see below for a possible solution.
Yup it sucks. I think all we can do is have a small acpi match function (which yes will duplicate some of the thinkpad_acpi driver logic) to re-create that information and give us a "should we have a lcdshadow driver for this $pci_device" answer.
Ok, so questions about this solution:
Where should that match-function live ?
An acpi_thinkpad derived match-function will only be able to answer if there is an lcdshadow device/driver for the internal panel. It will not be able to tie this info to a certain PCI device. My plan is to pass NULL as dev_name when registering the lcdshadow-device and have lcdshadow_get(dev, <connector-name>) skip device-name matching (consider everything a match) for lcdshadow-devices registered with NULL as dev_name.
So I guess in this case the mini match function should just ignore the passed in device?
Yeah I think we can't really avoid that. I also expect that we'll need ACPI and dt versions of this, and driver needs to know which one to call. Since at least in a dt world the driver knows exactly for which dt node it needs a lcdshadow driver for (with the phandle stuff), so we can be a lot more strict.
For the acpi version I'm not even sure we can do more than provide the struct device * pointer of the gpu. I think if we ever get more than 1 lcdshadow driver on acpi systems we can add more stuff later on, for now I'd just leave that out.
So maybe
acpi_lcdshadow_get(struct device *dev);
of_lcdshadow_get(struct device_node *np);
And with maybe a future plan to add some kind of enum or whatever to acpi_lcdshadow_get(). Both would return either the lcdshadow pointer, or an PTR_ERR() so that we could encode EPROBE_DEFER vs ENOENT.
Ok, note I plan to only implement the acpi version for now, I do expect some non ACPI/x86 devices to show up with his feature eventually but I believe it is best to implement this once those actually show up. Esp. since this will also involve adding some devicetree bindings for this.
ofc, just wanted to lay out the entire thing. The DT version needs some good bikeshed on the dt schema first anyway (so that the helper can decode that directly).
We might also want a low-level lcdshadow_get() which only returns ENOENT when the driver isn't there, and which leaves "do we really need one?" to higher levels to answer.
Right, so my latest idea on that is indeed a high-level lcdshadow_get() which takes a struct device * and a connector-name and which never returns EPROBE_DEFER.
As for leaving things to the higher levels to answer, as explained in my other follow-up email I think that we should probably add a lcdshadow_probe_defer() helper for this and call that early on in the PCI-driver probe functions for the 3 major x86 GPU drivers. Does that sound ok to you?
Uh ... not pretty. There's still a lifetime problem that strictly speaking there's nothing stopping the other driver from getting unloaded between your _probe_defer and the subsequent _get. I think fixing this properly (and screaming a bit at the error code, oh well) is better.
I would really like to separate the discussion and the work on getting the 3 major x86 GPU drivers ready to deal with EPROBE_DEFER from the lcdshadow discussion and work. I expect getting these 3 drivers ready for EPROBE_DEFER is going to be a major undertaking and I would like avoid introducing this significant scope creep to the lcdshadow discussion, because it simply is a too big undertaking to undertake without us getting a significant amount of manpower specifically for this from somewhere.
Note I do agree with you that getting these 3 drivers ready for EPROBE_DEFER handling is a worthwhile undertaking, but it simply will take too much extra time and as such IMHO it really is out of scope for the lcdshadow stuff. I expect the amount of work (esp. also dealing with testing and regressions) for the EPROBE_DEFER project by itself to be a lot *more* work then the actual lcdshadow project.
So going with the assumption/decision that adding proper EPROBE_DEFER handling to these 3 drivers is out of scope, I believe that adding a lcdshadow_probe_defer() helper is an ok solution/workaround for now.
As for your case of the other driver getting unloaded in between the check and use of it, that can only happen by explicit user action and in that case the worst thing what will happen is the "privacy-screen" property missing from the connector, which in that case is more or less exactly what the user asked for.
I'd also lean towards putting lcdshadow headers/interfaces into drivers/gpu,
Ack, I think we should even make this drm specific and prefix it with drm_ so that we get drm_lcdshadow_foo as functions, just to make clear that this is maintained together with the other drm bits.
I want to avoid the impression that I'm on an evil plan to take over the entire world, but personally very happy with a drm_ prefix for this.
Hehe, ok.
But my question about "where should this live" was mainly about the light weight match helpers you suggested to use to figure out if the device supports lcdshadow at all (and we thus should wait for a driver) or not. E.g. I can see us adding a:
drivers/gpu/drm/drm_lcdshadow.c
file for the core bits and then maybe a:
drivers/gpu/drm/drm_lcdshadow_detect.c
file with the light weight match helpers, with each helper wrapped in #if IS_ENABLED(CONFIG_THINKPAD_ACPI), etc. ?
I'd expect it's all going to be so tiny that separate file wont make much sense. Thus far we're simply adding the 1-2 of_ helpers to the corresponding file, with an #ifdef CONFIG_OF around them. It's ok enough.
Ok, lets start with one file for now, we can always split it later.
Regards,
Hans
On Wed, Apr 15, 2020 at 8:19 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 7:54 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 03:02:53PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 2:01 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 01:39:23PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 12:22 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote: > > Hi, > > On 4/15/20 11:52 AM, Daniel Vetter wrote:
<snip>
>> iv. What every SoC subsystem does: >> >> - lcdshadow drivers register drivers >> - drm drivers look them up >> - if stuff isn't there yet, we delay loading with EPROBE_DEFER until >> the entire thing is assembled. >> >> That's what we're doing already for other standardized components like >> drm_bridge or drm_panel, and I think that's also the right approach >> for backlight and anything else like that. Hand-rolling our own >> EPROBE_DEFER handling, or some other duct-tape monsters imo just leads >> to real pain. Also, with EPROBE_DEFER we have one standard way of >> building a driver from component, which spans subsystems and is also >> the underlying magic that makes stuff like component.c work. > > On the SoCs we have devicetree telling us what components there > are, so we can wait for them to show up. The only way to figure out > if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, > or duplicating a lot of code from thinkpad_acpi. Edit: > also see below for a possible solution.
Yup it sucks. I think all we can do is have a small acpi match function (which yes will duplicate some of the thinkpad_acpi driver logic) to re-create that information and give us a "should we have a lcdshadow driver for this $pci_device" answer.
Ok, so questions about this solution:
Where should that match-function live ?
An acpi_thinkpad derived match-function will only be able to answer if there is an lcdshadow device/driver for the internal panel. It will not be able to tie this info to a certain PCI device. My plan is to pass NULL as dev_name when registering the lcdshadow-device and have lcdshadow_get(dev, <connector-name>) skip device-name matching (consider everything a match) for lcdshadow-devices registered with NULL as dev_name.
So I guess in this case the mini match function should just ignore the passed in device?
Yeah I think we can't really avoid that. I also expect that we'll need ACPI and dt versions of this, and driver needs to know which one to call. Since at least in a dt world the driver knows exactly for which dt node it needs a lcdshadow driver for (with the phandle stuff), so we can be a lot more strict.
For the acpi version I'm not even sure we can do more than provide the struct device * pointer of the gpu. I think if we ever get more than 1 lcdshadow driver on acpi systems we can add more stuff later on, for now I'd just leave that out.
So maybe
acpi_lcdshadow_get(struct device *dev);
of_lcdshadow_get(struct device_node *np);
And with maybe a future plan to add some kind of enum or whatever to acpi_lcdshadow_get(). Both would return either the lcdshadow pointer, or an PTR_ERR() so that we could encode EPROBE_DEFER vs ENOENT.
Ok, note I plan to only implement the acpi version for now, I do expect some non ACPI/x86 devices to show up with his feature eventually but I believe it is best to implement this once those actually show up. Esp. since this will also involve adding some devicetree bindings for this.
ofc, just wanted to lay out the entire thing. The DT version needs some good bikeshed on the dt schema first anyway (so that the helper can decode that directly).
We might also want a low-level lcdshadow_get() which only returns ENOENT when the driver isn't there, and which leaves "do we really need one?" to higher levels to answer.
Right, so my latest idea on that is indeed a high-level lcdshadow_get() which takes a struct device * and a connector-name and which never returns EPROBE_DEFER.
As for leaving things to the higher levels to answer, as explained in my other follow-up email I think that we should probably add a lcdshadow_probe_defer() helper for this and call that early on in the PCI-driver probe functions for the 3 major x86 GPU drivers. Does that sound ok to you?
Uh ... not pretty. There's still a lifetime problem that strictly speaking there's nothing stopping the other driver from getting unloaded between your _probe_defer and the subsequent _get. I think fixing this properly (and screaming a bit at the error code, oh well) is better.
I would really like to separate the discussion and the work on getting the 3 major x86 GPU drivers ready to deal with EPROBE_DEFER from the lcdshadow discussion and work. I expect getting these 3 drivers ready for EPROBE_DEFER is going to be a major undertaking and I would like avoid introducing this significant scope creep to the lcdshadow discussion, because it simply is a too big undertaking to undertake without us getting a significant amount of manpower specifically for this from somewhere.
Note I do agree with you that getting these 3 drivers ready for EPROBE_DEFER handling is a worthwhile undertaking, but it simply will take too much extra time and as such IMHO it really is out of scope for the lcdshadow stuff. I expect the amount of work (esp. also dealing with testing and regressions) for the EPROBE_DEFER project by itself to be a lot *more* work then the actual lcdshadow project.
So going with the assumption/decision that adding proper EPROBE_DEFER handling to these 3 drivers is out of scope, I believe that adding a lcdshadow_probe_defer() helper is an ok solution/workaround for now.
As for your case of the other driver getting unloaded in between the check and use of it, that can only happen by explicit user action and in that case the worst thing what will happen is the "privacy-screen" property missing from the connector, which in that case is more or less exactly what the user asked for.
For i915 we've had patches, for mei-hdcp integration. Until it became clear that the mei subsystem is bonkers, and handles suspend/resume by unloading! drivers.
Which forced i915 to unload and broke the world :-/
So at least for i915 the work should be done already, and just amount to updating the patches Ram already had. No ideas about the other 2.
What I don't really want is an interface with races. So if fixing the other drivers is too much work, what we could do is roughly:
- in the top-level probe function to an acpi_lcdshadow_get. This might fail with EPROBE_DEFER. - this gives us a dangling reference, but we can drop that one when we exit the top-level probe function (both on success and on error cases) - the 2nd acpi_lcdshadow_get call deep down should always succeed at that point (since the top level holds a reference), so you could wrap that in a WARNING(IS_ERR_PTR()). Ok that's a lie, if we concurrently unload then the 2nd call still fails (the reference can never prevent a hotunbind in the linux kernel device model), so this is exactly as buggy as your version, so still needs a comment about that.
Adding a acpi_lcdshadow_probe_defer() function otoh just gives people the impression that that's actually a correct way of doing this.
Then it's up to the driver maintainers whether they're ok with the above hack of a fake reference, or not. As I said, I think for i915 it should be fairly ok to just roll it out, but maybe the patches don't apply at all anymore.
btw to make everything work you also need to set up a device_link between the lcdshadow device (and it's driver, that's the more important thing) and the gpu device. That device link will make sure that - suspend/resume is done in the right order - driver load/unload is done in the right order, i.e. unloading of the lcdshadow driver will automatically force an unbind of the gpu driver first.
With that all the get/put will do is just refcount the final kfree of the driver structure.
Cheers, Daniel
I'd also lean towards putting lcdshadow headers/interfaces into drivers/gpu,
Ack, I think we should even make this drm specific and prefix it with drm_ so that we get drm_lcdshadow_foo as functions, just to make clear that this is maintained together with the other drm bits.
I want to avoid the impression that I'm on an evil plan to take over the entire world, but personally very happy with a drm_ prefix for this.
Hehe, ok.
But my question about "where should this live" was mainly about the light weight match helpers you suggested to use to figure out if the device supports lcdshadow at all (and we thus should wait for a driver) or not. E.g. I can see us adding a:
drivers/gpu/drm/drm_lcdshadow.c
file for the core bits and then maybe a:
drivers/gpu/drm/drm_lcdshadow_detect.c
file with the light weight match helpers, with each helper wrapped in #if IS_ENABLED(CONFIG_THINKPAD_ACPI), etc. ?
I'd expect it's all going to be so tiny that separate file wont make much sense. Thus far we're simply adding the 1-2 of_ helpers to the corresponding file, with an #ifdef CONFIG_OF around them. It's ok enough.
Ok, lets start with one file for now, we can always split it later.
Regards,
Hans
Hi,
On 4/15/20 8:29 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 8:19 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 7:54 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 03:02:53PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 2:01 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 01:39:23PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 12:22 PM, Daniel Vetter wrote: > On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote: >> >> Hi, >> >> On 4/15/20 11:52 AM, Daniel Vetter wrote:
<snip>
>>> iv. What every SoC subsystem does: >>> >>> - lcdshadow drivers register drivers >>> - drm drivers look them up >>> - if stuff isn't there yet, we delay loading with EPROBE_DEFER until >>> the entire thing is assembled. >>> >>> That's what we're doing already for other standardized components like >>> drm_bridge or drm_panel, and I think that's also the right approach >>> for backlight and anything else like that. Hand-rolling our own >>> EPROBE_DEFER handling, or some other duct-tape monsters imo just leads >>> to real pain. Also, with EPROBE_DEFER we have one standard way of >>> building a driver from component, which spans subsystems and is also >>> the underlying magic that makes stuff like component.c work. >> >> On the SoCs we have devicetree telling us what components there >> are, so we can wait for them to show up. The only way to figure out >> if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, >> or duplicating a lot of code from thinkpad_acpi. Edit: >> also see below for a possible solution. > > Yup it sucks. I think all we can do is have a small acpi match > function (which yes will duplicate some of the thinkpad_acpi driver > logic) to re-create that information and give us a "should we have a > lcdshadow driver for this $pci_device" answer.
Ok, so questions about this solution:
Where should that match-function live ?
An acpi_thinkpad derived match-function will only be able to answer if there is an lcdshadow device/driver for the internal panel. It will not be able to tie this info to a certain PCI device. My plan is to pass NULL as dev_name when registering the lcdshadow-device and have lcdshadow_get(dev, <connector-name>) skip device-name matching (consider everything a match) for lcdshadow-devices registered with NULL as dev_name.
So I guess in this case the mini match function should just ignore the passed in device?
Yeah I think we can't really avoid that. I also expect that we'll need ACPI and dt versions of this, and driver needs to know which one to call. Since at least in a dt world the driver knows exactly for which dt node it needs a lcdshadow driver for (with the phandle stuff), so we can be a lot more strict.
For the acpi version I'm not even sure we can do more than provide the struct device * pointer of the gpu. I think if we ever get more than 1 lcdshadow driver on acpi systems we can add more stuff later on, for now I'd just leave that out.
So maybe
acpi_lcdshadow_get(struct device *dev);
of_lcdshadow_get(struct device_node *np);
And with maybe a future plan to add some kind of enum or whatever to acpi_lcdshadow_get(). Both would return either the lcdshadow pointer, or an PTR_ERR() so that we could encode EPROBE_DEFER vs ENOENT.
Ok, note I plan to only implement the acpi version for now, I do expect some non ACPI/x86 devices to show up with his feature eventually but I believe it is best to implement this once those actually show up. Esp. since this will also involve adding some devicetree bindings for this.
ofc, just wanted to lay out the entire thing. The DT version needs some good bikeshed on the dt schema first anyway (so that the helper can decode that directly).
We might also want a low-level lcdshadow_get() which only returns ENOENT when the driver isn't there, and which leaves "do we really need one?" to higher levels to answer.
Right, so my latest idea on that is indeed a high-level lcdshadow_get() which takes a struct device * and a connector-name and which never returns EPROBE_DEFER.
As for leaving things to the higher levels to answer, as explained in my other follow-up email I think that we should probably add a lcdshadow_probe_defer() helper for this and call that early on in the PCI-driver probe functions for the 3 major x86 GPU drivers. Does that sound ok to you?
Uh ... not pretty. There's still a lifetime problem that strictly speaking there's nothing stopping the other driver from getting unloaded between your _probe_defer and the subsequent _get. I think fixing this properly (and screaming a bit at the error code, oh well) is better.
I would really like to separate the discussion and the work on getting the 3 major x86 GPU drivers ready to deal with EPROBE_DEFER from the lcdshadow discussion and work. I expect getting these 3 drivers ready for EPROBE_DEFER is going to be a major undertaking and I would like avoid introducing this significant scope creep to the lcdshadow discussion, because it simply is a too big undertaking to undertake without us getting a significant amount of manpower specifically for this from somewhere.
Note I do agree with you that getting these 3 drivers ready for EPROBE_DEFER handling is a worthwhile undertaking, but it simply will take too much extra time and as such IMHO it really is out of scope for the lcdshadow stuff. I expect the amount of work (esp. also dealing with testing and regressions) for the EPROBE_DEFER project by itself to be a lot *more* work then the actual lcdshadow project.
So going with the assumption/decision that adding proper EPROBE_DEFER handling to these 3 drivers is out of scope, I believe that adding a lcdshadow_probe_defer() helper is an ok solution/workaround for now.
As for your case of the other driver getting unloaded in between the check and use of it, that can only happen by explicit user action and in that case the worst thing what will happen is the "privacy-screen" property missing from the connector, which in that case is more or less exactly what the user asked for.
For i915 we've had patches, for mei-hdcp integration. Until it became clear that the mei subsystem is bonkers, and handles suspend/resume by unloading! drivers.
Which forced i915 to unload and broke the world :-/
So at least for i915 the work should be done already, and just amount to updating the patches Ram already had. No ideas about the other 2.
Ok.
What I don't really want is an interface with races. So if fixing the other drivers is too much work, what we could do is roughly:
- in the top-level probe function to an acpi_lcdshadow_get. This might
fail with EPROBE_DEFER.
- this gives us a dangling reference, but we can drop that one when we
exit the top-level probe function (both on success and on error cases)
- the 2nd acpi_lcdshadow_get call deep down should always succeed at
that point (since the top level holds a reference), so you could wrap that in a WARNING(IS_ERR_PTR()). Ok that's a lie, if we concurrently unload then the 2nd call still fails (the reference can never prevent a hotunbind in the linux kernel device model), so this is exactly as buggy as your version, so still needs a comment about that.
Adding a acpi_lcdshadow_probe_defer() function otoh just gives people the impression that that's actually a correct way of doing this.
Then it's up to the driver maintainers whether they're ok with the above hack of a fake reference, or not. As I said, I think for i915 it should be fairly ok to just roll it out, but maybe the patches don't apply at all anymore.
Ok trying to taking a ref early on and handling EPROBE_DEFER at that first attempt to take a ref works for me. So lets go with that. I will try to whip up a v1 patch-set for this, ETA aprox. 1-2 weeks I guess.
btw to make everything work you also need to set up a device_link between the lcdshadow device (and it's driver, that's the more important thing) and the gpu device. That device link will make sure that
- suspend/resume is done in the right order
- driver load/unload is done in the right order, i.e. unloading of the
lcdshadow driver will automatically force an unbind of the gpu driver first.
That is an interesting idea, but that does assume that their is an actual struct device which the code handling the lcdshadow binds to, which in case of thinkpad_acpi is not really the case.
Anyways passing in a parent device when registering a lcdshadow_dev seems like a good idea and we can do the device_link thing if the parent is non NULL.
Regards,
Hans
On Wed, Apr 15, 2020 at 9:50 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 8:29 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 8:19 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 7:54 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 03:02:53PM +0200, Hans de Goede wrote:
Hi,
On 4/15/20 2:01 PM, Daniel Vetter wrote:
On Wed, Apr 15, 2020 at 01:39:23PM +0200, Hans de Goede wrote: > Hi, > > On 4/15/20 12:22 PM, Daniel Vetter wrote: >> On Wed, Apr 15, 2020 at 12:11 PM Hans de Goede hdegoede@redhat.com wrote: >>> >>> Hi, >>> >>> On 4/15/20 11:52 AM, Daniel Vetter wrote: > > <snip> > >>>> iv. What every SoC subsystem does: >>>> >>>> - lcdshadow drivers register drivers >>>> - drm drivers look them up >>>> - if stuff isn't there yet, we delay loading with EPROBE_DEFER until >>>> the entire thing is assembled. >>>> >>>> That's what we're doing already for other standardized components like >>>> drm_bridge or drm_panel, and I think that's also the right approach >>>> for backlight and anything else like that. Hand-rolling our own >>>> EPROBE_DEFER handling, or some other duct-tape monsters imo just leads >>>> to real pain. Also, with EPROBE_DEFER we have one standard way of >>>> building a driver from component, which spans subsystems and is also >>>> the underlying magic that makes stuff like component.c work. >>> >>> On the SoCs we have devicetree telling us what components there >>> are, so we can wait for them to show up. The only way to figure out >>> if the lcdshadow thing is there on a ThinkPad is asking thinkpad_acpi, >>> or duplicating a lot of code from thinkpad_acpi. Edit: >>> also see below for a possible solution. >> >> Yup it sucks. I think all we can do is have a small acpi match >> function (which yes will duplicate some of the thinkpad_acpi driver >> logic) to re-create that information and give us a "should we have a >> lcdshadow driver for this $pci_device" answer. > > Ok, so questions about this solution: > > 1. Where should that match-function live ? > > 2. An acpi_thinkpad derived match-function will only be able to > answer if there is an lcdshadow device/driver for the internal > panel. It will not be able to tie this info to a certain PCI > device. My plan is to pass NULL as dev_name when registering > the lcdshadow-device and have lcdshadow_get(dev, <connector-name>) > skip device-name matching (consider everything a match) for > lcdshadow-devices registered with NULL as dev_name. > > So I guess in this case the mini match function should just > ignore the passed in device?
Yeah I think we can't really avoid that. I also expect that we'll need ACPI and dt versions of this, and driver needs to know which one to call. Since at least in a dt world the driver knows exactly for which dt node it needs a lcdshadow driver for (with the phandle stuff), so we can be a lot more strict.
For the acpi version I'm not even sure we can do more than provide the struct device * pointer of the gpu. I think if we ever get more than 1 lcdshadow driver on acpi systems we can add more stuff later on, for now I'd just leave that out.
So maybe
acpi_lcdshadow_get(struct device *dev);
of_lcdshadow_get(struct device_node *np);
And with maybe a future plan to add some kind of enum or whatever to acpi_lcdshadow_get(). Both would return either the lcdshadow pointer, or an PTR_ERR() so that we could encode EPROBE_DEFER vs ENOENT.
Ok, note I plan to only implement the acpi version for now, I do expect some non ACPI/x86 devices to show up with his feature eventually but I believe it is best to implement this once those actually show up. Esp. since this will also involve adding some devicetree bindings for this.
ofc, just wanted to lay out the entire thing. The DT version needs some good bikeshed on the dt schema first anyway (so that the helper can decode that directly).
We might also want a low-level lcdshadow_get() which only returns ENOENT when the driver isn't there, and which leaves "do we really need one?" to higher levels to answer.
Right, so my latest idea on that is indeed a high-level lcdshadow_get() which takes a struct device * and a connector-name and which never returns EPROBE_DEFER.
As for leaving things to the higher levels to answer, as explained in my other follow-up email I think that we should probably add a lcdshadow_probe_defer() helper for this and call that early on in the PCI-driver probe functions for the 3 major x86 GPU drivers. Does that sound ok to you?
Uh ... not pretty. There's still a lifetime problem that strictly speaking there's nothing stopping the other driver from getting unloaded between your _probe_defer and the subsequent _get. I think fixing this properly (and screaming a bit at the error code, oh well) is better.
I would really like to separate the discussion and the work on getting the 3 major x86 GPU drivers ready to deal with EPROBE_DEFER from the lcdshadow discussion and work. I expect getting these 3 drivers ready for EPROBE_DEFER is going to be a major undertaking and I would like avoid introducing this significant scope creep to the lcdshadow discussion, because it simply is a too big undertaking to undertake without us getting a significant amount of manpower specifically for this from somewhere.
Note I do agree with you that getting these 3 drivers ready for EPROBE_DEFER handling is a worthwhile undertaking, but it simply will take too much extra time and as such IMHO it really is out of scope for the lcdshadow stuff. I expect the amount of work (esp. also dealing with testing and regressions) for the EPROBE_DEFER project by itself to be a lot *more* work then the actual lcdshadow project.
So going with the assumption/decision that adding proper EPROBE_DEFER handling to these 3 drivers is out of scope, I believe that adding a lcdshadow_probe_defer() helper is an ok solution/workaround for now.
As for your case of the other driver getting unloaded in between the check and use of it, that can only happen by explicit user action and in that case the worst thing what will happen is the "privacy-screen" property missing from the connector, which in that case is more or less exactly what the user asked for.
For i915 we've had patches, for mei-hdcp integration. Until it became clear that the mei subsystem is bonkers, and handles suspend/resume by unloading! drivers.
Which forced i915 to unload and broke the world :-/
So at least for i915 the work should be done already, and just amount to updating the patches Ram already had. No ideas about the other 2.
Ok.
What I don't really want is an interface with races. So if fixing the other drivers is too much work, what we could do is roughly:
- in the top-level probe function to an acpi_lcdshadow_get. This might
fail with EPROBE_DEFER.
- this gives us a dangling reference, but we can drop that one when we
exit the top-level probe function (both on success and on error cases)
- the 2nd acpi_lcdshadow_get call deep down should always succeed at
that point (since the top level holds a reference), so you could wrap that in a WARNING(IS_ERR_PTR()). Ok that's a lie, if we concurrently unload then the 2nd call still fails (the reference can never prevent a hotunbind in the linux kernel device model), so this is exactly as buggy as your version, so still needs a comment about that.
Adding a acpi_lcdshadow_probe_defer() function otoh just gives people the impression that that's actually a correct way of doing this.
Then it's up to the driver maintainers whether they're ok with the above hack of a fake reference, or not. As I said, I think for i915 it should be fairly ok to just roll it out, but maybe the patches don't apply at all anymore.
Ok trying to taking a ref early on and handling EPROBE_DEFER at that first attempt to take a ref works for me. So lets go with that. I will try to whip up a v1 patch-set for this, ETA aprox. 1-2 weeks I guess.
btw to make everything work you also need to set up a device_link between the lcdshadow device (and it's driver, that's the more important thing) and the gpu device. That device link will make sure that
- suspend/resume is done in the right order
- driver load/unload is done in the right order, i.e. unloading of the
lcdshadow driver will automatically force an unbind of the gpu driver first.
That is an interesting idea, but that does assume that their is an actual struct device which the code handling the lcdshadow binds to, which in case of thinkpad_acpi is not really the case.
Anyways passing in a parent device when registering a lcdshadow_dev seems like a good idea and we can do the device_link thing if the parent is non NULL.
For device_link to work, it needs to be the struct device the actual driver is bound to. Otherwise the suspend/resume functions dont do anything. No idea how this acpi stuff works, if it doesn't work like a driver then we're a bit screwed perhaps ...
Anyway I guess for lcdshadow it's not that important, since not much to suspend/resume. For backlight this will matter more I guess. -Daniel
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go. Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state? I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
BR, Jani.
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Regards,
Hans
*) Some udev event I guess, I sorta assume there already is a notification mechanism for property change notifications ?
Hi,
-----Original Message----- From: Hans de Goede hdegoede@redhat.com Sent: Wednesday, April 15, 2020 11:41 AM On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote: Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Afraid, the "hotkeys" control for ePrivacy (Fn+D I believe) is very unlikely to change - Windows uses it as well... We can do notification of any hotkey presses to update the DRM layer (and userspace) if that helps
Hi Mark,
On 4/15/20 7:14 PM, Mark Pearson wrote:
Hi,
-----Original Message----- From: Hans de Goede hdegoede@redhat.com Sent: Wednesday, April 15, 2020 11:41 AM On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote: Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Afraid, the "hotkeys" control for ePrivacy (Fn+D I believe) is very unlikely to change - Windows uses it as well... We can do notification of any hotkey presses to update the DRM layer (and userspace) if that helps
We are not asking for changing the hotkey, what we would like is for the hotkey to only send a notification that it was pressed and for it to not actually do anything with the ePrivacy screen state. This does not need to be it defaults behavior, but we would like to be able to ask the firmware to not act on it itself, just like we can already disable the firmware/embedded controller responding to e.g. brightness up/down key presses itself.
Regards,
Hans
Hello,
On Wed, Apr 15, 2020 at 8:40 AM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Thanks, it would be great!.
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
I agree with Hans here that I think it would be better if we could do it with one property.
* I can imagine demand for laptops that have a "hardware kill switch" for privacy screen (just like there are for camera etc today). So I think in future we may have to deal with this case anyway. In such devices it's the hardware (as opposite to firmware) that will change the state. The HW will likely provide an interrupt to the software to notify of the change. This is all imaginative at this point though.
* I think having 2 properties might be a confusing UAPI. Also, we have existing properties like link-status that can be changed by both the user and the hardware.
Thanks,
Rajat
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Regards,
Hans
*) Some udev event I guess, I sorta assume there already is a notification mechanism for property change notifications ?
On Wed, 15 Apr 2020, Rajat Jain rajatja@google.com wrote:
Hello,
On Wed, Apr 15, 2020 at 8:40 AM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Thanks, it would be great!.
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
I agree with Hans here that I think it would be better if we could do it with one property.
- I can imagine demand for laptops that have a "hardware kill switch"
for privacy screen (just like there are for camera etc today). So I think in future we may have to deal with this case anyway. In such devices it's the hardware (as opposite to firmware) that will change the state. The HW will likely provide an interrupt to the software to notify of the change. This is all imaginative at this point though.
- I think having 2 properties might be a confusing UAPI. Also, we have
existing properties like link-status that can be changed by both the user and the hardware.
I think the consensus is that all properties that get changed by both userspace and the kernel are mistakes, and the way to handle it is to have two properties.
BR, Jani.
Thanks,
Rajat
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Regards,
Hans
*) Some udev event I guess, I sorta assume there already is a notification mechanism for property change notifications ?
Hi,
On 4/15/20 11:10 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Rajat Jain rajatja@google.com wrote:
Hello,
On Wed, Apr 15, 2020 at 8:40 AM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Thanks, it would be great!.
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
I agree with Hans here that I think it would be better if we could do it with one property.
- I can imagine demand for laptops that have a "hardware kill switch"
for privacy screen (just like there are for camera etc today). So I think in future we may have to deal with this case anyway. In such devices it's the hardware (as opposite to firmware) that will change the state. The HW will likely provide an interrupt to the software to notify of the change. This is all imaginative at this point though.
- I think having 2 properties might be a confusing UAPI. Also, we have
existing properties like link-status that can be changed by both the user and the hardware.
I think the consensus is that all properties that get changed by both userspace and the kernel are mistakes, and the way to handle it is to have two properties.
But the actual privacy screen has only 1 state, having two properties for this will only be confusing. As I mentioned before we have a similar case with e.g. keyboard backlighting and there the userspace API also has a single sysfs attribute for the brightness, with change notifications to userspace if the firmware changes the brightness on its own and this works well.
What would the semantics of these 2 different properties be? And what sort of extra functionality would these semantics offer which the single property versions does not offer?
Regards,
Hans
Hi,
-----Original Message----- From: Hans de Goede hdegoede@redhat.com Sent: Wednesday, April 15, 2020 5:21 PM
Hi,
On 4/15/20 11:10 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Rajat Jain rajatja@google.com wrote:
Hello,
On Wed, Apr 15, 2020 at 8:40 AM Hans de Goede
hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com
wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Thanks, it would be great!.
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could
indicate
userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
I agree with Hans here that I think it would be better if we could do it with one property.
- I can imagine demand for laptops that have a "hardware kill switch"
for privacy screen (just like there are for camera etc today). So I think in future we may have to deal with this case anyway. In such devices it's the hardware (as opposite to firmware) that will change the state. The HW will likely provide an interrupt to the software to notify of the change. This is all imaginative at this point though.
- I think having 2 properties might be a confusing UAPI. Also, we have
existing properties like link-status that can be changed by both the user and the hardware.
I think the consensus is that all properties that get changed by both userspace and the kernel are mistakes, and the way to handle it is to have two properties.
But the actual privacy screen has only 1 state, having two properties for this will only be confusing. As I mentioned before we have a similar case with e.g. keyboard backlighting and there the userspace API also has a single sysfs attribute for the brightness, with change notifications to userspace if the firmware changes the brightness on its own and this works well.
What would the semantics of these 2 different properties be? And what sort of extra functionality would these semantics offer which the single property versions does not offer?
I'm worried I'm missing the point but logically this doesn't make sense to me either. We're just providing two mechanisms for controlling ePrivacy - neither is wrong. I'm not sure where the consensus was reached and why - I'm afraid I don't have that history. Not sure it's the best example but you can control the system volume using buttons or the SW slider. It's still just the volume. Isn't this similar (ish). Both have their place and benefits.
What if users want the hotkey functionality? It is after all quite helpful: doing Fn+d is a lot quicker than finding the control box and toggling something if someone is looking over your shoulder and you don't want them to see. I'm not sure why a user would want it disabled - as long as it is tied in properly with the user space reporting.
<Lenovo hat on> On that note - (I started replying in an earlier thread but got side tracked) there is little chance the BIOS team will make significant changes to existing BIOS for adding extra disable functionality. There are just too many repercussions on testing as it impacts Windows. I know it sucks but it's just the way it is and I don't think there is a strong enough story to really push for it. Let me know if I'm out to lunch or being unreasonable 😊I'm happy to have that conversation with them but I already know the results...maybe for future platforms but there are issues there too (when shipping Linux systems it's just another setting to tweak and we're trying to reduce them not increase them...but that's another story)
Mark
On Thu, 16 Apr 2020 00:10:06 +0300 Jani Nikula jani.nikula@linux.intel.com wrote:
On Wed, 15 Apr 2020, Rajat Jain rajatja@google.com wrote:
- I think having 2 properties might be a confusing UAPI. Also, we have
existing properties like link-status that can be changed by both the user and the hardware.
I think the consensus is that all properties that get changed by both userspace and the kernel are mistakes, and the way to handle it is to have two properties.
Yes, I very much agree with Jani.
I don't like KMS properties that are writable by both the driver and userspace. It's awkward to handle in userspace and fundamentally racy aside from tricks like "what you wrote is not what you read back". I have ranted against that when looking at HDCP properties, e.g.: https://lists.freedesktop.org/archives/dri-devel/2019-July/226424.html
See also my other email in this thread about how userspace uses atomic KMS UAPI: you must expect that userspace will always write the property for any KMS update, even if it does not change its value, so any change done by the kernel is randomly lost unless the property is read-only or otherwise weird.
Thanks, pq
On Wed, 15 Apr 2020 17:40:46 +0200 Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Assuming the property is userspace-writable: if kernel goes and changes the property value on its own, it will very likely be just overwritten by userspace right after if userspace does not manage to process the uevent first. If that happens and userspace later processes the uevent, userspace queries the kernel for the current proprerty state which is now what userspace wrote, not what firmware set.
Therefore you end up with the firmware hotkey working only randomly.
It would be much better to have the hotkey events delivered to userspace so that userspace can control the privacy screen and everything will be reliable, both the hotkeys and any GUI for it. The other reliable option is that userspace must never be able to change privacy screen state, only the hardware hotkeys can.
Regards,
Hans
*) Some udev event I guess, I sorta assume there already is a notification mechanism for property change notifications ?
Yes, such mechanism has been discussed before, see the thread containing https://lists.freedesktop.org/archives/dri-devel/2019-May/217588.html
TL;DR: the mechanism exists and is called the hotplug event. But the problem with the hotplug event is that it carries no information about what changed, so userspace is forced re-discover everything about the DRM device. That thread discusses extending the hotplug event to be able to signal changes in individual properties rather than "something maybe changed, you figure out what".
Thanks, pq
On Fri, 17 Apr 2020, Pekka Paalanen ppaalanen@gmail.com wrote:
On Wed, 15 Apr 2020 17:40:46 +0200 Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Assuming the property is userspace-writable: if kernel goes and changes the property value on its own, it will very likely be just overwritten by userspace right after if userspace does not manage to process the uevent first. If that happens and userspace later processes the uevent, userspace queries the kernel for the current proprerty state which is now what userspace wrote, not what firmware set.
Therefore you end up with the firmware hotkey working only randomly.
It would be much better to have the hotkey events delivered to userspace so that userspace can control the privacy screen and everything will be reliable, both the hotkeys and any GUI for it.
I'd like this too. However I fear this is out of our control, and OEMs have and will anyway fiddle with the privacy screen directly no matter what we say, and we can't prevent that. From their POV it's easier for them to do their value-add in components they have total control over. I emphatize with that view, even if it's counter-productive from the Linux ecosystem POV.
So we'll just have to deal with it.
The other reliable option is that userspace must never be able to change privacy screen state, only the hardware hotkeys can.
That, in turn, discourages anyone from doing the right thing, and blocks us from adding any nice additional features for privacy screens that only the userspace is capable of managing. For example, controlling privacy screen based on content, which seems like an obvious feature.
I suppose rf-kill is a bit similar.
You'd have a userspace controlled property to state what the userspace wants, and a kernel controlled property to show the hardware state. Userspace can ask for one or the other, and usually this happens, but the hardware hotkey bypasses the whole thing.
It's not perfect. But I think just one property changed nilly-willy by both the kernel and userspace (especially when it's really not in the kernel's full control either) is going to be a PITA.
BR, Jani.
Regards,
Hans
*) Some udev event I guess, I sorta assume there already is a notification mechanism for property change notifications ?
Yes, such mechanism has been discussed before, see the thread containing https://lists.freedesktop.org/archives/dri-devel/2019-May/217588.html
TL;DR: the mechanism exists and is called the hotplug event. But the problem with the hotplug event is that it carries no information about what changed, so userspace is forced re-discover everything about the DRM device. That thread discusses extending the hotplug event to be able to signal changes in individual properties rather than "something maybe changed, you figure out what".
Thanks, pq
On Fri, Apr 17, 2020 at 1:55 PM Jani Nikula jani.nikula@linux.intel.com wrote:
On Fri, 17 Apr 2020, Pekka Paalanen ppaalanen@gmail.com wrote:
On Wed, 15 Apr 2020 17:40:46 +0200 Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Assuming the property is userspace-writable: if kernel goes and changes the property value on its own, it will very likely be just overwritten by userspace right after if userspace does not manage to process the uevent first. If that happens and userspace later processes the uevent, userspace queries the kernel for the current proprerty state which is now what userspace wrote, not what firmware set.
Therefore you end up with the firmware hotkey working only randomly.
It would be much better to have the hotkey events delivered to userspace so that userspace can control the privacy screen and everything will be reliable, both the hotkeys and any GUI for it.
I'd like this too. However I fear this is out of our control, and OEMs have and will anyway fiddle with the privacy screen directly no matter what we say, and we can't prevent that. From their POV it's easier for them to do their value-add in components they have total control over. I emphatize with that view, even if it's counter-productive from the Linux ecosystem POV.
So we'll just have to deal with it.
The other reliable option is that userspace must never be able to change privacy screen state, only the hardware hotkeys can.
That, in turn, discourages anyone from doing the right thing, and blocks us from adding any nice additional features for privacy screens that only the userspace is capable of managing. For example, controlling privacy screen based on content, which seems like an obvious feature.
I suppose rf-kill is a bit similar.
You'd have a userspace controlled property to state what the userspace wants, and a kernel controlled property to show the hardware state. Userspace can ask for one or the other, and usually this happens, but the hardware hotkey bypasses the whole thing.
It's not perfect. But I think just one property changed nilly-willy by both the kernel and userspace (especially when it's really not in the kernel's full control either) is going to be a PITA.
Yeah that's what we've done in other cases where both kernel and userspace can change stuff. These where just tri-states, but this here sounds like we need all for, so best to just have 2 properties. -Daniel
BR, Jani.
Regards,
Hans
*) Some udev event I guess, I sorta assume there already is a notification mechanism for property change notifications ?
Yes, such mechanism has been discussed before, see the thread containing https://lists.freedesktop.org/archives/dri-devel/2019-May/217588.html
TL;DR: the mechanism exists and is called the hotplug event. But the problem with the hotplug event is that it carries no information about what changed, so userspace is forced re-discover everything about the DRM device. That thread discusses extending the hotplug event to be able to signal changes in individual properties rather than "something maybe changed, you figure out what".
Thanks, pq
-- Jani Nikula, Intel Open Source Graphics Center
On Fri, 2020-04-17 at 16:18 +0200, Daniel Vetter wrote:
I suppose rf-kill is a bit similar.
RfKill is actually much more complicated, and I don't really see how it is related. We may be in the situation where we cannot control the hardware state, but RfKill has two entirely separate "block" states and the real value is the logical OR of both.
Also, RfKill key handling is a mess as userspace needs to tell the kernel it is handling the keys.
You'd have a userspace controlled property to state what the userspace wants, and a kernel controlled property to show the hardware state. Userspace can ask for one or the other, and usually this happens, but the hardware hotkey bypasses the whole thing.
It's not perfect. But I think just one property changed nilly-willy by both the kernel and userspace (especially when it's really not in the kernel's full control either) is going to be a PITA.
Yeah that's what we've done in other cases where both kernel and userspace can change stuff. These where just tri-states, but this here sounds like we need all for, so best to just have 2 properties.
As I see it, the requirements here are: * Userspace needs to be able to query the current state (possibly unavailable?) * Userspace needs to know whether it can set the property * Userspace needs to be notified about changes (by the firmware or possibly any other reason)
But, should never get into the situation that both the firmware and software are trying to toggle the state in response to the same event. In the case of the Fn-key, we'll either deliver a key-press to userspace or just update the attribute because the firmware has already handled it. Only one of these two possibilities may happen.
Benjamin
Hi,
On 4/17/20 1:55 PM, Jani Nikula wrote:
On Fri, 17 Apr 2020, Pekka Paalanen ppaalanen@gmail.com wrote:
On Wed, 15 Apr 2020 17:40:46 +0200 Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Assuming the property is userspace-writable: if kernel goes and changes the property value on its own, it will very likely be just overwritten by userspace right after if userspace does not manage to process the uevent first. If that happens and userspace later processes the uevent, userspace queries the kernel for the current proprerty state which is now what userspace wrote, not what firmware set.
Therefore you end up with the firmware hotkey working only randomly.
It would be much better to have the hotkey events delivered to userspace so that userspace can control the privacy screen and everything will be reliable, both the hotkeys and any GUI for it.
I'd like this too. However I fear this is out of our control, and OEMs have and will anyway fiddle with the privacy screen directly no matter what we say, and we can't prevent that. From their POV it's easier for them to do their value-add in components they have total control over. I emphatize with that view, even if it's counter-productive from the Linux ecosystem POV.
So we'll just have to deal with it.
Ack, at least that is the case for the current generation Lenovo devices.
The other reliable option is that userspace must never be able to change privacy screen state, only the hardware hotkeys can.
That, in turn, discourages anyone from doing the right thing, and blocks us from adding any nice additional features for privacy screens that only the userspace is capable of managing. For example, controlling privacy screen based on content, which seems like an obvious feature.
Right.
So we have the case here were both the firmware and userspace may change the privacyscreen state (on/off) at any time.
This means that the atomic API use described by Pekka for this, where userspace keeps all properties in memory, updates the one which it wants to change and then commits simply cannot work here.
Userspace should only include this property in the transaction if it actually wants to change it. Or it should do a read of it and update its internal state before committing that state unless it wants to change it. But always blindly committing the last value used by userspace will simply not work, because then there is no way for the kernel to know if userspace is just repeating itself or actually wants to change the property.
As for the one vs two properties. For the current hw userspace can always change the state at any time. But I can envision devices where there is a hardware override forcing the privacy screen to be on. So I guess that a r/w "software state" + a ro "hardware state" property would make sense. Note that with the current gen. Lenovo devices the hw-state will simply be a mirror of the sw-state and the sw-state will be toggled by the firmware independently of the last value requested by userspace.
I guess we could add a third ro firmware-state property, which would reflect the last firmware requested value, but I cannot really come up with clear semantics for this; nor can I come up with how this actually helps.
TL;DR: Yes there will be races, because of both userspace + the firmware having; and potentially using r/w access to the privacy-screen state. But in practice I expect these to not really be an issue. Important here is that userspace only commits the property in a transaction to commit if it actually intends to change the property so as to not needlessly create a situation where we might hit the race.
As for 1 vs 2 properties for this I guess that in preparation for potential devices where the state is locked, having a r/w sw-state + a ro hw-state property makes sense.
So I suggest that we replace the current "privacy-screen" property from Rajat's patch-set with 2 props named:
"privacy-screen-sw-state" (r/w) "privacy-screen-hw-state" (ro)
Where for current gen hardware the privacy-screen-hw-state is just a mirror of the sw-state.
Regards,
Hans
On Tue, Apr 21, 2020 at 02:37:41PM +0200, Hans de Goede wrote:
Hi,
On 4/17/20 1:55 PM, Jani Nikula wrote:
On Fri, 17 Apr 2020, Pekka Paalanen ppaalanen@gmail.com wrote:
On Wed, 15 Apr 2020 17:40:46 +0200 Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Assuming the property is userspace-writable: if kernel goes and changes the property value on its own, it will very likely be just overwritten by userspace right after if userspace does not manage to process the uevent first. If that happens and userspace later processes the uevent, userspace queries the kernel for the current proprerty state which is now what userspace wrote, not what firmware set.
Therefore you end up with the firmware hotkey working only randomly.
It would be much better to have the hotkey events delivered to userspace so that userspace can control the privacy screen and everything will be reliable, both the hotkeys and any GUI for it.
I'd like this too. However I fear this is out of our control, and OEMs have and will anyway fiddle with the privacy screen directly no matter what we say, and we can't prevent that. From their POV it's easier for them to do their value-add in components they have total control over. I emphatize with that view, even if it's counter-productive from the Linux ecosystem POV.
So we'll just have to deal with it.
Ack, at least that is the case for the current generation Lenovo devices.
The other reliable option is that userspace must never be able to change privacy screen state, only the hardware hotkeys can.
That, in turn, discourages anyone from doing the right thing, and blocks us from adding any nice additional features for privacy screens that only the userspace is capable of managing. For example, controlling privacy screen based on content, which seems like an obvious feature.
Right.
So we have the case here were both the firmware and userspace may change the privacyscreen state (on/off) at any time.
This means that the atomic API use described by Pekka for this, where userspace keeps all properties in memory, updates the one which it wants to change and then commits simply cannot work here.
Userspace should only include this property in the transaction if it actually wants to change it. Or it should do a read of it and update its internal state before committing that state unless it wants to change it. But always blindly committing the last value used by userspace will simply not work, because then there is no way for the kernel to know if userspace is just repeating itself or actually wants to change the property.
As for the one vs two properties. For the current hw userspace can always change the state at any time. But I can envision devices where there is a hardware override forcing the privacy screen to be on. So I guess that a r/w "software state" + a ro "hardware state" property would make sense. Note that with the current gen. Lenovo devices the hw-state will simply be a mirror of the sw-state and the sw-state will be toggled by the firmware independently of the last value requested by userspace.
I guess we could add a third ro firmware-state property, which would reflect the last firmware requested value, but I cannot really come up with clear semantics for this; nor can I come up with how this actually helps.
TL;DR: Yes there will be races, because of both userspace + the firmware having; and potentially using r/w access to the privacy-screen state. But in practice I expect these to not really be an issue. Important here is that userspace only commits the property in a transaction to commit if it actually intends to change the property so as to not needlessly create a situation where we might hit the race.
As for 1 vs 2 properties for this I guess that in preparation for potential devices where the state is locked, having a r/w sw-state + a ro hw-state property makes sense.
So I suggest that we replace the current "privacy-screen" property from Rajat's patch-set with 2 props named:
"privacy-screen-sw-state" (r/w) "privacy-screen-hw-state" (ro)
Where for current gen hardware the privacy-screen-hw-state is just a mirror of the sw-state.
Yup. I think internally we should have some kind of explicit update function perhaps, that's at least how immutable (by userspace) properties work. Or we'll change that to a callback, so that there's no inversion going on. -Daniel
Regards,
Hans
On Tue, 21 Apr 2020 14:37:41 +0200 Hans de Goede hdegoede@redhat.com wrote:
TL;DR: Yes there will be races, because of both userspace + the firmware having; and potentially using r/w access to the privacy-screen state. But in practice I expect these to not really be an issue. Important here is that userspace only commits the property in a transaction to commit if it actually intends to change the property so as to not needlessly create a situation where we might hit the race.
As for 1 vs 2 properties for this I guess that in preparation for potential devices where the state is locked, having a r/w sw-state + a ro hw-state property makes sense.
So I suggest that we replace the current "privacy-screen" property from Rajat's patch-set with 2 props named:
"privacy-screen-sw-state" (r/w) "privacy-screen-hw-state" (ro)
Where for current gen hardware the privacy-screen-hw-state is just a mirror of the sw-state.
Hi,
this sounds like a good plan to me, assuming the kernel writes only to the ro property and never to the r/w property.
I understand that as long as firmware hotkeys will toggle actual state, there is no design that could work reliably if userspace will always commit all KMS state even when it is not necessary. But not committing KMS state unless it is actually necessary is really a new requirement on userspace, so that needs to be documented before it's too late.
It's not enough to document "don't set it unless you want to overwrite/change it" for privacy screen properties. It needs to be documented as a general rule that userspace must follow with *unknown* properties as well. "Do not restore unrecognized properties unless the kernel KMS state might be incorrect compared to what you used to have."
This means that with a display server that does not understand privacy screen properties, the end user will lose the privacy screen state on every VT-switch back to the display server.
However, if we had a way to query the kernel for the default state to reset unknown properties to, the kernel implementation could return the current value of the privacy screen property instead of "off" to not lose the firmware state. Assuming firmware hotkeys exist, but if they don't then return just "off". The point is that the kernel who knows all the properties makes the decision what a sane reset value is. Userspace can always override the reset value for the properties it recognizes.
Thanks, pq
On Tue, Apr 21, 2020 at 7:46 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:37:41 +0200 Hans de Goede hdegoede@redhat.com wrote:
TL;DR: Yes there will be races, because of both userspace + the firmware having; and potentially using r/w access to the privacy-screen state. But in practice I expect these to not really be an issue. Important here is that userspace only commits the property in a transaction to commit if it actually intends to change the property so as to not needlessly create a situation where we might hit the race.
As for 1 vs 2 properties for this I guess that in preparation for potential devices where the state is locked, having a r/w sw-state + a ro hw-state property makes sense.
So I suggest that we replace the current "privacy-screen" property from Rajat's patch-set with 2 props named:
"privacy-screen-sw-state" (r/w) "privacy-screen-hw-state" (ro)
Where for current gen hardware the privacy-screen-hw-state is just a mirror of the sw-state.
Just to make sure I understand the semantics correctly:
- The "privacy-screen-hw-state" shall be read-only, and can be modified by: - Hardware (e.g. HW kill switch). - Firmware. - (Potentially) needs a notification/irq to the kernel when this changes (or may be kernel can read it only when userspace queries for it).
- The "privacy-screen-sw-state" shall be read-write, and can only be modified by user space. - If user space toggles it, the kernel will attempt to "request" the change to hardware. - Whether the request to hardware was successful or not, the "privacy-screen-sw-state" will always reflect the latest value userspace wrote. - If the request to hardware was successful, the "privacy-screen-hw-state" will also change (probably via a separate notification/irq from HW). - We expect the user space to write to "privacy-screen-sw-state" only if it really wants to toggle the value.
What is not clear to me is if any change to"privacy-screen-hw-state" shall be propagated to "privacy-screen-sw-state"? - If yes, then I think we are not solving any problems of single property. - If no, then why do we require userspace to write to sw state only if something has changed?
Also, it seems to me that in my current patchset, the property I have already behaves like "privacy-screen-sw-state". Do I just need to rename it?
Thanks,
Rajat
Hi,
this sounds like a good plan to me, assuming the kernel writes only to the ro property and never to the r/w property.
I understand that as long as firmware hotkeys will toggle actual state, there is no design that could work reliably if userspace will always commit all KMS state even when it is not necessary. But not committing KMS state unless it is actually necessary is really For implementing the "privacy-screen-sw-state".a new requirement on userspace, so that needs to be documented before it's too late.
It's not enough to document "don't set it unless you want to overwrite/change it" for privacy screen properties. It needs to be documented as a general rule that userspace must follow with *unknown* properties as well. "Do not restore unrecognized properties unless the kernel KMS state might be incorrect compared to what you used to have."
This means that with a display server that does not understand privacy screen properties, the end user will lose the privacy screen state on every VT-switch back to the display server.
However, if we had a way to query the kernel for the default state to reset unknown properties to, the kernel implementation could return the current value of the privacy screen property instead of "off" to not lose the firmware state. Assuming firmware hotkeys exist, but if they don't then return just "off". The point is that the kernel who knows all the properties makes the decision what a sane reset value is. Userspace can always override the reset value for the properties it recognizes.
Thanks,For implementing the "privacy-screen-sw-state". pq
On Thu, 23 Apr 2020 11:21:47 -0700 Rajat Jain rajatja@google.com wrote:
On Tue, Apr 21, 2020 at 7:46 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:37:41 +0200 Hans de Goede hdegoede@redhat.com wrote:
TL;DR: Yes there will be races, because of both userspace + the firmware having; and potentially using r/w access to the privacy-screen state. But in practice I expect these to not really be an issue. Important here is that userspace only commits the property in a transaction to commit if it actually intends to change the property so as to not needlessly create a situation where we might hit the race.
As for 1 vs 2 properties for this I guess that in preparation for potential devices where the state is locked, having a r/w sw-state + a ro hw-state property makes sense.
So I suggest that we replace the current "privacy-screen" property from Rajat's patch-set with 2 props named:
"privacy-screen-sw-state" (r/w) "privacy-screen-hw-state" (ro)
Where for current gen hardware the privacy-screen-hw-state is just a mirror of the sw-state.
Just to make sure I understand the semantics correctly:
- The "privacy-screen-hw-state" shall be read-only, and can be modified by: - Hardware (e.g. HW kill switch). - Firmware. - (Potentially) needs a notification/irq to the kernel when this
changes (or may be kernel can read it only when userspace queries for it).
- The "privacy-screen-sw-state" shall be read-write, and can only be
modified by user space. - If user space toggles it, the kernel will attempt to "request" the change to hardware. - Whether the request to hardware was successful or not, the "privacy-screen-sw-state" will always reflect the latest value userspace wrote. - If the request to hardware was successful, the "privacy-screen-hw-state" will also change (probably via a separate notification/irq from HW). - We expect the user space to write to "privacy-screen-sw-state" only if it really wants to toggle the value.
Hi,
yes, to my understanding, that seems to be the correct idea from this thread. The hw-state property must reflect the actual hardware state at all times.
However, when userspace sets "privacy-screen-sw-state", the driver should attempt to change hardware state regardless of whether the "privacy-screen-sw-state" value changes compared to its old value or not. Otherwise userspace cannot intentionally override a hardware hotkey setting if possible (or would need two atomic commits to do it).
Mind, the above paragraph is only what I interpreted from this email thread here. Previously I did not think that with atomic KMS, setting a property to a value it already has could trigger anything. But I guess it can?
This design is based on that it can.
What is not clear to me is if any change to"privacy-screen-hw-state" shall be propagated to "privacy-screen-sw-state"?
- If yes, then I think we are not solving any problems of single property.
- If no, then why do we require userspace to write to sw state only
if something has changed?
No. As already written, the kernel must not change the value of "privacy-screen-sw-state", only userspace can.
Let's assume that you have a firmware-implemented hardware hotkey for toggling the shield. The driver also successfully implements "privacy-screen-sw-state" meaning that writing to it will set the hardware shield state. If userspace was writing "privacy-screen-sw-state" even when it does not intend to change hardware state, it would almost immediately override any state set by the hardware hotkey, making the hardware hotkey (randomly) not work.
This assumes that the hardware hotkey is a momentary switch that does not stop software from controlling the shield too.
If the hardware hotkey can stop software from changing the shield state, then it might not be necessary for userspace to avoid unneeded setting of the property. But that depends on which way the hotkey works and which way users want to use it, so it's still best for userspace to not set the property unless it really intends to apply a change.
If possible, it would be good to make this case the prime example of how to correctly implement KMS properties for a hardware feature that can be controlled (and fought over) by both userspace and hardware/firmware. It seems like the same design can also work with hardware switches that force the hardware state to be one or the other, stopping userspace from changing it. Therefore I'd avoid incorporating any specific shield use cases in the design, e.g. "if hw switch is set to shield-on, userspace cannot turn shield off".
Thanks, pq
Also, it seems to me that in my current patchset, the property I have already behaves like "privacy-screen-sw-state". Do I just need to rename it?
Thanks,
Rajat
Hi all,
Pekka, Rajat,
Thank you for your input in this.
On 4/24/20 9:40 AM, Pekka Paalanen wrote:
On Thu, 23 Apr 2020 11:21:47 -0700 Rajat Jain rajatja@google.com wrote:
On Tue, Apr 21, 2020 at 7:46 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:37:41 +0200 Hans de Goede hdegoede@redhat.com wrote:
TL;DR: Yes there will be races, because of both userspace + the firmware having; and potentially using r/w access to the privacy-screen state. But in practice I expect these to not really be an issue. Important here is that userspace only commits the property in a transaction to commit if it actually intends to change the property so as to not needlessly create a situation where we might hit the race.
As for 1 vs 2 properties for this I guess that in preparation for potential devices where the state is locked, having a r/w sw-state + a ro hw-state property makes sense.
So I suggest that we replace the current "privacy-screen" property from Rajat's patch-set with 2 props named:
"privacy-screen-sw-state" (r/w) "privacy-screen-hw-state" (ro)
Where for current gen hardware the privacy-screen-hw-state is just a mirror of the sw-state.
Just to make sure I understand the semantics correctly:
- The "privacy-screen-hw-state" shall be read-only, and can be modified by: - Hardware (e.g. HW kill switch). - Firmware. - (Potentially) needs a notification/irq to the kernel when this
changes (or may be kernel can read it only when userspace queries for it).
- The "privacy-screen-sw-state" shall be read-write, and can only be
modified by user space. - If user space toggles it, the kernel will attempt to "request" the change to hardware. - Whether the request to hardware was successful or not, the "privacy-screen-sw-state" will always reflect the latest value userspace wrote. - If the request to hardware was successful, the "privacy-screen-hw-state" will also change (probably via a separate notification/irq from HW). - We expect the user space to write to "privacy-screen-sw-state" only if it really wants to toggle the value.
Hi,
yes, to my understanding, that seems to be the correct idea from this thread. The hw-state property must reflect the actual hardware state at all times.
Agree on the hw-state prop reflecting the actual hardware state at all times, that one is easy.
However, when userspace sets "privacy-screen-sw-state", the driver should attempt to change hardware state regardless of whether the "privacy-screen-sw-state" value changes compared to its old value or not. Otherwise userspace cannot intentionally override a hardware hotkey setting if possible (or would need two atomic commits to do it).
Ack / agreed.
Mind, the above paragraph is only what I interpreted from this email thread here. Previously I did not think that with atomic KMS, setting a property to a value it already has could trigger anything. But I guess it can?
In a way. My idea for the "privacy-screen-sw-state" is for it to reflect the last requested value, where the request could come from either a firmware controlled hotkey; or from userspace (this seems to be where our ideas of how to handle this diverts).
So what can happen is (with both props being always in sync) -userspace reads privacy screen being off -user toggles privacy screen on through firmware controlled hotkey -kernel gets notified about state toggle, updates both property states to on -userspace commits its old knowledge of the property (off), thereby triggering the kernel to turn the privacy screen back off
So in this case from the kernel pov the property is actually set to a new value, not to "a value it already has".
Note there can be races here of course, but lets ignore those (for now). Both the hotkey event as well as userspace changing the setting will be end-user triggered events and will not happen at high frequency. Also I see no way to completely eliminate racing here. Luckily the side effects of the race or pretty harmless (annoying maybe, but not causing crashes, etc).
This design is based on that it can.
What is not clear to me is if any change to"privacy-screen-hw-state" shall be propagated to "privacy-screen-sw-state"?
- If yes, then I think we are not solving any problems of single property.
- If no, then why do we require userspace to write to sw state only
if something has changed?
No. As already written, the kernel must not change the value of "privacy-screen-sw-state", only userspace can.
So this is where out view of how to handle this differs, I do not see the hotkey changing the state as different from userspace changing it. The reason for me to have both a sw- and a hw-state is in case there is a physical switch (typically a slider style switch) which forces the state to on / off. In this case userspace could still set the "privacy-screen-sw-state" prop and then the 2 could differ.
Lets add one more complication to this, which I think helps. Currently the thinkpad_acpi driver exports the privacy screen as:
/proc/acpi/ibm/lcdshadow
Userspace can write this and then change the privacy-screen setting, this is in shipped kernels and cannot be dropped because it is part if the kernel's uABI now. This means that another userspace process can change the property underneath a kms client. I do not see how this is different from the firmware changing the setting based on a hotkey press. Yet if we stick with your "only userspace can" change the sw-state setting, then does this count as userspace, or do you mean only a kms client can ? And then how is another kms-client changing the setting different ?
So to me to avoid confusion the only valid case where the hw- and sw-state can differ is if userspace requests say "off" as state while the privacy screen is forced on by say a physical switch (or e.g. a BIOS option to lock it?).
Then we would remember the off in sw-state but hw-state would still be on.
I guess that maybe for the enum of the hw-state we need 4 values instead of 2:
Enabled Disabled Enabled, locked Disabled, locked
To indicate to userspace that atm the state cannot be changed.
If userspace then still changes sw-state we cache it and apply this if the privacy screen control gets unlocked.
On hardware where there is no "lock" the 2 properties will simply always be the same.
Let's assume that you have a firmware-implemented hardware hotkey for toggling the shield. The driver also successfully implements "privacy-screen-sw-state" meaning that writing to it will set the hardware shield state. If userspace was writing "privacy-screen-sw-state" even when it does not intend to change hardware state, it would almost immediately override any state set by the hardware hotkey, making the hardware hotkey (randomly) not work.
Right, this is why userspace should not set the property unless it really means to change it, even then things could still race, but as explained above that should normally never happen and luckily the side-effects of hitting the race somehow are not that bad.
This assumes that the hardware hotkey is a momentary switch that does not stop software from controlling the shield too.
This is correct for the Lenovo / thinkpad_acpi case.
If the hardware hotkey can stop software from changing the shield state, then it might not be necessary for userspace to avoid unneeded setting of the property. But that depends on which way the hotkey works and which way users want to use it, so it's still best for userspace to not set the property unless it really intends to apply a change.
Ack.
If possible, it would be good to make this case the prime example of how to correctly implement KMS properties for a hardware feature that can be controlled (and fought over) by both userspace and hardware/firmware. It seems like the same design can also work with hardware switches that force the hardware state to be one or the other, stopping userspace from changing it. Therefore I'd avoid incorporating any specific shield use cases in the design, e.g. "if hw switch is set to shield-on, userspace cannot turn shield off".
I agree that it would be good to make this the prime example of how to deal with similar cases.
Also, it seems to me that in my current patchset, the property I have already behaves like "privacy-screen-sw-state". Do I just need to rename it?
Maybe, it looks like we first need to figure out the exact semantics of all this.
Regards,
Hans
On Fri, 24 Apr 2020 10:24:31 +0200 Hans de Goede hdegoede@redhat.com wrote:
Hi all,
Pekka, Rajat,
Thank you for your input in this.
On 4/24/20 9:40 AM, Pekka Paalanen wrote:
On Thu, 23 Apr 2020 11:21:47 -0700 Rajat Jain rajatja@google.com wrote:
On Tue, Apr 21, 2020 at 7:46 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:37:41 +0200 Hans de Goede hdegoede@redhat.com wrote:
TL;DR: Yes there will be races, because of both userspace + the firmware having; and potentially using r/w access to the privacy-screen state. But in practice I expect these to not really be an issue. Important here is that userspace only commits the property in a transaction to commit if it actually intends to change the property so as to not needlessly create a situation where we might hit the race.
As for 1 vs 2 properties for this I guess that in preparation for potential devices where the state is locked, having a r/w sw-state + a ro hw-state property makes sense.
So I suggest that we replace the current "privacy-screen" property from Rajat's patch-set with 2 props named:
"privacy-screen-sw-state" (r/w) "privacy-screen-hw-state" (ro)
Where for current gen hardware the privacy-screen-hw-state is just a mirror of the sw-state.
Just to make sure I understand the semantics correctly:
- The "privacy-screen-hw-state" shall be read-only, and can be modified by: - Hardware (e.g. HW kill switch). - Firmware. - (Potentially) needs a notification/irq to the kernel when this
changes (or may be kernel can read it only when userspace queries for it).
- The "privacy-screen-sw-state" shall be read-write, and can only be
modified by user space. - If user space toggles it, the kernel will attempt to "request" the change to hardware. - Whether the request to hardware was successful or not, the "privacy-screen-sw-state" will always reflect the latest value userspace wrote. - If the request to hardware was successful, the "privacy-screen-hw-state" will also change (probably via a separate notification/irq from HW). - We expect the user space to write to "privacy-screen-sw-state" only if it really wants to toggle the value.
Hi,
yes, to my understanding, that seems to be the correct idea from this thread. The hw-state property must reflect the actual hardware state at all times.
Agree on the hw-state prop reflecting the actual hardware state at all times, that one is easy.
However, when userspace sets "privacy-screen-sw-state", the driver should attempt to change hardware state regardless of whether the "privacy-screen-sw-state" value changes compared to its old value or not. Otherwise userspace cannot intentionally override a hardware hotkey setting if possible (or would need two atomic commits to do it).
Ack / agreed.
Mind, the above paragraph is only what I interpreted from this email thread here. Previously I did not think that with atomic KMS, setting a property to a value it already has could trigger anything. But I guess it can?
In a way. My idea for the "privacy-screen-sw-state" is for it to reflect the last requested value, where the request could come from either a firmware controlled hotkey; or from userspace (this seems to be where our ideas of how to handle this diverts).
So what can happen is (with both props being always in sync) -userspace reads privacy screen being off -user toggles privacy screen on through firmware controlled hotkey -kernel gets notified about state toggle, updates both property states to on -userspace commits its old knowledge of the property (off), thereby triggering the kernel to turn the privacy screen back off
So in this case from the kernel pov the property is actually set to a new value, not to "a value it already has".
Hi,
that is an interesting point of view.
You are keeping the separation between "wanted" and "actual" state, but counting firmware/hardware hotkeys as "want" instead of letting them silently change hardware state.
That seems ok.
Note there can be races here of course, but lets ignore those (for now). Both the hotkey event as well as userspace changing the setting will be end-user triggered events and will not happen at high frequency. Also I see no way to completely eliminate racing here. Luckily the side effects of the race or pretty harmless (annoying maybe, but not causing crashes, etc).
This design is based on that it can.
What is not clear to me is if any change to"privacy-screen-hw-state" shall be propagated to "privacy-screen-sw-state"?
- If yes, then I think we are not solving any problems of single property.
- If no, then why do we require userspace to write to sw state only
if something has changed?
No. As already written, the kernel must not change the value of "privacy-screen-sw-state", only userspace can.
So this is where out view of how to handle this differs, I do not see the hotkey changing the state as different from userspace changing it. The reason for me to have both a sw- and a hw-state is in case there is a physical switch (typically a slider style switch) which forces the state to on / off. In this case userspace could still set the "privacy-screen-sw-state" prop and then the 2 could differ.
Yes, the locked switch case definitely makes sense to me.
If userspace has to avoid setting the sw property unless it actually intends to change it, then the sw property being controlled from multiple sources (firmware, hotkey, the /proc file below) could work. It would even tell the KMS client when someone else changed the "wanted" state.
Lets add one more complication to this, which I think helps. Currently the thinkpad_acpi driver exports the privacy screen as:
/proc/acpi/ibm/lcdshadow
Userspace can write this and then change the privacy-screen setting, this is in shipped kernels and cannot be dropped because it is part if the kernel's uABI now. This means that another userspace process can change the property underneath a kms client. I do not see how this is different from the firmware changing the setting based on a hotkey press. Yet if we stick with your "only userspace can" change the sw-state setting, then does this count as userspace, or do you mean only a kms client can ? And then how is another kms-client changing the setting different ?
To me that would be similar to firmware changing hardware state: it's not the KMS client (the display server) doing it, but something else behind its back while it thinks it's in full control.
Doing things behind the display server's back is what creates all the mess here.
Another KMS client cannot set the property behind the display server's back, because the display server is holding DRM master, and the property cannot be written without DRM master status. If the display server drops DRM master, it knows it probably lost all state.
So to me to avoid confusion the only valid case where the hw- and sw-state can differ is if userspace requests say "off" as state while the privacy screen is forced on by say a physical switch (or e.g. a BIOS option to lock it?).
Then we would remember the off in sw-state but hw-state would still be on.
I guess that maybe for the enum of the hw-state we need 4 values instead of 2:
Enabled Disabled Enabled, locked Disabled, locked
To indicate to userspace that atm the state cannot be changed.
If userspace needs that information, sure.
This makes me think that a driver needs to handle different types of switches/hotkeys through different properties:
- For a hardware latching switch that forces the shield state to be one and not the other, it should change the hw-state but it does not seem to have a reason to change the sw-state property: it's not a "want", it's a hard setting. Changing sw-state would also lose the userspace preference of the setting.
- For momentary button or a hotkey that is supposed to just toggle the shield state, it would toggle sw-state property since it's a "want" that can be overridden. Setting the property leads to changing the hw-state as well (if not locked).
Does that make sense?
Maybe this is the best compromise given the display server cannot be in full control.
If userspace then still changes sw-state we cache it and apply this if the privacy screen control gets unlocked.
Sounds good.
On hardware where there is no "lock" the 2 properties will simply always be the same.
Ok.
Thanks, pq
Let's assume that you have a firmware-implemented hardware hotkey for toggling the shield. The driver also successfully implements "privacy-screen-sw-state" meaning that writing to it will set the hardware shield state. If userspace was writing "privacy-screen-sw-state" even when it does not intend to change hardware state, it would almost immediately override any state set by the hardware hotkey, making the hardware hotkey (randomly) not work.
Right, this is why userspace should not set the property unless it really means to change it, even then things could still race, but as explained above that should normally never happen and luckily the side-effects of hitting the race somehow are not that bad.
This assumes that the hardware hotkey is a momentary switch that does not stop software from controlling the shield too.
This is correct for the Lenovo / thinkpad_acpi case.
If the hardware hotkey can stop software from changing the shield state, then it might not be necessary for userspace to avoid unneeded setting of the property. But that depends on which way the hotkey works and which way users want to use it, so it's still best for userspace to not set the property unless it really intends to apply a change.
Ack.
If possible, it would be good to make this case the prime example of how to correctly implement KMS properties for a hardware feature that can be controlled (and fought over) by both userspace and hardware/firmware. It seems like the same design can also work with hardware switches that force the hardware state to be one or the other, stopping userspace from changing it. Therefore I'd avoid incorporating any specific shield use cases in the design, e.g. "if hw switch is set to shield-on, userspace cannot turn shield off".
I agree that it would be good to make this the prime example of how to deal with similar cases.
Also, it seems to me that in my current patchset, the property I have already behaves like "privacy-screen-sw-state". Do I just need to rename it?
Maybe, it looks like we first need to figure out the exact semantics of all this.
Regards,
Hans
Hi all,
On 4/24/20 11:08 AM, Pekka Paalanen wrote:
On Fri, 24 Apr 2020 10:24:31 +0200 Hans de Goede hdegoede@redhat.com wrote:
<snip>
Agree on the hw-state prop reflecting the actual hardware state at all times, that one is easy.
However, when userspace sets "privacy-screen-sw-state", the driver should attempt to change hardware state regardless of whether the "privacy-screen-sw-state" value changes compared to its old value or not. Otherwise userspace cannot intentionally override a hardware hotkey setting if possible (or would need two atomic commits to do it).
Ack / agreed.
Mind, the above paragraph is only what I interpreted from this email thread here. Previously I did not think that with atomic KMS, setting a property to a value it already has could trigger anything. But I guess it can?
In a way. My idea for the "privacy-screen-sw-state" is for it to reflect the last requested value, where the request could come from either a firmware controlled hotkey; or from userspace (this seems to be where our ideas of how to handle this diverts).
So what can happen is (with both props being always in sync) -userspace reads privacy screen being off -user toggles privacy screen on through firmware controlled hotkey -kernel gets notified about state toggle, updates both property states to on -userspace commits its old knowledge of the property (off), thereby triggering the kernel to turn the privacy screen back off
So in this case from the kernel pov the property is actually set to a new value, not to "a value it already has".
Hi,
that is an interesting point of view.
You are keeping the separation between "wanted" and "actual" state, but counting firmware/hardware hotkeys as "want" instead of letting them silently change hardware state.
Right, that seems more natural to me, as mentioned already, this way the wanted and hw state only get out of sync if the hw is locked to a certain state.
That seems ok.
Note there can be races here of course, but lets ignore those (for now). Both the hotkey event as well as userspace changing the setting will be end-user triggered events and will not happen at high frequency. Also I see no way to completely eliminate racing here. Luckily the side effects of the race or pretty harmless (annoying maybe, but not causing crashes, etc).
This design is based on that it can.
What is not clear to me is if any change to"privacy-screen-hw-state" shall be propagated to "privacy-screen-sw-state"?
- If yes, then I think we are not solving any problems of single property.
- If no, then why do we require userspace to write to sw state only
if something has changed?
No. As already written, the kernel must not change the value of "privacy-screen-sw-state", only userspace can.
So this is where out view of how to handle this differs, I do not see the hotkey changing the state as different from userspace changing it. The reason for me to have both a sw- and a hw-state is in case there is a physical switch (typically a slider style switch) which forces the state to on / off. In this case userspace could still set the "privacy-screen-sw-state" prop and then the 2 could differ.
Yes, the locked switch case definitely makes sense to me.
If userspace has to avoid setting the sw property unless it actually intends to change it, then the sw property being controlled from multiple sources (firmware, hotkey, the /proc file below) could work. It would even tell the KMS client when someone else changed the "wanted" state.
Right, that is the idea and telling the KMS client definitely is a feature we want, so that we can show an OSD notifcation on the firmware handled hotkey presses, like we already do for volume control/mute, (kbd) backlight changes, etc.
Lets add one more complication to this, which I think helps. Currently the thinkpad_acpi driver exports the privacy screen as:
/proc/acpi/ibm/lcdshadow
Userspace can write this and then change the privacy-screen setting, this is in shipped kernels and cannot be dropped because it is part if the kernel's uABI now. This means that another userspace process can change the property underneath a kms client. I do not see how this is different from the firmware changing the setting based on a hotkey press. Yet if we stick with your "only userspace can" change the sw-state setting, then does this count as userspace, or do you mean only a kms client can ? And then how is another kms-client changing the setting different ?
To me that would be similar to firmware changing hardware state: it's not the KMS client (the display server) doing it, but something else behind its back while it thinks it's in full control.
Doing things behind the display server's back is what creates all the mess here.
Right, unfortunately this is not something which we can change. I have asked Lenovo if it will be possible to just have the hotkey send keypresses and let userspace deal with it. This might happen on future hardware generations. But then again it might not and for the current hardware gen. we are stuck with the firmware handling it.
Another KMS client cannot set the property behind the display server's back, because the display server is holding DRM master, and the property cannot be written without DRM master status. If the display server drops DRM master, it knows it probably lost all state.
True.
So to me to avoid confusion the only valid case where the hw- and sw-state can differ is if userspace requests say "off" as state while the privacy screen is forced on by say a physical switch (or e.g. a BIOS option to lock it?).
Then we would remember the off in sw-state but hw-state would still be on.
I guess that maybe for the enum of the hw-state we need 4 values instead of 2:
Enabled Disabled Enabled, locked Disabled, locked
To indicate to userspace that atm the state cannot be changed.
If userspace needs that information, sure.
Yes I believe that userspace will want to know when the state is locked. Currently when wifi is disable through a hardware switch, like the sliders on the side of (usually somewhat older) laptops, then e.g. GNOME will show "Disabled in hardware" (the wording could do with some work) at the place where you would normally run wifi on/off and select the network to use.
This makes me think that a driver needs to handle different types of switches/hotkeys through different properties:
- For a hardware latching switch that forces the shield state to be one and not the other, it should change the hw-state but it does not seem to have a reason to change the sw-state property: it's not a "want", it's a hard setting. Changing sw-state would also lose the userspace preference of the setting.
Yes, I completely agree.
- For momentary button or a hotkey that is supposed to just toggle the shield state, it would toggle sw-state property since it's a "want" that can be overridden. Setting the property leads to changing the hw-state as well (if not locked).
Also agreed, although in practice the firmware may actually change the hw state itself (if not locked), but we would then update both the sw and hw props to reflect this.
Does that make sense?
Yes this makes sense to me. Now we just need to write it all down in such a way so that other people will also understand it :)
Maybe this is the best compromise given the display server cannot be in full control.
I agree that this seems to be the best compromise.
If userspace then still changes sw-state we cache it and apply this if the privacy screen control gets unlocked.
Sounds good.
On hardware where there is no "lock" the 2 properties will simply always be the same.
Ok.
Regards,
Hans
On Fri, Apr 17, 2020 at 11:02 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Wed, 15 Apr 2020 17:40:46 +0200 Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 4/15/20 5:28 PM, Jani Nikula wrote:
On Wed, 15 Apr 2020, Hans de Goede hdegoede@redhat.com wrote:
ii. Currently the "privacy-screen" property added by Rajat's patch-set is an enum with 2 possible values: "Enabled" "Disabled"
We could add a third value "Not Available", which would be the default and then for internal panels always add the property so that we avoid the problem that detecting if the laptop has an internal privacy screen needs to be done before the connector is registered. Then we can add some hooks which allow an lcdshadow-driver to register itself against a connector later (which is non trivial wrt probe order, but lets ignore that for now).
I regret dropping the ball on Rajat's series (sorry!).
I do think having the connector property for this is the way to go.
I 100% agree.
Even if we couldn't necessarily figure out all the details on the kernel internal connections, can we settle on the property though, so we could move forward with Rajat's series?
Yes please, this will also allow us to move forward with userspace support even if for testing that we do some hacks for the kernel's internal connections for now.
Moreover, do we actually need two properties, one which could indicate userspace's desire for the property, and another that tells the hardware state?
No I do not think so. I would expect there to just be one property, I guess that if the state is (partly) firmware controlled then there might be a race, but we will need a notification mechanism (*) for firmware triggered state changes anyways, so shortly after loosing the race userspace will process the notification and it will know about it.
One thing which might be useful is a way to signal that the property is read-only in case we ever hit hw where that is the case.
I'd so very much like to have no in-kernel/in-firmware shortcuts to enable/disable the privacy screen, and instead have any hardware buttons just be events that the userspace could react to. However I don't think that'll be the case unfortunately.
In my experience with keyboard-backlight support, we will (unfortunately) see a mix and in some case we will get a notification that the firmware has adjusted the state, rather then just getting a keypress and dealing with that ourselves. In some cases we may even be able to choose, so the fw will deal with it by default but we can ask it to just send a key-press. But I do believe that we can *not* expect that we will always just get a keypress for userspace to deal with.
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Uh if you do that you'll get random surprising failures if you don't also set ALLOW_MODESET, because that way you'll automatically repair link failures and stuff like that. I'm assuming your userspace only supplies all the properties for crtc and planes, and leaves connectors as-is? Otherwise you already have some fun bugs.
In general I'd say userspace shouldn't write stuff it doesn't understand. If you limit yourself to just the properties you do want to (re)set, that's safe. But if you just blindly write everything all the time, random modesets, and hence random failures if you don't set ALLOW_MODESET.
Assuming the property is userspace-writable: if kernel goes and changes the property value on its own, it will very likely be just overwritten by userspace right after if userspace does not manage to process the uevent first. If that happens and userspace later processes the uevent, userspace queries the kernel for the current proprerty state which is now what userspace wrote, not what firmware set.
Therefore you end up with the firmware hotkey working only randomly.
It would be much better to have the hotkey events delivered to userspace so that userspace can control the privacy screen and everything will be reliable, both the hotkeys and any GUI for it. The other reliable option is that userspace must never be able to change privacy screen state, only the hardware hotkeys can.
We have fancy new uevents which give you both the connector and the property, so you know what's going on.
Also, a property which userspace and the kernel can race like you describe above is broken. We don't have these, and we won't merge them.
The ones we do have the state transitions are a lot clearer, and userspace overwriting what the kernel has done is not actually going to cause a big pain. At least in the sense of the transition will be lost, since for e.g. both link_status and hdcp the value the kernel sets is not a value userspace can set. But it can result in problems if you just blindly write them again causing modesets you'd not expect. -Daniel
Regards,
Hans
*) Some udev event I guess, I sorta assume there already is a notification mechanism for property change notifications ?
Yes, such mechanism has been discussed before, see the thread containing https://lists.freedesktop.org/archives/dri-devel/2019-May/217588.html
TL;DR: the mechanism exists and is called the hotplug event. But the problem with the hotplug event is that it carries no information about what changed, so userspace is forced re-discover everything about the DRM device. That thread discusses extending the hotplug event to be able to signal changes in individual properties rather than "something maybe changed, you figure out what".
Thanks, pq
On Fri, 17 Apr 2020 16:17:18 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 17, 2020 at 11:02 AM Pekka Paalanen ppaalanen@gmail.com wrote:
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Uh if you do that you'll get random surprising failures if you don't also set ALLOW_MODESET, because that way you'll automatically repair link failures and stuff like that. I'm assuming your userspace only supplies all the properties for crtc and planes, and leaves connectors as-is? Otherwise you already have some fun bugs.
In general I'd say userspace shouldn't write stuff it doesn't understand. If you limit yourself to just the properties you do want to (re)set, that's safe. But if you just blindly write everything all the time, random modesets, and hence random failures if you don't set ALLOW_MODESET.
Hi,
how should userspace KMS program A recover from the situation when switching the VT back from KMS program B who changed properties that program A does not recognise? (I believe Weston does not recover at the moment.) This is very important for getting e.g. reliable color reproduction, since not all KMS programs are always up-to-date with everything the kernel exposes and people may switch between them. Not resetting everything may even encourage people to write hacks where you temporarily VT-switch away, run a KMS program to set one property, and then switch back assuming the property remains set. I have already seen someone mention they can enable VRR behind the display server's back like this.
I don't think Weston records and re-sets unknown properties yet, but I assumed it is what it needs to do to be able to reliably recover from VT-switches. In that case ALLOW_MODESET is of course set since all state is unknown and assumed bad.
I do believe Weston re-submits *everything* it knows about every update, except for CRTCs and connectors it has already disabled and knows are in disabled state (this could change though).
However, during steady-state operation when ALLOW_MODESET should not be necessary, is it still harmful to re-program *all* properties on every update?
After all, the kernel will just no-op all property setting where the value is already the right one, does it not?
The only "random" KMS state is the properties the userspace KMS program does not know that are set on start-up. I have been assuming that as long as you had fbdev active before the KMS program started, the unknown properties have "harmless" default values. And maybe even at driver device init if fbdev does not exist?
Is there something more up-to-date than https://blog.ffwll.ch/2016/01/vt-switching-with-atomic-modeset.html ?
Assuming the property is userspace-writable: if kernel goes and changes the property value on its own, it will very likely be just overwritten by userspace right after if userspace does not manage to process the uevent first. If that happens and userspace later processes the uevent, userspace queries the kernel for the current proprerty state which is now what userspace wrote, not what firmware set.
Therefore you end up with the firmware hotkey working only randomly.
It would be much better to have the hotkey events delivered to userspace so that userspace can control the privacy screen and everything will be reliable, both the hotkeys and any GUI for it. The other reliable option is that userspace must never be able to change privacy screen state, only the hardware hotkeys can.
We have fancy new uevents which give you both the connector and the property, so you know what's going on.
Yeah, userspace can know what changed, but not the new value with the uevent.
Also, a property which userspace and the kernel can race like you describe above is broken. We don't have these, and we won't merge them.
That's what I would expect, yes, but I'm not that optimistic that the knowledgeable maintainers can always catch them all, which is why I try to comment on UAPI additions that look fishy to me.
The ones we do have the state transitions are a lot clearer, and userspace overwriting what the kernel has done is not actually going to cause a big pain. At least in the sense of the transition will be lost, since for e.g. both link_status and hdcp the value the kernel sets is not a value userspace can set. But it can result in problems if you just blindly write them again causing modesets you'd not expect.
Yeah, HDCP "Content Protection" is very carefully engineered to cope with the awkwardness that both userspace and kernel will write it. It's cumbersome.
Btw. I searched for all occurrences of link_status in https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html and it seems it only has two possible values, good and bad, and no mention whether it is writable. Looks like it's writable. There does not seem to be a) an explanation how exactly it needs to the handled (writing it does something? what can you write?) or b) any way discern between kernel and userspace set values like HDCP "Content Protection" has.
Weston does not support link_status yet, FWIW.
Unless you have some other idea on how to reset unknown KMS state to sensible defaults that are always the same, I think any KMS property where the query can return a value that cannot be written back to it with ALLOW_MODESET is broken.
Thanks, pq
On Mon, 20 Apr 2020 11:27:04 +0300 Pekka Paalanen ppaalanen@gmail.com wrote:
On Fri, 17 Apr 2020 16:17:18 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 17, 2020 at 11:02 AM Pekka Paalanen ppaalanen@gmail.com wrote:
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Uh if you do that you'll get random surprising failures if you don't also set ALLOW_MODESET, because that way you'll automatically repair link failures and stuff like that. I'm assuming your userspace only supplies all the properties for crtc and planes, and leaves connectors as-is? Otherwise you already have some fun bugs.
In general I'd say userspace shouldn't write stuff it doesn't understand. If you limit yourself to just the properties you do want to (re)set, that's safe. But if you just blindly write everything all the time, random modesets, and hence random failures if you don't set ALLOW_MODESET.
Hi,
how should userspace KMS program A recover from the situation when switching the VT back from KMS program B who changed properties that program A does not recognise? (I believe Weston does not recover at the moment.) This is very important for getting e.g. reliable color reproduction, since not all KMS programs are always up-to-date with everything the kernel exposes and people may switch between them. Not resetting everything may even encourage people to write hacks where you temporarily VT-switch away, run a KMS program to set one property, and then switch back assuming the property remains set. I have already seen someone mention they can enable VRR behind the display server's back like this.
I don't think Weston records and re-sets unknown properties yet, but I assumed it is what it needs to do to be able to reliably recover from VT-switches. In that case ALLOW_MODESET is of course set since all state is unknown and assumed bad.
I do believe Weston re-submits *everything* it knows about every update, except for CRTCs and connectors it has already disabled and knows are in disabled state (this could change though).
However, during steady-state operation when ALLOW_MODESET should not be necessary, is it still harmful to re-program *all* properties on every update?
After all, the kernel will just no-op all property setting where the value is already the right one, does it not?
The only "random" KMS state is the properties the userspace KMS program does not know that are set on start-up. I have been assuming that as long as you had fbdev active before the KMS program started, the unknown properties have "harmless" default values. And maybe even at driver device init if fbdev does not exist?
Is there something more up-to-date than https://blog.ffwll.ch/2016/01/vt-switching-with-atomic-modeset.html ?
Thinking more, would the below work?
Actor: a KMS userspace program, e.g. a display server
- On start-up, read all KMS properties and their values. The properties that are not recognised are saved in a set called "reset unknowns" with their current values.
Optional: The program commits the "reset unknown" state to KMS with ALLOW_MODESET to ensure it all is writable as is; if that fails, there is no guarantee that the program could recover later, so it's best to abort in that case. This could be part of the initial modeset, too.
- When the program has lost and regained DRM master status, meaning that (unrecognised) KMS state is potentially incorrect, prepare an atomic commit with "reset unknowns" set and add all the recognised state the program knows of on top. This resets everything to like it was, with ALLOW_MODESET.
- At any other time, do not use the "reset unknowns" set.
The final point is the important one. I have assumed it would be safe to use always, but apparently not? Good thing I haven't yet written code to do that.
You have to recognise the property to know if it is safe to set needlessly (for convenience in both code simplicity and ease of debugging)?
Also, when using "reset unknowns" set, it actually has to be partitioned by KMS objects (CRTC, connector, plane...) so if e.g. a connector no longer exist, you don't attempt to set it.
However, this still leaves writable properties whose value read is not legal to write as broken. Let's pray that fbcon or a system compositor will never succeed in enabling HDCP...
Thanks, pq
I'd really like a drmModeAtomicAddDefaultProperty that resets a prop to its default value…
On Mon, Apr 20, 2020 at 01:04:20PM +0300, Pekka Paalanen wrote:
On Mon, 20 Apr 2020 11:27:04 +0300 Pekka Paalanen ppaalanen@gmail.com wrote:
On Fri, 17 Apr 2020 16:17:18 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 17, 2020 at 11:02 AM Pekka Paalanen ppaalanen@gmail.com wrote:
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Uh if you do that you'll get random surprising failures if you don't also set ALLOW_MODESET, because that way you'll automatically repair link failures and stuff like that. I'm assuming your userspace only supplies all the properties for crtc and planes, and leaves connectors as-is? Otherwise you already have some fun bugs.
In general I'd say userspace shouldn't write stuff it doesn't understand. If you limit yourself to just the properties you do want to (re)set, that's safe. But if you just blindly write everything all the time, random modesets, and hence random failures if you don't set ALLOW_MODESET.
Hi,
how should userspace KMS program A recover from the situation when switching the VT back from KMS program B who changed properties that program A does not recognise? (I believe Weston does not recover at the moment.) This is very important for getting e.g. reliable color reproduction, since not all KMS programs are always up-to-date with everything the kernel exposes and people may switch between them. Not resetting everything may even encourage people to write hacks where you temporarily VT-switch away, run a KMS program to set one property, and then switch back assuming the property remains set. I have already seen someone mention they can enable VRR behind the display server's back like this.
I don't think Weston records and re-sets unknown properties yet, but I assumed it is what it needs to do to be able to reliably recover from VT-switches. In that case ALLOW_MODESET is of course set since all state is unknown and assumed bad.
I do believe Weston re-submits *everything* it knows about every update, except for CRTCs and connectors it has already disabled and knows are in disabled state (this could change though).
However, during steady-state operation when ALLOW_MODESET should not be necessary, is it still harmful to re-program *all* properties on every update?
After all, the kernel will just no-op all property setting where the value is already the right one, does it not?
The only "random" KMS state is the properties the userspace KMS program does not know that are set on start-up. I have been assuming that as long as you had fbdev active before the KMS program started, the unknown properties have "harmless" default values. And maybe even at driver device init if fbdev does not exist?
Is there something more up-to-date than https://blog.ffwll.ch/2016/01/vt-switching-with-atomic-modeset.html ?
Sadly, nothing changed since then.
Thinking more, would the below work?
Yup, this would somewhat work. Except not always, I've found one case where even this goes wrong:
- Content-protection property, if enabled, has the kernel automatically switch to enabled if hdcp is actually on and authenticated and all that. Writing back that enabled value will fail. But there's good chances that at boot-up content protection isn't enabled, so should work out nicely.
- We could fix this by silently downcasting enabled to requested, but might still lead to surprises since it makes hdcp rather more sticky than some users might like.
Actor: a KMS userspace program, e.g. a display server
On start-up, read all KMS properties and their values. The properties that are not recognised are saved in a set called "reset unknowns" with their current values.
Optional: The program commits the "reset unknown" state to KMS with ALLOW_MODESET to ensure it all is writable as is; if that fails, there is no guarantee that the program could recover later, so it's best to abort in that case. This could be part of the initial modeset, too.
When the program has lost and regained DRM master status, meaning that (unrecognised) KMS state is potentially incorrect, prepare an atomic commit with "reset unknowns" set and add all the recognised state the program knows of on top. This resets everything to like it was, with ALLOW_MODESET.
At any other time, do not use the "reset unknowns" set.
The final point is the important one. I have assumed it would be safe to use always, but apparently not? Good thing I haven't yet written code to do that.
You have to recognise the property to know if it is safe to set needlessly (for convenience in both code simplicity and ease of debugging)?
Also, when using "reset unknowns" set, it actually has to be partitioned by KMS objects (CRTC, connector, plane...) so if e.g. a connector no longer exist, you don't attempt to set it.
However, this still leaves writable properties whose value read is not legal to write as broken. Let's pray that fbcon or a system compositor will never succeed in enabling HDCP...
Note that the kernel isn't entire consistent on this. I've looked a bit more closely at stuff. Ignoring content protection I've found following approaches things:
- self refresh helpers, which are entirely transparent. Therefore we do a hack to set allow_modeset when the self-refresh helpers need to do a modeset, to avoid total surprise for userspace. I think this is only ok for these kind of behind-the-scenes helpers like self-refresh.
- link-status is always reset to "good" when you include any connector, which might force a modeset. Even when allow_modeset isn't set by userspace. Maybe we should fix that, but we've discussed forever how to make sure a bad link isn't ever stuck at "bad" for old userspace, so we've gone with this. But maybe limiting to only allow_modeset cases would also work.
- I guess we could do stuff that only fires off when allow_modeset is set, but I haven't found some examples. I thought we've had some outside of self-refresh helpers already. The closest I've found is again link-status, which never allows userspace to set BAD, and silently upgrades to GOOD. So that userspace doing a blind safe/restore can't wreak stuff permanently.
It's all a bit nasty :-/
I think we should at least allow userspace to do a blind restore with allow_modeset and not expect bad side-effects. That would mean fixing at least the content protection stuff.
Plus documenting this in the kernel somewhere. As the official thing to do. But maybe we want some actual userspace doing this before we enshrine it as official policy. The content protection fix is a one-liner and can be cc'ed stable. -Daniel
On Tue, 21 Apr 2020 14:15:52 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Apr 20, 2020 at 01:04:20PM +0300, Pekka Paalanen wrote:
On Mon, 20 Apr 2020 11:27:04 +0300 Pekka Paalanen ppaalanen@gmail.com wrote:
On Fri, 17 Apr 2020 16:17:18 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 17, 2020 at 11:02 AM Pekka Paalanen ppaalanen@gmail.com wrote:
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Uh if you do that you'll get random surprising failures if you don't also set ALLOW_MODESET, because that way you'll automatically repair link failures and stuff like that. I'm assuming your userspace only supplies all the properties for crtc and planes, and leaves connectors as-is? Otherwise you already have some fun bugs.
In general I'd say userspace shouldn't write stuff it doesn't understand. If you limit yourself to just the properties you do want to (re)set, that's safe. But if you just blindly write everything all the time, random modesets, and hence random failures if you don't set ALLOW_MODESET.
Hi,
how should userspace KMS program A recover from the situation when switching the VT back from KMS program B who changed properties that program A does not recognise? (I believe Weston does not recover at the moment.) This is very important for getting e.g. reliable color reproduction, since not all KMS programs are always up-to-date with everything the kernel exposes and people may switch between them. Not resetting everything may even encourage people to write hacks where you temporarily VT-switch away, run a KMS program to set one property, and then switch back assuming the property remains set. I have already seen someone mention they can enable VRR behind the display server's back like this.
I don't think Weston records and re-sets unknown properties yet, but I assumed it is what it needs to do to be able to reliably recover from VT-switches. In that case ALLOW_MODESET is of course set since all state is unknown and assumed bad.
I do believe Weston re-submits *everything* it knows about every update, except for CRTCs and connectors it has already disabled and knows are in disabled state (this could change though).
However, during steady-state operation when ALLOW_MODESET should not be necessary, is it still harmful to re-program *all* properties on every update?
After all, the kernel will just no-op all property setting where the value is already the right one, does it not?
The only "random" KMS state is the properties the userspace KMS program does not know that are set on start-up. I have been assuming that as long as you had fbdev active before the KMS program started, the unknown properties have "harmless" default values. And maybe even at driver device init if fbdev does not exist?
Is there something more up-to-date than https://blog.ffwll.ch/2016/01/vt-switching-with-atomic-modeset.html ?
Sadly, nothing changed since then.
Thinking more, would the below work?
Yup, this would somewhat work. Except not always, I've found one case where even this goes wrong:
Content-protection property, if enabled, has the kernel automatically switch to enabled if hdcp is actually on and authenticated and all that. Writing back that enabled value will fail. But there's good chances that at boot-up content protection isn't enabled, so should work out nicely.
We could fix this by silently downcasting enabled to requested, but might still lead to surprises since it makes hdcp rather more sticky than some users might like.
The fix doesn't make HDCP any more or less sticky, it just makes it possible to not fail a resetting atomic commit. Without a resetting commit, it will remain DESIRED/ENABLED.
If "Content Protection" reads back as DESIRED, userspace that relies on read-back for reset will reset it to DESIRED. Could as well be ENABLED that the kernel just takes as DESIRED when written.
Actor: a KMS userspace program, e.g. a display server
On start-up, read all KMS properties and their values. The properties that are not recognised are saved in a set called "reset unknowns" with their current values.
Optional: The program commits the "reset unknown" state to KMS with ALLOW_MODESET to ensure it all is writable as is; if that fails, there is no guarantee that the program could recover later, so it's best to abort in that case. This could be part of the initial modeset, too.
When the program has lost and regained DRM master status, meaning that (unrecognised) KMS state is potentially incorrect, prepare an atomic commit with "reset unknowns" set and add all the recognised state the program knows of on top. This resets everything to like it was, with ALLOW_MODESET.
At any other time, do not use the "reset unknowns" set.
The final point is the important one. I have assumed it would be safe to use always, but apparently not? Good thing I haven't yet written code to do that.
You have to recognise the property to know if it is safe to set needlessly (for convenience in both code simplicity and ease of debugging)?
Also, when using "reset unknowns" set, it actually has to be partitioned by KMS objects (CRTC, connector, plane...) so if e.g. a connector no longer exist, you don't attempt to set it.
However, this still leaves writable properties whose value read is not legal to write as broken. Let's pray that fbcon or a system compositor will never succeed in enabling HDCP...
Note that the kernel isn't entire consistent on this. I've looked a bit more closely at stuff. Ignoring content protection I've found following approaches things:
self refresh helpers, which are entirely transparent. Therefore we do a hack to set allow_modeset when the self-refresh helpers need to do a modeset, to avoid total surprise for userspace. I think this is only ok for these kind of behind-the-scenes helpers like self-refresh.
link-status is always reset to "good" when you include any connector, which might force a modeset. Even when allow_modeset isn't set by userspace. Maybe we should fix that, but we've discussed forever how to make sure a bad link isn't ever stuck at "bad" for old userspace, so we've gone with this. But maybe limiting to only allow_modeset cases would also work.
Wait, what do you mean "include any connector"?
What exactly could cause a modeset instead of failure when ALLOW_MODESET is not set?
Does that mean that I'll never need to implement link-status handling in Weston, because the kernel will recover the link anyway? If the kernel does that, then what's the point of having a link-status property to begin with?
- I guess we could do stuff that only fires off when allow_modeset is set, but I haven't found some examples. I thought we've had some outside of self-refresh helpers already. The closest I've found is again link-status, which never allows userspace to set BAD, and silently upgrades to GOOD. So that userspace doing a blind safe/restore can't wreak stuff permanently.
Sounds like link-status was designed with a blind save/restore in mind.
It's all a bit nasty :-/
I think we should at least allow userspace to do a blind restore with allow_modeset and not expect bad side-effects. That would mean fixing at least the content protection stuff.
Plus documenting this in the kernel somewhere. As the official thing to do. But maybe we want some actual userspace doing this before we enshrine it as official policy. The content protection fix is a one-liner and can be cc'ed stable.
I'd probably not go there, a blind save does not guarantee a good state. The fix to "Content Protection" is not necessary (as long as userspace does not do a blind save/restore) if we can get the default state from the kernel. If we get the default state from the kernel, then userspace would be doing a blind restore but not save, meaning that the state actually is sane and writable.
I'd love to volunteer for writing the Weston code to make use of "get me sane default state" UAPI, but I'm afraid I'm not in that much control of my time.
Thanks, pq
On Tuesday, April 21, 2020 4:33 PM, Pekka Paalanen ppaalanen@gmail.com wrote:
I'd love to volunteer for writing the Weston code to make use of "get me sane default state" UAPI, but I'm afraid I'm not in that much control of my time.
I'm interested in this problem too. If someone writes the kernel side, I'll gladly write user-space for it.
On Tue, Apr 21, 2020 at 4:33 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:15:52 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Apr 20, 2020 at 01:04:20PM +0300, Pekka Paalanen wrote:
On Mon, 20 Apr 2020 11:27:04 +0300 Pekka Paalanen ppaalanen@gmail.com wrote:
On Fri, 17 Apr 2020 16:17:18 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 17, 2020 at 11:02 AM Pekka Paalanen ppaalanen@gmail.com wrote:
Hi,
let's think about how userspace uses atomic KMS UAPI. The simplest way to use atomic correctly is that userspace will for every update send the full, complete set of all properties that exist, both known and unknown to userspace (to recover from temporarily VT-switching to another KMS program that changes unknown properties). Attempting to track which properties already have their correct values in the kernel is extra work for just extra bugs.
Uh if you do that you'll get random surprising failures if you don't also set ALLOW_MODESET, because that way you'll automatically repair link failures and stuff like that. I'm assuming your userspace only supplies all the properties for crtc and planes, and leaves connectors as-is? Otherwise you already have some fun bugs.
In general I'd say userspace shouldn't write stuff it doesn't understand. If you limit yourself to just the properties you do want to (re)set, that's safe. But if you just blindly write everything all the time, random modesets, and hence random failures if you don't set ALLOW_MODESET.
Hi,
how should userspace KMS program A recover from the situation when switching the VT back from KMS program B who changed properties that program A does not recognise? (I believe Weston does not recover at the moment.) This is very important for getting e.g. reliable color reproduction, since not all KMS programs are always up-to-date with everything the kernel exposes and people may switch between them. Not resetting everything may even encourage people to write hacks where you temporarily VT-switch away, run a KMS program to set one property, and then switch back assuming the property remains set. I have already seen someone mention they can enable VRR behind the display server's back like this.
I don't think Weston records and re-sets unknown properties yet, but I assumed it is what it needs to do to be able to reliably recover from VT-switches. In that case ALLOW_MODESET is of course set since all state is unknown and assumed bad.
I do believe Weston re-submits *everything* it knows about every update, except for CRTCs and connectors it has already disabled and knows are in disabled state (this could change though).
However, during steady-state operation when ALLOW_MODESET should not be necessary, is it still harmful to re-program *all* properties on every update?
After all, the kernel will just no-op all property setting where the value is already the right one, does it not?
The only "random" KMS state is the properties the userspace KMS program does not know that are set on start-up. I have been assuming that as long as you had fbdev active before the KMS program started, the unknown properties have "harmless" default values. And maybe even at driver device init if fbdev does not exist?
Is there something more up-to-date than https://blog.ffwll.ch/2016/01/vt-switching-with-atomic-modeset.html ?
Sadly, nothing changed since then.
Thinking more, would the below work?
Yup, this would somewhat work. Except not always, I've found one case where even this goes wrong:
Content-protection property, if enabled, has the kernel automatically switch to enabled if hdcp is actually on and authenticated and all that. Writing back that enabled value will fail. But there's good chances that at boot-up content protection isn't enabled, so should work out nicely.
We could fix this by silently downcasting enabled to requested, but might still lead to surprises since it makes hdcp rather more sticky than some users might like.
The fix doesn't make HDCP any more or less sticky, it just makes it possible to not fail a resetting atomic commit. Without a resetting commit, it will remain DESIRED/ENABLED.
If "Content Protection" reads back as DESIRED, userspace that relies on read-back for reset will reset it to DESIRED. Could as well be ENABLED that the kernel just takes as DESIRED when written.
Actor: a KMS userspace program, e.g. a display server
On start-up, read all KMS properties and their values. The properties that are not recognised are saved in a set called "reset unknowns" with their current values.
Optional: The program commits the "reset unknown" state to KMS with ALLOW_MODESET to ensure it all is writable as is; if that fails, there is no guarantee that the program could recover later, so it's best to abort in that case. This could be part of the initial modeset, too.
When the program has lost and regained DRM master status, meaning that (unrecognised) KMS state is potentially incorrect, prepare an atomic commit with "reset unknowns" set and add all the recognised state the program knows of on top. This resets everything to like it was, with ALLOW_MODESET.
At any other time, do not use the "reset unknowns" set.
The final point is the important one. I have assumed it would be safe to use always, but apparently not? Good thing I haven't yet written code to do that.
You have to recognise the property to know if it is safe to set needlessly (for convenience in both code simplicity and ease of debugging)?
Also, when using "reset unknowns" set, it actually has to be partitioned by KMS objects (CRTC, connector, plane...) so if e.g. a connector no longer exist, you don't attempt to set it.
However, this still leaves writable properties whose value read is not legal to write as broken. Let's pray that fbcon or a system compositor will never succeed in enabling HDCP...
Note that the kernel isn't entire consistent on this. I've looked a bit more closely at stuff. Ignoring content protection I've found following approaches things:
self refresh helpers, which are entirely transparent. Therefore we do a hack to set allow_modeset when the self-refresh helpers need to do a modeset, to avoid total surprise for userspace. I think this is only ok for these kind of behind-the-scenes helpers like self-refresh.
link-status is always reset to "good" when you include any connector, which might force a modeset. Even when allow_modeset isn't set by userspace. Maybe we should fix that, but we've discussed forever how to make sure a bad link isn't ever stuck at "bad" for old userspace, so we've gone with this. But maybe limiting to only allow_modeset cases would also work.
Wait, what do you mean "include any connector"?
What exactly could cause a modeset instead of failure when ALLOW_MODESET is not set?
If you do an atomic commit with the connector included that has the bad link status, then it'll reset it to Good to try to get a full modeset to restore stuff. If you don't also have ALLOW_MODESET set, it'll fail and userspace might be sad and not understand what's going on. We can easily fix this by only forcing the link training to be fixed if userspace has set ALLOW_MODESET.
Does that mean that I'll never need to implement link-status handling in Weston, because the kernel will recover the link anyway? If the kernel does that, then what's the point of having a link-status property to begin with?
Well generally all your compositor does all day long is flip buffers. So you'll never get the kernel into restoring the link. Hence the uevent, so that the compositor can a) update the mode list, which might have changed b) do the modeset to restore stuff. The auto-fallback is so that stuff like users manually disabling/re-enabling an output actually helps with fixing stuff.
- I guess we could do stuff that only fires off when allow_modeset is set, but I haven't found some examples. I thought we've had some outside of self-refresh helpers already. The closest I've found is again link-status, which never allows userspace to set BAD, and silently upgrades to GOOD. So that userspace doing a blind safe/restore can't wreak stuff permanently.
Sounds like link-status was designed with a blind save/restore in mind.
Yeah that part we didn't screw up.
It's all a bit nasty :-/
I think we should at least allow userspace to do a blind restore with allow_modeset and not expect bad side-effects. That would mean fixing at least the content protection stuff.
Plus documenting this in the kernel somewhere. As the official thing to do. But maybe we want some actual userspace doing this before we enshrine it as official policy. The content protection fix is a one-liner and can be cc'ed stable.
I'd probably not go there, a blind save does not guarantee a good state. The fix to "Content Protection" is not necessary (as long as userspace does not do a blind save/restore) if we can get the default state from the kernel. If we get the default state from the kernel, then userspace would be doing a blind restore but not save, meaning that the state actually is sane and writable.
I'd love to volunteer for writing the Weston code to make use of "get me sane default state" UAPI, but I'm afraid I'm not in that much control of my time.
The problem is, what is your default state? Driver defaults (generally fairly random and mostly everything is off)? After fbcon has done it's, which might never happen when you've disabling fbdev/fbcon? Boot-up state from the firmware for drivers like i915 that support fastboot (and what if that's garbage)? These can all be different too.
The hard part isn't the uapi, it's the implementation and defining what you mean. Generally everyone wants their own version of default value semantics, so this a) encodes policy, which we try to avoid in the kernel and b) will be epic amounts of endless bikeshedding and fighting between use-cases. -Daniel
On Thu, 23 Apr 2020 17:01:49 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Apr 21, 2020 at 4:33 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:15:52 +0200 Daniel Vetter daniel@ffwll.ch wrote:
...
Note that the kernel isn't entire consistent on this. I've looked a bit more closely at stuff. Ignoring content protection I've found following approaches things:
self refresh helpers, which are entirely transparent. Therefore we do a hack to set allow_modeset when the self-refresh helpers need to do a modeset, to avoid total surprise for userspace. I think this is only ok for these kind of behind-the-scenes helpers like self-refresh.
link-status is always reset to "good" when you include any connector, which might force a modeset. Even when allow_modeset isn't set by userspace. Maybe we should fix that, but we've discussed forever how to make sure a bad link isn't ever stuck at "bad" for old userspace, so we've gone with this. But maybe limiting to only allow_modeset cases would also work.
Wait, what do you mean "include any connector"?
What exactly could cause a modeset instead of failure when ALLOW_MODESET is not set?
If you do an atomic commit with the connector included that has the bad link status, then it'll reset it to Good to try to get a full modeset to restore stuff. If you don't also have ALLOW_MODESET set, it'll fail and userspace might be sad and not understand what's going on. We can easily fix this by only forcing the link training to be fixed if userspace has set ALLOW_MODESET.
Hi,
oh, that's ok, the fail case is there like it should.
It sounded like even if userspace does not set ALLOW_MODESET, the kernel would still do a modeset in come cases. I'm happy to have misunderstood.
Does that mean that I'll never need to implement link-status handling in Weston, because the kernel will recover the link anyway? If the kernel does that, then what's the point of having a link-status property to begin with?
Well generally all your compositor does all day long is flip buffers. So you'll never get the kernel into restoring the link. Hence the uevent, so that the compositor can a) update the mode list, which might have changed b) do the modeset to restore stuff. The auto-fallback is so that stuff like users manually disabling/re-enabling an output actually helps with fixing stuff.
Ok, that's fine.
If link goes bad while Weston does not implement link-status, I'm totally happy to degree any fallout from that to be a Weston bug. It has never worked (right?), so it cannot be a kernel regression. It is quite important to me to be able to say to that a failure is a Weston bug, not a kernel regression, to not piss on the kernel devs.
- I guess we could do stuff that only fires off when allow_modeset is set, but I haven't found some examples. I thought we've had some outside of self-refresh helpers already. The closest I've found is again link-status, which never allows userspace to set BAD, and silently upgrades to GOOD. So that userspace doing a blind safe/restore can't wreak stuff permanently.
Sounds like link-status was designed with a blind save/restore in mind.
Yeah that part we didn't screw up.
It's all a bit nasty :-/
I think we should at least allow userspace to do a blind restore with allow_modeset and not expect bad side-effects. That would mean fixing at least the content protection stuff.
Plus documenting this in the kernel somewhere. As the official thing to do. But maybe we want some actual userspace doing this before we enshrine it as official policy. The content protection fix is a one-liner and can be cc'ed stable.
I'd probably not go there, a blind save does not guarantee a good state. The fix to "Content Protection" is not necessary (as long as userspace does not do a blind save/restore) if we can get the default state from the kernel. If we get the default state from the kernel, then userspace would be doing a blind restore but not save, meaning that the state actually is sane and writable.
I'd love to volunteer for writing the Weston code to make use of "get me sane default state" UAPI, but I'm afraid I'm not in that much control of my time.
The problem is, what is your default state? Driver defaults (generally fairly random and mostly everything is off)? After fbcon has done it's, which might never happen when you've disabling fbdev/fbcon? Boot-up state from the firmware for drivers like i915 that support fastboot (and what if that's garbage)? These can all be different too.
I believe what I want is more or less the driver defaults, or DRM core defaults for common KMS properties. It does not matter if they are arbitrary, as long as they remain the same across kernel versions. They need to be constants, so that I can rely on always getting to the same state on the same piece of hardware when I use the "default state" regardless of what might be happening software wise.
But of course the defaults must be legal values and they should make some sense. I'd consider "neutral", "identity", and "off" to be sensible values, but what that means for each property depends on the exact property. So there may be work to be done to unify the default values across drivers for driver-specific properties.
Having stuff off by default is ok. If that makes things not work, then we'll fix userspace to understand the new properties that are necessary. At no point it can be said to be a kernel regression - at least I very much hope and intend so.
There is always the problem of someone using new userspace first on old kernel that does not expose "default state" so userspace doesn't use "default state". Then upgrading to new kernel that has "default state" that is not good for making things work, but userspace starts using it because it becomes available and fails. Is there any way we can stop that from being seen as a kernel regression?
Fbcon has exactly the problem that fbcon might not exist, so we cannot rely on that. Is fbcon even reliable enough to offer constant defaults that could be relied upon to stay the same across kernel versions and software configurations?
State from firmware I would not trust at all, who knows when a firmware upgrade changes things.
What I'm saying is, the "default state" would very much be UABI. Not just how to access it, but the exact values of it for all existing properties. The important invariant is: for the same piece of hardware, I always get the same defaults (if any), regardless of any software or firmware versions.
This invariant should guarantee that e.g. measured monitor color profiles stay valid across kernel upgrades. I know, people who actually care about color do not trust even reboots, but if we can let them get away with just verifying the profile instead of wasting hours in re-measuring the profile, that would be good.
When a driver starts supporting a new property, it must set the default value of the property to the value it has been using implicitly before. But if that implicit value was not hardcoded and depends on variable factors, then we probably cannot avoid a change in behaviour when introducing the property or "default values".
The hard part isn't the uapi, it's the implementation and defining what you mean. Generally everyone wants their own version of default value semantics, so this a) encodes policy, which we try to avoid in the kernel and b) will be epic amounts of endless bikeshedding and fighting between use-cases.
It's a policy, yes. But we need a single source of arbitrary default values, so that we have a strategy for achieving the same hardware state even when userspace does not understand all KMS properties:
- after losing and re-gaining DRM master status (e.g. temporary VT-switch to another KMS program that modifies state I don't understand), and
- across changes in the software stack.
The former is the most important point, the latter would be good to have.
We already have a policy in the kernel: fbcon KMS state. If everyone can agree to that state, it's fine by me, but it also needs to be available when fbcon support is disabled. Maybe this could be a path forward? Expose what state fbcon would use, regardless of whether fbcon exists. It would probably miss the latter point, though, but that could be acceptable.
If we could trust that the KMS state is "sane" when any KMS program starts, then this problem doesn't exist: just read out the KMS state on start-up and use that. But since we don't know where the existing KMS state comes from, it could be anything. And for smooth hand-off between display servers, we can't have per DRM file description KMS state either.
Thanks, pq
On Fri, Apr 24, 2020 at 11:32:16AM +0300, Pekka Paalanen wrote:
On Thu, 23 Apr 2020 17:01:49 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Apr 21, 2020 at 4:33 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:15:52 +0200 Daniel Vetter daniel@ffwll.ch wrote:
...
Note that the kernel isn't entire consistent on this. I've looked a bit more closely at stuff. Ignoring content protection I've found following approaches things:
self refresh helpers, which are entirely transparent. Therefore we do a hack to set allow_modeset when the self-refresh helpers need to do a modeset, to avoid total surprise for userspace. I think this is only ok for these kind of behind-the-scenes helpers like self-refresh.
link-status is always reset to "good" when you include any connector, which might force a modeset. Even when allow_modeset isn't set by userspace. Maybe we should fix that, but we've discussed forever how to make sure a bad link isn't ever stuck at "bad" for old userspace, so we've gone with this. But maybe limiting to only allow_modeset cases would also work.
Wait, what do you mean "include any connector"?
What exactly could cause a modeset instead of failure when ALLOW_MODESET is not set?
If you do an atomic commit with the connector included that has the bad link status, then it'll reset it to Good to try to get a full modeset to restore stuff. If you don't also have ALLOW_MODESET set, it'll fail and userspace might be sad and not understand what's going on. We can easily fix this by only forcing the link training to be fixed if userspace has set ALLOW_MODESET.
Hi,
oh, that's ok, the fail case is there like it should.
It sounded like even if userspace does not set ALLOW_MODESET, the kernel would still do a modeset in come cases. I'm happy to have misunderstood.
Well currently that can go wrong, if you include a connector and a link-status is bad. But could be fixed fairly easily.
Does that mean that I'll never need to implement link-status handling in Weston, because the kernel will recover the link anyway? If the kernel does that, then what's the point of having a link-status property to begin with?
Well generally all your compositor does all day long is flip buffers. So you'll never get the kernel into restoring the link. Hence the uevent, so that the compositor can a) update the mode list, which might have changed b) do the modeset to restore stuff. The auto-fallback is so that stuff like users manually disabling/re-enabling an output actually helps with fixing stuff.
Ok, that's fine.
If link goes bad while Weston does not implement link-status, I'm totally happy to degree any fallout from that to be a Weston bug. It has never worked (right?), so it cannot be a kernel regression. It is quite important to me to be able to say to that a failure is a Weston bug, not a kernel regression, to not piss on the kernel devs.
- I guess we could do stuff that only fires off when allow_modeset is set, but I haven't found some examples. I thought we've had some outside of self-refresh helpers already. The closest I've found is again link-status, which never allows userspace to set BAD, and silently upgrades to GOOD. So that userspace doing a blind safe/restore can't wreak stuff permanently.
Sounds like link-status was designed with a blind save/restore in mind.
Yeah that part we didn't screw up.
It's all a bit nasty :-/
I think we should at least allow userspace to do a blind restore with allow_modeset and not expect bad side-effects. That would mean fixing at least the content protection stuff.
Plus documenting this in the kernel somewhere. As the official thing to do. But maybe we want some actual userspace doing this before we enshrine it as official policy. The content protection fix is a one-liner and can be cc'ed stable.
I'd probably not go there, a blind save does not guarantee a good state. The fix to "Content Protection" is not necessary (as long as userspace does not do a blind save/restore) if we can get the default state from the kernel. If we get the default state from the kernel, then userspace would be doing a blind restore but not save, meaning that the state actually is sane and writable.
I'd love to volunteer for writing the Weston code to make use of "get me sane default state" UAPI, but I'm afraid I'm not in that much control of my time.
The problem is, what is your default state? Driver defaults (generally fairly random and mostly everything is off)? After fbcon has done it's, which might never happen when you've disabling fbdev/fbcon? Boot-up state from the firmware for drivers like i915 that support fastboot (and what if that's garbage)? These can all be different too.
I believe what I want is more or less the driver defaults, or DRM core defaults for common KMS properties. It does not matter if they are arbitrary, as long as they remain the same across kernel versions. They need to be constants, so that I can rely on always getting to the same state on the same piece of hardware when I use the "default state" regardless of what might be happening software wise.
But of course the defaults must be legal values and they should make some sense. I'd consider "neutral", "identity", and "off" to be sensible values, but what that means for each property depends on the exact property. So there may be work to be done to unify the default values across drivers for driver-specific properties.
Having stuff off by default is ok. If that makes things not work, then we'll fix userspace to understand the new properties that are necessary. At no point it can be said to be a kernel regression - at least I very much hope and intend so.
There is always the problem of someone using new userspace first on old kernel that does not expose "default state" so userspace doesn't use "default state". Then upgrading to new kernel that has "default state" that is not good for making things work, but userspace starts using it because it becomes available and fails. Is there any way we can stop that from being seen as a kernel regression?
If you upgrade the kernel and existing userspace breaks, it's a regression.
There's no wiggle room in this, the only wiggle room is that sometimes people don't report issues, and eventually all affected hw is dead.
Fbcon has exactly the problem that fbcon might not exist, so we cannot rely on that. Is fbcon even reliable enough to offer constant defaults that could be relied upon to stay the same across kernel versions and software configurations?
It's getting hacked on quite a bit, in a fairly ad-hoc fashion. So "defaults as fbcon sets them" is something that's under fair bit of change.
State from firmware I would not trust at all, who knows when a firmware upgrade changes things.
What I'm saying is, the "default state" would very much be UABI. Not just how to access it, but the exact values of it for all existing properties. The important invariant is: for the same piece of hardware, I always get the same defaults (if any), regardless of any software or firmware versions.
Yup, that's why default state exposed by the kernel is such a pain: It's uapi. If we change it, we can break something somewhere. And the problem is complex enough that there's going to be lots of opinions about what the default should be.
fbcon is a lot easier here, since there there's at least some agreement that a working console should be visible and preferrably right side up and all that.
This invariant should guarantee that e.g. measured monitor color profiles stay valid across kernel upgrades. I know, people who actually care about color do not trust even reboots, but if we can let them get away with just verifying the profile instead of wasting hours in re-measuring the profile, that would be good.
When a driver starts supporting a new property, it must set the default value of the property to the value it has been using implicitly before. But if that implicit value was not hardcoded and depends on variable factors, then we probably cannot avoid a change in behaviour when introducing the property or "default values".
The problem is in figuring out that implicit value. In some cases I think it's even "whatever the hw booted with", but I've tried to handle these kinds of issues with atomic at least.
The hard part isn't the uapi, it's the implementation and defining what you mean. Generally everyone wants their own version of default value semantics, so this a) encodes policy, which we try to avoid in the kernel and b) will be epic amounts of endless bikeshedding and fighting between use-cases.
It's a policy, yes. But we need a single source of arbitrary default values, so that we have a strategy for achieving the same hardware state even when userspace does not understand all KMS properties:
after losing and re-gaining DRM master status (e.g. temporary VT-switch to another KMS program that modifies state I don't understand), and
across changes in the software stack.
The former is the most important point, the latter would be good to have.
But you can fix the above use-case also with save&restore. Which we're trying to support (but even that has bugs).
We already have a policy in the kernel: fbcon KMS state. If everyone can agree to that state, it's fine by me, but it also needs to be available when fbcon support is disabled. Maybe this could be a path forward? Expose what state fbcon would use, regardless of whether fbcon exists. It would probably miss the latter point, though, but that could be acceptable.
If we could trust that the KMS state is "sane" when any KMS program starts, then this problem doesn't exist: just read out the KMS state on start-up and use that. But since we don't know where the existing KMS state comes from, it could be anything. And for smooth hand-off between display servers, we can't have per DRM file description KMS state either.
You can assume that boot-up state is about as "sane" as it gets. So a system compositor (in logind or whatever, it already deals with this stuff) can reset the screen.
For everything else I'm not sure kernel defaults actually fixes anything. It just gives everyont the illusion that now the problem is fixed, except the fix is now kernel uapi, which means we can never change it.
The upshot of doing the exact same as a logind request along the lines of "pls reset display to sane state, thx" is that logind can be changed much easier than kernel uapi in case we get this wrong. Which we will. -Daniel
On Tue, 28 Apr 2020 16:51:57 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 24, 2020 at 11:32:16AM +0300, Pekka Paalanen wrote:
On Thu, 23 Apr 2020 17:01:49 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Apr 21, 2020 at 4:33 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:15:52 +0200 Daniel Vetter daniel@ffwll.ch wrote:
...
Hi,
please skip to the TL;DR at the bottom, the rest is just how I ended up there.
Note that the kernel isn't entire consistent on this. I've looked a bit more closely at stuff. Ignoring content protection I've found following approaches things:
self refresh helpers, which are entirely transparent. Therefore we do a hack to set allow_modeset when the self-refresh helpers need to do a modeset, to avoid total surprise for userspace. I think this is only ok for these kind of behind-the-scenes helpers like self-refresh.
link-status is always reset to "good" when you include any connector, which might force a modeset. Even when allow_modeset isn't set by userspace. Maybe we should fix that, but we've discussed forever how to make sure a bad link isn't ever stuck at "bad" for old userspace, so we've gone with this. But maybe limiting to only allow_modeset cases would also work.
Wait, what do you mean "include any connector"?
What exactly could cause a modeset instead of failure when ALLOW_MODESET is not set?
If you do an atomic commit with the connector included that has the bad link status, then it'll reset it to Good to try to get a full modeset to restore stuff. If you don't also have ALLOW_MODESET set, it'll fail and userspace might be sad and not understand what's going on. We can easily fix this by only forcing the link training to be fixed if userspace has set ALLOW_MODESET.
Hi,
oh, that's ok, the fail case is there like it should.
It sounded like even if userspace does not set ALLOW_MODESET, the kernel would still do a modeset in come cases. I'm happy to have misunderstood.
Well currently that can go wrong, if you include a connector and a link-status is bad. But could be fixed fairly easily.
What do you mean by "include a connector"? Which property on which object?
Weston always submits more or less full KMS state (known properties on intended-active outputs) on all updates, on simple pageflips too, so I am worried that the connector is "included", leading to automatic papering over link-status failures, and Weston doesn't handle link-status yet.
Weston does this, because it is the easy thing to do and debug. You don't have to track back thousands of frames to figure out what the mode or the connectors are, when something doesn't work like it should. Or to figure out why some state wasn't emitted when it was supposed to.
I'd probably not go there, a blind save does not guarantee a good state. The fix to "Content Protection" is not necessary (as long as userspace does not do a blind save/restore) if we can get the default state from the kernel. If we get the default state from the kernel, then userspace would be doing a blind restore but not save, meaning that the state actually is sane and writable.
I'd love to volunteer for writing the Weston code to make use of "get me sane default state" UAPI, but I'm afraid I'm not in that much control of my time.
The problem is, what is your default state? Driver defaults (generally fairly random and mostly everything is off)? After fbcon has done it's, which might never happen when you've disabling fbdev/fbcon? Boot-up state from the firmware for drivers like i915 that support fastboot (and what if that's garbage)? These can all be different too.
I believe what I want is more or less the driver defaults, or DRM core defaults for common KMS properties. It does not matter if they are arbitrary, as long as they remain the same across kernel versions. They need to be constants, so that I can rely on always getting to the same state on the same piece of hardware when I use the "default state" regardless of what might be happening software wise.
But of course the defaults must be legal values and they should make some sense. I'd consider "neutral", "identity", and "off" to be sensible values, but what that means for each property depends on the exact property. So there may be work to be done to unify the default values across drivers for driver-specific properties.
Having stuff off by default is ok. If that makes things not work, then we'll fix userspace to understand the new properties that are necessary. At no point it can be said to be a kernel regression - at least I very much hope and intend so.
There is always the problem of someone using new userspace first on old kernel that does not expose "default state" so userspace doesn't use "default state". Then upgrading to new kernel that has "default state" that is not good for making things work, but userspace starts using it because it becomes available and fails. Is there any way we can stop that from being seen as a kernel regression?
If you upgrade the kernel and existing userspace breaks, it's a regression.
There's no wiggle room in this, the only wiggle room is that sometimes people don't report issues, and eventually all affected hw is dead.
Then the only possibility is that the transition from not using default state to using default state does not change anything. Which means that the default state must be the state the KMS program would normally inherit, so, fbcon state?
Fbcon has exactly the problem that fbcon might not exist, so we cannot rely on that. Is fbcon even reliable enough to offer constant defaults that could be relied upon to stay the same across kernel versions and software configurations?
It's getting hacked on quite a bit, in a fairly ad-hoc fashion. So "defaults as fbcon sets them" is something that's under fair bit of change.
Yet, it seems to be the only "sane state" defined so far. Maybe we have to live with it changing from boot to boot.
We also cannot ever have anything better, because of the above issue "new userspace, old kernel, then upgrade kernel", can we? It doesn't "break" userspace but it could be a regression for the end user, because upgrading the kernel changes something that is visible and no longer the same as before, e.g. breaking monitor calibration or profile because suddenly some color related property is set differently.
When new KMS properties are introduced, how do people pick their default values in the kernel when no userspace sets them yet? Are those defaults immutable for eternity afterwards?
State from firmware I would not trust at all, who knows when a firmware upgrade changes things.
What I'm saying is, the "default state" would very much be UABI. Not just how to access it, but the exact values of it for all existing properties. The important invariant is: for the same piece of hardware, I always get the same defaults (if any), regardless of any software or firmware versions.
Yup, that's why default state exposed by the kernel is such a pain: It's uapi. If we change it, we can break something somewhere. And the problem is complex enough that there's going to be lots of opinions about what the default should be.
I think the current situation is just *hoping* that kernel doesn't change its defaults when they are not even defined in any one place. The defaults userspace currently gets in most cases is whatever it inherits from fbcon. And you said fbcon changes ad-hoc.
fbcon is a lot easier here, since there there's at least some agreement that a working console should be visible and preferrably right side up and all that.
Yeah, but what about gamma, color and other details that the console usually doesn't care about as long as e.g. red is somewhat reddish?
I remember someone telling me, that fbcon does not reset gamma tables, which means any gamma using KMS app can mess up the console and all other KMS apps that don't use gamma.
This invariant should guarantee that e.g. measured monitor color profiles stay valid across kernel upgrades. I know, people who actually care about color do not trust even reboots, but if we can let them get away with just verifying the profile instead of wasting hours in re-measuring the profile, that would be good.
When a driver starts supporting a new property, it must set the default value of the property to the value it has been using implicitly before. But if that implicit value was not hardcoded and depends on variable factors, then we probably cannot avoid a change in behaviour when introducing the property or "default values".
The problem is in figuring out that implicit value. In some cases I think it's even "whatever the hw booted with", but I've tried to handle these kinds of issues with atomic at least.
The hard part isn't the uapi, it's the implementation and defining what you mean. Generally everyone wants their own version of default value semantics, so this a) encodes policy, which we try to avoid in the kernel and b) will be epic amounts of endless bikeshedding and fighting between use-cases.
It's a policy, yes. But we need a single source of arbitrary default values, so that we have a strategy for achieving the same hardware state even when userspace does not understand all KMS properties:
after losing and re-gaining DRM master status (e.g. temporary VT-switch to another KMS program that modifies state I don't understand), and
across changes in the software stack.
The former is the most important point, the latter would be good to have.
But you can fix the above use-case also with save&restore. Which we're trying to support (but even that has bugs).
Ok, so we're back to blind save & restore with undefined default state, that depends not only on the hardware, the firmware, the kernel and drivers, but also what was the previous KMS client doing before you took over.
The only way to avoid undefined state is for all KMS programs to learn to know all existing properties, and maybe warn the user if any unrecognised properties appear. If KMS programs refuse to work when unrecognised properties appear, that means you will never be able to add anything to the properties again. Then you probably add caps for new properties and we are back to square one except the UAPI is a lot more complicated and userspace can longer warn the end user about potentially bad KMS state.
Since the kernel refuses to provide a constant default state, maybe we need a userspace library for that - one that *must* be updated for every new property added to the kernel. Then we could circumvent the kernel stable UAPI rules to provide predictable behaviour for userspace that wants it.
Except, that too, would hit the "new userspace, old kernel, then upgrade kernel" scenario where a kernel upgrade breaks userspace because it allows userspace to start using new properties with legal and working but different values from what it had before.
Are we really in a dead end with this?
We already have a policy in the kernel: fbcon KMS state. If everyone can agree to that state, it's fine by me, but it also needs to be available when fbcon support is disabled. Maybe this could be a path forward? Expose what state fbcon would use, regardless of whether fbcon exists. It would probably miss the latter point, though, but that could be acceptable.
If we could trust that the KMS state is "sane" when any KMS program starts, then this problem doesn't exist: just read out the KMS state on start-up and use that. But since we don't know where the existing KMS state comes from, it could be anything. And for smooth hand-off between display servers, we can't have per DRM file description KMS state either.
You can assume that boot-up state is about as "sane" as it gets. So a system compositor (in logind or whatever, it already deals with this stuff) can reset the screen.
System compositors largely do not exist, the whole concept has been dismissed. If logind would reset the screen, we would no longer have seamless hand-off between any display servers and anything. Logind could hand off the state for others to use as default state though.
How would one get the boot-up state?
Why do we need a userspace daemon to record KMS boot-up state instead of the kernel giving it to userspace? How would that daemon itself get the boot-up state if it has already been modified by fbcon?
For everything else I'm not sure kernel defaults actually fixes anything. It just gives everyont the illusion that now the problem is fixed, except the fix is now kernel uapi, which means we can never change it.
If we want constant defaults, then by definition they cannot ever be changed. It doesn't matter where it comes from.
I suppose the conclusion is that we cannot have constant defaults at all. The only defaults we can have are undefined. If someone does not like it, he needs to implement unrecognised properties in whatever userspace he cares about.
The upshot of doing the exact same as a logind request along the lines of "pls reset display to sane state, thx" is that logind can be changed much easier than kernel uapi in case we get this wrong. Which we will.
Sorry, I don't see the difference if the goal is to have fixed defaults. The only difference with logind is that there is no angry Linus.
TL;DR:
If blind save & restore (but restricted only to cases where KMS state may have been lost!) works, it will ensure that a running userspace program has what it was started with, but it does not guarantee that the state is the same between restarts of the program.
It seems that the default state is always undefined, and cannot be defined because whatever one defines might not always result in a picture on screen. Bugs in defined default state cannot be fixed, by definition as the default state is immutable once defined.
Ensuring consistent state between restarts is the responsibility of userspace and each KMS program individually. They might attempt to blindly save KMS state in a file to be re-used on the next start.
Thanks, pq
On Wed, Apr 29, 2020 at 01:07:54PM +0300, Pekka Paalanen wrote:
On Tue, 28 Apr 2020 16:51:57 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 24, 2020 at 11:32:16AM +0300, Pekka Paalanen wrote:
On Thu, 23 Apr 2020 17:01:49 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Apr 21, 2020 at 4:33 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:15:52 +0200 Daniel Vetter daniel@ffwll.ch wrote:
...
Hi,
please skip to the TL;DR at the bottom, the rest is just how I ended up there.
Note that the kernel isn't entire consistent on this. I've looked a bit more closely at stuff. Ignoring content protection I've found following approaches things:
self refresh helpers, which are entirely transparent. Therefore we do a hack to set allow_modeset when the self-refresh helpers need to do a modeset, to avoid total surprise for userspace. I think this is only ok for these kind of behind-the-scenes helpers like self-refresh.
link-status is always reset to "good" when you include any connector, which might force a modeset. Even when allow_modeset isn't set by userspace. Maybe we should fix that, but we've discussed forever how to make sure a bad link isn't ever stuck at "bad" for old userspace, so we've gone with this. But maybe limiting to only allow_modeset cases would also work.
Wait, what do you mean "include any connector"?
What exactly could cause a modeset instead of failure when ALLOW_MODESET is not set?
If you do an atomic commit with the connector included that has the bad link status, then it'll reset it to Good to try to get a full modeset to restore stuff. If you don't also have ALLOW_MODESET set, it'll fail and userspace might be sad and not understand what's going on. We can easily fix this by only forcing the link training to be fixed if userspace has set ALLOW_MODESET.
Hi,
oh, that's ok, the fail case is there like it should.
It sounded like even if userspace does not set ALLOW_MODESET, the kernel would still do a modeset in come cases. I'm happy to have misunderstood.
Well currently that can go wrong, if you include a connector and a link-status is bad. But could be fixed fairly easily.
What do you mean by "include a connector"? Which property on which object?
Weston always submits more or less full KMS state (known properties on intended-active outputs) on all updates, on simple pageflips too, so I am worried that the connector is "included", leading to automatic papering over link-status failures, and Weston doesn't handle link-status yet.
Include a connector = you have a property for a connector included in your atomic update. Sounds like you do that, so if you have a real world link-status failure, then things go a bit boom already.
I guess we need a kernel patch to make sure link-status only gets fixed when userspace is ok with a modeset.
Weston does this, because it is the easy thing to do and debug. You don't have to track back thousands of frames to figure out what the mode or the connectors are, when something doesn't work like it should. Or to figure out why some state wasn't emitted when it was supposed to.
I'd probably not go there, a blind save does not guarantee a good state. The fix to "Content Protection" is not necessary (as long as userspace does not do a blind save/restore) if we can get the default state from the kernel. If we get the default state from the kernel, then userspace would be doing a blind restore but not save, meaning that the state actually is sane and writable.
I'd love to volunteer for writing the Weston code to make use of "get me sane default state" UAPI, but I'm afraid I'm not in that much control of my time.
The problem is, what is your default state? Driver defaults (generally fairly random and mostly everything is off)? After fbcon has done it's, which might never happen when you've disabling fbdev/fbcon? Boot-up state from the firmware for drivers like i915 that support fastboot (and what if that's garbage)? These can all be different too.
I believe what I want is more or less the driver defaults, or DRM core defaults for common KMS properties. It does not matter if they are arbitrary, as long as they remain the same across kernel versions. They need to be constants, so that I can rely on always getting to the same state on the same piece of hardware when I use the "default state" regardless of what might be happening software wise.
But of course the defaults must be legal values and they should make some sense. I'd consider "neutral", "identity", and "off" to be sensible values, but what that means for each property depends on the exact property. So there may be work to be done to unify the default values across drivers for driver-specific properties.
Having stuff off by default is ok. If that makes things not work, then we'll fix userspace to understand the new properties that are necessary. At no point it can be said to be a kernel regression - at least I very much hope and intend so.
There is always the problem of someone using new userspace first on old kernel that does not expose "default state" so userspace doesn't use "default state". Then upgrading to new kernel that has "default state" that is not good for making things work, but userspace starts using it because it becomes available and fails. Is there any way we can stop that from being seen as a kernel regression?
If you upgrade the kernel and existing userspace breaks, it's a regression.
There's no wiggle room in this, the only wiggle room is that sometimes people don't report issues, and eventually all affected hw is dead.
Then the only possibility is that the transition from not using default state to using default state does not change anything. Which means that the default state must be the state the KMS program would normally inherit, so, fbcon state?
Fbcon has exactly the problem that fbcon might not exist, so we cannot rely on that. Is fbcon even reliable enough to offer constant defaults that could be relied upon to stay the same across kernel versions and software configurations?
It's getting hacked on quite a bit, in a fairly ad-hoc fashion. So "defaults as fbcon sets them" is something that's under fair bit of change.
Yet, it seems to be the only "sane state" defined so far. Maybe we have to live with it changing from boot to boot.
We also cannot ever have anything better, because of the above issue "new userspace, old kernel, then upgrade kernel", can we? It doesn't "break" userspace but it could be a regression for the end user, because upgrading the kernel changes something that is visible and no longer the same as before, e.g. breaking monitor calibration or profile because suddenly some color related property is set differently.
When new KMS properties are introduced, how do people pick their default values in the kernel when no userspace sets them yet? Are those defaults immutable for eternity afterwards?
State from firmware I would not trust at all, who knows when a firmware upgrade changes things.
What I'm saying is, the "default state" would very much be UABI. Not just how to access it, but the exact values of it for all existing properties. The important invariant is: for the same piece of hardware, I always get the same defaults (if any), regardless of any software or firmware versions.
Yup, that's why default state exposed by the kernel is such a pain: It's uapi. If we change it, we can break something somewhere. And the problem is complex enough that there's going to be lots of opinions about what the default should be.
I think the current situation is just *hoping* that kernel doesn't change its defaults when they are not even defined in any one place. The defaults userspace currently gets in most cases is whatever it inherits from fbcon. And you said fbcon changes ad-hoc.
fbcon is a lot easier here, since there there's at least some agreement that a working console should be visible and preferrably right side up and all that.
Yeah, but what about gamma, color and other details that the console usually doesn't care about as long as e.g. red is somewhat reddish?
I remember someone telling me, that fbcon does not reset gamma tables, which means any gamma using KMS app can mess up the console and all other KMS apps that don't use gamma.
This invariant should guarantee that e.g. measured monitor color profiles stay valid across kernel upgrades. I know, people who actually care about color do not trust even reboots, but if we can let them get away with just verifying the profile instead of wasting hours in re-measuring the profile, that would be good.
When a driver starts supporting a new property, it must set the default value of the property to the value it has been using implicitly before. But if that implicit value was not hardcoded and depends on variable factors, then we probably cannot avoid a change in behaviour when introducing the property or "default values".
The problem is in figuring out that implicit value. In some cases I think it's even "whatever the hw booted with", but I've tried to handle these kinds of issues with atomic at least.
The hard part isn't the uapi, it's the implementation and defining what you mean. Generally everyone wants their own version of default value semantics, so this a) encodes policy, which we try to avoid in the kernel and b) will be epic amounts of endless bikeshedding and fighting between use-cases.
It's a policy, yes. But we need a single source of arbitrary default values, so that we have a strategy for achieving the same hardware state even when userspace does not understand all KMS properties:
after losing and re-gaining DRM master status (e.g. temporary VT-switch to another KMS program that modifies state I don't understand), and
across changes in the software stack.
The former is the most important point, the latter would be good to have.
But you can fix the above use-case also with save&restore. Which we're trying to support (but even that has bugs).
Ok, so we're back to blind save & restore with undefined default state, that depends not only on the hardware, the firmware, the kernel and drivers, but also what was the previous KMS client doing before you took over.
The only way to avoid undefined state is for all KMS programs to learn to know all existing properties, and maybe warn the user if any unrecognised properties appear. If KMS programs refuse to work when unrecognised properties appear, that means you will never be able to add anything to the properties again. Then you probably add caps for new properties and we are back to square one except the UAPI is a lot more complicated and userspace can longer warn the end user about potentially bad KMS state.
Since the kernel refuses to provide a constant default state, maybe we need a userspace library for that - one that *must* be updated for every new property added to the kernel. Then we could circumvent the kernel stable UAPI rules to provide predictable behaviour for userspace that wants it.
Except, that too, would hit the "new userspace, old kernel, then upgrade kernel" scenario where a kernel upgrade breaks userspace because it allows userspace to start using new properties with legal and working but different values from what it had before.
Are we really in a dead end with this?
One idea I proposed on irc is that logind would capture the boot-up state, and you'd do a loginctl reset-screen cmd to reset it if something broke somewhere. logind already has the code for forced vt switching, that feels like a reasonable extension of that.
Then you could pick between "smooth transitions" (probably best for users only using 1 compositor) and "whack it back to boot-up state when switching to a new compositor" (maybe best default if you run multiple different compositors, logind could even do that automatically).
We already have a policy in the kernel: fbcon KMS state. If everyone can agree to that state, it's fine by me, but it also needs to be available when fbcon support is disabled. Maybe this could be a path forward? Expose what state fbcon would use, regardless of whether fbcon exists. It would probably miss the latter point, though, but that could be acceptable.
If we could trust that the KMS state is "sane" when any KMS program starts, then this problem doesn't exist: just read out the KMS state on start-up and use that. But since we don't know where the existing KMS state comes from, it could be anything. And for smooth hand-off between display servers, we can't have per DRM file description KMS state either.
You can assume that boot-up state is about as "sane" as it gets. So a system compositor (in logind or whatever, it already deals with this stuff) can reset the screen.
System compositors largely do not exist, the whole concept has been dismissed. If logind would reset the screen, we would no longer have seamless hand-off between any display servers and anything. Logind could hand off the state for others to use as default state though.
How would one get the boot-up state?
Why do we need a userspace daemon to record KMS boot-up state instead of the kernel giving it to userspace? How would that daemon itself get the boot-up state if it has already been modified by fbcon?
For everything else I'm not sure kernel defaults actually fixes anything. It just gives everyont the illusion that now the problem is fixed, except the fix is now kernel uapi, which means we can never change it.
If we want constant defaults, then by definition they cannot ever be changed. It doesn't matter where it comes from.
I suppose the conclusion is that we cannot have constant defaults at all. The only defaults we can have are undefined. If someone does not like it, he needs to implement unrecognised properties in whatever userspace he cares about.
Or config file of some sort to overwrite it. Like the kernel provides for configuring fbcon already (as cmdline options). We add a new one every time someone screams loud enough.
The upshot of doing the exact same as a logind request along the lines of "pls reset display to sane state, thx" is that logind can be changed much easier than kernel uapi in case we get this wrong. Which we will.
Sorry, I don't see the difference if the goal is to have fixed defaults. The only difference with logind is that there is no angry Linus.
Oh it's the exact same problem, but more focused. E.g. we don't have to worry about embedded stuff, because that's going to run one compositor only. So entire class of regression reports goes away.
If we put the defaults into the kernel, then more people will use it (it's there), so the wiggle room is substantially reduced once we do have conflict demands.
logind with multiple different compositors is going to be used on desktops/latops only, so some chances that the definition of what default should be all match up.
Also no regression pretty much applies across the board, e.g. we don't have modifiers enabled in many compositors by default yet because it breaks stuff.
TL;DR:
If blind save & restore (but restricted only to cases where KMS state may have been lost!) works, it will ensure that a running userspace program has what it was started with, but it does not guarantee that the state is the same between restarts of the program.
It seems that the default state is always undefined, and cannot be defined because whatever one defines might not always result in a picture on screen. Bugs in defined default state cannot be fixed, by definition as the default state is immutable once defined.
Ensuring consistent state between restarts is the responsibility of userspace and each KMS program individually. They might attempt to blindly save KMS state in a file to be re-used on the next start.
I disagree with "each KMS program individually". We already have defined protocols for drm master switching: Either the cooperative vt-switch dance, or the unpriviledge switch logind allows. We also have ad-hoc definitions of smooth transitions, it's "one primary plane, untiled". Some people broke that, and now we have getfb2 so you can smoothly transition even with tiled buffers. Except not everyone supports that.
So there's already a lot of stuff userspace needs to agree on, or your compositor switching is more or less broken. "reset to defaults if a compositor left a mess behind" is just one more aspect. Expecting the kernel to provide a magic solution in form of defaults isn't going to work well, same way it doesn't really work for drm master switching, or smooth transitions. Or anything else that comes up in the future between compositors.
I think on irc I mentioned that step 1 to fix this is come up with a userspace protocol between compositors (with or without logind, but really I'm not aware of anything else than logind). Once we have that, we can figure out where to store the defaults. Maybe the right answer then is "store them in the kernel, hardcoded as uapi", maybe not. But trying to find the answer to that before we have a clear protocol and responsibilities is just not going to work. -Daniel
On Thu, 30 Apr 2020 15:53:23 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Wed, Apr 29, 2020 at 01:07:54PM +0300, Pekka Paalanen wrote:
On Tue, 28 Apr 2020 16:51:57 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 24, 2020 at 11:32:16AM +0300, Pekka Paalanen wrote:
On Thu, 23 Apr 2020 17:01:49 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Apr 21, 2020 at 4:33 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Tue, 21 Apr 2020 14:15:52 +0200 Daniel Vetter daniel@ffwll.ch wrote:
...
Hi,
please skip to the TL;DR at the bottom, the rest is just how I ended up there.
> Note that the kernel isn't entire consistent on this. I've looked a bit > more closely at stuff. Ignoring content protection I've found following > approaches things: > > - self refresh helpers, which are entirely transparent. Therefore we do a > hack to set allow_modeset when the self-refresh helpers need to do a > modeset, to avoid total surprise for userspace. I think this is only ok > for these kind of behind-the-scenes helpers like self-refresh. > > - link-status is always reset to "good" when you include any connector, > which might force a modeset. Even when allow_modeset isn't set by > userspace. Maybe we should fix that, but we've discussed forever how to > make sure a bad link isn't ever stuck at "bad" for old userspace, so > we've gone with this. But maybe limiting to only allow_modeset cases > would also work.
Wait, what do you mean "include any connector"?
What exactly could cause a modeset instead of failure when ALLOW_MODESET is not set?
If you do an atomic commit with the connector included that has the bad link status, then it'll reset it to Good to try to get a full modeset to restore stuff. If you don't also have ALLOW_MODESET set, it'll fail and userspace might be sad and not understand what's going on. We can easily fix this by only forcing the link training to be fixed if userspace has set ALLOW_MODESET.
Hi,
oh, that's ok, the fail case is there like it should.
It sounded like even if userspace does not set ALLOW_MODESET, the kernel would still do a modeset in come cases. I'm happy to have misunderstood.
Well currently that can go wrong, if you include a connector and a link-status is bad. But could be fixed fairly easily.
What do you mean by "include a connector"? Which property on which object?
Weston always submits more or less full KMS state (known properties on intended-active outputs) on all updates, on simple pageflips too, so I am worried that the connector is "included", leading to automatic papering over link-status failures, and Weston doesn't handle link-status yet.
Include a connector = you have a property for a connector included in your atomic update. Sounds like you do that, so if you have a real world link-status failure, then things go a bit boom already.
Are you talking about setting a connector's property, like "CRTC_ID", "Content Protection", "HDCP Content Type"? Weston sets all of those on every update if they exist. Or is it about any property on any connector, not just a specific property on the specific connector, or any property on the specific connector?
Also CRTC properties "MODE_ID" and "ACTIVE" are set on every update, modeset or not. Weston just trusts that no-op changes in state do not require a modeset.
Is it not the opposite? If a link fails, then Weston will produce a glitch when the kernel automatically re-trains the link, and then everything continues happily?
I guess we need a kernel patch to make sure link-status only gets fixed when userspace is ok with a modeset.
That would be nice, but would it not also mean the above Weston case ends up with a permanent black screen instead of a temporary glitch?
Or is there a hotplug uevent at play here too?
Weston does this, because it is the easy thing to do and debug. You don't have to track back thousands of frames to figure out what the mode or the connectors are, when something doesn't work like it should. Or to figure out why some state wasn't emitted when it was supposed to.
...
One idea I proposed on irc is that logind would capture the boot-up state, and you'd do a loginctl reset-screen cmd to reset it if something broke somewhere. logind already has the code for forced vt switching, that feels like a reasonable extension of that.
Needing to run a command manually to restore the screen instead of "simply" switching to the graphical login manager to re-gain user control seems a bit difficult.
Then you could pick between "smooth transitions" (probably best for users only using 1 compositor) and "whack it back to boot-up state when switching to a new compositor" (maybe best default if you run multiple different compositors, logind could even do that automatically).
I think even better would be for logind to serialize and hand out the default state it saved, instead of applying it. Then the compositor who takes over can integrate the default state into its own state and inherited state to apply it all in one commit.
Doing so, the compositor could even do a TEST_ONLY commit without ALLOW_MODESET to see if a smooth transition is possible. If not, it can skip some smooth transition animation for instance.
The upshot of doing the exact same as a logind request along the lines of "pls reset display to sane state, thx" is that logind can be changed much easier than kernel uapi in case we get this wrong. Which we will.
Sorry, I don't see the difference if the goal is to have fixed defaults. The only difference with logind is that there is no angry Linus.
Oh it's the exact same problem, but more focused. E.g. we don't have to worry about embedded stuff, because that's going to run one compositor only. So entire class of regression reports goes away.
If we put the defaults into the kernel, then more people will use it (it's there), so the wiggle room is substantially reduced once we do have conflict demands.
logind with multiple different compositors is going to be used on desktops/latops only, so some chances that the definition of what default should be all match up.
Also no regression pretty much applies across the board, e.g. we don't have modifiers enabled in many compositors by default yet because it breaks stuff.
I'm not sure I understand, but ok, I'll stop pushing for it.
TL;DR:
If blind save & restore (but restricted only to cases where KMS state may have been lost!) works, it will ensure that a running userspace program has what it was started with, but it does not guarantee that the state is the same between restarts of the program.
It seems that the default state is always undefined, and cannot be defined because whatever one defines might not always result in a picture on screen. Bugs in defined default state cannot be fixed, by definition as the default state is immutable once defined.
Ensuring consistent state between restarts is the responsibility of userspace and each KMS program individually. They might attempt to blindly save KMS state in a file to be re-used on the next start.
I disagree with "each KMS program individually". We already have defined protocols for drm master switching: Either the cooperative vt-switch dance, or the unpriviledge switch logind allows. We also have ad-hoc definitions of smooth transitions, it's "one primary plane, untiled". Some people broke that, and now we have getfb2 so you can smoothly transition even with tiled buffers. Except not everyone supports that.
Switching is not the point with saving unrecognized KMS state into a compositor-specific file. The point of the file is to re-create the exact same KMS state after the compositor has been restarted, regardless of what KMS program might have ran first.
Use case: when you have completed monitor color profiling, the profile may become invalid as soon as anything affecting colors changes, e.g. link bit depth. With all the new properties for color spaces, HDR etc. the likelihood of an unrecognized property affecting colors is quite high. Either the property must be reset to the value it had during profiling, or the profile needs to be rejected. This is more about rebooting the machine than it is about switching between display servers.
If the compositor has such a file saved, it might as well use it also for sanitation when switching from another display server.
This still does not handle the case where a kernel upgrade introduces a new property, the graphical login manager sets the property altering colors, and the session compositor you profiled does not recognize the new property and also does not have it saved in the file because it didn't exist last time. But maybe that's a corner-case enough, and people who care will notice the profile is invalid when physically measuring its validity before starting to work. But this could be seen as a kernel regression too: you only had to upgrade the kernel. Logind saving state before fbcon takes over could remedy this, if the session compositor integrated the logind saved state on every start.
But again we would rely on kernel not changing the hardware state between kernel versions before fbcon takes over.
Thanks, pq
On Mon, May 4, 2020 at 11:49 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Thu, 30 Apr 2020 15:53:23 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Wed, Apr 29, 2020 at 01:07:54PM +0300, Pekka Paalanen wrote:
On Tue, 28 Apr 2020 16:51:57 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 24, 2020 at 11:32:16AM +0300, Pekka Paalanen wrote:
On Thu, 23 Apr 2020 17:01:49 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Tue, Apr 21, 2020 at 4:33 PM Pekka Paalanen ppaalanen@gmail.com wrote: > > On Tue, 21 Apr 2020 14:15:52 +0200 > Daniel Vetter daniel@ffwll.ch wrote: >
...
Hi,
please skip to the TL;DR at the bottom, the rest is just how I ended up there.
> > Note that the kernel isn't entire consistent on this. I've looked a bit > > more closely at stuff. Ignoring content protection I've found following > > approaches things: > > > > - self refresh helpers, which are entirely transparent. Therefore we do a > > hack to set allow_modeset when the self-refresh helpers need to do a > > modeset, to avoid total surprise for userspace. I think this is only ok > > for these kind of behind-the-scenes helpers like self-refresh. > > > > - link-status is always reset to "good" when you include any connector, > > which might force a modeset. Even when allow_modeset isn't set by > > userspace. Maybe we should fix that, but we've discussed forever how to > > make sure a bad link isn't ever stuck at "bad" for old userspace, so > > we've gone with this. But maybe limiting to only allow_modeset cases > > would also work. > > Wait, what do you mean "include any connector"? > > What exactly could cause a modeset instead of failure when > ALLOW_MODESET is not set?
If you do an atomic commit with the connector included that has the bad link status, then it'll reset it to Good to try to get a full modeset to restore stuff. If you don't also have ALLOW_MODESET set, it'll fail and userspace might be sad and not understand what's going on. We can easily fix this by only forcing the link training to be fixed if userspace has set ALLOW_MODESET.
Hi,
oh, that's ok, the fail case is there like it should.
It sounded like even if userspace does not set ALLOW_MODESET, the kernel would still do a modeset in come cases. I'm happy to have misunderstood.
Well currently that can go wrong, if you include a connector and a link-status is bad. But could be fixed fairly easily.
What do you mean by "include a connector"? Which property on which object?
Weston always submits more or less full KMS state (known properties on intended-active outputs) on all updates, on simple pageflips too, so I am worried that the connector is "included", leading to automatic papering over link-status failures, and Weston doesn't handle link-status yet.
Include a connector = you have a property for a connector included in your atomic update. Sounds like you do that, so if you have a real world link-status failure, then things go a bit boom already.
Are you talking about setting a connector's property, like "CRTC_ID", "Content Protection", "HDCP Content Type"? Weston sets all of those on every update if they exist. Or is it about any property on any connector, not just a specific property on the specific connector, or any property on the specific connector?
Any property on the specific connector where link-status has gone "bad".
Also CRTC properties "MODE_ID" and "ACTIVE" are set on every update, modeset or not. Weston just trusts that no-op changes in state do not require a modeset
Yup that's generally valid assumption, but also that's enough to hit the link-status == "bad" forced modeset I think.
Is it not the opposite? If a link fails, then Weston will produce a glitch when the kernel automatically re-trains the link, and then everything continues happily?
That's the soft link failure, where the kernel can recover on its own. There's the harder one where the new link status is degraded, and the old mode doesn't fit. Or the kernel just threw a fit and can't make stuff work anymore, and waits for the next userspace commit. Iirc for MST links we can't easily do this because locking, so gets pushed to a full modeset commit. Or something like that.
I guess we need a kernel patch to make sure link-status only gets fixed when userspace is ok with a modeset.
That would be nice, but would it not also mean the above Weston case ends up with a permanent black screen instead of a temporary glitch?
Or is there a hotplug uevent at play here too?
Yeah you get a hotplug uevent, so as long as you eventually process that and do a full modeset should be ok.
Weston does this, because it is the easy thing to do and debug. You don't have to track back thousands of frames to figure out what the mode or the connectors are, when something doesn't work like it should. Or to figure out why some state wasn't emitted when it was supposed to.
...
One idea I proposed on irc is that logind would capture the boot-up state, and you'd do a loginctl reset-screen cmd to reset it if something broke somewhere. logind already has the code for forced vt switching, that feels like a reasonable extension of that.
Needing to run a command manually to restore the screen instead of "simply" switching to the graphical login manager to re-gain user control seems a bit difficult.
But that requires mind reading skills. Computer can't tell the "pls don't wreak stuff, I want smooth switching" from "pls wreak stuff, because it's glitching" wish the user has when vt-switching.
Then you could pick between "smooth transitions" (probably best for users only using 1 compositor) and "whack it back to boot-up state when switching to a new compositor" (maybe best default if you run multiple different compositors, logind could even do that automatically).
I think even better would be for logind to serialize and hand out the default state it saved, instead of applying it. Then the compositor who takes over can integrate the default state into its own state and inherited state to apply it all in one commit.
Yeah that'd be a flavour of this userspace protocol that we need first - I have no idea about compositor interactions in userspace and what protocols they'd want, my input isn't going to be much use here I think.
Doing so, the compositor could even do a TEST_ONLY commit without ALLOW_MODESET to see if a smooth transition is possible. If not, it can skip some smooth transition animation for instance.
Generally that's not enough, most people want some blending. And for blending, you need to be able to read&understand what's already there. Hence getfb2 so you can get at modifiers.
The upshot of doing the exact same as a logind request along the lines of "pls reset display to sane state, thx" is that logind can be changed much easier than kernel uapi in case we get this wrong. Which we will.
Sorry, I don't see the difference if the goal is to have fixed defaults. The only difference with logind is that there is no angry Linus.
Oh it's the exact same problem, but more focused. E.g. we don't have to worry about embedded stuff, because that's going to run one compositor only. So entire class of regression reports goes away.
If we put the defaults into the kernel, then more people will use it (it's there), so the wiggle room is substantially reduced once we do have conflict demands.
logind with multiple different compositors is going to be used on desktops/latops only, so some chances that the definition of what default should be all match up.
Also no regression pretty much applies across the board, e.g. we don't have modifiers enabled in many compositors by default yet because it breaks stuff.
I'm not sure I understand, but ok, I'll stop pushing for it.
Maybe to clarify: I'm not against adding something to solve this to the kernel, period. If logind (or whatever) implements a proper compositor switching protocol for kms state (instead of the current "whatever, maybe glitches" approach), and it turns out the kernels needs something to make that work, we can add it. But adding default state to the kernel just so compositors can continue to not coordinate how vt switching and all that is supposed to work without glitches and leftover planes/cursors or whatever, then no, I don't like that idea. Because I don't think it'll really work. If that means vt switching is only glitch-free and can be reset to a working state if a compositor crashed if you have logind working, *shrug*.
TL;DR:
If blind save & restore (but restricted only to cases where KMS state may have been lost!) works, it will ensure that a running userspace program has what it was started with, but it does not guarantee that the state is the same between restarts of the program.
It seems that the default state is always undefined, and cannot be defined because whatever one defines might not always result in a picture on screen. Bugs in defined default state cannot be fixed, by definition as the default state is immutable once defined.
Ensuring consistent state between restarts is the responsibility of userspace and each KMS program individually. They might attempt to blindly save KMS state in a file to be re-used on the next start.
I disagree with "each KMS program individually". We already have defined protocols for drm master switching: Either the cooperative vt-switch dance, or the unpriviledge switch logind allows. We also have ad-hoc definitions of smooth transitions, it's "one primary plane, untiled". Some people broke that, and now we have getfb2 so you can smoothly transition even with tiled buffers. Except not everyone supports that.
Switching is not the point with saving unrecognized KMS state into a compositor-specific file. The point of the file is to re-create the exact same KMS state after the compositor has been restarted, regardless of what KMS program might have ran first.
This example isn't about when you can capture the "default state", saving on every vt switch doesn't work. This was another example to illustrate that smooth, glitch-free vt switching without coordination between compositors (and enforced by logind) doesn't work. Some compositors leave a tiled buffer behind, if the new compositor doesn't implement getfb2 then you get ugly artifacts when trying to blend to the new screen.
Use case: when you have completed monitor color profiling, the profile may become invalid as soon as anything affecting colors changes, e.g. link bit depth. With all the new properties for color spaces, HDR etc. the likelihood of an unrecognized property affecting colors is quite high. Either the property must be reset to the value it had during profiling, or the profile needs to be rejected. This is more about rebooting the machine than it is about switching between display servers.
If the compositor has such a file saved, it might as well use it also for sanitation when switching from another display server.
This still does not handle the case where a kernel upgrade introduces a new property, the graphical login manager sets the property altering colors, and the session compositor you profiled does not recognize the new property and also does not have it saved in the file because it didn't exist last time. But maybe that's a corner-case enough, and people who care will notice the profile is invalid when physically measuring its validity before starting to work. But this could be seen as a kernel regression too: you only had to upgrade the kernel. Logind saving state before fbcon takes over could remedy this, if the session compositor integrated the logind saved state on every start.
But again we would rely on kernel not changing the hardware state between kernel versions before fbcon takes over.
I guess my point is, this is a lot bigger problem than just the default state. This = vt switching that doesn't look horrible and doesn't result in artifacts and issues on the new compositor. -Daniel
On Mon, 4 May 2020 13:00:02 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Mon, May 4, 2020 at 11:49 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Thu, 30 Apr 2020 15:53:23 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Wed, Apr 29, 2020 at 01:07:54PM +0300, Pekka Paalanen wrote:
On Tue, 28 Apr 2020 16:51:57 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Apr 24, 2020 at 11:32:16AM +0300, Pekka Paalanen wrote:
On Thu, 23 Apr 2020 17:01:49 +0200 Daniel Vetter daniel@ffwll.ch wrote:
> On Tue, Apr 21, 2020 at 4:33 PM Pekka Paalanen ppaalanen@gmail.com wrote: > > > > On Tue, 21 Apr 2020 14:15:52 +0200 > > Daniel Vetter daniel@ffwll.ch wrote: > >
Include a connector = you have a property for a connector included in your atomic update. Sounds like you do that, so if you have a real world link-status failure, then things go a bit boom already.
Are you talking about setting a connector's property, like "CRTC_ID", "Content Protection", "HDCP Content Type"? Weston sets all of those on every update if they exist. Or is it about any property on any connector, not just a specific property on the specific connector, or any property on the specific connector?
Any property on the specific connector where link-status has gone "bad".
Ok, now I got it clearly. Thanks.
Also CRTC properties "MODE_ID" and "ACTIVE" are set on every update, modeset or not. Weston just trusts that no-op changes in state do not require a modeset
Yup that's generally valid assumption, but also that's enough to hit the link-status == "bad" forced modeset I think.
Is it not the opposite? If a link fails, then Weston will produce a glitch when the kernel automatically re-trains the link, and then everything continues happily?
That's the soft link failure, where the kernel can recover on its own. There's the harder one where the new link status is degraded, and the old mode doesn't fit. Or the kernel just threw a fit and can't make stuff work anymore, and waits for the next userspace commit. Iirc for MST links we can't easily do this because locking, so gets pushed to a full modeset commit. Or something like that.
I guess we need a kernel patch to make sure link-status only gets fixed when userspace is ok with a modeset.
That would be nice, but would it not also mean the above Weston case ends up with a permanent black screen instead of a temporary glitch?
Or is there a hotplug uevent at play here too?
Yeah you get a hotplug uevent, so as long as you eventually process that and do a full modeset should be ok.
I suppose Weston falls short there. Getting a hotplug event does not force a modeset on all active CRTCs. Weston does re-read resources and properties, but it does not trigger anything like re-picking the video mode if the connection status and EDID remains the same.
I hope we can say Weston never worked with link failures to begin with, and fix things properly through the link status property in Weston.
Weston does this, because it is the easy thing to do and debug. You don't have to track back thousands of frames to figure out what the mode or the connectors are, when something doesn't work like it should. Or to figure out why some state wasn't emitted when it was supposed to.
...
One idea I proposed on irc is that logind would capture the boot-up state, and you'd do a loginctl reset-screen cmd to reset it if something broke somewhere. logind already has the code for forced vt switching, that feels like a reasonable extension of that.
Needing to run a command manually to restore the screen instead of "simply" switching to the graphical login manager to re-gain user control seems a bit difficult.
But that requires mind reading skills. Computer can't tell the "pls don't wreak stuff, I want smooth switching" from "pls wreak stuff, because it's glitching" wish the user has when vt-switching.
No, it doesn't. The graphical login manager already saved its default state and it will always reinstate its default state. The session compositor hopefully inherits the same or at least compatible default state from the login manager, so the session compositor has it too. So smooth hand-over is always possible unless the session compositor goes out of its way to set otherwise. If the session compositor screws something up like a gamma table, the login manager's default state will just overwrite the bad data anyway. Whether that becomes a smooth transition or not depends on whether the reset needs a modeset and if the old FB is understandable.
The session will be permanently broken, as it will re-break when switching back to it, but at least the login manager remains always accessible.
Then you could pick between "smooth transitions" (probably best for users only using 1 compositor) and "whack it back to boot-up state when switching to a new compositor" (maybe best default if you run multiple different compositors, logind could even do that automatically).
I think even better would be for logind to serialize and hand out the default state it saved, instead of applying it. Then the compositor who takes over can integrate the default state into its own state and inherited state to apply it all in one commit.
Yeah that'd be a flavour of this userspace protocol that we need first
- I have no idea about compositor interactions in userspace and what
protocols they'd want, my input isn't going to be much use here I think.
Doing so, the compositor could even do a TEST_ONLY commit without ALLOW_MODESET to see if a smooth transition is possible. If not, it can skip some smooth transition animation for instance.
Generally that's not enough, most people want some blending. And for blending, you need to be able to read&understand what's already there. Hence getfb2 so you can get at modifiers.
I assumed a smooth transition does not make sense if a modeset is required. TEST_ONLY is useful for checking that.
...
Maybe to clarify: I'm not against adding something to solve this to the kernel, period. If logind (or whatever) implements a proper compositor switching protocol for kms state (instead of the current "whatever, maybe glitches" approach), and it turns out the kernels needs something to make that work, we can add it. But adding default state to the kernel just so compositors can continue to not coordinate how vt switching and all that is supposed to work without glitches and leftover planes/cursors or whatever, then no, I don't like that idea. Because I don't think it'll really work. If that means vt switching is only glitch-free and can be reset to a working state if a compositor crashed if you have logind working, *shrug*.
We don't actually need new kernel UAPI for doing the non-cooperative ignorant design. We can just do a blind save & restore to get that. (By "blind", I have always meant "including properties you do not understand"; not that you would use the state set as a whole without merging it with other inherited or wanted state which you do understand.)
What I was after with the kernel-hardcoded defaults is keeping the same state across restarts and reboots, even when the kernel gains new KMS properties. After all, the kernel drivers are the only thing that could know what the value in the hardware was before a new property to control it is introduced. A side-effect of such defaults being available is that they could be used instead of the blind save, to do a blind restore of the properties a KMS program does not understand.
TL;DR:
If blind save & restore (but restricted only to cases where KMS state may have been lost!) works, it will ensure that a running userspace program has what it was started with, but it does not guarantee that the state is the same between restarts of the program.
This was about switching.
It seems that the default state is always undefined, and cannot be defined because whatever one defines might not always result in a picture on screen. Bugs in defined default state cannot be fixed, by definition as the default state is immutable once defined.
Ensuring consistent state between restarts is the responsibility of
Here I started talking about "restarts".
userspace and each KMS program individually. They might attempt to blindly save KMS state in a file to be re-used on the next start.
I disagree with "each KMS program individually". We already have defined protocols for drm master switching: Either the cooperative vt-switch dance, or the unpriviledge switch logind allows. We also have ad-hoc definitions of smooth transitions, it's "one primary plane, untiled". Some people broke that, and now we have getfb2 so you can smoothly transition even with tiled buffers. Except not everyone supports that.
Switching is not the point with saving unrecognized KMS state into a compositor-specific file. The point of the file is to re-create the exact same KMS state after the compositor has been restarted, regardless of what KMS program might have ran first.
This example isn't about when you can capture the "default state", saving on every vt switch doesn't work. This was another example to illustrate that smooth, glitch-free vt switching without coordination between compositors (and enforced by logind) doesn't work. Some compositors leave a tiled buffer behind, if the new compositor doesn't implement getfb2 then you get ugly artifacts when trying to blend to the new screen.
Right, about switching. The blind save & restore solves switching for the the unrecognized KMS properties. I was not seeking to ensure that switching results in smooth transitions, only that smooth transitions are not deliberately made impossible by e.g. logind always resetting KMS state on switch.
So I consider switching solved, the only open question is about restarts.
Use case: when you have completed monitor color profiling, the profile may become invalid as soon as anything affecting colors changes, e.g. link bit depth. With all the new properties for color spaces, HDR etc. the likelihood of an unrecognized property affecting colors is quite high. Either the property must be reset to the value it had during profiling, or the profile needs to be rejected. This is more about rebooting the machine than it is about switching between display servers.
If the compositor has such a file saved, it might as well use it also for sanitation when switching from another display server.
This still does not handle the case where a kernel upgrade introduces a new property, the graphical login manager sets the property altering colors, and the session compositor you profiled does not recognize the new property and also does not have it saved in the file because it didn't exist last time. But maybe that's a corner-case enough, and people who care will notice the profile is invalid when physically measuring its validity before starting to work. But this could be seen as a kernel regression too: you only had to upgrade the kernel. Logind saving state before fbcon takes over could remedy this, if the session compositor integrated the logind saved state on every start.
But again we would rely on kernel not changing the hardware state between kernel versions before fbcon takes over.
I guess my point is, this is a lot bigger problem than just the default state. This = vt switching that doesn't look horrible and doesn't result in artifacts and issues on the new compositor.
I am interested in getting reliably to the same hardware state as you used to have before, either after reboots or after vt-switches. The transition does not have to be guaranteed to be smooth, but at the same time the restore policy should not exclude the possibility of a smooth transition.
Thanks, pq
Refocusing on where I think we still have a bit a disconnnect.
On Mon, May 04, 2020 at 03:22:28PM +0300, Pekka Paalanen wrote:
On Mon, 4 May 2020 13:00:02 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Mon, May 4, 2020 at 11:49 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Thu, 30 Apr 2020 15:53:23 +0200 Daniel Vetter daniel@ffwll.ch wrote:
I guess my point is, this is a lot bigger problem than just the default state. This = vt switching that doesn't look horrible and doesn't result in artifacts and issues on the new compositor.
I am interested in getting reliably to the same hardware state as you used to have before, either after reboots or after vt-switches. The transition does not have to be guaranteed to be smooth, but at the same time the restore policy should not exclude the possibility of a smooth transition.
So my point is kinda that if you want both, then the only way to get there to have a very clear set of what's allowed to be left behind be the outgoing compositor. For example: - only primary plane - only untiled - no color transform magic - ...
Imo this is a related problem to the save/restore topic, since if one compositor does something the new one doesn't understand (e.g. tiled buffers with modifiers, and new compositor doesn't use getfb2), then it's going to break.
Similar, if the old compositor sets a new color transform property that the new compositor doesn't understand, then you get a mess.
Blind restore handles the permanent mess issue, but it doesn't handle the smooth transition issue. But both problems are of the "old compositor did something the new compositor doesn't understand", hence why I chuck them into the same bin. And the issue with a blind save/restore, or kernel defaults, or any of the solutions proposed here is that they pretty much guarantee non-smooth transitions, they only solve the permanent damange part of the problem.
I think to solve both, we need some kind of proper compositor switching protocol, e.g. using logind: - old compositor transitions to the minimal config as specified somewhere - logind forces all other properties to "defaults", whether that's restoring boot-up defaults or defaults obtained from the kernel or something else is tbd - logind maybe even does a transition if there's multiple version of the protocol, e.g. v2 allows modifiers, v1 only untiled, so it'd need to do a blit to untiled if the new compositor only supports v1 switching protocol - new compositor takes over, and can continue the smooth transition since it's a well-defined starting state with limited feature usage, _and_ everything else is reset to "defaults"
I fear that if we only solve the "resets to defaults" issue then we can draw ourselves into a corner for the smooth transition stuff, if e.g. the wrong entity in the above dance forces the reset to defaults. -Daniel
On Tue, 5 May 2020 10:48:52 +0200 Daniel Vetter daniel@ffwll.ch wrote:
Refocusing on where I think we still have a bit a disconnnect.
On Mon, May 04, 2020 at 03:22:28PM +0300, Pekka Paalanen wrote:
On Mon, 4 May 2020 13:00:02 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Mon, May 4, 2020 at 11:49 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Thu, 30 Apr 2020 15:53:23 +0200 Daniel Vetter daniel@ffwll.ch wrote:
I guess my point is, this is a lot bigger problem than just the default state. This = vt switching that doesn't look horrible and doesn't result in artifacts and issues on the new compositor.
I am interested in getting reliably to the same hardware state as you used to have before, either after reboots or after vt-switches. The transition does not have to be guaranteed to be smooth, but at the same time the restore policy should not exclude the possibility of a smooth transition.
So my point is kinda that if you want both, then the only way to get there to have a very clear set of what's allowed to be left behind be the outgoing compositor. For example:
- only primary plane
- only untiled
- no color transform magic
- ...
Hi,
agreed, and to this I was implicitly adding "for everything else the old compositor must leave behind the so called default state".
That means the new compositor restoring defaults to the properties it does not understand is a no-op, unless the old compositor violated the smooth transition rules, in which case it might not be a smooth transition.
If on some system, the default state does not result in a "usable" picture on screen, then a compositor that does not understand the property will always be broken and has always been broken.
If such compositor ever worked ok on such system, it was because it inherited a property value that made things work. So if using defaults breaks things, it means the defaults were not actually the defaults.
This definition probably means that defaults cannot be universal constants, but I think it would be enough if they were constants given the hardware and firmware at hand. IOW, not changed by simply rebooting or upgrading the kernel.
Imo this is a related problem to the save/restore topic, since if one compositor does something the new one doesn't understand (e.g. tiled buffers with modifiers, and new compositor doesn't use getfb2), then it's going to break.
Similar, if the old compositor sets a new color transform property that the new compositor doesn't understand, then you get a mess.
Blind restore handles the permanent mess issue, but it doesn't handle the smooth transition issue. But both problems are of the "old compositor did something the new compositor doesn't understand", hence why I chuck them into the same bin. And the issue with a blind save/restore, or kernel defaults, or any of the solutions proposed here is that they pretty much guarantee non-smooth transitions, they only solve the permanent damange part of the problem.
Right, except I disagree on the "guarantee non-smooth transition", as I explained above.
I think to solve both, we need some kind of proper compositor switching protocol, e.g. using logind:
- old compositor transitions to the minimal config as specified somewhere
- logind forces all other properties to "defaults", whether that's restoring boot-up defaults or defaults obtained from the kernel or something else is tbd
In my mind, the new compositor would do this step. If the old compositor did its job, it's a no-op.
- logind maybe even does a transition if there's multiple version of the protocol, e.g. v2 allows modifiers, v1 only untiled, so it'd need to do a blit to untiled if the new compositor only supports v1 switching protocol
That would be fancy, but I'm not sure it's necessary. Maybe it becomes useful one day, and then that day we can look into involving actual protocol or logind. I just don't see a problem implementing the de facto protocol v1 that already exists without logind or such.
- new compositor takes over, and can continue the smooth transition since it's a well-defined starting state with limited feature usage, _and_ everything else is reset to "defaults"
I fear that if we only solve the "resets to defaults" issue then we can draw ourselves into a corner for the smooth transition stuff, if e.g. the wrong entity in the above dance forces the reset to defaults.
I guess we fundamentally agree. It just doesn't matter who does the reset to defaults if the old compositor does not, the transition cannot be smooth.
To me it seems the key here is that the old compositor *must* use "a simple KMS setup" and reset everything else to defaults.
Actually, that's what we require in practise already. If the old compositor leaves something funny in KMS state, the new compositor will either glitch or be permanently broken when taking over.
The only thing that needs more work to fix is to prevent the permanent breakage.
I guess this line on thinking also shows the answer to what KMS programs are supposed to do on switching out: clean up after yourself, but don't shut CRTCs down or change video modes.
Thanks, pq
On Monday, April 20, 2020 10:27 AM, Pekka Paalanen ppaalanen@gmail.com wrote:
The only "random" KMS state is the properties the userspace KMS program does not know that are set on start-up. I have been assuming that as long as you had fbdev active before the KMS program started, the unknown properties have "harmless" default values. And maybe even at driver device init if fbdev does not exist?
Note, this is not the case when using e.g. a display manager. In the past there have been cases of a display manager setting a hw cursor and launching a compositor not supporting hw cursors. This results in a stuck hw cursor.
Btw. I searched for all occurrences of link_status in https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html and it seems it only has two possible values, good and bad, and no mention whether it is writable. Looks like it's writable. There does not seem to be a) an explanation how exactly it needs to the handled (writing it does something? what can you write?) or b) any way discern between kernel and userspace set values like HDCP "Content Protection" has.
User-space needs to reset the value to GOOD when recovering from a BAD value.
On Mon, 20 Apr 2020 10:15:39 +0000 Simon Ser contact@emersion.fr wrote:
On Monday, April 20, 2020 10:27 AM, Pekka Paalanen ppaalanen@gmail.com wrote:
The only "random" KMS state is the properties the userspace KMS program does not know that are set on start-up. I have been assuming that as long as you had fbdev active before the KMS program started, the unknown properties have "harmless" default values. And maybe even at driver device init if fbdev does not exist?
I meant fbcon, not fbdev above.
Note, this is not the case when using e.g. a display manager. In the past there have been cases of a display manager setting a hw cursor and launching a compositor not supporting hw cursors. This results in a stuck hw cursor.
Indeed. So the display manager might get sensible defaults, but the session compositor might not. Or maybe boot splash uses KMS already, so even display manager doesn't get all-defaults state.
It seems we really do need "sane defaults" from the kernel explicitly. Writing a userspace tool to save it at boot time before any KMS program runs would be awkward.
Btw. I searched for all occurrences of link_status in https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html and it seems it only has two possible values, good and bad, and no mention whether it is writable. Looks like it's writable. There does not seem to be a) an explanation how exactly it needs to the handled (writing it does something? what can you write?) or b) any way discern between kernel and userspace set values like HDCP "Content Protection" has.
User-space needs to reset the value to GOOD when recovering from a BAD value.
What if userspace writes BAD?
BAD cannot be default state, so getting default state from somewhere would solve this property's restoring as well. Reading back the true current value could accidentally return BAD.
Just to reiterate for everyone, the important thing here is to figure out how userspace is supposed to reset unknown properties to sensible defaults. Once we know how that should work, we can review whether new properties support or break that.
Thanks, pq
On Monday, April 20, 2020 2:22 PM, Pekka Paalanen ppaalanen@gmail.com wrote:
On Mon, 20 Apr 2020 10:15:39 +0000 Simon Ser contact@emersion.fr wrote:
On Monday, April 20, 2020 10:27 AM, Pekka Paalanen ppaalanen@gmail.com wrote:
The only "random" KMS state is the properties the userspace KMS program does not know that are set on start-up. I have been assuming that as long as you had fbdev active before the KMS program started, the unknown properties have "harmless" default values. And maybe even at driver device init if fbdev does not exist?
I meant fbcon, not fbdev above.
Note, this is not the case when using e.g. a display manager. In the past there have been cases of a display manager setting a hw cursor and launching a compositor not supporting hw cursors. This results in a stuck hw cursor.
Indeed. So the display manager might get sensible defaults, but the session compositor might not. Or maybe boot splash uses KMS already, so even display manager doesn't get all-defaults state.
It seems we really do need "sane defaults" from the kernel explicitly. Writing a userspace tool to save it at boot time before any KMS program runs would be awkward.
Agreed.
Btw. I searched for all occurrences of link_status in https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html and it seems it only has two possible values, good and bad, and no mention whether it is writable. Looks like it's writable. There does not seem to be a) an explanation how exactly it needs to the handled (writing it does something? what can you write?) or b) any way discern between kernel and userspace set values like HDCP "Content Protection" has.
User-space needs to reset the value to GOOD when recovering from a BAD value.
What if userspace writes BAD?
BAD cannot be default state, so getting default state from somewhere would solve this property's restoring as well. Reading back the true current value could accidentally return BAD.
Interestingly, setting it to BAD is a no-op (BAD is silently discarded):
/* Never downgrade from GOOD to BAD on userspace's request here, * only hw issues can do that. * * For an atomic property the userspace doesn't need to be able * to understand all the properties, but needs to be able to * restore the state it wants on VT switch. So if the userspace * tries to change the link_status from GOOD to BAD, driver * silently rejects it and returns a 0. This prevents userspace * from accidently breaking the display when it restores the * state. */ if (state->link_status != DRM_LINK_STATUS_GOOD) state->link_status = val;
So restoring the "sane default" would be work, even if the link happens to be BAD when saving said "sane defaults".
Just to reiterate for everyone, the important thing here is to figure out how userspace is supposed to reset unknown properties to sensible defaults. Once we know how that should work, we can review whether new properties support or break that.
Yes, that's a good description of the problem.
I see two main solutions here: either the kernel provides the default values in its property descriptions (e.g. drmModeGetProperty), either user-space can ask the kernel to reset properties to their default values.
dri-devel@lists.freedesktop.org