First off, apologies if the functionality described already exists and I just failed to find it, or if this isn't the correct venue for this discussion. If so, pointers to the correct location would be appreciated.
I'm currently looking into the feasibility of developing a remote access tool using kernel-level interfaces (e.g., drmModeGetFB and uinput) to operate regardless of whether the user is using Xorg, a Wayland compositor, or even a text console (assuming KMS is in use).
One of the requirements, however, is the remote user is able to "curtain" their session in order to prevent individuals near the physical machine from watching their session. Imagine a user working from home and connecting to their workstation in a shared office space.
One possible solution I came up with would be a new kernel API to allow a privileged process other than the DRM-Master to request that all displays of a card be blanked or left in power saving mode. This wouldn't affect the ability of the DRM-Master to change modes and layout configuration, but no content would be visible on the physical displays until the curtaining process ended the curtain or exited.
Is this (a) a good approach to solving this issue, (b) an API that, if implemented, would be likely to be accepted into the kernel, and (c) something that would be feasible to implement given the current architecture? E.g., would it require changes in individual drivers, or could it be managed solely in driver-independent kernel code?
I'm new to DRI development, so if it is something that folks would be open to having, pointers to a good part of the code to look at to start implementing such a feature would be appreciated.
Thanks!
On Fri, 3 Apr 2020 12:56:33 -0700 Erik Jensen rkjnsn@google.com wrote:
First off, apologies if the functionality described already exists and I just failed to find it, or if this isn't the correct venue for this discussion. If so, pointers to the correct location would be appreciated.
I'm currently looking into the feasibility of developing a remote access tool using kernel-level interfaces (e.g., drmModeGetFB and uinput) to operate regardless of whether the user is using Xorg, a Wayland compositor, or even a text console (assuming KMS is in use).
One of the requirements, however, is the remote user is able to "curtain" their session in order to prevent individuals near the physical machine from watching their session. Imagine a user working from home and connecting to their workstation in a shared office space.
One possible solution I came up with would be a new kernel API to allow a privileged process other than the DRM-Master to request that all displays of a card be blanked or left in power saving mode. This wouldn't affect the ability of the DRM-Master to change modes and layout configuration, but no content would be visible on the physical displays until the curtaining process ended the curtain or exited.
Is this (a) a good approach to solving this issue, (b) an API that, if implemented, would be likely to be accepted into the kernel, and (c) something that would be feasible to implement given the current architecture? E.g., would it require changes in individual drivers, or could it be managed solely in driver-independent kernel code?
I'm new to DRI development, so if it is something that folks would be open to having, pointers to a good part of the code to look at to start implementing such a feature would be appreciated.
Hi,
I have heard of such a screen scraper already existing, maybe Simon remembers where one is?
Personally I am very much against the whole idea:
Screen scraping like that will have big problems trying to a) synchronize to the display updates correctly (was the screen updated, did you get old or new frame, and you have to poll rather than be notified), and b) synchronizing framebuffer reads vs. writes (is the display server re-using the buffer when you are still reading it). You also get to handle each KMS plane individually.
You have to adapt to what the display server does and you have no way to negotiate better configurations. The framebuffers could be tiled and/or compressed, and quite likely are the kind of memory that is very slow to read by CPU, at least directly.
It obviously needs elevated privileges, because you are stealing data behind the display server's back. Then you are feeding it through network.
The curtaining goes against the policy that the current DRM master is in full control of the display. It also means the kernel has to lie to the DRM master to make the display server unaware of the funny business, and I don't like that at all.
With uinput, you will be having fun issues trying to guess what keymaps the display server and apps might be using, since you need to know that to be able to manufacture the right evdev keycodes that will be translated into the keysyms you actually wanted. Keymaps can change dynamically, too.
I believe it would much better to cooperate with display servers than trying to bypass and fool them. Maybe look towards Pipewire at least for the screen capturing API?
Thanks, pq
Hi,
I completely agree with Pekka here.
On Sunday, April 5, 2020 10:21 AM, Pekka Paalanen ppaalanen@gmail.com wrote:
I have heard of such a screen scraper already existing, maybe Simon remembers where one is?
Yes, ffmpeg's kmsgrab does the same thing. It doesn't support modifiers and only grabs the primary plane. It comes with all the other drawbacks Pekka mentioned.
Simon
Thanks for the reply! (And thanks Simon for the pointer to ffmpeg.)
Screen scraping like that will have big problems trying to a) synchronize to the display updates correctly (was the screen updated, did you get old or new frame, and you have to poll rather than be notified), and b) synchronizing framebuffer reads vs. writes (is the display server re-using the buffer when you are still reading it). You also get to handle each KMS plane individually.
We're not too concerned with every frame being perfect, as long as there aren't frequent annoying artifacts and the user receives feedback to typing and mouse movement in a reasonable amount of time. (Think browsing the web, not playing a video game.) I'll play around with ffmpeg's kmsgrab and see what we might expect on that front. Obviously we'd have to handle the hardware cursor in addition to the primary plane at the very least. Not sure how common video overlays are these days? It seems most players render via GL, now.
You have to adapt to what the display server does and you have no way to negotiate better configurations. The framebuffers could be tiled and/or compressed, and quite likely are the kind of memory that is very slow to read by CPU, at least directly.
Yeah, I see ffmpeg has some examples of feeding frames through VAAPI to handle situations where the buffer isn't CPU mapped. Maybe EGL_EXT_image_dma_buf_import could also be useful here?
It obviously needs elevated privileges, because you are stealing data behind the display server's back. Then you are feeding it through network.
Yes. It is expected that elevation would be required at the very least to grab the dma_buf fds and activate the proposed curtaining mode.
The curtaining goes against the policy that the current DRM master is in full control of the display. It also means the kernel has to lie to the DRM master to make the display server unaware of the funny business, and I don't like that at all.
The hope was that this could be done without interfering with the DRM master at all. The DRM master could still control resolutions, displays, determine which CRTCs go to what outputs, et cetera. It's just that the content wouldn't actually be visible on the screen while curtaining was enabled, conceptually similarly to if the physical displays themselves were configured not to display anything (e.g., switched to a different input, or brightness set to zero), which also wouldn't affect output and mode selection.
If this could be implemented in a relatively simple way (i.e., curtaining sets a flag that suppresses the actual scan out to the display, but everything else stays the same), it seems like it could be a worthwhile avenue to explore. On the other hand, if it requires adding a lot of complexity (e.g., maintaining a completely separate physical configuration for the graphics card and "shadow" configuration to report to the DRM master), I would certainly concur that it doesn't make sense to do. Which is closer to the truth is one of the things I was hoping to find out from this e-mail.
With uinput, you will be having fun issues trying to guess what keymaps the display server and apps might be using, since you need to know that to be able to manufacture the right evdev keycodes that will be translated into the keysyms you actually wanted. Keymaps can change dynamically, too.
This isn't a concern to us, as we plan to transmit keycodes and leave the keyboard mapping to the remote machine.
I believe it would much better to cooperate with display servers than trying to bypass and fool them. Maybe look towards Pipewire at least for the screen capturing API?
I agree that this could create a better experience for some use cases if supported by all components. Unfortunately, the variety of graphical login managers, display servers, and desktop environments with different amounts of resources and priorities means that coming up with a solution that works for all of them seems untenable. It would also preclude being able to use the console remotely.
Chrome Remote Desktop currently spins up its own display server (Xvfb) that's not attached to the local displays at all, but that has its own issues: the user can't interact with programs running in their existing local session, a number of programs don't support running multiple instances simultaneously using the same profile (e.g. Chrome, Firefox, IntelliJ), and in general things seem to be moving in the direction of assuming there will only ever be at most one graphical session at a time for each user. (E.g., DBUS using a single user bus in place of a per-session bus on many distributions. Also see https://gitlab.gnome.org/GNOME/gdm/-/issues/580, where GDM will fail to log a user in locally at all if a graphical PAM session already exists for the user, even if no programs are running in that session.)
Our hope is that interacting at the kernel level can avoid all of these issues, especially given that frame grabbing (albeit imperfect) and input injection are already supported by the kernel, with curtaining being the only thing that does not already have an existing interface.
Hi Erik,
On Mon, 6 Apr 2020 at 20:01, Erik Jensen rkjnsn@google.com wrote:
Screen scraping like that will have big problems trying to a) synchronize to the display updates correctly (was the screen updated, did you get old or new frame, and you have to poll rather than be notified), and b) synchronizing framebuffer reads vs. writes (is the display server re-using the buffer when you are still reading it). You also get to handle each KMS plane individually.
We're not too concerned with every frame being perfect, as long as there aren't frequent annoying artifacts and the user receives feedback to typing and mouse movement in a reasonable amount of time. (Think browsing the web, not playing a video game.) I'll play around with ffmpeg's kmsgrab and see what we might expect on that front. Obviously we'd have to handle the hardware cursor in addition to the primary plane at the very least. Not sure how common video overlays are these days? It seems most players render via GL, now.
A lot, but not all. X11 makes that the only reasonable choice thanks to its compositing design, but Wayland makes it possible to handle video externally, and that is what is encouraged.
You have to adapt to what the display server does and you have no way to negotiate better configurations. The framebuffers could be tiled and/or compressed, and quite likely are the kind of memory that is very slow to read by CPU, at least directly.
Yeah, I see ffmpeg has some examples of feeding frames through VAAPI to handle situations where the buffer isn't CPU mapped. Maybe EGL_EXT_image_dma_buf_import could also be useful here?
Don't forget modifiers!
The curtaining goes against the policy that the current DRM master is in full control of the display. It also means the kernel has to lie to the DRM master to make the display server unaware of the funny business, and I don't like that at all.
The hope was that this could be done without interfering with the DRM master at all. The DRM master could still control resolutions, displays, determine which CRTCs go to what outputs, et cetera. It's just that the content wouldn't actually be visible on the screen while curtaining was enabled, conceptually similarly to if the physical displays themselves were configured not to display anything (e.g., switched to a different input, or brightness set to zero), which also wouldn't affect output and mode selection.
If this could be implemented in a relatively simple way (i.e., curtaining sets a flag that suppresses the actual scan out to the display, but everything else stays the same), it seems like it could be a worthwhile avenue to explore. On the other hand, if it requires adding a lot of complexity (e.g., maintaining a completely separate physical configuration for the graphics card and "shadow" configuration to report to the DRM master), I would certainly concur that it doesn't make sense to do. Which is closer to the truth is one of the things I was hoping to find out from this e-mail.
I think you just end up inventing too much fake hardware in the kernel. If you handle curtaining by requiring the screen to be on and showing a black buffer, you have to allocate and show that (not as trivial as you might hope), and then keep a whole set of shadow state. If you handle it by having the CRTC be off, you have to spin a fake vblank loop in a shadow CRTC. I don't think this is something we would really want to keep.
I believe it would much better to cooperate with display servers than trying to bypass and fool them. Maybe look towards Pipewire at least for the screen capturing API?
I agree that this could create a better experience for some use cases if supported by all components. Unfortunately, the variety of graphical login managers, display servers, and desktop environments with different amounts of resources and priorities means that coming up with a solution that works for all of them seems untenable. It would also preclude being able to use the console remotely.
[... separate sessions aren't viable ...]
Our hope is that interacting at the kernel level can avoid all of these issues, especially given that frame grabbing (albeit imperfect) and input injection are already supported by the kernel, with curtaining being the only thing that does not already have an existing interface.
Well, it solves the issue of needing to fix userspace, but it definitely leaves you with a worse experience.
Userspace has largely standardised on PipeWire for remote streaming, which also handles things like hardware encoding for you, if desired. This is used in the xdg-desktop-portal (as used by GNOME, Flatpak, Chromium, Firefox, others) in particular, and implemented by many desktop environments. I think continuing to push the userspace side-channel is a far more viable long-term path. I would suggest starting with a single target desktop environment to design exemplary use and semantics, and then pushing that out into other environments as you come to rely on them.
Cheers, Daniel
On Mon, 6 Apr 2020 12:01:30 -0700 Erik Jensen rkjnsn@google.com wrote:
Thanks for the reply! (And thanks Simon for the pointer to ffmpeg.)
Screen scraping like that will have big problems trying to a) synchronize to the display updates correctly (was the screen updated, did you get old or new frame, and you have to poll rather than be notified), and b) synchronizing framebuffer reads vs. writes (is the display server re-using the buffer when you are still reading it). You also get to handle each KMS plane individually.
We're not too concerned with every frame being perfect, as long as there aren't frequent annoying artifacts and the user receives feedback to typing and mouse movement in a reasonable amount of time. (Think browsing the web, not playing a video game.) I'll play around with ffmpeg's kmsgrab and see what we might expect on that front. Obviously we'd have to handle the hardware cursor in addition to the primary plane at the very least. Not sure how common video overlays are these days? It seems most players render via GL, now.
Hi,
any kind of animation, while running, is a potential source of continuous fullscreen glitching, depending on how the display server works. Every key press etc. updating the screen could result in a temporary black screen in the worst case. Or it could be just fine. Or anything in between. It all depends on what the display server does and what you do and how (un)lucky you are.
Wayland compositors may and will attempt to use all hardware (including overlay) planes to present windows, regardless of window content. Some have it implemented already, some are still working towards it. The use of hardware planes is completely automatic, applications do not need to specifically ask for it.
Applications do have shortcuts they can take if they do specific things with Wayland, and these shortcuts are recommended because it has potential of making better use of the display hardware. These shortcuts are engineered to automatically take advantage of hardware overlay planes when available. With them, players can avoid using GL themselves.
You have to adapt to what the display server does and you have no way to negotiate better configurations. The framebuffers could be tiled and/or compressed, and quite likely are the kind of memory that is very slow to read by CPU, at least directly.
Yeah, I see ffmpeg has some examples of feeding frames through VAAPI to handle situations where the buffer isn't CPU mapped. Maybe EGL_EXT_image_dma_buf_import could also be useful here?
You can try, yes. Make sure to use the new GetFB2 KMS ioctl to get the modifiers.
Maybe I should underline the read/write race:
You do not get notified when a display server updates the screen, so you poll. When your poll returns a new FB id, you don't know how long it has already been up, IOW you don't know how many milliseconds you have grace time before the display server can replace it with another FB. After the display server has replaced the FB again, it is free to render again into the FB you just got. When and what the display server renders, makes all the difference in what you get if you read the FB too late.
E.g. the first thing a display server does could be to clear the FB to black. If you get your timings exactly wrong by accident, you will see nothing but black as long as anything is animating. Maybe even indefinitely if you don't, in addition to polling KMS state, also poll the contents of the FB you got.
Timings are something you cannot test for in general. You can test on a specific machine with a specific display server and specific apps, but whether that generalises to any other machine or most other machines will be hard to tell without testing them. Simply some system load could push you to the "wrong timings" region. A different video mode will invalidate all your testing.
This design cannot ever be reliable.
The curtaining goes against the policy that the current DRM master is in full control of the display. It also means the kernel has to lie to the DRM master to make the display server unaware of the funny business, and I don't like that at all.
The hope was that this could be done without interfering with the DRM master at all. The DRM master could still control resolutions, displays, determine which CRTCs go to what outputs, et cetera. It's just that the content wouldn't actually be visible on the screen while curtaining was enabled, conceptually similarly to if the physical displays themselves were configured not to display anything (e.g., switched to a different input, or brightness set to zero), which also wouldn't affect output and mode selection.
It does affect things. Not displaying an image can free up e.g. memory bandwidth, causing the DRM master to succeed in output configurations it cannot succeed if it was really driving the hardware. So the DRM master ends up in a state that cannot work, but it looks like it works because it's not actually done. When your curtaining then stops, who knows if the display server can recover, since the configuration in the kernel is something that cannot work.
To get around that, the kernel itself could allocate a placeholder FB and show that instead of the FB from the display server, but then you need all that faking complexity in the kernel. Using a different FB from what the display server programs may also change hardware state enough, that the consequences can leak: e.g. different pixel formats and modifiers consume different amounts of memory bandwidth.
If this could be implemented in a relatively simple way (i.e., curtaining sets a flag that suppresses the actual scan out to the display, but everything else stays the same), it seems like it could be a worthwhile avenue to explore. On the other hand, if it requires adding a lot of complexity (e.g., maintaining a completely separate physical configuration for the graphics card and "shadow" configuration to report to the DRM master), I would certainly concur that it doesn't make sense to do. Which is closer to the truth is one of the things I was hoping to find out from this e-mail.
I am very much trying to scare you with the worst case theoretical scenarios, because I think this is a really bad idea.
With uinput, you will be having fun issues trying to guess what keymaps the display server and apps might be using, since you need to know that to be able to manufacture the right evdev keycodes that will be translated into the keysyms you actually wanted. Keymaps can change dynamically, too.
This isn't a concern to us, as we plan to transmit keycodes and leave the keyboard mapping to the remote machine.
Why would that not have exactly the problem I mentioned?
Do you expect your users to figure out what keymap is in effect in the server, and manually configure the same in your remote client, then hope it doesn't change?
Keymaps can be user customised as well, so maybe your client does not even have a matching one.
Or maybe you require users to reconfigure their desktop keymap to match the remote viewer?
I believe it would much better to cooperate with display servers than trying to bypass and fool them. Maybe look towards Pipewire at least for the screen capturing API?
I agree that this could create a better experience for some use cases if supported by all components. Unfortunately, the variety of graphical login managers, display servers, and desktop environments with different amounts of resources and priorities means that coming up with a solution that works for all of them seems untenable. It would also preclude being able to use the console remotely.
Unfortunate, but it's the reality today.
If you wanted to do the capture properly, there are alternatives:
- a virtual DRM driver like VKMS (upstream) or EVDI (not upstream), which the display servers need to use explicitly
- running the desktop in a VM, let the VM handle the grabbing
- create your own Wayland compositor designed for your remoting and hosting a session Wayland compositor (nested compositor architecture)
None meet your exact requirements though.
Chrome Remote Desktop currently spins up its own display server (Xvfb) that's not attached to the local displays at all, but that has its own issues: the user can't interact with programs running in their existing local session, a number of programs don't support running multiple instances simultaneously using the same profile (e.g. Chrome, Firefox, IntelliJ), and in general things seem to be moving in the direction of assuming there will only ever be at most one graphical session at a time for each user. (E.g., DBUS using a single user bus in place of a per-session bus on many distributions. Also see https://gitlab.gnome.org/GNOME/gdm/-/issues/580, where GDM will fail to log a user in locally at all if a graphical PAM session already exists for the user, even if no programs are running in that session.)
Indeed.
Our hope is that interacting at the kernel level can avoid all of these issues, especially given that frame grabbing (albeit imperfect) and input injection are already supported by the kernel, with curtaining being the only thing that does not already have an existing interface.
I'm very pessimistic about that. I worry it creates issues for everyone else, while your scheme won't work properly anyway.
Thanks, pq
Hi,
On Tue, 7 Apr 2020 at 09:23, Pekka Paalanen ppaalanen@gmail.com wrote:
Maybe I should underline the read/write race:
You do not get notified when a display server updates the screen, so you poll. When your poll returns a new FB id,
And that's only useful for Wayland systems. On X11, the server can (and often does) render directly to a single static front buffer, so whilst the observed KMS FB ID never changes, the content does. Which you only know about if a) the display server tells you about it (so the magic isn't all hidden in the kernel anyway), or b) you just blindly update the content on a timer.
Cheers, Daniel
Thanks, all for the thorough answers, even if they weren't necessarily the answers I was hoping for. It sounds like a "system compositor" would definitely be the right place for a remote desktop solution, with the unfortunate caveat of not currently existing. :) (At least, the most recent mention I can find of folks working on the concept is from 2013.)
I'll probably make a post to wayland-devel to gather thoughts about how to best proceed. With the current trend toward assuming at most one graphical session for each user, the current approach of spinning up a separate Xvfb instance is becoming more broken more often, so hopefully there's a solution to be found somewhere in the stack.
On Tue, Apr 7, 2020 at 1:23 AM Pekka Paalanen ppaalanen@gmail.com wrote:
With uinput, you will be having fun issues trying to guess what keymaps the display server and apps might be using, since you need to know that to be able to manufacture the right evdev keycodes that will be translated into the keysyms you actually wanted. Keymaps can change dynamically, too.
This isn't a concern to us, as we plan to transmit keycodes and leave the keyboard mapping to the remote machine.
Why would that not have exactly the problem I mentioned?
Do you expect your users to figure out what keymap is in effect in the server, and manually configure the same in your remote client, then hope it doesn't change?
Keymaps can be user customised as well, so maybe your client does not even have a matching one.
Or maybe you require users to reconfigure their desktop keymap to match the remote viewer?
The model used by Chrome Remote Desktop is to operate as if the local keyboard were connected to the remote machine directly. I.e., the user manages the keyboard layout on the remote machine while connected, and the local layout is ignored. Our most common use case on Linux is a user connecting remotely to their own workstation, and this seems to work fairly well, and runs into fewer cross-platform issues than trying to guess what the user means.
On Sun, Apr 05, 2020 at 11:21:31AM +0300, Pekka Paalanen wrote:
On Fri, 3 Apr 2020 12:56:33 -0700 Erik Jensen rkjnsn@google.com wrote:
First off, apologies if the functionality described already exists and I just failed to find it, or if this isn't the correct venue for this discussion. If so, pointers to the correct location would be appreciated.
I'm currently looking into the feasibility of developing a remote access tool using kernel-level interfaces (e.g., drmModeGetFB and uinput) to operate regardless of whether the user is using Xorg, a Wayland compositor, or even a text console (assuming KMS is in use).
One of the requirements, however, is the remote user is able to "curtain" their session in order to prevent individuals near the physical machine from watching their session. Imagine a user working from home and connecting to their workstation in a shared office space.
One possible solution I came up with would be a new kernel API to allow a privileged process other than the DRM-Master to request that all displays of a card be blanked or left in power saving mode. This wouldn't affect the ability of the DRM-Master to change modes and layout configuration, but no content would be visible on the physical displays until the curtaining process ended the curtain or exited.
Is this (a) a good approach to solving this issue, (b) an API that, if implemented, would be likely to be accepted into the kernel, and (c) something that would be feasible to implement given the current architecture? E.g., would it require changes in individual drivers, or could it be managed solely in driver-independent kernel code?
I'm new to DRI development, so if it is something that folks would be open to having, pointers to a good part of the code to look at to start implementing such a feature would be appreciated.
Hi,
I have heard of such a screen scraper already existing, maybe Simon remembers where one is?
Personally I am very much against the whole idea:
Screen scraping like that will have big problems trying to a) synchronize to the display updates correctly (was the screen updated, did you get old or new frame, and you have to poll rather than be notified), and b) synchronizing framebuffer reads vs. writes (is the display server re-using the buffer when you are still reading it). You also get to handle each KMS plane individually.
You have to adapt to what the display server does and you have no way to negotiate better configurations. The framebuffers could be tiled and/or compressed, and quite likely are the kind of memory that is very slow to read by CPU, at least directly.
It obviously needs elevated privileges, because you are stealing data behind the display server's back. Then you are feeding it through network.
The curtaining goes against the policy that the current DRM master is in full control of the display. It also means the kernel has to lie to the DRM master to make the display server unaware of the funny business, and I don't like that at all.
With uinput, you will be having fun issues trying to guess what keymaps the display server and apps might be using, since you need to know that to be able to manufacture the right evdev keycodes that will be translated into the keysyms you actually wanted. Keymaps can change dynamically, too.
I believe it would much better to cooperate with display servers than trying to bypass and fool them. Maybe look towards Pipewire at least for the screen capturing API?
Yup, this should be done in something like systemd-logind, whose job it already is to forcibly remove master rights and stuff like that. "I'm going to take your display to blank it" is a very natural extension of that.
Wrt actual screen grabbing part: We have minimal support for that, for smooth/flicker free booting. But that assumes the old compositor has ceased to access the driver already, there's 0 support for correctly syncing access to buffers. So if you do that behind a compositor huge chances that you race and read out black, when the user actually sees seomthing ... Fixing that isn't possible without the cooperation of the compositor and some real protocol. Essentially you need a nested compositor, with maybe something like drm_lease thrown on top for a nice fast-path for when the system compositor does not want to capture all the frames.
Again the kernel isn't in that business, that's userspace's job. -Daniel
dri-devel@lists.freedesktop.org