New subject: [PATCH 1/6] fs: fix i_writecount on shmem and friends

19 Mar 2014


      Hi
This series introduces the concept of "file sealing". Sealing a file restricts
the set of allowed operations on the file in question. Multiple seals are
defined and each seal will cause a different set of operations to return EPERM
if it is set. The following seals are introduced:
* SEAL_SHRINK: If set, the inode size cannot be reduced
 * SEAL_GROW: If set, the inode size cannot be increased
 * SEAL_WRITE: If set, the file content cannot be modified
Unlike existing techniques that provide similar protection, sealing allows
file-sharing without any trust-relationship. This is enforced by rejecting seal
modifications if you don't own an exclusive reference to the given file. So if
you own a file-descriptor, you can be sure that no-one besides you can modify
the seals on the given file. This allows mapping shared files from untrusted
parties without the fear of the file getting truncated or modified by an
attacker.
Several use-cases exist that could make great use of sealing:
1) Graphics Compositors
     If a graphics client creates a memory-backed render-buffer and passes a
     file-decsriptor to it to the graphics server for display, the server
     _has_ to setup SIGBUS handlers whenever mapping the given file. Otherwise,
     the client might run ftruncate() or O_TRUNC on the on file in parallel,
     thus crashing the server.
     With sealing, a compositor can reject any incoming file-descriptor that
     does _not_ have SEAL_SHRINK set. This way, any memory-mappings are
     guaranteed to stay accessible. Furthermore, we still allow clients to
     increase the buffer-size in case they want to resize the render-buffer for
     the next frame. We also allow parallel writes so the client can render new
     frames into the same buffer (client is responsible of never rendering into
     a front-buffer if you want to avoid artifacts).
Real use-case: Wayland wl_shm buffers can be transparently converted
2) Geneal-purpose IPC
     IPC mechanisms that do not require a mutual trust-relationship (like dbus)
     cannot do zero-copy so far. With sealing, zero-copy can be easily done by
     sharing a file-descriptor that has SEAL_SHRINK | SEAL_GROW | SEAL_WRITE
     set. This way, the source can store sensible data in the file, seal the
     file and then pass it to the destination. The destination verifies these
     seals are set and then can parse the message in-line.
     Note that these files are usually one-shot files. Without any
     trust-relationship, a destination can notify the source that it released a
     file again, but a source can never rely on it. So unless the destination
     releases the file, a source cannot clear the seals for modification again.
     However, this is inherent to situations without any trust-relationship.
Real use-case: kdbus messages already use a similar interface and can be
                    transparently converted to use these seals
Other similar use-cases exist (eg., audio), but these two I am personally
working on. Interest in this interface has been raised from several other camps
and I've put respective maintainers into CC. If more information on these
use-cases is needed, I think they can give some insights.
The API introduced by this patchset is:
* fcntl() extension:
   Two new fcntl() commands are added that allow retrieveing (SHMEM_GET_SEALS)
   and setting (SHMEM_SET_SEALS) seals on a file. Only shmfs implements them so
   far and there is no intention to implement them on other file-systems.
   All shmfs based files support sealing.
Patch 2/6
* memfd_create() syscall:
   The new memfd_create() syscall is a public frontend to the shmem_file_new()
   interface in the kernel. It avoids the need of a local shmfs mount-point (as
   requested by android people) and acts more like MAP_ANON than O_TMPFILE.
Patch 3/6
The other 4 patches are cleanups, self-tests and docs.
The commit-messages explain the API extensions in detail. Man-page proposals
are also provided. Last but not least, the extensive self-tests document the
intended behavior, in case it is still not clear.
Technically, sealing and memfd_create() are independent, but the described
use-cases would greatly benefit from the combination of both. Hence, I merged
them into the same series. Please also note that this series is based on earlier
works (ashmem, memfd, shmgetfd, ..) and unifies these attempts.
Comments welcome!
Thanks
David
David Herrmann (4):
  fs: fix i_writecount on shmem and friends
  shm: add sealing API
  shm: add memfd_create() syscall
  selftests: add memfd_create() + sealing tests
David Herrmann (2): (man-pages)
  fcntl.2: document SHMEM_SET/GET_SEALS commands
  memfd_create.2: add memfd_create() man-page
arch/x86/syscalls/syscall_32.tbl           |   1 +
 arch/x86/syscalls/syscall_64.tbl           |   1 +
 fs/fcntl.c                                 |  12 +-
 fs/file_table.c                            |  27 +-
 include/linux/shmem_fs.h                   |  17 +
 include/linux/syscalls.h                   |   1 +
 include/uapi/linux/fcntl.h                 |  13 +
 include/uapi/linux/memfd.h                 |   9 +
 kernel/sys_ni.c                            |   1 +
 mm/shmem.c                                 | 267 +++++++-
 tools/testing/selftests/Makefile           |   1 +
 tools/testing/selftests/memfd/.gitignore   |   2 +
 tools/testing/selftests/memfd/Makefile     |  29 +
 tools/testing/selftests/memfd/memfd_test.c | 972 +++++++++++++++++++++++++++++
 14 files changed, 1338 insertions(+), 15 deletions(-)
 create mode 100644 include/uapi/linux/memfd.h
 create mode 100644 tools/testing/selftests/memfd/.gitignore
 create mode 100644 tools/testing/selftests/memfd/Makefile
 create mode 100644 tools/testing/selftests/memfd/memfd_test.c
-- 
1.9.0

[PATCH 0/6] File Sealing & memfd_create()