Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de --- init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..f62fca13ac5b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN + select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore. @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP + bool "Enable kcmp() system call" if EXPERT + default y + help + Enable the file descriptor comparison system call. It provides + user-space with the ability to compare two fd to see if they + point to the same file, and check other attributes. + + If unsure, say Y. + config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS) - SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)"); + SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)"); }
TEST(mode_strict_support)
Am Freitag, dem 05.02.2021 um 16:37 +0000 schrieb Chris Wilson:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de
init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..f62fca13ac5b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
- select KCMP
default n help Enables additional kernel features in a sake of checkpoint/restore. @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
- bool "Enable kcmp() system call" if EXPERT
- default y
- help
Enable the file descriptor comparison system call. It provides
user-space with the ability to compare two fd to see if they
point to the same file, and check other attributes.
This description undersells the abilities of kcmp, while fd compare is the only thing used by the graphics stack, kcmp can compare a handful of other system resources, see man 2 kcmp. I think the helptext should at least try to cover this fact somewhat.
Regards, Lucas
If unsure, say Y.
config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de
init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..f62fca13ac5b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
- select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
- bool "Enable kcmp() system call" if EXPERT
- default y
I would expect this to be not default-y, especially if CHECKPOINT_RESTORE does a "select" on it.
This is a really powerful syscall, but it is bounded by ptrace access controls, and uses pointer address obfuscation, so it may be okay to expose this. As it is, at least Ubuntu already has CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much difference on exposure.
So, if you drop the "default y", I'm fine with this.
-Kees
- help
Enable the file descriptor comparison system call. It provides
user-space with the ability to compare two fd to see if they
point to the same file, and check other attributes.
If unsure, say Y.
config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
2.20.1
On Fri, Feb 5, 2021 at 7:37 PM Kees Cook keescook@chromium.org wrote:
On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de
init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..f62fca13ac5b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
bool "Enable kcmp() system call" if EXPERT
default y
I would expect this to be not default-y, especially if CHECKPOINT_RESTORE does a "select" on it.
This is a really powerful syscall, but it is bounded by ptrace access controls, and uses pointer address obfuscation, so it may be okay to expose this. As it is, at least Ubuntu already has CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much difference on exposure.
So, if you drop the "default y", I'm fine with this.
It was maybe stupid, but our userspace started relying on fd comaprison through sys_kcomp. So for better or worse, if you want to run the mesa3d gl/vk stacks, you need this. Was maybe not the brighest ideas, but since enough distros had this enabled by defaults, it wasn't really discovered, and now we're shipping this everywhere.
Ofc we can leave the default n, but the select if CONFIG_DRM is unfortunately needed I think. For that part:
Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
Also adding Dave Airlie for his take. -Daniel
-Kees
help
Enable the file descriptor comparison system call. It provides
user-space with the ability to compare two fd to see if they
point to the same file, and check other attributes.
If unsure, say Y.
config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
2.20.1
-- Kees Cook
On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 7:37 PM Kees Cook keescook@chromium.org wrote:
On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de
init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..f62fca13ac5b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
bool "Enable kcmp() system call" if EXPERT
default y
I would expect this to be not default-y, especially if CHECKPOINT_RESTORE does a "select" on it.
This is a really powerful syscall, but it is bounded by ptrace access controls, and uses pointer address obfuscation, so it may be okay to expose this. As it is, at least Ubuntu already has CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much difference on exposure.
So, if you drop the "default y", I'm fine with this.
It was maybe stupid, but our userspace started relying on fd comaprison through sys_kcomp. So for better or worse, if you want to run the mesa3d gl/vk stacks, you need this.
That's overstating things somewhat. The vast majority of applications will work fine regardless (as they did before Mesa started using this functionality). Only some special ones will run into issues, because the user-space drivers incorrectly assume two file descriptors reference different descriptions.
Was maybe not the brighest ideas, but since enough distros had this enabled by defaults,
Right, that (and the above) is why I considered it fair game to use. What should I have done instead? (TBH I was surprised that this functionality isn't generally available)
it wasn't really discovered, and now we're shipping this everywhere.
You're making it sound like this snuck in secretly somehow, which is not true of course.
Ofc we can leave the default n, but the select if CONFIG_DRM is unfortunately needed I think.
Per above, not sure this is really true.
On 2021-02-08 12:49 p.m., Michel Dänzer wrote:
On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 7:37 PM Kees Cook keescook@chromium.org wrote:
On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de
init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..f62fca13ac5b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN + select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore. @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP + bool "Enable kcmp() system call" if EXPERT + default y
I would expect this to be not default-y, especially if CHECKPOINT_RESTORE does a "select" on it.
This is a really powerful syscall, but it is bounded by ptrace access controls, and uses pointer address obfuscation, so it may be okay to expose this. As it is, at least Ubuntu already has CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much difference on exposure.
So, if you drop the "default y", I'm fine with this.
It was maybe stupid, but our userspace started relying on fd comaprison through sys_kcomp. So for better or worse, if you want to run the mesa3d gl/vk stacks, you need this.
That's overstating things somewhat. The vast majority of applications will work fine regardless (as they did before Mesa started using this functionality). Only some special ones will run into issues, because the user-space drivers incorrectly assume two file descriptors reference different descriptions.
Was maybe not the brighest ideas, but since enough distros had this enabled by defaults,
Right, that (and the above) is why I considered it fair game to use. What should I have done instead? (TBH I was surprised that this functionality isn't generally available)
In that spirit, an alternative might be to make KCMP_FILE available unconditionally, and the rest of SYS_kcmp only with CHECKPOINT_RESTORE as before. (Or maybe other parts of SYS_kcmp are generally useful as well?)
On Mon, Feb 8, 2021 at 12:49 PM Michel Dänzer michel@daenzer.net wrote:
On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 7:37 PM Kees Cook keescook@chromium.org wrote:
On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de
init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..f62fca13ac5b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
bool "Enable kcmp() system call" if EXPERT
default y
I would expect this to be not default-y, especially if CHECKPOINT_RESTORE does a "select" on it.
This is a really powerful syscall, but it is bounded by ptrace access controls, and uses pointer address obfuscation, so it may be okay to expose this. As it is, at least Ubuntu already has CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much difference on exposure.
So, if you drop the "default y", I'm fine with this.
It was maybe stupid, but our userspace started relying on fd comaprison through sys_kcomp. So for better or worse, if you want to run the mesa3d gl/vk stacks, you need this.
That's overstating things somewhat. The vast majority of applications will work fine regardless (as they did before Mesa started using this functionality). Only some special ones will run into issues, because the user-space drivers incorrectly assume two file descriptors reference different descriptions.
Was maybe not the brighest ideas, but since enough distros had this enabled by defaults,
Right, that (and the above) is why I considered it fair game to use. What should I have done instead? (TBH I was surprised that this functionality isn't generally available)
Yeah that one is fine, but I thought we've discussed (irc or something) more uses for de-duping dma-buf and stuff like that. But quick grep says that hasn't landed yet, so I got a bit confused (or just dreamt). Looking at this again I'm kinda surprised the drmfd de-duping blows up on normal linux distros, but I guess it can all happen.
it wasn't really discovered, and now we're shipping this everywhere.
You're making it sound like this snuck in secretly somehow, which is not true of course.
Ofc we can leave the default n, but the select if CONFIG_DRM is unfortunately needed I think.
Per above, not sure this is really true.
We seem to be going boom on linux distros now, maybe userspace got more creative in abusing stuff? The entire thing is small enough that imo we don't really have to care, e.g. we also unconditionally select dma-buf, despite that on most systems there's only 1 gpu, and you're never going to end up with a buffer sharing case that needs any of that code (aside from the "here's an fd" part).
But I guess we can limit to just KCMP_FILE like you suggest in another reply. Just feels a bit like overkill. -Daniel
On 2021-02-08 2:34 p.m., Daniel Vetter wrote:
On Mon, Feb 8, 2021 at 12:49 PM Michel Dänzer michel@daenzer.net wrote:
On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
On Fri, Feb 5, 2021 at 7:37 PM Kees Cook keescook@chromium.org wrote:
On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de
init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..f62fca13ac5b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
bool "Enable kcmp() system call" if EXPERT
default y
I would expect this to be not default-y, especially if CHECKPOINT_RESTORE does a "select" on it.
This is a really powerful syscall, but it is bounded by ptrace access controls, and uses pointer address obfuscation, so it may be okay to expose this. As it is, at least Ubuntu already has CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much difference on exposure.
So, if you drop the "default y", I'm fine with this.
It was maybe stupid, but our userspace started relying on fd comaprison through sys_kcomp. So for better or worse, if you want to run the mesa3d gl/vk stacks, you need this.
That's overstating things somewhat. The vast majority of applications will work fine regardless (as they did before Mesa started using this functionality). Only some special ones will run into issues, because the user-space drivers incorrectly assume two file descriptors reference different descriptions.
Was maybe not the brighest ideas, but since enough distros had this enabled by defaults,
Right, that (and the above) is why I considered it fair game to use. What should I have done instead? (TBH I was surprised that this functionality isn't generally available)
Yeah that one is fine, but I thought we've discussed (irc or something) more uses for de-duping dma-buf and stuff like that. But quick grep says that hasn't landed yet, so I got a bit confused (or just dreamt). Looking at this again I'm kinda surprised the drmfd de-duping blows up on normal linux distros, but I guess it can all happen.
One example: GEM handle name-spaces are per file description. If user-space incorrectly assumes two DRM fds are independent, when they actually reference the same file description, closing a GEM handle with one file descriptor will make it unusable with the other file descriptor as well.
Ofc we can leave the default n, but the select if CONFIG_DRM is unfortunately needed I think.
Per above, not sure this is really true.
We seem to be going boom on linux distros now, maybe userspace got more creative in abusing stuff?
I don't know what you're referring to. I've only seen maybe two or three reports from people who didn't enable CHECKPOINT_RESTORE in their self-built kernels.
The entire thing is small enough that imo we don't really have to care, e.g. we also unconditionally select dma-buf, despite that on most systems there's only 1 gpu, and you're never going to end up with a buffer sharing case that needs any of that code (aside from the "here's an fd" part).
But I guess we can limit to just KCMP_FILE like you suggest in another reply. Just feels a bit like overkill.
Making KCMP_FILE gated by DRM makes as little sense to me as by CHECKPOINT_RESTORE.
Hi Chris,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linux/master] [also build test ERROR on kees/for-next/seccomp kees/for-next/pstore linus/master v5.11-rc6 next-20210125] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chris-Wilson/kernel-Expose-SYS_kcmp... base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 config: powerpc-randconfig-r023-20210205 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project c9439ca36342fb6013187d0a69aef92736951476) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc cross compiling tool for clang build # apt-get install binutils-powerpc-linux-gnu # https://github.com/0day-ci/linux/commit/f7694e48ae81aac5a226e74421dbda1dcdc3... git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Chris-Wilson/kernel-Expose-SYS_kcmp-by-default/20210206-004006 git checkout f7694e48ae81aac5a226e74421dbda1dcdc3ca92 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc
If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot lkp@intel.com
All error/warnings (new ones prefixed by >>):
In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:190:1: note: expanded from here __do_insw ^ arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw' #define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from kernel/kcmp.c:3: In file included from include/linux/syscalls.h:84: In file included from include/trace/syscall.h:7: In file included from include/linux/trace_events.h:9: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:192:1: note: expanded from here __do_insl ^ arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl' #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from kernel/kcmp.c:3: In file included from include/linux/syscalls.h:84: In file included from include/trace/syscall.h:7: In file included from include/linux/trace_events.h:9: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:194:1: note: expanded from here __do_outsb ^ arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb' #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from kernel/kcmp.c:3: In file included from include/linux/syscalls.h:84: In file included from include/trace/syscall.h:7: In file included from include/linux/trace_events.h:9: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:196:1: note: expanded from here __do_outsw ^ arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw' #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from kernel/kcmp.c:3: In file included from include/linux/syscalls.h:84: In file included from include/trace/syscall.h:7: In file included from include/linux/trace_events.h:9: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:198:1: note: expanded from here __do_outsl ^ arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl' #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^
kernel/kcmp.c:117:13: error: implicit declaration of function 'get_epoll_tfile_raw_ptr' [-Werror,-Wimplicit-function-declaration]
filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); ^ kernel/kcmp.c:117:13: note: did you mean 'get_file_raw_ptr'? kernel/kcmp.c:62:1: note: 'get_file_raw_ptr' declared here get_file_raw_ptr(struct task_struct *task, unsigned int idx) ^
kernel/kcmp.c:117:11: warning: incompatible integer to pointer conversion assigning to 'struct file *' from 'int' [-Wint-conversion]
filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13 warnings and 1 error generated.
vim +/get_epoll_tfile_raw_ptr +117 kernel/kcmp.c
d97b46a64674a2 Cyrill Gorcunov 2012-05-31 96 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 97 #ifdef CONFIG_EPOLL 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 98 static int kcmp_epoll_target(struct task_struct *task1, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 99 struct task_struct *task2, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 100 unsigned long idx1, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 101 struct kcmp_epoll_slot __user *uslot) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 102 { 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 103 struct file *filp, *filp_epoll, *filp_tgt; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 104 struct kcmp_epoll_slot slot; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 105 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 106 if (copy_from_user(&slot, uslot, sizeof(slot))) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 107 return -EFAULT; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 108 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 109 filp = get_file_raw_ptr(task1, idx1); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 110 if (!filp) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 111 return -EBADF; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 112 f43c283a89a7dc Eric W. Biederman 2020-11-20 113 filp_epoll = fget_task(task2, slot.efd); f43c283a89a7dc Eric W. Biederman 2020-11-20 114 if (!filp_epoll) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 115 return -EBADF; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 116 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 @117 filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 118 fput(filp_epoll); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 119 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 120 if (IS_ERR(filp_tgt)) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 121 return PTR_ERR(filp_tgt); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 122 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 123 return kcmp_ptr(filp, filp_tgt, KCMP_FILE); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 124 } 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 125 #else 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 126 static int kcmp_epoll_target(struct task_struct *task1, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 127 struct task_struct *task2, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 128 unsigned long idx1, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 129 struct kcmp_epoll_slot __user *uslot) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 130 { 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 131 return -EOPNOTSUPP; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 132 } 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 133 #endif 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 134
--- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Chris,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linux/master] [also build test ERROR on kees/for-next/seccomp kees/for-next/pstore linus/master v5.11-rc6 next-20210125] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Chris-Wilson/kernel-Expose-SYS_kcmp... base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 config: i386-randconfig-s002-20210205 (attached as .config) compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.3-215-g0fb77bb6-dirty # https://github.com/0day-ci/linux/commit/f7694e48ae81aac5a226e74421dbda1dcdc3... git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Chris-Wilson/kernel-Expose-SYS_kcmp-by-default/20210206-004006 git checkout f7694e48ae81aac5a226e74421dbda1dcdc3ca92 # save the attached .config to linux build tree make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386
If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot lkp@intel.com
All error/warnings (new ones prefixed by >>):
kernel/kcmp.c: In function 'kcmp_epoll_target':
kernel/kcmp.c:117:13: error: implicit declaration of function 'get_epoll_tfile_raw_ptr'; did you mean 'get_file_raw_ptr'? [-Werror=implicit-function-declaration]
117 | filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); | ^~~~~~~~~~~~~~~~~~~~~~~ | get_file_raw_ptr
kernel/kcmp.c:117:11: warning: assignment to 'struct file *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
117 | filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); | ^ cc1: some warnings being treated as errors
vim +117 kernel/kcmp.c
d97b46a64674a2 Cyrill Gorcunov 2012-05-31 96 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 97 #ifdef CONFIG_EPOLL 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 98 static int kcmp_epoll_target(struct task_struct *task1, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 99 struct task_struct *task2, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 100 unsigned long idx1, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 101 struct kcmp_epoll_slot __user *uslot) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 102 { 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 103 struct file *filp, *filp_epoll, *filp_tgt; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 104 struct kcmp_epoll_slot slot; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 105 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 106 if (copy_from_user(&slot, uslot, sizeof(slot))) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 107 return -EFAULT; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 108 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 109 filp = get_file_raw_ptr(task1, idx1); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 110 if (!filp) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 111 return -EBADF; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 112 f43c283a89a7dc Eric W. Biederman 2020-11-20 113 filp_epoll = fget_task(task2, slot.efd); f43c283a89a7dc Eric W. Biederman 2020-11-20 114 if (!filp_epoll) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 115 return -EBADF; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 116 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 @117 filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 118 fput(filp_epoll); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 119 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 120 if (IS_ERR(filp_tgt)) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 121 return PTR_ERR(filp_tgt); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 122 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 123 return kcmp_ptr(filp, filp_tgt, KCMP_FILE); 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 124 } 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 125 #else 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 126 static int kcmp_epoll_target(struct task_struct *task1, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 127 struct task_struct *task2, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 128 unsigned long idx1, 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 129 struct kcmp_epoll_slot __user *uslot) 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 130 { 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 131 return -EOPNOTSUPP; 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 132 } 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 133 #endif 0791e3644e5ef2 Cyrill Gorcunov 2017-07-12 134
--- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Acked-by: Daniel Vetter daniel.vetter@ffwll.ch # DRM depends on SYS_kcmp
--- v2: - Default n. - Borrrow help message from man kcmp. - Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP --- fs/eventpoll.c | 4 ++-- include/linux/eventpoll.h | 2 +- init/Kconfig | 12 ++++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 5 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a829af074eb5..3196474cbe24 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; }
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
return file_raw; } -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP */
/** * Adds a new entry to the tail of the list in a lockless way, i.e. diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0350393465d4..593322c946e6 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,7 +18,7 @@ struct file;
#ifdef CONFIG_EPOLL
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); #endif
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..1b75141bc18b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN + select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore. @@ -1737,6 +1738,17 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP + bool "Enable kcmp() system call" if EXPERT + default n + help + Enable the kernel resource comparison system call. It provides + user-space with the ability to compare two processes to see if they + share a common resource, such as a file descriptor or even virtual + memory space. + + If unsure, say N. + config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS) - SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)"); + SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)"); }
TEST(mode_strict_support)
The subject should of course be changed, as it is no longer being enabled by default.
Something like
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
Quoting Chris Wilson (2021-02-05 21:06:10)
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Acked-by: Daniel Vetter daniel.vetter@ffwll.ch # DRM depends on SYS_kcmp
v2:
- Default n.
- Borrrow help message from man kcmp.
- Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
fs/eventpoll.c | 4 ++-- include/linux/eventpoll.h | 2 +- init/Kconfig | 12 ++++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 5 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a829af074eb5..3196474cbe24 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; }
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
return file_raw;
} -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP */
/**
- Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0350393465d4..593322c946e6 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,7 +18,7 @@ struct file;
#ifdef CONFIG_EPOLL
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); #endif
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..1b75141bc18b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,17 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
bool "Enable kcmp() system call" if EXPERT
default n
help
Enable the kernel resource comparison system call. It provides
user-space with the ability to compare two processes to see if they
share a common resource, such as a file descriptor or even virtual
memory space.
If unsure, say N.
config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
2.20.1
On Fri, Feb 05, 2021 at 09:16:01PM +0000, Chris Wilson wrote:
The subject should of course be changed, as it is no longer being enabled by default.
"default n" is redundant. I thought Daniel said CONFIG_DRM needed to "select" it too, though? Otherwise, yeah, this looks good. Was the export due to the 0-day bot failure reports?
-Kees
Something like
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
Quoting Chris Wilson (2021-02-05 21:06:10)
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Acked-by: Daniel Vetter daniel.vetter@ffwll.ch # DRM depends on SYS_kcmp
v2:
- Default n.
- Borrrow help message from man kcmp.
- Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
fs/eventpoll.c | 4 ++-- include/linux/eventpoll.h | 2 +- init/Kconfig | 12 ++++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 5 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a829af074eb5..3196474cbe24 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; }
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
return file_raw;
} -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP */
/**
- Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0350393465d4..593322c946e6 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,7 +18,7 @@ struct file;
#ifdef CONFIG_EPOLL
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); #endif
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..1b75141bc18b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,17 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
bool "Enable kcmp() system call" if EXPERT
default n
help
Enable the kernel resource comparison system call. It provides
user-space with the ability to compare two processes to see if they
share a common resource, such as a file descriptor or even virtual
memory space.
If unsure, say N.
config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
2.20.1
Quoting Kees Cook (2021-02-05 21:20:33)
On Fri, Feb 05, 2021 at 09:16:01PM +0000, Chris Wilson wrote:
The subject should of course be changed, as it is no longer being enabled by default.
"default n" is redundant.
I thought being explicit would be preferred. There are a few other default n, so at least it's not the odd-one-out!
I thought Daniel said CONFIG_DRM needed to "select" it too, though?
Yes. We will need to select it for any DRM driver so that the Vulkan/GL stacks can rely on having SYS_kcmp. That deserves to be handled and explain within drm/Kconfig, and as they are already shipping with calls to SYS_kcmp we may have to ask for a stable backport.
Otherwise, yeah, this looks good. Was the export due to the 0-day bot failure reports?
Yes. -Chris
On Fri, Feb 5, 2021 at 10:28 PM Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Kees Cook (2021-02-05 21:20:33)
On Fri, Feb 05, 2021 at 09:16:01PM +0000, Chris Wilson wrote:
The subject should of course be changed, as it is no longer being enabled by default.
"default n" is redundant.
I thought being explicit would be preferred. There are a few other default n, so at least it's not the odd-one-out!
I thought Daniel said CONFIG_DRM needed to "select" it too, though?
Yes. We will need to select it for any DRM driver so that the Vulkan/GL stacks can rely on having SYS_kcmp. That deserves to be handled and explain within drm/Kconfig, and as they are already shipping with calls to SYS_kcmp we may have to ask for a stable backport.
Oh I dreamed and thought it's part of this patch already. So v3 with matching subject to enabled it for drm? -Daniel
Otherwise, yeah, this looks good. Was the export due to the 0-day bot failure reports?
Yes. -Chris
On 05/02/2021 22.06, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
Looks a lot like https://lore.kernel.org/lkml/20200710075632.14661-1-linux@rasmusvillemoes.dk.... So FWIW, ack from me.
cc += Cyrill.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Acked-by: Daniel Vetter daniel.vetter@ffwll.ch # DRM depends on SYS_kcmp
v2:
- Default n.
- Borrrow help message from man kcmp.
- Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
fs/eventpoll.c | 4 ++-- include/linux/eventpoll.h | 2 +- init/Kconfig | 12 ++++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 5 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a829af074eb5..3196474cbe24 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; }
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
return file_raw; } -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP */
/**
- Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0350393465d4..593322c946e6 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,7 +18,7 @@ struct file;
#ifdef CONFIG_EPOLL
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); #endif
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..1b75141bc18b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
- select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,17 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
- bool "Enable kcmp() system call" if EXPERT
- default n
- help
Enable the kernel resource comparison system call. It provides
user-space with the ability to compare two processes to see if they
share a common resource, such as a file descriptor or even virtual
memory space.
If unsure, say N.
config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store.
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Cyrill Gorcunov gorcunov@gmail.com Cc: stable@vger.kernel.org Acked-by: Daniel Vetter daniel.vetter@ffwll.ch # DRM depends on kcmp Acked-by: Rasmus Villemoes linux@rasmusvillemoes.dk # systemd uses kcmp
--- v2: - Default n. - Borrrow help message from man kcmp. - Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP v3: - Select KCMP for CONFIG_DRM --- drivers/gpu/drm/Kconfig | 3 +++ fs/eventpoll.c | 4 ++-- include/linux/eventpoll.h | 2 +- init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 6 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 0973f408d75f..af6c6d214d91 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -15,6 +15,9 @@ menuconfig DRM select I2C_ALGOBIT select DMA_SHARED_BUFFER select SYNC_FILE +# gallium uses SYS_kcmp for os_same_file_description() to de-duplicate +# device and dmabuf fd. Let's make sure that is available for our userspace. + select KCMP help Kernel-level support for the Direct Rendering Infrastructure (DRI) introduced in XFree86 4.0. If you say Y here, you need to select diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a829af074eb5..3196474cbe24 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; }
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
return file_raw; } -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP */
/** * Adds a new entry to the tail of the list in a lockless way, i.e. diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0350393465d4..593322c946e6 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,7 +18,7 @@ struct file;
#ifdef CONFIG_EPOLL
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); #endif
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..9cc7436b2f73 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN + select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore. @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP + bool "Enable kcmp() system call" if EXPERT + help + Enable the kernel resource comparison system call. It provides + user-space with the ability to compare two processes to see if they + share a common resource, such as a file descriptor or even virtual + memory space. + + If unsure, say N. + config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS) - SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)"); + SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)"); }
TEST(mode_strict_support)
On Fri, Feb 05, 2021 at 10:00:12PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
... Reviewed-by: Cyrill Gorcunov gorcunov@gmail.com
On Fri, Feb 05, 2021 at 10:00:12PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store.
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk
Thanks!
Reviewed-by: Kees Cook keescook@chromium.org
-Kees
Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Cyrill Gorcunov gorcunov@gmail.com Cc: stable@vger.kernel.org Acked-by: Daniel Vetter daniel.vetter@ffwll.ch # DRM depends on kcmp Acked-by: Rasmus Villemoes linux@rasmusvillemoes.dk # systemd uses kcmp
v2:
- Default n.
- Borrrow help message from man kcmp.
- Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
v3:
- Select KCMP for CONFIG_DRM
drivers/gpu/drm/Kconfig | 3 +++ fs/eventpoll.c | 4 ++-- include/linux/eventpoll.h | 2 +- init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 6 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 0973f408d75f..af6c6d214d91 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -15,6 +15,9 @@ menuconfig DRM select I2C_ALGOBIT select DMA_SHARED_BUFFER select SYNC_FILE +# gallium uses SYS_kcmp for os_same_file_description() to de-duplicate +# device and dmabuf fd. Let's make sure that is available for our userspace.
- select KCMP help Kernel-level support for the Direct Rendering Infrastructure (DRI) introduced in XFree86 4.0. If you say Y here, you need to select
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a829af074eb5..3196474cbe24 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; }
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
return file_raw; } -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP */
/**
- Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0350393465d4..593322c946e6 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,7 +18,7 @@ struct file;
#ifdef CONFIG_EPOLL
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); #endif
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..9cc7436b2f73 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
- select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
- bool "Enable kcmp() system call" if EXPERT
- help
Enable the kernel resource comparison system call. It provides
user-space with the ability to compare two processes to see if they
share a common resource, such as a file descriptor or even virtual
memory space.
If unsure, say N.
config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
2.20.1
On Mon, Feb 08, 2021 at 02:12:00PM -0800, Kees Cook wrote:
On Fri, Feb 05, 2021 at 10:00:12PM +0000, Chris Wilson wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store.
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk
Thanks!
Reviewed-by: Kees Cook keescook@chromium.org
Thanks for reviews&patch, I stuffed it into a topic branch and plan to send it to Linus later this week.
Cheers, Daniel
-Kees
Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Cyrill Gorcunov gorcunov@gmail.com Cc: stable@vger.kernel.org Acked-by: Daniel Vetter daniel.vetter@ffwll.ch # DRM depends on kcmp Acked-by: Rasmus Villemoes linux@rasmusvillemoes.dk # systemd uses kcmp
v2:
- Default n.
- Borrrow help message from man kcmp.
- Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
v3:
- Select KCMP for CONFIG_DRM
drivers/gpu/drm/Kconfig | 3 +++ fs/eventpoll.c | 4 ++-- include/linux/eventpoll.h | 2 +- init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 6 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 0973f408d75f..af6c6d214d91 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -15,6 +15,9 @@ menuconfig DRM select I2C_ALGOBIT select DMA_SHARED_BUFFER select SYNC_FILE +# gallium uses SYS_kcmp for os_same_file_description() to de-duplicate +# device and dmabuf fd. Let's make sure that is available for our userspace.
- select KCMP help Kernel-level support for the Direct Rendering Infrastructure (DRI) introduced in XFree86 4.0. If you say Y here, you need to select
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a829af074eb5..3196474cbe24 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; }
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
return file_raw; } -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP */
/**
- Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0350393465d4..593322c946e6 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,7 +18,7 @@ struct file;
#ifdef CONFIG_EPOLL
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); #endif
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..9cc7436b2f73 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
- select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
- bool "Enable kcmp() system call" if EXPERT
- help
Enable the kernel resource comparison system call. It provides
user-space with the ability to compare two processes to see if they
share a common resource, such as a file descriptor or even virtual
memory space.
If unsure, say N.
config RSEQ bool "Enable rseq() system call" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
2.20.1
-- Kees Cook
On Fri, 5 Feb 2021 at 22:01, Chris Wilson chris@chris-wilson.co.uk wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf)
As you rightfully point out, SYS_kcmp is a bit of a two edged sword. While you mention the CONFIG issue, there is also a portability aspect (mesa runs on more than just linux) and as well as sandbox filtering of the extra syscall.
Last time I looked, the latter was still an issue and mesa was using SYS_kcmp to compare device node fds. A far shorter and more portable solution is possible, so let me prepare a Mesa patch.
-Emil
On Friday, February 12th, 2021 at 1:57 PM, Emil Velikov emil.l.velikov@gmail.com wrote:
On Fri, 5 Feb 2021 at 22:01, Chris Wilson chris@chris-wilson.co.uk wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf)
As you rightfully point out, SYS_kcmp is a bit of a two edged sword. While you mention the CONFIG issue, there is also a portability aspect (mesa runs on more than just linux) and as well as sandbox filtering of the extra syscall.
Last time I looked, the latter was still an issue and mesa was using SYS_kcmp to compare device node fds. A far shorter and more portable solution is possible, so let me prepare a Mesa patch.
Comparing two DMA-BUFs can be done with their inode number, I think.
Comparing two device FDs is more subtle, because of GEM handle ref'counting. You sometimes really want to check whether two FDs are backed by the same file *description*. See [1] for details.
[1]: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/110
On Fri, 12 Feb 2021 at 13:14, Simon Ser contact@emersion.fr wrote:
On Friday, February 12th, 2021 at 1:57 PM, Emil Velikov emil.l.velikov@gmail.com wrote:
On Fri, 5 Feb 2021 at 22:01, Chris Wilson chris@chris-wilson.co.uk wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf)
As you rightfully point out, SYS_kcmp is a bit of a two edged sword. While you mention the CONFIG issue, there is also a portability aspect (mesa runs on more than just linux) and as well as sandbox filtering of the extra syscall.
Last time I looked, the latter was still an issue and mesa was using SYS_kcmp to compare device node fds. A far shorter and more portable solution is possible, so let me prepare a Mesa patch.
Comparing two DMA-BUFs can be done with their inode number, I think.
Comparing two device FDs is more subtle, because of GEM handle ref'counting. You sometimes really want to check whether two FDs are backed by the same file *description*. See [1] for details.
Thanks for the correction and the reference. Seems like I've short circuited file description table vs file descriptor.
Emil
On 2021-02-12 1:57 p.m., Emil Velikov wrote:
On Fri, 5 Feb 2021 at 22:01, Chris Wilson chris@chris-wilson.co.uk wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf)
As you rightfully point out, SYS_kcmp is a bit of a two edged sword. While you mention the CONFIG issue, there is also a portability aspect (mesa runs on more than just linux) and as well as sandbox filtering of the extra syscall.
Last time I looked, the latter was still an issue and mesa was using SYS_kcmp to compare device node fds. A far shorter and more portable solution is possible, so let me prepare a Mesa patch.
Make sure to read my comments on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6881 first. :)
On Fri, 12 Feb 2021 at 14:01, Michel Dänzer michel@daenzer.net wrote:
On 2021-02-12 1:57 p.m., Emil Velikov wrote:
On Fri, 5 Feb 2021 at 22:01, Chris Wilson chris@chris-wilson.co.uk wrote:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf)
As you rightfully point out, SYS_kcmp is a bit of a two edged sword. While you mention the CONFIG issue, there is also a portability aspect (mesa runs on more than just linux) and as well as sandbox filtering of the extra syscall.
Last time I looked, the latter was still an issue and mesa was using SYS_kcmp to compare device node fds. A far shorter and more portable solution is possible, so let me prepare a Mesa patch.
Make sure to read my comments on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6881 first. :)
Much appreciated. I might have been "slightly" off - pardon for the noise o/
-Emil
Hi
Am 05.02.21 um 23:00 schrieb Chris Wilson:
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store.
This helps a lot with transactional programming in userspace system code. So FWIW
Acked-by: Thomas Zimmermann tzimmermann@suse.de
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Dave Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Lucas Stach l.stach@pengutronix.de Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Cyrill Gorcunov gorcunov@gmail.com Cc: stable@vger.kernel.org Acked-by: Daniel Vetter daniel.vetter@ffwll.ch # DRM depends on kcmp Acked-by: Rasmus Villemoes linux@rasmusvillemoes.dk # systemd uses kcmp
v2:
- Default n.
- Borrrow help message from man kcmp.
- Export get_epoll_tfile_raw_ptr() for CONFIG_KCMP
v3:
- Select KCMP for CONFIG_DRM
drivers/gpu/drm/Kconfig | 3 +++ fs/eventpoll.c | 4 ++-- include/linux/eventpoll.h | 2 +- init/Kconfig | 11 +++++++++++ kernel/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 6 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 0973f408d75f..af6c6d214d91 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -15,6 +15,9 @@ menuconfig DRM select I2C_ALGOBIT select DMA_SHARED_BUFFER select SYNC_FILE +# gallium uses SYS_kcmp for os_same_file_description() to de-duplicate +# device and dmabuf fd. Let's make sure that is available for our userspace.
- select KCMP help Kernel-level support for the Direct Rendering Infrastructure (DRI) introduced in XFree86 4.0. If you say Y here, you need to select
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a829af074eb5..3196474cbe24 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,7 +979,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; }
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1021,7 +1021,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
return file_raw; } -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP */
/**
- Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0350393465d4..593322c946e6 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,7 +18,7 @@ struct file;
#ifdef CONFIG_EPOLL
-#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); #endif
diff --git a/init/Kconfig b/init/Kconfig index b77c60f8b963..9cc7436b2f73 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1194,6 +1194,7 @@ endif # NAMESPACES config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN
- select KCMP default n help Enables additional kernel features in a sake of checkpoint/restore.
@@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool
+config KCMP
- bool "Enable kcmp() system call" if EXPERT
- help
Enable the kernel resource comparison system call. It provides
user-space with the ability to compare two processes to see if they
share a common resource, such as a file descriptor or even virtual
memory space.
If unsure, say N.
- config RSEQ bool "Enable rseq() system call" if EXPERT default y
diff --git a/kernel/Makefile b/kernel/Makefile index aa7368c7eabf..320f1f3941b7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -51,7 +51,7 @@ obj-y += livepatch/ obj-y += dma/ obj-y += entry/
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 26c72f2b61b1..1b6c7d33c4ff 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -315,7 +315,7 @@ TEST(kcmp) ret = __filecmp(getpid(), getpid(), 1, 1); EXPECT_EQ(ret, 0); if (ret != 0 && errno == ENOSYS)
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_CHECKPOINT_RESTORE?)");
SKIP(return, "Kernel does not support kcmp() (missing CONFIG_KCMP?)");
}
TEST(mode_strict_support)
Hi!
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Is it good idea to enable everything because Mesa uses it for file descriptors?
This is really interesting syscall...
Best regards, Pavel
Am Samstag, dem 13.02.2021 um 18:40 +0100 schrieb Pavel Machek:
Hi!
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Is it good idea to enable everything because Mesa uses it for file descriptors?
This is really interesting syscall...
As Debian/Ubuntu and Fedora are already shipping with CONFIG_CHECKPOINT_RESTORE=y in their kernel configs, I don't really see the need to add further restrictions here. Or this discussion should have happened a while ago...
Regards, Lucas
dri-devel@lists.freedesktop.org