Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

17 Jan 2022

      Am 2022-01-17 um 9:21 a.m. schrieb Christian König:
...
Am 17.01.22 um 15:17 schrieb Felix Kuehling:
...
Am 2022-01-17 um 6:44 a.m. schrieb Christian König:
...
Am 14.01.22 um 18:40 schrieb Felix Kuehling:
...
Am 2022-01-14 um 12:26 p.m. schrieb Christian König:
...
Am 14.01.22 um 17:44 schrieb Daniel Vetter:
...
Top post because I tried to catch up on the entire discussion here.
So fundamentally I'm not opposed to just close this fork() hole
once and
for all. The thing that worries me from a upstream/platform pov is
really
only if we don't do it consistently across all drivers.
So maybe as an idea:

Do the original patch, but not just for ttm but all gem rendernode

drivers at least (or maybe even all gem drivers, no idea), with
the
     below discussion cleaned up as justification.
I know of at least one use case which this will break.
A couple of years back we had a discussion on the Mesa mailing list
because (IIRC) Marek introduced a background thread to push command
submissions to the kernel.
That broke because some compositor used to initialize OpenGL and then
do a fork(). This indeed worked previously (no GPUVM at that time),
but with the addition of the backround thread obviously broke.
The conclusion back then was that the compositor is broken and needs
fixing, but it still essentially means that there could be people out
there with really old userspace where this setting would just break
the desktop.
I'm not really against that change either, but at least in theory we
could make fork() work perfectly fine even with VMs and background
threads.
You may regret this if you ever try to build a shared virtual address
space between GPU and CPU. Then you have two processes (parent and
child) sharing the same render context and GPU VM address space.
But the
CPU address spaces are different. You can't maintain consistent shared
virtual address spaces for both processes when the GPU address
space is
shared between them.
That's actually not much of a problem.
All you need to do is to use pthread_atfork() and do the appropriate
action in parent/child to clean up your context:
https://man7.org/linux/man-pages/man3/pthread_atfork.3.html
Thunk already does that. However, it's not foolproof. pthread_atfork
hanlders aren't called when the process is forked with a clone call.
Yeah, but that's perfectly intentional. clone() is usually used to
create threads.
Clone can be used to create new processes. Maybe not the common use today.
...
...
...
The rest is just to make sure that all shared and all private data are
kept separate all the time. Sharing virtual memory is already done for
decades this way, it's just that nobody ever did it with a statefull
device like GPUs.
My concern is not with sharing or not sharing data. It's with sharing
the address space itself. If you share the render node, you share GPU
virtual address space. However CPU address space is not shared between
parent and child. That's a fundamental mismatch between the CPU world
and current GPU driver implementation.
Correct, but even that is easily solvable. As I said before you can
hang this state on a VMA and let it be cloned together with the CPU
address space.
I'm not following. The address space I'm talking about is struct
amdgpu_vm. It's associated with the render node file descriptor.
Inheriting and using that file descriptor in the child inherits the
amdgpu_vm. I don't see how you can hang that state on any one VMA.
To be consistent with the CPU, you'd need to clone the GPU address space
(struct amdgpu_vm) in the child process. That means you need a new
render node file descriptor that imports all the BOs from the parent
address space. It's a bunch of extra work to fork a process, that you're
proposing to immediately undo with an atfork handler. So I really don't
see the point.
Regards,
  Felix
...
Since VMAs are informed about their cloning (in opposite to file
descriptors) it's trivial to even just clone kernel data on first access.
Regards,
Christian.
...
Regards,
   Felix
...
Regards,
Christian.
...
Regards,
    Felix

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process