Hello.
I tried many times to ask about the supposed behaviour of block_all_signals() in drm, but it seems nobody can answer.
So I am going to send the patch which simply removes block_all_signals() and friends. There are numeruous problems with this interace, I can't even enumerate them. But I think that it is enough to mention that block_all_signals() simply can not work. AT ALL. I am wondering, was it ever tested and how.
So. ioctl()->drm_lock() "blocks" the stop signals. Probably to ensure the task can't be stopped until it does DRM_IOCTL_UNLOCK. And what does this mean? Yes, the task won't stop if it receives, say, SIGTSTP. But! Instead it will loop forever in kernel mode until it receives another unblocked/non-ignored signal which should be numerically less than SIGSTOP.
Why do we need this? Once again. block_all_signals(SIGTSTP) only means that the caller will burn cpu instead of sleeping in TASK_STOPPED after ^Z. What is the point?
And once again, there are other problems. For example, even if block_all_signals() actually blocked SIGSTOP/etc, this can not help if the caller is multithreaded.
I strongly believe block_all_signals() should die. Given that it doesn't work, could somebody please explain me what will be broken?
Just in case... Please look at the debugging patch below. With this patch,
$ perl -le 'syscall 157,666 and die $!; sleep 1, print while ++$_' 1 2 3 ^Z
Hang. So it does react to ^Z anyway, just it is looping in the endless loop in the kernel. It can only look as if ^Z is ignored, because obviously bash doesn't see it stopped.
Now lets look at drm_notifier(). If it returns 0 it does:
/* Otherwise, set flag to force call to drmUnlock */
drmUnlock? grep shows nothing...
do { old = s->lock->lock; new = old | _DRM_LOCK_CONT; prev = cmpxchg(&s->lock->lock, old, new); } while (prev != old); return 0;
OK. So, if block_all_signals() makes any sense, it seems that this is only because we add _DRM_LOCK_CONT.
Who checks _DRM_LOCK_CONT? _DRM_LOCK_IS_CONT(), but it has no users. Hmm. Looks like via_release_futex() is the only user, but it doesn't look as "force call to drmUnlock" and it is CONFIG_DRM_VIA only.
I am totally confused. But block_all_signals() should die anyway.
We can probably implement something like 'i-am-going-to-stop' or even 'can-i-stop' per-thread notifiers, although this all looks like the user-space problem to me (yes, I know absolutely nothing about drm/etc).
If nothing else. We can change drm_lock/drm_unlock to literally block/unblock SIGSTOP/etc (or perhaps we only should worry about the signals from tty?). This is the awful hack and this can't work with the multithreaded tasks too, but still it is better than what we have now.
Oleg.
--- a/kernel/sys.c~ 2011-06-16 20:12:18.000000000 +0200 +++ b/kernel/sys.c 2011-07-12 16:24:50.000000000 +0200 @@ -1614,6 +1614,11 @@ SYSCALL_DEFINE1(umask, int, mask) return mask; }
+static int notifier(void *arg) +{ + return 0; +} + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { @@ -1627,6 +1632,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
error = 0; switch (option) { + case 666: { + sigset_t *pmask = kmalloc(sizeof(*pmask), GFP_KERNEL); + siginitset(pmask, sigmask(SIGTSTP)); + block_all_signals(notifier, NULL, pmask); + break; + } + case PR_SET_PDEATHSIG: if (!valid_signal(arg2)) { error = -EINVAL;
On Tue, Jul 12, 2011 at 11:15 AM, Oleg Nesterov oleg@redhat.com wrote:
I tried many times to ask about the supposed behaviour of block_all_signals() in drm, but it seems nobody can answer.
It's always been broken, I think. We've had threads about it before (years and years ago). I'd certainly be willing to just remove it entirely, and see if anything pops up that ends up actually working and being saner.
Linus
On Tue, Jul 12, 2011 at 7:15 PM, Oleg Nesterov oleg@redhat.com wrote:
Hello.
I tried many times to ask about the supposed behaviour of block_all_signals() in drm, but it seems nobody can answer.
Its not hard to explain, basically there exists hardware that you program acceleration engines using MMIO, having multiple processes writing to the MMIO area at once is a bad idea, so we use the DRM lock which is like a pre-futex futex. Now the problem is if you stop or kill a process in a critical section where it is writing to the hardware directly with MMIO you could crash the hardware, so the idea is that you block to the max until the critical section exits via the drm unlock.
some links http://lkml.indiana.edu/hypermail/linux/kernel/0202.2/0045.html
The only driver that might even use this and care in the tree is tdfx as I think it might be MMIO programmed, though I'm not 100% sure. I pretty much reckon we can deprecate block_all_signals as I forsee fixing all the problem you pointed out with it would be worse than just removing it.
So I am going to send the patch which simply removes block_all_signals() and friends. There are numeruous problems with this interace, I can't even enumerate them. But I think that it is enough to mention that block_all_signals() simply can not work. AT ALL. I am wondering, was it ever tested and how.
I wasn't around, but it did work back in linux-2.4.0-test7 when it was introduced,
http://www.kernel.org/pub/linux/kernel/v2.4/old-test-kernels/patch-2.4.0-tes...
is the initial patch. You should probably check that out and see if you can work out how it originally worked, and the work out if its really is broken now or if you haven't analysed it correctly.
I strongly believe block_all_signals() should die. Given that it doesn't work, could somebody please explain me what will be broken?
Just in case... Please look at the debugging patch below. With this patch,
$ perl -le 'syscall 157,666 and die $!; sleep 1, print while ++$_' 1 2 3 ^Z
Hang. So it does react to ^Z anyway, just it is looping in the endless loop in the kernel. It can only look as if ^Z is ignored, because obviously bash doesn't see it stopped.
taking the drm lock usually means you will drop it without doing anything stupid like sleeping for a long time, if the process dies, the lock gets dropped automatically also.
I've only second hand knowledge of how this works though, and I'm on holidays for another week or so, maybe once I get back I'll find some time to figure out how it works vs what happens, but really I suspect we can kill this with fire.
Dave.
On 07/12, Dave Airlie wrote:
On Tue, Jul 12, 2011 at 7:15 PM, Oleg Nesterov oleg@redhat.com wrote:
Hello.
I tried many times to ask about the supposed behaviour of block_all_signals() in drm, but it seems nobody can answer.
Its not hard to explain, basically there exists hardware that you program acceleration engines using MMIO, having multiple processes writing to the MMIO area at once is a bad idea, so we use the DRM lock which is like a pre-futex futex.
I guess LOCK_TEST_WITH_RETURN() does the check... This looks per-file, not per process/thread but I guess this doesn't matter in practice. And in any case, of course I do not understand this code.
Now the problem is if you stop or kill a process in a critical section where it is writing to the hardware directly with MMIO you could crash the hardware, so the idea is that you block to the max until the critical section exits via the drm unlock.
OK, thanks.
But who can send SIGTSOP to the drm_lock() holder? I mean, can this be a problem in practice?
As for SIGTSTP/SIGTTIN/SIGTTOU, the application can block them itself.
I pretty much reckon we can deprecate block_all_signals as I forsee fixing all the problem you pointed out with it would be worse than just removing it.
Yes, I think we should kill it. May be we should implement something else instead, I dunno.
So I am going to send the patch which simply removes block_all_signals() and friends. There are numeruous problems with this interace, I can't even enumerate them. But I think that it is enough to mention that block_all_signals() simply can not work. AT ALL. I am wondering, was it ever tested and how.
I wasn't around, but it did work back in linux-2.4.0-test7 when it was introduced,
http://www.kernel.org/pub/linux/kernel/v2.4/old-test-kernels/patch-2.4.0-tes...
is the initial patch. You should probably check that out and see if you can work out how it originally worked, and the work out if its really is broken now or if you haven't analysed it correctly.
Oh, 10 years ago ;) Of course I can't be sure if it ever worked or not. Probably it mostly worked, although I _feel_ it was always buggy. But it doesn't now, and I am pretty sure it was broken many years ago.
And no, I do not think we can fix it. It is not that the current implementation is buggy (although it is buggy), this interface is simply wrong, imho.
For example. block_all_signals(SIGINT) can not "block" SIGINT, it can be transofmed into SIGKILL. Of course, I am not saying it is not possible to do something which really works, but imho we should not do this.
Currently it is only used by drm, and drm doesn't need the "generic" interface anyway. If we decide that the kernel should help somehow to the process holding drm_lock (I hope it shouldn't ;), then we should implement something sane.
Just in case... Please look at the debugging patch below. With this patch,
$ perl -le 'syscall 157,666 and die $!; sleep 1, print while ++$_' 1 2 3 ^Z
Hang. So it does react to ^Z anyway, just it is looping in the endless loop in the kernel. It can only look as if ^Z is ignored, because obviously bash doesn't see it stopped.
taking the drm lock usually means you will drop it without doing anything stupid like sleeping for a long time,
Yes, I understand. I added "sleep" to lessen the output. The point is, it "stops" anyway even if block_all_signals() was called. It can't return to the user-mode and release the lock.
if the process dies, the lock gets dropped automatically also.
Just curious... do you mean drm_release() does this? Again, this is per file. A thread can exit without drm_unlock(). OK, at least this works if the process is killed and nobody else keeps this file opened.
I've only second hand knowledge of how this works though, and I'm on holidays for another week or so, maybe once I get back I'll find some time to figure out how it works vs what happens, but really I suspect we can kill this with fire.
OK, thanks Dave. This is not urgent. I do not want to send the patch which touches the code I do not understand and can't test. I'll wait for your return.
To summarize: block_all_signals() do not work. Afaics, the _only_ effect it has is that we set _DRM_LOCK_CONT. If this is not that important we can simply kill block_all_signals(), then we can think if we need do something else.
Oleg.
dri-devel@lists.freedesktop.org