A guide to Linux signals

thatcks · on Nov 24, 2015

A note: SIGHUP is far from obsolete in its original purpose. We may have stopped using modems but people still log in to Unix machines in ways that can get disconnected. If you SSH in to a machine and your SSH session is cut off by a network issue, your shell (and any command you have running) will get a SIGHUP.

dozzie · on Nov 24, 2015

There's more than that in SIGHUP. It's sent to the process group owning the terminal[1] (/dev/tty[1-X], /dev/pts/*) when said terminal is closed. Your shell, Midnight Commander, Vim, Emacs, and what not terminates when you close your terminal emulator window.

[1] I don't quite remember what happens to other process groups, since you could have some background jobs stopped.

misterdata · on Nov 24, 2015

Yes, and using the 'nohup' utility you can effectively block the delivery of SIGHUP to a particular process, to ensure it does not terminate when you close your SSH connection.

jon-wood · on Nov 24, 2015

Awesome, I knew nohup could be used for that, but hadn't worked out why it was called that.

danieltillett · on Nov 24, 2015

Catching signals is a great way of catching nasty bugs in production code. I use them to be able to log information on the cause of an error before the code exits due to some sort of catastrophic bug (e.g. null pointer deference, etc). This also allows you to clean up to clean up temp files, etc. They are much nicer to use than the Windows equivalent.

useerup · on Nov 24, 2015

> They are much nicer to use than the Windows equivalent

Seriously? You find a system with global events where you must perform your own resource book-keeping to clean up the correct resources nicer than structured exception handling?

danieltillett · on Nov 24, 2015

Yes as soon as you have to support both 32 and 64 builds in a multi-threaded program that is using the C run time library. This of course is just my opinion :)

pjc50 · on Nov 24, 2015

Why is SEH a problem in that case? SEH is occasionally flaky but much nicer than trying to handle SEGV.

rewqfdsa · on Nov 24, 2015

> They are much nicer to use than the Windows equivalent.

Strongly disagree. It's hard to make different signals cooperate. Vectored exception handlers in Windows have built-in support for arbitrating between different users.

ratboy666 · on Nov 24, 2015

Linux (Unix) signals are per-process. What is hard about making signals cooperate? Are you referring to a signal generated while in a signal handler? raise() is supported on Windows, but kill() isn't (directly, signals don't seem to be well supported as an IPC mechanism on Windows -- although there appear to be kill() style mechanisms).

rewqfdsa · on Nov 24, 2015

> What is hard about making signals cooperate?

Library A wants to handle SIGSEGV for memory mapping X. Library B wants to handle SIGSEGV for memory mapping Y. Which of them calls sigaction? What if both do? What if then unload one of the libraries?

Windows handles this interaction sanely. Sigaction does not.

> raise() is supported on Windows, but kill()

Windows splits the jobs of signals into structured exceptions and APCs. You can definitely send an APC to another thread: https://msdn.microsoft.com/en-us/library/windows/desktop/ms6...

fulafel · on Nov 24, 2015

It's actually alarmingly common for Windows C/C++ apps to paper over crashes with SEH try { } catch (segfault) { /* seems to work */ } this way. Even Microsoft's own apps used to do it, maybe they still do. You run into this when running random apps under a debugger.

silon7 · on Nov 24, 2015

There's nothing structured about core dumps. Better to do nothing so the developer can see a clean core dump.

ratboy666 · on Nov 24, 2015

sigaction() returns the old action. This permits chaining of the actions. Specifically for SIGSEGV, if Library B does not want the signal, it can invoke Library A (without caring what it is). Similarily, A can invoke B. "Unloading" cannot be done, unless you are aware of the load order. This is a general issue -- dlclose() is reference counted, and you wouldn't know if it actually unloaded the library.

useerup · on Nov 24, 2015

> Unloading" cannot be done, unless you are aware of the load order. This is a general issue -- dlclose() is reference counted, and you wouldn't know if it actually unloaded the library.

Yup. Which is exactly why Windows is more sane: No chaining, "unloading" (i.e. unsubscribing) can be done robustly. In the absence of a mechanism like SEH on Linux, signals are (mis)used. Signals are a crude, unstructured way of handling internal termination.

pjc50 · on Nov 24, 2015

Much of the IPC stuff Linux has to do with signals is done with window messages on Windows, which is usually nicer as you get to handle it in a defined state rather than, say, in the middle of a syscall.

zurn · on Nov 24, 2015

Signals are not generally used for IPC on Linux. Also you can handle them orderly along with other fd events using the standard & portable self-pipe pattern or Linux-specific signalfd.

throw7 · on Nov 24, 2015

One thing that might have been nice to note is the rules regarding what you can and can't do in signal handler code.

JoeAltmaier · on Nov 24, 2015

One of my pet peeves: documentation that doesn't say anything. Methods (and callbacks) are listed by name with some thrifty description of argument. But none of the important stuff is ever there: is it reentrant? stateless? Who owns each non-scalar argument? Can it be invoked from user space? installable driver? kernel driver? Signal handler? Does it have latency constraints? Can I use it from a different thread than it was constructed under? Is it protected during thread death? Is it atomic vs what other methods? What other apis are allowed/forbidden when writing a callback?

As an embedded programmer, I end up reading library source most of the time trying to find these answers. And if it isn't open, then I have to black-box test or lard calls up with semaphores which might not be necessary. The state of API documentation is in the stone age.

marios · on Nov 24, 2015

The short answer is: not much. Basically change a global variable declared as volatile sig_atomic_t.

You can find more information in this presentation [1] (from 2004!).

[1]: http://www.openbsd.org/papers/opencon04/index.html

sohkamyung · on Nov 24, 2015

For a more in-depth look at Signals, there's Michael Kerrisk's Linux Man Pages on Signals [1]. AFAIK, the best reference on Signals is in the book by Stevens and Rago [2].

[1] http://man7.org/linux/man-pages/man7/signal.7.html [2] Advanced Programming in the Unix Environment, Second Edition

Symbiote · on Nov 24, 2015

    ps -ef | grep foobar

is more easily done with

    pgrep foobar

or perhaps

    pgrep -a foobar

SEJeff · on Nov 24, 2015

or ps -ef | grep fo[o]bar

So you don't have to | grep -v grep :)

I ask people to explain what that does and how it works as a sysadmin interview question.

notfoss · on Nov 25, 2015

That's nice. Apparently, we can place the brackets around any of the characters too! Can you explain how it works?

SEJeff · on Nov 29, 2015

Sure, the square brackets in posix regex (used by grep) are for character ranges. You can wrap any single character in the brackets and it works the same.

So foob[a]r matches the literal string foobar. If you grepped foob[ab]r it would match both literal strings foobar and foobbr. However, it doesn't match the literal string foob[a]r because [] is a character range. To match that, you'd need to escape it something like foob\[a\]r, which would not match the literal string foobar. This is why you don't need grep -v grep

Understanding how and why this works will dramatically help you slice and dice text strings in a shell, so it makes a great SysAdmin interview question.

nailer · on Nov 24, 2015

pgrep already excludes itself

SEJeff · on Nov 29, 2015

Sure but that isn't often nearly as useful as something like:

ps -efH | grep foob[a]r

Where you can see the arguments and whatnot. Both are good to know.

You can use awk's ability to do posix regex and emulate pgrep with:

ps -efH | awk '/foob[a]r/{print $1}'

mixblast · on Nov 24, 2015

What about SIGSTOP and SIGCONT? They're quite useful to pause and resume processes (Ctrl-Z).

noselasd · on Nov 24, 2015

And indeed also very useful for testing. "what happens if we simulate this process being a bit slow" - send it a SIGSTOP. Doing so can simulate a number of other weird cases too as observed from the outside , such as a process trashing on swap, NFS hanging.

I've found that doing that to etcd causes other etcd in the cluster to randomly hang - while it handles a lot of other failures fine (instant poweroff, network partitioning, randomly crash etcd, sending SIGSTOP to on member of a cluster breaks everything - I don't know if that's improved since I did those tests though)

teddyh · on Nov 24, 2015

As I usually have to point out, Ctrl-Z sends SIGTSTP, which can be caught by a signal handler – unlike SIGSTOP, which is not sent by any key in the terminal driver, and which can not be caught by a signal handler.

jackgavigan · on Nov 24, 2015

Aren't these Unix (or, more accurately, POSIX) signals, rather than Linux signals?

wnoise · on Nov 24, 2015

WINCH comes from the terminal driver, not from the "window manager".

positron4 · on Nov 24, 2015

Thank you for this excellent article. Much appreciated!

rewqfdsa · on Nov 24, 2015

The trouble with signals is that there are so few of them and we can't add more without breaking the ABI.

nitrogen · on Nov 24, 2015

If you need to send more types of messages, there are other IPC mechanisms.