When I was young and really didn't understand Unix, my friend and were summer students at NBS (now NIST), and one fine afternoon we wondered what would happen if you ran fork() forever.
We didn't know, so we wrote the program and ran it.
This was on a PDP-11/45 running v6 or v7 Unix. The printing console (some DECWriter 133 something or other) started burping and spewing stuff about fork failing and other bad things, and a minute or two later one of the folks who had 'root' ran into the machine room with a panic-stricken look because the system had mostly just locked up.
"What were you DOING?" he asked / yelled.
"Uh, recursive forks, to see what would happen."
He grumbled. Only a late 70s hacker with a Unix-class beard can grumble like that, the classic Unix paternal geek attitude of "I'm happy you're using this and learning, but I wish you were smarter about things."
I think we had to hard-reset the system, and it came back with an inconsistent file system which he had to repair by hand with ncheck and icheck, because this was before the days of fsck and that's what real programmers did with slightly corrupted Unix file systems back then. Uphill both ways, in the snow, on a breakfast of gravel and no documentation.
Total downtime, maybe half an hour. We were told nicely not to do that again. I think I was handed one of the illicit copies of Lions Notes a few days later. "Read that," and that's how my introduction to the guts of operating systems began.
> ...a minute or two later one of the folks who had 'root' ran into the machine room with a panic-stricken look because the system had mostly just locked up.
It's kind of weird that, while root has always had e.g. 5% reserved disk space on the rootfs for emergencies, one thing no Unix has ever done is enforce a 5% CPU reservation for root so administrators can "talk over" a cascading failure. I think this is possible just recently in Linux with CPU namespacing, but it's still not something any OS does by default.
It's not specifically the lack of cpu timeslices that crowds out other programs, it's more like exhaustion of all the OS resources (process table fills up, file table fills up, memory runs out, swap death etc).
Sure if you carefully made everything fork-bomb-resistant then a cpu quota would be a part of it. Container systems
use fork bombs as basic test cases.
I'm surprised that this wasn't one of the primary goals of cgroups: the ability to group "all userspace processes" into one cgroup, and then say that that cgroup can in sum only use so much CPU, so many processes, so many inodes, etc. You know, a control plane/data plane separation, without requiring hypervision.
It is. Cgroup provides limits for memory, CPU time. We already have other accounting mechanisms for processes/threads (rlimits) and for inodes and disk space (disk quota systems). We've had those for ages. I imagine there will be more work to integrate these various accounting mechanisms with cgroup as the work continues.
If you care about such things the normal method is to have a backup ssh running on a different port with realtime priority , it is not used at any other time except when some process had gone runaway and you can't do anything else.
No write up that I know of. I used it in systems I've made in the past. Some of our services were running in a realtime priority and we needed a way to take care of such a system mostly in development.
Most linux distributions assign root processes a better scheduling priority than non-root processes, which should be good enough in most cases. Critical system processes also run at better priorities than other processes. It's not uncommon to see linux users consciously decide on the priority of a process by using nice or renice.
Totally limiting the CPU utilization of a group of processes requires more overhead than changing the scheduling priority since you must actively account for the CPU usage. CPU cgroups should do just that though and in most cases the overhead should be acceptable.
In your comment's parent, I don't think raw CPU utilization was the issue since kabdib mentioned fork and it was in response to a post about fork failures. The problems caused by a fork bomb are not limited to CPU utilization, see: https://en.wikipedia.org/wiki/Fork_bomb
In any case, there will likely always be some system call you can abuse to totally exhaust some resource of the kernel.
> In any case, there will likely always be some system call you can abuse to totally exhaust some resource of the kernel.
If this is true, I would expect there to exist one or more articles entitled "how I brought down my Heroku host-instance" or something along those lines. Anyone got some links? :)
It would only be possible if a limit were enforced on all non-primary namespaces.
However something that has been /possible/ for a while (but not in practice done) would be to elevate root process priority over other processes. Probably not done due to daemons needing to run as root (which is decreasing as they're able to drop privileges these days).
Root has had the ability to assign negative nice values since long, long ago. Non-root users can only assign positive niceness. The range is -20 - +19.
In theory this can give higher priority to a process, but if you cannot get into the run-queue at all (fork bomb), or the problem is in kernel space (e.g., I/O access, hang, or a kernel space loop), then it's not going to help you much.
And, sadly, most of the really hard hangs are kernel space. The general fix is to cut off all network requests/incoming jobs, powercycle, dig through logs, and try to shunt a future hang. (Sometimes just cutting incoming jobs will stop the hang, too)
On Linux, nice is not an absolute priority system.
In the old days, the Amiga operating system did use static absolute priorities for its multi-tasking. This meant that if a task with a priority of 1 wanted to use as much CPU as it wanted, then all tasks with a priority of 0 or below would be completely starved. This meant that you could boost a certain process (like, say, a CD writer) and get close to real-time behaviour. I was certainly writing coaster-free CDs on a much less powerful Amiga than a Linux box that constantly made coasters from buffer under-runs.
Linux, however, has virtual memory and "nice", which complicates matters. A process with a niceness of 19 will still take a small amount of CPU in the presence of another process with a niceness of -20. In the presence of a fork bomb, you may have a very large number of processes. If they all (by some miracle) have a niceness of 19, you still have very little CPU time left for a process with a normal or negative niceness. Infinity multiplied by a small number is still infinity. Real-time priorities are the only thing that will save you here.
You also have the problem of being able to actually change the processes' nicenesses. That requires CPU time, which you no longer have. You would be better off sending a kill signal. You also have a race condition - you obtain (from the OS) a list of processes that are running that you want to renice or kill. By the time you have iterated through each one renicing or killing them, new processes have appeared.
For several years I was a sysadmin for the University of Texas computer sciences department. (This was much later than your story, though.) If I remember correctly, the operating systems class was usually taught in the spring and they got to exploring processes sometime in late March or early April. And for about two weeks, none of our generally available systems would have an uptime of more than a couple of days.
Sure, you could get in and kill a fork-bomb before it did anything bad. But two or three on the same machine? And when you've got a couple hundred machines? It was easier to just reboot and let the victims who were inconvenienced handle explaining to the guilty how what they did was bad.
Then there were the guys who would log into one machine in a lab, fork-bomb it, move to the next machine over and make a change to their program, fork-bomb that machine, and expect to iterate that process until they passed the assignment. Leaving a wake of pitifully flailing workstations behind. Ahh, good times.
Must have been V6; I recall V7 had patches to prevent this, at least to the extent that it wouldn't crater the whole machine. I haven't thought about using ncheck & icheck since fsdb showed up about BSD4.2 or thereabouts. I remember using adb as well to fix buggered filesystems back in the ancient days.
I remember well the day one of the elder neckbeards handed me my own photocopy of the Lions books. It was enlightenment in pure form.
Running as root, this usually worked fine: It would create a directory, move into it, and clean out any garbage left behind by a previous run before doing anything new.
Running as non-root, the mkdir failed, the chdir failed, and it started eating my home directory.
If you don't need to specify a custom message, you could also use "set -e" to die automatically on command failure and use "trap ... EXIT" to display some kind of failure message.
This isn't really fork() failing per se -- but rather a failed program/script that did not understand the well defined and clearly documented behavior of fork().
True -- I guess fork() has failed at that point. I was more getting at the article's authors scenarios of careless scripts treating -1 as a valid pid (which should always be > 0), which would be a failure of the script instead.
Well, it's a failure of the script and the POSIX API - it is unfortunate that a pid_t inhabited by -1 means "failure" in one case and "everything" in another. It is certainly not fork, narrowly, to blame.
I think the point that the author was making, outside of any criticism to the API itself, is that the hapless programmer may not know that fork() could even return -1. The article is pointing out that possibility as documented by the API, not criticizing a failure of the API. Whether or not the API is faulty in this--and I tend to agree with you--is outside of the scope of the article, though only just.
I wasn't making any particular comment on the content of the article. It was certainly relevant to the discussion tangent the thread had wandered down.
Taking the square root of a negative number removing all the files in your home directory could be "well defined and clearly documented behavior". Would you blame the API author at that point or would it still be strictly your fault?
At what point do API authors share the blame for a needlessly harsh punishment delivered upon a predictably common error?
I certainly prefer to work with systems produced by people tending to think it'd be their fault more often than not.
I could argue, however, that in this particular case, it's the user's fault for failing to understand the full and defined behavior of fork() in addition to failing to understand the full and defined behavior of other functions, ...say, kill().
It's just as wrong to feed kill() -1 as it would be to feed it -48585 or "babdkd" (unless that is explicitly your intention). A simple sanity check of if [ "${pid} > "0" ]; is all that's necessary to protect against this behavior.
So, I would argue, the fault lays on the users, not the creator, for not understanding the API when all materials necessary to understand said API are freely available.
(with that said, I think it's safe to say, we've all been bitten by not fully understanding some function before)
There'd be far fewer bugs if everyone knew exactly how everything else works.
This kind of mistake is godawful and should not be defended. (but it's correctly fixed through stronger typing, not through choosing -48585 as the code for killing everything).
> not through choosing -48585 as the code for killing everything)
Actually, only -1 is the code that "kills everything"
> There'd be far fewer bugs if everyone knew exactly how everything else works.
Perhaps I misinterpreted your meaning, because you seem to be advocating using programming and scripting languages without actually bothering to learn them. Of course this can, will and does lead to very bad effects.
The bottom line is, if you are going to use a function in your program/script -- please, read the docs and understand what is will return at the very least.
> Perhaps I misinterpreted your meaning, because you seem to be advocating using programming and scripting languages without actually bothering to learn them. Of course this can, will and does lead to very bad effects.
You're arguing that people should read the API before doing anything with it. Parent's point is that this class of error can be avoided by strong typing (eg, via algebraic datatypes), negating the chance that it would happen in the first place. Which, I think, is the right way to look at the problem. But certainly, if you do have to use a weakly-typed, unsafe language which does not provide this kind of guarantee, be sure to read the documentation twice.
Which doesn't mean you won't get bitten when it turns out that the person writing a library you rely didn't RTFM.
Fault is not a rivalrous good. It's the user's fault, and it's the API creator's fault.
Is there a reason fork can't be changed to just crash the program on failure? Are situations where a program usefully does something other than crash on fork failure, more or less common than situations where a program fails in the way described in the article?
I can't think of any good cases for crashing a program when fork fails. If you're doing work in a pool of processes and the parent process tries to fork another process, but fails, there are many ways to handle this: wait a few seconds before trying again, wait for another process to finish, kill a few processes, etc.
If you were forking in a high-level language such as Python, failing to fork would raise an exception which would possibly crash the program if left unhandled.
C does not have exceptions so return codes are used to indicate success or failure. This is true for nearly every function, not just fork. If you're not checking for errors in a C program, it's going to break in unexpected ways, and will possibly be vulnerable to exploitation.
fork has 3 possible return values:
- 0 for the child process
- a positive number for the parent process
- a negative number if it failed.
"On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately."
"Is there a reason fork can't be changed to just crash the program on failure?"
YES. Situation: fork bomb, can't create new processes. How do you notice? Typically because you can't spawn new processes from your shell because fork is failing. With bash, there is a kill builtin - so you have a chance of cleaning things up (depending) if you have a shell open. If the failed fork kills the shell, then oops, you don't have a shell open.
That should be up to the programmer to decide. If fork() fails, it's almost always a transient situation that can be recovered from by spinning until fork() succeeds. Or one could have jobs doing useful things that can finish up, state saved, etc, for when the process is re-run. Just dropping everything onto the floor is usually the worst option.
`with-current-directory` has to be a macro because if it is not, then the evaluation of (delete-all-files-recursively) happens before the definition of with-current-directory can change the current directory.
What the person meant when he wrote, "In those times I wish I could use the emacs lisp way," is, "In those times I wish I could use a lisp _macro_" -- particularly, one of those macros that makes a change, runs some code ("the body") then undoes the change.
Since all lisps have macros, the code above would work in any lisp -- not just Emacs Lisp. Among lisps, Emacs Lisp is famous for its dynamically scoped variables. Consequently, the specific reference to Emacs Lisp perpetuates the confusion that how variables are scoped has anything to do with what we have been talking about.
Nothing, of course. I just find the macros that give you a "modified environment" to run some code short and sweet.
with-temp-buffer is another example: a macro that bridges the functions for "string manipulation" and the ones for "buffer manipulation", since you start writing stuff this way:
(defun replace-in-string (str from to)
(with-temp-buffer
(insert str)
(beginning-of-file)
;;; Here you can use all your normal text editing commands
(replace-regexp from to nil t)
(buffer-string)))
Lots of dirty manipulation, but from outside its a pure function, and doesn't change the editor state in any way after it runs.
Good luck writing a macro in C to express language functionality for which you don't have the primitives. It's not exactly lisp. Think of C macros as a way to save you some typing and lisp macros as a way to extend the language.
When you see chdir, or any notion of the current working directory being used for anything: run as fast as you can. (or refactor if it's not too late). Things I've seen because of software relying on it.. Sometimes it's just directories/files it creates popping up all over the place, sometimes it's 'just' crashing, but yes sometimes it starts to erase and all hell really breaks loose.
If you can't rely on current working directories then you have to specify any file locations absolutely? That doesn't seem like a good idea because then your code quickly turns into a hot mess if you ever have to change where stuff lives.
This is such a stupid problem I run into a lot. Both alternatives (doing things with absolute paths vs doing things entirely with relative paths) seem to have a lot of downsides. Overall relative paths seems to be way better, but then you leave yourself open to problems which "rhyme" with the one OP was talking about.
As agwa and I mentioned in sibling comments, there are the ...at() functions, which let you specify actions on paths relative to a specific directory you have a file descriptor for. This not only avoids issues like the above (failing to open the directory and then failing to check for failure will mean you're passing -1 into unlinkat, which would simply fail) but will also keep you talking about the same place if links are moved around somewhere up the tree from where you are working.
Rearranging the tree above CWD isn't a problem, CWD will follow along just fine (as in, be the same directory). Also, you're assuming AT_FDCWD won't be -1, which could be reasonable but isn't guaranteed afaik.
"Rearranging the tree above CWD isn't a problem, CWD will follow along just fine (as in, be the same directory)."
I was citing rearranging the tree is a potential issue with absolute directories, not with relying on CWD - that certainly could have been clearer.
"Also, you're assuming AT_FDCWD won't be -1, which could be reasonable but isn't guaranteed afaik."
Interesting point regarding guarantees. It's not -1 on any existing OS that I can find (it seems to be -100 on Linux and FreeBSD, -3041965 on Solaris, -2 on AIX), and shouldn't be for precisely this reason, but something to bear in mind if you are working on something more obscure that nonetheless has these functions.
Of course, you shouldn't be relying on reasonable behavior from functions passed a bad FD in general. It's just nice to have the additional defense when that does get missed.
That doesn't seem like a good idea because then your code quickly turns into a hot mess if you ever have to change where stuff lives.
It doesn't turn into a mess if you handle it correctly from the start. The way we ususally handle this in large applications is to have one single class like 'ApplicationPaths' which internally figures out all paths needed. No other code uses paths directly, instead always uses paths relative to ApplicationPaths.AppConfigDir/ApplicationPaths.UserConfigDir/ApplicationPaths.ExecutableDir and so on.
There's a good reason you can't rely on CWD other than root - the directory can be on a different mountpoint and possibly even on a remote mountpoint. If the remote server goes down or the mountpoint is forcefully unmounted, what would your CWD point to then? Root is the only directory guaranteed to always exist, that's why you'd see a chdir('/') as one of the first steps in properly written unix daemons.
Almost - the main reason for the chdir('/') is so that if you do happen to be in a mounted file system (locally or remotely), your daemon doesn't prevent the system administrator from gracefully unmounting that file system for whatever reason.
Sometimes there are legitimate reasons for relying on the CWD, such as avoiding time-of-check/time-of-use race conditions. The *at syscalls provide a better alternative, but (at least as of a few years ago) they weren't widely implemented outside of Linux and Solaris.
The exception is chdir("/"). This will always give you the desired behavior, only error on EACCES, and prevent many nasty filesystem problems. This also removes the idea of relative paths, so the user is forced to use full paths (or, every relative path is a full path). Not very practical but it does work.
The real problem with chdir is when that code ends up refactored into a library and ends up used in a multithreaded program. Then you've got an ugly bug.
CWD is a useful concept. If I run gimp in a particular directory, I'd like it to show that directory in the file dialogue when I try to load or save an image.
What is evil is a program changing its working directory. That's when it becomes an evil global variable, rather than a non-evil global constant.
I think that's probably the best way to look at it. You are given the parent node with CWD and you can attempt to modify child nodes by relatively addressing them.
I was wondering before if it would be interesting to have a filesystem with transactional locking of paths, though I'm sure the performance would take a hit. Would be kind of cool to be able to do filesystem operations without constantly opening yourself up to race conditions and requiring extremely defensive programming.
Why limit that to one directory? As I expanded on a bit in my response to ygra, I don't mean eliminating any notion of carrying a directory, just that I don't know that there is actually good reason to privilege one particular path universally.
I agree that treating cwd as a global constant solves most (at least) of the issues, I'm just poking assumptions to see what ideas arise.
You'd have to use absolute paths everywhere, but that probably doesn't hurt that much in programs or scripts. The CWD seems most useful in interactive shells, I guess. A fun thing is PowerShell on Windows where you have two CWDs, one from the process and another one from the shell which had is own VFS handling (e.g. the registry is a place that has no representation in the normal file system). Cmdlets use one of them and external commands and .NET APIs use the other. So in the latter case you always need Resolve-Path foo.bar instead of just foo.bar for things to work properly.
You wouldn't have to use absolute paths everywhere - you could use paths relative to any file descriptor that pointed at a directory. The shell itself wouldn't seem to have any problem - it could maintain a logical CWD without assistance. Utilities would probably need some other convention - maybe fd 4 points at the directory they are to operate in at start?
In a sense, this is "still a CWD" - but the differences would be 1) you can maintain multiple at the same time, and 2) you could close it.
Could you have not just checked to see if you actually created the directory and/or check to make sure you moved into the directory before proceeding with your destructive function?
They could have, but then they wouldn't have a great 'epic bug' story to share! Missing the basics is part of what leads to mistakes like this, and I'd wager that after getting bit by that they are much more careful today.
Also, don't rely on implicit state (the "current directory") for a destructive command, pass the dependency in:
Reminds me of a time when I was working on a package system for an in-house linux os build and carelessly had my fakeroot directory set wrong in my configurations, which ended up treating my local root / directory as the root of the fakeroot, which is as bad as it sounds. Running the script over-wrote my entire /etc directory among other important and un-recoverable things...
Gladly, it was a development system so nothing crucial was lost. Needless to say, I am way more careful today in-part due to this mishap (and hours of setting up a new dev system!)
? not at all -- you make the deletion of the directory dependent on whether or not you tested true for the directory being present. I don't see why this would cause any race condition...
The race condition is when another program is interacting with that directory too. So you check that the directory exists, and then the other program gets CPU time and uses it to delete that directory, and then your program tries to delete that directory but fails.
To avoid this, you have to either get a lock on that directory somehow (opening a transaction), or you have to try deleting the directory and then check the return code or catch any exceptions to tell whether the directory existed at the time of the attempted deletion.
It does, because somebody might have done something to the filesystem between your test and the rest of your code.
This is why I found your "Defensive Programming 101 really" comment slightly arrogant (perhaps I misunderstood the intention). Writing correct programs is not easy and one should not mock others, because there is always something new to learn.
The variant of this I ran into was significantly less destructive.
recursively_find_everything_in_current_directory() crashed due to a stack overflow when SetCurrentDirectory failed due to a single corrupt NTFS directory.
If a function be advertised to return an error code in the event of difficulties, thou shalt check for that code, yea, even though the checks triple the size of thy code and produce aches in thy typing fingers, for if thou thinkest "it cannot happen to me", the gods shall surely punish thee for thy arrogance. [0]
Counterexample: pthread_mutex_unlock. That function returns an error code, but it cannot possibly fail in a well-formed program. Checking for an error for mutex unlock is pointless: what would you do in response?
In the late 90s I had an app ported to multiple unixes. We were having intermittent problems with our HP/UX port which was caused by a mutex being unlocked by a different thread than which had locked it. In that case (which only happened under heavy worker contention) unlock happily returned an error and the mutex was left locked.
I think that's your answer. A mutex error return probably indicates an application bug, such as double unlock. You should probably assert or abort on "can't happen" mutex errors.
Programmers are lazy. If they take the time to document an error return value, then you should probably heed their warnings. :)
Huh? You do realize that the standard assert() macro already is compiled out if NDEBUG is defined, right?
Your code could just as well be written as
assert(pthread_mutex_unlock(&lock) == 0);
which of course has the added benefit of not inventing anything new, i.e. being standard and immediately understood by anyone who knows the language and its libraries reasonably well.
Replying to self since I can't edit: d'oh. Yes, I totally mis-read the original code. I should have relized why the other comment questioning this practice had been down-voted, heh.
Of course not unlocking the mutex in non-debug builds would be a problem.
As per jacquesm's comment below[0], your example would, by default, be compiled out if NDEBUG is defined. So in production you'd never release the lock.
You're misreading the code. When NDEBUG is defined, we don't use assert, but instead always evaluate the expression. The overall effect is that we always evaluate VERIFY's argument, but only check it in debug builds.
Be very careful with that. When compiled under NDEBUG and the assert gone, the mutex unlock will be gone too and you'll wonder why the application stops working.
Never ever put actual code in asserts, not even through clever macros. Stick the return value in a temp var, then check the contents of the temp var in your assertion.
Note that wasn't the real reason. At the bottom you can see an edit which reads:
"I was wrong about why malloc finally failed! @GodmarBack observes, in the comments, that x64 systems only have an address space of 48 bits, which comes out to about 131000 GB. So, on my machine at least, the malloc finally failed because of address space exhaustion."
The "One True Brace Style" is actually a specific style with that name. It is based on the K&R style, with the additional stipulation that all `if`, `else`, `for`, and `while` statements use braces.
In a similar family, note also that setuid() can fail! If you try to setuid() to a user that has has reached their ulimit for number of processes, then setuid() will fail, just like fork() would for that user.
This is a classic way to get your application exploited. Google did it (at least) twice in Android: once in ADB [1], and once in Zygote [2]. Both resulted in escalation.
I see a lot of comments blaming the programmer. This is completely the wrong attitude.
Why are you treating the programmer like a machine? They're not a machine -- they're human. Regardless if they fully understand the API or not things should have have sane defaults for HUMAN FACTORS reasons.
Bugs will always exist. The fact that the Linux kernel has many bugs is just one example of a code base that has over a decade of work put into it by many people with high skill shows that bugs are inevitable.
The goal should be to assume people will do stupid things and make fatal behavior more explicit/difficult. Do we really need -1 for kill to do such behavior? How common is that anyway? It's a pretty destructive behavior, and probably should be removed from kill. The human factors approach would say if you really want that behavior then write a for loop to do over the list of pids, because it should never be within easy reach especially for such an uncommon scenario.
Apple's iOS API is similar. Try to insert a nil object into an array? Crash. Try to reload an item in a list that's past the known objects index? Crash. So instead of doing something sane like reloading the entire list, the user has a shit experience because off by one errors happen easily especially in front-end/model work [1 re: fb's persistent unread chat].
Not recognizing the human part of things leads to issues everywhere.. reminding me of this article on human factors in health care previously posted on HN [2].
Conclusion: design for humans and default to non-fatal situations.
In general I agree with your "don't blame the programmer" point, but I would seriously hesitate to criticize fork(). Yes, in 2014, that behavior seems uncommon and it seems like very poor design to lump such destructive behavior into an otherwise meaningless "-1"...
but remember that fork was not written in 2014. It was written forty-five years ago. I'm not saying it was a great API design decision back then, but I'm willing to bet that it seemed a lot less "wrong" at the time.
You can criticise both, and more importantly criticise C for its inability to create sensible APIs: in a good design, fork() would have exclusive domains for a PID, an Error and a Child result and you couldn't confuse an error for a pid.
Conclusion: design for humans and default to non-fatal situations.
It's a lot safer to fail fast and fail safe than to hobble along with possibly undefined state doing who knows what to the system and to the user's data.
You're missing my point. It's not about undefined state -- it's about sane defaults and APIs that make things that crashes/bad things harder to achieve. Kill's -1 is an example of this -- having a separate killall api is defaulting to a non-fatal situation. Apple's UITableView's is another -- it's so easy for them to rebuild themselves yet instead it crashes on an inconsistency.
If a developer fails to check the return value of a function that can fail, all bets are off. There is no sane default that can safely hide a developer mistake.
Granted, the specific kill() API could have been designed better, but since it's very well established by decades of history, the burden of understanding it lies with the developer.
Because nothing ever changes with computers over time? You're being myopic. It's not about fork->kill. It's about how computers work in general, and why this 'blame the programmer' mindset is just dumb.
As you admitted, the API could have been designed better. That's my point. Things that break things for users, especially those that take down entire systems, should be difficult or require more awareness to do, such as by naming it killall() like suggested in this thread. The human factors approach recognizes that operators aka programmers aka humans will not just make errors, but predictability so. We can incorporate those predictions into our domain and change design patterns to match.
There are many known patterns of errors that programmers make: edge case errors such as off-by-one, null dereferencing, etc.
Using the return value from a function is a common pattern generally. Having a return value that mixes an actual result (PID) along with an error code (-1) is problematic in that they're both integers so there's no obvious handler for the error case, and thus this cascade can happen. Then you add the fact that fork() rarely fails and that compounds the issue in hiding it into obscurity.
Generally when a system is maxed out of resources all kinds of things break and fail in weird ways; it just so happened that this particular cascade was super bad and needlessly so because of API design choices that have never been addressed.
Somewhat pedantically, I discovered a bug where a daemon (which did this very similar code) started killing random processes. The issue turned out to be that it was test run as root once and the pid_file was owned by root, group root. So the write_pid() function failed (silently) and the cleanup script always took the pid that was in pid_file and sent it a kill 9, which was now stuck. Sometimes kill 9 would return invalid pid, sometimes it would kill some process. Fixed by removing the pid_file, letting the daemon create it, and checking that the write succeeded.
You should always put a break statement at the end of a default branch. The default case is put at the end as a convention but no one (certainly not the C standard) prevents you from adding a new case after, which will be promptly executed even though probably it's what you want. I think this is mentioned a best practice in the K&R.
smart-ass comment: you should put two breaks in for when some programmer removes one of them six months from now. ;)
edit: i know it's not totally analogous since removing the exit and not noticing there's no break is a lot more likely than just randomly removing a break, but the "let's prevent someone clumsy from screwing this code up in the future" argument always makes me laugh a little.
Sure, you could have pid = fork(); switch (pid) if you prefer. I find that the inline assignment-and-condition style is clearer since it reads to me as "switch on the result of fork, and cache that value somewhere" whereas the separate statements read to me as "call fork" and "switch on the process ID".
daemon(3) isn't part of POSIX, and the OS X man page says "the use of this API is discouraged in favor of using launchd(8)", so who knows what might happen here in the future.
If you don't care about portability, it's an easier call to make.
I'm kinda surprised that the assignment-during-test thing doesn't generate a warning in the case of a switch statement. It does in other cases (if and while do under gcc and clang, at least).
Assignment-during-test is flagged because it's common for a typo to conflate assignment and equality-testing. Equality-testing is very common in an if or while, so an assignment inside an if or while has a fairly high likelihood of actually being a mis-typed equality-test. Switch, on the other hand, is very rarely used with an equality-test (since equality-test only returns true or false, why would you use a switch when an if would suffice?). So an assignment inside a switch is much less likely to be a mis-typed equality-test.
Careful with that example. The parentheses here don't just remove the warning. They remove a bug. This code:
if (buf = malloc(buflen) == NULL)
goto outofmemory;
is actually equivalent to that code:
if (buf = (malloc(buflen) == NULL))
goto outofmemory;
So, malloc gives you a pointer, which is compared to the null pointer, giving you either 0 or 1. And that is assigned to buf. Hopefully your compiler will warn you about the type error that spawns from such dark magic.
I was originally going to comment saying exactly that — under gcc and clang, at least, just the extra parens will disable the warning; no comparison necessary — but thought to test that the warning is generated in case of a switch statement (and is then disabled by the extra set of parens), which led to realizing the behavior is different.
I was curious too about my own code, so I looked at the last publicly available C code I wrote with fork in it[1], and yep I checked for the error case too :)
I think this just came from being drilled in school on the importance of checking error return values. No matter how unlikely never assume that something can't happen. If it really can't you should at the very least assert on it.
Just noticed that in Perl the behavior is slightly different: http://perldoc.perl.org/functions/fork.html unsuccessful fork() returns undef, effectively stopping you from kill-ing what you don't want to kill.
Similarly python throws an exception, and I bet other languages have their own behaviors, but in case of C this is the only way (or at least it is the only non complicated way to do it).
When I read this article I thought it was preaching to a choir. I'm actually quite surprised people programming C don't check for errors. That's the only way the functions can provide a feedback.
Nothing in C forces the API designer to use -1 as "bad PID" in one place and as "the set of all PIDs" in another, however. Perl's undef isn't that different from returning, gosh, -2 or any other bloody number except -1 in C.
fork() returns pid_t type which is usually mapped to int32_t. For this type there's no equivalent of Perl's "undef", the -1 is standardized as an error in all system calls that return an integer.
As for the argument why not send -2 instead, well guess what? Other negative values also have a meaning. Negative values in kill send signal to a process group instead of a process.
It's not libc responsibility to predict all possible things the programmer can do. Also unlike perl, C doesn't have exceptions so it can't exactly quickly terminate on error showing what went wrong.
Imagine C throwing SIGSEGV every single time a function failed.
> As for the argument why not send -2 instead, well guess what? Other negative values also have a meaning. Negative values in kill send signal to a process group instead of a process.
That's the problem there. Kill takes an argument that's either a process id or a magic number or a different magic number or.... Those should be different functions, and the special cases like "kill all processes", "kill all processes in this group",... should be some kind of enumeration type. But it's C, so...
Interestingly, the kill procedure in Perl explicitely checks for non-numbers, which is unusual. Perl as a language would just cast undef to zero in a numeric context[1], and 'kill $signal, 0' would go on killing every process in the process group.
$ perl -we 'kill 9, undef'
Can't kill a non-numeric process ID at -e line 1.
[1] unless you "use warnings FATAL => 'uninitialized';"
A null pointer does not need to have an all-zero representation, which mattered once on some architectures. I'm not aware of any current architecture that does it that way, but that's still what the standard says, which is to say that C does technically have a notion of null which is not necessarily identical with 0.
"For practical purposes, the null pointer and the integer `0` are one and same."
For most practical purposes, which is why I said that it wasn't a terrible approximation. However, they can be distinguished on some architectures:
intptr_t x = 0;
void *p = 0;
x == *(intptr_t*)(char*)&p;
I can easily construct fantasy scenarios (involving more than a bit of Doing It Wrong) where this would be relevant. I'm not convinced it couldn't ever be relevant without Doing It Wrong, if one in fact needed to work on that kind of a system.
I wish posix_spawn were ubiquitous; it's a much better process-launching interface than fork: it's naturally race-free and amenable to use in multi-threaded programs, and unlike fork(2), it plays well with turning VM overcommit off. (If overcommit is off and a large process forks, the system must assume that every COW page could be made process-private and reserve that much memory. Ouch.)
Unfortunately, posix_spawn is woefully underpowered. I can't make the child process a session leader (setsid) or process group leader (setpgrp). I can't set a working directory. Etcetera.
The role of posix_spawn is for spawning "helper processes", not starting new process groups. This should overwhelming be your common launch case. posix_spawn is so much faster (on BSD/Mac anyway) that fork should be outright avoided.
Processes requiring other permissions can/should be spawned by asking systemd/chron/init/launchd to launch them for you.
And extensions to let us specify failure-case or other behavior of those extensions, and so on. But at that point, we're already heading down the road of implementing a tiny DSL for "the program that posix_spawn() should run after creating the new process but before exec()ing the new executable". Why not simply write that code in the host language? You could specify a thunk of code to be sent to the new process and executed there. Oh, you'll also need to pass any data structures that code relies on--- pass a closure, not just a thunk. And garbage-collected references to any system objects that those data structures rely on. Congratulations, now you have fork()! If you squint a little, that's exactly what fork() provides you --- a closure and continuation.
Yes, you end up with a tiny DSL for specifying transformations to make to a child process: they key difference is that the kernel can execute this DSL much more efficiently than it can code written in the host language: fork closes over the entire world, and posix_spawn doesn't have to do that.
Even if VM overcommit is enabled, you shouldn't be forking from large processes, because the necessary kernel VM manipulation will kill your performance.
It's portable enough: Linux, Darwin, and the BSDs all support it. Darwin also supports posix_spawn natively. Even Cygwin supports it, although Cygwin's vfork is currently just an alias for fork.
I seem to remember Cygwin having all sorts of weird problems with fork behaviour, but I'm no C programmer let alone doing stuff on Windows, so I might be remembering incorrectly
This to me is a good example of why exceptions in modern languages are good way to handle errors. In this case the user has basically ignored the error return from fork() and the accidentally used it in kill.
If fork() had thrown an exception for an unexpected failure then the user could not have accidentally ignored it in the same way.
I realize that this is not appropriate for a system call but it seems like a good example of why handling errors using exceptions is helpful sometimes.
Standard file handles are another thing you should not assume are there (though I'm not sure how to test for it programmatically).
We once had a user that, for whatever reason, tweaked their Unix installations to not pass an open stderr to processes - they just got stdin and stdout (that is, file handles 0 and 1, but not 2). If you wrote to stderr anywhere in your program, it wrote to whatever was open on handle 2, which was not a stderr that the OS passed in.
Yeah, that's a pretty insane thing to do, but somebody was doing it...
Wow that's a really wild story, but I think it's also pretty different. fork() returning -1 is defined behavior, as is true for many functions. Whereas not having stderr open defies everything about the C standard I/O. (K&R B1, 7.5 & 7.6)
Note however, that strictly speaking stderr does not have to be 2. It can be any number, but it has to be whatever the include file says it is, so if you don't specify the stream as stderr but instead write to stream 2, that would potentially be a problem.
That said, if for some reason you have a program completely defying the C standard, you can test whether the streams are open (and they are explicitly defined as having to be open) using fcntl and testing for EBADF at the very beginning of the program.
> The following symbolic values in <unistd.h> define the file descriptors
> that shall be associated with the C-language stdin, stdout, and stderr
> when the application is started:
>
> STDIN_FILENO
> Standard input value, stdin. Its value is 0.
> STDOUT_FILENO
> Standard output value, stdout. Its value is 1.
> STDERR_FILENO
> Standard error value, stderr. Its value is 2.
You are not correct here: There are no fds in the C standard. The only thing defined is fopen/fread/fwrite (which are FILE*). The open/read/write API is only defined by POSIX, where 2 is most definitely stderr.
I cited the exact portions of K&R that specify stdin, stdout and stderr in my original post: K&R B1, 7.5 & 7.6
Section 7.6 explicitly says all three must be open when the program begins. There are equivalent sections in the formal standards, I cited K&R because it was on my shelf.
Please stop and bother to check your facts before continuing. You're very close to being accurate (POSIX does define open, but C defines stderr and some functions to print to it, the underlying mechanics are the choice of the implementing system.) But don't you think it would be nice to check that I actually am before writing yet another post simply asserting I'm wrong?
You are wrong because you keep asserting that file descriptor #2 is not defined to be standard error. The only place that defines the interface that uses file descriptors is POSIX and it defines standard error to be file descriptor #2, end of story.
The sources you cite say nothing of file descriptors; they are all references to the standard FILE* interface in C. Those are opaque pointers and have nothing to do with 0, 1, or 2.
You may be confused because I abbreviated "standard error" as "stderr", yet I was never talking about the C standard global "FILE *stderr". That was sloppy of me.
When you printf to stderr, nothing in that function call is dependent on POSIX semantics. It is dependent on the semantics of C, which requires a stream called stderr to:
1) exist
2) be open when the program starts
POSIX implements this using file descriptors and specifies that stderr is 2.
I said: "strictly speaking, stderr does not have to be 2" which is true. A system is welcome to implement file descriptors and make stderr's something other than 2. It will be a blatant violation of POSIX, but complying with POSIX is optional. Systems that don't just aren't POSIX systems. Complying with the C standard? Not really optional.
> If I was arguing that it was reasonable, I'd have said that. I merely stated C allows it. Which it does.
That's the thing. The C standard does not allow it. The C standard merely never mentions it, which is a different thing entirely. Hence my analogy to JPEGs (which the C standard never mentions either).
And I meant the argument itself is unreasonable. It makes no sense! I could just as well argue that the TCP/IP RFCs allow fd 2 to be something other than standard error. Or the HTTP 2.0 spec. Or the Ecmascript spec. Arguing that something is allowed by a spec that never mentions it and has absolutely nothing to do with it is not an argument.
Back in the day I had a Motorola Atrix (remember those? First dual core Android phone, best thing since sliced bread, abandoned by Motorola a few months after launch?). Well, one of the ways to root it was to keep forking a process until the phone ran out of memory. After fork failed, you were left with a process that for some reason was running with root privileges...
Just as a reminder:
"So, malloc on Linux only fails if there isn’t enough memory for its control structures. It does not fail if there isn’t enough memory to fulfill the request." - http://scvalex.net/posts/6/
Malloc can also fail if you're out of address space without necessarily being out of memory.
NT does a much better job of separating these concepts than Unix-family operating systems do. Conceptually, setting aside a region of your process's address space and guaranteeing that the OS will be able to serve you a given number of pages are completely different operations. I wish more programs would use MAP_NORESERVE when they want the former without the latter. (I'm looking at you, Java.)
One day, perhaps when I am old and frail, we will achieve sanity and turn overcommit off by default. But we're a long way from being able to do that now.
These days, I describe malloc() as "a function which allocates address space", to avoid confusion. Which means it makes sense that malloc() returns NULL if you are out of address space, even if you have lots of memory. (But so many people don't check malloc()'s return anyway...)
This is a mostly-untrue statement because it makes unreliable assumptions about the host system. It depends on the vm.overcommit_memory setting and the programmer should never make assumptions about why or when malloc might fail. Read more on Rich Felker's excellent blog post here: http://ewontfix.com/3/
Another is if you are allocating lots of memory with alternating mprotect() permissions. On some systems (AIX for example) this uses up all of the memory for control structures WAY before hitting the address space limit (I've seen it fail after just a couple of GB).
The problem is that C and POSXIX don't (idiomatically) provide rich enough data types force checking the error condition. The fact that the error is signaled by a random integer (-1) is horrible. In a language with stronger types and richer data structures, one can have a return type that is a disjunction of {failure, parent, child}, so you can never accidentally treat a failure as a PID.
In a functional language this would look like a datatype (essentially a generalized enum), while an OO language you would use different subclasses of a common superclass.
In C you can return a tagged union and check the tag, but nothing forces you to do the check. A user of this API can just go ahead and assume the success branch of the union. Furthermore, this isn't idiomatic POSIX, so it is never done.
You could return a pointer or null. That would successfully force the "did it succeed" check, but of course raises questions about memory management.
Tagged union is probably the best approach. It doesn't prevent skipping the check, but it at least makes "thing I am supposed to use" different than "thing I am supposed to check".
"Unix: just enough potholes and bear traps to keep an entire valley going."
If you don't understand how to use sharp tools, you may hurt yourself and others. Documentation for fork() clearly explains why and when fork() returns -1. Those that find the man page lacking or elusive may get more out of an earnest study of W. Richard Stevens' book, Advanced Programming in the UNIX Environment. In any case, every system programmer should own a copy and understand its contents.
> If you don't understand how to use sharp tools, you may hurt yourself and others.
It's still bad API design when naively handling an error case kills everything. Is there an inherent reason that the error value for a pid has to be the same as the "all pids" value? Unless there's a very compelling reason, it seems like very poor design, well documented or not.
The inherent reason is that -1 is the most common error return code in C based APIs. The problem is not naively handling an error case, it's not handling an error case. Using a different value might avoid calling killall -1, but the program would still be incorrect.
This is the same sort of argument as strlcat vs strncat, and people can't agree on that one.
I'd argue every programmer. It's such a fundamental part of computers & operating systems that key concepts will come up again and again. Just the other day I wanted to learn about Docker/CoreOS/etcd only to realize that I have an embarrassingly lacking understanding of how UNIX works. I immediately went to the library to pick up this book and begin fixing a flaw of mine (even as a web developer).
Meh. Not all programmers are on a UNIX system. Not all programmers are even on UNIX || Window.XX.
But even if there was only UNIX... the entire point of a well designed system is to allow users of the system to reason about it on a high level, not a domain-expert level or even domain-intermediate level. As programmers we reason about code without worrying too much about gate layout on silicon. As non-system programmers we should likewise not need to worry about shoddy OS design.
err_sys() is something you have to provide yourself, though, which takes you out of the flow of the code you're thinking about, which is half the reason people don't write error handling in the first place.
Once I discovered the existence of the BSD err()/errx()/warn() functions, though, the error handling in even my quick one-off programs became much better and more informative.
pid_t child = fork();
if (child < 0) {
err("fork");
} else if (child == 0) {
... in child
} else {
... in parent
}
is idiomatic, quick to write, and produces useful error messages when that "throwaway" program starts failing years later.
See /etc/security/limits.conf and nproc and "fork bomb"
Aside from intentional fork bombs I've seen this done intentionally in the spirit of a OOMkiller to keep a machine alive for debugging / detection of problem. 100 "whatever" processes will kill this webserver making it impossible to log in and diagnose much less fix, so we'll limit to 50 processes in the OS.
I've also seen it in systems where people are too lazy to test if a process is running before forking another and the system doesn't like multiple copies running (like a keep alive restarter pattern). If ops has no access to the source to fix that or no one cares, then just run it in jail where you only get two processes, the restarter-forker and the forkee. Then hilarity can result if the restarter thinks the PID of the failed fork means something, like sending an email alert or logging the restart attempt. "Why are my logs now gigabytes of ERROR: restarted process new pid is -1?"
The first time I learnt of fork (from an OS book), the example had three branches to the if statement after fork - and the first tested for a negative pid. I suspect that the reason this link has 400 odd upvotes is because more people aren't learning OS the correct way in the beginning. Or maybe my OS book was nice. IDK.
This. One of the most useful classes I took in undergrad was implementing a scheduler and an I-node disk. Nothing shows you all the ways fork() can fail like having to implement it.
Every system call can fail, even if it doesn't do something obvious like use disk resources. Ignoring this is how subtle bugs appear that seem unreproducible until you implement correct error handling.
If it's possible for the call to fail to return an object, you declare its type as optional (add a ? to it) and then the calling code has to explicitly "unwrap" the return value to get to the actual object - they can't simply go
var pid=fork()
pid.kill()
as that would be a compiler error. Instead, they need to go
pid!.kill()
The idea is that they should check for the pid object being nil before doing the unwrapping. Of course, it's still possible for the coder to ignore that (just as it's possible, and depressingly common, for coders to catch and then ignore exceptions), but that's going to be a conscious decision because the compiler is telling them that there's a possible error condition here.
Exceptions disrupt the program flow at any place, including constructors and destructors.
It's not easy to guarantee that you deallocate on destructors exactly the resources that were allocated at the constructor when both of them can stop their execution at any time. Finally clauses are technically enough, but each allocation needs the same level of attention non-memory resources (e.g. connections, files) get on other languages.
I'm going to answer for C++, since as far as I know it's the only major language with exceptions and RAII. Correct me if I'm misunderstanding your post.
> It's not easy to guarantee that you deallocate on destructors exactly the resources that were allocated at the constructor when both of them can stop their execution at any time.
I disagree; let's take this one case at time to keep it simple:
1. Destructors: within C++, if you're in a destructor, the object was fully constructed, and thus you know the exact set of resources requiring destruction. It is idiomatic C++ that a destructor should not throw; I'll discuss why below.
2. Constructors: these certainly can throw at any moment, as resource acquisition is often fraught with failures. That said, idiomatic C++ provides mechanisms (RAII, such as std::unique_ptr) to manage the partially constructed set of resources in a constructor, such that if something goes wrong, they will be automatically released by virtue of the variable going out of scope. Once you have the resource acquisition completed, you transfer ownership of the objects to the object you're constructing, which is practically guaranteed to be exception-free, since it's usually just moving a pointer under the hood.
> Finally clauses are technically enough
I don't really think you can both stand by the fact that destructors can throw at any moment and that finally clauses are enough, without making what amounts to an apples to oranges comparison. Take, for example, this function, where we assume releasing a resource can fail:
Foo() {
SomeResource resource;
// Assume the destruction of a SomeResource can fail.
// Other actions take place, some of which may raise/throw.
}
In this example, if the other actions throw an exception that causes Foo to itself abort, then SomeResource resource must be destructed. If we're assuming that destructor can also throw, we've now got two exceptions, and how do you handle two exceptions? (It's language dependent. Some discard an exception, some chain them, some, like C++, just terminate.)
If we translate this to using some sort of "finally" construct, say in a garbage collected language:
def foo():
resource = aquire_some_resource()
try:
# other actions that may raise/throw.
finally:
resource.release() # but we're assuming this can also raise/throw.
You still have the same problem at the resource.release(): up to two exceptions can occur at a given point in the program, and you then need to know what your language does in that situation.
The general gist of this is that if the "release" of some generic resource can fail, then you have to make harder decisions about what happens during a stack unwind due to some other error because now you have two errors. Do you ignore it? Log it? (can you log it?)
If releasing a resource cannot fail, destructors (and finally clauses in languages lacking RAII-style resource management) cannot fail.
"It is idiomatic C++ that a destructor should not throw;...which is practically guaranteed to be exception-free..."
"Should not", "practically". Your confidence is overwhelming. :-)
Exception safety in C++ may not be quite as much of a black art as it once was (say, before std::unique_ptr), but it is still something the programmer has to do, actively.
"If releasing a resource cannot fail, destructors (and finally clauses in languages lacking RAII-style resource management) cannot fail."
Yupper.
I'm probably unqualified to have an opinion on this, but I believe that the entire hatred for checked exceptions in Java comes from that general piece of idiocy and specifically from JDBC's urge to possibly throw a SQLException from close(). (Just what the hell is anyone supposed to do with that?)
For a long time, Java suffered from issues when throwing an exception during stack unwind. The second exception is the one that is subsequently propagated, and in modern Java the original exception is available and printed in any stack trace.
It's still not particularly pleasant, but it is at least survivable and no information is lost.
And what if $programmer forgets to check what's in err? What would pid contain in that case?
I mention this because I guess you quoted a kind of syntax that matches the one from Go.
So then I'm guessing that Go would simply ignore the error in this case.
However, having a proper exception mechanism, if you don't catch the problem, then it bubbles up, and the program doesn't continue with wrong data (which is a good thing => fail fast!).
Or a general "Choice" sum, perhaps using phantom types so int<err> isn't compatible with int<pid>. But then all of a sudden, instead of a single word being returned, a tag and possibly variably-sized result has to be returned, and that's quite a hassle which doesn't fit well with C.
A "Choice" (Either in Haskell, Result in Rust) wouldn't work for fork() as it can have 3 results, and you'd want the `Child` case cleanly and easily separated from `Pid`.
I think the parent meant only a sum type, not a concrete example of it such as Either of Haskell, which would surely not suffice here. In Haskell you would probably define a new sum type for this occasion, e.g.:
Actually if you want to not handle an error, you have to do either _ = err or just go data, _ = doStuff(), both of which are very visible. You can basically scan your codebase for _'s and find all the unhanded exceptions. If you don't do something with a variable, eg I do myInt := 1 but I don't use myInt go simply refuses to compile.
Technically true but most of the [admittededly modest] Go code I've seen had that error punt all over the place. I really wish they'd learned from C and either banned assigning _ for errors or threw a compiler error if the next line wasn't a _ check.
On second thought, it'd probably avoid some nasty production failures if that behaviour was true for everything which can return an error.
If you have set a non-root user's process limits correctly, sending SIGKILL to all of that user's processes is likely a perfectly fine response to their fork() failing.
If you haven't limited the number of processes a given non-root user can start to some value the machine can handle, sending SIGKILL to all of the user's processes is probably not going to do anymore damage.
If a program running as root doesn't correctly handle fork() failing, someone needs to be taken out back and beaten with a stick. Maybe the person who wrote the program, maybe the person who ran it as root. But somebody.
This reminds me of the time I was telnet'd (since SSH wasn't a thing at the time) into a remote SunOS/Solaris server. At the time my only Unix experience was with Linux.
"killall -9 httpd" gave an unhelpful error message. "killall httpd" also gave an unhelpful error message. "killall", which would give you usage instructions in Linux, killed all processes on the system. Reading this article makes me figure that killall was likely a frontend to kill(-1, ...).
That day I learned a valuable lesson about reading man pages and understanding that not all unixes are the same.
That's funny... I almost always try "command --help" first if I'm not sure. Of course some may point out "man command" but I always find man painful, and revert to google.
I just recently finished a multithreaded program where I found obtaining the pid [on linux: getpid()] of child processes spawn was only effective by utilizing a common pipe that was non-blocking [fcntl(pipefd[1], F_SETFL, O_NONBLOCK) ].
In other, more humorous words, as a "parent," it's great to know what your "child," is doing (or in this sense), who your child is (the actual pid), instead of just kill SIGTERM them.
Is there any clean way to use an Option/Maybe monad in C (or C++)? It should be a simple way to solve problems where error codes are valid inputs of other functions.
The simplest way I can think of is:
struct maybe {
bool isEmpty;
void* value;
}
Although I wonder if using C++ templates, classes and operator overloading is possible to make a more practical implementation (using void* does seem like a bad idea).
The problem is solved, but consider the overhead of all this. You now need an extra byte somewhere (register, stack space, whatever) to track the tag. I would imagine that this, and lack of handy syntax, is why lower-level APIs don't offer better return values.
Nitpick: std::optional<T> does not exist in C++11, nor in the recently-approved C++14. It is, nevertheless, in the process of getting into the standard, and is already present in GCC 4.9's libstdc++ as std::experimental::optional.
Unfortunately `std::optional` did not get accepted for C++14, because it's implementation uses some hacks that are undefined behavior by the standard (but work on the major compilers).
I don't use fork() that often, but my own paranoia is why I always test for <= 0 instead of == 0. Some people think I'm weird for doing something like:
if len(some_list) <= 0:
# Test for empty list
But it's just my way of covering my ass in case the laws of physics change during execution, or just in case weird bugs exist like those found in this article.
Careful, in many cases C will happily automatically and silently coerce unsigned values to signed values for your check if the types differ. Some functions may be actually intentionally returning numbers that large indicating success and your program might be invoking error handling code assuming a failure.
There's no good substitute for reading the RETURN VALUE(S) section the manpage for every function and testing appropriately.
If you're not sure whether len() can return -1, and in turn what that value means, then you can't know that your code is any more correct.
In fact, this is going to be worse than a equals comparison because instead of having code that clearly doesn't handle a corner case you have code that lies about what values it can correctly handle. That makes it much harder to debug.
You only need 3 cases if you need to distinguish between the parent and the child. I can imagine designs where the parent and child are both going to exec the same program and so all you need to check on the fork is for success or failure.
"Thankfully that case will never happen because you can't be a child if there was an error. :)"
Uh, no. If there is an error, there will be no child process but the parent process will think it is the child.
From the fork man page, emphasis mine: "On success, the PID of the child process is returned in the parent, and 0 is returned in the child."
"Also the check as presumably written would never miss an error, it would just potentially assume valid return values were also errors."
That was already discussed as a possibility; I was addressing the other. In the case you describe the software would never work at all, even when fork successfully forks, because the child will always think there was an error and presumably fall over rather than getting things done. That's probably the better case, in terms of development progress, because it would be spotted and fixed right away. But hopefully fixed correctly, and not converted to the broken-but-working-when-fork-succeeds other variant that also uses "<= 0".
> just in case weird bugs exist like those found in this article
The article does not describe weird bugs - the behaviour it describes in fork() and kill() are by design, and well-documented. The real lesson here is to RTFM and understand what return values you get under what circumstances.
Yes, fork can fail and we ran into this a few years ago at MemSQL. The problem was that MemSQL would allocate a lot of memory and linux wouldn't allow to fork such a process. A remedy to that is to create a separate process and talk to it via tcp. This small and low on memory consumption process is responsible for fork/exec paradigm.
Why doesn't it use vfork() for fork+exec? AFAIK vfork() doesn't clone the allocated memory of the parent process and is only useful for calling exec immediately after forking.
http://linux.die.net/man/2/vfork
Is there a test command that asks the operating system to run a program but cause the nth fork to fail? I would be more diligent about writing code that handles rare errors if I could create test cases. Writing code that I cannot test feels wrong.
I'm a teaching assistant of an OS course, I grade projects. I constantly remind students to check the return values of the system calls and it is mostly the main issue in their codes.
If I set off something that proves to be a fork bomb as an unprivileged user, kill -1 might be a reasonable approach. And note that kill exists as a builtin in bash, so you can run it without forking a process.
Hopefully doesn't happen often, but potentially very useful in narrow circumstances. The problem is it being -1, not it being available. If it took an argument to kill that you'd never accidentally generate as a PID then it might as well be a different function. Of course, with negative numbers otherwise referring to process groups, there's not a lot of room remaining, so yeah...
Haha too cruel. I remember in one of my early college CS classes the professor told us about fork bombs and wasn't sure if they still worked on the Sun Fire servers we were using. About 5 minutes later someone piped up and confirmed that yep they still worked at bringing the server down. :)
Parent said as soon as it is able to fork a new process, which still isn't quite right (it's the child that quits, as you say) but not the error you seem to be responding to.
And this is why I think Go-lang and its multi-return is the way of the future. In Go you are required to handle errors. If you want them to go away you have to explicitly use an _ and thats really easy to find in the code and shame the person who did it. Nothing fails silently. Nothing fails via primary return. It is such greatness it is hard to express.
But as a language construct and the agreed upon way of returning errors it's much more powerful. Returning a structure you can just ignore the error piece. Returning an error however in Go, you have to explicitly handle it. If you don't, it's literally a compile error.
That's why you read the manpage on a function before you apply it rather than just cutting-and-pasting the first bit of code google returns when you search for 'fork example unix'.
(In this particular case that actually returns (for me) a bit of code that gets it right.)
Your username is offensive. I'd like to take what you say seriously, and perhaps even engage in further conversation about the topic at hand, but ... you know ... you look stupid. Grow up. You should be embarrassed.
That's understandable, but please don't feed the trolls.
When you see a comment that is truly bad for HN, it's best to flag it. You can do this by clicking on "link" to go to the item page for the comment, then "flag". We monitor those flags and take action based on them.
We didn't know, so we wrote the program and ran it.
This was on a PDP-11/45 running v6 or v7 Unix. The printing console (some DECWriter 133 something or other) started burping and spewing stuff about fork failing and other bad things, and a minute or two later one of the folks who had 'root' ran into the machine room with a panic-stricken look because the system had mostly just locked up.
"What were you DOING?" he asked / yelled.
"Uh, recursive forks, to see what would happen."
He grumbled. Only a late 70s hacker with a Unix-class beard can grumble like that, the classic Unix paternal geek attitude of "I'm happy you're using this and learning, but I wish you were smarter about things."
I think we had to hard-reset the system, and it came back with an inconsistent file system which he had to repair by hand with ncheck and icheck, because this was before the days of fsck and that's what real programmers did with slightly corrupted Unix file systems back then. Uphill both ways, in the snow, on a breakfast of gravel and no documentation.
Total downtime, maybe half an hour. We were told nicely not to do that again. I think I was handed one of the illicit copies of Lions Notes a few days later. "Read that," and that's how my introduction to the guts of operating systems began.