I think it's a shame (and painful) how the major Unices have all significantly diverged on things like event handling, I/O multiplexing and file system watching.
I find the BSD kqueue(2) interface to be much more elegant than any of Linux's *fd() functions or epoll(7). On the other hand, Linux's inotify(7) is, I find, cleaner to use for file system event notifications than kqueue(2). But then there's also fanotify(7) on Linux, which seems to have sort of been neglected since its initial hype. Then OS X has FSEvents.
I guess this is why one may need libraries like libevent and libuv.
I think the kqueue interface is a lot worse than the Linux way. With kqueue everything is forced through one interface. In Linux it's just file descriptors and epoll to deal with them. It is a nice extension of the "everything is a file" mechanism and it embodies the "do one thing and do it right".
This becomes apparent when you look at fs notifications. You've already mentioned how inotify is superior to kqueue. I mean seriously, the kqueue fs notifications are designed in such an incredibly bad way, it almost looks like a trolling attempt. E.g., when you try to watch a directory the kernel knows which files are manipulated. But the API has no way of telling you. The documentations usually suggest to keep a file list and update it. But this is complicated and racy and will certainly result in buggy behaviour.
And that really shows that having one interface do everything is a bad approach. The kqueue designers had to cover every event type that could happen. And thus they added an API for something they apparently didn't understand properly. And since it's stuck now in their API they would have to deprecate parts of it, if they ever have the interest in fixing it.
The inotify API is not perfect. But it does its job pretty well and it is the best fs notification API I have seen. fanotify solves a different use case. And that's really the flexibility that the Linux interface has. They can easily come up with new event APIs for new use cases because all they have to do is provide a file descriptor.
I wish the BSD folks would simply implement inotify. But it seems they are unwilling to fix their API and if they did then they'd probably design their own interface simply out of spite. Right now their crappy fs notification interface is a real pain. And since the lowest common denominator is very influential when it comes to portability libraries like glib, Qt, etc. everybody is suffering because of BSD. And they only get away with it because OSX has the same API. I haven't looked at FSEvents. Is it a new API or simply based on kqueue?
EVFILT_VNODE is not complicated to use for the cases for which it was actually designed — monitoring a file descriptor. For more general file system monitoring, a new kqueue filter type would be ideal, but adding that does not require breaking or deprecating kqueue API; the whole point of kqueue is to provide a generic extensible event mechanism.
Adding a recursive path monitoring filter is as simple as:
- Defining a new `EVFILT_DIR` filter type.
- Accepting a file path or st_dev/ino_t pair via the generic `uinptr_t ident` kevent identifier.
- Providing extended data via the existing filter-controlled `intptr_t data` value.
Boom. Done. Using the nice, generic, well-designed kqueue() API that can also monitor file descriptors, processes, AIO events, signals, timers, and user-defined events.
I wish Linux — like Mac OS X, and all the BSDs — had adopted kqueue, or at least participated in the conversation. Instead, Linux went through 2-3 different mechanisms before finally settling on the odd-duck single-purpose inotify interface, despite kqueue's design having been published (http://people.freebsd.org/~jlemon/papers/kqueue.pdf) and released as part of FreeBSD 5 years earlier.
While Jon Lemon published a detailed paper covering kqueue's design, implementation, and performance benchmarking, the inotify developer published a 30 line README with erudite gems such as "Rumor is that the "d" in "dnotify" does not stand for "directory" but for "suck.": https://www.kernel.org/pub/linux/kernel/people/rml/inotify/R...
If you begin your comment with "Bah. This is complete nonsense." then you should at least address all of my points...
Especially when you confirm one thing I've said in your first sentence. EVFILT_VNODE's design it too limited. It can watch directories (which after being opened are just a file descriptor). But it doesn't give you the information you need although they are available when the event is generated. The only explanation for this I can think of is because the API designers wanted to cover every event even if they didn't understand the particular use case. Which is exactly what's going to happen when you try to design an API in such a way.
Yes, you could change kqueue and add another event type. But then why keep EVFILT_VNODE, except for legacy reasons?
You complain about nonsense and then you compare a paper covering kqueue, which again is an API trying to do everything, to the documentation of inotify, which only covers a single class of events. And then you are disingenuous enough to only look at the README and not at the provided manpages and other material. inotify actually has pretty good documentation. But if you want to compare the 30 lines README to the kqueue paper ... well the kqueue paper covers EVFILT_VNODE with 19 lines, 7 of which simply list the possible actions.
inotify is single purpose because it follows the "do one thing and do it right"-principle. The Linux folks had the freedom and time to come up with a good enough API because inotify wasn't forced into epoll, which itself follows the "do one thing and do it right"-principle.
Meanwhile the kqueue designers came up with an fs notification API that is even inferior to what w32 offers...
Why do you keep ascribing EVFILT_VNODE's limitations to the the fact that the kqueue API is well designed and extensible? The inotify designers had 5 years to figure out the constraints and implications of file system notifications -- of course that gave them the opportunity to explore a different approach. Imagine that, instead of ignoring the hard problem of generic event APIs and NIH'ing a bad one-off hack, they'd adopted kqueue and extended it with an EVFILT_DIR? The whole industry would have benefited.
Instead, they did the standard Linux thing of hacking together an incoherent bolt-on API because to do otherwise would require careful consideration and thought.
Blaming the BSDs for not adopting Linux's hack is nonsense.
I have explained it several times now: The Linux devs had 5 years to come up with the inotify design because their API is more flexible. If you design things along the lines of "do one thing and do it well" instead of "let's cram everything into it" then you gain that flexibility. That's where the kqueue API fails and that's why it is not as good as the Linux approach. You talk about "extensible". But extensible really means touching the kqueue API because that's the only way to extend it. On Linux epoll is far more extensible because it just means adding a new descriptor type.
I don't know why you claim that Linux ignored the hard problem of generic event APIs. Unlike kqueue, Linux has a truly generic event API. That's what I've explaining since the beginning.
Even if Linux had added kqueue and added an EVFILT_DIR then there would have been no guarantee that the BSD's would adopt the same API. There is no committee in charge of kqueue. The licenses are not portable and even if they'd agree on being compatible for kqueue then this would touch parts beyond it. And then there is the huge questions, why a directory and a file should be treated differently, when the only difference is the provided meta data.
Apple apparently has added a completely different approach to fs notifications. Based on a device and a deamon. Why didn't they just add an EVFILT_DIR, if that's as easy as you claim? Why didn't the BSDs do it?
Claiming that inotify is a hack or incoherent is just beyond ridiculous. Inotify is far better than anything kqueue offers. So if anything then kqueue is a hack and badly thought out...
If the BSD folks would simply adopt inotify then it would be one less painful thing to deal with when porting applications to BSD. All I hear nowadays is BSD folks complaining that the Linux folks are doing things differently and BSD has a hard time getting all the software to run they want. So adopting inotify would make it easier for the BSDs. In the end I don't care. But at least they should come up with a somewhat decent API. Doesn't matter if it's inotify or some NIH'ing.
You're absolutely 100% full of crap. I already explained how kqueue's flexibility means that an EVFILT_DIR can be added easily, and work with all other event mechanisms on the system.
> The Linux devs had 5 years to come up with the inotify design because their API is more flexible.
That doesn't even begin to make sense. There's nothing inherent in the kqueue API that mandates that all filter types be defined upfront and never be extended or modified.
> On Linux epoll is far more extensible because it just means adding a new descriptor type.
How is that more extensible? The whole point of kqueue is that there exists a class of event types for which a read-based FD API is not the most efficient mechanism for event dispatch.
> Apple apparently has added a completely different approach to fs notifications. Based on a device and a deamon. Why didn't they just add an EVFILT_DIR, if that's as easy as you claim?
Corporate developer politics. The people working on kqueue and the people working on Spotlight (the primary FSEvent consumer) were in two separate universes.
> Why didn't the BSDs do it?
It clearly hasn't been a priority for anyone; kqueue meets most needs just fine.
You are not helping your point by insulting me. I know that EVFILT_DIR can be added. I never disputed that. But it needs changes to the kqueue API. And apparently it is not so easy because nobody has done it and even operating systems using kqueue have opted to do it differently. You can talk all day about how things could be done theoretically. But that doesn't matter if reality looks so much different. Maybe you should ask yourself why EVFILT_VNODE was so poorly designed.
If by "changes to the kqueue" API you mean "adding a new filter type, as kqueue was designed to support", then how is the kqueue API poorly designed? Why should the BSDs adopt a completely separate inotify hack?
"100% full of crap" is an accurate assessment when you keep trotting out completely nonsense arguments as fact instead of actually engaging on the technical content; consider it my adopting of Linus' discussion style. Is there some specific problem that can't be solved within the confines of the existing kevent types and data structures?
This seems pretty simple:
- EVFILT_VNODE isn't poorly designed for the use-case of monitoring a file.
- EVFILT_VNODE is poorly designed for the use-case of monitoring a directory hierarchy, but that isn't the intended use case.
Nowhere have you demonstrated why the BSDs should drop kqueue in favor of Linux's hack; as for why it hasn't been implemented: because apparently no other kernel developers have needed it badly enough.
Mac OS X FSEvents as a counter-example doesn't hold technical water; the reasons for FSEvents being separate from kqueue are solely to do with Apple's internal political machinations, and you've provided no technical analysis that demonstrate otherwise.
There's been talk (at BSDCam) of implementing inotify which is needed for the Linuxulator and exposing it in the FreeBSD API too. So your wish isn't so far fetched but I'm not sure anyone is actively working on it. Patches welcome :)
"It is the 2013th year in the Common Era at the moment of this writing and you might think that people should have came up with something better in terms of signal handling at this time. The truth is that they did. It is just not that well known yet due to a huge momentum of outdated information still overflowing the Internet."
Then they talk about kqueue.
The above paragraph emphasizes how much Unix programming still relies on old ideas.
Likewise, sockets are an old idea, and newer libraries, such as ZeroMQ, do a lot to fix the old problems. ZeroMQ is often described as "Sockets on steroids". It implements a lot of patterns that Unix itself does not give us:
"It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fan-out, pub-sub, task distribution, and request-reply. "
But I am left wondering, what would the world of programming be like if we had a new operating system that incorporated some of the new ideas and patterns that have developed over the last 25 years? Instead of depending on libraries, would we not be better off if we had an OS built around these newer ideas?
It's been tried. Look at plan 9, or MS Singularity, or heck even BeOS was a clean break in many respects, with certain things implemented at low level and used throughout the OS.
Turns out compatibility and understandability is really useful, and it's good to define your complex systems in terms of simple systems. Worse is better.
We wouldn't be better off, no. The newer OS would likely be less secure, less reliable, and probably less performant. It would also require all of the tools we take for granted to be ported to the new OS, which would be difficult since the above proposal is to change the API of the new OS.
It doesn't seem like there's anything wrong with libraries.
I don't see why it would be less secure, reliable, or performant. It might be that the new ideas, and new abstractions, make it harder to do things in insecure or unreliable ways.
Reliability problems: You're going to be porting a lot of programs to this hypothetical OS. The porting process is going to be difficult and error-prone by definition, because the proposal was to make significant changes to the kernel API. This leads to...
Security problems: When you change the fundamental assumptions of a program, you open them up to security flaws. I am primarily referring to ported programs when I say "the new OS will have security problems."
Performance problems: The performance will be worse than existing OS's, because drivers that work for Linux probably won't work for this new OS. This means you're going to have to get by with slower video drivers, at least initially.
These three types of problems aren't impossible to overcome. I doubt 2,000 years from now that people will be using Linux, BSD, or Windows. But you will have to overcome them.
I don't understand your point; kqueue is a comparatively new idea.
Do you want signals written as messages across ZeroMQ? In that case, look at Mac OS X's use of Mach ports for exception handling, which is an even older idea than kqueue.
The signal idea is to implement a user space soft IRQ like communication between process.
But we trigger them the wrong way
if should not be syslog that does kill -HUP daemon to have them release file but daemon that should have callback implemented like "on_file_destroy" and this event should be propagated to the root processes until the bubble up event is being stopped explicitly from being propagated.
The problem with that approach that is simple is it requires a careful definition of defined callbacks
As far as "modern signal handling" goes: On OS X, you can also just create a dispatch source of type DISPATCH_SOURCE_TYPE_SIGNAL. If your code already uses dispatch, this is much easier than trying to wire up to kqueue directly.
Why does the signal handler have to interrupt a thread? Why doesn't the OS just create a new thread to run the signal handler in? This would avoid this whole reentrancy problem and make the concurrency explicit.
Only a subset of signals come from mapping CPU faults to signals, the so-called synchronous signals. This example was about SIGINT coming from the usual source (Ctrl-C from terminal).
It's actually possible to handle it in a thread using existing semantics without kqueue or signalfd: block SIGINT from all threads except your dedicated SIGINT-handling thread.
That requires that you have control over all threads, which is just not something that can be guaranteed with modern operating systems and libraries that regularly spawn their own threads.
Thread-interrupting delivery of signals is necessary in a world where you're relying on that interruption of your single thread; a replacement that worked for the async signals is useful, but distinct from the needs of sync signal handling.
That doesn't actually work 100%: signals can be sent to the process as a whole or to a specific thread. The SIGINT-handling thread will be able to handle signals sent to the process, but not to any other thread of the program.
Threads are fairly expensive - you have to allocate and map a new stack, as well as a bunch of bookkeeping data structures in the kernel. While the kernel could of course start an auxillary thread at program initialisation, and re-use it for all signals, this does complicate the runtime environment a bit. And what happens if the signal-handling-thread triggers a signal?
I suppose the Unix tradition encourages giving the user the ability to centralise signal handling in a thread if he wishes (using kpoll or signalfd or whatever you want), but not forcing any overhead upon the program.
I looked around but couldn't find a great answer on how to do this correctly in Python on Linux. I have a daemon that attempts to shut down cleanly when it gets a number of signals and it appears to be working correctly. But, it's written like the first example.
Python does the right thing under the hood for you. The signal handler you register in Python isn't run in the dangerous context. There's an internal signal handler which just tells the interpreter a signal happened and when control is returned back to the interpreter outside of the signal handler it knows what to do:
It is possible to write thread safe signal (and portable) signal handlers doing exactly what you've done. Have the handler set a flag. That's is. Then, outside the handler, periodically check that flag and do the real work when it changes.
The main reason I prefer this approach is that sigaction is in the POSIX standard but not the C or C++ standards. sigaction isn't available on Windows, but signal is (then again, very few signals are available on Windows).
I still don't understand how queueing signals works.
For example, signalfd() is really cool, and you can indeed read() a signal from it, but only one of a kind.
If you get, say, two SIGINT's you may as well just be able to read() one. I guess it's an implementation detail that some signals aren't queued while some others are, I guess. In practice it means that receiving a signal is an edge, and in signal handler you must check how many times the signal occurred. This for me sounds inherently racey.
You don't check how many times the signal occured - you write your code so that it doesn't matter how many times the signal occured.
For example, when you're handling a SIGCHLD you don't assume exactly one child has exited - you loop around calling waitpid() with WNOHANG and handle every child that's exited.
Note that the standard signals are not queued, a signal handler for a specific signal might be run just once if several signals are sent between the first signal arriving and the signal being handled.
In my experience it usual means that you need some form of time reversal e.g. unsend an email or put a BEGIN TRAN before the SQL you just executed.
Reasonably sure that can't be done in C even with all that undefined behaviour.
However, if your IO monad implementation is sufficiently slow, you could ^C between when you thought the code had executed and when it actually bothered to get around to it.
I can't believe that none of the functional ninjas thought to add time travel as a benefit. Most remiss...
Haskell perpetually astonishes me. Not only is it, more or less single-handedly, keeping the Holy Wars alive, it now appears to supply punchlines to jokes. Amazing.
Although I cannot recall any specific name right now, I have certainly seen programs handling a single Ctrl-C different than say two Ctrl-C in quick succession.
Example: Pressing one Ctrl-C will display a message like this on some programs (which do not want to terminate themselves upon receiving a single SIGINT, because accidents happen): "Press Ctrl-C again to quit."
I've always found it interesting that the list of safe functions includes read(), write(), and most of the filesystem operations; so while printf is not safe, you could use write() if you really wanted to output.
this seems like yet another case where a thread safe lockless queue would be good. setting a volatile atomically is fine for the simplest cases, but what if you want to handle every signal raised independently and not just do something if any signal is raised?
the real answer here i think is to think about code being run concurrently and how to do that safely... where a signal handler is a special case that, actually, requires no extra special treatment.
I find the BSD kqueue(2) interface to be much more elegant than any of Linux's *fd() functions or epoll(7). On the other hand, Linux's inotify(7) is, I find, cleaner to use for file system event notifications than kqueue(2). But then there's also fanotify(7) on Linux, which seems to have sort of been neglected since its initial hype. Then OS X has FSEvents.
I guess this is why one may need libraries like libevent and libuv.