How about some discussion of Fuchsia itself, instead of "why reinvent the wheel"...

CodeArtisan · on April 12, 2018

>A lot simpler and more consistent, and does a much better job of "everything is a file" than the POSIX API ever did (of course, it's "everything is a handle," but that's fine, the point is that there's one consistent way to work with everything).

I am failing to see how this is more consistent. With UNIX, because everything is like a file, you operate on them the same manner. A file, a socket, a pipe, shared memory, ... you open them then you use the system calls for operating on files: read(), write(), poll(), dup(),... which then allow you to use operations built on these syscalls such as fprintf, fscanf,... but also all the tools like cat, head, grep,... This is what i would call consistency.

If i implement a new feature as a file in linux, for example a virtual filesystem like /proc/, all the cited operations would already be available out of the box.

lambda · on April 12, 2018

But this is how Fuchsia is as well; these handles are pretty much equivalent to file descriptors, except for how they get numbered/allocated (though for C library compatibility, there is a per-process file descriptor table to map between file descriptors and handles).

Even on UNIX like systems, you can't read or write on every file; for instance, you can open directory, but you can only readdir on that, not read from it. But they are still file descriptors like everything else, so you can call dup(), fstat(), pass them between processes on Unix sockets, etc.

There are plenty of other operations which can only be done on certain types of files in UNIX-like systems; for instance, you can only recv() or recvmsg() on a socket.

The difference is that in Fuchsia, more things have handles, and so more things can be treated consistently. For instance, jobs, processes, and threads all have such handles; so instead of getting a signal that you have to handle in an extremely restrictive environment in a signal handler or having to call wait4() to learn about the status of a child process, you can just wait on signals to be asserted on the child process using zx_object_wait(), which is the equivalent of select() or poll(). This means no more jumping through hoops to get signal handling to work with an event loop; it just works.

Of course, the other difference in Fuchsia is that there is not a single namespace. Every component in Fuchsia has its own namespace, with just the things it needs access to; there is no "root" namespace. This is good for isolation, both for security reasons and reducing accidental dependencies, though I do wonder how much of a pain it would make debugging and administering a system.

CodeArtisan · on April 12, 2018

My point was that with UNIX, while you have specialized operations like recvmsg, you still have read() and write() acting as an universal interface. If you look at Fuschia system calls, you would see

    vmo_read - read from a vmo
    vmo_write - write to a vmo
    fifo_read - read data from a fifo
    fifo_write - write data to a fifo
    socket_read - read data from a socket
    socket_write - write data to a socket
    channel_read - receive a message from a channe
    channel_write - write a message to a channel
    log_write - write log entry to log
    log_read - read log entries from log

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

lambda · on April 12, 2018

Hmm. On UNIX read() and write() are not universal; you can't use them on directories, for instance, nor can you use them on various other things like unconnected UDP sockets.

Treating everything like an undifferentiated sequence of bytes can cause impedence mismatches; each of these types of handles has very different ways that you work with it. For instance, a VMO is just a region in memory. A FIFO is a very small queue for short equally sized messages between processes. A socket is an undifferentiated stream of bytes. A channel is a datagram oriented channel with the ability to pass handles. The log is for kernel and low level device driver logging.

In fact, it looks like the Zircon kernel has no actual knowledge of filesystem files or directories; they are actually channels that talk to the filesystem driver (another userspace process) over a particular protocol.

The thing about having one single universal interface like read() and write() to a lot of fairly different things is that they each actually support different operations; you can't actually cat or echo to a socket (not without piping into nc, which does that for you). Or you can't just echo data into most device files and expect it to work; some of them you can, like block devices, but others you need to manipulate with ioctls to configure properly.

What Fuchsia is doing here is acknowledging the different nature of the different types of IPC mechanisms, and so giving them each APIs that better matches what they represent. A VMO can be randomly read and written to; none of the others can. A FIFO can only accept messages in an integral number of equal size pieces that are smaller than the FIFO size, which is limited at a maximum of 4096 bytes; it is used for very small signals to be used in conjunction with other mechansims like VMOs. A socket provides the traditional stream abstraction, like a pipe or SOCK_STREAM on UNIX, in which you can read or write new data but can't seek at all. A channel provides datagram based messages along with passing handles.

One of the big things that I think the Unix model makes hard is telling when something is going to block; because read and write assume that the file is one big undifferentiated blob of bytes, it can be hard to tell when it's safe to do so without blocking. On the other hand, each of these is able to have particular guarantees about what you can do when they report that there is space available.

I admit that the log ones seem redundant; I would think they would make more sense as just a particular protocol over channels. I don't see any reason for that one to exist separately.

I wonder why you would think it would be better to have one interface that isn't an exact match for a lot of different IPC types, than separate specific interfaces that match them? They are all tied together by being handles, so you can dup them, send them to other processes, and select on them just the same, but the read and write operations behave quite differently on each so having an API that reflects that seems reasonable.

If you like to think in object oriented terms, think of them as subclasses of handle. If you like to think in terms of traits or interfaces, think of there being one generic handle interface, plus specific operations for each type of handle.

The "every thing is a file, and a file is an undifferentiated bag of bytes" is in some ways a strength of UNIX, but in other ways a weakness. You then have to build protocols and formats on top of that, kernel buffer boundaries don't necessarily match up with the framing of the protocol on top, and so on.

And all it takes to give you the power to manipulate things in the shell is appropriate adapter tools. Just like nc on UNIX allows you to pipe in to something that will send the data out on a socket, you need some adapter programs that can translate from one of these to another (and from filesystem files, since those don't even exist at this abstraction level); of course, in many cases, you're probably going to need some serialization format for things like channel datagram boundaries, and there are some things that just can't be translated from a plain text bag of bytes (like handles).

AnIdiotOnTheNet · on April 12, 2018

But you can't actually treat everything like a file, so you have to turn to ioctl and whatever structure is actually related to what you're working with in reality.

In a sense, we only treat everything as a file the same way some languages have a toString method for everything. You can get something out of it no matter what it is, but there's no guarantee that something is going to be in any way useful and you still have to interact with it in a way that doesn't treat it like a generic string.

zbentley · on April 12, 2018

The issue that GP was pointing out is that in Linux, everything is not a file. And the inconsistencies (in Linux or the POSIX model; take your pick--I'm not interested in splitting those particular hairs) are a hassle to program around. For example:

- Non mmap'd memory: what do you do with memory retrieved by [s]brk(2)?

- Signals: are they files or not? Signalfd is provided in addition to lots of other complicated other facilities for handling around the same things. And that's before getting into realtime signals and the like.

- Timers/events: are they files? vDSOs? Available via files? No? Depends on which syscalls/APIs you're using.

Additionally, many of the non-file-ish APIs in POSIX implementations are a massive hassle to use across multiple threads. Even the file-based APIs for eventing (select/poll/epoll.) have substantially different behavior with regards to edge/level triggering, and different behavior when used across multiple processes (shared fia fork or non-CLOEXEC handles) or threads. Can you master them and use them correctly? Sure. But there are so many ways to achieve very similar things, and, while each method has its niche, there's no unifying thought model that ties them together in the same way that "everything is a file" was promised to tie together UNIX operating systems.

Oh, and /proc is being moved away from/found to have limitations because it's a file descriptor exhaustion hot-spot, among other things. So the plan9-style of "expose the internals as a filesystem" appears not to be the route that the Linux community is pursuing.

TL;DR if you're writing simple, non-concurrent standalone utilities or working at a very high level then sure, you can pretend everything is a file. Below that, or in more complex applications, those abstractions break down in surmountable-but-annoying ways.

lambda · on April 12, 2018

Yes, this is exactly what I was getting at.

hammerandtongs · on April 12, 2018

Thank you for writing this it was just what I wanted :)

What is your sense of the latency guarantees/scheduling to user space graphics and input? This is an area that everything else kind of fails at. "No drawing the groundplane and every possible frame for this user you have in your VR grasp is not optional".

Is the security actually in use or just someday?

edit: security looks baked in but tbd

lambda · on April 12, 2018

No idea about the latency guarantees or scheduling. I've mostly just read and summarized some of the overview docs, and there's been nothing written on scheduling or latency that I've found yet.

It looks like the isolation of different applications, and capability based security, is pretty baked in. There do seem to be some TBD parts, like right now just like on Android apps can either have access to all of /data or none of it, and fixing that is something they list that they want to do but haven't yet.

JdeBP · on April 14, 2018

Almost none of what you discuss is novel. As others have pointed out, this was the sort of innovation that people were coming up with in the 1980s, and has long since been expressed in systems like GNU Hurd and Windows NT.

Multiple signallable states per handle, rather than a single bit's worth of "signalled", is slightly novel. It's a fairly obvious generalization of WaitForMultipleObjects once one realizes that there are multiple ways in which one can wait on a handle. It's one that I implemented in a hobby operating system about 10 years ago, and I certainly didn't consider it groundbreaking.

That program loading mechanism isn't novel, contrastingly. Again, I did much the same for my hobby operating system. It's almost an inevitable design given the a desire to support an API with "spawn" (as opposed to "fork") semantics. And it has been in Windows NT all of these years, which from the start had to provide underlying mechanisms to support both models as required by its POSIX, OS/2, and Win32 subsystems. One can create a process object as a blank canvas and then something that has the handle can populate it with what are to be its program images; or one can create a process object that is "copy constructed" (as it were) complete with program images from an existing process. Your consequent forking-of-a-spawned-template idea for worker processes is interesting, but consider that an operating system capable of doing that has existed for quarter of a century now, and people haven't really made use of it in the real world. Not even the Cygnus people. (-:

More interesting from the operating systems design perspective are things that you've overlooked. One of the things that also happened in the 1980s was a reinvention/replacement of the concept of a POSIX terminal. OS/2 got some VIO, MOU, and KBD subsystems, and the applications-software level concepts of directly-addressable video buffers and queues of mouse/keyboard input events that encompassed all of the keys on a keyboard including function/editing keys. Windows NT took that further with its "console" model, unifying mouse, keyboard, and other input events into a single structure and unifying input and output (albeit with some kludges under the covers) into the waitable handles model. GNU Hurd contrastingly retains the POSIX terminal model, albeit that it is all implemented outwith the kernel, without even a pseudo-terminal mechanism or a line discipline within the kernel, and the console dæmons do have cell arrays that could in principle be accessed by applications softwares.

* http://jdebp.eu./FGA/tui-console-and-terminal-paradigms.html

* http://jdebp.eu./FGA/hurd-daemons.html

* http://jdebp.eu./Softwares/nosh/user-vt-screenshots.html

It's worth considering what design choices Google et al. have made in this area.

Then there are other lessons to learn from Windows NT. WaitForSingle/MultipleObjects having more than 1 bit's worth of "signalled" is one improvement that hindsight yields, as I have mentioned. Another is the lesson of Get/SetStdHandle. The Win32 API only supported three handles. And so language implementations that wanted to provide POSIX semantics in their runtime libraries had to implement the same sort of bodges with "invisible" environment variables that they did to make it appear as though there was more than a single current directory. They implemented their own extensible Get/SetStdHandle mechanism in effect, which only worked for coöperating runtime libraries, when it would have been far better for this to be provided by Win32.

* http://jdebp.eu./FGA/redirecting-standard-io.html#SystemAPIW...

* https://github.com/open-watcom/open-watcom-v2/blob/e5a5cab04...

* https://unix.stackexchange.com/a/251215/5132

* https://unix.stackexchange.com/a/413225/5132

Again, it's worth looking to see whether Google et al. have learned from this and provided a common language-neutral descriptor-to-handle mapping mechanism, and a means for that table to be inherited by child processes.