i know, but why doesn't the OS do this too? Most programmers don't care if their...

pavlov · on July 2, 2019

Windows provides fibers in addition to threads. Apparently it’s a quite good API although not widely used:

https://nullprogram.com/blog/2019/03/28/

jacques_chester · on July 2, 2019

OS scheduling requires you to switch contexts between different modes in the CPU. Essentially, the OS has the power to perform operations that regular code does not. Threads and processes are usually built on top of these additional permissions.

Context switching is relatively expensive. The CPU needs to push a lot of application state out of the way to clear the path for the OS code to run, then after the OS is done, push the OS out of the way and retrieve the application's state and code again.

Whereas a fiber remains entirely inside the application code. It never requires a context switch. But a fiber loses some of the powers of the OS: for example, it can't draw hard memory boundaries between fibers that will be enforced by the CPU.

For some things you want processes, for some threads, for some fibers.

wahern · on July 2, 2019

Context switching doesn't need to be as expensive as it is in these cases, it's just that Linux doesn't provide the mechanisms needed to make it more efficient. See, e.g., https://blog.linuxplumbersconf.org/2013/ocw/system/presentat... which implemented a syscall for a process-directed context switch directly to another thread.

TickleSteve · on July 2, 2019

If the OS were to do it, it would involve a switch to kernel-space and would then be termed an OS-level thread...

Another way of putting it is that yes, Operating Systems do do it. Thats what an OS-level thread is.

garmaine · on July 2, 2019

They do: it’s called threads.

It’s the OS context switch that makes process threads so much more costly than user-scheduled fibers. You cannot involve the OS if you want efficiency.

Otherwise it is the same exact thing.

chrisseaton · on July 2, 2019

> If the benefits are this large, it should be provided by the OS in my opinion.

Like with most things, there are drawbacks as well as benefits.

lmm · on July 2, 2019

The OS doesn't know the details of what things are specific to a given "thread". So it has to take a "big dumb sledgehammer" approach to switching tasks: it has to switch out the whole stack (4k or 8k) even when there are only a few bytes that belong to that particular task (because it has no way of knowing which bytes those are), and it has to interrupt the tasks at essentially arbitrary times rather than waiting for them to yield (because, again, it has no way of knowing what the yield points are) which then means the tasks have to have more synchronization overhead etc. to work around the fact that they might be preempted at any point.

In this age of VMs/containers for everything I'm not convinced a conventional OS offers a lot of value - OSes made sense when programs needed to access different kinds of hardware and we liked to have multiple processes/users sharing a single machine while broadly trusting each other, but neither of those things is really true any more. Look at unikernels for where I think the future is going - bootable VMs that act as a language runtime that controls things like threading directly, no need for an OS intermediary.

deathanatos · on July 2, 2019

> it has to switch out the whole stack (4k or 8k) even when there are only a few bytes that belong to that particular task (because it has no way of knowing which bytes those are)

This reads to me like you believe the OS must memcpy/move the whole stack out of the way on a context switch. It doesn't; the other thread has other, dedicated memory its stack, and on a context switch, the stack pointer is simply adjusted to point at the other stack.

Assuming Java's fibers work similar to other green thread implementations, green threads/fibers work similarly — it's just a pointer change, except the adjustment is done in userspace.

(And, to some degree, the OS does know what bytes are stack for any given task. It's whole pages, yes, but that allows the program to manage it otherwise. But I think most green-thread/userspace threads are similar: the stack is preallocated ahead of time and left to the thread to manage. Sure, you might not know the exact range, but you don't really need to? Go, I think, is an interesting outlier here; IIRC, it dynamically expands and contracts the allocated space on the stack in response to the application's demands, though I do think they had some interesting issues w/ loops thrashing allocations if they fell along an allocation boundary. I think they've also long since fixed that issue.)

wahern · on July 2, 2019

Java's fibers actually work like the previous poster described, which may explain the confusion:

> The current prototype implements the mount/dismount operations by copying stack frames from the continuation stack – stored on the Java heap as two Java arrays, an Object array for the references on the stack and a primitive array for primitive values and metadata. Copying a frame from the thread stack (which we also call the vertical stack, or the v-stack) to the continuation stack (also, the horizontal stack, or the h-stack) is called freezing it, while copying a frame from the h-stack to the v-stack is called thawing. The prototype also optionally thaws just a small portion of the h-stack when mounting using an approach called lazy copy; see the JVMLS 2018 talk as well as the section on performance for more detail.

Source: https://wiki.openjdk.java.net/display/loom/Main

The video presentation describes it better. I think in their expected usage scenarios the stacks of fibers aren't particularly deep so the copying isn't that expensive. Also, IIRC, doing it this way was less intrusive to the existing JVM architecture; it's possible in time they'll rearchitect things to use a more traditional technique.