Hacker News new | past | comments | ask | show | jobs | submit login
Coroutines for Go (swtch.com)
330 points by trulyrandom on July 17, 2023 | hide | past | favorite | 182 comments



It looks like a lot of people are missing the point here. Yes a coroutine library would be a worse/more cumbersome way to do concurrency than the go keyword.

The use case motivating all the complexity is function iterators, where `range` can be used on functions of type `func() (T, bool)`. That has been discussed in the Go community for a long time, and the semantics would be intuitive/obvious to most Go programmers.

This post addresses the next thing: Assuming function iterators are added to the language, how do I write one of these iterators that I can use in a for loop?

It starts by noticing that it is often very easy to write push iterators, and builds up to a push-to-pull adapter. It also includes a general purpose mechanism for coroutines, which the adapter is built on.

If all of this goes in, I think it will be bad practice to use coroutines for things other than iteration, just like it's bad practice to use channels/goroutines in places where a mutex would do.


I think it's also worth mentioning that for certain specific use cases coroutines are much more efficient than a full goroutine, because switching to/from a coroutine doesn't require context switching or rescheduling anything. If you have two cooperating tasks that are logically synchronous anyway (e.g. an iterator) it's much more efficient to just run everything on the same CPU because the kernel doesn't have to reschedule anything or power down/wake up CPU cores, and the data is in the right CPU caches, so you'll get better cache latency and hit rates. With goroutines this may happen anyway, but it's not guaranteed and at the minimum you have the overhead of going through the Go runtime goroutine scheduler which is fast, but not as fast as just executing a different code context in the same goroutine. Coroutines offer more predictable scheduling behavior because you know that task A is switching directly to task B, whereas otherwise the goroutine scheduler could switch to another available task that wants to run. The last section of the blog post goes into this, where Russ shows that an optimized runtime implementation of coroutines is 10x faster than emulating them with goroutines.

Google has internal kernel patches that implement cooperating multithreading this way (internally they're called fibers), and the patches exist for precisely this reason: better latency and more predictable scheduling behavior. Paul Turner gave a presentation about this at LPC ~10 years ago that explains more about the motivation for the patches and why they improve efficiency: https://www.youtube.com/watch?v=KXuZi9aeGTw


It's not just about performance, but also safety and ergonomics. Since true coroutines[1] offer predictable scheduling, their behavior with regards to data races and deadlocks is also more predictable.

If programmers try to manually implement iterators, generators and interleaved state machines with their own goroutines and channels, it's not just performance that suffers - there is too much room for error.

[1] I'm using the qualifier "true" here, since many modern languages (such as Python, Kotlin) use the term "coroutines" for something that is more like Go's Goroutines than Lua's coroutines. Unlike Go, they are not preemptible, but they are (at least by default) implicitly resumed when necessary by some scheduler, and they may execute on different kernel threads and switch contexts.


> I'm using the qualifier "true" here, since many modern languages (such as Python, Kotlin) use the term "coroutines" for something that is more like Go's Goroutines than Lua's coroutines.

Python has both true (ish) coroutines (or at least coroutines which are entirely user controllable), which it mostly uses for iteration, and the concurrency specialised “async”.

Initially the goal was to reuse “yield” for concurrency (a big reason why “yield from” was added), but the ergonomics of mixing multiple coroutines uses was found to be awful, at least for the langage python is, and trying to provide relevant sugar difficult.


It's almost true. Python calls its "almost true" coroutines "generators", while the implicitly scheduled, async I/O-oriented coroutines are just called "coroutines".

I understand that the historical reasons for that ("generators" originally only supported "yield" and a separate async/await design with implicit scheduling seemed more ergonomic). But that doesn't remove the confusion.

But Python's generators are still fully "true" coroutines. Russ makes the distinction between coroutines and generators based on whether as "Generators provide less power than coroutines, because only the top-most frame in the coroutine is allowed to yield. ". I think this is a bad definition, as generators can clearly "pass" the yielding power to other generators with "yield from", and Russ talks about this feature in the same post. Generators require more ceremony, but I don't think they are "less powerful" in any meaningful way .

Roberto Ierusalimschy, the creator of Lua, and Ana Lucia de Moura proposed the distinction between stackful and stackless coroutines[1]. Their paper revived coroutines as a research subject and had a lasting impact on coroutine-related terminology. Ierusalimschy and Moura made the distinction between "Stackful" coroutines and "non-stackful coroutines" (the term "stackless coroutines" stuck later on, but they did not use it), where Stackful coroutines like Lua's coroutines can "suspend their execution from within nested functions". They also made the same allusion to power that Russ does: "powerful enough to constitute a general control abstraction; in particular, they can- not be used as a concurrent construct".

I think both Ierusalimschy and de Moura are onto something when they call generators "limited", but they are just incorrect when they say that "but are not powerful enough to constitute a general control abstraction; in particular, they cannot be used as a concurrent construct". Early Node.js libraries and frameworks such as "co" and "koa"[2] should count as a proof that you can very much use generators as a concurrency construct. Generators have all the power that is necessary for dealing with normal concurrency cases, although they don't have the best ergonomics.

But there is another distinction where "generators" are not the same as Lua coroutines, and clearly lack some expressive power (even though this power is NOT a necessary condition for implementing full-fledged concurrency). Lua allows coroutines to communicate bidrectionally: yield() can pass a value back the coroutine caller, which is returned from resume() - but when the coroutine caller calls resume() again they can pass an argument which will be returned from yield(). Generators are unidirectional: they only allow you to pass a value to yield(), but not to resume().

[1] http://www.inf.puc-rio.br/~roberto/docs/MCC15-04.pdf [2] https://news.ycombinator.com/item?id=6933358


> Lua allows coroutines to communicate bidrectionally: yield() can pass a value back the coroutine caller, which is returned from resume() - but when the coroutine caller calls resume() again they can pass an argument which will be returned from yield(). Generators are unidirectional: they only allow you to pass a value to yield(), but not to resume().

I might be missing something as I’m not familiar with Lua’s coroutines but that sounds like send (https://docs.python.org/3/reference/expressions.html#generat...). The parameter to `send` (from outside the coroutine) is the return value of `yield` (inside the coroutine).


What is wrong with:

    for {
        next := getNext()
        ...
    }
What is the advantage of writing this as:

    for next := range getNext {
        ...
    }


In practice the difference would be closer to:

    getNext := iterableThing.Iterator()
    for {
        next, ok := getNext()
        if !ok {
            break
        }
        ...
    }
vs.

    for next := range iterableThing.Iterator() {
       ...
    }
One advantage is that it's slightly shorter, which matters for very common patterns--people complain about `err != nil` after all. Another advantage is there isn't another variable for everyone to name differently. Another advantage is that everyone can't do the control flow slightly differently either.


Only newbies tend to complain about `err != nil` in my experience. After a certain point it just clicks and they get used to it. There's a cadence to Go code (do the thing, check the error, do the thing, check the error) that is easy to read once you're used to it, but looks horribly verbose when you're coming from an exceptions-based language.

Go has simplicity as a design goal. Part of that simplicity is not adding all the features we can think of to the language. Just because it would provide slightly leaner syntax for a relatively small group of use cases isn't a good reason to add new language features (imho, from my 10+ years using Go and watching it evolve).

If we can implement this using current language features, but it's complex and messy to implement, then it's a great case for a new library, possibly even inclusion in the standard library. If we can't implement this at all using current language features, then maybe it's a case for a new language feature to enable this. If we can implement it relatively cleanly using current language features, then we're good and don't need to do anything. This seems like a "can implement, but not very cleanly" case, which would be a great justification for a library, but not a language feature. Again, imho.


I'm used to `err != nil`, but it doesn't mean I like it. It's a lot of what amounts to boilerplate in a language that is mercifully short of boilerplate elsewhere. This is doubly true when you want custom error types, and need to start unpacking values from your custom error struct.


>This seems like a "can implement, but not very cleanly" case, which would be a great justification for a library, but not a language feature.

First sentence of OP:

> This post is about why we need a coroutine package for Go

Then in the girst section after the intro:

> Later, I will argue for an optimized implementation provided directly by the runtime, but that implementation should be indistinguishable from the pure Go definition.


Yeah, so I'm agreeing that this would be a great library, but disagreeing that it should be a language feature. Sorry if I didn't make that clear.


Personally, I don't really see anything wrong with explicit error values, and checking them against nil.

The point I was trying to make is that some amount of people see the benefit in removing much smaller boilerplate. And so presumably at least as many people should see the benefit of removing a larger amount of boilerplate.

I'm just talking about the benefits here. There is also a cost to expanding the language. The costs might outweigh the benefits, but it doesn't mean that the benefits don't exist.


> There is also a cost to expanding the language. The costs might outweigh the benefits, but it doesn't mean that the benefits don't exist.

Agreed. I think this is where the discussion will focus. I'm glad that historically the Go team have weighted the costs heavily, and hope they continue to do so.

I guess I'm in the minority but boilerplate really doesn't bother me. Going the other way gets too much "magic" and we lose the ability to fine-tune behaviour.


> complain about `err != nil` after all

Not to nitpick this specifically but as a generic reminder not all complaints are worthy of shifting the trajectory of a massively popular programming language.

Balancing "worthy" and "unworthy" changes is really hard both in the community and discussions like this one. I don't envy the teams that have to do it.


Not forcing to check errors is akin to what you do in C, and even there compilers complain about this.


I think more idiomatic go for the first case would be:

    getNext := iterableThing.Iterator()
    for next, ok := getNext(); ok; next, ok = getNext() {
        ...
    }
Which, yeah, the range cleans it up a bit, but it's not doing quite as much work as you're implying.


Don't forget that the important bit in Go is readability of the code. To me the range form is much easier to reason about than this for loop.


If we're going for readability instead of trying to stick to the case presented above, I'd go with:

    for i.HasNext() {
        current := i.GetNext()
        ...
    }
And just mutate the internal state of the struct. Probably throw it behind some sort of interface like:

    type Iterator[T any] interface { // feel free to strip out the generic
        HasNext() bool               // if you really only have one type
        GetNext() T                  // you do this with
    }
I think this still loses to ranging over a function, but I still think it should be compared to what you can do with the current language as opposed to what was originally posted.


The err!=nil is exactly what you don’t want to mention and the perfect example why you shouldn’t listen to all the close-minded takes some people might have.


Your code is an example of a "pull iterator", and it's not as much of a concern.

The harder case to deal with is a "push iterator", which are often much simpler to implement, but less flexible to use. See https://github.com/golang/go/discussions/56413

The OP is about converting a push iterator into a pull iterator efficiently by using coroutines. This provides the best of both worlds, simple iterator implementation and flexible use by its caller.


Besides what everyone else said, the obvious advantage is the latter builds in the logic for exiting the loop once getNext has run out of elements in the slice/map. Your former example will need a final step that's like:

    ...
    if getNext() == nil {
      break
    }
    ...
This isn't a huge boon, and is mostly a matter of style. But I prefer the latter because it's logic that gets handled with the range builtin, so my code can be more about application logic, rather than muddling in breaking out of loop logic.


None of the proposals submit your idea of writing things differently. The article proposes an implementation that is fully doable and usable with current spec and no breaking changes.

The point of coroutines is that they are little contexts that serve to be called

- many times

- sometimes with different parameters

- change state

- might be interrupted by callers

- might interrupt callers themselves

All of this can be done by other means. Just like any sorting can be done by copy pasting the same code, but generics make it less tedious. That's the same idea here. Some problems can be implemented as interleaving coroutines, and their definition is simple enough that you want to write it all in the body of some CreateCoroutine() function instead of taking out another struct with 5 almost empty functions. It will not solve all problems, but can more clearly separate business logic and orchestration.


The first doesn't do control flow


Depends how getNext is defined.


The only way to do control flow in getNext, consistent with how you partially defined it, is to panic. That means there is some important code wrapping the whole loop body that you left out.


That doesn't make sense. `getNext()` has no ability to break out of the loop.


Well, what was wrong with:

    for {
        next := <-channel
        ...
    }


Horrible overhead. If the loop does something simple, like summing integers, 99% of time will be spent switching between goroutines.

From TFA:

"Because scheduling is explicit (without any preemption) and done entirely without the operating system, a coroutine switch takes at most around ten nanoseconds, usually even less. Startup and teardown is also much cheaper than threads."

"For this taxonomy, Go's goroutines are cheap threads: a goroutine switch is closer to a few hundred nanoseconds"

Also check out https://news.ycombinator.com/item?id=29510751

and https://ewencp.org/blog/golang-iterators/index.html:

"Finally, the most natural channel based implementation is… slow. By a factor of nearly 50x. Buffering does help though, reducing that to a factor of 25x."


And even if it's a toy program and you don't care about performance, it's not as simple as just:

    for {
        next := <-channel
        ...
    }

You have to set up the channel and the goroutine that feeds it, you need to safely close the channel when the iteration is done (but not before, unless you like panics), you need to deal with panics inside the goroutine and possibly support cancellation if the iteration breaks early (unless you love memory leaks).

If you try to implement this pattern by hand, you are all too likely to make fatal mistake, and this is doubly true in the hands of an inexperienced programmer.

I appreciate the fact that Russ wrote this long post, gradually implementing `coro.New()` and improving its functionality and safety — and only in the very end, we get a short paragraph about performance. Good performance is important to make this feature attractive to use, but if the feature is clunky and error-prone, it wouldn't be worth much, even with great performance.


I'm responding to a comment asking for a justification for sugar for function call based iteration. My comment is an attempt to draw a parallel to the need for sugar for channel based iteration. I'm not trying to suggest channel based iteration as an alternative to coroutines.


The article mentions race conditions


I think coroutines in Go will make it possible to use Go as a host for clean definition of discrete element simulations. Without it, yielding to an actor is clunky.


Why not just range (or use an switch) over a channel and have a go routine running that pushes values into the channel?

I still don’t see the need for coroutines.


This is too slow to really be useful on its own


Still a kitchen sink move, though, isn't it?

Like, no careful thinking and good 80/20 solution this time. Just "huh, we'd need coroutines to do this `right` so let's just do that"

When they added generics, they really, really thought long and hard, and came up with a compromise that was brilliant and innovative in the balance it struck.

I would have hoped to see something like that here, like "we're adding this one feature to goroutines to have control in specific situations" feels like something that would be better than "we're going full Rust on this problem and just adding it."


This has been under discussion for a long time.

https://github.com/golang/go/issues/43557

https://github.com/golang/go/discussions/56413

I'm not sure Russ's personal blog is any kind of official statement "this is what we're doing" yet?


> I'm not sure Russ's personal blog is any kind of official statement "this is what we're doing" yet?

I've found it's generally a very strong indicator.

But to your point, this isn't a new issue, there's been a good amount of discussion around it over the years.


In the old discussion, I specifically complained about how it would add back door coroutines to Go via iterators, and now Russ has a post about formalizing the concept of a coroutine, while also keeping the optimization of coroutines as a runtime implementation detail. I feel vindicated. :-)

https://github.com/golang/go/discussions/56413#discussioncom...


I have written Go professionally for many years now and don't want to see it become something like the Python Twisted / Tornado / whatever frameworks.

The go keyword nicely prevents the annoying function coloring problem, which causes quite a bit of pain.

Sometimes in high performance contexts I'd like to be able to do something like e.g. per CPU core data sharding, but this proposal doesn't scratch those kinds of itches.


Coroutines and goroutines fill different niches. The latter already fill the niche the likes of Twisted fill. There's nothing here trying to pull anything akin to async/await into Go.

Coroutines will fill a different nice more akin to Python's generators. There are a whole bunch of places where this could dramatically cut down on memory usage and code complexity where you have a composable pipeline of components you need to glue together. The design appears to me to be entirely synchronous.


I have had some experience with wire protocols that might benefit from coroutines. If I use goroutines, I have to use sync or channels, and then be careful. If I use state machines, it can be cumbersome. I suppose we will see.


For sure cut down code complexity for the single clever code writer who wanted to be fancy. Not so sure about all the readers that will come afterwards.


Not at all. I often have to slurp in large amounts of data to be processed in some way. If you don't want to use huge amounts of memory for doing so, you either need to (a) write some horribly obtuse code or (b) use coroutines. Coroutines allow you do this in a more natural and composable way.

Anything involving a lexer and parser is a good example of this, and exactly one of the use cases given in the example. With coroutines, the lexer can take some input and emit a series of tokens which the parser can consume on the fly without any need to do any complex interleaving of the parser and lexer code. You also gain the ability to stop lexing early in case the parser encounters an issue.

I'm not sure what kinds of software you work on, but this is little more complicated or clever than Unix shell pipelines.


Isn't this literally the entire point of Golang?


> The go keyword nicely prevents the annoying function coloring problem, which causes quite a bit of pain.

The good thing is, nothing in the post even proposes anything that might fall in the function coloring problem.


> The go keyword nicely prevents the annoying function coloring problem

FYI when you write a goroutine and have to use Context, that's literally just function coloring


Ad absurdum, then anything other than full curried 1-ary-exclusive functions is coloring.

You can easily bridge context-using and non-context-using code, you can have nested contexts, you can have multiple unrelated contexts, you can ignore the context, probably most critically you can't pass values back up the context, I don't see any way these look like function coloring.


Not sure about in Python, but in Rust, you can easily bridge sync and async code. The function coloring problem is, at least in that case, grossly overstated.


Why was this downvoted? The person makes an interesting point.

For discussion, can you please provide a short sample in Rust to demonstrate your point? I would like to hear more.


I'm guessing it was downvoted for mentioning Rust ¯\_(ツ)_/¯

For sure - it's a little terse, just to demonstrate the various cases

  async fn do_something_async() {
    nonblocking_sync_call();
    tokio::task::spawn_blocking(blocking_sync_call()).await;
    async_std::task::spawn_blocking(blocking_sync_call()).await;
    async_call().await;
  }

  fn do_something_sync() {
    nonblocking_sync_call();
    blocking_sync_call();
    futures::executor::block_on(async_call());
  }
There is _a bit_ of ceremony required to bridge the two, but it's pretty minimal.

If you want to know more about any of that, happy to explain or provide links for further reading


Is it really? >90% of the time, I'm just passing a context so that I can cancel the rest of the goroutines if an early one fails and I don't need to do the rest of the computation.

That works just fine with throwing a context.Background in, which should be available anywhere, and not color the function.


Arguably the "problem" with other async/threading frameworks is that they build threading on top of iterators/coroutines. In this case they would be orthogonal, so it's probably not as bad as you think, though almost certainly some clever developer out there will do something stupid with it that will also get very popular (my bet would be abstracting goroutines and coroutines into a single thing)


Can you maybe share hints to better channel management patterns/frameworks?

Usually my goroutines related code feels like a mess because of them, and I don't know how to make them "more clean" (whatever that means). Any hints proven to be maintainable patterns would be highly appreciated.


I don't always use channels in the code I write.

For example, let's say I have []foo as input and I'm going to call a service on every element of the slice.

You could do it with channels, but oftentimes I'll reach for creating a slice of results and passing each goroutine an index into the slice. That way I can avoid any overhead that comes from having what is effectively a queue with a lock.

It depends on what I'm doing, how I want to treat failures of a single goroutine, etc. of course.

I default to never returning channels unless it's a requirement for the use case (e.g. context.Done() returns a channel so callers can listen for when the context is closed).

Callers can always create goroutines to make your code run concurrently. It's more annoying to erase channels when they aren't useful to the caller (e.g. if you made all functions that make HTTP calls return a channel because you assume callers want a new goroutine, but I already have my own goroutine per request)

Beyond that, some specific examples would be helpful so I can tailor my advice


Don't worry, it will get even worse, because its foundations are worse than pythons - and it also has less flexibility because it's not interpreted. Like it or not, it's going to happen. I also predicted that generics would be added to Go and folks here laughed about it. :)


Multitasking systems gave us processes.

But those were too much.

So we got threads, which are processes that share an address space, file table, and some other things. The scheduler can switch from one to the other more easily than between processes, and data can be shared between threads without needing serialization.

But those were too much.

So we got user space threads, which are logical threads of execution that are driven by a runtime entirely in user space. The runtime adds scheduling hooks into all I/O functions in the standard library, or even uses a system API like Unix signals to preempt logical threads. No system-level context switching is needed. User space threads can be tiny.

But those were too much.

So we got coroutines, which allow a programmer to define logical "threads" of execution that cooperatively interact with each other. There is no assumption about the presence of a scheduler. The programmer either writes their own event loop or invokes one from a library in a "real" logical thread.

I wonder what comes next. As far as [communicating sequential processes][1] are concerned, maybe cooperative coroutines are a low as you can go.

[1]: https://www.cs.cmu.edu/~crary/819-f09/Hoare78.pdf


Even coroutines can't save us programmers from the dread of having to explain every little possible detail to the computer, in a way that doesn't take forever to execute if you provide input that is even slighly different from what the programmer had in mind


I'm a bit surprised about the lack of languages that parallelizes code automatically as much as possible, but only as far as measured performance suggests. It requires a semantics that supports concurrency, for example iteration over sequences needs to be in unspecified order by default, and also rests on whole program flow analysis, but I see no principle obstacles.

Hasn't Microsoft done some research on such a language years ago already? There is also ParaSail made by someone from the Ada community. What happened to these projects? Nobody uses them?


Languages built on the BEAM VM like Erlang and Elixir support concurrency at the runtime level, though it's up to you to specify when you want to run something in parallel. Or am I missing why they don't fulfill your requirements?

I'm not sure if it's desirable to parallelize code automatically, as in many cases you do need at least _some_ parts of the code to run synchronously. But it's an interesting thought experiment to have parallelism be the default instead of opt-in.


Guy Steele Jr. worked on a language called "Fortress." There are a few good talks that he gives:

- (toy problem) https://www.youtube.com/watch?v=ftcIcn8AmSY

- (parallelism) https://www.youtube.com/watch?v=dPK6t7echuA

- (languages) https://www.youtube.com/watch?v=dCuZkaaou0Q


Turns out that autopar is an extremely hard problem beyond simple use cases. And for simple use cases existing solutions work fine (pragma omp for goes a long way for example).


I don’t think there’s any further down to go, but there’s probably room to go up to distributed computing. IIRC, some early versions of Go when it was in alpha had channels work across machines.


I think there is. How do you describe data parallelism? coroutine are too big when all your function call are doing the same things, but on a different data (think ISPC, shaders, cuda). There is still one more step in parallelism where coroutine cost to much :-)


Good point. SIMD routines!


Imagining next step could be something like: "process this collection of task-items as you wish". If you squint, any concurrent execution can be viewed as sequence of tasks (which can also be collections), even if there is only one or two of them.


Something along what Cuda, OpenCL or ISPC do, but integrated to the language.


I thought that the entire point of green threads was so that I didn't have to use something like Python's `yield` keyword to get nice, cooperative-style scheduling.

I thought go's `insert resumes at call points and other specific places` design decision was a very nice compromise.

This is allowing access to more and more of the metal. At what point are we just recreating Zig here? What's next? An optional garbage collector?


yield and resume here aren't keywords, they're regular variables (referencing regular closures) given those names purely for pedagogical purposes.

The only odd thing here is the creation and use of a cancel callback. I've not done much Go programming, so I don't know whether this is just a performance improvement to more quickly collect dropped iterator state, or it's a requirement because Go does not garbage collect goroutines waiting on channels for which the waiting goroutine has the only reference. In Lua you wouldn't need such a thing because coroutines/threads are GC'd like any other object--once all references are gone they're GC'd even if the last operation was a yield and not a return from the entry function.


Garbage collector in Go is optional. You can switch off garbage collection by setting the environment variable GOGC=off.

More info about GOGC: https://dave.cheney.net/tag/gogc


Only very theoretically - in Go, you don't control whether memory goes on your stack or the heap, and heap escape analysis is notoriously unpredictable. There is no explicit free. You would have to write Go in a completely crazy way to be able to turn off the GC and have the program not grow unbounded.

You might think "I'll just use static buffers everywhere", but allocations can occur in unexpected places. The compiler does some very basic lifetime analysis to eliminate some obvious cases (loops...), but it's really hard to avoid in general.


Ok so garbage collection is optional, how about garbage generation? Is there any way to manually clean up resources if GOCG=off, or will memory usage continue to grow unbounded as new objects are created?


Grows unbounded. I wasn't recommending that one should set GOGC=off. Just making a remark that one could should they choose to do so.

EDIT: Sorry, I misunderstood part of your question. The memory grows unbounded unless you call runtime.GC() which triggers garbage collection. But this is a blocking call and essentially block your whole program.


I think in theory you can write code (or do some tricks) to avoid all heap allocation.


"In theory" being the operative words there. Turn on heap escape analysis sometime, you'll be surprised how hard it is to avoid.


You mean not growing the heap or literally no allocation?


Arguably, one can't truly say the GC is optional, unless the language and its libraries were designed to work without it. In languages like Vlang, that's the case, as the GC was added later. If turning the GC off cripples the functionality and usefulness of the language, then there is little point in using the option or claiming it as optional.

Probably a better argument for Go (and other languages like Java) is how "tweakable" the GC is or at least describe it as the GC can be turned off, but it's not designed or useful to do so.


Well, then so is in Java, with the Epsillon “GC”. But not collecting garbage is very different than manual memory management.


Coroutines are one thing that i'd probably prefer language support for rather than a library.

    x := co func(){
        var z int
        for {
            z++
            yield z
        }
    }

    y := x()

    for y := range x {
        ...
    }

or something to that effect. It's cool that it can be done at all in pure go, and I can see the appeal of having a standard library package for it with an optimized runtime instead of complecting the language specification. After all, if it's possible to do in pure go, then other implementations can be quickly bootstrapped.

My $0.02, as someone that uses go at $work daily: I'd be happy to have either, but I'd prefer it baked into the language. Go's concurrency primitives have always been a strength, just lean into it.


you need language support for coroutines - what happens when the stack runs out in one of the coroutine? With a library solution you would have to kill the co-routine, or kill the program. If you want the stack to grow transparently, then that can only be done by the generated code, it has to watch stack usage and then grow the stack on demand (i think you have something like that with goroutines)

Well, maybe a library solution could possibly have a guard page at the end of the stack. When that one is reached, then you could try to grow the stack in an error handler. (but that would probably not work, if you have taken a pointer to some stack variable)


You don't need language support for coroutines if you have language support for relocatable/chunkable stack, which Go has.


I would prefer to add a `yield` param to functions so it can better interact with the current type system.

This means that for a function to yield, it needs to have the so called `yield` param and you can only yield inside a yield function functions with the same yield signature or does not contain a signature at all.

So the example would be rewritten into something like:

  x := func(:z int) {
    for {
      z++
      :- z
    }
  }

  for y := range x() {
    ...
  }
The : adds the yield signature and :< yields the value.

Functions that only yield a value `X` would just have the `: X` or `: name X`. To accept a value of type `Y` in the resume the signature would change to `:[Y] X` or `:[Y] name X`.

Accepting is weak, so if something expects a yielding function that resumes with Y and yield X, yielding functions that only yield X without a resume should also be accepted.

Consider the following code example:

  func filter[T comparable](it func(:T), f func(T) bool, :T) {
    for v := range it {
      if f(v) { :- v }
    }
  }

  func map[T](arr []T, :T) {
    for _, v := range arr {
      :- v
    }
  }

  for v := range filter(map({1, 2, 3}), func(x) { return x < 3 }) {
    print(v)
  }
Special functions in a co package could give resume and New functionality to keep the go style.

A basic counter could be written like this:

  func counter(:[bool]c int) {
    for {
      if ! :- c { break }
      c += 1
    }
  }
A func that counts twice could be as simple as:

  func countTwice(:[bool]c int) {
    count()
    count()
  }
To interact with the resume value the range syntax could be expanded to something like:

  for c := range count() {
    if c > 10 { -: false }
    -: true
    print(c)
  }
Range would resume with a default zero value if the range does not pass a value to -:.


Sure, as long as I can close over a yield object. There is no reason for filter and map to know that the last function parameter takes a continuation.


But then, how could you guarantee type safety?

This yield is not an object, it is handling the execution context to another stack.


It is a delimited one-shot continuation. Delimited continuations are first class objects like any other and it should be possible to close over them.


wow that is horrible. none of that is intuitive, which is one of Go strength. you would need to specific learn the Go semantic of coroutine to have any chance of writing or even reading code like this.


Goroutines and channels aren't intuitive either. You learn them and become familiar with them over time, at which point they become intuitive __for you__.


actually I get it now. the issue is that OP was giving TWO different examples of use, pulling a single value versus multiple. they should have clarified.


Yes, on review, I should have made two examples instead of combining them. Thanks for persevering through my brevity and reaching understanding :)


They're totally intuitive if you were an Occam programmer.


So it's intuitive because you are familiar with those concepts. I don't see how that goes against what I said.


Yes, you need to learn language features in order to use the language, shocker!


I've written go for a year or so. That looks close to go code I look at every day. I wouldn't be surprised if it was in the language.


Not sure I'm a fan. Looking through the examples, I feel like this makes the language much harder to read and follow, but maybe that's just my own brain and biases.

Further, it doesn't seem to me to allow you to do anything you can't currently do with blocking channels and/or state.


You’re absolutely right. People advocating for it can’t seem to see beyond their nose.


Yes, the person who spent 15 years developing the Go language can't see beyond his nose.


More likely they're trying to solve real problems you just haven't hit yet.


The go team has explicitly said this is the past: “You just don’t understand because you’re not at Google’s scale.”


Where have they said this specifically?


What language change are you talking about? This is just a proposed construct to regularise and make efficient something people already do (as you says with “state”).

I’ve used iterators similar to what’s described in this article to avoid allocations in critical code paths, but this would make those much less awkward to use (particularly with the upcoming range iterator language change).


Perhaps language change was bad wording, I guess I meant paradigm change encouraging? Just look at this func signature and first line...

> func Pull[V any](push func(yield func(V) bool)) (pull func() (V, bool), stop func()) {

> copush := func(more bool, yield func(V) bool) V {

The main power of Go to me was always quickly being able to read and understand code. This type of coding has a lot of cognitive load to a reader, I feel.


The implementation of Pull may be a bit mysterious to the casual reader, but its usage seems clear enough to me from the function signature. Even without reading the docs I think it’s easy to guess what yield, pull, and stop should do (and how to implement yield). That’s what matters, imo.

Fwiw if you look at the source of some other idiomatic standard library functions you may find their implementation is of similar complexity. That’s the nature of good abstractions, though, they should make tricky things easy to use.


A dozen different incompatible iterator implementations (the current stdlib) isn't easier.

Channels are slow for no good reason when they aren't wrapping blocking operations.


Reading the comments makes me feel bittersweet.

- Many people consider coroutines and green threads to be more or less the same thing, when they both have their pros and cons.

- The fact that the omission of iterators is even acceptable in the Go community saddens me. They seem to deliberately refuse any feature that might make the language even slightly more complex, in the name of simplicity. But hey, at least they retracted their opinion on generics.

I'm again reminded that Go is not my language.


I use Go daily. honestly generics didn't change my code much at all.

I use them now somewhat, but only because they offer slightly better sugar at the call site. at the definition site, they are just as ugly as you expect, granted the Go syntax is about the best I have seen from other languages. so in the end I think we really lost some simplicity in order to silence the whining about no generics. not a good tradeoff.


I wouldn’t confuse HN with “the Go community”. The head of the Go team wrote this post. Will the coro package proposed in the post be added exactly as written? Maybe, but probably not exactly as written. Will something like it be added? I would be willing to bet money on it, yes. How long will it take? I’d say at least a year (release in Go 1.23 in August of 2024), maybe a little longer. I don’t think it could be much shorter though.


Given Go's history, there's a moderately high chance that this will be in a PR and immediately merged within the next few hours. They're a bit bipolar on "listen, wait, debate" vs "we just wrote about a new thing, it's already built and in stdlib but is largely undocumented"


I’ve never seen them do this. What are you talking about?


The whole Go module drama.


> But hey, at least they retracted their opinion on generics.

No, we really didn't.

Generics were acceptable to the community only because they are fully backwards compatible with existing code, and can be safely ignored if you don't need them.

Which, not at all surprising, is what most golang code still does. Because as it turns out, outside of "collections" of one sort or another, practical use cases for generics are not that easy to find. Most code written never gets to see more than one type to begin with, and more than 2 is already a stretch.

If anything, the addition of generics showcased to the larger community, how little many of the vaunted features!!!!! that people keep demanding and labeling as "essential" are actually needed for a widely accepted and excellent language.


Generics didn’t even change up 20% of my codebase and from that 20% there where libraries, am not sure if Go or C habits has hatched into me, but I see generics as another library that one programmer from somewhere just solved something


I would have expected that the Go community welcomes a uniform interface for iteration.


I guess it is great that they are finally paying attentio to programming languages like CLU.

On the other side, given my experience with .NET and C++ co-routines, and Active Objects (in Symbian C++ and Active Oberon) not sure if this is really something to add to Go.

Even the .NET team has acknowledged at this year's BUILD, that if they could go back in time having the runtime handle them Go-style would probably been a better decision, given how many developers keep having issues understanding async/await.


Not sure if this really is required. Most cases in Go are served well by GoRoutines and for yield/resume semantics, 2 blocking channel are enough. This seems to add complexity for the sake of it and not sure it actually adds any new power to Go that already didn't exist.


Goroutines + channels add an enormous amount of overhead. Using them as iterators is basically insane.


Why? Channels are already iterable using the range keyword.

    ch := make(chan int)
    go func() {
        for i := 0; i < 100; i += 1 {
            ch <- i
        }
        close(ch)
    }()

    for i := range ch {
        fmt.Println(i)
    }
That is very simple.


"Enormous amount of overhead" is the operative phrase. In general, you want your concurrency operations to be significantly smaller than the payload of the operation. In the case of using channels as iterators with goroutines behind them, it works fine for something like a web page scraper, where the act of fetching a web page is enormously larger than a goroutine switch, but as a generalized iteration mechanism it's unusably expensive because a lot of iteration payloads are very small compared to a channel send.

I've encountered a lot of people who read that on /r/golang and then ask "well why are send operations so expensive", and it's not that. It's that a lot of iteration operations are on the order of single-digit cycles and often very easy to pipeline or predict. No concurrency primitive can keep up with that. A given send operation is generally fairly cheap but there are enough other things that are still an order or two of magnitude cheaper than even the cheapest send operation that if you block all those super cheap operations on a send you're looking at multiple factor of magnitude slowdowns. Such as you would experience with your code.


But if your iteration is so fast then you don’t need a channel at all. Just use a plain for loop.


Yes, that's exactly what the article is about: how we can model "plain for-loop" levels of performance in the presence of complex (intricate state at multiple levels and high potential for nesting iterators) code that is supplying the loop(s)


Then you lose the separation of iteration from the looping construct.

You can use a function or a method, but you lose the nice continuation/generator ability to write a function that can do complicated yields without having to write the state machine yourself, plus you run a risk that the call won't inline in which case you're incurring non-trivial function call overhead.

The problem with iteration in Go isn't that you can't solve any given individual problem, the problem is that you can't solve all of them simultaneously the way you can in Rust or Python. (Though one of these days I want to get around to benchmarking Python's iteration versus Go channel-based iteration, I'm not actually sure which would win. What Go considers dangerously slow can still be baseline performance for other languages.) So you can get a defeat-in-detail sort of thing where a person cites a problem, and someone posts as solution to that, and then they cite another problem, and there's a solution for that, and then there's another problem, and a solution is posted for that, and all the solutions do indeed more-or-less solve the given problem... but you can't combine them into one.


How do you solve the tree fringe problem with a for loop?


Since all recursive programs can be converted into iterative programs then you can "simply" (not always simple) convert recursive solutions like McCarthy's Lisp solution to a loop: https://dl.acm.org/action/showFmPdf?doi=10.1145%2F1045283 (page 5)

  (DE SAMEFRINGE (X Y)
         (OR (EQ X Y)
             (AND (NOT (ATOM X))
                  (NOT (ATOM Y))
                  (SAME (GOPHER X) (GOPHER Y)))))

  (DE SAME (X Y)
         (AND (EQ (CAR X) (CAR Y))
              (SAMEFRINGE (CDR X) (CDR Y))))

  (DE GOPHER (U)
         (COND ((ATOM (CAR U)) U)
               (T (GOPHER (CONS (CAAR U)
                                (CONS (CDAR U) (CDR U)))))))
Coroutines are not necessary for solving that problem (though they do offer a neat solution to it).


The results of that conversion are not fun


Aside from the massive performance penalty, cache thrashing and context switching, this code will also leak a goroutine (and so, memory) if you don't finish receiving from `ch`. It's more brittle, longer to write, less local and in every other way worse than a for loop. Why would you ever do it?


Race conditions. With coroutines you’re not supposed to deal with race. If i am understanding the motives


You can write to a channel concurrently.


And to make the concurrency safe you have to pay the price of synchronization.

https://go.dev/tour/concurrency/8

In A Tour of Go, concurrency section, "Equivalent Binary Trees" is an example of paying the price when you don't need it.


its not that you cant, it's that its very expensive.


It’s not insane at all. How did you come to that conclusion?

* Mutex lock+unlock: 10ns

* Chan send buffered: 21ns

* Try send (select with default): 3.5ns

Missing from here is context switches.

In either case, the overhead is proportional to how fast each iteration is. I have channels of byte slices of 64k and the channel ops don’t even make a dent compared to other ops, like IO.

You should absolutely use channels if it’s the right tool for the job.

Fwiw, I wouldn’t use channels for “generators” like in the article. I believe they are trying to proof-of-concept a language feature they want. I have no particular opinion about that.


> Missing from here is context switches.

Exactly. From rsc's previous post[1]:

> On my laptop, a C thread switch takes a few microseconds. A channel operation and goroutine switch is an order of magnitude cheaper: a couple hundred nanoseconds. An optimized coroutine system can reduce the cost to tens of nanoseconds or less.

[1]: <https://research.swtch.com/pcdata>


Yeah I 100% understand wanting to optimize this for something like generators if we imagine them as first-class constructs. But they’re not at all a replacement for channels – they would be an addition, or specialization. I’ve never seen real world Go code that has needed it but maybe this will change with generics. It’s worth keeping an eye on, at least.

Channels otoh are very versatile: everything from spsc to mpmc with buffering and starvation protections, fast cancelation and notifications, etc etc. They’re not perfect, but it’s a helluva bang-for-the-buck for a single primitive. Literally all you have to do for performance is add buffering and coalesce “units of work”, and you’re good to go.


Sure but that's what the implementation of the coro in this post uses under the hood. Not sure how this is any better wrt overheads.


Where did you get this '..it uses same under the hood'? The article clearly says:

..Next I added a direct coroutine switch to the runtime, avoiding channels entirely. That cuts the coroutine switch to three atomic compare-and-swaps (one in the coroutine data structure, one for the scheduler status of the blocking coroutine, and one for the scheduler status of the resuming coroutine), which I believe is optimal given the safety invariants that must be maintained. That implementation takes 20ns per switch, or 40ns per pulled value. This is about 10X faster than the original channel implementation.


The runtime switch was buried in the last paragraph of the article. All of the code was using goroutines and channels....


It does mention in an earlier section:

... That means the definition of coroutines should be possible to implement and understand in terms of ordinary Go code. Later, I will argue for an optimized implementation provided directly by the runtime,..


> Not sure how this is any better wrt overheads.

At the end he implements an experimental runtime mechanism that permits a goroutine to explicitly switch execution to another goroutine rather than using the generic channel scheduling plumbing.


Its in the last paragraph of the article...Very easy to miss given the code uses goroutines and channels.



This is covered in the article.


As a point of comparison, here's my demo from a recent presentation of firing up 1 million (1,000,000) Elixir (BEAM VM) threads, sending them all a "Hello!" message, and then each thread waits a random amount of time between 0 and 2 seconds to send a message back of "Process <their number> received message <themessage>!"

At the same time, I am running the Erlang observer beside it to watch what happens to the CPU and memory consumption and how quickly it recovers/cleans up the garbage.

The biggest bottleneck here is the terminal's ability to keep up, but the observer seems to reflect what's happening accurately.

https://www.youtube.com/watch?v=yxyYKnashR0

The code I used: https://gist.github.com/pmarreck/4cc8f2f55a561ebce2012085a3a...

These features have been built into Erlang (and thus Elixir) since the 1980's. I'm sure many of you have heard of the Actor model and/or Erlang's "legendary" implementation of it, but I don't know how many have actually seen it in action with monitoring kit running.

I think it would be great for Go if it offered language-level support like this, but given the extremely resource-efficient implementation (both in spawning and runtime consumption) of threads on the BEAM VM, coupled with the ease of concurrency which comes directly from only permitting immutable values, I don't think it will ever be matched.


I don’t think Coroutines would fit in with Go. There is a huge emphasis on simplicity. Coroutines add a massive amount of complexity. In addition, goroutines provide the best parts of Coroutines - cheap, easy to use, non-blocking operations - without a lot of the pain pints such as “coloring” or functions and issues with using things like mutexes.

Just the question of whether one should use a goroutine or a coroutine adds complexity.


There are plenty of complicated things in Go, IMHO where it shines best is judiciously providing incredibly nice interfaces atop the complicated things.


Having coded in go professionally, I disagree. Go abstractions leak in weird and unexpected ways that are surprisingly different to it's C/C++/Java predecessors.

Goroutines were kind of the raison detre' for using go. But using them wasn't simple, and instead often goroutines brought their own issues. See here:

https://songlh.github.io/paper/go-study.pdf

Often a programming language takes a first guess at the problems they want to solve, and often get them wrong. C++ is probably the most notable language in this category here.

That said, I do appreciate an attempt to improve programming languages even if it undermines the primary feature of the language itself.


There's no colouring here. These are synchronous coroutines.


I'm not 100% sure this is the case, but I believe the context of this goes something like this. As Go has added generics, there are proposals to add generic data structures like a Set. Generics solve almost every problem with that, but there is one conspicuous issue that remains for a data structure: You can iterate over a slice or a map with the "range" keyword, and that yields special iteration behavior, but there is no practical way to do that with a general data structure, if you consider constructing an intermediate map or slice to be an insufficient solution. Go is generally performance-sensitive enough that it is.

The natural solution to this is some sort of iterator, as in Python or other languages. (Contra frequent accusations to the contrary, the Go community is aware of other language's efforts.)

So this has opened the can of worms of trying to create an iteration standard for Go.

Go has something that has almost all the semantics we want right now. You can also "range" over a channel. This consumes one value at a time from the channel and provides it to the iteration, exactly as you'd expect, and the iteration terminates when the channel is closed. It just has one problem, which is that it involves a full goroutine and a synchronized channel send operation for each loop of the iteration. As I said in another comment, if what is being iterated on is something huge like a full web page fetch, this is actually fine, but no concurrency primitive can keep up with the efficiency of incrementing an integer, a single instruction which may literally take an amortized fraction of a cycle on a modern processor. With generics you can even relatively implement filter, map, etc. on this iterator... but adding a goroutine and synchronized commit for each such element of a pipeline is just crazy.

I believe the underlying question in this post is, can we use standard Go mechanisms to implement the coroutines without creating a new language construct, then use the compiler under the hood to convert it to an efficient execution? Basically, can this problem be solved with compiler optimizations rather than a new language construct? From this point of view, the payload of this article is really only that very last paragraph; the entire rest of the article is just orientation. If so, then Go can have coroutine efficiency with the standard language constructs that already exist. Perhaps some code that is using this pattern goroutine already might speed up too "for free".

The concerns people have about this complexifying Go, the entire point of this operation is to suck the entire problem into the compiler with 0 changes to the spec. Not complexifying Go with a formal iteration standard is the entire point of this operation. If one wishes to complain, the correct complaint is the exact opposite one, that Go is not "simply" "just" implementing iterators as a first class construct just like all the other languages.

Also, in the interests of not posting a full new post, note that in general I shy away from the term "coroutine" because a coroutine is what this article describes, exactly, and nothing less. To those posting "isn't a goroutine already a coroutine?", the answer is, no, and in fact almost nothing called a coroutine by programmers nowadays actually is. The term got simplified down to where it just means thread or generator as Python uses the term, depending on the programming community you're looking at, but in that context we don't need to use the term "coroutine" that way, because we already have the word "thread" or "generator". This is what "real" coroutines are, and while I won't grammatically proscribe to you what you can and can not say, I will reiterate that I personally tend to avoid the term because the conflation between the sloppy programmer use and the more precise academic/compiler use is just confusing in almost all cases.


A goroutine really is implemented as a coroutine though. It's just that without this runtime optimization, there hasn't been a way to interact with them without letting a multithreaded scheduler manage their execution.

I don't remember where I came across this, but many years ago I saw python-style generators termed "semi-coroutines" which I'm a fan of. Python (frustratingly) doesn't implement them this way, but the beauty of generators is that by limiting the scope where execution can be suspended to the highest-level stack frame, you can statically determine how much memory is needed for the (semi-)coroutine, so it doesn't require a single allocation.

Zig takes that a step farther by considering recursive functions second-class, which lets the compiler keep track of the maximum stack depth of any function, and thereby allow you to allocate a stack frame for any non-recursive function inside of another stack frame, enabling zero-allocation full coroutines, as long as the function isn't recursive.

That would... probably be overkill for Go, since marginal allocations are very cheap, and you're already paying the runtime cost for dynamic stack size, so the initial allocation can be tiny.

I would love to see full proper coroutine support make it to Go, freeing users of the overhead of the multithreaded scheduler on the occasions where coroutine patterns work best. I remember back in 2012 or so, looking at examples of Go code that showed off goroutine's utility as control flow primitives even when parallelism wasn't desired and being disappointed that those patterns would likely be rare on account of runtime overhead, and sure enough I hardly ever see them.


what? I'm a Go newb, but isn't this what goroutines and channels get you?


This is an interface that can be implemented in terms of goroutines and channels, but can also be implemented with a lower-overhead scheduler tweak. The article shows how it could be implemented using goroutines and channels, and then reports the result of that implementation versus an optimized version that avoids synchronization overhead and scheduler latency which is unnecessary with this pattern.

Currently, you could use goroutines and channels to implement a nice way to provide general iteration support for user-defined data structures, but because of the overhead, people most often opt for clunkier solutions. This change would give us the best of both worlds.


You can build this yourself using goroutines and channels, but adding "native" first-class support for this generator pattern would be easier to use and come with less overhead.


Aside:

Lua is an absolute work of art. Everything about the tiny language, how it works, and even all the little peculiarities, just makes sense.


One core Lua thing that I think is an ugly mistake: trying to represent maps (dictionaries) and arrays using a single logical data type.

Most languages use different data types but with some API overlap, e.g. maps and arrays are both "iterable". Lua goes too far, I think, and tries to make them the exact same, a data type they call "table".

One side-effect is that you have some operations that only really make sense for maps or lists, but since they work on all tables, they're defined awkwardly, e.g:

> The length of a table t is defined to be any integer index n such that t[n] is not nil and t[n+1] is nil; moreover, if t[1] is nil, n can be zero. For a regular array, with non-nil values from 1 to a given n, its length is exactly that n, the index of its last value. If the array has "holes" (that is, nil values between other non-nil values), then #t can be any of the indices that directly precedes a nil value (that is, it may consider any such nil value as the end of the array).


hmm. why are Lua arrays 1-index based?


Because arrays start at 1, offsets start at 0


Arrays are just a sequence of objects.

Indexes (traditionally) start at 1, offsets (measures from base) obviously start from zero.


good enough for Rome


[flagged]


As it stands, this comment contains no information. Can you explain what about Lua is problematic?


Wondering whether coroutines may be a step towards async event-based style APIs without allocating read buffers for the entire connection. I.e. a solution to problems discussed in https://github.com/golang/go/issues/15735. Goroutines provide a great way to have non-blocking IO with synchronous code – but when it comes to effective memory management with many connections Go community tend to invent raw epoll implementations: https://www.freecodecamp.org/news/million-websockets-and-go-.... So my question here – can coroutines somehow bring new possibilities in terms of working with network connections?


Somewhat on topic given that OP brought up coroutines in Python: what resources have folks used to understand Python's asyncio story in depth? I'm just now finally understanding how to use stuff, but it was through a combination of the official documentation, the books "Using Asyncio in Python" and "Expert Python Programming", none of which were particularly good. Normally I'd rely just on the official docs, but the docs have created much confusion, it seems, because there's a lot in them that are useful more so for library/framework developers than for users. So, I'm just wondering if anyone has great resources for really gaining a strong understanding of Python's asyncio or how else you might have gone about gaining proficiency to the point where you felt comfortable using asyncio in real projects.


I read the same books you did, and I was equally unsatisfied afterwards. The "Using Asyncio in Python 3" book was good enough to help me write some code that had to hit an API 400k times without blocking, but I never returned to asyncio after that.

Afterwards, I realized there was a package called aiohttp that I could've used, but too late.

I'll be interested to see what other HN people have done.


This blog helps me a lot about the motivation and underlying mechanism of python asyncio https://tenthousandmeters.com/blog/python-behind-the-scenes-...


Thanks a lot, I'll check that out.


The most valuable quality of a programming language committee is holding the temptation to add any new features unless it is something that drives existing users away.


This is a thoroughly interesting topic. Thanks for the article.

I haven't thought much about iterators link to coroutines.

As a hobby, I am working to write about a dream programming language. I happen to be really interested in parallelism, asynchronous, coroutines, multithreading and concurrency.

I want:

* seamlessly switch between remote-thread coroutine, local thread coroutine.

* concurrency and parallelism and async to be easy to think about, reason about, read and program

* programs should be easy to parallelise and be async and concurrent

Go iterators seem to be local to a thread, but what if you want to distribute work across threads?

I've been thinking of scheduling recently.

Imagine you're a search engine company and you want to index links between URLs. How would you solve this with coroutines?

  task download-url
   for url in urls:
    download(url)

  task extract-links
   parsed = parse(document)
   return parsed

  task fetch-links
   for link in document.query("a")
    return link

  task save-data
   db.save(url, link)

How would you do control flow and scheduling and parallelism and async efficiently with this code?

* `db.save()`, `download()` are IO intensive whereas `document.query("a")` and `parse` is CPU intensive.

* I want to handle plurality or multiple items trivially such as multiple URLs and multiple links.

* I want to keep IO and CPU in flight at all times.

I think I want this schedule:

https://user-images.githubusercontent.com/1983701/254083968-...

I have a toy 1:M:L 1 scheduler thread:M kernel threads:N lightweight threads lightweight scheduler in C, Rust and Java

https://github.com/samsquire/preemptible-thread

This lets me switch between tasks and preempt them from user space without assistance at descheduling time.

I have a simplistic async/await state machine thread pool in Java. My scheduling algorithm is very simple.

I want things like backpressure, circuit breakers, rate limiting, load shedding, rate adjustment, queuing.


i've been thinking about a closely related feature in a different context: adding block arguments, as in smalltalk or ruby or especially lobster, to a language more like c, with static types and stack allocation

i think this would be favorable for (among other things) clu-like iterators and imgui libraries, where you often want to do something like

    submenu("&Edit") {
        command("&Cut") { clip_cut(getSelection()); }
        ...
    }
this is especially useful in a context where you're heap-allocating sparingly or not at all, because the subroutine taking the block argument can stack-allocate some resource, pass it to the block, and deallocate it once the block returns; python context managers and win32 paint messages are two cases where people commonly do this sort of thing, but things like save-excursion, with-output-file, transactional memory, and gsave/grestore also provide motivation

the conventional way to do this is to package up the block into a closure, then use a full-fledged function invocation to invoke it, using a calling convention that supports closures. but i suspect a more relaxed and efficient approach is to use an asymmetric coroutine calling convention, in which the callee yields back control to its caller at the entry point to the block, and the block then resumes the callee when it finishes. so instead of merely dividing registers into callee-saved and call-clobbered, as subroutine calling conventions do, we would divide them into callee-saved upon return but upon yield containing callee values the block must have restored upon resumption; caller coroutine context registers, which are callee-saved upon return and also on yield; and call-clobbered. you also need in many cases a way for the block to safely force an early exit from the callee

this allows the caller's local variables to be in registers its blocks can use without further ado, or at least indexed off of such a register, while allowing the yield and resume operations to be, in many cases, just a single machine instruction. and it does not require heap allocation

as an example of taking this to the point of absurdity, here's an untested subroutine for iterating over a nul-terminated string passed in r0 with a block passed in r1, using a hypothetical coroutine convention which passes at least r4 through from its caller to its blocks

    itersz: push {r6, r7, r8, lr}
            mov  r7, r0
            mov  r6, r1
    1:      ldrb r0, [r7], #1
            cbz  r0, 1f
            blx  r1
            b    1b
    1:      pop  {r6, r7, r8, pc}
and here is another untested subroutine which uses it to calculate a string hash

    hashsz: push {r4, r5, r9, lr}
            movs r4, #53
            adr  r1, 1f
            blx  itersz
            mov  r0, r4
            pop  {r4, r5, r9, pc}
    1:      eor  r4, r0, r4, ror #27
            bx   lr
even in this case where both the iteration and the visitor block are utterly trivial, the runtime overhead per item (compared to putting them in the same subroutine) is evidently extremely modest; my estimate is 7 cycles per byte rather than 4 cycles per byte on in-order hardware with simple branch prediction, so, on the order of 1 ns on the hardware russ used as his reference. for anything more complex the overhead should be insignificant

it's less general than the mechanism russ proposes here (it doesn't solve the celebrated samefringe problem), but it's also an order of magnitude more efficient, because the yield and resume operations are less work than a subroutine call, though still more work than, say, decrementing a register and jumping if nonzero


corrected, tested, and commented version of the above example code

            @ hashsz s = {
            @   h := 53u
            @   itersz s, b => {
            @     h = (h >> 27 | h << 5) ^ b
            @   }
            @   return h
            @ }

            .syntax unified
            .thumb
    itersz: push {r6, r7, r8, lr}   @ r6 and r7 are preserved by blocks
            mov  r7, r0             @ so put the string pointer in r7
            mov  r6, r1             @ and the block code pointer in r6
    1:      ldrb r0, [r7], #1       @ loop over string bytes, post-indexing
            cbz  r0, 1f             @ bail out on NUL, but otherwise
            blx  r1                 @ invoke block argument with byte, then
            b    1b                 @ repeat loop
    1:      pop  {r6, r7, r8, pc}   @ restore registers and return

    hashsz: push {r4, lr}           @ r4 is callee-preserved and passed to blocks
            movs r4, #53            @ initial value of hash
            adr  r1, 1f+1  @ load Thumbified pointer for block using gas local label
            bl   itersz             @ invoke iterator subroutine
            mov  r0, r4             @ load completed hash into return value
            pop  {r4, pc}           @ restore and return
    1:      eor  r4, r0, r4, ror #27  @ block argument rotates & xors the byte in r0
            bx   lr                 @ then resumes the iterator


fe de ratas, the above should read

            blx  r6                 @ invoke block argument with byte, then
those responsible for this error have been sacked

years ago, in fact


The examples given prompt me to say: if all you have is Rube-Goldberg hammer, everything looks like an Escheresque nail.

Sieving primes by turning functions into coroutines, parsing text by yielding characters, all with unnatural functions and state management... that;s an improvement over what?


I'm not sure why author is advocating for single threaded patterns in a multithreaded environment. Not sure why he's trying to limit himself like this. The magic of goroutines is that you can use all of your cores easily not just one. Python and Lua has no choice.


Coroutines are a control-flow mechanism. They're a single-threaded pattern in as much as for loops are a single-threaded pattern. Ability to write multithreaded programs does not exclude the need for good single-threaded tools.


Looking at python's asyncio coroutine library, they are just mocking multithreading with asyncio.gather. Since coroutines can be executed in any order they are not really control-flow mechanisms. The selling point of coroutines over traditional threads is its lightweight but its moot since goroutines has similar memory cost to coroutines. The only real benefit is that coroutines are non blocking while goroutines may be blocked. There is no real benefit of having python's coroutine in go since goroutines does the same but better.


The concept of "coroutines" is as much about control flow as "subroutines" (function calls and return statements).

But when you have a construct that has their own call stacks, it's a relatively small step to implement lightweight threads with it.

Since doing concurrency happens much more often than any other smart use of coroutines, many people conflate the two.

I sometimes see this confusion in discussions about Kotlin's coroutines. An example of using coroutines that is not about concurrency: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequence...


Kotlin's sequence pre-dates co-routines. Mostly what either of those do is just a bit of syntactic sugar on top of call back mechanisms. One of the nice things with Kotlin is the ability to extend existing APIs via extension functions. Which is something the co-routines library uses extensively to be able to provide co-routine implementations on top of existing frameworks on the JVM, in javascript, and in native environments. Same APIs. Same code. Different underlying implementations. I actively use this in some of my multi platform libraries that I use on JVM and in the browser.

The key concept with Kotlin's co-routines is not having to choose between reactive, green threads, or real threads but treating all of those in the same way with a robust set of abstractions. I use co-routines in the browser on top of existing javascript frameworks that return promises. I use them on the JVM with reactive frameworks like flux. And I also use them with some thread pools. Once Loom reaches LTS (next year, I think), I'll probably be configuring some Loom capable co-routine dispatchers as well.

The debate regarding co-routines vs. go-routines seems like it is similar. I think Go can learn a thing or two from Kotlin's co-routines. After all it is mostly built as a library on top of a single language feature: the suspend keyword. I get that not everybody likes colored functions. But then having a lot of leaky abstractions and related complexity on top of go routines is maybe also not ideal.


> Kotlin's sequence pre-dates co-routines.

You misunderstood. The `Sequence` type does predate coroutines. But I meant the `sequence` builder function, which takes a block of suspending code to create a `Sequence`.

The in-order traversal in the article can be translated to Kotlin:

    fun walk(t: Tree?): Sequence<Int> = sequence {
        if (t != null) {
            yieldAll(walk(t.left))
            yield(t.value)
            yieldAll(walk(t.right))
        }
    }
As I have noted above, this is a use of coroutines that has nothing to do with concurrency.


Sequences actually were part of Kotlin 1.0. Co-routines were added later.


> Sequences actually were part of Kotlin 1.0.

But the `sequence` builder function was not. In fact it depends on coroutines.

Could you read my reply before repeating?


In fact python had general coroutines first in yield, which is mostly used for iteration (“generators”) and state machines.

Some frameworks (e.g. twisted) did use them for concurrency, and the core team originally planned something similar, however the ergonomics were not what they wanted (especially when mixing coroutines-for-concurrency and coroutines-for-iteration), so they went for a more specialised design.


> did use them for concurrency

> went for a more specialised design

My impression is that JS evolved similarly - (ab)using generator for concurrency, then specialized async-await as a language feature.


Kinda? But that was a very short cycle, both promises and generators were added to the language in ES6 (though the community had been coalescing around promises — “thenables” — for a while), the first draft for async functions was actually created during the development cycle of ES6, and it was shipped in ES7.


I don't know about Python, but this description of coroutines, the general concept, doesn't seem accurate to me. They most definitely cannot be executed in any order. They also have nothing to do with multithreading whatsoever. Lua has the most honest-to-god implementation of coroutines I know of, so I'd suggest looking at that. I've seen the word "coroutines" used in some weird way suggesting multithreading, which probably means Python used them to fake multithreading, but the idea originally has nothing to do with it. The idea actually started out as a way to better structure an assembly program in 1958: <http://melconway.com/Home/pdf/compiler.pdf>.


Goroutines are non-blocking. Channels may not always be, however.


Reasoning about and following the control flow of the proposed code hurts me inside. If Go adds function coloring via (e.g. python's async and/or yield concepts), I'm out, because I don't want to use this, much less encounter it in the form of a bug in some library.

Java and C++ are largely inferior for my typical purposes, but at the end of the day they work fine and are stable in terms of direction, and don't tend to repeatedly bloat the language over pedantry. If you want top-notch performance, there's already C, C++, and Rust.

I am not a fan of the function coloring shit in Python and Javascript.

I don't want the kitchen sink!


Yield has nothing to do with colouring.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: