Coroutines for Go

alphazard · on July 17, 2023

It looks like a lot of people are missing the point here. Yes a coroutine library would be a worse/more cumbersome way to do concurrency than the go keyword.

The use case motivating all the complexity is function iterators, where `range` can be used on functions of type `func() (T, bool)`. That has been discussed in the Go community for a long time, and the semantics would be intuitive/obvious to most Go programmers.

This post addresses the next thing: Assuming function iterators are added to the language, how do I write one of these iterators that I can use in a for loop?

It starts by noticing that it is often very easy to write push iterators, and builds up to a push-to-pull adapter. It also includes a general purpose mechanism for coroutines, which the adapter is built on.

If all of this goes in, I think it will be bad practice to use coroutines for things other than iteration, just like it's bad practice to use channels/goroutines in places where a mutex would do.

eklitzke · on July 17, 2023

I think it's also worth mentioning that for certain specific use cases coroutines are much more efficient than a full goroutine, because switching to/from a coroutine doesn't require context switching or rescheduling anything. If you have two cooperating tasks that are logically synchronous anyway (e.g. an iterator) it's much more efficient to just run everything on the same CPU because the kernel doesn't have to reschedule anything or power down/wake up CPU cores, and the data is in the right CPU caches, so you'll get better cache latency and hit rates. With goroutines this may happen anyway, but it's not guaranteed and at the minimum you have the overhead of going through the Go runtime goroutine scheduler which is fast, but not as fast as just executing a different code context in the same goroutine. Coroutines offer more predictable scheduling behavior because you know that task A is switching directly to task B, whereas otherwise the goroutine scheduler could switch to another available task that wants to run. The last section of the blog post goes into this, where Russ shows that an optimized runtime implementation of coroutines is 10x faster than emulating them with goroutines.

Google has internal kernel patches that implement cooperating multithreading this way (internally they're called fibers), and the patches exist for precisely this reason: better latency and more predictable scheduling behavior. Paul Turner gave a presentation about this at LPC ~10 years ago that explains more about the motivation for the patches and why they improve efficiency: https://www.youtube.com/watch?v=KXuZi9aeGTw

unscaled · on July 18, 2023

It's not just about performance, but also safety and ergonomics. Since true coroutines[1] offer predictable scheduling, their behavior with regards to data races and deadlocks is also more predictable.

If programmers try to manually implement iterators, generators and interleaved state machines with their own goroutines and channels, it's not just performance that suffers - there is too much room for error.

[1] I'm using the qualifier "true" here, since many modern languages (such as Python, Kotlin) use the term "coroutines" for something that is more like Go's Goroutines than Lua's coroutines. Unlike Go, they are not preemptible, but they are (at least by default) implicitly resumed when necessary by some scheduler, and they may execute on different kernel threads and switch contexts.

masklinn · on July 18, 2023

> I'm using the qualifier "true" here, since many modern languages (such as Python, Kotlin) use the term "coroutines" for something that is more like Go's Goroutines than Lua's coroutines.

Python has both true (ish) coroutines (or at least coroutines which are entirely user controllable), which it mostly uses for iteration, and the concurrency specialised “async”.

Initially the goal was to reuse “yield” for concurrency (a big reason why “yield from” was added), but the ergonomics of mixing multiple coroutines uses was found to be awful, at least for the langage python is, and trying to provide relevant sugar difficult.

unscaled · on July 19, 2023

It's almost true. Python calls its "almost true" coroutines "generators", while the implicitly scheduled, async I/O-oriented coroutines are just called "coroutines".

I understand that the historical reasons for that ("generators" originally only supported "yield" and a separate async/await design with implicit scheduling seemed more ergonomic). But that doesn't remove the confusion.

But Python's generators are still fully "true" coroutines. Russ makes the distinction between coroutines and generators based on whether as "Generators provide less power than coroutines, because only the top-most frame in the coroutine is allowed to yield. ". I think this is a bad definition, as generators can clearly "pass" the yielding power to other generators with "yield from", and Russ talks about this feature in the same post. Generators require more ceremony, but I don't think they are "less powerful" in any meaningful way .

Roberto Ierusalimschy, the creator of Lua, and Ana Lucia de Moura proposed the distinction between stackful and stackless coroutines[1]. Their paper revived coroutines as a research subject and had a lasting impact on coroutine-related terminology. Ierusalimschy and Moura made the distinction between "Stackful" coroutines and "non-stackful coroutines" (the term "stackless coroutines" stuck later on, but they did not use it), where Stackful coroutines like Lua's coroutines can "suspend their execution from within nested functions". They also made the same allusion to power that Russ does: "powerful enough to constitute a general control abstraction; in particular, they can- not be used as a concurrent construct".

I think both Ierusalimschy and de Moura are onto something when they call generators "limited", but they are just incorrect when they say that "but are not powerful enough to constitute a general control abstraction; in particular, they cannot be used as a concurrent construct". Early Node.js libraries and frameworks such as "co" and "koa"[2] should count as a proof that you can very much use generators as a concurrency construct. Generators have all the power that is necessary for dealing with normal concurrency cases, although they don't have the best ergonomics.

But there is another distinction where "generators" are not the same as Lua coroutines, and clearly lack some expressive power (even though this power is NOT a necessary condition for implementing full-fledged concurrency). Lua allows coroutines to communicate bidrectionally: yield() can pass a value back the coroutine caller, which is returned from resume() - but when the coroutine caller calls resume() again they can pass an argument which will be returned from yield(). Generators are unidirectional: they only allow you to pass a value to yield(), but not to resume().

[1] http://www.inf.puc-rio.br/~roberto/docs/MCC15-04.pdf [2] https://news.ycombinator.com/item?id=6933358

masklinn · on July 19, 2023

> Lua allows coroutines to communicate bidrectionally: yield() can pass a value back the coroutine caller, which is returned from resume() - but when the coroutine caller calls resume() again they can pass an argument which will be returned from yield(). Generators are unidirectional: they only allow you to pass a value to yield(), but not to resume().

I might be missing something as I’m not familiar with Lua’s coroutines but that sounds like send (https://docs.python.org/3/reference/expressions.html#generat...). The parameter to `send` (from outside the coroutine) is the return value of `yield` (inside the coroutine).

rcme · on July 17, 2023

What is wrong with:

    for {
        next := getNext()
        ...
    }

What is the advantage of writing this as:

    for next := range getNext {
        ...
    }

alphazard · on July 17, 2023

In practice the difference would be closer to:

    getNext := iterableThing.Iterator()
    for {
        next, ok := getNext()
        if !ok {
            break
        }
        ...
    }

vs.

    for next := range iterableThing.Iterator() {
       ...
    }

One advantage is that it's slightly shorter, which matters for very common patterns--people complain about `err != nil` after all. Another advantage is there isn't another variable for everyone to name differently. Another advantage is that everyone can't do the control flow slightly differently either.

marcus_holmes · on July 18, 2023

Only newbies tend to complain about `err != nil` in my experience. After a certain point it just clicks and they get used to it. There's a cadence to Go code (do the thing, check the error, do the thing, check the error) that is easy to read once you're used to it, but looks horribly verbose when you're coming from an exceptions-based language.

Go has simplicity as a design goal. Part of that simplicity is not adding all the features we can think of to the language. Just because it would provide slightly leaner syntax for a relatively small group of use cases isn't a good reason to add new language features (imho, from my 10+ years using Go and watching it evolve).

If we can implement this using current language features, but it's complex and messy to implement, then it's a great case for a new library, possibly even inclusion in the standard library. If we can't implement this at all using current language features, then maybe it's a case for a new language feature to enable this. If we can implement it relatively cleanly using current language features, then we're good and don't need to do anything. This seems like a "can implement, but not very cleanly" case, which would be a great justification for a library, but not a language feature. Again, imho.

jwestbury · on July 18, 2023

I'm used to `err != nil`, but it doesn't mean I like it. It's a lot of what amounts to boilerplate in a language that is mercifully short of boilerplate elsewhere. This is doubly true when you want custom error types, and need to start unpacking values from your custom error struct.

hgsgm · on July 18, 2023

>This seems like a "can implement, but not very cleanly" case, which would be a great justification for a library, but not a language feature.

First sentence of OP:

> This post is about why we need a coroutine package for Go

Then in the girst section after the intro:

> Later, I will argue for an optimized implementation provided directly by the runtime, but that implementation should be indistinguishable from the pure Go definition.

marcus_holmes · on July 18, 2023

Yeah, so I'm agreeing that this would be a great library, but disagreeing that it should be a language feature. Sorry if I didn't make that clear.

alphazard · on July 18, 2023

Personally, I don't really see anything wrong with explicit error values, and checking them against nil.

The point I was trying to make is that some amount of people see the benefit in removing much smaller boilerplate. And so presumably at least as many people should see the benefit of removing a larger amount of boilerplate.

I'm just talking about the benefits here. There is also a cost to expanding the language. The costs might outweigh the benefits, but it doesn't mean that the benefits don't exist.

marcus_holmes · on July 19, 2023

> There is also a cost to expanding the language. The costs might outweigh the benefits, but it doesn't mean that the benefits don't exist.

Agreed. I think this is where the discussion will focus. I'm glad that historically the Go team have weighted the costs heavily, and hope they continue to do so.

I guess I'm in the minority but boilerplate really doesn't bother me. Going the other way gets too much "magic" and we lose the ability to fine-tune behaviour.

tmpz22 · on July 17, 2023

> complain about `err != nil` after all

Not to nitpick this specifically but as a generic reminder not all complaints are worthy of shifting the trajectory of a massively popular programming language.

Balancing "worthy" and "unworthy" changes is really hard both in the community and discussions like this one. I don't envy the teams that have to do it.

bombolo · on July 18, 2023

Not forcing to check errors is akin to what you do in C, and even there compilers complain about this.

Kamq · on July 17, 2023

I think more idiomatic go for the first case would be:

    getNext := iterableThing.Iterator()
    for next, ok := getNext(); ok; next, ok = getNext() {
        ...
    }

Which, yeah, the range cleans it up a bit, but it's not doing quite as much work as you're implying.

davenoul · on July 18, 2023

Don't forget that the important bit in Go is readability of the code. To me the range form is much easier to reason about than this for loop.

Kamq · on July 18, 2023

If we're going for readability instead of trying to stick to the case presented above, I'd go with:

    for i.HasNext() {
        current := i.GetNext()
        ...
    }

And just mutate the internal state of the struct. Probably throw it behind some sort of interface like:

    type Iterator[T any] interface { // feel free to strip out the generic
        HasNext() bool               // if you really only have one type
        GetNext() T                  // you do this with
    }

I think this still loses to ranging over a function, but I still think it should be compared to what you can do with the current language as opposed to what was originally posted.

JyB · on July 17, 2023

The err!=nil is exactly what you don’t want to mention and the perfect example why you shouldn’t listen to all the close-minded takes some people might have.

aaronbee · on July 17, 2023

Your code is an example of a "pull iterator", and it's not as much of a concern.

The harder case to deal with is a "push iterator", which are often much simpler to implement, but less flexible to use. See https://github.com/golang/go/discussions/56413

The OP is about converting a push iterator into a pull iterator efficiently by using coroutines. This provides the best of both worlds, simple iterator implementation and flexible use by its caller.

starttoaster · on July 17, 2023

Besides what everyone else said, the obvious advantage is the latter builds in the logic for exiting the loop once getNext has run out of elements in the slice/map. Your former example will need a final step that's like:

    ...
    if getNext() == nil {
      break
    }
    ...

This isn't a huge boon, and is mostly a matter of style. But I prefer the latter because it's logic that gets handled with the range builtin, so my code can be more about application logic, rather than muddling in breaking out of loop logic.

rakoo · on July 17, 2023

None of the proposals submit your idea of writing things differently. The article proposes an implementation that is fully doable and usable with current spec and no breaking changes.

The point of coroutines is that they are little contexts that serve to be called

- many times

- sometimes with different parameters

- change state

- might be interrupted by callers

- might interrupt callers themselves

All of this can be done by other means. Just like any sorting can be done by copy pasting the same code, but generics make it less tedious. That's the same idea here. Some problems can be implemented as interleaving coroutines, and their definition is simple enough that you want to write it all in the body of some CreateCoroutine() function instead of taking out another struct with 5 almost empty functions. It will not solve all problems, but can more clearly separate business logic and orchestration.

bouk · on July 17, 2023

The first doesn't do control flow

rcme · on July 17, 2023

Depends how getNext is defined.

alphazard · on July 17, 2023

The only way to do control flow in getNext, consistent with how you partially defined it, is to panic. That means there is some important code wrapping the whole loop body that you left out.

zeroxfe · on July 17, 2023

That doesn't make sense. `getNext()` has no ability to break out of the loop.

ben0x539 · on July 18, 2023

Well, what was wrong with:

    for {
        next := <-channel
        ...
    }

Mawr · on July 18, 2023

Horrible overhead. If the loop does something simple, like summing integers, 99% of time will be spent switching between goroutines.

From TFA:

"Because scheduling is explicit (without any preemption) and done entirely without the operating system, a coroutine switch takes at most around ten nanoseconds, usually even less. Startup and teardown is also much cheaper than threads."

"For this taxonomy, Go's goroutines are cheap threads: a goroutine switch is closer to a few hundred nanoseconds"

Also check out https://news.ycombinator.com/item?id=29510751

and https://ewencp.org/blog/golang-iterators/index.html:

"Finally, the most natural channel based implementation is… slow. By a factor of nearly 50x. Buffering does help though, reducing that to a factor of 25x."

unscaled · on July 18, 2023

And even if it's a toy program and you don't care about performance, it's not as simple as just:

    for {
        next := <-channel
        ...
    }

You have to set up the channel and the goroutine that feeds it, you need to safely close the channel when the iteration is done (but not before, unless you like panics), you need to deal with panics inside the goroutine and possibly support cancellation if the iteration breaks early (unless you love memory leaks).

If you try to implement this pattern by hand, you are all too likely to make fatal mistake, and this is doubly true in the hands of an inexperienced programmer.

I appreciate the fact that Russ wrote this long post, gradually implementing `coro.New()` and improving its functionality and safety — and only in the very end, we get a short paragraph about performance. Good performance is important to make this feature attractive to use, but if the feature is clunky and error-prone, it wouldn't be worth much, even with great performance.

ben0x539 · on July 18, 2023

I'm responding to a comment asking for a justification for sugar for function call based iteration. My comment is an attempt to draw a parallel to the need for sugar for channel based iteration. I'm not trying to suggest channel based iteration as an alternative to coroutines.

fooyc · on July 18, 2023

The article mentions race conditions

AYBABTME · on July 18, 2023

I think coroutines in Go will make it possible to use Go as a host for clean definition of discrete element simulations. Without it, yielding to an actor is clunky.

dirtsoc · on July 18, 2023

Why not just range (or use an switch) over a channel and have a go routine running that pushes values into the channel?

I still don’t see the need for coroutines.

saturn_vk · on July 19, 2023

This is too slow to really be useful on its own

djha-skin · on July 17, 2023

Still a kitchen sink move, though, isn't it?

Like, no careful thinking and good 80/20 solution this time. Just "huh, we'd need coroutines to do this `right` so let's just do that"

When they added generics, they really, really thought long and hard, and came up with a compromise that was brilliant and innovative in the balance it struck.

I would have hoped to see something like that here, like "we're adding this one feature to goroutines to have control in specific situations" feels like something that would be better than "we're going full Rust on this problem and just adding it."

morelisp · on July 17, 2023

This has been under discussion for a long time.

https://github.com/golang/go/issues/43557

https://github.com/golang/go/discussions/56413

I'm not sure Russ's personal blog is any kind of official statement "this is what we're doing" yet?

chrsig · on July 17, 2023

> I'm not sure Russ's personal blog is any kind of official statement "this is what we're doing" yet?

I've found it's generally a very strong indicator.

But to your point, this isn't a new issue, there's been a good amount of discussion around it over the years.

earthboundkid · on July 18, 2023

In the old discussion, I specifically complained about how it would add back door coroutines to Go via iterators, and now Russ has a post about formalizing the concept of a coroutine, while also keeping the optimization of coroutines as a runtime implementation detail. I feel vindicated. :-)

https://github.com/golang/go/discussions/56413#discussioncom...

Zach_the_Lizard · on July 17, 2023

I have written Go professionally for many years now and don't want to see it become something like the Python Twisted / Tornado / whatever frameworks.

The go keyword nicely prevents the annoying function coloring problem, which causes quite a bit of pain.

Sometimes in high performance contexts I'd like to be able to do something like e.g. per CPU core data sharding, but this proposal doesn't scratch those kinds of itches.

talideon · on July 17, 2023

Coroutines and goroutines fill different niches. The latter already fill the niche the likes of Twisted fill. There's nothing here trying to pull anything akin to async/await into Go.

Coroutines will fill a different nice more akin to Python's generators. There are a whole bunch of places where this could dramatically cut down on memory usage and code complexity where you have a composable pipeline of components you need to glue together. The design appears to me to be entirely synchronous.

kardianos · on July 18, 2023

I have had some experience with wire protocols that might benefit from coroutines. If I use goroutines, I have to use sync or channels, and then be careful. If I use state machines, it can be cumbersome. I suppose we will see.

JyB · on July 17, 2023

For sure cut down code complexity for the single clever code writer who wanted to be fancy. Not so sure about all the readers that will come afterwards.

talideon · on July 18, 2023

Not at all. I often have to slurp in large amounts of data to be processed in some way. If you don't want to use huge amounts of memory for doing so, you either need to (a) write some horribly obtuse code or (b) use coroutines. Coroutines allow you do this in a more natural and composable way.

Anything involving a lexer and parser is a good example of this, and exactly one of the use cases given in the example. With coroutines, the lexer can take some input and emit a series of tokens which the parser can consume on the fly without any need to do any complex interleaving of the parser and lexer code. You also gain the ability to stop lexing early in case the parser encounters an issue.

I'm not sure what kinds of software you work on, but this is little more complicated or clever than Unix shell pipelines.

throw-ru-938 · on July 18, 2023

Isn't this literally the entire point of Golang?

rakoo · on July 17, 2023

> The go keyword nicely prevents the annoying function coloring problem, which causes quite a bit of pain.

The good thing is, nothing in the post even proposes anything that might fall in the function coloring problem.

GaryNumanVevo · on July 17, 2023

> The go keyword nicely prevents the annoying function coloring problem

FYI when you write a goroutine and have to use Context, that's literally just function coloring

morelisp · on July 17, 2023

Ad absurdum, then anything other than full curried 1-ary-exclusive functions is coloring.

You can easily bridge context-using and non-context-using code, you can have nested contexts, you can have multiple unrelated contexts, you can ignore the context, probably most critically you can't pass values back up the context, I don't see any way these look like function coloring.

mcronce · on July 17, 2023

Not sure about in Python, but in Rust, you can easily bridge sync and async code. The function coloring problem is, at least in that case, grossly overstated.

throwaway2037 · on July 18, 2023

Why was this downvoted? The person makes an interesting point.

For discussion, can you please provide a short sample in Rust to demonstrate your point? I would like to hear more.

mcronce · on July 27, 2023

I'm guessing it was downvoted for mentioning Rust ¯\_(ツ)_/¯

For sure - it's a little terse, just to demonstrate the various cases

  async fn do_something_async() {
    nonblocking_sync_call();
    tokio::task::spawn_blocking(blocking_sync_call()).await;
    async_std::task::spawn_blocking(blocking_sync_call()).await;
    async_call().await;
  }

  fn do_something_sync() {
    nonblocking_sync_call();
    blocking_sync_call();
    futures::executor::block_on(async_call());
  }

There is _a bit_ of ceremony required to bridge the two, but it's pretty minimal.

If you want to know more about any of that, happy to explain or provide links for further reading

Kamq · on July 17, 2023

Is it really? >90% of the time, I'm just passing a context so that I can cancel the rest of the goroutines if an early one fails and I don't need to do the rest of the computation.

That works just fine with throwing a context.Background in, which should be available anywhere, and not color the function.

throwawaymaths · on July 17, 2023

Arguably the "problem" with other async/threading frameworks is that they build threading on top of iterators/coroutines. In this case they would be orthogonal, so it's probably not as bad as you think, though almost certainly some clever developer out there will do something stupid with it that will also get very popular (my bet would be abstracting goroutines and coroutines into a single thing)

cookiengineer · on July 17, 2023

Can you maybe share hints to better channel management patterns/frameworks?

Usually my goroutines related code feels like a mess because of them, and I don't know how to make them "more clean" (whatever that means). Any hints proven to be maintainable patterns would be highly appreciated.

Zach_the_Lizard · on July 18, 2023

I don't always use channels in the code I write.

For example, let's say I have []foo as input and I'm going to call a service on every element of the slice.

You could do it with channels, but oftentimes I'll reach for creating a slice of results and passing each goroutine an index into the slice. That way I can avoid any overhead that comes from having what is effectively a queue with a lock.

It depends on what I'm doing, how I want to treat failures of a single goroutine, etc. of course.

I default to never returning channels unless it's a requirement for the use case (e.g. context.Done() returns a channel so callers can listen for when the context is closed).

Callers can always create goroutines to make your code run concurrently. It's more annoying to erase channels when they aren't useful to the caller (e.g. if you made all functions that make HTTP calls return a channel because you assume callers want a new goroutine, but I already have my own goroutine per request)

Beyond that, some specific examples would be helpful so I can tailor my advice

valenterry · on July 17, 2023

Don't worry, it will get even worse, because its foundations are worse than pythons - and it also has less flexibility because it's not interpreted. Like it or not, it's going to happen. I also predicted that generics would be added to Go and folks here laughed about it. :)

MathMonkeyMan · on July 17, 2023

Multitasking systems gave us processes.

But those were too much.

So we got threads, which are processes that share an address space, file table, and some other things. The scheduler can switch from one to the other more easily than between processes, and data can be shared between threads without needing serialization.

But those were too much.

So we got user space threads, which are logical threads of execution that are driven by a runtime entirely in user space. The runtime adds scheduling hooks into all I/O functions in the standard library, or even uses a system API like Unix signals to preempt logical threads. No system-level context switching is needed. User space threads can be tiny.

But those were too much.

So we got coroutines, which allow a programmer to define logical "threads" of execution that cooperatively interact with each other. There is no assumption about the presence of a scheduler. The programmer either writes their own event loop or invokes one from a library in a "real" logical thread.

I wonder what comes next. As far as [communicating sequential processes][1] are concerned, maybe cooperative coroutines are a low as you can go.

[1]: https://www.cs.cmu.edu/~crary/819-f09/Hoare78.pdf

nasretdinov · on July 18, 2023

Even coroutines can't save us programmers from the dread of having to explain every little possible detail to the computer, in a way that doesn't take forever to execute if you provide input that is even slighly different from what the programmer had in mind

jonathanstrange · on July 18, 2023

I'm a bit surprised about the lack of languages that parallelizes code automatically as much as possible, but only as far as measured performance suggests. It requires a semantics that supports concurrency, for example iteration over sequences needs to be in unspecified order by default, and also rests on whole program flow analysis, but I see no principle obstacles.

Hasn't Microsoft done some research on such a language years ago already? There is also ParaSail made by someone from the Ada community. What happened to these projects? Nobody uses them?

omarelb · on July 19, 2023

Languages built on the BEAM VM like Erlang and Elixir support concurrency at the runtime level, though it's up to you to specify when you want to run something in parallel. Or am I missing why they don't fulfill your requirements?

I'm not sure if it's desirable to parallelize code automatically, as in many cases you do need at least _some_ parts of the code to run synchronously. But it's an interesting thought experiment to have parallelism be the default instead of opt-in.

MathMonkeyMan · on July 19, 2023

Guy Steele Jr. worked on a language called "Fortress." There are a few good talks that he gives:

- (toy problem) https://www.youtube.com/watch?v=ftcIcn8AmSY

- (parallelism) https://www.youtube.com/watch?v=dPK6t7echuA

- (languages) https://www.youtube.com/watch?v=dCuZkaaou0Q

gpderetta · on July 18, 2023

Turns out that autopar is an extremely hard problem beyond simple use cases. And for simple use cases existing solutions work fine (pragma omp for goes a long way for example).

earthboundkid · on July 18, 2023

I don’t think there’s any further down to go, but there’s probably room to go up to distributed computing. IIRC, some early versions of Go when it was in alpha had channels work across machines.

davenoul · on July 18, 2023

I think there is. How do you describe data parallelism? coroutine are too big when all your function call are doing the same things, but on a different data (think ISPC, shaders, cuda). There is still one more step in parallelism where coroutine cost to much :-)

earthboundkid · on July 19, 2023

Good point. SIMD routines!

deepsun · on July 18, 2023

Imagining next step could be something like: "process this collection of task-items as you wish". If you squint, any concurrent execution can be viewed as sequence of tasks (which can also be collections), even if there is only one or two of them.

davenoul · on July 18, 2023

Something along what Cuda, OpenCL or ISPC do, but integrated to the language.

djha-skin · on July 17, 2023

I thought that the entire point of green threads was so that I didn't have to use something like Python's `yield` keyword to get nice, cooperative-style scheduling.

I thought go's `insert resumes at call points and other specific places` design decision was a very nice compromise.

This is allowing access to more and more of the metal. At what point are we just recreating Zig here? What's next? An optional garbage collector?

wahern · on July 17, 2023

yield and resume here aren't keywords, they're regular variables (referencing regular closures) given those names purely for pedagogical purposes.

The only odd thing here is the creation and use of a cancel callback. I've not done much Go programming, so I don't know whether this is just a performance improvement to more quickly collect dropped iterator state, or it's a requirement because Go does not garbage collect goroutines waiting on channels for which the waiting goroutine has the only reference. In Lua you wouldn't need such a thing because coroutines/threads are GC'd like any other object--once all references are gone they're GC'd even if the last operation was a yield and not a return from the entry function.

vaastav · on July 17, 2023

Garbage collector in Go is optional. You can switch off garbage collection by setting the environment variable GOGC=off.

More info about GOGC: https://dave.cheney.net/tag/gogc

t8sr · on July 17, 2023

Only very theoretically - in Go, you don't control whether memory goes on your stack or the heap, and heap escape analysis is notoriously unpredictable. There is no explicit free. You would have to write Go in a completely crazy way to be able to turn off the GC and have the program not grow unbounded.

You might think "I'll just use static buffers everywhere", but allocations can occur in unexpected places. The compiler does some very basic lifetime analysis to eliminate some obvious cases (loops...), but it's really hard to avoid in general.

gkfasdfasdf · on July 17, 2023

Ok so garbage collection is optional, how about garbage generation? Is there any way to manually clean up resources if GOCG=off, or will memory usage continue to grow unbounded as new objects are created?

vaastav · on July 17, 2023

Grows unbounded. I wasn't recommending that one should set GOGC=off. Just making a remark that one could should they choose to do so.

EDIT: Sorry, I misunderstood part of your question. The memory grows unbounded unless you call runtime.GC() which triggers garbage collection. But this is a blocking call and essentially block your whole program.

Cthulhu_ · on July 17, 2023

I think in theory you can write code (or do some tricks) to avoid all heap allocation.

t8sr · on July 17, 2023

"In theory" being the operative words there. Turn on heap escape analysis sometime, you'll be surprised how hard it is to avoid.

dgb23 · on July 17, 2023

You mean not growing the heap or literally no allocation?

Tozen · on July 21, 2023

Arguably, one can't truly say the GC is optional, unless the language and its libraries were designed to work without it. In languages like Vlang, that's the case, as the GC was added later. If turning the GC off cripples the functionality and usefulness of the language, then there is little point in using the option or claiming it as optional.

Probably a better argument for Go (and other languages like Java) is how "tweakable" the GC is or at least describe it as the GC can be turned off, but it's not designed or useful to do so.

kaba0 · on July 18, 2023

Well, then so is in Java, with the Epsillon “GC”. But not collecting garbage is very different than manual memory management.

chrsig · on July 17, 2023

Coroutines are one thing that i'd probably prefer language support for rather than a library.

    x := co func(){
        var z int
        for {
            z++
            yield z
        }
    }

    y := x()

    for y := range x {
        ...
    }

or something to that effect. It's cool that it can be done at all in pure go, and I can see the appeal of having a standard library package for it with an optimized runtime instead of complecting the language specification. After all, if it's possible to do in pure go, then other implementations can be quickly bootstrapped.

My $0.02, as someone that uses go at $work daily: I'd be happy to have either, but I'd prefer it baked into the language. Go's concurrency primitives have always been a strength, just lean into it.

MichaelMoser123 · on July 18, 2023

you need language support for coroutines - what happens when the stack runs out in one of the coroutine? With a library solution you would have to kill the co-routine, or kill the program. If you want the stack to grow transparently, then that can only be done by the generated code, it has to watch stack usage and then grow the stack on demand (i think you have something like that with goroutines)

Well, maybe a library solution could possibly have a guard page at the end of the stack. When that one is reached, then you could try to grow the stack in an error handler. (but that would probably not work, if you have taken a pointer to some stack variable)

hgsgm · on July 18, 2023

You don't need language support for coroutines if you have language support for relocatable/chunkable stack, which Go has.

uhura · on July 18, 2023

I would prefer to add a `yield` param to functions so it can better interact with the current type system.

This means that for a function to yield, it needs to have the so called `yield` param and you can only yield inside a yield function functions with the same yield signature or does not contain a signature at all.

So the example would be rewritten into something like:

  x := func(:z int) {
    for {
      z++
      :- z
    }
  }

  for y := range x() {
    ...
  }

The : adds the yield signature and :< yields the value.

Functions that only yield a value `X` would just have the `: X` or `: name X`. To accept a value of type `Y` in the resume the signature would change to `:[Y] X` or `:[Y] name X`.

Accepting is weak, so if something expects a yielding function that resumes with Y and yield X, yielding functions that only yield X without a resume should also be accepted.

Consider the following code example:

  func filter[T comparable](it func(:T), f func(T) bool, :T) {
    for v := range it {
      if f(v) { :- v }
    }
  }

  func map[T](arr []T, :T) {
    for _, v := range arr {
      :- v
    }
  }

  for v := range filter(map({1, 2, 3}), func(x) { return x < 3 }) {
    print(v)
  }

Special functions in a co package could give resume and New functionality to keep the go style.

A basic counter could be written like this:

  func counter(:[bool]c int) {
    for {
      if ! :- c { break }
      c += 1
    }
  }

A func that counts twice could be as simple as:

  func countTwice(:[bool]c int) {
    count()
    count()
  }

To interact with the resume value the range syntax could be expanded to something like:

  for c := range count() {
    if c > 10 { -: false }
    -: true
    print(c)
  }

Range would resume with a default zero value if the range does not pass a value to -:.

gpderetta · on July 18, 2023

Sure, as long as I can close over a yield object. There is no reason for filter and map to know that the last function parameter takes a continuation.

uhura · on July 18, 2023

But then, how could you guarantee type safety?

This yield is not an object, it is handling the execution context to another stack.

gpderetta · on July 18, 2023

It is a delimited one-shot continuation. Delimited continuations are first class objects like any other and it should be possible to close over them.

38 · on July 17, 2023

wow that is horrible. none of that is intuitive, which is one of Go strength. you would need to specific learn the Go semantic of coroutine to have any chance of writing or even reading code like this.

sk0g · on July 18, 2023

Goroutines and channels aren't intuitive either. You learn them and become familiar with them over time, at which point they become intuitive __for you__.

38 · on July 18, 2023

actually I get it now. the issue is that OP was giving TWO different examples of use, pulling a single value versus multiple. they should have clarified.

chrsig · on July 18, 2023

Yes, on review, I should have made two examples instead of combining them. Thanks for persevering through my brevity and reaching understanding :)

dboreham · on July 18, 2023

They're totally intuitive if you were an Occam programmer.

sk0g · on July 18, 2023

So it's intuitive because you are familiar with those concepts. I don't see how that goes against what I said.

kissgyorgy · on July 18, 2023

Yes, you need to learn language features in order to use the language, shocker!

12345hn6789 · on July 18, 2023

I've written go for a year or so. That looks close to go code I look at every day. I wouldn't be surprised if it was in the language.

silisili · on July 17, 2023

Not sure I'm a fan. Looking through the examples, I feel like this makes the language much harder to read and follow, but maybe that's just my own brain and biases.

Further, it doesn't seem to me to allow you to do anything you can't currently do with blocking channels and/or state.

JyB · on July 17, 2023

You’re absolutely right. People advocating for it can’t seem to see beyond their nose.

hgsgm · on July 18, 2023

Yes, the person who spent 15 years developing the Go language can't see beyond his nose.

slantedview · on July 18, 2023

More likely they're trying to solve real problems you just haven't hit yet.

iends · on July 18, 2023

The go team has explicitly said this is the past: “You just don’t understand because you’re not at Google’s scale.”

philosopher1234 · on July 18, 2023

Where have they said this specifically?

enneff · on July 18, 2023

What language change are you talking about? This is just a proposed construct to regularise and make efficient something people already do (as you says with “state”).

I’ve used iterators similar to what’s described in this article to avoid allocations in critical code paths, but this would make those much less awkward to use (particularly with the upcoming range iterator language change).

silisili · on July 18, 2023

Perhaps language change was bad wording, I guess I meant paradigm change encouraging? Just look at this func signature and first line...

> func Pull[V any](push func(yield func(V) bool)) (pull func() (V, bool), stop func()) {

> copush := func(more bool, yield func(V) bool) V {

The main power of Go to me was always quickly being able to read and understand code. This type of coding has a lot of cognitive load to a reader, I feel.

enneff · on July 18, 2023

The implementation of Pull may be a bit mysterious to the casual reader, but its usage seems clear enough to me from the function signature. Even without reading the docs I think it’s easy to guess what yield, pull, and stop should do (and how to implement yield). That’s what matters, imo.

Fwiw if you look at the source of some other idiomatic standard library functions you may find their implementation is of similar complexity. That’s the nature of good abstractions, though, they should make tricky things easy to use.

hgsgm · on July 18, 2023

A dozen different incompatible iterator implementations (the current stdlib) isn't easier.

Channels are slow for no good reason when they aren't wrapping blocking operations.

xpressvideoz · on July 18, 2023

Reading the comments makes me feel bittersweet.

- Many people consider coroutines and green threads to be more or less the same thing, when they both have their pros and cons.

- The fact that the omission of iterators is even acceptable in the Go community saddens me. They seem to deliberately refuse any feature that might make the language even slightly more complex, in the name of simplicity. But hey, at least they retracted their opinion on generics.

I'm again reminded that Go is not my language.

38 · on July 18, 2023

I use Go daily. honestly generics didn't change my code much at all.

I use them now somewhat, but only because they offer slightly better sugar at the call site. at the definition site, they are just as ugly as you expect, granted the Go syntax is about the best I have seen from other languages. so in the end I think we really lost some simplicity in order to silence the whining about no generics. not a good tradeoff.

earthboundkid · on July 18, 2023

I wouldn’t confuse HN with “the Go community”. The head of the Go team wrote this post. Will the coro package proposed in the post be added exactly as written? Maybe, but probably not exactly as written. Will something like it be added? I would be willing to bet money on it, yes. How long will it take? I’d say at least a year (release in Go 1.23 in August of 2024), maybe a little longer. I don’t think it could be much shorter though.

Groxx · on July 18, 2023

Given Go's history, there's a moderately high chance that this will be in a PR and immediately merged within the next few hours. They're a bit bipolar on "listen, wait, debate" vs "we just wrote about a new thing, it's already built and in stdlib but is largely undocumented"

philosopher1234 · on July 18, 2023

I’ve never seen them do this. What are you talking about?

pjmlp · on July 19, 2023

The whole Go module drama.

usrbinbash · on July 18, 2023

> But hey, at least they retracted their opinion on generics.

No, we really didn't.

Generics were acceptable to the community only because they are fully backwards compatible with existing code, and can be safely ignored if you don't need them.

Which, not at all surprising, is what most golang code still does. Because as it turns out, outside of "collections" of one sort or another, practical use cases for generics are not that easy to find. Most code written never gets to see more than one type to begin with, and more than 2 is already a stretch.

If anything, the addition of generics showcased to the larger community, how little many of the vaunted features!!!!! that people keep demanding and labeling as "essential" are actually needed for a widely accepted and excellent language.

Patrickmi · on July 18, 2023

Generics didn’t even change up 20% of my codebase and from that 20% there where libraries, am not sure if Go or C habits has hatched into me, but I see generics as another library that one programmer from somewhere just solved something

dgb23 · on July 18, 2023

I would have expected that the Go community welcomes a uniform interface for iteration.

pjmlp · on July 18, 2023

I guess it is great that they are finally paying attentio to programming languages like CLU.

On the other side, given my experience with .NET and C++ co-routines, and Active Objects (in Symbian C++ and Active Oberon) not sure if this is really something to add to Go.

Even the .NET team has acknowledged at this year's BUILD, that if they could go back in time having the runtime handle them Go-style would probably been a better decision, given how many developers keep having issues understanding async/await.

vaastav · on July 17, 2023

Not sure if this really is required. Most cases in Go are served well by GoRoutines and for yield/resume semantics, 2 blocking channel are enough. This seems to add complexity for the sake of it and not sure it actually adds any new power to Go that already didn't exist.

masklinn · on July 17, 2023

Goroutines + channels add an enormous amount of overhead. Using them as iterators is basically insane.

rcme · on July 17, 2023

Why? Channels are already iterable using the range keyword.

    ch := make(chan int)
    go func() {
        for i := 0; i < 100; i += 1 {
            ch <- i
        }
        close(ch)
    }()

    for i := range ch {
        fmt.Println(i)
    }

That is very simple.

jerf · on July 17, 2023

"Enormous amount of overhead" is the operative phrase. In general, you want your concurrency operations to be significantly smaller than the payload of the operation. In the case of using channels as iterators with goroutines behind them, it works fine for something like a web page scraper, where the act of fetching a web page is enormously larger than a goroutine switch, but as a generalized iteration mechanism it's unusably expensive because a lot of iteration payloads are very small compared to a channel send.

I've encountered a lot of people who read that on /r/golang and then ask "well why are send operations so expensive", and it's not that. It's that a lot of iteration operations are on the order of single-digit cycles and often very easy to pipeline or predict. No concurrency primitive can keep up with that. A given send operation is generally fairly cheap but there are enough other things that are still an order or two of magnitude cheaper than even the cheapest send operation that if you block all those super cheap operations on a send you're looking at multiple factor of magnitude slowdowns. Such as you would experience with your code.

rcme · on July 17, 2023

But if your iteration is so fast then you don’t need a channel at all. Just use a plain for loop.

zbentley · on July 17, 2023

Yes, that's exactly what the article is about: how we can model "plain for-loop" levels of performance in the presence of complex (intricate state at multiple levels and high potential for nesting iterators) code that is supplying the loop(s)

jerf · on July 18, 2023

Then you lose the separation of iteration from the looping construct.

You can use a function or a method, but you lose the nice continuation/generator ability to write a function that can do complicated yields without having to write the state machine yourself, plus you run a risk that the call won't inline in which case you're incurring non-trivial function call overhead.

The problem with iteration in Go isn't that you can't solve any given individual problem, the problem is that you can't solve all of them simultaneously the way you can in Rust or Python. (Though one of these days I want to get around to benchmarking Python's iteration versus Go channel-based iteration, I'm not actually sure which would win. What Go considers dangerously slow can still be baseline performance for other languages.) So you can get a defeat-in-detail sort of thing where a person cites a problem, and someone posts as solution to that, and then they cite another problem, and there's a solution for that, and then there's another problem, and a solution is posted for that, and all the solutions do indeed more-or-less solve the given problem... but you can't combine them into one.

wbl · on July 18, 2023

How do you solve the tree fringe problem with a for loop?

Jtsummers · on July 18, 2023

Since all recursive programs can be converted into iterative programs then you can "simply" (not always simple) convert recursive solutions like McCarthy's Lisp solution to a loop: https://dl.acm.org/action/showFmPdf?doi=10.1145%2F1045283 (page 5)

  (DE SAMEFRINGE (X Y)
         (OR (EQ X Y)
             (AND (NOT (ATOM X))
                  (NOT (ATOM Y))
                  (SAME (GOPHER X) (GOPHER Y)))))

  (DE SAME (X Y)
         (AND (EQ (CAR X) (CAR Y))
              (SAMEFRINGE (CDR X) (CDR Y))))

  (DE GOPHER (U)
         (COND ((ATOM (CAR U)) U)
               (T (GOPHER (CONS (CAAR U)
                                (CONS (CDAR U) (CDR U)))))))

Coroutines are not necessary for solving that problem (though they do offer a neat solution to it).

wbl · on July 20, 2023

The results of that conversion are not fun

t8sr · on July 17, 2023

Aside from the massive performance penalty, cache thrashing and context switching, this code will also leak a goroutine (and so, memory) if you don't finish receiving from `ch`. It's more brittle, longer to write, less local and in every other way worse than a for loop. Why would you ever do it?

freecodyx · on July 17, 2023

Race conditions. With coroutines you’re not supposed to deal with race. If i am understanding the motives

rcme · on July 17, 2023

You can write to a channel concurrently.

ackfoobar · on July 17, 2023

And to make the concurrency safe you have to pay the price of synchronization.

https://go.dev/tour/concurrency/8

In A Tour of Go, concurrency section, "Equivalent Binary Trees" is an example of paying the price when you don't need it.

ikiris · on July 17, 2023

its not that you cant, it's that its very expensive.

klabb3 · on July 17, 2023

It’s not insane at all. How did you come to that conclusion?

* Mutex lock+unlock: 10ns

* Chan send buffered: 21ns

* Try send (select with default): 3.5ns

Missing from here is context switches.

In either case, the overhead is proportional to how fast each iteration is. I have channels of byte slices of 64k and the channel ops don’t even make a dent compared to other ops, like IO.

You should absolutely use channels if it’s the right tool for the job.

Fwiw, I wouldn’t use channels for “generators” like in the article. I believe they are trying to proof-of-concept a language feature they want. I have no particular opinion about that.

GrumpySloth · on July 17, 2023

> Missing from here is context switches.

Exactly. From rsc's previous post[1]:

> On my laptop, a C thread switch takes a few microseconds. A channel operation and goroutine switch is an order of magnitude cheaper: a couple hundred nanoseconds. An optimized coroutine system can reduce the cost to tens of nanoseconds or less.

[1]: <https://research.swtch.com/pcdata>

klabb3 · on July 17, 2023

Yeah I 100% understand wanting to optimize this for something like generators if we imagine them as first-class constructs. But they’re not at all a replacement for channels – they would be an addition, or specialization. I’ve never seen real world Go code that has needed it but maybe this will change with generics. It’s worth keeping an eye on, at least.

Channels otoh are very versatile: everything from spsc to mpmc with buffering and starvation protections, fast cancelation and notifications, etc etc. They’re not perfect, but it’s a helluva bang-for-the-buck for a single primitive. Literally all you have to do for performance is add buffering and coalesce “units of work”, and you’re good to go.

vaastav · on July 17, 2023

Sure but that's what the implementation of the coro in this post uses under the hood. Not sure how this is any better wrt overheads.

geodel · on July 17, 2023

Where did you get this '..it uses same under the hood'? The article clearly says:

..Next I added a direct coroutine switch to the runtime, avoiding channels entirely. That cuts the coroutine switch to three atomic compare-and-swaps (one in the coroutine data structure, one for the scheduler status of the blocking coroutine, and one for the scheduler status of the resuming coroutine), which I believe is optimal given the safety invariants that must be maintained. That implementation takes 20ns per switch, or 40ns per pulled value. This is about 10X faster than the original channel implementation.

vaastav · on July 17, 2023

The runtime switch was buried in the last paragraph of the article. All of the code was using goroutines and channels....

geodel · on July 17, 2023

It does mention in an earlier section:

... That means the definition of coroutines should be possible to implement and understand in terms of ordinary Go code. Later, I will argue for an optimized implementation provided directly by the runtime,..

wahern · on July 17, 2023

> Not sure how this is any better wrt overheads.

At the end he implements an experimental runtime mechanism that permits a goroutine to explicitly switch execution to another goroutine rather than using the generic channel scheduling plumbing.

vaastav · on July 17, 2023

Its in the last paragraph of the article...Very easy to miss given the code uses goroutines and channels.

ikiris · on July 17, 2023

see also https://github.com/golang/go/discussions/54245

hgsgm · on July 18, 2023

This is covered in the article.

pmarreck · on July 17, 2023

As a point of comparison, here's my demo from a recent presentation of firing up 1 million (1,000,000) Elixir (BEAM VM) threads, sending them all a "Hello!" message, and then each thread waits a random amount of time between 0 and 2 seconds to send a message back of "Process <their number> received message <themessage>!"

At the same time, I am running the Erlang observer beside it to watch what happens to the CPU and memory consumption and how quickly it recovers/cleans up the garbage.

The biggest bottleneck here is the terminal's ability to keep up, but the observer seems to reflect what's happening accurately.

https://www.youtube.com/watch?v=yxyYKnashR0

The code I used: https://gist.github.com/pmarreck/4cc8f2f55a561ebce2012085a3a...

These features have been built into Erlang (and thus Elixir) since the 1980's. I'm sure many of you have heard of the Actor model and/or Erlang's "legendary" implementation of it, but I don't know how many have actually seen it in action with monitoring kit running.

I think it would be great for Go if it offered language-level support like this, but given the extremely resource-efficient implementation (both in spawning and runtime consumption) of threads on the BEAM VM, coupled with the ease of concurrency which comes directly from only permitting immutable values, I don't think it will ever be matched.

RcouF1uZ4gsC · on July 17, 2023

I don’t think Coroutines would fit in with Go. There is a huge emphasis on simplicity. Coroutines add a massive amount of complexity. In addition, goroutines provide the best parts of Coroutines - cheap, easy to use, non-blocking operations - without a lot of the pain pints such as “coloring” or functions and issues with using things like mutexes.

Just the question of whether one should use a goroutine or a coroutine adds complexity.

badrequest · on July 17, 2023

There are plenty of complicated things in Go, IMHO where it shines best is judiciously providing incredibly nice interfaces atop the complicated things.

bb88 · on July 18, 2023

Having coded in go professionally, I disagree. Go abstractions leak in weird and unexpected ways that are surprisingly different to it's C/C++/Java predecessors.

Goroutines were kind of the raison detre' for using go. But using them wasn't simple, and instead often goroutines brought their own issues. See here:

https://songlh.github.io/paper/go-study.pdf

Often a programming language takes a first guess at the problems they want to solve, and often get them wrong. C++ is probably the most notable language in this category here.

That said, I do appreciate an attempt to improve programming languages even if it undermines the primary feature of the language itself.

talideon · on July 17, 2023

There's no colouring here. These are synchronous coroutines.

jerf · on July 17, 2023

I'm not 100% sure this is the case, but I believe the context of this goes something like this. As Go has added generics, there are proposals to add generic data structures like a Set. Generics solve almost every problem with that, but there is one conspicuous issue that remains for a data structure: You can iterate over a slice or a map with the "range" keyword, and that yields special iteration behavior, but there is no practical way to do that with a general data structure, if you consider constructing an intermediate map or slice to be an insufficient solution. Go is generally performance-sensitive enough that it is.

The natural solution to this is some sort of iterator, as in Python or other languages. (Contra frequent accusations to the contrary, the Go community is aware of other language's efforts.)

So this has opened the can of worms of trying to create an iteration standard for Go.

Go has something that has almost all the semantics we want right now. You can also "range" over a channel. This consumes one value at a time from the channel and provides it to the iteration, exactly as you'd expect, and the iteration terminates when the channel is closed. It just has one problem, which is that it involves a full goroutine and a synchronized channel send operation for each loop of the iteration. As I said in another comment, if what is being iterated on is something huge like a full web page fetch, this is actually fine, but no concurrency primitive can keep up with the efficiency of incrementing an integer, a single instruction which may literally take an amortized fraction of a cycle on a modern processor. With generics you can even relatively implement filter, map, etc. on this iterator... but adding a goroutine and synchronized commit for each such element of a pipeline is just crazy.

I believe the underlying question in this post is, can we use standard Go mechanisms to implement the coroutines without creating a new language construct, then use the compiler under the hood to convert it to an efficient execution? Basically, can this problem be solved with compiler optimizations rather than a new language construct? From this point of view, the payload of this article is really only that very last paragraph; the entire rest of the article is just orientation. If so, then Go can have coroutine efficiency with the standard language constructs that already exist. Perhaps some code that is using this pattern goroutine already might speed up too "for free".

The concerns people have about this complexifying Go, the entire point of this operation is to suck the entire problem into the compiler with 0 changes to the spec. Not complexifying Go with a formal iteration standard is the entire point of this operation. If one wishes to complain, the correct complaint is the exact opposite one, that Go is not "simply" "just" implementing iterators as a first class construct just like all the other languages.

Also, in the interests of not posting a full new post, note that in general I shy away from the term "coroutine" because a coroutine is what this article describes, exactly, and nothing less. To those posting "isn't a goroutine already a coroutine?", the answer is, no, and in fact almost nothing called a coroutine by programmers nowadays actually is. The term got simplified down to where it just means thread or generator as Python uses the term, depending on the programming community you're looking at, but in that context we don't need to use the term "coroutine" that way, because we already have the word "thread" or "generator". This is what "real" coroutines are, and while I won't grammatically proscribe to you what you can and can not say, I will reiterate that I personally tend to avoid the term because the conflation between the sloppy programmer use and the more precise academic/compiler use is just confusing in almost all cases.

adrusi · on July 17, 2023

A goroutine really is implemented as a coroutine though. It's just that without this runtime optimization, there hasn't been a way to interact with them without letting a multithreaded scheduler manage their execution.

I don't remember where I came across this, but many years ago I saw python-style generators termed "semi-coroutines" which I'm a fan of. Python (frustratingly) doesn't implement them this way, but the beauty of generators is that by limiting the scope where execution can be suspended to the highest-level stack frame, you can statically determine how much memory is needed for the (semi-)coroutine, so it doesn't require a single allocation.

Zig takes that a step farther by considering recursive functions second-class, which lets the compiler keep track of the maximum stack depth of any function, and thereby allow you to allocate a stack frame for any non-recursive function inside of another stack frame, enabling zero-allocation full coroutines, as long as the function isn't recursive.

That would... probably be overkill for Go, since marginal allocations are very cheap, and you're already paying the runtime cost for dynamic stack size, so the initial allocation can be tiny.

I would love to see full proper coroutine support make it to Go, freeing users of the overhead of the multithreaded scheduler on the occasions where coroutine patterns work best. I remember back in 2012 or so, looking at examples of Go code that showed off goroutine's utility as control flow primitives even when parallelism wasn't desired and being disappointed that those patterns would likely be rare on account of runtime overhead, and sure enough I hardly ever see them.

HumblyTossed · on July 17, 2023

what? I'm a Go newb, but isn't this what goroutines and channels get you?

adrusi · on July 17, 2023

This is an interface that can be implemented in terms of goroutines and channels, but can also be implemented with a lower-overhead scheduler tweak. The article shows how it could be implemented using goroutines and channels, and then reports the result of that implementation versus an optimized version that avoids synchronization overhead and scheduler latency which is unnecessary with this pattern.

Currently, you could use goroutines and channels to implement a nice way to provide general iteration support for user-defined data structures, but because of the overhead, people most often opt for clunkier solutions. This change would give us the best of both worlds.

EspressoGPT · on July 18, 2023

You can build this yourself using goroutines and channels, but adding "native" first-class support for this generator pattern would be easier to use and come with less overhead.

VWWHFSfQ · on July 17, 2023

Aside:

Lua is an absolute work of art. Everything about the tiny language, how it works, and even all the little peculiarities, just makes sense.

cakoose · on July 18, 2023

One core Lua thing that I think is an ugly mistake: trying to represent maps (dictionaries) and arrays using a single logical data type.

Most languages use different data types but with some API overlap, e.g. maps and arrays are both "iterable". Lua goes too far, I think, and tries to make them the exact same, a data type they call "table".

One side-effect is that you have some operations that only really make sense for maps or lists, but since they work on all tables, they're defined awkwardly, e.g:

> The length of a table t is defined to be any integer index n such that t[n] is not nil and t[n+1] is nil; moreover, if t[1] is nil, n can be zero. For a regular array, with non-nil values from 1 to a given n, its length is exactly that n, the index of its last value. If the array has "holes" (that is, nil values between other non-nil values), then #t can be any of the indices that directly precedes a nil value (that is, it may consider any such nil value as the end of the array).

dingxiong · on July 17, 2023

hmm. why are Lua arrays 1-index based?

nmz · on July 18, 2023

Because arrays start at 1, offsets start at 0

defrost · on July 18, 2023

Arrays are just a sequence of objects.

Indexes (traditionally) start at 1, offsets (measures from base) obviously start from zero.

MathMonkeyMan · on July 17, 2023

good enough for Rome

bovbknn · on July 17, 2023

[flagged]

wtetzner · on July 17, 2023

As it stands, this comment contains no information. Can you explain what about Lua is problematic?

FZambia · on July 18, 2023

Wondering whether coroutines may be a step towards async event-based style APIs without allocating read buffers for the entire connection. I.e. a solution to problems discussed in https://github.com/golang/go/issues/15735. Goroutines provide a great way to have non-blocking IO with synchronous code – but when it comes to effective memory management with many connections Go community tend to invent raw epoll implementations: https://www.freecodecamp.org/news/million-websockets-and-go-.... So my question here – can coroutines somehow bring new possibilities in terms of working with network connections?

xwowsersx · on July 17, 2023

Somewhat on topic given that OP brought up coroutines in Python: what resources have folks used to understand Python's asyncio story in depth? I'm just now finally understanding how to use stuff, but it was through a combination of the official documentation, the books "Using Asyncio in Python" and "Expert Python Programming", none of which were particularly good. Normally I'd rely just on the official docs, but the docs have created much confusion, it seems, because there's a lot in them that are useful more so for library/framework developers than for users. So, I'm just wondering if anyone has great resources for really gaining a strong understanding of Python's asyncio or how else you might have gone about gaining proficiency to the point where you felt comfortable using asyncio in real projects.

manifoldgeo · on July 17, 2023

I read the same books you did, and I was equally unsatisfied afterwards. The "Using Asyncio in Python 3" book was good enough to help me write some code that had to hit an API 400k times without blocking, but I never returned to asyncio after that.

Afterwards, I realized there was a package called aiohttp that I could've used, but too late.

I'll be interested to see what other HN people have done.

dingxiong · on July 17, 2023

This blog helps me a lot about the motivation and underlying mechanism of python asyncio https://tenthousandmeters.com/blog/python-behind-the-scenes-...

xwowsersx · on July 18, 2023

Thanks a lot, I'll check that out.

up2isomorphism · on July 19, 2023

The most valuable quality of a programming language committee is holding the temptation to add any new features unless it is something that drives existing users away.

samsquire · on July 18, 2023

This is a thoroughly interesting topic. Thanks for the article.

I haven't thought much about iterators link to coroutines.

As a hobby, I am working to write about a dream programming language. I happen to be really interested in parallelism, asynchronous, coroutines, multithreading and concurrency.

I want:

* seamlessly switch between remote-thread coroutine, local thread coroutine.

* concurrency and parallelism and async to be easy to think about, reason about, read and program

* programs should be easy to parallelise and be async and concurrent

Go iterators seem to be local to a thread, but what if you want to distribute work across threads?

I've been thinking of scheduling recently.

Imagine you're a search engine company and you want to index links between URLs. How would you solve this with coroutines?

  task download-url
   for url in urls:
    download(url)

  task extract-links
   parsed = parse(document)
   return parsed

  task fetch-links
   for link in document.query("a")
    return link

  task save-data
   db.save(url, link)

How would you do control flow and scheduling and parallelism and async efficiently with this code?

* `db.save()`, `download()` are IO intensive whereas `document.query("a")` and `parse` is CPU intensive.

* I want to handle plurality or multiple items trivially such as multiple URLs and multiple links.

* I want to keep IO and CPU in flight at all times.

I think I want this schedule:

https://user-images.githubusercontent.com/1983701/254083968-...

I have a toy 1:M:L 1 scheduler thread:M kernel threads:N lightweight threads lightweight scheduler in C, Rust and Java

https://github.com/samsquire/preemptible-thread

This lets me switch between tasks and preempt them from user space without assistance at descheduling time.

I have a simplistic async/await state machine thread pool in Java. My scheduling algorithm is very simple.

I want things like backpressure, circuit breakers, rate limiting, load shedding, rate adjustment, queuing.

kragen · on July 18, 2023

i've been thinking about a closely related feature in a different context: adding block arguments, as in smalltalk or ruby or especially lobster, to a language more like c, with static types and stack allocation

i think this would be favorable for (among other things) clu-like iterators and imgui libraries, where you often want to do something like

    submenu("&Edit") {
        command("&Cut") { clip_cut(getSelection()); }
        ...
    }

this is especially useful in a context where you're heap-allocating sparingly or not at all, because the subroutine taking the block argument can stack-allocate some resource, pass it to the block, and deallocate it once the block returns; python context managers and win32 paint messages are two cases where people commonly do this sort of thing, but things like save-excursion, with-output-file, transactional memory, and gsave/grestore also provide motivation

the conventional way to do this is to package up the block into a closure, then use a full-fledged function invocation to invoke it, using a calling convention that supports closures. but i suspect a more relaxed and efficient approach is to use an asymmetric coroutine calling convention, in which the callee yields back control to its caller at the entry point to the block, and the block then resumes the callee when it finishes. so instead of merely dividing registers into callee-saved and call-clobbered, as subroutine calling conventions do, we would divide them into callee-saved upon return but upon yield containing callee values the block must have restored upon resumption; caller coroutine context registers, which are callee-saved upon return and also on yield; and call-clobbered. you also need in many cases a way for the block to safely force an early exit from the callee

this allows the caller's local variables to be in registers its blocks can use without further ado, or at least indexed off of such a register, while allowing the yield and resume operations to be, in many cases, just a single machine instruction. and it does not require heap allocation

as an example of taking this to the point of absurdity, here's an untested subroutine for iterating over a nul-terminated string passed in r0 with a block passed in r1, using a hypothetical coroutine convention which passes at least r4 through from its caller to its blocks

    itersz: push {r6, r7, r8, lr}
            mov  r7, r0
            mov  r6, r1
    1:      ldrb r0, [r7], #1
            cbz  r0, 1f
            blx  r1
            b    1b
    1:      pop  {r6, r7, r8, pc}

and here is another untested subroutine which uses it to calculate a string hash

    hashsz: push {r4, r5, r9, lr}
            movs r4, #53
            adr  r1, 1f
            blx  itersz
            mov  r0, r4
            pop  {r4, r5, r9, pc}
    1:      eor  r4, r0, r4, ror #27
            bx   lr

even in this case where both the iteration and the visitor block are utterly trivial, the runtime overhead per item (compared to putting them in the same subroutine) is evidently extremely modest; my estimate is 7 cycles per byte rather than 4 cycles per byte on in-order hardware with simple branch prediction, so, on the order of 1 ns on the hardware russ used as his reference. for anything more complex the overhead should be insignificant

it's less general than the mechanism russ proposes here (it doesn't solve the celebrated samefringe problem), but it's also an order of magnitude more efficient, because the yield and resume operations are less work than a subroutine call, though still more work than, say, decrementing a register and jumping if nonzero

kragen · on July 18, 2023

corrected, tested, and commented version of the above example code

            @ hashsz s = {
            @   h := 53u
            @   itersz s, b => {
            @     h = (h >> 27 | h << 5) ^ b
            @   }
            @   return h
            @ }

            .syntax unified
            .thumb
    itersz: push {r6, r7, r8, lr}   @ r6 and r7 are preserved by blocks
            mov  r7, r0             @ so put the string pointer in r7
            mov  r6, r1             @ and the block code pointer in r6
    1:      ldrb r0, [r7], #1       @ loop over string bytes, post-indexing
            cbz  r0, 1f             @ bail out on NUL, but otherwise
            blx  r1                 @ invoke block argument with byte, then
            b    1b                 @ repeat loop
    1:      pop  {r6, r7, r8, pc}   @ restore registers and return

    hashsz: push {r4, lr}           @ r4 is callee-preserved and passed to blocks
            movs r4, #53            @ initial value of hash
            adr  r1, 1f+1  @ load Thumbified pointer for block using gas local label
            bl   itersz             @ invoke iterator subroutine
            mov  r0, r4             @ load completed hash into return value
            pop  {r4, pc}           @ restore and return
    1:      eor  r4, r0, r4, ror #27  @ block argument rotates & xors the byte in r0
            bx   lr                 @ then resumes the iterator

kragen · on July 19, 2023

fe de ratas, the above should read

            blx  r6                 @ invoke block argument with byte, then

those responsible for this error have been sacked

years ago, in fact

pierrebai · on July 17, 2023

The examples given prompt me to say: if all you have is Rube-Goldberg hammer, everything looks like an Escheresque nail.

Sieving primes by turning functions into coroutines, parsing text by yielding characters, all with unnatural functions and state management... that;s an improvement over what?

ketchupdebugger · on July 17, 2023

I'm not sure why author is advocating for single threaded patterns in a multithreaded environment. Not sure why he's trying to limit himself like this. The magic of goroutines is that you can use all of your cores easily not just one. Python and Lua has no choice.

GrumpySloth · on July 17, 2023

Coroutines are a control-flow mechanism. They're a single-threaded pattern in as much as for loops are a single-threaded pattern. Ability to write multithreaded programs does not exclude the need for good single-threaded tools.

ketchupdebugger · on July 17, 2023

Looking at python's asyncio coroutine library, they are just mocking multithreading with asyncio.gather. Since coroutines can be executed in any order they are not really control-flow mechanisms. The selling point of coroutines over traditional threads is its lightweight but its moot since goroutines has similar memory cost to coroutines. The only real benefit is that coroutines are non blocking while goroutines may be blocked. There is no real benefit of having python's coroutine in go since goroutines does the same but better.

ackfoobar · on July 17, 2023

The concept of "coroutines" is as much about control flow as "subroutines" (function calls and return statements).

But when you have a construct that has their own call stacks, it's a relatively small step to implement lightweight threads with it.

Since doing concurrency happens much more often than any other smart use of coroutines, many people conflate the two.

I sometimes see this confusion in discussions about Kotlin's coroutines. An example of using coroutines that is not about concurrency: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequence...

jillesvangurp · on July 18, 2023

Kotlin's sequence pre-dates co-routines. Mostly what either of those do is just a bit of syntactic sugar on top of call back mechanisms. One of the nice things with Kotlin is the ability to extend existing APIs via extension functions. Which is something the co-routines library uses extensively to be able to provide co-routine implementations on top of existing frameworks on the JVM, in javascript, and in native environments. Same APIs. Same code. Different underlying implementations. I actively use this in some of my multi platform libraries that I use on JVM and in the browser.

The key concept with Kotlin's co-routines is not having to choose between reactive, green threads, or real threads but treating all of those in the same way with a robust set of abstractions. I use co-routines in the browser on top of existing javascript frameworks that return promises. I use them on the JVM with reactive frameworks like flux. And I also use them with some thread pools. Once Loom reaches LTS (next year, I think), I'll probably be configuring some Loom capable co-routine dispatchers as well.

The debate regarding co-routines vs. go-routines seems like it is similar. I think Go can learn a thing or two from Kotlin's co-routines. After all it is mostly built as a library on top of a single language feature: the suspend keyword. I get that not everybody likes colored functions. But then having a lot of leaky abstractions and related complexity on top of go routines is maybe also not ideal.

ackfoobar · on July 18, 2023

> Kotlin's sequence pre-dates co-routines.

You misunderstood. The `Sequence` type does predate coroutines. But I meant the `sequence` builder function, which takes a block of suspending code to create a `Sequence`.

The in-order traversal in the article can be translated to Kotlin:

    fun walk(t: Tree?): Sequence<Int> = sequence {
        if (t != null) {
            yieldAll(walk(t.left))
            yield(t.value)
            yieldAll(walk(t.right))
        }
    }

As I have noted above, this is a use of coroutines that has nothing to do with concurrency.

jillesvangurp · on July 18, 2023

Sequences actually were part of Kotlin 1.0. Co-routines were added later.

ackfoobar · on July 18, 2023

> Sequences actually were part of Kotlin 1.0.

But the `sequence` builder function was not. In fact it depends on coroutines.

Could you read my reply before repeating?

masklinn · on July 18, 2023

In fact python had general coroutines first in yield, which is mostly used for iteration (“generators”) and state machines.

Some frameworks (e.g. twisted) did use them for concurrency, and the core team originally planned something similar, however the ergonomics were not what they wanted (especially when mixing coroutines-for-concurrency and coroutines-for-iteration), so they went for a more specialised design.

ackfoobar · on July 18, 2023

> did use them for concurrency

> went for a more specialised design

My impression is that JS evolved similarly - (ab)using generator for concurrency, then specialized async-await as a language feature.

masklinn · on July 18, 2023

Kinda? But that was a very short cycle, both promises and generators were added to the language in ES6 (though the community had been coalescing around promises — “thenables” — for a while), the first draft for async functions was actually created during the development cycle of ES6, and it was shipped in ES7.

GrumpySloth · on July 17, 2023

I don't know about Python, but this description of coroutines, the general concept, doesn't seem accurate to me. They most definitely cannot be executed in any order. They also have nothing to do with multithreading whatsoever. Lua has the most honest-to-god implementation of coroutines I know of, so I'd suggest looking at that. I've seen the word "coroutines" used in some weird way suggesting multithreading, which probably means Python used them to fake multithreading, but the idea originally has nothing to do with it. The idea actually started out as a way to better structure an assembly program in 1958: <http://melconway.com/Home/pdf/compiler.pdf>.

maxk42 · on July 17, 2023

Goroutines are non-blocking. Channels may not always be, however.

metadat · on July 17, 2023

Reasoning about and following the control flow of the proposed code hurts me inside. If Go adds function coloring via (e.g. python's async and/or yield concepts), I'm out, because I don't want to use this, much less encounter it in the form of a bug in some library.

Java and C++ are largely inferior for my typical purposes, but at the end of the day they work fine and are stable in terms of direction, and don't tend to repeatedly bloat the language over pedantry. If you want top-notch performance, there's already C, C++, and Rust.

I am not a fan of the function coloring shit in Python and Javascript.

I don't want the kitchen sink!

andrewshadura · on July 19, 2023

Yield has nothing to do with colouring.