Not sure if this really is required. Most cases in Go are served well by GoRouti...

masklinn · on July 17, 2023

Goroutines + channels add an enormous amount of overhead. Using them as iterators is basically insane.

rcme · on July 17, 2023

Why? Channels are already iterable using the range keyword.

    ch := make(chan int)
    go func() {
        for i := 0; i < 100; i += 1 {
            ch <- i
        }
        close(ch)
    }()

    for i := range ch {
        fmt.Println(i)
    }

That is very simple.

jerf · on July 17, 2023

"Enormous amount of overhead" is the operative phrase. In general, you want your concurrency operations to be significantly smaller than the payload of the operation. In the case of using channels as iterators with goroutines behind them, it works fine for something like a web page scraper, where the act of fetching a web page is enormously larger than a goroutine switch, but as a generalized iteration mechanism it's unusably expensive because a lot of iteration payloads are very small compared to a channel send.

I've encountered a lot of people who read that on /r/golang and then ask "well why are send operations so expensive", and it's not that. It's that a lot of iteration operations are on the order of single-digit cycles and often very easy to pipeline or predict. No concurrency primitive can keep up with that. A given send operation is generally fairly cheap but there are enough other things that are still an order or two of magnitude cheaper than even the cheapest send operation that if you block all those super cheap operations on a send you're looking at multiple factor of magnitude slowdowns. Such as you would experience with your code.

rcme · on July 17, 2023

But if your iteration is so fast then you don’t need a channel at all. Just use a plain for loop.

zbentley · on July 17, 2023

Yes, that's exactly what the article is about: how we can model "plain for-loop" levels of performance in the presence of complex (intricate state at multiple levels and high potential for nesting iterators) code that is supplying the loop(s)

jerf · on July 18, 2023

Then you lose the separation of iteration from the looping construct.

You can use a function or a method, but you lose the nice continuation/generator ability to write a function that can do complicated yields without having to write the state machine yourself, plus you run a risk that the call won't inline in which case you're incurring non-trivial function call overhead.

The problem with iteration in Go isn't that you can't solve any given individual problem, the problem is that you can't solve all of them simultaneously the way you can in Rust or Python. (Though one of these days I want to get around to benchmarking Python's iteration versus Go channel-based iteration, I'm not actually sure which would win. What Go considers dangerously slow can still be baseline performance for other languages.) So you can get a defeat-in-detail sort of thing where a person cites a problem, and someone posts as solution to that, and then they cite another problem, and there's a solution for that, and then there's another problem, and a solution is posted for that, and all the solutions do indeed more-or-less solve the given problem... but you can't combine them into one.

wbl · on July 18, 2023

How do you solve the tree fringe problem with a for loop?

Jtsummers · on July 18, 2023

Since all recursive programs can be converted into iterative programs then you can "simply" (not always simple) convert recursive solutions like McCarthy's Lisp solution to a loop: https://dl.acm.org/action/showFmPdf?doi=10.1145%2F1045283 (page 5)

  (DE SAMEFRINGE (X Y)
         (OR (EQ X Y)
             (AND (NOT (ATOM X))
                  (NOT (ATOM Y))
                  (SAME (GOPHER X) (GOPHER Y)))))

  (DE SAME (X Y)
         (AND (EQ (CAR X) (CAR Y))
              (SAMEFRINGE (CDR X) (CDR Y))))

  (DE GOPHER (U)
         (COND ((ATOM (CAR U)) U)
               (T (GOPHER (CONS (CAAR U)
                                (CONS (CDAR U) (CDR U)))))))

Coroutines are not necessary for solving that problem (though they do offer a neat solution to it).

wbl · on July 20, 2023

The results of that conversion are not fun

t8sr · on July 17, 2023

Aside from the massive performance penalty, cache thrashing and context switching, this code will also leak a goroutine (and so, memory) if you don't finish receiving from `ch`. It's more brittle, longer to write, less local and in every other way worse than a for loop. Why would you ever do it?

freecodyx · on July 17, 2023

Race conditions. With coroutines you’re not supposed to deal with race. If i am understanding the motives

rcme · on July 17, 2023

You can write to a channel concurrently.

ackfoobar · on July 17, 2023

And to make the concurrency safe you have to pay the price of synchronization.

https://go.dev/tour/concurrency/8

In A Tour of Go, concurrency section, "Equivalent Binary Trees" is an example of paying the price when you don't need it.

ikiris · on July 17, 2023

its not that you cant, it's that its very expensive.

klabb3 · on July 17, 2023

It’s not insane at all. How did you come to that conclusion?

* Mutex lock+unlock: 10ns

* Chan send buffered: 21ns

* Try send (select with default): 3.5ns

Missing from here is context switches.

In either case, the overhead is proportional to how fast each iteration is. I have channels of byte slices of 64k and the channel ops don’t even make a dent compared to other ops, like IO.

You should absolutely use channels if it’s the right tool for the job.

Fwiw, I wouldn’t use channels for “generators” like in the article. I believe they are trying to proof-of-concept a language feature they want. I have no particular opinion about that.

GrumpySloth · on July 17, 2023

> Missing from here is context switches.

Exactly. From rsc's previous post[1]:

> On my laptop, a C thread switch takes a few microseconds. A channel operation and goroutine switch is an order of magnitude cheaper: a couple hundred nanoseconds. An optimized coroutine system can reduce the cost to tens of nanoseconds or less.

[1]: <https://research.swtch.com/pcdata>

klabb3 · on July 17, 2023

Yeah I 100% understand wanting to optimize this for something like generators if we imagine them as first-class constructs. But they’re not at all a replacement for channels – they would be an addition, or specialization. I’ve never seen real world Go code that has needed it but maybe this will change with generics. It’s worth keeping an eye on, at least.

Channels otoh are very versatile: everything from spsc to mpmc with buffering and starvation protections, fast cancelation and notifications, etc etc. They’re not perfect, but it’s a helluva bang-for-the-buck for a single primitive. Literally all you have to do for performance is add buffering and coalesce “units of work”, and you’re good to go.

vaastav · on July 17, 2023

Sure but that's what the implementation of the coro in this post uses under the hood. Not sure how this is any better wrt overheads.

geodel · on July 17, 2023

Where did you get this '..it uses same under the hood'? The article clearly says:

..Next I added a direct coroutine switch to the runtime, avoiding channels entirely. That cuts the coroutine switch to three atomic compare-and-swaps (one in the coroutine data structure, one for the scheduler status of the blocking coroutine, and one for the scheduler status of the resuming coroutine), which I believe is optimal given the safety invariants that must be maintained. That implementation takes 20ns per switch, or 40ns per pulled value. This is about 10X faster than the original channel implementation.

vaastav · on July 17, 2023

The runtime switch was buried in the last paragraph of the article. All of the code was using goroutines and channels....

geodel · on July 17, 2023

It does mention in an earlier section:

... That means the definition of coroutines should be possible to implement and understand in terms of ordinary Go code. Later, I will argue for an optimized implementation provided directly by the runtime,..

wahern · on July 17, 2023

> Not sure how this is any better wrt overheads.

At the end he implements an experimental runtime mechanism that permits a goroutine to explicitly switch execution to another goroutine rather than using the generic channel scheduling plumbing.

vaastav · on July 17, 2023

Its in the last paragraph of the article...Very easy to miss given the code uses goroutines and channels.

ikiris · on July 17, 2023

see also https://github.com/golang/go/discussions/54245

hgsgm · on July 18, 2023

This is covered in the article.