Cmd/compile: Go 1.8 regression: sync/atomic loop elided

vessenes · on Feb 20, 2017

This is interesting, actually.

As the language is specified, it's not wrong to just ignore that add completely -- the variable isn't used inside the goroutine, and you're not guaranteed a goroutine will run at any particular time.

So, code executed right after that routine is spawned will be hitting the 'old' value anyway; hence Dvyukov's comments that this is arguably to spec, or within spec.

Think if you actually wrote this code; you'd have some weird behavior if the compiler did what you 'expect' - a couple outputs of low integers then some high ones as the goroutine gets scheduled in. So it's almost certainly just 'wrong' to write something like this, under most (all?) circumstances.

Because sync.Wait() isn't used, there's no guarantee the goroutine will execute at all before program halt.

That said, I bet there's some related more reasonable scenario where you could argue the compiler should at least do the adding. I'm not sure why atomic.Add* should get special treatment though, and it looks like the fix is to remove some escape-analysis type code, so I'm curious how this will get fixed.

alecrn · on Feb 20, 2017

Yeah, this particular instance seems ok to me. This one makes the example feel weirder:

  func main() {
  	runtime.GOMAXPROCS(runtime.NumCPU())
  	fmt.Println(runtime.NumCPU(), runtime.GOMAXPROCS(0))
  	started := make(chan bool)
  
  	go func() {
  		started <- true
  		for {
  			atomic.AddUint64(&a, uint64(1))
  		}
  	}()
  
  	<-started
  	for {
  		fmt.Println(atomic.LoadUint64(&a))
  		time.Sleep(time.Second)
  	}
  }

Here we explicitly wait until the goroutine is started, so we know it's scheduled by the time our other loop runs. Here on my computer with go1.8 linux/amd64 it still optimizes out the while loop, which makes sense as nothing changed that would convince the optimizer the loop should remain given the compiler's current logic.

If you add time.Sleep(time.Millisecond) to the goroutine loop or any other synchronization it works fine. I'm having trouble thinking of a real world example where you'd want an atomic operation going ham in a loop without any sort of time or synchronization. At the very least a channel indicating when the loop is done would cause the loop to compile.

schmichael · on Feb 20, 2017

Just because started ensures the goroutine has been scheduled once it makes no guarantees it will ever be scheduled again. It really does nothing to extend the original example code.

thegeekpirate · on Feb 21, 2017

FWIW, "runtime.GOMAXPROCS(runtime.NumCPU())" has been done automatically for you since Go 1.5, so there's no need to include it.

the8472 · on Feb 20, 2017

> I'm having trouble thinking of a real world example where you'd want an atomic operation going ham in a loop without any sort of time or synchronization.

By using another atomic op as synchronization? After all that's their stated purpose.

alecrn · on Feb 20, 2017

But I don't think in go atomic ops are considered synchronization in the sense that they force two goroutines to synchronize at a particular point like a chan. I.e. a chan send in one goroutine must be matched with a chan receive in another (unless they're buffered). If you have an atomic operation between two synchronization points I'd expect the only guarantee is that it occurs between the two points, and when it does it happens atomically.

flik · on Feb 20, 2017

>>Strictly saying this is confirming behavior as we don't give any guarantees about scheduler (just like any other language, e.g. C++). This can be explained as "the goroutine is just not scheduled".

Not a GO developer. On a multi-processor machine, how is this a conforming behavior? Is "scheduler cannot give any guarantees" acceptable?

jerf · on Feb 20, 2017

'Is "scheduler cannot give any guarantees" acceptable?'

Most schedulers give far fewer guarantees that you might think. A guarantee to be a guarantee must be true no matter what you do within the boundaries of the language. If you create a goroutine fork bomb

    func fork_bomb() {
        for {
            go fork_bomb()
        }
    }

Go doesn't, to the best of my knowledge, guarantee that any other goroutine will get any execution time, or guarantee much of anything will happen. Your OS is likely to have similarly weak guarantees for the equivalent process bomb, unless you do something to turn on more guarantees/protection.

You have to go into some relatively special-purpose stuff before you can get schedulers that will guarantee that some process will get scheduled for at least 10ms out of every 100ms or something. And then, once you get that guarantee, you'll pay some other way.

Given the fact that most of our machines are incredibly powerful, and that a lot of them still get upgraded on a fairly routine schedule in a lot of dimensions even if single-core clock speed has stalled out, most of us prefer to work with things that just promise to do their best as long as you don't overload them, because the other prices you have to pay to get guarantees turn out not to be beneficial on our monster machines. Of course one should always have their eyes out for when that may not be the case at some point, but in general we're headed away from rigorous guarantees in favor of shared resources that are cheaper and more scalable and making up the difference in volume, rather than trying to get better guarantees.

alecrn · on Feb 20, 2017

My understanding is there are only scheduling guarantees around synchronization points between goroutines through chans or mutexes and stuff.

Go's concurrency model is inspired by Hoare's communicating sequential processes which kinda has the same idea: http://usingcsp.com/cspbook.pdf

For instance in this program:

  func main() {
  	point1 := make(chan bool)
  	point2 := make(chan bool)
  
  	go func() {
  		point1 <- true
  		fmt.Println("hello")
  		point2 <- true
  	}()
  
  	<-point1
  	time.Sleep(3 * time.Second)
  	<-point2
  }

An unbuffered channel read is always matched with a corresponding write. Let's call the point at which `point1 <- true` occurs and `<-point1` occurs T1 and the point at which `point2 <- true` occurs and `<-point2` occurs T2. fmt.Println("hello") and time.Sleep(3*time.Second) are both guaranteed to occur between T1 and T2. If we didn't have T2 there's no guarantee fmt.Println("hello") would run before the program exits.

Maybe I'm wrong but this is my understanding of Hoare's and go's concurrency model.

ridiculous_fish · on Feb 21, 2017

How about a long-running calculation that uses atomic variables to report progress or poll for cancellation? Might it move all of the atomic ops outside of the loop?

Filligree · on Feb 20, 2017

> Because sync.Wait() isn't used, there's no guarantee the goroutine will execute at all before program halt.

This is the sort of reasoning I'd expect to see from C apologists. Go should be better...

There are implicit guarantees on when goroutines are scheduled, too. The scheduler makes some attempts at fairly spreading the CPU.

knorker · on Feb 20, 2017

> There are implicit guarantees on when goroutines are scheduled

But there's also explicitly no guarantees on that.

omginternets · on Feb 21, 2017

This is unfair, I think. Go offers strong guarantees that C does not. It has fewer sharp edges than C; it also asserts a principle of not polishing away sharp edges with concurrency.

You don't have to do everything better than C to be a huge improvement over C.

fmstephe · on Feb 20, 2017

Unfortunately this miscompilation probably is within spec, given that the atomic package has no documented ordering.

But it's coming.

https://github.com/golang/go/issues/5045

gpderetta · on Feb 20, 2017

This is issue is not related to ordering, but to visibility (i.e., when is a memory side effect globally visible).

Many concurrent memory models (C++ at least, java very likely) allow coalescing stores to atomic variables as long as there is no other interleaving side effect; this is normally a desirable optimization.

As described by dvyukov, the go compiler seems to be aggressively coalescing all stores till the end of the loop (which never happens).

Store coalescing is desirable, but it is hard to specify completely, for example the C++ standard has non normative wording stating that stores should be visible in a reasonable amount of time exactly to prevent this sort of issues.

twic · on Feb 20, 2017

Java only requires that the writes can't be seen out of order - if a thread writes A then B, another thread can't read the new value of B and then the old value of A. If coalescing them makes them all happen atomically, that's fine.

eternalban · on Feb 20, 2017

What "'old' value" are you talking about?

This is a pointer:

    val := atomic.LoadUint64(&a)

> Because sync.Wait() isn't used, there's no guarantee the goroutine will execute at all before program halt.

It's an infinite loop ...

> I'm not sure why atomic.Add* should get special treatment though

Because it side effects and arguably it is a form (mem) I/O in context of modern systems.

010a · on Feb 20, 2017

> It's an infinite loop ...

In Go, if runtime.GOMAXPROCS() returns 1 or is set to 1 (meaning, Go only has access to a single system thread), the 'for' loop canabalizes the scheduler and the goroutine is rarely or never scheduled.

That fact doesn't make the OP's statement true, or have anything to do with the optimization bug, but it is worth pointing out that there are situations where Go correctly cannot guarantee that goroutine is executed.

eternalban · on Feb 20, 2017

Yes, I am aware of that. Just addressing the OP's misread of the code.

But do note that /may/ change given the GC considerations:

https://github.com/golang/go/issues/10958

kevhito · on Feb 20, 2017

I don't understand. By your argument, and some of the replies, because "you're not guaranteed a goroutine will run at any particular time", the compiler is free to elide any suffix of a goroutine's code, even if that code has visible side effects.

I think there is something more specific to atomic going on here, where the same kinds of optimizations are already forbidden for other kinds of visible side effects like printing.

omginternets · on Feb 21, 2017

>even if that code has visible side effects

The argument is that you told it to do that. "Just print whenever", and whenever can mean never if the world dies prematurely.

kevhito · on Feb 21, 2017

So you are telling me a valid optimization for Go is to elide the entire contents of all goroutines?

And by that argument, since main() itself is an implicitly-called goroutine (so far as I know), then the compiler can elide the entire contents of main() too, and then the rest of the contents of the program.

</reductio-ad-absurdum>

natefinch · on Feb 21, 2017

Well, sort of, yes. https://play.golang.org/p/qyfqZA5pww

  func main() {
    go fmt.Println("Hi!")
  }

This is a valid go program (eliding import etc), and it'll almost never print anything.

This is actually a classic mistake when starting with goroutines - not adding synchronization to make sure your main goroutine doesn't exit before any spawned goroutines.

CasualSuperman · on Feb 25, 2017

Yes, but then the program would never terminate, because that's what happens at the end of main. If main returns, it's saying that the rest of the code in main has executed.

the8472 · on Feb 20, 2017

It is strange, the atomic package[0] exposes various low-level atomic functions but none of them seem to take ordering arguments and the docs don't mention a default order either.

Atomicity and ordering constraints are orthogonal, so you can have an atomic add but relaxed ordering, which allows the compiler to do lots of optimizations on those adds, presumably to the point where they are optimized out completely.

If this is the cause it seems like pretty bad API design to not expose the flags on low-level library functions. Even high level languages like java expose them[2]

[0] https://golang.org/pkg/sync/atomic/ [1] http://en.cppreference.com/w/cpp/atomic/memory_order [2] http://download.java.net/java/jdk9/docs/api/java/lang/invoke...

fmstephe · on Feb 20, 2017

There is no official ordering provided by the atomic package. An atomic package isn't very useful without ordering semantics. And ordering actually _is_ provided, it's just not part of the documentation.

The background is that the Go team were initially reluctant to even make the atomic package public. It's mostly there to support the runtime and the sync package which needs fiddly low level concurrency things. But, it exists and it is public. Some people on the Go team regret this (I believe).

Giving clear and precise semantics to the atomic package was delayed because, it's a lot of difficult work and you only get to do it once. There was discussion around trying to find a clear and simple semantics.

In the mean time the atomic package actually had ordering, because you would have a hard time writing runtime or the sync package without this. Also the race detector assumes some kind of ordering for operations in the atomic package. It was confirmed by Dmitry Vyukov that the race detector assumed C/C++11 acquire/release semantics for atomic operations.

After some discussion it was decided that C/C++11 semantics should be adopted. But the documentation has not been updated to reflect this. Updating the documentation doesn't appear to planned right now. Which is a shame.

https://github.com/golang/go/issues/5045

Above I have sort of put words into the mouths of the Go team (or some members of it). It's not my intention and bad mouth anyone, and I think everyone was making decisions for good reasons. Whether I agree or not :)

With respect to adding ordering arguments, Go most likely won't get those for a very long time.

akkartik · on Feb 20, 2017

Quote of the day:

"Language implementation[s] become useless when they take formal properties to the extreme."

(https://github.com/golang/go/issues/19182#issuecomment-28107...)

YZF · on Feb 20, 2017

As many of the comments point out it look like this isn't a compiler bug. The example code makes assumptions about the goroutine execution order without using the facilities (e.g. channels) that guarantee the right ordering. The use of atomic doesn't seem to be relevant.

The example code is what I call "works by chance" which I would contrast with code that "works by design". I think some people are trying to argue that a language should prevent developers from writing code that "works by chance" but I don't think that's actually possible. It's the equivalent of saying I want to be able to type anything I want into my editor and have it compile into doing what I think it should do... A "works by design" code here would be adding channel synchronization points between the two goroutines such that the desired execution order is guaranteed.

It's also IMO not reasonable to expect a new compiler to produce identical results to an old compiler. E.g. if the compiler does a better job your code may run faster (or slower) which may break something if you've assumed that something takes so long to run (e.g. some hard coded timeout in some other component).

At least the 1.8 compiler behaviour is deterministic. So code written assuming some particular value of a would either work or fail consistently. Previously you could construct some code that fails randomly.

LeanderK · on Feb 20, 2017

can somebody comment on what's happening there? I don't know go and i can't put the pieces together right now. Also aren't optimising-transformations not checked (verified) for correctness? If not, how can the authors be confident in the optimisations? Simple tests? Or is it a bug?

yalue · on Feb 20, 2017

Variable writes that look like they have no side effects may actually be intended for another thread to read (but hopefully most reasonable programs don't do this without some good explanation). However, many concurrency models say that this method of inter-thread communication is Okay to optimize out, because most threading libraries (including Go's) make no guarantees about the order in which threads are scheduled. Therefore, removing the "writing" thread entirely is equivalent to it never being scheduled.

However, this is not the case for atomic operations (which are generally specifically intended for communication via shared variables). This optimization of removing the writing thread entirely isn't allowed if atomic operations are used, because those actually do come with a guarantee that other threads will get to see the update in finite time.

Go was incorrectly "optimizing" the atomic operations out entirely. It's a fairly subtle bug, actually, but still likely a big deal for the few programs communicating via shared variables.

masklinn · on Feb 20, 2017

> However, this is not the case for atomic operations (which are generally specifically intended for communication via shared variables). This optimization of removing the writing thread entirely isn't allowed if atomic operations are used, because those actually do come with a guarantee that other threads will get to see the update in finite time.

That sounds more like volatile than atomic. At its core, atomic just means that observer may see either pre-update or post-update state but won't see intermediate states.

yalue · on Feb 20, 2017

I agree that this sounds more like "volatile", but it still makes sense that programmers should be able to rely on atomic operations for inter-thread communication.

According to the comments on the bug report, this behavior is inspired by C++'s <atomic> library, which guarantees cross-thread visibility.

chimeracoder · on Feb 20, 2017

> I agree that this sounds more like "volatile", but it still makes sense that programmers should be able to rely on atomic operations for inter-thread communication.

Only in the sense that they won't see inconsistent (intermediate) states.

Atomicity shouldn't guarantee order per se - if you want both, you need a mutex.

gpderetta · on Feb 20, 2017

Ordered atomic operations like C++ std::atomic or java/C# volatile do guarantee some ordering. For example if one thread modifies an object and then stores a pointer to an atomic variable, under default visibility rules, another thread reading that pointer from the atomic variable is guaranteed to see the modifications to the object itself, i.e. the update, the store and the remote load are ordered.

It is possible to have a model where all atomic operations are relaxed (i.e. unordered) and use explicit barriers for ordering (which is closer to how some, but not all, CPUs work), but it is significantly harder to reason about and prove algorithms correct under it.

edit: wording

wbl · on Feb 20, 2017

No. Read the spec. There is no deadline for atomics to be visible. What is true is that some operations have happens-before ordering, and if a store happens-before a load, the load will see the result of the store.

gpderetta · on Feb 20, 2017

Did you reply to the wrong post? My post was only about ordering.

Still the C++ spec has wording about making stores visible in a reasonable time. The wording is non-normative as they run out of time (ah!) while trying to come up with a more formal (and implementable) specification.

edit: oh, I see I wrote 'default visibility' instead of 'default ordering'.

frede · on Feb 20, 2017

One point of view is that the removal of the atomic increment is justified because the programmer cannot rely on the execution of the go routine that contains the increment. If you add an explicit call to the scheduler the problem is fixed. However, the justification is a bit of a stretch and it looks like it will be changed back to old behavior by the go developers.

chrisseaton · on Feb 20, 2017

I think how to verify real compiler optimisations is still an open research question. Does any other compiler do it formally? I'm not sure it's reasonable to expect Go to do it.

chimeracoder · on Feb 20, 2017

In the general case, it's definitely not solved. Java and C/C++ (gcc) are known to have had compiler optimizations introduce bugs. (I assume the same is true about clang and other compilers; I just happen to remember about these ones).

pjmlp · on Feb 20, 2017

For a recent Java one with Hotspot JIT, there is this FOSDEM 2017 talk.

"....Escape Analysis and Intrinsics, two commonly used HotSpot optimization techniques. I'll show how a combination of these two features can optimize away IndexOutOfBoundsExceptions in some corner cases where they are required by the standard..."

https://fosdem.org/2017/schedule/event/optimizing_error/

xgbi · on Feb 20, 2017

It seems that they unified the assembler used in the go compiler to use intrinsics for amd64.

In doing so, they must have botched something, because now some atomic operations (IE. operations that shouldn't be interruptible by another thread) are completely optimized away.

This is quite a big issue because any multi-threaded program will use these (they are the only way to do lock-free code).

ghthor · on Feb 20, 2017

This isn't correct, if it was this would be a much bigger issue. I'd be quite surprised if this affects any code in the wild.

This bug is related to the compiler deciding to optimize the loop. All modifications of variables in the loop are being batched till the loop exits, but it never exits therefore none of the modifications ever happen. In most cases this would be fine as the side-effects of modifying the variables in the loop wouldn't need to visible anywhere else until the loop exited.

If you remove the loop and just modify the atomic variable once, the modification is eventually happens and is seen by the second go routine.

iand · on Feb 20, 2017

It's a bug.

richard_todd · on Feb 20, 2017

What's funny to me is: if you don't do the add in a loop, go1.8 generates the add code:

    go func() {
        atomic.AddUint64(&a, uint64(1))  // no loop
    }()

To me, it's the unpredictability of how compilers will choose to apply these extreme optimizations that is most frustrating. Reasoning from the spec ("the goroutine might not be scheduled anyway") isn't very satisfying when the compiler clearly doesn't follow logic like this. The implementers reason from the spec and then glue together a mishmash of targeted optimizations--presumably the ones they think will be most helpful.

I suppose many modern modes of programming with composable abstractions result in lots of dead code, and it's a net win for the compiler to aggressively try to get rid of it. With simpler languages like C or Go, though, I might prefer a rule like: "I wouldn't have written the code if I didn't want any instructions generated from it."

0xCMP · on Feb 21, 2017

Luckily go does provide ways to see the compiler's decisions. I'm not sure if it goes far enough here though

londons_explore · on Feb 20, 2017

What is the overflow behaviour here?

Surely in the case of this example code, the compiler sees an overflow is going to occur, leading the variable back to zero.

By removing all the code, the compiler has simply combined all the additions into a single addition which overflows back to the initial value.

Seems working as intended to me.

mook · on Feb 20, 2017

As far as I can tell it's just as broken if you change the Add to a Store (at least, doing that and breaking in the main routine when it's > 1 still times out on play.golang.org). That just turned into a (terrible) spinlock and has no overflow issues.

On mobile, so I can't test with a real toolchain. https://play.golang.org/p/DlWBh1Via4

Dylan16807 · on Feb 20, 2017

Surely? Based on what?

As far as I can see there's a very good chance it will do the same thing even if you use a saturating add.