As the language is specified, it's not wrong to just ignore that add completely -- the variable isn't used inside the goroutine, and you're not guaranteed a goroutine will run at any particular time.
So, code executed right after that routine is spawned will be hitting the 'old' value anyway; hence Dvyukov's comments that this is arguably to spec, or within spec.
Think if you actually wrote this code; you'd have some weird behavior if the compiler did what you 'expect' - a couple outputs of low integers then some high ones as the goroutine gets scheduled in. So it's almost certainly just 'wrong' to write something like this, under most (all?) circumstances.
Because sync.Wait() isn't used, there's no guarantee the goroutine will execute at all before program halt.
That said, I bet there's some related more reasonable scenario where you could argue the compiler should at least do the adding. I'm not sure why atomic.Add* should get special treatment though, and it looks like the fix is to remove some escape-analysis type code, so I'm curious how this will get fixed.
Yeah, this particular instance seems ok to me. This one makes the example feel weirder:
func main() {
runtime.GOMAXPROCS(runtime.NumCPU())
fmt.Println(runtime.NumCPU(), runtime.GOMAXPROCS(0))
started := make(chan bool)
go func() {
started <- true
for {
atomic.AddUint64(&a, uint64(1))
}
}()
<-started
for {
fmt.Println(atomic.LoadUint64(&a))
time.Sleep(time.Second)
}
}
Here we explicitly wait until the goroutine is started, so we know it's scheduled by the time our other loop runs. Here on my computer with go1.8 linux/amd64 it still optimizes out the while loop, which makes sense as nothing changed that would convince the optimizer the loop should remain given the compiler's current logic.
If you add time.Sleep(time.Millisecond) to the goroutine loop or any other synchronization it works fine. I'm having trouble thinking of a real world example where you'd want an atomic operation going ham in a loop without any sort of time or synchronization. At the very least a channel indicating when the loop is done would cause the loop to compile.
Just because started ensures the goroutine has been scheduled once it makes no guarantees it will ever be scheduled again. It really does nothing to extend the original example code.
> I'm having trouble thinking of a real world example where you'd want an atomic operation going ham in a loop without any sort of time or synchronization.
By using another atomic op as synchronization? After all that's their stated purpose.
But I don't think in go atomic ops are considered synchronization in the sense that they force two goroutines to synchronize at a particular point like a chan. I.e. a chan send in one goroutine must be matched with a chan receive in another (unless they're buffered). If you have an atomic operation between two synchronization points I'd expect the only guarantee is that it occurs between the two points, and when it does it happens atomically.
>>Strictly saying this is confirming behavior as we don't give any guarantees about scheduler (just like any other language, e.g. C++). This can be explained as "the goroutine is just not scheduled".
Not a GO developer. On a multi-processor machine, how is this a conforming behavior? Is "scheduler cannot give any guarantees" acceptable?
'Is "scheduler cannot give any guarantees" acceptable?'
Most schedulers give far fewer guarantees that you might think. A guarantee to be a guarantee must be true no matter what you do within the boundaries of the language. If you create a goroutine fork bomb
func fork_bomb() {
for {
go fork_bomb()
}
}
Go doesn't, to the best of my knowledge, guarantee that any other goroutine will get any execution time, or guarantee much of anything will happen. Your OS is likely to have similarly weak guarantees for the equivalent process bomb, unless you do something to turn on more guarantees/protection.
You have to go into some relatively special-purpose stuff before you can get schedulers that will guarantee that some process will get scheduled for at least 10ms out of every 100ms or something. And then, once you get that guarantee, you'll pay some other way.
Given the fact that most of our machines are incredibly powerful, and that a lot of them still get upgraded on a fairly routine schedule in a lot of dimensions even if single-core clock speed has stalled out, most of us prefer to work with things that just promise to do their best as long as you don't overload them, because the other prices you have to pay to get guarantees turn out not to be beneficial on our monster machines. Of course one should always have their eyes out for when that may not be the case at some point, but in general we're headed away from rigorous guarantees in favor of shared resources that are cheaper and more scalable and making up the difference in volume, rather than trying to get better guarantees.
An unbuffered channel read is always matched with a corresponding write. Let's call the point at which `point1 <- true` occurs and `<-point1` occurs T1 and the point at which `point2 <- true` occurs and `<-point2` occurs T2. fmt.Println("hello") and time.Sleep(3*time.Second) are both guaranteed to occur between T1 and T2. If we didn't have T2 there's no guarantee fmt.Println("hello") would run before the program exits.
Maybe I'm wrong but this is my understanding of Hoare's and go's concurrency model.
How about a long-running calculation that uses atomic variables to report progress or poll for cancellation? Might it move all of the atomic ops outside of the loop?
This is unfair, I think. Go offers strong guarantees that C does not. It has fewer sharp edges than C; it also asserts a principle of not polishing away sharp edges with concurrency.
You don't have to do everything better than C to be a huge improvement over C.
This is issue is not related to ordering, but to visibility (i.e., when is a memory side effect globally visible).
Many concurrent memory models (C++ at least, java very likely) allow coalescing stores to atomic variables as long as there is no other interleaving side effect; this is normally a desirable optimization.
As described by dvyukov, the go compiler seems to be aggressively coalescing all stores till the end of the loop (which never happens).
Store coalescing is desirable, but it is hard to specify completely, for example the C++ standard has non normative wording stating that stores should be visible in a reasonable amount of time exactly to prevent this sort of issues.
Java only requires that the writes can't be seen out of order - if a thread writes A then B, another thread can't read the new value of B and then the old value of A. If coalescing them makes them all happen atomically, that's fine.
In Go, if runtime.GOMAXPROCS() returns 1 or is set to 1 (meaning, Go only has access to a single system thread), the 'for' loop canabalizes the scheduler and the goroutine is rarely or never scheduled.
That fact doesn't make the OP's statement true, or have anything to do with the optimization bug, but it is worth pointing out that there are situations where Go correctly cannot guarantee that goroutine is executed.
I don't understand. By your argument, and some of the replies, because "you're not guaranteed a goroutine will run at any particular time", the compiler is free to elide any suffix of a goroutine's code, even if that code has visible side effects.
I think there is something more specific to atomic going on here, where the same kinds of optimizations are already forbidden for other kinds of visible side effects like printing.
So you are telling me a valid optimization for Go is to elide the entire contents of all goroutines?
And by that argument, since main() itself is an implicitly-called goroutine (so far as I know), then the compiler can elide the entire contents of main() too, and then the rest of the contents of the program.
This is a valid go program (eliding import etc), and it'll almost never print anything.
This is actually a classic mistake when starting with goroutines - not adding synchronization to make sure your main goroutine doesn't exit before any spawned goroutines.
Yes, but then the program would never terminate, because that's what happens at the end of main. If main returns, it's saying that the rest of the code in main has executed.
It is strange, the atomic package[0] exposes various low-level atomic functions but none of them seem to take ordering arguments and the docs don't mention a default order either.
Atomicity and ordering constraints are orthogonal, so you can have an atomic add but relaxed ordering, which allows the compiler to do lots of optimizations on those adds, presumably to the point where they are optimized out completely.
If this is the cause it seems like pretty bad API design to not expose the flags on low-level library functions. Even high level languages like java expose them[2]
There is no official ordering provided by the atomic package. An atomic package isn't very useful without ordering semantics. And ordering actually _is_ provided, it's just not part of the documentation.
The background is that the Go team were initially reluctant to even make the atomic package public. It's mostly there to support the runtime and the sync package which needs fiddly low level concurrency things. But, it exists and it is public. Some people on the Go team regret this (I believe).
Giving clear and precise semantics to the atomic package was delayed because, it's a lot of difficult work and you only get to do it once. There was discussion around trying to find a clear and simple semantics.
In the mean time the atomic package actually had ordering, because you would have a hard time writing runtime or the sync package without this. Also the race detector assumes some kind of ordering for operations in the atomic package. It was confirmed by Dmitry Vyukov that the race detector assumed C/C++11 acquire/release semantics for atomic operations.
After some discussion it was decided that C/C++11 semantics should be adopted. But the documentation has not been updated to reflect this. Updating the documentation doesn't appear to planned right now. Which is a shame.
Above I have sort of put words into the mouths of the Go team (or some members of it). It's not my intention and bad mouth anyone, and I think everyone was making decisions for good reasons. Whether I agree or not :)
With respect to adding ordering arguments, Go most likely won't get those for a very long time.
As many of the comments point out it look like this isn't a compiler bug. The example code makes assumptions about the goroutine execution order without using the facilities (e.g. channels) that guarantee the right ordering. The use of atomic doesn't seem to be relevant.
The example code is what I call "works by chance" which I would contrast with code that "works by design". I think some people are trying to argue that a language should prevent developers from writing code that "works by chance" but I don't think that's actually possible. It's the equivalent of saying I want to be able to type anything I want into my editor and have it compile into doing what I think it should do... A "works by design" code here would be adding channel synchronization points between the two goroutines such that the desired execution order is guaranteed.
It's also IMO not reasonable to expect a new compiler to produce identical results to an old compiler. E.g. if the compiler does a better job your code may run faster (or slower) which may break something if you've assumed that something takes so long to run (e.g. some hard coded timeout in some other component).
At least the 1.8 compiler behaviour is deterministic. So code written assuming some particular value of a would either work or fail consistently. Previously you could construct some code that fails randomly.
can somebody comment on what's happening there? I don't know go and i can't put the pieces together right now. Also aren't optimising-transformations not checked (verified) for correctness? If not, how can the authors be confident in the optimisations? Simple tests? Or is it a bug?
Variable writes that look like they have no side effects may actually be intended for another thread to read (but hopefully most reasonable programs don't do this without some good explanation). However, many concurrency models say that this method of inter-thread communication is Okay to optimize out, because most threading libraries (including Go's) make no guarantees about the order in which threads are scheduled. Therefore, removing the "writing" thread entirely is equivalent to it never being scheduled.
However, this is not the case for atomic operations (which are generally specifically intended for communication via shared variables). This optimization of removing the writing thread entirely isn't allowed if atomic operations are used, because those actually do come with a guarantee that other threads will get to see the update in finite time.
Go was incorrectly "optimizing" the atomic operations out entirely. It's a fairly subtle bug, actually, but still likely a big deal for the few programs communicating via shared variables.
> However, this is not the case for atomic operations (which are generally specifically intended for communication via shared variables). This optimization of removing the writing thread entirely isn't allowed if atomic operations are used, because those actually do come with a guarantee that other threads will get to see the update in finite time.
That sounds more like volatile than atomic. At its core, atomic just means that observer may see either pre-update or post-update state but won't see intermediate states.
I agree that this sounds more like "volatile", but it still makes sense that programmers should be able to rely on atomic operations for inter-thread communication.
According to the comments on the bug report, this behavior is inspired by C++'s <atomic> library, which guarantees cross-thread visibility.
> I agree that this sounds more like "volatile", but it still makes sense that programmers should be able to rely on atomic operations for inter-thread communication.
Only in the sense that they won't see inconsistent (intermediate) states.
Atomicity shouldn't guarantee order per se - if you want both, you need a mutex.
Ordered atomic operations like C++ std::atomic or java/C# volatile do guarantee some ordering. For example if one thread modifies an object and then stores a pointer to an atomic variable, under default visibility rules, another thread reading that pointer from the atomic variable is guaranteed to see the modifications to the object itself, i.e. the update, the store and the remote load are ordered.
It is possible to have a model where all atomic operations are relaxed (i.e. unordered) and use explicit barriers for ordering (which is closer to how some, but not all, CPUs work), but it is significantly harder to reason about and prove algorithms correct under it.
No. Read the spec. There is no deadline for atomics to be visible. What is true is that some operations have happens-before ordering, and if a store happens-before a load, the load will see the result of the store.
Did you reply to the wrong post? My post was only about ordering.
Still the C++ spec has wording about making stores visible in a reasonable time. The wording is non-normative as they run out of time (ah!) while trying to come up with a more formal (and implementable) specification.
edit: oh, I see I wrote 'default visibility' instead of 'default ordering'.
One point of view is that the removal of the atomic increment is justified because the programmer cannot rely on the execution of the go routine that contains the increment. If you add an explicit call to the scheduler the problem is fixed. However, the justification is a bit of a stretch and it looks like it will be changed back to old behavior by the go developers.
I think how to verify real compiler optimisations is still an open research question. Does any other compiler do it formally? I'm not sure it's reasonable to expect Go to do it.
In the general case, it's definitely not solved. Java and C/C++ (gcc) are known to have had compiler optimizations introduce bugs. (I assume the same is true about clang and other compilers; I just happen to remember about these ones).
For a recent Java one with Hotspot JIT, there is this FOSDEM 2017 talk.
"....Escape Analysis and Intrinsics, two commonly used HotSpot optimization techniques. I'll show how a combination of these two features can optimize away IndexOutOfBoundsExceptions in some corner cases where they are required by the standard..."
It seems that they unified the assembler used in the go compiler to use intrinsics for amd64.
In doing so, they must have botched something, because now some atomic operations (IE. operations that shouldn't be interruptible by another thread) are completely optimized away.
This is quite a big issue because any multi-threaded program will use these (they are the only way to do lock-free code).
This isn't correct, if it was this would be a much bigger issue. I'd be quite surprised if this affects any code in the wild.
This bug is related to the compiler deciding to optimize the loop. All modifications of variables in the loop are being batched till the loop exits, but it never exits therefore none of the modifications ever happen. In most cases this would be fine as the side-effects of modifying the variables in the loop wouldn't need to visible anywhere else until the loop exited.
If you remove the loop and just modify the atomic variable once, the modification is eventually happens and is seen by the second go routine.
What's funny to me is: if you don't do the add in a loop, go1.8 generates the add code:
go func() {
atomic.AddUint64(&a, uint64(1)) // no loop
}()
To me, it's the unpredictability of how compilers will choose to apply these extreme optimizations that is most frustrating. Reasoning from the spec ("the goroutine might not be scheduled anyway") isn't very satisfying when the compiler clearly doesn't follow logic like this. The implementers reason from the spec and then glue together a mishmash of targeted optimizations--presumably the ones they think will be most helpful.
I suppose many modern modes of programming with composable abstractions result in lots of dead code, and it's a net win for the compiler to aggressively try to get rid of it. With simpler languages like C or Go, though, I might prefer a rule like: "I wouldn't have written the code if I didn't want any instructions generated from it."
As far as I can tell it's just as broken if you change the Add to a Store (at least, doing that and breaking in the main routine when it's > 1 still times out on play.golang.org). That just turned into a (terrible) spinlock and has no overflow issues.
As the language is specified, it's not wrong to just ignore that add completely -- the variable isn't used inside the goroutine, and you're not guaranteed a goroutine will run at any particular time.
So, code executed right after that routine is spawned will be hitting the 'old' value anyway; hence Dvyukov's comments that this is arguably to spec, or within spec.
Think if you actually wrote this code; you'd have some weird behavior if the compiler did what you 'expect' - a couple outputs of low integers then some high ones as the goroutine gets scheduled in. So it's almost certainly just 'wrong' to write something like this, under most (all?) circumstances.
Because sync.Wait() isn't used, there's no guarantee the goroutine will execute at all before program halt.
That said, I bet there's some related more reasonable scenario where you could argue the compiler should at least do the adding. I'm not sure why atomic.Add* should get special treatment though, and it looks like the fix is to remove some escape-analysis type code, so I'm curious how this will get fixed.