Blog author here with a regretful correction. This result is not accurate, mea culpa. The headline should read: errors.Is() is 500% slower.
Basically: the critical functions in the benchmark are small enough they were being inlined by the compiler, which means it's possible for the compiler to further optimize the loop to avoid all comparisons in some cases, producing an inaccurate result for some of the benchmarks. You can fix this by adding noinline directives to the methods.
I'll be publishing an update to the article and a post-mortem on how this slipped past review. The rank ordering of techniques in the article is unchanged, but the magnitude of the difference is not nearly so large.
Thanks for posting the update here. I was going to point out that 0.5ns for anything is almost always a sign of benchmarks being optimized away.
But even if those numbers had been accurate, it is important to think about how they compare to the operations being performed. Taking 4 nanoseconds to check for an error sentinel is not a huge deal when you've spent microseconds or even milliseconds waiting for a network service.
It's a relatively bigger deal for many other potential sources of errors, like parsing a string as a number. Also, even if the caller does not unwrap the error values, producing errors often requires at least one heap allocation.
These are all fair tradeoffs for the simplicity and ergonomics of Go's error handling. They rarely matter for performance because the happy critical path of most programs does not produce lots of errors. These are just nuances worth recognizing in a thread specifically about the isolated overheads of errors.
It should (hopefully!) be self-evident that "hot loops" in which single-digit nanosecond performance differences are meaningful, are (a) exceptional; and, more importantly, (b) not a place where you'd make any kind of function call in the first place!
Our database is within spitting distance of MySQL's raw performance and getting faster all the time. Certainly there are some applications where that will not be fast enough. That didn't prevent MySQL from becoming the worlds most widely deployed free database.
And yes, postgres is faster and will eventually pass MySQL in deployments, but that's success also has very little to do with its speed.
It may just be me, but I have found greater joy and satisfaction working in Go compared to any other language. I believe this is very important when considering the finite and decreasing time remaining in my life.
Sometimes "good enough" performance is perfectly fine when considering a bigger picture.
That's one of the most reasonable ways to look at it, among many voiced here.
I agree with it, it's the same reason I enjoy C# (and sometimes F#) - it gives the sense of control, gets out of the way when you want to get things done, and gives powerful tools when you want to push it to the limit.
The problem is - Go is not an underdog the way C# is if you look at GitHub statistics, and has reached the escape velocity that gets it picked for all the fun projects, even when they would have been better served by C# which either offers a purpose-built capability or good tools to implement such. And when Go fails that, it will be made work despite its shortcomings, similar to Python and ML.
It's very painful to move languages when such move involves the sense of settling for less. I tried a lot and very few felt like an improvement - some would perform better in a particular area but would also have significant shortcomings I'm not comfortable with in areas C# doesn't.
Updated the numbers in the gist with more context.
This is .NET 9 preview 5 JIT or .NET 8 + DPGO casts and VTable profiling feature (it was tuned, stabilized and is now enabled by default in 9).
This would be representative for a long-running DB workload as the one the original blog post refers to, which wants to use JIT rather than AOT.
Practically speaking, the example is implemented in a way to make it logically closer to Golang one. However, splitting the interface check and the cast defeats default behavior of .NET 8 JIT's guarded and ILC's exact devirtualization. Ideally, you would not perform type tests and just express it through a single interface abstraction that inherits from IEquatable<T> - this is what regular code looks like and what the compiler is optimized towards (e.g. if ILC can see that only a single type implements an interface, all callsites referring to it are unconditionally devirtualized).
Tl;Dr:
CPU Freq: 3.22GHz, single cycle: ~0.31ns
L1 reference roundtrip lat.: 4 cycles (iirc for Firestorm)
Cost of GetValue -> (object?, IError?) + Errors.Is call
-- JIT --
.NET 9: 3.24ns value ret, 3.58ns error ret
.NET 8: 5.56ns value ret, 5.86ns error ret
-- AOT --
.NET 9: 5.72ns value ret, 5.91ns error ret
.NET 8: 5.52ns value ret, 5.89ns error ret
Note: .NET 9 numbers are not finalized - it's still 5 months away from release.
Yeah, it's in the same area as guarded devirtualization done by HotSpot but there are some differences.
ILC is "IL AOT Compiler", it targets the same compiler back-end as JIT but each one has its own set of specific optimizations, on top of the ones that do not care about JIT or AOT conditions.
JIT relies on "tiered compilation" and DynamicPGO, which is very similar to HotSpot C1/C2, except it does not have interpreter mode - the code is always compiled and data shows that to have better startup latency than e.g. what Spring Boot applications have.
AOT on the other hand relies on "frozen world" optimizations[0] which can be both better and worse than JIT's depending on the exact scenario. JIT currently defaults to a single type devirtualization then fallback (configurable). AOT otoh can devirtualize and/or inline up to 3 variants and doesn't have to emit a fallback when not needed because the exact type hierarchy is statically known.
There are other differences that stem from architectural choices made long ago, namely, all method calls in C# are direct except those that are explicitly marked as virtual or interface (there are also delegates but eh), naturally, if a compiler sees the exact type in a local scope, no call will be virtual either. In Java, all calls are virtual by default unless stated or proven by compiler otherwise, interface calls there are also more expensive which is why certain areas of OpenJDK have to be this advanced in terms of devirt, and support de- and re-optimization, that .NET's JIT opted not to do - data shows most callsites are predominantly mono and bi-morphic, and the fallback cost is inexpensive in most situations (virtual calls are pretty much like in C++, interface calls use inline caching style approach).
Today, JIT, on average, produces faster application code at the cost of memory and startup latency. It also isn't limited by "can only build for -march=x86-64-v2" unlike AOT. In the future, I hope AOT gets the ability to use higher internal compiler limits and spend more time on optimization passes in a way that isn't viable with JIT as it would mean sacrificing its precious throughput.
The thing with programming language evolution is that quite often everything except the error handling gets faster over time. If you have an app that makes aggressive use of errors, you’re not going to see the same speedups from version to version that others are seeing.
Particularly in JITed languages, if the multiplier is hyperbole today, there may be a day in the future where it’s accurate.
The minimum cost of an error is one cold branch. Which is effectively free on the happy path, and as cheap as it possibly can be on the error path.
A well-designed error handling system can feasibly reach this goal, at which point no more optimization is possible.
Edit @hinkley: I am unfortunately in hacker jail, and have used up my post quota, so I have to edit my post rather than reply to yours. Reply follows.
Throwing, the way Java and C++ do it, exceeds the minimum cost of an error, yes. Not that this is some sort of show-stopping problem, I'm discussing the minimum cost, not what level of cost makes errors too expensive.
A branch-not-taken, which is the opposite of processing an error, caught or otherwise, is what I described as practically free. Not always. If the only path forward is, for instance, to reference a pointer, and the error would mean that the pointer is a null pointer, the chip would have to wait for the branch to coalesce, which might mean a minor pipeline stall.
But if the processor can do further work, such as checking for an error on malformed input, where it's legal to speculatively execute the next decode on the predicted branch, it really is practically free. It costs considerably less than one instruction. If the branch triggers, the processor spills all the speculative work and switches. Since errors are expected less frequently than success, more or less by definition, that's the correct tradeoff.
This is why I described the error branch as "cold". The processor should predict and speculate on the happy path.
Zig is an example of a language where errors can be as inexpensive as a superscalar check on a given register while the main path moves forward. This is also possible in C when checking errno(). I can't speak for Rust, but enormous amounts of work have gone into that compiler, and it should indeed be possible to compile a match-on-Result to the same efficient level. Perhaps trickier since Rust doesn't "know" what an error is, but I suspect that in practice it can get this right.
Not that it's even mostly up to the compiler: there are ASM hints for the branch predictor in some instruction sets, but modern branch predictors don't need those to handle the case where 99+% of the time execution proceeds on one branch. It's conventional for compilers to emit "branch on cold, proceed on hot", but it's been a long time since that has mattered for execution speed. You'll get a bit tighter on a chip with no predictor and a shallow pipeline, like AVR, but for anything which can run an operating system it's almost irrelevant now.
I see no a priori reason why Go shouldn't achieve the minimum level either. Languages which use exceptions and try/catch must do some bookkeeping, which imposes a cost on the code whether an exception is raised, or not. That can be minimized, but not to the limit I'm talking about.
Note that the costs described are for the ability to generate an error trace, not error handling itself. This cost is only paid in debug and ReleaseSafe modes.
Do you notice any difference if you update `BenchmarkFoundErrorsIs()` to the following?
func BenchmarkFoundErrorsIs(b *testing.B) {
var es errStore
for i := 0; i < b.N; i++ {
val, err := es.GetValue(true)
if err != nil {
if errors.Is(err, notFoundErr) {
b.Fatal("expected found")
} else {
b.Fatal(err)
}
}
if val == nil {
b.Fatal("expected not nil")
}
}
}
I think this doesn't mean you shouldn't ever use sentinel errors, just that you should be mindful of the performance cost, as it isn't free. If you're doing this for a loop where the iteration of the loop is vastly slower, e.g. because it is performing some sort of I/O in each iteration, or if you're doing this for a loop where you know the worst-case amount of iterations is very low, then you can probably safely continue to eat the cost of using sentinel errors. The only time where it is genuinely better to avoid a performance pitfall at all costs is when scaling comes into play, since then it's very likely that it will creep up fast and make the performance absolutely unusably bad seemingly out of nowhere.
In practice, if you are really writing performance critical code, you kind of just need to benchmark and determine which bottlenecks are holding you back. Knowing the performance hit of some patterns is still useful, but it's not a substitute for actually measuring the performance.
That said, the even better solution would probably be if future Go versions could just make the idiomatic code faster somehow, so that it is less likely to be a bottleneck. But, this is the case for many patterns in Go, as Go and its stdlib doesn't really prioritize performance or zero (runtime) cost abstractions. (They justify this with data; in developers surveys, time and time again Go developers greatly express their priority in having robust, "safer" code over higher performance code.)
https://go-review.googlesource.com/c/go/+/576015 this change to optimise errors.Is in the err == nil case was merged recently. It’s not included in the latest Go release, hence the stats here, but should end up in a release pretty soon I’d imagine.
Sorry, that's still not a reason to use bools instead of "errors" and
An error should by default be considered non-recoverable, to be returned when something goes very wrong. Semantically, a table not existing isn't really an error, it's something you expect to happen all the time.
Agree to disagree here. And yes, this article is a follow-up to the linked one.
The last section of the blog has a discussion of aesthetic and philosophical concerns around using errors for control flow. I'm against it, for the reasons I provide. YMMV.
Your take away in the "ok is harmful" article is incorrect. There is nothing wrong with "ok" - your problem is with shadowing variables. The better organized code couldn't use shadowing. Your example of moving the ok-checks to the end makes as much sense as moving all your err checks to the bottom. If you overwrite a shadow variable before using the value, of course that's bad.
The real takeaway should be that one must always be careful with shadow variables and generally use them as immediately as possible.
I'm not sure how useful it is to adjudicate what is ultimately just a property of the language. It's not something to celebrate, or mourn, or (importantly) try to work around, it's just the way things work.
It seems like you're just fighting the language. This is not the path to success :)
Yes, it is the same situation. No, we don't tend to use unique names for errors. The idiom is just too strong.
This is somewhat mitigated by consistent consecutive error handling, meaning you always immediately dispatch an error on the line after assigning it. Every time we have had bugs because of this, it's because someone tried to be clever and delay handling an error until slightly later (like at the end of an if else chain where each branch might have had an error).
That quote conflates errors and exceptions. An error is an unsuccessful result. An exception, however, precludes any result at all. Sometimes they are the same, but in many cases they are different.
Asking to retrieve a table that doesn't exist is an error, because the answer is that there is no table; the operation was unsuccessful. There failed to be retrieved any table, but as the table always could have either existed or not, this is not an exceptional case. Nothing went wrong: you simply asked to retrieve a table and no table was found, so you get that back as an error.
However, trying to perform a table operation on that unsuccessful result generates an exception, because you violated a contract: you can't operate on something that doesn't exist. Sure, you can operate on the representation of nonexistence (that is, null), but you attempted to operate on a table and you don't have one.
The computation stops prematurely with an exception because the operation expects the table to necessarily exist. There is no unsuccessful result to be returned because the operation didn't even make it to the point of failing; the performance itself of the operation failed. The operation is entirely invalid and returning any result would be incorrect.
At least, that is my personal understanding of the terms.
For example, most things in Rust are errors and not exceptions, because exceptions (panics) generally terminate the thread, or abort the process, depending on user preference or runtime environment. Anything that isn't fatally unrecoverable deserves to be an error - in other words, a `Result::Err` - which can be handled by the caller, as opposed to panics, which are not guaranteed to be catchable (unwinding is not a required part of the language, it is an optional convenience).
Most failures in JavaScript are exceptions, because you are not made to check anything, and everything is dynamically typed, so if you make an assumption that isn't true, an exception has to be raised to stop you. Luckily, there is try/catch. It can also be difficult to represent errors without exceptions, you instead have things called "error states" which are basically certain non-errors being treated as errors by user code, when either that result is undesired even if perfectly normal, or that result is the only/easiest way for the error to be reported by the source.
Anyway, my point is that the article probably means to use the term exception and say that a boolean is better than an exception when the error isn't an exceptional case. And in that case I would agree. In Rust, I don't want my program to crash when I ask for a nonexistent table - I only want that to happen later if I decide that the table's nonexistence is a fatal error in the context of the whole thread.
You are explaining the diferrence between error results and panics in Rust, and maybe this also applies to Go. But this is very much not how exceptions (a word these languages don't even use) are understood in languages which have them.
In Java, C#, C++, for example, exceptions are meant for any situation where the operation you asked for can't be completed successfully, for whatever reason. The most common example is IOException: if you asked to read 200 bytes from a socket, but the connection got interrupted after 100 bytes, that's an IOException. This would not be appropriate as a panic in Go or Rust, for a comparison of the difference in philosophy.
> You are explaining the diferrence between error results and panics in Rust
That is indeed an example I included, yes
> In Java, C#, C++, for example, exceptions are meant for any situation where the operation you asked for can't be completed successfully, for whatever reason. The most common example is IOException: if you asked to read 200 bytes from a socket, but the connection got interrupted after 100 bytes, that's an IOException.
- Java has checked exceptions, which... ugh, I'm not going to get too much into this, but this is basically their terrible way of representing expected errors while reusing the try/catch syntax. Yes, the thing is called exceptions at the language level. Yes, they do interrupt control flow. But fundamentally I consider them mere errors.
- C++ overuses exceptions in a similar way, except they're not checked because fuck you, nothing in C++ is checked.
The relevant quote's talking about semantics, so I'm talking about errors and exceptions as something that transcends the individual programming language. Different languages are different and don't always (even usually don't) map these concepts 1:1.
The behavior in Java and C++ is just a different philosophy of what exceptions mean, that is what I was trying to say. The semantics of an exception in these languages is that functions should either return a simple type or throw an exception if they are unable to return that type for whatever reason (be it a bug, an unmet precondition, a temporary error, an unmet postcondition, etc).
Exceptions are not extraordinary events, and it is certainly never appropriate to stop execution because some exception was raised: the program itself is supposed to, at the right level, specify what should happen when an exception is encountered.
The difference between different levels of error, such as fatal errors vs others, is captured by the type of the exception itself.
This all has little to do with Java's checked exceptions support. That is a feature that is supposed to help with verification that the program indeed specifies what to do when an exception happened, for certain classes of errors that are considered more likely to occur in production. C#, which learned from both C++ and Java, doesn't include these, but still has the same semantics for exceptions.
> Exceptions are not extraordinary events, and it is certainly never appropriate to stop execution because some exception was raised: the program itself is supposed to, at the right level, specify what should happen when an exception is encountered.
Exceptions are being used for errors, and that's fine, I just don't consider them actual exceptions because, for example, you expect certain operations to be able to fail and you should handle that failure gracefully. In Rust errors are a separate concept that can be part of the return value, but in Java and C++ (that "different philosophy") errors are also just exceptions and you're expected to know which ones happen in the course of normal operation in order to handle them.
I know in JavaScript I sometimes use try/finally (without catch) for things that throw exceptions that are indeed fatal for my function, and then something higher up can catch errors thrown by that function if it's not quite fatal anymore. But it feels annoying to have to "handle exceptions" when I should be handling expected errors instead, and writing code that contains no unexpected errors. (TypeScript at least makes that easier than it otherwise would be for vanilla JavaScript.)
Again, you're just shoe-horning in your preconceived ideas about how errors should be handled into a different language.
The philosophy of Java and C# especially* is that software should be resilient to errors in production. It doesn't matter if one function for one request with some obscure parameter failed because of a network failure or because of a bug that led to an NPE when that parameter was encountered. What matters is that all other requests that don't have that bug should continue running and not impact users unnecessarily.
So, at some points in my code, I need to handle any possible exception, and decide how to continue from that point. I also need to properly clean up resources and so on regardless of whether I encountered a bug in a subcomponent or an external failure.
Now sure, I probably don't want to be writing "catch (NullPointerException) { ... }", while I may occasionally want "catch (IOException e) {....}" . So there is some difference, but that is where the type system comes in (and the oh-so-hated checked exceptions feature).
* C++'s philosophy is the worse here, of course: programs that have bugs are not valid C++ so they don't even discuss them. Rust's "any bug should crash your program as soon as possible", while still very strange to me, is of course much better.
Java just needs better language constructs for checked exceptions to make them easier to deal with. It seems like the OpenJdk team is finally moving towards looking at making them better though.
I like your example. Although I do remember that there are also read operations where you ask for 200 bytes and you can get 100 back. And it is up to the caller to check for this. Otoh, if the socket is closed you indeed get a socket closed exception. But that is because the docker was closed while reading. What is the rust behaviour in this case?
> I do remember that there are also read operations where you ask for 200 bytes and you can get 100 back.
Those are read operations where you say you have 200 bytes to store the result, rather than where you ask for exactly 200 bytes. But yes, they certainly exist, the ones that ask for exact numbers of bytes are typically built on repeated calls to the ones that can return less.
> Otoh, if the socket is closed you indeed get a socket closed exception. But that is because the docker was closed while reading.
I think this depends on how it closed, sometimes you just get EOF, which is an error state but not an exception (unless your language is clinically insane like Python). Obviously if it's interrupted in such a way that there would probably have been more data but you can no longer get it, that's a different error than EOF (and some languages use exceptional control flow to report mere errors, it is just that Rust does not do this).
> What is the rust behaviour in this case?
Read calls can return less data than you asked for. If they return no data, that is a successful EOS. They can also report I/O errors as distinct from a successful result.
There is an I/O error for "unexpected EOF", where you require more data than is present, and the fact that it's not present is an error. I believe `read_exact` uses this if anything happens before it has entirely filled the buffer, including completely normal connection closure.
Your understanding otherwise seems on point, but I'm not sure they can ever be the same. An exception is the representation of programmer fault. An error, on the other hand, is the representation of faults outside of the programmer's control.
Like you point out, operating on a missing table is a mistake made by the programmer. This fault could have been avoided when the code was written. The table not being there is not within the control of the developer (with some assumptions of what table is meant to mean here) and no amount of perfect coding could have avoided the situation.
> Most failures in JavaScript are exceptions
Idiomatic Javascript suggests handling errors using its exception handling mechanism, but the payload is still error in nature. In fact, the base object the idioms suggest you use to hold error data, passed using the exception handler, is literally called Error.
Javascript may be confused by having no official "Exception" type. Although arguably it does not need one as you can simply throw a string, which is all you need for an exception. The exception handler will tack on the rest of the information you need to know automatically. Java takes a different opinion, though, having two different, albeit poorly named, built-in classes for errors (Exception) and exceptions (RuntimeException).
The go standard library uses sentinel errors. ErrNotFound is actually good to use, because otherwise, a `Get` call can end up returning `nil, nil` (*Value, error). Which is just bad.
Anyone who’s curious why `errors.Is()` is so slow (relatively speaking) should read the implementation [0]. It’s pretty obvious once you read the code—it relies on reflection followed by a tree-walking algorithm.
So the overhead is the function call and 2 comparisons. I wonder if the `isComparable := reflectlite.TypeOf(target).Comparable()` line was moved into a separate function if the compiler would inline Is() reducing the perf impact.
Optimized Go code will generally perform on par with optimized Rust code. Exceptions exist and Go is a much less expressive language but that affects developers more than users. Go's error handling with wrapping is very similar to the anyhow crate which also has runtime overhead. The happy path should occur much more often than the error path and so the overhead is usually worth it for developer convenience in debugging.
The quality of code relying on reflection varies wildly (even within the standard library) so I don't think anything about the reflection system itself is the problem. I've written reflection-based solutions to problems sparingly but with an eye to giving good output on errors so debugging is easy. You can write some awful and hard-to-debug macros in Rust too.
> The quality of code relying on reflection varies wildly (even within the standard library) so I don't think anything about the reflection system itself is the problem. I've written reflection-based solutions to problems sparingly but with an eye to giving good output on errors so debugging is easy. You can write some awful and hard-to-debug macros in Rust too.
Maybe? I am yet to see easy-to-read/easy-to-debug Go reflection. I have not encountered hard-to-debug macros in Rust (probably because I have almost never needed to use a debugger in Rust in the first place). Doesn't mean that these don't exist, of course.
I don't really use an interactive debugger with Go that often. I prefer error wrapping and logging. That having been said, I've found that it is often best to disentangle reflection and business logic by converting to an intermediate representation. For example, I wrote a library that takes a struct pointer and populates its target with values taken from environment variables, using field tags like `env:"FOO"`. The first version did at least have useful error messages, e.g.
reading environment into struct: struct field Foo: as int32: strconv.ParseInt: parsing "a": invalid syntax
However, it wasn't very extensible, because you had to read through a lot of boilerplate to figure out where to put new features. I also bet debugging this through an interactive debugger would have been quite a headache, though I never had to do that. So I rewrote it and split it into two parts: a reflection-based part which returned details about the extracted fields and their tags, then a (nearly) reflection-free part which took those details and parsed the corresponding environment variables then injected the values into the fields. The intermediate representation is usually hidden behind a simple interface, but can be examined when needed, and prints nicely for debugging.
> Optimized Go code will generally perform on par with optimized Rust code.
This is an incredibly dangerous assumption. Go and Rust are in a completely different weight class of compiler capability.
Before making an assumption like this, I strongly suggest building a sample application (that is more complex than hello world) in Go and Rust, then compiling both with optimizations and taking a peek at what they compile to with Ghidra/IDA Pro/any other disassembler.
C# + .NET 8 can sometimes trade blows with Rust + LLVM, particularly with struct generics, but Go definitely can't, is far behind in compiler features and doesn't leverage techniques that reduce cost of non-free abstractions that .NET and OpenJDK employ.
I should clarify that by "optimized" I meant hand optimized, not just compiler optimized. You can easily code yourself into surprisingly poor performance with Go, though usually for the same/similar reasons as C#/Java. Rust makes it a little harder to get into the same situations though it's by no means impossible.
As to the zero/low-cost abstractions, Go has very few abstractions at a language level and that is what I meant by "much less expressive". If you try too hard to write Go as though it were another language, you will shoot yourself in the foot. However the Go compiler, standard library, and broader ecosystem have come a long way and enable very similar performance profiles to most other compiled languages. It has not been my experience, at least in micro-benchmarks, to see any major performance differences in Go vs e.g. C on things like encryption, hashing, RNG, SIMD, etc. You do need to avoid allocations in hot code paths but that's true of any garbage-collected language.
The outstanding performance issues I'm aware of are around deferring, calling methods through an interface, and calling out to cgo. These are pretty minor in most situations but can eat up a lot of CPU cycles if done often in tight loops. They've also been getting better over time and can sometimes be optimized away with PGO.
This is fair but performant data processing in Go simply has much lower performance ceiling than Rust, C#, Swift or C++. Which is why all standard library routines that do so (text search, hashing, encryption, etc.) are written in Go's custom """portable""" ASM dialect. It is unwieldy and has API omissions because it takes this half-assed approach many things in Golang ecosystem do.
The closer, and better, example is hand-optimized C# where you still (mostly) write portable and generic code for SIMD routines and they get compiled to almost the same codegen as hand-intrinsified C++.
I understand where this belief comes from, but the industry at large does itself a huge disservice by tunnel visioning at a handful of (mostly subpar) technologies.
Unfortunately, you are right that SIMD support in Go isn't great and basically requires writing platform-specific code in their quirky assembly language or using a third-party library which does that for you.
I don't think the existing use of assembly under the hood to accelerate performance when possible is something to be frowned upon though. It's an implementation detail that almost never matters.
One issue with the performance of their assembly code though is that it passes arguments and return values on the stack. I saw something about allowing code to use registers instead but I don't think it's available yet.
I think you over hype c# and swift, we did benchmark a grpc api server in c# ( net core ) and Go, same implementation in both languages and Go while faster was using way less memory.
Yes, the default implementation of hashing in Rust is designed to be cryptographically-safe at the expense of speed. You can get a large speedup by replacing it, e.g. https://nnethercote.github.io/perf-book/hashing.html .
I believe this is because the code `counts[word]++` basically works by:
1. Hash word, retrieve the value from the counts map
2. Increment the value
3. Hash word, store the value into the counts map
By change counts to a map of strings to pointers of ints, you only need to perform #3 once per unique word. Once you've read the int pointer, you need only write to the pointer and not re-hash the word and write that to the map, which can be expensive.
I am surprised this doesn't get optimized by the compiler. I assume this is necessary in the general case (map could be modified concurrently, causing the bucket for the key to move) but it obviously isn't here.
Look at the antecedents of some of those comments and you'll see the bar for successfully introducing an unasked-for language comparison on a language thread is very high.
A reasonable rule of thumb specific to this thread: unless the article is about a comparison between Rust and Go, you probably cannot safely discuss Go in a Rust thread, or vice versa. It's a lit match thrown into dry brush.
This calls for the classic quote: "Premature optimization is the root of all evil"
I expect the vast majority of code spends so little time in errors.Is() that trying to optimize it saves nothing. Profile first, and change it later if needed; as Go is statically typed such a change is trivial.
> errors.Is() is expensive. If you use it, check the error is non-nil first to avoid a pretty big performance penalty on the happy path.
Am I missing something, or could this be partially mitigated by adding a check for `err == nil && target != nil` and vice versa, rather than `err == target`?
are a "pretty big performance penalty", there isn't much work going on. I mean, of course, these 2 or 3 comparisons + the function call of `Is` are at least double that of
err != nil
even if `Is` would have been inlined by the compiler.
Actually, the `err == nil` check doesn't exist in the latest release. It was added recently, which would explain the significant performance cost. If err is nil, it doesn't return early.
There's a tangential issue in Rust where there's a concern around if a library should be exposing the error types of its dependencies. If they are exposed, semver is broken anytime a dependency is updated in a semver breaking fashion. I, in general, agree that this breaks encapsulation and is not good. However, and maybe I'm doing things wrong, but often enough there are edge cases where it is important to distinguish between different errors and adjust control flow. Often, library authors will not be able to foresee all possible usecases for their error types, so one can end up doing some horrible type casting or worse hacks to get at the actual underlying data. I'm still torn as to what is the better way forward.
That's what the #[non_exhaustive] attribute is for. If a struct or enum is given this attribute then everybody else is treated as though they cannot exhaustively match it, they must write the catch all case. You (as the owner of the type) can still exhaustively match it, after all you're the one who'll get broken by your own change if you screw up - but everybody else gets told they need to write the "catch all".
So, no, errors aren't special and should obey semver but you deliberately can (and should) signal that your users can't exhaustively match a thing you know will change.
I want to be exhaustive on my error checks. I want my match to fail if a new variant is added, so that I can handle it (even if it's just to ignore it). A big reason why refactors can be so fearless in Rust is thanks to exhaustive checks. While the 'refactor' part applies mostly to your own, 1st party code, it's also useful when upgrading 3rd party dependencies.
For this reason I personally don't like `#[non_exhaustive]`. I don't expect my code to keep compiling when I upgrade the minor version of my dependencies, and as long as the reason is simple, like a new enum variant was added, then I'm happy that it errored and will happily fix it.
This perfectionist attitude towards semver is why so many projects get stuck forever at v0. https://0ver.org/
The problem is not that a new variant of an enum could be introduced, it is that an error that a library returns has the option to return an *inner* error, which then could be a type defined by another library. Thus, if you develop library A and depend on library B, you cannot reasonably update B if you return it's errors because, by doing so, you're breaking your public API. Emotionally, I'm very much for breaking backwards compatibility and just bumping semver in these cases, but people more reasonable than myself have put forward the argument that such a policy would cause too much churn in the ecosystem.
It's only fair to call out that #[non_exhaustive] only helps you for the very specific case of adding new enum variants or struct fields. It does nothing for the burden of keeping all existing variants and fields compatible with any way they could have been matched.
java has the "rootCause" constructor which wraps another Exception (actually Throwable) of any type passed in. so the idiomatic way would be to create your own error types, but to pass the lower-level exception in as the rootCause so you can trace the exceptions down the stack if desired. Since the type is Throwable rather than a dependency-specific exception, this isn't exposed in the interface unless you go out of your way to cast things in an obviously dangerous way.
When you say "obviously dangerous" do you mean "obviously breaking encapsulation" or is there a bigger issue? If there's a bigger issue then I'm not sure how it's useful for solving this problem of wanting the details.
"obviously breaking encapsulation", like if you pull some interface from the dependency through yours, then yeah, you're broken when they break things. That's a generalized interface problem and java can't help you anymore than rust can. But I guess it's true that it may not be obvious that library interfaces may churn unexpectedly etc, you are dependent on the good behavior of others unless you specifically take steps to abstract it.
(although you do absolutely have to be careful about what you are returning in a client-facing error message etc. don't return the parts of the stack trace that leak information about your application to an actual client, they should be caught by your framework and turned into generic "4xx git rekt"/"5xx we made a fucky wucky" messages that don't reveal too much about your environment.)
not a java problem specifically for either of those, it just seems odd coming from that world that there's not a concept of a "recursive exception stack" like that in rust?
> The problem is that sentinel errors, as typically and idiomatically used, in fact are special, and are more expensive to deal with than other values. My suggestion to use boolean values outperforms them by a lot, 30x in fairly common idiomatic usage.
while I agree to some degree, but when performance comes into picture, what really matters more is normal path vs surprise path rather than happy path vs error path
it's hard to argue what is a happy path in the code, but it's not wrong to say that io.EOF check is a normal path of the code i.e. not a surprise in production. the bad performance of errors.Is is something to be improved upon but it's not a surprise in production when there are a large number of `errors.Is` checks during normal path of the code
now coming to the surprise path of the code, here's where performance gets really important, no one wants their code to suddenly hog 100% CPU because of some special error case - https://blog.cloudflare.com/cloudflare-outage . but such surprise paths often contain a large amount of business logic that weigh much more than how slow errors.Is function is compared to a boolean check
it would be interesting to see where this line of reasoning is valid but IMO performance isn't a good argument against why errors are not normal outcomes of operations in production
but thumbs up for the article, now I know what to reference for backing the below pattern that I often use, when I first saw the errors.Is it was pretty obvious that its going to be slow but just didn't have time to prove it and use below pattern
The function call itself isn’t really the culprit (the cost of the function call itself is negligible in this case). It’s the implementation of the function [0].
The `err == nil` check was added in April to avoid expensive and unnecessary reflection. Perhaps 1.23, or whichever release that ends
up in, would be faster?
I have to say that nearly all the production code I write in Go is typically stuff interacting with external services like databases and SaaS APIs, so the difference in nanoseconds here is not going to move me away from using the safe idiomatic approach, but it is something to bear in mind for whenever I might use Go for very computationally intense tasks.
It's a shame that errors.Is is slow for general use, and at least some of that seems attributable to the Comparable change requiring reflection. Multi-errors seems to have bloated the switch. And of course the lack of a happy-path that was fixed in [1].
Since Go already has two ways of handling exceptional state: return or panic, it does feel like a stretch to also introduce a "not found" path too. All bets are off in tight inner loops, but I think as a general coding practice, it'll make the language (de facto) more complicated/ambiguous.
But my take away is that the question has been kicked off: can wrapped errors be made more efficient?
I would argue that "not found" is already present in the form of boolean "ok" return values, in cases where it really isn't an exceptional error state. On the other hand, the standard library implements things like errors.NewNotFound
The absolute values here are in terms of single-digit nanoseconds.
The Venn diagrams of "situations where single-digit nanosecond performance differences are important" and "situations where one would use errors.Is" have no overlap.
This isn't really relevant unless you're error checking inside a hot function - in which case, you need to rewrite your hot function. The majority of code out there is I/O bound (network/disk) anyway.
Comparing to the speed of a direct boolean check is a great way to sensationalize really small numbers.
Nobody's real-world code is being slowed down by 500% because all real world code is doing much more than just checking errors. All I see from these results is a 15-16ns cost to using errors.Is versus an additional boolean.
Even the examples ("GetValue") hint at an extremely common use case: reads from a data store. A case where single-digit milliseconds is considered "all good performance-wise" for the most common SQL databases, clocking in at 100000x the time scale of errors.Is.
I thought this as well, but I also think it depends on how you’ve structured your code. If errors aren’t used by the algorithm (they’re just used for early exit) then yes it won’t make a big difference, but if your algorithm is handling errors nearly as frequently as the happy path then you’ll see a difference using the boolean.
Yeah, of course if you're just running through an in-memory slice and doing a little arithmetic on each item, sentinel errors and errors.Is might dominate your runtime. But the dominant use-cases (e.g. in the standard library) are syscalls, filesystem interactions, network APIs - you know, stuff that takes real time.
This reminds me of "latency numbers every programmer should know". Work with the standard library and other well-conceived go projects and you gain the intuition that error handling and bit-twiddling arithmetic don't belong together. That's the real story here, and OP's article is way wide of the mark.
> Here's one of them, which follows a common recommendation in the Go community to use a sentinel error to represent the "value not found" condition.
Is this really a common recommendation in the Go community? Seems like returning `bool` to indicate if the value was found is somewhat of a no-brainer, since it follows the familiar approach used for map lookups and type assertions.
I've certainly seen and emulated this pattern in a lot of Go code. I don't know if I am representative of the larger community. I often use bool unless there are other error conditions (and there often are), but this article is definitely making me wonder if this is ideal.
In error handling or even reporting status there seems to be a part of being unaware that using stdout or some other strategy is either designed incorrectly which leads to slowdowns or people don’t understand when to use it. Sure languages also have this slowdown or have some documentation about it but many people just use what seems to be correct.
If this is s big deal perhaps you should be choosing a different language :)
Also BTW the nested `errors.Is` bench is wrong. Each function is wrapping the errors instead of constructing them preemptively and this is benching the time to create nested errors. Don't do this, in the test or real code. It makes it tough to read as a client.
Towards the end of the article, what does the author mean by "sentinel errors are expensive to construct"? Aren't they called sentinel errors because they are constructed only once?
Is the performance lost in the panic recover() not the actual panic?
Just a guess, I'm not a Go person.
If there's no attempt to recover in the code, then panic() wouldn't need to save any state and the compiler might just optimise out anything needed for recover()y.
Of course were it rewritten to a TryGet or (object, bool), it would bring that to 1-3 CPU cycles, as everything would get inlined and lowered to a few branches.
For gc. The was no attempt to benchmark under gccgo, tinygo, etc. so we don’t really know how Go fares. Go is, by explicit design, multi-implementation.
Do you have numbers for any of these on hand? I'm not holding my breath (after all, there's still issue of this being Go itself, which is inadequate at systems programming) but it's interesting to see how alternative implementations fare nonetheless.
(this does not make the point any less valid as the solution this is applied to discussed in the blog post uses "vanilla" runtime flavour)
If 19ns is critical to my existence, I'm certainly not going to spend hours of it finding out if a handful of ns might be shaved off by using another implementation.
There are plenty of good languages which offer better expressiveness and performance in systems programming domain: C#, Swift, Rust, Zig, D, Nim.
The biggest lie is Go looks like a low-level-ish language whilst it's anything but, so you end up with the worst of both worlds. It's tooling starts fast, which gives you a false belief that it will continue perform fast, and time and time again people learn the hard way that it's just not the case.
It's ironic the story with C# is the opposite, that people expect it to underperform, and avoid it on the premise of criticism based on their imagination, only to be repeatedly proven wrong that it's much faster and extremely capable at systems programming tasks.
> There are plenty of good languages which offer better expressiveness and performance: C#, Swift, Rust, Zig, D, Nim.
If we're sharing random tangents, I hear a lot of people like ice cream.
> The biggest lie is Go looks like a low-level-ish language
In what way does it look low level? I don't see it. Its closest language analog is Python, which I don't think anyone would see as being low level. Also, I assume this is meant to be in response to the "systems" bit. Systems are not defined by how "low level" they are. You most definitely could create a "low level" scripting language. You probably wouldn't, for various reasons, but you could.
You're responding to something that isn't in my comments. Neither I said it's a scripting language, nor you are being honest in pretending that the industry does not associate whether a language is perceptibly low-level with its applicability in systems programming tasks.
You said systems, which implies a converse of scripts. There is nothing else in the category. If something is not a systems language, it is necessarily a scripting language.
The vast majority of low level tasks are systems in nature (obviously; hence why a low level scripting language would be mostly pointless), so you're not wrong, but you're not telling the whole story.
Defining systems languages and scripting languages as mutually exclusive duals is a pretty bizarre way to look at the space. I mean, you do you, but I doubt you'll find many people who will agree with this framing.
What is low level about Go's pointers? You can't assign an arbitrary memory location. You can't do arithmetic. You can't do much of anything with them other than reference a value. I can't think of any language considered high level that doesn't have the ability to reference a value.
Like I said I think it tries to masquerade it’s low level by including them when they’re not really pointers. I see no reason why they didn’t go the route of other GC’d languages which don’t include faux-pointers.
Yes, you've repeated yourself twice now, but haven't explained yourself. What is low level about Go's pointers that would suggest that might suggest it is a low level language? There is nothing low level about them.
I’ve repeated myself because you don’t seem to be reading the words: masquerades, faux-pointers. I never claimed they were low level. I said they were pretending to be because pointers are completely unnecessary in a garbage collected language.
You've repeated yourself for a third time, but have yet to explain yourself. Pointers are unnecessary in every language, but there is nothing to suggest that a language with them is low level. Clearly there is nothing about pointers that make them low level.
I'd ask again, but it is apparent you don't even know what you meant by it.
What tools or primitives are missing that you wouldn't also need in the "happy path"? It is not like error handling is any different than any other type of handling from a programming point of view. It's the "business" challenges that makes it interesting. Most everyone has already figured out how to deal with success, but few want to think about failure, leaving all kinds of interesting problems to solve.
For other fun surprises that are rather obvious from the implementation but still surprise people quite regularly: wait until you see how much slower context keys are when compared to thread locals, particularly since it's normal for contexts to contain MANY layers (as opposed to errors, which are frequently* returned without wrapping at almost any level). It's a glorified linked-list lookup, and it re-walks the whole path every time.
But for both contexts and errors: imo the tradeoff is overwhelmingly in favor of wrapping more, not less, in the vast majority of code. Context is useful and standard for many things, and wrapped errors carry a lot more useful information, and both of those are absolutely crucial for healthy interop with other code. Performance sensitive stuff can do whatever it needs, these are obviously terrible choices there so just don't even consider it.
* but not always! and it may change with time. Still, my contexts are regularly 10+ layers deep while my errors are rarely more than one or two.
There are some interesting results in here, but the slower cases are a bit misleading. The majority of time in the slow cases is spent constructing errors, not in errors.Is.
Some background for anyone not familiar with Go errors:
A Go error is an interface value with an Error method that returns the error's text. A simple error can be constructed with the errors.New function:
var ErrNotFound = errors.New("not found")
A nil error indicates success, and a non-nil error indicates some other condition. This is the infamous "if err != nil {}" check. Comparing an error to nil is pretty fast, since it's just a single pointer comparison. On my laptop, it's about 0.5ns. Comparing a bool is about 0.3ns, so "err != nil" is quite a bit slower than "!found", but it's really unlikely the 0.2ns is going to be relevant outside of extremely hot loops.
We can also compare an error to some value: "if err == ErrNotFound {}". In this case, we say that ErrNotFound is a "sentinel" (some error value that you compare against). This is about 2.3ns on my laptop; there are two pointer comparisons in this case and a bit more overhead in comparing interface values. (You can actually make this check almost arbitrarily expensive; you could have an error value that's a gigabyte-large array, for example.)
It's common to annotate an error, adding some more useful information to it. For example, we might want our "not found" error to say what was not found:
return fmt.Errorf("%q: not found", name) // "foo": not found
This is quite a bit more expensive than "return ErrNotFound". The fmt.Errorf function will parse a format string, produce the error text, and make two allocations (one for the error string, one for a small struct that holds it). This is about 84ns on my laptop--168 times slower than the fast path! But 84ns is still pretty fast, and you can't get away from the need for at least one allocation if you want to return an error that's varies based on the inputs of the function that produced it. (You can get faster than fmt.Errorf if it matters, but this comment is already getting large.)
A problem with using fmt.Errorf in this way is that you can't test the error against a sentinel any more. This was addressed a while back in Go 1.13 with the addition of error wrapping. You can return an error that wraps the sentinel (note the %w format verb):
return fmt.Errorf("%q: %w", name, ErrNotFound) // "foo": not found
And you can then use the errors.Is function to ask whether an error is equal to ErrNotFound, or if it wraps ErrNotFound:
if errors.Is(err, ErrNotFound) { ... }
On my laptop, producing a wrapping error like this and testing it with "err != nil" is about 91ns, and testing it with "errors.Is(err, ErrNotFound)" is about 98ns. So using Is is adding 7ns of overhead, which is not nothing, but is also pretty much lost in the noise compared to creating the error in the first place.
The example in this blog post went a step further, though, and created an error with not just a single layer of wrapping but one with four. The error text in the wrapped error cases is:
GetValue couldn't get a value: queryValueStore couldn't get a value: queryDisk couldn't get a value: not found
(That is, by the way, a very difficult error to read. Don't hand users errors that look like that.)
Creating a stack of four wrapped errors like this on my laptop is 396ns, and inspecting it with errors.Is is another 21ns. 21ns is waaaaay more than the 0.5ns for a simple "err != nil" check, but again the runtime here is massively dominated by the expense of creating the error--which in this case involves repeatedly creating formatted strings and throwing them away, and two allocations for each layer in the stack.
In general, when doing low level optimization of Go code, avoiding allocations is the biggest bang for your buck. If microseconds matter, you absolutely should pay attention to the cost of constructing error values. But the cost of inspecting those values doesn't usually become an issue unless nanoseconds count, and will generally be dominated by the cost of construction.
Also, even the slowest cases here are running about 0.5-1.5μs, which absolutely matters in some cases, but is irrelevant in many others.
> In general, when doing low level optimization of Go code, avoiding allocations is the biggest bang for your buck.
I have found this to be the truth. I'm making a game in Go, and have learned that a smooth framerate depends on programming with allocation awareness. Always know where and when your memory is coming from.
We have in the past eliminated sentinel errors and gotten measurable gains from doing so, but this is mostly because our errors were both common (occurring several times on every request) and expensive to construct. The direct cost of errors.Is() was minor compared to the cost of building the error itself.
And you might be surprised how common it is for people to insist on wrapping an error at every layer of the stack. I've gotten in arguments with these people online, they're out there.
I insist on wrapping all errors each time and only removing that when performance testing shows it to be a bottleneck. A top concern of my systems is debugability which includes descriptive, wrapped errors with structured logging (this is a super power for system development and I am surprised when folks don't give love to structured logs and detailed, reproducible errors).
I want organizational velocity in the general case. If wrapping an error is in a hot path and shows up in metrics, yeah, remove the wrapping. Otherwise, wrap the error.
What is your argument against that? It would seem you find the compute savings of non-wrapped errors outweighs developer time and customer impact. If that is not what you are saying, please correct me.
My creds are using Go since 1.2 and writing massively scaled systems processing multibillion events daily for hundreds of thousands of users with 4 to 5 9s of uptime across dozens of services maintained by hundreds of developers earning the company hundreds of millions of dollars.
My argument against wrapping for backend services is that is:
1. I think that it is preferable to handle the error where it happened instead of at the top of the stack. For a backend service, there's really only three things you want to do with an error: log it, maybe bump some metrics, and return an error code and ID to the client. You have a lot more information available (including a stack trace if desired) if you handle it at this point.
2. By wrapping the error up the call stack, you're building an ad hoc stack trace. Performance wise, this is (probably, haven't measured) a lot better than an actual stack trace, but as you said yourself, the top concern is debug-ability
and developer velocity.
3. Wrapping an error doesn't provide just a stack though, you can add values to the error! Except...what does that really buy you vs. just adding the values to your structured logging system going down the stack vs. doing it on the way back up in an ad-hoc way? Those wrapped error values are a lot more difficult to work with in Grafana vs. searching based on fields.
4. If I have a stack trace, structured log fields, and a correlation ID, I personally don't get any value out of messages like ("could not open file), as I can just use the stack trace to go look at exactly what the line of code is doing. You could argue that with good enough wrapping, looking at the code wouldn't even be necessary, but I think that's pretty rare in practice. It also seems like a lot of extra work to spend a minute loading up the code in an IDE.
5. As mentioned in 1), what the client gets is just an error code and trace ID anyways. In fact, we actively don't want the wrapped context to be sent back to the client since it can be a security concern. If that's the case, we need to remove it and log it anyways. Why not just log the information in the first place?
Anyways, curious to hear your thoughts. I used to advocate for wrapping errors, FWIW.
My main argument against this practice isn't performance, it is that it makes error handling more difficult to write, review, and maintain. Treating errors as opaque and passing them up the stack in the general case is automatic and trivial to get right. Wrapping them is not.
I agree with your point about debugging, but I have a different idea how to best achieve it. Rather than wrapping an error at every stack layer, just take a stack trace when the error is created. This works great as long as... you don't design the system to require sentinel errors. Treating errors as rare, exceptional events rather than normal values used for control flow changes how you approach them.
> Treating errors as opaque and passing them up the stack in the general case...
...is directly contradictory to one of the most fundamental assertions of the language, which is that errors are values -- https://go.dev/blog/errors-are-values -- and therefore "can [and should] be programmed".
The reality is that the vast majority of error handling in go is to do one of two things
1) pass it up the stack
2) wrap it and pass it up the stack
The fact that you must do this explicitly in all cases is a failure of the language. Many people have pointed this out, but the go team and elite members in the community are very dedicated to the myth that every error is precious and special and must be handled in a one-off manner.
"This article" is an explanation of a property of the language, written by one of its authors. It's not a position piece, it's just an additional bit of documentation.
I mean, your position is totally valid, no argument. But it's definitely not some kind of objective fact (I certainly don't agree). And it's essentially an objection to fundamental properties of the language as it exists. Whether or not those properties represent a failure of the language is a question for the philosophers, but regardless, your code needs to respond to things as they are, not as you wish they were :)
Interesting enough, but this misses most of the point. In any case, its not really that interesting if the value you looked for is missing or not, its way more important that I can see _which_ value was not found. Any solution that does not capture which value was missing, is kind of besides the point for most practical applications.
You can get that by implementing an error type, or otherwise constructing a new error every time one is returned. But doing this adds to the cost of returning the error.
This is how we use most errors in our code base. But we tend to not treat them as sentinels, we are only interested in the descriptive error messages.
You understood what they meant, though. But yeah, it's confusing. I argued with my PhD advisor over this several times. "50% slower" or "10% slower" is ready to understand, but "96% slower" is harder to figure out, and it's easy to dismiss the difference between 96% and 98% because the numbers are similar
As far as I can tell, this is an aspect of language which people merely pretend to have problems with, out of pedantry, rather than a barrier to understanding in any possible case.
Everyone understands that if A is 2x faster than B, then B is 2x slower than A. If you say either of these things to someone with a basic grasp of sums, and then tell them A takes one second, they'll know that B takes two seconds. It doesn't matter which one you supply. No one would get this wrong.
Basically: the critical functions in the benchmark are small enough they were being inlined by the compiler, which means it's possible for the compiler to further optimize the loop to avoid all comparisons in some cases, producing an inaccurate result for some of the benchmarks. You can fix this by adding noinline directives to the methods.
I'll be publishing an update to the article and a post-mortem on how this slipped past review. The rank ordering of techniques in the article is unchanged, but the magnitude of the difference is not nearly so large.