Hacker News new | past | comments | ask | show | jobs | submit login
Should small Rust structs be passed by-copy or by-borrow? (forrestthewoods.com)
229 points by forrestthewoods on Aug 26, 2019 | hide | past | favorite | 107 comments



This purports to discuss pass-by-copy or reference but ends up not discussing that particular issue at all since every single compiler naturally inlined and vectorized his trivial micro-benchmark.

If you were to discuss the headline issue, I guess you could delve deep into calling conventions, register pressure, various instruction set extensions that compilers can abuse to carry the data.. or you take the compiler optimizations as a hint that concerning yourself with this is mostly pointless, as long as you don't start passing huge structures by-copy.


> or you take the compiler optimizations as a hint that concerning yourself with this is mostly pointless, as long as you don't start passing huge structures by-copy.

That's exactly what the author did.


yeah, writing microbenchmarks (coming from java) is an art of its right own. It requires knowledge what can and could be optimized given the context.

Trying to guide compilers with hints is a futile effort and most of the time compilers would be expending best effort to begin with.


I've mostly only dabbled in Rust, but I recall it being really difficult to pass data by copy in Rust. It seemed to want you to pass things by reference, especially if your type is meant to implement any prominent trait. It didn't matter if my type was an int, Rust seemed to want me to pass it by reference.


Do you have an example? Copying is only hard to the extent that you need to #[derive(Copy)] for your type, and ints are already Copy. I also don't see how other traits factor here.


It's been a while, but I tried implementing a trait that took `&self` with several different types. One such type was an integer type, so I had to pass by reference. It's probably not a big deal, but there are many things like this that seem to prefer you move or reference rather than copy.

I think it would be better for all traits to implement their methods on `self` and then the caller can choose whether the type that implements the trait is `i64` or `&i64` (instead of the trait being prescriptive about the type, which largely defeats the purpose of traits to begin with).

That said, I'm not very experienced in Rust, so I expect someone can correct me about why this is a Very Bad Idea.


Rust basically has this functionality. Whether a type is borrowed or not dictates what I can do with it. For example, I cannot remove items from a `Vec` with a shared &reference. Thus, without knowing which reference type is used in a trait method, I cannot write generic code; I don't know whether my function must take &T or T.

There might be a way around this limitation, but it makes figuring out who is supposed to free a resource hard to do.

The rust solution is to have different traits for different levels of ownership. For example, if I want to iterate over a vector of strings, I can get a Iterator<&str> without destroying the vector. If I'm willing to destroy it, I can get an Iterator<String>.

Or, you can actually implement a trait on a reference type[0] to convert a `self` argument to effectively `&self`. This is the recommended way to implement some conversions, using AsRef and/or Into[1].

[0] https://play.rust-lang.org/?version=stable&mode=debug&editio... [1] https://doc.rust-lang.org/std/convert/trait.AsRef.html


You should be careful trying to apply these micro benchmarks to modern optimising compiles, behaviour may be different depending on the exact code being compiled. Rust especially has immutability and known types in generics at compile time, so the compiler can do a lot of pointer and inlining magic that C and other languages can't.

In the below example Rust switches to using a pointer when you might think it's doing pass by copy/move. This works because in Rust the moved value can never be referenced after calling the function, so the compiler can just pass a pointer and clean up the value after the function has returned.

https://www.reddit.com/r/rust/comments/3g30fw/how_efficient_...


> so the compiler can do a lot of pointer and inlining magic that C and other languages can't.

Isn't this a negative for Rust for when you actually do care about such micro-optimizations? It feels as if you not only need to know the language itself but also how the particular version of the compiler you are using has decided to interpret the language and apply optimizations - essentially having to know how the magic trick is performed.

AFAIK this is why Free Pascal added constref in addition to const (which existed since the 90s in Delphi) - the latter doesn't guarantee pass-by-reference (even if in most cases it will do that) so if you care about that you need to know how the compiler you are using will treat it (thus needing to know more than just the language) but constref does exactly that. Now granted this was mainly done for interfacing with external code written in other langauges than micro-optimizations, but it still applies that if you cared about what the compiler will produce you had to rely on "magic knowledge" before constref was introduced.


In general Rust emphasises semantics over implementation side-effects. For example you use a pass-by-value because you're semantically moving ownership of the value to the function, not because the implementation may be faster/slower at the assembly level. Not nailing down the implementation is what allows the compiler to make so many optimisations.

If you need concise pointer control, use the unsafe trapdoor built into rust (like Rust's Vec). If you need very specialised code use assembly, not by abusing language side-effects like Duff's Device. If you need fast number crunching use the specialised library for it, not by writing a magic for loop with just the right amount of statements in it.

I admit that sometimes you need to micro-optimise, but I haven't come across this case much in my Rust code. When I do I usually hinder more then help it.


>If you need fast number crunching use the specialised library for it, not by writing a magic for loop with just the right amount of statements in it.

There is always that group of people who are trying to do something for which no specialized library exists.


As a member of that group, Hacker News is not where I come for answers to my questions. I think it's perfectly reasonable for comments here to assume that the readers of the comment aren't in that group. Either you know what you're doing enough that 99.9% of the internet is not helpful to you, or you're probably up shit creek without a paddle anyway until you figure out enough to move into the first group.


Then they'll use ASM, manual SIMD or pointers arithmetic like they would do in C and that would work. All of theses are part of the unsafe Rust (inline ASM is nightly only though, but you can still link to a ASM file even in stable).


> It feels as if you not only need to know the language itself but also how the particular version of the compiler you are using has decided to interpret the language and apply optimizations - essentially having to know how the magic trick is performed.

That's also true with C/C++, isn't it? Optimizations should be different if you compile a C program with GCC or CLANG, or even with GCC x or y versions (but I'm not a compiled language programmer, I could be wrong)


You're absolutely right. It's not even just optimization. If you exercise your C++ compiler hard enough, you will find bugs. You have to know what the compiler is doing.

I have the misfortune of working somewhere where I generally have to use older versions of compilers for production. I've found two compiler bugs just in the last two years since I started my current project. I'm usually the last person to blame the compiler, but in this case I can be pretty confident because the bugs I found were fixed in later versions of the compiler.


Just like Ada has in, out and inout.

The actual generated code depends on the optimizer, while preserving semantics.


C++ also has some copy elision: https://en.cppreference.com/w/cpp/language/copy_elision

This is a very obscure topic in general, and almost requires reading the C++ standard to understand it.


0. Generously use black_box

1. a large sample size

2. a single thread

3. high-precision monotonic timestamps on an unloaded system

4. mean/std dev from more than several runs


5. run them long enough to stabilize thermally so you don't have bursts of performance (potentially minutes), and/or use something like perflock to lock down your cpu speed: https://github.com/aclements/perflock (general tactic, this is just the last tool I heard of)

6. for similar reasons, run them in a random order each time


Or frameworks helping you do so - like "benchmark" for C++ (though I wonder if there is a cross-language one - that does the same amount of "pre"-warming, same counters, etc.)

Probably not... and too much to ask.


> Blech! Having to explicitly borrow temporary values is super gross.

You can cut down on a lot of that by implementing basic operations (Add, Mul, etc.) for both value and reference types.

    impl Add<Vector3> for Vector3 { … }
    impl Add<&Vector3> for Vector3 { … }
    impl Add<Vector3> for &Vector3 { … }
    impl Add<&Vector3> for &Vector3 { … }
(The first three can delegate to the last one, unless there are value-specific optimizations you want to apply.)


This has irked me as well. Is there an acceptable way to make this less verbose?

A crate with a derive macro for example?



As danieldk mentioned elsewhere, you could get partway there with std::borrow::Borrow

    impl<T> Add<T> for &Vector3 where T: Borrow<Vector3>


Does that essentially overload + for it?


Yes; Rust uses traits to implement behavior for operators on custom types (see the full list here: https://doc.rust-lang.org/std/ops/#traits)


The Cpp Core Guidelines suggest that the cutoff between copy and const reference should be "up to two or three words" (item F.16). Following this advice, the f32 vector would be passed by value, and the f64 vector would end up in the grey zone.

https://github.com/isocpp/CppCoreGuidelines/blob/master/CppC...


That's what the Nim compiler does.

Bigger than 3 words, pass-by-ref, otherwise pass-by-value.

So the only important modifier is whether the parameter is mutable or not. Obviously if mutable it's always passed-by-reference.


That explains why I usually see std::span passed by value.


Spans with extent known to the compiler are not even two words long. Like unique_ptr with default deleter they are the size of a pointer.


The original question aside, it's awesome that Rust has the potential to be faster than C++ because its language semantics allow for more compiler optimizations. I'd thought it was slightly slower, but maybe that's just a temporary state because C++ has had so much time to get optimized.


I don't think that's the case. The article is comparing wildly different compilers (Visual Studio and LLVM) which are inevitably going to give very different results. Comparing Rust with clang would be a more interesting comparison since they use the same back end.


The idea that Rust has the potential to produce faster binaries than C/C++ comes, I guess, mainly from the fact that (once the bugs in LLVM have been fixed) Rust will be able to apply restrict/noalias very liberally to the generated IR, facilitating optimizations by the compiler backend which are not possible in your usual C/C++ codebase.

But yes, in this case it probably came down to some more obvious cause like a wildly different compiler.


Right, but I'm not basing that purely on these benchmarks. The fact that C++ allows so much, which Rust doesn't, prevents you from transforming code in certain ways that might achieve better performance with identical behavior. I'd just never really thought about the implications of that before. Because of this, in theory, equivalent Rust and C++ code should asymptotically approach a place where Rust is faster, even though at the moment there are many other factors like comparative maturity of the ecosystems.


The author posted a clang benchmark at the end of the article. It has pretty much the same performance as the MSVC one.


For the difference between rust and C++ I'd say is caused by the difference between rustc+llvm and msvc++ compilers. Compiling C++ under clang should give more comparable results


I did find it frustrating that Clang wasn't the first thing he tried. But he did eventually try Clang past the end of the post (when it should've been at the top). Interestingly though, Clang doesn't bridge the gap.


> Interestingly though, Clang doesn't bridge the gap.

That might be due to differences in the code, but it might just as well be due to differences in the flags that rustc and Clang pass to the backend by default. It's not even clear what optimization level they asked for.


Also might be worth benching #[inline(never)], as the heuristic difference between a small benchmark and a larger program can lead to different behaviour.


It's mentioned at the end of the article that clang doesn't make much of a difference.


Note that Rust's "clippy" linter has a `trivially_copy_pass_by_ref` lint which recommends changing by-reference to by-copy for sufficiently small structs, but presumably it's aiming to be conservative.

AIUI it recommends by-copy for structs of at most 8 bytes because it thinks that's the appropriate limit for 32-bit targets, and they don't like the set of warnings issued to change too much for different targets.


Curiously that linter rule source has a very in-depth discussion of "setting the limit", but entirely disregards that it is the sum of parameter sizes, not the individual parameter (which is all the linter looks at).


This reminds me of the years I wasted writing C++, always looking for the idiomatic approach, rather than learning about security, networking, architecture and so on. In exactly the same way, Rust is just too complicated.


Rust is less complicated than C++, but it is more complicated than Go. However I’m a big proponent of minimizing complexity wherever possible - language overhead should not become an obstacle to programmer time.

The nice thing about Rust is that it allows you to compartmentalize complexity very nicely by creating rules the compiler will enforce, whereas with C++ the programmer is also burdened with knowing them.

I’m not sure if I’d say that it’s more complicated than Python, since Python’s flexibility can in practice lead to creative but very hard-to-follow code.


I recently realized that trying to order C++ and Rust precisely on a complexity scale is rather pointless and misleading, since it's obvious that they are in the same class of trickiness, but for different reasons.

If a significant number of the developers that come into contact with a language think it's hard, then... it's very likely hard.


Rust is harder than many languages, but just because something is hard doesn't mean that it's complicated. I find that the strictness of the compiler ends up all-but-forcing code to end up rather simple.


This article demonstrates that it is overthinking the issue — both approaches are correct and about as fast. For non-copyable structs move vs borrow are semantically different, so the choice is even clearer.

With built-in style lints and Clippy, Rust is doing quite well in keeping codebases simple and idiomatic.


Rust is too complicated because you can choose to take parameters by-value or by-reference?


It's because:

a) the language designers decided to cram several paradigms into it, including a heavier than usual dose of functional.

b) most people don't want to get pedantic about resource management

c) the syntax is weird for people coming from the C language family


> a) the language designers decided to cram several paradigms into it, including a heavier than usual dose of functional.

Rust is actually really close to JavaScript in that regard, with its mix of OOP-without-inheritance and the whole collection of functional method on Arrays and iterators (map, reduce, etc.). And while People have a ton of complain about JavaScript, none of them is «the JavaScript language is too complex».

> b) most people don't want to get pedantic about resource management

But Rust is explicitly aimed at people who do! You can't use Python or JavaScript to write a web browser, a kernel or anything where performance is paramount.

> c) the syntax is weird for people coming from the C language family

What ? Do you mean “ not coming from the C language family”? Because its syntax is clearly in the C-family (semicolon, braces).


> What ? Do you mean “ not coming from the C language family”? Because its syntax is clearly in the C-family (semicolon, braces).

My guess is that they're talking about the syntax inspired from its ML roots, like `let/let mut`, `->` for return types, and `:` for type annotations. I personally prefer these to the C-style equivalents, but I've talked with enough people who haven't used an ML-like language that I'm not quite as surprised as I used to be about people expressing misgivings about them.


Thanks I haven't thought about that part, but nowadays it has become kind of mainstream: JavaScript uses let/const and typescript uses the colon notation for type. And I've never heard anyone complaining about those for the complexity of their syntax.


I'm sure a lot of people don't care about explicitly managing resources! But there are a lot of people who do, and they need a language to use.


You're welcome to write Rust as non-idiomatically as you like! It might take a few lines of #[no-warn()] to silence the compiler, though...


I was pleased to see that there is a fuckit.rs: https://docs.rs/fuckit/0.2.1/fuckit/ in the vein of fuckit.js, fuckit.py and libfuckit


As someone said on the thread, there is a clippy suggestion to tell you that you could pass by copy given the size of your struct (without it I would never have passed a struct by copy).


In general, in a multi-core/massively-concurrent&&parallel world, copying small, immutable data-structures is preferred. The big reason is cache hits and minimizing contention for cache lines. If you have small data structures at different addresses (that map to different cache lines), then the memory subsystem isn't stepping all over itself with RAR/WAR/RAW hazards because the copies of data structure became essentially independent of each other after the copy finished and wrote-back to L2/L3.


> Rust tuples are delightful to use and C++ tuples are a monstrosity

They're not great, but they're a lot better than using out parameters!


Give Rust twenty years, and there will be plenty that's crap compared to the new hotness. C++ tuples will be no worse than they are, and possibly a little better.


Interesting article. I enjoyed it.

I have two thoughts.

====

1. Regarding the benchmark itself, I wonder out loud if CPU caching could have a meaningful alter on the results. The article says :

> I randomly generate 4000 spheres, capsules, segments, and triangles.

It is not clear to me if this is enough to fill CPU L1 cache or not. My guess is that it is not. If all the benchmark indeed happens within the L1 cache, I wonder how a bigger workload would affect the results. Maybe it does not change anything. But maybe it does.

====

2. The author mentioned two criterion for choosing between by-copy and by-borrow : performance and ergonomics. I would add a third, somewhat related to ergonomics : let's call it semantics. When some piece of data naturally belongs somewhere and some others entities are natural "readers" of it, it might make sense to use the ownership / borrowing mechanism of Rust to naturally reflect this relationship, regardless of performance or code ergonomics.

In particular, a borrow guarantees you always work on a read-only, up to date value of the data. Meanwhile, after a pass by-copy the life cycles of the data seen by the caller and the data seen by the callee become independent of each other. This can have consequences, particularly in a multi-threaded environment.


the author doesn't make the data representation explicit, but if doubles are used for positions, 4000 triangles alone would be 288,000 bytes (8 bytes * 3 indices * 3 vecs * 4000). 4,000 of each would spill the largest L1 cache I know of on a mainstream processor (512 KB) and they would still be too big to fit in L1 cache of most modern CPUs even if the vectors used floats. it wouldn't be a huge margin though.

I agree with your point 2) in general. first try for the closest mapping between language semantics and program semantics. come back and do weird stuff later after profiling.


I don't see that making a difference in the by-copy/by-reference use on the leaf functions? Since the copies are stack allocated all the data used there is going to be shuffling in and out of the same few cache lines and shouldn't make a perceptible difference in overall cache usage.


My intuition would be: up to 128 bit/16 bytes always pass by-copy. For larger structs up to about 1024 bytes start with pass by-copy and switch to by-borrow only if you profiled and found a performance bottleneck.


I don't really understand the complaints in the 'ergonomics' section, this code won't even compile:

  fn dot_product(a: &Vector3, b: &Vector3) -> float {
      a.x*b.x + a.y*b.y + a.z*b.z
  }

  fn do_math(p1: &Vector3, p2: &Vector3, d1: &Vector3, d2: &Vector3, s: f32, t: f32) -> f32 {
      let a = p1 + &(&d1*s);
      let b = p2 + &(&d2*t);
      let result = dot_product(&(&b - &a), &(&b - &a));
  }

Namely, how are you going to multiply a struct like d1 by an f32? Rust has deref coercions also, so there's plenty of times you don't even need to put a &.

Also, I think the idiomatic thing to do would be to implement 'Add' for Vector3.


> Namely, how are you going to multiply a struct like d1 by an f32?

https://en.wikipedia.org/wiki/Scalar_multiplication

Here's a Rust Playground with the vector math impls he's probably assuming (dummied): https://play.rust-lang.org/?version=stable&mode=debug&editio...


Oh yeah. It wasn't clear from the blog post that he had implemented some of the std::ops operators for his Vector3 struct.

You can get rid of the superfluous & symbols by implementing for &Vector3 also, so again, the ergonomics section doesn't really resonate with me.


A graph of struct size Vs speed would have been useful. Also definitely learn to read the assembly - you can then just get an actually answer for what is happening rather than stabbing around in the dark with profiling tools.

Anyway good article.


That's the compiler's decision. Whether to pass something as a reference or as a copy is up to the compiler, provided that it can tell if this affects program semantics. If a argument is not modified (or cannot be), and lifetime analysis can exclude aliasing and concurrency conflicts, then it doesn't need to be copied. It can be if it's tiny; that's the compiler's call.

That feature was in an early Modula compiler.


Will the Rust compiler convert one to the other as a performance optimisation where possible?


Yes - even if you ask for pass by move, it might pass a reference for a large struct.


That's interesting, and a bit surprising. I know Ada does this, but it's explicit about only specifying semantics ("in, "out" or "in out") and letting the compiler decide.

Surely C or C++ optimizers won't do this?


It's not as surprising when you consider that moving the value into the function also makes it permanently unavailable to the caller. So what difference does it really make? The semantics are the same either way. The lifetime of the object ends within the function.


> Surely C or C++ optimizers won't do this?

When they can prove it doesn't change the semantics on ways that violate the standard I think you will find that they do. Rust doesn't really have it's own optimizer after all, just a repurposed C/C++ one.


For objects with trivial constructors and destructors, that's certainly a possibility. But anything with non-trivial move constructor or destructor, e.g. anything that allocates memory, is going to be much harder to optimise in C++.

The problem is, "moving" a C++ object means executing an arbitrary function, which at the very least needs to put the object to a "do nothing on destruction" state. The moved-from object still exists and has to have its destructor run (which is also an arbitrary function). In Rust, even an unoptimised move is just a bitwise copy, and the moved-from object doesn't need a destructor run because its lifetime ends completely when it is moved from.


Let's disregard inlining etc, so that we have actual function calls in both languages.

C and Rust compilers do the same thing - they follow a calling convention. It will tell you how to pass a struct, if it should be passed as a pointer or in registers etc.


Rust doesn't have a defined, standardized, binary calling convention, to the extent that C does.


I agree, but that doesn't mean that it picks different calling conventions for each function call it compiles or something like that. Each version & platform of Rustc will have a calling convention that it sticks to when compiling function calls.


It can pick a calling style on a per function basis.


It's wild how much longer this article was than it needed to be. The first benchmark showed that for the test case tested, the copying and borrowing performed nearly identically. The answer to the stated question is that one is free to use whichever syntax one prefers. I would pick the one with the less-confusing semantics, as less complex code is more likely to be correct.


> It's wild how much longer this article was than it needed to be. The first benchmark showed that for the test case tested, the copying and borrowing performed nearly identically. The answer to the stated question is that one is free to use whichever syntax one prefers. I would pick the one with the less-confusing semantics, as less complex code is more likely to be correct.

Please don't blindly apply this line of reasoning everywhere.

For this particular micro-benchmark the results were almost identical. So for this particular piece of code one should apply whatever makes sense.

But if you are trying to generalize to other code (or worse yet, trying to come up with a rule), then you need to understand why the code is behaving the way it is. If you blindly trust your benchmarks you become blinded by their implicit assumptions.


Maybe I'm missing something, but it looks like about a third of the functions in by_ref.cpp take their parameters by value:

https://www.forrestthewoods.com/blog/should-small-rust-struc...


I've had a smaller but similar dive recently while working on embedded machine (32 bit) with Rust. Clippy was pushing me to use move/copy instead of reference for passing a [u8; 8].

I thought this strange on a 32bit machine so I dived in and it turned out the "rule" is supposed to be "up to 2 register sized" variables are supposed to be copied, anything bigger is supposed to be referenced.

The trick here is that due to compatibility and no knowlege of target platform and setup clippy assumes 32bit register size. So on 64 bit platforms you'll only get this warning up to 8 bytes as well for example.

I never went in to actually see if it makes any sense on that particular 32bit platform, so good to see someone taking a dive on the actual compiled code side.


Did you file a ticket?


No, I asked on the IRC channel and forums as well as a friend who does embedded. The answer was basically "because clippy doesn't know the target arch it has to be a conservative-guess" which I agree with.

My friend also explained how it's a REALLY tricky question to answer properly especially given specific embedded architectures and setups where the answer is very hard.


The article is interesting, but it is frustrating that it mainly cares about speed, but 'speed' or 'fast' is not mentioned in the title.

To answer that general title question, I would not even have considered that 'speed' might be an issue. I'd say: pass small things by value because it probably results in the most readable code, which should always be the first dimension of optimisation, I think. Also, the compiler is most probably good enough anyway, so care about your source code readability first.

So, at least the title should say 'if you care about speed a lot'.


Everyone cares about speed first, or they would be using a more convenient slow language.

The challenge of Rust is, how nice can we make a language without noticeably giving up any performance to C++? Pretty nice, it turns out.

C++ still has an edge on both performance and ability to code powerful libraries that are not a PITA to use. The performance margin definitely will close up. The library margin could, too, but it will require unpleasant choices.

Anyway both are faster than almost anything else, and will stay that way, for von Neumann machines. Those might get less important, soon.


> Everyone cares about speed first, or they would be using a more convenient slow language.

I use Rust and I actually care about readability first.

Sincerely I find Rust more readable than most popular languages (Python, Javascript, C++, Java, even Swift...), unless I'm looking into some code that abuses generics and lifetimes (which is actually rare for me).

I think that's because it enforces some pretty clear patterns with its lack of struct inheritance, sum types with pattern matching, the way things are imported ("use" statements), lack of parenthesis around conditionals, and being expression oriented.


"At least three things are super weird here"

What was the third thing??


> My answer to by-copy versus by-borrow for small Rust structs is by-copy

The author is making the case for special treatment of structs for a special case. For a performance gain that's probably below the margin of error. This is negating the elegance and consistency of rust for a special case. I think the author thinks too much in C++ terms and is sort of missing the point of Rust (or at least one point of rust).


> The author is making the case for special treatment of structs for a special case.

The struct by-copy is mostly a good API thing in a lot of scenarios, particularly when the objects are <1 cache line.

Even more so, when you're only writing half the code and want to separate the actions within a function from any side-effects outside.

I would say that the performance difference for small structures is irrelevant to the extra hop, particularly if the copied location is the stack on the callee.


> Even more so, when you're only writing half the code and want to separate the actions within a function from any side-effects outside.

Due to the default immutability of references and the fact the compiler won't allow you to won't allow you to share data between threads (without jumping through a lot of hoops), the risk of side-effects are pretty well neutralized. Another selling point of Rust.


Thanks for sharing. As a Rust learner, I still don't know how to place the cursor between performance with borrowing and ergonomic with copying. Most of the time, I just copy/clone and I tell myself that the performance does not matter that much because it will be fast enough for my purpose anyway.


> it will be fast enough for my purpose anyway

It probably will be.

I've been using Rust in production for a while now, and our bottlenecks are never Rust's performance. Something else is the bottleneck way before we approach anything close to Rust's peak throughput. This is especially true if you're doing anything that touches networking or interacts with other services.


We've used this to our advantage when performance testing things on the JVM. We might not write the final solution in Rust, but we'll use it to figure out a theoretical upper limit with various things like Kafka and misc. databases.


Using references to temporaries is at least slightly less painful with a ref pattern:

    let ref a = p1 + &(d1*s);
    let ref b = p2 + &(d2*t);
    dot_product_by_borrow(&(b - a), &(b - a))


How to design your API in Rust (or any modern language):

1) What makes the most sense semantically? (So that using your API correctly is the most obvious path)

2) Doesn't matter? Okay, what's the most ergonomic?

3) That code path needs to be performant? Try variations and benchmark within your overall application.

I'm not refuting/dismissing the article. Dives into how the compiler handles things is always interesting! But the programming ecosystem is going through some growing pains right now.

At the beginning of time we had the Era of Assembly; back when our machines were measured in MHz. Code had to be performant above all else; that was the only rule.

During the Era of Moore, performance exploded. As a counter-reaction to the Era of Assembly, programmers began chanting the mantra "No Premature Optimization!" This led to the creation of easy/lazy languages like Python, where code was an art and performance wasn't even an afterthought.

Now we're in the Era of Types. While Python and Javascript were running rampant without types (quack quack), languages like Haskell were evolving in the shadows with intricate and expressive type systems. The fruits of those labors are sprouting in the form of Rust. The mantra of "No Premature Optimization!" isn't enough. We no longer need to choose between code that is easy to write and code that is performant. With a good type system we can be more explicit with the semantics of our code and APIs. This makes our code easier to use, more ergonomic, and gives the compiler more information that it can leverage to optimize our creations into assembly machines that would make C64 developers nod approvingly (though knowing secretly that they could always do better).

The growing pain is this transition from a mindset of just "No Premature Optimization" where the focus is simply on writing "easy code" in stark reaction to the hyperoptimization of the Era of Assembly, to a modern mindset where we should write code with intentionally designed types and semantics. Hence my more complicated list of three rules instead of one.

Side Note: And of course, as some comments have pointed out, step number 3 of my guideline is fraught with peril because the compiler's behavior, and the behavior of the CPU, change like the direction of the wind. Thus the thrust of my comment is emphasizing the use of our new typing systems. If your code is typed correctly and your intentions clear (per step #1), the compiler will do the right thing on average at least.

Side Note 2: None of those guidelines apply to languages of the older eras; they aren't expressive enough to communicate with the compiler and hence you're left in the old miasma of untyped languages where optimization is near hopeless, or languages with type systems so narrow that you're battling day-to-day to build APIs with a semblance of usability.


Copying data is among the fastest operations possible on a modern computer bus. Not a thing to worry much about.


Sure, until your typical workload involves billions of them.


Nope. If its a fraction of the processing you do (including loops, calculations and tests) then its negligible. Because you do billions of those things too, and they will dominate the time-cost.

The case in point, subroutine calls, the call itself is far costlier than the argument copying. So in this case, data moving is almost completely irrelevant.


What I find weird is that there is a difference in performance.

If the compiler is smart enough to inline, it should be able to notice that the way arguments are passed make no difference and use the most efficient way regardless of the signature. Do I overestimate the abilities of compilers, or maybe there is some side effect I am not aware of (concurrency?).

Obviously, it only applies when inlining.


Include cost of indirection


A good rule is to let the caller decide. If input parameters are by-reference and the caller wants to make them by-reference, congrats, no problem. If caller wants the parameters to by a copy, they can create a clone and pass a reference. But if you impose the input parameters to by copy, there's no way back.


Although your point is sort-of true, but I don't really see how it's valid here. The author is talking about the performance implications of the two approaches in situations where the function parameters are not mutated. Suggesting that user "makes the choice" here by taking defensive copies for themselves will probably (but who knows what the optimisers will do - they might just elide it) make performance worse. It's not something end users will think to do, anyway, at least not in the name of performance.

Regardless, I want a library author to help me fall into the pit of performance success, not expect me to do it all for myself.


In the case of Rust it would typically be an immutable reference (or else passing by value wouldn't be an alternative). So the caller would have no reason to make a copy.


In Rust (end even C++ afaik) it usually makes more sense to let the callee decide. Does it need to own the value? -> Call by value. Does it only need to inspect or (visibly) mutate the value during the function call? -> Call by reference (modulo optimization concerns).


Functions can also be generic over references vs. move/copy semantics, e.g.:

    fn foo<T>(frob: T) where T: Borrow<Frob> {
      // Do something with frob.borrow()
    }
Due to monomorpization, foo is specialized for moves/clones when called as foo(frob) where frob is a value type or for borrows when called as foo(&frob). Similarly, you can use Clone, ToOwned or even Into for going in the opposite direction.

Whether such functions are good taste in general is debatable ;).


You want AsRef, not Borrow, unless you need the additional guarantees that Borrow provides (identical comparisons and hashing).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: