Hacker News new | past | comments | ask | show | jobs | submit login
Rust Ownership Explained with Python (paulkernfeld.com)
149 points by adamnemecek on Sept 29, 2018 | hide | past | favorite | 45 comments



Not that it really matters, but since he lamented that the second Python solution was "longer and less elegant", I'll point out that that algorithm can be written in a functional style that is both short and (IMO) somewhat elegant. However with reduce being removed as a built-in since Python 3, this is arguably not Pythonic.

    from functools import reduce
    
    squares = (x * x for x in range(10))
    
    min_and_max = lambda current, x: [min(current[0], x), max(current[1], x)]
    minimum, maximum = reduce(min_and_max, squares, [next(squares)]*2)
    
    print(minimum)
    print(maximum)


    squares = [x * x for x in range(10)]
    print(min(squares))
    print(max(squares))
this one works and looks good.


range(10) doesn't properly demonstrate the material differences between generators and lists.

Like posted elsewhere, tee is an appropriate solution that preserves the generator's benefits over a list. It's too bad that the example is so synthetic.


One downside of both solutions is that they both traverse the list twice, which could be expensive when we're talking about real data and not toy problems.


The Python problem can be solved with `itertools.tee`[0]

    from itertools import tee
    squares_min, squares_max = tee(x * x for x in range(10))
    print(min(squares_min))
    print(max(squares_max))
[0] https://docs.python.org/3/library/itertools.html#itertools.t...


Unfortunately, tee's are backed by a queue and upon each read of a "fresh" value from one of the branches of the tee, the shared queue is appended to. There is no opportunity to remove an element from the queue until all branches of the tee have consumed it, so this will end up using O(n) memory as squares_min is completely consumed before squares_max is accessed. (Alternatingly calling next on the branches of the tee will solve the issue.) See the "Roughly equivalent to" and immediately previous text in your link.


    squares = [x * x for x in range(10)]
    print(min(squares))
    print(max(squares))
this one works and looks good.


It will have memory impact though with larger range.

A better approach might be:

    squares = lambda: (x * x for x in range(10))
    print(min(squares))
    print(max(squares()))


Yes, your solution is in the original post, under the heading 'Python solutions':

  def squares():
      return (x * x for x in range(10))


It also uses O(n) memory.


We need a language that can schedule and fuse those folds over an infinite sequence concurrently. Joining a set of futures over an infinite sequence fuses folds


I'm not sure adding more magic to languages is necessary when the alternative of using explicit generators is not too difficult.


Do generators compose?


Sure.


As a Rust beginner I am not quite sure why .min() is supposed to take ownership and change the sequence. I would assume it does nothing with the sequence except read it.


Sequences can be generated at runtime, and it may even be impossible to "regenerate" them (think for instance reading bytes from the network).

Therefore, just reading that sequence changes (consumes) it.


And of course for things that can be iterated over multiple times, it's likely possible that you can (re)generate the iterator. E.g. this compiles fine:

    fn main() {
        let squares : Vec<u32> = (0..10).map(|x| x * x).collect();
        println!("{:?}", squares.iter().min()); // Some(0)
        println!("{:?}", squares.iter().max()); // Some(81)
    }
Edit: Or a smaller change to the original, you can add a single .clone():

    fn main() {
        let squares = (0..10).map(|x| x * x);
        println!("{:?}", squares.clone().min());
        println!("{:?}", squares.max());
    }
This clones the std::iter::Map<std::ops::Range<i32>>. Unlike the vec version, this will re-evaluate the |x|x*x sequence when .max() is called.

There's no technical reason Map<Range<i32>> couldn't be made to implement the Copy trait, which would make the extra .clone() here unnecessary - but Rust's standard library has chosen to force you to make an explicit choice instead.


Curiously, C# provides the behavior one would expect.

It does so by distinguishing IEnumerable<T> [1], and IEnumerator<T> [2]. The first would be the result of the map call (Select in C#'s LINQ), representing a sequence, that could be generated, and the later is the one, that actually generates it one by one, and has to be consumed.

I wonder why Rust did not take C#'s approach.

[1] https://docs.microsoft.com/en-us/dotnet/api/system.collectio...

[2] https://docs.microsoft.com/en-us/dotnet/api/system.collectio...


Rust does have these abstractions. The corresponding rust traits are `IntoIterator<Item=T>` and `Iterator<Item=T>`. For example:

  - Vec<T> implements IntoIterator<Item=T>
  - &Vec<T> implements IntoIterator<Item=&T>
  - &mut Vec<T> implements IntoIterator<Item=&mut T>
and consequently,

  - vec.into_iter() implements Iterator<Item=T>
  - (&vec).into_iter() (or vec.iter()) implements Iterator<Item=&T>
  - (&mut vec).into_iter() (or vec.iter_mut()) implements Iterator<Item=&mut T>


Good, but how do you create something, that is IntoIterator, which holds an immutable description of how to generate members to iterate over? And I mean some potentially infinite sequence.



Can you .map it?



Can you be more specific about what behavior from the Rust code you find unexpected, in comparison with C#?


Exactly the consumption part. Perhaps I don't understand Rust library, and there's another stuff to do it right.

In C# Enumerable.Range and .Select (e.g. .map) internally create an immutable instance, that implements IEnumerable<T>, that describes what will have to be returned.

Then .Min internally takes IEnumerable<T>, calls its .GetEnumerator, that actually returns something, that can be consumed once and has a mutable state, and finds the minimum by actually consuming that iterator (all internally).


I'd say the distinction is that Rust's .min() is implemented on the equivalent of IEnumerator<T>, Iterator - instead of the equivalent of IEnumerable<T>, IntoIterator. This makes it slightly more flexible, since you can always get the former from the latter by calling GetEnumerator(), but not vicea versa (without filling a container with it first or something.)

If C# also implemented Min on IEnumerator<T>:

    var enumerator = collection.GetEnumerator();
    var min = enumerator.Min();
    var max = enumerator.Max(); // Runtime bug: Min() already the entire enumerator!  InvalidOperationException?
There's basically nothing useful you can do with 'enumerator' after the first call to this hypothetical Min. Rust's stdlib tries to catch this bug by having min take ownership of/consume 'enumerator', instead of merely borrowing it, so you can't use it again:

    let iter = collection.iter();
    let min = iter.min();
    let max = iter.max(); // Compile time error: iter used after move
In C#, basically everything that can be is an IEnumerable<T>, and the stdlib is built around that fact. Does System.IO.File.ReadLines hit disk every time you re-enumerate it? Or does it store to a container? Many of the stream based APIs have a .ReadLine() method but no .ReadLines() method, presumably because they'd have to allocate a container to store the result in case enumerable be re-evaluated multiple times, and that seems bad if you don't need it?

Rust seems to lean a lot more heavily on single-shot iterators.

    let lines = stdin.lock().lines();
    for line in lines { ... }
    for line in lines { ... } // Compile time error: lines moved.

    let lines = BufReader::new(File::open("input.tsv")?).lines();
    for line in lines { ... }
    for line in lines { ... } // Compile time error: lines moved.
"Does lines() hit disk/stdin every time you re-enumerate it?" is a trick question: You can't re-enumerate it.

You can resolve these errors by being explicit about storing the input in a container and iterating that multiple times, or by re-opening the input file, or by resetting the File position and constructing a new BufReader, depending on which behavior you wanted for re-enumeration.


I think what's unexpected is that (0..10).map(|x| x * x) doesn't implement Copy. I think that's something that could be implemented as Copy closures are a thing now (or maybe soon). Although wait, the problem is more that (0..10) itself isn't Copy, now that I think about it.


That seems strange to me. If calling max() changed the sequence, I would expect it would have to be declared as mutable.

I struggled with this when writing my first rust program. I was very surprised by the behaviour.


If calling max() changed the sequence,

max does not change a sequence, it consumes the iterator.

I would expect it would have to be declared as mutable

max takes self (and not &self), so the ownership of the iterator is moved to the max method. The new owner can decide to bind the value mutably. This is always the case when a value is moved and changes ownership, e.g.:

https://play.rust-lang.org/?gist=50e6fe86e530baeabda1a7cca3c...


I suppose designing it that way still prevents you from doing anything wrong, but you need quite a lot of knowledge of the system to understand what it's doing.

I'll get there one day, but it's hard to go back to having no idea what my compiler is trying to do.


Because it runs on an iterator. And anything that runs on an iterator consumes it. If you for example run

    // Make a iterator
    let test_iterator = vec![1, 2, 3].into_iter();
    
    // count() iterates over the elements and consumes them
    let count = test_iterator.count();
    
    // because the values have been consumed, this is not possible
    for element in test_iterator{
        println!("{}", element);
    }
See: https://play.rust-lang.org/?gist=750dffd641ac237690defbf86da...

The count method will consume every element, and return the length. This is because iterators are meant to be consumed when iterated. This is maybe inconvenient, but necessary because of the generic nature of the iterator trait: for some things that implement the iterator trait it is just not possible to get the max(), min() or count without consuming the iterator. So this would work:

    // Make a iterator
    let test = vec![1, 2, 3];
    
    // Create an Iterator and consume it with count()
    let count = test.iter().count();
    println!("Count: {}\n", count);
    
    // Create another Iterator and loop over it (consume it in each row)
    for element in test.iter(){
        println!("{}", element);
    }
    
    // Create another Iterator and consume it with min()
    let min = test.iter().min();
    println!("\nMin: {}", min.unwrap());
Try here: https://play.rust-lang.org/?gist=498ffd9afa163909745d8537e97...

That means, there was probably the choice: try keeping it generic and consistent for all things that implement iterator or making these methods something that works fine with basic types but not at all with anything more complex.

So: Iterators don't know any value until they are consumed. That means iterators will be consumed most of the time (there are exceptions.

Also see: https://doc.rust-lang.org/std/iter/index.html#implementing-i... And: https://doc.rust-lang.org/std/iter/trait.Iterator.html


It's definitely harder in the general case to run into these bugs, but the real trick, in my opinion, is building APIs where it's impossible.

The 'escape hatch' here is the .by_ref() but you could, if it's critical, write wrappers that don't provide those escape hatches.

If it were critical that the reuse never happened you can make it impossible, which is really cool. But, by default, for most APIs, it's just generally more difficult to mess up.

There are a bunch of articles on session types out there, where you represent each state in a state machine as a separate type, with transition methods that consume the previous state. If the correctness touted in this article sounds appealing, I highly recommend reading about session types.


Isn't the property of the generator expression not that it is an iterator (like a range or list also are) but specifically that it is a generator and thus inherently has state that is permanently/mutably exhaustible across different scopes? I like the example, and this is the first time I've totally wrapped my head around Rust ownership, so thanks! I just think precise Python terminology might keep people from getting confused about different types of Python iterator objects. Iterators are anything that has a __next__() method: https://www.python.org/dev/peps/pep-0234/


"it is an iterator (like a range or list also are)"

No, range is not an iterator. Try this in either python 2 or 3:

  a = range(10)
  a.next()
It will fail. Now try this:

  a = range(10)
  my_iterator = a.__iter__()
  my_iterator.next()
"Iterators are anything that has a __next__() method"

Right, and range(10) does not have a __next__() method, so it's not an iterator. It's an iterable, which is anything which has a __iter__() method that returns an iterable.

In python 3, range(10) returns a range object (not a list). Because it's an iterable and not an iterator, it doesn't get 'used up'. For example, try this (in Python 3 only):

  a = range(1000000000) # returns instantly, as it's not building a list
  a[3] # returns 3
  a[3] # returns 3 again


> Isn't the property of the generator expression not that it is an iterator (like a range or list also are) but specifically that it is a generator and thus inherently has state that is permanently/mutably exhaustible across different scopes?

> just think precise Python terminology might keep people from getting confused about different types of Python iterator objects. Iterators are anything that has a __next__() method

An iterator will function exactly as poorly as a generator does in his first example:

  In [5]: l = [1, 2, 3]

  In [6]: iter_ = iter(l)

  In [7]: min(iter_)
  Out[7]: 1

  In [8]: max(iter_)
  ---------------------------------------------------------------------------
  ValueError                                Traceback (most recent call last)
  <ipython-input-8-0dab0672ef83> in <module>()
  ----> 1 max(iter_)

  ValueError: max() arg is an empty sequence
(I chose iter(a list) here as it is not a generator.)

And for the same reasons, really. An Iterator in Python (something w/ a __next__ method) also keeps mutable state, and the first call to min() exhausts it (by mutating it). (In Rust, it would likely "consume" the iterator, that is, it would move it, and the article demonstrates an example of this.) The mutable state being kept and used for iteration is the material part here; the generator is just an easy way of obtaining an iterator.

Hopefully I've got this right, but in Python, all generators are iterators¹ (they have a __next__); generators also additionally have send(), throw(), and close() methods, making them substantially more powerful. They make it really easy to make just a basic iterator, so they're often used for that.

¹The glossary actually calls the functions (those with "yield") themselves generators, and the objects returned by them "generator iterators".


I remember seeing a visualization of the variables and values in a python program as it ran and how they get linked to one another and that really helped me understand python variables. Does such a visualization exists for Rust?

I'm gonna try to hunt the link for the Python one.


You're probably thinking of http://pythontutor.com/ for python. It's one of my absolute favorite instructional resources for any language.

Birdseye[1] looks like it is trying to be a similar kind of thing for debugging Python, which is amazing.

I don't know of anything for anything similar for Rust or anything with similar characteristics (C/++).

[1] https://github.com/alexmojaki/birdseye



Yes! thank you!

This helped a lot for understanding the behavior of list vs other values, for example:

http://www.pythontutor.com/visualize.html#code=a%20%3D%204%0...

I think it would do wonders for ownership.


I really enjoyed reading this, I thought the author made these very accessible.

Perhaps the author might consider writing posts on both traits and lifetimes?

Understanding ownership, traits and lifetimes seem to be the keys to understanding and using Rust effectively. Unfortunately these are also the things I've had the hardest time getting my head around. If anyone could suggest similarly accessible articles on traits and lifetimes I would greatly appreciate it. Cheers.


I can’t comment on the general case, but I disagree very strongly with the author’s perspective about generator expressions here.

First, a generator is a precise type of data structure that can be exhausted. If you don’t want that, don’t use that structure. It’s foolish to say that just because a data structure superficially satisfies some API (like being passable to min), that it “ought to have” some behavior that the author wants. That’s backwards. If you want certain behavior, choose a data structure that supports it.

It would be no different from saying if I have a heap and I call sort() it is faster than if I have the same data in a list and call sort(). They both support sort() so why aren’t they equally fast? It’s obviously the wrong way to think about it. Instead, use the appropriate data structure.

And if you’re really worried about the operation being safe, use the provided default argument, or the pattern for versions before 3.4, as in [0], or write a wrapper function that catches whatever exceptions and handles them.

This isn’t even the biggest issue though. Generators generally are meant to function as coroutines and for complex generators, you can use send() to set the “returned value” of a yield statement inside the generator for when it resumes, and so a single generator could go through stages of being exhausted and empty, then having a value set again so that on resumption it’s no longer empty, and could build back up a whole iterable of data that way.

So you absolutely do want min() or max() to throw errors when trying to consume from an empty generator, and you absolutely do want a generator’s state to become modified by the way values are consumed or added, because it meaningfully represents the state of the generator. Suppose for a more complex generator, someone uses send() to feed it a value between your call to min() and your call to max().

Generator expressions that superficially look like list comprehensions are just one use case of the general data structure and in fact one of the “generators 101” pieces of advice you always see is that if you need to iterate the values multiple times, then populate a persistent data structure out of the generator, like by wrapping in list().

I bring this up because it frustrates me to see people acting like a type safety or borrow checking idea is somehow aadding something new or solving some type of problem that the dynamic typing approach doesn’t or can’t solve, but this is emphatically incorrect and a severely myopic way to look at it.

The dynamic typing approach solves all the same problems, it just facilitates the solution differently: use the default argument or other patterns for safety on functions that can fail on empty containers, and use custom wrappers with exception handling when you need to. Choose a persistent data structure if you need persistence.

Again, not saying this has anything to do with people’s big picture reasons for liking static types or borrow checking... just this particular example strikes a chord to me highlighting the super annoying myopia about it.

[0]: < https://stackoverflow.com/questions/36157995/a-safe-max-func... >


> It’s foolish to say that just because a data structure superficially satisfies some API (like being passable to min), that it “ought to have” some behavior that the author wants.

I disagree. There is also a concept of "principle of least surprise". To take an extreme example, let's say you wrote a standard library where push!(array, item) actually takes the array and deletes it. Would it be foolish to say this is bad design because it shouldn't "ought" to have some behavior, it's up to the user to know what's going on under the hood?


That’s a silly abuse of the idea of least surprise. Documented, long-lived built-in data structures like generators that support behavior like exhaustion and mutation in a time-tested way are _clearly_ not some kind of surprising and useless or pedantic quirk.

I mean, consider implicits in Scala. It’s hard to think of a worse violation of least surprise, but because it’s documented and time-tested and becomes a de facto standard in that language, the argument of least surprise becomes nothing but a preference debate.

The behavior where max raises an exception on an exhausted generator is not at all surprising, from the point of view of language documentation, ubiquitous and popular advice on that data structure, etc. etc.

And besides all that: if you did happen to find a serious design bug that created a surprise problem, there are tons of ways to solve that problem in a dynamically typed language that wouldn’t require static typing or borrow checking or any explicit state management. It still would not be evidence that those things are a superior way to solve it.


There is no surprise in passing iterator and it's exhausting.


For me Rust ownership clicked when I was reading a C++ book. More specifically the sections about references, move and copy.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: