You can do linear lists and trees in Rust. The problem is backlinks. If you refc...

Manishearth · on Feb 17, 2017

> The checker needs a few simple theorems, such as "valid_array(A,0,n) and valid_element(A[n+1)) implies valid_array(A,0,n+1)" to check this.

This is basically dependent types. Rust doesn't really want to go that far from a language level.

> Handling these two classes of unsafe code takes care of a sizable fraction of the unsafe code really needed in Rust.

Not really. All the other datastructures in the stdlib would still need unsafe. Not really a good metric.

The Rustbelt project at MIP-SWS is doing formal verification of Rust including unsafe code, and that approach will be far better IMO. It won't let you eliminate unsafe from your code, but might make it easier to prove that your unsafe code is performing safe operations.

dbaupp · on Feb 17, 2017

> FWIW you can use Weak for this. There's a performance cost though.

The sentence before your quote mentions ref-counting. This is referring to Weak.

Manishearth · on Feb 17, 2017

Somehow missed that, thanks.

Animats · on Feb 17, 2017

All the other datastructures in the stdlib would still need unsafe.

Why?

Manishearth · on Feb 17, 2017

How would you implement the robin hood Hash{Set,Map} with this? Or BTreeMap?

Maybe I misunderstood your proposal, at first it seemed like dependent types (which makes vectors awesome but really doesn't address other things), but your Pascal thing makes it sound more like complete formal verification, which is what Rustbelt is doing. I don't expect that to become part of the language (maybe? would be cool), but it would help reason about unsafe code.

Animats · on Feb 17, 2017

Rustbelt is a research project, and they're using Coq, which is interactive and requires a lot of user effort. What I'm talking about is doing it within the language and automatically.

Looking at BTreeMap, the extensive use of unwrap_unchecked seems to be unnecessary. That looks like premature optimization - omitting the check saves maybe two instruction on the main path.

The other unsafe code there seems to mostly involve backpointers, which was covered above. "Drop" apparently is unsafe only for backpointer reasons, although the comments don't make this clear.

comex · on Feb 18, 2017

To me, Rust is very much about doing "premature optimization", in the sense that small overheads add up and should be avoided where possible.

BTW, I do think the language ought to have an answer to backpointers. In fact, there is an interesting crate named 'rental' that tries to make a subset of cases safe; but to do it properly requires more concepts in the type system, especially immovable types.

It's tricky, though. The problem with backpointers is that they destroy uniqueness. You can't have 'node: &mut Node' if Node is the target of backpointers, because you could do 'let node2 = &node.left.parent'; you could then take a reference to some field of node2 and mutate it via the original, leading to use-after-free, etc. But without uniqueness, how do you mutate? One option is to use `Cell`, a.k.a. allowing mutation via shared references - but then if you have a reference to a node, you can't take a reference to its child without worrying it could be freed from under you. So you need reference counting. Another option is to use `RefCell`, dynamically tracking whether a value is borrowed, but that has its own overhead (especially if the code is multithreaded, then you need a mutex!) and generally makes it pretty easy to write code that crashes (safely, but still a crash from a user's perspective).

I'm sure there are constructs that can solve this problem in many instances, but as far as I know, it's hard to make general without a very complex proof system.

Manishearth · on Feb 18, 2017

I think you can solve the backpointer problem in a relatively straightforward way if Rust had true linear types. Possibly not as simple as it seems, though.

Manishearth · on Feb 17, 2017

> which was covered above

I don't really see how the backpointer thing could work, I find that it too will be limited, but I'm probably just wrong here.

> What I'm talking about is doing it within the language and automatically.

Once the language is verified, more user-friendly abstractions over the verification can be built.

Animats · on Feb 17, 2017

I'm talking about simple type constraints. Think Eiffel, not F* - entry and exit conditions for sections where there is temporary unsafety, and constraints on types for persistent unsafety. Universal quantifiers only, and decidable theories only. Dafny's level of prover.

comex · on Feb 17, 2017

You can implement a hash map safely on top of an array; indeed you could do it with `Box<[T]>` today. You would just miss the optimization where missing buckets can be represented by zero in the "hash value" field rather than an explicit Option.

Animats · on Feb 17, 2017

Doesn't Option() on a Rust pointer optimize down to a zero pointer value anyway?

Much of the unsafe Rust code I come across looks like either premature optimization or trying to write C in Rust.

Manishearth · on Feb 17, 2017

> Doesn't Option() on a Rust pointer optimize down to a zero pointer value anyway?

There aren't pointers here though.

But yes, you could use NonZero<T> to mark the hash value as never-zero so that the option gets optimized. You'd still need unsafe code to construct the nonzero, but that's okay.

At least for the Hashmap I don't think the code is premature optimization; it seems to be pretty "idiomatic" unsafe Rust, with unsafe used exactly where needed. I haven't really looked at the btreemap in a while.

Animats · on Feb 17, 2017

t seems to be pretty "idiomatic" unsafe Rust, with unsafe used exactly where needed. I haven't really looked at the btreemap in a while.

The number of excuses associated with unsafe code in Rust is excessive.

Manishearth · on Feb 17, 2017

It's not an excuse, I just don't think that your statement that the hashmap code uses unsafe superfluously is correct.

comex · on Feb 18, 2017

It's a bit more complicated than marking the hash value as NonZero because there are currently two parallel arrays: one for hash values, one for (key, value) pairs. So there's nothing in the second array to indicate that an element is missing. (It used to be three arrays, with the keys and values separate, but that "optimization" was actually a terrible layout for the cache. I'd like to see a further improvement where key types that can be hashed trivially, e.g. integers, ditch the hash array altogether.)

I'm fine with the use of `unsafe` here. A hash table can be implemented using safe Rust, but you can squeeze a few percent more performance out of it by using a small amount of unsafe code, in a way that would be very difficult to make safe, without a super fancy dependent type system like ATS plus a ton of annotation overhead. The unsafe code can be vetted relatively thoroughly and is safe from the outside. That kind of practical compromise seems very much in the spirit of Rust...