Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can do linear lists and trees in Rust. The problem is backlinks. If you refcount everything, you can have backlinks, but otherwise there's a safety problem. Backlinks require an invariant which covers two variables, and you can't express that in Rust.

Back pointers are a special sort of pointer from an ownership perspective. They don't carry ownership, but are locked in an invariant relationship with the pointer that does. If you could express that in Rust, it would be compile-time checkable. Rust needs a way to say "field Q of struct T2 is a backpointer to field P of struct T1".

During pointer manipulation, that rule will momentarily be broken. But it's still checkable. It just needs some analysis. First, the checker would have to identify a block of code in which a backpointer was being manipulated, and locate the corresponding manipulation of the forward pointer. This is the code block of interest. It's an atomic transaction, in the database sense. Maybe the user would have to identify the code block with something like "atomic", with syntax like "unsafe".

Then, the checker would have to establish that if the invariant held at entry to the code block, it will hold at exit from the code block. This is a simple application of program verification technology. If the atomic section is small, this is not difficult.

Typical uses would be updating doubly-linked lists, rebalancing trees, and pulling part of a DOM-like tree out and moving it somewhere else.

This is one class of unsafe code - transient. It's the easiest to check, because it's a local problem. At the end of the code block, a safe state has been reestablished.

The second class of unsafe code involves partially valid data structures. This problem lies underneath "Vec", where the underlying block of storage has preallocated, uninitialized space for later growth. This is harder, because unsafe state persists outside unsafe code sections. To deal with this, it's necessary to somehow attach invariants to the data structure. If you have an array A which is valid from elements 0 to n, you need something like "valid_array(A,0,n)". Then, when you initialize an element n+1, you change the validity limits. The checker needs a few simple theorems, such as "valid_array(A,0,n) and valid_element(A[n+1)) implies valid_array(A,0,n+1)" to check this. This is old and completely automatable technology; we did it in the Pascal-F verifier 35 years ago, and Dafny does it today.

Handling these two classes of unsafe code takes care of a sizable fraction of the unsafe code really needed in Rust. With the support described above, most of the unsafe code in pure Rust could be eliminated. Outside of those two classes, most unsafe code in Rust involves external interfaces to other languages.

Unsafe code which doesn't fit into any of those classes needs to be looked at very hard.



> The checker needs a few simple theorems, such as "valid_array(A,0,n) and valid_element(A[n+1)) implies valid_array(A,0,n+1)" to check this.

This is basically dependent types. Rust doesn't really want to go that far from a language level.

> Handling these two classes of unsafe code takes care of a sizable fraction of the unsafe code really needed in Rust.

Not really. All the other datastructures in the stdlib would still need unsafe. Not really a good metric.

The Rustbelt project at MIP-SWS is doing formal verification of Rust including unsafe code, and that approach will be far better IMO. It won't let you eliminate unsafe from your code, but might make it easier to prove that your unsafe code is performing safe operations.


> FWIW you can use Weak for this. There's a performance cost though.

The sentence before your quote mentions ref-counting. This is referring to Weak.


Somehow missed that, thanks.


All the other datastructures in the stdlib would still need unsafe.

Why?


How would you implement the robin hood Hash{Set,Map} with this? Or BTreeMap?

Maybe I misunderstood your proposal, at first it seemed like dependent types (which makes vectors awesome but really doesn't address other things), but your Pascal thing makes it sound more like complete formal verification, which is what Rustbelt is doing. I don't expect that to become part of the language (maybe? would be cool), but it would help reason about unsafe code.


Rustbelt is a research project, and they're using Coq, which is interactive and requires a lot of user effort. What I'm talking about is doing it within the language and automatically.

Looking at BTreeMap, the extensive use of unwrap_unchecked seems to be unnecessary. That looks like premature optimization - omitting the check saves maybe two instruction on the main path.

The other unsafe code there seems to mostly involve backpointers, which was covered above. "Drop" apparently is unsafe only for backpointer reasons, although the comments don't make this clear.


To me, Rust is very much about doing "premature optimization", in the sense that small overheads add up and should be avoided where possible.

BTW, I do think the language ought to have an answer to backpointers. In fact, there is an interesting crate named 'rental' that tries to make a subset of cases safe; but to do it properly requires more concepts in the type system, especially immovable types.

It's tricky, though. The problem with backpointers is that they destroy uniqueness. You can't have 'node: &mut Node' if Node is the target of backpointers, because you could do 'let node2 = &node.left.parent'; you could then take a reference to some field of node2 and mutate it via the original, leading to use-after-free, etc. But without uniqueness, how do you mutate? One option is to use `Cell`, a.k.a. allowing mutation via shared references - but then if you have a reference to a node, you can't take a reference to its child without worrying it could be freed from under you. So you need reference counting. Another option is to use `RefCell`, dynamically tracking whether a value is borrowed, but that has its own overhead (especially if the code is multithreaded, then you need a mutex!) and generally makes it pretty easy to write code that crashes (safely, but still a crash from a user's perspective).

I'm sure there are constructs that can solve this problem in many instances, but as far as I know, it's hard to make general without a very complex proof system.


I think you can solve the backpointer problem in a relatively straightforward way if Rust had true linear types. Possibly not as simple as it seems, though.


> which was covered above

I don't really see how the backpointer thing could work, I find that it too will be limited, but I'm probably just wrong here.

> What I'm talking about is doing it within the language and automatically.

Once the language is verified, more user-friendly abstractions over the verification can be built.


I'm talking about simple type constraints. Think Eiffel, not F* - entry and exit conditions for sections where there is temporary unsafety, and constraints on types for persistent unsafety. Universal quantifiers only, and decidable theories only. Dafny's level of prover.


You can implement a hash map safely on top of an array; indeed you could do it with `Box<[T]>` today. You would just miss the optimization where missing buckets can be represented by zero in the "hash value" field rather than an explicit Option.


Doesn't Option() on a Rust pointer optimize down to a zero pointer value anyway?

Much of the unsafe Rust code I come across looks like either premature optimization or trying to write C in Rust.


> Doesn't Option() on a Rust pointer optimize down to a zero pointer value anyway?

There aren't pointers here though.

But yes, you could use NonZero<T> to mark the hash value as never-zero so that the option gets optimized. You'd still need unsafe code to construct the nonzero, but that's okay.

At least for the Hashmap I don't think the code is premature optimization; it seems to be pretty "idiomatic" unsafe Rust, with unsafe used exactly where needed. I haven't really looked at the btreemap in a while.


t seems to be pretty "idiomatic" unsafe Rust, with unsafe used exactly where needed. I haven't really looked at the btreemap in a while.

The number of excuses associated with unsafe code in Rust is excessive.


It's not an excuse, I just don't think that your statement that the hashmap code uses unsafe superfluously is correct.


It's a bit more complicated than marking the hash value as NonZero because there are currently two parallel arrays: one for hash values, one for (key, value) pairs. So there's nothing in the second array to indicate that an element is missing. (It used to be three arrays, with the keys and values separate, but that "optimization" was actually a terrible layout for the cache. I'd like to see a further improvement where key types that can be hashed trivially, e.g. integers, ditch the hash array altogether.)

I'm fine with the use of `unsafe` here. A hash table can be implemented using safe Rust, but you can squeeze a few percent more performance out of it by using a small amount of unsafe code, in a way that would be very difficult to make safe, without a super fancy dependent type system like ATS plus a ton of annotation overhead. The unsafe code can be vetted relatively thoroughly and is safe from the outside. That kind of practical compromise seems very much in the spirit of Rust...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: