Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How would you implement the robin hood Hash{Set,Map} with this? Or BTreeMap?

Maybe I misunderstood your proposal, at first it seemed like dependent types (which makes vectors awesome but really doesn't address other things), but your Pascal thing makes it sound more like complete formal verification, which is what Rustbelt is doing. I don't expect that to become part of the language (maybe? would be cool), but it would help reason about unsafe code.



Rustbelt is a research project, and they're using Coq, which is interactive and requires a lot of user effort. What I'm talking about is doing it within the language and automatically.

Looking at BTreeMap, the extensive use of unwrap_unchecked seems to be unnecessary. That looks like premature optimization - omitting the check saves maybe two instruction on the main path.

The other unsafe code there seems to mostly involve backpointers, which was covered above. "Drop" apparently is unsafe only for backpointer reasons, although the comments don't make this clear.


To me, Rust is very much about doing "premature optimization", in the sense that small overheads add up and should be avoided where possible.

BTW, I do think the language ought to have an answer to backpointers. In fact, there is an interesting crate named 'rental' that tries to make a subset of cases safe; but to do it properly requires more concepts in the type system, especially immovable types.

It's tricky, though. The problem with backpointers is that they destroy uniqueness. You can't have 'node: &mut Node' if Node is the target of backpointers, because you could do 'let node2 = &node.left.parent'; you could then take a reference to some field of node2 and mutate it via the original, leading to use-after-free, etc. But without uniqueness, how do you mutate? One option is to use `Cell`, a.k.a. allowing mutation via shared references - but then if you have a reference to a node, you can't take a reference to its child without worrying it could be freed from under you. So you need reference counting. Another option is to use `RefCell`, dynamically tracking whether a value is borrowed, but that has its own overhead (especially if the code is multithreaded, then you need a mutex!) and generally makes it pretty easy to write code that crashes (safely, but still a crash from a user's perspective).

I'm sure there are constructs that can solve this problem in many instances, but as far as I know, it's hard to make general without a very complex proof system.


I think you can solve the backpointer problem in a relatively straightforward way if Rust had true linear types. Possibly not as simple as it seems, though.


> which was covered above

I don't really see how the backpointer thing could work, I find that it too will be limited, but I'm probably just wrong here.

> What I'm talking about is doing it within the language and automatically.

Once the language is verified, more user-friendly abstractions over the verification can be built.


I'm talking about simple type constraints. Think Eiffel, not F* - entry and exit conditions for sections where there is temporary unsafety, and constraints on types for persistent unsafety. Universal quantifiers only, and decidable theories only. Dafny's level of prover.


You can implement a hash map safely on top of an array; indeed you could do it with `Box<[T]>` today. You would just miss the optimization where missing buckets can be represented by zero in the "hash value" field rather than an explicit Option.


Doesn't Option() on a Rust pointer optimize down to a zero pointer value anyway?

Much of the unsafe Rust code I come across looks like either premature optimization or trying to write C in Rust.


> Doesn't Option() on a Rust pointer optimize down to a zero pointer value anyway?

There aren't pointers here though.

But yes, you could use NonZero<T> to mark the hash value as never-zero so that the option gets optimized. You'd still need unsafe code to construct the nonzero, but that's okay.

At least for the Hashmap I don't think the code is premature optimization; it seems to be pretty "idiomatic" unsafe Rust, with unsafe used exactly where needed. I haven't really looked at the btreemap in a while.


t seems to be pretty "idiomatic" unsafe Rust, with unsafe used exactly where needed. I haven't really looked at the btreemap in a while.

The number of excuses associated with unsafe code in Rust is excessive.


It's not an excuse, I just don't think that your statement that the hashmap code uses unsafe superfluously is correct.


It's a bit more complicated than marking the hash value as NonZero because there are currently two parallel arrays: one for hash values, one for (key, value) pairs. So there's nothing in the second array to indicate that an element is missing. (It used to be three arrays, with the keys and values separate, but that "optimization" was actually a terrible layout for the cache. I'd like to see a further improvement where key types that can be hashed trivially, e.g. integers, ditch the hash array altogether.)

I'm fine with the use of `unsafe` here. A hash table can be implemented using safe Rust, but you can squeeze a few percent more performance out of it by using a small amount of unsafe code, in a way that would be very difficult to make safe, without a super fancy dependent type system like ATS plus a ton of annotation overhead. The unsafe code can be vetted relatively thoroughly and is safe from the outside. That kind of practical compromise seems very much in the spirit of Rust...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: