Hacker News new | past | comments | ask | show | jobs | submit | jarrettc's comments login

Lately I've been playing around with static dimensional analysis in Rust. The overall idea is similar: Use PhantomData to add a type parameter and define an empty struct for each unit. So you might end up with, say, Scalar<Newtons> or Vec3<Meters>.

Dividing and multiplying units statically is where I've had trouble so far. I think I've found a way, but it would depend on negative trait bounds, as discussed here:

https://github.com/rust-lang/rfcs/issues/1053

Ideally, I'd like to be able to do something like this:

    let a: Scalar<Joules> = Scalar::new(2.0);
    let b: Scalar<Seconds> = Scalar::new(3.0);
    let c: Scalar<Watts> = a / b;
    // Watts is a type synonym for Over<Joules, Seconds>.
    // Other derived units would use Times<U, V>. E.g.:
    type Pascals = Times<Newton, Times<Meter, Meter>>.


If you aren't aware of it, the dimensional library for Haskell does this. https://hackage.haskell.org/package/dimensional


Yep! That was my inspiration. I'd love to be able to do this in Rust. Soon, hopefully!


You should take a look at its source code. You'll see that the path you're taking is not the path that library takes.


But what makes Times<Times<Newton, Meter>, Meter> the same as Times<Newton, Times<Meter, Meter>>?


Canonicalization is tough, and requires you to define some common ordering on your units.

Systems will use an array of unit powers, so that if the array were defined as <Joules, Seconds, Newtons, Meters>, then acceleration would be <0,-2,0,1> and watts would be <1,-1,0,0>. Addition and subtraction require that your arrays are equal, and multiplication and division are pairwise additive/subtractive.


Is it tough? There are only seven fundamental units. https://en.wikipedia.org/wiki/SI_base_unit

Just represent every unit in terms of them then it's good.

One problem with the array you came up is that the units are not orthogonal, since Joules = Newtons * Meters.


Actually, there's more "kinds" of units: radians (versus degrees) and steradians come to mind.


These are dimensionless derived units in SI equivalence (respectively m/m and m2/m2)


Sorry! I never really spent much time in the physical sciences, so I didn't know that. You would obviously not want to pick Joules as a fundamental unit.


Will the Rust compiler ever understand numbers in types? I.e. will the numbers ever be more than just part of the string that is the type name? If not, then I don't know how possible powers will be. Maybe the solution would be to define some types for commonly-used powers, e.g. PowN4, PowN3, PowN2, PowN1, Pow2, Pow3, Pow4. Users who needed higher powers could define those types themselves, I guess.


We do desire type-level integers, yes. There hasn't been an RFC yet, though.


Great question. I believe each unit struct would need to implement PartialEq or something similar. That would define the canonical nesting order. Alphabetical would make sense.

We would to resolve the nesting during multiplication and division. For example:

    let a: Scalar<Times<A, B>> = Scalar::new(10.0);
    let b: Scalar<Times<A, Times<B, C>>> = Scalar::new(5.0);
    let c: Scalar<Times<A, Times<A, Times<B, Times<C>>>> = a * b;
    
Another challenge is that the type of c is ugly. But this could be mitigated by generous use of type synonyms.


> Hasn't every template language in the world has come to a similar conclusion?

No, not even every mainstream template language. For example, I still use ERB and EJS extensively, and so do many other engineers. ERB and EJS use the control flow constructs of the underlying languages (Ruby and JS, respectively).

There is of course debate about whether templates should expose the full power of a programming language. I have tried templating languages that do and ones that don't. For now, I'm sticking with templates that let me mix in arbitrary code as I see fit. Could I abuse that power and make a mess? Absolutely. But I try not to, and my code stays pretty maintainable.


I'm skeptical that 3d printed objects could replace very much of what we now obtain from factories. In industrial design, the materials and fabrication process are (supposed to be) selected very carefully with an eye towards cost, physical properties, chemical properties, etc. Even my phone case, which might be the simplest product I own, was made not just from any old plastic, but from a very particular plastic that's optimized for phone cases.

Whereas with 3d printing, we're currently stuck with a kind of plastic that's optimized for compatibility with a 3d printer. It's unclear whether we'll ever be able to print from other, better plastics. And that's saying nothing of other materials like steel, glass, and ceramic, which are often necessary in useful consumer products.


> Or does it truly rely on blackbox obfuscation?

The client ultimately has to decrypt the data somehow. So the key is there on the client. I take it obfuscation is the only thing standing between the user and that key. Am I correct about that?

Which makes me wonder: How much security does HTML5 DRM really provide? Security through obscurity is a very weak defense, and one that is almost invariable defeated sooner or later. Will this really prove a hindrance to piracy in the long run?


This isn't security through obscurity, unless the DRM implementation being a secret actually does provide security. I doubt it does, beyond the fact that an audit of the source could probably find a load of security issues.

Of course an audit of OpenSSL would do the same.


* unless the DRM implementation being a secret actually does provide security.*

It Does, because it is illegal(at least in the US) to reverse engineer it.


> This isn't security through obscurity

I don't necessarily disagree. But how, other than through obscurity, does HTML5 DRM inhibit copying, given that the client possesses the decryption key? (Let's assume the would-be attackers aren't dissuaded by any laws that might apply.)


Thanks so much for working on the grammar docs! This is much needed and very helpful, considering how quickly Rust's grammar has been evolving.


You're welcome! <3


Rust has algebraic data types. They're one of Rust's most important features.

But algebraic data types do not in themselves provide a way to express the notion of a type with variants or sub-types.


I'm hoping the next post will talk about this use case: Suppose I have an enum called "Canine" and I want each of its variants to implement a different "bark" method. Currently, as far as I know, I have to write a match statement dispatching "bark" to each "Canine" variant. If I have "bark" and "growl," I have to write two match statements, and so on for each method that needs to be dispatched to different variants.

So it's a lot of boilerplate. I think it can be slimmed down with a macro, but still.

You might think traits rather than enums are the way to go here. Sometimes that may be true. But often, an enum is far preferable because in Rust, a trait is not a type, but an enum is. That means you cannot, for example, have a "Vec<CanineTrait>," but you can have a "Vec<CanineEnum>."


A trait is a type, but the type does not have a size, which is the reason it is not possible to store a trait instance directly in a Vec (or any other place needing a sized type). It is possible to store a pointer to a trait object in a Vec, which would look like Vec<Box<CanineTrait>>.


Ah, my mistake. So would I be correct in saying that a trait is a type, but it does not necessarily implement the Sized trait, and if it does not, then it cannot be allocated directly on the stack?

Also, I've been wondering something about dereferencing a Box where the inner type is just a trait. If I dereference a Box<CanineTrait> and call "bark," how is the correct implementation found at runtime? Is there a vtable or something analogous?


Trait object pointers are "fat"- they consist of both a pointer to the object and a vtable pointer (much like Go interfaces if you've used those). Other DSTs are similar- pointers to slices consist of both a pointer to the elements and a count.


> If I dereference a Box<CanineTrait> and call "bark," how is the correct implementation found at runtime? Is there a vtable or something analogous?

I think a vtable. It's my understanding that "trait objects" (boxed traits) are how you implement dynamic dispatch in rust. If instead you were talking about a `fn foo<T: CanineTrait>(x: T) { ... }` trait bound (the more common case), there will be monomorphization and static dispatch.

Source: http://doc.rust-lang.org/1.0.0-beta/book/static-and-dynamic-...


After the beta landed, I re-did the TOC for the book, and so this information will be at http://doc.rust-lang.org/nightly/book/trait-objects.html in the future. (where nightly = 1.0.0, of course)


Strong typing on the web has been an intractable problem for me so far. Sure, I can have strong typing in my server-side code. But so many errors result from the interaction between the server, CSS, HTML, and JS. For example, you define a route at the path `/apples` but send an AJAX request to `/oranges` instead. Or you write `<div class="apples">` but query it with `div.oranges` instead. These are very much like type errors or name errors, except they occur at the boundaries of languages and processes.

Have you worked out a way to catch these sorts of things at compile time? If not, do you think it's possible in the framework of the future?


The examples you give don't seem to be typing problems, they seem to be wrong-value problems. They might incidentally also involve typing issues (e.g., "/oranges" might not exist or might be an endpoint with a different signature than "/apples"), but that doesn't seem to be the central problem in any of the examples.

> Have you worked out a way to catch these sorts of things at compile time? If not, do you think it's possible in the framework of the future?

To the extent that they are typing problems, it would seem conceptually possible to catch them through a strongly typed language and framework that abstracts all the underlying technologies and compiles to a combination of backend executable(s), and front-end HTML, JS, and CSS, and includes all the routing for both ends.

Actually building such a beast would seem to be a non-trivial engineering challenge.


> The examples you give don't seem to be typing problems, they seem to be wrong-value problems.

They're like type or name errors because the "apple" and "orange" here are like identifiers, not data. Sure, to the browser, they're data. But in terms of the structure of the web application, they're identifiers like variables, function names, or types.

For example, the HTTP endpoint "/apples?count=5" is like a function "apples(int count)."

> Actually building such a beast would seem to be a non-trivial engineering challenge.

It certainly would. That's why I'm wondering if you consider it possible.


"[I]t would seem conceptually possible to catch them through a strongly typed language and framework that abstracts all the underlying technologies and compiles to a combination of backend executable(s), and front-end HTML, JS, and CSS, and includes all the routing for both ends"

That would certainly do it, but I think all you need is some definition of interface that you can check your code against on both sides. This could be generated by one side and consumed by the other, or produced directly by the programmer(s) and consumed by both. You would need some means of actually checking your code against the specification on the consuming side(s), but they needn't be part of some broader framework (beyond the trivial sense in which they already are).


Sure, you can do that; but the problem is that you then have to worry about type system mismatches between the interface definition language, and the back- and front-end application languages.

There have been lots of things that do something like this: SOAP and the associated WS-* standards are probably the best known.


Very true, and certainly still a big undertaking, depending a little on how well the type systems at either end line up.


Haskell has some typesafe template languages. I'm not a huge fan of them, tbh, as they're kind of rough at the moment.

More promising in my opinion is the fact that Javascript is becoming an increasingly popular backend for Haskell via GHCjs which will give a great space for building type-checked front ends which have all the guarantees you like. For instance, type checked routes already exist which prevent you from writing the wrong endpoints or sending invalid typed data to them... these can be transparently extended to the frontend without much more difficulty.


While a bit rough around the edges, the full type safe server-client stack can be done in Scala with Play[0] + Scala.js[1] + ScalaCSS[2]

I say rough because despite Scala.js' fantastic performance characteristics, you're looking at 100kb file size off the bat; from there generated code size is reasonable, but that's a pretty big hit, particularly for cache challenged mobile clients.

Otherwise, being able to tie Play's type safe reverse routing into the client is a big win. Previously with GruntJS + Coffeescript approach I'd get small file size, but complete lack of safety; just winging it with `routes.user.maybeNotExist(id)`.

[0] https://github.com/playframework/playframework/ [1] https://github.com/scala-js/scala-js [2] https://github.com/japgolly/scalacss


> errors result from the interaction between the server, CSS, HTML, and JS

this may be true, but I'm not sure spending resources trying to solve those problems, are the best use of resources?

I would rather be happy with a strict separation between the front-end and the server than try and deal with such an impedance mismatch and the framework cruft that generates.

I guess it just seems overly ambitious to me.. finding the right abstraction for the server is difficult enough without polluting it with the front-end.

It seems to me people are very productive in other languages that don't tightly bind the front-end code to the server; why spend time solving problems are are more incidental than essential?


> I guess it just seems overly ambitious to me.. finding the right abstraction for the server is difficult enough without polluting it with the front-end.

Certainly. That's why I'm skeptical that this will ever happen.

> why spend time solving problems are are more incidental than essential?

I wouldn't characterize these kinds of errors as incidental, inasmuch as they account for a very high percentage of the web app bugs I've encountered.

Designers of languages like Rust and Haskell noted that null pointer dereferences were the single largest class of errors in other languages. Thus, the designers chose to make null pointer dereferences impossible at the language level. With that choice, they turned a huge number of run-time errors, which developers often miss, into compile-time errors, which developers cannot ignore. This has proven itself beneficial to productivity and software quality.

So too here: If I'm correct that client-side type and name errors constitute a large fraction of all web app errors, then catching them at compile time will be a big win.

But again, I don't know how feasible this is. Nor do I know whether it would involve compiling from a type-safe language to HTML/CSS/JS or just static analysis of raw HTML/CSS/JS.


the logic of your analysis is sound. I suppose I'm just not sure that client-side type and name errors constitute a large fraction of all web app errors.

Anecdotally, the team I am on doesn't have these issues (we certainly have other issues), but I could see them being important to prevent on certain projects.


When using Haskell it feels like I more clearly have to deal with those not-strongly-types-environment issues. I have to write some interface code (which surely takes some "extra" time) to pull un-strong into Haskell; but then the unstronglyness is represented strongly in Haskell types and has to be dealt with accordingly. This reduces funny bugs that may otherwise arise when overseeing corner cases.

Main hometaker: it's like in Haskell I have to do more work up-front, to enjoy much better productivity down the road.


True, but the parent commenter is getting at something important. The article suggests that researchers have found a new, much more concise way to express the solutions to difficult problems. That's different from a library, which merely packages pre-built solutions to a finite set of problems.

It's like the difference between a complete kitchen that fits in your pocket and an iPhone app that lets you order a burrito. The article suggests something like the former. A library which encapsulates 1000 lines of code into a single function call is like the latter.


Even a "kitchen that fits in your pocket" isn't general enough. A library that parses a DSL, such that you can then code in that DSL, is still a lame duck if that library, plus the encodings of all the useful solutions in the domain, add up to more code than just the solutions would be when expressed in a general language. The ROI of (learning the DSL + maintaining the library) would be negative.

On the other hand, there are things like Prolog. You can think of Prolog as a backtracking constraint-solving library, and then another library that parses a DSL for expressing facts and procedural constraints and feeds it to the first library. But Prolog's language isn't really a DSL, because it isn't particular to any domain: there's no closed solution-space where Prolog applies. The efficiency gains you get from Prolog's elision of proceduralized contraint-solution code can apply to any program you write. And so its value is unbounded; its ROI is certainly positive, whatever the cost was to implement it.

That's the comparison that's useful here, I think. Is this something that only solves problems in one domain? Or is this something that could be applied to (at least some little bits of) any problem you encounter?


Some languages are almost UNUSABLE without libraries. Looking at R with the dplyr, ggplot2, reshape 2 packages being mandatory for my work flow.

Python also has some mandatory libraries if you want to do any specific. Numpy and pandas for statistical analysis make them required.

My code is concise and clean but it is because of these libraries.

I am also certain that there are exceptions.


The author mentions that Rust's "Box type is an owning type" and compares Box to C++'s std::unique_ptr. Worth noting: Rust's most basic variable syntax provides unique_ptr-like behavior, so you usually don't need the Box type for this behavior. For example:

https://gist.github.com/jarrett/fe3f24c301a586efd7c1


The default mode in Rust is value semantics, similar to C++'s default. Rust does have the advantage of move-by-default on uses and checking the validity of moves, but if you want unique_ptr behavior then you need Box. C++ has the advantage (if you would call it that) of allowing arbitrary code in move constructors.


I think he was more talking about the unique ownership part, especially that values themselves follow the same rules.

Which I find fitting, since in Rust, the Box is (at least to me) more about indirection. The single-owner semantics just come in because Box follows the same rules as anything else, while for unique_ptr they are part of the pointer concept.


Are you sure your comments are right? It seems to me like the value would be memcpyed, when you pass by value. There is not one unit of memory that is "taken over," or if there was, then Rust would have a serious problem.


It might be a terminology thing. Assuming MyStruct only contains values, it will fully live on my_func's stack and nothing needs deallocating on drop. When take_ownership is called a copy of it's value would be passed in, and the original location marked unusable (assuming MyStruct doesn't implement Copy). So yeah, there's isn't any specific memory location being taken over, since there's two locations involved.


> It seems to me like the value would be memcpyed, when you pass by value.

As far as I know, the compiler should not copy in that instance. Rather, it should move.

> There is not one unit of memory that is "taken over," or if there was, then Rust would have a serious problem.

Could you elaborate on that? Rust does have an ownership model, and ownership can be transferred as in the example. What sort of problems would you expect that to cause? If you're worried that it will invalidate existing pointers, the compiler checks that for you. Unless you deliberately circumvent the check, the compiler guarantees that your pointers are valid.


Semantically speaking, the only difference between a move and a copy is that you're allowed to use a copy type afterwards, and you're not allowed to use a move type afterwards. It's still a memcpy. Of course, these may be elided by optimization passes.


> Semantically speaking, the only difference between a move and a copy is that you're allowed to use a copy type afterwards

Are you speaking about Rust specifically, or move in general? I had always understood that move was no more expensive than passing by reference. That is, I had thought the memory was on the heap and didn't need to be copied each time someone new took ownership of that heap space.


I mean in Rust.

> That is, I had thought the memory was on the heap

An example:

    let x = Box::new(5);
    let y = x;
While the 5 is allocated on the heap, when we move x to y, _the pointer itself_ is memcpy'd. That's why Box<T> isn't Copy; as you say, a simple memcpy won't actually duplicate the structure. Make sense?

(and in this case, I'd assume llvm's optimizations would realize the copy is superflous and just elide it, but semantically, that's what's up)


Oh, agreed. But how about this:

    let x = BigExpensiveStruct::new();
    some_function(x);
That won't trigger a big, expensive memcpy of the BigExpensiveStruct, will it? I'd thought that its memory was on the heap.


If that's just a struct, it's stack allocated. So it's not ok the heap in the first place. IIRC, LLVM may optimize passing it to the function by reference though.


Good to know. Thanks! So I guess the moral is, if you have a big, expensive struct, make sure the expensive part is in a sub-structure you know is heap allocated, such as a Vec. E.g.:

    struct Expensive {
        cheap_value: u32,
        expensive_value: Vec<u32>
    }
Does that seem like a good maxim?


That is not a big expensive struct (to memcpy). (edit: I'm just bickering about what to call things, what you say is right.)

Your intuition about the performance of this sort of thing might be served by reading about how calling conventions work. You might not need to copy the struct from a local variable to where it belongs on the stack or in registers when calling a function, either because the calling convention says you put a pointer to the struct somewhere (depending on its size) or because you're (you being a compiler) clever enough to put it in the place that'll end up in the argument list later. The callee, however, has less flexibility, and if it needs to pass the value somewhere else, it'll probably have to copy the data. This is way better than allocating it on the heap -- stuff on the stack is in the L1 cache, you compute it's location statically instead of having to traverse data structures, but yeah if you found yourself copying around a 1000-byte struct you might want to box it or pass it by reference. I only know about C and C++ calling conventions though, so don't infer from my comment that Rust isn't doing anything "special" for big non-copyable structs -- I wouldn't know.


Rust (well, LLVM really) will automatically pass any "large" struct by reference, which in practice is going to be anything more than a few words (I think in the current implementation, it might actually be two). Unless the function call is inlined, of course, in which case LLVM can do pretty much whatever it wants.


Well, it depends. I mean, generally, you want to give flexibility to your users. Maybe I _do_ want to stack allocate an ExpensiveStruct for some reason. You can always Box the struct to have it on the heap.


> Could you elaborate on that?

If the value isn't passed by (perhaps elidably) copying, that would mean it's getting allocated on the heap and deallocated.


Yes, I had understood the struct's memory to be on the heap. My thinking was, keeping the memory on the heap allows for inexpensive moves. Whereas, the pointer to the heap space may indeed be on the stack, as far as I know.

But I could be mistaken. This is all based on hearsay--just stuff I've read about the Rust compiler. I don't actually work on the compiler myself.


His comment is right in that all of Rust's values do have move semantics by default (checked by the compiler), that's not specific to Box<T>.


The Rust documentation on ownership shows examples of function with reference parameters and Box parameters, but nothing like the OP's `fn take_ownership(obj: MyStruct)` example. Is that an oversight in this page in Rust docs?

http://doc.rust-lang.org/nightly/book/ownership.html


I think it is an oversight: Box isn't particularly special with how it handles ownership, but people do get the wrong impression that Box is the way to have uniquely owned data, whereas it is actually the default. This happens less now that Box doesn't have special syntax, but I agree with you that we may be able to improve those docs in this respect.

Unfortunately, it can be hard to have an example that's simple enough to understand, while still emphasising the right concepts. Hopefully we can find something.


> nothing like the OP's `fn take_ownership(obj: MyStruct)` example.

It uses Box to demonstrate ownership in the first section, but that works the same way with any Rust type unless said type was opted into copy semantics, there's nothing special or magical about Box (at least in that respect), it's just a wrapper for a raw pointer.


I said comments.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: