Hacker News new | past | comments | ask | show | jobs | submit login
Such a Little Thing: The Semicolon in Rust (pocoo.org)
172 points by joeyespo on Oct 18, 2012 | hide | past | favorite | 74 comments



Is that really some "special behavior"? As far as i understand the semicolon is just an expression separator like in Erlang and Pascal. In contrast, the semicolon is a statement terminator in C. C uses the comma for expression separation. Practically everything is an expression in Rust, even blocks, so

  foo()       => evaluates to the return value of foo
  { foo() }   => evaluates to the return value of foo
  { foo(); }  => evaluates to nil
The last case is not a special case. The last expression in the block determines the returned value. There is an expression separator in there, so there must be two expressions which are separated by it. The first one is foo() and second one is ... wait for it ... the empty expression. Which value should EmptyExpression have? Of course: nil, which would be called void in C-land.


There was some discussion about doing this in D, and there was some in C++11, too. I argued that the presence or absence of the ; did not visually stand out very well, and so would be a source of confusion and errors.

Hence, ; remained as a statement terminator, and the return keyword served to indicate returning an expression.

However, the D lambda syntax does not require a ; when the lambda body consists of only a single expression:

http://dlang.org/expression.html#Lambda

and in practice this has turned out to be well liked.


Ignoring a potentially wanted return value with ";" is a hazard. I'd like to institute some set of "warn-unused-result" warnings in Rust to combat this.


Wouldn't the type system complain that you're returning nil in a function that is not declared/inferred as returning nil?


This is correct for any "normal" (a.k.a. "item level") function, since their signatures must always be explicit. But you're allowed to infer the signatures of closures, which is the only place where this problem could arise.

  // This function lives at the top level of the module, so its types must be annotated
  fn foo(bar: int) -> int {
      // Here's a closure without type annotations:
      let baz = |qux| { qux + 1 };
      // And a closure with type annotations:
      let baz = |qux: int| -> int { qux + 1 };
      // Which means that this would be a compile-time error:
      let baz = |qux: int| -> int { qux + 1; };  // returning nil, but expected int
      // But if you expect this to return int, then there are very few cases where
      // this might not be a compile-time error:
      let baz = |qux| { qux + 1; };
      log(error, baz(bar));  // will print "()", but did you want bar + 1?
      io::println(fmt!("%?", baz(bar)));  // same as previous
      io::println(fmt!("%d", baz(bar)));  // this one *is* a compile-time error
                                          // macros ftw
      baz(bar);  // this will also be a compile-time error
  }


What if foo() could return nil in some cases?


Then you need an algebraic data type having nil is one of its constructors, and that function returning that type.

AFAIK, Rust doesn't have nil. So, the closest thing is using the option type.


In Rust, nil is the name for the unit type (written as empty parentheses, like "()"), used whenever a function doesn't return anything meaningful. You're correct in that instead of null pointers it has the None variant of the Option enum.


An understandable point of view.

Though I can think of an even more crazy variation. Declare the semicolon a binary operator and allow overloading it.

As funny as that sounds, Haskell provides something like this, as do-notation can behave differently depending on the monad it is in.


Your post made me check it, and you CAN overload "operator," in C++ apparently :)

http://en.wikibooks.org/wiki/C%2B%2B_Programming/Operators/O...

My mind is blown. You can write bottom-up code in C++ :)

EDIT: no, you still can't the expressions are evaluated first, before calling "operator,"


Although everything you say is technically true, readers will tend to get the wrong idea unless we add that (1) most users of the do notation choose Haskell's significant white space rather than semicolons and (2) the (rare) Haskell code that does contain semicolons that behave the way you describe probably also contains semicolons (e.g., in Haskell's let statement and case statement) that have nothing to do with monads or the "sequencing" of side effects.

In other words, the semicolons of Haskell are only tenuously related to semicolons in languages like C.


It's odd because people think of semicolons as the end of a statement, not a delimiter between statements.


Which is usually wrong.


That's not the point. Most programmers are not language lawyers. It is very bug-prone, which could in itself enough to remove the confusion from a language that advertises itself as "safe".


Bug prone? The compiler will inevitably warn you about your error at compile time. No such bug will have any consequence beyond that.

This is also not such a big problem when a human reads the function. We can always see at a glance the return type of the function. Either it's explicitly declared, or it's a lambda whose usage is readily visible. Semicolon or not, you can easily guess if the function is supposed to return its last expression, or not.

By the way, the compiler could do the same. Knowing that, there probably will be helpful error messages such as "did you forget the last semicolon?", or "should you remove the last semicolon?".

The semicolon is really just a small confirmation. That's why they didn't chose a heavier syntax.


Ok, I misunderstood then. I thought it would subtly return no value instead of a value, and you'd only discover this at run time. That would be bug prone. If it simply generates a compilation error, no problem.


The thing is that in a safe language, a value always evaluates to a value of the type of the expression[1]. E.g. in Haskell, if a function returns a list:

  foo :: Int -> [a]
It should always evaluate to a list. There is no such thing as a null pointer. The only thing that comes close to a nullable type is an option type such as Maybe:

  data Maybe a = Just a | Nothing
The function

  bar :: Int -> Maybe [a]
can either evaluate to a Just [a] or Nothing.

[1] Actually, I lied a bit, since there is bottom: http://www.haskell.org/haskellwiki/Bottom But bottom does not fullfil the same role as, say null pointers.


That's what makes it elegant: it lets you omit the return statement (safely) but otherwise looks exactly the same as C, even though it's semantically different.


parsimony =/= elegance


Uh, it depends on taste. There are a few reasons to want it, such as lambdas as WalterBright mentioned, but I like it because when this is done in a language where statements and expressions are unified, ";" can neatly replace "," from C, and comma can be reused for something else. (Unifying statements and expressions has its own benefits, but we can start with avoiding GNU C's hideous "({ foo; bar; })", which evaluates to the value of bar...)


I memorably referred to Rust's significant semicolons as "the worst thing in the language" after my very first read through the tutorial, last November.

Almost a year later, I'm just as in love with Rust's semicolon rules as Armin is. However, I bet new users will still be just as instinctively revolted as I initially was.


I guess it depends on what languages you're experienced with. I just read through the Rust tutorial for the first time 2 days ago, and I fell in love with the semicolon rule at first sight.


    The downside is that you would have to put () (Rust's version of “nil”) in a bunch of functions to fulfil the requirements of the callback's signature since otherwise the type inferred from the function would be the value of the last expression
Wouldn't co/contra-variance solve this entirely? It works just fine in Scala for example

    scala> def runTwice(f: () => Unit) = {
         |  f()
         |  f()
         | }
    runTwice: (f: () => Unit)Unit

    scala> runTwice{ () =>
         |  println("moo")
         |  1
         | }
    moo
    moo
Note how it expects a function that returns Unit, i'm passing in a function that returns an Int (1), but the compiler is perfectly happy.


I implemented basically this in Rust, and it was overwhelmingly rejected by the community at the time. People seem to like the strict typechecking.


Are semicolons annoying to type? Probably, I got used to them

The semicolon is the least-annoying non-letter character to type. It's right there on your home row.


Hello American. I'm from one of those other pesky countries that make up most of the human population of earth, and which usually have other keyboard layouts. I have to type shift+, to get the semicolon.


A potential workaround is to always use an American keyboard. It's what I do.


Indeed. On a Swedish keyboard the semicolon is not the biggest problem though. {}[] are since you need to press right-alt+number keys to get them. Really awful. It takes a very short time using an American keyboard for coding to realize its benefits.

Still, even on an American keyboard () are not terribly well placed.

I've been thinking about adopting the NEO2 layout(with some modification so that I get all of åäö) but haven't gotten around to it yet. Has anyone tried that for programming?


Hej!

I have all my symbol characters easily accessible from the normal A-Z keys using a third shift state. It's awesome, and works with any keyboard layout (QWERTY, or even Swedish QWERTY are all fine, as is Dvorak, Colemak etc). Typing a previously uncomfortable sequence like "for (int i = 0; i < count; ++i)" is even pleasant now.

It takes about two weeks to get used to. If you try it, make sure the third shift state can be accessed from either hand (i.e. you have to have left and right keys, just like with shift or control).

I wasn't aware of NEO2 so I developed my own symbol layout using a genetic algorithm running over my code corpus, then adjusting to taste.


Another solution is xmodmap in Unix or some similar tool in Windows to change the keyboard layout. With that solution you don't have to miss the default national keyboard layout.


I feel your pain, I have to jump through all kinds of høøps to get some characters on my keyboård.


I switch from my native layout to the US layout for coding.


Which keyboard layout is that?


Czech; German; Polish; French; Spanish; Italian; Portuguese; etc etc etc.


That being said, there are also programmer's layouts specific to single languages which among others have a semi-colon on the home row.


I've never seen QWERTZ keyboard in Poland and I live there. Typewriters indeed were using this layout, but not many people use them to program.

"practically all computers (except custom-made, e.g., in public sector and some Apple computers) use standard US layout (commonly called Polish programmers layout, in Polish: polski programisty) with Polish letters accessed through AltGr (AltGr-Z giving “Ż” and AltGr-X giving “Ź”)." http://keyboard-layout.info/#Polish


There's no need to use shift to get a semicolon on a french keyboard.


Norwegian. Many european countries use the same placement for semicolon. All the Scandinavian and Baltic countries, Germany and Netherlands to name a few.

Russia uses shift+4. Can't imagine how annoying that is.


Russian keyboard layout does not have any Roman letters so apart from obscure even in their time, Soviet-devloped languages like http://en.wikipedia.org/wiki/Rapira, reaching for shift+4 for ';' is going to be the least of your concerns if you try to write code using Russian layout.


Nitpick: Almost. There is a Dutch keyboard layout, but nobody uses it. Nobody, as in, there's more people in the Netherlands that use Dvorak than people that use the Dutch keyboard layout.

The Dutch commonly use the US keyboard layout, which makes sense because Dutch and English have the same alphabet.


here's a bunch (scroll down)

http://keyboard-layout.info/


Swedish is one such keyboard.


I am going to pay more attention to http://en.wikipedia.org/wiki/Keyboard_layout after this thread!


That sounds really tough. An extra key press. I bet that wastes like, 10 seconds a day.


Is it just me or is this kind of ugly in the first place?

We're trying to mix logic that executes on each element in a sequence, with logic that controls how to iterate over that sequence. That is, a "return 42" statement will tell .each to stop iterating, but its contained within a block that is supposed to do things to individual elements.

The mathematical concept just doesn't sit right with me.. I guess if all you have as an iterator is 'each' then that would necessitate finding an additional way to modify iteration in some way, but still.. I don't like the fact that a return statement can break out of something outside of its own scope

edit: even the low-level alternative seems nicer

    for(blah;blah;blah) { 
        stuff;
    }


I'd just like to express my appreciation for the depth of this article. I always learn something from Armin Ronacher.


I still don't understand why they couldn't just introduce a new keyword that has the same meaning as a missing semicolon. Call it 'ret' or something as a 'lesser return'. Relying on ppl to be semicolon hunters is poor for readability.

It also means you can't put more expressions on the same line because doing so requires a semi-colon which then eliminates the special semicolon behavior. Having an explicit 'ret' keyword means you could accomplish that and have more expressions for a separate block on the same line if desired.


Take a look at Haskell for a very clean and expressive syntax without (required) semicolons.


I'd rather have significant semi-colons than significant whitespace[1]. Don't mix presentation and semantics.

1: http://c2.com/cgi/wiki?SyntacticallySignificantWhitespaceCon...


I'd rather have the ability to choose between and freely mix significant whitespace and significant semi-colons within a single source file.


No you wouldn't. See Javascript for all the pain that causes. Language syntax needs to be dictatorial so that the language is consistent for reading (which is more important than writing).


JavaScript's problem isn't that you can choose whether or not to use semicolons, it's that they're automatically put in for you if you omit them.

http://lucumr.pocoo.org/2011/2/6/automatic-semicolon-inserti...

Technically, Python is a much better example of optional semicolons since you can use them and they're not required. Of course, the real reason they exist is for compound statements.

http://stackoverflow.com/questions/8236380/why-is-semicolon-...


Just like Python, Haskell only ever has significant indentation.


Untrue. While it's true that most people use the 'layout' rule, which is the significant whitespace, there is a non-whitespace-significant variant that uses braces and semicolons. You can see an example of how layout is expanded to the alternative syntax at http://www.haskell.org/onlinereport/lexemes.html#lexemes-lay...


No, I meant that Haskell doesn't consider any whitespace that isn't indentation as significant.


Very few languages consider non-indentation whitespace as significant (not counting using whitespace as a token delimiter, which Haskell definitely does, e.g. "foo bar" =/= "foobar").


Ruby and CoffeeScript do and their syntax relies heavily on it.


For Ruby at least "heavily" is a big exaggeration. It is significant as a token separator and in cases where inserting a line break makes the expression up to the line break a valid expression in itself, in which case it is treated as one.

You pretty much only need to remember to separate tokens where lack of whitespace might create a different vald token, and ensure any expressions that you want to let span more than one line ends with something that expects following tokens to make a complete expression.

Personally I can live with that easily. I can not live with significant indentation, on the other hand...


`foo()` and `foo ()` are different things in ruby, so are `foo[x]` and `foo [x]`. That is significant whitespace.


Yes, but many people assume whitespace everywhere will matter when they hear Python's offside rule described. It's a misconception I've found is worth dispelling, since almost no one minds meaningful indentation.


Actually, that's a lot of words to say that ; is a binary operator, if i'm correct.

a ; b is the operator which returns the value of b. And, there is some syntactic sugar to make a ; equivalent to a ; nil which returns nil.


I'm all with the author with the Ruby love and it looks like Rust managed to implement them in a cleaner syntactic way compared to Ruby.

However, that template code was utterly hideous.


Normally in Rust you'd never use a trait like that without bringing it into the local namespace, so you wouldn't need to qualify it with its module like Armin does. And in future versions of Rust, Num will inherit from both the Eq and Copy traits, so the signature would instead look like this:

  fn find_even<T: Num>(vec: &[T]) -> Option<T> {


I haven't looked at Rust, but the semi colon looks like a decision to make some common verbose or "ugly" syntax to be less noisy. Admittedly most syntax has quirks, and this seems more like a quirk rather than an instance of "clever design". Ignoring the explanation of the differences between statements and expressions, the rest of the discussion is about the presence of a semicolon.


"But the alternative to semicolons is making line endings significant."

Only sorta - consider Haskell's indent rule: code which is part of an expression should be indented further than the start of that expression. Generally, this is something you should be doing to make your code readable regardless of the statement termination.


And with Rust's strong static typing, it's even easier to spot. While I love a preference for expressions, in CoffeeScript I've had to put a single null at the end of functions several times.

The presence/absence of the ; is both explicit and subtle enough to not annoy.


As an aside, the "power_it" function in Python can also be written as:

    map(lambda t: t ** 2, [1,2,3,4])
or even:

    [(lambda t: t**2)(x) for x in [1,2,3,4]]


> [(lambda t: t2)(x) for x in [1,2,3,4]]

Why would you have that lambda?

    [x**2 for x in range(1, 5)]


My kingdom for a TL;DR!

(As a prerequisite to making its point, the article teaches many intricate details regarding two dynamic languages which even most of their practitioners probably never think about, then dives into more intricate details regarding a language which most of us have probably never even used, let along grokked all the details of. I sense that there's something important here for me to learn, but from where I sit this article is a lot to bite off all at once.)

A tight summary by someone who understands all this would be appreciated by many readers, not just myself.


> A tight summary by someone who understands all this would be appreciated by many readers, not just myself.

A tight summary of all of it would be as long as the article. However, the point about semicolons in Rust is:

1. ; is a separator, not a terminator.

2. a;b separates a and b.

3. a; is a special case which means a;nil(or whatever is the equivalent in Rust)

4. The last expression in a function will be the return value of the function.

5. If the last line in a function is "a", it returns a. If it's "a;", it returns nil(from 3)


Perfect, thank you.


I don't think parent deserves to be modded down. The article did take a pretty long time to get to the point.

Why all this talk about Python and Ruby? Rust is not really competing with those languages-- it is quite specifically designed as a C++ replacement. They should be comparing themselves against C++, Golang, or D.

The Golang solution would just be a function which returns the next element or nil if there are no more elements. That function might be part of an interface, if you wanted to generalize it across several types. To me, this is a lot simpler than the other stuff that was discussed.

To go back to Rust specifically, rather than having magic semicolons, why not make the "break" statement take a value, which becomes the return value of the block? Despite all the tl;dr there was not much discussion of design alternatives.


Neat. I came up with this rule some time ago for my as-yet-unimplemented pet language, but I didn't realize Rust did the same thing.


Some of this seems cool and some of it is soaring over my head. I'm kind of embarassed and newly motivated to learn Rust.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: