So You Want to Write Your Own Language (2014)

tikhonj · on Dec 23, 2015

I've always found it unfortunate how he, and others, implicitly define the success of a programming language as its popularity. A programming language can be incredibly interesting and useful to you and others without ever becoming fashionable—and that doesn't mean it has failed!

I think there should be other metrics for success: a programming language can be useful to a dedicated community of experts, it can be useful to a single company, it can be great for a single use-case or even just great for you, it can push novel ideas or showcase a novel design philosophy, all without massive popularity or market success. (And it's not like most languages directly make lots of money anyhow.)

Popularity is a poor proxy for quality, and quality matters all on its own.

Keeping this in mind changes a lot of his suggestions, I think.

Completely unrelated note: what the article calls "lowering" I've more commonly seen called "desugaring" and is, indeed, a powerful technique both in terms of implementation but also in terms of understanding a language—if a piece of syntax is pure sugar, you don't have to understand anything new, just how the syntax it desugars to works.

saurik · on Dec 24, 2015

FWIW, I have more often seen the term "lowering" used by people who do research in this field and "desugaring" mostly among industry developers.

tikhonj · on Dec 26, 2015

Most of the time I see desugaring is in a research context, at least within areas of PL adjacent to functional programming.

joncrocks · on Dec 23, 2015

It's because we're all in it for the fame and glory. Right?

;-)

kazinator · on Dec 23, 2015

> "lowering" ... "desugaring"

"expansion"?

".... syntax it expands to"

kibwen · on Dec 23, 2015

In a language with macros that might be ambiguous, since there will usually be a phase called "macro expansion".

kazinator · on Dec 23, 2015

That is correct. Expansion isn't necessarily macro expansion. But it's the same kind of thing and may even be be done in the same code walking pass (and possibly have to). (Since the constructs being expanded could be generated by macros, then if the expansions also use macros in their syntax, the two have to be interleaved.)

The difference is minor: it's basically "does the code walker specially recognize this construct and invoke some code to expand it" versus "does the code walker recognize a macro call and invoke it to expand it".

qwertyuiop924 · on Dec 23, 2015

I actually think that easy parsing is something you should try for. the easier your language is to parse, the less edge cases there are, and simpler it is for programmers to understand. In short, I'd rather have Scheme than Ruby. But I seem to be in the minority...

epidemian · on Dec 23, 2015

I agree in that easy parsing is a good quality of languages. But i also think a language can have a rather rich (i.e. big) syntax that's also straightforward to parse. That is, that the dichotomy of either having a very lean syntax like Scheme or a bloated and complicated syntax like Ruby is not so clear-cut.

The difficulty of parsing Ruby's syntax (both for machines and for humans) comes from some unfortunate design decisions, not so much from its vastness. For example, using the same symbols, {}, to denote hash literals and blocks makes parsing/understanding the syntax of method calls more difficult:

  def foo(*args)
    puts "args: #{args}, block_given?: #{block_given?}"
  end
  # Parentheses can be omitted when calling methods:
  foo           # args: [], block_given?: false
  foo 42        # args: [42], block_given?: false
  foo 42, "bar" # args: [42, "bar"], block_given?: false
  # *Except* when the first argument is a hash. This:
  foo({})       # args: [{}], block_given?: false
  # Means something completely different than this:
  foo {}        # args: [], block_given?: true
  # But if the hash is not the first argument then it's OK, you can omit parens
  # again :3
  foo 42, {}    # args: [42, {}], block_given?: false

This is not a problem of having dedicated syntax for blocks and hash literals. It's a problem of having chosen conflicting syntactic forms for them. If blocks could only be denoted with do/end, or with a different set of symbols than hash literals, this would not be a problem.

And this is just one example of conflicting syntactic forms. In Ruby, sadly, there many cases of this. E.g., using * both for multiplication and for argument "splatting" means that

  a * b

is not the same as

  a *b

when `a` is a method, but actually is the same when `a` is a local variable; or `foo "bar"` and `foo"bar"` being two perfectly valid ways of calling foo with "bar" as an argument, but `foo [42]` and `foo[42]` not being two perfectly valid ways of calling foo with [42] as an argument; and probably more.

So, as long as syntactic forms are carefully chosen so that they don't have weird and conflicting edge cases between each other, i think a language can have a rather big syntax without it being difficult to parse/understand.

Disclaimer: i love Ruby; it's my bread-and-butter language. But i wish it didn't have as many syntax quirks as it does :)

qwertyuiop924 · on Dec 23, 2015

This is what I mean. Syntactic ambiguity is evil, and pretty common. Even C has it (google "lexer hack"). And I definitely think that Scheme could do with a bit more syntax. Some of the whitespace indent extensions (WISP, for example) are nice, and I could do with hash table literals. So I'm off to go write those...:-D

saurik · on Dec 24, 2015

FWIW, as someone who does a lot of work with parsers, that block case is a bad example, as it is not at all ambiguous and does not make the language more difficult to parse. C and C++ have tons of things that make them frustrating or even nearly impossible to parse: if you just write down the naive description of this feature from Ruby you will see there are no ambiguities and it never even requires more than a single lookahead token to decide if you are using a hash or a block argument. A very similar example from JavaScript between and object literal and a block statement is actually a legitimate issue, but this feature of Ruby is so benign you can probably parse it correctly dropped into the middle of the file at the { with nothing but a single token of look-behind. The "splat" example, though, is spot on: there you have a context sensitivity. (Not so much in the [/" one, though: this does make the language more frustrating to parse, but only because the language is clearly defined over characters instead of tokens; but it is still a better example than the block argument.)

epidemian · on Dec 27, 2015

Thanks for the clarification :)

(And sorry for not having replied earlier, i seems HN doesn't notify about replies any more.)

The point i was trying to make was mostly about the difficulty of understanding the syntax (i.e. "parsing" for humans), not so much about syntax ambiguity or having to use a more sophisticated parsing strategy. The comments on the code represent the "rules" that a human might follow to try and make sense of that syntax. They are not as simple as they could be had Ruby only allows blocks to be denoted with do/end.

However, about the complexity on the parser itself, doesn't this case of block syntax need at least some dedicated logic to take care of ´foo {}´ not parsing the {} part as a hash literal expression but rather as a block? (e.g., maybe adding some explicit priories to the parser rules, or making the lexer add hidden parentheses so that the parser sees ´foo() {}´ and doesn't have to make that distinction).

brianberns · on Dec 23, 2015

Easy (for a computer) to parse does not necessarily mean simpler for programmers to understand. I think Lisp's deeply nested parens are a good example of this problem.

SpaceCadetJones · on Dec 23, 2015

I've been diving into Lisp over the past few months and I already find it just as easy to read and even easier to edit/navigate with a proper editor. Sometimes it gets hairy when reading mathematical operations, but that's it. Sure it's definitely possible to write hard to read code, but I don't think s-exps are necessarily more prone to this than your typical C style syntax.

pvitz · on Dec 23, 2015

J is also quite easy to parse, but could be difficult to understand. (See e.g. http://sblom.github.io/openj-core/iojSent.htm#Parsing)

stepvhen · on Dec 23, 2015

J is one of my favorites. The syntax is cryptic at first, but the lack of ambiguity makes programs easier to understand and easier to design in the long run.

Avshalom · on Dec 23, 2015

I dunno, I think proper anonymous functions would have been a lot better than hooks and forks.

Though I'm not sure if that falls under syntax or semantics for purposes of this discussion.

kevin_thibedeau · on Dec 23, 2015

Lisp has never been fully implemented. If M-expressions were in active use (and modernized over time) there wouldn't be so much parentheses hell.

SpaceCadetJones · on Dec 23, 2015

The only people who refer to it as parens hell seem to be non-lispers. As I understand it m-exps were never implemented because people don't really want them. I definitely prefer s-exps over them.

mokus · on Dec 23, 2015

It seems plausible that the implied causality is backwards - perhaps it's not that people don't mind the parens because they use LISP, but rather that people use LISP because they don't mind the parens. Personally, I don't think I ever did mind them, but I'd be really surprised if that's not at least a moderately important filter on the set of people who become lispers.

munificent · on Dec 23, 2015

> The only people who refer to it as parens hell seem to be non-lispers.

Confirmation bias.

Avshalom · on Dec 23, 2015

"the only people" implies that non-lispers is a category that is not equal to "almost everyone"

qwertyuiop924 · on Dec 23, 2015

...Which is why we have WISP. And SRFI-49. And editors that can indent LISP code. All of these solve the problem rather nicely.

nbardy · on Dec 23, 2015

When I first started writing Lisp I was put off by how simple the syntax was. Everything looks the same and it was so hard to tells things apart. After working with Lisp for a long enough time I began to love the simplicity. Everything is so simple, nothing gets in the way of reading code.

moron4hire · on Dec 23, 2015

To poorly paraphrase Bruce Lee: "When I first started to learn Lisp, I thought it was all just parens. After I truly learned Lisp, I realized it is all just parens."

martincmartin · on Dec 23, 2015

Also tools become much easier. There are few good refactoring tools for C++ because the language is almost impossible to parse. In fact, even generating tags for vim/emacs is nearly impossible.

pldrnt · on Dec 23, 2015

My thoughts exactly, as someone who has spent quite some time creating a (usable and readable) LALR syntax that can do everything modern C++ does (plus named arguments, a null-coalescing operator and extension methods) and is currently writing a transpiler based on it, I hope more people share your concern :)

nickpsecurity · on Dec 23, 2015

Anyone who used Modula-3...

https://en.wikipedia.org/wiki/Modula-3

...knows that one can have key features of C++ for systems programming with an easy-to-parse, safer, and [for a time] more efficient language. The SPIN OS team even modified it with type-safe linking so one could hotload code into the OS. What I saw there was "hotload code into running system" w/ associated benefits during updates. Or safer, better JIT. ;)

Anyway, combined with LISP-style macros & irritating parts removed (eg uppercase, verbosity), you get a system language that blows C++ away in productivity (esp w/ DSL's), compiler efficiency, maintenance, and possibly run-time efficiency. The world might have went in a different direction but they better be honest about parsing pain they created. And all the productivity opportunities lost due to that.

msie · on Dec 23, 2015

Takes me back to the past when Metrowerks created a Modula-2 compiler for the Mac:

https://en.wikipedia.org/wiki/Metrowerks

nickpsecurity · on Dec 24, 2015

Didn't know they started out with Wirth's compiler. Picked the best one. Then market rejected it. Then pivoted to a more successful approach supporting whatever crud was popular. The recurring pattern. Also didn't know they supported that many platforms. Impressive.

lultimouomo · on Dec 23, 2015

While syntax surely doesn't help, I think the real blocker is actually the preprocessor.

qwertyuiop924 · on Dec 25, 2015

No, the real problem is the both C and C++ are context sensitive. (In C compilers, though not C++ ones, 'the lexer hack' is the most common solution) That's fairly hard to parse, although not as bad as some languages. For preprocessing, it's fairly easy to write a C preprocessor, and then look at how your code will look if various options are defined... Actually that would be a really good debugging tool. I should go write that. And then make an emacs mode for it, so that you can type one command at any point, and see how your function/file/whatever will be transformed by the preprocessor.

umanwizard · on Dec 23, 2015

C has basically the same preprocessor, and ctags works mostly fine.

pjmlp · on Dec 23, 2015

ctags is a pretty bare bones compared with what an IDE with semantic analysis is capable of.

chipsy · on Dec 23, 2015

The simplest parser style is assembly code: one line, at most one statement. But there's always some desire to get a little more code density than assembly, it's just a matter of how one wants to go about the problem.

The syntax of older-style BASIC, for example, hews close to the assembly parsing model, just adding a few special cases for different keywords. Algol syntax has a tradition of design based around algorithm description first, with generalized power second. The Forth or Lisp approaches, because they are very generic and powerful, naturally pull those languages in the direction of extensive metaprogramming.

In that sense one can see that the more sugary syntaxes treat the programmer more like an end-user, because they direct the workflow. Generalized ones encourage a DIY approach to everything. There are tradeoffs to each strategy.

bryanlarsen · on Dec 23, 2015

And I don't think that Walter Bright would necessarily disagree with you. My interpretation of his "false gods" list is that those are probably OK as secondary goals, but that the "true gods" list should be primary goals.

In this case, "easy parsing" is OK, but not when it conflicts with "tried and true".

Choosing Lisp as your "tried and true" is acceptable, though...

Avshalom · on Dec 23, 2015

Particularly he has CFG as a true god, for ease of parsing and tooling. Bright hangs out here so if he wants he can come in and expand but I think it's a case of look at the moon(CFG) not the finger(parsing ease) pointing at the moon.

mindslight · on Dec 23, 2015

Humans like parsing - witness the love for mathematical notation. It takes time to learn the grammar, but after that it's a background process.

Simpler general parsing (eg Lisp) is more powerful, but that's a negative for learnability of the language.

nickpsecurity · on Dec 23, 2015

It's a decent list but not enough. The real trick is being general purpose enough to handle low-level up to programming in the large while being comprehensible and easy for compiler to work with. In my field, highly-assured INFOSEC, you also want correctness and traceability for security purposes.

I wrote an essay on a layered, bottom-up approach to constructing such a language here if anyone is interested:

https://www.schneier.com/blog/archives/2014/05/friday_squid_...

I'm also interested in feedback on my approach. It tries to solve probably a dozen problems at once with some basic principles and three, compatible languages.

restalis · on Dec 24, 2015

I'm not sure I understand the value of having tools to "downgrade" something to a lower level. I compare this "downgrading" with "compiling to C" - a feature that enables some languages to become useful on more than one platform just by riding on the back of the C language. Was it about this idea of having different layers so that one may care to really port only the lowest level when it comes to it? But then the scope you get on the lower levels is simply inferior compared to the one you get on higher level/layer in terms of optimization possibilities, partly because you are now forced to get from point A (source code) to point B (binary code) not directly but through a 3rd point - C (a lower level source code), which may hurt performance and other things. Did I misunderstood something?

nickpsecurity · on Dec 24, 2015

It's an exploration so I was throwing "downgrading" out there. You almost guessed it all with portability and bringing on new programmers. Both arguments for that technique. Another, esp if compiler is written in an ML, is that code used to analyze, optimize, or translate lower layers of language can be re-used in compiler for high-level.

Matter of fact, I think my default was to go to ASM with plugins for other stuff. That should prevent missed opportunities in optimization.

toolslive · on Dec 23, 2015

Minimizing key strokes is seen as 'false', but it has actually been proven that one can write (debug, maintain,...) about 10 lines of code per hour. Those 10 lines can be 10 lines of assembly, 10 lines of c++ or 10 lines of haskell. Of course, you can do much more in 10 lines of haskell than you can do in 10 lines of assembly. It basically means you want to be work on the highest level of abstraction that you can afford. It also means that economy of expression is important. source: https://vimeo.com/9270320

josteink · on Dec 23, 2015

> Those 10 lines can be 10 lines of assembly, 10 lines of c++ or 10 lines of haskell.

I seriously doubt the validity of this. 10 lines of assembly is usually reasonably clear-cut what is about.

10 lines of Haskell can take hours of mindfuck trying to peer through the functor and monad operators, trying to work out which operation does what and which operator takes precedence where. And then you need to start worrying about which data-type are you actually working on. And how does that type implement lift and bind. Etc etc.

Basically 10 seems like a BS number taken out of thin air, because it looks good in a base 0x0a number-system.

Jtsummers · on Dec 23, 2015

The origin of this idea, for my first introduction I guess, was in The Mythical Man Month. I don't have a copy at hand, so I can't read what Fred Brooks wrote specifically. 10 is very precise, but really the writing was about the order of magnitude that people can manage in a day. It's about the level of complexity involved.

10 lines of assembly is very clear about what exactly is happening. It's a sequence of adds, loads, branches, stores, moves, etc. Each step is incredibly clear, but what are they operating on? Now it's not so clear. What is R2 at this point in the program? Am I calling a subroutine? On this architecture is there some way to automatically handle this with storing the PC and everything, or am I going to use a regular branch instruction? Regular branch? Ok, now I'm storing the PC and other information away per our calling convention.

C: Oh, I'm calling a function. I'm calculating a formula. I have named variables, so it's clear what I'm operating on (hopefully, this is at least possible here, even if not well done in practice). With typedefs, good naming conventions, structs and enums, you can maintain an order of magnitude greater complexity in the same time as an assembly programmer, in an order of magnitude lower amount of code.

Haskell: I have functions, I have types, I operate over these types transforming them from one to another. Collecting objects of a type here, transforming them there, zipping them with those, and back again. And I've written a function that would take dozens of lines of C or grosses of lines of assembly, and it took me an hour. Again, an order of magnitude more complexity can be entertained in another order of magnitude less code.

Don't get stuck on the specific number, that's incredibly pedantic and unhelpful. It's about the degree and magnitude.

codygman · on Dec 24, 2015

> 10 lines of Haskell can take hours of mindfuck trying to peer through the functor and monad operators,

Please stop. Can you even come up with one example that supports this? Alright, now how about 10 lines from a "real world"* Haskell project.

*real world meaning has at least 1 user besides it's developer

saurik · on Dec 24, 2015

I think the more important counter-argument is "that ten lines of assembly just barely did one thing; the ten lines of Haskell was most of the program: clearly it will be easier for you to understand a single wire or switch than to understand an entire computer, but that isn't a valuable insight".

codygman · on Dec 25, 2015

Hyperbole (if that is what the commenter was using and didn't seriously think it) should be used carefully when it reinforces stereotypes that are so damaging.

jerf · on Dec 23, 2015

It may help to know what he's talking about. He's probably [1] not referring to whether you use C-style || vs. Python's "or". He's probably referring to things like this:

    NB. Initialize a global, but not if it's already been initialized
    NB. Example: 'name' initifundef 5
    initifundef =: (, ('_'&([,],[))@(>@(18!:5)@(0&$)) ) ux_vy ((4 : '(x) =: y')^:(0:>(4!:0)@<@[))

From the bottom of http://code.jsoftware.com/wiki/Phrases/Language_Utilities . NB. appears to be the comment marker, so the last line is the payload.

Yeah, I picked the worst thing I could find, and my guess is that that code is doing something really nasty hacky, because it is too large to be doing only what it is described. On the other hand, the fact that I pretty much have to guess with almost no information is sorta my point.

There's creating a computer language to be concise and readable, and then there's just concise. You can also have a look at Perl, which can (and generally should) be programmed as a fairly normal dynamic object-oriented language, but has a subset of functions, operators, and implicit variables that allows you to spit out terribly small, dense lines of few characters that do horrible things, usually not just doing whatever they are putatively doing but also having other mysterious side effects in the process.

[1]: I'm reasonably confident I'm on the right track here, but don't want to put words in his mouth, so I'm hedging. Especially since there's a non-zero chance the original author will show up here. As I type this, that has not happened yet.

bcbrown · on Dec 23, 2015

> NB. appears to be the comment marker

Likely in reference to https://en.wikipedia.org/wiki/Nota_bene

Avshalom · on Dec 23, 2015

It's a 'false god' as in not something you should worship at the feet of. Don't prefer fr to for; feel free to require empty () when calling a function without arguments; lambda instead of \ is fine.

It also doesn't mean you can't, just that you shouldn't use it as a/the primary concern.

seanmcdirmid · on Dec 23, 2015

Then of course APL is king since a 10 line APL program is huge. But that doesn't really bear it given APL's reputation as a write only language.

Abstraction also has a cost. You can cram a lot into 10 lines, but if you have to think a long time about the most elegant way to do that, you might not be winning in productivity.

Jtsummers · on Dec 23, 2015

Probably more a warning against pre-mature optimization.

If you can eliminate all ; in most, or even all, circumstances in otherwise C-like syntax, what do you gain?

And most of the savings you see there are from having greater levels of abstraction available.

Assembly = arithmetic. You have to explicitly state each step, and on its own there's no understanding of what you're operating on, they're all bits (numbers).

C++ (ok, more C) = algebra. You have a bit more understanding of your structure, better names and concepts behind things (types = units, for instance). You're still operating on bits, but now most of those bits have a specifically, language encoded, meaning.

Haskell = calculus. Not only are your bits typed, so are your functions (ok, technically also typed in C and kin syntaxes, but not usually as useful) and you can pass them around and modify them.

EDIT:

I'm more picking on the class of syntax available.

When you include semantics OO languages offer a great deal more expressive power than most procedural languages. Probably should've just had C and not C++, but it is what I wrote.

More generally what I should have written was towards language classes: assembly (as most primitive procedural language), structured procedural languages (C, Go, Algol, Ada, etc.), OO languages (Java, C++, Smalltalk, etc.) and functional languages (lisps, Haskell, ML-family, etc.).

But then you have to break it down further. OO is not inherently more or less expressive than functional languages, it depends on the feature set of the OO and functional languages. Smalltalk is very expressive, easily comparable to the more expressive functional languages. Java, as it used to be, not familiar with developments post adding generics so I'm really rusty on it, was not so expressive. So a comparison between Smalltalk and Common Lisp would put them on a similar level of abstraction. But Common Lisp versus Java (again, with my outdated knowledge), Common Lisp is at a higher level of abstraction (both syntactically and semantically).

yongjik · on Dec 23, 2015

For my project, I found sticking to LALR(1) quite beneficial. Not that there's anything special about LALR(1): you could use LL(k) instead if you prefer. However, by sticking to a small, deterministic syntax (and making the parser generator complain loudly in case of ambiguity), I can easily find edge cases I've never thought about before it comes around to bite me. And, in most cases, it's trivial to slightly modify the syntax so that the undesirable ambiguity does not arise.

It's the equivalent of static typing for syntax: it takes effort to make bison run successfully, but when it does, I know that my grammar is free of ambiguity.

musesum · on Dec 23, 2015

> Regex is just the wrong tool for lexing and parsing

Funny; I recently wrote an Island parser that extends Regex to create a token stream: https://github.com/musesum/Par

Quoting the link to Rob Pike's post:

> Consider finding alphanumeric identifiers. It's not too hard to write the regexp (something like "[a-ZA-Z_][a-ZA-Z_0-9]*"), but really not much harder to write as a simple loop

Maybe I'm missing something, but does this suggest replacing a declaration with a procedure? Oh the horror!

krig · on Dec 23, 2015

I disagree with several of these. But I guess that's why 1) I am working on my own language, and 2) Walter Bright is a lot more successful at that than me. So yeah. He's probably right, and I'm probably wrong. However: I'm doing this entirely for my own pleasure, and I don't want success as he defines it (popularity means having to cater to the whims of others).

Disagreements:

I think minimizing keystrokes is important, and I think the notion that code is primarily read, not written, is idealistic but completely wrong. The purpose of programming languages is to be writable by humans and readable by machines. There is value in making them readable by humans and writable by machines, but only secondary value.

I think there is value in easy parsing, if nothing else because every single editor out there can highlight your language even if it isn't popular. I think the focus on popularity as the measure of success is wrong.

Tried and true is overrated. There are plenty of examples of programming languages that break the tried and true rules and are successful either because of this or despite this. Either way, it doesn't seem to matter as much as people seem to think, but staring too hard at the success of Java might make you think it does.

Anyway, feel free to disagree with me, go ahead and design your own language. I sincerely hope you are successful, regardless of what your definition of success is.

kazagistar · on Dec 23, 2015

In practice, writing code consists mostly of reading; either to (re)understand how some other part of a large system works, or to figure out why the code is incorrect.

PeCaN · on Dec 23, 2015

I would argue that it's easier to understand short, even excessively terse, code than it is to understand long code. I can look at a few lines of Haskell and even though deciphering it may take me some time, the cognitive overhead of anything other than working out what the code is doing is very low (the most I'd have to do is look up some operators). I can see a significant chunk of logic all at once and piece together the system as a whole. Meanwhile I could look at 3 files full of Java and have no idea what the system is actually doing.

StephenFalken · on Dec 23, 2015

  LinuxWorld.com: What is your advice to designers of new programming languages?
  
  Dennis Ritchie: At least for the people who send me mail about a new language 
  that they're designing, the general advice is: do it to learn about how to 
  write a compiler. Don't have any expectations that anyone will use it, unless 
  you hook up with some sort of organization in a position to push it hard. 
  It's a lottery, and some can buy a lot of the tickets. There are plenty of 
  beautiful languages (more beautiful than C) that didn't catch on. But someone 
  does win the lottery, and doing a language at least teaches you something.
  
  Oh, by the way, if your new language does begin to grow in usage, it can 
  become really hard to fix early mistakes.[0]

[0] http://www.itworld.com/article/2826125/development/the-futur...

talles · on Dec 23, 2015

> My career has been all about designing programming languages and writing compilers for them.

Anyone knows which languages/compilers the author has worked on?

(I'm not trying to validate its arguments, it's just out of curiousity)

noblethrasher · on Dec 23, 2015

The author is also an HN denizen: https://news.ycombinator.com/user?id=WalterBright

anewhnaccount · on Dec 23, 2015

Digital Mars C, C++ and D compilers

stevoo · on Dec 23, 2015

To add more context to that https://en.wikipedia.org/wiki/Walter_Bright

Although i have never though of actually write a language i can agree on those. Especially Minimizing keystrokes. I want the language to be verbose and easy to understand. Dealing with thousands of line trying to figure out what is going on there it is essential to easily pick up what is going on !

dozzie · on Dec 23, 2015

> I want the language to be verbose and easy to understand.

So, you basically want to write in assembler, which printed is very verbose, and it's very easy to understand each separate instruction.

What I want is the language to minimize language entities used to do a task (from parser's perspective those would be AST nodes), and I want those tasks to be general and high-level. This is what makes source code comprehensible, not verbosity.

Jtsummers · on Dec 23, 2015

Each instruction may be easy to understand, but the programs are not easy to understand at scale. So that's probably not what stevoo wants.

dozzie · on Dec 23, 2015

Of course it's not (or rather, I would be surprised if it was), but he phrased it in very wrong way.

ternaryoperator · on Dec 24, 2015

>> I want the language to be verbose and easy to understand.

> So, you basically want to write in assembler, which printed is very verbose, and it's very easy to understand each separate instruction.

Please don't do that: You took what he said, jumped to the most extreme case, and then attributed it back to him. He never implied that he wanted to write in assembler. In fact, based on the part you didn't quote, I think he would not want to work in assembler.

dozzie · on Dec 24, 2015

You may dislike this form, but it shows exactly why stevoo's opinion is formed on wrong basis.

> In fact, based on the part you didn't quote, I think he would not want to work in assembler.

I have hard time to understand what he meant in this sentence:

> Dealing with thousands of line trying to figure out what is going on there it is essential to easily pick up what is going on !

Is it a praise of having plenty of code for a simple thing? Or is it a rant?

dwc · on Dec 23, 2015

For an analogy, before there was anything like modern math notation people would write out the problem and solution in plain language. On one hand, that made it very accessible as long as you knew a few terms like "sum". On the other hand, the reader has to build a relatively small mental model out of a lot of words.

Is it better to concentrate hard on a system of equations taking 1/4 of a page (with lots of whitespace) until you grok it, or read five pages of prose at fairly normal speed and try to build a mental model of it? Which better confers the ability to quickly come to the realization of something like "that's a parabola" or "the minimum of that function is obviously seven" or whatever?

Analogies are only so useful, but I think there are enough parallels there to shed some light.

Jtsummers · on Dec 23, 2015

Right, but your verbose example reduces ease of use.

It's two axes. J is not verbose, but is impenetrable (certainly to a novice, others more familiar see a lot of benefits to it). Ada is fairly verbose, but is pretty easy to understand. A page of Ada code is probably a handful of lines, mostly to aid in clarity, of J code. Though the Ada code has type safety that the J code lacks. Assembly, as someone else referenced, is very verbose, but hard to understand once you get beyond a certain scale.

_Codemonkeyism · on Dec 23, 2015

Write once, read often.

melling · on Dec 23, 2015

Did you try using Google before asking? I didn't get to hit return on my iPad before I had the answer since he had a Wikipedia entry:

https://en.m.wikipedia.org/wiki/Walter_Bright

pjc50 · on Dec 23, 2015

My own opinion about PL research is that programming is concept serialisation: the programmer has a model of the system (or part thereof) in their head, and the role of the language is to turn that into a byte sequence that is amenable to both being turned into machine instructions and being de-serialised into the heads of other programmers.

But different programmers have different abstractions in their heads, and find different things intuitive. That's why things like ColorForth and APL exist; to a tiny fraction of people and problem sets they're intuitive, to the rest obtuse. I suspect this is why people keep thinking Lisp is going to take over the world despite 30+ years of this not happening.

david927 · on Dec 23, 2015

One thing abundantly clear is that syntax matters.

I feel this is like reading 100 years ago about how to make your own new type of horse & buggy, with tips like "make sure the reigns are well-fitted," when what you want to invent is a car.

The future doesn't belong to syntax because the future isn't text. The same people that would blanch at sending a string argument of "ONE" or "TWO" are indifferent about the fact that they send in "IF" to start a condition.

The reason you wouldn't send in "ONE" or "TWO" is because it makes it stiffer, more brittle and harder to manage. What can we say about the software we write? That it's stiff, brittle and hard to manage.

It's not that the future has no textual syntax at all, of course. An expression of 3+4*5 will always be easiest to express just that way. But we need to pull back from our gregarious, wide-mouthed babbling into something more meta, or I should say, more data, than text.

pjc50 · on Dec 23, 2015

Writing is approximately five thousand years old. Symbolic algebra is about a thousand. Text will be with us for our lifetimes.

commentzorro · on Dec 23, 2015

This is the ??? (I don't know what to describe it as) post here. You took the one sentence I just loved from the article and pooped on it. Then tried to tie it in with details of a non syntax related item and process. Next, you said a two character syntax for describing something as complex as an if statement was somehow too much burden?!

Not sure what the future of programming languages is. But if humans are involved I suspect there will be a visual representation. And that means syntax. And that syntax will matter then just as it does now.

david927 · on Dec 23, 2015

I didn't mean to hurt your feelings. This is my opinion; I didn't expect it to receive a warm reception.

If you went to a horse and buggy convention 150 years ago and told them that it would all be gone soon, they would have virulent reaction, right? They would say that metal contraptions have been tried before and failed, that humans have been using horses for travel for thousands of years and if so if humans are involved, it will mean horses. This isn't to prove my point but to draw caution to yours.

Note that I understand your position quite well but that you clearly don't remotely understand anything that I've written. That should raise a flag. Instead of getting mad, you should be getting curious.

knz42 · on Dec 24, 2015

I am a PL researcher and I too believe you may not fully understand what's going on.

To keep your analogy, the idea that a PL needs syntax to be used is comparable to the idea that a vehicle needs acceleration to get moving. Different flavors of syntax correspond to different means of propulsion.

Perhaps we will find good way to get rid of context free grammars (the horses in your example) and replace them with automatic pruning of parse forests (the current Tesla car of compilers) or even something more fancy like semantic editors (a fusion engine in your example) but unless you change the laws of physics, you can't get a vehicle without an accelerating engine nor a PL without an input syntax.

commentzorro · on Dec 23, 2015

Not mad ... not sad ... not misunderstanding. Just weirded out by the use of non sequiturs to crap on a thought you didn't like.

Jtsummers · on Dec 23, 2015

Structure is syntax. Even if it's non-textual, say visual languages, there's still a structure and syntax to the language.

What's your vision of a future, syntax-free, language?

david927 · on Dec 23, 2015

I'm referring to the article, so I'm referring to what it means by syntax, which is "your decision to use curly braces is of utmost importance."

I specifically added this statement to help clarify: "It's not that the future has no textual syntax at all, of course." The difference between a text file and a database is not that the database contains no text, but rather the relations between items is discretely managed.

What I'm saying is this: storing a program as a text file was understandable in the 1970's but now is what a brain-damaged, drooling Bonobo might do.

Would you store your payroll as text? Really? "Bob is too be paid eighty thousand a year;" (And I sure hope the compiler catches the misspelled 'too' or he might not get paid.)

Structure is syntax

No, structure is semantics represented by syntax. And that structure is all that counts, so we want to get there as discretely as possible. We want to get there as data. And when we do that, the syntax can be a decorator pattern; it can be what you want it to be and something else to someone else, all without modifying its core meaning. And what happens when it's just a decorator pattern? It loses its importance.

knz42 · on Dec 24, 2015

You're making a categorical error here. The entire discussion is not about storing programs but about entering them. If you want to use a database for storage then the UI you present to populate your database will become syntax.

The idea that you can have multiple syntaxes for the same language is not new (eg multilingual pascal or basic) heck even different semantic models are possible (eg LLVM) but any UI that enables a programmer to tell a computer what to do is ultimately syntax.

bmm6o · on Dec 23, 2015

To be fair, Dr Dobbs' audience is working engineers, not PL researchers, and even researchers haven't had much success in non-text languages.

david927 · on Dec 23, 2015

To be fair, if Dr. Dobbs doesn't have the audience for PLR, their article on "Write Your Own Language" should consist of two words:

Please don't

andrewchambers · on Dec 23, 2015

The level of dedication is required is really high. I stumbled upon https://github.com/oridb/mc and found my new favourite language written entirely by one person over the course of many years.

auggierose · on Dec 23, 2015

He got the thing about using context-free grammars right. The final comment about not using generators is currently right, but will hopefully be wrong by the end of this year. :-)

scriptproof · on Dec 23, 2015

Do not know why this old page is revived again, but a better title would be "So you want to write you C or C++ like own language?".

Among other points is the trailing semicolon. If this is so good, then the authors of Go and Swift must be wrong.

Randgalt · on Dec 23, 2015

I don't think you can say one is objectively right or wrong. However, as someone who's programmed for nearly 40 years, the entire issue seems like a fetish. Would anyone choose a language solely based on whether separators were required or not? Is your productivity dramatically increased by the choice of required separators? Of course not.

In functional languages there are some reasonable needs that make not requiring a separator useful. But, it does make the code harder to parse and harder to read. Go, for instance, is not a functional language. Making separators optional (sometimes) seems capricious. But, then, Go is rife with capricious choices.

vdaniuk · on Dec 23, 2015

>Is your productivity dramatically increased by the choice of required separators? Of course not.

Yes, for me it does. A large amount of symbols on screen makes my mind drift away and it gets harder to concentrate on the flow and meaning of code. Semicolons or lisp parens are good examples of such visual noise. While productivity decrease is not drastic, it's noticeable. The mind adjusts and this problem diminishes with time and practice of the particular syntax but it's still present.

k__ · on Dec 23, 2015

I often simply change the colors for those.

{} [] () , ; get the same colors as whitespaces (I enable their visibility but make them just a bit brighter as the background)

astrobe_ · on Dec 23, 2015

> Would anyone choose a language solely based on whether separators were required or not? Is your productivity dramatically increased by the choice of required separators? Of course not.

It may if you have to switch between languages. Getting syntax errors because of dummy separators can be quite frustrating at times. So the choice is mandatory separator or optional separators. I designed a language once in which most punctuation characters where treated as blanks.

Randgalt · on Dec 23, 2015

Every day I write in bash, Javascript, XML, JSON, Java, Scala and probably a few others. Trivial differences don't matter.

jerf · on Dec 23, 2015

Go uses semi-colon insertion like Javascript does, albeit with the benefit of Javascript's experience to fix up the edges. Calling "adopting some of the syntax of arguably the most popular language" capricious is probably stretching a bit.

imtringued · on Dec 23, 2015

I don't see why anyone would even think that a band-aid solution like "automatic semi-colon insertion" is a good idea. If you want optional semi-colons you have to change your language's syntax.

restalis · on Dec 24, 2015

Javascript's popularity have nothing to do with its value as a language, considering that it didn't have any competition in what it does. Ignoring the (almost already) decades of critique addressed to its design choices and quirks to declare it a cherished wonder is _no doubt_ a stretch.

Jtsummers · on Dec 23, 2015

He isn't saying you need a semicolon in particular, but rather something should be used for redundancy so when a user makes an error it is detectable. The semicolon is just an example.

https://golang.org/ref/spec#Semicolons

Go uses semicolons, they can be dropped in place of a newline (NB: So instead of a semicolon, a newline becomes the redundancy he speaks of) under specific conditions.

Chabs · on Dec 23, 2015

The need for semicolons is easy to understand. It's required if you have operators that can be both binary and unary. Look at the following in non-semicoloned C:

a + b *c - d

is it one or two expressions?

ufo · on Dec 23, 2015

Lua avoids this issue because it does not allow an statement to be just an expression.

theseoafs · on Dec 23, 2015

ufo · on Dec 23, 2015

Function-call-statements are a special case. Not every expression can be turned into a statement.

http://www.lua.org/manual/5.3/manual.html#3.3.6

This gets rids of almost all the syntactic ambiguities. The only big one that is left is statements that start with parenthesis.

http://www.lua.org/manual/5.3/manual.html#3.3.1