It's an LR(1) parser generator for OCaml and Coq with a lot of extremely interesting features, such as genuinely good debugging support for grammars and the ability of generating error messages by example.
What this means is that after you write down your grammar, Menhir will give you examples of all the possible syntax errors that could occur. You can then write error messages for each case and get a parser with built-in error reporting for syntax errors.
This works a lot better than you'd think and I really wonder why nobody else implements this feature. Or for that matter, why they're not advertising it on the webpage! If you want to know more, look in the manual, section 11.
I've never used Menhir so I can't compare how similar they are in practice, but I've enjoyed the times I played with LALRPOP much more than the many times I've battled various yacc derivatives.
+100 this! Menhir is really awesome and super usable parser generators. We used it to for JSJS [0], and you can see for yourself that the rules are quite easy to understand for a non-trivial grammar.
Having written a parser generator myself (https://github.com/YorickPeterse/ruby-ll), plenty of handwritten parsers, and having mess with parsing combinators I concluded I really don't like parsing combinators. Perhaps this was due to the APIs of the libraries I worked with (e.g. Nom in Rust), but it just lead to incredibly verbose and hard to work with code.
I think parser generators are generally easier to use with dynamic languages. Most of the code-generation tooling can be hidden by interpreters or JIT compilers, and dynamic typing and generics can make the action code much more concise.
I learned and used bison+flex back in the 90s. More recently, I've enjoyed the PLY package for Python. It's essentially the same algorithms and abstractions, but it turns the usual bison grammar inside-out by embedding grammar fragments in your Python action code instead of embedding action code in a grammar file.
Your grammar is a Python module. You write Python function definitions which feel almost like you are writing the interesting subset of a recursive-descent parser, i.e. the important reduction steps, while hand-waving about all the input-testing and buffering you would have to do in a bespoke recursive descent parser. The code comment on each function attaches the grammatical rule which fires that action. The PLY parser-generator function you call at the end of your grammar module uses Python introspection to find all the production rules and stitch the parser together.
My only coding style complaint is how PLY abuses the function __doc__ strings to embed the grammar rules. I think using a decorator interface would have been a more "pythonic" interface to attach the grammatical rules to each action function...
I used PLY to write a parser for a moderately complex internal language a while ago at work and it was a great experience. I ended up with very readable code. (And with only a little hackery I was able to write a tool to perform source code transformations while preserving comments and formatting around the changes.) I'd definitely use it again. The only potential downside is speed: it's Python, so there's only so fast it can go compared to other languages.
I also slightly dislike parser combinators.
Mainly because usually you cannot easily debug them
by placing a breakpoint.
In the best case scenario the specific parser combinator
has good tracing support to enable debugging.
However, imho this is still inferior to simple breakpoints and debugging "directly" in one's favorite IDE.
The only parser combinator library I used (smug, for common lisp) had little trouble with breakpoints; (any parse with significant backtracking was annoying but breakpoints worked fine) is it common that one cannot use breakpoints for debugging in other parser-combinator libraries?
I'm not familiar with Smug, but I would guess that
the combinator is implemented using LISP macros which are than expanded into "real" source code (lisp lists)
which is evaluated directly and can halt on break points.
In programing languages which do not have true macros
The combinator API normally creates a data structure representing the grammar which is than interpreted
Which usually makes debugging harder
and the parsing slower.
A design I like combines a combinator library with an optional facade that gives you a convenient grammar syntax a la a parser generator -- for example, Lua's LPEG works this way. LPEG is a little awkward imo in how it integrates semantic actions, and I tried to do better in my own library (which unfortunately isn't really ready to share and it's been on the back burner).
I tried to use nom for one of my projects, but I found that it had two major problems. First, the macro syntax was hard to use, hard to learn, and hard to debug. Sometimes you had to put an expression in parentheses, even if it seems that it should be unnecessary, so that the macros would be able to parse your code. I also found the macros hard to compose, which is ironic for a parser combinator library.
My other problem was with error handling. I wanted to have my own `Error` enum to represent errors in my program. Nom supports a custom error type, but I now had to write out type annotations in many places with the fish syntax (e.g. foo::<&[u8], Ast, Error>()) and they are quite long annotations. In addition, some of the macros would not work—or at least, I couldn't get them to work—with my custom error type, and I had to revert to using functions in a way that was absolutely not compositional.
In the end, since the format I was parsing was simple and regular (Erlang's External Term Format), I wrote a parser by hand and removed the dependency on nom. The result is as just as fast and produces meaningful error messages. It's actually less tedious, because I am using simple functions that have the syntax and semantics that I already know; maybe if I had a more complex format to parse I would sing a different tune, but for the moment I prefer to avoid using nom.
I've used nom to implement a Turtle parser. The result was fast (it runs the test suite in 50ms) and the code is pretty readable. Certainly much more readable then if I'd used a homegrown set of functions. I've documented the specification grammar rules next to the nom rules. They match up well.
Nom can be tricky to debug, but Rust can host tests in the same file as the code, so debugging by writing new tests is convenient.
The syntax takes some getting used to. The hardest part for me was learning to order the grammar alternatives correctly. The order matters for performance and correctness.
> Many low-level parsers ‘in the wild’ are written by hand, often because they need to deal with binary formats, or in a quest for better performance.
I think the main reason for hand-rolled parsers, at least in my experience, has been error reporting.
Most parsing frameworks make it difficult for you to tell the user simple things like misspelled variables, something does exist, but it isn't in-scope when its being called, and basically helpful messages like that.
Error management is only briefly mentioned, and sort of glossed over in Nom's [0] paper. Mostly because Nom was designed for efficiency... But efficiency is not why GCC rolled their own parsers.
It's possible to provide rich information and flexible error reporting using error combinators, which merge results from failed parser combinators. The error combinators are in some sense dual to the parser combinators.
How could an error combinator possibly approach something like a misnamed variable? It would need access to the current scope's environment, or a version of it, to suggest what the spelling error might be. Not-yet-in-scope errors are similar, but need both past and future scopes as well.
I don't know of any parser combinator that can handle those very human mistakes, because they require partial evaluation to understand.
I might be mistaken, but if I'm not, parser combinators are great for saving time, but if you have the time to do it right, then a hand-rolled parser will give your users a much better experience.
Misnamed variables are actually one of the simplest problems to deal with ehile parsing. The solution is simple - you don't! The purpose of the parser is to parse the syntax, so you can simply use it to build a syntax tree and postpone the resolution of names until the next step in compiler's pipeline (also useful if you language supports macros).
In some areas of the world (VHDL/Ada), you have to maintain symbol table for correct lexing.
These languages have constructions like OBJECT_NAME'ATTRIBUTE (object name can be anything with visible name in the scope and attributes depend on its kind) and RECORD_TYPE_NAME'SLICE where we create value of RECORD_TYPE_NAME using slice, which is collection of values of record fields.
There are also character literals, which have form like 'a', '1', '#', ''''. Just as you can expect here.
The combination of record value construction and lexer forms for character literals makes it hard to parse something like MY_RECORD'('1','0',false, 10).
Symbol table and/or dynamic parse construction allows you to parse these texts easily and specify the syntax straightforwardly. The other approach with construction of separate AST complicates things a lot.
C is a well-known exception to this. The grammar depends on typedef name resolution. `foo * bar` and `(foo)-1` parse as different things depending on whether `foo` is a typedef or a variable.
>Most parsing frameworks make it difficult for you to tell the user simple things like misspelled variables, something does exist, but it isn't in-scope when its being called, and basically helpful messages like that.
I agree that generated parsers make error reporting difficult, but do not think these examples are relevent. If the problem is a misspelled or out of scope identifier, the parser should should still be able to parse the program, which would be syntactically valid.
> If the problem is a misspelled or out of scope identifier, the parser should should still be able to parse the program
That's not necessarily true for all languages. For example, in C, "(e1)&e2" is parsed differently depending on how e1 is declared. If e1 is declared as
typedef int* e1;
then the "&" in "(e1)&e2" is parsed as the address-of operator, but if e1 is declared as
This is a problem for a lot of languages, unfortunately, and it breaks the clean separation between syntax and semantics that we're hoping for when we write a formal grammar.
I've had some success in the past with using GLR or another algorithm that can handle ambiguity, and then choosing among the possible parse trees in another pass that takes semantic information into account. How applicable that is really depends on the language, though; if things are so ambiguous that you're getting an exponential growth in possible parse trees, you may not want to use this approach.
But there are statement based languages where instead you have:
print "Something"
Still simple enough to parse.
However, what about user function calls?
something 1 2
3 4
If a newline doesn't break a call, and you don't depend on indenting, but actually the arity of the original function definition, accidentally writing:
someting 1, 2
3 4
Can be impossible to parse correctly, especially if said language also supports first-class functions.
func apply with func:caller array:values
do
...code...
done
aply + list
do
10 10 10 10
done
The syntax in the above example is still correct, but because the arity of 'aply' can't be determined, the parser can't actually continue.
The grammar might be a little insane, but I've had to work with similar grammars in several financial languages. And the parser does poop out, and when it does, you really want a nice error to result.
It's not just about the language or tools you use. You can produce a memory-safe, beautifully structured parser and use whatever framework you like, but if it doesn't do full recognition of your input before you do any processing then you're likely boned.
Not only is there a high risk that you'll still be passing bad input through to your database, another program, or an unsafe library deep down in your application (YAML in Ruby anyone?), but you get what Meredith Patterson calls a "weird machine" that is programmable by an attacker.
> ... if it doesn't do full recognition of your input before you do any processing then you're likely boned.
There are languages where you need to process some of what you've done before a later part of the source can be dis-ambiguated. How does that sit with what you've said here?
The argument that langsec.org makes is that any such languages are missing security properties. In fact they regard most ad-hoc languages as insecure due to the lack of verification, which they believe to be the root cause of many security problems.
The article mentioned using Parser Combinators as the sweet spot between Hand Written Parsers and Parser Generators.
Another possible sweet spot would using something I call a "Parsing DSL" which is a sort of a cross between a parser combinator and a parser generator.
Details:
A Parsing DSL means using API similar to hand building
a parser but without a-lot of the cruft associated with hand building while enjoying higher level abstractions
from the Parsing DSL library such as:
1. automatic ambiguity detection.
2. lookahead calculation.
3. Grammar diagrams
4. auto-complete.
5. automatic error recovery
6. and more...
Under V8 (Chrome/Node) this is much faster than any other library
tested and even substantially faster than a naive hand built parser.
http://sap.github.io/chevrotain/performance/
Chevrotain shares two main ideas/concepts with Ohm.
1. Separation of grammar and semantics, but in a less opinionated manner as it does not enforce the separation as Ohm does (it is still possible to embed actions directly in the grammar).
2. Grammar Inheritance.
Although while those ideas are not common they are also
not that rare (For example the same concepts exist in Antlr). I think there are three big conceptual differences.
1. In Chevrotain Performance is considered as a major feature.
Which results in it being two orders of magnitude faster (in the benchmark linked above)
2. Chevrotain attempts to provides capabilities relevant for writing IDEs, for example automatic error recovery/tolerance and syntactic content assist.
3. Internal vs External DSL -
From an implementation perspective there is a vast difference
as Ohm is an external DSL while Chevrotain is an internal DSL.
In practical(user) terms this means that you can place a breakpoint directly in a Chevrotain grammar, but you cannot do so in Ohm.
Or that you will need a separate editor to edit an Ohm grammar while
you can use any JavaScript editor to create a Chevrotain grammar.
It also means that Ohm could be ported to different target runtimes (Like Antlr actually is) while Chevrotain can only
run in an ECMAScript engine.
Parser combinators are awesome! I've played with them just a bit. They seem to match my intuition of how a parser's implementation ought to resemble an EBNF grammar.
Just for fun, using the Bennu library[0] I wrote a JSON parser[1]. (Not intended for production use of course; well-optimized JSON parsers exist and browsers kinda ship with them now. If you're looking for a Bennu example or are thinking of experimenting with extending JSON in wacky ways for fun, then it's neat to mess with.) With that specific library, I seemed to create some messy parts when shuffling values through the library's stream abstraction, but I got the hang of it and the parts I thought were messy at least were straight-forward and didn't have issues like hidden edge cases. Something cool is that parsers made with the library and its stream abstraction automatically work incrementally too.
Is there a tool out there that generates grammar by analyzing source code? For example, I'd like to feed the linux kernel source code and get the C grammar as the output?
That gives me a kind of funny idea: train a sequence-to-sequence LSTM network on code written in two (or more) programming languages that implement the same functionality. You'd need a big corpus, and the could would probably would have to be a little more complicated than "hello world, but I don't see why it wouldn't work, in principle.
For me writing parsers like it is 2017, means actually using parser generators like JetBrains MPS or ANTLR, instead of using bison and yacc as some kind of quality measure.
I don't see how a better understanding of the generated parser helps improving those things. Either you can't improve these, or you improve these by modifying the grammar.
Having used ANTLR recently I would tend to not only agree, but say most cases of disagreement are circle jerking instead of getting to a quick solution and shipping. I just updated our grammar 2 weeks ago. Effort? Update the *.g4 grammar file and run a build target in ANT. Done. Next problem. Performance wise it's parsing deeply nested SQL queries in sub millisecond time which meets our need. If someone requires better performance than that I'd be curious what they're doing.
This is true ideally, but there are several factors which make it not always true in practice.
(1) Often your grammar admits inputs which are not in your language. Less often, it forbids some inputs which are actually valid. This is usually because an accurate grammar would be extremely complex to write and inefficient to parse using a generic algorithm. Languages which combine a context-free grammar with parsing decisions based on semantic information fall into this category (think C++); the "true" grammar would usually be context sensitive and quite hard to work with.
(2) You are sometimes compiling a layered language. Again, consider C++. The program you're compiling is generally actually written in the C preprocessor language! Conceptually, it's translated to C++, and then the C++ code is compiled in a separate phase. In practice, that layering will make it difficult to produce good error messages, and a formal grammar for the combined language would be a mess to say the least.
(3) The structure of the grammar may just be awkward for producing good error messages, and refactoring it to eliminate the problem may not be possible, or it may only be possible by making problem (1) even worse.
In general, the decision of what goes in the lexer, what goes in the grammar, what is handled in semantic actions, and what is dealt with at a later semantic layer is always a trade off. Each layer has different characteristics in terms of computational power, performance, static analyzability, programming flexibility, composability, modularity, and maintainability. It doesn't make sense to insist that you solve all problems at one particular layer. You look at the problem you need to solve and decide how it'll be partitioned between these different layers to maximally benefit from their strengths and minimize their weaknesses.
In ANTLR4, they are most definitely not part of the grammar which they were in previous ANTLRs. That makes the grammar easier to read/change/reuse/... Everything else is listener and visitor methods which are easy to write, debug, error check, ....
You could theoretically put all kinds of meta-information into the grammar, even arbitrary code snippets that are pasted into the generated parser. I'm not saying that that's a good idea - it undermines much of the value of a parser generator - but generally speaking it can be done.
It was more a kind of heads up, because I think pure coding is too low level as parsing tool and most still think the alternative to manually parsing is only bison/yacc.
I've used yacc/lex and bison/flex but ANTLR4 is the bees knees. It is a considerable improvement on previous ANTLRs. Separating out grammars (ANTLR grammar file) from actions (Java files) makes everything better.
Error handling involves quite a bit of manual work, memory management if you're allocating nodes during the parser and also trying to do error recovery is even worse. Best dealt with by a slab or pool allocator. It's not very OO-frinedly and you can only have one instance running at once. That was my experience of trying to manage several parsers for Verilog, VHDL, DEF and similar ecad formats.
ANTLR combines lexical analysis and parsing, flex+bison, into a single tool. To me, ANTLR4 sets the bar pretty high. I'm not opposed to learning a new tool like parser combinators but I'd need an article that showed the differences between ANTLR4 and PCs, showing how it was much easier in PCs rather than it was just 2017.
I don't think so? That doesn't appear to support escaping quotes by doubling them (instead, it only supports the \" variety), but doubling quotes is the standard escaping mechanism for CSV data.
> The problem with CSV is that nothing is standard, not even the eponymous comma as separator.
So? The parent claimed it understood everything. I was pointing out that it doesn't. There are plenty of CSV parsers out there that can pretty much handle anything you throw at them, and they work pretty well in practice.
> Seriously, stop using CSV.
I'm a consumer, not a producer. So I'll continue right on using it, thank you very much.
It's a bit of a special case, but I am huge fan of Kaitai [1] for generating parsers for reading binary data formats. You declaratively describe the format in a YAML-based DSL, and "compile" it into a parser that can target a variety of languages (C++, Java, Python, etc).
Ragel generates very fast code, is modular by design and has good error handling. It supports both compiled and scripted languages (e.g. can generate both Ruby and C) which is useful if you need a fallback.
I've really enjoyed using it to implement the following:
I'll admit, it does take a while to understand how to correctly handle ambiguity when dealing with callbacks/events during parsing, but generally speaking, once you get a bit of experience with how things work, it becomes a very powerful tool in your toolbox.
I thought that Ragel was no longer open-source, and supported C/C++ only (admittedly, the second point isn't that big a deal, given the article and its audience).
This is always going to lead to the same place - whether it's the developer's fault or the tool's.
The author of this post wants uncompromising safety, ragel does not allow for that - programmer error can introduce memory unsafety. They even explicitly call out Cloudbleed.
To provide a little bit more context, here is detailed description of this bug [0].
Curiously, they suggest that problem could be avoided by changing the check for EOF to ">=".
This is not true at all in case of C, at least as far as language semantics is concerned.
When pointer goes two past the end of array you are already in undefined behaviour land.
> The Ragel code we wrote contained a bug that caused the pointer to jump over the end of the buffer and past the ability of an equality check to spot the buffer overrun.
A simple `assert(p<pe);` in a debug build would have caught the issue. I think you could argue that ragel should include something like this by default.
> An unreliable programming language generating unreliable programs constitutes a far greater risk to our environment and to our society than unsafe cars, toxic pesticides, or accidents at nuclear power stations. Be vigilant to reduce that risk, not to increase it. – C.A.R Hoare
No. There is no programming language that prevents the programmer from writing bogus code. Blaming software instability on the programming language would be like blaming unsafe building designs on the architect's drafting tools. Yes, shitty drafting tools can make certain kinds of mistakes easier, but the task of having to design something with careful thought and good engineering principles does not ever go away, no matter what kind of compass and ruler you're using.
Also, unsafe software IS dangerous, but putting it next to those actually life-threatening things actually undermines the message by the contrast.
Is it still relevant if it has not been updated for several years?
Perhaps Ometa's younger's brother (Ohm) should have been referenced instead: https://github.com/harc/ohm
Unfortunately while it seems to have some very nice features, particularly the separation of grammar and semantics.
Its performance is underwhelming.
See a benchmark I've created of JSON parsers implemented using many
parsing libraries.
Meaningless statement - 'fault' in these discussions is impossible to objectively attribute. Is every vulnerability the fault of the developer writing the code, or the tool that facilitated the code? Some would argue the former, some the latter.
The end result is the same - using these tools you can produce vulnerabilities, and your users get exploited all the same. So is fault really important?
> Writing a parser in a memory-safe language does nothing for it's correctness, as memory safety is not the only security problem. One should use a domain specific language, as simplicity is very important for safety.
You have two statements here - one is that memory safety is no the end of safety, one is that simplicity is important to safety beyond memory safety.
The first feels... weak. No kidding - security encompasses more than memory safety. But eliminating memory safety issues limits what you have to worry about.
As for simplicity leading to security, this isn't a supported argument. But as for encoding semantic security issues into your program (security issues outside of memory safety), I would argue a strong type system is the right way to go here, something Rust certainly provides.
> edit: Rust is the new systemd, full of fanboys pushing it forcefully on others (or, as they say, "gently nudging"). I have not much to say against rust itself (except that it is not for "systems programming"), but the fanboys are idiots.
blah blah blah, literally just vapor at this point, not worth addressing.
>Ragel is still better then a generic programming language for writing parsers, by a long shot.
>That said, THIS ARTICLE DOES NOT SAY ANYTHING ABOUT HOW TO ACTUALLY WRITE A PARSER !
And, if i can dare to comment on your statement..
>As for simplicity leading to security, this isn't a supported argument. But as for encoding semantic security issues into your program (security issues outside of memory safety), I would argue a strong type system is the right way to go here, something Rust certainly provides.
This is, for one, a bastardization of what i stated, that was that a DOMAIN SPECIFIC LANGUAGE IS (usually) BETTER FOR A DOMAIN THAT IT IS SPECIFIC TOO THEN A LANGUAGE THAT IT IS NOT SPECIFIC TO THAT DOMAIN. This is particularly evident in parsing, and to say otherwise is dishonest.
As for simplicity being related to security, that should also be self evident. If not from common sense, then from examples of more complex (needlessly complex or otherwise) programs having more faults (on average).
>blah blah blah, literally just vapor at this point, not worth addressing.
Your post is exactly what a fanboy would write. A post that goes point by point, "bashing" them for being "wrong", "imprecise", or "not strong enough of an argument", all the while ignoring the actual "meat" of the argument (the meat being that A LANGUAGE FOR WRITING PARSERS IS BETTER FOR WRITING PARSERS THEN A GENERIC LANGUAGE). It is the equivalent of "link plz" (meaning "scientific paper of GTFO", as if actually using your own brain is a bad thing). In short, you sir are a fanboy.
Rust still means too much ceremony; there is a line where specifying more details makes the whole thing more difficult to understand and maintain; and that's still besides taking more time to write in the first place. Not everyone is into checking all the boxes, all the time; for good reasons; otherwise we'd all be coding Ada by now. Trying to shame others into adopting that mindset can only cause more division.
No, I don't think so. I maintain a popular command line tool that needs to be extremely fast, and Rust was an excellent choice for that. It's made it so much easier to maintain than equivalent C tools, and has made it easier for me to reach feature parity by nearly effortlessly depending on the work that so many others have done in the ecosystem.
It's not bug free mind you, but that class of bugs doesn't include things like buffer overruns, double frees or memory leaks (note: Rust does not guarantee absence of memory leaks, but RAII pretty much makes sure it doesn't happen in practice).
I would argue that writing it in C++ would have gotten you there faster, assuming a decent level of experience with the language. There's no ecosystem in the world that beats seamless integration with C and no language that even comes close to being as pragmatic.
> no language that even comes close to being as pragmatic.
That's what I'm disagreeing with. I find Rust to be extremely pragmatic. There are other competing tools written in C++, and from where I'm standing, their maintenance story is quite a bit harder. Conversely, my tool works seamlessly on Windows, Mac and Linux.
If multi-platform is your biggest problem, I can see how that helps. I don't really care that much myself as long as my code compiles on the unixes. I sadly agree that static linking is a pain in the behind in C/C++ land. Still, the language I'm using interacts with the way I'm thinking; if that part isn't working out, all the features in the world isn't going to help.
Indeed. Windows support was critical, and I have a ton of users there as a result of it being so easy to distribute a Rust application to them. Microsoft even picked it up and is now shipping it in VS Code. :-)
> Still, the language I'm using interacts with the way I'm thinking; if that part isn't working out, all the features in the world isn't going to help.
Sure, and there was a lot of friction between me and Rust in the beginning. But my experience---and a lot others' experience as far as I'm aware---is that the friction settles down quite a bit. But yeah, experiences can vary there!
What's your alternative when writing parsers which need to handle untrusted input? This isn't a general "use Rust everywhere" article, it's focusing on these kinds of parsers and how we can do better against very real issues.
But that has nothing to do with Rust; this post and others is more or less claiming between the lines that the only thing that can save us from buggy software is the Rust way, which is bullshit. It didn't work for Java, or Erlang or Haskell; and it won't work for Rust; simply because the solution is experience, not more rules.
I disagree with your "too much ceremony" premise in general, but in particular I don't think it holds true for nom parsers. Here are two examples I've been working on:
But I can, and I do; that's still too much noise for me. I'm sure the code is fine by Rust standards, but to me it looks like Perl with control issues. I don't need any help from my programming language with micromanaging the code I write or the way I write it. I'll ask when I want assistance, and until then I prefer if they stay the hell out of my way. I won't claim to write bug-free software; the only way to write bug-free software is to not write it at all; spending half my energy on sweet talking a psychotic compiler around every single detail is a perfect recipe for missing the bigger picture and ending up solving the wrong problems.
Because it takes longer, is more complicated and more difficult to design. You can't tell me that writing Rust is as easy as writing Go for example.
I'm not saying Rust's heavy typing and borrow checker don't have advantages. Of course they do. But I do think too many people pretend it is always an obvious choice to take them. In many situations a simpler, less 'perfect' language like Go, Python, or even C++ is better.
For example I don't think anyone is going to be writing AAA games in Rust any time soon, and not just because of language momentum.
> Because it takes longer, is more complicated and more difficult to design.
So you mean it requires properly solving the hard problems implied by your application and your desired solution, instead of ignoring them and permitting latent safety and security violations.
> You can't tell me that writing Rust is as easy as writing Go for example.
GC will always be easier than lifetime checking. But that's not what you claimed: you said people have good reasons for not wanting to check all the (safety) boxes. Go also requires you to check all of its safety boxes too, so it isn't an example of what you claimed.
Only languages like C and C++ which permit violating type safety are easy are examples of being justified in not wanting to check the safety boxes. And you have unsafe Rust for when you really need it.
> In many situations a simpler, less 'perfect' language like Go, Python, or even C++ is better.
C++ is not simpler. It's hilarious that some people think "familiar" somehow means "simpler".
you present a good argument for ticking all the safety boxes, but the original comment you're citing doesn't specify safety; you're refuting an argument they didn't make. To count Go as an example, one could count "zero-cost abstractions" or "no runtime" as a box that Go doesn't check.
Rust's "boxes" are there for safety reasons. I don't see what else could have possibly been meant.
And Go has a runtime. The fact that it has green threads and runs a GC makes for a heavier runtime than C or Rust.
I'm not sure I get your argument about Go' boxes. The original post was clearly talking about the "extra" work a developer has to do to "satisfy" the compiler and run their code. It's a common complaint from those coming from less safe languages. I don't see how I could interpret the meaning as you suggested and still make sense of the post.
> You can't tell me that writing Rust is as easy as writing Go for example.
Sure I can! I've been writing Go and Rust daily for years. Both come pretty easy to me. But that's because I've been using both for quite a long time now. Do I sometimes stumble and wonder, "How do I represent my data in the cleanest way in Rust?" Sure! But the same happens in Go too.
Rust was definitely harder to learn, and it took me longer to reach proficiency. With Go, I can't recall much of a learning period. It was pretty much off to the races on day 1. With Rust, there were some growing pains for a few weeks before things clicked.
However, I don't actually disagree with your larger point! I am only one person, and what I find easy or hard might be completely different for another person. For example, I had written quite a bit of Haskell and C before coming to Rust, which turned out to be pretty good preparation. Not everyone has that background, which might make the learning experience worse (or better) than mine.
Ada compilers targeting enterprise and military customers with deep pockets, alongside the industry's adoption of UNIX, where Ada compilers where something extra to buy instead of available for free C compiler[0] required for any UNIX related systems programming, did not help making the language more widespread.
[0] Until they started selling the SDK as developer's edition addon
It's an LR(1) parser generator for OCaml and Coq with a lot of extremely interesting features, such as genuinely good debugging support for grammars and the ability of generating error messages by example.
What this means is that after you write down your grammar, Menhir will give you examples of all the possible syntax errors that could occur. You can then write error messages for each case and get a parser with built-in error reporting for syntax errors.
This works a lot better than you'd think and I really wonder why nobody else implements this feature. Or for that matter, why they're not advertising it on the webpage! If you want to know more, look in the manual, section 11.