Just curious, how do they handle broken code? (Like when you start writing a line in the middle of the file, not yet done with it, but already need all the goodness like highlighting and code completion to work.)
A common approach with libraries I've encountered is that parser just stops with error - but that's almost unacceptable for use in a proper code editor, which should really try its best to recover and continue processing, even if some chunk in the middle is failing.
That part of Visual Studio impresses me a lot, though it’s not obvious to a user. I haven’t (say) closed the brace yet, so the code is invalid and therefore a correct AST can’t be parsed.
It has to be heuristic? Or a given (say) line falls back to last known good state?
I suspect the latter, helped by an ability to recover by inserting missing characters such as braces and quotes. (Xcode actually offers to fix trivial errors such as those, I'm sure VS does the same.)
It's an interesting problem. I suppose that as it knows the point of breakage, it can annotate the AST to indicate breakage, but preserve the subsequent node; breakage itself becomes a kind of AST node. It's possible that in such a situation, any subsequent AST nodes probably have to point to their pre-breakage nodes as parents in order to stay sane. Thus the AST tree becomes a kind of Git-like revision history that stays fragmented until the next time the AST fully parses. It could easily be something even simpler, however.
In my case, the later, I use a very tolerated css and html tokenizer/parser to provide basic html code-completion in LIVEditor - the real-time css and html tweaker/editor (http://liveditor.com)
Most IDEs use pretty smart heuristics for this sort of thing out of necessity. You notice when it is missing: for example, MonoDevelop starts compulsively indenting wrong if you mismatch your brackets anywhere in the file, but other IDEs more gracefully match the local indentation.
In other words, JetBrains has lots of experience with correctly handling partially valid syntax trees in a friendly way.
Hah! I've been looking for something like this lately, to have some platform to integrate Hack and TypeScript into some IDE. TextAdept with LPEG has been nice, Komodo Edit/IDE would be better but their docs are out of date, Sublime Text is great but you're too limited in UIs you can build.
I had an idea of an IDE that instead of being built around a language was built around frameworks and workflows in that language, which would require deep understand of the target language. This seems like a great step in that direction, but it's a shame I can't run it on Linux :(
Mono support isn't at the top of our priority list right now. However, the Nemerle binaries work on mono, so it's possible you could build the command line compiler and sample applications under Linux. If you get it working, a pull request would be very cool!
We do have a lot of future plans for Nitra, but for now, it's worth pointing out that the products that are cross platform are Java based, while Nitra is based on .net. The project is currently Windows only.
Can't wait to have a play with this (once I dig up my windows laptop...).
I have such amazing respect for the amazing products from jetbrains; having toys to play with like this is just fantastic.
I'm particularly interested in the component based grammars; I'd don't quite understand how you can get away with not breaking the 'parent' grammar when you drop an arbitrary child grammar inside of it, but quite looking forward to finding out~
This doesn't solve the problem of combining arbitrary grammars - there's obviously restrictions on what you can add where, or a requirement to add special delimiters around child grammars so that they can be parsed correctly, but Nemerle takes a practical approach to the problem. You still cannot nest arbitrary grammars inside others several layers deep - as each nested language requires consideration of its parents to get the parse you intended.
Perhaps an interesting project would be to combine the two approaches, by having a language-box aware editor which could automatically insert the correct delimiters around language-boxes (inferred by usage), and produce plain-text representations which could still be understood by Nemerle/Nitra, which is language-box unaware.
You can consider an extensible grammar as a "nested" grammar. You have certain extension points in the "parent" and you can attach a new grammar here.
For example, it would be easy to take a C# grammar and add a new operator, such as the null propagator "?.", since the list of operators is extensible. This wouldn't break anything, as it's just a new token for a binary expression.
Or, you could add something bigger, such as LINQ, by extending C# 2.0's Expression syntax rule with a LINQ query expression.
> I'd don't quite understand how you can get away with not breaking the 'parent' grammar when you drop an arbitrary child grammar inside of it, but quite looking forward to finding out~
Rakudo, which is built on NQP, is one of the largest of modern langs, but it's technically built out of a series of sub-languages ("slangs") that recursively reference each other. Thus there's a language for grammars, another for strings, another for closures, and all of these work seamlessly together.
No, it's always been the plan to open source Nitra. It's come from the team who built Nemerle, which is open source, and the team obviously wanted to continue in this manner. And JetBrains has a pretty good track record with open source - e.g. the IDEA platform that is IntelliJ's Community Edition is fully open source.
The parsing part is similar conceptually and syntactically, but their implementation is very different. Antlr parses LL grammars - an unambiguous subset of context-free-grammars which are quite restrictive in the production rules they allow. This tool on the other hand uses PEGs, which parse a different (but overlapping) set of grammars, which aren't necessarily limited to CFGs, but are always guaranteed to be unambiguous. The main feature of PEGs that allows this is that the ordered choice operator (|) - the correct parse depends on the order you specify alternations, unlike with Antlr, where all alternations have equal precedence.
It should be noted though that this tool is much more than just a parser-generator - it's a framework for developing tools for interacting with languages, which just happens to use PEG as part of that implementation.
Note that PEGs are not context-free grammars. They're both more and less powerful than traditional CFGs, and they're tricky to use: because PEG choice is ordered and traditional CFG choice is unordered, it's hard to translate standard language grammars to a PEG recognizer system. That's why, for my forever-project, I've oped to use scannerless GLR instead of PEGs. Both PEGs and GLR recognize languages that are closed under composition (the property that gives you extensibility), but the formalisms for GLR parsers are much better.
As others have mentioned, for an IDE, you also want strong error recovery. Doing that in a general way when using tools based on declarative grammars is, well, very hard, especially when you want to recover from brace mismatch problems. The best approach is "island and reef parsing", where you actually parse your buffer twice: you first build a map of all the "reefs" (parenthesis) using a simple recursive descent parser, pair up mismatched parenthesis using an ad-hoc algorithm, insert corrections for mismatches, then apply your fully general parser to the result. (The word "parenthesis" here refers to any balanced construct, even "begin" and "end". You can actually infer what the "parenthesis" for a given language are by examining the grammar!)
I don't see how that is addressing my point. Emacs Lisp is the scripting language of Emacs, and you can use it to build AST to deal with syntax highlighting and autocompletions amongst other. And that's what Nitra is intended for, and it doesn't matter that it is written in Nemerle.
Does anyone know if that project is still alive?
[1] http://bsumm.net/2012/08/11/steve-yegge-and-grok.html
[2] https://www.youtube.com/watch?v=KTJs-0EInW8