Nitra, JetBrains’ research project for language tooling, goes open-source

bodski · on May 27, 2014

This looks very much in the spirit of Yegges' 'Grok' project at Google [1][2].

Does anyone know if that project is still alive?

[1] http://bsumm.net/2012/08/11/steve-yegge-and-grok.html

[2] https://www.youtube.com/watch?v=KTJs-0EInW8

sitkack · on May 27, 2014

You might be interested in https://github.com/yinwang0/pysonar2

drdaeman · on May 27, 2014

Just curious, how do they handle broken code? (Like when you start writing a line in the middle of the file, not yet done with it, but already need all the goodness like highlighting and code completion to work.)

A common approach with libraries I've encountered is that parser just stops with error - but that's almost unacceptable for use in a proper code editor, which should really try its best to recover and continue processing, even if some chunk in the middle is failing.

mwsherman · on May 27, 2014

That part of Visual Studio impresses me a lot, though it’s not obvious to a user. I haven’t (say) closed the brace yet, so the code is invalid and therefore a correct AST can’t be parsed.

It has to be heuristic? Or a given (say) line falls back to last known good state?

atombender · on May 28, 2014

I suspect the latter, helped by an ability to recover by inserting missing characters such as braces and quotes. (Xcode actually offers to fix trivial errors such as those, I'm sure VS does the same.)

It's an interesting problem. I suppose that as it knows the point of breakage, it can annotate the AST to indicate breakage, but preserve the subsequent node; breakage itself becomes a kind of AST node. It's possible that in such a situation, any subsequent AST nodes probably have to point to their pre-breakage nodes as parents in order to stay sane. Thus the AST tree becomes a kind of Git-like revision history that stays fragmented until the next time the AST fully parses. It could easily be something even simpler, however.

edwinyzh · on May 28, 2014

In my case, the later, I use a very tolerated css and html tokenizer/parser to provide basic html code-completion in LIVEditor - the real-time css and html tweaker/editor (http://liveditor.com)

kazagistar · on May 31, 2014

Most IDEs use pretty smart heuristics for this sort of thing out of necessity. You notice when it is missing: for example, MonoDevelop starts compulsively indenting wrong if you mismatch your brackets anywhere in the file, but other IDEs more gracefully match the local indentation.

In other words, JetBrains has lots of experience with correctly handling partially valid syntax trees in a friendly way.

girvo · on May 27, 2014

Hah! I've been looking for something like this lately, to have some platform to integrate Hack and TypeScript into some IDE. TextAdept with LPEG has been nice, Komodo Edit/IDE would be better but their docs are out of date, Sublime Text is great but you're too limited in UIs you can build.

I had an idea of an IDE that instead of being built around a language was built around frameworks and workflows in that language, which would require deep understand of the target language. This seems like a great step in that direction, but it's a shame I can't run it on Linux :(

citizenmatt · on May 27, 2014

Mono support isn't at the top of our priority list right now. However, the Nemerle binaries work on mono, so it's possible you could build the command line compiler and sample applications under Linux. If you get it working, a pull request would be very cool!

phpnode · on May 27, 2014

The rest of JetBrain's products are cross platform, so there's a good chance that this will be in future.

citizenmatt · on May 27, 2014

We do have a lot of future plans for Nitra, but for now, it's worth pointing out that the products that are cross platform are Java based, while Nitra is based on .net. The project is currently Windows only.

mythz · on May 27, 2014

Interesting, what made you decide on .NET (i.e. given you're predominantly JVM based)? Was it just based on the Nermerle's team preference?

citizenmatt · on May 27, 2014

I wouldn't say we're predominantly JVM based. We've also got ReSharper, dotTrace, dotMemory, dotCover and dotPeek, which are all .net based.

But basically, it's because Nitra is an extension and evolution of work done on Nemerle, and Nemerle is a .net language.

mythz · on May 27, 2014

Is ReSharper written with C#? I thought I heard it was written with C++.

citizenmatt · on May 27, 2014

It's written in C#, with some of the Visual Basic support written in VB. Parts of ReSharper for C++ are written in C++ and C++/CLI.

jimmcslim · on May 27, 2014

Given the renewed spirit of open-ness at MS, and more support from Xamarin for Mono, there's a chance that this might be sooner rather than later?

shadowmint · on May 27, 2014

Can't wait to have a play with this (once I dig up my windows laptop...).

I have such amazing respect for the amazing products from jetbrains; having toys to play with like this is just fantastic.

I'm particularly interested in the component based grammars; I'd don't quite understand how you can get away with not breaking the 'parent' grammar when you drop an arbitrary child grammar inside of it, but quite looking forward to finding out~

sparkie · on May 27, 2014

This doesn't solve the problem of combining arbitrary grammars - there's obviously restrictions on what you can add where, or a requirement to add special delimiters around child grammars so that they can be parsed correctly, but Nemerle takes a practical approach to the problem. You still cannot nest arbitrary grammars inside others several layers deep - as each nested language requires consideration of its parents to get the parse you intended.

If you're interested in the problem of combining grammars, I'd encourage you to check out Diekmann & Tratt's Language Boxes (http://soft-dev.org/pubs/pdf/diekmann_tratt__parsing_compose...) [demo: http://www.youtube.com/watch?v=LMzrTb22Ot8], which provide an elegant solution to the problem, although with the obvious caveat that it diverges from plain-text file representation of code, and requires an intelligent editor like their example implementation, eco (https://bitbucket.org/softdevteam/eco).

Perhaps an interesting project would be to combine the two approaches, by having a language-box aware editor which could automatically insert the correct delimiters around language-boxes (inferred by usage), and produce plain-text representations which could still be understood by Nemerle/Nitra, which is language-box unaware.

citizenmatt · on May 27, 2014

You can consider an extensible grammar as a "nested" grammar. You have certain extension points in the "parent" and you can attach a new grammar here.

For example, it would be easy to take a C# grammar and add a new operator, such as the null propagator "?.", since the list of operators is extensible. This wouldn't break anything, as it's just a new token for a binary expression.

Or, you could add something bigger, such as LINQ, by extending C# 2.0's Expression syntax rule with a LINQ query expression.

raiph · on May 29, 2014

> I'd don't quite understand how you can get away with not breaking the 'parent' grammar when you drop an arbitrary child grammar inside of it, but quite looking forward to finding out~

Rakudo, which is built on NQP, is one of the largest of modern langs, but it's technically built out of a series of sub-languages ("slangs") that recursively reference each other. Thus there's a language for grammars, another for strings, another for closures, and all of these work seamlessly together.

jimmcslim · on May 27, 2014

I wonder if the open-sourcing of Microsoft's Roslyn, C# compiler project, had an impact on this decision?

At any rate, I might have a look at this and see if a grammar for Delphi can be built... the state of tooling on that platform is quite frankly dire.

citizenmatt · on May 27, 2014

No, it's always been the plan to open source Nitra. It's come from the team who built Nemerle, which is open source, and the team obviously wanted to continue in this manner. And JetBrains has a pretty good track record with open source - e.g. the IDEA platform that is IntelliJ's Community Edition is fully open source.

Igglyboo · on May 27, 2014

Having a minimal idea of how these work, how similar is this to antlr?

"It is also a build tool to compile the grammars into parsers" this line specifically caught my attention.

sparkie · on May 27, 2014

The parsing part is similar conceptually and syntactically, but their implementation is very different. Antlr parses LL grammars - an unambiguous subset of context-free-grammars which are quite restrictive in the production rules they allow. This tool on the other hand uses PEGs, which parse a different (but overlapping) set of grammars, which aren't necessarily limited to CFGs, but are always guaranteed to be unambiguous. The main feature of PEGs that allows this is that the ordered choice operator (|) - the correct parse depends on the order you specify alternations, unlike with Antlr, where all alternations have equal precedence.

It should be noted though that this tool is much more than just a parser-generator - it's a framework for developing tools for interacting with languages, which just happens to use PEG as part of that implementation.

quotemstr · on May 27, 2014

Note that PEGs are not context-free grammars. They're both more and less powerful than traditional CFGs, and they're tricky to use: because PEG choice is ordered and traditional CFG choice is unordered, it's hard to translate standard language grammars to a PEG recognizer system. That's why, for my forever-project, I've oped to use scannerless GLR instead of PEGs. Both PEGs and GLR recognize languages that are closed under composition (the property that gives you extensibility), but the formalisms for GLR parsers are much better.

The Harmonia project is the best whack at the problem I've seen. See http://harmonia.cs.berkeley.edu/papers/twagner-parsing.pdf.

As others have mentioned, for an IDE, you also want strong error recovery. Doing that in a general way when using tools based on declarative grammars is, well, very hard, especially when you want to recover from brace mismatch problems. The best approach is "island and reef parsing", where you actually parse your buffer twice: you first build a map of all the "reefs" (parenthesis) using a simple recursive descent parser, pair up mismatched parenthesis using an ad-hoc algorithm, insert corrections for mismatches, then apply your fully general parser to the result. (The word "parenthesis" here refers to any balanced construct, even "begin" and "end". You can actually infer what the "parenthesis" for a given language are by examining the grammar!)

See also http://fileadmin.cs.lth.se/cs/Personal/Emma_Soderberg/docs/S...

moondowner · on May 27, 2014

They have a Confluence wiki space for Nitra: http://confluence.jetbrains.com/display/Nitra/Home

Here's the developer installation: http://confluence.jetbrains.com/display/Nitra/Developer+Inst...

moogly · on May 27, 2014

I would love to see some ReSharper-quality IntelliSense for Rust code in Visual Studio :)

caniszczyk · on May 27, 2014

The Eclipse equivalent of this is Xtext: https://www.eclipse.org/Xtext/

S4M · on May 27, 2014

So, this will be the Emacs Lisp of IntelliJ?

lapusta · on May 27, 2014

It's written in Nemerle, that runs on top of CLR. I believe it's targeted to .NET, VisualStudio & Resharper audience.

S4M · on May 27, 2014

I don't see how that is addressing my point. Emacs Lisp is the scripting language of Emacs, and you can use it to build AST to deal with syntax highlighting and autocompletions amongst other. And that's what Nitra is intended for, and it doesn't matter that it is written in Nemerle.

lapusta · on May 27, 2014

IntelliJ platform runs on top of JVM, Nitra runs on top of CLR.

th3iedkid · on May 27, 2014

they also have MPS on the grammer less IDE/Language tooling...

cbsmith · on May 27, 2014

The article was written as though Emacs didn't exist.

juggty_dev · on May 27, 2014

What about the security ?

citizenmatt · on May 27, 2014

What do you mean? Security of what?

JackFr · on May 27, 2014

Sounds a lot like parser combinators, without ever mentioning parser combinators.

sparkie · on May 27, 2014

Because they use PEGs, not parser-combinators.

ThinkBeat · on May 27, 2014

Kinda like

Lex and Yacc and Bison Antlr Lemon LPEG Ragel re2c

or any other tool on this list

http://en.wikipedia.org/wiki/Comparison_of_parser_generators

acdha · on May 27, 2014

Except, of course, for the areas where it's different as explained in the original post and their post last November:

http://blog.jetbrains.com/blog/2013/11/12/an-introduction-to...

Before dismissing someone's work you could at least skim a blog post.