Higher-order abstract syntax

fizixer · on Aug 16, 2016

Based on my limited knowledge, SSA form is at a higher level than AST (right?). If yes, would SSA be considered an example of HOAS?

Regardless, besides the question posed by OP, I wonder what the relation between SSA and HOAS might be?

mafribe · on Aug 16, 2016

ASTs and SSA are orthogonal concepts. ASTs are a representation of syntax (including SSA) where immediate subterms are easily accessible (unlike with strings as representation). Easy access to subterms is important for the efficiency of type-checkers and code-generators. In contrast, SSA is a simple programming language that is often used as an intermediate form in code generators.

HOAS is a form of embedding a language L with binders into a language L-Base also with binders and functions, but such that L-binders are handled by functions in L-Base and the problem of defining capture-avoiding substitution for L-terms is reduced to reusing L-Base's capture-avoiding substitution. (NB L = L-Base is possible but not required.)

HOAS is neat, but (simplifying a bit) prevents you from making inductive definitions on L terms. There are various ways of getting around this shortcoming, the most principled one is probably the nominal approach of A. Pitts et al. Another problem with HOAS is that the representation may contain 'junk'. That means there are L-Base terms that do not represent L-programs.

trurl · on Aug 17, 2016

HOAS representations can only contain junk if you have not taken appropriate precautions. See the paper "Boxes Go Bananas" for one technique.

mafribe · on Aug 17, 2016

Yes, that's right. But you need parametric polymorphism for "Boxes Go Bananas" which may not be available in some languages.

adamnemecek · on Aug 16, 2016

I'd be very interested to hear opinions of people who've actually used this and if so for what?

danharaj · on Aug 16, 2016

It's quite ingenious but it's not as convenient as locally nameless representation for working underneath binders. In Haskell if you want to do variable binding you ought to use the bound library[1]. It's been seriously used in at least one compiler project[2].

[1] https://hackage.haskell.org/package/bound [2] https://github.com/ermine-language/ermine

doomrobo · on Aug 16, 2016

The Rust compiler uses DeBrujin indices to refer to variables at some point in the compilation process.

tel · on Aug 18, 2016

But that's pretty much exactly what HOAS is avoiding ;)

grumpyprole · on Aug 16, 2016

HOAS is particularly nice as a surface representation for deeply embedded DSLs, especially when combined with GADTs or tagless-final style. But you'll likely want to transform to e.g. DeBruijn indices for optimisations, low-level codegen etc

Athas · on Aug 16, 2016

It is nice for languages where it is a first-class concept (e.g. Twelf, which is a logic/theorem language), but it gets complicated very quickly if you want to do AST transformations.

I cannot recommend HOAS outside of very narrow areas.

gelisam · on Aug 16, 2016

The [Beluga language](http://complogic.cs.mcgill.ca/beluga/) is a variant of Twelf which aims to make such transformations much easier.

Full disclosure: Brigitte Pientka was my supervisor. I did not work on the compiler, but I did make the language's logo.

fritzo · on Aug 17, 2016

I've found HOAS to be a convenient notation for higher order functions in DSLs in Coq [1] and Python [2], but both use cases were to reason about lambda-calculus.

[1] https://github.com/fritzo/hstar/blob/6df3347/src/DeBruijn.v#... [2] https://github.com/fritzo/pomagma/blob/ada575c/src/reducer/s...

naasking · on Aug 16, 2016

It's been used quite a bit: http://okmij.org/ftp/tagless-final/JFP.pdf

kxyvr · on Aug 16, 2016

If anyone want another concrete example of a tagless interpreter in OCaml, I wrote one on SO awhile back:

https://stackoverflow.com/questions/22676975/simple-lambda-c...

Candidly, I find them a pain to write, but it is faster and integrates with the host language's type checker.

tel · on Aug 17, 2016

I've used it a few times. It's a cute way to make shallow DSLs which need to identify binders. It gets very annoying to do any work under the binders however making it scale poorly to many tasks important in more complex DSL implementations.

grumpyprole · on Aug 17, 2016

I think you mean deeply embedded DSLs (as per my earlier comment)? Shallow DSLs implement semantics directly, having only one interpretation and don't have abstract syntax (the AS in HOAS).

tel · on Aug 17, 2016

I suppose I don't really define deep and shallow in any precise way. I'd say what you describe is a "shallower" DSL than any one using HOAS, but using HOAS will be shallower than one that has an explicit binding structure.

Maybe that's pretty non-standard. I like thinking of shallow-deep as a continuum, though.

grumpyprole · on Aug 18, 2016

These terms do have fairly precise meanings in the FP community; there are many papers on the subject that define shallow versus deep as I described. For example: http://www.cs.ox.ac.uk/publications/publication7584-abstract...

tel · on Aug 18, 2016

Yes, I'm aware of those papers. I've read many including the one that you linked. As far as I'm aware, while you can use the strict definition (is there, in the metalanguage, the creation of an AST?) they're often discussed as more of a continuum.

HOAS is a great example. There is the creation of an AST, but perhaps the most important and tricky part of any AST, the binding system, is left to the metalanguage. For that reason exactly I'm happy to say that HOAS is "shallower" than, say, a de Bruijn indexed binding system.