Introduction To Program Synthesis (2018)

LondonGeorge · on Aug 7, 2021

Is there a set of program synthesis problems (or a source of problems) accepted by academics as a reasonable benchmark for measuring progress these days? Or does each new approach find its own example problems to demonstrate their particular differences on?

mechtaev · on Aug 7, 2021

SyGuS competition (SyGuS-Comp)[1] is a widely-used collection of such problems, but of course there will always be problem domains that are not well-represented by standard benchmarks.

1. https://sygus.org/

daralthus · on Aug 7, 2021

Perhaps https://ml4code.github.io/tags.html#benchmark

Darmani · on Aug 8, 2021

I was the TA for this course at MIT last time it was taught in 2020. Ask me anything.

gnulinux · on Aug 8, 2021

In what ways program synthesis different than code generation taught in compiler and/or PLT courses? Is the goal here get ad high level ss possible, or be able to create as much code as possible?

Darmani · on Aug 8, 2021

One word: search. Most of the traditional compilation pipeline can be expressed using deterministic rewrite rules. Synthesizers of all families generate and consider a large number of possibilities.

Flipping it around, there's an old joke that "a synthesizer is just a compiler that sometimes doesn't work."

But the first thing to learn is that, the deeper you go, the blurrier the lines get. There certainly exists a compiler out there that does more search than a good chunk of things done under the name of synthesis. (Especially because "synthesis" has become a bit of a buzzword in PL circles, and often gets slapped where it doesn't really belong.) I've discussed this briefly before in my blog: http://www.pathsensitive.com/2021/03/why-programmers-shouldn...

YeGoblynQueenne · on Aug 8, 2021

Not so much a question as a complaint: I can't find anything about Inductive Logic Programming in the course material. ILP is the field that studies the inductive synthesis of logic programs, it should fit right in with the rest of the material, particulary in the parts about constraint based synthesis and verification.

If ILP is discussed and I missed it, apologies- I've only read through very quickly to get a general idea of the curriculum and bookmarked the top level for reference.

Darmani · on Aug 8, 2021

I'm barely familiar with Inductive Logic Programming and never hear anyone talk about it. I came in prepared to dismiss the field's relevance to synthesis, but decided to do a bunch of reading first so that I could do so from an informed perspective. I knew ILP to be "constructing a classifier from a conjunction of literals," which is very much not synthesis. It turns out there's a lot more done under the name of ILP.

I think the real answer is largely that logic programming as a whole fell out of favor in the US sometime in the 90's, whereas the program synthesis renaissance started in the 2000's -- and is still US-dominated. Not to say there still aren't tons of people doing logic programming, but, personal anecdote: I encountered a ton of logic programming papers during the lit review for one of my other project, which has some overlap in the symbolic techniques used, and....they're all from the period 1988-1995.

Mukund's work is the most relevant from a synthesis perspective that I know of, as one of the only people doing synthesis of a logic language (unless you count SQL). (Note: This is distinct from people like William Byrd doing synthesis using a logic programming language.) Have a look at the bottom of page 23 of "Provenance-Guided Synthesis of Datalog Programs" https://dl.acm.org/doi/pdf/10.1145/3371130 . It does a good job explaining the distinction between the goals of ILP things done under the umbrella of synthesis. But note that this work is still considered niche in the synthesis world.

There's also the question about what techniques belong to what field; I expect that, were I to dig into thy synthesis-relevant parts of ILP further, I'd find a lot of techniques that are familiar and that I learned with no connection to ILP. Paul Graham has said that philosophy is not useful because everything useful it discovers was spun out into a different field. ( http://www.paulgraham.com/philosophy.html ) The philosophy of computer science in that perspective is AI. 40 years ago, garbage collection, term rewriting, logic programming, and programming assistants were all AI. Even just 20 years ago, Tessa Lau's work on inductive synthesis was AI. Today, all of those are things likely to be rejected from a top AI conference, but welcome at a PL conference. Thus, you might find in the synthesis world techniques that would be familiar to an old-school ILP researcher, but without shared context.

(Two recent exceptions are neural synthesis and probabilistic programming, where you do see a lot of crossover between PL and ML. This is very recent. [Insert harsh judgments here about the attempts from 2016 and earlier to learn programs.])

YeGoblynQueenne · on Aug 10, 2021

Thank you for your thoughtful and kowledgeable comment and many apologies for my belated reply. I was crossing an ocean on an unseaworthy vessel ("with slight abuse of terminology").

My intuition is the same as yours: US institutions don't really take notice of what's going on in Europe. It's a little strange in this day and age were we don't have to wait for correspondence to be delivered by wind clipper, say, but oh well, it's the way it is.

ILP is a small field with hubs in Europe and in Japan, and those also tend to be the hubs for logic programming (for obvious reasons). The overlap with a slightly arcane programming paradigm that, like you say, is more well-known in Programming Language circles is not really helping ILP's popularity, I guess.

>> I came in prepared to dismiss the field's relevance to synthesis, but decided to do a bunch of reading first so that I could do so from an informed perspective

Well, respect!

Regarding Mukund Raghothaman's work that you linked. First of all, thanks for making me aware of yet another interesting approach to learning datalog programs that I had missed because I myself don't keep as watchful an eye on the wider program synthesis literature as I wish others kept on ILP. Guilty! Second, I so disagree with the characterisation of the work in Muggleton et al. 2015 :) I guess that's typical, Stephen (Muggleton) is my thesis advisor and my work is a few steps down the path from the 2015 work so obviously I have a different perspective. Still, by January 2020 there were more recent results to cite and more advanced systems to compare against. Exciting things are happening in ILP right now, long-term thorny problems (predicate invention, learning recursion, learning in languages other than Prolog, efficiency) are falling one by one and it's a bit of a shame that it's all being missed by everyone outside our tiny little field.

Regarding what techniques belong to what field, I think you're probably right, ILP is traditionally based on search, specifically searching the space of logic programs that are sets of clauses constructible from examples and background knowledge (both also logic programs) and some kind of typically extra-logical language bias. The big problem has always been the size of the resulting hypothesis spaces. Of course I would think so because my work has been mainly about this big problem. Anyway as far as I know, wider program synthesis approaches also work essentially the same way, by searching a program space, and so have the same problems wih searching large spaces so definitely there will be much overlap. The difference is in the perspective: like you say, ILP sometimes has a bit of a strongly theoretical flavour because many of the experts in the field have a deep background in logic programming and expect the same from published papers. Btw, while there's few of us, the main venues for ILP work still include IJCAI and AAAI and special issues of the MLJ, and you can even find work in the machine learning conferences (although usually it's the Statistical Relational Learning branch that publishes there).

>> I knew ILP to be "constructing a classifier from a conjunction of literals," which is very much not synthesis.

Yes, I know this interpretation of ILP. The idea is that examples in ILP are typically labelled positive/negative so you can see a correct program that accepts positives and rejects negatives as a discriminator. Another way to see it is that examples are logical atoms assigned true/false values by a logical interpretation (literally an assignment of truth values to atoms) so again correctly recognising true from false can be seen as a classification.

This notion of "classification" has come from inside the field, I think as part of an effort to make ILP more approachable to the mainstream of machine learning. I know the people it came from and the goal was of course noble, but I think the result is that it sells short what is a much more interesting discipline. I mean, sure, the task of recognising _and generating_ sorted/unsorted lists can be shoehorned into a classification problem but that's missing the point that the task is to learn a program that has some kind of expected behaviour. Which is the essence of program synthesis of course.

For me the magical thing about ILP considered as program synthesis from incomplete specifications is that the specifications are logic programs, so learning a program that explains the examples with respect to the background knowledg is essentially completing a specification. And while the specicification is a program, a program-learning approach can learn its own specifications: what is called "predicate invention" in the literature.

Darmani · on Aug 10, 2021

Well, I learned a lot from this comment. Cool!

You should E-mail Mukund and start a conversation! (He should remember me, so feel free to name-drop me.)

YeGoblynQueenne · on Aug 11, 2021

You think? I would guess he'd just be annoyed!

But it might be worth to connect with the other side of the pond and all that. Like we just did :)

Edit: btw, I thought I had met Armando so I just checked and we were both at the Dagstuhl conference on Inductive Programming (yet another byword for program synthesis) in 2018. I don't think we talked much but we were introduced and he gave a short talk. Small world!

Darmani · on Aug 11, 2021

It's possible, but if you have a lot of knowledge, I think he'd welcome the connection. It's also incredibly flattering if you can ask intelligent questions or give intelligent appreciation about their papers. How else do you know your stiff is being read?

Have a look at Exercise 3 of https://web.eecs.umich.edu/~weimerw/2008-615/homework/hw5.pd... . Internalizing this advice has been life-changing for me.

> Edit: btw, I thought I had met Armando so I just checked and we were both at the Dagstuhl conference on Inductive Programming (yet another byword for program synthesis) in 2018. I don't think we talked much but we were introduced and he gave a short talk. Small world!

Cool!

YeGoblynQueenne · on Aug 11, 2021

I'll need to read the paper carefully first though. Again, thanks for pointing it out.

>> It's also incredibly flattering if you can ask intelligent questions or give intelligent appreciation about their papers.

It's true. I usually only get those from reviewers, often the ones who reject my papers...

Re: Exercise 3, I wish we had such insightful teaching when I was studying for my degree!

archibaldJ · on Aug 8, 2021

Would knowledge in automated theorem proving (e.g. knowing how coq works) be somehow transferable to research in this field?

What are some existing open-source program synthesis libraries/frameworks where we can tinker around and maybe make something with it?

Thanks! I'm super curious about this field and it's somewhat related to what I'm working on so I'm pretty interested to learn more!

Darmani · on Aug 8, 2021

> Would knowledge in automated theorem proving (e.g. knowing how coq works) be somehow transferable to research in this field?

Possibly.

First, Coq is not an automated theorem proving tool; it's an interactive theorem prover.

If by automated theorem proving, you're thinking of a tool like Z3 or CVC4 which find models of statements in a decidable theory (which are pretty much always mathematically uninteresting), then...absolutely! SAT/SMT are at the core of constraint-based synthesis. Heck, CVC4 has a SyGuS mode that won the synthesis competition one year.

If you're referring to actually searching for interesting theorems, as done by tools like Vampire and Otter, then probably not.

Last month, I was discussing the matching algorithm used in the paper "egg: Fast and Flexible Equality Saturation" with a collaborator. It uses code trees, which I read about a few months earlier in the Handbook of Automated Reasoning. It stands out because I had pretty much never encountered a mention of any of that stuff in the preceding several years.

I touch on this slightly in this blog post, about the benefits of learning Coq towards synthesis: http://www.pathsensitive.com/2021/03/why-programmers-shouldn... . (And the benefits I mention are in understanding the type theory, not the innards.)

> What are some existing open-source program synthesis libraries/frameworks where we can tinker around and maybe make something with it?

https://github.com/microsoft/prose Sketch: Download and manual from https://people.csail.mit.edu/asolar/ https://egraphs-good.github.io/

I searched for a few others, but they're either tools rather than frameworks, not directly synthesis, or I couldn't find a download.

Though I may as well plug my own Cubix framework: http://cubix-framework.com/

> Thanks! I'm super curious about this field and it's somewhat related to what I'm working on so I'm pretty interested to learn more!

Awesome! I hope to see great things coming from you.

archibaldJ · on Aug 9, 2021

Thanks for the detailed break-down and the links!!

> which are pretty much always mathematically uninteresting), then...absolutely! SAT/SMT are at the core of constraint-based synthesis.

Wow this is super cool to know!

> http://www.pathsensitive.com/2021/03/why-programmers-shouldn...

This would definitely help to kickstart my journey in program synthesis:)

> "egg: Fast and Flexible Equality Saturation"... It uses code trees, which I read about a few months earlier in the Handbook of Automated Reasoning > http://cubix-framework.com/ > https://egraphs-good.github.io/ > https://people.csail.mit.edu/asolar/ > https://github.com/microsoft/prose

These stuff are super cool! Really appreaciate it!

agumonkey · on Aug 8, 2021

How do you feel about the topic, is it lively ? dormant but potentially great ?

Any other source to read about ?

thanks :)

Darmani · on Aug 8, 2021

It's a very lively field. Lots of groups working on it. Last year, the big 3 PL conferences (PLDI, POPL, OOPSLA) all had tracks dedicated to it. A few years ago, Armando (my advisor and the professor of this course) taught a weekend seminar on the nitty-gritty internals of how his tool, Sketch, works ---- and had commercial users flying in.

Most recent summaries off the top of my head:

https://alexpolozov.com/blog/program-synthesis-2018/

https://www.microsoft.com/en-us/research/wp-content/uploads/...

srvmshr · on Aug 8, 2021

Fun fact: My elder brother was Armando's TA in TAMU. Armanda probably was the brightest and most fun guy to work with. There were extra credit for building a fully functional software CPU and its vector pipeline supporting VLIW (mid 90s duh). Your advisor guy did it in a breeze. The instructor privately confided to my brother he was a legend in the making.

Ask him about Sid in Lawrence Rawschberger's class for good old anecdotes! :)

agumonkey · on Aug 8, 2021

I also tried to read mathematics of program construction (similar spirit I assume) with Gibbons et al.

https://duckduckgo.com/?q=mathematics+program+construction&t...