Is there a set of program synthesis problems (or a source of problems) accepted by academics as a reasonable benchmark for measuring progress these days? Or does each new approach find its own example problems to demonstrate their particular differences on?
SyGuS competition (SyGuS-Comp)[1] is a widely-used collection of such problems, but of course there will always be problem domains that are not well-represented by standard benchmarks.
In what ways program synthesis different than code generation taught in compiler and/or PLT courses? Is the goal here get ad high level ss possible, or be able to create as much code as possible?
One word: search. Most of the traditional compilation pipeline can be expressed using deterministic rewrite rules. Synthesizers of all families generate and consider a large number of possibilities.
Flipping it around, there's an old joke that "a synthesizer is just a compiler that sometimes doesn't work."
But the first thing to learn is that, the deeper you go, the blurrier the lines get. There certainly exists a compiler out there that does more search than a good chunk of things done under the name of synthesis. (Especially because "synthesis" has become a bit of a buzzword in PL circles, and often gets slapped where it doesn't really belong.) I've discussed this briefly before in my blog: http://www.pathsensitive.com/2021/03/why-programmers-shouldn...
Not so much a question as a complaint: I can't find anything about Inductive Logic Programming in the course material. ILP is the field that studies the inductive synthesis of logic programs, it should fit right in with the rest of the material, particulary in the parts about constraint based synthesis and verification.
If ILP is discussed and I missed it, apologies- I've only read through very quickly to get a general idea of the curriculum and bookmarked the top level for reference.
I'm barely familiar with Inductive Logic Programming and never hear anyone talk about it. I came in prepared to dismiss the field's relevance to synthesis, but decided to do a bunch of reading first so that I could do so from an informed perspective. I knew ILP to be "constructing a classifier from a conjunction of literals," which is very much not synthesis. It turns out there's a lot more done under the name of ILP.
I think the real answer is largely that logic programming as a whole fell out of favor in the US sometime in the 90's, whereas the program synthesis renaissance started in the 2000's -- and is still US-dominated. Not to say there still aren't tons of people doing logic programming, but, personal anecdote: I encountered a ton of logic programming papers during the lit review for one of my other project, which has some overlap in the symbolic techniques used, and....they're all from the period 1988-1995.
Mukund's work is the most relevant from a synthesis perspective that I know of, as one of the only people doing synthesis of a logic language (unless you count SQL). (Note: This is distinct from people like William Byrd doing synthesis using a logic programming language.) Have a look at the bottom of page 23 of "Provenance-Guided Synthesis of Datalog Programs" https://dl.acm.org/doi/pdf/10.1145/3371130 . It does a good job explaining the distinction between the goals of ILP things done under the umbrella of synthesis. But note that this work is still considered niche in the synthesis world.
There's also the question about what techniques belong to what field; I expect that, were I to dig into thy synthesis-relevant parts of ILP further, I'd find a lot of techniques that are familiar and that I learned with no connection to ILP. Paul Graham has said that philosophy is not useful because everything useful it discovers was spun out into a different field. ( http://www.paulgraham.com/philosophy.html ) The philosophy of computer science in that perspective is AI. 40 years ago, garbage collection, term rewriting, logic programming, and programming assistants were all AI. Even just 20 years ago, Tessa Lau's work on inductive synthesis was AI. Today, all of those are things likely to be rejected from a top AI conference, but welcome at a PL conference. Thus, you might find in the synthesis world techniques that would be familiar to an old-school ILP researcher, but without shared context.
(Two recent exceptions are neural synthesis and probabilistic programming, where you do see a lot of crossover between PL and ML. This is very recent. [Insert harsh judgments here about the attempts from 2016 and earlier to learn programs.])
Thank you for your thoughtful and kowledgeable comment and many apologies for my
belated reply. I was crossing an ocean on an unseaworthy vessel ("with slight
abuse of terminology").
My intuition is the same as yours: US institutions don't really take notice of
what's going on in Europe. It's a little strange in this day and age were we
don't have to wait for correspondence to be delivered by wind clipper, say, but
oh well, it's the way it is.
ILP is a small field with hubs in Europe and in Japan, and those also tend to be
the hubs for logic programming (for obvious reasons). The overlap with a
slightly arcane programming paradigm that, like you say, is more well-known in
Programming Language circles is not really helping ILP's popularity, I guess.
>> I came in prepared to dismiss the field's relevance to synthesis, but decided
to do a bunch of reading first so that I could do so from an informed
perspective
Well, respect!
Regarding Mukund Raghothaman's work that you linked. First of all, thanks for
making me aware of yet another interesting approach to learning datalog programs
that I had missed because I myself don't keep as watchful an eye on the wider
program synthesis literature as I wish others kept on ILP. Guilty! Second, I so
disagree with the characterisation of the work in Muggleton et al. 2015 :) I
guess that's typical, Stephen (Muggleton) is my thesis advisor and my work is a
few steps down the path from the 2015 work so obviously I have a different
perspective. Still, by January 2020 there were more recent results to cite and
more advanced systems to compare against. Exciting things are happening in ILP
right now, long-term thorny problems (predicate invention, learning recursion,
learning in languages other than Prolog, efficiency) are falling one by one and
it's a bit of a shame that it's all being missed by everyone outside our tiny
little field.
Regarding what techniques belong to what field, I think you're probably right,
ILP is traditionally based on search, specifically searching the space of logic
programs that are sets of clauses constructible from examples and background
knowledge (both also logic programs) and some kind of typically extra-logical
language bias. The big problem has always been the size of the resulting
hypothesis spaces. Of course I would think so because my work has been mainly
about this big problem. Anyway as far as I know, wider program synthesis
approaches also work essentially the same way, by searching a program space, and
so have the same problems wih searching large spaces so definitely there will be
much overlap. The difference is in the perspective: like you say, ILP sometimes
has a bit of a strongly theoretical flavour because many of the experts in the
field have a deep background in logic programming and expect the same from
published papers. Btw, while there's few of us, the main venues for ILP work
still include IJCAI and AAAI and special issues of the MLJ, and you can even
find work in the machine learning conferences (although usually it's the
Statistical Relational Learning branch that publishes there).
>> I knew ILP to be "constructing a classifier from a conjunction of literals,"
which is very much not synthesis.
Yes, I know this interpretation of ILP. The idea is that examples in ILP are
typically labelled positive/negative so you can see a correct program that
accepts positives and rejects negatives as a discriminator. Another way to see
it is that examples are logical atoms assigned true/false values by a logical
interpretation (literally an assignment of truth values to atoms) so again
correctly recognising true from false can be seen as a classification.
This notion of "classification" has come from inside the field, I think as part
of an effort to make ILP more approachable to the mainstream of machine
learning. I know the people it came from and the goal was of course noble, but I
think the result is that it sells short what is a much more interesting
discipline. I mean, sure, the task of recognising _and generating_
sorted/unsorted lists can be shoehorned into a classification problem but that's
missing the point that the task is to learn a program that has some kind of
expected behaviour. Which is the essence of program synthesis of course.
For me the magical thing about ILP considered as program synthesis from
incomplete specifications is that the specifications are logic programs, so
learning a program that explains the examples with respect to the background
knowledg is essentially completing a specification. And while the
specicification is a program, a program-learning approach can learn its own
specifications: what is called "predicate invention" in the literature.
But it might be worth to connect with the other side of the pond and all that. Like we just did :)
Edit: btw, I thought I had met Armando so I just checked and we were both at the Dagstuhl conference on Inductive Programming (yet another byword for program synthesis) in 2018. I don't think we talked much but we were introduced and he gave a short talk. Small world!
It's possible, but if you have a lot of knowledge, I think he'd welcome the connection. It's also incredibly flattering if you can ask intelligent questions or give intelligent appreciation about their papers. How else do you know your stiff is being read?
> Edit: btw, I thought I had met Armando so I just checked and we were both at the Dagstuhl conference on Inductive Programming (yet another byword for program synthesis) in 2018. I don't think we talked much but we were introduced and he gave a short talk. Small world!
> Would knowledge in automated theorem proving (e.g. knowing how coq works) be somehow transferable to research in this field?
Possibly.
First, Coq is not an automated theorem proving tool; it's an interactive theorem prover.
If by automated theorem proving, you're thinking of a tool like Z3 or CVC4 which find models of statements in a decidable theory (which are pretty much always mathematically uninteresting), then...absolutely! SAT/SMT are at the core of constraint-based synthesis. Heck, CVC4 has a SyGuS mode that won the synthesis competition one year.
If you're referring to actually searching for interesting theorems, as done by tools like Vampire and Otter, then probably not.
Last month, I was discussing the matching algorithm used in the paper "egg: Fast and Flexible Equality Saturation" with a collaborator. It uses code trees, which I read about a few months earlier in the Handbook of Automated Reasoning. It stands out because I had pretty much never encountered a mention of any of that stuff in the preceding several years.
It's a very lively field. Lots of groups working on it. Last year, the big 3 PL conferences (PLDI, POPL, OOPSLA) all had tracks dedicated to it. A few years ago, Armando (my advisor and the professor of this course) taught a weekend seminar on the nitty-gritty internals of how his tool, Sketch, works ---- and had commercial users flying in.
Fun fact:
My elder brother was Armando's TA in TAMU. Armanda probably was the brightest and most fun guy to work with. There were extra credit for building a fully functional software CPU and its vector pipeline supporting VLIW (mid 90s duh). Your advisor guy did it in a breeze. The instructor privately confided to my brother he was a legend in the making.
Ask him about Sid in Lawrence Rawschberger's class for good old anecdotes! :)