hiyo /ml,

I do program synthesis as a profession for over 10 years, and I've recently finished writing a blog series, explaining how to get started in program synthesis. Here's an excerpt from the about page:

>Program synthesis is useful – Who wouldn’t want to make a computer that automatically writes programs? As humans and computers continue to work in collaboration, the distinction between programming, program-synthesis, and naturalistic communication will continue to blur. However, there is a knowledge gap between how to build state of the art program synthesis algorithms and what is generally known about it. This gap is much bigger than it needs to be. This blog aims to shrink this knowledge gap, so that you can start applying program synthesis to your own works. We will cover both the concepts of program synthesis – so you can have a framework to think and talk about it, and the bare-minimum toolings required to implement these algorithms – so you can start iterating on solutions. Ultimately, I hope researchers and system-builders can view “programming” as more than typing obscure green characters onto an uncompromising black terminal, and build systems that are as empathetic as they are efficient.

specifically, it covers topics from how to formulate a synthesis problem, to how to fine-tune llm on huggingface to write programs to match a specification.

blog : https://evanthebouncy.github.io/program-synthesis-minimal/

twitter thread: https://twitter.com/evanthebouncy/status/1580634593685753856

I can take some questions in this thread, please feel free to ask me anythings, from technical to hot-takes.

--evan

Comments

You must log in or register to comment.

neuralbeans t1_is71aaq wrote on October 13, 2022 at 7:36 PM

Is what you do a translation from specification to code? What is your profession exactly? How have you been doing this task before deep learning was a thing?

evanthebouncy OP t1_is76k7z wrote on October 13, 2022 at 8:09 PM

more than translation per se. in real life, when you're given a specification, it is rare you can directly translate it into a solution in a 1:1 mapping. typically, you have to _search_ for a solution.

before deep learning, the search can be performed by enumeration/back-tracking solver. various SAT engines (miniSAT, z3) were used to effectively (not by today's standard) find a solution within the search space.

r4and0muser9482 t1_is7bnn8 wrote on October 13, 2022 at 8:42 PM

Just as a caveat, translation is not a 1-1 mapping either. But I get your point.

YouAgainShmidhoobuh t1_is9mo5a wrote on October 14, 2022 at 9:08 AM

Prolog as program synthesis is a one I’ve not heard yet, but it does make sense

yldedly t1_isaznkg wrote on October 14, 2022 at 4:14 PM

Have you tried synthesizing probabilistic programs and inference programs? Any general thoughts on the topic?

evanthebouncy OP t1_isb2n4f wrote on October 14, 2022 at 4:34 PM

Not much personally no.

But it's widely applicable, because in many instances, you'll have a stoicastic system that generates data. You see the data, you want to infer the system.

Example 1 modeling behavior. You could have a game, and the way a person plays the game is random, doing something different at times. By observing a person playing the game, you have collected some observation data, that's generated from a random behavior. To model the strategy the person is using, you'd have to use a probablistic program. It'll have some logical components, and some random components.

Example 2 modeling natural phenomenon. You have a toilet (I'm sitting on one now lmaoo) that you're building and you want to know, given the weight and consistency of the poo inside (X), how much water does it need (Y) to flush cleanly. The relationship between X and Y can be described by an equation, plus some noise, making it really intuitive to model as a probablistic program.

I'd learn about it here

https://probmods.org/

yldedly t1_isb5nsi wrote on October 14, 2022 at 4:54 PM

What evocative examples :P
I know probmods.org well, it's excellent. I wrote a blogpost about program synthesis. I stumbled on the area during my phd where I did structure learning for probabilistic programs, and realized (a bit late) that I was actually trying to do program synthesis. So I'm very interested in it, wish I had the chance to work with it more professionally. Looking forward to reading your blog!

evanthebouncy OP t1_isbui1j wrote on October 14, 2022 at 7:40 PM

I read all of your blog.

I loved this reference

"""The physicist David Deutsch proposes a single criterion to judge the quality of explanations. He says good explanations are those that are hard to vary, while still accounting for observations. """

You write really well! I followed you on twitter. I Think you have thought about the relationship between explaining data and probablistic programming deeper and longer than I have so i cant say much of surprising cool things to you.

I think my work "communicating natural programs to humans and machines" will entertain you for hours. Give it a go.

It's my belief that we should program computers using natural utterances such as language, demonstration, doodles, ect. These "programs" are fundamentally probablistic and admits multiple interpretations/executions.

[deleted] t1_isbujdb wrote on October 14, 2022 at 7:41 PM

[removed]

yldedly t1_isecsiy wrote on October 15, 2022 at 9:39 AM

>I think my work "communicating natural programs to humans and machines" will entertain you for hours. Give it a go.

I will, looks super interesting. I'm so jealous of you guys at MIT working on all this fascinating stuff :D

>It's my belief that we should program computers using natural utterances such as language, demonstration, doodles, ect. These "programs" are fundamentally probablistic and admits multiple interpretations/executions.

That's an ambitious vision. I can totally see how that's the way to go if we want "human compatible" AI, in Stuart Russell's sense where AI is learning what the human wants to achieve, by observing their behavior (including language, demonstrations, etc).

evanthebouncy OP t1_iseg25n wrote on October 15, 2022 at 10:26 AM

Yaya thanks! My belief is that for most part, people know exactly what they want from computers, and can articulate it well enough so that a developer (with knowledge of computers) can implement it successfully. In this process the first person need not code at all, in the traditional sense.

All we need is the technology to replace the dev with AI haha

evanthebouncy OP t1_isq1rru wrote on October 17, 2022 at 9:11 PM

Yo.

Foundation Posteriors for Approximate Probabilistic Inference

Read this on arxiv

JNmbrs t1_isfuq2n wrote on October 15, 2022 at 5:39 PM

Hi Evan—I think you and your collaborators put out some of the most interesting research out there (at least the parts of it that I understand). Unfortunately, I’m only a lightly technical hobbyist (not an engineer or even in STEM), so before wasting your time with my noob questions, I just wanted to confirm this AMA is indeed open even to idiots.

evanthebouncy OP t1_isfytt9 wrote on October 15, 2022 at 6:08 PM

Yaya hit me with the question and I'll see what i can do!

JNmbrs t1_isgqdyr wrote on October 15, 2022 at 9:24 PM

Thanks! A few questions below:

What do you see as the bottleneck to be overcome to make library learning program synthesis systems (e.g., Dreamcoder) scalable? Where I've seen recent work on these systems, the work seems to focus on improvements in (a) search algorithms (e.g., https://arxiv.org/pdf/2110.12485.pdf); (b) program abstraction/library compression (e.g., https://mlb2251.github.io/stitch_jul11.pdf and http://andrewcropper.com/pubs/aaai20-forgetgol.pdf); (c) optimizing neural guidance (e.g., https://openreview.net/pdf?id=rCzfIruU5x5 and https://arxiv.org/pdf/2206.05922.pdf); and (d) specification (e.g., https://arxiv.org/pdf/2007.05060.pdf and https://arxiv.org/pdf/2204.02495.pdf). While obviously work proceeds in these (and other related) domains, I'd love to hear your thoughts on which one(s) are the bottlenecks where breakthroughs are most needed.
In the immediate term (3-5 years), in what fields (e.g., theory generators to aide scientists or as modules in robotics) do you think library learning program synthesis programs will have the greatest impact?
(Sorry if this is especially stupid, but) Do you think humans have explicit representations of rules (e.g., programs) in our brain "hardware" that we could in theory point to?
I was intrigued but also left a little confused by the LARC paper. In the conclusion you advocate for that we need advances to help map from natural programs to machine programs or, instead, that machine programs should have the properties of natural language (like being ambiguous)? Or did I miss the point entirely lol?

Huge thanks again for your time.

evanthebouncy OP t1_isgwjhn wrote on October 15, 2022 at 10:09 PM

Q: What do you see as the bottleneck to be overcome to make library learning program synthesis systems (e.g., Dreamcoder) scalable?

A: one can take a simulation based approach to understand/view dreamcoder -- an agent starts with only knowing primitives, and is asked to solve a set of tasks, from easy to hard. the agent solves a few, compress the library, then try to solve the ones slightly harder than those, and repeat. the input to the simulation is the set of tasks, and the learning algorithm, and you just hit "run" and off it goes in a self-contained way, and we observe what it comes up with in a few days -- kind of like opening a jar of a closed evolutionary system and see if dinosaurs are in there or something like that lol.

so obviously we can improve the simulation by picking up different components of dreamcoder and make them run faster or more efficient. my idea (kind of silly tbh) is to allow additional input as the simulation is run. what if you let users tweak the simulation as it is running? what if you let the user guide some of the search, or pick different curriculum of tasks? etc? how do we make it so it is easy for end-users to inject knowledge into the system as it is running?

ultimately we're designers as much as simulation creators. we can half let the system run on its own through self-play, half with some hand-picked intervention because humans are good at solving problems.

Q: In the immediate term (3-5 years), in what fields (e.g., theory generators to aide scientists or as modules in robotics) do you think library learning program synthesis programs will have the greatest impact?

A: well the sell is library right? so I'd say it'll do well in a field where, there _should_ be some library, yet they're somewhat unintuitive for the humans to design. I'm unsure haha maybe robotics is a good domain, or planning problems if we can view library learning as a kind of hierarchical planning set up, being able to come up with its own abstractions.

Q: (Sorry if this is especially stupid, but) Do you think humans have explicit representations of rules (e.g., programs) in our brain "hardware" that we could in theory point to?

A: I... don't know and I don't think about these problems too much tbh, I'm more practical, I want to build systems that end-users use. so by profession I don't ponder those questions. philosophically I'm more into music and reading old chinese stories hhaha so I don't ponder those questions philosophically either. I will tell you a funny story though, hopefully it makes up for my lack of answer. There was this lex friedman lecture at MIT at one point, and he invited a rly awesome neuro biologist. a student asked her a question "how do we know worms have no consciousness, what if they do?" and she simply said "it's unlikely because the sensory neuron (eye) of the worm directly wires into the motor (feet) of the worm, no in between, it sees bright light, it retracts backwards reflexively. so what hardware is there for the worm to even process and make decision?" and I thought that was brutally hilarious answer.

Although, irrespective of what our brain program is like, we _did_ invent rules and logic right? we _did_ invent tools that are highly programmatic, and reliable in their executions. So maybe the question should be "can we make AI systems that can _invent_ logic itself" because clearly humans have done it.

Q: I was intrigued but also left a little confused by the LARC paper. In the conclusion you advocate for that we need advances to help map from natural programs to machine programs or, instead, that machine programs should have the properties of natural language (like being ambiguous)? Or did I miss the point entirely lol?

A: the latter. machine programs need to have the properties of language, namely, being expressive and universal (i.e. can express a lot of ideas, and be understood by a range of interpreters), yet still being precise (i.e. can be used to carry out specific tasks). How to do it? honestly iono but I'm working on it so subscribe for the next episode (that's a sn00pdawg quote isn't it ahahaha)