This is one of the most exciting things I've seen come out of AI research for my...

tasdfqwer0897 · on May 26, 2020

We have been working on this recently on the Google Brain team.

We are working both on synthesizing programs from scratch (see https://arxiv.org/abs/2002.09030 for example) and on understanding computer programs using machine learning (see e.g. https://arxiv.org/abs/1911.01205).

I'm always happy to correspond with people about these topics.

logicchains · on May 27, 2020

Have you considered attempting to develop a "differentiable programming language" so that it's possible to do gradient descent directly on the space of programs? There was some work on building a "differentiable lambda calculus", https://core.ac.uk/download/pdf/82396223.pdf, but I haven't heard of anyone trying to use something like that in modern program synthesis.

m3at · on May 28, 2020

Late on the thread but you might be interested in this: https://www.tensorflow.org/swift

logicchains · on May 30, 2020

That's something different: it allows doing gradient descent on the space of paramaters for programs, not on the space of programs.

yazr · on May 26, 2020

Love to chat. Sent u an email (former HFT and kernel guy).

One topic which is feel is neglected is a good GCN (or any GNN) to operate on existing code trees. Most approaches seem to prefer seq or at most tree inputs.

Is this simply not finding yet a good network architecture, or is it a performance issue ?

siv_ran7611 · on May 27, 2020

I dont see any email listed on your profile. Would like to have a chat to work/learn on this stuff

Jugglerofworlds · on May 26, 2020

I'm planning on applying for PhD programs this fall to work in this area. There are only a few places in the world right now that I know of working on these types of problems. They are:

* Martin Vechev, ETH Zurich

* Dawn Song, University of California Berkeley

* Eran Yahav, Technion

* Miltiadis Allamanis, Microsoft Research Cambridge

If anyone knows other advisors looking for graduate students in this area, please let me know. Due to personal circumstances I can most likely not apply to ETH Zurich or Technion (I don't speak Hebrew anyway), which leaves me with only one potential advisor in a program that I really want.

There is also the Python writing model that Open AI showed recently at the Microsoft Build conference, so maybe there is some interest growing at other places as well.

I was also recently working on a deep learning decompiler but was unable to get my transformer model to learn well enough to actually decompile x64 assembly. I have the source code for the entire Linux kernel as training data, so it's not an issue with quantity. If anyone is interested in helping out with this project, please let me know in a comment.

p1esk · on May 26, 2020

I have the source code for the entire Linux kernel as training data, so it's not an issue with quantity

Linux kernel is only ~30M LOC. That's a really small dataset. For comparison, the reddit based dataset for GPT-2 is 100 times larger. Try using all C code posted on Github.

decompile x64 assembly

You can't "decompile" assembly. Either you decompile machine code, or you disassemble assembly code. The latter is easier than the former, so if you're trying to decompile executables, then perhaps you should train two models: one to convert machine code to assembly, and the other to convert assembly to C. Assembly code produced by an optimizing compiler might differ significantly from assembly code which closely corresponds to C code.

tsimionescu · on May 26, 2020

> perhaps you should train two models: one to convert machine code to assembly, and the other to convert assembly to C.

Is the step of going from machine code to gcc-produced assembly not trivial? Is gcc actually producing assembly code that an assembler needs to do more with than convert to the corresponding opcodes?

p1esk · on May 27, 2020

There are two kinds of assembly: 1. assembly that corresponds to optimized machine code, and 2. assembly that closely corresponds to the original C code. As I said, these two assembly versions might look very different depending on optimizations performed by the compiler. You can reduce the difficulty of learning the conversion from machine code to assembly at the expense of increasing the difficulty of learning the conversion from assembly to C code (and vice versa).

lmeyerov · on May 26, 2020

As part of the short list:

* Swarat: https://blog.sigplan.org/2020/04/15/synthesizing-neurosymbol... <- probably heaviest on the math side wrt PL people

* Ras Bodik (Berkeley -> UW): esp. w/ Pedro Domingos and all the MSR collaborators (Sumit Gulwani, ...) <- a bit biased b/c I was in the group while at Berkeley; Ras + Dawn are crazy creative

* Percy Liang (Stanford): Coming from the ML side and w/ a long-running interest here

Mehdi2277 · on May 26, 2020

UCL is another good choice for this area. I did some research in this area back in undergrad. There was a workshop called NAMPI (neural abstract machine program intelligence) on this subfield so I'd recommend going through the accepted papers for it and seeing where all the faculty come from.

Also general advice, after finding a few papers you like go through the papers they cite that are relevant to the subfield and also papers that cite them. That's one of the best ways of finding other related research that interests you.

gnolne · on May 26, 2020

I've listened to some talks by Song, and my impression is that she does not seem to have a strong technical grasp of what's going on, for what it's worth. There is a lack of clarity in her communication style. I'd try to work with Vechev, Liang, UW PLSE group if I were you...

idoby · on May 26, 2020

If the language barrier is the only reason keeping you from applying to the Technion, or any other university in Israel, graduate courses in Israel will generally be given in English if a non-Hebrew speaker is present. It's standard practice, just email the instructor in advance. Besides, Haifa is a very nice city.

You should also check out professors at Ben Gurion University, Tel Aviv University and the Hebrew University who might have similar interests, IIRC. Feel free to hit me up if there's some page in Hebrew that doesn't translate well.

tpetricek · on May 26, 2020

Another person to talk to would be Brooks Paige, who recently joined UCL: https://tbrx.github.io/ This is totally not my area of research (I'm a PL person), but I know Brooks from the Alan Turing Institute and I think he'd be a great advisor for this kind of work.

risubramanian · on May 26, 2020

Premkumar Devanbu at UC Davis works in this area. I've taken his graduate course on the subject.

NwtnsMthd · on May 26, 2020

Maybe don't discount the Technion so easily. All of the professors are fluent in English as is most of the student body. There are many graduate students who start studying there without knowing Hebrew.

Zardus · on May 27, 2020

There are a lot of groups working in this area. I'm going to pitch ours first, and then also point you to some others!

My colleagues and I run the SEFCOM lab at Arizona State University (https://sefcom.asu.edu/). Most relevant to your interests, Fish Wang (fellow SEFCOM professor) and I (Yan Shoshitaishvili) founded the angr program analysis framework (https://angr.io/) back in our gradschool days and have continued to evolve it together with our students in the course of our research at ASU. We're actually currently undertaking a concerted push into decompilation research, using both PL and ML techniques. This research direction is a passion of ours, and there's plenty of room for you here if you're interested!

Of course, we also do quite a bit of work in other areas of program analysis (including less overtly "mathy" techniques, like fuzzing) as well as other areas of cybersecurity. We are also quite active in the Capture the Flag community, if that is something that interests you!

Other places that do research in program analysis off the top of my head:

- Chris Kruegel (https://sites.cs.ucsb.edu/~chris/) and Giovanni Vigna (https://sites.cs.ucsb.edu/~vigna/) at UCSB (disclaimer: I got my PhD from them!)

- Davide Balzarotti (http://s3.eurecom.fr/~balzarot/) and Yanick Fratantonio (https://reyammer.io/) at EURECOM

- Antonio Bianchi (https://antoniobianchi.me/) (and, soon, Aravind Machiry) at Purdue

- Alexandros Kapravelos (https://kapravelos.com/) at NCSU

- Taesoo Kim (https://taesoo.kim/) at Georgia Tech

- Yeongjin Jang (https://www.unexploitable.systems/) at O(regon)SU

- Zhiqiang Lin (http://web.cse.ohio-state.edu/~lin.3021/) at O(hio)SU

- Brendan Dolan-Gavitt (https://engineering.nyu.edu/faculty/brendan-dolan-gavitt) at NYU

- Wil Robertson (https://wkr.io/) and Engin Kirda (https://www.ccs.neu.edu/home/ek/) at Northeastern

If you have questions about the PhD process or this research area, feel free to reach out: yans@asu.edu or @Zardus on twitter!

thomasahle · on May 26, 2020

They don't actually use trees in the paper though. https://arxiv.org/abs/1912.01412

They express everything in reverse polish notation, exactly so the transformer network can work on streams and not care about the nesting levels.

It's a bit odd that this article doesn't talk about that.

gavinray · on May 26, 2020

Forgive me for being entirely ignorant about the field, but I've thought about this frequently and would be grateful for a decent answer:

How theoretically feasible would it be to create Meta-AI.

My naive (incredibly over-simplified) notion goes something like this:

- You convert open-source code into a generic/universal AST/IR format, and figure out a means to represent function inputs and outputs (behavior)

- You use all of the OSS code in the world as data for this, to train a neural network to start making predictions about what kind of AST inputs in a program lead to what kinds of outputs (function synthesis)

- You tell the program to look at other deep learning/ML code, and to attempt to optimize it's own code, while running, inside of isolated processes/VM's.

Something like Erlang/Elixir or Lisp should be capable of this meta-programming and self-code modification while still running. And the BEAM would theoretically be able to isolate actors in the event that the epoch doesn't pan out very well and just kill that branch/lineage off.

Is this insane? I've wondered about this for years.

gnramires · on May 26, 2020

No it isn't, in fact this line of work has already a tradition seen here:

https://en.wikipedia.org/wiki/Genetic_programming#Meta-genet...

Although it uses genetic programming instead of neural networks. Those approaches are equivalent in power, the difference is only that neural networks are more amenable to parallelization and we know a very good baseline optimization algorithm (gradient descent and friends).

The problem is that coming up with an optimizer/architecturer is hard. So the benefit of running a meta-optimizer to solve a problem is difficult to realize versus a handmade neural network. The problem needs to be so large the network would need to re-engineer itself to solve it -- but think how many resources have already been spent in engineering neural networks (by humans no less, which are quite powerful optimizers!). Unless the problem is truly titanic (w.r.t. other current problems) the yields might be small.

What you can do in a similar vein is gather a large set of problems and train a 'Neural architect' or something like that that then can be applied many times to new problems. This allows sharing this cost over many new networks. I think it could make sense for governments and large organizations to get together and create this sort of thing. If you know the costs of training a large neural network, imagine the cost of training hundreds of millions to train some kind of neural architect (NA).

There are milder versions of this, where the architecture search itself isn't trained:

https://en.wikipedia.org/wiki/Neural_architecture_search

https://en.wikipedia.org/wiki/Automated_machine_learning

Of course even this approach has limitations (even if you theoretically allow it to create essentially arbitrary architectures), because it will still have difficulty "thinking outside the box" like we do using huge amounts of mathematical expertise and intuition (see EfficientNet -- the insight in that paper would be difficult to arise from a NA network) -- it's not the thing that will solve all of our problem forever; but it would be pretty significant (perhaps towards making large AI-based systems with multiple components, say self-driving cars and robots of the future).

gavinray · on May 26, 2020

Wow, thank you for the answer and the links. I had tried to search for something like this but I didn't have the right terminology.

I really hope to see something happen in this area before I die, just for the sake of seeing it happen.

I often wonder about whether neural networks might need to meet at a crossroads with other techniques.

Inductive Logic/Answer Set Programming or Constraints Programming seems like it could be a good match for this field. Because from my ignorant understanding, you have a more "concrete" representation of a model/problem in the form of symbolic logic or constraints and an entirely abstract "black box" solver with neural networks. I have no real clue, but it seems like they could be synergistic?

There's a really oddball repo I found that took this approach:

https://github.com/921kiyo/symbolic-rl

"Symbolic Reinforcement Learning using Inductive Logic Programming"

dunefox · on May 26, 2020

> You convert open-source code into a generic/universal AST/IR format, and figure out a means to represent function inputs and outputs (behavior)

I think this alone is a herculean task.

gavinray · on May 26, 2020

Definitely, it's _practically_ impossible, but I am just curious whether the idea/theory of it isn't.

0-_-0 · on May 26, 2020

> Is this insane?

No, in fact I belive some variant of this approach is what will eventually lead to AGI. (you can probably do a similar type of solution but approaching from the direction of neural networks where you can use backprop)

throw_away · on May 26, 2020

This vid from the MS Build conf is pretty amazing (skip to around the 30m mark for the mindblowing stuff): https://twitter.com/i/broadcasts/1OyKAYWPRrWKb

keenmaster · on May 26, 2020

"With this, you get: ... more abstract programming languages (maybe)."

That would be great. If you could guess, how would you go about using symbolic math and neural nets to create a more abstract programming language?

MaxBarraclough · on May 26, 2020

If you'll entertain my clueless guesswork:

If it could be used to greatly accelerate SMT solving/constraint solving, perhaps it would be possible to run a formal model as if it were code, i.e. fast and scalable constraint-based programming. That's about as high-level as you can get, short of a requirements document. I'm not sure if that's the kind of symbolic mathematics that's being studied here, as they seem to be looking at mathematically interesting expressions, rather than boring-but-enormous constraints.

Or, somewhat related, perhaps it could be used to help scale formal verification with languages like SPARK and ZZ.

orange3xchicken · on May 26, 2020

In general, I think neural networks really struggle with np hard problems. Current state of the art is bottlenecked wrt to even subtle generalization. The strong baselines have been variations on the pointer-network architecture for some time now. Even though they can do reasonably well on restricted problems (e.g. instances of sat/traveling salesman with fixed # of nodes), they struggle to generalize to instances of the problem with variable # of nodes - not to mention varying the problem itself. Some popular approaches also incl. applications of Graph CNNs & reinforcement learning. Don't really have much more than a cursory idea of the current directions though.

To me, more exciting is some kind of joint collaboration between modern ml systems & formal solvers.

bglazer · on May 27, 2020

You might be interested in the line of work around differentiable relaxations of formal solvers that was kicked off by Zico Kolter et al. They add a Max=SAT solver as a differentiable layer. This allows the neat trick of solving handwritten Sudoku puzzles by composing a ConvNet with the differentiable Max-SAT layer.

[1] https://arxiv.org/abs/1905.12149

orange3xchicken · on May 27, 2020

Thanks - that's great! I'm familiar with Kolter's work on adversarial robustness.

sharemywin · on May 27, 2020

I found this video interesting. It takes comments and generates code.

https://www.youtube.com/watch?v=fZSFNUT6iY8

KhoomeiK · on May 26, 2020

I believe Jacob Andreas at MIT is working on compositional neural net architectures for applications in programming language theory at the moment