Submitted by currentscurrents t3_1007w5u in MachineLearning

Correct me if any of these priors are wrong:

  • Every problem solvable by a neural network is provably solvable in code, although not necessarily in a useful way - at worst you could generate the pytorch source code and the model weights.

  • Neural networks can discover algorithms during training, and use them internally to accomplish the task. This happens emergently in today's large transformer models; it's part of learning how to solve the problem.

  • While neural networks can do a lot of things that classical algorithms can't, there's also a lot of things that both can do - pathfinding for example. Maybe there's more yet-unknown overlap between them.

Stripping away the neural network and running the underlying algorithm could be useful, since classical algorithms tend to run much faster and with less memory.

Has there been any research into converting neural networks into code that accomplishes the same thing? My first thought would be to train a network to take another neural network as input and output the corresponding code. You could create a dataset for this by taking various chunks of code and training neural networks to imitate them.

68

Comments

You must log in or register to comment.

yaosio t1_j2g4wqx wrote

Deepmind put out a paper on discovering faster matrix multiplication. I only know enough machine learning to ask where the bathroom is so I don't know the methods they used.

https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor

https://www.nature.com/articles/s41586-022-05172-4

38

Dylan_TMB t1_j2g6nlc wrote

Good suggestion but not really applicable to this question. TL;DR matrix multiplication done the naive way is slow. There is a way to decompose the matrix and get and algebraic expression that is equivalent but has less multiplication steps making it faster. Finding the decomposition through an algorithm is too time consuming. They trained a reinforcement model that treated decomposition like a game to make a model that can find the decomposition for a set of matrices fast. (I am not 100% decomposition is the right word).

I think what OP is really describing is almost a subfield of explainable AI. I think making an AI that learns to write an algorithm (let's say a python function) that can solve a problem correctly could be attempted. Would be solving multiple loss functions here 1) valid code 2) solves the problem 3) is fast. And 4) isn't just hard coding a dictionary of the training data as a lookup table lol. However, I don't think if a model is learning a function that an algorithm can be extracted.

You have to remember a model is learning a FUNCTION that approximates the mapping of two spaces. So it isn't exact or always correct. So even if the algorithm could be extracted it would be "correct" and I would assume if it could it would not be comprehensible.

7

currentscurrents OP t1_j2g9mvy wrote

Thanks, that is the question I'm trying to ask! I know explainability is a bit of a dead-end field right now so it's a hard problem.

An approximate or incomprehensible algorithm could still be useful if it's faster or uses less memory. But I think to accomplish that you would need to convert it into higher-level ideas; otherwise you're just emulating the network.

Luckily neural networks are capable of converting things into higher-level ideas? It doesn't seem fundamentally impossible.

3

Dylan_TMB t1_j2gc9va wrote

I actually think you are looking for this:

https://arxiv.org/abs/2210.05189

Proof that all neural networks can be represented by a decision tree. Navigating a decision tree is an algorithm so this would be a representation of the "algorithm"

So a question to ask would be if it is the minimal decision tree?

2

currentscurrents OP t1_j2gctk4 wrote

Interesting!

This feels like it falls under emulating a neural network, since you've done equivalent computations - just in a different form.

I wonder if you could train a neural network with the objective of creating the minimal decision tree.

1

Dylan_TMB t1_j2gcz2v wrote

Or just learn to minimize a tree that's input.

4

arg_max t1_j2hw6f8 wrote

That is an interesting paper BUT their method relies heavily on the structure of the task. In general, if you want to create a method that outputs algorithms choosing the output format is already non-trivial. For humans, pseudo-code probably is the most natural way to present algorithms but then you will require some kind of language model or at least a recurrent architecture that can output solutions of different lengths (as not every program has a fixed length). And even once you get your output from the model you have to first make sure that is a valid program and more importantly that it solves the task. This means that you have to verify the correctness of every method your model creates before being able to measure runtime.

But matrix multiplication is different. If you read the paper, you will see that every matrix multiplication algorithm can be written as a higher order Tensor and given a Tensor decomposition its trivial to check the correctness of the matrix multiplication algorithm. This is not even a super novel insight, people knew that you can formulate the task of finding better matrix multiplication algorithms as Tensor decomposition optimization problem BUT the problem is super hard to solve.

But not many real world tasks are like this. For most problems you don't have such a nice output space and at that point it becomes much much harder to to learn algorithms. I guess once people figure out a way to make models that can output verifiably correct pseudo code we will start to see tons of papers of new AI generated heuristics for NP hard and other problems that cannot be solved in optimal time yet.

6

On_Mt_Vesuvius t1_j2g597u wrote

Here's a prominent example. Matrix-matrix multiplication can be done through algorithms with complexity less than O(n^3 ). Here's a paper by DeepMind that uses ML to learn faster algorithms for specific size matrix multiplications, although I don't think it's a neural network behind the scenes.

22

currentscurrents OP t1_j2g5x92 wrote

Thanks for the link, that's good to know about!

But maybe I should have titled this differently. I'm interested in taking a network that solves a problem networks are good at, and converting it into a code representation as a way to speed it up. Like translating between between the two different forms of computation.

1

enzlbtyn t1_j2gfihn wrote

8

currentscurrents OP t1_j2gfvev wrote

Nice, this is the kind of thing I'm looking for! Their approach is different (training a neural network to write programs to solve a task, instead of converting a network that's been trained for the task) but I think it may be equivalent.

2

RandomIsAMyth t1_j2gcm0b wrote

>Stripping away the neural network and running the underlying algorithm could be useful, since classical algorithms tend to run much faster and with less memory.

It's not clear what you call classical algorithm here and I wonder how you would find such algorithm inside a neural network.

The entire neural network is the algorithm. Deleting/changing any parameter could damage the network accuracy. Also, the most costly operations are matrix multiplications but you can hardly speed up multiplications and additions in today's computers. Making the matrix multiplication simpler using quantization and/or sparsity is probably the way to go.

6

currentscurrents OP t1_j2gdumb wrote

By classical algorithm, I mean something that doesn't use a neural network. Traditional programming and neural networks are two very different ways to solve problems, but they can solve many of the same problems.

That sounds like a translation problem, which neural networks are good at. Just like in translation, it would have to understand the higher-level idea behind the implementation.

It's like text-to-code, but network-to-code instead.

3

MrAcurite t1_j2h1fbs wrote

I think what you're talking about is, essentially, having the neural network learn an algorithm, and then pulling the learned algorithm out of the network, and standing it up in code.

Sadly, that's not how the underlying dynamics of the neural network operates. We're doing statistical function approximation, there's not really all that much that's fundamentally "algorithmic" about the network. For example, a sort function is necessarily general over the entire domain of the entries for which it is valid, whereas a neural network will only approximate a function over the subfield of the domain for which it was trained, all bets are off elsewhere; it doesn't generalize.

Maybe you could pull something out of a Neural Turing Machine or a Spiking Neural Network, but even then, you're running into tremendously difficult problems in interpretability.

6

currentscurrents OP t1_j2h21zn wrote

>For example, a sort function is necessarily general over the entire domain of the entries for which it is valid, whereas a neural network will only approximate a function over the subfield of the domain for which it was trained, all bets are off elsewhere; it doesn't generalize.

But you can teach neural networks to do things like solve arbitrary mazes. Isn't that pretty algorithmic?

1

MrAcurite t1_j2h2ei1 wrote

You can teach a neural network to solve, say, mazes in a 10x10 grid, but then you'd need to train it again to solve them in a 20x20 grid, and there would be a size at which the same model would simply cease to work. Whereas Dijkstra's, even if it slows down, would never fail to find the exit if the exit exists.

You might be able to train a model to find new strategies in a specific case, analyze it, and then code your understanding of it yourself, kinda like using a Monte Carlo approach to find a numerical answer to a problem before trying an analytic one. But you're not going to be able to pull an algorithm out of the parameters directly.

6

currentscurrents OP t1_j2hdsvv wrote

Someone else posted this example, which is kind of what I was interested in. They trained a neural network to do a toy problem, addition mod 113, and then were able to determine the algorithm it used to compute it.

>The algorithm learned to do modular addition can be fully reverse engineered. The algorithm is roughly:

>Map inputs x,y→ cos(wx),cos(wy),sin(wx),sin(wy) with a Discrete Fourier Transform, for some frequency w.

>Multiply and rearrange to get cos(w(x+y))=cos(wx)cos(wy)−sin(wx)sin(wy) and sin(w(x+y))=cos(wx)sin(wy)+sin(wx)cos(wy)

>By choosing a frequency w=2πnk we get period dividing n, so this is a function of x + y (mod n)

>Map to the output logits z with cos(w(x+y))cos(wz)+sin(w(x+y))sin(wz)=cos(w(x+y−z)) - this has the highest logit at z≡x+y(mod n), so softmax gives the right answer.

>To emphasise, this algorithm was purely learned by gradient descent! I did not predict or understand this algorithm in advance and did nothing to encourage the model to learn this way of doing modular addition. I only discovered it by reverse engineering the weights.

This is a very different way to do modular addition, but it makes sense for the network. Sine/cosine functions represent waves that repeat every frequency, so if you choose the right frequency you can implement the non-differentiable modular addition function just working with differentiable functions.

Extracting this algorithm is useful for generalization; while the original network only worked for mod 113, with the algorithm we can plug in any value for the frequency. Of course this is a toy example and there are much faster ways to do modular addition, but maybe it could work for more complex problems too.

6

fujiitora t1_j2h0j15 wrote

Far from an expert, but this almost seems like a more powerful symbolic regression in disguise. A main limitation of SR is that you need to have some educated guess/prior of what the potential function components look like for the search space. If this converter/extractor works it would be pretty damn powerful right?

3

digmux t1_j2i9pz2 wrote

Yeah, it seems so.

I'm generally interested in learning about SR. What would be a good starting point? Are there some books, papers or other materials, I could read?

1

evanthebouncy t1_j2h4ir2 wrote

Dm has a paper on sorting.

They asked a nn to sort using only pointer movements, and swap operations. They ended up with a cool generalizable sort on longgggggerrrrrr arrays.

Firgot the name of the paper thou

3

velcher t1_j2jxium wrote

> Stripping away the neural network and running the underlying algorithm could be useful, since classical algorithms tend to run much faster and with less memory.

I would disagree with this. A lot of research is focused on distilling classical algorithms (e.g. search) into neural networks because a forward pass on a GPU is much faster than running the algorithm itself.

3

Phoneaccount25732 t1_j2l2hcq wrote

What is there in this genre other than learned indices and surrogate modeling for physics problems?

1

Thistleknot t1_j2gik12 wrote

Differential equations has been done using neural networks.

2

jms4607 t1_j2gwke0 wrote

A lot of classical algorithms are not differentiable, so you couldn’t expect a differentiable model to do any better than approximate it. Reinforcement learning allows you to learn a non-differentiable algorithm tho.

2

Krappatoa t1_j2irxx3 wrote

ChatGPT will write code for you to solve programming problems. Is that what you mean?

2

keepthepace t1_j2k0pvh wrote

I think you may be interested in neural Turing machines and it's successor: diffferentiable neural computers. They basically force a network to accomplish a task through a Turing machine.

https://en.m.wikipedia.org/wiki/Differentiable_neural_computer

2

currentscurrents OP t1_j2k1t74 wrote

Interesting! I have heard about these, but it doesn't look like there's been much work on them in the last few years - it's mostly 2014-2018 papers.

1

keepthepace t1_j2m2xrp wrote

Yes, they never reached the level of very complex algorithms but also no one ever tried to throw a lot of compute at them before we created gigantic language models able to mostly do these tasks with orders of magnitude more parameters.

I do suspect that if we threw a bit more of compute at a DNC we would get very interesting results, but that's little more than a hunch.

1

abc220022 t1_j2hbqqi wrote

This twitter thread and attached post in part talks about training a neural network to solve modular addition, and then backing out the algorithm the neural network learned, which was imo an unexpected algorithm: https://twitter.com/neelnanda5/status/1559060507524403200?lang=en

1

currentscurrents OP t1_j2hdtor wrote

Interesting! That's a very different way to implement modular addition, but it makes sense for the network to do it that way.

1

Stone_d_ t1_j2hlq81 wrote

Smartypants mcgee over here

1

vallyscode t1_j2hughm wrote

Interesting topic. One thing is to train model to solve given problem, another thing is to figure out or prove whether model was the best one and whether that model landed with the best approach to solve that problem, and last is whether it's possible to generalize the approach network used to solve the problem. Solving all that smells like a noble prize.

1

SilentRetriever t1_j2hy7fh wrote

Look at the work by Petar Velickovic and colleagues. They have worked a lot on simulating graph algorithms(BFS, Prims...) using Deep Learning.

1

Just_CurioussSss t1_j2hzhee wrote

Here is an example of a research paper that investigates the use of neural networks to discover classical algorithms:
"Neural Combinatorial Optimization with Reinforcement Learning" (Bello et al., 2016)
In this paper, the authors propose a method for using reinforcement learning to train a neural network to discover efficient algorithms for combinatorial optimization problems. The neural network is trained to solve a particular optimization problem by generating a sequence of actions, which are then executed by a virtual machine to find the optimal solution. The authors show that this approach can discover a variety of classical algorithms, such as the Knapsack algorithm and the Traveling Salesman algorithm, and can achieve similar or better performance compared to the original algorithms.

1

PunsbyMann t1_j2i06hl wrote

I think looking at the resulting graph of Neural Algorithmic Learner (Petr Velickovic and the group at Deepmind) may result in an algorithm that won't be able to generate code directly.

Bonus, there is this CLRS benchmark that might be useful for the evaluation of your task.

1

AllowFreeSpeech t1_j2ilhjb wrote

Fwiw, it may be easier to learn an algorithm represented in a Prolog-like or Lisp-like language than in various modern C-like programming languages. I am not sure.

1

science-raven t1_j2jnc5l wrote

Lols for brevity! Theres a maths NN project or 5. There isnt a thing called classical algos, like music. Classical greek? That isnt a science term? No results for that type of algo on google either.

1

BrisklyBrusque t1_j2wwoym wrote

Some of the earliest perceptron research involved emulating logic gates with single and multilayer perceptrons. A single layer perceptron famously could not learn XOR gates but a multilayer perceptron could.

1