Viewing a single comment thread. View all comments

arg_max t1_j2hw6f8 wrote

That is an interesting paper BUT their method relies heavily on the structure of the task. In general, if you want to create a method that outputs algorithms choosing the output format is already non-trivial. For humans, pseudo-code probably is the most natural way to present algorithms but then you will require some kind of language model or at least a recurrent architecture that can output solutions of different lengths (as not every program has a fixed length). And even once you get your output from the model you have to first make sure that is a valid program and more importantly that it solves the task. This means that you have to verify the correctness of every method your model creates before being able to measure runtime.

But matrix multiplication is different. If you read the paper, you will see that every matrix multiplication algorithm can be written as a higher order Tensor and given a Tensor decomposition its trivial to check the correctness of the matrix multiplication algorithm. This is not even a super novel insight, people knew that you can formulate the task of finding better matrix multiplication algorithms as Tensor decomposition optimization problem BUT the problem is super hard to solve.

But not many real world tasks are like this. For most problems you don't have such a nice output space and at that point it becomes much much harder to to learn algorithms. I guess once people figure out a way to make models that can output verifiably correct pseudo code we will start to see tons of papers of new AI generated heuristics for NP hard and other problems that cannot be solved in optimal time yet.

6