Viewing a single comment thread. View all comments

neanderthal_math t1_ir7k9jl wrote

I’m a little confused by the purpose of this paper too. If the point is to show that an RL algorithm found better bounds than Strassen, then that’s cool. But are they claiming that this is something that a compiler would use in practice? How does this work with fixed SIMD sizes.

18

Lairv t1_ir7p0xt wrote

In the article they try 2 types of reward: minimizing the rank of the tensor decomposition (i.e. minimizing total number of multiplication), and minimizing the runtime of the algorithm on a given hardware (they tried with nvidia V100 and TPUv2)

The latter could be actually useful since their graphs shows that the algorithms discovered reach better performances than cuBLAS (Fig.5)

37

AssumptionIcy5363 t1_irbuwfh wrote

What I would do is train the model for matrixes for many small and many usefull combination of sizes.

Than I would use the normal algorithm for every other combination of sozes

1