Viewing a single comment thread. View all comments

Lairv t1_ir7p0xt wrote

In the article they try 2 types of reward: minimizing the rank of the tensor decomposition (i.e. minimizing total number of multiplication), and minimizing the runtime of the algorithm on a given hardware (they tried with nvidia V100 and TPUv2)

The latter could be actually useful since their graphs shows that the algorithms discovered reach better performances than cuBLAS (Fig.5)

37