Viewing a single comment thread. View all comments

ReasonablyBadass t1_ir6zjcx wrote

And since ML is a lot of matrix multiplication we get faster ML which leads to better matrix multiplication techniques...

112

ActiveLlama t1_ir8fczf wrote

It feels like we are getting closer and closer to the singularity.

13

ThatInternetGuy t1_ir7zmqm wrote

And GPU is mainly a matrix multiplication hardware. 3D graphics rendering is a parallel matrix multiplication on the 3D model vertices and on the buffer pixels, so it's not really an unsolved problem, as all graphics cards are designed to do extremely fast matrix multiplication.

−6

_matterny_ t1_ir8ht0g wrote

But even a GPU has a maximum size matrix it can process. More efficient algorithms could improve GPU performance if they really are new.

14

Thorusss t1_ir9o8lk wrote

Especially since the algorithm are specifically faster on the most modern hardware we have right now.

1

master3243 t1_ir94he8 wrote

It Is an unsolved problem, there's no known optimal algorithm yet.

Unless you have a proof your hiding from the rest of the world?

> The optimal number of field operations needed to multiply two square n × n matrices up to constant factors is still unknown. This is a major open question in theoretical computer science.

5

ThatInternetGuy t1_ir96weg wrote

https://developer.nvidia.com/blog/implementing-high-performance-matrix-multiplication-using-cutlass-v2-8/

Nvidia Tensor Cores implement GEMM for extremely fast matrix-matrix multiplication. This has never been figured out for ages; however, it's up to the debate if the AI could improve the GEMM design to allow an even faster matrix-matrix multiplication.

Matrix-Matrix Multiplication has never been slow. If it were slow, we wouldn't have all the extremely fast computing of neural networks.

If you were following the latest news of Machine Learning, you should have heard the recent release of Meta's AITemplate which speeds up inference by 3x to 10x. It is possible thanks to the Nvidia CUTLASS team who have made Matrix-Matrix Multiplication even faster.

−6

master3243 t1_ir9a1st wrote

Absolutely nothing you said contradicts my point that the optimal algorithm is an unsolved problem, and thus you can't claim that it's impossible for an RL agent to optimize over current methods.

9

ReginaldIII t1_ir9uakx wrote

> however, it's up to the debate if the AI could improve the GEMM design to allow an even faster matrix-matrix multiplication.

Nvidia have been applying RL for chip design and optimization: https://developer.nvidia.com/blog/designing-arithmetic-circuits-with-deep-reinforcement-learning/

So I think it's pretty clear that they think it's possible.

1

ThatInternetGuy t1_ir9v9aj wrote

Yes, 25% improvement.

My point is, Nvidia CUTLASS has practically improved matrix multiplication by 200% to 900%. Why do you guys think matrix multiplication is currently slow with GPU, I don't get that. The other guy said it's an unsolved problem. There is nothing unsolved when it comes to matrix multiplication. It has been vastly optimized over the years since RTX first came out.

It's apparent that RTX Tensor Cores and CUTLASS have really solved it. It's no coincidence that the recent explosion of ML progresses when Nvidia put in more Tensor Cores and now with CUTLASS templates, all models will benefit from 200% to 900% performance boost.

This RL-designed GEMM is the icing on the cake. Giving that extra 25%.

0

ReginaldIII t1_ir9w5x1 wrote

> It's apparent that RTX Tensor Cores and CUTLASS have really solved it.

You mean more efficiency was achieved using a novel type of hardware implementing a state of the art algorithm?

So if we develop methods for searching for algorithms with even better op requirements, we can work on developing hardware that directly leverages those algorithms.

> Why do you guys think matrix multiplication is currently slow with GPU, I don't get that.

I don't think that. I think that developing new hardware and implementing new algorithms that leverage that hardware is how it gets even faster.

And it's an absurd statement for you to make because it's entirely relative. Go back literally 4 years and you could say the same thing despite how much has happened since.

> This has never been figured out for ages; however, it's up to the debate if the AI could improve the

> The other guy said it's an unsolved problem. There is nothing unsolved when it comes to matrix multiplication. It has been vastly optimized over the years since RTX first came out.

The "other guy" is YOU!

3

ThatInternetGuy t1_ir9zlmq wrote

This is not the first time RL is used to make efficient routings on the silicon wafers and on the circuit boards. This announcement is good but not that good. 25% improvement in the reduction of silicon area.

I thought they discovered a new Tensor Core design that gives at least 100% improvement.

0