ThatInternetGuy

ThatInternetGuy t1_is46ghv wrote

Transformer-based models are gaining traction since 2021 for generative models as you could practically scale up to tens of billions of parameters, whereas GAN-based models are already saturated, not that GAN(s) were any less powerful, as GAN(s) are generally much more efficient in terms of performance and memory.

7

ThatInternetGuy t1_ir9zlmq wrote

This is not the first time RL is used to make efficient routings on the silicon wafers and on the circuit boards. This announcement is good but not that good. 25% improvement in the reduction of silicon area.

I thought they discovered a new Tensor Core design that gives at least 100% improvement.

0

ThatInternetGuy t1_ir9v9aj wrote

Yes, 25% improvement.

My point is, Nvidia CUTLASS has practically improved matrix multiplication by 200% to 900%. Why do you guys think matrix multiplication is currently slow with GPU, I don't get that. The other guy said it's an unsolved problem. There is nothing unsolved when it comes to matrix multiplication. It has been vastly optimized over the years since RTX first came out.

It's apparent that RTX Tensor Cores and CUTLASS have really solved it. It's no coincidence that the recent explosion of ML progresses when Nvidia put in more Tensor Cores and now with CUTLASS templates, all models will benefit from 200% to 900% performance boost.

This RL-designed GEMM is the icing on the cake. Giving that extra 25%.

0

ThatInternetGuy t1_ir96weg wrote

https://developer.nvidia.com/blog/implementing-high-performance-matrix-multiplication-using-cutlass-v2-8/

Nvidia Tensor Cores implement GEMM for extremely fast matrix-matrix multiplication. This has never been figured out for ages; however, it's up to the debate if the AI could improve the GEMM design to allow an even faster matrix-matrix multiplication.

Matrix-Matrix Multiplication has never been slow. If it were slow, we wouldn't have all the extremely fast computing of neural networks.

If you were following the latest news of Machine Learning, you should have heard the recent release of Meta's AITemplate which speeds up inference by 3x to 10x. It is possible thanks to the Nvidia CUTLASS team who have made Matrix-Matrix Multiplication even faster.

−6

ThatInternetGuy t1_ir7zmqm wrote

And GPU is mainly a matrix multiplication hardware. 3D graphics rendering is a parallel matrix multiplication on the 3D model vertices and on the buffer pixels, so it's not really an unsolved problem, as all graphics cards are designed to do extremely fast matrix multiplication.

−6

ThatInternetGuy t1_iqmjuvb wrote

Why do you think Elon is pushing Neuralink?

Elon explains a really simple reason. He believes people using a smartphone is a cyborg as the smartphone is the extension to the human form but it's not apparent because, as he explains, the communication bandwidth between the human form and the machine is limited by how fast you type on the keyboard, so by figuring this bottleneck, Elon explains it why Neuralink will be the answer. Basically, it's a fast communication with huge bandwidth between our human form and the machine, allowing us to stream our thoughts, visuals, ideas and commands to the machine. For safety reasons, the machine may only respond to us via AR glasses and headsets, as to not interfere directly with our brain signals.

So what he wants to do exactly is each person will have a fast AI computer with them linked to the brain wired or wirelessly to the Neuralink brain implant, so basically, everyone has all the abilities to do everything, from being fluent in all world languages to generating arts, designing 3D models, fixing cars, and so on.

1