currentscurrents t1_jczods2 wrote on March 20, 2023 at 7:49 PM

Reply to comment by UnusualClimberBear in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

I'm hoping that non-Vonn-Neumann chips will scale up in the next few years. There's some you can buy today but they're small:

>NDP200 is designed natively run deep neural networks (DNN) on a variety of architectures, such as CNN, RNN, and fully connected networks, and it performs vision processing with highly accurate inference at under 1mW.

>Up to 896k neural parameters in 8bit mode, 1.6M parameters in 4bit mode, and 7M+ In 1bit mode

An arduino idles at about 10mw, for comparison.

The idea is that if you're not shuffling the entire network weights across the memory bus every inference cycle, you save ludicrous amounts of time and energy. Someday, we'll use this kind of tech to run LLMs on our phones.

VodkaHaze t1_jd11vhm wrote on March 21, 2023 at 1:29 AM

There's also the tenstorrent chips coming out to public which are vastly more efficient than nvidia stuff

currentscurrents t1_jd1c52o wrote on March 21, 2023 at 2:47 AM

Doesn't look like they sell in individual quantities right now but I welcome any competition in the space!

mycall t1_jd0yi8i wrote on March 21, 2023 at 1:05 AM

> if you're not shuffling the entire network weights across the memory bus every inference cycle

Isn't this common though?