Submitted by imgonnarelph t3_11wqmga in MachineLearning
currentscurrents t1_jczods2 wrote
Reply to comment by UnusualClimberBear in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
I'm hoping that non-Vonn-Neumann chips will scale up in the next few years. There's some you can buy today but they're small:
>NDP200 is designed natively run deep neural networks (DNN) on a variety of architectures, such as CNN, RNN, and fully connected networks, and it performs vision processing with highly accurate inference at under 1mW.
>Up to 896k neural parameters in 8bit mode, 1.6M parameters in 4bit mode, and 7M+ In 1bit mode
An arduino idles at about 10mw, for comparison.
The idea is that if you're not shuffling the entire network weights across the memory bus every inference cycle, you save ludicrous amounts of time and energy. Someday, we'll use this kind of tech to run LLMs on our phones.
VodkaHaze t1_jd11vhm wrote
There's also the tenstorrent chips coming out to public which are vastly more efficient than nvidia stuff
currentscurrents t1_jd1c52o wrote
Doesn't look like they sell in individual quantities right now but I welcome any competition in the space!
mycall t1_jd0yi8i wrote
> if you're not shuffling the entire network weights across the memory bus every inference cycle
Isn't this common though?
Viewing a single comment thread. View all comments