Submitted by imgonnarelph t3_11wqmga in MachineLearning
pier4r t1_jd0pf1x wrote
Reply to comment by wojtek15 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
> 128Gb of Uniform RAM which can be used by CPU, GPU or Neural Engine.
But it doesn't have the same bandwidth as the VRAM on the GPU card iirc.
Otherwise every integrated GPGPU would be better due to available ram.
The neural engine on M1 and M2 is usable IIRC only with apple libraries, that may not be used by notable models yet.
currentscurrents t1_jd10ab5 wrote
Llamma.cpp uses the neural engine, so does StableDiffusion. And the speed is not that far off from VRAM, actually.
>Memory bandwidth is increased to 800GB/s, more than 10x the latest PC desktop chip, and M1 Ultra can be configured with 128GB of unified memory.
By comparison, the Nvidia 4090 is clocking in at ~1000GB/s
Apple is clearly positioning their devices for AI.
Straight-Comb-6956 t1_jd2iwp6 wrote
> Llamma.cpp uses the neural engine,
Does it?
mmyjona t1_jdceex2 wrote
no, llama-mps use ane.
pier4r t1_jd39md4 wrote
> Llamma.cpp uses the neural engine
I am trying to find confirmation for this but I didn't. I saw some ports, but weren't from the LLaMa team. Do you have any source?
Viewing a single comment thread. View all comments