currentscurrents t1_jczuqo8 wrote on March 20, 2023 at 8:29 PM

Just price. They have the same amount of VRAM. The 4090 is faster of course.

satireplusplus t1_jczz8e6 wrote on March 20, 2023 at 8:58 PM

VRAM is the limiting factor to run these things though, not tensor cores

currentscurrents t1_jd007aa wrote on March 20, 2023 at 9:05 PM

Right. And even once you have enough VRAM, memory bandwidth limits the speed more than tensor core bandwidth.

They could pack more tensor cores in there if they wanted to, they just wouldn't be able to fill them with data fast enough.

pointer_to_null t1_jd0bv74 wrote on March 20, 2023 at 10:23 PM

This is definitely true. Theoretically you can page stuff in/out of VRAM to run larger models, but you won't be getting much benefit over CPU compute with all that thrashing.

Enturbulated t1_jd1x9uu wrote on March 21, 2023 at 6:30 AM

You are absolutely correct. text-gen-webui offers "streaming" via paging models in and out of VRAM. Using this your CPU no longer gets bogged down with running the model, but you don't see much improvement in generation speed as the GPU is churning with loading and unloading model data from main RAM all the time. It can still be an improvement worth some effort, but it's far less drastic of an improvement than when the entire model fits in VRAM.

shafall t1_jd2380o wrote on March 21, 2023 at 7:56 AM

To give some more specifics, most of the time its not the CPU that copies the data on modern systems, it is the PCI DMA chip (that may be on the same die though). CPU just sends address ranges to DMA Info

wojtek15 t1_jd0p206 wrote on March 20, 2023 at 11:57 PM

Hey, recently I was thinking if Apple Silicon Macs may be best thing for AI in the future. Most powerful Mac Studio has 128Gb of Uniform RAM which can be used by CPU, GPU or Neural Engine. If only memory size is considered, even A100, let alone any consumer oriented model, can't match. With this amount of memory you could run GPT3 Davinci size model in 4bit mode.

pier4r t1_jd0pf1x wrote on March 20, 2023 at 11:59 PM

> 128Gb of Uniform RAM which can be used by CPU, GPU or Neural Engine.

But it doesn't have the same bandwidth as the VRAM on the GPU card iirc.

Otherwise every integrated GPGPU would be better due to available ram.

The neural engine on M1 and M2 is usable IIRC only with apple libraries, that may not be used by notable models yet.

currentscurrents t1_jd10ab5 wrote on March 21, 2023 at 1:18 AM

Llamma.cpp uses the neural engine, so does StableDiffusion. And the speed is not that far off from VRAM, actually.

>Memory bandwidth is increased to 800GB/s, more than 10x the latest PC desktop chip, and M1 Ultra can be configured with 128GB of unified memory.

By comparison, the Nvidia 4090 is clocking in at ~1000GB/s

Apple is clearly positioning their devices for AI.

Straight-Comb-6956 t1_jd2iwp6 wrote on March 21, 2023 at 11:30 AM

> Llamma.cpp uses the neural engine,

Does it?

mmyjona t1_jdceex2 wrote on March 23, 2023 at 12:08 PM

no, llama-mps use ane.

pier4r t1_jd39md4 wrote on March 21, 2023 at 3:05 PM

> Llamma.cpp uses the neural engine

I am trying to find confirmation for this but I didn't. I saw some ports, but weren't from the LLaMa team. Do you have any source?

remghoost7 t1_jd1k0l6 wrote on March 21, 2023 at 3:55 AM

>...Uniform RAM which can be used by CPU, GPU or Neural Engine.

Interesting....

That's why I've seen so many M1 implementations of machine learning models. It really does seem like the M1 chips were made with AI in mind....

SWESWESWEh t1_jd2s9ml wrote on March 21, 2023 at 12:58 PM

Unfortunately, most code out there is using calls to cuda explicitly rather then checking the GPU type you have and using that. You can fix this yourself, (I use an m1 macbook pro for ML and it is quite powerful) but you need to know what you're doing and it's just more work. You might also run into situations where things are not fully implemented in Metal Performance Shaders (the mac equivalent to cuda), but Apple does put a lot of resources into making this better

LetMeGuessYourAlts t1_jd02jkq wrote on March 20, 2023 at 9:20 PM

Used availability is better on the 3090 as well. I got one for $740 on eBay. Little dust on the heatsinks but at half price it was a steal.

[Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset

gybemeister t1_jczucbf wrote on March 20, 2023 at 8:27 PM