Viewing a single comment thread. View all comments

sayoonarachu t1_j4n2w5j wrote

Quite a bit and even more if you use optimized frameworks and packages like voltaml, pytorch lighting, colossalai, bitsandbytes, xformers, etc. Those are just the ones I am familiar with.

Some libraries allow balancing between cpu, gpu, and memory, though obviously, that will come at a cost of speed.

General rule, the more parameters the model, the higher the cost of memory. So, unless you're planning to train from scratch or fine tune in the billions of param, you'll be fine.

It's gonna take playing around with hyper parameters, switching between 32, 16, 8 bit quant with pytorch or other python packages, testing between offloading weights to gpu/cpu, etc to get a feel of what you can and can't do.

Also, if I remember correctly, pytorch 2.0 will somewhat benefit the consumer nvidia 40 series to some extent when it is more ready.

Edit: p.s. supposedly a new Forward Forward algorithm can be "helpful" for large models since there's no back propagation

1