I'm aware transformers are pretty vram hungry and a 4080 only has 16 GB. So I am guessing a lot of transformer based models will be out of the question. At least anything that is interesting.

Not sure about other models though. Is there anything I can do with a 4080 that's beyond just some toy experiment?

Comments

You must log in or register to comment.

junetwentyfirst2020 t1_j4jgu4t wrote on January 16, 2023 at 3:35 AM

I’m not sure why you think that that’s such a crummy graphics card. I’ve trained a lot of interesting things for grad school and even in the work place on 4GB less. If you’re fine tuning then it’s not really going to take that long to get decent results, and 16 GB is not bad.

currentscurrents t1_j4jj1l6 wrote on January 16, 2023 at 3:51 AM

It's a little discouraging when every interesting paper has a cluster of 64 A100s in their methods section.

junetwentyfirst2020 t1_j4jkejb wrote on January 16, 2023 at 4:01 AM

The first image transformer is pretty clear that it works better at scale. You might not need a transformer for interesting work though.

You can do so much with that GPU. I think transformers are heavier models, but my background is on CNNs and those work fine on your GPU.

currentscurrents t1_j4ijlqv wrote on January 15, 2023 at 11:37 PM

You can fine-tune image generator models and some smaller language models.

You can also do tasks that don't require super large models, like image recognition.

>that's beyond just some toy experiment?

Don't knock toy experiments too much! I'm having a lot of fun trying to build a differentiable neural computer or memory-augmented network in pytorch.

DaLameLama t1_j4mamhy wrote on January 16, 2023 at 6:50 PM

Relevant: https://arxiv.org/abs/2212.14034

>Cramming: Training a Language Model on a Single GPU in One Day

KBM_KBM t1_j4j10y6 wrote on January 16, 2023 at 1:37 AM

You can pre train and finetune energy efficient language models such as electra or convbert in this gpu. But maybe the batch size might not be too big so the descent would be a bit noisy and also keep the corpus size as small as possible.

Look into bio electra paper which also has the notebook on how he has trained it .

serge_cell t1_j4kv3aw wrote on January 16, 2023 at 12:49 PM

> beyond just some toy experiment?

Compress models. See if you can fit 8GB model into 1G, capable to run on mobile, and at what cost.

sayoonarachu t1_j4n2w5j wrote on January 16, 2023 at 9:47 PM

Quite a bit and even more if you use optimized frameworks and packages like voltaml, pytorch lighting, colossalai, bitsandbytes, xformers, etc. Those are just the ones I am familiar with.

Some libraries allow balancing between cpu, gpu, and memory, though obviously, that will come at a cost of speed.

General rule, the more parameters the model, the higher the cost of memory. So, unless you're planning to train from scratch or fine tune in the billions of param, you'll be fine.

It's gonna take playing around with hyper parameters, switching between 32, 16, 8 bit quant with pytorch or other python packages, testing between offloading weights to gpu/cpu, etc to get a feel of what you can and can't do.

Also, if I remember correctly, pytorch 2.0 will somewhat benefit the consumer nvidia 40 series to some extent when it is more ready.

Edit: p.s. supposedly a new Forward Forward algorithm can be "helpful" for large models since there's no back propagation

faker10101891 OP t1_j4n6a3b wrote on January 16, 2023 at 10:08 PM

Thanks, I'll check that out!

No_Research5050 t1_j4n77xh wrote on January 16, 2023 at 10:14 PM

spam filter.