Comments

You must log in or register to comment.

junetwentyfirst2020 t1_j4jgu4t wrote

I’m not sure why you think that that’s such a crummy graphics card. I’ve trained a lot of interesting things for grad school and even in the work place on 4GB less. If you’re fine tuning then it’s not really going to take that long to get decent results, and 16 GB is not bad.

7

currentscurrents t1_j4jj1l6 wrote

It's a little discouraging when every interesting paper has a cluster of 64 A100s in their methods section.

6

junetwentyfirst2020 t1_j4jkejb wrote

The first image transformer is pretty clear that it works better at scale. You might not need a transformer for interesting work though.

You can do so much with that GPU. I think transformers are heavier models, but my background is on CNNs and those work fine on your GPU.

2

currentscurrents t1_j4ijlqv wrote

You can fine-tune image generator models and some smaller language models.

You can also do tasks that don't require super large models, like image recognition.

>that's beyond just some toy experiment?

Don't knock toy experiments too much! I'm having a lot of fun trying to build a differentiable neural computer or memory-augmented network in pytorch.

3

KBM_KBM t1_j4j10y6 wrote

You can pre train and finetune energy efficient language models such as electra or convbert in this gpu. But maybe the batch size might not be too big so the descent would be a bit noisy and also keep the corpus size as small as possible.

Look into bio electra paper which also has the notebook on how he has trained it .

2

serge_cell t1_j4kv3aw wrote

> beyond just some toy experiment?

Compress models. See if you can fit 8GB model into 1G, capable to run on mobile, and at what cost.

1

sayoonarachu t1_j4n2w5j wrote

Quite a bit and even more if you use optimized frameworks and packages like voltaml, pytorch lighting, colossalai, bitsandbytes, xformers, etc. Those are just the ones I am familiar with.

Some libraries allow balancing between cpu, gpu, and memory, though obviously, that will come at a cost of speed.

General rule, the more parameters the model, the higher the cost of memory. So, unless you're planning to train from scratch or fine tune in the billions of param, you'll be fine.

It's gonna take playing around with hyper parameters, switching between 32, 16, 8 bit quant with pytorch or other python packages, testing between offloading weights to gpu/cpu, etc to get a feel of what you can and can't do.

Also, if I remember correctly, pytorch 2.0 will somewhat benefit the consumer nvidia 40 series to some extent when it is more ready.

Edit: p.s. supposedly a new Forward Forward algorithm can be "helpful" for large models since there's no back propagation

1