Viewing a single comment thread. View all comments

artsybashev t1_j154fhy wrote

it is just the inference. Training requires more like 100 x A100 and a cluster to train on. Just a million to get started.

19

AltruisticNight8314 t1_j1ohh7u wrote

What hardware would be required to i) train or ii) fine-tune weights (i.e. run a few epochs on my own data) for medium-sized transformers (500M-15B parameters)?

I do research on proteomics and I have a very specific problem where perhaps even fine-tuning the weights of a trained transformer (such as ESM-2) might be great.

Of course, there's always the poor man's alternative of building a supervised model on the embeddings returned by the encoder.

1

artsybashev t1_j1ph7f3 wrote

one A100 80GB will get you started with models 500M-15B. You can rent that for a $50 per day. See where that takes you in a week.

2