Viewing a single comment thread. View all comments

Tuggummii t1_j453j7r wrote

If you have multiple GPUs with each around of 20G VRAM on your hand, you may try from the training. My question is this, does it worth it with an enterprise level of resources and a lot of time? Why would you choose this way instead of picking a pretrained model and finetuning specifically on the code generation from English. OpenAI's GPT-3 Davinci-003 does fairly good code generation from English but sometimes the result is a bit clunky. Therefore you still want to finetune it. They claim davinci-003 has hundreds of trillions of parameters.1.3 billion parameters of OPT or GPT-Neo need 8GB VRAM to just load the model. To finetune these 1.3 billion parameters, you need 16GB VRAM. You can probably do it as a single person.

7