Tuggummii

Tuggummii t1_j453j7r wrote

If you have multiple GPUs with each around of 20G VRAM on your hand, you may try from the training. My question is this, does it worth it with an enterprise level of resources and a lot of time? Why would you choose this way instead of picking a pretrained model and finetuning specifically on the code generation from English. OpenAI's GPT-3 Davinci-003 does fairly good code generation from English but sometimes the result is a bit clunky. Therefore you still want to finetune it. They claim davinci-003 has hundreds of trillions of parameters.1.3 billion parameters of OPT or GPT-Neo need 8GB VRAM to just load the model. To finetune these 1.3 billion parameters, you need 16GB VRAM. You can probably do it as a single person.

7

Tuggummii t1_j3kyf2w wrote

I'm not a professional, but I can answer some of your questions as my personal opinion.

How good is it at writing short stories?

- I don't think GPT-J is dramatically better than the others, especially for text generation. I often see hallucinating, illogical, misconceived text generation. If you want a result like OpenAI's Davinci-003, you may be disappointed despite your fine tuning.

How resource-expensive is it to use locally?

- You need 40GB+ RAM if you're running on CPU. One of my friends has failed on her 32GB RAM and she had to increase her swap memory, then she succeeded with an extremely slow loading time. ( Almost 7~8 minutes ) If you want GPU power, VRAM with float16 need 32GB+ VRAM ( I saw someone using on 24GB ). CPU generates a text from a prompt in 30~45 seconds whereas a GPU generates a text from the same prompt in 3 to 5 seconds.

7