Comments

You must log in or register to comment.

Tuggummii t1_j453j7r wrote

If you have multiple GPUs with each around of 20G VRAM on your hand, you may try from the training. My question is this, does it worth it with an enterprise level of resources and a lot of time? Why would you choose this way instead of picking a pretrained model and finetuning specifically on the code generation from English. OpenAI's GPT-3 Davinci-003 does fairly good code generation from English but sometimes the result is a bit clunky. Therefore you still want to finetune it. They claim davinci-003 has hundreds of trillions of parameters.1.3 billion parameters of OPT or GPT-Neo need 8GB VRAM to just load the model. To finetune these 1.3 billion parameters, you need 16GB VRAM. You can probably do it as a single person.

7

LiquidDinosaurs69 t1_j44wp7w wrote

It’s definitely infeasible to train and run inference on your own for a large language model. You would need many datacenter gpus. But you could maybe create an application that interfaces with a chatgpt api (or some other api accessible LLM)

2

Mosh_98 t1_j4536jl wrote

its better to use a pre trained model and maybe further pretrain it to your needs imo. Try CodeT5. Its a decent model.

1

ZestyData t1_j45bgzi wrote

This concept already exists so there are plenty of resources (papers, etc) online to learn from.

However, current code generation models are huge and hefty, and take a lot of time & resources to build using our current 2023 technology. So it probably isn't a great idea to build a large code-gen language model from scratch.

However, to do a school project about Large Language Models (LLMs), which includes finetuning a pretrained model as well as doing a small model from scratch as a demonstration, would be cool!

1

Far_Butterfly_7987 t1_j45gk6m wrote

Hia!! As others have suggested, please use transfer learning.

1

Nineshadow t1_j45i8rf wrote

No, 60 problems are not enough, probably not even for fine tuning. I would also strongly advise against starting from scratch.

The best approach in this case would be to fine-tune a pre-trained LLM which was trained on both natural language and code, something like GPT-Neo with 125M parameters. I'm mentioning the small version because the you'll have trouble fitting in memory larger models with billions of parameters!

Personally this is what I used for my Bachelor's where I made a tool to automatically generate input code from competitive programming statements.

1

AnnualDegree99 t1_j45rbg5 wrote

125m parameters already sounds like it wouldn't be fun on most GPUs that people actually own, I'm imagining a student with like a 1650 laptop waiting days with their laptop sounding like an F-15 with full afterburner

1