Submitted by Business-Lead2679 t3_12618zu in MachineLearning

I apologize if what I'm about to say sounds trivial, but I recently trained the 7b version of llama on my json dataset containing 122k questions and answers. The results were quite good, but I noticed that about 30% of the answers could be improved. I've heard that the 65b model is significantly better, so I'm interested in training it to see how it performs. I already tried Google Colab (high-ram), Paperspace, Deepnote, and Jetbrains, and all crashed. I'm wondering how I can realistically train the 65b model with my $1k budget and complete the training process without any major issues? Any advice is appreciated.

79

Comments

You must log in or register to comment.

Business-Lead2679 OP t1_je70nka wrote

Id like to train it on those settings:

EPOCHS = 3

LEARNING_RATE = 2e-5

CUTOFF_LEN = 1024

3

WarProfessional3278 t1_je790g7 wrote

By training do you mean finetuning with lora or from the ground up like alpaca? Realistically you could just rent an 8xa100 and spend 4 or 5 hours to get it done

12

gmork_13 t1_je7dho0 wrote

For a more stable compute, check out google cloud gpu.

Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.

edit: here, I found one: https://github.com/tloen/alpaca-lora

I think you could get this done for far less than your budget if need be.

31

ustainbolt t1_je7plqi wrote

For a 65b model you are probably going to have to parallelise the model parameters. See this link. As for training, it would be best to use a vm (any provider will work, lambda and vast.ai are cheap). I would a recommend 4x (or 8x) A100 machine. I'm sure you can find more information about all of this.

31

Justice43 t1_je7vbe7 wrote

I recommend looking into Lambda Cloud VMs. They're much cheaper than AWS, and their largest machine (8x A100, 80GB VRAM for each A100) should be enough to finetune the 65b LLaMA model.

3

jd_3d t1_je7xkwq wrote

Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/

7

OSeady t1_je81gws wrote

Contact Redmond.ai they can hook you up.

0

machineko t1_je88wj9 wrote

I'm working on an open source library focused on resource-efficient fine-tuning methods called xTuring: https://github.com/stochasticai/xturing

Here's how you would perform int8 LoRA fine-tuning in three lines:

python: https://github.com/stochasticai/xturing/blob/main/examples/llama/llama_lora_int8.py
colab notebook: https://colab.research.google.com/drive/1SQUXq1AMZPSLD4mk3A3swUIc6Y2dclme?usp=sharing

Of course the Colab still only works with smaller models. In the example above, 7B required 9G VRAM.

12

SigmaSixShooter t1_je8kncb wrote

I don’t have an answer for you, but as a fellow noobie, I’d love to hear how you did this. Any tips or resources you want to provide would be greatly appreciated.

2

Evening_Ad6637 t1_jeapgrs wrote

That sounds very interesting. I'm sorry if this question is trivial or stupid, but I'm an absolute newcomer in this field. Is there a way to train the model as you describe it here (https://xturing.stochastic.ai/quickstart) with only or almost only CPU performance? It's about the fact that I have the following specifications i5 @3.5ghz, 16gb ddr4 ram and only a radeon pro 575 4gb graca. But since I saw how fast alpaca runs over my cpu and ram on my computer, I hope that I could also fine-tune a llama model with this equipment. I would be very grateful for more information regarding possibilities in this direction.

1