Submitted by Business-Lead2679 t3_12618zu in MachineLearning

I apologize if what I'm about to say sounds trivial, but I recently trained the 7b version of llama on my json dataset containing 122k questions and answers. The results were quite good, but I noticed that about 30% of the answers could be improved. I've heard that the 65b model is significantly better, so I'm interested in training it to see how it performs. I already tried Google Colab (high-ram), Paperspace, Deepnote, and Jetbrains, and all crashed. I'm wondering how I can realistically train the 65b model with my $1k budget and complete the training process without any major issues? Any advice is appreciated.

79

Comments

You must log in or register to comment.

gmork_13 t1_je7dho0 wrote

For a more stable compute, check out google cloud gpu.

Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.

edit: here, I found one: https://github.com/tloen/alpaca-lora

I think you could get this done for far less than your budget if need be.

31

ustainbolt t1_je7plqi wrote

For a 65b model you are probably going to have to parallelise the model parameters. See this link. As for training, it would be best to use a vm (any provider will work, lambda and vast.ai are cheap). I would a recommend 4x (or 8x) A100 machine. I'm sure you can find more information about all of this.

31

wrossmorrow t1_je7vy2p wrote

+1 for lambda labs

8

ustainbolt t1_je7xtcw wrote

I love lambda. More reliable than vast.ai, and WAY cheaper than AWS/GCP/Azure.

8

Nhabls t1_je9598b wrote

Every time I logged on to lambdalabs in the past year all their instances were full. Not that available in my experience

5

badabummbadabing t1_je9cdf7 wrote

They just had their Series B funding, they should upscale their resources soon.

1

itsyourboiirow t1_jecqc1d wrote

This is the only downside I've found. Sometimes it's too darn hard to find an instance.

1

learn-deeply t1_je9eovt wrote

Tensor (aka model parallel) parallel with model checkpointing works better than FSDP (though they can be used in conjunction) from my experience. FSDP is easier to work with though.

1

WarProfessional3278 t1_je790g7 wrote

By training do you mean finetuning with lora or from the ground up like alpaca? Realistically you could just rent an 8xa100 and spend 4 or 5 hours to get it done

12

Business-Lead2679 OP t1_je7aefg wrote

Just like Alpaca. Even the JSON format is the same as the one released by Stanford, just with different inputs & outputs

3

Business-Lead2679 OP t1_je794o8 wrote

I tried vast.ai which didn’t work. I’m a newbie so maybe I’m doing something wrong

2

dreaming_geometry t1_je7vmov wrote

If you're having trouble with Vast.ai, you can ask for help on the discord. Sounds like your desired use case is a good fit.

3

machineko t1_je88wj9 wrote

I'm working on an open source library focused on resource-efficient fine-tuning methods called xTuring: https://github.com/stochasticai/xturing

Here's how you would perform int8 LoRA fine-tuning in three lines:

python: https://github.com/stochasticai/xturing/blob/main/examples/llama/llama_lora_int8.py
colab notebook: https://colab.research.google.com/drive/1SQUXq1AMZPSLD4mk3A3swUIc6Y2dclme?usp=sharing

Of course the Colab still only works with smaller models. In the example above, 7B required 9G VRAM.

12

Evening_Ad6637 t1_jeapgrs wrote

That sounds very interesting. I'm sorry if this question is trivial or stupid, but I'm an absolute newcomer in this field. Is there a way to train the model as you describe it here (https://xturing.stochastic.ai/quickstart) with only or almost only CPU performance? It's about the fact that I have the following specifications i5 @3.5ghz, 16gb ddr4 ram and only a radeon pro 575 4gb graca. But since I saw how fast alpaca runs over my cpu and ram on my computer, I hope that I could also fine-tune a llama model with this equipment. I would be very grateful for more information regarding possibilities in this direction.

1

itsyourboiirow t1_jecqjqd wrote

Training requires a significant more amount of memory as it it has to keep track of the gradient for every parameter. I would check to see how much memory it takes up on your computer.

2

machineko t1_jecvhyt wrote

16gb of RAM is not enough for even the smallest LLaMA 7b model. You can try doing LoRA with int8 listed above. Did you try the python script I linked above?

1

jd_3d t1_je7xkwq wrote

Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/

7

Business-Lead2679 OP t1_je70nka wrote

Id like to train it on those settings:

EPOCHS = 3

LEARNING_RATE = 2e-5

CUTOFF_LEN = 1024

3

Justice43 t1_je7vbe7 wrote

I recommend looking into Lambda Cloud VMs. They're much cheaper than AWS, and their largest machine (8x A100, 80GB VRAM for each A100) should be enough to finetune the 65b LLaMA model.

3

Business-Lead2679 OP t1_je9erdj wrote

Just checked it out - looks interesting. Unfortunately, the availability of this instance is quite limited, so I'm not sure if I can get access to it

2

SigmaSixShooter t1_je8kncb wrote

I don’t have an answer for you, but as a fellow noobie, I’d love to hear how you did this. Any tips or resources you want to provide would be greatly appreciated.

2

OSeady t1_je81gws wrote

Contact Redmond.ai they can hook you up.

0