I apologize if what I'm about to say sounds trivial, but I recently trained the 7b version of llama on my json dataset containing 122k questions and answers. The results were quite good, but I noticed that about 30% of the answers could be improved. I've heard that the 65b model is significantly better, so I'm interested in training it to see how it performs. I already tried Google Colab (high-ram), Paperspace, Deepnote, and Jetbrains, and all crashed. I'm wondering how I can realistically train the 65b model with my $1k budget and complete the training process without any major issues? Any advice is appreciated.

Comments

You must log in or register to comment.

gmork_13 t1_je7dho0 wrote on March 29, 2023 at 11:03 PM

For a more stable compute, check out google cloud gpu.

Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.

edit: here, I found one: https://github.com/tloen/alpaca-lora

I think you could get this done for far less than your budget if need be.

ustainbolt t1_je7plqi wrote on March 30, 2023 at 12:34 AM

For a 65b model you are probably going to have to parallelise the model parameters. See this link. As for training, it would be best to use a vm (any provider will work, lambda and vast.ai are cheap). I would a recommend 4x (or 8x) A100 machine. I'm sure you can find more information about all of this.

wrossmorrow t1_je7vy2p wrote on March 30, 2023 at 1:22 AM

+1 for lambda labs

ustainbolt t1_je7xtcw wrote on March 30, 2023 at 1:36 AM

I love lambda. More reliable than vast.ai, and WAY cheaper than AWS/GCP/Azure.

Nhabls t1_je9598b wrote on March 30, 2023 at 9:35 AM

Every time I logged on to lambdalabs in the past year all their instances were full. Not that available in my experience

badabummbadabing t1_je9cdf7 wrote on March 30, 2023 at 11:07 AM

They just had their Series B funding, they should upscale their resources soon.

itsyourboiirow t1_jecqc1d wrote on March 31, 2023 at 1:32 AM

This is the only downside I've found. Sometimes it's too darn hard to find an instance.

learn-deeply t1_je9eovt wrote on March 30, 2023 at 11:33 AM

Tensor (aka model parallel) parallel with model checkpointing works better than FSDP (though they can be used in conjunction) from my experience. FSDP is easier to work with though.

[deleted] t1_je9rb1f wrote on March 30, 2023 at 1:25 PM

[deleted]

WarProfessional3278 t1_je790g7 wrote on March 29, 2023 at 10:30 PM

By training do you mean finetuning with lora or from the ground up like alpaca? Realistically you could just rent an 8xa100 and spend 4 or 5 hours to get it done

Business-Lead2679 OP t1_je792jz wrote on March 29, 2023 at 10:31 PM

Finetuning

Business-Lead2679 OP t1_je7aefg wrote on March 29, 2023 at 10:40 PM

Just like Alpaca. Even the JSON format is the same as the one released by Stanford, just with different inputs & outputs

Business-Lead2679 OP t1_je794o8 wrote on March 29, 2023 at 10:31 PM

I tried vast.ai which didn’t work. I’m a newbie so maybe I’m doing something wrong

dreaming_geometry t1_je7vmov wrote on March 30, 2023 at 1:19 AM

If you're having trouble with Vast.ai, you can ask for help on the discord. Sounds like your desired use case is a good fit.

machineko t1_je88wj9 wrote on March 30, 2023 at 3:04 AM

I'm working on an open source library focused on resource-efficient fine-tuning methods called xTuring: https://github.com/stochasticai/xturing

Here's how you would perform int8 LoRA fine-tuning in three lines:

python: https://github.com/stochasticai/xturing/blob/main/examples/llama/llama_lora_int8.py
colab notebook: https://colab.research.google.com/drive/1SQUXq1AMZPSLD4mk3A3swUIc6Y2dclme?usp=sharing

Of course the Colab still only works with smaller models. In the example above, 7B required 9G VRAM.

Evening_Ad6637 t1_jeapgrs wrote on March 30, 2023 at 5:20 PM

That sounds very interesting. I'm sorry if this question is trivial or stupid, but I'm an absolute newcomer in this field. Is there a way to train the model as you describe it here (https://xturing.stochastic.ai/quickstart) with only or almost only CPU performance? It's about the fact that I have the following specifications i5 @3.5ghz, 16gb ddr4 ram and only a radeon pro 575 4gb graca. But since I saw how fast alpaca runs over my cpu and ram on my computer, I hope that I could also fine-tune a llama model with this equipment. I would be very grateful for more information regarding possibilities in this direction.

itsyourboiirow t1_jecqjqd wrote on March 31, 2023 at 1:34 AM

Training requires a significant more amount of memory as it it has to keep track of the gradient for every parameter. I would check to see how much memory it takes up on your computer.

machineko t1_jecvhyt wrote on March 31, 2023 at 2:12 AM

16gb of RAM is not enough for even the smallest LLaMA 7b model. You can try doing LoRA with int8 listed above. Did you try the python script I linked above?

jd_3d t1_je7xkwq wrote on March 30, 2023 at 1:34 AM

Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/

Business-Lead2679 OP t1_je70nka wrote on March 29, 2023 at 9:31 PM

Id like to train it on those settings:

EPOCHS = 3

LEARNING_RATE = 2e-5

CUTOFF_LEN = 1024

Justice43 t1_je7vbe7 wrote on March 30, 2023 at 1:17 AM

I recommend looking into Lambda Cloud VMs. They're much cheaper than AWS, and their largest machine (8x A100, 80GB VRAM for each A100) should be enough to finetune the 65b LLaMA model.

Business-Lead2679 OP t1_je9erdj wrote on March 30, 2023 at 11:33 AM

Just checked it out - looks interesting. Unfortunately, the availability of this instance is quite limited, so I'm not sure if I can get access to it

nmfisher t1_jeco3nx wrote on March 31, 2023 at 1:15 AM

Someone also mentioned https://jarvislabs.ai/ to me the other day, haven't used it myself but it looks promising.

brandonZappy t1_je7zwfg wrote on March 30, 2023 at 1:52 AM

What QA dataset are you using?

SigmaSixShooter t1_je8kncb wrote on March 30, 2023 at 4:56 AM

I don’t have an answer for you, but as a fellow noobie, I’d love to hear how you did this. Any tips or resources you want to provide would be greatly appreciated.

[deleted] t1_je7hpr5 wrote on March 29, 2023 at 11:35 PM

[removed]

OSeady t1_je81gws wrote on March 30, 2023 at 2:04 AM

Contact Redmond.ai they can hook you up.