Submitted by Business-Lead2679 t3_12618zu in MachineLearning
I apologize if what I'm about to say sounds trivial, but I recently trained the 7b version of llama on my json dataset containing 122k questions and answers. The results were quite good, but I noticed that about 30% of the answers could be improved. I've heard that the 65b model is significantly better, so I'm interested in training it to see how it performs. I already tried Google Colab (high-ram), Paperspace, Deepnote, and Jetbrains, and all crashed. I'm wondering how I can realistically train the 65b model with my $1k budget and complete the training process without any major issues? Any advice is appreciated.
gmork_13 t1_je7dho0 wrote
For a more stable compute, check out google cloud gpu.
Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.
edit: here, I found one: https://github.com/tloen/alpaca-lora
I think you could get this done for far less than your budget if need be.