Submitted by 00001746 t3_1244q71 in MachineLearning
fiftyfourseventeen t1_je1gprd wrote
Reply to comment by utopiah in [D] FOMO on the rapid pace of LLMs by 00001746
If you just want to change the output of a model to look more like something else in its training data, sure. LoRa trains the attention layers (technically it trains a separate model but it can be merged into the attention layers), so it doesn't necessarily add anything NEW per se, but rather focuses on things the model has already learned. For example, if you were to try to make a model work well with a language not in its training data, LoRa is not going to work very well. However, if you wanted to make the model give things in a dialogue like situation (as is the case of alpaca), it can work because the model has already seen dialogue before, so the LoRa makes it "focus" on creating dialogue.
You can get useful results with just LoRa, which is nice. If you want to try to experiment with architecture improvements or large scale finetunes / training from scratch, you are out of luck unless you have millions of dollars.
I'd say the biggest limitation of LoRa is that your model for the most part already has to "know" everything that you are trying to do. It's not a good solution to add more information into the model (e.g. training it on information after 2021 to make it more up to date) with lora. That has to be a full finetune which is a lot more expensive.
As for the cost, I honestly don't know because these companies don't like to make data like that public. We don't even know for sure what hardware GPT 3 was trained on, although it was likely V100s, and then A100s for GPT 3.5 and 4. I think people calculated the least they could have spent on training was around 4.5 million for GPT 3, and 1.6 million for llama. That doesn't even include all the work that went into building an absolutely massive dataset and paying employees to figure out how to do distributed training across tens of thousands of nodes with multiple GPUs each.
Viewing a single comment thread. View all comments