utopiah t1_je0zqae wrote on March 28, 2023 at 5:06 PM

Reply to comment by fiftyfourseventeen in [D] FOMO on the rapid pace of LLMs by 00001746

Well I just did so please explain why not, genuinely trying to learn. I'd also be curious if you have a list of trained models compared by cost. I only saw some CO2eq order of magnitude equivalent but not rough price estimations so that would help me to get a better intuition as you seem to know more about this.

That being said the point was that you don't necessarily need to train anything from scratch or buy anything to have useful results, you cant rent per hour on cloud and refine existing work, no?

fiftyfourseventeen t1_je1gprd wrote on March 28, 2023 at 6:51 PM

If you just want to change the output of a model to look more like something else in its training data, sure. LoRa trains the attention layers (technically it trains a separate model but it can be merged into the attention layers), so it doesn't necessarily add anything NEW per se, but rather focuses on things the model has already learned. For example, if you were to try to make a model work well with a language not in its training data, LoRa is not going to work very well. However, if you wanted to make the model give things in a dialogue like situation (as is the case of alpaca), it can work because the model has already seen dialogue before, so the LoRa makes it "focus" on creating dialogue.

You can get useful results with just LoRa, which is nice. If you want to try to experiment with architecture improvements or large scale finetunes / training from scratch, you are out of luck unless you have millions of dollars.

I'd say the biggest limitation of LoRa is that your model for the most part already has to "know" everything that you are trying to do. It's not a good solution to add more information into the model (e.g. training it on information after 2021 to make it more up to date) with lora. That has to be a full finetune which is a lot more expensive.

As for the cost, I honestly don't know because these companies don't like to make data like that public. We don't even know for sure what hardware GPT 3 was trained on, although it was likely V100s, and then A100s for GPT 3.5 and 4. I think people calculated the least they could have spent on training was around 4.5 million for GPT 3, and 1.6 million for llama. That doesn't even include all the work that went into building an absolutely massive dataset and paying employees to figure out how to do distributed training across tens of thousands of nodes with multiple GPUs each.