Viewing a single comment thread. View all comments

LetGoAndBeReal t1_je71r0g wrote

Fine-tuning can be great for getting better output from the model based on the knowledge that model already contains. I only meant fine-tuning is not viable for getting new data/knowledge into a model. Fine-tuning does not accomplish knowledge absorption.

3

WokeAssBaller t1_je7y09s wrote

Huh? I think that depends on the fine tuning you are talking about. Fine tuning can absolutely add knowledge to a model

8

lgastako t1_je8i6dw wrote

Not generally very well.

−1

WokeAssBaller t1_jea0ubd wrote

Fine tuning is additional training, there are lots of ways of doing that and sometimes it’s absolutely ideal, there are tradeoffs

1

lgastako t1_jea7kb3 wrote

Would love to see an example of it adding knowledge effectively. I haven't been able to find any at all.

1

WokeAssBaller t1_jealxm2 wrote

Train one from scratch

1

lgastako t1_jeayn8v wrote

I know training a model from scratch will work, but the context of the conversation is fine tuning an existing model and I'm saying I would love to see examples of the claims people are making actually working, because I have only been able to find and create examples of it not working very well at all.

1

WokeAssBaller t1_jebpjog wrote

fine tuning is just additional training, so if it works from scratch it works with fine tuning. And no it may not be as effective as other methods but the poster was claiming it was impossible

1

machineko t1_je83m8x wrote

Unsupervised fine-tuning (or extending the pre-training) with additional data will work. Of course, how to get it to learn new information effectively is a challenge but not impossible.

2

Goldenier t1_je9uruu wrote

This is false, and actually most of the time the opposite is the problem: the model learns too much of the new data it's finetuned on (overfitting on it), but forgets the "knowledge" in the original model. The simplest and most popularly used example right now is when you use the dreambooth, Lora or other finetuning methods to finetune parts of the big image diffusion models and if you overtrain it will place the newly trained face or object in almost all of it's output, so it easily learns new data but also easily forgets old one. ( One mitigation for this is to use preservation loss to make sure it also keeps the old knowledge. ) And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.

2

LetGoAndBeReal t1_je9zfyb wrote

>And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.

It's really not helpful to make strong assertions like this without referring to specific, verifiable sources. Fine-tuning very typically is done in a way where certain layers/parameters of the model are frozen. This is done to avoid the sort of loss we are discussing. The LoRA paper itself states that LoRA "freezes the pre-trained model weights".

0