LetGoAndBeReal t1_j3oit19 wrote on January 10, 2023 at 12:01 AM

How should I think about the way a large language model gains new specific knowledge? For example, suppose you have a model trained on hundreds of gigabytes of text and then want to continue its training to gain knowledge of a single specific fact it has not yet encountered such as “Steven Pinker is the author of The Language Instinct.”

I imagine that presenting it with a single sentence such as this embedded in a training set would contribute very little to its ability to subsequently answer the question “Who was the author of The Language Instinct?” Is that correct?

Is there some heuristic for how many exposures a model like GPT3.5 would need to a new fact, as such, before its weights and biases were adjusted enough to embody this fact?

I-am_Sleepy t1_j3qa3yq wrote on January 10, 2023 at 9:25 AM

I am not really in this field (NLP), but you should checkout Fast Model Editing at Scale from 2021 (use google scholar to find citation thread)

LetGoAndBeReal t1_j3r0p45 wrote on January 10, 2023 at 2:11 PM

Thank you for this. It seems this paper could surely help answer my question, if only I could understand it!

A challenge I keep coming up against in my quest to quickly learn about ML/NN is that almost everything I read is either too high level to provide meaningful explanation or too technically dense for me to follow. I guess I will just take note of this paper for now and circle back to it when I'm a bit further along.

I-am_Sleepy t1_j3vok1m wrote on January 11, 2023 at 11:30 AM

Hey, I’ve found another paper (Git Re-Basin) about merging model weight trained on a disjoint dataset while retaining both model performance. This paper is quite technical, but there is an implementation online. I think you should check it out