Submitted by super_deap t3_11tmpc5 in MachineLearning
-Rizhiy- t1_jck6j55 wrote
Reply to comment by Spiritual-Reply5896 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Increasing the context window is a simple albeit costly method of increasing amount of addressable information. Working with external memory is not as straightforward.
felheartx t1_jcli6si wrote
You said working with external memory is not as straightforward. Can you explain that?
I've read this: https://arxiv.org/abs/2301.04589# and even though I'm not super familiar with the details, to my untrained eye it seems like attaching external memory is easier than extending the context size.
Just from reading posts on this subreddit I get the feeling that getting larger and larger context sizes is very difficult. Whereas simply attaching this sort of "dictionary" thing is pretty easy to do.
lmericle t1_jcln487 wrote
You will find that in hype circles such as NLP there's a lot of thought-terminating cliches passed around by people who are not so deep in the weeds. Someone says something with confidence, another person doesn't know how to vet it and so just blindly passes it on, and all of a sudden a hack becomes a rumor becomes dogma. It seems to me to be this way with context vs memory.
Put another way: it's the kind of attitude that says "No, Mr. Ford, what we wanted was faster horses".
KerfuffleV2 t1_jclo0oh wrote
I'm not an ML person, but it seems like that paper is just teaching the LLM to simulate a Turing machine. Actually making it respond normally while doing practical stuff like answering user queries would be a different thing.
Also, suppose the LLM has access to external memory. First, you have to teach it how to interact with that external memory (via special command sequences in its tokens, most likely). Then you have to teach it/take steps to make it appropriately note which things are important or not and store/retrieve them as necessary. All of this requires tokens for input/output so it will increase processing time even when used perfectly, these tokens will also consume the existing context window.
One really big thing with LLMs now is it seems like they don't (and maybe can't) know what they know/don't know. They just predict tokens, they can't really do introspection. Of course, they can be trained to respond that they don't know certain things, but getting the LLM to decide it needs to use the external memory doesn't seem like the simplest thing.
I mean, take humans as an example: Are you effective at taking notes, organizing them in a way that lets you easily recall them in the future, etc? It's not even an easy skill for humans to develop, and we're relatively good at knowing what we don't know.
Another thing is the paper you linked to says it set the temperature to 0, to make the responses very deterministic. Generally this makes them a lot less creative as well. If you turn up temperature, you potentially increase the chances that the LLM generates malformed queries for the external memory or stuff like that.
Anyway, I don't know much about the technical side of increasing the context window but when the context window is bigger the thing can just use it as far as I know. Taking advantage of some sort of external memory system seems like it's a very, very complicated thing to solve reliably.
Again, note this is coming from someone that doesn't really know much about ML, LLMs, etc. I'm just a normal developer, so take all this with a grain of salt.
Art10001 t1_jcnakzz wrote
There is a github project that uses embeddings with GPT-3.5 to create infinite memory, as long as you have infinite disk space. The database grows and grows the more you talk.
KerfuffleV2 t1_jcncad2 wrote
You'd have to link me what you're talking about for me to say anything. I doubt it works as straightforwardly as "infinite memory" though.
Art10001 t1_jcnv4kf wrote
https://github.com/LagPixelLOL/ChatGPTCLIBot
There are other similar projects I found while trying to recover this one, which may also be of interest. You can find them by searching "chatgpt embeddings memory github"
KerfuffleV2 t1_jcp7qcz wrote
I'm not sure I fully understand it, but it seems like it's just basically adding context to the prompt it submits with requests. For obvious reasons, the prompt can only get so big. It also requires making requests to OpenAI's embedding API which isn't free: so it's both pushing in more tokens and making those extra requests.
I can definitely see how that approach could produce better results, but it's also not really unlimited memory. Note: I skimmed the source, but I'm not really a C++ person and I didn't actually set it up to use my OpenAI account via API.
127-0-0-1_1 t1_jcqd8se wrote
It's not unlimited memory in a single run, which remains unchanged, but that doesn't seem super relevant to what people want (nothing wrong with multiple runs!). Think about a turing machine, or heck, yourself. A turing machine only has access to a single cell of memory at at time, and in practice, modern CPUs only have access to their registers directly. For long term storage, that goes into RAM, which is accessed on demand.
Similarly, your own memory is not large enough to contain all the information you'd need to complete most complex tasks. That's why you have to write things down and actively try to remember things.
While that uses OpenAI's embedding networks, like the autoregressive LLM itself, it's not like OpenAI has a monopoly on text embeddings by any means (far from it - embeddings have a very straightforward business use and are used in practically any major site you know for things like similarity queries).
While I think OP is overhyping the degree to which this is "infinite memory" yet, in a hypothetical turing machine formulation where the network can more proactively store and restore memory, it would allow for it to be, at least, turing complete.
Spiritual-Reply5896 t1_jcsq4d9 wrote
Exactly, I wanted to find out whether there is some research regarding these embeddings. I really think that by efficient pruning/organization of these "memories" its possible to generate quite advanced memory. Things like embedding consistency then becomes a big player - how much does length affect the embedding, what is the optimal information content vs string size...
hfnuser0000 t1_jcnspad wrote
Hi there! It sounds really interesting! Could you please share the name of the project or provide a link to it? I would love to check it out. Thank you!
Art10001 t1_jcnv4te wrote
https://github.com/LagPixelLOL/ChatGPTCLIBot
There are other similar projects I found while trying to recover this one, which may also be of interest. You can find them by searching "chatgpt embeddings memory github"
[deleted] t1_jcky91i wrote
[deleted]
Viewing a single comment thread. View all comments