Submitted by terserterseness t3_10fxryj in MachineLearning

All the examples from langchain and on huggingface create memory by pasting the entire history in every prompt. This seems to violate the max input prompt length pretty quickly. And it’s expensive. Does chatgpt use something revolutionary? It forgets everything when you create a new session so it ‘feels’ it’s using the convo as memory as well.

But then the question; how do they get past prompt limits? Chunking doesn’t help as it still doesn’t get context in that case between prompts. Maybe they ask the same question with different chunks many times and then ask for a final result?

Apologies if this was answered somewhere, I cannot find it at all and all examples use the same kind of history memory.

35

Comments

You must log in or register to comment.

DaLameLama t1_j4zhqqj wrote

Does ChatGPT actually get past the token limit? Codex supports ~8000 tokens. You might underestimate how much this is. Has anyone tested the limits?

Unfortunately, OpenAI aren't serious about publishing technical reports anymore.

30

andreichiffa t1_j50x4ky wrote

Reported token size is 2048, but they likely do a hard attention mask. In about 1/4th of words

9

EmmyNoetherRing t1_j510553 wrote

>Unfortunately, OpenAI aren't serious about publishing technical reports anymore.

Do OpenAI folks show up to any of the major research conferences? These days I mostly come into contact with AI when it wanders into the tech policy/governance world, and this seems like the sort of work that would get you invited to an OSTP workshop, but I'm not sure if that's actually happening.

OpenAI's latest not-so-technical report (on their website) has a few folks from Georgetown contributing to it, and since AAAI is in DC in a few weeks I was hoping OpenAI would be around and available for questions in some capacity, in some room at the conference.

5

DaLameLama t1_j519tns wrote

There was an OpenAI party at NeurIPS, but I wasn't there. No clue about AAAI :)

4

EmmyNoetherRing t1_j51cvjh wrote

yeah, as an uninformed guess it seems like IJCAI or NeurIPS would be a more natural home, but AAAI is actually in DC, which seems helpful for some categories of conversation. if the right people attend.

3

EmmyNoetherRing t1_j50zesm wrote

I've heard a diverse variety of folks talk about leaving chatGPT tabs/sessions open for for days or weeks and maintaining context plausibly well throughout.

3

Daos-Lies t1_j4zpwjr wrote

This is just a suspicion, but I think it's just a matter of embedding the conversation and using that embedding as an input, in addition to your most recent question. (Which is just classic recurrence really).

I'm relatively confident that the mechanism would be something along those lines because they made a relatively big fuss about their new embedding service around the same time that chatgpt was released. (tho obviously that didn't get as much attention as chatgpt itself).

(and in response to u/DaLameLama asking if chatGPT goes past the token limit: Yes. it deffo can go past 8000 tokens, I have had some v v v long conversations with it.)

21

IntelArtiGen t1_j4zr3iq wrote

Yeah that's also what I would say, I doubt it's anything revolutionary as it's likely not necessary. It might be an innovative use of embeddings of a conversation but I wouldn't qualify that as "revolutionary".

They probably don't use only one embedding for the whole conv, perhaps they use one embedding per prompt and/or they keep in memory some tokens.

1

MysteryInc152 t1_j50pw6e wrote

With embeddings, it should theoritically not have a hard limit at all. But experiments here suggest a sliding context window of 8096

https://mobile.twitter.com/goodside/status/1598874674204618753?t=70_OKsoGYAx8MY38ydXMAA&s=19

6

Daos-Lies t1_j50vdq9 wrote

That is indeed fair enough.

Big fan of the concept of screaming at it until it forgets ;)

And I suppose it is very possible that as part of my 'v long conversations with it' if the topic of the conversation repeated at any stage, which I'm sure they would have done at points, then that could have fooled me into thinking it was remembering things from right at the start.

2

Czl2 t1_j4zqan4 wrote

Ask model to summarize whatever is about to be cut off as you slide the token window and replace what is lost with that summary? In this way your token window always has a summarized version of what is missing attached?

8

wind_dude t1_j50pmcc wrote

I would suspect similar to blenderbot2 from meta and parl.ai.

Chat memory is searched for relevant information and sent to the decoder for the final output.

https://medium.com/ai-network/is-there-a-chatbot-that-goes-beyond-the-gpt-3-blenderbot-2-0-17e42e674824

​

https://ai.facebook.com/blog/blender-bot-2-an-open-source-chatbot-that-builds-long-term-memory-and-searches-the-internet/

​

So it's in the model architecture.

5

drumnation t1_j52yrfo wrote

The api docs don’t seem clear in how to remake the same session memory in the main app. It appeared to me as if it uses stop words to achieve this but I’m still trying to figure out how to emulate conversation memory.

2

kvutxdy t1_j531msz wrote

I asked ChatGPT and it said RNN is used in the system as well. (probably not true)

1