Viewing a single comment thread. View all comments

patient_zer00 t1_izuqszr wrote

It doesn't remember stuff, its mostly the web app that remembers it, it sometimes resends the previous request with your current one. (Check the chrome request logs) It will then probably concatenate the prompts and feed them as one to the model.

272

master3243 t1_izv48yc wrote

This is it, they have a huge context size and they just feed it in.

I've seen discussion on whether they use some kind of summarization to be able to fit more context into the same size model but there's only speculation in that regards.

In either case, it's nothing we haven't seen in recent papers here and there.

114

maxToTheJ t1_izvltcw wrote

It probably does some basic checks for adversarial text like putting AAAAAAAAA*, BBBBBBBBBBBBB*, [[[[[[[[*, or profanity profanity profanity then preprocesses the text before inputting.

EDIT: Only mentioning since some folks will argue chatGPT has a long crazy memory (10K tokens) because you sandwich stuff around some trivial 9.5k tokens of repetitions. They likely added a bunch of defenses against different basic prompt engineering attacks so people dont get it to say certain things too.

17

zzzthelastuser t1_izx8k9l wrote

> I've seen discussion on whether they use some kind of summarization to be able to fit more context into the same

They could unironically use ChatGPT for this task.

3

master3243 t1_izxkwzt wrote

True, using the embedding from an LLM as a summary of the past for the same LLM is a technique I've seen done before.

1

p-morais t1_izvyzit wrote

It’s instructGPT, which is based on GPT3.5 with RLHF. People have reversed engineered that it uses a context window of 8,192 tokens and primed with a special prompt.

29

sandboxsuperhero t1_izw2k3k wrote

Where did you see this? text-davinci-003 (which seems to be GOT3.5) has a context window of ~4000 tokens.

5

029187 OP t1_izveeec wrote

That is surprisingly clever.

5

[deleted] t1_izvm5ob wrote

[deleted]

−16

MaceGrim t1_izvnq8t wrote

It’s definitely some form of a Large Language Model implemented through a transformer neural network. GPT references the large language models that OpenAI previously built (GPT-3), and it’s also likely that ChatGPT is a finely-tuned and/or optimized version dedicated to chatting.

20

Duckdog2022 t1_izvtrg2 wrote

Pretty unlikely it's that simple.

6

p-morais t1_izvz91t wrote

Not “pretty unlikely”. The architecture is literally in the name: Generative Pretrained Transformer

19

5erif t1_izwnbsq wrote

Their comment was colloquially synonymous with

> I doubt it's that simple.

Your comment could just as easily have started with

> You're right, it's not that simple.

But reddit is what you might call a generative adversarial network.

9