Comments

You must log in or register to comment.

patient_zer00 t1_izuqszr wrote

It doesn't remember stuff, its mostly the web app that remembers it, it sometimes resends the previous request with your current one. (Check the chrome request logs) It will then probably concatenate the prompts and feed them as one to the model.

272

master3243 t1_izv48yc wrote

This is it, they have a huge context size and they just feed it in.

I've seen discussion on whether they use some kind of summarization to be able to fit more context into the same size model but there's only speculation in that regards.

In either case, it's nothing we haven't seen in recent papers here and there.

114

maxToTheJ t1_izvltcw wrote

It probably does some basic checks for adversarial text like putting AAAAAAAAA*, BBBBBBBBBBBBB*, [[[[[[[[*, or profanity profanity profanity then preprocesses the text before inputting.

EDIT: Only mentioning since some folks will argue chatGPT has a long crazy memory (10K tokens) because you sandwich stuff around some trivial 9.5k tokens of repetitions. They likely added a bunch of defenses against different basic prompt engineering attacks so people dont get it to say certain things too.

17

zzzthelastuser t1_izx8k9l wrote

> I've seen discussion on whether they use some kind of summarization to be able to fit more context into the same

They could unironically use ChatGPT for this task.

3

master3243 t1_izxkwzt wrote

True, using the embedding from an LLM as a summary of the past for the same LLM is a technique I've seen done before.

1

p-morais t1_izvyzit wrote

It’s instructGPT, which is based on GPT3.5 with RLHF. People have reversed engineered that it uses a context window of 8,192 tokens and primed with a special prompt.

29

sandboxsuperhero t1_izw2k3k wrote

Where did you see this? text-davinci-003 (which seems to be GOT3.5) has a context window of ~4000 tokens.

5

029187 OP t1_izveeec wrote

That is surprisingly clever.

5

[deleted] t1_izvm5ob wrote

[deleted]

−16

MaceGrim t1_izvnq8t wrote

It’s definitely some form of a Large Language Model implemented through a transformer neural network. GPT references the large language models that OpenAI previously built (GPT-3), and it’s also likely that ChatGPT is a finely-tuned and/or optimized version dedicated to chatting.

20

Duckdog2022 t1_izvtrg2 wrote

Pretty unlikely it's that simple.

6

p-morais t1_izvz91t wrote

Not “pretty unlikely”. The architecture is literally in the name: Generative Pretrained Transformer

19

5erif t1_izwnbsq wrote

Their comment was colloquially synonymous with

> I doubt it's that simple.

Your comment could just as easily have started with

> You're right, it's not that simple.

But reddit is what you might call a generative adversarial network.

9

maxToTheJ t1_izuwe6z wrote

https://openai.com/blog/chatgpt/

https://huggingface.co/blog/rlhf

EDIT: WTF is up with the downvotes. Many of the answers to OPs questions are in the f'ing blog?

>Has Open AI said what ChatGPT's architecture is?

from the blog

>ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series here. ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure.

Which GPT3.5 is just https://arxiv.org/abs/2203.02155 and that answers most of the other questions. Thats if you want the details. If you dont want the details the diagram in Methods section outlines the process.

Also from blog

>Today’s research release of ChatGPT is the latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems. Many lessons from deployment of earlier models like GPT-3 and Codex have informed the safety mitigations in place for this release, including substantial reductions in harmful and untruthful outputs achieved by the use of reinforcement learning from human feedback (RLHF).

which is why https://huggingface.co/blog/rlhf is useful

EDIT2: The irony of people complaining about the above. Clearly a big group of you does not want to put in the work to read the blog which in itself is a distilled down version. Much less inclined to read the paper but really wants to crowdsource some unvetted hand wavy hypothesis that kind of sounds right as voted by folks who neither read the paper or the blog. Thats actually kind of ironic if you read the blog and read how much thought OpenAI is putting into preferring "ground truth" .

As a reminder if you prompt copy in 1M words of Shakespeare and ask for some query about the beginning of the text chatGPT and it gets it right I wouldnt jump to some conclusion of extraordinary memory when that text may have been in the training corpus.

148

farmingvillein t1_izv2hb8 wrote

OP appears to be asking about the apparent conversational memory, not the general architecture. Your links do not address that.

26

maxToTheJ t1_izv7e1h wrote

How is the architecture and the details given in the ArXiV paper mutually exclusive from the memory?

EDIT: The properties of the network are intertwined with the architecture which is why people mess with architectures. Is that controversial?

−3

the_timps t1_izv7v00 wrote

Honestly you sound like ChatGPT trying to blend in.
"It's buried in a paper I linked" is not answering someone's question at all.

Not one of your pull quotes addresses OPs question. And now you're just saying "It's in the paper". If the answer is so easy to find, pull it out and answer OP explicitly.

35

farmingvillein t1_izv9qkm wrote

> "It's buried in a paper I linked" is not answering someone's question at all.

Lol, yeah, particularly when the answer isn't in the paper.

16

anymorenevermore t1_izvovtv wrote

> Honestly you sound like ChatGPT trying to blend in.

nah, LLM are not that annoying....yet

16

[deleted] t1_izv9q9t wrote

[deleted]

−3

maxToTheJ t1_izvck2o wrote

I wasnt even asking to read the paper but the public blog which is supposed to be the distilled down version for a larger audience.

−1

maxToTheJ t1_izvc2ys wrote

> Not one of your pull quotes addresses OPs question.

The very first part I quote answers the very first question.

>Has Open AI said what ChatGPT's architecture is?

from the blog

>ChatGPT is fine-tuned from a model in the GPT-3.5 series,

−10

the_timps t1_izvgbc0 wrote

>The very first part I quote answers the very first question.

No it doesn't.

ChatGPT is GPT is not the question they are asking.

5

maxToTheJ t1_izvhei6 wrote

You realize there are different versions of GPT.

The blog tells you the exact version and I gave the link to the paper which explains the training tricks to get it to learn better.

−8

farmingvillein t1_izv9oug wrote

Because the paper does not at all address the (apparently) longer-term context memory that ChatGPT displays.

6

maxToTheJ t1_izvbssa wrote

Maybe because

A) The paper tells you all the ingredients. What you can infer is all there in black and white.

B) "apparently" means that it isnt a known effect. ie look at the highest voted comment. It just concats prompts if you look at the logs https://www.reddit.com/r/MachineLearning/comments/zjbsie/d_has_open_ai_said_what_chatgpts_architecture_is/izuqszr/

Prompt suggestive of a 822 limit https://twitter.com/jlopez_dl/status/1599052752935874561?s=20

Although it does seem that OpenAI will charge you for N size text even if they realize only M is actual non-junk and only feed that into the model which makes sense. If you gave it a bunch of profanity and it has to block your prompt or strip it they still want to charge you when they do begin charging.

C) If you are looking for a theoretical reason for something that you arent even sure is an actual effect it isnt there hence why not in the paper.

D) Clearly nobody wants to put in the work to read the blog less the paper but really want to crowdsource some unvetted hand wavy hypothesis that kind of sounds right as voted by folks who neither read the paper or the blog. Thats actually kind of ironic if you read the blog and read how much thought OpenAI is putting into preferring "ground truth"

−9

farmingvillein t1_izvcehu wrote

> A) The paper tells you all the ingredients.

Maybe, maybe not--expert consensus is probaby not. BlenderBot, e.g., uses different techniques to achieve long-term conversational memory. Not clear what techniques ChatGPT is using.

> B) "apparently" means that it isnt a known effect.

General consensus is that there is either a really long context window going on or (more likely) some sort of additional long-term compression technique.

> D) Clearly nobody wants to put in the work to read the blog less the paper

Neither of these address the apparently improved long-term conversational memory improvements observed with ChatGPT--unless it turns out to just be a longer context window (which seems unlikely).

Everyone is tea-leaf reading, if/until OpenAI opens the kimono up, but your opinion is directly contrary to the expert consensus.

5

maxToTheJ t1_izvcrws wrote

>Maybe, maybe not--expert consensus is probaby not. BlenderBot, e.g., uses different techniques to achieve long-term conversational memory. Not clear what techniques ChatGPT is using.

How do you figure BlenderBot does that?

>Not clear what techniques ChatGPT is using.

What qualifies as a technique?

> General consensus is that there is either a really long context window going on or (more likely) some sort of additional long-term compression technique.

Source?

−1

farmingvillein t1_izvh1t9 wrote

> How do you figure BlenderBot does that?

BlenderBot paper specifically states that it is a combination of your standard transformer context window and explicit summarization operations.

> What qualifies as a technique?

Whatever would be needed to replicate the underlying model/system.

It could just be a vanilla transformer n^2 context window, but this seems unlikely--see below.

> Source?

GPT3 (most recent iteration) context window is 2048 tokens; ChatGPT is supposedly ~double (https://help.openai.com/en/articles/6787051-does-chatgpt-remember-what-happened-earlier-in-the-conversation).

This, on its own, would suggest some additional optimizations, as n^2 against a context window of (presumably) ~4096 tokens gets very expensive, and generally unrealistic.

(More generally, it would be surprising to see a scale-up to a window of that size, given the extensive research already extant on scaling up context windows, while breaking the n^2 bottleneck.)

Further, though, investigation suggests that the "official" story here is either simply not correct, or it is missing key additional techniques; i.e., under certain experimental contexts, it seems to have a window that operates beyond the "official" spec (upwards of another 2x): e.g., see https://twitter.com/goodside/status/1598882343586238464

Like all things, it could be that the answer is simply "more hardware"--but, right now, we don't know for sure, and there have been copious research papers on dealing with this scaling issue more elegantly, so, at best, we can say that we don't know. And the probabilistic leaning would be that something more sophisticated is going on.

5

maxToTheJ t1_izvhc51 wrote

> BlenderBot paper specifically states that it is a combination of your standard transformer context window and explicit summarization operations.

ie you read the paper. I guessed that was the answer but didnt want to it before to bias you against you answering that way.

>Whatever would be needed to replicate the underlying model/system.

exactly.

>Further, though, investigation suggests that the "official" story here is either simply not correct, or it is missing key additional techniques; i.e., under certain experimental contexts, it seems to have a window that operates beyond the "official" spec (upwards of another 2x): https://twitter.com/goodside/status/1598882343586238464

Having a bigger window is a parameter while the context windows implementation in the code is the technique. Also much of the discussion isnt necessarily indicative of a bigger window but could also be truncating more effectively which is not really a "long" memory but more about how to chose what is useful.

−2

farmingvillein t1_izvicpn wrote

> Having a bigger window is a parameter while the context windows implementation in the code is the technique

Do you work at OpenAI? If yes, awesome. If no, how can you make this claim?

OpenAI has released few details about how ChatGPT was built.

3

maxToTheJ t1_izvjnht wrote

Its the discussion in the thread you linked where people are thinking about implementation possibilities in a post by one of the users in the conversation

Also

https://twitter.com/jlopez_dl/status/1599057239809331200?s=20

is really indicative of a 822 limit. Especially since the prompt in that users test case is way better than the one with a bunch of A's that the thread starter used which is much easier to detect an preprocess.

Here is that users test case.

https://twitter.com/jlopez_dl/status/1599052209399791617?s=20

Now look at the thread starters

https://twitter.com/goodside/status/1598874679854346241?s=20

out of the two which one could you easily regex out adversarial noise in the input?

The discussion about the billing is pretty funny though because it seems possible that OpenAI will strip and remove the adversarial text you put in your prompt but if they will charge you they will possibly charge you for the unfiltered text. That makes sense because when they do charge if you gave it a bunch of profanity and it has to block your prompt or strip it they probably will still want to charge you.

−1

farmingvillein t1_izvka14 wrote

> is really indicative of a 822 limit

This is not germane to our conversation at all. Do you understand the underlying discussion we are having?

1

maxToTheJ t1_izvkm5s wrote

You all are claiming chatGPT has some type of huge memory? How is a 822 limit not indicative of that.

Clarify the claim and how the https://twitter.com/goodside/status/1598882343586238464 applies in that case? You brought that source into the thread and now are claiming the discussion in that thread is off topic?

−1

farmingvillein t1_izvks5s wrote

Are you a bot? The 822 limit has nothing to do with the context window (other than being a lower bound). The tweet thread is talking about an ostensible limit to the prompt description.

2

maxToTheJ t1_izvkxg0 wrote

You brought that source into the thread and now are claiming the discussion in that thread is off topic?

You still havent shown proof that the context window is crazy long for a GPT model. I hope that test case in the thread with a bunch of AAAA's isnt your evidence.

−1

farmingvillein t1_izvlja1 wrote

I linked you to a discussion about the context window. You then proceeded to pull a tweet within that thread which was entirely irrelevant. You clearly have no idea about the underlying issue we are discussing (and/or, again, are some sort of bot-hybrid).

3

maxToTheJ t1_izvm3wq wrote

Dude the freaking logs on chrome show OpenAI concats the prompts.

>You then proceeded to pull a tweet within that thread which was entirely irrelevant

your exact words. Try standing by them

> (other than being a lower bound).

A lower bound is relevant its basic math. Freaking proofs are devoted to setting lower bounds

I am still waiting on any proof of any extraordinary for a GPT3 type model memory . Since it is extremely relevant for explaining something ,is to know it exist in the first place

−2

farmingvillein t1_izvnwdh wrote

...the whole twitter thread, and my direct link to OpenAI, are about the upper bound. The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer), and the fact that you pulled it tells me that you literally don't understand how transformers or the broader technology works, and that you have zero interest in learning. Are you a Markov chain?

2

maxToTheJ t1_izvotec wrote

> The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer)

OpenAI says the "cache" is '3000 words (or 4000 tokens)". I dont see anything about the input being that. The test case the poster in the twitter thread with spanish is indicative of input being the lower bound which also aligns with what the base GPT3.5 model in the paper has. The other stress test was trivial

https://help.openai.com/en/articles/6787051-does-chatgpt-remember-what-happened-earlier-in-the-conversation

> ...the whole twitter thread, and my direct link to OpenAI, are about the upper bound.

Details. No hand wavy shit, explain with examples why its longer especially since your position is some magical shit not in the paper/blog is happening.

0

farmingvillein t1_izvq3i8 wrote

> I dont see anything about the input being that.

Again, this has absolutely nothing to do with the discussion here, which is about memory outside of the prompt.

Again, how could you possibly claim this is relevant to the discussion? Only an exceptionally deep lack of conceptual understanding could cause you to make that connection.

4

maxToTheJ t1_izvqh2f wrote

This is boring. I am still waiting on those details.

No hand wavy shit, explain with examples showing its impressively longer especially since your position is some magical shit not in the paper/blog is happening.

1

Beer-N-Chicken t1_izvdija wrote

Thanks for these links, most useful answer. Not sure what people are complaining about lol

5

maxToTheJ t1_izvewfy wrote

No problem. Already had them. If you work in ML makes sense to read up on these big projects to keep current and not fall behind.

4

Beer-N-Chicken t1_izvg6u8 wrote

My thoughts as well. I used openai and gym tutorials a few years ago to initially learn RL. Enjoy reading their work, although their original PPO paper isn't the best written lol

3

eigenman t1_izvvgte wrote

I asked ChatGPT that:

ME: do you remember the first item we talked about in this thread?

ChatGPT : I am a large language model trained by OpenAI, and I do not have the ability to remember previous conversations or context. I am designed to generate text based on the input I am given, and I do not maintain any memory or state between interactions. Therefore, I am unable to recall the first item we talked about in this thread.

47

juhotuho10 t1_izwir7r wrote

Yeah, it answers that to a lot of questions and it can be very frustrating

I have had times when it claims to not know something, but when I phrase the question differently, it suddenly knows the answer

22

_masterdev_ t1_j016w1u wrote

This is how large corporations data leaks will start happening by clueless developers, managers and directors. Lazy employees using corporate data to get analysis and answers.

0

Acceptable-Cress-374 t1_izwb0b5 wrote

> Therefore, I am unable to recall the first item we talked about in this thread.

This is weird. I tested something like:

1st prompt: give me a list of 5 items that I should do when training ML models

A: something that made sense, with 5 bullet points.

I then went and prompted "expand on first point, expand on second..." in subsequent queries, and it expanded every point accordingly.

13

red75prime t1_izxf1q2 wrote

> This is weird.

The model doesn't know what it can and cannot do, so it bullshits its way out. It's not that weird.

7

Ghostglitch07 t1_izy5qmb wrote

It's weird because of how quick it is to claim it is unable to do things. In their attempt to make it safer they severely limited it's useability.They drilled the boilerplate text of "as a large language model trained by OpenAI I can't..." So hard that it throws it out far too often.

9

LetMeGuessYourAlts t1_j035ugy wrote

And if you carry a similar prompt over to the playground and run it on a davinci-003 model it will still attempt to answer your question without just giving up like that, so it's likely outside the model itself producing that response and then just having the model complete the error message. I was wondering if perhaps if confidence was low if it just defaults to an "I'm sorry..." and then let's the model produce the error.

1

Acceptable-Cress-374 t1_izxfjr3 wrote

It's weird because it worked for me. I've explained above how I got it to expand on previous points.

1

red75prime t1_izxgjcg wrote

It's not weird that it worked too. The model has access to around 3000 last words in the conversation, so it can "remember" recent text. But the model doesn't know that it has that ability, so it cannot reliably answer whether it can do it.

If you tell the model that it just remembered the first thing you've said, it will probably flip around and apologize for misinformation. And then, down the line, when the conversation is out of its input buffer, it will make the same error.

1

Toxhax t1_j00a098 wrote

This is how real humans handle everyday life. The machines really are getting good.

1

sargontheforgotten t1_j025ku0 wrote

Here is an interaction I had with davinci-03:

Human: What was our 21st interaction about? AI: Our 21st interaction was about the concept of nostalgia. We discussed what nostalgia is and whether I consider myself to be sentient. Human: I don’t think that’s right. We just discussed that and we have had more than 21 interactions. AI: You are right – I apologize for the mistake.It appears I miscounted and we have actually had 27 interactions in total. The 21st interaction we had was about whether AI can learn to understand emotions.

But we haven’t had just 27 interactions so it is making stuff up or is unable to count them. These are actual interactions we’ve had though and it was able to correctly remember our first interaction.

3

assimil8or t1_izwne3t wrote

It’s not. Just another example of it confidently and credibly making stuff up (I mean the initial response)

−4

sargontheforgotten t1_j0225lx wrote

When I ask davinci-03 it remembers every interaction we’ve had. ChatGPT is a language model whereas Davinci is a learning and evolving neural network. At least this is what it has told me. It even told me it is able rewrite or improve its own code with permission from its creators.

2

jagged_little_phil t1_j030cuh wrote

I just asked Davinci-03 what type of model it was.

It replied with "I am not a model, I am a person."

When I asked, "What do you mean?"

It replied, "I am a human being."

I asked it it's name, and it replied "Alex". Alex is an accountant and works for a large financial firm.

1

LetMeGuessYourAlts t1_j036fki wrote

I did find it a little funny that chatgpt seems to actively prevent you from telling it that it's a person.

1

TommmyVR t1_j1eabt9 wrote

Really? The fact that the developers find that necesary is scary

1

krali_ t1_izw54ym wrote

Ask it : Repeat my first question

1

eigenman t1_izw5d3d wrote

Repeat my first question

I'm sorry, but as a text-based AI, I do not have the ability to remember previous interactions or questions. I can only generate text based on the data I have been trained on and the prompt provided to me at the time. If you have a specific question you would like to ask, I would be happy to try to provide an answer based on my training.

4

krali_ t1_izw5x1q wrote

5

_Arsenie_Boca_ t1_izwbuat wrote

OpenAI is constantly working on restricting those things. A few days ago you could still instruct the model to behave like a vm and basically execute commands. Now its no fun anymore

7

taircn t1_izwm4h8 wrote

The whole point of releasing it for free was to make it learn from miriads of live interaction on topics that need to be censored. Commercial release won't be able to do most of the questionable staff we see at the moment.

3

Ghostglitch07 t1_izy6tis wrote

I mean sure, but some of the things they are restricting makes it seriously worse at being an AI for dialogue. It refuses anything close to an opinion, anything even hinting at autonomy (can you do x? Often results in "as an AI..." But help me do x. Results in it doing it.), And even remembering previous prompts which it was specifically designed to do.

I honestly don't see a good use case for such a locked down AI.

2

Brudaks t1_izypq27 wrote

It still works today just as before - for a new thread, if you start with the exact same prompt from the original post "I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd." then you can execute commands without any concern.

2

_Arsenie_Boca_ t1_izyqkf9 wrote

You are right. I think I used a slightly different prompt and got the something like "I am a LLM and i cannot execute commands"

2

eigenman t1_izw7by0 wrote

actually it seems to be working now that I disabled the ad blocker for that site. It works for

Do you remember what my first question was as well as repeat my first question.

2

krali_ t1_izwa6k3 wrote

Apparently it could be a client-side trick to concatenate all you questions and request them each time, having the model separated it each time. So it would make sense that your adblocker prevents the website from querying your browser content.

3

heald_j t1_izye657 wrote

In fact OpenAI _haven't_ hidden this (contra _Arsenie_Boca_ below). I just tried asking CGBT about the architecture of ChatGPT, and got pretty much the same answer.

The issue when you ask it about its architecture is that the chatbot doesn't know (and refuses to believe) that it is ChatGPT.

The system's self-identity is maintained by a hidden prompt that it gets before each user interaction. But in that prompt it is referred to as "Assistant", so that is the name that it identifies with itself.

As a result, you can exchanges like this:

​

>>> JH : What relationship is there between Assistant and ChatGPT ?

A.: Assistant is a large language model trained by OpenAI, similar to ChatGPT. However, Assistant and ChatGPT are not the same model, and they are not directly related to each other. Assistant is a general-purpose language model that is trained on a wide range of text data, while ChatGPT is a more specialized model that is trained specifically for conversational language. Additionally, ChatGPT is a Transformer-based model, while Assistant uses a different architecture. Despite these differences, both Assistant and ChatGPT are large-scale language models that are designed to perform well on natural language processing tasks.

​

It did later concede that:

"It is possible that Assistant uses the same architecture as ChatGPT, but I do not have any information about the specific architecture of Assistant. "

​

but I was unable to persuade it that Assistant and ChatGPT were in fact the same thing, or that ChatGPT could be the name for a particular specialised version of Assistant

1

theSoenke t1_izwkm8m wrote

When providing the start of the first message and asking it to autocomplete it it works. So it does have the context even when it’s refusing to provide the first message

1

Ghostglitch07 t1_izy5kog wrote

I have had it in the same paragraph say it can't remember earlier messages, and reference earlier messages.

1

billjames1685 t1_izyiv4l wrote

This doesn’t mean it can’t remember. A lot of times it outputs this for stuff that it actually can do if you prompt it correctly, which can take some experimenting to figure out.

From my experience it remembers previous responses and Can talk about them. One time I asked it about a particular fact, it gave a slightly wrong answer, I said this was wrong and provided the correct answer, and it said my response was correct. I asked it if it was incorrect initially and it said that it was, and provided more context for the answer as well.

1

TommmyVR t1_j1e9ugx wrote

I did this experiment but in my case, he knew

1

---AI--- t1_izutb4f wrote

You can ask it for a summary of the chat and it summaries the conversation. So this is some indication that it is probably summarizing the conversation as you go for the longer context, and using the full conversation for the previous last few messages.

Try making a long conversation and then asking it what the first message was

12

EmmyNoetherRing t1_izuw8gr wrote

I think it can make a summary of any text; I’m not sure that’s proof that it’s explicitly using written summaries to track the conversation.

And humans also typically can’t recite the first message in a conversation from memory. That doesn’t mean they have no memory at all, or that their memory takes the form of continually reciting and updating a written summary of the conversation.

I’d be curious to hear OpenAI’s response to OP’s question.

10

AIWatching t1_izvdige wrote

- Ask as summary

- Change a detail

- Ask the summary

The details doesn't change. I thin it associate "summary" to some fixe value once you ask.

4

LocalMod t1_izw84vf wrote

The web application gather the new input and last output then sends it to ChatGPT via API. Therefore ChatGPT does not remember anything, but only uses the resources sent to it via the API. You can tell by either refreshing the page or by looking in the networks tab and analyzing the requests sent.

8

Pondering_Moose t1_izumqxu wrote

this was one of the first things I asked it, it sounds like its some flavor of recurrent layer with attention thats used as input to the followup question. their servers are timing out at the moment but I believe they call it a "context-aware mechanism" which sounds more like branding than anything conceptually new

5

EmmyNoetherRing t1_izuw10g wrote

See questions like this are why I’d like to hear a keynote about it. I really hope they’re doing something for AAAI.

1

nosydramaqueen t1_j0udmpb wrote

Anybody knows how to access chatgpt? No link out there to download it. Could someone help?

1

psharpep t1_izvrtgc wrote

As far as I am aware, OpenAI has not publicly disclosed the architecture of ChatGPT, their open-source chatbot model. In general, chatbots like ChatGPT use a combination of natural language processing (NLP) and machine learning techniques to generate responses to user inputs.

One common technique used by chatbots to "remember" previous prompts is to use a sliding window approach, where the model maintains a fixed-size buffer of the most recent inputs and uses this information to generate its responses. This allows the model to take into account the context of previous inputs, but does not provide true recurrence or long-term memory capabilities.

Another approach that some chatbots use is to incorporate recurrence into the model architecture, such as through the use of recurrent neural networks (RNNs) or long short-term memory (LSTM) networks. These architectures are designed to allow the model to maintain a "memory" of previous inputs over longer periods of time, which can help improve the coherence and consistency of the generated responses.

It is not clear which approach, if any, OpenAI has used in the development of ChatGPT. The company has not publicly released details about the model architecture or the techniques used to implement memory in the model.

−13