Viewing a single comment thread. View all comments

maxToTheJ t1_izvhc51 wrote

> BlenderBot paper specifically states that it is a combination of your standard transformer context window and explicit summarization operations.

ie you read the paper. I guessed that was the answer but didnt want to it before to bias you against you answering that way.

>Whatever would be needed to replicate the underlying model/system.

exactly.

>Further, though, investigation suggests that the "official" story here is either simply not correct, or it is missing key additional techniques; i.e., under certain experimental contexts, it seems to have a window that operates beyond the "official" spec (upwards of another 2x): https://twitter.com/goodside/status/1598882343586238464

Having a bigger window is a parameter while the context windows implementation in the code is the technique. Also much of the discussion isnt necessarily indicative of a bigger window but could also be truncating more effectively which is not really a "long" memory but more about how to chose what is useful.

−2

farmingvillein t1_izvicpn wrote

> Having a bigger window is a parameter while the context windows implementation in the code is the technique

Do you work at OpenAI? If yes, awesome. If no, how can you make this claim?

OpenAI has released few details about how ChatGPT was built.

3

maxToTheJ t1_izvjnht wrote

Its the discussion in the thread you linked where people are thinking about implementation possibilities in a post by one of the users in the conversation

Also

https://twitter.com/jlopez_dl/status/1599057239809331200?s=20

is really indicative of a 822 limit. Especially since the prompt in that users test case is way better than the one with a bunch of A's that the thread starter used which is much easier to detect an preprocess.

Here is that users test case.

https://twitter.com/jlopez_dl/status/1599052209399791617?s=20

Now look at the thread starters

https://twitter.com/goodside/status/1598874679854346241?s=20

out of the two which one could you easily regex out adversarial noise in the input?

The discussion about the billing is pretty funny though because it seems possible that OpenAI will strip and remove the adversarial text you put in your prompt but if they will charge you they will possibly charge you for the unfiltered text. That makes sense because when they do charge if you gave it a bunch of profanity and it has to block your prompt or strip it they probably will still want to charge you.

−1

farmingvillein t1_izvka14 wrote

> is really indicative of a 822 limit

This is not germane to our conversation at all. Do you understand the underlying discussion we are having?

1

maxToTheJ t1_izvkm5s wrote

You all are claiming chatGPT has some type of huge memory? How is a 822 limit not indicative of that.

Clarify the claim and how the https://twitter.com/goodside/status/1598882343586238464 applies in that case? You brought that source into the thread and now are claiming the discussion in that thread is off topic?

−1

farmingvillein t1_izvks5s wrote

Are you a bot? The 822 limit has nothing to do with the context window (other than being a lower bound). The tweet thread is talking about an ostensible limit to the prompt description.

2

maxToTheJ t1_izvkxg0 wrote

You brought that source into the thread and now are claiming the discussion in that thread is off topic?

You still havent shown proof that the context window is crazy long for a GPT model. I hope that test case in the thread with a bunch of AAAA's isnt your evidence.

−1

farmingvillein t1_izvlja1 wrote

I linked you to a discussion about the context window. You then proceeded to pull a tweet within that thread which was entirely irrelevant. You clearly have no idea about the underlying issue we are discussing (and/or, again, are some sort of bot-hybrid).

3

maxToTheJ t1_izvm3wq wrote

Dude the freaking logs on chrome show OpenAI concats the prompts.

>You then proceeded to pull a tweet within that thread which was entirely irrelevant

your exact words. Try standing by them

> (other than being a lower bound).

A lower bound is relevant its basic math. Freaking proofs are devoted to setting lower bounds

I am still waiting on any proof of any extraordinary for a GPT3 type model memory . Since it is extremely relevant for explaining something ,is to know it exist in the first place

−2

farmingvillein t1_izvnwdh wrote

...the whole twitter thread, and my direct link to OpenAI, are about the upper bound. The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer), and the fact that you pulled it tells me that you literally don't understand how transformers or the broader technology works, and that you have zero interest in learning. Are you a Markov chain?

2

maxToTheJ t1_izvotec wrote

> The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer)

OpenAI says the "cache" is '3000 words (or 4000 tokens)". I dont see anything about the input being that. The test case the poster in the twitter thread with spanish is indicative of input being the lower bound which also aligns with what the base GPT3.5 model in the paper has. The other stress test was trivial

https://help.openai.com/en/articles/6787051-does-chatgpt-remember-what-happened-earlier-in-the-conversation

> ...the whole twitter thread, and my direct link to OpenAI, are about the upper bound.

Details. No hand wavy shit, explain with examples why its longer especially since your position is some magical shit not in the paper/blog is happening.

0

farmingvillein t1_izvq3i8 wrote

> I dont see anything about the input being that.

Again, this has absolutely nothing to do with the discussion here, which is about memory outside of the prompt.

Again, how could you possibly claim this is relevant to the discussion? Only an exceptionally deep lack of conceptual understanding could cause you to make that connection.

4

maxToTheJ t1_izvqh2f wrote

This is boring. I am still waiting on those details.

No hand wavy shit, explain with examples showing its impressively longer especially since your position is some magical shit not in the paper/blog is happening.

1